from:"Zhihong Yu $Issue Comment Edited$ $JIRA$"

[jira] [Issue Comment Edited] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region

2012-04-18 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257195#comment-13257195
 ] 

Zhihong Yu edited comment on HBASE-5816 at 4/19/12 4:18 AM:


stack, your suggestion seems an ultimate solution to the current HMaster 
workflow. trunk has the same problem.
The case i attached were actually introduced by HBASE-5396, which tried to let 
ServerShutdownHandler assign the region in an earlier stage instead of waiting 
for the TimeoutMonitor to do the job. But the isRegionOnline test seems too 
weak here.
{code}
  for (HRegionInfo hri : regionsFromRegionPlansForServer) {
if (!this.services.getAssignmentManager().isRegionOnline(hri)) {
  this.services.getAssignmentManager().assign(hri, true);
  reassignedPlans++;
}
  }
{code}
However, i think any client call to HBaseAdmin.assign() that coincide at this 
point would cause the same problem. There is a lock guarding the private 
assign() method to deal with concurrent assigns, but the entire assign process 
is not atomic. It should be safe for the later thread just return or get an 
exception if the region has already been assigned by an earlier thread.

  was (Author: maryannxue):
stack, your suggestion seems an ultimate solution to the current HMaster 
workflow. trunk has the same problem.
The case i attached were actually introduced by HBASE-5396, which tried to let 
ServerShutdownHandler assign the region in an earlier stage instead of waiting 
for the TimeoutMonitor to do the job. But the isRegionOnline test seems too 
weak here.
  for (HRegionInfo hri : regionsFromRegionPlansForServer) {
if (!this.services.getAssignmentManager().isRegionOnline(hri)) {
  this.services.getAssignmentManager().assign(hri, true);
  reassignedPlans++;
}
  }

However, i think any client call to HBaseAdmin.assign() that coincide at this 
point would cause the same problem. There is a lock guarding the private 
assign() method to deal with concurrent assigns, but the entire assign process 
is not atomic. It should be safe for the later thread just return or get an 
exception if the region has already been assigned by an earlier thread.
  
 Balancer and ServerShutdownHandler concurrently reassigning the same region
 ---

 Key: HBASE-5816
 URL: https://issues.apache.org/jira/browse/HBASE-5816
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Maryann Xue
Priority: Critical
 Attachments: HBASE-5816.patch


 The first assign thread exits with success after updating the RegionState to 
 PENDING_OPEN, while the second assign follows immediately into assign and 
 fails the RegionState check in setOfflineInZooKeeper(). This causes the 
 master to abort.
 In the below case, the two concurrent assigns occurred when AM tried to 
 assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler 
 tried to assign this region (from the region plan) spontaneously.
 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
 src=hadoop05.sh.intel.com,60020,1334544902186, 
 dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
 2012-04-17 05:44:57,648 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
 (offlining)
 2012-04-17 05:44:57,648 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, 
 regions=0, usedHeap=0, maxHeap=0) for region 
 TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
 2012-04-17 05:44:57,666 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b 
 (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
  server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
 2012-04-17 05:52:58,984 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
 state=CLOSED, ts=1334612697672, 
 server=hadoop05.sh.intel.com,60020,1334544902186
 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x236b912e9b3000e Creating (or updating) unassigned node for 
 fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
 2012-04-17 05:52:59,096 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region

[jira] [Issue Comment Edited] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-09 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250443#comment-13250443
 ] 

Zhihong Yu edited comment on HBASE-5677 at 4/10/12 3:58 AM:


Some comments about coding style:
{code}
+  //the master is running and it can provide service
+  public boolean isMasterAvailable() {
+   return !isStopped()  isInitialized();
+  }
{code}
@Override is missing for the above method.
Please leave a space between // and the
Indentation for the return line should be 4 spaces. i.e. 'r' of return should 
be under 'b' of public.
{code}
+  if(isAvailable) {
{code}
Please leave a space between if and left parenthesis.
{code}
+  throw new MasterNotRunningException();
{code}
You can create a new exception in place of MasterNotRunningException above.

  was (Author: zhi...@ebaysf.com):
Some comments about coding style:
{code}
+  //the master is running and it can provide service
+  public boolean isMasterAvailable() {
+   return !isStopped()  isInitialized();
+  }
{code}
@Override is missing for the above method.
Please leave a space between // and the
Indentation for the return line should be 4 spaces. i.e. 'r' of return should 
be under 'b' of public.
{code}
+  if(isAvailable) {
{code}
Please leave a space between if and left parenthesis.
{code}
+  throw new MasterNotRunningException();
{code}
You can create a new exception or provide cause to MasterNotRunningException.
  
 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Attachments: HBASE-5677-90-v1.patch, 
 surefire-report_no_patched_v1.html, surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243057#comment-13243057
 ] 

Zhihong Yu edited comment on HBASE-5689 at 3/31/12 2:00 PM:


@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  

  was (Author: ram_krish):
@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  
  
 Skip RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-50) Snapshot of table

2012-03-29 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906486#comment-12906486
 ] 

Zhihong Yu edited comment on HBASE-50 at 3/29/12 7:02 PM:
--

Message from: Chongxin Li lichong...@zju.edu.cn

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/
---

(Updated 2010-09-06 04:34:53.459404)


Review request for hbase.

Changes
---

Add Mapreduce based export (ExportSnapshot) and import (ImportSnapshot) for 
snapshot, so that snapshot of an hbase table could be exported and imported to 
other data centers. Unit test (TestSnapshotExport) has passed.


Summary
---

This patch includes the first three sub-tasks of HBASE-50:
1. Start and monitor the creation of snapshot via ZooKeeper
2. Create snapshot of an HBase table
3. Some existing functions of HBase are modified to support snapshot

Currently snapshots can be created as expected, but can not be restored or 
deleted yet


This addresses bug HBASE-50.
http://issues.apache.org/jira/browse/HBASE-50


Diff: http://review.cloudera.org/r/467/diff


Testing
---

Unit tests and integration tests with mini cluster passed.


Thanks,

Chongxin




  was (Author: hbasereviewboard):
Message from: Chongxin Li lichong...@zju.edu.cn

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/
---

(Updated 2010-09-06 04:34:53.459404)


Review request for hbase.


Changes
---

Add Mapreduce based export (ExportSnapshot) and import (ImportSnapshot) for 
snapshot, so that snapshot of an hbase table could be exported and imported to 
other data centers. Unit test (TestSnapshotExport) has passed.


Summary
---

This patch includes the first three sub-tasks of HBASE-50:
1. Start and monitor the creation of snapshot via ZooKeeper
2. Create snapshot of an HBase table
3. Some existing functions of HBase are modified to support snapshot

Currently snapshots can be created as expected, but can not be restored or 
deleted yet


This addresses bug HBASE-50.
http://issues.apache.org/jira/browse/HBASE-50


Diffs (updated)
-

  bin/add_snapshot_family.rb PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java bfaa4a1 
  src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 
  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 
  src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 
  src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java d35a28a 
  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 20860d6 
  src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java 
d4bcbed 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b 
  src/main/java/org/apache/hadoop/hbase/mapreduce/ExportSnapshot.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/mapreduce/ImportSnapshot.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1183584 
  src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 2deea4a 
  src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 4735304 
  src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 
  src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotSentinel.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 
  src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 9fdd86d 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 8356d64 
  src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
f1d52b7 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190

[jira] [Issue Comment Edited] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231361#comment-13231361
 ] 

Zhihong Yu edited comment on HBASE-5568 at 3/16/12 4:41 PM:


Integrated to TRUNK.

Thanks for the patch Chunhui.

Thanks for the review Ramkrishna and Lars.

  was (Author: zhi...@ebaysf.com):
Integrated to TRUNK.

Thanks for the patch Chunhui.
  
 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

2012-03-14 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229315#comment-13229315
 ] 

Zhihong Yu edited comment on HBASE-5206 at 3/14/12 10:27 PM:
-

Will integrate patch v3 to 0.94 and TRUNK if there is no objection.

  was (Author: zhi...@ebaysf.com):
Will integrated patch v3 to 0.94 and TRUNK if there is no objection.
  
 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228954#comment-13228954
 ] 

Zhihong Yu edited comment on HBASE-4608 at 3/14/12 4:28 AM:


Please wrap long line:
{code}
+  public static final String ENABLE_WAL_COMPRESSION = 
hbase.regionserver.wal.enablecompression;
{code}
w.r.t. the following code:
{code}
+  static final int VERSION = COMPRESSION_VERSION;
+  static final Text WAL_VERSION = new Text( + VERSION);
{code}
It implies that WAL_VERSION is the same as COMPRESSION_VERSION.
As I explained earlier, we would likely have another compression scheme for WAL 
in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g.
Then we face a choice: what value would WAL_VERSION carry ?

I propose naming the above COMPRESSION_VERSION constant 
DICTIONARY_COMPRESSION_VERSION and decouple it from WAL_VERSION.
In the future, WAL_VERSION of 2 can carry either dictionary or prefix 
compression.


  was (Author: zhi...@ebaysf.com):
Please wrap long line:
{code}
+  public static final String ENABLE_WAL_COMPRESSION = 
hbase.regionserver.wal.enablecompression;
{code}
w.r.t. the following code:
{code}
+  static final int VERSION = COMPRESSION_VERSION;
+  static final Text WAL_VERSION = new Text( + VERSION);
{code}
It implies that WAL_VERSION is the same as COMPRESSION_VERSION.
As I explained earlier, we would likely have another compression scheme for WAL 
in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g.
Then we face a choice: what value would WAL_VERSION carry ?

I propose naming COMPRESSION_VERSION above DICTIONARY_COMPRESSION_VERSION and 
decouple it from WAL_VERSION.
In the future, WAL_VERSION of 2 can carry either dictionary or prefix 
compression.

  
 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4608) HLog Compression

2012-03-07 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224786#comment-13224786
 ] 

Zhihong Yu edited comment on HBASE-4608 at 3/7/12 10:33 PM:


I issued the following command:
{code}
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 5
{code}
After the above job finished, I saw this in region server log:
{code}
2012-03-07 13:01:12,408 INFO  wal.SequenceFileLogWriter 
(SequenceFileLogWriter.java:init(91)) regionserver60020.logRoller - WAL 
compression enabled for 
hdfs://sea-lab-0:54310/hbase/.logs/sea-lab-5,60020,1331150872956/sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}

After copying the HLog to local, I issued:
{code}
bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -u 
sea-lab-5%2C60020%2C1331150872956.1331154072399 sea-lab-5.decomp
{code}
I got:
{code}
-rwxr-xr-x 1 hduser hduser 119487372 2012-03-07 14:12 sea-lab-5.decomp
-rw-r--r-- 1 hduser hduser 120660017 2012-03-07 14:11 
sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}
When I issued compression command, I saw:
{code}
$ bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -c 
sea-lab-5.decomp sea-lab-5.comp
12/03/07 14:14:17 INFO wal.SequenceFileLogReader: Input stream class: 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting 
length
12/03/07 14:14:17 INFO wal.SequenceFileLogWriter: WAL compression enabled for 
sea-lab-5.comp
12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: new createWriter -- 
HADOOP-6840 -- not available
12/03/07 14:14:17 WARN util.NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
12/03/07 14:14:17 WARN util.NativeCodeLoader: 
java.library.path=/apache/hbase/bin/../lib/native/Linux-amd64-64
12/03/07 14:14:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
12/03/07 14:14:17 INFO compress.CodecPool: Got brand-new compressor [.deflate]
12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-5.comp, 
syncFs=true, hflush=true
Exception in thread main java.io.IOException: sea-lab-5.decomp, 
entryStart=124, pos=1406386, end=119487372, edit=0
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:275)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:231)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:93)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:59)
Caused by: java.io.IOException: //0 read 36 bytes, should read 22
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:229)
... 3 more
{code}

  was (Author: zhi...@ebaysf.com):
I saw this in region server log:
{code}
2012-03-07 13:01:12,408 INFO  wal.SequenceFileLogWriter 
(SequenceFileLogWriter.java:init(91)) regionserver60020.logRoller - WAL 
compression enabled for 
hdfs://sea-lab-0:54310/hbase/.logs/sea-lab-5,60020,1331150872956/sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}

After copying the HLog to local, I issued:
{code}
bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -u 
sea-lab-5%2C60020%2C1331150872956.1331154072399 sea-lab-5.decomp
{code}
I got:
{code}
-rwxr-xr-x 1 hduser hduser 119487372 2012-03-07 14:12 sea-lab-5.decomp
-rw-r--r-- 1 hduser hduser 120660017 2012-03-07 14:11 
sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}
When I issued compression command, I saw:
{code}
$ bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -c 
sea-lab-5.decomp sea-lab-5.comp
12/03/07 14:14:17 INFO wal.SequenceFileLogReader: Input stream class: 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting 
length
12/03/07 14:14:17 INFO wal.SequenceFileLogWriter: WAL compression enabled for 
sea-lab-5.comp
12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: new createWriter -- 
HADOOP-6840 -- not available
12/03/07 14:14:17 WARN util.NativeCodeLoader: Failed to load native-hadoop with 
error:

[jira] [Issue Comment Edited] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-03-07 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224838#comment-13224838
]

Zhihong Yu edited comment on HBASE-5209 at 3/7/12 11:40 PM:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3892/#review5696
---

src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
https://reviews.apache.org/r/3892/#comment12401

Another JIRA is good for me.

- Ted

was (Author: jirapos...@reviews.apache.org):

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3892/#review5696
---

src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
https://reviews.apache.org/r/3892/#comment12401

Another JIRA is good for me.

- Ted

On 2012-02-16 06:30:31, David Wang wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3892/
bq. ---
bq.
bq. (Updated 2012-02-16 06:30:31)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. Problem:
bq. There is no method in the HBase client-facing APIs to determine which of
the masters is currently active. This can be especially useful in setups with
multiple backup masters.
bq.
bq. Solution:
bq. Augment ClusterStatus to return the currently active master and the list
of backup masters.
bq.
bq. Notes:
bq. * I uncovered a race condition in ActiveMasterManager, between when it
determines that it did not win the original race to be the active master, and
when it reads the ServerName of the active master. If the active master goes
down in that time, the read to determine the active master's ServerName will
fail ungracefully and the candidate master will abort. The solution
incorporated in this patch is to check to see if the read of the ServerName
succeeded before trying to use it.
bq. * I fixed some minor formatting issues while going through the code. I
can take these changes out if it is considered improper to commit such
non-related changes with the main changes.
bq.
bq.
bq. This addresses bug HBASE-5209.
bq. https://issues.apache.org/jira/browse/HBASE-5209
bq.
bq.
bq. Diffs
bq. -
bq.
bq.src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429
bq.src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
2f60b23
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
3e3d131
bq.
src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
16e4744
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
bc98fb0
bq.
bq. Diff: https://reviews.apache.org/r/3892/diff
bq.
bq.
bq. Testing
bq. ---
bq.
bq. * Ran mvn -P localTests test multiple times - no new tests fail
bq. * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs
- no failures
bq. * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no
failures
bq. * Started active and multiple backup masters, then killed active master,
then brought it back up (will now be a backup master)
bq.* Did the following before and after killing
bq. * hbase hbck -details - checked output to see that active and backup
masters are reported properly
bq. * zk_dump - checked that active and backup masters are reported
properly
bq. * Started cluster with no backup masters to make sure change operates
correctly that way
bq. * Tested build with this diff vs. build without this diff, in all
combinations of client and server
bq.* Verified that new client can run against old servers without incident
and with the defaults applied.
bq.* Note that old clients get an error when running against new servers,
because the old readFields() code in ClusterStatus does not handle exceptions
of any kind. This is not solvable, at least in the scope of this change.
bq.
bq. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment
complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid =
0x135834c75e20008, negotiated timeout = 5000
bq. 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
bq. A record version mismatch occured. Expecting v2, found v3
bq.

[jira] [Issue Comment Edited] (HBASE-5512) Add support for INCLUDE_AND_SEEK_USING_HINT

2012-03-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221915#comment-13221915
 ] 

Zhihong Yu edited comment on HBASE-5512 at 3/4/12 3:45 PM:
---

CustomIncludeAndNextUsingHintFilter.java misses license and class javadoc.
In ScanQueryMatcher.java:
{code}
+boolean filterSeek = false;
{code}
Can the above variable be named such that the name reflects semantics for 
INCLUDE_AND_NEXT_USING_HINT ?
That would make understanding the following code easier:
{code}
+if (filterSeek) {
+  switch(colChecker) {
+  case INCLUDE:
{code}

  was (Author: zhi...@ebaysf.com):
CustomIncludeAndNextUsingHintFilter.java misses license and class javadoc.
{code}
+boolean filterSeek = false;
{code}
Can the above variable be named such that the name reflects semantics for 
INCLUDE_AND_NEXT_USING_HINT ?
That would make understanding the following code easier:
{code}
+if (filterSeek) {
+  switch(colChecker) {
+  case INCLUDE:
{code}
  
 Add support for INCLUDE_AND_SEEK_USING_HINT
 ---

 Key: HBASE-5512
 URL: https://issues.apache.org/jira/browse/HBASE-5512
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
Assignee: Lars Hofhansl
 Attachments: 5512-v2.txt, 5512.txt


 This came up from HBASE-2038
 From Anoop:
 - What we wanted from the filter is include a row and then seek to the next 
 row which we are interested in. I cant see such a facility with our Filter 
 right now. Correct me if I am wrong. So suppose we already seeked to one row 
 and this need to be included in the result, then the Filter should return 
 INCLUDE. Then when the next next() call happens, then only we can return a 
 SEEK_USING_HINT. So one extra row reading is needed. This might create even 
 one unwanted HFileBlock fetch (who knows).
 Can we add reseek() at higher level?
 From Lars:
 Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to the 
 INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, 
 I'm happy to do that, if that's the route we want to go with this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5512) Add support for INCLUDE_AND_SEEK_USING_HINT

2012-03-03 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221559#comment-13221559
 ] 

Zhihong Yu edited comment on HBASE-5512 at 3/3/12 3:24 PM:
---

ScanQueryMatcher:
{code}
} else if (filterResponse == ReturnCode.INCLUDE_AND_SEEK_NEXT_USING_HINT) {
+
   }
{code}
You need make filterSeek=true? 

  was (Author: anoopsamjohn):
ScanQueryMatcher
} else if (filterResponse == ReturnCode.INCLUDE_AND_SEEK_NEXT_USING_HINT) {
+
   }
You need make filterSeek=true? 
  
 Add support for INCLUDE_AND_SEEK_USING_HINT
 ---

 Key: HBASE-5512
 URL: https://issues.apache.org/jira/browse/HBASE-5512
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
Assignee: Lars Hofhansl
 Attachments: 5512.txt


 This came up from HBASE-2038
 From Anoop:
 - What we wanted from the filter is include a row and then seek to the next 
 row which we are interested in. I cant see such a facility with our Filter 
 right now. Correct me if I am wrong. So suppose we already seeked to one row 
 and this need to be included in the result, then the Filter should return 
 INCLUDE. Then when the next next() call happens, then only we can return a 
 SEEK_USING_HINT. So one extra row reading is needed. This might create even 
 one unwanted HFileBlock fetch (who knows).
 Can we add reseek() at higher level?
 From Lars:
 Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to the 
 INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, 
 I'm happy to do that, if that's the route we want to go with this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221371#comment-13221371
 ] 

Zhihong Yu edited comment on HBASE-5509 at 3/3/12 12:18 AM:


SnapshotUtilities.java misses license and javadoc for the class.
{code}
+  public static boolean sameFile(FileSystem srcfs, FileStatus srcstatus,
+  FileSystem dstfs, Path dstpath, boolean skipCRCCheck) throws IOException 
{
{code}
Is it possible to make the src and dst comply to same data type ? Either 
FileStatus or Path.

For sameFile(), I think false should be returned for dest file in the following 
case:
{code}
+  //return true if checksum is not supported
+  //(i.e. some of the checksums is null)
{code}
{code}
+  public static Path getPathInTrash(Path path, String hbaseUser,
+  FileSystem srcFileSys) throws IOException {
{code}
I think FileSystem parameter should be placed as first parameter for the above 
method.
{code}
+String trashPrefix = /user/ + hbaseUser + /.Trash;
{code}
I think the name of trash folder should be made configurable.

For getStoreFileList():
{code}
+   * @param families
+   *  a comma separated list of column families for which we need to
{code}
I think ListString may be better data type for families parameter. This would 
make this method more general in that it is not tied to the format of user 
input.
{code}
+long retryTimeInMins =
+  conf.getInt(hbase.backups.region.retryTimeInMins, 5) * 60 * 1000L;
{code}
Please rename the above variable which is converted to millis unit.

SnapshotMR.java misses license.

  was (Author: zhi...@ebaysf.com):
SnapshotUtilities.java misses license and javadoc for the class.
{code}
+  public static boolean sameFile(FileSystem srcfs, FileStatus srcstatus,
+  FileSystem dstfs, Path dstpath, boolean skipCRCCheck) throws IOException 
{
{code}
Is it possible to make the src and dst comply to same data type ? Either 
FileStatus or Path.

For sameFile(), I think false should be returned for dest file in the following 
case:
{code}
+  //return true if checksum is not supported
+  //(i.e. some of the checksums is null)
{code}
{code}
+  public static Path getPathInTrash(Path path, String hbaseUser,
+  FileSystem srcFileSys) throws IOException {
{code}
I think FileSystem parameter should be placed as first parameter for the above 
method.
  
 MR based copier for copying HFiles (trunk version)
 --

 Key: HBASE-5509
 URL: https://issues.apache.org/jira/browse/HBASE-5509
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Lars Hofhansl
 Fix For: 0.94.0, 0.96.0

 Attachments: 5509.txt


 This copier is a modification of the distcp tool in HDFS. It does the 
 following:
 1. List out all the regions in the HBase cluster for the required table
 2. Write the above out to a file
 3. Each mapper 
3.1 lists all the HFiles for a given region by querying the regionserver
3.2 copies all the HFiles
3.3 outputs success if the copy succeeded, failure otherwise. Failed 
 regions are retried in another loop
 4. Mappers are placed on nodes which have maximum locality for a given region 
 to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5491) Delete the HBaseConfiguration.create for coprocessor.Exec class

2012-02-29 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219338#comment-13219338
 ] 

Zhihong Yu edited comment on HBASE-5491 at 2/29/12 4:53 PM:


The default Exec ctor is only used by TestCoprocessorEndpoint
I think we shouldn't introduce public method(s) just for unit tests.
Stack's last comment makes sense.

@Honghua:
Can you attach a new patch ?

  was (Author: zhi...@ebaysf.com):
The default Exec ctor is only used by TestCoprocessorEndpoint
I think we shouldn't introduce public method(s) just for unit tests.
Stack's last comment makes sense.

@Honghai:
Can you attach a new patch ?
  
 Delete the HBaseConfiguration.create for coprocessor.Exec class
 ---

 Key: HBASE-5491
 URL: https://issues.apache.org/jira/browse/HBASE-5491
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5491.patch


 Exec class has a field: private Configuration conf = 
 HBaseConfiguration.create()
 Client side generates an Exec instance of the class, each initiated 
 Statistics request by ExecRPCInvoker
 Is so HBaseConfiguration.create for each request needs to call
 When the server side deserialize the Exec Called once 
 HBaseConfiguration.create in,
 HBaseConfiguration.create is a time consuming operation.
 private Configuration conf = HBaseConfiguration.create();
 This code is only useful for testing code 
 (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization),
 other places with the Exec class, pass a Configuration come,
 so no need to conf field a default value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4662) Replay the required hlog edits to make the backup preserve row atomicity.

2012-02-27 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217709#comment-13217709
]

Zhihong Yu edited comment on HBASE-4662 at 2/28/12 12:25 AM:
-

The taskframework ('cassini') is interesting and maybe useful to implement
region deletion - see HBASE-4991

How does cassini handle failure scenarios ?

was (Author: zhi...@ebaysf.com):
The taskframework ('cassini') is interesting and maybe useful to implement
region deletion - see HBase-4991

How does cassini handle failure scenarios ?

Replay the required hlog edits to make the backup preserve row atomicity.
-

Key: HBASE-4662
URL: https://issues.apache.org/jira/browse/HBASE-4662
Project: HBase
Issue Type: Sub-task
Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan

The algorithm is as follows:
A. For HFiles:
1. Need to track t1,t2 for each backup (start and end times of the backup)
2. For point in time restore to time t, pick a HFile snapshot which has t2 t
3. Copy HFile snapshot to a temp location - HTABLE_RESTORE_t
B. For HLogs:
for each regionserver do
for .logs and .oldlogs do
1. log file is hlog.TIME
2. if (t TIME and hlog.TIME is open for write) fail restore for t
3. Pick the latest HLog whose create time is t1
4. Pick all HLogs whose create time is t1 and = t2
5. Copy hlogs to the right structures inside HTABLE_RESTORE_t
C. Split logs
1. Enhance HLog.splitLog to take timestamp t
2. Enhance distributed log split tool to pass HTABLE_RESTORE_t, so that log
split is picked up and put in the right location
3. Enhance distributed log split tool to pass t so that all edits till t are
included and others ignored
D. Import the directory into the running HBase with META entries, etc (this
already exists)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-25 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216355#comment-13216355
]

Zhihong Yu edited comment on HBASE-5416 at 2/25/12 3:09 PM:

Atomcity can be achieved by applying the filter set twice. I agree with Mikhail
that we need to have good code quality and decent unit test coverage.
Complexity in the critical path might be a concern. Performance might be
another as certain use cases benefit from the approach while others don't.
Thus, we could consider execution plan from the relational database world for
SQL tuning. Once the data is in the table, tune the execution plan (which way
to go) against particular use case(s). Just my $0.02.

was (Author: thomaspan):
Atomcity can be achieved by applying the filter set twice. I agree with
Mikhail that we need to have good code quality and decent unit test coverage.
Complexity in the critical path might be a concern. Performance might be
another as certain use cases benefit from the approach while others don't.
Thus, we could consider execution plan from the rational database world for SQL
tuning. Once the data is in the table, tune the execution plan (which way to
go) against particular use case(s). Just my $0.02.

Improve performance of scans with some kind of filters.
---

Key: HBASE-5416
URL: https://issues.apache.org/jira/browse/HBASE-5416
Project: HBase
Issue Type: Improvement
Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch,
Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch

When the scan is performed, whole row is loaded into result list, after that
filter (if exists) is applied to detect that row is needed.
But when scan is performed on several CFs and filter checks only data from
the subset of these CFs, data from CFs, not checked by a filter is not needed
on a filter stage. Only when we decided to include current row. And in such
case we can significantly reduce amount of IO performed by a scan, by loading
only values, actually checked by a filter.
For example, we have two CFs: flags and snap. Flags is quite small (bunch of
megabytes) and is used to filter large entries from snap. Snap is very large
(10s of GB) and it is quite costly to scan it. If we needed only rows with
some flag specified, we use SingleColumnValueFilter to limit result to only
small subset of region. But current implementation is loading both CFs to
perform scan, when only small subset is needed.
Attached patch adds one routine to Filter interface to allow filter to
specify which CF is needed to it's operation. In HRegion, we separate all
scanners into two groups: needed for filter and the rest (joined). When new
row is considered, only needed data is loaded, filter applied, and only if
filter accepts the row, rest of data is loaded. At our data, this speeds up
such kind of scans 30-50 times. Also, this gives us the way to better
normalize the data into separate columns by optimizing the scans performed.

[jira] [Issue Comment Edited] (HBASE-5479) Postpone CompactionSelection to compaction execution time

2012-02-25 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216639#comment-13216639
]

Zhihong Yu edited comment on HBASE-5479 at 2/26/12 5:46 AM:

I think compactionPriority score needs to be designed in such a way, when
multiple column families are involved, that no single column family would
exclusively come off the head of PriorityQueue for extended period of time.

was (Author: zhi...@ebaysf.com):
I think compactionPriority score needs to be designed in such a way, when
multiple column families are involved, that no single column family would
consistently come off the head of PriorityQueue for extended period of time.

Postpone CompactionSelection to compaction execution time
-

Key: HBASE-5479
URL: https://issues.apache.org/jira/browse/HBASE-5479
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Reporter: Matt Corgan

It can be commonplace for regionservers to develop long compaction queues,
meaning a CompactionRequest may execute hours after it was created. The
CompactionRequest holds a CompactionSelection that was selected at request
time but may no longer be the optimal selection. The CompactionSelection
should be created at compaction execution time rather than compaction request
time.
The current mechanism breaks down during high volume insertion. The
inefficiency is clearest when the inserts are finished. Inserting for 5
hours may build up 50 storefiles and a 40 element compaction queue. When
finished inserting, you would prefer that the next compaction merges all 50
files (or some large subset), but the current system will churn through each
of the 40 compaction requests, the first of which may be hours old. This
ends up re-compacting the same data many times.
The current system is especially inefficient when dealing with time series
data where the data in the storefiles has minimal overlap. With time series
data, there is even less benefit to intermediate merges because most
storefiles can be eliminated based on their key range during a read, even
without bloomfilters. The only goal should be to reduce file count, not to
minimize number of files merged for each read.
There are other aspects to the current queuing mechanism that would need to
be looked at. You would want to avoid having the same Store in the queue
multiple times. And you would want the completion of one compaction to
possibly queue another compaction request for the store.
A alternative architecture to the current style of queues would be to have
each Store (all open in memory) keep a compactionPriority score up to date
after events like flushes, compactions, schema changes, etc. Then you create
a CompactionPriorityComparator implements ComparatorStore and stick all
the Stores into a PriorityQueue (synchronized remove/add from the queue when
the value changes). The async compaction threads would keep pulling off the
head of that queue as long as the head has compactionPriority X.

[jira] [Issue Comment Edited] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure

2012-02-22 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214300#comment-13214300
 ] 

Zhihong Yu edited comment on HBASE-5456 at 2/23/12 4:53 AM:


@Mikhail:
Thanks for the reminder and pardon my math :-)

Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/1013/console 
again, we can see:
{code}
Results :

Tests run: 519, Failures: 0, Errors: 0, Skipped: 0
...
Results :

Failed tests:   testMROnTable(org.apache.hadoop.hbase.mapreduce.TestImportTsv)
  testMROnTableWithCustomMapper(org.apache.hadoop.hbase.mapreduce.TestImportTsv)
  testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  
testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  
testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited 
too much time for queueFailover replication

Tests in error: 
  testMultiRegionTable(org.apache.hadoop.hbase.mapred.TestTableMapReduce): Job 
failed!

Tests run: 885, Failures: 6, Errors: 1, Skipped: 10
{code}
519+885 is 1404 which is very close to what you have seen.

  was (Author: zhi...@ebaysf.com):
@Mikhail:
Thanks for the reminder and pardon my math :-)
{code}
Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/1013/console 
again, we can see:

Results :

Tests run: 519, Failures: 0, Errors: 0, Skipped: 0
...
Results :

Failed tests:   testMROnTable(org.apache.hadoop.hbase.mapreduce.TestImportTsv)
  testMROnTableWithCustomMapper(org.apache.hadoop.hbase.mapreduce.TestImportTsv)
  testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  
testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  
testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited 
too much time for queueFailover replication

Tests in error: 
  testMultiRegionTable(org.apache.hadoop.hbase.mapred.TestTableMapReduce): Job 
failed!

Tests run: 885, Failures: 6, Errors: 1, Skipped: 10
{code}
519+885 is 1404 which is very close to what you have seen.
  
 Introduce PowerMock into our unit tests to reduce unnecessary method exposure
 -

 Key: HBASE-5456
 URL: https://issues.apache.org/jira/browse/HBASE-5456
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 We should introduce PowerMock into our unit tests so that we don't have to 
 expose methods intended to be used by unit tests.
 Here was Benoit's reply to a user of asynchbase about testability:
 OpenTSDB has unit tests that are mocking out HBaseClient just fine
 [1].  You can mock out pretty much anything on the JVM: final,
 private, JDK stuff, etc.  All you need is the right tools.  I've been
 very happy with PowerMock.  It supports Mockito and EasyMock.
 I've never been keen on mutilating public interfaces for the sake of
 testing.  With tools like PowerMock, we can keep the public APIs tidy
 while mocking and overriding anything, even in the most private guts
 of the classes.
  [1] 
 https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4991) Provide capability to delete named region

2012-02-15 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208695#comment-13208695
 ] 

Zhihong Yu edited comment on HBASE-4991 at 2/16/12 3:57 AM:


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3917/
---

(Updated 2012-02-15 19:16:37.200644)


Review request for hbase.


Changes
---

Removing HBASE-4991 from bugs field. Thanks.


Summary
---

Code review request for HBASE-4991 (Provide capability to delete named region).


Diffs
-

  /src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 1236386 
  
/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteRegionHandler.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1236386 
  /src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 
1236386 
  
/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterDeleteRegionTracker.java 
PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerDeleteRegionTracker.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
1241350 
  /src/main/ruby/hbase/admin.rb 1236386 
  /src/main/ruby/shell.rb 1236386 
  /src/main/ruby/shell/commands/delete_region.rb PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1236386 
  /src/test/java/org/apache/hadoop/hbase/client/TestDeleteRegion.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 1236386 
  /src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 
1236386 

Diff: https://reviews.apache.org/r/3917/diff


Testing
---

Unit tests are passed. Tested in 5 node dev cluster. 

Test case: src/test/java/org/apache/hadoop/hbase/client/TestDeleteRegion.java


Thanks,

Mubarak



  was (Author: jirapos...@reviews.apache.org):

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3909/
---

(Updated 2012-02-15 19:16:37.200644)


Review request for hbase.


Changes
---

Removing HBASE-4991 from bugs field. Thanks.


Summary
---

Code review request for HBASE-4991 (Provide capability to delete named region).


Diffs
-

  /src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 1236386 
  
/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteRegionHandler.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1236386 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1236386 
  /src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 
1236386 
  
/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterDeleteRegionTracker.java 
PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerDeleteRegionTracker.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
1241350 
  /src/main/ruby/hbase/admin.rb 1236386 
  /src/main/ruby/shell.rb 1236386 
  /src/main/ruby/shell/commands/delete_region.rb PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1236386 
  /src/test/java/org/apache/hadoop/hbase/client/TestDeleteRegion.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 1236386 
  /src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 
1236386 

Diff: https://reviews.apache.org/r/3909/diff


Testing
---

Unit tests are passed. Tested in 5 node dev cluster. 

Test case: src/test/java/org/apache/hadoop/hbase/client/TestDeleteRegion.java


Thanks,

Mubarak


  
 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL:

[jira] [Issue Comment Edited] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-11 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206287#comment-13206287
 ] 

Zhihong Yu edited comment on HBASE-5387 at 2/11/12 10:32 PM:
-

We could also do something heuristically. Get a random float and if it is  
0.001 (for example, every 1000's time on average) throw away the old BAOS and 
get a new one.
Can the BAOS really get  the blocksize?

Also what happens when somebody (foolishly, but it does happen) sets the block 
size to 512mb, in that case we might want to free this BOAS every single time.

  was (Author: lhofhansl):
We could also do something heuristically. Get a random float and if it is  
0.001 (for example, every 1000's time on average) throw away the old BOAS and 
get a new one.
Can the BOAS really get  the the blocksize?

Also what happens when somebody (foolishly, but it does happen) sets the block 
size to 512mb, in that case we might want to free this BOAS every single time.
  
 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: D1719.1.patch, 
 Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206040#comment-13206040
 ] 

Zhihong Yu edited comment on HBASE-5387 at 2/11/12 6:11 AM:


See 
http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH
 for where header length came from.

http://kickjava.com/src/java/util/zip/GZIPOutputStream.java.htm, line 109 gives 
us better idea about header length.

  was (Author: zhi...@ebaysf.com):
See 
http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH
 for where header length came from.
  
 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5074) support checksums in HBase block cache

2012-02-07 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202213#comment-13202213
 ] 

Zhihong Yu edited comment on HBASE-5074 at 2/7/12 10:13 AM:


@Dhruba:
Your explanation of CRC algorithm selection makes sense. 

  was (Author: zhi...@ebaysf.com):
@Dhruba:
Your explanation of CDC algorithm selection makes sense. 
  
 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460
 ] 

Zhihong Yu edited comment on HBASE-5317 at 2/6/12 6:47 PM:
---

Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2 (when hadoop.profile property 
carries value of 23).

  was (Author: zhi...@ebaysf.com):
Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2.
  
 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-23 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190945#comment-13190945
 ] 

Zhihong Yu edited comment on HBASE-5231 at 1/23/12 5:16 PM:


I think we can do. 
regarding to log Done. Calculated a load balance in , we can move out 
balanceCluster.
move to below code ?
{code}
+  for (MapServerName, ListHRegionInfo assignments : 
assignmentsByTable.values()) {
+ListRegionPlan partialPlans = 
this.balancer.balanceCluster(assignments);
+if (partialPlans != null) plans.addAll(partialPlans);
   }
{code}

  was (Author: sunnygao):
I think we can do. 
regarding to log Done. Calculated a load balance in , we can move out 
balanceCluster.
move to below code ?

+  for (MapServerName, ListHRegionInfo assignments : 
assignmentsByTable.values()) {
+ListRegionPlan partialPlans = 
this.balancer.balanceCluster(assignments);
+if (partialPlans != null) plans.addAll(partialPlans);
   }
  
 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5255) Use singletons for OperationStatus to save memory

2012-01-22 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190878#comment-13190878
]

Zhihong Yu edited comment on HBASE-5255 at 1/23/12 5:42 AM:

I like the introduction of singletons.
Since there is no setter for code or exceptionMsg, I think we should make them
final.

was (Author: zhi...@ebaysf.com):
I like the introduction of singletons.
Currently there is no constraint that OperationStatus.FAILURE cannot have an
exceptionMsg which deviates from its OperationStatusCode.

At the moment we can put in javadoc to clarify the usage of these new
singletons.

Use singletons for OperationStatus to save memory
-

Key: HBASE-5255
URL: https://issues.apache.org/jira/browse/HBASE-5255
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.92.0, 0.90.5
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor
Labels: performance
Fix For: 0.94.0, 0.92.1

Attachments:
HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch,
HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch

Every single {{Put}} causes the allocation of at least one
{{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so
these allocations are unnecessary and could be avoided. Attached patch adds
a few singletons and uses them, with no public API change. I didn't test the
patches, but you get the idea.

[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-20 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190002#comment-13190002
 ] 

Zhihong Yu edited comment on HBASE-5179 at 1/20/12 7:08 PM:


Patch v17 for 0.90 passed unit tests.
Got a strange complaint about TestLruBlockCache. In 
org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.txt:
{code}
testBackgroundEvictionThread(org.apache.hadoop.hbase.io.hfile.TestLruBlockCache)
  Time elapsed: 3.157 sec   FAILURE!
junit.framework.AssertionFailedError: null
  at junit.framework.Assert.fail(Assert.java:47)
  at junit.framework.Assert.assertTrue(Assert.java:20)
  at junit.framework.Assert.assertTrue(Assert.java:27)
  at 
org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.testBackgroundEvictionThread(TestLruBlockCache.java:58)
{code}
Looks like the following assertion didn't give eviction enough time to run:
{code}
  Thread.sleep(1000);
  assertTrue(n++  2);
{code}
Running TestLruBlockCache alone passed.

  was (Author: zhi...@ebaysf.com):
Patch v17 for 0.90 passed unit tests.
Got a strange complaint about TestLruBlockCache. But in 
org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.txt:
{code}
---
Test set: org.apache.hadoop.hbase.io.hfile.TestLruBlockCache
---
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.152 sec
{code}
  
 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 
 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, 
 hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
 hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5203) Fix atomic put/delete with region server failures.

2012-01-15 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186622#comment-13186622
]

Zhihong Yu edited comment on HBASE-5203 at 1/15/12 11:52 PM:
-

I think we should log a JIRA for adapting replication to work with option #1.
After that JIRA is completed, we can revisit this JIRA.

was (Author: zhi...@ebaysf.com):
I think we should log a JIRA for adapting replication work with option #1.
After that JIRA is completed, we can revisit this JIRA.

Fix atomic put/delete with region server failures.
--

Key: HBASE-5203
URL: https://issues.apache.org/jira/browse/HBASE-5203
Project: HBase
Issue Type: Sub-task
Components: client, coprocessors, regionserver
Reporter: Lars Hofhansl
Fix For: 0.94.0

HBASE-3584 does not not provide fully atomic operation in case of region
server failures (see explanation there).
What should happen is that either (1) all edits are applied via a single
WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed
together.
For #1 it is not clear whether it is advisable to manage multiple *different*
operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL
replay on region startup would work, but that replication would need to be
adapted. The refactoring needed would be non-trivial.
#2 Might actually not work, as another operation could request sync'ing a
later edit and hence flush these entries out as well.

[jira] [Issue Comment Edited] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-14 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186312#comment-13186312
]

Zhihong Yu edited comment on HBASE-5128 at 1/14/12 9:27 PM:

I think we should keep offline() and deprecate clearRegionFromTransition().
Let's remove clearRegionFromTransition() in another JIRA.

was (Author: zhi...@ebaysf.com):
I think we should keep offline() and deprecate the other method.
Let's remove that one in another Jira.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

Key: HBASE-5128
URL: https://issues.apache.org/jira/browse/HBASE-5128
Project: HBase
Issue Type: New Feature
Components: hbck
Affects Versions: 0.92.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region
consistency and table integrity invariant violations. However with '-fix' it
can only automatically repair region consistency cases having to do with
deployment problems. This updated version should be able to handle all cases
(including a new orphan regiondir case). When complete will likely deprecate
the OfflineMetaRepair tool and subsume several open META-hole related issue.
Here's the approach (from the comment of at the top of the new version of the
file).
{code}
/**
* HBaseFsck (hbck) is a tool for checking and repairing region consistency
and
* table integrity.
*
* Region consistency checks verify that META, region deployment on
* region servers and the state of data in HDFS (.regioninfo files) all are in
* accordance.
*
* Table integrity checks verify that that all possible row keys can resolve
to
* exactly one region of a table. This means there are no individual
degenerate
* or backwards regions; no holes between regions; and that there no
overlapping
* regions.
*
* The general repair strategy works in these steps.
* 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
* 2) Repair Region Consistency with META and assignments
*
* For table integrity repairs, the tables their region directories are
scanned
* for .regioninfo files. Each table's integrity is then verified. If there
* are any orphan regions (regions with no .regioninfo files), or holes, new
* regions are fabricated. Backwards regions are sidelined as well as empty
* degenerate (endkey==startkey) regions. If there are any overlapping
regions,
* a new region is created and all data is merged into the new region.
*
* Table integrity repairs deal solely with HDFS and can be done offline --
the
* hbase region servers or master do not need to be running. These phase can
be
* use to completely reconstruct the META table in an offline fashion.
*
* Region consistency requires three conditions -- 1) valid .regioninfo file
* present in an hdfs region dir, 2) valid row with .regioninfo data in META,
* and 3) a region is deployed only at the regionserver that is was assigned
to.
*
* Region consistency requires hbck to contact the HBase master and region
* servers, so the connect() must first be called successfully. Much of the
* region consistency information is transient and less risky to repair.
*/
{code}

[jira] [Issue Comment Edited] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184205#comment-13184205
 ] 

Zhihong Yu edited comment on HBASE-5120 at 1/11/12 5:17 PM:


Can you change LOG.debug() to LOG.error() in deleteClosingOrClosedNode() ?
{code}
+LOG.debug(The deletion of the CLOSED node for the region 
++ region.getEncodedName() +  returned  + deleteNode);
{code}

  was (Author: zhi...@ebaysf.com):
Can you change LOG.debug() to LOG.error() in deleteClosingOrClosedNode() ?
  
 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on

[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184286#comment-13184286
 ] 

Zhihong Yu edited comment on HBASE-5179 at 1/11/12 7:07 PM:


I agree with the spirit of this class.  Good stuff Chunhui.

This is awkward name for a method, getDeadServersUnderProcessing.  Should it be 
getDeadServers?  Does it need to be a public method?  Seems fine that it be 
package private.

Is serversWithoutSplitLog a good name for a local variable?  Should it be 
deadServers with a comment saying that deadServers are processed by 
servershutdownhandler and it will be taking care of the log splitting?

Is this right -- for trunk?

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {
{code}
Online servers is keyed by a ServerName, not a hostname.

What is a deadServersUnderProcessing?  Does DeadServers keep list of all 
servers that ever died?  Is that a good idea?  Shouldn't finish remove item 
from deadservers rather than just from deadServersUnderProcessing

Change  name of this method, cloneProcessingDeadServers.  Just call it 
getDeadServers?  That its a clone is an internal implementation detail?






  was (Author: stack):
I agree with the spirit of this class.  Good stuff Chunhui.

This is awkward name for a method, getDeadServersUnderProcessing.  Should it be 
getDeadServers?  Does it need to be a public method?  Seems fine that it be 
package private.

Is serversWithoutSplitLog a good name for a local variable?  Should it be 
deadServers with a comment saying that deadServers are processed by 
servershutdownhandler and it will be taking care of the log splitting?

Is this right -- for trunk?

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {

Online servers is keyed by a ServerName, not a hostname.

What is a deadServersUnderProcessing?  Does DeadServers keep list of all 
servers that ever died?  Is that a good idea?  Shouldn't finish remove item 
from deadservers rather than just from deadServersUnderProcessing

Change  name of this method, cloneProcessingDeadServers.  Just call it 
getDeadServers?  That its a clone is an internal implementation detail?





  
 Concurrent processing of processFaileOver and ServerShutdownHandler  may 
 cause region is assigned before completing split log, it would cause data loss
 ---

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5165) Concurrent processing of DeleteTableHandler and ServerShutdownHandler may cause deleted region to assign again

2012-01-10 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183167#comment-13183167
 ] 

Zhihong Yu edited comment on HBASE-5165 at 1/10/12 9:39 AM:


As Stack said in HBASE-5155:
bq. now when a table is disabled, we now set a flag for the table in zk rather 
than do it individually on each region

@Chunhui:
Can you address the above in ServerShutdownHandler ?

  was (Author: zhi...@ebaysf.com):
As Stack said:
bq. now when a table is disabled, we now set a flag for the table in zk rather 
than do it individually on each region

@Chunhui:
Can you address the above in ServerShutdownHandler ?
  
 Concurrent processing of DeleteTableHandler and ServerShutdownHandler may 
 cause deleted region to assign again
 --

 Key: HBASE-5165
 URL: https://issues.apache.org/jira/browse/HBASE-5165
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: chunhui shen
 Attachments: hbase-5165.patch, hbase-5165v2.patch


 Concurrent processing of DeleteTableHandler and ServerShutdownHandler may 
 cause following situation
 1.Table has already be disabled.
 2.ServerShutdownHandler is doing MetaReader.getServerUserRegions.
 3.When step2 is processing or is completed just now, DeleteTableHandler 
 starts to delete region(Remove region from META and Delete region from FS)
 4.DeleteTableHandler set table enabled.
 4.ServerShutdownHandler is starting to assign region which is alread deleted 
 by DeleteTableHandler.
 The result of above operations is producing an invalid record in .META.  and 
 can't be fixed by hbck 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-10 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183380#comment-13183380
 ] 

Zhihong Yu edited comment on HBASE-5120 at 1/10/12 5:07 PM:


{code}
+LOG.debug(The deletion of the CLOSED node returned  + deleteNode);
{code}
I think the above should be at ERROR level. Also, please log the region name.

  was (Author: zhi...@ebaysf.com):
{code}
+LOG.debug(The deletion of the CLOSED node returned  + deleteNode);
{code}
I think the above should be at ERROR level. Also, please make the sentence 
syntactically correct.
  
 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG

[jira] [Issue Comment Edited] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-08 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182337#comment-13182337
 ] 

Zhihong Yu edited comment on HBASE-5123 at 1/9/12 12:32 AM:


AggregationClient.std() already provides support for standard deviation.

  was (Author: zhi...@ebaysf.com):
AggregationClient.std() already provides this capability.
  
 Provide more aggregate functions for Aggregations Protocol
 --

 Key: HBASE-5123
 URL: https://issues.apache.org/jira/browse/HBASE-5123
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu

 Royston requested the following aggregates on top of what we already have:
 Median, Weighted Median, Mult
 See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-07 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182111#comment-13182111
 ] 

Zhihong Yu edited comment on HBASE-5137 at 1/7/12 10:04 PM:


Nicolas might know the reason for introducing 
hbase.hlog.split.failure.retry.interval parameter

  was (Author: zhi...@ebaysf.com):
Nicolas might know the reason for introducing 
hbase.hlog.split.failure.retry.interval parameter

Please provide a patch for 0.92 and TRUNK which adds check for retrySplitting 
in the following if statement (line 220):
{code}
if (!checkFileSystem()) {
{code}
  
 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
 IOException
 

 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: 5137-trunk.txt, HBASE-5137.patch


 I am not sure if this bug was already raised in JIRA.
 In our test cluster we had a scenario where the RS had gone down and 
 ServerShutDownHandler started with splitLog.
 But as the HDFS was down the check waitOnSafeMode throws IOException.
 {code}
 try {
 // If FS is in safe mode, just wait till out of it.
 FSUtils.waitOnSafeMode(conf,
   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 {code}
 We catch the exception
 {code}
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
 }
 {code}
 So the HLog split itself did not happen. We encontered like 4 regions that 
 was recently splitted in the crashed RS was lost.
 Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181572#comment-13181572
 ] 

Zhihong Yu edited comment on HBASE-5140 at 1/6/12 8:40 PM:
---

I suggest utilizing this method in HTable:
{code}
  public Pairbyte[][],byte[][] getStartEndKeys() throws IOException {
{code}
i.e. start and end keys returned by the above method are passed to the splitter 
interface.

  was (Author: zhi...@ebaysf.com):
I suggest utilizing this method in HTable:
{code}
  public Pairbyte[][],byte[][] getStartEndKeys() throws IOException {
{code}
i.e. start and end keys are passed to the splitter interface.
  
 TableInputFormat subclass to allow N number of splits per region during MR 
 jobs
 ---

 Key: HBASE-5140
 URL: https://issues.apache.org/jira/browse/HBASE-5140
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial
   Original Estimate: 72h
  Remaining Estimate: 72h

 In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
 am working on a subclass for the TableInputFormat class that overrides 
 getSplits in order to generate N number of splits per regions and/or N number 
 of splits per job. The idea is to convert the startKey and endKey for each 
 region from byte[] to BigDecimal, take the difference, divide by N, convert 
 back to byte[] and generate splits on the resulting values. Assuming your 
 keys are fully distributed this should generate splits at nearly the same 
 number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-05 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180281#comment-13180281
 ] 

Zhihong Yu edited comment on HBASE-5121 at 1/5/12 11:20 AM:


The above test failure shows the bug, that's great.

About the patch itself, some minor comments:
Please add Apache license to PeekChangedException header.
{code}
++ this.lastTop.toString() + ,and after = 
++ (this.heap.peek() == null ? null : 
this.heap.peek().toString()));
{code}
The null check on the second line is not needed for string concatenation. See 
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuffer.html#append%28java.lang.String%29

{code}
+  mayContainsMoreRows = currentAsInternal.next(result, limit);
{code}
Since this part of the code is changed, please name the variable 
mayContainMoreRows.

Please combine the two patches in one next time they're uploaded.

Great work, Chunhui.

  was (Author: zhi...@ebaysf.com):
The above test failure shows the bug, that's great.

About the patch itself, some minor comments:
Please add Apache license to PeekChangedException header.
{code}
++ this.lastTop.toString() + ,and after = 
++ (this.heap.peek() == null ? null : 
this.heap.peek().toString()));
{code}
The null check on the second line is not needed for string concatenation.

{code}
+  mayContainsMoreRows = currentAsInternal.next(result, limit);
{code}
Since this part of the code is changed, please name the variable 
mayContainMoreRows.

Please combine the two patches in one next time they're uploaded.

Great work, Chunhui.
  
 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: hbase-5121-testcase.patch, hbase-5121.patch, 
 hbase-5121v2.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179394#comment-13179394
]

Zhihong Yu edited comment on HBASE-5081 at 1/4/12 2:48 PM:
---

In the above scenario, would there be many duplicate log split scheduled for
messages in master log file ?

I think the latest patch should go through verification in a real cluster.

was (Author: zhi...@ebaysf.com):
In the above scenario, would there be many duplicate log messages in
master log file ?

Distributed log splitting deleteNode races against splitLog retry
--

Key: HBASE-5081
URL: https://issues.apache.org/jira/browse/HBASE-5081
Project: HBase
Issue Type: Bug
Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
Fix For: 0.92.0

Attachments:
0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch,
0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch,
0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch,
0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch,
distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt,
hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt,
hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt,
patch_for_92_v3.txt

Recently, during 0.92 rc testing, we found distributed log splitting hangs
there forever. Please see attached screen shot.
I looked into it and here is what happened I think:
1. One rs died, the servershutdownhandler found it out and started the
distributed log splitting;
2. All three tasks failed, so the three tasks were deleted, asynchronously;
3. Servershutdownhandler retried the log splitting;
4. During the retrial, it created these three tasks again, and put them in a
hashmap (tasks);
5. The asynchronously deletion in step 2 finally happened for one task, in
the callback, it removed one
task in the hashmap;
6. One of the newly submitted tasks' zookeeper watcher found out that task is
unassigned, and it is not
in the hashmap, so it created a new orphan task.
7. All three tasks failed, but that task created in step 6 is an orphan so
the batch.err counter was one short,
so the log splitting hangs there and keeps waiting for the last task to
finish which is never going to happen.
So I think the problem is step 2. The fix is to make deletion sync, instead
of async, so that the retry will have
a clean start.
Async deleteNode will mess up with split log retrial. In extreme situation,
if async deleteNode doesn't happen
soon enough, some node created during the retrial could be deleted.
deleteNode should be sync.

[jira] [Issue Comment Edited] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179390#comment-13179390
]

Zhihong Yu edited comment on HBASE-5121 at 1/4/12 5:04 PM:
---

{code}
+this.lastTop = null;
+LOG.debug(Storescanner.peek() is changed where before =
++ this.lastTop.toString() + ,and after =
{code}
Should null assignment for lastTop be after the debug log ?

was (Author: zhi...@ebaysf.com):
Should null assignment for lastTop be after the debug log ?

MajorCompaction may affect scan's correctness
-

Key: HBASE-5121
URL: https://issues.apache.org/jira/browse/HBASE-5121
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: chunhui shen
Attachments: hbase-5121.patch

In our test, there are two families' keyvalue for one row.
But we could find a infrequent problem when doing scan's next if
majorCompaction happens concurrently.
In the client's two continuous doing scan.next():
1.First time, scan's next returns the result where family A is null.
2.Second time, scan's next returns the result where family B is null.
The two next()'s result have the same row.
If there are more families, I think the scenario will be more strange...
We find the reason is that storescanner.peek() is changed after
majorCompaction if there are delete type KeyValue.
This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap
is not sure to be sorted.

[jira] [Issue Comment Edited] (HBASE-4822) TestCoprocessorEndpoint is failing intermittently

2012-01-01 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178185#comment-13178185
 ] 

Zhihong Yu edited comment on HBASE-4822 at 1/1/12 4:51 PM:
---

testAggregation disabled in TRUNK.

  was (Author: zhi...@ebaysf.com):
testAggregation disabled.
  
 TestCoprocessorEndpoint is failing intermittently
 -

 Key: HBASE-4822
 URL: https://issues.apache.org/jira/browse/HBASE-4822
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Attachments: 4822.txt


 TestCoprocessorEndpoint is failing intermittently in Jenkins.  The errors are 
 all similar to:
 {noformat}
 ---
 Test set: org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
 ---
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 30.562 sec 
  FAILURE!
 testAggregation(org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint)  
 Time elapsed: 0.069 sec   FAILURE!
 java.lang.AssertionError: Invalid result expected:180 but was:190
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testAggregation(TestCoprocessorEndpoint.java:149)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177282#comment-13177282
 ] 

Zhihong Yu edited comment on HBASE-5064 at 12/29/11 5:20 PM:
-

I saw patch v19 coming and started to run test suite with it - I didn't keep 
the output from TestMiniClusterLoadParallel

w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) 
for surefire.secondPartThreadCount if the parameter isn't set.


  was (Author: zhi...@ebaysf.com):
I saw patch v19 coming and started to run test suited with it - I didn't 
keep the output from TestMiniClusterLoadParallel

w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) 
for surefire.secondPartThreadCount if the parameter isn't set.

  
 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2011-12-28 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176701#comment-13176701
 ] 

Zhihong Yu edited comment on HBASE-5097 at 12/28/11 4:57 PM:
-

fair enough. So if postOpenScanner returns null we just won't create a scanner?
Might lead to hard to detect problems. 

  was (Author: lhofhansl):
fair enough. So if posyOpenScanner returns null we just won't create a 
scanner?
Might lead to hard to detect problems. 
  
 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2011-12-28 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176701#comment-13176701
 ] 

Zhihong Yu edited comment on HBASE-5097 at 12/28/11 4:58 PM:
-

fair enough. So if postScannerOpen() returns null we just won't create a 
scanner?
Might lead to hard to detect problems. 

  was (Author: lhofhansl):
fair enough. So if postOpenScanner returns null we just won't create a 
scanner?
Might lead to hard to detect problems. 
  
 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-28 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176951#comment-13176951
 ] 

Zhihong Yu edited comment on HBASE-5101 at 12/29/11 2:27 AM:
-

@Jon:
Can you describe how the cluster got into this state ?
I guess a lot of region servers crashed ?
Otherwise there should have been better planning for the region size.

  was (Author: zhi...@ebaysf.com):
@Jon:
Can you describe how the cluster got into this state ?
I guess a lot of region servers crashed ?
Otherwise there should have been better planning for the HFile size limit.
  
 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-27 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176465#comment-13176465
 ] 

Zhihong Yu edited comment on HBASE-5100 at 12/28/11 6:01 AM:
-

When this.parent.close(false) returns null(It means region has already been 
closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION

  was (Author: zjushch):
When this.parent.close（false） returns null(It means region has already been 
closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION
  
 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen

 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1

[jira] [Issue Comment Edited] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2011-12-24 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175770#comment-13175770
]

Zhihong Yu edited comment on HBASE-4218 at 12/25/11 12:21 AM:
--

Integrated to TRUNK

Thanks for the awesome work, Jacek.

Thanks for the persistence to finish this feature, Mikhail.

Thanks for the detailed review Kannan.

Thanks for the suggestions, Matt.

was (Author: zhi...@ebaysf.com):
Integrated to TRUNK

Thanks for the awesome work, Jacek.

Thanks for the persistence to finish this feature, Mikhail.

Thanks for the suggestions, Matt.

Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
---

Key: HBASE-4218
URL: https://issues.apache.org/jira/browse/HBASE-4218
Project: HBase
Issue Type: Improvement
Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
Labels: compression
Fix For: 0.94.0

Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch,
0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch,
D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch,
D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch,
Data-block-encoding-2011-12-23.patch,
Delta-encoding.patch-2011-12-22_11_52_07.patch,
Delta_encoding_with_memstore_TS.patch, open-source.diff

A compression for keys. Keys are sorted in HFile and they are usually very
similar. Because of that, it is possible to design better compression than
general purpose algorithms,
It is an additional step designed to be used in memory. It aims to save
memory in cache as well as speeding seeks within HFileBlocks. It should
improve performance a lot, if key lengths are larger than value lengths. For
example, it makes a lot of sense to use it when value is a counter.
Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes)
shows that I could achieve decent level of compression:
key compression ratio: 92%
total compression ratio: 85%
LZO on the same data: 85%
LZO after delta encoding: 91%
While having much better performance (20-80% faster decompression ratio than
LZO). Moreover, it should allow far more efficient seeking which should
improve performance a bit.
It seems that a simple compression algorithms are good enough. Most of the
savings are due to prefix compression, int128 encoding, timestamp diffs and
bitfields to avoid duplication. That way, comparisons of compressed data can
be much faster than a byte comparator (thanks to prefix compression and
bitfields).
In order to implement it in HBase two important changes in design will be
needed:
-solidify interface to HFileBlock / HFileReader Scanner to provide seeking
and iterating; access to uncompressed buffer in HFileBlock will have bad
performance
-extend comparators to support comparison assuming that N first bytes are
equal (or some fields are equal)
Link to a discussion about something similar:
http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

[jira] [Issue Comment Edited] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint

2011-12-16 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170980#comment-13170980
 ] 

Zhihong Yu edited comment on HBASE-4938 at 12/16/11 2:08 PM:
-

Hadoop Qa checks attachment Id to not rerun same attachment. 
That's why a new file has to be attached. 

  was (Author: zhi...@ebaysf.com):
Hadoop Qa uses attachment Id to not rerun same attachment. 
That's why a new file has to be attached. 
  
 Create a HRegion.getScanner public method that allows reading from a 
 specified readPoint
 

 Key: HBASE-4938
 URL: https://issues.apache.org/jira/browse/HBASE-4938
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: scannerMVCC1.txt


 There is an existing api HRegion.getScanner(Scan) that allows scanning a 
 table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5055) Build against hadoop 0.22 broken

2011-12-16 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171258#comment-13171258
 ] 

Zhihong Yu edited comment on HBASE-5055 at 12/16/11 10:19 PM:
--

Shall we change the version for hadoop 0.22 to 0.22.0 from 0.22.0-SNAPSHOT ?

  was (Author: zhi...@ebaysf.com):
Shall we change the version for hadoop 0.22 to 0.22.0 instead of SNAPSHOT ?
  
 Build against hadoop 0.22 broken
 

 Key: HBASE-5055
 URL: https://issues.apache.org/jira/browse/HBASE-5055
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.92.0, 0.94.0


 I got the following when compiling TRUNK against hadoop 0.22:
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile 
 (default-compile) on project hbase: Compilation failure: Compilation failure:
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[37,39]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class org.apache.hadoop.hdfs.DFSClient
 [ERROR] 
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[109,37]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.WALReader.WALReaderFSDataInputStream
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4683) Always cache index and bloom blocks

2011-12-13 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168899#comment-13168899
]

Zhihong Yu edited comment on HBASE-4683 at 12/13/11 11:59 PM:
--

SchemaConfigured was introduced by HBASE-4768.
Looks like backport is needed since this JIRA is marked against 0.92

was (Author: zhi...@ebaysf.com):
SchemaConfigured was introduced by HBASE-4768.

Always cache index and bloom blocks
---

Key: HBASE-4683
URL: https://issues.apache.org/jira/browse/HBASE-4683
Project: HBase
Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Mikhail Bautin
Priority: Minor
Fix For: 0.92.0, 0.94.0

Attachments: 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch,
HBASE-4683-v3.patch

This would add a new boolean config option: hfile.block.cache.datablocks
Default would be true.
Setting this to false allows HBase in a mode where only index blocks are
cached, which is useful for analytical scenarios where a useful working set
of the data cannot be expected to fit into the (aggregate) cache.
This is the equivalent of setting cacheBlocks to false on all scans
(including scans on behalf of gets).
I would like to get a general feeling about what folks think about this.
The change itself would be simple.
Update (Mikhail): we probably don't need a new conf option. Instead, we will
make index blocks cached by default.

47 matches

Mail list logo