date:20130110


 [ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7403:


Attachment: hbase-7403-trunkv9.patch

Addressing Ted's comments, adding a test for concurrent region splitting and 
region merging scenario

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
 hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server


 [ 
https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7506:


Attachment: 7506-trunkv1.patch

 Judgment of carrying ROOT/META will become wrong when expiring server
 -

 Key: HBASE-7506
 URL: https://issues.apache.org/jira/browse/HBASE-7506
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch, 
 7506-trunkv2.patch


 We will check whether server carrying ROOT/META when expiring the server.
 See ServerManager#expireServer.
 If the dead server carrying META, we assign meta directly in the process of 
 ServerShutdownHandler.
 If the dead server carrying ROOT, we will offline ROOT and then 
 verifyAndAssignRootWithRetries()
 How judgement of carrtying ROOT/META become wrong?
 If region is in RIT, and isCarryingRegion() return true after addressing from 
 zk.
 However, once RIT time out(could be caused by this.allRegionServersOffline  
 !noRSAvailable, see AssignmentManager#TimeoutMonitor)   and we assign it to 
 otherwhere, this judgement become wrong.
 See AssignmentManager#isCarryingRegion for details
 With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META 
 twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server


 [ 
https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7506:


Status: Patch Available  (was: Open)

 Judgment of carrying ROOT/META will become wrong when expiring server
 -

 Key: HBASE-7506
 URL: https://issues.apache.org/jira/browse/HBASE-7506
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch, 
 7506-trunkv2.patch


 We will check whether server carrying ROOT/META when expiring the server.
 See ServerManager#expireServer.
 If the dead server carrying META, we assign meta directly in the process of 
 ServerShutdownHandler.
 If the dead server carrying ROOT, we will offline ROOT and then 
 verifyAndAssignRootWithRetries()
 How judgement of carrtying ROOT/META become wrong?
 If region is in RIT, and isCarryingRegion() return true after addressing from 
 zk.
 However, once RIT time out(could be caused by this.allRegionServersOffline  
 !noRSAvailable, see AssignmentManager#TimeoutMonitor)   and we assign it to 
 otherwhere, this judgement become wrong.
 See AssignmentManager#isCarryingRegion for details
 With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META 
 twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549425#comment-13549425
 ] 

chunhui shen commented on HBASE-7504:
-

Patch v2 committed to trunk,0.94 branch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Fix Version/s: 0.94.5

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-94.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7505) Server will hang when stopping cluster, caused by waiting for split threads


 [ 
https://issues.apache.org/jira/browse/HBASE-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7505:


   Resolution: Fixed
Fix Version/s: (was: 0.94.4)
   0.94.5
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Server will hang when stopping cluster, caused by waiting for split threads
 ---

 Key: HBASE-7505
 URL: https://issues.apache.org/jira/browse/HBASE-7505
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7505-94.patch, 7505-trunk v1.patch


 We will retry 100 times (about 3200 minitues) for 
 HRegionServer#postOpenDeployTasks now, see 
 HConnectionManager#setServerSideHConnectionRetries.
 However, 
 when we stopping the cluster, we will wait for split threads in  
 HRegionServer#join,
 if META/ROOT server has already been stopped, the split thread won't exit 
 because it is in the retrying for HRegionServer#postOpenDeployTasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server

[
https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549435#comment-13549435
]

Hadoop QA commented on HBASE-7506:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12564138/7506-trunkv1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestSplitTransaction

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3958//console

This message is automatically generated.

Judgment of carrying ROOT/META will become wrong when expiring server
-

Key: HBASE-7506
URL: https://issues.apache.org/jira/browse/HBASE-7506
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
Fix For: 0.96.0

Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch,
7506-trunkv2.patch

We will check whether server carrying ROOT/META when expiring the server.
See ServerManager#expireServer.
If the dead server carrying META, we assign meta directly in the process of
ServerShutdownHandler.
If the dead server carrying ROOT, we will offline ROOT and then
verifyAndAssignRootWithRetries()
How judgement of carrtying ROOT/META become wrong?
If region is in RIT, and isCarryingRegion() return true after addressing from
zk.
However, once RIT time out(could be caused by this.allRegionServersOffline
!noRSAvailable, see AssignmentManager#TimeoutMonitor) and we assign it to
otherwhere, this judgement become wrong.
See AssignmentManager#isCarryingRegion for details
With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META
twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549436#comment-13549436
 ] 

Hadoop QA commented on HBASE-7504:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564139/7504-94.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3960//console

This message is automatically generated.

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)

2013-01-10 Thread Anil Gupta (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549437#comment-13549437
]

Anil Gupta commented on HBASE-7474:
---

For SortingProtocol:

+ T Result[] sortIncreasing(Scan scan, byte[] columnFamily, byte[]
columnQualifier,

I think sortAscending would be more familiar to people who have worked with
RDBMS.
Anil: Done. Even, i was thinking that Ascending and Descending are more familiar

+ T Result[] sortDecreasing(Scan scan, byte[] columnFamily, byte[]
columnQualifier,

sortDescending would be a better method name.
Anil: Done

+ * @param singleRegion does this scan request spans multiple regions?

spelling: 'spans' - 'span'
Anil: spans is the correct word

Looking at SortingProtocolImplementation.sortIncreasing(), singleRegion is not
referenced in the loop - we scan until there is no more row.
Anil: When the scan is limited to a single region we only return startIndex to
startIndex+(pageSize-1) results to client since we dont need to merge sort at
client side. We dont need to use singleRegion in the loop. Otherwise if scan
spans multiple region then Region returns 0 to (startIndex+(pageSize-1)) to
client for carrying out merge sort.

Some clarification is needed in javadoc and variable name.
Anil:Do you want me to write above description for singleRegion in comments?
Wont it confuse the user with too much of information?

Endpoint Implementation to support Scans with Sorting of Rows based on column
values(similar to order by clause of RDBMS)
---

Key: HBASE-7474
URL: https://issues.apache.org/jira/browse/HBASE-7474
Project: HBase
Issue Type: New Feature
Components: Coprocessors, Scanners
Affects Versions: 0.94.3
Reporter: Anil Gupta
Assignee: Anil Gupta
Priority: Minor
Labels: coprocessors, scan, sort
Fix For: 0.94.5

Attachments: hbase-7474.patch, hbase-7474-v2.patch,
SortingEndpoint_high_level_flowchart.pdf

Recently, i have developed an Endpoint which can sort the Results(rows) on
the basis of column values. This functionality is similar to order by
clause of RDBMS. I will be submitting this Patch for HBase0.94.3
I am almost done with the initial development and testing of feature. But, i
need to write the JUnits for this. I will also try to make design doc.
Thanks,
Anil Gupta
Software Engineer II, Intuit, inc

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)

2013-01-10 Thread Anil Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549452#comment-13549452
 ] 

Anil Gupta commented on HBASE-7474:
---

License headers in SortingClient.java and 
BigDecimalSortingColumnInterpreter.java are not properly formatted.
Anil: Pending
Some log statements, such as the following, can be at debug level.

+  log.info(Querying only one region for sorting);
Anil: Done

+if (sortDecreasing) return instance.sortDecreasing(scan, 
columnFamily, columnQualifier,
+  colInterpreter, startIndex, pageSize, true);
+else return instance.sortIncreasing(scan, columnFamily, 
columnQualifier,

'else' keyword is not needed above.
Anil: Done. But, personally i like to explicitly use the else keyword for 
readability purpose. I am curious to know if there is any technical reason for 
not using else in the above case?

+   * This method is used to do the merge sort the rows from multiple regions 
and produce the final output

Remove 'do the'. Wrap long line.
Anil: Done

+for (Map.Entrybyte[], Result[] regionResultsEntryMap : 
regionResultMap.entrySet()) {

regionResultsEntryMap - regionResultsEntry or regionResultsMapEntry
Anil: Done

+if(totalNoOfRows  startIndex)
+{

Normally left brace is on the same line as if statement. Insert a space between 
if and (.
Anil: Done

currentMaxorMinValueRegion and maxOrMin are used in the if / else blocks. You 
can move them inside if / else block and give them names that are clearer in 
meaning.
Anil: Done

+for (Result[] regionResult : regionResults) {
+  if ((regionResult.length - 1)  arrayIndex[regionNum]) {

regionResults and arrayIndex are both arrays. So you can use the same index to 
access them - in my opinion the code is more readable.
Anil:Pending

+  finalResult[finalResultCurrentSize++] = 
regionResults[currentMaxorMinValueRegion][arrayIndex[currentMaxorMinValueRegion]];

Wrap long line above.
Anil: Done

+  if (colInterpreter.compare(tmp, maxOrMin)  0) {

If I read the code correctly, the above comparison is the major difference 
between ascending and descending sorting. A little abstraction would allow you 
to unify the two cases.
Anil: IMHO, the only way to do that is to put an If(sortDecresing) condition 
and then either do  or  comparison on the basis of sortDecreasing. I am 
worried that this abstraction will make the implementation a tab more slow 
since the worst case complexity of this sorting is O(n*n). I would prefer 
performance over few extra lines of code. Let me know your views.

Looking at SortingColumnInterpreter, this is the only method which is not 
present in ColumnInterpreter:

+  T getValue(KeyValue kv) throws IOException;

The following method is already provided by ColumnInterpreter:

  public abstract T getValue(byte[] colFamily, byte[] colQualifier, KeyValue kv)
  throws IOException;
Anil: I am thinking of adding the missing method T getValue(KeyValue kv) 
throws IOException; in ColumnInterpreter. Is that fine? I dont understand why 
we need colFamily and colQualifier in getValue method when only a KeyValue is 
passed.

Please consider dropping SortingColumnInterpreter

Thanks a lot for doing the code review.
~Anil Gupta
Software Engineer II, Intuit, Inc


 Endpoint Implementation to support Scans with Sorting of Rows based on column 
 values(similar to order by clause of RDBMS)
 ---

 Key: HBASE-7474
 URL: https://issues.apache.org/jira/browse/HBASE-7474
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors, Scanners
Affects Versions: 0.94.3
Reporter: Anil Gupta
Assignee: Anil Gupta
Priority: Minor
  Labels: coprocessors, scan, sort
 Fix For: 0.94.5

 Attachments: hbase-7474.patch, hbase-7474-v2.patch, 
 SortingEndpoint_high_level_flowchart.pdf


 Recently, i have developed an Endpoint which can sort the Results(rows) on 
 the basis of column values. This functionality is similar to order by 
 clause of RDBMS. I will be submitting this Patch for HBase0.94.3
 I am almost done with the initial development and testing of feature. But, i 
 need to write the JUnits for this. I will also try to make design doc.
 Thanks,
 Anil Gupta
 Software Engineer II, Intuit, inc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549457#comment-13549457
 ] 

Hudson commented on HBASE-7504:
---

Integrated in HBase-TRUNK #3721 (See 
[https://builds.apache.org/job/HBase-TRUNK/3721/])
HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) 
(Revision 1431208)

 Result = FAILURE
zjushch : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7403) Online Merge


[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549464#comment-13549464
 ] 

Hadoop QA commented on HBASE-7403:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12564137/hbase-7403-trunkv9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.replication.TestReplicationWithCompression
  org.apache.hadoop.hbase.client.TestMultiParallel

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3959//console

This message is automatically generated.

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
 hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t

[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549494#comment-13549494
 ] 

Hudson commented on HBASE-7504:
---

Integrated in HBase-0.94 #721 (See 
[https://builds.apache.org/job/HBase-0.94/721/])
HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) 
(Revision 1431204)

 Result = SUCCESS
zjushch : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS


[ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549560#comment-13549560
 ] 

Hudson commented on HBASE-7504:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #340 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/340/])
HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) 
(Revision 1431208)

 Result = FAILURE
zjushch : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)


[ 
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549585#comment-13549585
 ] 

Ted Yu commented on HBASE-7474:
---

@Anil:
Thanks for the detailed response. In the future, you can quote comments using 
bqdot. People would be able to correlate your response with original comment.

w.r.t. singleRegion in SortingProtocolImplementation.sortIncreasing(), your 
explanation makes sense.

+   * @param singleRegion does this scan request spans multiple regions?

Here 'scan request' is singular, so 'spans' should be 'span'. I am fine with 
the javadoc after correcting spelling.

+if (sortDecreasing) return instance.sortDecreasing(scan, 
columnFamily, columnQualifier,
+  colInterpreter, startIndex, pageSize, true);
+else return instance.sortIncreasing(scan, columnFamily, 
columnQualifier,

bq. I am curious to know if there is any technical reason for not using else 
in the above case?

The reason is that when sortDecreasing is true, we would return from the 
method, hence not reaching else statement.

bq. I am worried that this abstraction will make the implementation a tab more 
slow

There are several conditional statements inside colInterpreter.compare(), I 
doubt there would be noticeable impact on performance if we unite code 
ascending and descending sorting. You can record the performance number for 
current implementation and compare the performance of rewritten code with that 
number.

bq. I am thinking of adding the missing method T getValue(KeyValue kv) throws 
IOException; in ColumnInterpreter. Is that fine?

ColumnInterpreter is able to provide access to value of the passed in KeyValue, 
so I don't think there is need to add the new method.


 Endpoint Implementation to support Scans with Sorting of Rows based on column 
 values(similar to order by clause of RDBMS)
 ---

 Key: HBASE-7474
 URL: https://issues.apache.org/jira/browse/HBASE-7474
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors, Scanners
Affects Versions: 0.94.3
Reporter: Anil Gupta
Assignee: Anil Gupta
Priority: Minor
  Labels: coprocessors, scan, sort
 Fix For: 0.94.5

 Attachments: hbase-7474.patch, hbase-7474-v2.patch, 
 SortingEndpoint_high_level_flowchart.pdf


 Recently, i have developed an Endpoint which can sort the Results(rows) on 
 the basis of column values. This functionality is similar to order by 
 clause of RDBMS. I will be submitting this Patch for HBase0.94.3
 I am almost done with the initial development and testing of feature. But, i 
 need to write the JUnits for this. I will also try to make design doc.
 Thanks,
 Anil Gupta
 Software Engineer II, Intuit, inc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk

chunhui shen created HBASE-7529:
---

 Summary: Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


{code}
M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master asking 
RS to open root
{code}

It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
time:

1.ROOT wait open-region-thread to handle opening it.
2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7529:


Attachment: 7529-trunk.patch

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549710#comment-13549710
 ] 

ramkrishna.s.vasudevan commented on HBASE-7529:
---

Good catch.  +1.

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7529:
--

Priority: Critical  (was: Major)
Hadoop Flags: Reviewed

This was discovered when Chunhui tried to find root cause for TestMultiParallel 
failure.

+1 from me.

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7529:
--

Status: Patch Available  (was: Open)

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)

2013-01-10 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549776#comment-13549776
 ] 

Lars Hofhansl commented on HBASE-7474:
--

[~giacomotaylor] do you have any comments? Would this be useful for you?

 Endpoint Implementation to support Scans with Sorting of Rows based on column 
 values(similar to order by clause of RDBMS)
 ---

 Key: HBASE-7474
 URL: https://issues.apache.org/jira/browse/HBASE-7474
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors, Scanners
Affects Versions: 0.94.3
Reporter: Anil Gupta
Assignee: Anil Gupta
Priority: Minor
  Labels: coprocessors, scan, sort
 Fix For: 0.94.5

 Attachments: hbase-7474.patch, hbase-7474-v2.patch, 
 SortingEndpoint_high_level_flowchart.pdf


 Recently, i have developed an Endpoint which can sort the Results(rows) on 
 the basis of column values. This functionality is similar to order by 
 clause of RDBMS. I will be submitting this Patch for HBase0.94.3
 I am almost done with the initial development and testing of feature. But, i 
 need to write the JUnits for this. I will also try to make design doc.
 Thanks,
 Anil Gupta
 Software Engineer II, Intuit, inc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549790#comment-13549790
 ] 

Hadoop QA commented on HBASE-7529:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564183/7529-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestLocalHBaseCluster

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3961//console

This message is automatically generated.

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently


[ 
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549789#comment-13549789
 ] 

ramkrishna.s.vasudevan commented on HBASE-7468:
---

So found the reason.  As stated in the above comment after rollback we need to 
delete the znode.  Only after the znode deletion happens it is possible to 
remove from RIT.  Only then the disable will be successful. 
In the previous commit, the infinite loops were removed and changed to finite 
loops.  So basically here the 
{code}
   assertFalse(region is still in transition,

am.getRegionsInTransition().containsKey(regions.get(0).getRegionInfo().getEncodedName()));
{code}
assertion has failed and it has tried to disable the table which did not 
happen.  
But in the output file attached by Lars the thing is the node deleted event 
never happened at all and i doubt it is because of the session expiry error 
that has come just after the rollback
{code}
2013-01-06 21:49:35,500 WARN  
[Master:0;bunnypig,51009,1357537755267-EventThread] zookeeper.ZKUtil(423): 
hconnection-0x13c138da85b0019 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:414)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:188)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:301)
{code}
So my suggestion would be we need to wait till the RIT is removed for the 
SPLITTING znode that happens thro AM.nodeDeleted().  And we should introdue a 
timeout for the test which is missing.  The same testcase does not exist in 
Trunk.
@Lars
Pls provide your thoughts.


 TestSplitTransactionOnCluster hangs frequently
 --

 Key: HBASE-7468
 URL: https://issues.apache.org/jira/browse/HBASE-7468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Lars Hofhansl
Assignee: ramkrishna.s.vasudevan
 Attachments: 7468-jstack.txt, 7468-output.zip, 
 TestSplitTransactionOnCluster-jstack.txt


 This what I saw once in a local build.
 {code}
 java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
 at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7521) fix HBASE-6060 (regions stuck in opening state) in 0.94


[ 
https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549809#comment-13549809
 ] 

ramkrishna.s.vasudevan commented on HBASE-7521:
---

@Sergey
I had a glance at the patch.  I think the SSH and the retry logic in assign on 
seeing an RS is down will lead to race conditions which is handled in the 
patches in HBASE-6060.
[~rajesh32]
Wanna take a look at this?  I know you have a version of this running in your 
cluster for 0.94.

 fix HBASE-6060 (regions stuck in opening state) in 0.94
 ---

 Key: HBASE-7521
 URL: https://issues.apache.org/jira/browse/HBASE-7521
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7521-v0.patch, HBASE-7521-v1.patch


 Discussion in HBASE-6060 implies that the fix there does not work on 0.94. 
 Still, we may want to fix the issue in 0.94 (via some different fix) because 
 the regions stuck in opening for ridiculous amounts of time is not a good 
 thing to have.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549826#comment-13549826
 ] 

stack commented on HBASE-7529:
--

+1 Excellent find.  Good stuff Chunhui.

 Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
 

 Key: HBASE-7529
 URL: https://issues.apache.org/jira/browse/HBASE-7529
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7529-trunk.patch


 {code}
 M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION),  // Master 
 asking RS to open root
 {code}
 It's a mistake only in trunk, causing ROOT couldn't be online for a long long 
 time:
 1.ROOT wait open-region-thread to handle opening it.
 2.Opening regions wait for ROOT to online, but occupy the threads...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently


[ 
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549831#comment-13549831
 ] 

stack commented on HBASE-7468:
--

bq. ...we need to wait till the RIT is removed for the SPLITTING znode...

Or can we just block until it is removed (with a timeout on the test) rather 
than have a timer?  Will it get removed if we wait long enough?  Why is it 
taking a while?

Why is this test not in trunk, do you know Ram?

Thanks for taking a looksee.

 TestSplitTransactionOnCluster hangs frequently
 --

 Key: HBASE-7468
 URL: https://issues.apache.org/jira/browse/HBASE-7468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Lars Hofhansl
Assignee: ramkrishna.s.vasudevan
 Attachments: 7468-jstack.txt, 7468-output.zip, 
 TestSplitTransactionOnCluster-jstack.txt


 This what I saw once in a local build.
 {code}
 java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
 at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently


[ 
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549835#comment-13549835
 ] 

ramkrishna.s.vasudevan commented on HBASE-7468:
---

The way RIT is populated for SPLITTING in 0.94 is not there in trunk after the 
AM related changes.  
The logs does not clearly tell the reason why is it taking a while but 
currently the testcase waits for 2 sec for it to happen.

Waiting should remove the znode i feel.  Need to run the tests repeatedly to 
see if there is any other reason for it.

 TestSplitTransactionOnCluster hangs frequently
 --

 Key: HBASE-7468
 URL: https://issues.apache.org/jira/browse/HBASE-7468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Lars Hofhansl
Assignee: ramkrishna.s.vasudevan
 Attachments: 7468-jstack.txt, 7468-output.zip, 
 TestSplitTransactionOnCluster-jstack.txt


 This what I saw once in a local build.
 {code}
 java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
 at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4274) RS should periodically ping its HLog pipeline even if no writes are arriving


 [ 
https://issues.apache.org/jira/browse/HBASE-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4274:
-

 Priority: Major  (was: Critical)
Fix Version/s: (was: 0.96.0)

Marking down to major and moving out of 0.96.  Bring back in if folks want RS 
to die quickly when HDFS goes out from under HBase (It does seem like general 
tendency though is to go the other direction, and try and ride over an HDFS 
outage if possible).

 RS should periodically ping its HLog pipeline even if no writes are arriving
 

 Key: HBASE-4274
 URL: https://issues.apache.org/jira/browse/HBASE-4274
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 If you restart HDFS underneath HBase, when HBase isn't taking any write load, 
 the region servers won't notice that there's any problem until the next 
 time they take a write, at which point they will abort (because the pipeline 
 is gone from beneath them). It would be better if they wrote some garbage to 
 their HLog once every few seconds as a sort of keepalive, so they will 
 aggressively abort as soon as there's an issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4147) StoreFile query usage report

[
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-4147:
-

Priority: Major (was: Critical)
Fix Version/s: (was: 0.96.0)

This turned into a (useful) discussion. [~eclark] Can you take a look and note
what in 0.96 metrics2 might help answering the questions Doug poses above? We
also have a trace mechanism committed but again it would take some work to get
it to the level Doug is asking for in the above.

It would seem that this issue should become two issues now: one to improve the
trace so can go down to the per-storefile level and another to add to metrics
so can do at the storefile emissions (if possible).

Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a
sponsor.

StoreFile query usage report

Key: HBASE-4147
URL: https://issues.apache.org/jira/browse/HBASE-4147
Project: HBase
Issue Type: Improvement
Reporter: Doug Meil
Attachments: hbase_4147_storefilereport_2011_08_10.pdf,
hbase_4147_storefilereport.pdf

Detailed information on what HBase is doing in terms of reads is hard to come
by.
What would be useful is to have a periodic StoreFile query report.
Specifically, this could run on a configured interval (e.g., every 30
seconds, 60 seconds) and dump the output to the log files.
This would have all StoreFiles accessed during the reporting period (and with
the Path we would also know region, CF, and table), # of times the StoreFile
was accessed, the size of the StoreFile, and the total time (ms) spent
processing that StoreFile.
Even this level of summary would be useful to detect a which tables CFs are
being accessed the most, and including the StoreFile would provide insight
into relative uncompaction (i.e., lots of StoreFiles).
I think the log-output, as opposed to UI, is an important facet with this.
I'm assuming that users will slice and dice this data on their own so I think
we should skip any kind of admin view for now (i.e., new JSPs, new APIs to
expose this data). Just getting this to log-file would be a big improvement.
Will this have a non-zero performance impact? Yes. Hopefully small, but yes
it will. However, flying a plane without any instrumentation isn't fun. :-)

[jira] [Assigned] (HBASE-7492) add new online-snapshot properties to hbase-default.xml


 [ 
https://issues.apache.org/jira/browse/HBASE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-7492:
-

Assignee: Jonathan Hsieh

 add new online-snapshot properties to hbase-default.xml
 ---

 Key: HBASE-7492
 URL: https://issues.apache.org/jira/browse/HBASE-7492
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Suggested by jesse on the HBASE-6864 review.
 {code}
 76
   /** Maximum number of concurrent snapshot region tasks that can run 
 concurrently */
 77
   private static final String CONCURENT_SNAPSHOT_TASKS_KEY = 
 hbase.snapshot.region.concurrentTasks;
 78
   private static final int DEFAULT_CONCURRENT_SNAPSHOT_TASKS = 3;
 79
 80
   /** Conf key for number of request threads to start snapshots on 
 regionservers */
 81
   public static final String SNAPSHOT_REQUEST_THREADS_KEY = 
 hbase.snapshot.region.pool.threads;
 82
   /** # of threads for snapshotting regions on the rs. */
 83
   public static final int SNAPSHOT_REQUEST_THREADS_DEFAULT = 10;
 84
 85
   /** Conf key for max time to keep threads in snapshot request pool waiting 
 */
 86
   public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = 
 hbase.snapshot.region.timeout;
 87
   /** Keep threads alive in request pool for max of 60 seconds */
 88
   public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6;
 89
 90
   /** Conf key for millis between checks to see if snapshot completed or if 
 there are errors*/
 91
   public static final String SNAPSHOT_REQUEST_WAKE_MILLIS_KEY = 
 hbase.snapshot.region.wakefrequency;
 92
   /** Default amount of time to check for errors while regions finish 
 snapshotting */
 93
   private static final long SNAPSHOT_REQUEST_WAKE_MILLIS_DEFAULT = 500;
 {code}
 nit: add these to hbase-default.xml?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-7492) add new online-snapshot properties to hbase-default.xml


 [ 
https://issues.apache.org/jira/browse/HBASE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-7492 started by Jonathan Hsieh.

 add new online-snapshot properties to hbase-default.xml
 ---

 Key: HBASE-7492
 URL: https://issues.apache.org/jira/browse/HBASE-7492
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Suggested by jesse on the HBASE-6864 review.
 {code}
 76
   /** Maximum number of concurrent snapshot region tasks that can run 
 concurrently */
 77
   private static final String CONCURENT_SNAPSHOT_TASKS_KEY = 
 hbase.snapshot.region.concurrentTasks;
 78
   private static final int DEFAULT_CONCURRENT_SNAPSHOT_TASKS = 3;
 79
 80
   /** Conf key for number of request threads to start snapshots on 
 regionservers */
 81
   public static final String SNAPSHOT_REQUEST_THREADS_KEY = 
 hbase.snapshot.region.pool.threads;
 82
   /** # of threads for snapshotting regions on the rs. */
 83
   public static final int SNAPSHOT_REQUEST_THREADS_DEFAULT = 10;
 84
 85
   /** Conf key for max time to keep threads in snapshot request pool waiting 
 */
 86
   public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = 
 hbase.snapshot.region.timeout;
 87
   /** Keep threads alive in request pool for max of 60 seconds */
 88
   public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6;
 89
 90
   /** Conf key for millis between checks to see if snapshot completed or if 
 there are errors*/
 91
   public static final String SNAPSHOT_REQUEST_WAKE_MILLIS_KEY = 
 hbase.snapshot.region.wakefrequency;
 92
   /** Default amount of time to check for errors while regions finish 
 snapshotting */
 93
   private static final long SNAPSHOT_REQUEST_WAKE_MILLIS_DEFAULT = 500;
 {code}
 nit: add these to hbase-default.xml?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4147) StoreFile query usage report

2013-01-10 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549869#comment-13549869
]

Andrew Purtell commented on HBASE-4147:
---

bq. Meantime, marking this as non-critical and moving out of 0.96 while it is
w/o a sponsor.

I might be looking at this again in the future in the context of HBASE-6572.
Deciding what stores to migrate.

StoreFile query usage report

[jira] [Created] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs

Jean-Daniel Cryans created HBASE-7530:
-

 Summary: [replication] Work around HDFS-4380 else we get NPEs
 Key: HBASE-7530
 URL: https://issues.apache.org/jira/browse/HBASE-7530
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5


I've been spending a lot of time trying to figure the recent test failures 
related to replication. One I seem to be constantly getting is this NPE:

{noformat}
2013-01-09 10:08:56,912 ERROR 
[RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2]
 regionserver.ReplicationSource$1(727): Unexpected exception in 
ReplicationSource, 
currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
at 
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)

{noformat}

Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 
but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing 
block boundaries and TestReplication uses a 20KB block size for the HLog. The 
intent was just to get HLogs to roll more often, and this can also be achieved 
with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master


 [ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3669:
-

 Priority: Major  (was: Critical)
Fix Version/s: (was: 0.96.0)

Knocking down priority.  My sense is that in 0.96, after all the AM work, this 
issue less likely.   Leaving open in case we do see it again.  Moving out of 
0.96 in meantime.  Making major rather than critical.

 Region in PENDING_OPEN keeps being bounced between RS and master
 

 Key: HBASE-3669
 URL: https://issues.apache.org/jira/browse/HBASE-3669
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
 Attachments: HBASE-3669-debug-v1.patch


 After going crazy killing region servers after HBASE-3668, most of the 
 cluster recovered except for 3 regions that kept being refused by the region 
 servers.
 One the master I would see:
 {code}
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  so generated a random one; 
 hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
 available servers
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  to sv2borg171,60020,1300399357135
 {code}
 Then on the region server:
 {code}
 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempting to transition node 
 f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
 /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
 data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
 node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING failed, the node existed but was in the state 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
 {code}
 I'm not sure I fully understand what was going on... the master was suppose 
 to OFFLINE the znode but then that's not what the region server was seeing? 
 In any case, I was able to recover by doing a force unassign for each region 
 and then assign.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7384) Introducing waitForCondition function into test cases

2013-01-10 Thread Jeffrey Zhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-7384:
-

Attachment: hbase-7384_2.4.patch


Resubmit patch to incorporate Enis feedbacks.

Thanks,
-Jeffrey

 Introducing waitForCondition function into test cases
 -

 Key: HBASE-7384
 URL: https://issues.apache.org/jira/browse/HBASE-7384
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
  Labels: test
 Fix For: 0.96.0

 Attachments: hbase-7384_1.0.patch, hbase-7384_2.4.patch, 
 hbase-7384.patch, Waiter.java


 Recently I'm working on flaky test cases and found we have many places using 
 while loop and sleep to wait for a condition to be true. There are several 
 issues in existing ways:
 1) Many similar code doing the same thing
 2) When time out happens, different errors are reported without explicitly 
 indicating a time out situation
 3) When we want to increase the max timeout value to verify if a test case 
 fails due to a not-enough time out value, we have to recompile  redeploy code
 I propose to create a waitForCondition function as a test utility function 
 like the following:
 {code}
 public interface WaitCheck {
 public boolean Check() ;
 }
 public boolean waitForCondition(int timeOutInMilliSeconds, int 
 checkIntervalInMilliSeconds, WaitCheck s)
 throws InterruptedException {
 int multiplier = 1;
 String multiplierProp = System.getProperty(extremeWaitMultiplier);
 if(multiplierProp != null) {
 multiplier = Integer.parseInt(multiplierProp);
 if(multiplier  1) {
 LOG.warn(String.format(Invalid extremeWaitMultiplier 
 property value:%s. is ignored., multiplierProp));
 multiplier = 1;
 }
 }
 int timeElapsed = 0;
 while(timeElapsed  timeOutInMilliSeconds * multiplier) {
 if(s.Check()) {
 return true;
 }
 Thread.sleep(checkIntervalInMilliSeconds);
 timeElapsed += checkIntervalInMilliSeconds;
 }
 assertTrue(WaitForCondition failed due to time out( + 
 timeOutInMilliSeconds +  milliseconds expired),
 false);
 return false;
 }
 {code}
 By doing the above way, there are several advantages:
 1) Clearly report time out error when such situation happens
 2) Use System property extremeWaitMultiplier to increase max time out 
 dynamically for a quick verification
 3) Standardize current wait situations
 Pleas let me know what your thoughts on this.
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs

2013-01-10 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549907#comment-13549907
 ] 

Lars Hofhansl commented on HBASE-7530:
--

Interesting. Did this only start recently (which would be strange)?
This happens with larger blocksizes too, right? If so this should be critical.

 [replication] Work around HDFS-4380 else we get NPEs
 

 Key: HBASE-7530
 URL: https://issues.apache.org/jira/browse/HBASE-7530
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5


 I've been spending a lot of time trying to figure the recent test failures 
 related to replication. One I seem to be constantly getting is this NPE:
 {noformat}
 2013-01-09 10:08:56,912 ERROR 
 [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
 at 
 org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)
 {noformat}
 Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via 
 HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while 
 crossing block boundaries and TestReplication uses a 20KB block size for the 
 HLog. The intent was just to get HLogs to roll more often, and this can also 
 be achieved with *hbase.regionserver.logroll.multiplier* with a value of 
 0.0003f.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs


[ 
https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549913#comment-13549913
 ] 

Jean-Daniel Cryans commented on HBASE-7530:
---

[~lhofhansl] Not sure when it started happening, the code has changed on the 
HBase side but not on the Hadoop side so we should have seen this before. It 
should happen with larger block sizes too, just a few orders of magnitude less 
probable to happen than in does in TestReplication :)

 [replication] Work around HDFS-4380 else we get NPEs
 

 Key: HBASE-7530
 URL: https://issues.apache.org/jira/browse/HBASE-7530
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5


 I've been spending a lot of time trying to figure the recent test failures 
 related to replication. One I seem to be constantly getting is this NPE:
 {noformat}
 2013-01-09 10:08:56,912 ERROR 
 [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
 at 
 org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)
 {noformat}
 Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via 
 HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while 
 crossing block boundaries and TestReplication uses a 20KB block size for the 
 HLog. The intent was just to get HLogs to roll more often, and this can also 
 be achieved with *hbase.regionserver.logroll.multiplier* with a value of 
 0.0003f.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader

Jean-Daniel Cryans created HBASE-7531:
-

 Summary: [replication] NPE in SequenceFileLogReader because 
ReplicationSource doesn't nullify the reader
 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch

Here's a NPE I get half the time I run TestReplication:

{noformat}
2012-12-20 08:59:17,259 ERROR 
[RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
 regionserver.ReplicationSource$1(727): Unexpected exception in 
ReplicationSource, 
currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
{noformat}

The issue happens after an IOE was caught while opening the reader, the issue 
is that it isn't set to null after that then the rest of the code assumes the 
reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader


 [ 
https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-7531:
--

Attachment: HBASE-7531.patch

Just a simple fix, setting the reader to null if we couldn't get it.

 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't 
 nullify the reader
 ---

 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch


 Here's a NPE I get half the time I run TestReplication:
 {noformat}
 2012-12-20 08:59:17,259 ERROR 
 [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
 {noformat}
 The issue happens after an IOE was caught while opening the reader, the issue 
 is that it isn't set to null after that then the rest of the code assumes the 
 reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader


 [ 
https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-7531:
-

Assignee: Jean-Daniel Cryans

 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't 
 nullify the reader
 ---

 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch


 Here's a NPE I get half the time I run TestReplication:
 {noformat}
 2012-12-20 08:59:17,259 ERROR 
 [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
 {noformat}
 The issue happens after an IOE was caught while opening the reader, the issue 
 is that it isn't set to null after that then the rest of the code assumes the 
 reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3643) Close the filesystem handle when HRS is aborting

[
https://issues.apache.org/jira/browse/HBASE-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549917#comment-13549917
]

stack commented on HBASE-3643:
--

We should at least try this for 0.96... Can punt if too much work.

Close the filesystem handle when HRS is aborting

Key: HBASE-3643
URL: https://issues.apache.org/jira/browse/HBASE-3643
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
Fix For: 0.96.0

I thought of a way to fix HBASE-3515 that has a very broad impact, so I'm
creating this jira to *raise awareness* and gather comments.
Currently when we call HRS.abort, it's still possible to do HDFS operations
like rolling logs and flushing files. It also has the impact that some
threads cannot write to ZK (like the situation described in HBASE-3515) but
then can still write to HDFS. Since that call is so central, I think we
should {color:red} add fs.close() inside the abort method{color}.
The impact of this is that everything else that happens after the close call,
like closing files or appending, will fail in the most horrible ways. On the
bright side, this means less disruptive changes on HDFS.
Todd pointed at HBASE-2231 as related, but I think my solution is still too
sloppy as we could still finish a compaction and immediately close the
filesystem after that (damage's done).

[jira] [Commented] (HBASE-7521) fix HBASE-6060 (regions stuck in opening state) in 0.94


[ 
https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549925#comment-13549925
 ] 

Sergey Shelukhin commented on HBASE-7521:
-

Can you please elaborate on race conditions? Do you mean HBASE-5816?
As far as I can see this patch preserves existing race conditions but doesn't 
add new ones :)
Although, my experience with AM is limited, even more so in 94.

We can try to rebase latest 094 patch from HBASE-6060 instead...


 fix HBASE-6060 (regions stuck in opening state) in 0.94
 ---

 Key: HBASE-7521
 URL: https://issues.apache.org/jira/browse/HBASE-7521
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7521-v0.patch, HBASE-7521-v1.patch


 Discussion in HBASE-6060 implies that the fix there does not work on 0.94. 
 Still, we may want to fix the issue in 0.94 (via some different fix) because 
 the regions stuck in opening for ridiculous amounts of time is not a good 
 thing to have.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs


 [ 
https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-7530:
--

Attachment: HBASE-7530.patch

The fix I proposed, I'm currently testing it in a loop.

 [replication] Work around HDFS-4380 else we get NPEs
 

 Key: HBASE-7530
 URL: https://issues.apache.org/jira/browse/HBASE-7530
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7530.patch


 I've been spending a lot of time trying to figure the recent test failures 
 related to replication. One I seem to be constantly getting is this NPE:
 {noformat}
 2013-01-09 10:08:56,912 ERROR 
 [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
 at 
 org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)
 {noformat}
 Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via 
 HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while 
 crossing block boundaries and TestReplication uses a 20KB block size for the 
 HLog. The intent was just to get HLogs to roll more often, and this can also 
 be achieved with *hbase.regionserver.logroll.multiplier* with a value of 
 0.0003f.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


[ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549932#comment-13549932
 ] 

Sergey Shelukhin commented on HBASE-7528:
-

Do you mean only the precheck is in error, or the null being there as such? For 
now fixing the precheck.

 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7528:


Attachment: HBASE-7528-v0.patch

 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7528:


Assignee: Sergey Shelukhin
  Status: Patch Available  (was: Open)

 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7522) Tests should not be writing under /tmp/

2013-01-10 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549937#comment-13549937
 ] 

Andrew Purtell commented on HBASE-7522:
---

TestLocalHBaseCluster is certainly picking up files under /tmp/hbase-${user}.

 Tests should not be writing under /tmp/
 ---

 Key: HBASE-7522
 URL: https://issues.apache.org/jira/browse/HBASE-7522
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0, 0.94.5
Reporter: Enis Soztutar

 As per the discussion 
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201301.mbox/%3CCA%2BRK%3D_BmV%3Dvwws4VeDJVPt6hY7NKCDEafex3XTNam630pQRBbA%40mail.gmail.com%3E,
  tests should not be writing under /tmp/ directory. 
 TestStoreFile is one of the offending ones. Some of them will be fixed at 
 HBASE-6824. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader


[ 
https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549948#comment-13549948
 ] 

Sergey Shelukhin commented on HBASE-7531:
-

+1. The cause is the dubious semantics of openReader imho (but I may just be 
unfamiliar with code); sleepMultiplier decision can be in the outside loop and 
openReader return value meaning can then be simpler.

 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't 
 nullify the reader
 ---

 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch


 Here's a NPE I get half the time I run TestReplication:
 {noformat}
 2012-12-20 08:59:17,259 ERROR 
 [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
 {noformat}
 The issue happens after an IOE was caught while opening the reader, the issue 
 is that it isn't set to null after that then the rest of the code assumes the 
 reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush


[ 
https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549954#comment-13549954
 ] 

Sergey Shelukhin commented on HBASE-6466:
-

I didn't see this on EC2 when I was doing perf testing, or just in exploratory 
test w/LTT.


 Enable multi-thread for memstore flush
 --

 Key: HBASE-6466
 URL: https://issues.apache.org/jira/browse/HBASE-6466
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6466.patch, HBASE-6466v2.patch, 
 HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, 
 HBASE-6466-v4.patch


 If the KV is large or Hlog is closed with high-pressure putting, we found 
 memstore is often above the high water mark and block the putting.
 So should we enable multi-thread for Memstore Flush?
 Some performance test data for reference,
 1.test environment ： 
 random writting；upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 
 regions per regionserver；row len=50 bytes, value len=1024 bytes;5 
 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler 
 per client for writing
 2.test results:
 one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per 
 regionserver, appears many aboveGlobalMemstoreLimit blocking
 two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per 
 regionserver,
 200 thread handler per client  two cacheFlush handlers, tps:16.1k/s per 
 regionserver, Flush:18.6MB/s per regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549955#comment-13549955
]

Sergey Shelukhin commented on HBASE-5416:
-

Is this JIRA unresolved pending 0.94 commit? Just checking as it shows up in my
filter :)

Improve performance of scans with some kind of filters.
---

Key: HBASE-5416
URL: https://issues.apache.org/jira/browse/HBASE-5416
Project: HBase
Issue Type: Improvement
Components: Filters, Performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Sergey Shelukhin
Fix For: 0.96.0

Attachments: 5416-0.94-v1.txt, 5416-0.94-v2.txt,
5416-Filtered_scans_v6.patch, 5416-v13.patch, 5416-v14.patch, 5416-v15.patch,
5416-v16.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch,
Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch,
Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch,
HBASE-5416-v10.patch, HBASE-5416-v11.patch, HBASE-5416-v12.patch,
HBASE-5416-v12.patch, HBASE-5416-v7-rebased.patch, HBASE-5416-v8.patch,
HBASE-5416-v9.patch

When the scan is performed, whole row is loaded into result list, after that
filter (if exists) is applied to detect that row is needed.
But when scan is performed on several CFs and filter checks only data from
the subset of these CFs, data from CFs, not checked by a filter is not needed
on a filter stage. Only when we decided to include current row. And in such
case we can significantly reduce amount of IO performed by a scan, by loading
only values, actually checked by a filter.
For example, we have two CFs: flags and snap. Flags is quite small (bunch of
megabytes) and is used to filter large entries from snap. Snap is very large
(10s of GB) and it is quite costly to scan it. If we needed only rows with
some flag specified, we use SingleColumnValueFilter to limit result to only
small subset of region. But current implementation is loading both CFs to
perform scan, when only small subset is needed.
Attached patch adds one routine to Filter interface to allow filter to
specify which CF is needed to it's operation. In HRegion, we separate all
scanners into two groups: needed for filter and the rest (joined). When new
row is considered, only needed data is loaded, filter applied, and only if
filter accepts the row, rest of data is loaded. At our data, this speeds up
such kind of scans 30-50 times. Also, this gives us the way to better
normalize the data into separate columns by optimizing the scans performed.

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HBASE-7268:

Attachment: HBASE-7268-v6.patch

feedback from /r/, removing lines longer than 100

correct local region location cache information can be overwritten w/stale
information from an old server
-

Key: HBASE-7268
URL: https://issues.apache.org/jira/browse/HBASE-7268
Project: HBase
Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
Fix For: 0.96.0

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch,
HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch,
HBASE-7268-v5.patch, HBASE-7268-v6.patch

Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic R
moved from C to B, even though such transition never happened (neither in
nor before the sequence described below). Not quite sure how the client
learned of the transition to C, I assume it's from meta from some other
thread...
Then, put fails (it may fail due to accumulated errors that are not logged,
which I am investigating... but the bogus cache update is there
nonwithstanding).
I have a patch but not sure if it works, test still fails locally for yet
unknown reason.

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549973#comment-13549973
]

Sergey Shelukhin commented on HBASE-7268:
-

The test repeatedly passes locally...

correct local region location cache information can be overwritten w/stale
information from an old server
-

[jira] [Commented] (HBASE-7384) Introducing waitForCondition function into test cases


[ 
https://issues.apache.org/jira/browse/HBASE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549980#comment-13549980
 ] 

Hadoop QA commented on HBASE-7384:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564216/hbase-7384_2.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMultiParallel
  
org.apache.hadoop.hbase.replication.TestReplicationWithCompression
  org.apache.hadoop.hbase.TestLocalHBaseCluster

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3962//console

This message is automatically generated.

 Introducing waitForCondition function into test cases
 -

 Key: HBASE-7384
 URL: https://issues.apache.org/jira/browse/HBASE-7384
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
  Labels: test
 Fix For: 0.96.0

 Attachments: hbase-7384_1.0.patch, hbase-7384_2.4.patch, 
 hbase-7384.patch, Waiter.java


 Recently I'm working on flaky test cases and found we have many places using 
 while loop and sleep to wait for a condition to be true. There are several 
 issues in existing ways:
 1) Many similar code doing the same thing
 2) When time out happens, different errors are reported without explicitly 
 indicating a time out situation
 3) When we want to increase the max timeout value to verify if a test case 
 fails due to a not-enough time out value, we have to recompile  redeploy code
 I propose to create a waitForCondition function as a test utility function 
 like the following:
 {code}
 public interface WaitCheck {
 public boolean Check() ;
 }
 public boolean waitForCondition(int timeOutInMilliSeconds, int 
 checkIntervalInMilliSeconds, WaitCheck s)
 throws InterruptedException {
 int multiplier = 1;
 String multiplierProp = System.getProperty(extremeWaitMultiplier);
 if(multiplierProp != null) {
 multiplier = Integer.parseInt(multiplierProp);
 if(multiplier  1) {
 LOG.warn(String.format(Invalid extremeWaitMultiplier 
 property value:%s. is ignored., multiplierProp));
 multiplier = 1;
 }
 }
 int timeElapsed = 0;
 while(timeElapsed  timeOutInMilliSeconds * multiplier) {
 if(s.Check()) {
 return true;
 }

[jira] [Updated] (HBASE-7213) Have HLog files for .META. edits only

2013-01-10 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-7213:
---

Attachment: 7213-2.10.patch

Rebased (again). Also I fixed a bug in HMaster.java. Some of the unit test 
failures were legit and were caused by the bug.

 Have HLog files for .META. edits only
 -

 Key: HBASE-7213
 URL: https://issues.apache.org/jira/browse/HBASE-7213
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7213-2.10.patch, 7213-2.4.patch, 7213-2.6.patch, 
 7213-2.8.patch, 7213-2.9.patch, 7213-in-progress.2.2.patch, 
 7213-in-progress.2.patch, 7213-in-progress.patch


 Over on HBASE-6774, there is a discussion on separating out the edits for 
 .META. regions from the other regions' edits w.r.t where the edits are 
 written. This jira is to track an implementation of that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)


[ 
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550350#comment-13550350
 ] 

Sergey Shelukhin commented on HBASE-7383:
-

bq. generateColumnsForCf() returns byte[][], but Setbyte[] is passed as param 
above. Can you explain why the difference ?
Convenience of existing users/implementation. I know, not a very good reason... 
Do you want me to change it? Should be easy to change to either if needed.
bq. Please use SecureRandom instead.
Why?

 create integration test for HBASE-5416 (improving scan performance for 
 certain filters)
 ---

 Key: HBASE-7383
 URL: https://issues.apache.org/jira/browse/HBASE-7383
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, 
 HBASE-7383-v1.patch


 HBASE-5416 is risky and needs an integration test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


[ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550351#comment-13550351
 ] 

Hadoop QA commented on HBASE-7528:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564228/HBASE-7528-v0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3963//console

This message is automatically generated.

 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)


 [ 
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7383:


Attachment: HBASE-7383-v2.patch

 create integration test for HBASE-5416 (improving scan performance for 
 certain filters)
 ---

 Key: HBASE-7383
 URL: https://issues.apache.org/jira/browse/HBASE-7383
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, 
 HBASE-7383-v1.patch, HBASE-7383-v2.patch


 HBASE-5416 is risky and needs an integration test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush

2013-01-10 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550362#comment-13550362
 ] 

Elliott Clark commented on HBASE-6466:
--

I'll circle back around and give this patch another run on a cluster next week. 
 I'll try and get more details for you.

 Enable multi-thread for memstore flush
 --

 Key: HBASE-6466
 URL: https://issues.apache.org/jira/browse/HBASE-6466
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6466.patch, HBASE-6466v2.patch, 
 HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, 
 HBASE-6466-v4.patch


 If the KV is large or Hlog is closed with high-pressure putting, we found 
 memstore is often above the high water mark and block the putting.
 So should we enable multi-thread for Memstore Flush?
 Some performance test data for reference,
 1.test environment ： 
 random writting；upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 
 regions per regionserver；row len=50 bytes, value len=1024 bytes;5 
 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler 
 per client for writing
 2.test results:
 one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per 
 regionserver, appears many aboveGlobalMemstoreLimit blocking
 two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per 
 regionserver,
 200 thread handler per client  two cacheFlush handlers, tps:16.1k/s per 
 regionserver, Flush:18.6MB/s per regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7424) Enable the DeltaEncoding for the HFileOutputFormat

2013-01-10 Thread Manukranth Kolloju (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-7424:
--

Description: HFileOutputFormat has a writer embedded but it is not 
configured to use the DeltaEncoding. This revision is to add that support to 
the HFileOutputFormat while it is used as an OutputFormat either the Mapper or 
the Reducer for a MapReduce task.  (was: HFileOutputFormat has a writer 
embedded but it is not configured to use the DeltaEncoding and FavoredNodes. 
This revision is to add that support to the HFileOutputFormat while it is used 
as an OutputFormat either the Mapper or the Reducer for a MapReduce task.)
Summary: Enable the DeltaEncoding for the HFileOutputFormat  (was: 
Enable the DeltaEncoding and FavoredNodes for the HFileOutputFormat)

 Enable the DeltaEncoding for the HFileOutputFormat
 --

 Key: HBASE-7424
 URL: https://issues.apache.org/jira/browse/HBASE-7424
 Project: HBase
  Issue Type: New Feature
Reporter: Manukranth Kolloju
Priority: Minor
  Labels: HFileOutputFormat

 HFileOutputFormat has a writer embedded but it is not configured to use the 
 DeltaEncoding. This revision is to add that support to the HFileOutputFormat 
 while it is used as an OutputFormat either the Mapper or the Reducer for a 
 MapReduce task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7532) Enable the FavoredNodes for the HFileOutputFormat

2013-01-10 Thread Manukranth Kolloju (JIRA)

Manukranth Kolloju created HBASE-7532:
-

 Summary: Enable the FavoredNodes for the HFileOutputFormat
 Key: HBASE-7532
 URL: https://issues.apache.org/jira/browse/HBASE-7532
 Project: HBase
  Issue Type: New Feature
Reporter: Manukranth Kolloju
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7533) Write an RPC Specification for 0.96

stack created HBASE-7533:


 Summary: Write an RPC Specification for 0.96
 Key: HBASE-7533
 URL: https://issues.apache.org/jira/browse/HBASE-7533
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.96.0


RPC format is changing for 0.96 to accomodate our protobufing all around.  Here 
is a first cut.  Please shred: 
https://docs.google.com/document/d/1-1RJMLXzYldmHgKP7M7ynK6euRpucD03fZ603DlZfGI/edit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)


[ 
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550390#comment-13550390
 ] 

Ted Yu commented on HBASE-7383:
---

w.r.t. SecureRandom, take a look at :
http://www.coderanch.com/t/410832/java/java/Java-Random-SecureRandom

 create integration test for HBASE-5416 (improving scan performance for 
 certain filters)
 ---

 Key: HBASE-7383
 URL: https://issues.apache.org/jira/browse/HBASE-7383
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, 
 HBASE-7383-v1.patch, HBASE-7383-v2.patch


 HBASE-5416 is risky and needs an integration test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


[ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550402#comment-13550402
 ] 

Jonathan Hsieh commented on HBASE-7528:
---

Thanks sergey.  I'll commit this with one minor fix (there is a missing ' 
char in my description and in the patch).  It still dones' tfix the problem but 
it does make the error message much better.



 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.


[ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550402#comment-13550402
 ] 

Jonathan Hsieh edited comment on HBASE-7528 at 1/10/13 8:56 PM:


Thanks sergey.  I'll commit this with one minor fix (there is a missing ' 
char in my description and in the patch).  There still is a problem here but it 
does make the error message much better.



  was (Author: jmhsieh):
Thanks sergey.  I'll commit this with one minor fix (there is a missing ' 
char in my description and in the patch).  It still dones' tfix the problem but 
it does make the error message much better.


  
 NPE in hbck -repair when adopting orphans if not tableinfo is found.
 

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)


[ 
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550405#comment-13550405
 ] 

Sergey Shelukhin commented on HBASE-7383:
-

Well, it says For general statistics, Random is fine. Its a typical modulo 
congruent function.

SecureRandom is more random. Specifically, it aims to make it impossible to 
predict the next random number from a sequence, which is trivial to do with 
most modulo congruent algorithms., so for test data generation Random would 
seemingly be the right choice.

 create integration test for HBASE-5416 (improving scan performance for 
 certain filters)
 ---

 Key: HBASE-7383
 URL: https://issues.apache.org/jira/browse/HBASE-7383
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, 
 HBASE-7383-v1.patch, HBASE-7383-v2.patch


 HBASE-5416 is risky and needs an integration test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7528:
--

Summary: Unhelpful NPE in hbck -repair when adopting orphans if no 
tableinfo is found.  (was: NPE in hbck -repair when adopting orphans if not 
tableinfo is found.)

 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
 -

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7528:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
 -

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7528:
--

  Component/s: hbck
Affects Version/s: 0.96.0
   0.90.6
   0.92.2
   0.94.3
Fix Version/s: 0.96.0

 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
 -

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-7528-v0.patch


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550410#comment-13550410
]

Hadoop QA commented on HBASE-7268:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12564234/HBASE-7268-v6.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 18 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestFromClientSide
org.apache.hadoop.hbase.TestLocalHBaseCluster
org.apache.hadoop.hbase.client.TestMultiParallel

{color:red}-1 core zombie tests{color}. There are 9 zombie test(s):
at
org.apache.hadoop.hbase.catalog.TestCatalogTracker.testServerNotRunningIOException(TestCatalogTracker.java:250)
at
org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS(TestMasterFailover.java:833)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3964//console

This message is automatically generated.

correct local region location cache information can be overwritten w/stale
information from an old server
-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)

[
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550413#comment-13550413
]

Hadoop QA commented on HBASE-7383:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12564243/HBASE-7383-v2.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 31 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestReplication

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3966//console

This message is automatically generated.

create integration test for HBASE-5416 (improving scan performance for
certain filters)
---

Key: HBASE-7383
URL: https://issues.apache.org/jira/browse/HBASE-7383
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch,
HBASE-7383-v1.patch, HBASE-7383-v2.patch

HBASE-5416 is risky and needs an integration test.

[jira] [Created] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous

Jean-Daniel Cryans created HBASE-7534:
-

 Summary: [replication] TestReplication.queueFailover can fail 
because HBaseTestingUtility.createMultiRegions is dangerous
 Key: HBASE-7534
 URL: https://issues.apache.org/jira/browse/HBASE-7534
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5


{{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an 
already existing table and hot replaces the regions in it. I've seen 
TestReplication failing a few times because the old first region is still 
assigned and tried to flush but crashed due to the fact that the region's 
folder is missing in HDFS: 

{noformat}
2013-01-04 10:04:45,500 DEBUG 
[RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
regionserver.Store(844): Renaming flushed file at 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
 to 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] 
namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to 
rename 
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
 to 
/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 because destination's parent does not exist
2013-01-04 10:04:45,503 WARN  
[RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
regionserver.Store(847): Unable to rename 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
 to 
hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
2013-01-04 10:04:45,504 WARN  [DataStreamer for file 
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
 hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
 File does not exist. [Lease.  Holder: 
DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
{noformat}

Eventually the test times out because both region servers on the master cluster 
are dead.

It can be easily fixed by pre-creating the table with enough regions.

FWIW a bunch of other tests are using this facility, my IDE tells me that the 3 
methods are called 25 times outside of {{HBaseTestingUtility}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.


 [ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7528:
--

Attachment: hbase-7528.v1

v1 is what I committed.

 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
 -

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-7528-v0.patch, hbase-7528.v1


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7213) Have HLog files for .META. edits only

[
https://issues.apache.org/jira/browse/HBASE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550423#comment-13550423
]

Hadoop QA commented on HBASE-7213:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12564237/7213-2.10.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 lineLengths{color}. The patch introduces lines longer than
100

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestScannerTimeout
org.apache.hadoop.hbase.client.TestMultiParallel
org.apache.hadoop.hbase.TestLocalHBaseCluster

{color:red}-1 core zombie tests{color}. There are 3 zombie test(s):
at
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitBeforeSettingSplittingInZKInternals(TestSplitTransactionOnCluster.java:738)
at
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitBeforeSettingSplittingInZK(TestSplitTransactionOnCluster.java:541)
at
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3965//console

This message is automatically generated.

Have HLog files for .META. edits only
-

Key: HBASE-7213
URL: https://issues.apache.org/jira/browse/HBASE-7213
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
Fix For: 0.96.0

Attachments: 7213-2.10.patch, 7213-2.4.patch, 7213-2.6.patch,
7213-2.8.patch, 7213-2.9.patch, 7213-in-progress.2.2.patch,
7213-in-progress.2.patch, 7213-in-progress.patch

Over on HBASE-6774, there is a discussion on separating out the edits for
.META. regions from the other regions' edits w.r.t where the edits are
written. This jira is to track an implementation of that.

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.


[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550431#comment-13550431
 ] 

Ted Yu commented on HBASE-5416:
---

For 0.94 patch, I saw the following on my Mac:
{code}
testScanner_JoinedScannersWithLimits(org.apache.hadoop.hbase.regionserver.TestHRegion)
  Time elapsed: 0.001 sec   FAILURE!
junit.framework.AssertionFailedError: expected:3 but was:1
  at junit.framework.Assert.fail(Assert.java:50)
  at junit.framework.Assert.failNotEquals(Assert.java:287)
  at junit.framework.Assert.assertEquals(Assert.java:67)
  at junit.framework.Assert.assertEquals(Assert.java:199)
  at junit.framework.Assert.assertEquals(Assert.java:205)
  at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testScanner_JoinedScannersWithLimits(TestHRegion.java:2976)
{code}

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: Filters, Performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Sergey Shelukhin
 Fix For: 0.96.0

 Attachments: 5416-0.94-v1.txt, 5416-0.94-v2.txt, 
 5416-Filtered_scans_v6.patch, 5416-v13.patch, 5416-v14.patch, 5416-v15.patch, 
 5416-v16.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch, 
 HBASE-5416-v10.patch, HBASE-5416-v11.patch, HBASE-5416-v12.patch, 
 HBASE-5416-v12.patch, HBASE-5416-v7-rebased.patch, HBASE-5416-v8.patch, 
 HBASE-5416-v9.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6201) HBase integration/system tests

2013-01-10 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550447#comment-13550447
]

Nick Dimiduk commented on HBASE-6201:
-

FYI, the guys at Wibidata have provided a [maven
plugin|https://github.com/kijiproject/hbase-maven-plugin] that looks
potentially interesting for the purpose of running these integration tests
locally. It may need to be jury-rigged to launch a cluster out of the local
sandbox rather than one provided by an external release...

HBase integration/system tests
--

Key: HBASE-6201
URL: https://issues.apache.org/jira/browse/HBASE-6201
Project: HBase
Issue Type: Bug
Components: test
Affects Versions: 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

Integration and general system tests have been discussed previously, and the
conclusion is that we need to unify how we do release candidate testing
(HBASE-6091).
In this issue, I would like to discuss and agree on a general plan, and open
subtickets for execution so that we can carry out most of the tests in
HBASE-6091 automatically.
Initially, here is what I have in mind:
1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454
(without any tests). This will allow integration test to be run with
{code}
mvn verify
{code}
2. Add ability to run all integration/system tests on a given cluster. Smt
like:
{code}
mvn verify -Dconf=/etc/hbase/conf/
{code}
should run the test suite on the given cluster. (Right now we can launch some
of the tests (TestAcidGuarantees) from command line). Most of the system
tests will be client side, and interface with the cluster through public
APIs. We need a tool on top of MiniHBaseCluster or improve
HBaseTestingUtility, so that tests can interface with the mini cluster or the
actual cluster uniformly.
3. Port candidate unit tests to the integration tests module. Some of the
candidates are:
- TestAcidGuarantees / TestAtomicOperation
- TestRegionBalancing (HBASE-6053)
- TestFullLogReconstruction
- TestMasterFailover
- TestImportExport
- TestMultiVersions / TestKeepDeletes
- TestFromClientSide
- TestShell and src/test/ruby
- TestRollingRestart
- Test**OnCluster
- Balancer tests
These tests should continue to be run as unit tests w/o any change in
semantics. However, given an actual cluster, they should use that, instead of
spinning a mini cluster.
4. Add more tests, especially, long running ingestion tests (goraci, BigTop's
TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests.
All suggestions welcome.

[jira] [Created] (HBASE-7535) Fix restore reference files

Matteo Bertozzi created HBASE-7535:
--

 Summary: Fix restore reference files
 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch

After HBASE-7419 the HFileLink regex became stricter, to have the proper 
isHFileLink() check.

but HFileLink should open both reference and hfiles
since the main idea behind it is open stuff in /table/region/family/XYZ

This patch fix the reference (split files) restore problem and open the 
hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7535) Fix restore reference files


 [ 
https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7535:
---

Attachment: HBASE-7535-v0.patch

 Fix restore reference files
 ---

 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch


 After HBASE-7419 the HFileLink regex became stricter, to have the proper 
 isHFileLink() check.
 but HFileLink should open both reference and hfiles
 since the main idea behind it is open stuff in /table/region/family/XYZ
 This patch fix the reference (split files) restore problem and open the 
 hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7535) Fix restore reference files


 [ 
https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7535:
---

Status: Patch Available  (was: Open)

 Fix restore reference files
 ---

 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch


 After HBASE-7419 the HFileLink regex became stricter, to have the proper 
 isHFileLink() check.
 but HFileLink should open both reference and hfiles
 since the main idea behind it is open stuff in /table/region/family/XYZ
 This patch fix the reference (split files) restore problem and open the 
 hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)


[ 
https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550467#comment-13550467
 ] 

Ted Yu commented on HBASE-7383:
---

My interpretation of the article about SecureRandom is that it gives us better 
randomness.

BTW there is a javadoc warning:

[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java:83:
 warning - Tag @link: can't find verify(byte[], byte[]...) in 
org.apache.hadoop.hbase.util.test.LoadTestKVGenerator
[INFO] 

 create integration test for HBASE-5416 (improving scan performance for 
 certain filters)
 ---

 Key: HBASE-7383
 URL: https://issues.apache.org/jira/browse/HBASE-7383
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, 
 HBASE-7383-v1.patch, HBASE-7383-v2.patch


 HBASE-5416 is risky and needs an integration test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7535) Fix restore reference files


[ 
https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550471#comment-13550471
 ] 

Hadoop QA commented on HBASE-7535:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564271/HBASE-7535-v0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3967//console

This message is automatically generated.

 Fix restore reference files
 ---

 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch


 After HBASE-7419 the HFileLink regex became stricter, to have the proper 
 isHFileLink() check.
 but HFileLink should open both reference and hfiles
 since the main idea behind it is open stuff in /table/region/family/XYZ
 This patch fix the reference (split files) restore problem and open the 
 hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous


 [ 
https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-7534:
--

Attachment: HBASE-7534.patch

This patch adds a new set of keys (almost the same but the semantic is 
different, and I also didn't want to mess with Arrays) which is now used when 
creating the table.

 [replication] TestReplication.queueFailover can fail because 
 HBaseTestingUtility.createMultiRegions is dangerous
 

 Key: HBASE-7534
 URL: https://issues.apache.org/jira/browse/HBASE-7534
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7534.patch


 {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an 
 already existing table and hot replaces the regions in it. I've seen 
 TestReplication failing a few times because the old first region is still 
 assigned and tried to flush but crashed due to the fact that the region's 
 folder is missing in HDFS: 
 {noformat}
 2013-01-04 10:04:45,500 DEBUG 
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(844): Renaming flushed file at 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] 
 namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to 
 rename 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
  because destination's parent does not exist
 2013-01-04 10:04:45,503 WARN  
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(847): Unable to rename 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,504 WARN  [DataStreamer for file 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
  hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
  File does not exist. [Lease.  Holder: 
 DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
 {noformat}
 Eventually the test times out because both region servers on the master 
 cluster are dead.
 It can be easily fixed by pre-creating the table with enough regions.
 FWIW a bunch of other tests are using this facility, my IDE tells me that the 
 3 methods are called 25 times outside of {{HBaseTestingUtility}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7458) TestReplicationWithCompression fails intermittently in both PreCommit and trunk builds

2013-01-10 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7458:


Labels: tes  (was: )

 TestReplicationWithCompression fails intermittently in both PreCommit and 
 trunk builds
 --

 Key: HBASE-7458
 URL: https://issues.apache.org/jira/browse/HBASE-7458
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Priority: Critical
  Labels: tes
 Fix For: 0.96.0


 TestReplicationWithCompression has been failing often.
 Here are few examples:
 https://builds.apache.org/job/PreCommit-HBASE-Build/3755/testReport/
 https://builds.apache.org/job/HBase-TRUNK/3672/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testDeleteTypes/
 https://builds.apache.org/job/HBase-0.94/677/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous


[ 
https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550500#comment-13550500
 ] 

Jean-Daniel Cryans commented on HBASE-7534:
---

bq. Go commit. See if it fixes the fails.

FWIW the current failures are unrelated to this, from what I can tell the 
machine where Jenkins runs is either slow or something is slowing us down. For 
example in build 717's log for TestReplication.queueFailover:

{noformat}
2013-01-09 06:14:47,771 DEBUG 
[RegionServer:1;vesta.apache.org,41495,1357711464011-EventThread.replicationSource,2]
 regionserver.ReplicationSource(638): Replicating 3
2013-01-09 06:14:49,730 INFO  [Thread-1887] replication.TestReplication(779): 
Only got 9720 rows instead of 17576 current i=-16
2013-01-09 06:14:55,176 INFO  [Thread-1887] replication.TestReplication(779): 
Only got 9720 rows instead of 17576 current i=-15
2013-01-09 06:14:56,789 DEBUG 
[RegionServer:1;vesta.apache.org,41495,1357711464011-EventThread.replicationSource,2]
 regionserver.ReplicationSource(651): Replicated in total: 1837
{noformat}

You can see that it took 9 seconds to replicate a bunch of rows and no progress 
is made. It runs way faster than that on my machine.

 [replication] TestReplication.queueFailover can fail because 
 HBaseTestingUtility.createMultiRegions is dangerous
 

 Key: HBASE-7534
 URL: https://issues.apache.org/jira/browse/HBASE-7534
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7534.patch


 {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an 
 already existing table and hot replaces the regions in it. I've seen 
 TestReplication failing a few times because the old first region is still 
 assigned and tried to flush but crashed due to the fact that the region's 
 folder is missing in HDFS: 
 {noformat}
 2013-01-04 10:04:45,500 DEBUG 
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(844): Renaming flushed file at 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] 
 namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to 
 rename 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
  because destination's parent does not exist
 2013-01-04 10:04:45,503 WARN  
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(847): Unable to rename 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,504 WARN  [DataStreamer for file 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
  hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
  File does not exist. [Lease.  Holder: 
 DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
 {noformat}
 Eventually the test times out because both region servers on the master 
 cluster are dead.
 It can be easily fixed by pre-creating the table with enough regions.
 FWIW a bunch of other tests are using this facility, my IDE tells me that the 
 3 methods are called 25 times outside of {{HBaseTestingUtility}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7399) Health check chore for HMaster

2013-01-10 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550505#comment-13550505
 ] 

Nick Dimiduk commented on HBASE-7399:
-

{code}
+  private boolean isHealthCheckerConfigured() {
+String healthScriptLocation = this.conf.get(HConstants.HEALTH_SCRIPT_LOC);
+return 
org.apache.commons.lang.StringUtils.isNotBlank(healthScriptLocation);
+  }
{code}

Nit: {{isNotBlank}} could/should be a static import.

 Health check chore for HMaster
 --

 Key: HBASE-7399
 URL: https://issues.apache.org/jira/browse/HBASE-7399
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Trivial
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7399-0.94.patch, HBASE-7399-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous


[ 
https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550512#comment-13550512
 ] 

Jean-Daniel Cryans commented on HBASE-7534:
---

Actually I was able to find one case where the test timed out on Jenkins:

https://builds.apache.org/job/HBase-0.94/649/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/

Look for failed to rename.

 [replication] TestReplication.queueFailover can fail because 
 HBaseTestingUtility.createMultiRegions is dangerous
 

 Key: HBASE-7534
 URL: https://issues.apache.org/jira/browse/HBASE-7534
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7534.patch


 {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an 
 already existing table and hot replaces the regions in it. I've seen 
 TestReplication failing a few times because the old first region is still 
 assigned and tried to flush but crashed due to the fact that the region's 
 folder is missing in HDFS: 
 {noformat}
 2013-01-04 10:04:45,500 DEBUG 
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(844): Renaming flushed file at 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] 
 namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to 
 rename 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
  because destination's parent does not exist
 2013-01-04 10:04:45,503 WARN  
 [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] 
 regionserver.Store(847): Unable to rename 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
  to 
 hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
 2013-01-04 10:04:45,504 WARN  [DataStreamer for file 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769]
  hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769
  File does not exist. [Lease.  Holder: 
 DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1]
 {noformat}
 Eventually the test times out because both region servers on the master 
 cluster are dead.
 It can be easily fixed by pre-creating the table with enough regions.
 FWIW a bunch of other tests are using this facility, my IDE tells me that the 
 3 methods are called 25 times outside of {{HBaseTestingUtility}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader


[ 
https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550515#comment-13550515
 ] 

Jean-Daniel Cryans commented on HBASE-7531:
---

I was able to find one test failure caused by this:

https://builds.apache.org/job/HBase-0.94/656/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testVerifyRepJob/

The replication thread dies so truncating can't complete.

bq. The cause is the dubious semantics of openReader imho

Yeah I should probably fold in that reader somehow into 
ReplicationHLogReaderManager.

 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't 
 nullify the reader
 ---

 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch


 Here's a NPE I get half the time I run TestReplication:
 {noformat}
 2012-12-20 08:59:17,259 ERROR 
 [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
 {noformat}
 The issue happens after an IOE was caught while opening the reader, the issue 
 is that it isn't set to null after that then the rest of the code assumes the 
 reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.


[ 
https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550549#comment-13550549
 ] 

Hudson commented on HBASE-7528:
---

Integrated in HBase-TRUNK #3725 (See 
[https://builds.apache.org/job/HBase-TRUNK/3725/])
HBASE-7528 Unhelpful NPE in hbck -repair when adopting orphans if no 
tableinfo is found (Sergey Shelukhin) (Revision 1431637)

 Result = FAILURE
jmhsieh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
 -

 Key: HBASE-7528
 URL: https://issues.apache.org/jira/browse/HBASE-7528
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-7528-v0.patch, hbase-7528.v1


 {code}
 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = 
 null, hdfs = 
 hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce,
  deployed =  }
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353)
 at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431)
 at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614)
 at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7535) Fix restore reference files


[ 
https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550564#comment-13550564
 ] 

Ted Yu commented on HBASE-7535:
---

Some logs were removed in patch:
{code}
-LOG.info(getReferredToFile(): p= + p +  g1= + m.group(1) +  g2= + 
m.group(2));
+
{code}
Would they be useful in debugging ? Maybe change to debug level.
{code}
+  LOG.info(restore file as link-link= + hfileName +  in= + familyDir);
{code}
'link-link' means HFileLink created from HFileLink. Maybe call it 
'link-from-link' or something similar ?

In the test:
{code}
+HTableDescriptor htd = createTableDescriptor(table);
{code}
Please create a constant for table name so that it can be referred later:
{code}
+Path basePath = new Path(new Path(table, region), cf);
{code}


 Fix restore reference files
 ---

 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch


 After HBASE-7419 the HFileLink regex became stricter, to have the proper 
 isHFileLink() check.
 but HFileLink should open both reference and hfiles
 since the main idea behind it is open stuff in /table/region/family/XYZ
 This patch fix the reference (split files) restore problem and open the 
 hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7453) HBASE-7423 snapshot followup


[ 
https://issues.apache.org/jira/browse/HBASE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550571#comment-13550571
 ] 

Ted Yu commented on HBASE-7453:
---

+1 from me.

 HBASE-7423 snapshot followup
 

 Key: HBASE-7453
 URL: https://issues.apache.org/jira/browse/HBASE-7453
 Project: HBase
  Issue Type: Sub-task
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: hbase-6055, hbase-7290

 Attachments: HBASE-7453-v0.patch, HBASE-7453-v1.patch


 HBASE-7423 change the arguments for one method used by restore code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7536) Add test that confirms that multiple concurrent snapshot requests are rejected.


 [ 
https://issues.apache.org/jira/browse/HBASE-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-7536:
-

Assignee: Jonathan Hsieh

 Add test that confirms that multiple concurrent snapshot requests are 
 rejected.
 ---

 Key: HBASE-7536
 URL: https://issues.apache.org/jira/browse/HBASE-7536
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Currently the rule is that we can only have online snapshot running at a 
 time.  This test tries to prove this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7535) Fix restore reference files


 [ 
https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7535:
---

Attachment: HBASE-7535-v1.patch

readded the getReferredToFile() log as debug, and remove the log.info() in the 
restoreStoreFile() since we have already the log.trace() one call before about 
the file that we are going to restore.

the table in the table descriptor and the table in the new Path(table, 
region, cf) are not related... the second one is a fake path to make happy 
StoreFile.getReferredToFile() and we don't care about the real path in the 
test. Changed the names to be more explicit about that

 Fix restore reference files
 ---

 Key: HBASE-7535
 URL: https://issues.apache.org/jira/browse/HBASE-7535
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Attachments: HBASE-7535-v0.patch, HBASE-7535-v1.patch


 After HBASE-7419 the HFileLink regex became stricter, to have the proper 
 isHFileLink() check.
 but HFileLink should open both reference and hfiles
 since the main idea behind it is open stuff in /table/region/family/XYZ
 This patch fix the reference (split files) restore problem and open the 
 hfilelink regex for HFileLink(/table/region/family/xyz).open()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7365) Safer table creation and deletion using .tmp dir


 [ 
https://issues.apache.org/jira/browse/HBASE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7365:
---

Attachment: HBASE-7365-v2.patch

 Safer table creation and deletion using .tmp dir
 

 Key: HBASE-7365
 URL: https://issues.apache.org/jira/browse/HBASE-7365
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.96.0

 Attachments: HBASE-7365-v0.patch, HBASE-7365-v1.patch, 
 HBASE-7365-v2.patch


 Currently tables are created in the root directory, and the removal works on 
 the root directory.
 Change the code to use a /hbase/.tmp directory to make the creation and 
 removal a bit safer
 Table Creation steps
  * Create the table descriptor (table folder, in /hbase/.tmp/)
  * Create the table regions (always in temp)
  * Move the table from temp to the root folder
  * Add the regions to meta
  * Trigger assignment
  * Set enable flag in ZooKeeper
 Table Deletion steps
  * Wait for regions in transition
  * Remove regions from meta (use bulk delete)
  * Move the table in /hbase/.tmp
  * Remove the table from the descriptor cache
  * Remove table from zookeeper
  * Archive the table
 The main changes in the current code are:
  * Writing to /hbase/.tmp and then rename
  * using bulk delete in DeletionTableHandler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7471) Enable Cleaners required for Snapshots by default


 [ 
https://issues.apache.org/jira/browse/HBASE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-7471:
-

Assignee: Ted Yu

 Enable Cleaners required for Snapshots by default
 -

 Key: HBASE-7471
 URL: https://issues.apache.org/jira/browse/HBASE-7471
 Project: HBase
  Issue Type: Sub-task
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: hbase-6055, 0.96.0

 Attachments: 7471.txt


 Currently, snapshots require admins to add configuration to their 
 hbase-site.xml to have snapshot functionality available.  It is at the moment 
 off by default.
 {code}
  property
 namehbase.snapshot.enabled/name
 valuetrue/value
   /property
 {code}
 Maybe we should just enable snapshots by default.  Discuss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7471) Enable Cleaners required for Snapshots by default


 [ 
https://issues.apache.org/jira/browse/HBASE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7471:
--

Attachment: 7471.txt

 Enable Cleaners required for Snapshots by default
 -

 Key: HBASE-7471
 URL: https://issues.apache.org/jira/browse/HBASE-7471
 Project: HBase
  Issue Type: Sub-task
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: hbase-6055, 0.96.0

 Attachments: 7471.txt


 Currently, snapshots require admins to add configuration to their 
 hbase-site.xml to have snapshot functionality available.  It is at the moment 
 off by default.
 {code}
  property
 namehbase.snapshot.enabled/name
 valuetrue/value
   /property
 {code}
 Maybe we should just enable snapshots by default.  Discuss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs


[ 
https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550583#comment-13550583
 ] 

stack commented on HBASE-7530:
--

I don't get what this change does.  Previous we had an explicit sizing.   This 
does explicit sizing too, right, by rolling at some multiple of current size?

 [replication] Work around HDFS-4380 else we get NPEs
 

 Key: HBASE-7530
 URL: https://issues.apache.org/jira/browse/HBASE-7530
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7530.patch


 I've been spending a lot of time trying to figure the recent test failures 
 related to replication. One I seem to be constantly getting is this NPE:
 {noformat}
 2013-01-09 10:08:56,912 ERROR 
 [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
 at 
 org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)
 {noformat}
 Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via 
 HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while 
 crossing block boundaries and TestReplication uses a 20KB block size for the 
 HLog. The intent was just to get HLogs to roll more often, and this can also 
 be achieved with *hbase.regionserver.logroll.multiplier* with a value of 
 0.0003f.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader


[ 
https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550584#comment-13550584
 ] 

stack commented on HBASE-7531:
--

+1

 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't 
 nullify the reader
 ---

 Key: HBASE-7531
 URL: https://issues.apache.org/jira/browse/HBASE-7531
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBASE-7531.patch


 Here's a NPE I get half the time I run TestReplication:
 {noformat}
 2012-12-20 08:59:17,259 ERROR 
 [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2]
  regionserver.ReplicationSource$1(727): Unexpected exception in 
 ReplicationSource, 
 currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332)
 {noformat}
 The issue happens after an IOE was caught while opening the reader, the issue 
 is that it isn't set to null after that then the rest of the code assumes the 
 reader is usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7537) .regioninfo not created by createHRegion()

Matteo Bertozzi created HBASE-7537:
--

 Summary: .regioninfo not created by createHRegion()
 Key: HBASE-7537
 URL: https://issues.apache.org/jira/browse/HBASE-7537
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi


After HBASE-5683 we have no longer the .regioninfo written on disk during the 
table creation.
so, if we fail before adding entries to .META. we end up with regions on disk 
that has no information, and hbck is not able to recover this situation.

The .regioninfo is written in checkRegioninfoOnFilesystem() that was called by 
initialize(), during the table creation and region opening. With HBASE-5683 we 
skip the call to initialize(), in during the region creation, to avoid to 
initialize the memstore  co.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7480) Explicit message for not allowed snapshot on meta tables


[ 
https://issues.apache.org/jira/browse/HBASE-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550587#comment-13550587
 ] 

Ted Yu commented on HBASE-7480:
---

+1 from me.

 Explicit message for not allowed snapshot on meta tables
 

 Key: HBASE-7480
 URL: https://issues.apache.org/jira/browse/HBASE-7480
 Project: HBase
  Issue Type: Sub-task
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: hbase-6055, 0.96.0

 Attachments: HBASE-7480-v0.patch


 taking a snapshot of -ROOT- or .META. now results in something like this:
 {code}
 Illegal first character 46 at 0. User-space table names can only start with 
 'word characters': i.e. [a-zA-Z_0-9]
 {code}
 changing the message in something more human readable to inform that meta 
 table are not supported

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7535) Fix restore reference files