from:"Ted Yu \(JIRA\)"

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu updated HBASE-5416:
--

Attachment: 5416-Filtered_scans_v6.patch

Rebased Max's latest patch on trunk.

Improve performance of scans with some kind of filters.
---

Key: HBASE-5416
URL: https://issues.apache.org/jira/browse/HBASE-5416
Project: HBase
Issue Type: Improvement
Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Attachments: 5416-Filtered_scans_v6.patch, 5416-v5.txt, 5416-v6.txt,
Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch,
Filtered_scans_v4.patch, Filtered_scans_v5.1.patch, Filtered_scans_v5.patch

When the scan is performed, whole row is loaded into result list, after that
filter (if exists) is applied to detect that row is needed.
But when scan is performed on several CFs and filter checks only data from
the subset of these CFs, data from CFs, not checked by a filter is not needed
on a filter stage. Only when we decided to include current row. And in such
case we can significantly reduce amount of IO performed by a scan, by loading
only values, actually checked by a filter.
For example, we have two CFs: flags and snap. Flags is quite small (bunch of
megabytes) and is used to filter large entries from snap. Snap is very large
(10s of GB) and it is quite costly to scan it. If we needed only rows with
some flag specified, we use SingleColumnValueFilter to limit result to only
small subset of region. But current implementation is loading both CFs to
perform scan, when only small subset is needed.
Attached patch adds one routine to Filter interface to allow filter to
specify which CF is needed to it's operation. In HRegion, we separate all
scanners into two groups: needed for filter and the rest (joined). When new
row is considered, only needed data is loaded, filter applied, and only if
filter accepts the row, rest of data is loaded. At our data, this speeds up
such kind of scans 30-50 times. Also, this gives us the way to better
normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect


 [ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5974:
--

Attachment: 5974_94-V4.patch

Patch v4 makes a small change to JVMClusterUtil.java so that 
RegionServerWithScanTimeout can be made private.

TestClientScannerRPCTimeout passes.

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.94.1

 Attachments: 5974_94-V4.patch, HBASE-5974_0.94.patch, 
 HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6138:
--

Attachment: 6138-addendum.txt

Addendum provided by Jesse.

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0

 Attachments: 6138-addendum.txt, 6138.txt


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6067:
--

Fix Version/s: 0.92.3
   0.94.1

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu updated HBASE-5416:
--

Attachment: (was: 5416-Filtered_scans_v6.patch)

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu updated HBASE-5416:
--

Attachment: 5416-Filtered_scans_v6.patch

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288218#comment-13288218
 ] 

Zhihong Ted Yu commented on HBASE-6067:
---

Integrated to trunk first.

Thanks for the review, Eli.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288248#comment-13288248
 ] 

Zhihong Ted Yu commented on HBASE-6067:
---

Integrated to 0.94
Will wait for release of 0.92.2 before integrating to 0.92 branch.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-6138:
-

Assignee: Jesse Yates  (was: Anoop Sam John)

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: 6138-addendum.txt, 6138.txt


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu resolved HBASE-6138.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

From QA report of HBASE-5416:
{code}
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
{code}

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: 6138-addendum.txt, 6138.txt


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect


[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288318#comment-13288318
 ] 

Zhihong Ted Yu commented on HBASE-5974:
---

w.r.t. potential change to RegionScanner, if users create wrapper(s), the 
maintenance of seqNo would still be completed by core implementation. See the 
following:
{code}
@InterfaceAudience.Private
public interface RegionScanner extends InternalScanner {
{code}

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.94.1

 Attachments: 5974_94-V4.patch, HBASE-5974_0.94.patch, 
 HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect


[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288318#comment-13288318
 ] 

Zhihong Ted Yu edited comment on HBASE-5974 at 6/4/12 4:18 AM:
---

w.r.t. potential change to RegionScanner, if users create wrapper(s), the 
maintenance of seqNo would still be completed by core implementation.
See the following in patch:
{code}
+  public Result[] next(final long scannerId, int nbRows) throws IOException {
+return next(scannerId, nbRows, -1);
+  }
{code}
and the following in current code base:
{code}
@InterfaceAudience.Private
public interface RegionScanner extends InternalScanner {
{code}

  was (Author: zhi...@ebaysf.com):
w.r.t. potential change to RegionScanner, if users create wrapper(s), the 
maintenance of seqNo would still be completed by core implementation. See the 
following:
{code}
@InterfaceAudience.Private
public interface RegionScanner extends InternalScanner {
{code}
  
 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.94.1

 Attachments: 5974_94-V4.patch, HBASE-5974_0.94.patch, 
 HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer


[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288592#comment-13288592
 ] 

Zhihong Ted Yu commented on HBASE-5699:
---

Can you run ycsb with 50% insert and 50% update load ?
Performance numbers in attachment match what I got based on my implementation.

Thanks

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi
 Attachments: PerfHbase.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288606#comment-13288606
 ] 

Zhihong Ted Yu commented on HBASE-6067:
---

@Daryn:
Looking at:
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Class.html#getDeclaredMethod%28java.lang.String,%20java.lang.Class[]%29
I don't see why getDeclaredMethod wouldn't find the desired method from this.fs

w.r.t. setAccessible(), I didn't include it in patch v1 for this reason.
Then I found that getGetNumCurrentReplicas() was using the call. And:
{code}
public int getNumCurrentReplicas() throws IOException {
./src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
{code}

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

[
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288608#comment-13288608
]

Zhihong Ted Yu commented on HBASE-6060:
---

Agreed.
We can integrate once patch for 0.92 is ready.

Regions's in OPENING state from failed regionservers takes a long time to
recover
-

Key: HBASE-6060
URL: https://issues.apache.org/jira/browse/HBASE-6060
Project: HBase
Issue Type: Bug
Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 0.96.0, 0.94.1, 0.92.3

Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch,
6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch,
HBASE-6060-94.patch

we have seen a pattern in tests, that the regions are stuck in OPENING state
for a very long time when the region server who is opening the region fails.
My understanding of the process:

- master calls rs to open the region. If rs is offline, a new plan is
generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in
master memory, zk still shows OFFLINE). See HRegionServer.openRegion(),
HMaster.assign()
- RegionServer, starts opening a region, changes the state in znode. But
that znode is not ephemeral. (see ZkAssign)
- Rs transitions zk node from OFFLINE to OPENING. See
OpenRegionHandler.process()
- rs then opens the region, and changes znode from OPENING to OPENED
- when rs is killed between OPENING and OPENED states, then zk shows OPENING
state, and the master just waits for rs to change the region state, but since
rs is down, that wont happen.
- There is a AssignmentManager.TimeoutMonitor, which does exactly guard
against these kind of conditions. It periodically checks (every 10 sec by
default) the regions in transition to see whether they timedout
(hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min,
which explains what you and I are seeing.
- ServerShutdownHandler in Master does not reassign regions in OPENING
state, although it handles other states.
Lowering that threshold from the configuration is one option, but still I
think we can do better.
Will investigate more.

[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288636#comment-13288636
 ] 

Zhihong Ted Yu commented on HBASE-6067:
---

@Stack:
Do you think this JIRA should be in 0.92.2 RC ?

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect


[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288644#comment-13288644
 ] 

Zhihong Ted Yu commented on HBASE-5974:
---

I would listen to Todd and Andy's opinion.

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.96.0, 0.94.1

 Attachments: 5974_94-V4.patch, 5974_trunk.patch, 
 HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96


[ 
https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288733#comment-13288733
 ] 

Zhihong Ted Yu commented on HBASE-6055:
---

bq. The HLog will have edits from regions not relevant to the table's regions.
Over in HBASE-5699, each one of the multiple WALs can be devised to receive 
edits from single table.

 Snapshots in HBase 0.96
 ---

 Key: HBASE-6055
 URL: https://issues.apache.org/jira/browse/HBASE-6055
 Project: HBase
  Issue Type: New Feature
  Components: client, master, regionserver, zookeeper
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: Snapshots in HBase.docx


 Continuation of HBASE-50 for the current trunk. Since the implementation has 
 drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6067:
--

Attachment: 6067-addendum.txt

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-addendum.txt, 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288740#comment-13288740
 ] 

Zhihong Ted Yu commented on HBASE-6067:
---

Addendum integrated to 0.94 and trunk.

Thanks for the tip, Daryn.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6067-addendum.txt, 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288756#comment-13288756
 ] 

Zhihong Ted Yu commented on HBASE-6046:
---

bq. New ServerManager is not created
I think there is a typo above: 'not' - 'now'

 Master retry on ZK session expiry causes inconsistent region assignments.
 -

 Key: HBASE-6046
 URL: https://issues.apache.org/jira/browse/HBASE-6046
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE_6046-trunk.patch, HBASE_6046_0.94.patch, 
 HBASE_6046_0.94_1.patch, HBASE_6046_0.94_2.patch, HBASE_6046_0.94_3.patch


 1 ZK Session timeout in the hmaster leads to bulk assignment though all the 
 RSs are online.
 2 While doing bulk assignment, if the master again goes down  restart(or 
 backup comes up) all the node created in the ZK will now be tried to reassign 
 to the new RSs. This is leading to double assignment.
 we had 2800 regions, among this 1900 region got double assignment, taking the 
 region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6131) Add attribution for code added by HBASE-5533 metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6131:
--

Fix Version/s: 0.96.0
   0.92.2

 Add attribution for code added by HBASE-5533 metrics
 

 Key: HBASE-6131
 URL: https://issues.apache.org/jira/browse/HBASE-6131
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6131.txt, 6131_092.txt, 6131_094.txt


 See the comment over in 
 https://issues.apache.org/jira/browse/HBASE-5533?focusedCommentId=13283920page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13283920
 The metrics histogram code was copied w/o attribution.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6158) Data loss if the words 'merges' or 'splits' are used as Column Family name


 [ 
https://issues.apache.org/jira/browse/HBASE-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-6158:
-

Assignee: Aditya Kishore

 Data loss if the words 'merges' or 'splits' are used as Column Family name
 --

 Key: HBASE-6158
 URL: https://issues.apache.org/jira/browse/HBASE-6158
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
 Attachments: HBASE-6158.patch


 If a table is creates with either 'merges' or 'splits' as one of the Column 
 Family name it can never be flushed to the disk even though the table 
 creation (and data population) succeeds.
 The reason for this is that these two are used as temporary directory names 
 inside the region folder or merge and splits respectively and hence conflicts 
 with the directories created for CF with same name.
 A simple fix would be to uses .merges' and .splits as the working folder 
 (patch attached). This will also be consistent with other work folder names. 
 An alternate fix would be to declare these words (and other similar) as 
 reserve words and throw exception when they are used. However, I do find the 
 alternate approach as unnecessarily restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6158) Data loss if the words 'merges' or 'splits' are used as Column Family name


[ 
https://issues.apache.org/jira/browse/HBASE-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288769#comment-13288769
 ] 

Zhihong Ted Yu commented on HBASE-6158:
---

Can you generate patch for trunk ?
{code}
-  static final String MERGEDIR = merges;
+  static final String MERGEDIR = .merges;
{code}
The above constant is only used in HRegion. We can make it private, right ?

 Data loss if the words 'merges' or 'splits' are used as Column Family name
 --

 Key: HBASE-6158
 URL: https://issues.apache.org/jira/browse/HBASE-6158
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6158.patch


 If a table is creates with either 'merges' or 'splits' as one of the Column 
 Family name it can never be flushed to the disk even though the table 
 creation (and data population) succeeds.
 The reason for this is that these two are used as temporary directory names 
 inside the region folder or merge and splits respectively and hence conflicts 
 with the directories created for CF with same name.
 A simple fix would be to uses .merges' and .splits as the working folder 
 (patch attached). This will also be consistent with other work folder names. 
 An alternate fix would be to declare these words (and other similar) as 
 reserve words and throw exception when they are used. However, I do find the 
 alternate approach as unnecessarily restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6158) Data loss if the words 'merges' or 'splits' are used as Column Family name


 [ 
https://issues.apache.org/jira/browse/HBASE-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6158:
--

Fix Version/s: 0.94.1
   0.96.0

 Data loss if the words 'merges' or 'splits' are used as Column Family name
 --

 Key: HBASE-6158
 URL: https://issues.apache.org/jira/browse/HBASE-6158
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6158.patch


 If a table is creates with either 'merges' or 'splits' as one of the Column 
 Family name it can never be flushed to the disk even though the table 
 creation (and data population) succeeds.
 The reason for this is that these two are used as temporary directory names 
 inside the region folder or merge and splits respectively and hence conflicts 
 with the directories created for CF with same name.
 A simple fix would be to uses .merges' and .splits as the working folder 
 (patch attached). This will also be consistent with other work folder names. 
 An alternate fix would be to declare these words (and other similar) as 
 reserve words and throw exception when they are used. However, I do find the 
 alternate approach as unnecessarily restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6158) Data loss if the words 'merges' or 'splits' are used as Column Family name


 [ 
https://issues.apache.org/jira/browse/HBASE-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6158:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Data loss if the words 'merges' or 'splits' are used as Column Family name
 --

 Key: HBASE-6158
 URL: https://issues.apache.org/jira/browse/HBASE-6158
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6158_94.patch, HBASE-6158_trunk.patch


 If a table is creates with either 'merges' or 'splits' as one of the Column 
 Family name it can never be flushed to the disk even though the table 
 creation (and data population) succeeds.
 The reason for this is that these two are used as temporary directory names 
 inside the region folder or merge and splits respectively and hence conflicts 
 with the directories created for CF with same name.
 A simple fix would be to uses .merges' and .splits as the working folder 
 (patch attached). This will also be consistent with other work folder names. 
 An alternate fix would be to declare these words (and other similar) as 
 reserve words and throw exception when they are used. However, I do find the 
 alternate approach as unnecessarily restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6067:
--

Fix Version/s: (was: 0.92.3)
   0.92.2

Integrated to 0.92 branch.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6067-addendum.txt, 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem


 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6067:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Ted Yu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6067-addendum.txt, 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries


[ 
https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288892#comment-13288892
 ] 

Zhihong Ted Yu commented on HBASE-6160:
---

@Enis:
Can you attach log snippets to show the problem ?
e.g. what was the duration between the two splits.

 META entries from daughters can be deleted before parent entries
 

 Key: HBASE-6160
 URL: https://issues.apache.org/jira/browse/HBASE-6160
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 HBASE-5986 fixed and issue, where the client sees the META entry for the 
 parent, but not the children. However, after the fix, we have seen the 
 following issue in tests: 
 Region A is split to - B, C
 Region B is split to - D, E
 After some time, META entry for B is deleted since it is not needed anymore, 
 but META entry for Region A stays in META (C still refers it). In this case, 
 the client throws RegionOfflineException for B. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6160) META entries from daughters can be deleted before parent entries

[
https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288892#comment-13288892
]

Zhihong Ted Yu edited comment on HBASE-6160 at 6/4/12 8:47 PM:
---

@Enis:
Can you attach log snippets to show the problem ?
e.g. what was the interval between the two splits.

was (Author: zhi...@ebaysf.com):
@Enis:
Can you attach log snippets to show the problem ?
e.g. what was the duration between the two splits.

META entries from daughters can be deleted before parent entries

Key: HBASE-6160
URL: https://issues.apache.org/jira/browse/HBASE-6160
Project: HBase
Issue Type: Bug
Components: client, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

HBASE-5986 fixed and issue, where the client sees the META entry for the
parent, but not the children. However, after the fix, we have seen the
following issue in tests:
Region A is split to - B, C
Region B is split to - D, E
After some time, META entry for B is deleted since it is not needed anymore,
but META entry for Region A stays in META (C still refers it). In this case,
the client throws RegionOfflineException for B.

[jira] [Comment Edited] (HBASE-3271) Allow .META. table to be exported


[ 
https://issues.apache.org/jira/browse/HBASE-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935184#comment-12935184
 ] 

Zhihong Ted Yu edited comment on HBASE-3271 at 6/4/12 11:32 PM:


I used this code:
{code}
if (keys == null || keys.getFirst() == null || 
keys.getFirst().length == 0) {
HRegionLocation regLoc = 
table.getRegionLocation(HConstants.EMPTY_BYTE_ARRAY);
if (null == regLoc)
throw new IOException(Expecting at least one region.);
ListInputSplit splits = new ArrayListInputSplit(1); 
InputSplit split = new TableSplit(table.getTableName(),
HConstants.EMPTY_BYTE_ARRAY, 
HConstants.EMPTY_BYTE_ARRAY,
regLoc.getServerAddress().getHostname());
splits.add(split);
return splits;
}
{code}
The following command only exports rows in .META. which have 'packageindex' 
(refer to HBASE-3255):
bin/hbase org.apache.hadoop.hbase.mapreduce.Export .META. h-meta 1 0 0 
packageindex

-rwxrwxrwx 1 hadoop users 90700 Nov 24 03:31 h-meta/part-m-0

  was (Author: yuzhih...@gmail.com):
I used this code:
if (keys == null || keys.getFirst() == null || 
keys.getFirst().length == 0) {
HRegionLocation regLoc = 
table.getRegionLocation(HConstants.EMPTY_BYTE_ARRAY);
if (null == regLoc)
throw new IOException(Expecting at least one region.);
ListInputSplit splits = new ArrayListInputSplit(1); 
InputSplit split = new TableSplit(table.getTableName(),
HConstants.EMPTY_BYTE_ARRAY, 
HConstants.EMPTY_BYTE_ARRAY,
regLoc.getServerAddress().getHostname());
splits.add(split);
return splits;
}

The following command only exports rows in .META. which have 'packageindex' 
(refer to HBASE-3255):
bin/hbase org.apache.hadoop.hbase.mapreduce.Export .META. h-meta 1 0 0 
packageindex

-rwxrwxrwx 1 hadoop users 90700 Nov 24 03:31 h-meta/part-m-0
  
 Allow .META. table to be exported
 -

 Key: HBASE-3271
 URL: https://issues.apache.org/jira/browse/HBASE-3271
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.20.6
Reporter: Ted Yu

 I tried to export .META. table in 0.20.6 and got:
 [hadoop@us01-ciqps1-name01 hbase]$ bin/hbase 
 org.apache.hadoop.hbase.mapreduce.Export .META. h-meta 1 0 0
 10/11/23 20:59:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 2010-11-23 20:59:05.255::INFO:  Logging to STDERR via 
 org.mortbay.log.StdErrLog
 2010-11-23 20:59:05.255::INFO:  verisons=1, starttime=0, 
 endtime=9223372036854775807
 10/11/23 20:59:05 INFO zookeeper.ZooKeeper: Client 
 environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT
 10/11/23 20:59:05 INFO zookeeper.ZooKeeper: Client 
 environment:host.name=us01-ciqps1-name01.carrieriq.com
 10/11/23 20:59:05 INFO zookeeper.ZooKeeper: Client 
 environment:java.version=1.6.0_21
 10/11/23 20:59:05 INFO zookeeper.ZooKeeper: Client 
 environment:java.vendor=Sun Microsystems Inc.
 ...
 10/11/23 20:59:05 INFO zookeeper.ClientCnxn: Server connection successful
 10/11/23 20:59:05 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode 
 /hbase/root-region-server got 10.202.50.112:60020
 10/11/23 20:59:05 DEBUG client.HConnectionManager$TableServers: Found ROOT at 
 10.202.50.112:60020
 10/11/23 20:59:05 DEBUG client.HConnectionManager$TableServers: Cached 
 location for .META.,,1 is us01-ciqps1-grid02.carrieriq.com:60020
 Exception in thread main java.io.IOException: Expecting at least one region.
 at 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:281)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
 at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:146)
 Related code is:
 if (keys == null || keys.getFirst() == null ||
 keys.getFirst().length == 0) {
   throw new IOException(Expecting at least one region.);
 }
 My intention was to save the dangling rows in .META. (for future 
 investigation) which prevented a table from being created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries


[ 
https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289131#comment-13289131
 ] 

Zhihong Ted Yu commented on HBASE-6160:
---

Minor comments:
{code}
+   * Scans META ane returns a pair of number of scanned rows, and
{code}
'ane' - 'and'
I think 'a pair of' is not needed above.
{code}
+//we could not clean the parent, so it's daughters should not be 
cleaned as well (HBASE-6160)
{code}
'as well' - 'either'


 META entries from daughters can be deleted before parent entries
 

 Key: HBASE-6160
 URL: https://issues.apache.org/jira/browse/HBASE-6160
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6160_v1.patch


 HBASE-5986 fixed and issue, where the client sees the META entry for the 
 parent, but not the children. However, after the fix, we have seen the 
 following issue in tests: 
 Region A is split to - B, C
 Region B is split to - D, E
 After some time, META entry for B is deleted since it is not needed anymore, 
 but META entry for Region A stays in META (C still refers it). In this case, 
 the client throws RegionOfflineException for B. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6147) SSH and AM.joinCluster leads to region assignment inconsistency in many cases.


[ 
https://issues.apache.org/jira/browse/HBASE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289455#comment-13289455
 ] 

Zhihong Ted Yu commented on HBASE-6147:
---

Nice start.
{code}
+  Thread.sleep(100);
+  waitedTimeForMasterInitialized += 100;
{code}
We don't know how long sleep() call may actually have taken. Better maintain 
timing ourselves.
{code}
+  Thread.currentThread().interrupt();
+  throw new IOException(Interrupted, e);
{code}
InterruptedIOException should be created above.

 SSH and AM.joinCluster leads to region assignment inconsistency in many cases.
 --

 Key: HBASE-6147
 URL: https://issues.apache.org/jira/browse/HBASE-6147
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.92.3

 Attachments: HBASE-6147.patch


 We are facing few issues in the master restart and SSH going in parallel.
 Chunhui also suggested that we need to rework on this part.  This JIRA is 
 aimed at solving all such possibilities of region assignment inconsistency

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289503#comment-13289503
 ] 

Zhihong Ted Yu commented on HBASE-6060:
---

{code}
+  if (newPlan) {
+this.regionPlans.remove(randomPlan.getRegionName());
+LOG
+.info(Server shutdown handler already in progress for the region 
++ randomPlan.getRegionName());
+randomPlan = RegionPlan.REGION_PLAN_ALREADY_INUSE;
{code}
It would be confusing to label a new plan 'ALREADY_INUSE'.
{code}
+  // the following singleton signifies that the plan is not usable
+  static final RegionPlan REGION_PLAN_ALREADY_INUSE = new RegionPlan(null, 
null, null);
{code}
I think UNUSABLE_REGION_PLAN would be a better name.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, HBASE-6060-92.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6162) Move KeyValue to hbase-common module


 [ 
https://issues.apache.org/jira/browse/HBASE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6162:
--

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 Move KeyValue to hbase-common module
 

 Key: HBASE-6162
 URL: https://issues.apache.org/jira/browse/HBASE-6162
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Fix For: 0.96.0

 Attachments: HBASE-6162-v1.patch


 * pull KeyValue up to hbase-common module
 This is part of the modularization strategy in HBASE-5977, and is 
 specifically necessary to modularize HBASE-4676.
 also brings these classes to hbase-common:
 * ClassSize, HeapSize
 * HTestConst
 * TestKeyValue, KeyValueTestUtil
 * LoadTestKVGenerator, TestLoadTestKVGenerator
 * MD5Hash
 moves a trivial constant (HRegionInfo.DELIMITER) from HRegionInfo to 
 HConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289703#comment-13289703
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

Nice work.
{code}
private static class ProcessR {
{code}
Can we name the above class more meaningfully ? javadoc is desirable.
{code}
   * @param sleepTime - sleep time befora actually executing the actions. 
Can be zero.
{code}
'befora' - 'before'
{code}
   for (ActionR aTodo : actionsList) {
{code}
aToDo - anAction ?
{code}
  CallableMultiResponse callable = createDelayedCallable(sleepTime, 
e.getKey(), e.getValue());
  TripleMultiActionR, HRegionLocation, FutureMultiResponse p =
new TripleMultiActionR, HRegionLocation, 
FutureMultiResponse(e.getValue(), e.getKey(), this.pool.submit(callable));
{code}
Wrap the two long lines above.
{code}
  throw new IllegalArgumentException(
argument results must be the same size as argument list);
{code}
It would be nice to include the sizes in exception message.


 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 5924.v5.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassign the same region


 [ 
https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5816:
--

Description: 
The first assign thread exits with success after updating the RegionState to 
PENDING_OPEN, while the second assign follows immediately into assign and 
fails the RegionState check in setOfflineInZooKeeper(). This causes the master 
to abort.

In the below case, the two concurrent assigns occurred when AM tried to assign 
a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to 
assign this region (from the region plan) spontaneously.
{code}
2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
src=hadoop05.sh.intel.com,60020,1334544902186, 
dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
(offlining)
2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, 
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region 
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling new unassigned node: 
/hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b 
(region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186
2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x236b912e9b3000e Creating (or updating) unassigned node for 
fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; 
plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
src=hadoop05.sh.intel.com,60020,1334544902186, 
dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
state=PENDING_OPEN, ts=1334613179096, 
server=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment of 
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, 
regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket 
timeout exception: java.net.SocketTimeoutException: 12 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 
remote=/10.239.47.87:60020]
at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
at $Proxy7.openRegion(Unknown Source)
at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
at 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: 12 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302

[jira] [Commented] (HBASE-6164) Correct the bug in block encoding usage in bulkload


[ 
https://issues.apache.org/jira/browse/HBASE-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290189#comment-13290189
 ] 

Zhihong Ted Yu commented on HBASE-6164:
---

Patch looks good.
{code}
-  // Save data block encoder metadata in the file info.
-  dataBlockEncoder.saveMetadata(this);
{code}
Why is the above method lifted out of StoreFile ?

 Correct the bug in block encoding usage in bulkload
 ---

 Key: HBASE-6164
 URL: https://issues.apache.org/jira/browse/HBASE-6164
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.96.0, 0.94.1
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0, 0.94.1

 Attachments: 6164_94.patch, 6164_Trunk.patch


 Address the issue raised under HBASE-6040
 https://issues.apache.org/jira/browse/HBASE-6040?focusedCommentId=13289334page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13289334

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign since SSH using


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290254#comment-13290254
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

{code}
+  ListRegionOpeningState regionOpeningStateList = this.serverManager
+  .sendRegionOpen(destination, regions);
+  for (int i = 0; i  regionOpeningStateList.size(); i++) {
{code}
Should we check whether the return from sendRegionOpen() is null ?

In ServerShutdownHandler.java:
{code}
+if(rit != null){
+  //clean zk node
+  try{
+ZKAssign.deleteNodeFailSilent(services.getZooKeeper(), 
e.getKey());
{code}
Log statement should be added that reveals the value of rit.


 Handling RegionOpeningState for bulk assign since SSH using
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, HBASE-6012v3.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5360) [uberhbck] Add options for how to handle offline split parents.


[ 
https://issues.apache.org/jira/browse/HBASE-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290300#comment-13290300
 ] 

Zhihong Ted Yu commented on HBASE-5360:
---

{code}
-  Path referencePath = getReferredToFile(p);
+  Path referencePath = StoreFileUtil.getReferredToFile(p);
{code}
Can you outline the rationale behind moving methods to StoreFileUtil ?
For sidelineSplitParent():
{code}
+if (needReassign) {
+  toBeReassigned.add(child);
+}
{code}
The above code is inside inner loop. Would a child be added to toBeReassigned 
multiple times ?
{code}
+  // scenario (2)
+  errors.reportError(ERROR_CODE.FAILED_SPLIT_PARENT, Region 
++ descriptiveName + , key= + key + , on HDFS, failed split parent);
+  if (shouldFixSplitParents()) {
+resetSplitParent(hbi);
{code}
Do we need to do something about the children in above scenario ?
I think we should provide two flags to user corresponding to the two scenarios 
so that they can decide which scenario(s) to fix.
{code}
+   * Daughters do refer to parent.
+   */
+  @Test
+  public void testLingeringSplitParent2() throws Exception {
{code}
Please give the two test cases meaningful names that are consistent with 
javadoc.

 [uberhbck] Add options for how to handle offline split parents. 
 

 Key: HBASE-5360
 URL: https://issues.apache.org/jira/browse/HBASE-5360
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.1, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jimmy Xiang
 Attachments: hbase-5360.path


 In a recent case, we attempted to repair a cluster that suffered from 
 HBASE-4238 that had about 6-7 generations of leftover split data.  The hbck 
 repair options in an development version of HBASE-5128 treat HDFS as ground 
 truth but didn't check SPLIT and OFFLINE flags only found in meta.  The net 
 effect was that it essentially attempted to merge many regions back into its 
 eldest geneneration's parent's range.  
 More safe guards to prevent mega-merges are being added on HBASE-5128.
 This issue would automate the handling of the mega-merge avoiding cases 
 such as lingering grandparents.  The strategy here would be to add more 
 checks against .META., and perform part of the catalog janitor's 
 responsibilities for lingering grandparents.  This would potentially include 
 options to sideline regions, deleting grandparent regions, min size for 
 sidelining, and mechanisms for cleaning .META..  
 Note: There already exists an mechanism to reload these regions -- the bulk 
 loaded mechanisms in LoadIncrementalHFiles can be used to re-add grandparents 
 (automatically splitting them if necessary) to HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6162) Move KeyValue to hbase-common module


[ 
https://issues.apache.org/jira/browse/HBASE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290305#comment-13290305
 ] 

Zhihong Ted Yu commented on HBASE-6162:
---

Hadoop QA didn't run any test.
See https://builds.apache.org/job/PreCommit-HBASE-Build/2108/console.

 Move KeyValue to hbase-common module
 

 Key: HBASE-6162
 URL: https://issues.apache.org/jira/browse/HBASE-6162
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Fix For: 0.96.0

 Attachments: HBASE-6162-v1.patch


 * pull KeyValue up to hbase-common module
 This is part of the modularization strategy in HBASE-5977, and is 
 specifically necessary to modularize HBASE-4676.
 also brings these classes to hbase-common:
 * ClassSize, HeapSize
 * HTestConst
 * TestKeyValue, KeyValueTestUtil
 * LoadTestKVGenerator, TestLoadTestKVGenerator
 * MD5Hash
 moves a trivial constant (HRegionInfo.DELIMITER) from HRegionInfo to 
 HConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6153) RS aborted due to rename problem (maybe a race)


[ 
https://issues.apache.org/jira/browse/HBASE-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290367#comment-13290367
 ] 

Zhihong Ted Yu commented on HBASE-6153:
---

ip-10-68-7-146.ec2.internal went down:
{code}
2012-05-31 18:34:42,541 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. 
Forcing server shutdown
{code}
The above lagged the other log snippets by 3 hours.
More log around 05-31 15:11 from ip-10-68-7-146.ec2.internal should help 
clarify.

 RS aborted due to rename problem (maybe a race)
 ---

 Key: HBASE-6153
 URL: https://issues.apache.org/jira/browse/HBASE-6153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Devaraj Das
Assignee: Devaraj Das

 I had a RS crash with the following:
 2012-05-31 18:34:42,534 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Renaming flushed file at 
 hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
  to 
 hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
 2012-05-31 18:34:42,536 WARN org.apache.hadoop.hbase.regionserver.Store: 
 Unable to rename 
 hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
  to 
 hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
 2012-05-31 18:34:42,541 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. 
 Forcing server shutdown
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/87_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1288)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1172)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1114)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1901)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1892)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:636)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:470)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:595)
 On the NameNode logs:
 2012-05-31 18:34:42,588 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.unprotectedRenameTo: failed to rename 
 /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
  to 
 /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
  because destination's parent does not exist
 I haven't looked deeply yet but I guess it is a race of some sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290483#comment-13290483
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

{code}
+  // We need the origin multi action to find out what are the actions 
to replay if
{code}
'origin' - 'original', 'what are the actions to replay' - 'what actions to 
replay'
{code}
+} catch (InterruptedException e) {
+  throw new IOException(e);
{code}
InterruptedIOException should be thrown.
{code}
+  // mutate list so that it is empty for complete success, or contains
+  // only failed records results are returned in the same order as the
+  // requests in list walk the list backwards, so we can remove from list
{code}
The above is hard to read. A period between 'records' and 'results' ? A period 
between 'list' and 'walk' ?

Hadoop QA didn't run tests:
https://builds.apache.org/job/PreCommit-HBASE-Build/2116/console

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 5924.v5.patch, 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290509#comment-13290509
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java was not 
included in patch v9.
Hence:
{code}
[ERROR] 
/home/hduser/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2061,23]
 cannot find symbol
[ERROR] symbol  : class Triple
[ERROR] location: class 
org.apache.hadoop.hbase.client.HConnectionManager.HConnectionImplementation.ProcessR
{code}

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 5924.v5.patch, 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6162) Move KeyValue to hbase-common module


[ 
https://issues.apache.org/jira/browse/HBASE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290525#comment-13290525
 ] 

Zhihong Ted Yu commented on HBASE-6162:
---

The patch produces a lot of compilation errors:
{code}
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestProcessBasedCluster.java:[29,30]
 cannot find symbol
[ERROR] symbol  : class HTestConst
[ERROR] location: package org.apache.hadoop.hbase
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java:[121,11]
 cannot find symbol
[ERROR] symbol  : variable LoadTestKVGenerator
[ERROR] location: class org.apache.hadoop.hbase.util.MultiThreadedWriter
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java:[128,58]
 cannot find symbol
[ERROR] symbol  : class LoadTestKVGenerator
[ERROR] location: class 
org.apache.hadoop.hbase.util.MultiThreadedWriter.HBaseWriterThread
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java:[191,24]
 cannot find symbol
[ERROR] symbol  : variable LoadTestKVGenerator
[ERROR] location: class 
org.apache.hadoop.hbase.util.MultiThreadedWriter.HBaseWriterThread
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java:[226,10]
 cannot find symbol
[ERROR] symbol  : variable LoadTestKVGenerator
[ERROR] location: class 
org.apache.hadoop.hbase.util.MultiThreadedReader.HBaseReaderThread
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java:[282,17]
 cannot find symbol
[ERROR] symbol  : variable LoadTestKVGenerator
[ERROR] location: class 
org.apache.hadoop.hbase.util.MultiThreadedReader.HBaseReaderThread
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestColumnRangeFilter.java:[192,24]
 cannot find symbol
[ERROR] symbol  : variable KeyValueTestUtil
[ERROR] location: class org.apache.hadoop.hbase.filter.TestColumnRangeFilter
[ERROR] 
[ERROR] 
/home/hduser/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java:[80,8]
 cannot find symbol
[ERROR] symbol  : variable KeyValueTestUtil
[ERROR] location: class org.apache.hadoop.hbase.regionserver.TestStoreScanner
{code}

 Move KeyValue to hbase-common module
 

 Key: HBASE-6162
 URL: https://issues.apache.org/jira/browse/HBASE-6162
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Fix For: 0.96.0

 Attachments: HBASE-6162-v1.patch


 * pull KeyValue up to hbase-common module
 This is part of the modularization strategy in HBASE-5977, and is 
 specifically necessary to modularize HBASE-4676.
 also brings these classes to hbase-common:
 * ClassSize, HeapSize
 * HTestConst
 * TestKeyValue, KeyValueTestUtil
 * LoadTestKVGenerator, TestLoadTestKVGenerator
 * MD5Hash
 moves a trivial constant (HRegionInfo.DELIMITER) from HRegionInfo to 
 HConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign since SSH using


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291122#comment-13291122
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

{code}
   LOG.info(Unable to communicate with the region server in order +
to assign regions, e);
-  return false;
+  // Server may already get RPC
+  return true;
{code}
What was the reasoning behind the above change ?
{code}
-  try {
-if (!assign(e.getKey(), e.getValue())) {
-  failedPlans.put(e.getKey(), e.getValue());
-}
-  } catch (Throwable t) {
+  if (!assign(e.getKey(), e.getValue())) {
{code}
I think the catch clause should be kept.

For HRegionServer.java, there're a lot of formatting changes which distract 
reviewing.
{code}
+  } catch (RegionAlreadyInTransitionException rie) {
+LOG.warn(, rie);
{code}
Please add some sentence for the log above.

 Handling RegionOpeningState for bulk assign since SSH using
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6147) SSH and AM.joinCluster leads to region assignment inconsistency in many cases.


[ 
https://issues.apache.org/jira/browse/HBASE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291139#comment-13291139
 ] 

Zhihong Ted Yu commented on HBASE-6147:
---

In testing phase, an option may be introduced to enable the following:
{code}
+  waitTillMasterInitialized();
{code}
so that we can compare performance difference.

 SSH and AM.joinCluster leads to region assignment inconsistency in many cases.
 --

 Key: HBASE-6147
 URL: https://issues.apache.org/jira/browse/HBASE-6147
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.92.3

 Attachments: HBASE-6147.patch, HBASE-6147_trunk.patch


 We are facing few issues in the master restart and SSH going in parallel.
 Chunhui also suggested that we need to rework on this part.  This JIRA is 
 aimed at solving all such possibilities of region assignment inconsistency

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5498) Secure Bulk Load


[ 
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291182#comment-13291182
 ] 

Zhihong Ted Yu commented on HBASE-5498:
---

{code}
+   * @param familyPaths list of family names to store files adding
+   * or removing from this list will add or remove HFiles to be bulk loaded.
{code}
Add a period between files and adding. Capitalize 'a' of adding.
{code}
+for(Pairbyte[], String el: familyPaths)
+families.add(el.getFirst());
{code}
Space between for and (, el and colon. families.add() should be put on the same 
line as for.
{code}
-class StoreFileScanner implements KeyValueScanner {
+public class StoreFileScanner implements KeyValueScanner {
{code}
I don't see StoreFileScanner accessed in AccessController. So the above change 
is not needed.
{code}
+  //TODO make this configurable
+  //two levels so it doesn't get deleted accidentally
+  //no sticky bit in Hadoop 1.0
+  private Path stagingDir = new Path(/tmp/hbase-staging);
{code}
I think the path should be configurable.
{code}
+  private User getActiveUser() throws IOException {
+User user = RequestContext.getRequestUser();
+if (!RequestContext.isInRequestContext()) {
{code}
if statement can be lifted above assignment.
{code}
+public interface SecureBulkLoadProtocol extends CoprocessorProtocol {
{code}
Add javadoc for the protocol.

 Secure Bulk Load
 

 Key: HBASE-5498
 URL: https://issues.apache.org/jira/browse/HBASE-5498
 Project: HBase
  Issue Type: Improvement
  Components: mapred, security
Reporter: Francis Liu
 Attachments: HBASE-5498_draft.patch


 Design doc: 
 https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
 Short summary:
 Security as it stands does not cover the bulkLoadHFiles() feature. Users 
 calling this method will bypass ACLs. Also loading is made more cumbersome in 
 a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data 
 from user's directory to the hbase directory, which would require certain 
 write access privileges set.
 Our solution is to create a coprocessor which makes use of AuthManager to 
 verify if a user has write access to the table. If so, launches a MR job as 
 the hbase user to do the importing (ie rewrite from text to hfiles). One 
 tricky part this job will have to do is impersonate the calling user when 
 reading the input files. We can do this by expecting the user to pass an hdfs 
 delegation token as part of the secureBulkLoad() coprocessor call and extend 
 an inputformat to make use of that token. The output is written to a 
 temporary directory accessible only by hbase and then bulkloadHFiles() is 
 called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5498) Secure Bulk Load

[
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu reassigned HBASE-5498:
-

Assignee: Francis Liu

Secure Bulk Load

Key: HBASE-5498
URL: https://issues.apache.org/jira/browse/HBASE-5498
Project: HBase
Issue Type: Improvement
Components: mapred, security
Reporter: Francis Liu
Assignee: Francis Liu
Attachments: HBASE-5498_draft.patch

Design doc:
https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
Short summary:
Security as it stands does not cover the bulkLoadHFiles() feature. Users
calling this method will bypass ACLs. Also loading is made more cumbersome in
a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data
from user's directory to the hbase directory, which would require certain
write access privileges set.
Our solution is to create a coprocessor which makes use of AuthManager to
verify if a user has write access to the table. If so, launches a MR job as
the hbase user to do the importing (ie rewrite from text to hfiles). One
tricky part this job will have to do is impersonate the calling user when
reading the input files. We can do this by expecting the user to pass an hdfs
delegation token as part of the secureBulkLoad() coprocessor call and extend
an inputformat to make use of that token. The output is written to a
temporary directory accessible only by hbase and then bulkloadHFiles() is
called.

[jira] [Updated] (HBASE-5498) Secure Bulk Load

[
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu updated HBASE-5498:
--

Fix Version/s: 0.96.0

Secure Bulk Load

Key: HBASE-5498
URL: https://issues.apache.org/jira/browse/HBASE-5498
Project: HBase
Issue Type: Improvement
Components: mapred, security
Reporter: Francis Liu
Assignee: Francis Liu
Fix For: 0.96.0

Attachments: HBASE-5498_draft.patch

[jira] [Commented] (HBASE-5498) Secure Bulk Load

[
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291273#comment-13291273
]

Zhihong Ted Yu commented on HBASE-5498:
---

What happens if user continues using LoadIncrementalHFiles directly ?

Secure Bulk Load

Key: HBASE-5498
URL: https://issues.apache.org/jira/browse/HBASE-5498
Project: HBase
Issue Type: Improvement
Components: mapred, security
Reporter: Francis Liu
Assignee: Francis Liu
Fix For: 0.96.0

Attachments: HBASE-5498_draft.patch

[jira] [Commented] (HBASE-5533) Add more metrics to HBase


[ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291318#comment-13291318
 ] 

Zhihong Ted Yu commented on HBASE-5533:
---

I saw a lot of the following in test output 
(https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3000/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/):
{code}
2012-06-07 18:33:14,623 ERROR 
[RegionServer:0;juno.apache.org,39424,1339093992166] 
util.MetricsDynamicMBeanBase(116): unknown metrics type: 
org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
{code}

 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, 
 HBASE-5533-v7-0.92.patch, TimingOverhead.java, hbase-5533-0.92.patch, 
 hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, 
 histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing


[ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291353#comment-13291353
 ] 

Zhihong Ted Yu commented on HBASE-5726:
---

From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3000/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/:
{code}
2012-06-07 18:33:22,794 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZKUtil(1142): master:49315-0x137c838bfa6 Retrieved 103 byte(s) of 
data from znode /hbase/unassigned/73830568ee93434ba97f7b5ade48ae30 and set 
watcher; region=ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30., 
state=RS_ZK_REGION_SPLITTING, servername=juno.apache.org,39424,1339093992166, 
createTime=1339094002792, payload.length=0
...
2012-06-07 18:33:47,887 DEBUG [Thread-941] 
regionserver.TestSplitTransactionOnCluster(482): Waiting on region to split
2012-06-07 18:33:47,922 DEBUG 
[RegionServer:8;juno.apache.org,43570,1339094025325-splits-1339094027483] 
regionserver.HRegion(463): Instantiated 
testMasterRestartAtRegionSplitPendingCatalogJanitor,,1339094027484.23694c0a5312f5801dfd5a2857cc3556.
2012-06-07 18:33:23,648 DEBUG 
[RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] 
regionserver.HRegion(463): Instantiated 
ephemeral,mnk,1339094002786.b5c2d9c3e0939c583f874e3efd51b478.
2012-06-07 18:33:23,680 INFO  
[RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] 
catalog.MetaEditor(191): Offlined parent region 
ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30. in META
{code}
We can see that region 73830568ee93434ba97f7b5ade48ae30 didn't finish splitting 
after the last 'Waiting on region to split' was printed.
In split() method:
{code}
while (ProtobufUtil.getOnlineRegions(server).size() = regionCount) {
  LOG.debug(Waiting on region to split);
{code}
I think the above method should be improved: if a region is moved onto server, 
the loop would exit but number of daughter regions wouldn't be 2.

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Priority: Critical
 Attachments: Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing


 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5726:
--

Attachment: 5726.txt

I ran TestSplitTransactionOnCluster#testShutdownFixupWhenDaughterHasSplit 5 
times with the patch - they passed.

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Priority: Critical
 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing


 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5726:
--

Status: Patch Available  (was: Open)

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Priority: Critical
 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing


 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5726:
--

Fix Version/s: 0.96.0

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.96.0

 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.

[
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291476#comment-13291476
]

Zhihong Ted Yu commented on HBASE-5924:
---

TestRegionServerCoprocessorExceptionWithAbort failed on QA machine.
Should investigate.

In the client code, don't wait for all the requests to be executed before
resubmitting a request in error.
--

Key: HBASE-5924
URL: https://issues.apache.org/jira/browse/HBASE-5924
Project: HBase
Issue Type: Improvement
Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch

The client (in the function HConnectionManager#processBatchCallback) works in
two steps:
- make the requests
- collect the failures and successes and prepare for retry
It means that when there is an immediate error (region moved, split, dead
server, ...) we still wait for all the initial requests to be executed before
submitting again the failed request. If we have a scenario with all the
requests taking 5 seconds we have a final execution time of: 5 (initial
requests) + 1 (wait time) + 5 (final request) = 11s.
We could improve this by analyzing immediately the results. This would lead
us, for the scenario mentioned above, to 6 seconds.
So we could have a performance improvement of nearly 50% in many cases, and
much more than 50% if the request execution time is different.

[jira] [Commented] (HBASE-5498) Secure Bulk Load

[
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291474#comment-13291474
]

Zhihong Ted Yu commented on HBASE-5498:
---

bq. we should make LoadIncrementalHFiles use the secure api when security is
enabled?
Sure.

Secure Bulk Load

Key: HBASE-5498
URL: https://issues.apache.org/jira/browse/HBASE-5498
Project: HBase
Issue Type: Improvement
Components: mapred, security
Reporter: Francis Liu
Assignee: Francis Liu
Fix For: 0.96.0

Attachments: HBASE-5498_draft.patch

[jira] [Commented] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing


[ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291477#comment-13291477
 ] 

Zhihong Ted Yu commented on HBASE-5726:
---

Integrated to trunk.

Will resolve after at least 5 trunk builds where this test passes.

Thanks for the review, Stack.

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.96.0

 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


 [ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5924:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 
 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291657#comment-13291657
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

With RSTracker gone, the following flag is no longer checked:
{code}
-public synchronized void nodeDeleted(String path) {
-  if (path.equals(rsNode)) {
-regionZKNodeWasDeleted = true;
{code}
Can we keep the check ?
{code}
-assertTrue(RegionServer aborted on coprocessor exception, as expected.,
-rsTracker.regionZKNodeWasDeleted);
{code}
I think this should be kept:
{code}
-table.close();
{code}

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 
 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291811#comment-13291811
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

{code}
$ find hbase-server/src/test -name '*.java' -exec grep 'nized void 
nodeDeleted(Str' {} \; -print
public synchronized void nodeDeleted(String path) {
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
public synchronized void nodeDeleted(String path) {
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
public synchronized void nodeDeleted(String path) {
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
{code}
The other two checks are for master znode.

w.r.t. table.close(), it is good programming practice of cleaning up resources.

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 
 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291889#comment-13291889
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

{code}
+if (loc == null)
+  throw new IOException();
{code}
Without braces, the throw statement should be on the same line as if. Please 
include a brief message for the exception.
Some long lines should be wrapped:
{code}
+final MapHRegionLocation, MultiActionR actionsByServer = new 
HashMapHRegionLocation, MultiActionR();
...
+new TripleMultiActionR, HRegionLocation, 
FutureMultiResponse(e.getValue(), e.getKey(), this.pool.submit(callable));
{code}

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 
 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6194) add open time for a region and list recently closed regions in a regionserver UI


[ 
https://issues.apache.org/jira/browse/HBASE-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292187#comment-13292187
 ] 

Zhihong Ted Yu commented on HBASE-6194:
---

bq. all the region servers that it is hosting
Did you mean regions ?

 add open time for a region and list recently closed regions in a regionserver 
 UI
 

 Key: HBASE-6194
 URL: https://issues.apache.org/jira/browse/HBASE-6194
 Project: HBase
  Issue Type: Improvement
Reporter: Feifei Ji

 The region server currently lists all the region servers that it is hosting. 
 It will be useful to report when those regions were opened on this server. It 
 will also be useful to report what and when were the recent regions closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing

2012-06-09 Thread Zhihong Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5726:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Trunk build has succeeded 5 times.

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Assignee: Zhihong Ted Yu
Priority: Critical
 Fix For: 0.96.0

 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing

2012-06-09 Thread Zhihong Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-5726:
-

Assignee: Zhihong Ted Yu

 TestSplitTransactionOnCluster occasionally failing
 --

 Key: HBASE-5726
 URL: https://issues.apache.org/jira/browse/HBASE-5726
 Project: HBase
  Issue Type: Bug
Reporter: Uma Maheswara Rao G
Assignee: Zhihong Ted Yu
Priority: Critical
 Fix For: 0.96.0

 Attachments: 5726.txt, 
 Hbase.log_testExistingZnodeBlocksSplitAndWeRollback  
 testShutdownFixupWhenDaughterHasSplit, 
 Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling


 When I ran TestSplitTransactionOnCluster, some times tests are failing.
 {quote}
 java.lang.AssertionError: expected:1 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
 {quote}
 Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-10 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292530#comment-13292530
 ] 

Zhihong Ted Yu commented on HBASE-6060:
---

For 6060_suggestion_toassign_rs_wentdown_beforerequest.patch:

Can you give the following variable better name ?
{code}
+SetHRegionInfo regionPlans = new ConcurrentSkipListSetHRegionInfo();
{code}
The set doesn't hold region plans. The following javadoc needs to be adjusted 
accordingly.
{code}
+   * @return Pair that has all regionplans that pertain to this dead server 
and a list that has
{code}
{code}
+  if ((region.getState() == RegionState.State.OFFLINE)
+   (region.getState() == RegionState.State.PENDING_OPEN)) {
{code}
A region cannot be in both states at the same time. '||' should be used instead 
of ''
{code}
+deadRegions = new TreeSetHRegionInfo(assignedRegions);
{code}
Since the fulfillment of deadRegions above is in a different code block from 
the following:
{code}
   if (deadRegions.remove(region.getRegion())) {
{code}
Running testSSHWhenSourceRSandDestRSInRegionPlanGoneDown (from v3) would lead 
to NPE w.r.t. deadRegions

After fixing the above, testSSHWhenSourceRSandDestRSInRegionPlanGoneDown still 
fails.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 
 HBASE-6060-92.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6012) Handling RegionOpeningState for bulk assign


 [ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6012:
--

Summary: Handling RegionOpeningState for bulk assign  (was: Handling 
RegionOpeningState for bulk assign since SSH using)

 Handling RegionOpeningState for bulk assign
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6195) Increment data will lost when the memstore flushed


 [ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6195:
--

Status: Patch Available  (was: Open)

 Increment data will lost when the memstore flushed
 --

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
 Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6195) Increment data will lost when the memstore flushed


[ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292849#comment-13292849
 ] 

Zhihong Ted Yu commented on HBASE-6195:
---

@Xing:
Hadoop QA run wasn't triggered.
Can you add a unit test showing this problem and present test suite results ?

Thanks

 Increment data will lost when the memstore flushed
 --

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
 Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, 
 HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6195) Increment data will lost when the memstore flushed


[ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292862#comment-13292862
 ] 

Zhihong Ted Yu commented on HBASE-6195:
---

In patch v3:
{code}
+  long now = EnvironmentEdgeManager.currentTimeMillis();
   Integer lid = getLock(lockid, row, true);
{code}
Variable now isn't actually referenced. Do we need it ?
{code}
+  //store the kvs to the tmp memory for write hlog first, then write 
memory
{code}
The above should read: 'to temporary memstore before writing HLog'

 Increment data will lost when the memstore flushed
 --

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
 Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, 
 HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed


 [ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6195:
--

Assignee: ShiXing
Hadoop Flags: Reviewed
 Summary: Increment data will be lost when the memstore is flushed  
(was: Increment data will lost when the memstore flushed)

 Increment data will be lost when the memstore is flushed
 

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
Assignee: ShiXing
 Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, 
 HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292868#comment-13292868
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

@Chunhui:
Hadoop QA is not functioning. Can you run the whole test suite and post the 
result ?


 Handling RegionOpeningState for bulk assign
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293270#comment-13293270
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

There was NPE in TestAssignmentManager#testSSHWhenSplitRegionInProgress
Please fix.

Thanks

 Handling RegionOpeningState for bulk assign
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293356#comment-13293356
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

I ran the two failed tests manually and they passed.

Will integrate tomorrow if there is no objection.

 Handling RegionOpeningState for bulk assign
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, 
 HBASE-6012v6.patch, HBASE-6012v7.patch, HBASE-6012v8.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5914) Bulk assign regions in the process of ServerShutdownHandler


 [ 
https://issues.apache.org/jira/browse/HBASE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-5914:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Bulk assign regions in the process of ServerShutdownHandler
 ---

 Key: HBASE-5914
 URL: https://issues.apache.org/jira/browse/HBASE-5914
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-5914.patch, HBASE-5914v2.patch, HBASE-5914v3.patch


 In the process of ServerShutdownHandler, we currently assign regions singly.
 In the large cluster, one regionserver always carried many regions, this 
 action is  quite slow.
 What about using bulk assign regions like cluster start up.
 In current logic,  if we failed assigning many regions to one destination 
 server, we will wait unitl timeout, 
 however in the process of ServerShutdownHandler, we should retry it to 
 another server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records


[ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293375#comment-13293375
 ] 

Zhihong Ted Yu commented on HBASE-5564:
---

Minor comment:
{code}
+  throw new BadTsvLineException(Invalid timestamp);
{code}
Can the timestamp string be included ?

 Bulkload is discarding duplicate records
 

 Key: HBASE-5564
 URL: https://issues.apache.org/jira/browse/HBASE-5564
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.96.0
 Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
  Labels: bulkloader
 Fix For: 0.96.0

 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, 
 HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, 
 HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, 
 HBASE-5564_trunk.patch


 Duplicate records are getting discarded when duplicate records exists in same 
 input file and more specifically if they exists in same split.
 Duplicate records are considered if the records are from diffrent different 
 splits.
 Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer


[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293517#comment-13293517
 ] 

Zhihong Ted Yu commented on HBASE-5699:
---

As I mentioned in HBASE-6055 @ 04/Jun/12 17:47, one of the benefits of this 
feature is for each HLog file to receive edits for one single table.

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi
 Attachments: PerfHbase.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293668#comment-13293668
 ] 

Zhihong Ted Yu commented on HBASE-6012:
---

Integrated to trunk.

Thanks for the patch, Chunhui.

Thanks for the review, Stack and Ram.

 Handling RegionOpeningState for bulk assign
 ---

 Key: HBASE-6012
 URL: https://issues.apache.org/jira/browse/HBASE-6012
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, 
 HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, 
 HBASE-6012v6.patch, HBASE-6012v7.patch, HBASE-6012v8.patch


 Since HBASE-5914, we using bulk assign for SSH
 But in the bulk assign case if we get an ALREADY_OPENED case there is no one 
 to clear the znode created by bulk assign. 
 Another thing, when RS opening a list of regions, if one region is already in 
 transition, it will throw RegionAlreadyInTransitionException and stop opening 
 other regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed


[ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293708#comment-13293708
 ] 

Zhihong Ted Yu commented on HBASE-6195:
---

The new test fails without fix in patch:
{code}
Failed tests:   
testParallelIncrementWithMemStoreFlush(org.apache.hadoop.hbase.regionserver.TestHRegion):
 expected:2000 but was:968
{code}

Will integrate this afternoon if there is no objection.

 Increment data will be lost when the memstore is flushed
 

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
Assignee: ShiXing
 Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, 
 HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, 
 HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed


 [ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6195:
--

Attachment: 6195-trunk-V7.patch

Modified the test slightly.
Made Incrementer class private, removed unused variable.

 Increment data will be lost when the memstore is flushed
 

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
Assignee: ShiXing
 Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, 
 HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, 
 HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293778#comment-13293778
 ] 

Zhihong Ted Yu commented on HBASE-5924:
---

I don't have further comments.

Thanks

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 
 5924.v9.patch


 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed


[ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293828#comment-13293828
 ] 

Zhihong Ted Yu commented on HBASE-6195:
---

Integrated to trunk.

Thanks for the patch, Xing.

 Increment data will be lost when the memstore is flushed
 

 Key: HBASE-6195
 URL: https://issues.apache.org/jira/browse/HBASE-6195
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Xing Shi
Assignee: ShiXing
 Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, 
 HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, 
 HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch


 There are two problems in increment() now:
 First:
 I see that the timestamp(the variable now) in HRegion's Increment() is 
 generated before got the rowLock, so when there are multi-thread increment 
 the same row, although it generate earlier, it may got the lock later. 
 Because increment just store one version, so till now, the result will still 
 be right.
 When the region is flushing, these increment will read the kv from snapshot 
 and memstore with whose timestamp is larger, and write it back to memstore. 
 If the snapshot's timestamp larger than the memstore, the increment will got 
 the old data and then do the increment, it's wrong.
 Secondly:
 Also there is a risk in increment. Because it writes the memstore first and 
 then HLog, so if it writes HLog failed, the client will also read the 
 incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'


[ 
https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294004#comment-13294004
 ] 

Zhihong Ted Yu commented on HBASE-6185:
---

Please also wrap the long line in the patch.
Currently we maintain 100 characters per line.

 region autoSplit when not reach 'hbase.hregion.max.filesize'
 

 Key: HBASE-6185
 URL: https://issues.apache.org/jira/browse/HBASE-6185
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.94.0
Reporter: nneverwei
 Attachments: HBASE-6185.patch


 When using hbase0.94.0 we met a strange problem.
 We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to 
 act as auto-split turn off). 
 {code:xml}
 property
   namehbase.hregion.max.filesize/name
   value107374182400/value
 /property
 {code}
 Then we keep putting datas into a table.
 But when the data size far more less than 100Gb(about 500~600 uncompressed 
 datas), the table auto splte to 2 regions...
 I change the log4j config to DEBUG, and saw logs below:
 {code}
 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for 
 region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 
 3201ms, sequenceid=176387980, compaction requested=false
 2012-06-07 10:30:52,161 DEBUG 
 org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: 
 ShouldSplit because info size=138657416, sizeToCheck=134217728, 
 regionsWithCommonTable=1
 2012-06-07 10:30:52,161 DEBUG   
 org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: 
 ShouldSplit because info size=138657416, sizeToCheck=134217728, 
 regionsWithCommonTable=1
 2012-06-07 10:30:52,240 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8..  
 compaction_queue=(0:0), split_queue=0
 2012-06-07 10:30:52,265 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.
 2012-06-07 10:30:52,265 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 
 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state
 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x137c4929efe0001 Attempting to transition node 
 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x137c4929efe0001 Successfully transitioned node 
 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: 
 disabling compactions  flushes
 2012-06-07 10:30:52,410 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing
 2012-06-07 10:30:52,411 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing
 {code}
 {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info 
 size=138657416, sizeToCheck=134217728{color}
 I did not config splitPolicy for hbase, so it means 
 *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0*
 After add
 {code:xml}
 property
 namehbase.regionserver.region.split.policy/name
 
 valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value
 /property
 {code}
 autosplit did not happen again and everything goes well.
 But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This 
 is the default split policy'. Or even in the 
 http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split 
 Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'.
 Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, 
 than the auto-split can be almost shutdown.
 You may change those docs, and What more, in many scenerys, we actually need 
 to control split manually（As you know when spliting the table are offline, 
 reads and writes will fail） 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log


 [ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6134:
--

Attachment: 6134v4.patch

TestSplitLogManager passes locally.

Reattaching patch v4.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch


 First，we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment：34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log


 [ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6134:
--

Hadoop Flags: Reviewed

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch


 First，we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment：34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log


[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294053#comment-13294053
 ] 

Zhihong Ted Yu commented on HBASE-6134:
---

TestServerCustomProtocol passes locally.
Will integrate later if there is no objection.

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch


 First，we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment：34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event


[ 
https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294072#comment-13294072
 ] 

Zhihong Ted Yu commented on HBASE-5970:
---

@Chunhui:
You can open a new issue for improving the above code.

 Improve the AssignmentManager#updateTimer and speed up handling opened event
 

 Key: HBASE-5970
 URL: https://issues.apache.org/jira/browse/HBASE-5970
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, 
 HBASE-5970v3.patch, HBASE-5970v4.patch, HBASE-5970v4.patch


 We found handing opened event very slow in the environment with lots of 
 regions.
 The problem is the slow AssignmentManager#updateTimer.
 We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process 
 of bulk assigning took 1 hours.
 2012-05-06 20:31:49,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 
 region(s) round-robin across 5 server(s)
 2012-05-06 21:26:32,103 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
 I think we could do the improvement for the AssignmentManager#updateTimer: 
 Make a thread do this work.
 After the improvement, it took only 4.5mins
 2012-05-07 11:03:36,581 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 
 region(s) across 5 server(s), retainAssignment=true 
 2012-05-07 11:07:57,073 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'


[ 
https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294079#comment-13294079
 ] 

Zhihong Ted Yu commented on HBASE-6185:
---

{code}
+ * This is the default split policy. From 0.94.0 the default split policy 
change
{code}
The above should read 'This was the default split policy. From 0.94.0 on the 
default split policy has changed'

 region autoSplit when not reach 'hbase.hregion.max.filesize'
 

 Key: HBASE-6185
 URL: https://issues.apache.org/jira/browse/HBASE-6185
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.94.0
Reporter: nneverwei
 Fix For: 0.94.1

 Attachments: HBASE-6185.patch


 When using hbase0.94.0 we met a strange problem.
 We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to 
 act as auto-split turn off). 
 {code:xml}
 property
   namehbase.hregion.max.filesize/name
   value107374182400/value
 /property
 {code}
 Then we keep putting datas into a table.
 But when the data size far more less than 100Gb(about 500~600 uncompressed 
 datas), the table auto splte to 2 regions...
 I change the log4j config to DEBUG, and saw logs below:
 {code}
 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for 
 region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 
 3201ms, sequenceid=176387980, compaction requested=false
 2012-06-07 10:30:52,161 DEBUG 
 org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: 
 ShouldSplit because info size=138657416, sizeToCheck=134217728, 
 regionsWithCommonTable=1
 2012-06-07 10:30:52,161 DEBUG   
 org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: 
 ShouldSplit because info size=138657416, sizeToCheck=134217728, 
 regionsWithCommonTable=1
 2012-06-07 10:30:52,240 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8..  
 compaction_queue=(0:0), split_queue=0
 2012-06-07 10:30:52,265 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.
 2012-06-07 10:30:52,265 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 
 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state
 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x137c4929efe0001 Attempting to transition node 
 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x137c4929efe0001 Successfully transitioned node 
 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: 
 disabling compactions  flushes
 2012-06-07 10:30:52,410 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing
 2012-06-07 10:30:52,411 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; 
 FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing
 {code}
 {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info 
 size=138657416, sizeToCheck=134217728{color}
 I did not config splitPolicy for hbase, so it means 
 *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0*
 After add
 {code:xml}
 property
 namehbase.regionserver.region.split.policy/name
 
 valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value
 /property
 {code}
 autosplit did not happen again and everything goes well.
 But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This 
 is the default split policy'. Or even in the 
 http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split 
 Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'.
 Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, 
 than the auto-split can be almost shutdown.
 You may change those docs, and What more, in many scenerys, we actually need 
 to control split manually（As you know when spliting the table are offline, 
 reads and writes will fail） 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Updated] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class