[jira] [Commented] (HBASE-10411) [Book] Add a kerberos 'request is a replay (34)' issue at troubleshooting section
[ https://issues.apache.org/jira/browse/HBASE-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897618#comment-13897618 ] Liang Xie commented on HBASE-10411: --- Hmm, seems my above said is only for JDK6, just found JDK-7085018 and JDK-6882687... [Book] Add a kerberos 'request is a replay (34)' issue at troubleshooting section - Key: HBASE-10411 URL: https://issues.apache.org/jira/browse/HBASE-10411 Project: HBase Issue Type: Improvement Components: documentation, security Reporter: takeshi.miao Assignee: takeshi.miao Priority: Minor Attachments: HBASE-10411-trunk-v01.patch, HBASE-10411-v01.odt For kerberos 'request is a replay (34)' issue (HBASE-10379), adding it to the troubleshooting section in HBase book -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897617#comment-13897617 ] Devaraj Das commented on HBASE-10490: - Nice cleanup, but hopefully we can make sure the changes doesn't break any assumption in the RPC layer. The 'ping' removal stood out while I was doing a review. Am wondering if the server side works well with this change.. i mean could this happen (1) client sends an RPC (2) server got to it but the request is taking a long time to process (3) meanwhile the server sees the connection as idle and closes it (since no ping came) The other thing is if the client's intended socket timeout is 0 (infinite timeout), wondering if the ping is relevant then to prevent server from closing the connection on incomplete/not-yet-responded RPCs. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10487) Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values
[ https://issues.apache.org/jira/browse/HBASE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897619#comment-13897619 ] Hudson commented on HBASE-10487: FAILURE: Integrated in HBase-TRUNK #4908 (See [https://builds.apache.org/job/HBase-TRUNK/4908/]) HBASE-10487 Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values (Honghua) (tedyu: rev 1566981) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values --- Key: HBASE-10487 URL: https://issues.apache.org/jira/browse/HBASE-10487 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10487-trunk_v1.patch in HRegion.append, new KeyValues will be allocated and do according bytes-copying no matter whether there are existing kv for the appended cells, we can improve here by avoiding the allocating of new KeyValue and according bytes-copying for kv which don't have existing(old) values by reusing the passed-in kv and only updating its timestamp to 'now'(its original timestamp is latest, so can be updated) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10495) upgrade script is printing usage two times with help option.
[ https://issues.apache.org/jira/browse/HBASE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897656#comment-13897656 ] Hadoop QA commented on HBASE-10495: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628159/HBASE-10495.patch against trunk revision . ATTACHMENT ID: 12628159 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8659//console This message is automatically generated. upgrade script is printing usage two times with help option. Key: HBASE-10495 URL: https://issues.apache.org/jira/browse/HBASE-10495 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.96.0 Reporter: rajeshbabu Assignee: rajeshbabu Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10495.patch while testing 0.98 RC found usage is printing two times with help option. {code} HOST-10-18-91-14:/home/rajeshbabu/98RC3/hbase-0.98.0-hadoop2/bin # ./hbase upgrade -h usage: $bin/hbase upgrade -check [-dir DIR]|-execute -check Run upgrade check; looks for HFileV1 under ${hbase.rootdir} or provided 'dir' directory. -dir arg Relative path of dir to check for HFileV1s. -execute Run upgrade; zk and hdfs must be up, hbase down -h,--helpHelp Read http://hbase.apache.org/book.html#upgrade0.96 before attempting upgrade Example usage: Run upgrade check; looks for HFileV1s under ${hbase.rootdir}: $ bin/hbase upgrade -check Run the upgrade: $ bin/hbase upgrade -execute usage: $bin/hbase upgrade -check [-dir DIR]|-execute -check Run upgrade check; looks for HFileV1 under ${hbase.rootdir} or provided 'dir' directory. -dir arg Relative path of dir to check for HFileV1s. -execute Run upgrade; zk and hdfs must be up, hbase down -h,--helpHelp Read http://hbase.apache.org/book.html#upgrade0.96 before attempting upgrade Example usage: Run upgrade check; looks for HFileV1s under
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897660#comment-13897660 ] Nicolas Liochon commented on HBASE-10490: - Nice catch. The max idle time is actually used on the server as well. But it does not really work, because the default are different: server: maxIdleTime = 2*conf.getInt(ipc.client.connection.maxidletime, 1000); client: maxIdleTime = conf.getInt(hbase.ipc.client.connection.maxidletime, 1); //10s So it means that the server disconnects any client that have not spoken for 2 seconds; while the client pings every 10 seconds. not as well that one is prefixed by 'hbase.' while the other is not. In 2008 they were sharing the same name then it diverged. I suppose the code on the server doesn't do much because of this: {code} protected int thresholdIdleConnections; // the number of idle // connections after which we // will start cleaning up idle // connections {code} thresholdIdleConnections is defaulted to 4000. Likely it never happens, and if it was happening it would not work because of the difference in default values. I suppose the best way of doing this is: order the connection not used for at least x seconds, kills some of the oldest. But, we can say as well that if we're satisfied with the way the server behaves today, we can remove the ping on the client without changing anything in the server: the behavior won't change. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
Feng Honghua created HBASE-10497: Summary: Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10497: - Attachment: HBASE-10497-trunk_v1.patch patch attached Note: The handling for InterruptedException thrown by sleep within createTable in HBaseAdmin.java is rethrow an encapsulated InterruptedIOException, while it's ignored for the one thrown within deleteTable. To keep the previous semantic, I just log and call Thread.currentThread.interrupt there. But maybe rethrowing an encapsulated InterruptedIOException as in createTable is more appropriate. Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically --- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897694#comment-13897694 ] Feng Honghua commented on HBASE-10497: -- Threads.sleep()(which prints full call stack and call Thread.currentThread.interrupt) is also a good alternative and it's used somewhere for this purpose, but maybe printing full call stack is a bit more heavyweight than a one-line logging for most cases? Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically --- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10497: - Summary: Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically (was: Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically -- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897700#comment-13897700 ] Lukas Nalezenec commented on HBASE-10413: - Hi, thank you very much for your time. I need one small change. Its not critical but it will make considerable difference in user experience. My line LOG.info(MessageFormat.format(Input split length: {0} bytes., tSplit.getLength())); was changed to LOG.info(Input split length: + tSplit.getLength() + bytes.); in last code review. The reason why i used MessageFormat.format is that the length is large number and it needs to be printed with thousands separator. It takes few seconds to read number 54798765321 How fast can you say if the number represents 5.4 TB or 5.4 GB ? but if you print it with separators you can correctly read it in a moment: 54,798,765,321 Can we add some formatting consistent with hbase coding standards ? Maybe String.format i dont know. Lukas Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897708#comment-13897708 ] Nicolas Liochon commented on HBASE-10497: - I'm not sure of this one. {code} +++ hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.java (working copy) @@ -111,6 +111,7 @@ } } catch (InterruptedException e) { System.err.println(didn't wait long enough: + e); + Thread.currentThread().interrupt(); return (-1); } {code} Because we took care of the interruption already (may be wrongly however), so the thread is not interrupted any more: the -1 means: we stop. It's the same for the one with the split worker likely. This one is likely wrong: {code} @@ -185,6 +185,8 @@ try { Thread.sleep(100); } catch (InterruptedException ignored) { +LOG.warn(Interrupted while sleeping); +Thread.currentThread().interrupt(); } if (System.currentTimeMillis() startTime + 3) { throw new RuntimeException(Master not active after 30 seconds); {code} As it's inside a loop, we're likely to loop again, then the sleep will be interrupted immediately as we restored the interruption status, then we will log again = we will flood the logs. I haven't checked the whole patch, but inside a loop you can't simply restore the status, you need to take a decision (stop the loop) or store the interruption to restore the status later. When we can, it's better to take care of the interruption explicitly by stopping our process and/or rethrowing an exception to the caller. In the case above, may be be we should throw a runtime exception, as in Master not active after 30 seconds? See as well ExceptionUtils (it's new). Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically -- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10498) Add new APIs to load balancer interface
rajeshbabu created HBASE-10498: -- Summary: Add new APIs to load balancer interface Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8332) Add truncate as HMaster method
[ https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8332: --- Attachment: HBASE-8332-v3.patch Add truncate as HMaster method -- Key: HBASE-8332 URL: https://issues.apache.org/jira/browse/HBASE-8332 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, HBASE-8332-v3.patch, HBASE-8332.draft.patch Currently truncate and truncate_preserve are only shell functions, and implemented as deleteTable() + createTable(). Using ACLs the user running truncate, must have rights to create a table and only global granted users can create tables. Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL check. https://reviews.apache.org/r/15835/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8332) Add truncate as HMaster method
[ https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8332: --- Attachment: (was: HBASE-8332-v3.patch) Add truncate as HMaster method -- Key: HBASE-8332 URL: https://issues.apache.org/jira/browse/HBASE-8332 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, HBASE-8332.draft.patch Currently truncate and truncate_preserve are only shell functions, and implemented as deleteTable() + createTable(). Using ACLs the user running truncate, must have rights to create a table and only global granted users can create tables. Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL check. https://reviews.apache.org/r/15835/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8332) Add truncate as HMaster method
[ https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8332: --- Attachment: HBASE-8332-v3.patch Add truncate as HMaster method -- Key: HBASE-8332 URL: https://issues.apache.org/jira/browse/HBASE-8332 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, HBASE-8332-v3.patch, HBASE-8332.draft.patch Currently truncate and truncate_preserve are only shell functions, and implemented as deleteTable() + createTable(). Using ACLs the user running truncate, must have rights to create a table and only global granted users can create tables. Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL check. https://reviews.apache.org/r/15835/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
ramkrishna.s.vasudevan created HBASE-10499: -- Summary: In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.0, 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 8677 {code} {code} 2014-02-11 10:31:21,020 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=54, maxlogs=32; forcing flush of 1 regions(s): fdbb3242d3b673bbe4790a47bc30576f {code} I restarted another RS and there were region movements with other regions but this region stays with the RS that has this issue. One important observation is that in
[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897779#comment-13897779 ] ramkrishna.s.vasudevan commented on HBASE-10499: Am not sure if this could come in 0.96 and trunk also. I feel 0.96 it is possible but with trunk (recent HLog disruptor) changes am not sure. Also may be possible in 0.94. I don't have any soln in hand except for adding log msgs in such a case where memstoreSize could be zero. Will check more on this. In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.0, 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay
[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10499: --- Affects Version/s: 0.98.0 In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 8677 {code} {code} 2014-02-11 10:31:21,020 INFO
[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897780#comment-13897780 ] ramkrishna.s.vasudevan commented on HBASE-10499: I have logs and thread dumps taken during this time. If needed can attach them here. In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region
[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10499: --- Fix Version/s: (was: 0.98.0) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 8677 {code} {code} 2014-02-11 10:31:21,020
[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897787#comment-13897787 ] ramkrishna.s.vasudevan commented on HBASE-10499: In HBASE-5568 and HBASE-5312 there were multiple flushes on the same region and splitting of regions were happening. Here no such things happen. The region in discussion has not been flushed even once and there are no splits or compactions that has happened on it. In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099
[jira] [Commented] (HBASE-8332) Add truncate as HMaster method
[ https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897799#comment-13897799 ] Hadoop QA commented on HBASE-8332: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628180/HBASE-8332-v3.patch against trunk revision . ATTACHMENT ID: 12628180 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +private TruncateTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private TruncateTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private EnableTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private EnableTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private DisableTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private DisableTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private ModifyTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +.preTruncateTable(ObserverContext.createAndPrepare(CP_ENV, null), TEST_TABLE.getTableName()); {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8660//console This message is automatically generated. Add truncate as HMaster method -- Key: HBASE-8332 URL: https://issues.apache.org/jira/browse/HBASE-8332 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, HBASE-8332-v3.patch, HBASE-8332.draft.patch Currently truncate and truncate_preserve are only shell functions, and implemented as deleteTable() + createTable(). Using ACLs the user running truncate, must have rights to create a table and only global granted users can create tables. Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL check.
[jira] [Commented] (HBASE-8332) Add truncate as HMaster method
[ https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897809#comment-13897809 ] Hadoop QA commented on HBASE-8332: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628184/HBASE-8332-v3.patch against trunk revision . ATTACHMENT ID: 12628184 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +private TruncateTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private TruncateTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private EnableTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private EnableTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private DisableTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private DisableTableResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private ModifyTableRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +.preTruncateTable(ObserverContext.createAndPrepare(CP_ENV, null), TEST_TABLE.getTableName()); {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8661//console This message is automatically generated. Add truncate as HMaster method -- Key: HBASE-8332 URL: https://issues.apache.org/jira/browse/HBASE-8332 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, HBASE-8332-v3.patch, HBASE-8332.draft.patch Currently truncate and truncate_preserve are only shell functions, and implemented as deleteTable() + createTable(). Using ACLs the user running truncate, must have rights to create a table and only global granted users can create tables. Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL check. https://reviews.apache.org/r/15835/
[jira] [Commented] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp
[ https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897814#comment-13897814 ] Hudson commented on HBASE-10486: SUCCESS: Integrated in hbase-0.96-hadoop2 #199 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/199/]) HBASE-10486: ProtobufUtil Append Increment deserialization lost cell level timestamp (jeffreyz: rev 1566962) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java ProtobufUtil Append Increment deserialization lost cell level timestamp - Key: HBASE-10486 URL: https://issues.apache.org/jira/browse/HBASE-10486 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.1, 0.99.0 Attachments: hbase-10486-v2.patch, hbase-10486.patch When we deserialized Append Increment, we uses wrong timestamp value during deserialization in trunk 0.98 code and discard the value in 0.96 code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897820#comment-13897820 ] Lukas Nalezenec commented on HBASE-10413: - One more think: There is some versioning in class TableSplit (methods write read). We dont need to increment it ? (I am just asking) Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10497: - Attachment: HBASE-10497-trunk_v2.patch Patch attached per [~nkeywal]'s review feedback, and thanks [~nkeywal] again:-) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically -- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch, HBASE-10497-trunk_v2.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
[ https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897827#comment-13897827 ] Feng Honghua commented on HBASE-10497: -- Thanks [~nkeywal] for the prompt review! bq.This one is likely wrong...As it's inside a loop, we're likely to loop again, then the sleep will be interrupted immediately as we restored the interruption status, then we will log again = we will flood the logs Good catch, my carelessness. To keep the same semantic(in terms of how/when we handle the exception), I use below code block for such non-cancelable tasks as while/for loops to prevent latter sleep from immediately throwing InterruptedException due to the re-interrupt issued by former iteration's catch... {code} boolean interrupted = false; try { while (...) { try { ... Thread.sleep(...); ... } catch (InterruptedException e) { interrupted = true; } } } finally { if (interrupted) { LOG.warn(...); Thread.currentThread().interrupt(); } } {code} bq.Because we took care of the interruption already (may be wrongly however), so the thread is not interrupted any more: the -1 means: we stop. It's the same for the one with the split worker likely. As you said we do take care of the interruption already, but they are within Runnable and Tool respectively, it's still meaningful to re-interrupt current thread in order for code higher up the call stack to being able to know about the fact of interrupt, right? At least it does no harm to re-interrupt here. Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically -- Key: HBASE-10497 URL: https://issues.apache.org/jira/browse/HBASE-10497 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Priority: Minor Attachments: HBASE-10497-trunk_v1.patch There are many places where InterruptedException thrown by Thread.sleep are swallowed silently (which are neither declared in the caller method's throws clause nor rethrown immediately) under HBase-Client/HBase-Server folders. It'd be better to add standard 'log and call currentThread.interrupt' for such cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10487) Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values
[ https://issues.apache.org/jira/browse/HBASE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10487: --- Fix Version/s: 0.99.0 Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values --- Key: HBASE-10487 URL: https://issues.apache.org/jira/browse/HBASE-10487 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Attachments: HBASE-10487-trunk_v1.patch in HRegion.append, new KeyValues will be allocated and do according bytes-copying no matter whether there are existing kv for the appended cells, we can improve here by avoiding the allocating of new KeyValue and according bytes-copying for kv which don't have existing(old) values by reusing the passed-in kv and only updating its timestamp to 'now'(its original timestamp is latest, so can be updated) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898030#comment-13898030 ] Ted Yu commented on HBASE-10499: bq. I have logs and thread dumps taken during this time Please attach them. In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f.
[jira] [Updated] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated HBASE-10498: - Tags: Phoenix Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898035#comment-13898035 ] James Taylor commented on HBASE-10498: -- Can this work be done in 0.98? Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10499: --- Attachment: hbase-root-regionserver-ip-10-93-128-92.zip t2.dump t1.dump In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1 Attachments: hbase-root-regionserver-ip-10-93-128-92.zip, t1.dump, t2.dump I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099 2014-02-11 09:43:04,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898037#comment-13898037 ] Nick Dimiduk commented on HBASE-10413: -- bq. Can we add some formatting consistent with hbase coding standards ? Maybe String.format i dont know. I agree, this is difficult to read. Usually we use [StringUtils#humanReadableInt(long)|http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/StringUtils.html#humanReadableInt(long)]. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898038#comment-13898038 ] Nick Dimiduk commented on HBASE-10413: -- The TableSplit writable does not persist beyond the life of a mapreduce job. A single job will have the same version of the serialization code, so there's no versioning to increment. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10413: --- Attachment: 10413.addendum Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-10498: --- Fix Version/s: 0.98.1 Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.1, 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898059#comment-13898059 ] rajeshbabu commented on HBASE-10498: Added 0.98.1 to fix versions. Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.1, 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898065#comment-13898065 ] Ted Yu commented on HBASE-10413: Integrated addendum to 0.98 and trunk. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10500) hbck and OOM when BucketCache is enabled
Nick Dimiduk created HBASE-10500: Summary: hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with the same hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898087#comment-13898087 ] Nick Dimiduk commented on HBASE-10500: -- Here's the full stack trace: {noformat} Exception in thread main java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:731) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:638) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:609) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:595) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4195) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4154) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4127) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4205) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4085) at org.apache.hadoop.hbase.util.HBaseFsckRepair.createHDFSRegionDir(HBaseFsckRepair.java:190) at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo$HDFSIntegrityFixer.handleHoleInRegionChain(HBaseFsck.java:2312) at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.checkRegionChain(HBaseFsck.java:2492) at org.apache.hadoop.hbase.util.HBaseFsck.checkHdfsIntegrity(HBaseFsck.java:1226) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:741) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:386) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:475) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:4029) at org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:3838) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3826) Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at org.apache.hadoop.hbase.util.ByteBufferArray.init(ByteBufferArray.java:65) at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.init(ByteBufferIOEngine.java:44) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:270) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.init(BucketCache.java:210) at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:399) at org.apache.hadoop.hbase.io.hfile.CacheConfig.init(CacheConfig.java:143) at org.apache.hadoop.hbase.regionserver.HStore.init(HStore.java:231) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3309) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:702) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:699) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with the same hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10500: - Description: Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. was: Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with the same hardware profile as the RS. hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10501) Make IncreasingToUpperBoundRegionSplitPolicy configurable
Lars Hofhansl created HBASE-10501: - Summary: Make IncreasingToUpperBoundRegionSplitPolicy configurable Key: HBASE-10501 URL: https://issues.apache.org/jira/browse/HBASE-10501 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl During some (admittedly) artificial load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy. The current logic is this (from the comment) regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size. With 10gb file sized it is still 9 regions of the same table. Considering that the number of regions that an RS can carry is limited and might be multiple tables, this should be more configurable. I think the squaring is smart and we do not need to change it. We could * Make the start size configurable and default it to the flush size * Add multiplier for the initial size, i.e. start with n * flushSize Of course one can override the default split policy, but these seem like simple tweaks. Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size. [~stack], I'm interested in your opinion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
Liyin Tang created HBASE-10502: -- Summary: [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898111#comment-13898111 ] Lars Hofhansl commented on HBASE-10502: --- see also HBASE-9272 [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898127#comment-13898127 ] Nick Dimiduk commented on HBASE-10413: -- Thanks Ted. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898128#comment-13898128 ] Liyin Tang commented on HBASE-10502: By skimming though HBASE-9272, the semantics seem to be a little different. In this case, the client actually wants to construct multiple scan requests, while HBASE-9272 is to perform a single scan request in parallel. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10501) Make IncreasingToUpperBoundRegionSplitPolicy configurable
[ https://issues.apache.org/jira/browse/HBASE-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10501: -- Description: During some (admittedly artificial) load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy. The current logic is this (from the comments): regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size. With 10gb file sized it is still 9 regions of the same table. Considering that the number of regions that an RS can carry is limited and there might be multiple tables, this should be more configurable. I think the squaring is smart and we do not need to change it. We could * Make the start size configurable and default it to the flush size * Add multiplier for the initial size, i.e. start with n * flushSize * Also change the default to start with 2*flush size Of course one can override the default split policy, but these seem like simple tweaks. Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size. [~stack], I'm especially interested in your opinion. was: During some (admittedly) artificial load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy. The current logic is this (from the comment) regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size. With 10gb file sized it is still 9 regions of the same table. Considering that the number of regions that an RS can carry is limited and might be multiple tables, this should be more configurable. I think the squaring is smart and we do not need to change it. We could * Make the start size configurable and default it to the flush size * Add multiplier for the initial size, i.e. start with n * flushSize Of course one can override the default split policy, but these seem like simple tweaks. Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size. [~stack], I'm interested in your opinion. Make IncreasingToUpperBoundRegionSplitPolicy configurable - Key: HBASE-10501 URL: https://issues.apache.org/jira/browse/HBASE-10501 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl During some (admittedly artificial) load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy. The current logic is this (from the comments): regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size. With 10gb file sized it is still 9 regions of the same table. Considering that the number of regions that an RS can carry is limited and there might be multiple tables, this should be more configurable. I think the squaring is smart and we do not need to change it. We could * Make the start size configurable and default it to the flush size * Add multiplier for the initial size, i.e. start with n * flushSize * Also change the default to start with 2*flush size Of course one can override the default split policy, but these seem like simple tweaks. Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size. [~stack], I'm especially interested in your opinion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898132#comment-13898132 ] Liyin Tang commented on HBASE-10502: Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. Assuming a join query such as Table A joins Table B based on row key / some prefix, then HBase-9272 is useful to issue the initial scan in parallel to retrieve all the join keys, and then based on join keys, multiple scan queries for Table B can be constructed and be submitted in parallel by HBase10502. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898138#comment-13898138 ] Liyin Tang commented on HBASE-10502: In addition, the API of HBASE-10502 seems to more flexible (to me). Because if there is a single scan request, spanning multiple region boundaries, then hbase client is always able to split this scan request into multiple region-local scan requests, and then submit to HBASE-10502 for parallel execution. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898157#comment-13898157 ] Nick Dimiduk commented on HBASE-10500: -- Looks like the same kind of issue crops up with LoadIncrementalHFiles: {noformat} 2014-02-11 18:14:30,021 ERROR [main] mapreduce.LoadIncrementalHFiles: Unexpected execution exception during splitting java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:407) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:288) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:822) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:827) Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at org.apache.hadoop.hbase.util.ByteBufferArray.init(ByteBufferArray.java:65) at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.init(ByteBufferIOEngine.java:44) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:270) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.init(BucketCache.java:210) at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:399) at org.apache.hadoop.hbase.io.hfile.CacheConfig.init(CacheConfig.java:166) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:476) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:397) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:395) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp
[ https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898164#comment-13898164 ] Enis Soztutar commented on HBASE-10486: --- Jeffrey, gentle reminder to set the appropriate fix versions once you commit the the patch to the branch(es). ProtobufUtil Append Increment deserialization lost cell level timestamp - Key: HBASE-10486 URL: https://issues.apache.org/jira/browse/HBASE-10486 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10486-v2.patch, hbase-10486.patch When we deserialized Append Increment, we uses wrong timestamp value during deserialization in trunk 0.98 code and discard the value in 0.96 code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp
[ https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10486: -- Fix Version/s: 0.96.2 ProtobufUtil Append Increment deserialization lost cell level timestamp - Key: HBASE-10486 URL: https://issues.apache.org/jira/browse/HBASE-10486 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10486-v2.patch, hbase-10486.patch When we deserialized Append Increment, we uses wrong timestamp value during deserialization in trunk 0.98 code and discard the value in 0.96 code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898181#comment-13898181 ] Enis Soztutar commented on HBASE-10498: --- Is this for doing all the placement decisions through the LB interfaces? I think it makes sense. Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.1, 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898190#comment-13898190 ] rajeshbabu commented on HBASE-10498: bq.Is this for doing all the placement decisions through the LB interfaces? Yes [~enis]. Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.1, 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10503) [0.89-fb] Add metrics to track compaction hook progress
Adela Maznikar created HBASE-10503: -- Summary: [0.89-fb] Add metrics to track compaction hook progress Key: HBASE-10503 URL: https://issues.apache.org/jira/browse/HBASE-10503 Project: HBase Issue Type: Improvement Components: Compaction Affects Versions: 0.89-fb Reporter: Adela Maznikar Assignee: Adela Maznikar Priority: Minor Add a metric to track how many KVs we have converted with the compaction hook, and bytes that we have saved during the process. This will help us to see when there are no new KVs converted and give us a good signal when to disable it (all KVs are converted). Related JIRA: HBASE-7099 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region
[ https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-7849: -- Assignee: Jimmy Xiang Provide administrative limits around bulkloads of files into a single region Key: HBASE-7849 URL: https://issues.apache.org/jira/browse/HBASE-7849 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Harsh J Assignee: Jimmy Xiang Given the current mechanism, it is possible for users to flood a single region with 1k+ store files via the bulkload API and basically cause the region to become a flying dutchman - never getting assigned successfully again. Ideally, an administrative limit could solve this. If the bulkload RPC call can check if the region already has X store files, then it can reject the request to add another and throw a failure at the client with an appropriate message. This may be an intrusive change, but seems necessary in perfecting the gap between devs and ops in managing a HBase clusters. This would especially prevent abuse in form of unaware devs not pre-splitting tables before bulkloading things in. Currently, this leads to ops pain, as the devs think HBase has gone non-functional and begin complaining. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10500: - Attachment: HBASE-10500.00.patch Here's a simple patch for HadoopQA that disables the blockcache for these tools. hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10500: - Affects Version/s: 0.99.0 0.96.0 Status: Patch Available (was: Open) hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.96.0, 0.98.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898263#comment-13898263 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-TRUNK #4909 (See [https://builds.apache.org/job/HBase-TRUNK/4909/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567232) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time
[ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898262#comment-13898262 ] Jerry He commented on HBASE-10492: -- The machines are 24 CPU 48G memory with Red Hat Enterprise Linux Server release 6.4 (Santiago) 2.6.32-358.el6.x86_64 IBM JDK 6 5 region servers (each with datanode and task tracker). The load MR job with loading of data. I have been trying to reproduce the long delay in opening the daughter regions. With 'org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 200' I have seen delays up to 6 mins. See the log below (from 2014-02-11 02:35:52 to 2014-02-11 02:41:14 at the end) {code} 2014-02-11 02:35:52,473 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 10a421ac8075a42cbcb53bdc393c8e8c 2014-02-11 02:35:52,479 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 5ff07e59d13c99ca14408807a6e61722 2014-02-11 02:35:52,589 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size [4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 5.00; throttle point 2684354560; delete expired; major period 0, major jitter 0.50 2014-02-11 02:35:52,596 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size [4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 5.00; throttle point 2684354560; delete expired; major period 0, major jitter 0.50 2014-02-11 02:35:55,458 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=4289924, memsize=256.6 M, hasBloomFilter=true, into tmp file gpfs:/hbase/data/default/TestTable/ed4d9fb392ae52c1a406a221defc6b00/.tmp/9e2cb318b0114248b9c62948cf47ac5b 2014-02-11 02:36:37,894 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=4289926, memsize=153.1 M, hasBloomFilter=true, into tmp file gpfs:/hbase/data/default/TestTable/110cc21c77569d595f7717b8c75fbf66/.tmp/4e55d6ba4b5644838163101f2ba20fdb 2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114789609 with entries=416, filesize=578.7 M; new WAL /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114958416 2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409 whose highest sequenceid is 4285071 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409 2014-02-11 02:36:53,162 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204 whose highest sequenceid is 4285169 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204 2014-02-11 02:36:53,210 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023 whose highest sequenceid is 4285266 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023 2014-02-11 02:37:13,297 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511 whose highest sequenceid is 4285362 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511 2014-02-11 02:37:13,326 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587 whose highest sequenceid is 4285453 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587 2014-02-11 02:37:13,383 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894 whose highest sequenceid is 4285546 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894 2014-02-11 02:37:33,474 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408 whose highest sequenceid is 4285641 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408 2014-02-11 02:37:33,481 INFO org.apache.hadoop.hbase.regionserver.HStore: Added
[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898295#comment-13898295 ] Jonathan Hsieh commented on HBASE-10500: lgtm.. +1 hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898298#comment-13898298 ] Devaraj Das commented on HBASE-10490: - I can't say for sure if in HBase anyone configures infinite timeout (rpcTimeout = 0) on the sockets but the pingery would have protected the client if it wanted to wait for a while in the situations where the server is busy. So if the rpcTimeout is passed as zero, the socket timeout is set to the ping interval. That means the client won't retry when the timeout happens. It'll just send a ping to figure out whether the server is still alive. If so, then it'll continue to wait (as opposed to resending the request). But I agree that if no one uses rpcTimeout = 0, we could remove the ping stuff. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898304#comment-13898304 ] Devaraj Das commented on HBASE-10490: - rpcTimeout in my last comment refers to the configured value of hbase.rpc.timeout. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898344#comment-13898344 ] Lars Hofhansl commented on HBASE-10493: --- +1 InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.1, 0.99.0 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10493: --- Hadoop Flags: Reviewed Integrated to 0.98 and trunk. Thanks Lars for the reviews. InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.1, 0.99.0 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898361#comment-13898361 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-0.98 #148 (See [https://builds.apache.org/job/HBase-0.98/148/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567230) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10493: -- Fix Version/s: 0.94.17 0.96.2 This is important correctness stuff that should be in 0.94 and 0.96 as well. InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898378#comment-13898378 ] stack commented on HBASE-10493: --- Agree InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region
[ https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7849: --- Status: Patch Available (was: Open) Provide administrative limits around bulkloads of files into a single region Key: HBASE-7849 URL: https://issues.apache.org/jira/browse/HBASE-7849 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Harsh J Assignee: Jimmy Xiang Attachments: hbase-7849.patch Given the current mechanism, it is possible for users to flood a single region with 1k+ store files via the bulkload API and basically cause the region to become a flying dutchman - never getting assigned successfully again. Ideally, an administrative limit could solve this. If the bulkload RPC call can check if the region already has X store files, then it can reject the request to add another and throw a failure at the client with an appropriate message. This may be an intrusive change, but seems necessary in perfecting the gap between devs and ops in managing a HBase clusters. This would especially prevent abuse in form of unaware devs not pre-splitting tables before bulkloading things in. Currently, this leads to ops pain, as the devs think HBase has gone non-functional and begin complaining. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region
[ https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7849: --- Attachment: hbase-7849.patch Provide administrative limits around bulkloads of files into a single region Key: HBASE-7849 URL: https://issues.apache.org/jira/browse/HBASE-7849 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Harsh J Assignee: Jimmy Xiang Attachments: hbase-7849.patch Given the current mechanism, it is possible for users to flood a single region with 1k+ store files via the bulkload API and basically cause the region to become a flying dutchman - never getting assigned successfully again. Ideally, an administrative limit could solve this. If the bulkload RPC call can check if the region already has X store files, then it can reject the request to add another and throw a failure at the client with an appropriate message. This may be an intrusive change, but seems necessary in perfecting the gap between devs and ops in managing a HBase clusters. This would especially prevent abuse in form of unaware devs not pre-splitting tables before bulkloading things in. Currently, this leads to ops pain, as the devs think HBase has gone non-functional and begin complaining. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10493: --- Priority: Major (was: Minor) InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898382#comment-13898382 ] Hudson commented on HBASE-10413: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #136 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/136/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567230) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region
[ https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898386#comment-13898386 ] stack commented on HBASE-7849: -- +1 Provide administrative limits around bulkloads of files into a single region Key: HBASE-7849 URL: https://issues.apache.org/jira/browse/HBASE-7849 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Harsh J Assignee: Jimmy Xiang Attachments: hbase-7849.patch Given the current mechanism, it is possible for users to flood a single region with 1k+ store files via the bulkload API and basically cause the region to become a flying dutchman - never getting assigned successfully again. Ideally, an administrative limit could solve this. If the bulkload RPC call can check if the region already has X store files, then it can reject the request to add another and throw a failure at the client with an appropriate message. This may be an intrusive change, but seems necessary in perfecting the gap between devs and ops in managing a HBase clusters. This would especially prevent abuse in form of unaware devs not pre-splitting tables before bulkloading things in. Currently, this leads to ops pain, as the devs think HBase has gone non-functional and begin complaining. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10493: --- Status: Open (was: Patch Available) Patch for 0.94 coming InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898396#comment-13898396 ] stack commented on HBASE-10490: --- bq. meanwhile the server sees the connection as idle and closes it If server is taking timeout to reply, the client will be gone anyways? If request is taking tens of seconds, we should kill it? If this is expected, up the socket timeout? bq. we can remove the ping on the client without changing anything in the server: We could. I like purging ping altogether (unless I'm wrong above) since it puts a particular shape on how we process the incoming requests (look for the special -1 length indicator and short circuit if a ping) and would like this cleaned up so easier putting in another request handling (e.g. async). bq. But I agree that if no one uses rpcTimeout = 0, we could remove the ping stuff. Lets beat anyone who has their rpcTimeout to 0. Smile (That said, I have vague recollection that rpcTimeout==0 was how we defaulted at one time so let me go beat myself in the past) Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898398#comment-13898398 ] stack commented on HBASE-10500: --- +1 hbck and OOM when BucketCache is enabled Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time
[ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898410#comment-13898410 ] stack commented on HBASE-10492: --- 6minutes is eons, a body blow. If you run oracle jvm, does it exhibit same latencies? ibm jdk taking this long scheduling a thread is a problem, a problem for us if we are to run well on ibmjdk. We should dig and figure if it something about the way this thread is scheduled in particular or if it is the case that any thread can be swapped out for pauses of this magnitude. open daughter regions can unpredictably take long time -- Key: HBASE-10492 URL: https://issues.apache.org/jira/browse/HBASE-10492 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Jerry He I have seen during a stress testing client was getting RetriesExhaustedWithDetailsException: Failed 748 actions: NotServingRegionException On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN. The corresponding time period on the region sever log is: {code} 2014-02-08 20:44:12,662 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 010c1981882d1a59201af5e2dc589d44 2014-02-08 20:44:12,666 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: c2eb9b7971ca7f3fed3da86df5b788e7 {code} There were no INFO related to these two regions until: (at the end see this: Split took 57mins, 16sec) {code} 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354 2014-02-08 21:41:14,032 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. with server=hdtest208.svl.ibm.com,60020,1391887547473 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c. into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed compaction: Request = regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c., storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, time=1391924373278861000; duration=1mins, 40sec 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on cf in region tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. into tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp, totalSize=709.7 M 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with server=hdtest208.svl.ibm.com,60020,1391887547473 2014-02-08 21:41:14,066 INFO
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898417#comment-13898417 ] Devaraj Das commented on HBASE-10490: - bq. Lets beat anyone who has their rpcTimeout to 0. If it's as simple as beating them up, +1 (though I would advise you to not beat yourself up just yet, Stack [smile]). Applications could break because of the fact that a timeout of 0 won't be supported any more (maybe log a big warning if we detect this in the RPC client). And if there is agreement, this should be one of the things we stop supporting in the upcoming 1.0. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10493: --- Attachment: 10493-0.94.txt InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-10493. Resolution: Fixed InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface
[ https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898420#comment-13898420 ] stack commented on HBASE-10498: --- Can you not add a new attribute for the Stochastic LB to consider -- colocation -- and weight it above others rather than add API? Or it may be the case that colocating regions cross-cuts how SLB works currently (it having a single region focus). Add new APIs to load balancer interface --- Key: HBASE-10498 URL: https://issues.apache.org/jira/browse/HBASE-10498 Project: HBase Issue Type: Improvement Components: Balancer Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.1, 0.99.0 If a custom load balancer required to maintain region and corresponding server locations, we can capture this information when we run any balancer algorithm before assignment(like random,retain). But during master startup we will not call any balancer algorithm if a region already assinged During split also we open child regions first in RS and then notify to master through zookeeper. So split regions information cannot be captured into balancer. Since balancer has access to master we can get the information from online regions or region plan data structures in AM. But some use cases we cannot relay on this information(mainly to maintain colocation of two tables regions). So it's better to add some APIs to load balancer to notify balancer when *region is online or offline*. These APIs helps a lot to maintain *regions colocation through custom load balancer* which is very important in secondary indexing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898421#comment-13898421 ] Lars Hofhansl commented on HBASE-10493: --- +1 on 0.94. I assume the trunk patch applies to 0.96. I'll handle HBASE-10485 after this has gone in. InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9360) Enable 0.94 - 0.96 replication to minimize upgrade down time
[ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898431#comment-13898431 ] Francis Liu commented on HBASE-9360: Sorry late to the party. [~saint@gmail.com] mentioned this. We have a different approach. We instead extended replication source/sink to use a thrift client/server to ship/receive the edits. We plan on using it for 0.94-0.96 replication as well as encrypting replication communication. We currently have 0.94-0.94 next step is 0.94-0.96. If people are interested I can try and share the code once we have things stable tho 0.94-0.96 might be a bit later. Enable 0.94 - 0.96 replication to minimize upgrade down time - Key: HBASE-9360 URL: https://issues.apache.org/jira/browse/HBASE-9360 Project: HBase Issue Type: Brainstorming Components: migration Affects Versions: 0.98.0, 0.96.0 Reporter: Jeffrey Zhong As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place upgrade: make corresponding client changes, recompile client application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the upgraded cluster. You can image the down time will be extended if something is wrong in between. To minimize the down time, another possible way is to setup a secondary 0.96 cluster and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission the old one. The ideal steps will be: 1) Setup a 0.96 cluster 2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster 3) Wait till they're in sync in replication 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now) 5) Forward read traffic to the slave 0.96 cluster 6) After a certain period, stop writes to the original 0.94 cluster if everything is good and completes upgrade To get us there, there are two tasks: 1) Enable replication from 0.94 - 0.96 I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it seems the best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support three commands replicateLogEntries, multi and delete. Inside the three commands, we just pass down the corresponding requests to the destination 0.96 cluster as a bridge. The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster to a 0.96 one. The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code. 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both hbase clients into one single JVM using different class loader. Let me know if you think this is worth to do and any better approach we could take. Thanks! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898435#comment-13898435 ] Lars Hofhansl commented on HBASE-10490: --- +1 on removing the pingery. Just this discussion shows how convoluted it has become. Even with rpcTimeout = 0, would clients actually break? Wouldn't they just reconnect? (I might be confused) Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9507) Promote methods of WALActionsListener to WALObserver
[ https://issues.apache.org/jira/browse/HBASE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9507: Fix Version/s: 0.99.0 Promote methods of WALActionsListener to WALObserver Key: HBASE-9507 URL: https://issues.apache.org/jira/browse/HBASE-9507 Project: HBase Issue Type: Brainstorming Components: Coprocessors, wal Reporter: Nick Dimiduk Priority: Minor Fix For: 0.99.0 The interface exposed by WALObserver is quite minimal. To implement anything of significance based on WAL events, WALActionsListener (at a minimum) is required. This is demonstrated by the implementation of the replication feature (not currently possible with coprocessors) and the corresponding interface exploitation that is the [Side-Effect Processor|https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep]. Consider promoting the interface of WALActionsListener into WALObserver. This goes a long way to being able refactor replication into a coprocessor. This also removes the duplicate code path for listeners because they're already available via coprocessor hook. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898447#comment-13898447 ] Devaraj Das commented on HBASE-10490: - bq. Even with rpcTimeout = 0, would clients actually break? Wouldn't they just reconnect? Most likely, they will. My point was that there is some change in semantics. Clients might be handling them well enough already. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898452#comment-13898452 ] Ted Yu commented on HBASE-10493: Integrated to 0.94 and 0.96 InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10361) Enable/AlterTable support for region replicas
[ https://issues.apache.org/jira/browse/HBASE-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-10361: Attachment: 10361-1.txt Patch that assumes HBASE-10350's patch on jira (10350-3.txt). Enable/AlterTable support for region replicas - Key: HBASE-10361 URL: https://issues.apache.org/jira/browse/HBASE-10361 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Devaraj Das Fix For: 0.99.0 Attachments: 10361-1.txt Add support for region replicas in master operations enable table and modify table. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10490) Simplify RpcClient code
[ https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898468#comment-13898468 ] stack commented on HBASE-10490: --- If someone wants timeout=0, and they want to avoid a beating, they could do timeout=Long.MAX_VALUE? I can't think of a place where timeout=0 would make any sense. Good stuff. Simplify RpcClient code --- Key: HBASE-10490 URL: https://issues.apache.org/jira/browse/HBASE-10490 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10490.v1.patch The code is complex. Here is a set of proposed changes, for trunk: 1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception. I expect that we always have a rpcTimeout. So we can remove the code. 2) remove the sendPing: instead, just close the connection if it's not used for a while, instead of trying to ping the server. 3) remove maxIddle time: to avoid the confusion if someone has overwritten the conf. 4) remove shouldCloseConnection: it was more or less synchronized with closeException. Having a single variable instead of two avoids the synchro 5) remove lastActivity: instead of trying to have an exact timeout, just kill the connection after some time. lastActivity could be set to wrong values if the server was slow to answer. 6) hopefully, a better management of the exception; we don't use the close exception of someone else as an input for another one. Same goes for interruption. I may have something wrong in the code. I will review it myself again. Feedback welcome, especially on the ping removal: I hope I got all the use cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10500) Some tools OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10500: - Component/s: (was: hbck) HFile Description: Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in HRegion or HFileReaders initialized with a CacheConfig that doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. was: Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its CacheConfig, which doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. Summary: Some tools OOM when BucketCache is enabled (was: hbck and OOM when BucketCache is enabled) Some tools OOM when BucketCache is enabled -- Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in HRegion or HFileReaders initialized with a CacheConfig that doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10500) Some tools OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10500: - Attachment: HBASE-10500.01.patch Patch moves conf management into constructors so that programatic use is also corrected. Without it, IntegrationTestImportTsv and IntegrationTestBulkLoad fail. Also remove the apparently redundant config from LoadIncrementalHFiles. If you were kind enough to provide a +1 earlier, note that this patch is a little more invasive. Some tools OOM when BucketCache is enabled -- Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in HRegion or HFileReaders initialized with a CacheConfig that doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region
[ https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898483#comment-13898483 ] Hadoop QA commented on HBASE-7849: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628345/hbase-7849.patch against trunk revision . ATTACHMENT ID: 12628345 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 hadoop1.0{color}. The patch failed to compile against the hadoop 1.0 profile. Here is snippet of errors: {code}[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java:[50,29] cannot find symbol [ERROR] symbol : class GenericTestUtils [ERROR] location: package org.apache.hadoop.test [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java:[384,6] cannot find symbol [ERROR] symbol : variable GenericTestUtils -- org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) -- Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729) at org.apache.maven.plugin.TestCompilerMojo.execute(TestCompilerMojo.java:161) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more{code} Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8663//console This message is automatically generated. Provide administrative limits around bulkloads of files into a single region Key: HBASE-7849 URL: https://issues.apache.org/jira/browse/HBASE-7849 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Harsh J Assignee: Jimmy Xiang Attachments: hbase-7849.patch Given the current mechanism, it is possible for users to flood a single region with 1k+ store files via the bulkload API and basically cause the region to become a flying dutchman - never getting assigned successfully again. Ideally, an administrative limit could solve this. If the bulkload RPC call can check if the region already has X store files, then it can reject the request to add another and throw a failure at the client with an appropriate message. This may be an intrusive change, but seems necessary in perfecting the gap between devs and ops in managing a HBase clusters. This would especially prevent abuse in form of unaware devs not pre-splitting tables before bulkloading things in. Currently, this leads to ops pain, as the devs think HBase has gone non-functional and begin complaining. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898493#comment-13898493 ] stack commented on HBASE-10493: --- Thanks [~ted_yu] InclusiveStopFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10493 URL: https://issues.apache.org/jira/browse/HBASE-10493 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt InclusiveStopFilter inherits filterKeyValue() from FilterBase which always returns ReturnCode.INCLUDE InclusiveStopFilter#filterKeyValue() should be consistent with filtering on row key. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10500) Some tools OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898491#comment-13898491 ] stack commented on HBASE-10500: --- Yeah, probably better. I can't think of case where tool would need to offheap. If it does, lets deal then. Meantime, get these tools useable again when offheap enabled. Some tools OOM when BucketCache is enabled -- Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in HRegion or HFileReaders initialized with a CacheConfig that doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10500) Some tools OOM when BucketCache is enabled
[ https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898492#comment-13898492 ] Hadoop QA commented on HBASE-10500: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628314/HBASE-10500.00.patch against trunk revision . ATTACHMENT ID: 12628314 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8662//console This message is automatically generated. Some tools OOM when BucketCache is enabled -- Key: HBASE-10500 URL: https://issues.apache.org/jira/browse/HBASE-10500 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in HRegion or HFileReaders initialized with a CacheConfig that doesn't have the necessary Direct Memory. Possible solutions include: - disable blockcache in the config used by hbck when running its repairs - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments I'm leaning toward the former because it's possible that hbck is run on a host with different hardware profile as the RS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898498#comment-13898498 ] Lars Hofhansl commented on HBASE-10485: --- Committed to 0.96 after HBASE-10493 (also made AlwaysNextColFilter private). PrefixFilter#filterKeyValue() should perform filtering on row key - Key: HBASE-10485 URL: https://issues.apache.org/jira/browse/HBASE-10485 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10485-0.94-v2.txt, 10485-0.94.txt, 10485-trunk-v2.txt, 10485-trunk.addendum, 10485-v1.txt Niels reported an issue under the thread 'Trouble writing custom filter for use in FilterList' where his custom filter used in FilterList along with PrefixFilter produced an unexpected results. His test can be found here: https://github.com/nielsbasjes/HBase-filter-problem This is due to PrefixFilter#filterKeyValue() using FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE When FilterList.Operator.MUST_PASS_ONE is specified, FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key prefix doesn't match meanwhile the other filter's filterKeyValue() returns ReturnCode.NEXT_COL -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10485: -- Resolution: Fixed Status: Resolved (was: Patch Available) And committed to 0.94 as well. Thanks Ted. PrefixFilter#filterKeyValue() should perform filtering on row key - Key: HBASE-10485 URL: https://issues.apache.org/jira/browse/HBASE-10485 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Attachments: 10485-0.94-v2.txt, 10485-0.94.txt, 10485-trunk-v2.txt, 10485-trunk.addendum, 10485-v1.txt Niels reported an issue under the thread 'Trouble writing custom filter for use in FilterList' where his custom filter used in FilterList along with PrefixFilter produced an unexpected results. His test can be found here: https://github.com/nielsbasjes/HBase-filter-problem This is due to PrefixFilter#filterKeyValue() using FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE When FilterList.Operator.MUST_PASS_ONE is specified, FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key prefix doesn't match meanwhile the other filter's filterKeyValue() returns ReturnCode.NEXT_COL -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10504) Define Replication Interface
stack created HBASE-10504: - Summary: Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898504#comment-13898504 ] stack commented on HBASE-10504: --- Is HBASE-9507 related? Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898507#comment-13898507 ] Nick Dimiduk commented on HBASE-10504: -- bq. Is HBASE-9507 related? I think so, yes, but that's based on code study and catching changes along the 0.94 line that broke the above tool. Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)