date:20140211

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897617#comment-13897617
]

Devaraj Das commented on HBASE-10490:
-

Nice cleanup, but hopefully we can make sure the changes doesn't break any
assumption in the RPC layer. The 'ping' removal stood out while I was doing a
review. Am wondering if the server side works well with this change.. i mean
could this happen
(1) client sends an RPC
(2) server got to it but the request is taking a long time to process
(3) meanwhile the server sees the connection as idle and closes it
(since no ping came)
The other thing is if the client's intended socket timeout is 0 (infinite
timeout), wondering if the ping is relevant then to prevent server from closing
the connection on incomplete/not-yet-responded RPCs.

Simplify RpcClient code
---

Key: HBASE-10490
URL: https://issues.apache.org/jira/browse/HBASE-10490
Project: HBase
Issue Type: Bug
Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Fix For: 0.99.0

Attachments: 10490.v1.patch

The code is complex. Here is a set of proposed changes, for trunk:
1) remove PingInputStream. if rpcTimeout 0 it just rethrows the exception.
I expect that we always have a rpcTimeout. So we can remove the code.
2) remove the sendPing: instead, just close the connection if it's not used
for a while, instead of trying to ping the server.
3) remove maxIddle time: to avoid the confusion if someone has overwritten
the conf.
4) remove shouldCloseConnection: it was more or less synchronized with
closeException. Having a single variable instead of two avoids the synchro
5) remove lastActivity: instead of trying to have an exact timeout, just kill
the connection after some time. lastActivity could be set to wrong values if
the server was slow to answer.
6) hopefully, a better management of the exception; we don't use the close
exception of someone else as an input for another one. Same goes for
interruption.
I may have something wrong in the code. I will review it myself again.
Feedback welcome, especially on the ping removal: I hope I got all the use
cases.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10487) Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values


[ 
https://issues.apache.org/jira/browse/HBASE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897619#comment-13897619
 ] 

Hudson commented on HBASE-10487:


FAILURE: Integrated in HBase-TRUNK #4908 (See 
[https://builds.apache.org/job/HBase-TRUNK/4908/])
HBASE-10487 Avoid allocating new KeyValue and according bytes-copying for 
appended kvs which don't have existing values (Honghua) (tedyu: rev 1566981)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


 Avoid allocating new KeyValue and according bytes-copying for appended kvs 
 which don't have existing values
 ---

 Key: HBASE-10487
 URL: https://issues.apache.org/jira/browse/HBASE-10487
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10487-trunk_v1.patch


 in HRegion.append, new KeyValues will be allocated and do according 
 bytes-copying no matter whether there are existing kv for the appended cells, 
 we can improve here by avoiding the allocating of new KeyValue and according 
 bytes-copying for kv which don't have existing(old) values by reusing the 
 passed-in kv and only updating its timestamp to 'now'(its original timestamp 
 is latest, so can be updated)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10495) upgrade script is printing usage two times with help option.


[ 
https://issues.apache.org/jira/browse/HBASE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897656#comment-13897656
 ] 

Hadoop QA commented on HBASE-10495:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628159/HBASE-10495.patch
  against trunk revision .
  ATTACHMENT ID: 12628159

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8659//console

This message is automatically generated.

 upgrade script is printing usage two times with help option.
 

 Key: HBASE-10495
 URL: https://issues.apache.org/jira/browse/HBASE-10495
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.96.0
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor
 Fix For: 0.96.2, 0.98.1, 0.99.0

 Attachments: HBASE-10495.patch


 while testing 0.98 RC found usage is printing two times with help option.
 {code}
 HOST-10-18-91-14:/home/rajeshbabu/98RC3/hbase-0.98.0-hadoop2/bin # ./hbase 
 upgrade -h
 usage: $bin/hbase upgrade -check [-dir DIR]|-execute
  -check   Run upgrade check; looks for HFileV1  under ${hbase.rootdir}
   or provided 'dir' directory.
  -dir arg   Relative path of dir to check for HFileV1s.
  -execute Run upgrade; zk and hdfs must be up, hbase down
  -h,--helpHelp
 Read http://hbase.apache.org/book.html#upgrade0.96 before attempting upgrade
 Example usage:
 Run upgrade check; looks for HFileV1s under ${hbase.rootdir}:
  $ bin/hbase upgrade -check
 Run the upgrade:
  $ bin/hbase upgrade -execute
 usage: $bin/hbase upgrade -check [-dir DIR]|-execute
  -check   Run upgrade check; looks for HFileV1  under ${hbase.rootdir}
   or provided 'dir' directory.
  -dir arg   Relative path of dir to check for HFileV1s.
  -execute Run upgrade; zk and hdfs must be up, hbase down
  -h,--helpHelp
 Read http://hbase.apache.org/book.html#upgrade0.96 before attempting upgrade
 Example usage:
 Run upgrade check; looks for HFileV1s under

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-11 Thread Nicolas Liochon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897660#comment-13897660
]

Nicolas Liochon commented on HBASE-10490:
-

Nice catch. The max idle time is actually used on the server as well. But it
does not really work, because the default are different:
server: maxIdleTime = 2*conf.getInt(ipc.client.connection.maxidletime, 1000);
client: maxIdleTime = conf.getInt(hbase.ipc.client.connection.maxidletime,
1); //10s

So it means that the server disconnects any client that have not spoken for 2
seconds; while the client pings every 10 seconds.
not as well that one is prefixed by 'hbase.' while the other is not. In 2008
they were sharing the same name then it diverged.

I suppose the code on the server doesn't do much because of this:
{code}
protected int thresholdIdleConnections; // the number of idle
// connections after which we
// will start cleaning up idle
// connections
{code}

thresholdIdleConnections is defaulted to 4000. Likely it never happens, and if
it was happening it would not work because of the difference in default values.

I suppose the best way of doing this is: order the connection not used for at
least x seconds, kills some of the oldest.

But, we can say as well that if we're satisfied with the way the server behaves
today, we can remove the ping on the client without changing anything in the
server: the behavior won't change.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

Feng Honghua created HBASE-10497:


 Summary: Add according handling for swallowed InterruptedException 
thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically
 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor


There are many places where InterruptedException thrown by Thread.sleep are 
swallowed silently (which are neither declared in the caller method's throws 
clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
It'd be better to add standard 'log and call currentThread.interrupt' for such 
cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically


 [ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10497:
-

Attachment: HBASE-10497-trunk_v1.patch

patch attached

Note: The handling for InterruptedException thrown by sleep within createTable 
in HBaseAdmin.java is rethrow an encapsulated InterruptedIOException, while 
it's ignored for the one thrown within deleteTable. To keep the previous 
semantic, I just log and call Thread.currentThread.interrupt there. But maybe 
rethrowing an encapsulated InterruptedIOException as in createTable is more 
appropriate.

 Add according handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 ---

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10497) Add according handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically


[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897694#comment-13897694
 ] 

Feng Honghua commented on HBASE-10497:
--

Threads.sleep()(which prints full call stack and call 
Thread.currentThread.interrupt) is also a good alternative and it's used 
somewhere for this purpose, but maybe printing full call stack is a bit more 
heavyweight than a one-line logging for most cases?

 Add according handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 ---

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically


 [ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10497:
-

Summary: Add standard handling for swallowed InterruptedException thrown by 
Thread.sleep under HBase-Client/HBase-Server folders systematically  (was: Add 
according handling for swallowed InterruptedException thrown by Thread.sleep 
under HBase-Client/HBase-Server folders systematically)

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897700#comment-13897700
]

Lukas Nalezenec commented on HBASE-10413:
-

Hi, thank you very much for your time.

I need one small change. Its not critical but it will make considerable
difference in user experience.

My line
LOG.info(MessageFormat.format(Input split length: {0} bytes.,
tSplit.getLength()));
was changed to
LOG.info(Input split length: + tSplit.getLength() + bytes.);
in last code review.

The reason why i used MessageFormat.format is that the length is large number
and it needs to be printed with thousands separator.

It takes few seconds to read number
54798765321
How fast can you say if the number represents 5.4 TB or 5.4 GB ?

but if you print it with separators you can correctly read it in a moment:
54,798,765,321

Can we add some formatting consistent with hbase coding standards ? Maybe
String.format i dont know.

Lukas

Tablesplit.getLength returns 0
--

Key: HBASE-10413
URL: https://issues.apache.org/jira/browse/HBASE-10413
Project: HBase
Issue Type: Bug
Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
Fix For: 0.98.1, 0.99.0

Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch,
HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch,
HBASE-10413.patch

InputSplits should be sorted by length but TableSplit does not contain real
getLength implementation:
@Override
public long getLength() {
// Not clear how to obtain this... seems to be used only for sorting
splits
return 0;
}
This is causing us problem with scheduling - we have got jobs that are
supposed to finish in limited time but they get often stuck in last mapper
working on large region.
Can we implement this method ?
What is the best way ?
We were thinking about estimating size by size of files on HDFS.
We would like to get Scanner from TableSplit, use startRow, stopRow and
column families to get corresponding region than computing size of HDFS for
given region and column family.
Update:
This ticket was about production issue - I talked with guy who worked on this
and he said our production issue was probably not directly caused by
getLength() returning 0.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

2014-02-11 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897708#comment-13897708
 ] 

Nicolas Liochon commented on HBASE-10497:
-

I'm not sure of this one.
{code}
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.java
  (working copy)
@@ -111,6 +111,7 @@
   }
 } catch (InterruptedException e) {
   System.err.println(didn't wait long enough: + e);
+  Thread.currentThread().interrupt();
   return (-1);
 }
{code}
Because we took care of the interruption already (may be wrongly however), so 
the thread is not interrupted any more: the -1 means: we stop.
It's the same for the one with the split worker likely.

This one is likely wrong:
{code}
@@ -185,6 +185,8 @@
   try {
 Thread.sleep(100);
   } catch (InterruptedException ignored) {
+LOG.warn(Interrupted while sleeping);
+Thread.currentThread().interrupt();
   }
   if (System.currentTimeMillis()  startTime + 3) {
 throw new RuntimeException(Master not active after 30 seconds);
{code}
As it's inside a loop, we're likely to loop again, then the sleep will be 
interrupted immediately as we restored the interruption status, then we will 
log again = we will flood the logs.

I haven't checked the whole patch, but inside a loop you can't simply restore 
the status, you need to take a decision (stop the loop) or store the 
interruption to restore the status later. When we can, it's better to take care 
of the interruption explicitly by stopping our process and/or rethrowing an 
exception to the caller. In the case above, may be be we should throw a runtime 
exception, as in Master not active after 30 seconds?

See as well ExceptionUtils (it's new).

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10498) Add new APIs to load balancer interface

rajeshbabu created HBASE-10498:
--

 Summary: Add new APIs to load balancer interface
 Key: HBASE-10498
 URL: https://issues.apache.org/jira/browse/HBASE-10498
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.99.0


If a custom load balancer required to maintain region and corresponding server 
locations,
we can capture this information when we run any balancer algorithm before 
assignment(like random,retain).
But during master startup we will not call any balancer algorithm if a region 
already assinged
During split also we open child regions first in RS and then notify to master 
through zookeeper. 
So split regions information cannot be captured into balancer.

Since balancer has access to master we can get the information from online 
regions or region plan data structures in AM.
But some use cases we cannot relay on this information(mainly to maintain 
colocation of two tables regions). 

So it's better to add some APIs to load balancer to notify balancer when 
*region is online or offline*.

These APIs helps a lot to maintain *regions colocation through custom load 
balancer* which is very important in secondary indexing. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-8332) Add truncate as HMaster method

2014-02-11 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8332:
---

Attachment: HBASE-8332-v3.patch

 Add truncate as HMaster method
 --

 Key: HBASE-8332
 URL: https://issues.apache.org/jira/browse/HBASE-8332
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, 
 HBASE-8332-v3.patch, HBASE-8332.draft.patch


 Currently truncate and truncate_preserve are only shell functions, and 
 implemented as deleteTable() + createTable().
 Using ACLs the user running truncate, must have rights to create a table and 
 only global granted users can create tables.
 Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL 
 check.
 https://reviews.apache.org/r/15835/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-8332) Add truncate as HMaster method

2014-02-11 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8332:
---

Attachment: (was: HBASE-8332-v3.patch)

 Add truncate as HMaster method
 --

 Key: HBASE-8332
 URL: https://issues.apache.org/jira/browse/HBASE-8332
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, 
 HBASE-8332.draft.patch


 Currently truncate and truncate_preserve are only shell functions, and 
 implemented as deleteTable() + createTable().
 Using ACLs the user running truncate, must have rights to create a table and 
 only global granted users can create tables.
 Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL 
 check.
 https://reviews.apache.org/r/15835/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-8332) Add truncate as HMaster method

2014-02-11 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8332:
---

Attachment: HBASE-8332-v3.patch

 Add truncate as HMaster method
 --

 Key: HBASE-8332
 URL: https://issues.apache.org/jira/browse/HBASE-8332
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, 
 HBASE-8332-v3.patch, HBASE-8332.draft.patch


 Currently truncate and truncate_preserve are only shell functions, and 
 implemented as deleteTable() + createTable().
 Using ACLs the user running truncate, must have rights to create a table and 
 only global granted users can create tables.
 Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL 
 check.
 https://reviews.apache.org/r/15835/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException

ramkrishna.s.vasudevan created HBASE-10499:
--

 Summary: In write heavy scenario one of the regions does not get 
flushed causing RegionTooBusyException
 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.0, 0.98.1


I got this while testing 0.98RC.  But am not sure if it is specific to this 
version.  Doesn't seem so to me.  
Also it is something similar to HBASE-5312 and HBASE-5568.

Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
regions.  In one of the run with 0.98 server and 0.98 client I faced this 
problem like the hlogs became more and the system requested flushes for those 
many regions.
One by one everything was flushed except one and that one thing remained 
unflushed.  The ripple effect of this on the client side
{code}
com.yahoo.ycsb.DBException: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 
actions: RegionTooBusyException: 54 times,
at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
at com.yahoo.ycsb.ClientThread.run(Client.java:307)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 54 actions: RegionTooBusyException: 54 times,
at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
at 
org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
at 
org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
... 2 more
{code}
On one of the RS
{code}
2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
acc43e4b42c1a041078774f4f20a3ff5
..
2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
{code}
{code}
2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. 
after a delay of 16689
2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. 
after a delay of 15868
2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. 
after a delay of 20847
2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. 
after a delay of 20099
2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. 
after a delay of 8677
{code}
{code}
2014-02-11 10:31:21,020 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
many hlogs: logs=54, maxlogs=32; forcing flush of 1 regions(s): 
fdbb3242d3b673bbe4790a47bc30576f
{code}
I restarted another RS and there were region movements with other regions but 
this region stays with the RS that has this issue.  One important observation 
is that in

[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


[ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897779#comment-13897779
 ] 

ramkrishna.s.vasudevan commented on HBASE-10499:


Am not sure if this could come in 0.96 and trunk also.  I feel 0.96 it is 
possible but with trunk (recent HLog disruptor) changes am not sure.  Also may 
be possible in 0.94.
I don't have any soln in hand except for adding log msgs in such a case where 
memstoreSize could be zero. Will check more on this.

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.0, 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay

[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


 [ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10499:
---

Affects Version/s: 0.98.0

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099
 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 8677
 {code}
 {code}
 2014-02-11 10:31:21,020 INFO

[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


[ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897780#comment-13897780
 ] 

ramkrishna.s.vasudevan commented on HBASE-10499:


I have logs and thread dumps taken during this time.  If needed can attach them 
here.

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099
 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region

[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


 [ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10499:
---

Fix Version/s: (was: 0.98.0)

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099
 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 8677
 {code}
 {code}
 2014-02-11 10:31:21,020

[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


[ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897787#comment-13897787
 ] 

ramkrishna.s.vasudevan commented on HBASE-10499:


In HBASE-5568 and HBASE-5312 there were multiple flushes on the same region and 
splitting of regions were happening.  Here no such things happen.  The region 
in discussion has not been flushed even once and there are no splits or 
compactions that has happened on it.

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099

[jira] [Commented] (HBASE-8332) Add truncate as HMaster method


[ 
https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897799#comment-13897799
 ] 

Hadoop QA commented on HBASE-8332:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628180/HBASE-8332-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12628180

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private TruncateTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private TruncateTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private EnableTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private EnableTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private DisableTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private DisableTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private ModifyTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+.preTruncateTable(ObserverContext.createAndPrepare(CP_ENV, null), 
TEST_TABLE.getTableName());

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8660//console

This message is automatically generated.

 Add truncate as HMaster method
 --

 Key: HBASE-8332
 URL: https://issues.apache.org/jira/browse/HBASE-8332
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, 
 HBASE-8332-v3.patch, HBASE-8332.draft.patch


 Currently truncate and truncate_preserve are only shell functions, and 
 implemented as deleteTable() + createTable().
 Using ACLs the user running truncate, must have rights to create a table and 
 only global granted users can create tables.
 Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL 
 check.

[jira] [Commented] (HBASE-8332) Add truncate as HMaster method


[ 
https://issues.apache.org/jira/browse/HBASE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897809#comment-13897809
 ] 

Hadoop QA commented on HBASE-8332:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628184/HBASE-8332-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12628184

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private TruncateTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private TruncateTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private EnableTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private EnableTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private DisableTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private DisableTableResponse(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+private ModifyTableRequest(boolean noInit) { this.unknownFields = 
com.google.protobuf.UnknownFieldSet.getDefaultInstance(); }
+.preTruncateTable(ObserverContext.createAndPrepare(CP_ENV, null), 
TEST_TABLE.getTableName());

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8661//console

This message is automatically generated.

 Add truncate as HMaster method
 --

 Key: HBASE-8332
 URL: https://issues.apache.org/jira/browse/HBASE-8332
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-8332-v0.patch, HBASE-8332-v2.patch, 
 HBASE-8332-v3.patch, HBASE-8332.draft.patch


 Currently truncate and truncate_preserve are only shell functions, and 
 implemented as deleteTable() + createTable().
 Using ACLs the user running truncate, must have rights to create a table and 
 only global granted users can create tables.
 Add truncate() and truncatePreserve() to HBaseAdmin/HMaster with its own ACL 
 check.
 https://reviews.apache.org/r/15835/

[jira] [Commented] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp


[ 
https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897814#comment-13897814
 ] 

Hudson commented on HBASE-10486:


SUCCESS: Integrated in hbase-0.96-hadoop2 #199 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/199/])
HBASE-10486: ProtobufUtil Append  Increment deserialization lost cell level 
timestamp (jeffreyz: rev 1566962)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java


 ProtobufUtil Append  Increment deserialization lost cell level timestamp
 -

 Key: HBASE-10486
 URL: https://issues.apache.org/jira/browse/HBASE-10486
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.1, 0.99.0

 Attachments: hbase-10486-v2.patch, hbase-10486.patch


 When we deserialized Append  Increment, we uses wrong timestamp value during 
 deserialization in trunk  0.98 code and discard the value in 0.96 code base. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897820#comment-13897820
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

One more think:
There is some versioning in class TableSplit (methods write  read). 
We dont need to increment it ?
(I am just asking)

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically


 [ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10497:
-

Attachment: HBASE-10497-trunk_v2.patch

Patch attached per [~nkeywal]'s review feedback, and thanks [~nkeywal] again:-)

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch, HBASE-10497-trunk_v2.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically


[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897827#comment-13897827
 ] 

Feng Honghua commented on HBASE-10497:
--

Thanks [~nkeywal] for the prompt review!
bq.This one is likely wrong...As it's inside a loop, we're likely to loop 
again, then the sleep will be interrupted immediately as we restored the 
interruption status, then we will log again = we will flood the logs
Good catch, my carelessness. To keep the same semantic(in terms of how/when we 
handle the exception), I use below code block for such non-cancelable tasks as 
while/for loops to prevent latter sleep from immediately throwing 
InterruptedException due to the re-interrupt issued by former iteration's 
catch...
{code}
  boolean interrupted = false;
  try {
while (...) {
  try {
...
Thread.sleep(...);
...
  } catch (InterruptedException e) {
interrupted = true;
  }
}
  } finally {
if (interrupted) {
  LOG.warn(...);
  Thread.currentThread().interrupt();
}
  }
{code}

bq.Because we took care of the interruption already (may be wrongly however), 
so the thread is not interrupted any more: the -1 means: we stop. It's the same 
for the one with the split worker likely.
As you said we do take care of the interruption already, but they are within 
Runnable and Tool respectively, it's still meaningful to re-interrupt current 
thread in order for code higher up the call stack to being able to know about 
the fact of interrupt, right? At least it does no harm to re-interrupt here.

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10487) Avoid allocating new KeyValue and according bytes-copying for appended kvs which don't have existing values


 [ 
https://issues.apache.org/jira/browse/HBASE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10487:
---

Fix Version/s: 0.99.0

 Avoid allocating new KeyValue and according bytes-copying for appended kvs 
 which don't have existing values
 ---

 Key: HBASE-10487
 URL: https://issues.apache.org/jira/browse/HBASE-10487
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.99.0

 Attachments: HBASE-10487-trunk_v1.patch


 in HRegion.append, new KeyValues will be allocated and do according 
 bytes-copying no matter whether there are existing kv for the appended cells, 
 we can improve here by avoiding the allocating of new KeyValue and according 
 bytes-copying for kv which don't have existing(old) values by reusing the 
 passed-in kv and only updating its timestamp to 'now'(its original timestamp 
 is latest, so can be updated)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


[ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898030#comment-13898030
 ] 

Ted Yu commented on HBASE-10499:


bq. I have logs and thread dumps taken during this time

Please attach them.

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099
 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f.

[jira] [Updated] (HBASE-10498) Add new APIs to load balancer interface

2014-02-11 Thread James Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated HBASE-10498:
-

Tags: Phoenix

 Add new APIs to load balancer interface
 ---

 Key: HBASE-10498
 URL: https://issues.apache.org/jira/browse/HBASE-10498
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.99.0


 If a custom load balancer required to maintain region and corresponding 
 server locations,
 we can capture this information when we run any balancer algorithm before 
 assignment(like random,retain).
 But during master startup we will not call any balancer algorithm if a region 
 already assinged
 During split also we open child regions first in RS and then notify to master 
 through zookeeper. 
 So split regions information cannot be captured into balancer.
 Since balancer has access to master we can get the information from online 
 regions or region plan data structures in AM.
 But some use cases we cannot relay on this information(mainly to maintain 
 colocation of two tables regions). 
 So it's better to add some APIs to load balancer to notify balancer when 
 *region is online or offline*.
 These APIs helps a lot to maintain *regions colocation through custom load 
 balancer* which is very important in secondary indexing. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface

2014-02-11 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898035#comment-13898035
 ] 

James Taylor commented on HBASE-10498:
--

Can this work be done in 0.98?

 Add new APIs to load balancer interface
 ---

 Key: HBASE-10498
 URL: https://issues.apache.org/jira/browse/HBASE-10498
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.99.0


 If a custom load balancer required to maintain region and corresponding 
 server locations,
 we can capture this information when we run any balancer algorithm before 
 assignment(like random,retain).
 But during master startup we will not call any balancer algorithm if a region 
 already assinged
 During split also we open child regions first in RS and then notify to master 
 through zookeeper. 
 So split regions information cannot be captured into balancer.
 Since balancer has access to master we can get the information from online 
 regions or region plan data structures in AM.
 But some use cases we cannot relay on this information(mainly to maintain 
 colocation of two tables regions). 
 So it's better to add some APIs to load balancer to notify balancer when 
 *region is online or offline*.
 These APIs helps a lot to maintain *regions colocation through custom load 
 balancer* which is very important in secondary indexing. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException


 [ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10499:
---

Attachment: hbase-root-regionserver-ip-10-93-128-92.zip
t2.dump
t1.dump

 In write heavy scenario one of the regions does not get flushed causing 
 RegionTooBusyException
 --

 Key: HBASE-10499
 URL: https://issues.apache.org/jira/browse/HBASE-10499
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.1

 Attachments: hbase-root-regionserver-ip-10-93-128-92.zip, t1.dump, 
 t2.dump


 I got this while testing 0.98RC.  But am not sure if it is specific to this 
 version.  Doesn't seem so to me.  
 Also it is something similar to HBASE-5312 and HBASE-5568.
 Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
 regions.  In one of the run with 0.98 server and 0.98 client I faced this 
 problem like the hlogs became more and the system requested flushes for those 
 many regions.
 One by one everything was flushed except one and that one thing remained 
 unflushed.  The ripple effect of this on the client side
 {code}
 com.yahoo.ycsb.DBException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
 at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
 at com.yahoo.ycsb.ClientThread.run(Client.java:307)
 Caused by: 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
 54 actions: RegionTooBusyException: 54 times,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
 at 
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
 at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
 at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
 ... 2 more
 {code}
 On one of the RS
 {code}
 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
 acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
 d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
 bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
 cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
 acc43e4b42c1a041078774f4f20a3ff5
 ..
 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
 many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
 fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
 {code}
 {code}
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 16689
 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 15868
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
 delay of 20847
 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
 flush for region 
 usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
 delay of 20099
 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
 regionserver.HRegionServer: regionserver60020.periodicFlusher

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898037#comment-13898037
 ] 

Nick Dimiduk commented on HBASE-10413:
--

bq. Can we add some formatting consistent with hbase coding standards ? Maybe 
String.format i dont know.

I agree, this is difficult to read. Usually we use 
[StringUtils#humanReadableInt(long)|http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/StringUtils.html#humanReadableInt(long)].

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898038#comment-13898038
 ] 

Nick Dimiduk commented on HBASE-10413:
--

The TableSplit writable does not persist beyond the life of a mapreduce job. A 
single job will have the same version of the serialization code, so there's no 
versioning to increment.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0


 [ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10413:
---

Attachment: 10413.addendum

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10498) Add new APIs to load balancer interface


 [ 
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-10498:
---

Fix Version/s: 0.98.1

 Add new APIs to load balancer interface
 ---

 Key: HBASE-10498
 URL: https://issues.apache.org/jira/browse/HBASE-10498
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.1, 0.99.0


 If a custom load balancer required to maintain region and corresponding 
 server locations,
 we can capture this information when we run any balancer algorithm before 
 assignment(like random,retain).
 But during master startup we will not call any balancer algorithm if a region 
 already assinged
 During split also we open child regions first in RS and then notify to master 
 through zookeeper. 
 So split regions information cannot be captured into balancer.
 Since balancer has access to master we can get the information from online 
 regions or region plan data structures in AM.
 But some use cases we cannot relay on this information(mainly to maintain 
 colocation of two tables regions). 
 So it's better to add some APIs to load balancer to notify balancer when 
 *region is online or offline*.
 These APIs helps a lot to maintain *regions colocation through custom load 
 balancer* which is very important in secondary indexing. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface

[
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898059#comment-13898059
]

rajeshbabu commented on HBASE-10498:

Added 0.98.1 to fix versions.

Add new APIs to load balancer interface
---

Key: HBASE-10498
URL: https://issues.apache.org/jira/browse/HBASE-10498
Project: HBase
Issue Type: Improvement
Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
Fix For: 0.98.1, 0.99.0

If a custom load balancer required to maintain region and corresponding
server locations,
we can capture this information when we run any balancer algorithm before
assignment(like random,retain).
But during master startup we will not call any balancer algorithm if a region
already assinged
During split also we open child regions first in RS and then notify to master
through zookeeper.
So split regions information cannot be captured into balancer.
Since balancer has access to master we can get the information from online
regions or region plan data structures in AM.
But some use cases we cannot relay on this information(mainly to maintain
colocation of two tables regions).
So it's better to add some APIs to load balancer to notify balancer when
*region is online or offline*.
These APIs helps a lot to maintain *regions colocation through custom load
balancer* which is very important in secondary indexing.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898065#comment-13898065
 ] 

Ted Yu commented on HBASE-10413:


Integrated addendum to 0.98 and trunk.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10500) hbck and OOM when BucketCache is enabled

Nick Dimiduk created HBASE-10500:


 Summary: hbck and OOM when BucketCache is enabled
 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause 
OOM. This is apparently because {{bin/hbase}} does not include 
$HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part 
of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its 
CacheConfig, which doesn't have the necessary Direct Memory.

Possible solutions include:
 - disable blockcache in the config used by hbck when running its repairs
 - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments

I'm leaning toward the former because it's possible that hbck is run on a host 
with the same hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898087#comment-13898087
 ] 

Nick Dimiduk commented on HBASE-10500:
--

Here's the full stack trace:

{noformat}
Exception in thread main java.io.IOException: java.lang.OutOfMemoryError: 
Direct buffer memory
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:731)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:638)
  at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:609)
  at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:595)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4195)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4154)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4127)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4205)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:4085)
  at 
org.apache.hadoop.hbase.util.HBaseFsckRepair.createHDFSRegionDir(HBaseFsckRepair.java:190)
  at 
org.apache.hadoop.hbase.util.HBaseFsck$TableInfo$HDFSIntegrityFixer.handleHoleInRegionChain(HBaseFsck.java:2312)
  at 
org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.checkRegionChain(HBaseFsck.java:2492)
  at 
org.apache.hadoop.hbase.util.HBaseFsck.checkHdfsIntegrity(HBaseFsck.java:1226)
  at 
org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:741)
  at 
org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:386)
  at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:475)
  at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:4029)
  at 
org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:3838)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
  at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3826)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at 
org.apache.hadoop.hbase.util.ByteBufferArray.init(ByteBufferArray.java:65)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.init(ByteBufferIOEngine.java:44)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:270)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.init(BucketCache.java:210)
  at 
org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:399)
  at org.apache.hadoop.hbase.io.hfile.CacheConfig.init(CacheConfig.java:143)
  at org.apache.hadoop.hbase.regionserver.HStore.init(HStore.java:231)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3309)
  at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:702)
  at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:699)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
{noformat}

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk

 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with the same hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled

[
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HBASE-10500:
-

Description:
Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause
OOM. This is apparently because {{bin/hbase}} does not include
$HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part
of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its
CacheConfig, which doesn't have the necessary Direct Memory.

Possible solutions include:
- disable blockcache in the config used by hbck when running its repairs
- include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments

I'm leaning toward the former because it's possible that hbck is run on a host
with different hardware profile as the RS.

was:
Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause
OOM. This is apparently because {{bin/hbase}} does not include
$HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part
of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its
CacheConfig, which doesn't have the necessary Direct Memory.

Possible solutions include:
- disable blockcache in the config used by hbck when running its repairs
- include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments

I'm leaning toward the former because it's possible that hbck is run on a host
with the same hardware profile as the RS.

hbck and OOM when BucketCache is enabled

Key: HBASE-10500
URL: https://issues.apache.org/jira/browse/HBASE-10500
Project: HBase
Issue Type: Bug
Components: hbck
Affects Versions: 0.98.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk

Running {{hbck --repair}} when BucketCache is enabled in offheap mode can
cause OOM. This is apparently because {{bin/hbase}} does not include
$HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as
part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes
its CacheConfig, which doesn't have the necessary Direct Memory.
Possible solutions include:
- disable blockcache in the config used by hbck when running its repairs
- include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
I'm leaning toward the former because it's possible that hbck is run on a
host with different hardware profile as the RS.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10501) Make IncreasingToUpperBoundRegionSplitPolicy configurable

Lars Hofhansl created HBASE-10501:
-

 Summary: Make IncreasingToUpperBoundRegionSplitPolicy configurable
 Key: HBASE-10501
 URL: https://issues.apache.org/jira/browse/HBASE-10501
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


During some (admittedly) artificial load testing we found a large amount split 
activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

The current logic is this (from the comment)
regions that are on this server that all are of the same table, squared, times 
the region flush size OR the maximum region split size, whichever is smaller

So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of 
the same table on an RS to reach the max size.
With 10gb file sized it is still 9 regions of the same table.
Considering that the number of regions that an RS can carry is limited and 
might be multiple tables, this should be more configurable.

I think the squaring is smart and we do not need to change it.

We could
* Make the start size configurable and default it to the flush size
* Add multiplier for the initial size, i.e. start with n * flushSize

Of course one can override the default split policy, but these seem like simple 
tweaks.

Or we could instead set the goal of how many regions of the same table would 
need to be present in order to reach the max size. In that case we'd start with 
maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 
20g/9 = 2.2g for the initial region size.

[~stack], I'm interested in your opinion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

Liyin Tang created HBASE-10502:
--

 Summary: [89-fb] ParallelScanner: a client utility to perform 
multiple scan requests in parallel.
 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


ParallelScanner is a utility class for the HBase client to perform multiple 
scan requests in parallel. It requires all the scan requests having the same 
caching size for the simplicity purpose. 
 
This class provides 3 very basic functionalities: 
* The initialize function will Initialize all the ResultScanners by calling 
{@link HTable#getScanner(Scan)} in parallel for each scan request.
* The next function will call the corresponding {@link ResultScanner#next(int 
numRows)} from each scan request in parallel, and then return all the results 
together as a list.  Also, if result list is empty, it indicates there is no 
data left for all the scanners and the user can call {@link #close()} 
afterwards.
* The close function will close all the scanners and shutdown the thread pool.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.


[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898111#comment-13898111
 ] 

Lars Hofhansl commented on HBASE-10502:
---

see also HBASE-9272

 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898127#comment-13898127
 ] 

Nick Dimiduk commented on HBASE-10413:
--

Thanks Ted.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.


[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898128#comment-13898128
 ] 

Liyin Tang commented on HBASE-10502:


By skimming though HBASE-9272,  the semantics seem to be a little different. In 
this case, the client actually wants to construct multiple scan requests, while 
HBASE-9272 is to perform a single scan request in parallel. 


 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10501) Make IncreasingToUpperBoundRegionSplitPolicy configurable

[
https://issues.apache.org/jira/browse/HBASE-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-10501:
--

Description:
During some (admittedly artificial) load testing we found a large amount split
activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

The current logic is this (from the comments):
regions that are on this server that all are of the same table, squared, times
the region flush size OR the maximum region split size, whichever is smaller

So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of
the same table on an RS to reach the max size.
With 10gb file sized it is still 9 regions of the same table.
Considering that the number of regions that an RS can carry is limited and
there might be multiple tables, this should be more configurable.

I think the squaring is smart and we do not need to change it.

We could
* Make the start size configurable and default it to the flush size
* Add multiplier for the initial size, i.e. start with n * flushSize
* Also change the default to start with 2*flush size

Of course one can override the default split policy, but these seem like simple
tweaks.

Or we could instead set the goal of how many regions of the same table would
need to be present in order to reach the max size. In that case we'd start with
maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with
20g/9 = 2.2g for the initial region size.

[~stack], I'm especially interested in your opinion.

was:
During some (admittedly) artificial load testing we found a large amount split
activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

The current logic is this (from the comment)
regions that are on this server that all are of the same table, squared, times
the region flush size OR the maximum region split size, whichever is smaller

So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of
the same table on an RS to reach the max size.
With 10gb file sized it is still 9 regions of the same table.
Considering that the number of regions that an RS can carry is limited and
might be multiple tables, this should be more configurable.

I think the squaring is smart and we do not need to change it.

We could
* Make the start size configurable and default it to the flush size
* Add multiplier for the initial size, i.e. start with n * flushSize

Of course one can override the default split policy, but these seem like simple
tweaks.

[~stack], I'm interested in your opinion.

Make IncreasingToUpperBoundRegionSplitPolicy configurable
-

Key: HBASE-10501
URL: https://issues.apache.org/jira/browse/HBASE-10501
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl

During some (admittedly artificial) load testing we found a large amount
split activity, which we tracked down the
IncreasingToUpperBoundRegionSplitPolicy.
The current logic is this (from the comments):
regions that are on this server that all are of the same table, squared,
times the region flush size OR the maximum region split size, whichever is
smaller
So with a flush size of 128mb and max file size of 20gb, we'd need 13 region
of the same table on an RS to reach the max size.
With 10gb file sized it is still 9 regions of the same table.
Considering that the number of regions that an RS can carry is limited and
there might be multiple tables, this should be more configurable.
I think the squaring is smart and we do not need to change it.
We could
* Make the start size configurable and default it to the flush size
* Add multiplier for the initial size, i.e. start with n * flushSize
* Also change the default to start with 2*flush size
Of course one can override the default split policy, but these seem like
simple tweaks.
Or we could instead set the goal of how many regions of the same table would
need to be present in order to reach the max size. In that case we'd start
with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start
with 20g/9 = 2.2g for the initial region size.
[~stack], I'm especially interested in your opinion.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.


[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898132#comment-13898132
 ] 

Liyin Tang commented on HBASE-10502:


Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. 
Assuming a join query such as Table A joins Table B based on row key / some 
prefix, then HBase-9272 is useful to issue the initial scan in parallel to 
retrieve all the join keys, and then based on join keys, multiple scan queries 
for Table B can be constructed and be submitted in parallel by HBase10502.

 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.


[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898138#comment-13898138
 ] 

Liyin Tang commented on HBASE-10502:


In addition, the API of HBASE-10502 seems to more flexible (to me). Because if 
there is a single scan request, spanning multiple region boundaries, then hbase 
client is always able to split this scan request into multiple region-local 
scan requests, and then submit to HBASE-10502 for parallel execution.


 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898157#comment-13898157
 ] 

Nick Dimiduk commented on HBASE-10500:
--

Looks like the same kind of issue crops up with LoadIncrementalHFiles:

{noformat}
2014-02-11 18:14:30,021 ERROR [main] mapreduce.LoadIncrementalHFiles: 
Unexpected execution exception during splitting
  java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct 
buffer memory
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
  at java.util.concurrent.FutureTask.get(FutureTask.java:111)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:407)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:288)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:822)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:827)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at 
org.apache.hadoop.hbase.util.ByteBufferArray.init(ByteBufferArray.java:65)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.init(ByteBufferIOEngine.java:44)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:270)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.init(BucketCache.java:210)
  at 
org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:399)
  at org.apache.hadoop.hbase.io.hfile.CacheConfig.init(CacheConfig.java:166)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:476)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:397)
  at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:395)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
{noformat}

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk

 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp

2014-02-11 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898164#comment-13898164
 ] 

Enis Soztutar commented on HBASE-10486:
---

Jeffrey, gentle reminder to set the appropriate fix versions once you commit 
the the patch to the branch(es). 

 ProtobufUtil Append  Increment deserialization lost cell level timestamp
 -

 Key: HBASE-10486
 URL: https://issues.apache.org/jira/browse/HBASE-10486
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.96.2, 0.98.1, 0.99.0

 Attachments: hbase-10486-v2.patch, hbase-10486.patch


 When we deserialized Append  Increment, we uses wrong timestamp value during 
 deserialization in trunk  0.98 code and discard the value in 0.96 code base. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10486) ProtobufUtil Append Increment deserialization lost cell level timestamp

2014-02-11 Thread Enis Soztutar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-10486:
--

Fix Version/s: 0.96.2

 ProtobufUtil Append  Increment deserialization lost cell level timestamp
 -

 Key: HBASE-10486
 URL: https://issues.apache.org/jira/browse/HBASE-10486
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.96.2, 0.98.1, 0.99.0

 Attachments: hbase-10486-v2.patch, hbase-10486.patch


 When we deserialized Append  Increment, we uses wrong timestamp value during 
 deserialization in trunk  0.98 code and discard the value in 0.96 code base. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface

2014-02-11 Thread Enis Soztutar (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898181#comment-13898181
]

Enis Soztutar commented on HBASE-10498:
---

Is this for doing all the placement decisions through the LB interfaces? I
think it makes sense.

Add new APIs to load balancer interface
---

Key: HBASE-10498
URL: https://issues.apache.org/jira/browse/HBASE-10498
Project: HBase
Issue Type: Improvement
Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
Fix For: 0.98.1, 0.99.0

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface

[
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898190#comment-13898190
]

rajeshbabu commented on HBASE-10498:

bq.Is this for doing all the placement decisions through the LB interfaces?
Yes [~enis].

Add new APIs to load balancer interface
---

Key: HBASE-10498
URL: https://issues.apache.org/jira/browse/HBASE-10498
Project: HBase
Issue Type: Improvement
Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
Fix For: 0.98.1, 0.99.0

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10503) [0.89-fb] Add metrics to track compaction hook progress

2014-02-11 Thread Adela Maznikar (JIRA)

Adela Maznikar created HBASE-10503:
--

 Summary: [0.89-fb] Add metrics to track compaction hook progress
 Key: HBASE-10503
 URL: https://issues.apache.org/jira/browse/HBASE-10503
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Affects Versions: 0.89-fb
Reporter: Adela Maznikar
Assignee: Adela Maznikar
Priority: Minor


Add a metric to track how many KVs we have converted with the compaction hook, 
and bytes that we have saved during the process. This will help us to see when 
there are no new KVs converted and give us a good signal when to disable it 
(all KVs are converted). Related JIRA: HBASE-7099



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region

2014-02-11 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-7849:
--

Assignee: Jimmy Xiang

 Provide administrative limits around bulkloads of files into a single region
 

 Key: HBASE-7849
 URL: https://issues.apache.org/jira/browse/HBASE-7849
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Harsh J
Assignee: Jimmy Xiang

 Given the current mechanism, it is possible for users to flood a single 
 region with 1k+ store files via the bulkload API and basically cause the 
 region to become a flying dutchman - never getting assigned successfully 
 again.
 Ideally, an administrative limit could solve this. If the bulkload RPC call 
 can check if the region already has X store files, then it can reject the 
 request to add another and throw a failure at the client with an appropriate 
 message.
 This may be an intrusive change, but seems necessary in perfecting the gap 
 between devs and ops in managing a HBase clusters. This would especially 
 prevent abuse in form of unaware devs not pre-splitting tables before 
 bulkloading things in. Currently, this leads to ops pain, as the devs think 
 HBase has gone non-functional and begin complaining.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled


 [ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10500:
-

Attachment: HBASE-10500.00.patch

Here's a simple patch for HadoopQA that disables the blockcache for these tools.

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch


 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10500) hbck and OOM when BucketCache is enabled


 [ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10500:
-

Affects Version/s: 0.99.0
   0.96.0
   Status: Patch Available  (was: Open)

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.96.0, 0.98.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch


 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898263#comment-13898263
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-TRUNK #4909 (See 
[https://builds.apache.org/job/HBase-TRUNK/4909/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567232)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time

2014-02-11 Thread Jerry He (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898262#comment-13898262
 ] 

Jerry He commented on HBASE-10492:
--

The machines are 24 CPU 48G memory with 
Red Hat Enterprise Linux Server release 6.4 (Santiago) 2.6.32-358.el6.x86_64
IBM JDK 6
5 region servers (each with datanode and task tracker).  The load MR job with 
loading of data.
I have been trying to reproduce the long delay in opening the daughter regions.
With 'org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 200' I have 
seen delays up to 6 mins.
See the log below (from 2014-02-11 02:35:52 to 2014-02-11 02:41:14 at the end)
{code}
2014-02-11 02:35:52,473 WARN 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
found for region: 10a421ac8075a42cbcb53bdc393c8e8c
2014-02-11 02:35:52,479 WARN 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
found for region: 5ff07e59d13c99ca14408807a6e61722
2014-02-11 02:35:52,589 INFO 
org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size 
[4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 
5.00; throttle point 2684354560; delete expired; major period 0, major 
jitter 0.50
2014-02-11 02:35:52,596 INFO 
org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size 
[4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 
5.00; throttle point 2684354560; delete expired; major period 0, major 
jitter 0.50
2014-02-11 02:35:55,458 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4289924, memsize=256.6 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/ed4d9fb392ae52c1a406a221defc6b00/.tmp/9e2cb318b0114248b9c62948cf47ac5b
2014-02-11 02:36:37,894 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, 
sequenceid=4289926, memsize=153.1 M, hasBloomFilter=true, into tmp file 
gpfs:/hbase/data/default/TestTable/110cc21c77569d595f7717b8c75fbf66/.tmp/4e55d6ba4b5644838163101f2ba20fdb
2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
Rolled WAL 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114789609
 with entries=416, filesize=578.7 M; new WAL 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114958416
2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409
 whose highest sequenceid is 4285071 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409
2014-02-11 02:36:53,162 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204
 whose highest sequenceid is 4285169 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204
2014-02-11 02:36:53,210 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023
 whose highest sequenceid is 4285266 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023
2014-02-11 02:37:13,297 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511
 whose highest sequenceid is 4285362 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511
2014-02-11 02:37:13,326 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587
 whose highest sequenceid is 4285453 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587
2014-02-11 02:37:13,383 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894
 whose highest sequenceid is 4285546 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894
2014-02-11 02:37:33,474 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
moving old hlog file 
/hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408
 whose highest sequenceid is 4285641 to 
/hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408
2014-02-11 02:37:33,481 INFO org.apache.hadoop.hbase.regionserver.HStore: Added

[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled

2014-02-11 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898295#comment-13898295
 ] 

Jonathan Hsieh commented on HBASE-10500:


lgtm.. +1

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch


 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898298#comment-13898298
]

Devaraj Das commented on HBASE-10490:
-

I can't say for sure if in HBase anyone configures infinite timeout (rpcTimeout
= 0) on the sockets but the pingery would have protected the client if it
wanted to wait for a while in the situations where the server is busy. So if
the rpcTimeout is passed as zero, the socket timeout is set to the ping
interval. That means the client won't retry when the timeout happens. It'll
just send a ping to figure out whether the server is still alive. If so, then
it'll continue to wait (as opposed to resending the request).

But I agree that if no one uses rpcTimeout = 0, we could remove the ping stuff.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898304#comment-13898304
]

Devaraj Das commented on HBASE-10490:
-

rpcTimeout in my last comment refers to the configured value of
hbase.rpc.timeout.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898344#comment-13898344
 ] 

Lars Hofhansl commented on HBASE-10493:
---

+1

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.98.1, 0.99.0

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10493:
---

Hadoop Flags: Reviewed

Integrated to 0.98 and trunk.

Thanks Lars for the reviews.

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.98.1, 0.99.0

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898361#comment-13898361
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-0.98 #148 (See 
[https://builds.apache.org/job/HBase-0.98/148/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-10493:
--

Fix Version/s: 0.94.17
   0.96.2

This is important correctness stuff that should be in 0.94 and 0.96 as well.

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898378#comment-13898378
 ] 

stack commented on HBASE-10493:
---

Agree

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region

2014-02-11 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7849:
---

Status: Patch Available  (was: Open)

 Provide administrative limits around bulkloads of files into a single region
 

 Key: HBASE-7849
 URL: https://issues.apache.org/jira/browse/HBASE-7849
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Harsh J
Assignee: Jimmy Xiang
 Attachments: hbase-7849.patch


 Given the current mechanism, it is possible for users to flood a single 
 region with 1k+ store files via the bulkload API and basically cause the 
 region to become a flying dutchman - never getting assigned successfully 
 again.
 Ideally, an administrative limit could solve this. If the bulkload RPC call 
 can check if the region already has X store files, then it can reject the 
 request to add another and throw a failure at the client with an appropriate 
 message.
 This may be an intrusive change, but seems necessary in perfecting the gap 
 between devs and ops in managing a HBase clusters. This would especially 
 prevent abuse in form of unaware devs not pre-splitting tables before 
 bulkloading things in. Currently, this leads to ops pain, as the devs think 
 HBase has gone non-functional and begin complaining.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region

2014-02-11 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7849:
---

Attachment: hbase-7849.patch

 Provide administrative limits around bulkloads of files into a single region
 

 Key: HBASE-7849
 URL: https://issues.apache.org/jira/browse/HBASE-7849
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Harsh J
Assignee: Jimmy Xiang
 Attachments: hbase-7849.patch


 Given the current mechanism, it is possible for users to flood a single 
 region with 1k+ store files via the bulkload API and basically cause the 
 region to become a flying dutchman - never getting assigned successfully 
 again.
 Ideally, an administrative limit could solve this. If the bulkload RPC call 
 can check if the region already has X store files, then it can reject the 
 request to add another and throw a failure at the client with an appropriate 
 message.
 This may be an intrusive change, but seems necessary in perfecting the gap 
 between devs and ops in managing a HBase clusters. This would especially 
 prevent abuse in form of unaware devs not pre-splitting tables before 
 bulkloading things in. Currently, this leads to ops pain, as the devs think 
 HBase has gone non-functional and begin complaining.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10493:
---

Priority: Major  (was: Minor)

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898382#comment-13898382
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #136 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/136/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region


[ 
https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898386#comment-13898386
 ] 

stack commented on HBASE-7849:
--

+1

 Provide administrative limits around bulkloads of files into a single region
 

 Key: HBASE-7849
 URL: https://issues.apache.org/jira/browse/HBASE-7849
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Harsh J
Assignee: Jimmy Xiang
 Attachments: hbase-7849.patch


 Given the current mechanism, it is possible for users to flood a single 
 region with 1k+ store files via the bulkload API and basically cause the 
 region to become a flying dutchman - never getting assigned successfully 
 again.
 Ideally, an administrative limit could solve this. If the bulkload RPC call 
 can check if the region already has X store files, then it can reject the 
 request to add another and throw a failure at the client with an appropriate 
 message.
 This may be an intrusive change, but seems necessary in perfecting the gap 
 between devs and ops in managing a HBase clusters. This would especially 
 prevent abuse in form of unaware devs not pre-splitting tables before 
 bulkloading things in. Currently, this leads to ops pain, as the devs think 
 HBase has gone non-functional and begin complaining.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10493:
---

Status: Open  (was: Patch Available)

Patch for 0.94 coming

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898396#comment-13898396
]

stack commented on HBASE-10490:
---

bq. meanwhile the server sees the connection as idle and closes it

If server is taking timeout to reply, the client will be gone anyways? If
request is taking tens of seconds, we should kill it? If this is expected, up
the socket timeout?

bq. we can remove the ping on the client without changing anything in the
server:

We could. I like purging ping altogether (unless I'm wrong above) since it
puts a particular shape on how we process the incoming requests (look for the
special -1 length indicator and short circuit if a ping) and would like this
cleaned up so easier putting in another request handling (e.g. async).

bq. But I agree that if no one uses rpcTimeout = 0, we could remove the ping
stuff.

Lets beat anyone who has their rpcTimeout to 0. Smile (That said, I have vague
recollection that rpcTimeout==0 was how we defaulted at one time so let me go
beat myself in the past)

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10500) hbck and OOM when BucketCache is enabled


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898398#comment-13898398
 ] 

stack commented on HBASE-10500:
---

+1

 hbck and OOM when BucketCache is enabled
 

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch


 Running {{hbck --repair}} when BucketCache is enabled in offheap mode can 
 cause OOM. This is apparently because {{bin/hbase}} does not include 
 $HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as 
 part of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes 
 its CacheConfig, which doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time


[ 
https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898410#comment-13898410
 ] 

stack commented on HBASE-10492:
---

6minutes is eons, a body blow.  If you run oracle jvm, does it exhibit same 
latencies?

ibm jdk taking this long scheduling a thread is a problem, a problem for us if 
we are to run well on ibmjdk.   We should dig and figure if it something about 
the way this thread is scheduled in particular or if it is the case that any 
thread can be swapped out for pauses of this magnitude.

 open daughter regions can unpredictably take long time
 --

 Key: HBASE-10492
 URL: https://issues.apache.org/jira/browse/HBASE-10492
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: Jerry He

 I have seen during a stress testing client was getting 
 RetriesExhaustedWithDetailsException: Failed 748 actions: 
 NotServingRegionException
 On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to 
 SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN.
 The corresponding time period on the region sever log is:
 {code}
 2014-02-08 20:44:12,662 WARN 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
 found for region: 010c1981882d1a59201af5e2dc589d44
 2014-02-08 20:44:12,666 WARN 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not 
 found for region: c2eb9b7971ca7f3fed3da86df5b788e7
 {code}
 There were no INFO related to these two regions until: (at the end see this: 
 Split took 57mins, 16sec)
 {code}
 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355
 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354
 2014-02-08 21:41:14,032 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for 
 region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Updated row 
 tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
  with server=hdtest208.svl.ibm.com,60020,1391887547473
 2014-02-08 21:41:14,054 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy 
 task for 
 tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
 2014-02-08 21:41:14,054 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for 
 region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: 
 Completed compaction of 10 file(s) in cf of 
 tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.
  into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 
 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute.
 2014-02-08 21:41:14,059 INFO 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed 
 compaction: Request = 
 regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.,
  storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, 
 time=1391924373278861000; duration=1mins, 40sec
 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Starting compaction on cf in region 
 tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: 
 Starting compaction of 10 file(s) in cf of 
 tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
  into 
 tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp,
  totalSize=709.7 M
 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Updated row 
 tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with 
 server=hdtest208.svl.ibm.com,60020,1391887547473
 2014-02-08 21:41:14,066 INFO

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898417#comment-13898417
]

Devaraj Das commented on HBASE-10490:
-

bq. Lets beat anyone who has their rpcTimeout to 0.

If it's as simple as beating them up, +1 (though I would advise you to not beat
yourself up just yet, Stack [smile]). Applications could break because of the
fact that a timeout of 0 won't be supported any more (maybe log a big warning
if we detect this in the RPC client). And if there is agreement, this should be
one of the things we stop supporting in the upcoming 1.0.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10493:
---

Attachment: 10493-0.94.txt

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-10493.


Resolution: Fixed

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10498) Add new APIs to load balancer interface

[
https://issues.apache.org/jira/browse/HBASE-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898420#comment-13898420
]

stack commented on HBASE-10498:
---

Can you not add a new attribute for the Stochastic LB to consider -- colocation
-- and weight it above others rather than add API? Or it may be the case that
colocating regions cross-cuts how SLB works currently (it having a single
region focus).

Add new APIs to load balancer interface
---

Key: HBASE-10498
URL: https://issues.apache.org/jira/browse/HBASE-10498
Project: HBase
Issue Type: Improvement
Components: Balancer
Reporter: rajeshbabu
Assignee: rajeshbabu
Fix For: 0.98.1, 0.99.0

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898421#comment-13898421
 ] 

Lars Hofhansl commented on HBASE-10493:
---

+1 on 0.94.
I assume the trunk patch applies to 0.96. I'll handle HBASE-10485 after this 
has gone in.

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-9360) Enable 0.94 - 0.96 replication to minimize upgrade down time

2014-02-11 Thread Francis Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898431#comment-13898431
]

Francis Liu commented on HBASE-9360:

Sorry late to the party. [~saint@gmail.com] mentioned this. We have a
different approach. We instead extended replication source/sink to use a thrift
client/server to ship/receive the edits. We plan on using it for 0.94-0.96
replication as well as encrypting replication communication. We currently have
0.94-0.94 next step is 0.94-0.96. If people are interested I can try and
share the code once we have things stable tho 0.94-0.96 might be a bit later.

Enable 0.94 - 0.96 replication to minimize upgrade down time
-

Key: HBASE-9360
URL: https://issues.apache.org/jira/browse/HBASE-9360
Project: HBase
Issue Type: Brainstorming
Components: migration
Affects Versions: 0.98.0, 0.96.0
Reporter: Jeffrey Zhong

As we know 0.96 is a singularity release, as of today a 0.94 hbase user has
to do in-place upgrade: make corresponding client changes, recompile client
application code, fully shut down existing 0.94 hbase cluster, deploy 0.96
binary, run upgrade script and then start the upgraded cluster. You can image
the down time will be extended if something is wrong in between.
To minimize the down time, another possible way is to setup a secondary 0.96
cluster and then setup replication between the existing 0.94 cluster and the
new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch
the traffic to the 0.96 cluster and decommission the old one.
The ideal steps will be:
1) Setup a 0.96 cluster
2) Setup replication between a running 0.94 cluster to the newly created 0.96
cluster
3) Wait till they're in sync in replication
4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop
relocation now)
5) Forward read traffic to the slave 0.96 cluster
6) After a certain period, stop writes to the original 0.94 cluster if
everything is good and completes upgrade
To get us there, there are two tasks:
1) Enable replication from 0.94 - 0.96
I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it
seems the best approach is to build a very similar service or on top of
https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support
three commands replicateLogEntries, multi and delete. Inside the three
commands, we just pass down the corresponding requests to the destination
0.96 cluster as a bridge. The reason to support the multi and delete is for
CopyTable to copy data from a 0.94 cluster to a 0.96 one.
The other approach is to provide limited support of 0.94 RPC protocol in
0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper
firstly before it can connect to a 0.96 region server. Therefore, we need a
faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to
connect. It may also pollute 0.96 code base with 0.94 RPC code.
2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need
to load both hbase clients into one single JVM using different class loader.
Let me know if you think this is worth to do and any better approach we could
take.
Thanks!

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898435#comment-13898435
]

Lars Hofhansl commented on HBASE-10490:
---

+1 on removing the pingery. Just this discussion shows how convoluted it has
become.
Even with rpcTimeout = 0, would clients actually break? Wouldn't they just
reconnect? (I might be confused)

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-9507) Promote methods of WALActionsListener to WALObserver


 [ 
https://issues.apache.org/jira/browse/HBASE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-9507:


Fix Version/s: 0.99.0

 Promote methods of WALActionsListener to WALObserver
 

 Key: HBASE-9507
 URL: https://issues.apache.org/jira/browse/HBASE-9507
 Project: HBase
  Issue Type: Brainstorming
  Components: Coprocessors, wal
Reporter: Nick Dimiduk
Priority: Minor
 Fix For: 0.99.0


 The interface exposed by WALObserver is quite minimal. To implement anything 
 of significance based on WAL events, WALActionsListener (at a minimum) is 
 required. This is demonstrated by the implementation of the replication 
 feature (not currently possible with coprocessors) and the corresponding 
 interface exploitation that is the [Side-Effect 
 Processor|https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep]. 
 Consider promoting the interface of WALActionsListener into WALObserver. This 
 goes a long way to being able refactor replication into a coprocessor. This 
 also removes the duplicate code path for listeners because they're already 
 available via coprocessor hook.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898447#comment-13898447
]

Devaraj Das commented on HBASE-10490:
-

bq. Even with rpcTimeout = 0, would clients actually break? Wouldn't they just
reconnect?

Most likely, they will. My point was that there is some change in semantics.
Clients might be handling them well enough already.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898452#comment-13898452
 ] 

Ted Yu commented on HBASE-10493:


Integrated to 0.94 and 0.96

 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10361) Enable/AlterTable support for region replicas


 [ 
https://issues.apache.org/jira/browse/HBASE-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-10361:


Attachment: 10361-1.txt

Patch that assumes HBASE-10350's patch on jira (10350-3.txt).

 Enable/AlterTable support for region replicas
 -

 Key: HBASE-10361
 URL: https://issues.apache.org/jira/browse/HBASE-10361
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Enis Soztutar
Assignee: Devaraj Das
 Fix For: 0.99.0

 Attachments: 10361-1.txt


 Add support for region replicas in master operations enable table and modify 
 table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10490) Simplify RpcClient code

[
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898468#comment-13898468
]

stack commented on HBASE-10490:
---

If someone wants timeout=0, and they want to avoid a beating, they could do
timeout=Long.MAX_VALUE? I can't think of a place where timeout=0 would make
any sense. Good stuff.

Simplify RpcClient code
---

Attachments: 10490.v1.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10500) Some tools OOM when BucketCache is enabled


 [ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10500:
-

Component/s: (was: hbck)
 HFile
Description: 
Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is 
enabled in offheap mode can cause OOM. This is apparently because {{bin/hbase}} 
does not include $HBASE_REGIONSERVER_OPTS for these tools. This results in 
HRegion or HFileReaders initialized with a CacheConfig that doesn't have the 
necessary Direct Memory.

Possible solutions include:
 - disable blockcache in the config used by hbck when running its repairs
 - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments

I'm leaning toward the former because it's possible that hbck is run on a host 
with different hardware profile as the RS.

  was:
Running {{hbck --repair}} when BucketCache is enabled in offheap mode can cause 
OOM. This is apparently because {{bin/hbase}} does not include 
$HBASE_REGIONSERVER_OPTS for hbck. It instantiates an HRegion instance as part 
of HDFSIntegrityFixer.handleHoleInRegionChain. That HRegion initializes its 
CacheConfig, which doesn't have the necessary Direct Memory.

Possible solutions include:
 - disable blockcache in the config used by hbck when running its repairs
 - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments

I'm leaning toward the former because it's possible that hbck is run on a host 
with different hardware profile as the RS.

Summary: Some tools OOM when BucketCache is enabled  (was: hbck and OOM 
when BucketCache is enabled)

 Some tools OOM when BucketCache is enabled
 --

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch


 Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is 
 enabled in offheap mode can cause OOM. This is apparently because 
 {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This 
 results in HRegion or HFileReaders initialized with a CacheConfig that 
 doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10500) Some tools OOM when BucketCache is enabled

[
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HBASE-10500:
-

Attachment: HBASE-10500.01.patch

Patch moves conf management into constructors so that programatic use is also
corrected. Without it, IntegrationTestImportTsv and IntegrationTestBulkLoad
fail.

Also remove the apparently redundant config from LoadIncrementalHFiles. If you
were kind enough to provide a +1 earlier, note that this patch is a little more
invasive.

Some tools OOM when BucketCache is enabled
--

Key: HBASE-10500
URL: https://issues.apache.org/jira/browse/HBASE-10500
Project: HBase
Issue Type: Bug
Components: HFile
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch

Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is
enabled in offheap mode can cause OOM. This is apparently because
{{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This
results in HRegion or HFileReaders initialized with a CacheConfig that
doesn't have the necessary Direct Memory.
Possible solutions include:
- disable blockcache in the config used by hbck when running its repairs
- include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
I'm leaning toward the former because it's possible that hbck is run on a
host with different hardware profile as the RS.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7849) Provide administrative limits around bulkloads of files into a single region


[ 
https://issues.apache.org/jira/browse/HBASE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898483#comment-13898483
 ] 

Hadoop QA commented on HBASE-7849:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628345/hbase-7849.patch
  against trunk revision .
  ATTACHMENT ID: 12628345

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 hadoop1.0{color}.  The patch failed to compile against the 
hadoop 1.0 profile.
Here is snippet of errors:
{code}[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project hbase-server: Compilation failure: Compilation 
failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java:[50,29]
 cannot find symbol
[ERROR] symbol  : class GenericTestUtils
[ERROR] location: package org.apache.hadoop.test
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java:[384,6]
 cannot find symbol
[ERROR] symbol  : variable GenericTestUtils
--
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project hbase-server: Compilation failure
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
--
Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation 
failure
at 
org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729)
at 
org.apache.maven.plugin.TestCompilerMojo.execute(TestCompilerMojo.java:161)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more{code}

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8663//console

This message is automatically generated.

 Provide administrative limits around bulkloads of files into a single region
 

 Key: HBASE-7849
 URL: https://issues.apache.org/jira/browse/HBASE-7849
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Harsh J
Assignee: Jimmy Xiang
 Attachments: hbase-7849.patch


 Given the current mechanism, it is possible for users to flood a single 
 region with 1k+ store files via the bulkload API and basically cause the 
 region to become a flying dutchman - never getting assigned successfully 
 again.
 Ideally, an administrative limit could solve this. If the bulkload RPC call 
 can check if the region already has X store files, then it can reject the 
 request to add another and throw a failure at the client with an appropriate 
 message.
 This may be an intrusive change, but seems necessary in perfecting the gap 
 between devs and ops in managing a HBase clusters. This would especially 
 prevent abuse in form of unaware devs not pre-splitting tables before 
 bulkloading things in. Currently, this leads to ops pain, as the devs think 
 HBase has gone non-functional and begin complaining.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10493) InclusiveStopFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898493#comment-13898493
 ] 

stack commented on HBASE-10493:
---

Thanks [~ted_yu]


 InclusiveStopFilter#filterKeyValue() should perform filtering on row key
 

 Key: HBASE-10493
 URL: https://issues.apache.org/jira/browse/HBASE-10493
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10493-0.94.txt, 10493-v1.txt, 10493-v2.txt


 InclusiveStopFilter inherits filterKeyValue() from FilterBase which always 
 returns ReturnCode.INCLUDE
 InclusiveStopFilter#filterKeyValue() should be consistent with filtering on 
 row key.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10500) Some tools OOM when BucketCache is enabled


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898491#comment-13898491
 ] 

stack commented on HBASE-10500:
---

Yeah, probably better.  I can't think of case where tool would need to offheap. 
 If it does, lets deal then.  Meantime, get these tools useable again when 
offheap enabled.

 Some tools OOM when BucketCache is enabled
 --

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch


 Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is 
 enabled in offheap mode can cause OOM. This is apparently because 
 {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This 
 results in HRegion or HFileReaders initialized with a CacheConfig that 
 doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10500) Some tools OOM when BucketCache is enabled


[ 
https://issues.apache.org/jira/browse/HBASE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898492#comment-13898492
 ] 

Hadoop QA commented on HBASE-10500:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628314/HBASE-10500.00.patch
  against trunk revision .
  ATTACHMENT ID: 12628314

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8662//console

This message is automatically generated.

 Some tools OOM when BucketCache is enabled
 --

 Key: HBASE-10500
 URL: https://issues.apache.org/jira/browse/HBASE-10500
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10500.00.patch, HBASE-10500.01.patch


 Running {{hbck --repair}} or {{LoadIncrementalHFiles}} when BucketCache is 
 enabled in offheap mode can cause OOM. This is apparently because 
 {{bin/hbase}} does not include $HBASE_REGIONSERVER_OPTS for these tools. This 
 results in HRegion or HFileReaders initialized with a CacheConfig that 
 doesn't have the necessary Direct Memory.
 Possible solutions include:
  - disable blockcache in the config used by hbck when running its repairs
  - include HBASE_REGIONSERVER_OPTS in the HBaseFSCK startup arguments
 I'm leaning toward the former because it's possible that hbck is run on a 
 host with different hardware profile as the RS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key


[ 
https://issues.apache.org/jira/browse/HBASE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898498#comment-13898498
 ] 

Lars Hofhansl commented on HBASE-10485:
---

Committed to 0.96 after HBASE-10493 (also made AlwaysNextColFilter private).

 PrefixFilter#filterKeyValue() should perform filtering on row key
 -

 Key: HBASE-10485
 URL: https://issues.apache.org/jira/browse/HBASE-10485
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10485-0.94-v2.txt, 10485-0.94.txt, 10485-trunk-v2.txt, 
 10485-trunk.addendum, 10485-v1.txt


 Niels reported an issue under the thread 'Trouble writing custom filter for 
 use in FilterList' where his custom filter used in FilterList along with 
 PrefixFilter produced an unexpected results.
 His test can be found here:
 https://github.com/nielsbasjes/HBase-filter-problem
 This is due to PrefixFilter#filterKeyValue() using 
 FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE
 When FilterList.Operator.MUST_PASS_ONE is specified, 
 FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key 
 prefix doesn't match meanwhile the other filter's filterKeyValue() returns 
 ReturnCode.NEXT_COL



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key


 [ 
https://issues.apache.org/jira/browse/HBASE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-10485:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

And committed to 0.94 as well. Thanks Ted.

 PrefixFilter#filterKeyValue() should perform filtering on row key
 -

 Key: HBASE-10485
 URL: https://issues.apache.org/jira/browse/HBASE-10485
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17

 Attachments: 10485-0.94-v2.txt, 10485-0.94.txt, 10485-trunk-v2.txt, 
 10485-trunk.addendum, 10485-v1.txt


 Niels reported an issue under the thread 'Trouble writing custom filter for 
 use in FilterList' where his custom filter used in FilterList along with 
 PrefixFilter produced an unexpected results.
 His test can be found here:
 https://github.com/nielsbasjes/HBase-filter-problem
 This is due to PrefixFilter#filterKeyValue() using 
 FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE
 When FilterList.Operator.MUST_PASS_ONE is specified, 
 FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key 
 prefix doesn't match meanwhile the other filter's filterKeyValue() returns 
 ReturnCode.NEXT_COL



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10504) Define Replication Interface

stack created HBASE-10504:
-

 Summary: Define Replication Interface
 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


HBase has replication.  Fellas have been hijacking the replication apis to do 
all kinds of perverse stuff like indexing hbase content (hbase-indexer 
https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
overrides that replicate via an alternate channel (over a secure thrift channel 
between dcs over on HBASE-9360).  This issue is about surfacing these APIs as 
public with guarantees to downstreamers similar to those we have on our public 
client-facing APIs (and so we don't break them for downstreamers).

Any input [~phunt] or [~gabriel.reid] or [~toffer]?

Thanks.

 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10504) Define Replication Interface


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898504#comment-13898504
 ] 

stack commented on HBASE-10504:
---

Is HBASE-9507 related?

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10504) Define Replication Interface