[jira] [Commented] (HBASE-10169) Batch coprocessor

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848909#comment-13848909
 ] 

Hadoop QA commented on HBASE-10169:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12618869/Batch%20Coprocessor%20Design%20Document.docx
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8175//console

This message is automatically generated.

 Batch coprocessor
 -

 Key: HBASE-10169
 URL: https://issues.apache.org/jira/browse/HBASE-10169
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Affects Versions: 0.99.0
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Attachments: Batch Coprocessor Design Document.docx, HBASE-10169.patch


 This is designed to improve the coprocessor invocation in the client side. 
 Currently the coprocessor invocation is to send a call to each region. If 
 there’s one region server, and 100 regions are located in this server, each 
 coprocessor invocation will send 100 calls, each call uses a single thread in 
 the client side. The threads will run out soon when the coprocessor 
 invocations are heavy. 
 In this design, all the calls to the same region server will be grouped into 
 one in a single coprocessor invocation. This call will be spread into each 
 region in the server side, and the results will be merged ahead in the server 
 side before being returned to the client.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849025#comment-13849025
 ] 

ramkrishna.s.vasudevan commented on HBASE-10137:


As this is fixed in Trunk this change should be fine.  
+1 for being only on trunk.
One quesiton is 
bq.In case of any failures retrying assignments and wait in 
GeneralBulkAssigner#waitUntilDone.
Yes this would happen now.
But after disabled had happened before enabling, if a  REgion server goes down 
and going with retainAssignment? how would this work?  
META may have the dead RS entry but the servermanager won't be giving the old 
ones.  Will this be handled here?

 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849040#comment-13849040
 ] 

rajeshbabu commented on HBASE-10137:


@Ram,
bq. But after disabled had happened before enabling, if a REgion server goes 
down and going with retainAssignment? how would this work? META may have the 
dead RS entry but the servermanager won't be giving the old ones. Will this be 
handled here?
If a region server is dead and no new RS started in the same host, then 
randomly one server from online servers(list from server manager) will be 
selected for the region. Its fine only.
{code}
  if (localServers.isEmpty()) {
// No servers on the new cluster match up with this hostname,
// assign randomly.
ServerName randomServer = servers.get(RANDOM.nextInt(servers.size()));
assignments.get(randomServer).add(region);
numRandomAssignments++;
{code}

 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849045#comment-13849045
 ] 

ramkrishna.s.vasudevan commented on HBASE-10137:


+1 on commit then.  
[~jxiang]
Want to take a look before commit?

 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: HBase upgrade error

2013-12-16 Thread Ted Yu
The exception happened in protobuf

Can you upgrade hadoop to release 2.2 ?

Cheers

On Dec 15, 2013, at 11:58 PM, hzwangxx whwangxingx...@163.com wrote:

 Hi,
I want to upgrade hbase from 0.94 to 0.96. According to 
 http://hbase.apache.org/book/upgrade0.96.html. I shutdown 0.94 cluster, use 
 bin/hbase upgrade -check, I encountered a problem as follows:
 
 13/12/16 15:43:08 INFO util.HFileV1Detector: Target dir is: 
 hdfs://node01:9000/hbase
 13/12/16 15:43:08 ERROR util.HFileV1Detector: 
 java.lang.UnsupportedOperationException: This is supposed to be overridden by 
 subclasses.
 13/12/16 15:43:08 WARN migration.UpgradeTo96: There are some HFileV1, or 
 corrupt files (files with incorrect major version).
 
 and use bin/hbase upgrade -execute can list the trackers:
 13/12/16 15:45:56 INFO zookeeper.ClientCnxn: EventThread shut down
 13/12/16 15:45:56 INFO zookeeper.ZooKeeper: Session: 0x142f975e49e0003 closed
 13/12/16 15:45:56 INFO migration.UpgradeTo96: Starting Namespace upgrade
 13/12/16 15:45:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 Exception in thread main java.lang.UnsupportedOperationException: This is 
 supposed to be overridden by subclasses.
at 
 com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:18005)
at 
 com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:193)
at sun.proxy.$Proxy10.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at sun.proxy.$Proxy10.getListing(Unknown Source)
at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:440)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1526)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1509)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:407)
at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:473)
at 
 org.apache.hadoop.hbase.migration.NamespaceUpgrade.verifyNSUpgrade(NamespaceUpgrade.java:488)
at 
 org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:127)
at 
 org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:502)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
 org.apache.hadoop.hbase.migration.UpgradeTo96.executeTool(UpgradeTo96.java:222)
at 
 org.apache.hadoop.hbase.migration.UpgradeTo96.executeUpgrade(UpgradeTo96.java:212)
at 
 org.apache.hadoop.hbase.migration.UpgradeTo96.run(UpgradeTo96.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
 org.apache.hadoop.hbase.migration.UpgradeTo96.main(UpgradeTo96.java:258)
 
 my hadoop version is: hadoop-2.0.0-cdh4.2.1 and hbase version is 
 0.94.0-cdh4.2.1.
 
 Waiting for your help. Thanks~
 Best Wishes!
 hzwangxx


[jira] [Commented] (HBASE-10087) Store should be locked during a memstore snapshot

2013-12-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849215#comment-13849215
 ] 

Nicolas Liochon commented on HBASE-10087:
-

Sorry for the delay.

Theoretically, we have an issue:
{code}
@Override
public void prepare() {
  memstore.snapshot();
  this.snapshot = memstore.getSnapshot();
  this.snapshotTimeRangeTracker = memstore.getSnapshotTimeRangeTracker(); 
}
{code}
w/o a lock, we could have an inconsistency between the snapshot and the 
snapshotTimeRangeTracker.
As well, we could have an inconsistency in memstore.snapshot();

But it seems it cannot happen, because: 
Store#snaphot is called only in the tests.
Store#prepare is called with the write lock on the HRegion.
The functions that will modify the store also have the read lock on HRegion, so 
we can't be inconsistent in practice.

This said, adding the lock in prepare() would be more consistent imho. I would 
propose to add it on trunk.




 Store should be locked during a memstore snapshot
 -

 Key: HBASE-10087
 URL: https://issues.apache.org/jira/browse/HBASE-10087
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.96.1, 0.94.14
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 10079.v1.patch


 regression from HBASE-9963, found while looking at HBASE-10079.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HBASE-10172) hbase upgrade -check error

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-10172.
---

Resolution: Won't Fix

What [~himan...@cloudera.com] says.  Check your protobuf versions in your 
classpath.  It looks like the first pb lib encountered is not 2.5.x as we 
expect (perhaps you have old pb lib still in your cp?)

 hbase upgrade -check error
 --

 Key: HBASE-10172
 URL: https://issues.apache.org/jira/browse/HBASE-10172
 Project: HBase
  Issue Type: Bug
Reporter: wang xiyi

 Hi, I want to upgrade hbase from 0.94 to 0.96. According to 
 http://hbase.apache.org/book/upgrade0.96.html, I encountered a problem as 
 follows:
 {code}
 Exception in thread main java.lang.UnsupportedOperationException: This is 
 supposed to be overridden by subclasses.
 at 
 com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:18005)
 .
 {code}
 hadoop version: hadoop-2.0.0-cdh4.2.1 
 hbase version:hbase-0.94.0-cdh4.2.1
 Thanks~



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10174:
---

Attachment: 9667-0.94.patch

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10174:
--

 Summary: Back port HBASE-9667 'NullOutputStream removed from Guava 
15' to 0.94
 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9667-0.94.patch

On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
NoClassDefFoundError when he used Guava 15.

The issue has been fixed in 0.96 + by HBASE-9667

This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849334#comment-13849334
 ] 

Jimmy Xiang commented on HBASE-10137:
-

I was thinking about the same too, just haven't get a chance to do it. Good 
stuff. +1. Should we remove class BulkEnabler if not used any more?

 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10163) Example Thrift DemoClient is flaky

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849369#comment-13849369
 ] 

Sergey Shelukhin commented on HBASE-10163:
--

Maybe add explicit timestamps?

 Example Thrift DemoClient is flaky
 --

 Key: HBASE-10163
 URL: https://issues.apache.org/jira/browse/HBASE-10163
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Trivial
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: hbase-10163_v1.patch


 DemoClient for thrift under hbase-examples is failing sometimes because of 
 earlier delete eclipses future puts with same timestamp. 
 {code}
 row: 6, cols: unused: = DELETE_ME;
 row: 6, cols: entry:num = -1;
 row: 6, cols: entry:num = 6; entry:sqr = 36;
 row: 6, cols: entry:num = 6; entry:sqr = 36;
 row: 6, values: 6; -1;
 row: 5, cols: unused: = DELETE_ME;
 row: 5, values:
 FATAL: wrong # of versions
 {code}
 Similar to the one we have, we can add another small sleep between delete and 
 subsequent put. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849384#comment-13849384
 ] 

Sergey Shelukhin commented on HBASE-10138:
--

[~lhofhansl] [~ndimiduk] there's no clear component owner for this, you guys 
want to review?

 incorrect or confusing test value is used in block caches
 -

 Key: HBASE-10138
 URL: https://issues.apache.org/jira/browse/HBASE-10138
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10138.patch


 DEFAULT_BLOCKSIZE_SMALL is described as:
 {code}
   // Make default block size for StoreFiles 8k while testing.  TODO: FIX!
   // Need to make it 8k for testing.
   public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024;
 {code}
 This value is used on production path in CacheConfig thru HStore/HRegion, and 
 passed to various cache object.
 We should change it to actual block size, or if it is somehow by design at 
 least we should clarify it and remove the comment. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849386#comment-13849386
 ] 

rajeshbabu commented on HBASE-10137:


Thanks [~jxiang] for review.
With the patch it wont be used any more. I will remove it and upload new patch.


 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849395#comment-13849395
 ] 

Sergey Shelukhin commented on HBASE-10136:
--

Agree with Jon, it looks like the implementation detail to me.

 Alter table conflicts with concurrent snapshot attempt on that table
 

 Key: HBASE-10136
 URL: https://issues.apache.org/jira/browse/HBASE-10136
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.96.0, 0.98.1, 0.99.0
Reporter: Aleksandr Shulman
Assignee: Matteo Bertozzi
  Labels: online_schema_change

 Expected behavior:
 A user can issue a request for a snapshot of a table while that table is 
 undergoing an online schema change and expect that snapshot request to 
 complete correctly. Also, the same is true if a user issues a online schema 
 change request while a snapshot attempt is ongoing.
 Observed behavior:
 Snapshot attempts time out when there is an ongoing online schema change 
 because the region is closed and opened during the snapshot. 
 As a side-note, I would expect that the attempt should fail quickly as 
 opposed to timing out. 
 Further, what I have seen is that subsequent attempts to snapshot the table 
 fail because of some state/cleanup issues. This is also concerning.
 Immediate error:
 {code}type=FLUSH }' is still in progress!
 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) 
 Sleeping: 1ms while waiting for snapshot completion.
 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting 
 current status of snapshot from master...
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in 
 progress!
 Snapshot failure occurred
 org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 
 'snapshot0' wasn't completed in expectedTime:6 ms
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
   at 
 org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
 Likely root cause of error:
 {code}Exception in SnapshotSubprocedurePool
 java.util.concurrent.ExecutionException: 
 org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at 

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849396#comment-13849396
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

That's an interesting one. Given that snapshots by default have no guarantees 
wrt consistent writes between regions (or do they), seems like snapshot should 
get the latest schema in case of concurrent alter. Is there any consideration 
(other the arguably implementation issues of not recovering from close-open) 
that would prevent that? For consistent snapshots presumably the schema can be 
snapshotted first, I am assuming they don't stop the world and just take 
seqId/mvcc/ts or something, so the newer values with new schema will just not 
exist.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849403#comment-13849403
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

The problem isn't that you would get snapshots with inconsistent schemas if the 
two operations were issued concurrently.  It is that open is async and outside 
the table write lock which means  the snapshot would fail because the region 
may no have been open. 

This is a particular case where we would want the open routines to act 
synchronously with table alters and split daugher region opens (both open 
before table lock released and snapshot can happen).



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions

2013-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849406#comment-13849406
 ] 

stack commented on HBASE-10137:
---

Should we rename GeneralBulkEnable as BulkEnabler rather than just remove 
BulkEnabler (if a 'GeneralBulkEnabler', folks will go looking for a 
SpecialBulkEnablers, or SergeantBulkEnabler...)?

 GeneralBulkAssigner with retain assignment plan can be used in 
 EnableTableHandler to bulk assign the regions
 

 Key: HBASE-10137
 URL: https://issues.apache.org/jira/browse/HBASE-10137
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10137.patch


 Current in BulkEnabler we are assigning one region at a time, instead we can 
 use GeneralBulkAssigner to bulk assign multiple regions at a time.
 {code}
   for (HRegionInfo region : regions) {
 if (assignmentManager.getRegionStates()
 .isRegionInTransition(region)) {
   continue;
 }
 final HRegionInfo hri = region;
 pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() {
   public void run() {
 assignmentManager.assign(hri, true);
   }
 }));
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table

2013-12-16 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849407#comment-13849407
 ] 

Matteo Bertozzi commented on HBASE-10136:
-

as I pointed out in my comment above we can fix case by case, but the main 
problems are still there.
If you implement a new handler you have to keep in mind the rules to make 
everything working.
In this case for example, the end of the handler is not the hand of the 
operation so the lock is released early.
also in this case, the master call uses handler.process() instead of the 
executor pool to make the client operation synchronous.
In the delete table case the last operation must be the removal of the table 
descriptor, otherwise the client call will not be synchronous.
...and so on with other, implementation details.

I've pointed out the new master design, to discuss this set of rules and be 
part of the design.
We must be able to know when an operation end, and not just guessing based on 
the result state of an operation. And this is a must for both server side (e.g. 
releasing the lock) and client side (e.g. sync operation)

 Alter table conflicts with concurrent snapshot attempt on that table
 

 Key: HBASE-10136
 URL: https://issues.apache.org/jira/browse/HBASE-10136
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.96.0, 0.98.1, 0.99.0
Reporter: Aleksandr Shulman
Assignee: Matteo Bertozzi
  Labels: online_schema_change

 Expected behavior:
 A user can issue a request for a snapshot of a table while that table is 
 undergoing an online schema change and expect that snapshot request to 
 complete correctly. Also, the same is true if a user issues a online schema 
 change request while a snapshot attempt is ongoing.
 Observed behavior:
 Snapshot attempts time out when there is an ongoing online schema change 
 because the region is closed and opened during the snapshot. 
 As a side-note, I would expect that the attempt should fail quickly as 
 opposed to timing out. 
 Further, what I have seen is that subsequent attempts to snapshot the table 
 fail because of some state/cleanup issues. This is also concerning.
 Immediate error:
 {code}type=FLUSH }' is still in progress!
 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) 
 Sleeping: 1ms while waiting for snapshot completion.
 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting 
 current status of snapshot from master...
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in 
 progress!
 Snapshot failure occurred
 org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 
 'snapshot0' wasn't completed in expectedTime:6 ms
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
   at 
 org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
 Likely root cause of error:
 {code}Exception in SnapshotSubprocedurePool
 java.util.concurrent.ExecutionException: 
 org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 

[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table

2013-12-16 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849412#comment-13849412
 ] 

Jonathan Hsieh commented on HBASE-10136:


I agree with needing rules -- the invariant I think we need here is that if an 
operation starts with a region in open state and is supposed to complete with 
the regions in open state, it must be open.  (or a suitable replacement must be 
open).

Currently I only see open/close/open conflicts.  (splits/alters, likely 
merges).  can we get away with just fixing those three operations so that 
their respective table locks are held until the opens complete?  Is the wait 
until handler completion needed for any other operations?





 Alter table conflicts with concurrent snapshot attempt on that table
 

 Key: HBASE-10136
 URL: https://issues.apache.org/jira/browse/HBASE-10136
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.96.0, 0.98.1, 0.99.0
Reporter: Aleksandr Shulman
Assignee: Matteo Bertozzi
  Labels: online_schema_change

 Expected behavior:
 A user can issue a request for a snapshot of a table while that table is 
 undergoing an online schema change and expect that snapshot request to 
 complete correctly. Also, the same is true if a user issues a online schema 
 change request while a snapshot attempt is ongoing.
 Observed behavior:
 Snapshot attempts time out when there is an ongoing online schema change 
 because the region is closed and opened during the snapshot. 
 As a side-note, I would expect that the attempt should fail quickly as 
 opposed to timing out. 
 Further, what I have seen is that subsequent attempts to snapshot the table 
 fail because of some state/cleanup issues. This is also concerning.
 Immediate error:
 {code}type=FLUSH }' is still in progress!
 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) 
 Sleeping: 1ms while waiting for snapshot completion.
 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting 
 current status of snapshot from master...
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in 
 progress!
 Snapshot failure occurred
 org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 
 'snapshot0' wasn't completed in expectedTime:6 ms
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
   at 
 org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
 Likely root cause of error:
 {code}Exception in SnapshotSubprocedurePool
 java.util.concurrent.ExecutionException: 
 org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
   at 
 

[jira] [Commented] (HBASE-9927) ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily

2013-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849413#comment-13849413
 ] 

Hudson commented on HBASE-9927:
---

SUCCESS: Integrated in HBase-0.94 #1228 (See 
[https://builds.apache.org/job/HBase-0.94/1228/])
HBASE-9927 ReplicationLogCleaner#stop() calls 
HConnectionManager#deleteConnection() unnecessarily (tedyu: rev 1551273)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java


 ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() 
 unnecessarily
 --

 Key: HBASE-9927
 URL: https://issues.apache.org/jira/browse/HBASE-9927
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.94.15

 Attachments: 9927.txt


 When inspecting log, I found the following:
 {code}
 2013-11-08 18:23:48,472 ERROR [M:0;kiyo:42380.oldLogCleaner] 
 client.HConnectionManager(468): Connection not found in the list, can't 
 delete it (connection key=HConnectionKey{properties={hbase.rpc.timeout=6, 
 hbase.zookeeper.property.clientPort=59832, hbase.client.pause=100, 
 zookeeper.znode.parent=/hbase, hbase.client.retries.number=350, 
 hbase.zookeeper.quorum=localhost}, username='zy'}). May be the key was 
 modified?
 java.lang.Exception
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:468)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:404)
 at 
 org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.stop(ReplicationLogCleaner.java:141)
 at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.cleanup(CleanerChore.java:276)
 {code}
 The call to HConnectionManager#deleteConnection() is not needed.
 Here is related code which has a comment for this effect:
 {code}
 // Not sure why we're deleting a connection that we never acquired or used
 HConnectionManager.deleteConnection(this.getConf());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits

2013-12-16 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8701:
-

Attachment: hbase-8701-tag.patch

Port the old patch to use tag. 

1) During KeyValue comparison after rowkey, colfam/qual, timestamp, type are 
compared(same key), log sequence number will be used if there is any otherwise 
mvcc will be used. So the extra cost to fetch the log sequence number from tag 
is trivial because comparison normally terminates much earlier before reaching 
log sequence number or mvcc. 

2) Only edits being replayed during recovery will be tagged with their own 
original log sequence number therefore the extra tag storage overhead is 
insignificant considering recovery doesn't happen often and tagging is only 
applied upon unflushed edits.


 distributedLogReplay need to apply wal edits in the receiving order of those 
 edits
 --

 Key: HBASE-8701
 URL: https://issues.apache.org/jira/browse/HBASE-8701
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0

 Attachments: 8701-v3.txt, hbase-8701-tag.patch, hbase-8701-v4.patch, 
 hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, 
 hbase-8701-v8.patch


 This issue happens in distributedLogReplay mode when recovering multiple puts 
 of the same key + version(timestamp). After replay, the value is 
 nondeterministic of the key
 h5. The original concern situation raised from [~eclark]:
 For all edits the rowkey is the same.
 There's a log with: [ A (ts = 0), B (ts = 0) ]
 Replay the first half of the log.
 A user puts in C (ts = 0)
 Memstore has to flush
 A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
 Replay the rest of the Log.
 Flush
 The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
 Put(key,t=T) in WAL2
 h5. Below is the option(proposed by Ted) I'd like to use:
 a) During replay, we pass original wal sequence number of each edit to the 
 receiving RS
 b) In receiving RS, we store negative original sequence number of wal edits 
 into mvcc field of KVs of wal edits
 c) Add handling of negative MVCC in KVScannerComparator and KVComparator   
 d) In receiving RS, write original sequence number into an optional field of 
 wal file for chained RS failure situation 
 e) When opening a region, we add a safety bumper(a large number) in order for 
 the new sequence number of a newly opened region not to collide with old 
 sequence numbers. 
 In the future, when we stores sequence number along with KVs, we can adjust 
 the above solution a little bit by avoiding to overload MVCC field.
 h5. The other alternative options are listed below for references:
 Option one
 a) disallow writes during recovery
 b) during replay, we pass original wal sequence ids
 c) hold flush till all wals of a recovering region are replayed. Memstore 
 should hold because we only recover unflushed wal edits. For edits with same 
 key + version, whichever with larger sequence Id wins.
 Option two
 a) During replay, we pass original wal sequence ids
 b) for each wal edit, we store each edit's original sequence id along with 
 its key. 
 c) during scanning, we use the original sequence id if it's present otherwise 
 its store file sequence Id
 d) compaction can just leave put with max sequence id
 Please let me know if you have better ideas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10174:
---

Fix Version/s: 0.94.15

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849431#comment-13849431
 ] 

Ted Yu commented on HBASE-10174:


Test suite passed with patch:
{code}
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:14:33.284s
[INFO] Finished at: Mon Dec 16 18:23:47 UTC 2013
[INFO] Final Memory: 36M/642M
{code}

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10142) TestLogRolling#testLogRollOnDatanodeDeath test failure

2013-12-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849429#comment-13849429
 ] 

Andrew Purtell commented on HBASE-10142:


This doesn't manifest on reasonably endowed VMs. However, we see two instances 
of different problems with this test on builds.apache.org.

From the Hadoop 2 based build 
https://builds.apache.org/job/hbase-0.98/14/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRolling/testLogRollOnDatanodeDeath/:
{noformat}
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.getLogReplication(FSHLog.java:1391)
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:400)
[...]
Caused by: java.nio.channels.ClosedChannelException
at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1317)
at 
org.apache.hadoop.hdfs.DFSOutputStream.getCurrentBlockReplication(DFSOutputStream.java:1762)
at 
org.apache.hadoop.hdfs.DFSOutputStream.getNumCurrentReplicas(DFSOutputStream.java:1751)
{noformat}

And from the Hadoop 1 based build 
https://builds.apache.org/job/hbase-0.98-on-hadoop-1.1/10/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRolling/testLogRollOnDatanodeDeath/:
{noformat}
java.lang.AssertionError: Missing datanode should've triggered a log roll
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:388)
{noformat}

So I am strongly inclined to disable this test as a flapper.

 TestLogRolling#testLogRollOnDatanodeDeath test failure
 --

 Key: HBASE-10142
 URL: https://issues.apache.org/jira/browse/HBASE-10142
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
 Fix For: 0.98.0


 This is a demanding unit test, which fails fairly often as software versions 
 (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I 
 see this failure:
 {noformat}
 Failed tests:   
 testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling):
  LowReplication Roller should've been disabled, current replication=1
 {noformat} 
 Could be a timing issue after the recent switch to Hadoop 2 as default 
 build/test profile. Let's see if more leniency makes sense and if it can 
 stabilize the test before disabling it.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10142) TestLogRolling#testLogRollOnDatanodeDeath test failure

2013-12-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849435#comment-13849435
 ] 

Andrew Purtell commented on HBASE-10142:


Could not reproduce on a m3.4xlarge build box running Amazon Linux (RHEL 
derived) and Oracle JDK 7u21 after 100 iterations.

 TestLogRolling#testLogRollOnDatanodeDeath test failure
 --

 Key: HBASE-10142
 URL: https://issues.apache.org/jira/browse/HBASE-10142
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
 Fix For: 0.98.0


 This is a demanding unit test, which fails fairly often as software versions 
 (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I 
 see this failure:
 {noformat}
 Failed tests:   
 testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling):
  LowReplication Roller should've been disabled, current replication=1
 {noformat} 
 Could be a timing issue after the recent switch to Hadoop 2 as default 
 build/test profile. Let's see if more leniency makes sense and if it can 
 stabilize the test before disabling it.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10087) Store should be locked during a memstore snapshot

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10087:
--

Fix Version/s: (was: 0.96.1)
   0.99.0

 Store should be locked during a memstore snapshot
 -

 Key: HBASE-10087
 URL: https://issues.apache.org/jira/browse/HBASE-10087
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.96.1, 0.94.14
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.94.15, 0.99.0

 Attachments: 10079.v1.patch


 regression from HBASE-9963, found while looking at HBASE-10079.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9942) hbase Scanner specifications accepting wrong specifier and then after scan using correct specifier returning unexpected result

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9942:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 hbase Scanner specifications accepting wrong specifier and then after scan 
 using correct specifier returning unexpected result 
 ---

 Key: HBASE-9942
 URL: https://issues.apache.org/jira/browse/HBASE-9942
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.13
Reporter: Deepak Sharma
Priority: Minor
 Fix For: 0.99.0


 check the given scenerio:
 1. log in to hbase client -- ./hbase shell
 2. created table 'tab1' 
 hbase(main):001:0 create 'tab1' , 'fa1'
 3. put some 10 rows (row1 to row10)  in table 'tab1'
 4.  run the scan for table 'tab1' as follows:
  
 hbase(main):013:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' }
 ROW   COLUMN+CELL 
   

  row4 column=fa1:col1, 
 timestamp=1384164182738, value=value1 
   
  row5 column=fa1:col1, 
 timestamp=1384164188396, value=value1 
   
  row6 column=fa1:col1, 
 timestamp=1384164192395, value=value1 
   
  row7 column=fa1:col1, 
 timestamp=1384164197693, value=value1 
   
  row8 column=fa1:col1, 
 timestamp=1384164203237, value=value1 
   
 5 row(s) in 0.0540 seconds
 so result was expected , rows from 'row4' to 'row8' are displayed
 5. then run the scan using wrong specifier ( '=' instead of '=') so get 
 wrong result
 hbase(main):014:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' }
 ROW   COLUMN+CELL 
   

  row1 column=fa1:col1, 
 timestamp=1384164167838, value=value1 
   
  row10column=fa1:col1, 
 timestamp=1384164212615, value=value1 
   
  row2 column=fa1:col1, 
 timestamp=1384164175337, value=value1 
   
  row3 column=fa1:col1, 
 timestamp=1384164179068, value=value1 
   
  row4 column=fa1:col1, 
 timestamp=1384164182738, value=value1 
   
  row5 column=fa1:col1, 
 timestamp=1384164188396, value=value1 
   
  row6 column=fa1:col1, 
 timestamp=1384164192395, value=value1 
   
  row7 column=fa1:col1, 
 timestamp=1384164197693, value=value1 
   
  row8 column=fa1:col1, 
 timestamp=1384164203237, value=value1 
   
  row9 column=fa1:col1, 
 timestamp=1384164208375, value=value1 
   
 10 row(s) in 0.0390 seconds
 6. now performed correct scan query with correct specifier ( used '=' as 
 specifier)
 hbase(main):015:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' }
 ROW   COLUMN+CELL 
   

  row1 column=fa1:col1, 
 timestamp=1384164167838, value=value1 
   
  row10column=fa1:col1, 
 timestamp=1384164212615, value=value1 
   
  row2

[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9969:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.1, 0.99.0

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10018) Change the location prefetch

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10018:
--

Fix Version/s: (was: 0.96.1)
   0.99.0

 Change the location prefetch
 

 Key: HBASE-10018
 URL: https://issues.apache.org/jira/browse/HBASE-10018
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 0.98.0, 0.99.0


 Issues with prefetching are:
 - we do two calls to meta: one for the exact row, one for the prefetch 
 - it's done in a lock
 - we take the next 10 regions. Why 10, why the 10 next?
 - is it useful if the table has 100K regions?
 Options are:
 - just remove it
 - replace it with a reverse scan: this would save a call
  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8329) Limit compaction speed

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8329:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Limit compaction speed
 --

 Key: HBASE-8329
 URL: https://issues.apache.org/jira/browse/HBASE-8329
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: binlijin
Assignee: binlijin
 Fix For: 0.99.0

 Attachments: HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, 
 HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, 
 HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-trunk.patch


 There is no speed or resource limit for compaction,I think we should add this 
 feature especially when request burst.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9151:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 HBCK cannot fix when meta server znode deleted, this can happen if all region 
 servers stopped and there are no logs to split.
 -

 Key: HBASE-9151
 URL: https://issues.apache.org/jira/browse/HBASE-9151
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0


 When meta server znode deleted and meta in FAILED_OPEN state, then hbck 
 cannot fix it. This scenario can come when all region servers stopped by stop 
 command and didnt start any RS within 10 secs(with default configurations). 
 {code}
   public void assignMeta() throws KeeperException {
 MetaRegionTracker.deleteMetaLocation(this.watcher);
 assign(HRegionInfo.FIRST_META_REGIONINFO, true);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8859) truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8859:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 truncate_preserve should get table split keys as it is instead of converting 
 them to string type and then again to bytes
 

 Key: HBASE-8859
 URL: https://issues.apache.org/jira/browse/HBASE-8859
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.95.1
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-8859-Test_to_reproduce.patch, 
 HBASE-8859_trunk.patch, HBASE-8859_trunk_2.patch


 If we take int,long or double bytes as split keys then we are not creating 
 table with same split keys because converting them to strings directly and to 
 bytes is giving different split keys, sometimes getting IllegalArgument 
 exception because of same split keys(converted). Instead we can get split 
 keys directly from HTable and pass them while creating table.
 {code}
   h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name)
   splits = h_table.getRegionLocations().keys().map{|i| i.getStartKey} 
 :byte
   splits = org.apache.hadoop.hbase.util.Bytes.toByteArrays(splits)
 {code}
 {code}
 Truncating 'emp3' table (it may take a while):
  - Disabling table...
  - Dropping table...
  - Creating table with region boundaries...
 ERROR: java.lang.IllegalArgumentException: All split keys must be unique, 
 found duplicate: B\x11S\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\x00\x00, 
 B\x11S\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\x00\x00
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8803:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Fix For: 0.94.15, 0.99.0

 Attachments: 8803v5.txt, HBASE-8803-v0-trunk.patch, 
 HBASE-8803-v1-0.94.patch, HBASE-8803-v1-trunk.patch, 
 HBASE-8803-v2-0.94.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v3-0.94.patch, 
 HBASE-8803-v4-0.94.patch, HBASE-8803-v4-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9738) Delete table and loadbalancer interference

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9738:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Delete table and loadbalancer interference
 --

 Key: HBASE-9738
 URL: https://issues.apache.org/jira/browse/HBASE-9738
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Critical
 Fix For: 0.99.0


 I have noticed that when the balancer is computing a plan for region moves, 
 and a delete table is issued, there is some interference.
 1. At time t1, user deleted the table.
 2. This led to the master updating the meta table to remove the line for the 
 regioninfo for a region f2a9e2e9d70894c03f54ee5902bebee6.
 {noformat}
 2013-10-04 08:42:52,495 INFO  [MASTER_TABLE_OPERATIONS-hor15n05:6-0] 
 catalog.MetaEditor: Deleted [{ENCODED = f2a9e2e9d70894c03f54ee5902bebee6, 
 NAME = 'usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6.', 
 STARTKEY = '', ENDKEY = ''}]
 {noformat}
 3. However around the same time, the balancer kicked in, and reassigned the 
 region and made it online somewhere. It didn't check the fact (nor anyone 
 else did) that the table was indeed deleted.
 {noformat}
 2013-10-04 08:42:53,215 INFO  
 [hor15n05.gq1.ygridcore.net,6,1380869262259-BalancerChore] 
 master.HMaster: balance 
 hri=usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6., 
 src=hor15n09.gq1.ygridcore.net,60020,1380869263722, 
 dest=hor15n11.gq1.ygridcore.net,60020,1380869263682
 {noformat}
 .
 {noformat}
 2013-10-04 08:42:53,592 INFO  [AM.ZK.Worker-pool2-t829] master.RegionStates: 
 Onlined f2a9e2e9d70894c03f54ee5902bebee6 on 
 hor15n11.gq1.ygridcore.net,60020,1380869263682
 {noformat}
 4. Henceforth, all the drop tables started giving warnings like
 {noformat}
 2013-10-04 08:45:17,587 INFO  [RpcServer.handler=8,port=6] 
 master.HMaster: Client=hrt_qa//68.142.246.151 delete usertable
 2013-10-04 08:45:17,631 DEBUG [RpcServer.handler=8,port=6] 
 lock.ZKInterProcessLockBase: Acquired a lock for 
 /hbase/table-lock/usertable/write-master:600
 2013-10-04 08:45:17,637 WARN  [RpcServer.handler=8,port=6] 
 catalog.MetaReader: No serialized HRegionInfo in 
 keyvalues={usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:seqnumDuringOpen/1380876173509/Put/vlen=8/mvcc=0,
  
 usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:server/1380876173509/Put/vlen=32/mvcc=0,
  
 usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:serverstartcode/1380876173509/Put/vlen=8/mvcc=0}
 {noformat}
 5. The create of the same table also fails since there is still state 
 (reincarnated, maybe) about the table in the master.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5583:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Master restart on create table with splitkeys does not recreate table with 
 all the splitkey regions
 ---

 Key: HBASE-5583
 URL: https://issues.apache.org/jira/browse/HBASE-5583
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, 
 HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, 
 HBASE-5583_new_5_WIP_using_tableznode.patch


 - Create table using splitkeys
 - MAster goes down before all regions are added to meta
 - On master restart the table is again enabled but with less number of 
 regions than specified in splitkeys
 Anyway client will get an exception if i had called sync create table.  But 
 table exists or not check will say table exists. 
 Is this scenario to be handled by client only or can we have some mechanism 
 on the master side for this? Pls suggest.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8495) Change ownership of the directory to bulk load

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8495:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Change ownership of the directory to bulk load
 --

 Key: HBASE-8495
 URL: https://issues.apache.org/jira/browse/HBASE-8495
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.94.7, 0.95.0
Reporter: Matteo Bertozzi
Priority: Trivial
 Fix For: 0.99.0


 To bulk load something you need to change the ownership of the data directory 
 to allow the hbase user to read and move the files, also in the split case 
 you must use the hbase user to run the LoadIncrementalHFiles tool, since 
 internally some directories _tmp are created to add the split reference 
 files.
 In a secure cluster, the SecureBulkLoadEndPoint will take care of this 
 problem by doing a chmod 777 on the directory to bulk load.
 NOTE that a chown is not possible since you must be a super user to change 
 the ownership, a change group may be possible but the user must be in the 
 hbase group... and anyway it will require a chmod to allow the group to 
 perform the move.
 {code}
 Caused by: org.apache.hadoop.security.AccessControlException: Permission 
 denied: user=hbase, access=WRITE, inode=/test/cf:th30z:supergroup:drwxr-xr-x
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
 Caused by: 
 org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
 java.io.IOException: Exception in rename
   at 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.rename(HRegionFileSystem.java:928)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:340)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionFileSystem.bulkLoadStoreFile(HRegionFileSystem.java:414)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8942:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.99.0

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6970:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 hbase-deamon.sh creates/updates pid file even when that start failed.
 -

 Key: HBASE-6970
 URL: https://issues.apache.org/jira/browse/HBASE-6970
 Project: HBase
  Issue Type: Bug
  Components: Usability
Reporter: Lars Hofhansl
 Fix For: 0.99.0


 We just ran into a strange issue where could neither start nor stop services 
 with hbase-deamon.sh.
 The problem is this:
 {code}
 nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
 --config ${HBASE_CONF_DIR} \
 $command $@ $startStop  $logout 21  /dev/null 
 echo $!  $pid
 {code}
 So the pid file is created or updated even when the start of the service 
 failed. The next stop command will then fail, because the pid file has the 
 wrong pid in it.
 Edit: Spelling and more spelling errors.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9484) Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9484:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96
 --

 Key: HBASE-9484
 URL: https://issues.apache.org/jira/browse/HBASE-9484
 Project: HBase
  Issue Type: Test
  Components: mapreduce, test
Reporter: Nick Dimiduk
Priority: Minor
 Fix For: 0.99.0

 Attachments: 
 0001-HBASE-9484-backport-8534-Fix-coverage-for-org.apache.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9047:
-

Fix Version/s: (was: 0.96.1)

 Tool to handle finishing replication when the cluster is offline
 

 Key: HBASE-9047
 URL: https://issues.apache.org/jira/browse/HBASE-9047
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Assignee: Demai Ni
 Fix For: 0.98.0, 0.94.15, 0.99.0

 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, 
 HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, 
 HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, 
 HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, 
 HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, 
 HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch


 We're having a discussion on the mailing list about replicating the data on a 
 cluster that was shut down in an offline fashion. The motivation could be 
 that you don't want to bring HBase back up but still need that data on the 
 slave.
 So I have this idea of a tool that would be running on the master cluster 
 while it is down, although it could also run at any time. Basically it would 
 be able to read the replication state of each master region server, finish 
 replicating what's missing to all the slave, and then clear that state in 
 zookeeper.
 The code that handles replication does most of that already, see 
 ReplicationSourceManager and ReplicationSource. Basically when 
 ReplicationSourceManager.init() is called, it will check all the queues in ZK 
 and try to grab those that aren't attached to a region server. If the whole 
 cluster is down, it will grab all of them.
 The beautiful thing here is that you could start that tool on all your 
 machines and the load will be spread out, but that might not be a big concern 
 if replication wasn't lagging since it would take a few seconds to finish 
 replicating the missing data for each region server.
 I'm guessing when starting ReplicationSourceManager you'd give it a fake 
 region server ID, and you'd tell it not to start its own source.
 FWIW the main difference in how replication is handled between Apache's HBase 
 and Facebook's is that the latter is always done separately of HBase itself. 
 This jira isn't about doing that.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9473) Change UI to list 'system tables' rather than 'catalog tables'.

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9473:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Change UI to list 'system tables' rather than 'catalog tables'.
 ---

 Key: HBASE-9473
 URL: https://issues.apache.org/jira/browse/HBASE-9473
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 9473.txt


 Minor, one-line, bit of polishing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-6416) hbck dies on NPE when a region folder exists but the table does not

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6416:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 hbck dies on NPE when a region folder exists but the table does not
 ---

 Key: HBASE-6416
 URL: https://issues.apache.org/jira/browse/HBASE-6416
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.99.0

 Attachments: hbase-6416-v1.patch, hbase-6416.patch


 This is what I'm getting for leftover data that has no .regioninfo
 First:
 {quote}
 12/07/17 23:13:37 WARN util.HBaseFsck: Failed to read .regioninfo file for 
 region null
 java.io.FileNotFoundException: File does not exist: 
 /hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46/.regioninfo
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegioninfo(HBaseFsck.java:611)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.access$2200(HBaseFsck.java:140)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2882)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2866)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Then it hangs on:
 {quote}
 12/07/17 23:13:39 INFO util.HBaseFsck: Attempting to handle orphan hdfs dir: 
 hdfs://sfor3s24:10101/hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46
 12/07/17 23:13:39 INFO util.HBaseFsck: checking orphan for table null
 Exception in thread main java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$100(HBaseFsck.java:1634)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:435)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:408)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:529)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:313)
   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:386)
   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3227)
 {quote}
 The NPE is sent by:
 {code}
 Preconditions.checkNotNull(Table  + tableName + ' not present!, 
 tableInfo);
 {code}
 I wonder why the condition checking was added if we don't handle it... In any 
 case hbck dies but it hangs because there are some non-daemon hanging around.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9591:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 [replication] getting Current list of sinks is out of date all the time 
 when a source is recovered
 

 Key: HBASE-9591
 URL: https://issues.apache.org/jira/browse/HBASE-9591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.99.0


 I tried killing a region server when the slave cluster was down, from that 
 point on my log was filled with:
 {noformat}
 2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 2013-09-20 00:31:04,226 INFO  
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
  org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 {noformat}
 The first log line is from the normal source, the second is the recovered 
 one. When we try to replicate, we call 
 replicationSinkMgr.getReplicationSink() and if the list of machines was 
 refreshed since the last time then we call chooseSinks() which in turn 
 refreshes the list of sinks and resets our lastUpdateToPeers. The next source 
 will notice the change, and will call chooseSinks() too. The first source is 
 coming for another round, sees the list was refreshed, calls chooseSinks() 
 again. It happens forever until the recovered queue is gone.
 We could have all the sources going to the same cluster share a thread-safe 
 ReplicationSinkManager. We could also manage the same cluster separately for 
 each source. Or even easier, if the list we get in chooseSinks() is the same 
 we had before, consider it a noop.
 What do you think [~gabriel.reid]?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-7245) Recovery on failed snapshot restore

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7245:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Recovery on failed snapshot restore
 ---

 Key: HBASE-7245
 URL: https://issues.apache.org/jira/browse/HBASE-7245
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Jonathan Hsieh
Assignee: Matteo Bertozzi
 Fix For: 0.99.0


 Restore will do updates to the file system and to meta.  it seems that an 
 inopportune failure before meta is completely updated could result in an 
 inconsistent state that would require hbck to fix.
 We should define what the semantics are for recovering from this.  Some 
 suggestions:
 1) Fail Forward (see some log saying restore's meta edits not completed, then 
 gather information necessary to build it all from fs, and complete meta 
 edits.).
 2) Fail backwards (see some log saying restore's meta edits not completed, 
 delete incomplete snapshot region entries from meta.)  
 I think I prefer 1 -- if two processes end somehow updating  (somehow the 
 original master didn't die, and a new one started up) they would be 
 idempotent.  If we used 2, we could still have a race and still be in a bad 
 place.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8713) [hadoop1] Log spam each time the WAL is rolled

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8713:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 [hadoop1] Log spam each time the WAL is rolled
 --

 Key: HBASE-8713
 URL: https://issues.apache.org/jira/browse/HBASE-8713
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.99.0


 Testing 0.95.1 RC1, I see these 2 lines every time the WAL is rolled:
 {noformat}
 2013-06-07 17:19:33,182 INFO  [RS_CLOSE_REGION-ip-10-20-46-44:50653-1] 
 util.FSUtils: FileSystem doesn't support getDefaultReplication
 2013-06-07 17:19:33,182 INFO  [RS_CLOSE_REGION-ip-10-20-46-44:50653-1] 
 util.FSUtils: FileSystem doesn't support getDefaultBlockSize
 {noformat}
 It only happens on hadoop1.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9866) Support the mode where REST server authorizes proxy users

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9866:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Support the mode where REST server authorizes proxy users
 -

 Key: HBASE-9866
 URL: https://issues.apache.org/jira/browse/HBASE-9866
 Project: HBase
  Issue Type: Improvement
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.99.0

 Attachments: 9866-1.txt, 9866-2.txt, 9866-3.txt, 9866-4.txt


 In one use case, someone was trying to authorize with the REST server as a 
 proxy user. That mode is not supported today. 
 The curl request would be something like (assuming SPNEGO auth) - 
 {noformat}
 curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8593) Type support in ImportTSV tool

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8593:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Type support in ImportTSV tool
 --

 Key: HBASE-8593
 URL: https://issues.apache.org/jira/browse/HBASE-8593
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Reporter: Anoop Sam John
Assignee: rajeshbabu
 Fix For: 0.99.0

 Attachments: HBASE-8593.patch, HBASE-8593_v2.patch, 
 HBASE-8593_v4.patch, ReportMapper.java


 Now the ImportTSV tool treats all the table column to be of type String. It 
 converts the input data into bytes considering its type to be String. Some 
 times user will need a type of say int/float to get added to table by using 
 this tool.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9343) Implement stateless scanner for Stargate

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9343:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Implement stateless scanner for Stargate
 

 Key: HBASE-9343
 URL: https://issues.apache.org/jira/browse/HBASE-9343
 Project: HBase
  Issue Type: Improvement
  Components: REST
Affects Versions: 0.94.11
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
Priority: Minor
 Fix For: 0.98.1, 0.99.0

 Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch, 
 HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch, 
 HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch


 The current scanner implementation for scanner stores state and hence not 
 very suitable for REST server failure scenarios. The current JIRA proposes to 
 implement a stateless scanner. In the first version of the patch, a new 
 resource class ScanResource has been added and all the scan parameters will 
 be specified as query params. 
 The following are the scan parameters
 startrow -  The start row for the scan.
 endrow - The end row for the scan.
 columns - The columns to scan. 
 starttime, endtime - To only retrieve columns within a specific range of 
 version timestamps,both start and end time must be specified.
 maxversions  - To limit the number of versions of each column to be returned.
 batchsize - To limit the maximum number of values returned for each call to 
 next().
 limit - The number of rows to return in the scan operation.
  More on start row, end row and limit parameters.
 1. If start row, end row and limit not specified, then the whole table will 
 be scanned.
 2. If start row and limit (say N) is specified, then the scan operation will 
 return N rows from the start row specified.
 3. If only limit parameter is specified, then the scan operation will return 
 N rows from the start of the table.
 4. If limit and end row are specified, then the scan operation will return N 
 rows from start of table till the end row. If the end row is 
 reached before N rows ( say M and M lt; N ), then M rows will be returned to 
 the user.
 5. If start row, end row and limit (say N ) are specified and N lt; number 
 of rows between start row and end row, then N rows from start row
 will be returned to the user. If N gt; (number of rows between start row and 
 end row (say M), then M number of rows will be returned to the
 user.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9889) Make sure we clean up scannerReadPoints upon any exceptions

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9889:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Make sure we clean up scannerReadPoints upon any exceptions
 ---

 Key: HBASE-9889
 URL: https://issues.apache.org/jira/browse/HBASE-9889
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.89-fb, 0.94.12, 0.96.0
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.99.0

 Attachments: hbase-9889.diff


 If there is an exception in the creation of RegionScanner (for example, 
 exception while opening store files) the scanner Read points is not cleaned 
 up.
 Having an unused old entry in the scannerReadPoints means that flushes and 
 compactions cannot garbage-collect older versions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9527) Review all old api that takes a table name as a byte array and ensure none can pass ns + tablename

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9527:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Review all old api that takes a table name as a byte array and ensure none 
 can pass ns + tablename
 --

 Key: HBASE-9527
 URL: https://issues.apache.org/jira/browse/HBASE-9527
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.98.0, 0.99.0


 Go over all old APIs that take a table name and ensure that it is not 
 possible to pass in a byte array that is a namespace + tablename; instead 
 throw an exception.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8028) Append, Increment: Adding rollback support

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8028:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Append, Increment: Adding rollback support
 --

 Key: HBASE-8028
 URL: https://issues.apache.org/jira/browse/HBASE-8028
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.99.0

 Attachments: HBase-8028-v1.patch, HBase-8028-v2.patch, 
 HBase-8028-with-Increments-v1.patch, HBase-8028-with-Increments-v2.patch


 In case there is an exception while doing the log-sync, the memstore is not 
 rollbacked, while the mvcc is _always_ forwarded to the writeentry created at 
 the beginning of the operation. This may lead to scanners seeing results 
 which are not synched to the fs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-7840) Enhance the java it framework to start stop a distributed hbase hadoop cluster

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7840:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Enhance the java it framework to start  stop a distributed hbase  hadoop 
 cluster 
 ---

 Key: HBASE-7840
 URL: https://issues.apache.org/jira/browse/HBASE-7840
 Project: HBase
  Issue Type: New Feature
  Components: test
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 0.99.0

 Attachments: 7840.v1.patch, 7840.v3.patch, 7840.v4.patch


 Needs are to use a development version of HBase  HDFS 1  2.
 Ideally, should be nicely backportable to 0.94 to allow comparisons and 
 regression tests between versions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5617:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9431) Set 'hbase.bulkload.retries.number' to 10 as HBASE-8450 claims

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9431:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Set  'hbase.bulkload.retries.number' to 10 as HBASE-8450 claims
 ---

 Key: HBASE-9431
 URL: https://issues.apache.org/jira/browse/HBASE-9431
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 9431.txt


 HBASE-8450 claimes  'hbase.bulkload.retries.number' is set to 10 when its 
 still 0 ([~jeffreyz] noticed).  Fix.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-5839) Backup master not starting up due to Bind Exception while starting HttpServer

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5839:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Backup master not starting up due to Bind Exception while starting HttpServer
 -

 Key: HBASE-5839
 URL: https://issues.apache.org/jira/browse/HBASE-5839
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.99.0


 Backup master tries to bind to the info port 60010.
 This is done once the back up master becomes active.  Even before that the 
 Data Xceviers threads (IPC handlers) are started and they are started at 
 random port.  If already 60010 is used then when standby master comes up then 
 it fails due to bind exception.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-6562) Fake KVs are sometimes passed to filters

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6562:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Fake KVs are sometimes passed to filters
 

 Key: HBASE-6562
 URL: https://issues.apache.org/jira/browse/HBASE-6562
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.99.0

 Attachments: 6562-0.94-v1.txt, 6562-0.96-v1.txt, 6562-v2.txt, 
 6562-v3.txt, 6562-v4.txt, 6562-v5.txt, 6562.txt, minimalTest.java


 In internal tests at Salesforce we found that fake row keys sometimes are 
 passed to filters (Filter.filterRowKey(...) specifically).
 The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the 
 row key is passed to filterRowKey in RegionScannImpl *before* that happens.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8533) HBaseAdmin does not ride over cluster restart

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8533:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 HBaseAdmin does not ride over cluster restart
 -

 Key: HBASE-8533
 URL: https://issues.apache.org/jira/browse/HBASE-8533
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 0.98.0, 0.95.0
Reporter: Julian Zhou
Assignee: Julian Zhou
Priority: Minor
 Fix For: 0.99.0

 Attachments: 8533-0.95-v1.patch, 8533-trunk-v1.patch, 
 hbase-8533-trunk-v0.patch


 For Restful servlet (org.apache.hadoop.hbase.rest.Main (0.94), 
 org.apache.hadoop.hbase.rest.RESTServer (trunk)) on Jetty, we need to first 
 explicitly start the service (% ./bin/hbase-daemon.sh start rest -p 8000 ) 
 for application running. Here is a scenario, sometimes, HBase cluster are 
 stopped/started for maintanence, but rest is a seperated standalone process, 
 which binds the HBaseAdmin at construction method.
 HBase stop/start cause this binding lost for existing rest servlet. Rest 
 servlet still exist to trying on old bound HBaseAdmin until a long time 
 duration later with an Unavailable caught via an IOException caught in
 such as RootResource.
 Could we pairwise the HBase service with HBase rest service with some 
 start/stop options? since seems no reason to still keep the rest servlet 
 process after HBase stopped? When HBase restarts, original rest service could 
 not resume to bind to the new HBase service via its old HBaseAdmin reference?
 So may we stop the rest when hbase stopped, or even if hbase was killed by 
 acident, restart hbase with rest option could detect the old rest process, 
 kill it and start to bind a new one?
 From this point of view, application rely on rest api in previous scenario 
 could immediately detect it when setting up http connection session instead 
 of wasting a long time to fail back from IOException with Unavailable from 
 rest servlet.
 Put current options from the discussion history here from Andrew, Stack and 
 Jean-Daniel,
 1) create an HBaseAdmin on demand in rest servlet instead of keeping 
 singleton instance; (another possible enhancement for HBase client: automatic 
 reconnection of an open HBaseAdmin handle after a cluster bounce?)
 2) pairwise the rest webapp with hbase webui so the rest is always on with 
 HBase serive;
 3) add an option for rest service (such as HBASE_MANAGES_REST) in 
 hbase-env.sh, set HBASE_MANAGES_REST to true, the scripts will start/stop the 
 REST server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9513) Why is PE#RandomSeekScanTest way slower in 0.96 than in 0.94?

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9513:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Why is PE#RandomSeekScanTest way slower in 0.96 than in 0.94?
 -

 Key: HBASE-9513
 URL: https://issues.apache.org/jira/browse/HBASE-9513
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.98.0, 0.99.0


 Our JMS reported this on the 0.96.0RC0 thread.  Our Matteo found similar on 
 an offline thread.  Whats up here?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9409) Backport 'HBASE-9391 Compilation problem in AccessController with JDK 6 (Andrew Purtell)'

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9409:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Backport 'HBASE-9391  Compilation problem in AccessController with JDK 6 
 (Andrew Purtell)'
 --

 Key: HBASE-9409
 URL: https://issues.apache.org/jira/browse/HBASE-9409
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 Issue to add fix to next 0.96.0RC or to 0.96.1



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-7108) Don't use legal family name for system folder at region level

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7108:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Don't use legal family name for system folder at region level
 -

 Key: HBASE-7108
 URL: https://issues.apache.org/jira/browse/HBASE-7108
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.2, 0.94.2, 0.95.2
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.99.0

 Attachments: HBASE-7108-v0.patch


 CHANGED, was: Don't allow recovered.edits as legal family name
 Region directories can contain folders called recovered.edits, log 
 splitting related.
 But there's nothing that prevent a user to create a family with that name...
 HLog.RECOVERED_EDITS_DIR = recovered.edits;
 HRegion.MERGEDIR = merges; // fixed with HBASE-6158
 SplitTransaction.SPLITDIR = splits; // fixed with HBASE-6158



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9445) Snapshots should create column family dirs for empty regions

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9445:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Snapshots should create column family dirs for empty regions
 

 Key: HBASE-9445
 URL: https://issues.apache.org/jira/browse/HBASE-9445
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.99.0

 Attachments: hbase-9445_v1.patch, hbase-9445_v2.patch, 
 hbase-9445_v3.patch


 Currently, taking a snapshot will not create the family directory under a 
 region if the family does not have any files in it. 
 Subsequent verification fails because of this. There is some logic in the 
 SnapshotTestingUtils.confirmSnapshotValid() to deal with empty family 
 directories, but I think we should create the family directories regardless 
 of whether there are any hfiles referencing them. 
 {code}
 2013-09-05 11:07:21,566 DEBUG [Thread-208] util.FSUtils(1687): |-data/
 2013-09-05 11:07:21,567 DEBUG [Thread-208] util.FSUtils(1687): |default/
 2013-09-05 11:07:21,568 DEBUG [Thread-208] util.FSUtils(1687): |---test/
 2013-09-05 11:07:21,569 DEBUG [Thread-208] util.FSUtils(1687): 
 |--.tabledesc/
 2013-09-05 11:07:21,570 DEBUG [Thread-208] util.FSUtils(1690): 
 |-.tableinfo.01
 2013-09-05 11:07:21,570 DEBUG [Thread-208] util.FSUtils(1687): 
 |--.tmp/
 2013-09-05 11:07:21,571 DEBUG [Thread-208] util.FSUtils(1687): 
 |--accd6e55887057888de758df44dacda7/
 2013-09-05 11:07:21,572 DEBUG [Thread-208] util.FSUtils(1690): 
 |-.regioninfo
 2013-09-05 11:07:21,572 DEBUG [Thread-208] util.FSUtils(1687): 
 |-fam/
 2013-09-05 11:07:21,555 DEBUG [Thread-208] util.FSUtils(1687): 
 |-.hbase-snapshot/
 2013-09-05 11:07:21,556 DEBUG [Thread-208] util.FSUtils(1687): |.tmp/
 2013-09-05 11:07:21,557 DEBUG [Thread-208] util.FSUtils(1687): 
 |offlineTableSnapshot/
 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1690): 
 |---.snapshotinfo
 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1687): 
 |---.tabledesc/
 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1690): 
 |--.tableinfo.01
 2013-09-05 11:07:21,559 DEBUG [Thread-208] util.FSUtils(1687): |---.tmp/
 2013-09-05 11:07:21,559 DEBUG [Thread-208] util.FSUtils(1687): 
 |---accd6e55887057888de758df44dacda7/
 2013-09-05 11:07:21,560 DEBUG [Thread-208] util.FSUtils(1690): 
 |--.regioninfo
 {code}
 I think this is important for 0.96.0. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8642:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 [Snapshot] List and delete snapshot by table
 

 Key: HBASE-8642
 URL: https://issues.apache.org/jira/browse/HBASE-8642
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2
Reporter: Julian Zhou
Assignee: Julian Zhou
Priority: Minor
 Fix For: 0.99.0

 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 
 8642-trunk-0.95-v2.patch


 Support list and delete snapshot by table name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9892:
-

Fix Version/s: (was: 0.96.1)

 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
 HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, 
 HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, 
 HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-7579) HTableDescriptor equals method fails if results are returned in a different order

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7579:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 HTableDescriptor equals method fails if results are returned in a different 
 order
 -

 Key: HBASE-7579
 URL: https://issues.apache.org/jira/browse/HBASE-7579
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.94.6, 0.95.0
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-7579-0.94.patch, HBASE-7579-v1.patch, 
 HBASE-7579-v2.patch, HBASE-7579-v3.patch, HBASE-7579-v4.patch, 
 HBASE-7579-v5.patch


 HTableDescriptor's compareTo function compares a set of HColumnDescriptors 
 against another set of HColumnDescriptors. It iterates through both, relying 
 on the fact that they will be in the same order.
 In my testing, I may have seen this issue come up, so I decided to fix it.
 It's a straightforward fix. I convert the sets into a hashset for O(1) 
 lookups (at least in theory), then I check that all items in the first set 
 are found in the second.
 Since the sizes are the same, we know that if all elements showed up in the 
 second set, then they must be equal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8091) Goraci test might rewrite data in case of map task failures, which eclipses data loss problems

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8091:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Goraci test might rewrite data in case of map task failures, which eclipses 
 data loss problems
 --

 Key: HBASE-8091
 URL: https://issues.apache.org/jira/browse/HBASE-8091
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.99.0


 As discussed in HBASE-8031, goraci map tasks will rewite the same data if the 
 map task fails. There are some conditions that this might cause overwriting 
 data that would have been reported as loss otherwise. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9542) Have Get and MultiGet do cellblocks, currently they are pb all the time

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9542:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Have Get and MultiGet do cellblocks, currently they are pb all the time
 ---

 Key: HBASE-9542
 URL: https://issues.apache.org/jira/browse/HBASE-9542
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Priority: Critical
 Fix For: 0.99.0


 Probably better if we cellblock Gets and MultiGets rather than pb the results.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9292) Syncer fails but we won't go down

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9292:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Syncer fails but we won't go down
 -

 Key: HBASE-9292
 URL: https://issues.apache.org/jira/browse/HBASE-9292
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.95.2
 Environment: hadoop-2.1.0-beta and tip of 0.95 branch
Reporter: stack
 Fix For: 0.99.0


 Running some simple loading tests i ran into the following running on 
 hadoop-2.1.0-beta.
 {code}
 2013-08-20 16:51:56,310 DEBUG [regionserver60020.logRoller] 
 regionserver.LogRoller: HLog roll requested
 2013-08-20 16:51:56,314 DEBUG [regionserver60020.logRoller] wal.FSHLog: 
 cleanupCurrentWriter  waiting for transactions to get synced  total 655761 
 synced till here 655750
 2013-08-20 16:51:56,360 INFO  [regionserver60020.logRoller] wal.FSHLog: 
 Rolled WAL 
 /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042714402
  with entries=985, filesize=122.5 M; new WAL 
 /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042716311
 2013-08-20 16:51:56,378 WARN  [Thread-4788] hdfs.DFSClient: DataStreamer 
 Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
 /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042716311
  could only be replicated to 0 nodes instead of minReplication (=1).  There 
 are 5 datanode(s) running and no node(s) are excluded in this operation.
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2458)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:525)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
 at org.apache.hadoop.ipc.Client.call(Client.java:1347)
 at org.apache.hadoop.ipc.Client.call(Client.java:1300)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at $Proxy13.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at $Proxy13.addBlock(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
 at $Proxy14.addBlock(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1220)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1073)
 ...
 {code}
 Thereafter the server is up but useless and can't go down because it just 
 keeps doing this:
 {code}
 2013-08-20 16:51:56,380 FATAL [RpcServer.handler=3,port=60020] wal.FSHLog: 
 Could not sync. Requesting roll of hlog
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
 

[jira] [Updated] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8529:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 checkOpen is missing from multi, mutate, get and multiGet etc.
 --

 Key: HBASE-8529
 URL: https://issues.apache.org/jira/browse/HBASE-8529
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Minor
 Fix For: 0.98.0, 0.99.0


 I saw we have checkOpen in all those functions in 0.94 while they're missing 
 from trunk. Does anyone know why?
 For multi and mutate, if we don't call checkOpen we could flood our logs with 
 bunch of DFSOutputStream is closed errors when we sync WAL.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-7115) [shell] Provide a way to register custom filters with the Filter Language Parser

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7115:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 [shell] Provide a way to register custom filters with the Filter Language 
 Parser
 

 Key: HBASE-7115
 URL: https://issues.apache.org/jira/browse/HBASE-7115
 Project: HBase
  Issue Type: Improvement
  Components: Filters, shell
Affects Versions: 0.95.2
Reporter: Aditya Kishore
Assignee: Aditya Kishore
 Fix For: 0.99.0

 Attachments: HBASE-7115_trunk.patch, HBASE-7115_trunk.patch, 
 HBASE-7115_trunk_v2.patch


 HBASE-5428 added this capability to thrift interface but the configuration 
 parameter name is thrift specific.
 This patch introduces a more generic parameter hbase.user.filters using 
 which the user defined custom filters can be specified in the configuration 
 and loaded in any client that needs to use the filter language parser.
 The patch then uses this new parameter to register any user specified filters 
 while invoking the HBase shell.
 Example usage: Let's say I have written a couple of custom filters with class 
 names *{{org.apache.hadoop.hbase.filter.custom.SuperDuperFilter}}* and 
 *{{org.apache.hadoop.hbase.filter.custom.SilverBulletFilter}}* and I want to 
 use them from HBase shell using the filter language.
 To do that, I would add the following configuration to {{hbase-site.xml}}
 {panel}{{property}}
 {{  namehbase.user.filters/name}}
 {{  
 value}}*{{SuperDuperFilter}}*{{:org.apache.hadoop.hbase.filter.custom.SuperDuperFilter,}}*{{SilverBulletFilter}}*{{:org.apache.hadoop.hbase.filter.custom.SilverBulletFilter/value}}
 {{/property}}{panel}
 Once this is configured, I can launch HBase shell and use these filters in my 
 {{get}} or {{scan}} just the way I would use a built-in filter.
 {code}
 hbase(main):001:0 scan 't', {FILTER = SuperDuperFilter(true) AND 
 SilverBulletFilter(42)}
 ROW  COLUMN+CELL
  status  column=cf:a, 
 timestamp=30438552, value=world_peace
 1 row(s) in 0. seconds
 {code}
 To use this feature in any client, the client needs to make the following 
 function call as part of its initialization.
 {code}
 ParseFilter.registerUserFilters(configuration);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849452#comment-13849452
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

IMHO this, in case of opens, promotes not being fault tolerant. In large 
clusters you cannot get around servers failing and regions closing and 
reopening. Snapshot should just be able to ride over that. Splits are more 
interesting.
Esp. if snapshots are used more (MR over snapshots), it may be nonviable to 
prevent splits and other operations for the duration of every snapshot, alter, 
...

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849583#comment-13849583
 ] 

Sergey Shelukhin commented on HBASE-10136:
--

If a server fails the region will not stay open... I don't think it's a good 
idea to rely on that. Locking would work as a temporary fix I guess, for this 
particular interaction. But why cannot snapshot handle the general case of 
regions becoming unavailable? It's not like close-open takes time like recovery 
does during alter table.

 Alter table conflicts with concurrent snapshot attempt on that table
 

 Key: HBASE-10136
 URL: https://issues.apache.org/jira/browse/HBASE-10136
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.96.0, 0.98.1, 0.99.0
Reporter: Aleksandr Shulman
Assignee: Matteo Bertozzi
  Labels: online_schema_change

 Expected behavior:
 A user can issue a request for a snapshot of a table while that table is 
 undergoing an online schema change and expect that snapshot request to 
 complete correctly. Also, the same is true if a user issues a online schema 
 change request while a snapshot attempt is ongoing.
 Observed behavior:
 Snapshot attempts time out when there is an ongoing online schema change 
 because the region is closed and opened during the snapshot. 
 As a side-note, I would expect that the attempt should fail quickly as 
 opposed to timing out. 
 Further, what I have seen is that subsequent attempts to snapshot the table 
 fail because of some state/cleanup issues. This is also concerning.
 Immediate error:
 {code}type=FLUSH }' is still in progress!
 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) 
 Sleeping: 1ms while waiting for snapshot completion.
 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting 
 current status of snapshot from master...
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in 
 progress!
 Snapshot failure occurred
 org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 
 'snapshot0' wasn't completed in expectedTime:6 ms
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
   at 
 org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
 Likely root cause of error:
 {code}Exception in SnapshotSubprocedurePool
 java.util.concurrent.ExecutionException: 
 org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
 

[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849587#comment-13849587
 ] 

stack commented on HBASE-9047:
--

[~lhofhansl] You going to commit or would you like me too?

 Tool to handle finishing replication when the cluster is offline
 

 Key: HBASE-9047
 URL: https://issues.apache.org/jira/browse/HBASE-9047
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Assignee: Demai Ni
 Fix For: 0.98.0, 0.94.15, 0.99.0

 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, 
 HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, 
 HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, 
 HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, 
 HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, 
 HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch


 We're having a discussion on the mailing list about replicating the data on a 
 cluster that was shut down in an offline fashion. The motivation could be 
 that you don't want to bring HBase back up but still need that data on the 
 slave.
 So I have this idea of a tool that would be running on the master cluster 
 while it is down, although it could also run at any time. Basically it would 
 be able to read the replication state of each master region server, finish 
 replicating what's missing to all the slave, and then clear that state in 
 zookeeper.
 The code that handles replication does most of that already, see 
 ReplicationSourceManager and ReplicationSource. Basically when 
 ReplicationSourceManager.init() is called, it will check all the queues in ZK 
 and try to grab those that aren't attached to a region server. If the whole 
 cluster is down, it will grab all of them.
 The beautiful thing here is that you could start that tool on all your 
 machines and the load will be spread out, but that might not be a big concern 
 if replication wasn't lagging since it would take a few seconds to finish 
 replicating the missing data for each region server.
 I'm guessing when starting ReplicationSourceManager you'd give it a fake 
 region server ID, and you'd tell it not to start its own source.
 FWIW the main difference in how replication is handled between Apache's HBase 
 and Facebook's is that the latter is always done separately of HBase itself. 
 This jira isn't about doing that.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9484) Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96

2013-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849590#comment-13849590
 ] 

stack commented on HBASE-9484:
--

[~ndimiduk] Looks like this issue should be resolved?  It was committed to 0.96?

 Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96
 --

 Key: HBASE-9484
 URL: https://issues.apache.org/jira/browse/HBASE-9484
 Project: HBase
  Issue Type: Test
  Components: mapreduce, test
Reporter: Nick Dimiduk
Priority: Minor
 Fix For: 0.99.0

 Attachments: 
 0001-HBASE-9484-backport-8534-Fix-coverage-for-org.apache.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes

2013-12-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HBASE-10175:


 Summary: 2-thread ChaosMonkey steps on its own toes
 Key: HBASE-10175
 URL: https://issues.apache.org/jira/browse/HBASE-10175
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: Sergey Shelukhin
Priority: Minor


ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) 
threads steps on its own toes and logs a lot of exceptions.
A simple solution would be to catch most (or all), like 
NotServingRegionException, and log less (not a full callstack for example, it's 
not very useful anyway).
A more complicated/complementary one would be to keep track which regions the 
destructive thread affects and use other regions for volatile one.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HBASE-10143) Clean up dead local stores in FSUtils

2013-12-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark resolved HBASE-10143.
---

   Resolution: Fixed
Fix Version/s: 0.99.0
   0.96.2
   0.98.0

 Clean up dead local stores in FSUtils
 -

 Key: HBASE-10143
 URL: https://issues.apache.org/jira/browse/HBASE-10143
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0, 0.96.0, 0.99.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Minor
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10143-0.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits

2013-12-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849591#comment-13849591
 ] 

Ted Yu commented on HBASE-8701:
---

{code}
-  return Longs.compare(right.getMvccVersion(), left.getMvccVersion());
+  long leftChangeSeqNum = getReplaySeqNum(left);
+  if (leftChangeSeqNum  0) {
+leftChangeSeqNum = left.getMvccVersion();
+  }
+  long RightChangeSeqNum = getReplaySeqNum(right);
+  if (RightChangeSeqNum  0) {
+RightChangeSeqNum = right.getMvccVersion();
{code}
What would happen if one Cell has sequence Id but the other cell doesn't have 
sequence Id ?

Can you put the patch on review board ?

Thanks

 distributedLogReplay need to apply wal edits in the receiving order of those 
 edits
 --

 Key: HBASE-8701
 URL: https://issues.apache.org/jira/browse/HBASE-8701
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0

 Attachments: 8701-v3.txt, hbase-8701-tag.patch, hbase-8701-v4.patch, 
 hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, 
 hbase-8701-v8.patch


 This issue happens in distributedLogReplay mode when recovering multiple puts 
 of the same key + version(timestamp). After replay, the value is 
 nondeterministic of the key
 h5. The original concern situation raised from [~eclark]:
 For all edits the rowkey is the same.
 There's a log with: [ A (ts = 0), B (ts = 0) ]
 Replay the first half of the log.
 A user puts in C (ts = 0)
 Memstore has to flush
 A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
 Replay the rest of the Log.
 Flush
 The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
 Put(key,t=T) in WAL2
 h5. Below is the option(proposed by Ted) I'd like to use:
 a) During replay, we pass original wal sequence number of each edit to the 
 receiving RS
 b) In receiving RS, we store negative original sequence number of wal edits 
 into mvcc field of KVs of wal edits
 c) Add handling of negative MVCC in KVScannerComparator and KVComparator   
 d) In receiving RS, write original sequence number into an optional field of 
 wal file for chained RS failure situation 
 e) When opening a region, we add a safety bumper(a large number) in order for 
 the new sequence number of a newly opened region not to collide with old 
 sequence numbers. 
 In the future, when we stores sequence number along with KVs, we can adjust 
 the above solution a little bit by avoiding to overload MVCC field.
 h5. The other alternative options are listed below for references:
 Option one
 a) disallow writes during recovery
 b) during replay, we pass original wal sequence ids
 c) hold flush till all wals of a recovering region are replayed. Memstore 
 should hold because we only recover unflushed wal edits. For edits with same 
 key + version, whichever with larger sequence Id wins.
 Option two
 a) During replay, we pass original wal sequence ids
 b) for each wal edit, we store each edit's original sequence id along with 
 its key. 
 c) during scanning, we use the original sequence id if it's present otherwise 
 its store file sequence Id
 d) compaction can just leave put with max sequence id
 Please let me know if you have better ideas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table

2013-12-16 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849592#comment-13849592
 ] 

Matteo Bertozzi commented on HBASE-10136:
-

[~sershe] we're not talking about snapshots here. Currently snapshot are built 
to fail if a region is moving or is down, and this is by design. If you want to 
talk about how to fix this open another jira.

The problem here is the TableEventHandler and when the table lock is released,
for example if you call modifyTable() twice or you have a split concurrently 
with modifyTable() you don't get the expected behavior that we want with the 
table lock, which should be an operation on the table is locked until the other 
is completed.

also the other problem, not completly related, that I'm pointing out is that 
since we have this async complete the client is not synchronous

 Alter table conflicts with concurrent snapshot attempt on that table
 

 Key: HBASE-10136
 URL: https://issues.apache.org/jira/browse/HBASE-10136
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.96.0, 0.98.1, 0.99.0
Reporter: Aleksandr Shulman
Assignee: Matteo Bertozzi
  Labels: online_schema_change

 Expected behavior:
 A user can issue a request for a snapshot of a table while that table is 
 undergoing an online schema change and expect that snapshot request to 
 complete correctly. Also, the same is true if a user issues a online schema 
 change request while a snapshot attempt is ongoing.
 Observed behavior:
 Snapshot attempts time out when there is an ongoing online schema change 
 because the region is closed and opened during the snapshot. 
 As a side-note, I would expect that the attempt should fail quickly as 
 opposed to timing out. 
 Further, what I have seen is that subsequent attempts to snapshot the table 
 fail because of some state/cleanup issues. This is also concerning.
 Immediate error:
 {code}type=FLUSH }' is still in progress!
 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) 
 Sleeping: 1ms while waiting for snapshot completion.
 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting 
 current status of snapshot from master...
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] 
 snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 
 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in 
 progress!
 Snapshot failure occurred
 org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 
 'snapshot0' wasn't completed in expectedTime:6 ms
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
   at 
 org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
 Likely root cause of error:
 {code}Exception in SnapshotSubprocedurePool
 java.util.concurrent.ExecutionException: 
 org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is closing
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
   at 
 org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
   at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
 changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3.
  is 

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-12-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849602#comment-13849602
 ] 

Enis Soztutar commented on HBASE-9892:
--

Go for it!

 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
 HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, 
 HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, 
 HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9399) Up the memstore flush size

2013-12-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849603#comment-13849603
 ] 

Elliott Clark commented on HBASE-9399:
--

I finally got to try this and upping the memstore flush size to 512mb gave us 
about a 1% perf gain when running our IT tests.  Nothing huge so we can take it 
or leave it.

 Up the memstore flush size
 --

 Key: HBASE-9399
 URL: https://issues.apache.org/jira/browse/HBASE-9399
 Project: HBase
  Issue Type: Task
  Components: regionserver
Affects Versions: 0.98.0, 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0


 As heap sizes get bigger we are still recommending that users keep their 
 number of regions to a minimum.  This leads to lots of un-used memstore 
 memory.
 For example I have a region server with 48 gigs of ram.  30 gigs are there 
 for the region server.  This with current defaults the global memstore size 
 reserved is 8 gigs.
 The per region memstore size is 128mb right now.  That means that I need 80 
 regions actively taking writes to reach the global memstore size.  That 
 number is way out of line with what our split policies currently give users.  
 They are given much fewer regions by default.
 We should up the hbase.hregion.memstore.flush.size size.  Ideally we should 
 auto tune everything.  But until then I think something like 512mb would help 
 a lot with our write throughput on clusters that don't have several hundred 
 regions per RS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10146) Bump HTrace version to 2.04

2013-12-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-10146:
--

   Resolution: Fixed
Fix Version/s: 0.99.0
   0.96.2
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Bump HTrace version to 2.04
 ---

 Key: HBASE-10146
 URL: https://issues.apache.org/jira/browse/HBASE-10146
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10146-0.patch


 2.04 has been released with a bug fix for what happens when htrace fails.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10176) Canary#sniff() should close the HTable instance

2013-12-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10176:
--

 Summary: Canary#sniff() should close the HTable instance
 Key: HBASE-10176
 URL: https://issues.apache.org/jira/browse/HBASE-10176
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  table = new HTable(admin.getConfiguration(), tableDesc.getName());
{code}
HTable instance should be closed by the end of the method.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10177) Fix the netty dependency issue

2013-12-16 Thread Gaurav Menghani (JIRA)
Gaurav Menghani created HBASE-10177:
---

 Summary: Fix the netty dependency issue
 Key: HBASE-10177
 URL: https://issues.apache.org/jira/browse/HBASE-10177
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Gaurav Menghani
 Fix For: 0.89-fb


The netty developers changed their group id from org.jboss.netty to io.netty. 
As a result, the zookeeper and hadoop dependencies pull in the older netty 
(3.2.2) and swift related dependencies pull in the newer netty (3.7.0). As a 
result we get ClassNotFoundExceptions, when the older 3.2.2 jar is picked up in 
place of 3.7.0. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9927) ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily

2013-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849607#comment-13849607
 ] 

Hudson commented on HBASE-9927:
---

FAILURE: Integrated in HBase-0.94-security #361 (See 
[https://builds.apache.org/job/HBase-0.94-security/361/])
HBASE-9927 ReplicationLogCleaner#stop() calls 
HConnectionManager#deleteConnection() unnecessarily (tedyu: rev 1551273)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java


 ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() 
 unnecessarily
 --

 Key: HBASE-9927
 URL: https://issues.apache.org/jira/browse/HBASE-9927
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.94.15

 Attachments: 9927.txt


 When inspecting log, I found the following:
 {code}
 2013-11-08 18:23:48,472 ERROR [M:0;kiyo:42380.oldLogCleaner] 
 client.HConnectionManager(468): Connection not found in the list, can't 
 delete it (connection key=HConnectionKey{properties={hbase.rpc.timeout=6, 
 hbase.zookeeper.property.clientPort=59832, hbase.client.pause=100, 
 zookeeper.znode.parent=/hbase, hbase.client.retries.number=350, 
 hbase.zookeeper.quorum=localhost}, username='zy'}). May be the key was 
 modified?
 java.lang.Exception
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:468)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:404)
 at 
 org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.stop(ReplicationLogCleaner.java:141)
 at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.cleanup(CleanerChore.java:276)
 {code}
 The call to HConnectionManager#deleteConnection() is not needed.
 Here is related code which has a comment for this effect:
 {code}
 // Not sure why we're deleting a connection that we never acquired or used
 HConnectionManager.deleteConnection(this.getConf());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10178) Potential null object dereference in TablePermission#equals()

2013-12-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10178:
--

 Summary: Potential null object dereference in 
TablePermission#equals()
 Key: HBASE-10178
 URL: https://issues.apache.org/jira/browse/HBASE-10178
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


At line 326:
{code}
((namespace == null  other.getNamespace() == null) ||
 namespace.equals(other.getNamespace()))
{code}
If namespace is null but other.getNamespace() is not null, we would deference 
null object.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10044) test-patch.sh should accept documents by known file extensions

2013-12-16 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849617#comment-13849617
 ] 

Jesse Yates commented on HBASE-10044:
-

seems reasonable to me. thx ted

 test-patch.sh should accept documents by known file extensions
 --

 Key: HBASE-10044
 URL: https://issues.apache.org/jira/browse/HBASE-10044
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.99.0

 Attachments: 10044-v1.txt, 10044-v2.txt, 10044-v3.txt


 Currently only htm[l] files are filtered out when test-patch.sh looks for 
 patch attachment.
 In the email thread, 'Extensions for patches accepted by QA bot', consensus 
 was to accept the following file extensions only:
 .patch
 .txt
 .diff



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849621#comment-13849621
 ] 

Lars Hofhansl commented on HBASE-10174:
---

Nit:
{code}
-  private final MapString, Counter counts;
+  private final ConcurrentMapString, Counter counts = new 
ConcurrentHashMapString, Counter();
{code}
Declaration can remain Map.

When exactly does that become an issue? In the pom we require Guava 11.0.2.

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849627#comment-13849627
 ] 

Lars Hofhansl commented on HBASE-9047:
--

I'll do it today.

 Tool to handle finishing replication when the cluster is offline
 

 Key: HBASE-9047
 URL: https://issues.apache.org/jira/browse/HBASE-9047
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Assignee: Demai Ni
 Fix For: 0.98.0, 0.94.15, 0.99.0

 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, 
 HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, 
 HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, 
 HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, 
 HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, 
 HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch


 We're having a discussion on the mailing list about replicating the data on a 
 cluster that was shut down in an offline fashion. The motivation could be 
 that you don't want to bring HBase back up but still need that data on the 
 slave.
 So I have this idea of a tool that would be running on the master cluster 
 while it is down, although it could also run at any time. Basically it would 
 be able to read the replication state of each master region server, finish 
 replicating what's missing to all the slave, and then clear that state in 
 zookeeper.
 The code that handles replication does most of that already, see 
 ReplicationSourceManager and ReplicationSource. Basically when 
 ReplicationSourceManager.init() is called, it will check all the queues in ZK 
 and try to grab those that aren't attached to a region server. If the whole 
 cluster is down, it will grab all of them.
 The beautiful thing here is that you could start that tool on all your 
 machines and the load will be spread out, but that might not be a big concern 
 if replication wasn't lagging since it would take a few seconds to finish 
 replicating the missing data for each region server.
 I'm guessing when starting ReplicationSourceManager you'd give it a fake 
 region server ID, and you'd tell it not to start its own source.
 FWIW the main difference in how replication is handled between Apache's HBase 
 and Facebook's is that the latter is always done separately of HBase itself. 
 This jira isn't about doing that.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits

2013-12-16 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8701:
-

Attachment: hbase-8701-tag-v1.patch

[~te...@apache.org] A good point. I've updated the patch and moved it to review 
board(https://reviews.apache.org/r/16304/). Thanks.

 distributedLogReplay need to apply wal edits in the receiving order of those 
 edits
 --

 Key: HBASE-8701
 URL: https://issues.apache.org/jira/browse/HBASE-8701
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0

 Attachments: 8701-v3.txt, hbase-8701-tag-v1.patch, 
 hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, 
 hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch


 This issue happens in distributedLogReplay mode when recovering multiple puts 
 of the same key + version(timestamp). After replay, the value is 
 nondeterministic of the key
 h5. The original concern situation raised from [~eclark]:
 For all edits the rowkey is the same.
 There's a log with: [ A (ts = 0), B (ts = 0) ]
 Replay the first half of the log.
 A user puts in C (ts = 0)
 Memstore has to flush
 A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
 Replay the rest of the Log.
 Flush
 The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
 Put(key,t=T) in WAL2
 h5. Below is the option(proposed by Ted) I'd like to use:
 a) During replay, we pass original wal sequence number of each edit to the 
 receiving RS
 b) In receiving RS, we store negative original sequence number of wal edits 
 into mvcc field of KVs of wal edits
 c) Add handling of negative MVCC in KVScannerComparator and KVComparator   
 d) In receiving RS, write original sequence number into an optional field of 
 wal file for chained RS failure situation 
 e) When opening a region, we add a safety bumper(a large number) in order for 
 the new sequence number of a newly opened region not to collide with old 
 sequence numbers. 
 In the future, when we stores sequence number along with KVs, we can adjust 
 the above solution a little bit by avoiding to overload MVCC field.
 h5. The other alternative options are listed below for references:
 Option one
 a) disallow writes during recovery
 b) during replay, we pass original wal sequence ids
 c) hold flush till all wals of a recovering region are replayed. Memstore 
 should hold because we only recover unflushed wal edits. For edits with same 
 key + version, whichever with larger sequence Id wins.
 Option two
 a) During replay, we pass original wal sequence ids
 b) for each wal edit, we store each edit's original sequence id along with 
 its key. 
 c) during scanning, we use the original sequence id if it's present otherwise 
 its store file sequence Id
 d) compaction can just leave put with max sequence id
 Please let me know if you have better ideas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9399) Up the memstore flush size

2013-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849635#comment-13849635
 ] 

Lars Hofhansl commented on HBASE-9399:
--

I am starting to benchmark the memstore. What I found so far is that (not too 
surprisingly) a lot of CPU time during an insert is spent in managing the CSLS. 
Making that larger should have minimal impact (if any). Might get better IO, 
since we're flushing larger initial files, even that should be negligible 
unless we're IO bound on write.

For reads I almost want to flush sooner so that the data gets into the more 
scan friendly block format.


 Up the memstore flush size
 --

 Key: HBASE-9399
 URL: https://issues.apache.org/jira/browse/HBASE-9399
 Project: HBase
  Issue Type: Task
  Components: regionserver
Affects Versions: 0.98.0, 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0


 As heap sizes get bigger we are still recommending that users keep their 
 number of regions to a minimum.  This leads to lots of un-used memstore 
 memory.
 For example I have a region server with 48 gigs of ram.  30 gigs are there 
 for the region server.  This with current defaults the global memstore size 
 reserved is 8 gigs.
 The per region memstore size is 128mb right now.  That means that I need 80 
 regions actively taking writes to reach the global memstore size.  That 
 number is way out of line with what our split policies currently give users.  
 They are given much fewer regions by default.
 We should up the hbase.hregion.memstore.flush.size size.  Ideally we should 
 auto tune everything.  But until then I think something like 512mb would help 
 a lot with our write throughput on clusters that don't have several hundred 
 regions per RS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10174:
---

Attachment: 10174-v2.txt

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 10174-v2.txt, 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849641#comment-13849641
 ] 

Ted Yu commented on HBASE-10174:


Patch v2 addresses comment on using Map.

The issue would surface if user wants to upgrade to Guava 15 - e.g. if some 
user code uses Guava 15 and he/she wants to use one release of Guava in 
deployment.

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 10174-v2.txt, 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849648#comment-13849648
 ] 

stack commented on HBASE-9892:
--

[~liushaohui] One thought I had before commit is what happens for say the case 
where it is an existing cluster and the znode has empty data?  Will we use the 
info port from the configuration?  What if we do a rolling restart when the 
znode has no data in it?  Who writes the znode data? Will new servers be able 
to work if the znode has no data in it?   Thanks.

 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
 HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, 
 HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, 
 HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849652#comment-13849652
 ] 

stack commented on HBASE-10174:
---

So, a client wants to do G15 so we hack on hbase to accommodate?  For sure this 
is the only issue when we jump G11 to G15?  We've tried on h1 and h2 cluster 
deploys?  No conflicts?

 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 10174-v2.txt, 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849651#comment-13849651
 ] 

Hadoop QA commented on HBASE-8701:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618942/hbase-8701-tag.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8176//console

This message is automatically generated.

 distributedLogReplay need to apply wal edits in the receiving order of those 
 edits
 --

 Key: HBASE-8701
 URL: https://issues.apache.org/jira/browse/HBASE-8701
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0

 Attachments: 8701-v3.txt, hbase-8701-tag-v1.patch, 
 hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, 
 hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch


 This issue happens in distributedLogReplay mode when recovering multiple puts 
 of the same key + version(timestamp). After replay, the value is 
 nondeterministic of the key
 h5. The original concern situation raised from [~eclark]:
 For all edits the rowkey is the same.
 There's a log with: [ A (ts = 0), B (ts = 0) ]
 Replay the first half of the log.
 A user puts in C (ts = 0)
 Memstore has to flush
 A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
 Replay the rest of the Log.
 Flush
 The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
 Put(key,t=T) in WAL2
 h5. Below is the option(proposed by Ted) I'd like to use:
 a) During replay, we pass original wal sequence number of each edit to the 
 receiving RS
 b) In receiving RS, we store negative original sequence number of wal edits 
 into mvcc field of KVs of wal edits
 c) Add handling of negative MVCC in KVScannerComparator and KVComparator   
 d) In receiving RS, write original sequence number into an optional field of 
 wal file for chained RS failure situation 
 e) When opening a region, we add a safety bumper(a large number) in order for 
 the new sequence number of a newly opened region not to collide with old 
 sequence numbers. 
 In the 

[jira] [Created] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions

2013-12-16 Thread Perry Trolard (JIRA)
Perry Trolard created HBASE-10179:
-

 Summary: HRegionServer underreports readRequestCounts by 1 under 
certain conditions
 Key: HBASE-10179
 URL: https://issues.apache.org/jira/browse/HBASE-10179
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.6, 0.99.0
Reporter: Perry Trolard
Priority: Minor


In HRegionServer.scan(), if

 (a) the number of results returned, n, is greater than zero
 (b) but less than the size of the batch (nbRows)
 (c) and the size in bytes is smaller than the max size (maxScannerResultSize)

then the readRequestCount will be reported as n - 1 rather than n. (This is 
because the for-loop counter i is used to update the readRequestCount, and if 
the scan runs out of rows before reaching max rows or size, the code `break`s 
out of the loop and i is not incremented for the final time.)

To reproduce, create a test table and open its details page in the web UI. 
Insert a single row, then note the current request count, c. Scan the table, 
returning 1 row; the request count will still be c, whereas it should be c + 1.

I have a patch against TRUNK I can submit. At Splice Machine we're running 
0.94,  I'd be happy to submit a patch against that as well. 




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions

2013-12-16 Thread Perry Trolard (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Perry Trolard updated HBASE-10179:
--

Fix Version/s: 0.99.0
   Status: Patch Available  (was: Open)

 HRegionServer underreports readRequestCounts by 1 under certain conditions
 --

 Key: HBASE-10179
 URL: https://issues.apache.org/jira/browse/HBASE-10179
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.6, 0.99.0
Reporter: Perry Trolard
Priority: Minor
 Fix For: 0.99.0


 In HRegionServer.scan(), if
  (a) the number of results returned, n, is greater than zero
  (b) but less than the size of the batch (nbRows)
  (c) and the size in bytes is smaller than the max size (maxScannerResultSize)
 then the readRequestCount will be reported as n - 1 rather than n. (This is 
 because the for-loop counter i is used to update the readRequestCount, and if 
 the scan runs out of rows before reaching max rows or size, the code `break`s 
 out of the loop and i is not incremented for the final time.)
 To reproduce, create a test table and open its details page in the web UI. 
 Insert a single row, then note the current request count, c. Scan the table, 
 returning 1 row; the request count will still be c, whereas it should be c + 
 1.
 I have a patch against TRUNK I can submit. At Splice Machine we're running 
 0.94,  I'd be happy to submit a patch against that as well. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions

2013-12-16 Thread Perry Trolard (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Perry Trolard updated HBASE-10179:
--

Attachment: 10179-against-trunk.diff

 HRegionServer underreports readRequestCounts by 1 under certain conditions
 --

 Key: HBASE-10179
 URL: https://issues.apache.org/jira/browse/HBASE-10179
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.6, 0.99.0
Reporter: Perry Trolard
Priority: Minor
 Fix For: 0.99.0

 Attachments: 10179-against-trunk.diff


 In HRegionServer.scan(), if
  (a) the number of results returned, n, is greater than zero
  (b) but less than the size of the batch (nbRows)
  (c) and the size in bytes is smaller than the max size (maxScannerResultSize)
 then the readRequestCount will be reported as n - 1 rather than n. (This is 
 because the for-loop counter i is used to update the readRequestCount, and if 
 the scan runs out of rows before reaching max rows or size, the code `break`s 
 out of the loop and i is not incremented for the final time.)
 To reproduce, create a test table and open its details page in the web UI. 
 Insert a single row, then note the current request count, c. Scan the table, 
 returning 1 row; the request count will still be c, whereas it should be c + 
 1.
 I have a patch against TRUNK I can submit. At Splice Machine we're running 
 0.94,  I'd be happy to submit a patch against that as well. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94

2013-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849661#comment-13849661
 ] 

Lars Hofhansl commented on HBASE-10174:
---

It's certainly better to make HBase work with the various versions of Guava out 
there. 

I'm still confused, though. This would be an issue if the *client* used the 
Guava classes, right? Surely nobody would need to upgrade Guava on the HBase 
server (unless one has custom filters or coprocessors that use a newer Guava, 
in which case I'd say tough luck).

If we use Guava on the client, I'd agree that we not force a client application 
to a specific version of Guava just due to the way we're using it. The changed 
classes are server only?


 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
 -

 Key: HBASE-10174
 URL: https://issues.apache.org/jira/browse/HBASE-10174
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.94.15

 Attachments: 10174-v2.txt, 9667-0.94.patch


 On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported 
 NoClassDefFoundError when he used Guava 15.
 The issue has been fixed in 0.96 + by HBASE-9667
 This JIRA ports the fix to 0.94 branch



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8329) Limit compaction speed

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849660#comment-13849660
 ] 

Hadoop QA commented on HBASE-8329:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593121/HBASE-8329-8-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8178//console

This message is automatically generated.

 Limit compaction speed
 --

 Key: HBASE-8329
 URL: https://issues.apache.org/jira/browse/HBASE-8329
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: binlijin
Assignee: binlijin
 Fix For: 0.99.0

 Attachments: HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, 
 HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, 
 HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-trunk.patch


 There is no speed or resource limit for compaction,I think we should add this 
 feature especially when request burst.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849665#comment-13849665
 ] 

Hadoop QA commented on HBASE-5617:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12520451/HBASE-5617_2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8179//console

This message is automatically generated.

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   3   >