[jira] [Updated] (HBASE-16744) Procedure V2 - Lock procedures to allow clients to acquire locks on tables/namespaces/regions

2016-10-12 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-16744:
-
Attachment: HBASE-16744.master.006.patch

> Procedure V2 - Lock procedures to allow clients to acquire locks on 
> tables/namespaces/regions
> -
>
> Key: HBASE-16744
> URL: https://issues.apache.org/jira/browse/HBASE-16744
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-16744.master.001.patch, 
> HBASE-16744.master.002.patch, HBASE-16744.master.003.patch, 
> HBASE-16744.master.004.patch, HBASE-16744.master.005.patch, 
> HBASE-16744.master.006.patch
>
>
> Will help us get rid of ZK locks.
> Will be useful for external tools like hbck, future backup manager, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570846#comment-15570846
 ] 

Heng Chen commented on HBASE-16807:
---

Got it. +1 for it. 

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-12 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570844#comment-15570844
 ] 

Pankaj Kumar commented on HBASE-16807:
--

On ServiceException, instead of using cached data it will refresh the master 
address znode from ZK and create the RS stub.

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16816) HMaster.move() should throw exception if region to move is not online

2016-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570842#comment-15570842
 ] 

Ted Yu commented on HBASE-16816:


Do the above tests pass with patch locally ?

> HMaster.move() should throw exception if region to move is not online
> -
>
> Key: HBASE-16816
> URL: https://issues.apache.org/jira/browse/HBASE-16816
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 1.1.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Fix For: 1.4.0
>
> Attachments: HBASE-16816-branch-1-v2.patch, 
> HBASE-16816-branch-1-v3.patch, HBASE-16816-branch-1.patch
>
>
> The move region function in HMaster only checks whether the region to move 
> exists
> {code}
> if (regionState == null) {
>   throw new 
> UnknownRegionException(Bytes.toStringBinary(encodedRegionName));
> }
> {code}
> It will not return anything if the region is split or in transition which is 
> not movable. So the caller has no way to know if the move region operation is 
> failed.
> It is a problem for "region_move.rb". It only gives up moving a region if a 
> exception is thrown.Otherwise, it will wait until a timeout and retry. 
> Without a exception, it have no idea the region is not movable.
> {code}
> begin
>   admin.move(Bytes.toBytes(r.getEncodedName()), Bytes.toBytes(newServer))
> rescue java.lang.reflect.UndeclaredThrowableException,
> org.apache.hadoop.hbase.UnknownRegionException => e
>   $LOG.info("Exception moving "  + r.getEncodedName() +
> "; split/moved? Continuing: " + e)
>   return
> end
>  # Wait till its up on new server before moving on
> maxWaitInSeconds = admin.getConfiguration.getInt("hbase.move.wait.max", 
> 60)
> maxWait = Time.now + maxWaitInSeconds
> while Time.now < maxWait
>   same = isSameServer(admin, r, original)
>   break unless same
>   sleep 0.1
> end
>   end
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570840#comment-15570840
 ] 

Heng Chen commented on HBASE-16698:
---

Numbers seems 20 regions on one RS.  If you have time,  please upload numbers 
one region on one RS.  I am very inerested about it.  As [~stack] said,  set it 
to off as default is good for me.

BTW.  The patch lgtm. +1 for it.  

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16814) FuzzyRowFilter causes remote call timeout

2016-10-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570833#comment-15570833
 ] 

ramkrishna.s.vasudevan commented on HBASE-16814:


Oh, you mean this is a client side bug? But the scan is throwing timeout 
exception. 

> FuzzyRowFilter causes remote call timeout
> -
>
> Key: HBASE-16814
> URL: https://issues.apache.org/jira/browse/HBASE-16814
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.2, 1.2.3
> Environment: LinuxMint 17.3 (=Ubuntu 14.04), Java 1.8
>Reporter: Hadi Kahraman
>
> FuzzyRowFilter causes ResultScanner.next hang and timeout. The same code 
> works well on hbase 1.2.1, 1.2.0, 1.1.4.
> hbase server: cloudera 5.7.0 (hbase 1.2.0) on 4 hosts, 1 master, 3 workers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570831#comment-15570831
 ] 

ramkrishna.s.vasudevan commented on HBASE-16578:


+1.

> Mob data loss after mob compaction and normal compcation
> 
>
> Key: HBASE-16578
> URL: https://issues.apache.org/jira/browse/HBASE-16578
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: Jingcheng Du
> Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, 
> TestMobCompaction.java, TestMobCompaction.java
>
>
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570829#comment-15570829
 ] 

Heng Chen commented on HBASE-16807:
---

You patch seems just skip the cache without consider ServiceException, right? 

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570828#comment-15570828
 ] 

Hadoop QA commented on HBASE-16752:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 43s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 152m 35s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 196m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegion |
| Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient |
|   | org.apache.hadoop.hbase.client.TestMetaWithReplicas |
|   | org.apache.hadoop.hbase.client.TestEnableTable |
|   | org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833011/HBASE-16752.V2.patch |
| JIRA Issue | HBASE-16752 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 2a64ca92b9dd 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 92ef234 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 

[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570822#comment-15570822
 ] 

Heng Chen commented on HBASE-16653:
---

Let me take a look of patch v3.  Need some time.

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16792) Reuse KeyValue.KeyOnlyKeyValue in BufferedDataBlockEncoder.SeekerState

2016-10-12 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-16792:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks for the patch [~aoxiang]. Thanks for the review 
[~saint@gmail.com].

> Reuse KeyValue.KeyOnlyKeyValue in BufferedDataBlockEncoder.SeekerState
> --
>
> Key: HBASE-16792
> URL: https://issues.apache.org/jira/browse/HBASE-16792
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16792_master.patch
>
>
> When every SeekerState#invalidate will new a fresh KeyValue.KeyOnlyKeyValue, 
> we should reuse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16792) Reuse KeyValue.KeyOnlyKeyValue in BufferedDataBlockEncoder.SeekerState

2016-10-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570816#comment-15570816
 ] 

ramkrishna.s.vasudevan commented on HBASE-16792:


+1. LGTM. Shall commit this code to master.

> Reuse KeyValue.KeyOnlyKeyValue in BufferedDataBlockEncoder.SeekerState
> --
>
> Key: HBASE-16792
> URL: https://issues.apache.org/jira/browse/HBASE-16792
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Attachments: HBASE-16792_master.patch
>
>
> When every SeekerState#invalidate will new a fresh KeyValue.KeyOnlyKeyValue, 
> we should reuse it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16810) HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are in /hbase/draining znode and unloaded

2016-10-12 Thread David Pope (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Pope updated HBASE-16810:
---
Attachment: HBASE-16810.branch-1.v2.patch

Added MockNoopMasterServices.

> HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are 
> in /hbase/draining znode and unloaded
> -
>
> Key: HBASE-16810
> URL: https://issues.apache.org/jira/browse/HBASE-16810
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: David Pope
> Fix For: 1.3.0
>
> Attachments: HBASE-16810.branch-1.patch, 
> HBASE-16810.branch-1.v2.patch, HBASE-16810.patch, master.patch
>
>
> 1. Add a regionserver znode under /hbase/draining znode.
> 2. Use RegionMover to unload all regions from the regionserver.
> 3. Run balancer.
> {code}
> 16/09/21 14:17:33 ERROR ipc.RpcServer: Unexpected throwable object
> java.lang.ArrayIndexOutOfBoundsException: 75
>   at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.getLocalityOfRegion(BaseLoadBalancer.java:867)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer$LocalityCostFunction.cost(StochasticLoadBalancer.java:1186)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.computeCost(StochasticLoadBalancer.java:521)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:309)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:264)
>   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1339)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:442)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58555)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2268)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16810) HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are in /hbase/draining znode and unloaded

2016-10-12 Thread David Pope (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Pope updated HBASE-16810:
---
Attachment: (was: HBASE-16810.branch-1.v2.patch)

> HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are 
> in /hbase/draining znode and unloaded
> -
>
> Key: HBASE-16810
> URL: https://issues.apache.org/jira/browse/HBASE-16810
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: David Pope
> Fix For: 1.3.0
>
> Attachments: HBASE-16810.branch-1.patch, HBASE-16810.patch, 
> master.patch
>
>
> 1. Add a regionserver znode under /hbase/draining znode.
> 2. Use RegionMover to unload all regions from the regionserver.
> 3. Run balancer.
> {code}
> 16/09/21 14:17:33 ERROR ipc.RpcServer: Unexpected throwable object
> java.lang.ArrayIndexOutOfBoundsException: 75
>   at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.getLocalityOfRegion(BaseLoadBalancer.java:867)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer$LocalityCostFunction.cost(StochasticLoadBalancer.java:1186)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.computeCost(StochasticLoadBalancer.java:521)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:309)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:264)
>   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1339)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:442)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58555)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2268)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16810) HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are in /hbase/draining znode and unloaded

2016-10-12 Thread David Pope (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Pope updated HBASE-16810:
---
Attachment: HBASE-16810.branch-1.v2.patch

Added MockNoopMasterServices.

> HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are 
> in /hbase/draining znode and unloaded
> -
>
> Key: HBASE-16810
> URL: https://issues.apache.org/jira/browse/HBASE-16810
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: David Pope
> Fix For: 1.3.0
>
> Attachments: HBASE-16810.branch-1.patch, 
> HBASE-16810.branch-1.v2.patch, HBASE-16810.patch, master.patch
>
>
> 1. Add a regionserver znode under /hbase/draining znode.
> 2. Use RegionMover to unload all regions from the regionserver.
> 3. Run balancer.
> {code}
> 16/09/21 14:17:33 ERROR ipc.RpcServer: Unexpected throwable object
> java.lang.ArrayIndexOutOfBoundsException: 75
>   at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.getLocalityOfRegion(BaseLoadBalancer.java:867)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer$LocalityCostFunction.cost(StochasticLoadBalancer.java:1186)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.computeCost(StochasticLoadBalancer.java:521)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:309)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:264)
>   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1339)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:442)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58555)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2268)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15638) Shade protobuf

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570791#comment-15570791
 ] 

Hudson commented on HBASE-15638:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1776 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1776/])
HBASE-16808 Included the generated, shaded java files from HBASE-15638 (stack: 
rev c6289f197685012eeb63b880f151499a47485e2d)
* (edit) hbase-assembly/src/main/assembly/src.xml


> Shade protobuf
> --
>
> Key: HBASE-15638
> URL: https://issues.apache.org/jira/browse/HBASE-15638
> Project: HBase
>  Issue Type: Bug
>  Components: Protobufs
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: 15638v2.patch, HBASE-15638.master.001.patch, 
> HBASE-15638.master.002.patch, HBASE-15638.master.003 (1).patch, 
> HBASE-15638.master.003 (1).patch, HBASE-15638.master.003 (1).patch, 
> HBASE-15638.master.003.patch, HBASE-15638.master.003.patch, 
> HBASE-15638.master.004.patch, HBASE-15638.master.005.patch, 
> HBASE-15638.master.006.patch, HBASE-15638.master.007.patch, 
> HBASE-15638.master.007.patch, HBASE-15638.master.008.patch, 
> HBASE-15638.master.009.patch, as.far.as.server.patch
>
>
> We need to change our protobuf. Currently it is pb2.5.0. As is, protobufs 
> expect all buffers to be on-heap byte arrays. It does not have facility for 
> dealing in ByteBuffers and off-heap ByteBuffers in particular. This fact 
> frustrates the off-heaping-of-the-write-path project as 
> marshalling/unmarshalling of protobufs involves a copy on-heap first.
> So, we need to patch our protobuf so it supports off-heap ByteBuffers. To 
> ensure we pick up the patched protobuf always, we need to relocate/shade our 
> protobuf and adjust all protobuf references accordingly.
> Given as we have protobufs in our public facing API, Coprocessor Endpoints -- 
> which use protobuf Service to describe new API -- a blind relocation/shading 
> of com.google.protobuf.* will break our API for CoProcessor EndPoints (CPEP) 
> in particular. For example, in the Table Interface, to invoke a method on a 
> registered CPEP, we have:
> {code} Map 
> coprocessorService(
> Class service, byte[] startKey, byte[] endKey, 
> org.apache.hadoop.hbase.client.coprocessor.Batch.Call 
> callable)
> throws com.google.protobuf.ServiceException, Throwable{code}
> This issue is how we intend to shade protobuf for hbase-2.0.0 while 
> preserving our API as is so CPEPs continue to work on the new hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16813) Procedure v2 - Move ProcedureEvent to hbase-procedure module

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570792#comment-15570792
 ] 

Hudson commented on HBASE-16813:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1776 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1776/])
HBASE-16813 Procedure v2 - Move ProcedureEvent to hbase-procedure module 
(matteo.bertozzi: rev 92ef234486537b4325641ce47f6fde26d9432710)
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureEvents.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureScheduler.java
* (edit) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestYieldProcedures.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
* (delete) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureSimpleRunQueue.java
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
* (edit) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureSuspended.java
* (add) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureSchedulerConcurrency.java
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEventQueue.java
* (delete) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureRunnableSet.java
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/SimpleProcedureScheduler.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureSchedulerConcurrency.java
* (add) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureEvents.java
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureScheduler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureEnv.java
* (edit) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/ProcedureTestingUtility.java


> Procedure v2 - Move ProcedureEvent to hbase-procedure module
> 
>
> Key: HBASE-16813
> URL: https://issues.apache.org/jira/browse/HBASE-16813
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-16813-v0.patch
>
>
> ProcedureEvent was added in MasterProcedureScheduler, but it is generic 
> enough to move to hbase-procedure module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16808) Included the generated, shaded java files from "HBASE-15638 Shade protobuf" in our src assembly

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570790#comment-15570790
 ] 

Hudson commented on HBASE-16808:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1776 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1776/])
HBASE-16808 Included the generated, shaded java files from HBASE-15638 (stack: 
rev c6289f197685012eeb63b880f151499a47485e2d)
* (edit) hbase-assembly/src/main/assembly/src.xml


> Included the generated, shaded java files from "HBASE-15638 Shade protobuf" 
> in our src assembly
> ---
>
> Key: HBASE-16808
> URL: https://issues.apache.org/jira/browse/HBASE-16808
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-16808.master.001.patch
>
>
> [~Apache9] found that I forgot to include the generated, shaded files in our 
> src tgz. Let me fix. This is a follow-on to HBASE-16793



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16746) The log of “region close” should differ from “region move”

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570793#comment-15570793
 ] 

Hudson commented on HBASE-16746:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1776 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1776/])
HBASE-16746 The log of “region close” should differ from “region move” (stack: 
rev dfb2a800c43b30c326dbdcec673387f7033f2092)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> The log of “region close” should differ from “region move”
> --
>
> Key: HBASE-16746
> URL: https://issues.apache.org/jira/browse/HBASE-16746
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16746.v0.patch
>
>
> If we disable some tables, we will see the following in region server log.
> {noformat}
> Close 90ed2fe1748644c6faecdec3651335d4, moving to null
> {noformat}
> The message “moving to null” is a bit confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16816) HMaster.move() should throw exception if region to move is not online

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570779#comment-15570779
 ] 

Hadoop QA commented on HBASE-16816:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
4s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 6s 
{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 53s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 47s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 118m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient |
|   | org.apache.hadoop.hbase.client.TestMetaWithReplicas |
|   | org.apache.hadoop.hbase.TestHColumnDescriptorDefaultVersions |
|   | org.apache.hadoop.hbase.regionserver.TestClusterId |
|   | org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:b2c5d84 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833020/HBASE-16816-branch-1-v3.patch
 |
| JIRA Issue | HBASE-16816 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle 

[jira] [Commented] (HBASE-16489) Configuration parsing

2016-10-12 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570768#comment-15570768
 ] 

Xiaobing Zhou commented on HBASE-16489:
---

[~sudeeps] thanks for work. HDFS-9537 and HDFS-9538 have done the work of 
loading configuration, you might want to refer to them, and reuse the work if 
possible. HDFS-9632, HDFS-9791, HDFS-10787 and HDFS-10611 are the follow up 
work or fix.

> Configuration parsing
> -
>
> Key: HBASE-16489
> URL: https://issues.apache.org/jira/browse/HBASE-16489
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-16489.HBASE-14850.v1.patch
>
>
> Reading hbase-site.xml is required to read various properties viz. 
> zookeeper-quorum, client retires etc.  We can either use Apache Xerces or 
> Boost libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16822) Enable restore-snapshot and clone-snapshot to use external specified snapshot locatioin

2016-10-12 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570758#comment-15570758
 ] 

Jerry He commented on HBASE-16822:
--

In the Backup/Restore feature under discussion, this ticket can help too by 
providing an alternative way to bulkloading during restore.  Hopefully we can 
restore/clone the full backup, which is an exported snapshot.

> Enable restore-snapshot and clone-snapshot to use external specified snapshot 
> locatioin 
> 
>
> Key: HBASE-16822
> URL: https://issues.apache.org/jira/browse/HBASE-16822
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>
> Currently restore-snapshot and clone-snapshot only work with the snapshots 
> that are under hbase root.dir.
> In combination with export-snapshot, this means the snapshot needs to be 
> exported out to another hbase root.dir, or back and forth eventually to a 
> hbase root.dir.
> There are a few issues with the approach.
> We've known that export-snapshot has a limitation dealing with secure 
> cluster, where the external user needs to have read access to hbase root.dir 
> data, by-passing table ACL check.
> The second problem is when we try to use or bring back the exported snapshot 
> for restore/clone.  They have to be in the target hbase root.dir, and needs 
> write permission to get it in there.
> Again we will have permission problem.
> This ticket tries to deal with the second problem, clone and restore from 
> exported snapshots.  The exported snapshots can be on the same cluster, but 
> the user may not have write permission to move them to hbase root.dir.
> We should have a solution that allow clone/restore snapshot from an external 
> path that keeps snapshot backups. And also do it with security permission in 
> mind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570746#comment-15570746
 ] 

Heng Chen commented on HBASE-16664:
---

Failed test cases for v7 seems not related.  If you are not hurry,  i will 
commit it later today.  Otherwise,  you could ask [~Apache9] to commit it.  
Thanks for your working, [~yangzhe1991] !  :)  

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16822) Enable restore-snapshot and clone-snapshot to use external specified snapshot locatioin

2016-10-12 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-16822:
-
Description: 
Currently restore-snapshot and clone-snapshot only work with the snapshots that 
are under hbase root.dir.

In combination with export-snapshot, this means the snapshot needs to be 
exported out to another hbase root.dir, or back and forth eventually to a hbase 
root.dir.

There are a few issues with the approach.

We've known that export-snapshot has a limitation dealing with secure cluster, 
where the external user needs to have read access to hbase root.dir data, 
by-passing table ACL check.

The second problem is when we try to use or bring back the exported snapshot 
for restore/clone.  They have to be in the target hbase root.dir, and needs 
write permission to get it in there.

Again we will have permission problem.

This ticket tries to deal with the second problem, clone and restore from 
exported snapshots.  The exported snapshots can be on the same cluster, but the 
user may not have write permission to move them to hbase root.dir.

We should have a solution that allow clone/restore snapshot from an external 
path that keeps snapshot backups. And also do it with security permission in 
mind.



  was:
Currently restore-snapshot and clone-snapshot only work with the snapshots that 
are under hbase root.dir.

In combination with export-snapshot, this means the snapshot needs to be 
exported out to another hbase root.dir, or back and forth eventually to a hbase 
root.dir.

There are a few issues with the approach.

We've know that export-snapshot has a limitation dealing with secure cluster, 
where the external user needs to have read access to hbase root.dir data, 
by-passing table ACL check.

The second problem is when we try to use or bring back the exported snapshot 
for restore/clone.  They have to in the target hbase root.dir, and needs write 
permission to get it in there.

Again we will have permission problem.

This ticket tries to deal with the second problem, clone and restore from a 
exported snapshot.  The exported snapshots can be on the same cluster but the 
user may not have write permission to move it to hbase.root.dir.

We should have a solution that allow clone/restore snapshot from a external 
path that keeps snapshot backups. And also do it with security permission in 
mind.




> Enable restore-snapshot and clone-snapshot to use external specified snapshot 
> locatioin 
> 
>
> Key: HBASE-16822
> URL: https://issues.apache.org/jira/browse/HBASE-16822
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>
> Currently restore-snapshot and clone-snapshot only work with the snapshots 
> that are under hbase root.dir.
> In combination with export-snapshot, this means the snapshot needs to be 
> exported out to another hbase root.dir, or back and forth eventually to a 
> hbase root.dir.
> There are a few issues with the approach.
> We've known that export-snapshot has a limitation dealing with secure 
> cluster, where the external user needs to have read access to hbase root.dir 
> data, by-passing table ACL check.
> The second problem is when we try to use or bring back the exported snapshot 
> for restore/clone.  They have to be in the target hbase root.dir, and needs 
> write permission to get it in there.
> Again we will have permission problem.
> This ticket tries to deal with the second problem, clone and restore from 
> exported snapshots.  The exported snapshots can be on the same cluster, but 
> the user may not have write permission to move them to hbase root.dir.
> We should have a solution that allow clone/restore snapshot from an external 
> path that keeps snapshot backups. And also do it with security permission in 
> mind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570718#comment-15570718
 ] 

Hadoop QA commented on HBASE-16578:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
8s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 33s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 45s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
13s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 115m 21s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.constraint.TestConstraint |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowAndColumnRangeFilter |
|   | org.apache.hadoop.hbase.TestNamespace |
|   | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833016/HBASE-16578-V2.patch |
| JIRA Issue | HBASE-16578 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 9a34a7b31a73 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 92ef234 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3973/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/3973/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3973/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3973/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message 

[jira] [Commented] (HBASE-16505) Save deadline in RpcCallContext according to request's timeout

2016-10-12 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570712#comment-15570712
 ] 

Phil Yang commented on HBASE-16505:
---

Thanks all for reviewing, thanks [~stack] for committing.

> Save deadline in RpcCallContext according to request's timeout
> --
>
> Key: HBASE-16505
> URL: https://issues.apache.org/jira/browse/HBASE-16505
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16505-branch-1-v1.patch, 
> HBASE-16505-branch-1-v2.patch, HBASE-16505-branch-1-v3.patch, 
> HBASE-16505-v1.patch, HBASE-16505-v10.patch, HBASE-16505-v10.patch, 
> HBASE-16505-v11.patch, HBASE-16505-v11.patch, HBASE-16505-v12.patch, 
> HBASE-16505-v13.patch, HBASE-16505-v2.patch, HBASE-16505-v3.patch, 
> HBASE-16505-v4.patch, HBASE-16505-v5.patch, HBASE-16505-v6.patch, 
> HBASE-16505-v7.patch, HBASE-16505-v8.patch, HBASE-16505-v9.patch
>
>
> If we want to know the correct setting of timeout in read/write path, we need 
> add a new parameter in operation-methods of Region.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16822) Enable restore-snapshot and clone-snapshot to use external specified snapshot locatioin

2016-10-12 Thread Jerry He (JIRA)
Jerry He created HBASE-16822:


 Summary: Enable restore-snapshot and clone-snapshot to use 
external specified snapshot locatioin 
 Key: HBASE-16822
 URL: https://issues.apache.org/jira/browse/HBASE-16822
 Project: HBase
  Issue Type: Improvement
Reporter: Jerry He


Currently restore-snapshot and clone-snapshot only work with the snapshots that 
are under hbase root.dir.

In combination with export-snapshot, this means the snapshot needs to be 
exported out to another hbase root.dir, or back and forth eventually to a hbase 
root.dir.

There are a few issues with the approach.

We've know that export-snapshot has a limitation dealing with secure cluster, 
where the external user needs to have read access to hbase root.dir data, 
by-passing table ACL check.

The second problem is when we try to use or bring back the exported snapshot 
for restore/clone.  They have to in the target hbase root.dir, and needs write 
permission to get it in there.

Again we will have permission problem.

This ticket tries to deal with the second problem, clone and restore from a 
exported snapshot.  The exported snapshots can be on the same cluster but the 
user may not have write permission to move it to hbase.root.dir.

We should have a solution that allow clone/restore snapshot from a external 
path that keeps snapshot backups. And also do it with security permission in 
mind.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16414) Improve performance for RPC encryption with Apache Common Crypto

2016-10-12 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HBASE-16414:
-
Attachment: HBASE-16414.008.patch

> Improve performance for RPC encryption with Apache Common Crypto
> 
>
> Key: HBASE-16414
> URL: https://issues.apache.org/jira/browse/HBASE-16414
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC
>Affects Versions: 2.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HBASE-16414.001.patch, HBASE-16414.002.patch, 
> HBASE-16414.003.patch, HBASE-16414.004.patch, HBASE-16414.005.patch, 
> HBASE-16414.006.patch, HBASE-16414.007.patch, HBASE-16414.008.patch, 
> HbaseRpcEncryptionWithCrypoto.docx
>
>
> Hbase RPC encryption is enabled by setting “hbase.rpc.protection” to 
> "privacy". With the token authentication, it utilized DIGEST-MD5 mechanisms 
> for secure authentication and data protection. For DIGEST-MD5, it uses DES, 
> 3DES or RC4 to do encryption and it is very slow, especially for Scan. This 
> will become the bottleneck of the RPC throughput.
> Apache Commons Crypto is a cryptographic library optimized with AES-NI. It 
> provides Java API for both cipher level and Java stream level. Developers can 
> use it to implement high performance AES encryption/decryption with the 
> minimum code and effort. Compare with the current implementation of 
> org.apache.hadoop.hbase.io.crypto.aes.AES, Crypto supports both JCE Cipher 
> and OpenSSL Cipher which is better performance than JCE Cipher. User can 
> configure the cipher type and the default is JCE Cipher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570596#comment-15570596
 ] 

Ted Yu commented on HBASE-16578:


[~mbertozzi]:
Do you want to take a look ?

Thanks

> Mob data loss after mob compaction and normal compcation
> 
>
> Key: HBASE-16578
> URL: https://issues.apache.org/jira/browse/HBASE-16578
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: Jingcheng Du
> Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, 
> TestMobCompaction.java, TestMobCompaction.java
>
>
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16716) OfflineMetaRepair leaves empty directory inside /hbase/WALs which remains forever

2016-10-12 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570594#comment-15570594
 ] 

Pankaj Kumar commented on HBASE-16716:
--

Test case failure is not relevant, locally it is successful.

Findbug warning is also not relevant, 
{noformat}
Return value of java.util.concurrent.CountDownLatch.await(long, TimeUnit) 
ignored in 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper()
{noformat}

> OfflineMetaRepair leaves empty directory inside /hbase/WALs which remains 
> forever
> -
>
> Key: HBASE-16716
> URL: https://issues.apache.org/jira/browse/HBASE-16716
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Affects Versions: 2.0.0
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16716-V2.patch, HBASE-16716-branch-1.patch, 
> HBASE-16716.patch
>
>
> OfflineMetaRepair rebuild Meta table, while creating meta region it creates 
> it's own WAL (inside /hbase/WALs/hbck-meta-recovery-) which wll 
> be closed and archived after rebuilding Meta. 
> {noformat}
> hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
> >> /hbase/WALs/hbck-meta-recovery-
> {noformat}
> It doesn't clear the empty dir, empty directory should be removed after 
> success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16786) Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure)

2016-10-12 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570583#comment-15570583
 ] 

Jingcheng Du edited comment on HBASE-16786 at 10/13/16 2:28 AM:


Hi [~appy].
The lock in MOB is trying to avoid making deleted cells live again.
If we always retain the delete markers in major compaction, the locks are not 
needed anymore. How do you think about this approach? [~jmhsieh], [~mbertozzi].


was (Author: jingcheng...@intel.com):
Hi [~appy].
The locks is trying to avoid making deleted cells live again.
If we always retain the delete markers in major compaction, the locks are not 
needed anymore. How do you think about this approach? [~jmhsieh], [~mbertozzi].

> Procedure V2 - Move ZK-lock's uses to Procedure framework locks 
> (LockProcedure)
> ---
>
> Key: HBASE-16786
> URL: https://issues.apache.org/jira/browse/HBASE-16786
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-16786.master.001.patch, 
> HBASE-16786.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16786) Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure)

2016-10-12 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570583#comment-15570583
 ] 

Jingcheng Du commented on HBASE-16786:
--

Hi [~appy].
The locks is trying to avoid making deleted cells live again.
If we always retain the delete markers in major compaction, the locks are not 
needed anymore. How do you think about this approach? [~jmhsieh], [~mbertozzi].

> Procedure V2 - Move ZK-lock's uses to Procedure framework locks 
> (LockProcedure)
> ---
>
> Key: HBASE-16786
> URL: https://issues.apache.org/jira/browse/HBASE-16786
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-16786.master.001.patch, 
> HBASE-16786.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 2:15 AM:


Ooops, may be a bug.





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to configure HBase cluster. From 
your log file, we  see that LocalFileSystem is used instead of hdfs, that looks 
like your own configuration problem.  If you have cluster, go to any node to 
HBASE_HOME/bin and run backup command. If it fails, then you will not be able 
to run any hbase command, including 'shell'. Can you run "hbase shell" and 
connect to cluster?




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture 

[jira] [Updated] (HBASE-16816) HMaster.move() should throw exception if region to move is not online

2016-10-12 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-16816:
---
Attachment: HBASE-16816-branch-1-v3.patch

Use TEST_UTIL.waitUntilNoRegionsInTransition instead of sleeping.

> HMaster.move() should throw exception if region to move is not online
> -
>
> Key: HBASE-16816
> URL: https://issues.apache.org/jira/browse/HBASE-16816
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 1.1.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Attachments: HBASE-16816-branch-1-v2.patch, 
> HBASE-16816-branch-1-v3.patch, HBASE-16816-branch-1.patch
>
>
> The move region function in HMaster only checks whether the region to move 
> exists
> {code}
> if (regionState == null) {
>   throw new 
> UnknownRegionException(Bytes.toStringBinary(encodedRegionName));
> }
> {code}
> It will not return anything if the region is split or in transition which is 
> not movable. So the caller has no way to know if the move region operation is 
> failed.
> It is a problem for "region_move.rb". It only gives up moving a region if a 
> exception is thrown.Otherwise, it will wait until a timeout and retry. 
> Without a exception, it have no idea the region is not movable.
> {code}
> begin
>   admin.move(Bytes.toBytes(r.getEncodedName()), Bytes.toBytes(newServer))
> rescue java.lang.reflect.UndeclaredThrowableException,
> org.apache.hadoop.hbase.UnknownRegionException => e
>   $LOG.info("Exception moving "  + r.getEncodedName() +
> "; split/moved? Continuing: " + e)
>   return
> end
>  # Wait till its up on new server before moving on
> maxWaitInSeconds = admin.getConfiguration.getInt("hbase.move.wait.max", 
> 60)
> maxWait = Time.now + maxWaitInSeconds
> while Time.now < maxWait
>   same = isSameServer(admin, r, original)
>   break unless same
>   sleep 0.1
> end
>   end
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16812) Cleanup deprecated compact() function

2016-10-12 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570560#comment-15570560
 ] 

Jingcheng Du commented on HBASE-16812:
--

bq. I tried to find the locks for CompactionTool and ExpiredMobFileCleaner but 
couldn't find it. Can you please point them to me, maybe github links. Thanks.
I reviewed the code in CompactionTool, it is a dummy HStore. Yes, there is no 
lock for it. I can add it.
For ExpiredMobFileCleaner, it has two entries, one is the 
{{ExpiredMobFileCleanerChore}} where read lock is used. Another is the tool, 
yes, we don't have locks for it. I can add it. Sorry for misunderstanding your 
words.

> Cleanup deprecated compact() function
> -
>
> Key: HBASE-16812
> URL: https://issues.apache.org/jira/browse/HBASE-16812
> Project: HBase
>  Issue Type: Task
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Attachments: HBASE-16812.master.001.patch
>
>
> compact(CompactionContext compaction, CompactionThroughputController 
> throughputController) is [deprecated in 1.2.0 
> release|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L222].
> Store.java is also marked limited private.
> Context: I was cleaning up zk table lock which is also used in that method's 
> [override|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L460]
>  in HMobStore.
> This method isn't being called from anywhere except CompactionTool (which 
> creates HStore object, not HMobStore object).
> [~jingcheng...@intel.com] Can you PTAL and help me understand what's going on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 2:12 AM:


[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to configure HBase cluster. From 
your log file, we  see that LocalFileSystem is used instead of hdfs, that looks 
like your own configuration problem.  If you have cluster, go to any node to 
HBASE_HOME/bin and run backup command. If it fails, then you will not be able 
to run any hbase command, including 'shell'. Can you run "hbase shell" and 
connect to cluster?





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you will not be able to run any hbase command, including 
'shell'. Can you run "hbase shell" and connect to cluster?




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture 

[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 2:11 AM:


[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you will not be able to run any hbase command, including 
'shell'. Can you run "hbase shell" and connect to cluster?





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you will not be able to run any hbase command, including 
'shell'. 




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or 

[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 2:00 AM:


[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you will not be able to run any hbase command, including 
'shell'. 





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you are will not be able to run any hbase command, including 
'shell'. 




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs 

[jira] [Created] (HBASE-16821) Enhance LoadIncrementalHFiles to convey missing hfiles if any

2016-10-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-16821:
--

 Summary: Enhance LoadIncrementalHFiles to convey missing hfiles if 
any
 Key: HBASE-16821
 URL: https://issues.apache.org/jira/browse/HBASE-16821
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


When map parameter of run() method is not null:
{code}
  public int run(String dirPath, Map map, TableName 
tableName) throws Exception{
{code}
the caller knows the exact files to be bulk loaded.

This issue is to enhance the run() API so that when certain hfiles turn out to 
be missing, the return value should indicate the missing files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16749) HBase root pom.xml contains repo from people.apache.org/~garyh

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570549#comment-15570549
 ] 

Hudson commented on HBASE-16749:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #46 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/46/])
HBASE-16749 Remove defunct repo from people.apache.org (garyh: rev 
211559e6d012b52b3e0fc8ecbd1a71856c9ae06c)
* (edit) pom.xml


> HBase root pom.xml contains repo from people.apache.org/~garyh
> --
>
> Key: HBASE-16749
> URL: https://issues.apache.org/jira/browse/HBASE-16749
> Project: HBase
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Gary Helmling
> Fix For: 1.2.5, 1.1.8
>
> Attachments: HBASE-16749.branch-1.1.patch, 
> HBASE-16749.branch-1.2.patch, HBASE-16749.branch-1.2.patch
>
>
> This is found when I was building Hadoop/YARN:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Hadoop YARN Timeline Service 3.0.0-alpha2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://conjars.org/repo/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> {code}
> [~te...@apache.org] mentioned:
> bq. Among hbase releases (1.1.x, 1.2.y), root pom.xml contains 
> "ghelmling.testing" repo. 
> This private repo sometimes extends the Hadoop build time a lot. It might be 
> better to use ASF snapshot repo first before downloading from private repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 1:58 AM:


[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you are will not be able to run any hbase command, including 
'shell'. 





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you are not able to run any hbase command, including 'shell'. 




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to 

[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 1:56 AM:


[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 

If you have cluster, go to any node to 
HBASE_HOME/bin

and run backup command

If it fails, then you are not able to run any hbase command, including 'shell'. 





was (Author: vrodionov):
[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to 

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570534#comment-15570534
 ] 

Vladimir Rodionov commented on HBASE-7912:
--

[~stack]
You can file JIRA, of course, but I think it will be closed with standard 
suggestion to ask community for help on how to
configure HBase cluster. From your log file, we clearly see that 
LocalFileSystem is used instead of hdfs, that is clearly, your own 
configuration issue. 




> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental 

[jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570525#comment-15570525
 ] 

Ted Yu commented on HBASE-16578:


+1 if tests pass.

> Mob data loss after mob compaction and normal compcation
> 
>
> Key: HBASE-16578
> URL: https://issues.apache.org/jira/browse/HBASE-16578
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: Jingcheng Du
> Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, 
> TestMobCompaction.java, TestMobCompaction.java
>
>
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-12 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570524#comment-15570524
 ] 

Guanghao Zhang commented on HBASE-16653:


ReplicationPeers, ReplicationPeersZKImpl, ReplicationPeerZKImpl are 
InterfaceAudience.Private. So I keep the changes in v3 patch. But 
ReplicationPeerConfig class is InterfaceAudience.Public and v3 patch added two 
methods for it. Did this has compatibility issues?

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16816) HMaster.move() should throw exception if region to move is not online

2016-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570522#comment-15570522
 ] 

Ted Yu commented on HBASE-16816:


Can you detect region online without sleeping ?
{code}
160//wait region online
161Thread.sleep(1000);
{code}

> HMaster.move() should throw exception if region to move is not online
> -
>
> Key: HBASE-16816
> URL: https://issues.apache.org/jira/browse/HBASE-16816
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 1.1.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Attachments: HBASE-16816-branch-1-v2.patch, HBASE-16816-branch-1.patch
>
>
> The move region function in HMaster only checks whether the region to move 
> exists
> {code}
> if (regionState == null) {
>   throw new 
> UnknownRegionException(Bytes.toStringBinary(encodedRegionName));
> }
> {code}
> It will not return anything if the region is split or in transition which is 
> not movable. So the caller has no way to know if the move region operation is 
> failed.
> It is a problem for "region_move.rb". It only gives up moving a region if a 
> exception is thrown.Otherwise, it will wait until a timeout and retry. 
> Without a exception, it have no idea the region is not movable.
> {code}
> begin
>   admin.move(Bytes.toBytes(r.getEncodedName()), Bytes.toBytes(newServer))
> rescue java.lang.reflect.UndeclaredThrowableException,
> org.apache.hadoop.hbase.UnknownRegionException => e
>   $LOG.info("Exception moving "  + r.getEncodedName() +
> "; split/moved? Continuing: " + e)
>   return
> end
>  # Wait till its up on new server before moving on
> maxWaitInSeconds = admin.getConfiguration.getInt("hbase.move.wait.max", 
> 60)
> maxWait = Time.now + maxWaitInSeconds
> while Time.now < maxWait
>   same = isSameServer(admin, r, original)
>   break unless same
>   sleep 0.1
> end
>   end
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16812) Cleanup deprecated compact() function

2016-10-12 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570519#comment-15570519
 ] 

Appy commented on HBASE-16812:
--

I tried to find the locks for CompactionTool and ExpiredMobFileCleaner but 
couldn't find it. Can you please point them to me, maybe github links. Thanks.


> Cleanup deprecated compact() function
> -
>
> Key: HBASE-16812
> URL: https://issues.apache.org/jira/browse/HBASE-16812
> Project: HBase
>  Issue Type: Task
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Attachments: HBASE-16812.master.001.patch
>
>
> compact(CompactionContext compaction, CompactionThroughputController 
> throughputController) is [deprecated in 1.2.0 
> release|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L222].
> Store.java is also marked limited private.
> Context: I was cleaning up zk table lock which is also used in that method's 
> [override|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L460]
>  in HMobStore.
> This method isn't being called from anywhere except CompactionTool (which 
> creates HStore object, not HMobStore object).
> [~jingcheng...@intel.com] Can you PTAL and help me understand what's going on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HBASE-16578:
-
Attachment: HBASE-16578-V2.patch

Upload a new patch V2 to address Ted's comments.

> Mob data loss after mob compaction and normal compcation
> 
>
> Key: HBASE-16578
> URL: https://issues.apache.org/jira/browse/HBASE-16578
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: Jingcheng Du
> Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, 
> TestMobCompaction.java, TestMobCompaction.java
>
>
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16816) HMaster.move() should throw exception if region to move is not online

2016-10-12 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-16816:
---
Attachment: HBASE-16816-branch-1-v2.patch

fix TestWarmupRegion UT fails

> HMaster.move() should throw exception if region to move is not online
> -
>
> Key: HBASE-16816
> URL: https://issues.apache.org/jira/browse/HBASE-16816
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 1.1.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Attachments: HBASE-16816-branch-1-v2.patch, HBASE-16816-branch-1.patch
>
>
> The move region function in HMaster only checks whether the region to move 
> exists
> {code}
> if (regionState == null) {
>   throw new 
> UnknownRegionException(Bytes.toStringBinary(encodedRegionName));
> }
> {code}
> It will not return anything if the region is split or in transition which is 
> not movable. So the caller has no way to know if the move region operation is 
> failed.
> It is a problem for "region_move.rb". It only gives up moving a region if a 
> exception is thrown.Otherwise, it will wait until a timeout and retry. 
> Without a exception, it have no idea the region is not movable.
> {code}
> begin
>   admin.move(Bytes.toBytes(r.getEncodedName()), Bytes.toBytes(newServer))
> rescue java.lang.reflect.UndeclaredThrowableException,
> org.apache.hadoop.hbase.UnknownRegionException => e
>   $LOG.info("Exception moving "  + r.getEncodedName() +
> "; split/moved? Continuing: " + e)
>   return
> end
>  # Wait till its up on new server before moving on
> maxWaitInSeconds = admin.getConfiguration.getInt("hbase.move.wait.max", 
> 60)
> maxWait = Time.now + maxWaitInSeconds
> while Time.now < maxWait
>   same = isSameServer(admin, r, original)
>   break unless same
>   sleep 0.1
> end
>   end
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16815) Low scan ratio in RPC queue tuning triggers divide by zero exception

2016-10-12 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570507#comment-15570507
 ] 

Guanghao Zhang commented on HBASE-16815:


If queue's number is 0, skip startHandler() maybe better?
{noformat}
2016-10-12 12:50:27,311 DEBUG [main] ipc.RWQueueRpcExecutor: FifoRWQ.default 
writeQueues=7 writeHandlers=15 readQueues=8 readHandlers=14
{noformat}
I thought that we can change log level to INFO and log scan queues number, scan 
handlers count, too.

> Low scan ratio in RPC queue tuning triggers divide by zero exception
> 
>
> Key: HBASE-16815
> URL: https://issues.apache.org/jira/browse/HBASE-16815
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, rpc
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Lars George
>
> Trying the following settings:
> {noformat}
> 
>   hbase.ipc.server.callqueue.handler.factor
>   0.5
> 
> 
>   hbase.ipc.server.callqueue.read.ratio
>   0.5
> 
> 
>   hbase.ipc.server.callqueue.scan.ratio
>   0.1
> 
> {noformat}
> With 30 default handlers, this means 15 queues. Further, it means 8 write 
> queues and 7 read queues. 10% of that is {{0.7}} which is then floor'ed to 
> {{0}}. The debug log confirms it, as the tertiary check omits the scan 
> details when they are zero:
> {noformat}
> 2016-10-12 12:50:27,305 INFO  [main] ipc.SimpleRpcScheduler: Using fifo as 
> user call queue, count=15
> 2016-10-12 12:50:27,311 DEBUG [main] ipc.RWQueueRpcExecutor: FifoRWQ.default 
> writeQueues=7 writeHandlers=15 readQueues=8 readHandlers=14
> {noformat}
> But the code in {{RWQueueRpcExecutor}} calls {{RpcExecutor.startHandler()}} 
> nevertheless and that does this:
> {code}
> for (int i = 0; i < numHandlers; i++) {
>   final int index = qindex + (i % qsize);
>   String name = "RpcServer." + threadPrefix + ".handler=" + 
> handlers.size() + ",queue=" +
>   index + ",port=" + port;
> {code}
> The modulo triggers then 
> {noformat}
> 2016-10-12 11:41:22,810 ERROR [main] master.HMasterCommandLine: Master exiting
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:220)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:155)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:222)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:137)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2524)
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.startHandlers(RpcExecutor.java:125)
> at 
> org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.startHandlers(RWQueueRpcExecutor.java:178)
> at org.apache.hadoop.hbase.ipc.RpcExecutor.start(RpcExecutor.java:78)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.start(SimpleRpcScheduler.java:272)
> at org.apache.hadoop.hbase.ipc.RpcServer.start(RpcServer.java:2212)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.start(RSRpcServices.java:1143)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:615)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:396)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.(HMasterCommandLine.java:312)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:140)
> ... 7 more
> {noformat}
> That causes the server to not even start. I would suggest we either skip the 
> {{startHandler()}} call altogether, or make it zero aware.
> Another possible option is to reserve at least _one_ scan handler/queue when 
> the scan ratio is greater than zero, but only of there is more than one read 
> handler/queue to begin with. Otherwise the scan handler/queue should be zero 
> and share the one read handler/queue.
> Makes sense?



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HBASE-16786) Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure)

2016-10-12 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-16786:
-
Attachment: HBASE-16786.master.002.patch

> Procedure V2 - Move ZK-lock's uses to Procedure framework locks 
> (LockProcedure)
> ---
>
> Key: HBASE-16786
> URL: https://issues.apache.org/jira/browse/HBASE-16786
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-16786.master.001.patch, 
> HBASE-16786.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16812) Cleanup deprecated compact() function

2016-10-12 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570482#comment-15570482
 ] 

Jingcheng Du commented on HBASE-16812:
--

Thanks a lot [~appy]. I can take this JIRA to cleanup the deprecated compact() 
method.
bq. If compaction tool triggers compact, then no locks are being taken.
Actually the lock is taken in CompactionTool, because we do this in 
{{DefaultMobStoreCompactor.compact}}.
Now we don't have mob compaction in CompactionTool, but how about adding it in 
another jira, and only do cleanup in this jira?
bq. If ExpiredMobFileCleaner is triggered from command line, then no locks are 
taken.
We take read lock here too to synchronize this operation with mob and major 
compaction.

> Cleanup deprecated compact() function
> -
>
> Key: HBASE-16812
> URL: https://issues.apache.org/jira/browse/HBASE-16812
> Project: HBase
>  Issue Type: Task
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Attachments: HBASE-16812.master.001.patch
>
>
> compact(CompactionContext compaction, CompactionThroughputController 
> throughputController) is [deprecated in 1.2.0 
> release|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L222].
> Store.java is also marked limited private.
> Context: I was cleaning up zk table lock which is also used in that method's 
> [override|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L460]
>  in HMobStore.
> This method isn't being called from anywhere except CompactionTool (which 
> creates HStore object, not HMobStore object).
> [~jingcheng...@intel.com] Can you PTAL and help me understand what's going on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

2016-10-12 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-16752:
--
Attachment: HBASE-16752.V2.patch

V2: Close the connection after making sure we send the response for the 
offending rpc to the client. Also, adding javadoc for the 
RequestTooBigException.

> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> --
>
> Key: HBASE-16752
> URL: https://issues.apache.org/jira/browse/HBASE-16752
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, rpc
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-16752.V1.patch, HBASE-16752.V2.patch
>
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation

2016-10-12 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570444#comment-15570444
 ] 

Jingcheng Du commented on HBASE-16578:
--

Thanks a lot for review [~te...@apache.org]! I will upload a patch to address 
this.
bq. The original test is superseded by the new test ?
Yes, the new test support the old one.

> Mob data loss after mob compaction and normal compcation
> 
>
> Key: HBASE-16578
> URL: https://issues.apache.org/jira/browse/HBASE-16578
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: Jingcheng Du
> Attachments: HBASE-16578.patch, TestMobCompaction.java, 
> TestMobCompaction.java
>
>
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16721) Concurrency issue in WAL unflushed seqId tracking

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570404#comment-15570404
 ] 

Hadoop QA commented on HBASE-16721:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 2m 39s 
{color} | {color:red} root in branch-1.1 failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.8.0_101. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 45s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.7.0_80. 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} branch-1.1 passed {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 14s 
{color} | {color:red} hbase-server in branch-1.1 failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in branch-1.1 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 12s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.8.0_101. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.7.0_80. 
{color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 5s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_101. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 5s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_80. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 6s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_80. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
7s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 55s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:red}-1{color} | {color:red} hbaseprotoc {color} | {color:red} 0m 7s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 8s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 6s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_101. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 7s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_80. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s {color} | 
{color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
40s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 28s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:35e2245 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833008/hbase-16721_addendum2.branch-1.1.patch
 |
| JIRA Issue | HBASE-16721 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 8f73a74263de 3.13.0-92-generic 

[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

2016-10-12 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570394#comment-15570394
 ] 

Ashu Pachauri commented on HBASE-16752:
---

[~ghelmling] Yeah you raise a valid point, we need to address the remaining 
part of the request on the channel in some way. Reading the garbage off the 
network just to discard the data sound like overkill. Even now we close the 
connection; with this change we'll have a better feedback to the client even if 
we do close the connection. So, I'll just modify the code to close the 
connection.

> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> --
>
> Key: HBASE-16752
> URL: https://issues.apache.org/jira/browse/HBASE-16752
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, rpc
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-16752.V1.patch
>
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570386#comment-15570386
 ] 

stack commented on HBASE-15560:
---

Thanks Ben. Let me try it here. Should be able in next day or so.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570379#comment-15570379
 ] 

stack commented on HBASE-7912:
--

{code}
2016-10-12 17:19:49,538 DEBUG [AsyncRpcChannel-pool2-t13] ipc.AsyncRpcChannel: 
Use SIMPLE authentication for service ClientService, sasl=false
2016-10-12 17:19:49,554 ERROR [main] impl.FullTableBackupClient: Unexpected 
BackupException : Wrong FS: hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, 
expected: file:///
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:647)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:425)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1712)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1711)
at 
org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:589)
at org.apache.hadoop.fs.FileSystem$6.(FileSystem.java:1787)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1783)
at 
org.apache.hadoop.hbase.backup.util.BackupClientUtil.getFiles(BackupClientUtil.java:161)
at 
org.apache.hadoop.hbase.backup.util.BackupServerUtil.getWALFilesOlderThan(BackupServerUtil.java:381)
at 
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:492)
at 
org.apache.hadoop.hbase.backup.impl.HBaseBackupAdmin.backupTables(HBaseBackupAdmin.java:532)
at 
org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:225)
at 
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:114)
at 
org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:135)
at 
org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:171)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:140)
2016-10-12 17:19:49,557 ERROR [main] impl.FullTableBackupClient: 
BackupId=backup_1476317988066,startts=1476317988637,failedts=1476317989557,failedphase=null,failedmessage=Wrong
 FS: hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:///
2016-10-12 17:19:49,559 DEBUG [AsyncRpcChannel-pool2-t14] ipc.AsyncRpcChannel: 
Use SIMPLE authentication for service ClientService, sasl=false
2016-10-12 17:19:49,656 DEBUG [main] ipc.AsyncRpcClient: Stopping async HBase 
RPC client
{code}

I pass the --config pointing at my conf dir which points at my hdfs and hbase. 
If I don't pass a --config, it'll use all defaults and not find the clusters 
(My config dir includes symlink to hdfs-site.xml).

Yeah, I've seen similar mismatch issues in the past in my own code (your google 
pointer is for a code writer, not for a 'user' like me).  I can bang my head 
and try 'fixing' it but am trying to convey a 'users' experience followiing 
instruction and tool usage. What is a little odd here is that the complaint is 
out of the backup tool, not about the arg I'm passing (must not be reading it 
immediately... because doesn't matter if I pass a file:/// or hdfs:/// scheme 
for backup location).

Let me know what you want me to try... 

This is straight cluster deploy. An Hadoop 2.7.3 build. All generally checks 
out.

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another 

[jira] [Updated] (HBASE-16813) Procedure v2 - Move ProcedureEvent to hbase-procedure module

2016-10-12 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16813:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Procedure v2 - Move ProcedureEvent to hbase-procedure module
> 
>
> Key: HBASE-16813
> URL: https://issues.apache.org/jira/browse/HBASE-16813
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-16813-v0.patch
>
>
> ProcedureEvent was added in MasterProcedureScheduler, but it is generic 
> enough to move to hbase-procedure module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-10-12 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570364#comment-15570364
 ] 

Ben Manes commented on HBASE-15560:
---

YCSB workload B states that each record is 1kb, so that is about 100mb (97mib). 
That's probably introduces some misses due to Java object overhead. Since 
LruBlockCache uses a high watermark and evicts to a low watermark, it could be 
aggressively under utilizing the capacity. So a higher hit rate might be 
understandable, in addition to the workload pattern's characteristics.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16812) Cleanup deprecated compact() function

2016-10-12 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570355#comment-15570355
 ] 

Appy commented on HBASE-16812:
--

so looking around, we take locks from:
- mob compaction chore in master
- sweeper chore in master
- compaction from regions

If compaction tool triggers compact, then no locks are being taken.
(In fact, it's not even triggering mob compaction. [~jingcheng...@intel.com], 
can you please take on this jira and fix that.)

If ExpiredMobFileCleaner is triggered from command line, then no locks are 
taken.
Shouldn't we also take locks when these stuff are triggered from tools?


> Cleanup deprecated compact() function
> -
>
> Key: HBASE-16812
> URL: https://issues.apache.org/jira/browse/HBASE-16812
> Project: HBase
>  Issue Type: Task
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Attachments: HBASE-16812.master.001.patch
>
>
> compact(CompactionContext compaction, CompactionThroughputController 
> throughputController) is [deprecated in 1.2.0 
> release|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L222].
> Store.java is also marked limited private.
> Context: I was cleaning up zk table lock which is also used in that method's 
> [override|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L460]
>  in HMobStore.
> This method isn't being called from anywhere except CompactionTool (which 
> creates HStore object, not HMobStore object).
> [~jingcheng...@intel.com] Can you PTAL and help me understand what's going on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

2016-10-12 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570353#comment-15570353
 ] 

Gary Helmling commented on HBASE-16752:
---

Please add javadoc for RequestTooBigException.

It occurs to me that we also need to do something with the remaining request 
data (the body) that is sitting in the socket channel.  Either we'll need to 
read it off into something like a garbage buffer and throw it away, or else 
maybe we should actually close the connection.  If we do close the connection, 
then does this change still buy us something?

> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> --
>
> Key: HBASE-16752
> URL: https://issues.apache.org/jira/browse/HBASE-16752
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, rpc
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-16752.V1.patch
>
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16721) Concurrency issue in WAL unflushed seqId tracking

2016-10-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-16721:
--
Status: Patch Available  (was: Reopened)

> Concurrency issue in WAL unflushed seqId tracking
> -
>
> Key: HBASE-16721
> URL: https://issues.apache.org/jira/browse/HBASE-16721
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.2.0, 1.1.0, 1.0.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: hbase-16721_addendum.patch, 
> hbase-16721_addendum2.branch-1.1.patch, hbase-16721_v1.branch-1.patch, 
> hbase-16721_v2.branch-1.patch, hbase-16721_v2.master.patch
>
>
> I'm inspecting an interesting case where in a production cluster, some 
> regionservers ends up accumulating hundreds of WAL files, even with force 
> flushes going on due to max logs. This happened multiple times on the 
> cluster, but not on other clusters. The cluster has periodic memstore flusher 
> disabled, however, this still does not explain why the force flush of regions 
> due to max limit is not working. I think the periodic memstore flusher just 
> masks the underlying problem, which is why we do not see this in other 
> clusters. 
> The problem starts like this: 
> {code}
> 2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> then, it continues until the RS is restarted: 
> {code}
> 2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already 
> split some time ago, and was able to flush its data and split without any 
> problems. However, the FSHLog still thinks that there is some unflushed data 
> for this region. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16721) Concurrency issue in WAL unflushed seqId tracking

2016-10-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-16721:
--
Attachment: hbase-16721_addendum2.branch-1.1.patch

This addendum is needed for 1.1 without a proper fix for HBASE-16820. It is 
basically undoing the HRegion changes, but adding a 
{{waitForPreviousTransactionsComplete()}} call at the beginning instead. We end 
up doing two mvcc transactions, but at the end of the flush, we are guaranteed 
to have the mvcc read point to be advanced to the flush sequence id. 

> Concurrency issue in WAL unflushed seqId tracking
> -
>
> Key: HBASE-16721
> URL: https://issues.apache.org/jira/browse/HBASE-16721
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.0.0, 1.1.0, 1.2.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: hbase-16721_addendum.patch, 
> hbase-16721_addendum2.branch-1.1.patch, hbase-16721_v1.branch-1.patch, 
> hbase-16721_v2.branch-1.patch, hbase-16721_v2.master.patch
>
>
> I'm inspecting an interesting case where in a production cluster, some 
> regionservers ends up accumulating hundreds of WAL files, even with force 
> flushes going on due to max logs. This happened multiple times on the 
> cluster, but not on other clusters. The cluster has periodic memstore flusher 
> disabled, however, this still does not explain why the force flush of regions 
> due to max limit is not working. I think the periodic memstore flusher just 
> masks the underlying problem, which is why we do not see this in other 
> clusters. 
> The problem starts like this: 
> {code}
> 2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> then, it continues until the RS is restarted: 
> {code}
> 2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already 
> split some time ago, and was able to flush its data and split without any 
> problems. However, the FSHLog still thinks that there is some unflushed data 
> for this region. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16641) QA tests for hbase-client skip the second part.

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570333#comment-15570333
 ] 

stack commented on HBASE-16641:
---

[~chenheng] Missed this. Does HBASE-16785 fix it?

> QA tests for hbase-client skip the second part.
> ---
>
> Key: HBASE-16641
> URL: https://issues.apache.org/jira/browse/HBASE-16641
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/3547/artifact/patchprocess/patch-unit-hbase-client.txt
> {code}
> [INFO] --- maven-surefire-plugin:2.18.1:test (secondPartTestsExecution) @ 
> hbase-client ---
> [INFO] Tests are skipped.
> {code}
> The first part passed fine,  but second parts is skipped. 
> Notice hbase-client/pom.xml 
> {code}
>  
>   
> secondPartTestsExecution
> test
> 
>   test
> 
> 
>   true
> 
>   
> 
> {code}
> If i change the 'skip' to be false,  the second part could be triggered.  But 
> this configuration existed for a long time,  is the cmd line on build box 
> updated recently? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-16721) Concurrency issue in WAL unflushed seqId tracking

2016-10-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar reopened HBASE-16721:
---

Reopening, since we have to do a quick addendum for 1.1. See HBASE-16820. 

> Concurrency issue in WAL unflushed seqId tracking
> -
>
> Key: HBASE-16721
> URL: https://issues.apache.org/jira/browse/HBASE-16721
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.0.0, 1.1.0, 1.2.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: hbase-16721_addendum.patch, 
> hbase-16721_v1.branch-1.patch, hbase-16721_v2.branch-1.patch, 
> hbase-16721_v2.master.patch
>
>
> I'm inspecting an interesting case where in a production cluster, some 
> regionservers ends up accumulating hundreds of WAL files, even with force 
> flushes going on due to max logs. This happened multiple times on the 
> cluster, but not on other clusters. The cluster has periodic memstore flusher 
> disabled, however, this still does not explain why the force flush of regions 
> due to max limit is not working. I think the periodic memstore flusher just 
> masks the underlying problem, which is why we do not see this in other 
> clusters. 
> The problem starts like this: 
> {code}
> 2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> then, it continues until the RS is restarted: 
> {code}
> 2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] 
> wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 
> regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] 
> regionserver.LogRoller: Failed to schedule flush of 
> d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already 
> split some time ago, and was able to flush its data and split without any 
> problems. However, the FSHLog still thinks that there is some unflushed data 
> for this region. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16749) HBase root pom.xml contains repo from people.apache.org/~garyh

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570331#comment-15570331
 ] 

Hudson commented on HBASE-16749:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1795 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1795/])
HBASE-16749 Remove defunct repo from people.apache.org (garyh: rev 
edac95201492a135e9908d94597741b44ab6496f)
* (edit) pom.xml


> HBase root pom.xml contains repo from people.apache.org/~garyh
> --
>
> Key: HBASE-16749
> URL: https://issues.apache.org/jira/browse/HBASE-16749
> Project: HBase
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Gary Helmling
> Fix For: 1.2.5, 1.1.8
>
> Attachments: HBASE-16749.branch-1.1.patch, 
> HBASE-16749.branch-1.2.patch, HBASE-16749.branch-1.2.patch
>
>
> This is found when I was building Hadoop/YARN:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Hadoop YARN Timeline Service 3.0.0-alpha2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://conjars.org/repo/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> {code}
> [~te...@apache.org] mentioned:
> bq. Among hbase releases (1.1.x, 1.2.y), root pom.xml contains 
> "ghelmling.testing" repo. 
> This private repo sometimes extends the Hadoop build time a lot. It might be 
> better to use ASF snapshot repo first before downloading from private repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570273#comment-15570273
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 12:11 AM:
-

{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

Google: "java.lang.IllegalArgumentException Wrong FS: hdfs:" 
https://www.google.com/webhp?sourceid=chrome-instant=1C5CHFA_enUS548US548=1=2=UTF-8#q=java.lang.IllegalArgumentException+Wrong+FS%3A+hdfs%3A

Next time, can you try recommended:
{code}
$ ./hbase/bin/hbase backup create full 
hdfs://ve0524.halxg.cloudera.com:8020/backu
{code}

hbase pick up config from CLASSPATH, all other options, such as --config may 
require you putting all hadoop xml files into command CLASSPATH manually.


was (Author: vrodionov):
{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

Google: "java.lang.IllegalArgumentException Wrong FS: hdfs:" 
https://www.google.com/webhp?sourceid=chrome-instant=1C5CHFA_enUS548US548=1=2=UTF-8#q=java.lang.IllegalArgumentException+Wrong+FS%3A+hdfs%3A

Next time, can try recommended:
{code}
$ ./hbase/bin/hbase backup create full 
hdfs://ve0524.halxg.cloudera.com:8020/backu
{code}

hbase pick up config from CLASSPATH, all other options, such as --config may 
require you putting all hadoop xml files into command CLASSPATH manually.

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A 

[jira] [Assigned] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-10-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar reassigned HBASE-16820:
-

Assignee: Enis Soztutar

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-10-12 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-16820:
-

 Summary: BulkLoad mvcc visibility only works accidentally 
 Key: HBASE-16820
 URL: https://issues.apache.org/jira/browse/HBASE-16820
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar


[~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
bulk load files is not visible because the sequence id assigned to the bulk 
load is not advanced in mvcc. 

Debugging further, we have noticed that bulk load behavior is wrong, but it 
works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
Let me explain: 
 - BL request can optionally request a flush before hand (this should be the 
default) which causes the flush to happen with some sequenceId. The flush 
sequence id is one past all the cells' sequenceids. This flush sequence id is 
returned as a result to the flush operation. 
 - BL then uses this particular sequenceId to mark the files, but itself does 
not get a new sequenceid of its own, or advance the mvcc number. 
 - BL completes WITHOUT making sure that the sequence id is visible. 
 - BL itself though writes entries to the WAL for the BL event, which in 1.2 
code bases goes through the whole mvcc + seqId paths, which makes sure that 
earlier sequenceIds (the flush sequenceId) are visible via mvcc. 

The problem with 1.1 is that the WAL entries only get sequence ids, but do not 
touch mvcc. With the patch for HBASE-16721, we have made it so that the 
flushedSequenceId is not used in mvcc as the highest read point (although all 
the data is still visible).

BL relying on the flush sequence id is wrong for two reasons: 
 - BL files are loaded with the flush sequence id from the memstore. This 
particular sequence id is used twice for two different things and ends up being 
the sequence id for flushed file as well as BL'ed files. 
 - BL should make sure that it gets a new sequence id and that sequence id is 
visible before returning the results. 

[~ndimiduk] FYI. 
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570273#comment-15570273
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 12:05 AM:
-

{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

Google: "java.lang.IllegalArgumentException Wrong FS: hdfs:" 
https://www.google.com/webhp?sourceid=chrome-instant=1C5CHFA_enUS548US548=1=2=UTF-8#q=java.lang.IllegalArgumentException+Wrong+FS%3A+hdfs%3A

Next time, can try recommended:
{code}
$ ./hbase/bin/hbase backup create full 
hdfs://ve0524.halxg.cloudera.com:8020/backu
{code}

hbase pick up config from CLASSPATH, all other options, such as --config may 
require you putting all hadoop xml files into command CLASSPATH manually.


was (Author: vrodionov):
{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

Google: "java.lang.IllegalArgumentException Wrong FS: hdfs:" 
https://www.google.com/webhp?sourceid=chrome-instant=1C5CHFA_enUS548US548=1=2=UTF-8#q=java.lang.IllegalArgumentException+Wrong+FS%3A+hdfs%3A

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental 

[jira] [Commented] (HBASE-15968) MVCC-sensitive semantics of versions

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570300#comment-15570300
 ] 

stack commented on HBASE-15968:
---

So, let me test no degradation in basic perf with YCSB.

What other tests do we need? We can use PE to write many versions on top of 
each other. What you need to test?

What is the serial replicator issue?

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch, HBASE-15968-v2.patch, 
> HBASE-15968-v3.patch, HBASE-15968-v4.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because 
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will 
> always get t2 no matter if there is a major compaction, because t1 is masked 
> when we put t3 so t1 will never be seen.
> case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, 
> and we will get nothing.
> case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, 
> and we will get t1 because it is not masked.
> case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and 
> we can get t3+t1 because when we put t1 at second time it is the 2nd latest 
> version and it can be read.
> case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like 
> what we can get now, ts is still the key of versions.
> Different VERSIONS may result in different results even the size of result is 
> smaller than VERSIONS(see case 3 and 4).  So Get/Scan.setMaxVersions will be 
> handled at end after we read correct data according to CF's  VERSIONS setting.
> The semantics is different from the current HBase, and we may need more logic 
> to support the new semantic, so it is configurable and default is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570273#comment-15570273
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/13/16 12:02 AM:
-

{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

Google: "java.lang.IllegalArgumentException Wrong FS: hdfs:" 
https://www.google.com/webhp?sourceid=chrome-instant=1C5CHFA_enUS548US548=1=2=UTF-8#q=java.lang.IllegalArgumentException+Wrong+FS%3A+hdfs%3A


was (Author: vrodionov):
{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then 

[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

2016-10-12 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570299#comment-15570299
 ] 

Ashu Pachauri commented on HBASE-16752:
---

Ran the timed out test locally and also had a look at the test code too; they 
seem unrelated to the patch.

> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> --
>
> Key: HBASE-16752
> URL: https://issues.apache.org/jira/browse/HBASE-16752
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, rpc
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-16752.V1.patch
>
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570273#comment-15570273
 ] 

Vladimir Rodionov edited comment on HBASE-7912 at 10/12/16 11:59 PM:
-

{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace, [~saint@gmail.com]? It seems something is 
wrong with your hbase conf file.


was (Author: vrodionov):
{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace?

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we 

[jira] [Assigned] (HBASE-16755) Don't flush everything when under global memstore pressure

2016-10-12 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri reassigned HBASE-16755:
-

Assignee: Ashu Pachauri

> Don't flush everything when under global memstore pressure
> --
>
> Key: HBASE-16755
> URL: https://issues.apache.org/jira/browse/HBASE-16755
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>
> When global memstore reaches the low water mark, we pick the best flushable 
> region and flush all column families for it. This is a suboptimal approach in 
> the  sense that it leads to an unnecessarily high file creation rate and IO 
> amplification due to compactions. We should still try to honor the underlying 
> FlushPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16755) Don't flush everything when under global memstore pressure

2016-10-12 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570289#comment-15570289
 ] 

Ashu Pachauri commented on HBASE-16755:
---

Picking this up.

> Don't flush everything when under global memstore pressure
> --
>
> Key: HBASE-16755
> URL: https://issues.apache.org/jira/browse/HBASE-16755
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>
> When global memstore reaches the low water mark, we pick the best flushable 
> region and flush all column families for it. This is a suboptimal approach in 
> the  sense that it leads to an unnecessarily high file creation rate and IO 
> amplification due to compactions. We should still try to honor the underlying 
> FlushPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16735) Procedure v2 - Fix yield while holding locks

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570285#comment-15570285
 ] 

Hudson commented on HBASE-16735:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1775 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1775/])
HBASE-16735 Procedure v2 - Fix yield while holding locks (matteo.bertozzi: rev 
e868d9586f738cc35b07818e6d796f12a4ccdbe4)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TableProcedureInterface.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureScheduler.java


> Procedure v2 - Fix yield while holding locks
> 
>
> Key: HBASE-16735
> URL: https://issues.apache.org/jira/browse/HBASE-16735
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-16735-v0.patch
>
>
> Sched fix for proc holding locks. the proc was added back to the queue, but 
> the queue was not re-added to the runq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16505) Save deadline in RpcCallContext according to request's timeout

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570284#comment-15570284
 ] 

Hudson commented on HBASE-16505:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1775 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1775/])
HBASE-16505 Pass deadline to HRegion operations (stack: rev 
b8173a548c6bcc86cd4ca77b40bcabe3f0bd85fd)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcCallContext.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Save deadline in RpcCallContext according to request's timeout
> --
>
> Key: HBASE-16505
> URL: https://issues.apache.org/jira/browse/HBASE-16505
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16505-branch-1-v1.patch, 
> HBASE-16505-branch-1-v2.patch, HBASE-16505-branch-1-v3.patch, 
> HBASE-16505-v1.patch, HBASE-16505-v10.patch, HBASE-16505-v10.patch, 
> HBASE-16505-v11.patch, HBASE-16505-v11.patch, HBASE-16505-v12.patch, 
> HBASE-16505-v13.patch, HBASE-16505-v2.patch, HBASE-16505-v3.patch, 
> HBASE-16505-v4.patch, HBASE-16505-v5.patch, HBASE-16505-v6.patch, 
> HBASE-16505-v7.patch, HBASE-16505-v8.patch, HBASE-16505-v9.patch
>
>
> If we want to know the correct setting of timeout in read/write path, we need 
> add a new parameter in operation-methods of Region.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16810) HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are in /hbase/draining znode and unloaded

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570286#comment-15570286
 ] 

Hudson commented on HBASE-16810:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1775 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1775/])
HBASE-16810 HBase Balancer throws ArrayIndexOutOfBoundsException when (tedyu: 
rev f4bdab8bace03c036ba0fe893cbd357b2e2f0526)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java


> HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers are 
> in /hbase/draining znode and unloaded
> -
>
> Key: HBASE-16810
> URL: https://issues.apache.org/jira/browse/HBASE-16810
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Ashu Pachauri
>Assignee: David Pope
> Fix For: 1.3.0
>
> Attachments: HBASE-16810.branch-1.patch, HBASE-16810.patch, 
> master.patch
>
>
> 1. Add a regionserver znode under /hbase/draining znode.
> 2. Use RegionMover to unload all regions from the regionserver.
> 3. Run balancer.
> {code}
> 16/09/21 14:17:33 ERROR ipc.RpcServer: Unexpected throwable object
> java.lang.ArrayIndexOutOfBoundsException: 75
>   at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.getLocalityOfRegion(BaseLoadBalancer.java:867)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer$LocalityCostFunction.cost(StochasticLoadBalancer.java:1186)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.computeCost(StochasticLoadBalancer.java:521)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:309)
>   at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:264)
>   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1339)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:442)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58555)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2268)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16703) Explore object pooling of SeekerState

2016-10-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-16703:
--
Component/s: Performance

> Explore object pooling of SeekerState
> -
>
> Key: HBASE-16703
> URL: https://issues.apache.org/jira/browse/HBASE-16703
> Project: HBase
>  Issue Type: Task
>  Components: Performance
>Reporter: Andrew Purtell
>Assignee: ramkrishna.s.vasudevan
>
> In read workloads 35% of the allocation pressure produced by servicing RPC 
> requests, when block encoding is enabled, comes from 
> BufferedDataBlockEncoder$SeekerState., where we allocate two byte 
> arrays of INITIAL_KEY_BUFFER_SIZE in length. There's an opportunity for 
> object pooling of SeekerState here. Subsequent code checks if those byte 
> arrays are sized sufficiently to handle incoming data to copy. The arrays 
> will be resized if needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16731) Inconsistent results from the Get/Scan if we use the empty FilterList

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570288#comment-15570288
 ] 

Hudson commented on HBASE-16731:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1775 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1775/])
HBASE-16731 Inconsistent results from the Get/Scan if we use the empty (stack: 
rev a46134bffc21da91853c4288375e9d4bd00a177d)
* (edit) hbase-protocol/src/main/protobuf/Client.proto
* (edit) hbase-protocol-shaded/src/main/protobuf/Client.proto
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* (edit) 
hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
* (edit) hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestScan.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java
* (edit) 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> Inconsistent results from the Get/Scan if we use the empty FilterList
> -
>
> Key: HBASE-16731
> URL: https://issues.apache.org/jira/browse/HBASE-16731
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16731.v0.patch, HBASE-16731.v1.patch, 
> HBASE-16731.v2.patch, HBASE-16731.v3.patch
>
>
> RSRpcServices#get() converts the Get to Scan without 
> scan#setLoadColumnFamiliesOnDemand. It causes that the result retrieved from 
> Get and Scan will be different if we use the empty filter. Scan doesn't 
> return any data but Get does.
> see [HBASE-16729 |https://issues.apache.org/jira/browse/HBASE-16729]
> Any comments? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15134) Add visibility into Flush and Compaction queues

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570281#comment-15570281
 ] 

stack commented on HBASE-15134:
---

+1

Let me try it.

> Add visibility into Flush and Compaction queues
> ---
>
> Key: HBASE-15134
> URL: https://issues.apache.org/jira/browse/HBASE-15134
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction, metrics, regionserver
>Reporter: Elliott Clark
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-15134.master.001.patch, 
> HBASE-15134.master.002.patch, HBASE-15134.patch, HBASE-15134.patch
>
>
> On busy spurts we can see regionservers start to see large queues for 
> compaction. It's really hard to tell if the server is queueing a lot of 
> compactions for the same region, lots of compactions for lots of regions, or 
> just falling behind.
> For flushes much the same. There can be flushes in queue that aren't being 
> run because of delayed flushes. There's no way to know from the metrics how 
> many flushes are for each region, how many are delayed. Etc.
> We should add either more metrics around this ( num per region, max per 
> region, min per region ) or add on a UI page that has the list of compactions 
> and flushes.
> Or both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15134) Add visibility into Flush and Compaction queues

2016-10-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15134:
--
Fix Version/s: 2.0.0

> Add visibility into Flush and Compaction queues
> ---
>
> Key: HBASE-15134
> URL: https://issues.apache.org/jira/browse/HBASE-15134
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction, metrics, regionserver
>Reporter: Elliott Clark
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-15134.master.001.patch, 
> HBASE-15134.master.002.patch, HBASE-15134.patch, HBASE-15134.patch
>
>
> On busy spurts we can see regionservers start to see large queues for 
> compaction. It's really hard to tell if the server is queueing a lot of 
> compactions for the same region, lots of compactions for lots of regions, or 
> just falling behind.
> For flushes much the same. There can be flushes in queue that aren't being 
> run because of delayed flushes. There's no way to know from the metrics how 
> many flushes are for each region, how many are delayed. Etc.
> We should add either more metrics around this ( num per region, max per 
> region, min per region ) or add on a UI page that has the list of compactions 
> and flushes.
> Or both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16710) Add ZStandard Codec to Compression.java

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570276#comment-15570276
 ] 

stack commented on HBASE-16710:
---

What does Hadoop have [~churromorales]?

> Add ZStandard Codec to Compression.java
> ---
>
> Key: HBASE-16710
> URL: https://issues.apache.org/jira/browse/HBASE-16710
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: churro morales
>Assignee: churro morales
>Priority: Minor
> Attachments: HBASE-16710-0.98.patch, HBASE-16710-1.2.patch, 
> HBASE-16710.patch
>
>
> HADOOP-13578 is adding the ZStandardCodec to hadoop.  This is a placeholder 
> to ensure it gets added to hbase once this gets upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

2016-10-12 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570273#comment-15570273
 ] 

Vladimir Rodionov commented on HBASE-7912:
--

{quote}
It failed with java.lang.IllegalArgumentException Wrong FS: 
hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs, expected: file:/// and then 
later I got an NPE:
{quote}

Can you post full stack trace?

> HBase Backup/Restore Based on HBase Snapshot
> 
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: Backup-and-Restore-Apache_19Sep2016.pdf, 
> Backup-and-Restore-Apache_9Sep2016.pdf, HBaseBackupAndRestore - v0.8.pdf, 
> HBaseBackupAndRestore -0.91.pdf, HBaseBackupAndRestore-v0.9.pdf, 
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer 

[jira] [Commented] (HBASE-16812) Cleanup deprecated compact() function

2016-10-12 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570270#comment-15570270
 ] 

Appy commented on HBASE-16812:
--

So if it's used only for major and mob compaction, can we use a simple lock 
(now that sweeper job is gone).

> Cleanup deprecated compact() function
> -
>
> Key: HBASE-16812
> URL: https://issues.apache.org/jira/browse/HBASE-16812
> Project: HBase
>  Issue Type: Task
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Attachments: HBASE-16812.master.001.patch
>
>
> compact(CompactionContext compaction, CompactionThroughputController 
> throughputController) is [deprecated in 1.2.0 
> release|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L222].
> Store.java is also marked limited private.
> Context: I was cleaning up zk table lock which is also used in that method's 
> [override|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L460]
>  in HMobStore.
> This method isn't being called from anywhere except CompactionTool (which 
> creates HStore object, not HMobStore object).
> [~jingcheng...@intel.com] Can you PTAL and help me understand what's going on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16813) Procedure v2 - Move ProcedureEvent to hbase-procedure module

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570234#comment-15570234
 ] 

Hadoop QA commented on HBASE-16813:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
11s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 28s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 4s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 12s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestProcedureAdmin |
|   | 
org.apache.hadoop.hbase.replication.regionserver.TestTableBasedReplicationSourceManagerImpl
 |
|   | org.apache.hadoop.hbase.master.procedure.TestModifyColumnFamilyProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestDeleteColumnFamilyProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12832978/HBASE-16813-v0.patch |
| JIRA Issue | HBASE-16813 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux f2a49fb0bd0a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / f4bdab8 |
| 

[jira] [Commented] (HBASE-16813) Procedure v2 - Move ProcedureEvent to hbase-procedure module

2016-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570215#comment-15570215
 ] 

Hadoop QA commented on HBASE-16813:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 32s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 17s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 131m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient |
|   | org.apache.hadoop.hbase.TestClusterBootOrder |
|   | org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient |
|   | org.apache.hadoop.hbase.TestPartialResultsFromClientSide |
|   | org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12832978/HBASE-16813-v0.patch |
| JIRA Issue | HBASE-16813 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux e21878a44145 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / f4bdab8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 

[jira] [Commented] (HBASE-16749) HBase root pom.xml contains repo from people.apache.org/~garyh

2016-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570211#comment-15570211
 ] 

Hudson commented on HBASE-16749:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1879 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1879/])
HBASE-16749 Remove defunct repo from people.apache.org (garyh: rev 
edac95201492a135e9908d94597741b44ab6496f)
* (edit) pom.xml


> HBase root pom.xml contains repo from people.apache.org/~garyh
> --
>
> Key: HBASE-16749
> URL: https://issues.apache.org/jira/browse/HBASE-16749
> Project: HBase
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Gary Helmling
> Fix For: 1.2.5, 1.1.8
>
> Attachments: HBASE-16749.branch-1.1.patch, 
> HBASE-16749.branch-1.2.patch, HBASE-16749.branch-1.2.patch
>
>
> This is found when I was building Hadoop/YARN:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Hadoop YARN Timeline Service 3.0.0-alpha2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://conjars.org/repo/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> ...
> Downloading: 
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-client/3.0.0-alpha2-SNAPSHOT/maven-metadata.xml
> {code}
> [~te...@apache.org] mentioned:
> bq. Among hbase releases (1.1.x, 1.2.y), root pom.xml contains 
> "ghelmling.testing" repo. 
> This private repo sometimes extends the Hadoop build time a lot. It might be 
> better to use ASF snapshot repo first before downloading from private repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15921) Add first AsyncTable impl and create TableImpl based on it

2016-10-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570206#comment-15570206
 ] 

Duo Zhang commented on HBASE-15921:
---

Ah I have removed the watcher related code in the new patch as we do not need 
it right now. And if we want to do async access to zk then curator is much 
simpler than our zk stuff as client only need to read a small set of data on 
zk. Now the client depend on Zookeeperwatcher, MasterAddressTrack, and maybe 
MetaTableAccessor. And for every class we only use  a very small part, it is a 
pain to modify such big classes...

And for now as we do not use async nor watcher, I can remove the curator 
dependency. We can continue this discussion in a new jira.

Thanks.

> Add first AsyncTable impl and create TableImpl based on it
> --
>
> Key: HBASE-15921
> URL: https://issues.apache.org/jira/browse/HBASE-15921
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jurriaan Mous
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-15921-v2.patch, HBASE-15921-v3.patch, 
> HBASE-15921-v4.patch, HBASE-15921-v5.patch, HBASE-15921-v6.patch, 
> HBASE-15921-v7.patch, HBASE-15921-v8.patch, HBASE-15921.demo.patch, 
> HBASE-15921.patch, HBASE-15921.v1.patch
>
>
> First we create an AsyncTable interface with implementation without the Scan 
> functionality. Those will land in a separate patch since they need a refactor 
> of existing scans.
> Also added is a new TableImpl to replace HTable. It uses the AsyncTableImpl 
> internally and should be a bit faster because it does jump through less hoops 
> to do ProtoBuf transportation. This way we can run all existing tests on the 
> AsyncTableImpl to guarantee its quality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570204#comment-15570204
 ] 

stack commented on HBASE-15560:
---

Did all fit in memory when you ran this test [~ben.manes]? Or was there cache 
misses? It looks like tinylfu did better in your test? Thanks.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15536) Make AsyncFSWAL as our default WAL

2016-10-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570176#comment-15570176
 ] 

Duo Zhang commented on HBASE-15536:
---

At RS, we will fail the request if the wall edit is too large.

And a simple split does not work, it we fail at the middle, we have no 
information to know if there are more pieces for this edit so either we loss 
data or can not guarantee atomic.
We need to introduce a chunk senmatic for wal edit if we want to do things like 
this. This will also have a benefit that a larger write will not block other 
write operations a lot.

Let me dig more. Thanks.

> Make AsyncFSWAL as our default WAL
> --
>
> Key: HBASE-15536
> URL: https://issues.apache.org/jira/browse/HBASE-15536
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15536-v1.patch, HBASE-15536-v2.patch, 
> HBASE-15536-v3.patch, HBASE-15536-v4.patch, HBASE-15536-v5.patch, 
> HBASE-15536.patch
>
>
> As it should be predicated on passing basic cluster ITBLL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16746) The log of “region close” should differ from “region move”

2016-10-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-16746:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master branch (doesn't cherry-pick clean to branch-1). Thanks 
[~chia7712]

> The log of “region close” should differ from “region move”
> --
>
> Key: HBASE-16746
> URL: https://issues.apache.org/jira/browse/HBASE-16746
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16746.v0.patch
>
>
> If we disable some tables, we will see the following in region server log.
> {noformat}
> Close 90ed2fe1748644c6faecdec3651335d4, moving to null
> {noformat}
> The message “moving to null” is a bit confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16819) findHangingTests.py should recognize tests with same name from different modules

2016-10-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-16819:
--

 Summary: findHangingTests.py should recognize tests with same name 
from different modules
 Key: HBASE-16819
 URL: https://issues.apache.org/jira/browse/HBASE-16819
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I was using findHangingTests.py on one of the 1.4 Jenkins builds:
{code}
ERROR! Multiple tests with same name 'TestTableMapReduceUtil'. Might get wrong 
results for this test.
ERROR! No test 'TestTableMapReduceUtil' found in hanging_tests. Might get wrong 
results for this test.
ERROR! Multiple tests with same name 'TestHTablePool'. Might get wrong results 
for this test.
ERROR! No test 'TestHTablePool' found in hanging_tests. Might get wrong results 
for this test.
{code}
There are mapred.TestTableMapReduceUtil and mapreduce.TestTableMapReduceUtil 
which confused findHangingTests.py

Similar issue with tests whose name contains $:
{code}
Running org.apache.hadoop.hbase.client.TestHTablePool$TestHTableReusablePool
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.273 sec - in 
org.apache.hadoop.hbase.client.TestHTablePool$TestHTableReusablePool
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15536) Make AsyncFSWAL as our default WAL

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570156#comment-15570156
 ] 

stack commented on HBASE-15536:
---

bq. We can introduce a new config named 'max.wal.edit.size' which is used to 
limit the size of one atomic operation and set the default value less than 
16MB? This could make it possible to make AsyncFSWAL as our default WAL.

Where would we do the check? On client-side?

We would have to dissassemble the WALEdit to break it into 16MB pieces? Each 
edit would have same sequenceid?





> Make AsyncFSWAL as our default WAL
> --
>
> Key: HBASE-15536
> URL: https://issues.apache.org/jira/browse/HBASE-15536
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15536-v1.patch, HBASE-15536-v2.patch, 
> HBASE-15536-v3.patch, HBASE-15536-v4.patch, HBASE-15536-v5.patch, 
> HBASE-15536.patch
>
>
> As it should be predicated on passing basic cluster ITBLL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15536) Make AsyncFSWAL as our default WAL

2016-10-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15536:
--
Priority: Critical  (was: Major)

> Make AsyncFSWAL as our default WAL
> --
>
> Key: HBASE-15536
> URL: https://issues.apache.org/jira/browse/HBASE-15536
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15536-v1.patch, HBASE-15536-v2.patch, 
> HBASE-15536-v3.patch, HBASE-15536-v4.patch, HBASE-15536-v5.patch, 
> HBASE-15536.patch
>
>
> As it should be predicated on passing basic cluster ITBLL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-16698:
--
Fix Version/s: 1.4.0
   1.3.0
   2.0.0

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570145#comment-15570145
 ] 

stack commented on HBASE-16698:
---

So, [~carp84], you running w/ this in production? I should apply this to master 
and branch-1 for hbase-1.4 and to branch-1.3 and branch-1.2 but with this 
option set to off as default? 

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16785) We are not running all tests

2016-10-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570103#comment-15570103
 ] 

stack commented on HBASE-16785:
---

Thanks [~mbertozzi] Done.

> We are not running all tests
> 
>
> Key: HBASE-16785
> URL: https://issues.apache.org/jira/browse/HBASE-16785
> Project: HBase
>  Issue Type: Bug
>  Components: build, test
>Reporter: stack
>Assignee: stack
> Attachments: HBASE-16785.master.001.patch, 
> HBASE-16785.master.002.patch
>
>
> Noticed by [~mbertozzi]
> We have some modules where we tried to 'skip' the running of the second part 
> of tests -- medium and larges. That might have made sense once when the 
> module was originally added when there may have been just a few small tests 
> to run but as time goes by and the module accumulates more tests in a few 
> cases we've added mediums and larges but we've not removed the 'skip' config.
> Matteo noticed this happened in hbase-procedure.
> In hbase-client, there is at least a medium test that is being skipped.
> Let me try purging this trick everywhere. It doesn't seem to save us anything 
> going by build time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >