[jira] [Commented] (HBASE-18221) Switch from pread to stream should happen under HStore's reentrant lock

2017-06-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057019#comment-16057019
 ] 

ramkrishna.s.vasudevan commented on HBASE-18221:


Find bugs warnings seems to be unrelated to this patch. 

> Switch from pread to stream should happen under HStore's reentrant lock
> ---
>
> Key: HBASE-18221
> URL: https://issues.apache.org/jira/browse/HBASE-18221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Affects Versions: 2.0.0, 3.0.0, 2.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18221_2_fortestcasefailure.patch, 
> HBASE-18221_2.patch, HBASE-18221_2.patch
>
>
> Found this while debugging HBASE-18186. When we try to reopen the scanners on 
> the storefiles while trying to switch over from pread to stream, we do not 
> use the HStore's reentrant lock to get the current Storefiles from the 
> StoreFileManager. All the scan APIs are guarded under that and we must do it 
> here also other wise the CompactedHfileDischarger may cause race issues with 
> the HStore's datastructures like here
> {code}
> 2017-06-14 18:16:17,223 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=23,queue=1,port=16020] 
> regionserver.StoreScanner: failed to switch to stream read
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:133)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:997)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1134)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:445)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6459)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:339)
> at 
> org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:252)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:166)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
> {code}
> I have a working patch fixing this problem. Will do some more testing and try 
> to upload the patch after I write a test case for this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057017#comment-16057017
 ] 

ramkrishna.s.vasudevan commented on HBASE-17125:


I was following this discussion previously and lost track. I have seen this 
problem consistently arising. In all the different JIRAs related to this the 
fix inside the core was always having some side effects one way or the other.

For the Visibility labels and ACL comment from Anoop, I think if we remove the 
Logic in VisiblityLabelFilter's filterKV to get the max number of versions and 
just add the new filter at the end in VisibiityController and set 
scan.setMaxVersion() all the visibilty tests should still pass. Can you try it 
out once [~zghaobac]?

I think having a javadoc and a new filter for the user in case where he really 
wants multiple versions to be returned back even while filtering means he can 
try using the new filter provided (if it is covering all the cases) as the core 
side changes are minimal and most importantly bug free.
IMHO - it is fine with me to add a new filter and expose it to user for those 
who need this filter + version specific usage.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-20 Thread Densel Santhmayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Densel Santhmayor updated HBASE-18161:
--
Attachment: MultiHFileOutputFormatSupport_HBASE_18161_v8.patch

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his 

[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057002#comment-16057002
 ] 

Hadoop QA commented on HBASE-18119:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 52s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 47s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 137m 59s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
45s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 204m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure |
| Timed out junit tests | org.apache.hadoop.hbase.backup.TestFullBackupSet |
|   | org.apache.hadoop.hbase.backup.TestFullRestore |
|   | org.apache.hadoop.hbase.backup.TestBackupDelete |
|   | org.apache.hadoop.hbase.backup.TestBackupStatusProgress |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.10.1 Server=1.10.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873775/HBASE-18119-v4.patch |
| JIRA Issue | HBASE-18119 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 33e30d203234 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 
24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5b485d1 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7265/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7265/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7265/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056993#comment-16056993
 ] 

Hadoop QA commented on HBASE-18167:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s {color} 
| {color:red} HBASE-18167 does not apply to branch-1.3. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873798/HBASE-18167.branch-1.3.V2.patch
 |
| JIRA Issue | HBASE-18167 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7270/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167.branch-1.3.V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: (was: HBASE-18167.V2.branch-1.3.patch)

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167.branch-1.3.V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: HBASE-18167.branch-1.3.V2.patch

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167.branch-1.3.V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch, 
> HBASE-18167.V2.branch-1.3.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18250) Failed to move working directory snapshot { ss=_UPGRADING_TABLE_SYSTEM.MUTEX table=SYSTEM.MUTEX type=DISABLED }

2017-06-20 Thread vishal (JIRA)
vishal created HBASE-18250:
--

 Summary: Failed to move working directory snapshot { 
ss=_UPGRADING_TABLE_SYSTEM.MUTEX table=SYSTEM.MUTEX type=DISABLED }
 Key: HBASE-18250
 URL: https://issues.apache.org/jira/browse/HBASE-18250
 Project: HBase
  Issue Type: Bug
 Environment: HBase version : 1.2.6
Phoenix Version : 4.10.0-HBase-1.2.0
Reporter: vishal


While creating schema or table I am getting this error:
Failed to move working directory snapshot { ss=_UPGRADING_TABLE_SYSTEM.MUTEX 
table=SYSTEM.MUTEX type=DISABLED }
So because of this error it is not creating the schema or table.

java-program:
--
{code:java}
Properties connectionProps = new Properties();
connectionProps.put("phoenix.schema.isNamespaceMappingEnabled", "true");
connectionProps.put("phoenix.schema.mapSystemTablesToNamespace", "true");
connection = 
DriverManager.getConnection("jdbc:phoenix:localhost",connectionProps);
statement = connection.createStatement();
statement.executeUpdate("CREATE SCHEMA MYSCHEMA");
{code}

hdfs-site.xml

{code:java}
 
phoenix.schema.isNamespaceMappingEnabled
true
   

phoenix.schema.mapSystemTablesToNamespace
true
   
{code}

please help me.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056984#comment-16056984
 ] 

Hudson commented on HBASE-18180:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #185 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/185/])
HBASE-18180 Possible connection leak while closing BufferedMutator in (tedyu: 
rev 98cd0de41075961f728543617e9785573f813a1a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableOutputFormat.java


> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0, 1.3.2, 2.0.0-alpha-2
>
> Attachments: HBASE-18180-branch-1.3.patch, 
> HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056982#comment-16056982
 ] 

Hudson commented on HBASE-18180:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #199 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/199/])
HBASE-18180 Possible connection leak while closing BufferedMutator in (tedyu: 
rev 98cd0de41075961f728543617e9785573f813a1a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableOutputFormat.java


> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0, 1.3.2, 2.0.0-alpha-2
>
> Attachments: HBASE-18180-branch-1.3.patch, 
> HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056975#comment-16056975
 ] 

Anoop Sam John commented on HBASE-17125:


Agree Duo.. We say in javadoc when we will call filter.  But if we can solve 
this issue in our code itself, that is great.  Ur concern of more complexity 
also valid only..  My request would be lets try that approach (I dont mind who 
do :-)  )  also and see.. How complex or not it is.  Any perf impact or not.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056973#comment-16056973
 ] 

Anoop Sam John commented on HBASE-17125:


What Ted 2 reasons why I fear abt the way of asking new Filter usage..   I did 
not check his suggestion fully wrt code and side effects if any..  But IMHO we 
should try it out..  With out much perf penalty if we can do, why not!  If the 
code itself can handle this that is always better and users will be much more 
happy(Than they have to remember and use another filter)..  Lets be open for 
all possible ways. My request he would be pls see possible ways with which we 
can solve it on our own..  All agree here that the present situation is a bug 
and really bad behave from system. We never give back deterministic results. 
(All depends on other factors like compaction done or not etc)

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18226:
---
Issue Type: New Feature  (was: Bug)

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056965#comment-16056965
 ] 

Ted Yu commented on HBASE-18226:


TestMasterProcedureWalLease is flaky.

+1 from me.

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18226:
---
Summary: Disable reverse DNS lookup at HMaster and use the hostname 
provided by RegionServer  (was: Disable reverse DNS lookup at HMaster and use 
default hostname provided by RegionServer)

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056962#comment-16056962
 ] 

Hadoop QA commented on HBASE-18248:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
26s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 47s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 49s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 165m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873770/HBASE-18248.patch |
| JIRA Issue | HBASE-18248 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux a3b92bd6f800 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5b485d1 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7264/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7264/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7264/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7264/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7264/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This 

[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056961#comment-16056961
 ] 

Hadoop QA commented on HBASE-18226:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
1s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 31s 
{color} | {color:red} hbase-common in master has 2 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 32s 
{color} | {color:red} hbase-common in master has 2 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 27s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} hbase-common generated 0 new + 26 unchanged - 26 fixed 
= 26 total (was 52) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 1s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 37s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873763/HBASE-18226.006.patch 
|
| JIRA Issue | HBASE-18226 |
| Optional Tests |  asflicense  javac  

[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056957#comment-16056957
 ] 

Ted Yu commented on HBASE-18167:


Please name your patch: HBASE-18167.branch-1.3.V2.patch or simply 
HBASE-18167.branch-1.3.patch

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.patch, 
> HBASE-18167-branch-1-V2.patch, HBASE-18167.V2.branch-1.3.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056947#comment-16056947
 ] 

Hadoop QA commented on HBASE-18167:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 2m 40s {color} 
| {color:red} HBASE-18167 does not apply to branch-1.3. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873792/HBASE-18167.V2.branch-1.3.patch
 |
| JIRA Issue | HBASE-18167 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7269/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.patch, 
> HBASE-18167-branch-1-V2.patch, HBASE-18167.V2.branch-1.3.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056941#comment-16056941
 ] 

Ted Yu commented on HBASE-18160:


I already gave +1 above.

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: HBASE-18167.V2.branch-1.3.patch

Reattaching branch-1.3 patch with proper name. 

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.patch, 
> HBASE-18167-branch-1-V2.patch, HBASE-18167.V2.branch-1.3.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056932#comment-16056932
 ] 

Duo Zhang commented on HBASE-17125:
---

And I need to say again, we have never break the javadoc of setMaxVersions. If 
you do not use filtr, then the behavior is correct. If you use filter, then 
this is what the javadoc of setFilter says, filter is tested at last.

{quote}
  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(org.apache.hadoop.hbase.Cell)} is called 
AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
{quote}

Filter always introduces behaviors which are not intuitive because it is too 
flexible. For example, PageFilter may still return more rows than configured. 
You need to know the details of HBase if you want to use filter correctly. 
That's why I want to fix the problem in a simpler way rather than a complicated 
way. It does not make any big difference to users. For normal users they just 
do not care, and for advanced users they must know the implementation details.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: (was: HBASE-18167-branch-1.3-V2.patch)

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-20 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056929#comment-16056929
 ] 

Jerry He commented on HBASE-18161:
--

Thanks for the contribution!

I am going through the patch.  Here are a couple of general questions.

The major difference between the patch and original HBASE-3727 and HBASE-16261 
is the approach to use the concat of table name and the original rowkey as the 
output key. This approach seems to allow you to use the existing 
TotalOrderPartitioner without change.  What other advantages do you see?   
Feature-wise or performance?  Comparing it to outputting the table name only as 
the key.

You choose a fixed ':' as the separator?  because it is not allowed in any 
table name?   You will not be able to make assumptions on the rowkeys. 
Does it all work well with the partitioner and sorter under this design, e.g. 
without ordering issue?  It does seem to be ok.

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will 

[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056920#comment-16056920
 ] 

Hadoop QA commented on HBASE-18167:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s {color} 
| {color:red} HBASE-18167 does not apply to branch-1.3. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873791/HBASE-18167-branch-1.3-V2.patch
 |
| JIRA Issue | HBASE-18167 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7268/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.3-V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18235) LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056916#comment-16056916
 ] 

Hadoop QA commented on HBASE-18235:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
5s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 23s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
59m 1s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 146m 50s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
11s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 237m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRegionReplicaFailover |
|   | hadoop.hbase.regionserver.wal.TestLogRollingNoCluster |
|   | hadoop.hbase.regionserver.TestEncryptionKeyRotation |
|   | hadoop.hbase.regionserver.TestPerColumnFamilyFlush |
| Timed out junit tests | org.apache.hadoop.hbase.quotas.TestQuotaStatusRPCs |
|   | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream |
|   | org.apache.hadoop.hbase.TestAcidGuarantees |
|   | org.apache.hadoop.hbase.quotas.TestQuotaObserverChoreRegionReports |
|   | org.apache.hadoop.hbase.replication.TestReplicationSmallTests |
|   | org.apache.hadoop.hbase.TestIOFencing |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873758/HBASE-18235.patch |
| JIRA Issue | HBASE-18235 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 519769a87da3 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | 

[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Fix Version/s: 1.3.2

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
> Attachments: HBASE-18167-branch-1.3-V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: HBASE-18167-branch-1.3-V2.patch

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.3-V2.patch, 
> HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-20 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056907#comment-16056907
 ] 

Zheng Hu commented on HBASE-17678:
--

[~busbey],  If no other concerns,  Let's commit it into branch-1.* ,  I've no 
permission to do that.  Anybody help ?   Thanks. 

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.1.v2.patch, 
> HBASE-17678.branch-1.1.v2.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.branch-1.v2.patch, 
> HBASE-17678.branch-1.v2.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. 

[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-20 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056901#comment-16056901
 ] 

Zheng Hu commented on HBASE-18160:
--

Ping [~anoop.hbase], [~tedyu] again.Thanks. 

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056895#comment-16056895
 ] 

Duo Zhang commented on HBASE-17125:
---

We have already discussed many times on how to implement it, either in our 
company or here. 

{quote}
 which doesn't add much more complexity
{quote}
Then why don't you implement it by yourself? If you think it is easy, then 
please implement it.

And your approach will lose the ability to control the versions passed to 
filter. And it can not be addressed by adding a new filter at the beginning, 
because you will always reset the version count if the filter list returns SKIP.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18204) [C++] Cleanup outgoing RPCs on connection close

2017-06-20 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056886#comment-16056886
 ] 

Enis Soztutar commented on HBASE-18204:
---

I was testing this patch together with a couple of other patches. It seems that 
we are still not there yet in terms of handling regionserver failure. Testing 
with 2 regionservers, 10 regions, and writing 500M of data. When the server is 
killed the client hangs: 
{code}
I0621 02:36:04.355890  6477 simple-client.cc:110] Sent  9 Put requests in 
33961 ms.
I0621 02:36:08.138052  6477 simple-client.cc:110] Sent  10 Put requests in 
37743 ms.
W0621 02:36:09.939910  6485 HandlerContext-inl.h:186] readException reached end 
of pipeline
{code}

> [C++] Cleanup outgoing RPCs on connection close
> ---
>
> Key: HBASE-18204
> URL: https://issues.apache.org/jira/browse/HBASE-18204
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: hbase-18204_v1.patch
>
>
> Our client-dispatcher maintains a map of outgoing RPCs per server with the 
> promises. 
> In case the server goes down, or TCP connection is closed, we should complete 
> the outgoing RPCs with exceptions so that higher level waiters can unblock 
> and retry. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056871#comment-16056871
 ] 

Ted Yu commented on HBASE-17125:


The reasons Anoop and I don't favor SpecifiedNumVersionsColumnFilter are:

* it is not intuitive.
* user may use it incorrectly (conside nested FilterLists).

Guanghao has agreed to try the new approach which doesn't add much more 
complexity on top of what is posted so far. Let's give him some time.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-20 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056865#comment-16056865
 ] 

chunhui shen commented on HBASE-15691:
--

{code:java}
+public synchronized void instantiateBucket(Bucket b) {
+private synchronized void removeBucket(Bucket b) {
+public synchronized IndexStatistics statistics() {
{code}
The  synchronized methods are all not on the client-read-path, thus there 
should be no perf implications.
I haven't found any possibility about deadlock, because won't try to fetch any 
lock inside these methods and their child methods.(It's able to read all the 
process inside the methods now :D)


+1 on the patch


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056860#comment-16056860
 ] 

Duo Zhang commented on HBASE-17125:
---

And we have already speak out at HBaseCon, there is no big concerns. I think 
most users just do not care about it as usually we only have one version, just 
like the user asked on mailing list.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056856#comment-16056856
 ] 

Hudson commented on HBASE-18180:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #67 (See 
[https://builds.apache.org/job/HBase-1.3-IT/67/])
HBASE-18180 Possible connection leak while closing BufferedMutator in (tedyu: 
rev 98cd0de41075961f728543617e9785573f813a1a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableOutputFormat.java


> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0, 1.3.2, 2.0.0-alpha-2
>
> Attachments: HBASE-18180-branch-1.3.patch, 
> HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18202) Trim down supplemental models file for unnecessary entries

2017-06-20 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-18202:
--
Attachment: HBASE-18202.v2.patch

v2: Fix an accidental dependency reordering and generate patch using patience 
diff algorithm so that it is easier to review.

> Trim down supplemental models file for unnecessary entries
> --
>
> Key: HBASE-18202
> URL: https://issues.apache.org/jira/browse/HBASE-18202
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Mike Drob
>Assignee: Mike Drob
> Attachments: HBASE-18202.patch, HBASE-18202.v2.patch
>
>
> With the more permissive "Apache License" check in HBASE-18033, we can remove 
> many entries from the supplemental-models.xml file. This issue is to track 
> that work separately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056839#comment-16056839
 ] 

Duo Zhang commented on HBASE-17125:
---

{quote}
Am still not in favor of asking the user to configure some extra Filter to get 
an expected behave from the system
{quote}

I'd say again that the javadoc never guarantee the current behavior and no 
doubt it is a broken semantic. And see my comment above,  I think use another 
filter to address the problem introduced by filter is the right direction. We 
should not put too many complexities to our core system.

And see my comment above, a real user case which shows that the current 
approach can solve his/her problem

{quote}
Oh, seems the user calls setMaxVerions to 1. I believe the problem is that 
he/she found that the filter will return old values then he/she use 
setMaxVersions(1) and hope this could solve the problem.
So it is clear that in this user's mind, setMaxVersions should be used to 
control the number of versions passed to the filter. This is exactly what we 
provide in the latest patch. With the patch in place, the user does not need to 
call setMaxVersions(1) anymore.
Thanks.
{quote}

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18180:
---
Fix Version/s: 1.3.2

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0, 1.3.2, 2.0.0-alpha-2
>
> Attachments: HBASE-18180-branch-1.3.patch, 
> HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056822#comment-16056822
 ] 

Pankaj Kumar edited comment on HBASE-18167 at 6/21/17 1:57 AM:
---

Got it Ted, will attach patch for other branches as well.


was (Author: pankaj2461):
Got it Ted.

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18249) Explicitly mark failed small test(s)

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18249:
---
Description: 
In some cases, I would remind contributor to re-attach patch if failure of 
small test(s) prevented medium / large tests from running.
HBASE-18167 was a recent example where TestSimpleRpcScheduler failed.

We should explicitly mark failed small test(s) so that people know a big 
portion of unit tests haven't been run.
This would reduce potential test failure in Jenkins builds if committer doesn't 
pay attention (that failed test was small test).

  was:
In some cases, I would remind contributor to re-attach patch if failure of 
small test(s) prevented medium / large tests from running.
HBASE-18167 was a recent example.

We should explicitly mark failed small test(s) so that people know a big 
portion of unit tests haven't been run.
This would reduce potential test failure in Jenkins builds if committer doesn't 
pay attention (that failed test was small test).


> Explicitly mark failed small test(s)
> 
>
> Key: HBASE-18249
> URL: https://issues.apache.org/jira/browse/HBASE-18249
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>
> In some cases, I would remind contributor to re-attach patch if failure of 
> small test(s) prevented medium / large tests from running.
> HBASE-18167 was a recent example where TestSimpleRpcScheduler failed.
> We should explicitly mark failed small test(s) so that people know a big 
> portion of unit tests haven't been run.
> This would reduce potential test failure in Jenkins builds if committer 
> doesn't pay attention (that failed test was small test).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18249) Explicitly mark failed small test(s)

2017-06-20 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18249:
--

 Summary: Explicitly mark failed small test(s)
 Key: HBASE-18249
 URL: https://issues.apache.org/jira/browse/HBASE-18249
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


In some cases, I would remind contributor to re-attach patch if failure of 
small test(s) prevented medium / large tests from running.
HBASE-18167 was a recent example.

We should explicitly mark failed small test(s) so that people know a big 
portion of unit tests haven't been run.
This would reduce potential test failure in Jenkins builds if committer doesn't 
pay attention (that failed test was small test).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-20 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056823#comment-16056823
 ] 

Pankaj Kumar commented on HBASE-18180:
--

Ping [~tedyu].

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18180-branch-1.3.patch, 
> HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056822#comment-16056822
 ] 

Pankaj Kumar commented on HBASE-18167:
--

Got it Ted.

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: HBASE-18167-branch-1-V2.patch

Reattaching V2 patch again to trigger QA. 

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056819#comment-16056819
 ] 

Ted Yu commented on HBASE-18167:


My point was that failed small test(s) prevented medium / large tests from 
running.

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18167:
-
Attachment: (was: HBASE-18167-branch-1-V2.patch)

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18167) OfflineMetaRepair tool may cause HMaster abort always

2017-06-20 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056815#comment-16056815
 ] 

Pankaj Kumar commented on HBASE-18167:
--

Locally it is successful.

> OfflineMetaRepair tool may cause HMaster abort always
> -
>
> Key: HBASE-18167
> URL: https://issues.apache.org/jira/browse/HBASE-18167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: HBASE-18167-branch-1.patch, HBASE-18167-branch-1-V2.patch
>
>
> In the production environment, we met a weird scenario where some Meta table 
> HFile blocks were missing due to some reason.
> To recover the environment we tried to rebuild the meta using 
> OfflineMetaRepair tool and restart the cluster, but HMaster couldn't finish 
> it's initialization. It always timed out as namespace table region was never 
> assigned.
> Steps to reproduce
> ==
> 1. Assign meta table region to HMaster (it can be on any RS, just to 
> reproduce the  scenario)
> {noformat}
>   
> hbase.balancer.tablesOnMaster
> hbase:meta
> 
> {noformat}
> 2. Start HMaster and RegionServer
> 2. Create two namespace, say "ns1" & "ns2"
> 3. Create two tables "ns1:t1' & "ns2:t1'
> 4. flush 'hbase:meta"
> 5. Stop HMaster (graceful shutdown)
> 6. Kill -9 RegionServer (Abnormal shutdown)
> 7. Run OfflineMetaRepair as follows,
> {noformat}
>   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix
> {noformat}
> 8. Restart HMaster and RegionServer
> 9. HMaster will never be able to finish its initialization and abort always 
> with below message,
> {code}
> 2017-06-06 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 12ms waiting for namespace table to be 
> assigned
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
> at 
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Root cause
> ==
> 1. During HM start up AM assumes that it's a failover scenario based on the 
> existing old WAL files, so SSH/SCP will split WAL files and assign the 
> holding regions. 
> 2. During SSH/SCP it retrieves the server holding regions from meta/AM's 
> in-memory-state, but meta only had "regioninfo" entry (as already rebuild by 
> OfflineMetaRepair). So empty region will be returned and it wont trigger any 
> assignment.
> 3. HMaster which is waiting for namespace table to be assigned will timeout 
> and abort always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-20 Thread Qilin Cao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056786#comment-16056786
 ] 

Qilin Cao edited comment on HBASE-18119 at 6/21/17 1:45 AM:


Hi, [~zyork],I think my original  test code(modify key/value then expect 
exception) was wrong, for the configuration is only created once, if  the test 
case runs successfully, the modified key/value may affect the class other tests.


was (Author: qilin cao):
Hi, [~zyork],I think my original  test code(modify key/value then expect 
exception) was wrong, for the configuration is only created once, if I modify 
the configuration key/value successfully, this may affect the class other tests.

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch, HBASE-18119-v4.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-18210) Implement Table#checkAndDelete()

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-18210.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HBASE-14850

Thanks for the review, Enis.

> Implement Table#checkAndDelete()
> 
>
> Key: HBASE-18210
> URL: https://issues.apache.org/jira/browse/HBASE-18210
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: HBASE-14850
>
> Attachments: 18210.v1.txt
>
>
> This issue is to implement Table#checkAndDelete() API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-20 Thread Qilin Cao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qilin Cao updated HBASE-18119:
--
Attachment: HBASE-18119-v4.patch

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch, HBASE-18119-v4.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-18244:
--

Assignee: Stephen Yuan Jiang

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> 
>
> Key: HBASE-18244
> URL: https://issues.apache.org/jira/browse/HBASE-18244
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started 
> timing-out/failing for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in 
> Object.wait() [0x7020d000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
> - locked <0x00078d1003f0> (a 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
> at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
> at 
> org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon 
> prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition 
> [0x7dbc9000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>Locked ownable synchronizers:
> - None
> {noformat}
> I don't see anything else running in the thread dump, but I do 

[jira] [Updated] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18244:
---
Fix Version/s: (was: 3.0.0)
   2.0.0

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> 
>
> Key: HBASE-18244
> URL: https://issues.apache.org/jira/browse/HBASE-18244
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
> Fix For: 2.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started 
> timing-out/failing for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in 
> Object.wait() [0x7020d000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
> - locked <0x00078d1003f0> (a 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
> at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
> at 
> org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon 
> prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition 
> [0x7dbc9000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>Locked ownable synchronizers:
> - None
> {noformat}
> I don't see anything else running in the thread dump, but I do see that meta 
> was 

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056801#comment-16056801
 ] 

Ted Yu commented on HBASE-17125:


When I tested my tentative patch, TestImportTSVWithVisibilityLabels hung.

TestImportTSVWithVisibilityLabels#testBulkOutputWithTsvImporterTextMapperWithInvalidLabels
 seems to be the last subtest running before I killed the surefire process.

FYI

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-20 Thread Qilin Cao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056786#comment-16056786
 ] 

Qilin Cao commented on HBASE-18119:
---

Hi, [~zyork],I think my original  test code(modify key/value then expect 
exception) was wrong, for the configuration is only created once, if I modify 
the configuration key/value successfully, this may affect the class other tests.

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18248:
---
Status: Patch Available  (was: Open)

> Warn if monitored task has been tied up beyond a configurable threshold
> ---
>
> Key: HBASE-18248
> URL: https://issues.apache.org/jira/browse/HBASE-18248
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18248-branch-1.3.patch, 
> HBASE-18248-branch-1.patch, HBASE-18248-branch-2.patch, HBASE-18248.patch
>
>
> Warn if monitored task has been tied up beyond a configurable threshold. We 
> especially want to do this for RPC tasks. Use a separate threshold for 
> warning about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18248:
---
Attachment: HBASE-18248.patch

> Warn if monitored task has been tied up beyond a configurable threshold
> ---
>
> Key: HBASE-18248
> URL: https://issues.apache.org/jira/browse/HBASE-18248
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18248-branch-1.3.patch, 
> HBASE-18248-branch-1.patch, HBASE-18248-branch-2.patch, HBASE-18248.patch
>
>
> Warn if monitored task has been tied up beyond a configurable threshold. We 
> especially want to do this for RPC tasks. Use a separate threshold for 
> warning about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18248:
---
Attachment: HBASE-18248-branch-2.patch

> Warn if monitored task has been tied up beyond a configurable threshold
> ---
>
> Key: HBASE-18248
> URL: https://issues.apache.org/jira/browse/HBASE-18248
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18248-branch-1.3.patch, 
> HBASE-18248-branch-1.patch, HBASE-18248-branch-2.patch
>
>
> Warn if monitored task has been tied up beyond a configurable threshold. We 
> especially want to do this for RPC tasks. Use a separate threshold for 
> warning about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056759#comment-16056759
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. Here is snippet showing the concept of slack:
bq. IMHO we should address this issue on our own (Like what Ted is trying to 
suggest here).
Ok. Let me try it.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18248:
---
Attachment: HBASE-18248-branch-1.patch
HBASE-18248-branch-1.3.patch

> Warn if monitored task has been tied up beyond a configurable threshold
> ---
>
> Key: HBASE-18248
> URL: https://issues.apache.org/jira/browse/HBASE-18248
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18248-branch-1.3.patch, HBASE-18248-branch-1.patch
>
>
> Warn if monitored task has been tied up beyond a configurable threshold. We 
> especially want to do this for RPC tasks. Use a separate threshold for 
> warning about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2017-06-20 Thread Tomu Tsuruhara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056756#comment-16056756
 ] 

Tomu Tsuruhara commented on HBASE-16616:


[~dvdreddy] You're right. The patch here is not enough. We needed the patch in 
the HBASE-16146 to resolve the issue. 


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18210) Implement Table#checkAndDelete()

2017-06-20 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056752#comment-16056752
 ] 

Enis Soztutar commented on HBASE-18210:
---

+1. 

> Implement Table#checkAndDelete()
> 
>
> Key: HBASE-18210
> URL: https://issues.apache.org/jira/browse/HBASE-18210
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18210.v1.txt
>
>
> This issue is to implement Table#checkAndDelete() API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-20 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-18248:
--

 Summary: Warn if monitored task has been tied up beyond a 
configurable threshold
 Key: HBASE-18248
 URL: https://issues.apache.org/jira/browse/HBASE-18248
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2


Warn if monitored task has been tied up beyond a configurable threshold. We 
especially want to do this for RPC tasks. Use a separate threshold for warning 
about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Duo Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056750#comment-16056750
 ] 

Duo Xu edited comment on HBASE-18226 at 6/21/17 12:54 AM:
--

Thanks [~tedyu]!

Attach a new patch and trigger the build.


was (Author: onpduo):
Thanks [~tedyu]!

Attache a new patch and trigger the build.

> Disable reverse DNS lookup at HMaster and use default hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Duo Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Xu updated HBASE-18226:
---
Attachment: HBASE-18226.006.patch

> Disable reverse DNS lookup at HMaster and use default hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Duo Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Xu updated HBASE-18226:
---
Status: Patch Available  (was: Open)

Thanks [~tedyu]!

Attache a new patch and trigger the build.

> Disable reverse DNS lookup at HMaster and use default hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Duo Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Xu updated HBASE-18226:
---
Status: Open  (was: Patch Available)

> Disable reverse DNS lookup at HMaster and use default hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056744#comment-16056744
 ] 

Enis Soztutar commented on HBASE-18036:
---

bq. The potential change would be more involved than we have now in 1.x code 
base. I open HBASE-18246 to track it (FYI, stack).
Fair enough. Thanks. 

> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056742#comment-16056742
 ] 

Hadoop QA commented on HBASE-18023:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
52s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 45s 
{color} | {color:red} hbase-common in master has 2 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 50s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
35m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 30m 42s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 28s 
{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestCheckTestClasses |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.10.1 Server=1.10.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873757/HBASE-18023.master.002.patch
 |
| JIRA Issue | HBASE-18023 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  xml  |
| uname | Linux 9ef592f24b2e 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 
24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5b485d1 |
| Default 

[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use default hostname provided by RegionServer

2017-06-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056737#comment-16056737
 ] 

Ted Yu commented on HBASE-18226:


Changing status is not enough to trigger QA.
Each attachment is associated with an Id. The Id for most recent attachment has 
been used.

You need to attach again.

> Disable reverse DNS lookup at HMaster and use default hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Xu
>Assignee: Duo Xu
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17752) Update reporting RPCs/Shell commands to break out space utilization by snapshot

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056728#comment-16056728
 ] 

Hudson commented on HBASE-17752:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3231 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3231/])
HBASE-17752 Shell command to list snapshot sizes WRT quotas (elserj: rev 
5b485d14cd6a5bce1bef8329122166153162037e)
* (add) hbase-shell/src/main/ruby/shell/commands/list_snapshot_sizes.rb
* (edit) hbase-shell/src/test/ruby/hbase/quotas_test.rb
* (edit) hbase-shell/src/main/ruby/hbase/quotas.rb
* (edit) hbase-shell/src/test/ruby/tests_runner.rb
* (edit) hbase-shell/src/main/ruby/shell.rb
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/TableQuotaSnapshotStore.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/quotas/QuotaTableUtil.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestSpaceQuotasWithSnapshots.java


> Update reporting RPCs/Shell commands to break out space utilization by 
> snapshot
> ---
>
> Key: HBASE-17752
> URL: https://issues.apache.org/jira/browse/HBASE-17752
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 3.0.0
>
> Attachments: HBASE-17752.001.patch, HBASE-17752.002.patch, 
> HBASE-17752.003.patch, HBASE-17752.004.patch
>
>
> For adminstrators running HBase with space quotas, it is useful to provide a 
> breakdown of the utilization of a table. For example, it may be non-intuitive 
> that a table's utilization is primarily made up of snapshots. We should 
> provide a new command or modify existing commands such that an admin can see 
> the utilization for a table/ns:
> e.g.
> {noformat}
> table1:   17GB
>   resident:   10GB
>   snapshot_a: 5GB
>   snapshot_b: 2GB
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18223) Track the effort to improve/bug fix read replica feature

2017-06-20 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056721#comment-16056721
 ] 

huaxiang sun commented on HBASE-18223:
--

Hbck enhancement to work with read replica

> Track the effort to improve/bug fix read replica feature
> 
>
> Key: HBASE-18223
> URL: https://issues.apache.org/jira/browse/HBASE-18223
> Project: HBase
>  Issue Type: Task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>
> During the hbasecon 2017, a group of people met and agreed to collaborate the 
> effort to improve/bug fix read replica feature so users can enable this 
> feature in their clusters. This jira is created to track jiras which are 
> known related with read replica feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18247) Hbck to fix the case that replica region shows as key in the meta table

2017-06-20 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-18247:
-
Description: 
Recently, we run into one case with read replica, the replica region shows up 
as key in meta table (it is not supposed to happen, we are still working on why 
it showed up in the meta table).

However, hbck always reported the error about the primary region. Please see 
the error attached.

{code}
The entry in meta table
test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3. 
column=info:regioninfo, timestamp=1494958820573, value={ENCODED => 
c3e6f235fe7caef75f8b0fb92a012da3, NAME => 
'test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3.', STARTKEY 
=> '92b0201b', ENDKEY => '92f1a952', REPLICA_ID => 1}

ERROR: Region { meta => 
test,92b0201b,1492546349354.d2c637715f31a072f174e70d407fb458., hdfs => null, 
deployed => , replicaId => 0 } found in META, but not in HDFS or deployed on 
any region server.
{code}

Traced the code, in the following line, it does not consider the case that 
replicaId in regionInfo could be non-default. 

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L985

If it is changed to get replicaId from regionInfo, then hbck should be able to 
fix this by "-fixMeta".

{code}
diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
index 9eb5111..1649e53 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
@@ -982,7 +982,7 @@ public class MetaTableAccessor {
 List locations = new ArrayList<>(1);
 NavigableMap> familyMap = 
r.getNoVersionMap();
 
-locations.add(getRegionLocation(r, regionInfo, 0));
+locations.add(getRegionLocation(r, regionInfo, regionInfo.getReplicaId()));
 
 NavigableMap infoMap = familyMap.get(getCatalogFamily());
 if (infoMap == null) return new RegionLocations(locations);
{code}

  was:
Recently, we run into one case with read replica, the replica region shows up 
as key in meta table (it is not supposed to happen, we are still working on why 
it showed up in the meta table).

However, hbck always reported the error about the primary region. Please see 
the error attached.

{code}
test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3. 
column=info:regioninfo, timestamp=1494958820573, value={ENCODED => 
c3e6f235fe7caef75f8b0fb92a012da3, NAME => 
'test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3.', STARTKEY 
=> '92b0201b', ENDKEY => '92f1a952', REPLICA_ID => 1}

ERROR: Region { meta => 
test,92b0201b,1492546349354.d2c637715f31a072f174e70d407fb458., hdfs => null, 
deployed => , replicaId => 0 } found in META, but not in HDFS or deployed on 
any region server.
{code}

Traced the code, in the following line, it does not consider the case that 
replicaId in regionInfo could be non-default. 

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L985

If it is changed to get replicaId from regionInfo, then hbck should be able to 
fix this by "-fixMeta".

{code}
diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
index 9eb5111..1649e53 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
@@ -982,7 +982,7 @@ public class MetaTableAccessor {
 List locations = new ArrayList<>(1);
 NavigableMap> familyMap = 
r.getNoVersionMap();
 
-locations.add(getRegionLocation(r, regionInfo, 0));
+locations.add(getRegionLocation(r, regionInfo, regionInfo.getReplicaId()));
 
 NavigableMap infoMap = familyMap.get(getCatalogFamily());
 if (infoMap == null) return new RegionLocations(locations);
{code}


> Hbck to fix the case that replica region shows as key in the meta table
> ---
>
> Key: HBASE-18247
> URL: https://issues.apache.org/jira/browse/HBASE-18247
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-1
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
>
> Recently, we run into one case with read replica, the replica region shows up 
> as key in meta table (it is not supposed to happen, we are still working on 
> why it showed up in the meta table).
> However, hbck always reported the error about the primary region. 

[jira] [Created] (HBASE-18247) Hbck to fix the case that replica region shows as key in the meta table

2017-06-20 Thread huaxiang sun (JIRA)
huaxiang sun created HBASE-18247:


 Summary: Hbck to fix the case that replica region shows as key in 
the meta table
 Key: HBASE-18247
 URL: https://issues.apache.org/jira/browse/HBASE-18247
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-alpha-1
Reporter: huaxiang sun
Assignee: huaxiang sun
Priority: Minor


Recently, we run into one case with read replica, the replica region shows up 
as key in meta table (it is not supposed to happen, we are still working on why 
it showed up in the meta table).

However, hbck always reported the error about the primary region. Please see 
the error attached.

{code}
test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3. 
column=info:regioninfo, timestamp=1494958820573, value={ENCODED => 
c3e6f235fe7caef75f8b0fb92a012da3, NAME => 
'test,92b0201b,1492546349354_0001.c3e6f235fe7caef75f8b0fb92a012da3.', STARTKEY 
=> '92b0201b', ENDKEY => '92f1a952', REPLICA_ID => 1}

ERROR: Region { meta => 
test,92b0201b,1492546349354.d2c637715f31a072f174e70d407fb458., hdfs => null, 
deployed => , replicaId => 0 } found in META, but not in HDFS or deployed on 
any region server.
{code}

Traced the code, in the following line, it does not consider the case that 
replicaId in regionInfo could be non-default. 

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L985

If it is changed to get replicaId from regionInfo, then hbck should be able to 
fix this by "-fixMeta".

{code}
diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
index 9eb5111..1649e53 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
@@ -982,7 +982,7 @@ public class MetaTableAccessor {
 List locations = new ArrayList<>(1);
 NavigableMap> familyMap = 
r.getNoVersionMap();
 
-locations.add(getRegionLocation(r, regionInfo, 0));
+locations.add(getRegionLocation(r, regionInfo, regionInfo.getReplicaId()));
 
 NavigableMap infoMap = familyMap.get(getCatalogFamily());
 if (infoMap == null) return new RegionLocations(locations);
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056708#comment-16056708
 ] 

Stephen Yuan Jiang commented on HBASE-15691:


[~zjushch], you reviewed the original patch in HBASE-10205.  Could you help 
review the V2 patch?

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18240) Add hbase-auxillary, a project with hbase utility including an hbase-shaded-thirdparty module with guava, netty, etc.

2017-06-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056706#comment-16056706
 ] 

stack commented on HBASE-18240:
---

I pushed first commit to the new repo. See 
https://git-wip-us.apache.org/repos/asf?p=hbase-thirdparty.git It builds a 
hbase-thirdparty-shaded jar. I pushed up a SNAPSHOT here:  
https://repository.apache.org/content/groups/snapshots/org/apache/hbase/thirdparty/hbase-thirdparty-shaded/

Would appreciate review of what I've posted. Currently it shades guava 22.0 and 
that is it. See  https://git-wip-us.apache.org/repos/asf?p=hbase-thirdparty.git

Let me now work on a patch that makes hbase use the shaded guava.

> Add hbase-auxillary, a project with hbase utility including an 
> hbase-shaded-thirdparty module with guava, netty, etc.
> -
>
> Key: HBASE-18240
> URL: https://issues.apache.org/jira/browse/HBASE-18240
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-18240.master.001.patch, hbase-auxillary.tgz
>
>
> This issue is about adding a new related project to host hbase auxillary 
> utility. In this new project, the first thing we'd add is a module to host 
> shaded versions of third party libraries.
> This task comes of discussion held here 
> http://apache-hbase.679495.n3.nabble.com/DISCUSS-More-Shading-td4083025.html 
> where one conclusion of the DISCUSSION was "... pushing this part forward 
> with some code is the next logical step. Seems to be consensus about taking 
> our known internal dependencies and performing this shade magic."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Description: 
HBASE-10205 solves the following problem:
"
The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
RAM queue containing entries to be cached. freeSpace() in turn calls 
BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which 
iterates over 'bucketList'. At the same time another WriterThread might call 
BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), 
add a bucket to 'bucketList' and consequently cause a 
ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are 
synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which 
allows this race to occur.
"

However, for some unknown reason, HBASE-10205 was only committed to master (2.0 
and beyond) and 0.98 branches only. To preserve continuity we should commit it 
to branch-1.

  was:HBASE-10205 was committed to trunk and 0.98 branches only. To preserve 
continuity we should commit it to branch-1. The change requires more than 
nontrivial fixups so I will attach a backport of the change from trunk to 
current branch-1 here. 


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails

2017-06-20 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056702#comment-16056702
 ] 

Thiruvel Thirumoolan commented on HBASE-18244:
--

[~elserj], The RSGroups tests were disabled as part of AMv2 changes, tracked at 
HBASE-18110. My observation is the same as yours. cc [~syuanjiang]

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> 
>
> Key: HBASE-18244
> URL: https://issues.apache.org/jira/browse/HBASE-18244
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
> Fix For: 3.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started 
> timing-out/failing for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in 
> Object.wait() [0x7020d000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
> - locked <0x00078d1003f0> (a 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
> at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
> at 
> org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon 
> prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition 
> [0x7dbc9000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>Locked ownable 

[jira] [Commented] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056685#comment-16056685
 ] 

Hudson commented on HBASE-18036:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #198 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/198/])
HBASE-18036 Data locality is not maintained after cluster restart or SSH 
(syuanjiangdev: rev 2fb68f5046a5c5dd54070148a80882ece5c9b8a1)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java


> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Summary: HBase 1.x : Data locality is not maintained after cluster restart 
or SSH  (was: Data locality is not maintained after cluster restart or SSH)

> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Fix Version/s: 1.4.0

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056681#comment-16056681
 ] 

Stephen Yuan Jiang commented on HBASE-18036:


[~enis], with Proc-V2 AM, the current change is no longer available.  
Currently, with initial commit of new AM, SSH calls 
AM.createAssignProcedures(), with forceNewPlan=true.  Even forceNewPlan is 
false, when we compare existing plan's ServerName, it will not be equal to the 
dead server due to timestamp change (ServerName is hostname+port+timestamp) & 
hence a new plan/server would be used for the region assignment.  Hence, 
locality is not guaranteed to be retained.  The potential change would be more 
involved than we have now in 1.x code base.  I open HBASE-18246 to track it 
(FYI, [~stack]).  

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18246) Proc-V2 AM: Maintain Data locality in ServerCrashProcedure

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18246:
---
Summary: Proc-V2 AM: Maintain Data locality in ServerCrashProcedure  (was: 
Maintain Data locality in ServerCrashProcedure)

> Proc-V2 AM: Maintain Data locality in ServerCrashProcedure
> --
>
> Key: HBASE-18246
> URL: https://issues.apache.org/jira/browse/HBASE-18246
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>
> Before HBASE-18036, SSH would use round-robin to re-distribute regions during 
> processing.  Round-robin assignment would loss data locality.  HBASE-18036 
> retains data locality if the dead region server has already restarted when 
> the dead RS is processing.  
> With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases 
> is no longer possible.  We need to implement the same logic under Proc-V2 
> based AM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18246) Maintain Data locality in ServerCrashProcedure

2017-06-20 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18246:
--

 Summary: Maintain Data locality in ServerCrashProcedure
 Key: HBASE-18246
 URL: https://issues.apache.org/jira/browse/HBASE-18246
 Project: HBase
  Issue Type: Sub-task
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang


Before HBASE-18036, SSH would use round-robin to re-distribute regions during 
processing.  Round-robin assignment would loss data locality.  HBASE-18036 
retains data locality if the dead region server has already restarted when the 
dead RS is processing.  

With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases 
is no longer possible.  We need to implement the same logic under Proc-V2 based 
AM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18235) LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18235:
---
Attachment: HBASE-18235.patch

So we need this? Patch attached

> LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname
> ---
>
> Key: HBASE-18235
> URL: https://issues.apache.org/jira/browse/HBASE-18235
> Project: HBase
>  Issue Type: Bug
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-18235.patch
>
>
> The original patch used localhost to have assignment fail fast. Avoiding 
> misleading DNS exceptions, delays due to dns lookup, etc. 
> Was wondering what the reason was for changing it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18235) LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname

2017-06-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18235:
---
Status: Patch Available  (was: Open)

> LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname
> ---
>
> Key: HBASE-18235
> URL: https://issues.apache.org/jira/browse/HBASE-18235
> Project: HBase
>  Issue Type: Bug
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-18235.patch
>
>
> The original patch used localhost to have assignment fail fast. Avoiding 
> misleading DNS exceptions, delays due to dns lookup, etc. 
> Was wondering what the reason was for changing it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-20 Thread David Harju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Harju updated HBASE-18023:

Status: Patch Available  (was: In Progress)

Implemented comment suggestions

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: David Harju
>Priority: Minor
> Attachments: HBASE-18023.master.001.patch, 
> HBASE-18023.master.002.patch
>
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-20 Thread David Harju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Harju updated HBASE-18023:

Attachment: HBASE-18023.master.002.patch

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: David Harju
>Priority: Minor
> Attachments: HBASE-18023.master.001.patch, 
> HBASE-18023.master.002.patch
>
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056638#comment-16056638
 ] 

Hudson commented on HBASE-18036:


FAILURE: Integrated in Jenkins build HBase-1.4 #780 (See 
[https://builds.apache.org/job/HBase-1.4/780/])
HBASE-18036 Data locality is not maintained after cluster restart or SSH 
(syuanjiangdev: rev 532e0dda16f3c5034aa337201bf6d733cc0a1c7b)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java


> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18245) Handle failed server in RpcClient

2017-06-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18245:
---
Description: 
This task is to add support for failed server in RpcClient::GetConnection().

FailedServers Java class would be ported to C++.

  was:
This task is to add support for failed server in RpcClient#GetConnection().

FailedServers Java class would be ported to C++.


> Handle failed server in RpcClient
> -
>
> Key: HBASE-18245
> URL: https://issues.apache.org/jira/browse/HBASE-18245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>
> This task is to add support for failed server in RpcClient::GetConnection().
> FailedServers Java class would be ported to C++.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18234) Revisit the async admin api

2017-06-20 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-18234:
---
Attachment: HBASE-18234.master.006.patch

> Revisit the async admin api
> ---
>
> Key: HBASE-18234
> URL: https://issues.apache.org/jira/browse/HBASE-18234
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18234.master.001.patch, 
> HBASE-18234.master.002.patch, HBASE-18234.master.003.patch, 
> HBASE-18234.master.004.patch, HBASE-18234.master.005.patch, 
> HBASE-18234.master.006.patch, HBASE-18234.master.006.patch
>
>
> 1. Update the balance method name. 
> balancer -> balance
> setBalancerRunning -> setBalancerOn
> isBalancerEnabled -> isBalancerOn
> 2. Use HRegionLocation instead of Pair
> 3. Remove the closeRegionWithEncodedRegionName method. Because all other api 
> can handle region name or encoded region name both. So don't need a method 
> for encoded name.
> 4. Unify the region name parameter's type to byte[]. And region name may be 
> full name or encoded name.
> 5. Unify the server name parameter's type to ServerName. For smoe api, it 
> support null for server name. So use Optional instead.
> 6. Unify the table name parameter's type to TableName.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18245) Handle failed server in RpcClient

2017-06-20 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056587#comment-16056587
 ] 

Enis Soztutar commented on HBASE-18245:
---

Indeed, we need to track recently failed servers in order to fail the RPCs 
immediately as opposed to blocking on the socket. 

> Handle failed server in RpcClient
> -
>
> Key: HBASE-18245
> URL: https://issues.apache.org/jira/browse/HBASE-18245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>
> This task is to add support for failed server in RpcClient#GetConnection().
> FailedServers Java class would be ported to C++.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18086) Create native client which creates load on selected cluster

2017-06-20 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056584#comment-16056584
 ] 

Enis Soztutar commented on HBASE-18086:
---

bq. If we don't handle RetriesExhaustedException, how do we perform validation 
for what is submitted for write(s) ?
If the client gets REE, then it should fail the test. We are testing client 
level retries with the test. If for whatever reason we are getting REE, it is a 
failure condition. 

> Create native client which creates load on selected cluster
> ---
>
> Key: HBASE-18086
> URL: https://issues.apache.org/jira/browse/HBASE-18086
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18086.v1.txt
>
>
> This task is to create a client which uses multiple threads to conduct Puts 
> followed by Gets against selected cluster.
> Default is to run the tool against local cluster.
> This would give us some idea on the characteristics of native client in terms 
> of handling high load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18133) Low-latency space quota size reports

2017-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056581#comment-16056581
 ] 

Hadoop QA commented on HBASE-18133:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 8s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 34s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 28s 
{color} | {color:red} hbase-server generated 1 new + 12 unchanged - 0 fixed = 
13 total (was 12) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} hbase-hadoop2-compat generated 0 new + 1 unchanged - 1 
fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 22s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 8s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Nullcheck of HRegion.rsServices at line 2764 of value previously 
dereferenced in 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(WAL, 
MonitoredTask, HRegion$PrepareFlushResult, Collection)  At HRegion.java:2764 of 
value previously dereferenced in 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(WAL, 
MonitoredTask, HRegion$PrepareFlushResult, Collection)  At 

[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056573#comment-16056573
 ] 

Hudson commented on HBASE-18036:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #154 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/154/])
HBASE-18036 Data locality is not maintained after cluster restart or SSH 
(syuanjiangdev: rev 3f9ba2f247ef0fb7cebf35a4501bd7cfa36197bc)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java


> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-20 Thread Zach York (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056576#comment-16056576
 ] 

Zach York commented on HBASE-18119:
---

{quote} 
+  @Test
+  public void testCheckHFileVersionForWrongConfiged() {
+Configuration conf = TEST_UTIL.getConfiguration();
+int version = conf.getInt(HFile.FORMAT_VERSION_KEY, 
HFile.MAX_FORMAT_VERSION);
+conf.setInt(HFile.FORMAT_VERSION_KEY, HFile.MIN_FORMAT_VERSION);
+try {
+  HFile.checkHFileVersion(conf);
+  fail();
+} catch (IllegalArgumentException e) {
+  conf.setInt(HFile.FORMAT_VERSION_KEY, version);
+  
assertTrue(String.valueOf(version).equals(conf.get(HFile.FORMAT_VERSION_KEY)));
+}
+  }
{quote}

I think we can simplify this test case to only include the failure case as your 
test above this should test this functionality.


{code:java}
  @Test(expected=IllegalArgumentException.class)
  public void testCheckHFileVersionNotEqualToMaxVersion() {
Configuration conf = TEST_UTIL.getConfiguration();
conf.setInt(HFile.FORMAT_VERSION_KEY, HFile.MIN_FORMAT_VERSION);
HFile.checkHFileVersion(conf);
  }
{code}


Also:
bq. +  public void testCheckHFileVersionForCorrectConfiged(){
ForCorrectConfiged -> ForCorrectConfiguration Or even something like 
testCheckHFileVersionEqualToMaxVersion

Also please include a space between '()' and '{' 

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18202) Trim down supplemental models file for unnecessary entries

2017-06-20 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-18202:
--
Attachment: HBASE-18202.patch

Patch tested locally on master against hadoop-2 profile and hadoop-3 profile 
with 3.0.0-alpha2 and -alpha3

Should commit HBASE-16351 first.

> Trim down supplemental models file for unnecessary entries
> --
>
> Key: HBASE-18202
> URL: https://issues.apache.org/jira/browse/HBASE-18202
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Mike Drob
> Attachments: HBASE-18202.patch
>
>
> With the more permissive "Apache License" check in HBASE-18033, we can remove 
> many entries from the supplemental-models.xml file. This issue is to track 
> that work separately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18202) Trim down supplemental models file for unnecessary entries

2017-06-20 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-18202:
--
Assignee: Mike Drob
  Status: Patch Available  (was: Open)

> Trim down supplemental models file for unnecessary entries
> --
>
> Key: HBASE-18202
> URL: https://issues.apache.org/jira/browse/HBASE-18202
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Mike Drob
>Assignee: Mike Drob
> Attachments: HBASE-18202.patch
>
>
> With the more permissive "Apache License" check in HBASE-18033, we can remove 
> many entries from the supplemental-models.xml file. This issue is to track 
> that work separately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056556#comment-16056556
 ] 

Hudson commented on HBASE-18036:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #150 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/150/])
HBASE-18036 Data locality is not maintained after cluster restart or SSH 
(syuanjiangdev: rev 3f9ba2f247ef0fb7cebf35a4501bd7cfa36197bc)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java


> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   >