[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047448#comment-16047448
 ] 

Hadoop QA commented on HBASE-18161:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
12s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 2s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 131m 32s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 172m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.mapreduce.TestCopyTable |
|   | hadoop.hbase.mapreduce.TestHFileOutputFormat2 |
|   | hadoop.hbase.coprocessor.TestCoprocessorMetrics |
| Timed out junit tests | 
org.apache.hadoop.hbase.master.assignment.TestSplitTableRegionProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872793/MultiHFileOutputFormatSupport_HBASE_18161_v4.patch
 |
| JIRA Issue | HBASE-18161 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 8eed62fef11c 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 384e308 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7175/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7175/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7175/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7175/console |
| Powered by | Apache 

[jira] [Commented] (HBASE-18128) compaction marker could be skipped

2017-06-12 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047400#comment-16047400
 ] 

Allan Yang commented on HBASE-18128:


I think it won't create any trouble if we don't replay compaction marker when 
region open. This situation only happens to the latest compaction. But if we 
replay all the compaction marker for this rare case, IMHO, this is too much.

> compaction marker could be skipped 
> ---
>
> Key: HBASE-18128
> URL: https://issues.apache.org/jira/browse/HBASE-18128
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18128-master.patch, HBASE-18128-master-v2.patch, 
> HBASE-18128-master-v3.patch, TestCompactionMarker.java
>
>
> The sequence for a compaction are as follows:
> 1. Compaction writes new files under region/.tmp directory (compaction output)
> 2. Compaction atomically moves the temporary file under region directory
> 3. Compaction appends a WAL edit containing the compaction input and output 
> files. Forces sync on WAL.
> 4. Compaction deletes the input files from the region directory.
> But if a flush happened between 3 and 4, then the regionserver crushed. The 
> compaction marker will be skipped when splitting log because the sequence id 
> of compaction marker is smaller than lastFlushedSequenceId.
> {code}
> if (lastFlushedSequenceId >= entry.getKey().getLogSeqNum()) {
>   editsSkipped++;
>   continue;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-12 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-17678:
-
Attachment: HBASE-17678.branch-1.v2.patch
HBASE-17678.branch-1.1.v2.patch

Trigger Hadoop QA again.

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.1.v2.patch, 
> HBASE-17678.branch-1.1.v2.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.branch-1.v2.patch, 
> HBASE-17678.branch-1.v2.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single 

[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-12 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047378#comment-16047378
 ] 

Vincent Poon commented on HBASE-18137:
--

[~anoop.hbase] Currently other replication configs follow the 
"replication.source.*" pattern

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch, 
> HBASE-18137.branch-1.v1.patch, HBASE-18137.branch-1.v2.patch, 
> HBASE-18137.master.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18128) compaction marker could be skipped

2017-06-12 Thread Jingyun Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-18128:
-
Attachment: HBASE-18128-master-v3.patch
TestCompactionMarker.java

[~tedyu] I didn't remove the condition, but to check if the cell is compaction 
marker. If it is, I will set the lastFlushedSequenceId to Long.MAX_VALUE so I 
can avoid the compaction marker being skipped.

{code}
if (lastFlushedSequenceId >= seqId) {
   editsSkipped++;
   continue;
 }
{code}

Patch updated, please check it out.
Thanks

> compaction marker could be skipped 
> ---
>
> Key: HBASE-18128
> URL: https://issues.apache.org/jira/browse/HBASE-18128
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18128-master.patch, HBASE-18128-master-v2.patch, 
> HBASE-18128-master-v3.patch, TestCompactionMarker.java
>
>
> The sequence for a compaction are as follows:
> 1. Compaction writes new files under region/.tmp directory (compaction output)
> 2. Compaction atomically moves the temporary file under region directory
> 3. Compaction appends a WAL edit containing the compaction input and output 
> files. Forces sync on WAL.
> 4. Compaction deletes the input files from the region directory.
> But if a flush happened between 3 and 4, then the regionserver crushed. The 
> compaction marker will be skipped when splitting log because the sequence id 
> of compaction marker is smaller than lastFlushedSequenceId.
> {code}
> if (lastFlushedSequenceId >= entry.getKey().getLogSeqNum()) {
>   editsSkipped++;
>   continue;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18128) compaction marker could be skipped

2017-06-12 Thread Jingyun Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-18128:
-
Attachment: (was: TestCompactionMarker.java)

> compaction marker could be skipped 
> ---
>
> Key: HBASE-18128
> URL: https://issues.apache.org/jira/browse/HBASE-18128
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18128-master.patch, HBASE-18128-master-v2.patch
>
>
> The sequence for a compaction are as follows:
> 1. Compaction writes new files under region/.tmp directory (compaction output)
> 2. Compaction atomically moves the temporary file under region directory
> 3. Compaction appends a WAL edit containing the compaction input and output 
> files. Forces sync on WAL.
> 4. Compaction deletes the input files from the region directory.
> But if a flush happened between 3 and 4, then the regionserver crushed. The 
> compaction marker will be skipped when splitting log because the sequence id 
> of compaction marker is smaller than lastFlushedSequenceId.
> {code}
> if (lastFlushedSequenceId >= entry.getKey().getLogSeqNum()) {
>   editsSkipped++;
>   continue;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Densel Santhmayor updated HBASE-18161:
--
Attachment: MultiHFileOutputFormatSupport_HBASE_18161_v3.patch

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Densel Santhmayor updated HBASE-18161:
--
Attachment: MultiHFileOutputFormatSupport_HBASE_18161_v4.patch

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047307#comment-16047307
 ] 

Anoop Sam John commented on HBASE-18137:


bq. A new config "replication.source.eof.autorecovery"
Normally our configs will be prefixed with 'hbase.'  right?  Any reason its 
diff here? Or just missed?  Current replication area configs are this way.  Did 
not check. Just asking as noticed in the Release Notes.

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch, 
> HBASE-18137.branch-1.v1.patch, HBASE-18137.branch-1.v2.patch, 
> HBASE-18137.master.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047235#comment-16047235
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Ok great! Just let me know if there's anything else additional you'd like me to 
do.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047224#comment-16047224
 ] 

Sean Busbey commented on HBASE-18164:
-

yeah none of those look related. might need to figure out if hte flaky list is 
working, but that shouldn't be a blocker here.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047203#comment-16047203
 ] 

Hadoop QA commented on HBASE-18164:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 6s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 114m 57s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 163m 53s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872092/HBASE-18164-02.patch |
| JIRA Issue | HBASE-18164 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 8ba63419e329 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 384e308 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7174/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7174/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7174/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7174/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Much faster locality cost function and candidate generator
> 

[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-12 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon updated HBASE-18137:
-
Release Note: 0-length WAL files can potentially cause the replication 
queue to get stuck.  A new config "replication.source.eof.autorecovery" has 
been added, if set to true (default is false), the 0-length WAL file will be 
skipped after 1) the max number of retries has been hit, and 2) there are more 
WAL files in the queue.

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch, 
> HBASE-18137.branch-1.v1.patch, HBASE-18137.branch-1.v2.patch, 
> HBASE-18137.master.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047148#comment-16047148
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Thanks [~busbey]! The failures don't seem related to my changes, but I'm happy 
to investigate if they fail again on this next run.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-06-12 Thread Chinmay Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kulkarni updated HBASE-17988:
-
Fix Version/s: (was: 1.2.7)
   (was: 1.3.2)
   (was: 1.4.0)
   (was: 1.1.2)
   3.0.0

> get-active-master.rb and draining_servers.rb no longer work
> ---
>
> Key: HBASE-17988
> URL: https://issues.apache.org/jira/browse/HBASE-17988
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Chinmay Kulkarni
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-17988.002.patch, HBASE-17988.patch
>
>
> The scripts {{bin/get-active-master.rb}} and {{bin/draining_servers.rb}} no 
> longer work on current master branch. Here is an example error message:
> {noformat}
> $ bin/hbase-jruby bin/get-active-master.rb 
> NoMethodError: undefined method `masterAddressZNode' for 
> #
>at bin/get-active-master.rb:35
> {noformat}
> My initial probing suggests that this is likely due to movement that happened 
> in HBASE-16690. Perhaps instead of reworking the ruby, there is similar Java 
> functionality already existing somewhere.
> Putting priority at critical since it's impossible to know whether users rely 
> on the scripts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047030#comment-16047030
 ] 

Densel Santhmayor edited comment on HBASE-18161 at 6/12/17 8:33 PM:


The link is https://reviews.apache.org/r/60027/


was (Author: denselm):
I had to choose hbase-git since the group "hbase" was giving me an error when 
posting a git diff. 

The link is https://reviews.apache.org/r/60027/

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by 

[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047032#comment-16047032
 ] 

Sean Busbey commented on HBASE-18164:
-

I reran the QA build to see if some of those failures are maybe flakys

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047030#comment-16047030
 ] 

Densel Santhmayor commented on HBASE-18161:
---

I had to choose hbase-git since the group "hbase" was giving me an error when 
posting a git diff. 

The link is https://reviews.apache.org/r/60027/

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047019#comment-16047019
 ] 

Ted Yu commented on HBASE-18161:


Can you post the link to the review board request ?

Please select hbase as the group before publishing.

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047012#comment-16047012
 ] 

Ted Yu commented on HBASE-18209:


TestCoprocessorMetrics is flaky test - not related to the patch.

> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18209.v1.txt
>
>
> We need httpclient & httpcore jars to be present when rootdir is placed on 
> s3(a).
> Attempts to move to the fully shaded amazon-SDK JAR caused problems of its 
> own. (according to [~steve_l])
> Here are the versions we should use:
> 4.5.2
> 4.4.4
> Currently they are declared test dependency.
> This JIRA is to move to compile time dependency so that the corresponding 
> jars are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046947#comment-16046947
 ] 

Densel Santhmayor commented on HBASE-18161:
---

Rookie mistake: I forgot to rebase on master after my latest changes. Fixed now 
and posting on review board

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046926#comment-16046926
 ] 

Hadoop QA commented on HBASE-18209:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 25s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 120m 37s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 156m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872731/18209.v1.txt |
| JIRA Issue | HBASE-18209 |
| Optional Tests |  asflicense  javac  javadoc  unit  xml  compile  |
| uname | Linux 1a8ac26175ef 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 384e308 |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7172/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7172/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7172/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7172/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>  

[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046894#comment-16046894
 ] 

Ted Yu commented on HBASE-18161:


There is conflict for the following file:
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat2.java

Please put next patch on the review board - it is quite big.

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046857#comment-16046857
 ] 

Ted Yu commented on HBASE-18180:


lgtm

Better resubmit for QA run.

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0
>
> Attachments: HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046817#comment-16046817
 ] 

Hadoop QA commented on HBASE-18180:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
34s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
20s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
66m 37s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 213m 1s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
55s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 314m 0s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver |
| Timed out junit tests | 
org.apache.hadoop.hbase.client.TestAsyncSnapshotAdminApi |
|   | org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream |
|   | org.apache.hadoop.hbase.quotas.TestSpaceQuotas |
|   | org.apache.hadoop.hbase.quotas.TestQuotaObserverChoreWithMiniCluster |
|   | org.apache.hadoop.hbase.client.TestReplicaWithCluster |
|   | org.apache.hadoop.hbase.snapshot.TestMobRestoreFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.replication.TestReplicationSmallTests |
|   | org.apache.hadoop.hbase.client.TestAsyncTableScan |
|   | org.apache.hadoop.hbase.client.TestBlockEvictionFromClient |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872687/HBASE-18180.patch |
| JIRA Issue | HBASE-18180 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 5648e5ea0e03 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046766#comment-16046766
 ] 

Hadoop QA commented on HBASE-18161:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} HBASE-18161 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872734/MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
 |
| JIRA Issue | HBASE-18161 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7173/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory 

[jira] [Created] (HBASE-18210) Implement Table#checkAndDelete()

2017-06-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18210:
--

 Summary: Implement Table#checkAndDelete()
 Key: HBASE-18210
 URL: https://issues.apache.org/jira/browse/HBASE-18210
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu


This issue is to implement Table#checkAndDelete() API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-12 Thread Densel Santhmayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Densel Santhmayor updated HBASE-18161:
--
Attachment: MultiHFileOutputFormatSupport_HBASE_18161_v2.patch

All tests fixed for default HFileOutputFormat as well as for code written to 
test multi-table HFileOutputFormat2

> MultiHFileOutputFormat - comprehensive incremental load support
> ---
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 
> and HBASE-16261. Thanks to [~clayb] for his support.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18209:
---
Attachment: 18209.v1.txt

> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18209.v1.txt
>
>
> We need httpclient & httpcore jars to be present when rootdir is placed on 
> s3(a).
> Attempts to move to the fully shaded amazon-SDK JAR caused problems of its 
> own. (according to [~steve_l])
> Here are the versions we should use:
> 4.5.2
> 4.4.4
> Currently they are declared test dependency.
> This JIRA is to move to compile time dependency so that the corresponding 
> jars are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18209:
---
Status: Patch Available  (was: Open)

> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18209.v1.txt
>
>
> We need httpclient & httpcore jars to be present when rootdir is placed on 
> s3(a).
> Attempts to move to the fully shaded amazon-SDK JAR caused problems of its 
> own. (according to [~steve_l])
> Here are the versions we should use:
> 4.5.2
> 4.4.4
> Currently they are declared test dependency.
> This JIRA is to move to compile time dependency so that the corresponding 
> jars are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18209:
---
Description: 
We need httpclient & httpcore jars to be present when rootdir is placed on 
s3(a).
Attempts to move to the fully shaded amazon-SDK JAR caused problems of its own. 
(according to [~steve_l])

Here are the versions we should use:

4.5.2
4.4.4

Currently they are declared test dependency.

This JIRA is to move to compile time dependency so that the corresponding jars 
are bundled in lib directory.

  was:
We need httpclient & httpcore jars to be present when rootdir is placed on 
s3(a).
Attempts to move to the fully shaded amazon-SDK JAR caused problems of its own.

Here are the versions we should use:

4.5.2
4.4.4

Currently they are declared test dependency.

This JIRA is to move to compile time dependency so that the corresponding jars 
are bundled in lib directory.


> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>
> We need httpclient & httpcore jars to be present when rootdir is placed on 
> s3(a).
> Attempts to move to the fully shaded amazon-SDK JAR caused problems of its 
> own. (according to [~steve_l])
> Here are the versions we should use:
> 4.5.2
> 4.4.4
> Currently they are declared test dependency.
> This JIRA is to move to compile time dependency so that the corresponding 
> jars are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-18209:
--

Assignee: Ted Yu

> Make httpclient / httpcore compile time dependency
> --
>
> Key: HBASE-18209
> URL: https://issues.apache.org/jira/browse/HBASE-18209
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>
> We need httpclient & httpcore jars to be present when rootdir is placed on 
> s3(a).
> Attempts to move to the fully shaded amazon-SDK JAR caused problems of its 
> own.
> Here are the versions we should use:
> 4.5.2
> 4.4.4
> Currently they are declared test dependency.
> This JIRA is to move to compile time dependency so that the corresponding 
> jars are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18209) Make httpclient / httpcore compile time dependency

2017-06-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18209:
--

 Summary: Make httpclient / httpcore compile time dependency
 Key: HBASE-18209
 URL: https://issues.apache.org/jira/browse/HBASE-18209
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


We need httpclient & httpcore jars to be present when rootdir is placed on 
s3(a).
Attempts to move to the fully shaded amazon-SDK JAR caused problems of its own.

Here are the versions we should use:

4.5.2
4.4.4

Currently they are declared test dependency.

This JIRA is to move to compile time dependency so that the corresponding jars 
are bundled in lib directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046571#comment-16046571
 ] 

Pankaj Kumar commented on HBASE-18180:
--

Thanks [~sachinjain024] for looking into this issue, this JIRA is to address 
below problem 
{noformat}
   connection will not be released in case when "mutator.close()" throws 
exception.
{noformat}

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0
>
> Attachments: HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Sachin Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046499#comment-16046499
 ] 

Sachin Jain commented on HBASE-18180:
-

[~pankaj2461] Have a look at TableOutputFormat inside 
org.apache.hadoop.hbase.mapreduce. Connection leak problem has already been 
addressed in this package. Code sample for reference.

{code}
@Override
public void close(TaskAttemptContext context)
throws IOException {
  mutator.close();
  connection.close();
}
{code}

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0
>
> Attachments: HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18180:
-
Fix Version/s: 1.4.0
   3.0.0
   Status: Patch Available  (was: Open)

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.3.1, 1.4.0, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 3.0.0, 1.4.0
>
> Attachments: HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18180) Possible connection leak while closing BufferedMutator in TableOutputFormat

2017-06-12 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-18180:
-
Attachment: HBASE-18180.patch
HBASE-18180-branch-1.patch

Simple patch, please review. 

> Possible connection leak while closing BufferedMutator in TableOutputFormat
> ---
>
> Key: HBASE-18180
> URL: https://issues.apache.org/jira/browse/HBASE-18180
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0, 1.3.1, 1.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Attachments: HBASE-18180-branch-1.patch, HBASE-18180.patch
>
>
> In TableOutputFormat, connection will not be released in case when 
> "mutator.close()" throws exception.
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
> {code}
> public void close(TaskAttemptContext context)
> throws IOException {
>   mutator.close();
>   connection.close();
> }
> {code}
> org.apache.hadoop.hbase.mapred.TableOutputFormat
> {code}
> public void close(Reporter reporter) throws IOException {
>   this.m_mutator.close();
>   if (connection != null) {
> connection.close();
> connection = null;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-12 Thread Clay B. (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046444#comment-16046444
 ] 

Clay B. commented on HBASE-18023:
-

Hi [~dharju], thanks for asking. I would suspect this is very specific to a 
cluster's usage and the specific pathology one's seeking out. In our case I 
think it happened to be 10's of thousands of writes. But I would imagine like 
`hbase.ipc.warn.response.time` or `hbase.ipc.warn.response.size` this could be 
tuned and left with a generic (or non-warning) default?

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: Josh Elser
>Priority: Minor
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18089) TestScannerHeartbeatMessages fails in branch-1

2017-06-12 Thread Xiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiang Li reassigned HBASE-18089:


Assignee: Xiang Li

> TestScannerHeartbeatMessages fails in branch-1
> --
>
> Key: HBASE-18089
> URL: https://issues.apache.org/jira/browse/HBASE-18089
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Xiang Li
> Attachments: test-heartbeat-6860.out
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/6860/artifact/patchprocess/patch-unit-hbase-server.txt
>  :
> {code}
> testScannerHeartbeatMessages(org.apache.hadoop.hbase.regionserver.TestScannerHeartbeatMessages)
>   Time elapsed: 2.376 sec  <<< FAILURE!
> java.lang.AssertionError: Heartbeats messages are disabled, an exception 
> should be thrown. If an exception  is not thrown, the test case is not 
> testing the importance of heartbeat messages
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.regionserver.TestScannerHeartbeatMessages.testImportanceOfHeartbeats(TestScannerHeartbeatMessages.java:237)
>   at 
> org.apache.hadoop.hbase.regionserver.TestScannerHeartbeatMessages.testScannerHeartbeatMessages(TestScannerHeartbeatMessages.java:207)
> {code}
> Similar test failure can be observed in 
> https://builds.apache.org/job/PreCommit-HBASE-Build/6852/artifact/patchprocess/patch-unit-hbase-server.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)