[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058805#comment-16058805
 ] 

Anoop Sam John commented on HBASE-18160:


Another side note. Again not related but we should try fix.
{code}
private Cell referenceCell = null;

  /**
   * When filtering a given Cell in {@link #filterKeyValue(Cell)},
   * this stores the transformed Cell to be returned by {@link 
#transformCell(Cell)}.
   *
   * Individual filters transformation are applied only when the filter 
includes the Cell.
   * Transformations are composed in the order specified by {@link #filters}.
   */
  private Cell transformedCell = null;
{code}
referenceCell is been kept till the next call to transformCell().  So there is 
no usage of a prev set cell after having an RPC shipped(Reaching a batch size 
or caching rows count) and so no functional problems.  Have to reset it to null 
after the check in transformCell() or else this will keep the ref to a cell 
even after it is been shipped in RPC. So keeping ref to the backing HFile block 
byte[] preventing it from GC.  A small reset one liner can avoid such a 
possible issue!
Same for transformedCell 

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058801#comment-16058801
 ] 

Anoop Sam John commented on HBASE-18160:


Quick question
{code}
if (operator == Operator.MUST_PASS_ALL) {
if (filter.filterAllRemaining()) {
  return ReturnCode.NEXT_ROW;
}
ReturnCode code = filter.filterKeyValue(c);
switch (code) {
// Override INCLUDE and continue to evaluate.
case INCLUDE_AND_NEXT_COL:
  rc = ReturnCode.INCLUDE_AND_NEXT_COL; // FindBugs 
SF_SWITCH_FALLTHROUGH
case INCLUDE:
  transformed = filter.transformCell(transformed);
  continue;
case SEEK_NEXT_USING_HINT:
  seekHintFilter = filter;
  return code;
default:
  return code;
}
  }
{code}
In case of MUST_PASS_ALL operator also, we don't handle 
INCLUDE_AND_SEEK_NEXT_ROW.

bq.seekHintFilter = filter;
In case when the list is having more than one filter returning this type and 
seek hints, we just consider one. Functionally that may be ok. But the 
optimized way would be return a highest key hint of all these returned hints. 
This is not related. As I was seeing this area of code after a long time, just 
saying here. We can do in some other issue if needed.

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058800#comment-16058800
 ] 

Duo Zhang commented on HBASE-17125:
---

But for most users they just do not use filter so I do not think it is a good 
idea to add the word 'filter' to the method name.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058792#comment-16058792
 ] 

Phil Yang commented on HBASE-17125:
---

I think setVersions may still confused us and users. Call it 
setMaxVersionsAfterFilters? And I think we should add some comments here to 
tell users how we deal with cf's VERSIONS, filters and Scan#setMaxVersions

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058790#comment-16058790
 ] 

Jerry He commented on HBASE-18161:
--

Thanks for the updated explanation from you (the big edited one which you added 
more after my last comment), [~denselm], which tells that you had put in a lot 
of thoughts. I will look at the patch again while thinking about your points.  
Let's let it sit for a little more time here so that people can chime in if 
they want. Thanks.

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v9.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the 

[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058783#comment-16058783
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #187 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/187/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 8653245e7d20610c616f114cc4eac30f8a8bcb48)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-21 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058782#comment-16058782
 ] 

Allan Yang commented on HBASE-18160:


I'm thinking of another optimization for filterlist with MUST_PASS_ONE. If all 
filter in the filterlist return NEXT_ROW/NEXT_COL, can we return 
NEXT_ROW/NEXT_COL?

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18197) import.java, job output is printing two times.

2017-06-21 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058780#comment-16058780
 ] 

Chia-Ping Tsai commented on HBASE-18197:


The build error is due to HBASE-18159.

> import.java, job output is printing two times.
> --
>
> Key: HBASE-18197
> URL: https://issues.apache.org/jira/browse/HBASE-18197
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 1.0.2, 1.2.0, 1.4.0
>Reporter: Chandra Sekhar
>Assignee: Jan Hentschel
>Priority: Trivial
> Attachments: HBASE-18197.branch-1.0.001.patch, 
> HBASE-18197.branch-1.2.001.patch
>
>
> import.java, job output is printing two times.
> {quote}
> after job completed, job.waitForCompletion(true) is calling two times.
> {quote}
> {code}
> Job job = createSubmittableJob(conf, otherArgs);
> boolean isJobSuccessful = job.waitForCompletion(true);
> if(isJobSuccessful){
>   // Flush all the regions of the table
>   flushRegionsIfNecessary(conf);
> }
> long inputRecords = 
> job.getCounters().findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue();
> long outputRecords = 
> job.getCounters().findCounter(TaskCounter.MAP_OUTPUT_RECORDS).getValue();
> if (outputRecords < inputRecords) {
>   System.err.println("Warning, not all records were imported (maybe 
> filtered out).");
>   if (outputRecords == 0) {
> System.err.println("If the data was exported from HBase 0.94 "+
> "consider using -Dhbase.import.version=0.94.");
>   }
> }
> System.exit(job.waitForCompletion(true) ? 0 : 1);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058772#comment-16058772
 ] 

Hadoop QA commented on HBASE-18119:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
1s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 21s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 16s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 9s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873992/HBASE-18119-v5.patch |
| JIRA Issue | HBASE-18119 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 9e739d97ab1f 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3489a1b |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7288/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7288/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7288/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7288/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7288/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This 

[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058764#comment-16058764
 ] 

Anoop Sam John commented on HBASE-18160:


Sorry for the delay. Checking it now.

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.branch-1.1.v1.patch, 
> HBASE-18160.branch-1.v1.patch, HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18241) Change client.Table and client.Admin to not use HTableDescriptor

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058759#comment-16058759
 ] 

stack commented on HBASE-18241:
---

We should add Table#getDescriptor and deprecate Table#getTableDescriptor... 
ditto for Admin?

[~ping] FYI

> Change client.Table and client.Admin to not use HTableDescriptor
> 
>
> Key: HBASE-18241
> URL: https://issues.apache.org/jira/browse/HBASE-18241
> Project: HBase
>  Issue Type: Bug
>Reporter: Biju Nair
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> {{HTableDescriptor}} is deprecated and scheduled to be removed in 3.0. But 
> [client.Table|https://github.com/apache/hbase/blob/a66d491892514fd4a188d6ca87d6260d8ae46184/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Table.java#L69]
>  and 
> [client.Admin|https://github.com/apache/hbase/blob/a66d491892514fd4a188d6ca87d6260d8ae46184/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java#L198]
>  method {{getTableDescriptor}} returns {{HTableDescriptor}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18241) Change client.Table and client.Admin to not use HTableDescriptor

2017-06-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18241:
--
Priority: Critical  (was: Minor)

> Change client.Table and client.Admin to not use HTableDescriptor
> 
>
> Key: HBASE-18241
> URL: https://issues.apache.org/jira/browse/HBASE-18241
> Project: HBase
>  Issue Type: Bug
>Reporter: Biju Nair
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> {{HTableDescriptor}} is deprecated and scheduled to be removed in 3.0. But 
> [client.Table|https://github.com/apache/hbase/blob/a66d491892514fd4a188d6ca87d6260d8ae46184/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Table.java#L69]
>  and 
> [client.Admin|https://github.com/apache/hbase/blob/a66d491892514fd4a188d6ca87d6260d8ae46184/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java#L198]
>  method {{getTableDescriptor}} returns {{HTableDescriptor}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058756#comment-16058756
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #156 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/156/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 1b15b7825bb390a59dd57527efd4d013c753de5a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18254) ServerCrashProcedure checks and waits for meta initialized, instead should check and wait for meta loaded

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058751#comment-16058751
 ] 

stack commented on HBASE-18254:
---

It would have taken for ever to find this breakage w/o the unit test.

> ServerCrashProcedure checks and waits for meta initialized, instead should 
> check and wait for meta loaded
> -
>
> Key: HBASE-18254
> URL: https://issues.apache.org/jira/browse/HBASE-18254
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0-alpha-1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
> Fix For: 2.0.0
>
> Attachments: HBASE-18254.master.001.patch
>
>
> After enabling test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta,
>  this bug is found in ServerCrashProcedure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18254) ServerCrashProcedure checks and waits for meta initialized, instead should check and wait for meta loaded

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058750#comment-16058750
 ] 

stack commented on HBASE-18254:
---

Oh, I love unit tests.

> ServerCrashProcedure checks and waits for meta initialized, instead should 
> check and wait for meta loaded
> -
>
> Key: HBASE-18254
> URL: https://issues.apache.org/jira/browse/HBASE-18254
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0-alpha-1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
> Fix For: 2.0.0
>
> Attachments: HBASE-18254.master.001.patch
>
>
> After enabling test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta,
>  this bug is found in ServerCrashProcedure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18254) ServerCrashProcedure checks and waits for meta initialized, instead should check and wait for meta loaded

2017-06-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18254:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.0.0-alpha-2)
   2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks for the nice fix and reenable of a useful test 
[~uagashe]

> ServerCrashProcedure checks and waits for meta initialized, instead should 
> check and wait for meta loaded
> -
>
> Key: HBASE-18254
> URL: https://issues.apache.org/jira/browse/HBASE-18254
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0-alpha-1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
> Fix For: 2.0.0
>
> Attachments: HBASE-18254.master.001.patch
>
>
> After enabling test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta,
>  this bug is found in ServerCrashProcedure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058747#comment-16058747
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #201 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/201/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 8653245e7d20610c616f114cc4eac30f8a8bcb48)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-18258) Take down hbasecon2017 logo from hbase.apache.org and put up hbaseconasia2017 instead.

2017-06-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-18258.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Pushed to master.

> Take down hbasecon2017 logo from hbase.apache.org and put up hbaseconasia2017 
> instead.
> --
>
> Key: HBASE-18258
> URL: https://issues.apache.org/jira/browse/HBASE-18258
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 3.0.0
>
>
> Change logo on home page.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18258) Take down hbasecon2017 logo from hbase.apache.org and put up hbaseconasia2017 instead.

2017-06-21 Thread stack (JIRA)
stack created HBASE-18258:
-

 Summary: Take down hbasecon2017 logo from hbase.apache.org and put 
up hbaseconasia2017 instead.
 Key: HBASE-18258
 URL: https://issues.apache.org/jira/browse/HBASE-18258
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack


Change logo on home page.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18235) LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058740#comment-16058740
 ] 

Hudson commented on HBASE-18235:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3238 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3238/])
HBASE-18235 LoadBalancer.BOGUS_SERVER_NAME should not have a bogus (apurtell: 
rev 3489a1b82144d2481e0b08f8d898c0e2a5c24623)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java


> LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname
> ---
>
> Key: HBASE-18235
> URL: https://issues.apache.org/jira/browse/HBASE-18235
> Project: HBase
>  Issue Type: Bug
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18235.patch
>
>
> The original patch used localhost to have assignment fail fast. Avoiding 
> misleading DNS exceptions, delays due to dns lookup, etc. 
> Was wondering what the reason was for changing it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058718#comment-16058718
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #152 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/152/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 1b15b7825bb390a59dd57527efd4d013c753de5a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18254) ServerCrashProcedure checks and waits for meta initialized, instead should check and wait for meta loaded

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058714#comment-16058714
 ] 

stack commented on HBASE-18254:
---

I was worried that this wait on hbase:meta to be loaded would make a case where 
if the crashed server was carrying hbase:meta, then we'd never start the SCP 
(because it wouldn't start until meta was up... but its supposed to put up 
hbase:meta). As it happens, HMaster will recover hbase:meta special outside of 
SCP before it will let SCPs run. See this section in Master:

// get a list for previously failed RS which need log splitting work
// we recover hbase:meta region servers inside master initialization and
// handle other failed servers in SSH in order to start up master node ASAP
MasterMetaBootstrap metaBootstrap = createMetaBootstrap(this, status);
metaBootstrap.splitMetaLogsBeforeAssignment();

... So, this patch is good IMO. Let me commit.

> ServerCrashProcedure checks and waits for meta initialized, instead should 
> check and wait for meta loaded
> -
>
> Key: HBASE-18254
> URL: https://issues.apache.org/jira/browse/HBASE-18254
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0-alpha-1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
> Fix For: 2.0.0-alpha-2
>
> Attachments: HBASE-18254.master.001.patch
>
>
> After enabling test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta,
>  this bug is found in ServerCrashProcedure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058712#comment-16058712
 ] 

Densel Santhmayor commented on HBASE-18161:
---

Changed separator and rebased on top of master

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v9.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the 

[jira] [Updated] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Densel Santhmayor updated HBASE-18161:
--
Attachment: MultiHFileOutputFormatSupport_HBASE_18161_v9.patch

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v9.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - 

[jira] [Commented] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058710#comment-16058710
 ] 

ramkrishna.s.vasudevan commented on HBASE-18255:


Great read. Thanks for the info. Thanks for the patch too. Should we 
specifically check if the JDK is 7 and then apply the fix? Just asking.

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058704#comment-16058704
 ] 

Josh Elser commented on HBASE-18023:


+1 Thanks for the updates. One final thing -- looks like I missed the changes 
to 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
 that are without consequence.

Let's race: I'll commit this tomorrow when I get a chance. If you can get a v4 
up before then, I'll use that. Otherwise, I'll just commit your patch, 
reverting the changes to TestHRegion.java before pushing ;)

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: David Harju
>Priority: Minor
> Attachments: HBASE-18023.master.001.patch, 
> HBASE-18023.master.002.patch, HBASE-18023.master.003.patch
>
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058699#comment-16058699
 ] 

Ted Yu edited comment on HBASE-17125 at 6/22/17 4:21 AM:
-

Based on patch v11 and the snippet I posted earlier, I have an aggregate patch 
which passes all Filter tests and the visibility test.
The complexity of the aggregate patch is on par with patch v11 (sans the 
SpecifiedNumVersionsColumnFilter).

Running thru whole test suite now.



was (Author: yuzhih...@gmail.com):
Based on patch v11 and the snippet I posted earlier, I have an aggregate patch 
which passes all Filter tests and the visibility test.
The complexity of the aggregate patch is on par with patch v11 (sans the 
SpecificNumberVersionsFilter).

Running thru whole test suite now.


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058699#comment-16058699
 ] 

Ted Yu commented on HBASE-17125:


Based on patch v11 and the snippet I posted earlier, I have an aggregate patch 
which passes all Filter tests and the visibility test.
The complexity of the aggregate patch is on par with patch v11 (sans the 
SpecificNumberVersionsFilter).

Running thru whole test suite now.


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18248) Warn if monitored task has been tied up beyond a configurable threshold

2017-06-21 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058697#comment-16058697
 ] 

Allan Yang commented on HBASE-18248:


Just a minor suggest, why do we need to pass a conf every time?
{code} 
   /**
* Get singleton instance.
* TODO this would be better off scoped to a single daemon
*/
-  public static synchronized TaskMonitor get() {
+  public static synchronized TaskMonitor get(Configuration conf) {
 if (instance == null) {
-  instance = new TaskMonitor();
+  instance = new TaskMonitor(conf);
 }
 return instance;
   }
{code}
TaskMonitor is a singleton, can we consider create a configuration when 
creating TaskMonitor, so we don't need to pass a conf every time.
{code}
TaskMonitor() {
Configuration conf = HBaseConfiguration.create();
..
}
{code}

> Warn if monitored task has been tied up beyond a configurable threshold
> ---
>
> Key: HBASE-18248
> URL: https://issues.apache.org/jira/browse/HBASE-18248
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18248-branch-1.3.patch, 
> HBASE-18248-branch-1.patch, HBASE-18248-branch-2.patch, HBASE-18248.patch
>
>
> Warn if monitored task has been tied up beyond a configurable threshold. We 
> especially want to do this for RPC tasks. Use a separate threshold for 
> warning about stuck RPC tasks versus other types of tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058696#comment-16058696
 ] 

Josh Elser commented on HBASE-18255:


+1 from me -- should prevent this from being a problem on Java7 and doesn't 
change semantics of Java8

Thanks for putting the patch together, Vlad!

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058674#comment-16058674
 ] 

Ted Yu commented on HBASE-18161:


bq. adding a default namespace is more inefficient

Right. Assuming there are many (composite) row keys, not embedding default 
namespace would be a saving. Please go with different separator.

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles

[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058672#comment-16058672
 ] 

Densel Santhmayor commented on HBASE-18161:
---

Excellent point. I hadn't thought of that. We can solve this in 2 ways:

1. I can update the current createKey api to add 'default' as the namespace in 
the composite key going forward (with the same colon separator), thus 
eliminating the ambiguity when namespaces are finally supported. 
2. I can use a different separator to separate tablename and rowkey so that we 
can use the colon to separate namespace and tablename in the future as you 
pointed out earlier

1 seems to have less ambiguity but for millions of keys, adding a default 
namespace is more inefficient and probably unnecessary. I'm happy to go with 2, 
unless you have anything else to add?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 

[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058663#comment-16058663
 ] 

Ted Yu commented on HBASE-18161:


bq. it will be namespace 'a', tablename 'b' and rowkey 'c' always

Should we allow future version to recognize composite row key serialized with 
current patch where table name is in 'default' namespace and therefore omitted ?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>  

[jira] [Commented] (HBASE-17908) Upgrade guava

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058661#comment-16058661
 ] 

stack commented on HBASE-17908:
---

Might have to do this for 2.0 if we can't have downstreamers swallow changed 
ReplicationEndpoint Interface.

> Upgrade guava
> -
>
> Key: HBASE-17908
> URL: https://issues.apache.org/jira/browse/HBASE-17908
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Balazs Meszaros
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
>
> Currently we are using guava 12.0.1, but the latest version is 21.0. 
> Upgrading guava is always a hassle because it is not always backward 
> compatible with itself.
> Currently I think there are to approaches:
> 1. Upgrade guava to the newest version (21.0) and shade it.
> 2. Upgrade guava to a version which does not break or builds (15.0).
> If we can update it, some dependencies should be removed: 
> commons-collections, commons-codec, ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18257) Cover exposed guava Service Interface in ReplicationEndpoint

2017-06-21 Thread stack (JIRA)
stack created HBASE-18257:
-

 Summary: Cover exposed guava Service Interface in 
ReplicationEndpoint
 Key: HBASE-18257
 URL: https://issues.apache.org/jira/browse/HBASE-18257
 Project: HBase
  Issue Type: Task
Reporter: stack


ReplicationEndpoint implements guava Service. It should instead implement an 
hbase Service class that covers over guava; lets not have third-party 
Interfaces/classes exposed as part of our API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058659#comment-16058659
 ] 

Hadoop QA commented on HBASE-18023:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 25s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s 
{color} | {color:red} hbase-common in master has 2 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 5m 9s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
57m 39s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 10s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 188m 39s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
46s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 315m 31s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873955/HBASE-18023.master.003.patch
 |
| JIRA Issue | HBASE-18023 |
| Optional Tests |  asflicense  javac  javadoc  unit  xml  findbugs  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux dc8c0df2a642 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3489a1b |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 

[jira] [Assigned] (HBASE-17908) Upgrade guava

2017-06-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-17908:
-

Assignee: stack
Hadoop Flags: Incompatible change
Release Note: 
Move to a relocated guava 22.0.

Incompatible change. ReplicationEndpoint and subclasses extends guava Service 
which changed pretty radically between 12.0 and 22.0. Change is kosher because 
implementations are marked audience private. Still, this will likely cause 
grief for the likes of the downstream lily indexer.

> Upgrade guava
> -
>
> Key: HBASE-17908
> URL: https://issues.apache.org/jira/browse/HBASE-17908
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Balazs Meszaros
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
>
> Currently we are using guava 12.0.1, but the latest version is 21.0. 
> Upgrading guava is always a hassle because it is not always backward 
> compatible with itself.
> Currently I think there are to approaches:
> 1. Upgrade guava to the newest version (21.0) and shade it.
> 2. Upgrade guava to a version which does not break or builds (15.0).
> If we can update it, some dependencies should be removed: 
> commons-collections, commons-codec, ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058651#comment-16058651
 ] 

Densel Santhmayor edited comment on HBASE-18161 at 6/22/17 3:14 AM:


This code is from the link I posted a bit above:
{noformat}
065  // A non-capture group so that this can be embedded.
066  // regex is a bit more complicated to support nuance of tables
067  // in default namespace
068  //Allows only letters, digits and '_'
069  public static final String VALID_NAMESPACE_REGEX =
070  "(?:[_\\p{Digit}\\p{IsAlphabetic}]+)";
071  //Allows only letters, digits, '_', '-' and '.'
072  public static final String VALID_TABLE_QUALIFIER_REGEX =
073  "(?:[_\\p{Digit}\\p{IsAlphabetic}][-_.\\p{Digit}\\p{IsAlphabetic}]*)";
074  //Concatenation of NAMESPACE_REGEX and TABLE_QUALIFIER_REGEX,
075  //with NAMESPACE_DELIM as delimiter
076  public static final String VALID_USER_TABLE_REGEX =
077  "(?:(?:(?:"+VALID_NAMESPACE_REGEX+"\\"+NAMESPACE_DELIM+")?)" +
078 "(?:"+VALID_TABLE_QUALIFIER_REGEX+"))";
{noformat}

The colon is used as a delimiter for namespace and tablename since either of 
them cannot include a colon in the name

So for composite row key 'a:b:c', in the current patch, it will always be 
tablename 'a' and rowkey 'b:c'. If the composite rowkey includes namespace 
support, then it will be namespace 'a', tablename 'b' and rowkey 'c' always. In 
the case of composite rowkey 'a:b:c:d', it should again always be namespace 
'a', tablename 'b', and rowkey 'c:d'. Does that sound correct to you? 



was (Author: denselm):
This code is from the link I posted a bit above:

In the future   // A non-capture group so that this can be embedded.
066  // regex is a bit more complicated to support nuance of tables
067  // in default namespace
068  //Allows only letters, digits and '_'
069  public static final String VALID_NAMESPACE_REGEX =
070  "(?:[_\\p{Digit}\\p{IsAlphabetic}]+)";
071  //Allows only letters, digits, '_', '-' and '.'
072  public static final String VALID_TABLE_QUALIFIER_REGEX =
073  "(?:[_\\p{Digit}\\p{IsAlphabetic}][-_.\\p{Digit}\\p{IsAlphabetic}]*)";
074  //Concatenation of NAMESPACE_REGEX and TABLE_QUALIFIER_REGEX,
075  //with NAMESPACE_DELIM as delimiter
076  public static final String VALID_USER_TABLE_REGEX =
077  "(?:(?:(?:"+VALID_NAMESPACE_REGEX+"\\"+NAMESPACE_DELIM+")?)" +
078 "(?:"+VALID_TABLE_QUALIFIER_REGEX+"))";

The colon is used as a delimiter for namespace and tablename since either of 
them cannot include a colon in the name

So for composite row key 'a:b:c', in the current patch, it will always be 
tablename 'a' and rowkey 'b:c'. If the composite rowkey includes namespace 
support, then it will be namespace 'a', tablename 'b' and rowkey 'c' always. In 
the case of composite rowkey 'a:b:c:d', it should again always be namespace 
'a', tablename 'b', and rowkey 'c:d'. Does that sound correct to you? 


> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access 

[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058651#comment-16058651
 ] 

Densel Santhmayor commented on HBASE-18161:
---

This code is from the link I posted a bit above:

In the future   // A non-capture group so that this can be embedded.
066  // regex is a bit more complicated to support nuance of tables
067  // in default namespace
068  //Allows only letters, digits and '_'
069  public static final String VALID_NAMESPACE_REGEX =
070  "(?:[_\\p{Digit}\\p{IsAlphabetic}]+)";
071  //Allows only letters, digits, '_', '-' and '.'
072  public static final String VALID_TABLE_QUALIFIER_REGEX =
073  "(?:[_\\p{Digit}\\p{IsAlphabetic}][-_.\\p{Digit}\\p{IsAlphabetic}]*)";
074  //Concatenation of NAMESPACE_REGEX and TABLE_QUALIFIER_REGEX,
075  //with NAMESPACE_DELIM as delimiter
076  public static final String VALID_USER_TABLE_REGEX =
077  "(?:(?:(?:"+VALID_NAMESPACE_REGEX+"\\"+NAMESPACE_DELIM+")?)" +
078 "(?:"+VALID_TABLE_QUALIFIER_REGEX+"))";

The colon is used as a delimiter for namespace and tablename since either of 
them cannot include a colon in the name

So for composite row key 'a:b:c', in the current patch, it will always be 
tablename 'a' and rowkey 'b:c'. If the composite rowkey includes namespace 
support, then it will be namespace 'a', tablename 'b' and rowkey 'c' always. In 
the case of composite rowkey 'a:b:c:d', it should again always be namespace 
'a', tablename 'b', and rowkey 'c:d'. Does that sound correct to you? 


> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER 

[jira] [Commented] (HBASE-17908) Upgrade guava

2017-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058650#comment-16058650
 ] 

stack commented on HBASE-17908:
---

Working on a patch that depends on new hbase-thirdparty project which bundles 
up a couple of problematic libs into a fat jar and then offsets the libs so 
they are at an offset of org.apache.hbase.thirdparty.shaded. Patch is mostly 
changing imports and then some fixup of the differences between guava 12.0 and 
22.0, the currently shipping version. Of note is that ReplicationEndpoint 
implemented guava Service which has changed a good bit in 22.0. That makes this 
a 'breaking' change though all RE implementations are marked audience Private. 
This change will break stuff like they lily indexer. Need to talk to them about 
it. Could depend on guava 12.0 since all internel refs will be to the relocated 
guava 22.0 but that'd be awkward to message (use the internal guava everywhere 
though the 'natural-seeming' com.google.guava is on your classpath). Lets not 
do this unless we really have too. Can do in separate issue.

> Upgrade guava
> -
>
> Key: HBASE-17908
> URL: https://issues.apache.org/jira/browse/HBASE-17908
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Balazs Meszaros
>Priority: Critical
> Fix For: 2.0.0
>
>
> Currently we are using guava 12.0.1, but the latest version is 21.0. 
> Upgrading guava is always a hassle because it is not always backward 
> compatible with itself.
> Currently I think there are to approaches:
> 1. Upgrade guava to the newest version (21.0) and shade it.
> 2. Upgrade guava to a version which does not break or builds (15.0).
> If we can update it, some dependencies should be removed: 
> commons-collections, commons-codec, ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18235) LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058648#comment-16058648
 ] 

Hudson commented on HBASE-18235:


FAILURE: Integrated in Jenkins build HBase-2.0 #86 (See 
[https://builds.apache.org/job/HBase-2.0/86/])
HBASE-18235 LoadBalancer.BOGUS_SERVER_NAME should not have a bogus (apurtell: 
rev ff5d497310c5551698acdd8592f4a3e484855d8d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java


> LoadBalancer.BOGUS_SERVER_NAME should not have a bogus hostname
> ---
>
> Key: HBASE-18235
> URL: https://issues.apache.org/jira/browse/HBASE-18235
> Project: HBase
>  Issue Type: Bug
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18235.patch
>
>
> The original patch used localhost to have assignment fail fast. Avoiding 
> misleading DNS exceptions, delays due to dns lookup, etc. 
> Was wondering what the reason was for changing it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058642#comment-16058642
 ] 

Ted Yu commented on HBASE-18161:


If colon is the separator, with composite row key: 'a:b:c', does it represent 
table 'a' and original row key of 'b:c', or does it represent table 'a:b' and 
original row key of 'c' ?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> 

[jira] [Updated] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Qilin Cao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qilin Cao updated HBASE-18119:
--
Attachment: HBASE-18119-v5.patch

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v3.patch, 
> HBASE-18119-v4.patch, HBASE-18119-v5.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Qilin Cao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qilin Cao updated HBASE-18119:
--
Attachment: (was: HBASE-18119-v5.patch)

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v3.patch, 
> HBASE-18119-v4.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058637#comment-16058637
 ] 

Densel Santhmayor commented on HBASE-18161:
---

1. Still not clear on this. Why can't we use the same separator for namespace 
to tablename and tablename to rowkey? Why does it have to be different? The 
code to separate them out won't change all that much, right?

2. If I understand you correctly, you'd like the createKey api to support 
namespaces now so that the public interface doesn't have to change in the 
future, correct? If that's the case, yes I can change the createKey api to 
support namespaces now even though I will provide support for namespaces in the 
compression/blocksize etc encodings in the future. Does that work?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while 

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058630#comment-16058630
 ] 

Anoop Sam John commented on HBASE-17125:


That makes sense Duo.. Already we have a FilterWrapper been used to wrap user 
side filters.  Pls check how this idea will turn out to be.. Thanks a lot 
Guanghao for ur perseverance.  Appreciate it !

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058624#comment-16058624
 ] 

Ted Yu commented on HBASE-18161:


bq. looking to use different separators for namespace separation from tablename 
vs tablename separation from rowkey? 

Yes.
Since there may be the same table name in different namespaces, it would be 
good practice to write both namespace and table name.

bq. we can mandate a namespace is necessary when creating the key

We should be prepared when the above happens.


> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> 

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058623#comment-16058623
 ] 

Anoop Sam John commented on HBASE-17125:


bq.I think we can append a SpecificNumberVersionsFilter at the last if filter 
and versions are both present when initialize a scan at RS side.
U mean HBase code itself add it than asking the user to do this right?   Am 
fine with any approach (which seems out to be the best by all of us) which can 
help to solve this bug by HBase code itself.  (Than asking user to do some 
extra work)..  If this is really not possible, then only , as a last step ask 
the user to do some extra things.  
Will look at the patches and approaches.  Thanks guys

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-17125:
---
Comment: was deleted

(was: bq.I think we can append a SpecificNumberVersionsFilter at the last if 
filter and versions are both present when initialize a scan at RS side.
U mean HBase code itself add it than asking the user to do this right?   Am 
fine with any approach (which seems out to be the best by all of us) which can 
help to solve this bug by HBase code itself.  (Than asking user to do some 
extra work)..  If this is really not possible, then only , as a last step ask 
the user to do some extra things.  
Will look at the patches and approaches.  Thanks guys)

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058621#comment-16058621
 ] 

Duo Zhang commented on HBASE-17125:
---

Yeah I mean do it by ourselves, not by users.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058619#comment-16058619
 ] 

Hadoop QA commented on HBASE-18119:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 23s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 37s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 36s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 36s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 18s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 32s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 46s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 2s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 16s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 32s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 46s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 0s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 15s 
{color} | {color:red} The patch causes 16 errors with Hadoop v3.0.0-alpha3. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 18s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 35s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
9s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 56s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873984/HBASE-18119-v5.patch |
| JIRA Issue | HBASE-18119 |
| Optional Tests |  asflicense  

[jira] [Comment Edited] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058591#comment-16058591
 ] 

Ted Yu edited comment on HBASE-18161 at 6/22/17 2:24 AM:
-

w.r.t. namespace support, we should lay a solid foundation in this JIRA.
Using colon as separator would make supporting namespace hard.

How about using semicolon as separator ?

Edit: corrected typo


was (Author: yuzhih...@gmail.com):
w.r.t. namespace support, we should lay a solid foundation in this JIRA.
Using semicolon as separator would make supporting namespace hard.

How about using semicolon as separator ?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, 

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058620#comment-16058620
 ] 

Anoop Sam John commented on HBASE-17125:


bq.I think we can append a SpecificNumberVersionsFilter at the last if filter 
and versions are both present when initialize a scan at RS side.
U mean HBase code itself add it than asking the user to do this right?   Am 
fine with any approach (which seems out to be the best by all of us) which can 
help to solve this bug by HBase code itself.  (Than asking user to do some 
extra work)..  If this is really not possible, then only , as a last step ask 
the user to do some extra things.  
Will look at the patches and approaches.  Thanks guys

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Densel Santhmayor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058616#comment-16058616
 ] 

Densel Santhmayor commented on HBASE-18161:
---

I am more than happy to lay the right foundation for future namespace support. 
I just didn't think that implementing complete support for namespaces should be 
done in this ticket. Since the separator is for internal-use, we should be able 
to change it at any time in the future as well. However, to address your 
concern, I'd like to understand more. 

Did you mean "Using *colon* as separator would make supporting namespace 
hard."? 

I'm not sure why this would be the case. I was looking at 
https://hbase.apache.org/apidocs/src-html/org/apache/hadoop/hbase/TableName.html#line.63
 and I saw the line
public final static char NAMESPACE_DELIM = ':';
So this seems to only be a concern if you are looking to use different 
separators for namespace separation from tablename vs tablename separation from 
rowkey? 
I'm not sure why we can't use the same delimiter to separate the namespace from 
the tablename as well as the tablename from the rowkey, since when we do 
implement namespace support, we can mandate a namespace is necessary when 
creating the key, just like we mandate now that the tablename is required when 
creating the key for multitable support
Also, namespaces/tablenames according to that source code can only have 
letters, digits and '_', so there is no chance of this delimiter ending up in 
the namespace (or tablename)
2. I'm more than happy to use a semicolon as separator in the current patch if 
you or others think that's the right way to go, just trying to understand 
whether there is a technical barrier or not. 

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept 

[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058613#comment-16058613
 ] 

Hudson commented on HBASE-18226:


FAILURE: Integrated in Jenkins build HBase-1.4 #785 (See 
[https://builds.apache.org/job/HBase-1.4/785/])
HBASE-18226 Disable reverse DNS lookup at HMaster and use the hostname (tedyu: 
rev 940f4107b0ac02592f6536cae8601132bb760e72)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerHostname.java
* (edit) hbase-common/src/main/resources/hbase-default.xml
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: 18226.branch-1.txt, HBASE-18226.001.patch, 
> HBASE-18226.002.patch, HBASE-18226.003.patch, HBASE-18226.004.patch, 
> HBASE-18226.005.patch, HBASE-18226.006.patch, HBASE-18226-branch-1.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058606#comment-16058606
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1967 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1967/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
106471bfb29bf3a7bcb74a9e53ed84443d1ba2c3)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) pom.xml
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058607#comment-16058607
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1967 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1967/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 24d94e857ef8897b90eed63397ea8ba9560963f3)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058604#comment-16058604
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1884 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1884/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 24d94e857ef8897b90eed63397ea8ba9560963f3)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058603#comment-16058603
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1884 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1884/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
106471bfb29bf3a7bcb74a9e53ed84443d1ba2c3)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) pom.xml


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Qilin Cao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qilin Cao updated HBASE-18119:
--
Attachment: (was: HBASE-18119-v2.patch)

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v3.patch, 
> HBASE-18119-v4.patch, HBASE-18119-v5.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058594#comment-16058594
 ] 

Duo Zhang commented on HBASE-17125:
---

I think we can append a SpecificNumberVersionsFilter at the last if filter and 
versions are both present when initialize a scan at RS side. This way we do not 
need to modify the logic of ColumnTracker, the code is already complicated 
enough so do not put new stuffs to it. I still think the problem introduced by 
filter should also be addressed by filter.

And the name 'setVersions' does not make sense, change it to 'readAllVersions'? 
And how do you deal with raw scan?

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058591#comment-16058591
 ] 

Ted Yu commented on HBASE-18161:


w.r.t. namespace support, we should lay a solid foundation in this JIRA.
Using semicolon as separator would make supporting namespace hard.

How about using semicolon as separator ?

> Incremental Load support for Multiple-Table HFileOutputFormat
> -
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
>  Issue Type: New Feature
>Reporter: Densel Santhmayor
>Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, 
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to 
> HFiles for a single table. The file(s) can then be uploaded to the relevant 
> RegionServers information with reasonable latency. This feature is useful to 
> make a large set of data available for queries at the same time as well as 
> provides a way to efficiently process very large input into HBase without 
> affecting query latencies.
> There is, however, no support to write variations of the same record key to 
> HFiles belonging to multiple HBase tables from within the same MapReduce job. 
>  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
> HFiles for different tables within the same MapReduce job while single-table 
> HFile features backwards-compatible. 
> For our use case, we needed to write a record key to a smaller HBase table 
> for quicker access, and the same record key with a date appended to a larger 
> table for longer term storage with chronological access. Each of these tables 
> would have different TTL and other settings to support their respective 
> access patterns. We also needed to be able to bulk write records to multiple 
> tables with different subsets of very large input as efficiently as possible. 
> Rather than run the MapReduce job multiple times (one for each table or 
> record structure), it would be useful to be able to parse the input a single 
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing 
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
> sensitivity (that was introduced long after we implemented support for 
> multiple tables) to support both single table and multi table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in 
> HFileOutputFormat2 will be maintained and in this case, mappers will need to 
> emit the table rowkey as before. However, a new class - 
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey 
> for mappers that prefixes the desired tablename to the existing rowkey as 
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and 
> region locator pairs, analogous to the single pair currently accepted by 
> HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column 
> family that are set in the Configuration object are now indexed and retrieved 
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate 
> split points and therefore partitions collectively for all tables. Similarly, 
> now the eventual number of Reducers will be equal to the total number of 
> partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or 
> without the tablename prepended depending on how configureIncrementalLoad was 
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which 
> will match the output format of HFileOutputFormat2. However, while the 
> default use case will keep the existing directory structure with column 
> family name as the directory and HFiles within that directory, in the case of 
> MultiHFileOutputFormat, it will output HFiles in the output directory with 
> the following relative paths: 
> {noformat}
>  --table1 
>--family1 
>  --HFiles 
>  --table2 
>--family1 
>--family2 
>  --HFiles
> 

[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058587#comment-16058587
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #155 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/155/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
64f41f1b50be9e75c25dd6078cc9a33850878aa9)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) pom.xml


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058583#comment-16058583
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #889 (See 
[https://builds.apache.org/job/HBase-1.2-IT/889/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 1b15b7825bb390a59dd57527efd4d013c753de5a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Duo Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058580#comment-16058580
 ] 

Duo Xu commented on HBASE-18226:


[~tedyu]

Thanks! I just attached a branch-1 patch at the same time :)

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: 18226.branch-1.txt, HBASE-18226.001.patch, 
> HBASE-18226.002.patch, HBASE-18226.003.patch, HBASE-18226.004.patch, 
> HBASE-18226.005.patch, HBASE-18226.006.patch, HBASE-18226-branch-1.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18226:
---
Fix Version/s: 1.4.0

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: 18226.branch-1.txt, HBASE-18226.001.patch, 
> HBASE-18226.002.patch, HBASE-18226.003.patch, HBASE-18226.004.patch, 
> HBASE-18226.005.patch, HBASE-18226.006.patch, HBASE-18226-branch-1.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Duo Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Xu updated HBASE-18226:
---
Attachment: HBASE-18226-branch-1.patch

Attach a patch for branch-1.

Local build successfully and passed the tests 
"org.apache.hadoop.hbase.regionserver.TestRegionServerHostname".

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 18226.branch-1.txt, HBASE-18226.001.patch, 
> HBASE-18226.002.patch, HBASE-18226.003.patch, HBASE-18226.004.patch, 
> HBASE-18226.005.patch, HBASE-18226.006.patch, HBASE-18226-branch-1.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18226:
---
Attachment: 18226.branch-1.txt

What I committed to branch-1

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 18226.branch-1.txt, HBASE-18226.001.patch, 
> HBASE-18226.002.patch, HBASE-18226.003.patch, HBASE-18226.004.patch, 
> HBASE-18226.005.patch, HBASE-18226.006.patch, HBASE-18226-branch-1.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Qilin Cao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058565#comment-16058565
 ] 

Qilin Cao commented on HBASE-18119:
---

OK, I recreated a patch.

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch, HBASE-18119-v4.patch, HBASE-18119-v5.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058564#comment-16058564
 ] 

Vladimir Rodionov commented on HBASE-18255:
---

cc: [~devaraj], [~elserj], [~enis]

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18119) Improve HFile readability and modify ChecksumUtil log level

2017-06-21 Thread Qilin Cao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qilin Cao updated HBASE-18119:
--
Attachment: HBASE-18119-v5.patch

> Improve HFile readability and modify ChecksumUtil log level
> ---
>
> Key: HBASE-18119
> URL: https://issues.apache.org/jira/browse/HBASE-18119
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Qilin Cao
>Assignee: Qilin Cao
>Priority: Minor
> Attachments: HBASE-18119-v1.patch, HBASE-18119-v2.patch, 
> HBASE-18119-v3.patch, HBASE-18119-v4.patch, HBASE-18119-v5.patch
>
>
> It is confused when I read the HFile.checkHFileVersion method, so I change 
> the if expression. Simultaneously, I change ChecksumUtil the info log level 
> to trace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16148) Hybrid Logical Clocks(placeholder for running tests)

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058559#comment-16058559
 ] 

Hadoop QA commented on HBASE-16148:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 29 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
32s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 32s 
{color} | {color:red} hbase-common in master has 2 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 1s 
{color} | {color:red} hbase-protocol-shaded in master has 27 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 46s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 17s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 22 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 33s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 35s 
{color} | {color:red} hbase-server generated 1 new + 10 unchanged - 0 fixed = 
11 total (was 10) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 110m 22s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058556#comment-16058556
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #151 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/151/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
64f41f1b50be9e75c25dd6078cc9a33850878aa9)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) pom.xml


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058554#comment-16058554
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #200 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/200/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
5e144fc01ce65ea73a3c0027a8500a9da76d53f4)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) pom.xml


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058551#comment-16058551
 ] 

Hudson commented on HBASE-15691:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #69 (See 
[https://builds.apache.org/job/HBase-1.3-IT/69/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 8653245e7d20610c616f114cc4eac30f8a8bcb48)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18254) ServerCrashProcedure checks and waits for meta initialized, instead should check and wait for meta loaded

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058549#comment-16058549
 ] 

Hadoop QA commented on HBASE-18254:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 26s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 22s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 50s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 106m 35s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 165m 16s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873956/HBASE-18254.master.001.patch
 |
| JIRA Issue | HBASE-18254 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 76719201554c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3489a1b |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7283/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7283/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7283/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> ServerCrashProcedure checks and waits for meta initialized, instead should 
> check and wait for meta loaded
> -
>
> Key: HBASE-18254
> URL: 

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058546#comment-16058546
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 48s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 59s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hbase-client generated 5 new + 1 unchanged - 0 fixed = 6 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 42s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 12s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestKeepDeletes |
|   | hadoop.hbase.regionserver.querymatcher.TestUserScanQueryMatcher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873971/HBASE-17125.master.checkReturnedVersions.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 0c417fc24366 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3489a1b |
| Default Java 

[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Duo Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058545#comment-16058545
 ] 

Duo Xu commented on HBASE-18226:


[~syuanjiang]

Do I need to submit a new patch for branch-1?

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18230) Generated LICENSE file includes unsubstituted Velocity variables

2017-06-21 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-18230:
--
Attachment: HBASE-18230.addendum.patch

[~busbey] - I missed an instance here, fix it as an addendum here or do you 
want a new JIRA for it?

> Generated LICENSE file includes unsubstituted Velocity variables
> 
>
> Key: HBASE-18230
> URL: https://issues.apache.org/jira/browse/HBASE-18230
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0-alpha-1
>Reporter: Mike Drob
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-18230.addendum.patch, HBASE-18230.patch
>
>
> From the release vote:
> {quote}
> we have a ton of places where we have velocity variables instead of
> copyright years, but IIRC that's a problem on branch-1 right now too.
> {quote}
> This is referring to lines like these:
> {noformat}
>   * javax.annotation-api, ${dep.licenses[0].comments}
>   * javax.servlet-api, ${dep.licenses[0].comments}
>   * jetty-schemas, ${dep.licenses[0].comments}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058541#comment-16058541
 ] 

Stephen Yuan Jiang commented on HBASE-18226:


[~onpduo], mind to port to branch-1 as well?

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18256) Option to parallelize restoreStoreFile in RestoreSnapshotHelper#cloneRegion

2017-06-21 Thread huaxiang sun (JIRA)
huaxiang sun created HBASE-18256:


 Summary: Option to parallelize restoreStoreFile in 
RestoreSnapshotHelper#cloneRegion
 Key: HBASE-18256
 URL: https://issues.apache.org/jira/browse/HBASE-18256
 Project: HBase
  Issue Type: Improvement
  Components: mob
Affects Versions: 2.0.0-alpha-1
Reporter: huaxiang sun
Assignee: huaxiang sun
Priority: Minor


In one of the MOB case, we found that cloneSnapshot of a MOB enabled table took 
very long time. The reason is that for the MOB region, there are lots of MOB 
files. It took long time to clone the MOB region.  In order to speed up 
cloneSnapshot, we will provide an option to  parallelize restoreStoreFile in 
RestoreSnapshotHelper#cloneRegion, so MOB can take advantage of it. For non-MOB 
tables, it will keep the current behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058509#comment-16058509
 ] 

Hudson commented on HBASE-16351:


FAILURE: Integrated in Jenkins build HBase-2.0 #85 (See 
[https://builds.apache.org/job/HBase-2.0/85/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
be559f1eb89ac22fac7303916f809d7451d8dae1)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) pom.xml


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058499#comment-16058499
 ] 

Hudson commented on HBASE-15691:


FAILURE: Integrated in Jenkins build HBase-1.4 #784 (See 
[https://builds.apache.org/job/HBase-1.4/784/])
HBASE-15691 ConcurrentModificationException in BucketAllocator (syuanjiangdev: 
rev 105c5c36e622c435d0e952722ce12bddbf08c22f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-18255:
--
Fix Version/s: 1.1.12
   1.2.7
   1.3.2

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-18255:
--
Affects Version/s: 1.1.12
   1.2.7
   1.3.2
   1.4.0

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-18255:
--
Affects Version/s: (was: 1.1.12)
   (was: 1.2.7)
   (was: 1.3.2)
   (was: 1.4.0)
   1.3.1
   1.2.6
   1.1.11

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 1.1.11
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058496#comment-16058496
 ] 

Hudson commented on HBASE-16351:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #186 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/186/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
5e144fc01ce65ea73a3c0027a8500a9da76d53f4)
* (edit) pom.xml
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-18255:
--
Attachment: HBASE-18255-branch-1.x.v1.patch

Simple patch is attached

> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18255-branch-1.x.v1.patch
>
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058476#comment-16058476
 ] 

Ted Yu commented on HBASE-17125:


Can you put the latest patch on review board ?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-18255:
--
Description: 
The good summary of the issue and provided resolution can be found in this 
article:
https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html

In a few words, due to internal JVM 7 bug (which has been addressed only in 
Java 8), HotSpot code cache can become full and after that ALL JIT compilations 
get suspended indefinitely.  The default value for code cache size in JVM 7 is 
quite low: 48MB. It is recommended to increase this value at least to 256MB 
(default in JVM 8).

This BUG affects only 1.x 

  was:
The good summary of the issue and provided resolution can be found in this 
article:
https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html

In a few words, due to internal JVM 7 bug (which has been addressed only in 
Java 8), HotSpot code cache can become full and after that ALL JIT compilations 
get suspended indefinitely.  The default value for code cache size in JVM 7 is 
quite low: 48MB. It is recommended to increase this value at least to 256MB 
(default in JVM 8).


> Time-Delayed HBase Performance Degradation with Java 7
> --
>
> Key: HBASE-18255
> URL: https://issues.apache.org/jira/browse/HBASE-18255
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>
> The good summary of the issue and provided resolution can be found in this 
> article:
> https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html
> In a few words, due to internal JVM 7 bug (which has been addressed only in 
> Java 8), HotSpot code cache can become full and after that ALL JIT 
> compilations get suspended indefinitely.  The default value for code cache 
> size in JVM 7 is quite low: 48MB. It is recommended to increase this value at 
> least to 256MB (default in JVM 8).
> This BUG affects only 1.x 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18255) Time-Delayed HBase Performance Degradation with Java 7

2017-06-21 Thread Vladimir Rodionov (JIRA)
Vladimir Rodionov created HBASE-18255:
-

 Summary: Time-Delayed HBase Performance Degradation with Java 7
 Key: HBASE-18255
 URL: https://issues.apache.org/jira/browse/HBASE-18255
 Project: HBase
  Issue Type: Bug
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov


The good summary of the issue and provided resolution can be found in this 
article:
https://community.hortonworks.com/articles/105802/time-delayed-hbase-performance-degradation-with-ja.html

In a few words, due to internal JVM 7 bug (which has been addressed only in 
Java 8), HotSpot code cache can become full and after that ALL JIT compilations 
get suspended indefinitely.  The default value for code cache size in JVM 7 is 
quite low: 48MB. It is recommended to increase this value at least to 256MB 
(default in JVM 8).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058467#comment-16058467
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #888 (See 
[https://builds.apache.org/job/HBase-1.2-IT/888/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
64f41f1b50be9e75c25dd6078cc9a33850878aa9)
* (edit) pom.xml
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18041) Add pylintrc file to HBase

2017-06-21 Thread Alex Leblang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Leblang updated HBASE-18041:
-
Attachment: diff-patch-pylint.txt
diff-patch-pylint-noRC.txt

Here is a run with the pylint file. There are 790 issues with the file. I've 
also attached a run without the RC file. Without a pylintrc file there are 
15600 issues

> Add pylintrc file to HBase
> --
>
> Key: HBASE-18041
> URL: https://issues.apache.org/jira/browse/HBASE-18041
> Project: HBase
>  Issue Type: Improvement
>  Components: community
>Reporter: Alex Leblang
>Assignee: Alex Leblang
> Attachments: diff-patch-pylint-noRC.txt, diff-patch-pylint.txt, 
> HBASE-18041.branch-1.2.001.patch
>
>
> Yetus runs all commits with python files through a linter. I think that the 
> HBase community should add a pylintrc file to actively choose the project's 
> python style instead of just relying on yetus defaults.
> As an argument for this, the yetus project itself doesn't even use the 
> default python linter for its own commits.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058441#comment-16058441
 ] 

Hudson commented on HBASE-16351:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3237 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3237/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
d8e0163facdb8003b71fb1173ca86340317c8fdc)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm
* (edit) pom.xml
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18245) Handle failed server in RpcClient

2017-06-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18245:
---
Attachment: 18245.v2.txt

Patch v2 tries to hook up FailedServers.

ConnectionException hasn't been checked in yet - there is more work to be done.

> Handle failed server in RpcClient
> -
>
> Key: HBASE-18245
> URL: https://issues.apache.org/jira/browse/HBASE-18245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
> Attachments: 18245.v1.txt, 18245.v2.txt
>
>
> This task is to add support for failed server in RpcClient::GetConnection().
> FailedServers Java class would be ported to C++.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058430#comment-16058430
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. If the filter decide to skip a version, then reduce the returned count in 
ColumnTracker.
This method is too trick. And it is easy to have bug. So I upload a new patch 
(checkReturnedVersions.patch) which use the second idea in the description.
It have three steps to match column.
1. check the column family's max versions.
2. check by filter
3. check the returned versions. (This can be set by user).

About the setFilter()'s javadoc. It says "called AFTER all tests
for ttl, column match, deletes and max versions have been run." Talked with 
[~yangzhe1991] and [~Apache9], we thought the max versions is easy to 
misunderstanding. Because the column family has a max versions config and user 
can set a max versions to scan. So in the new patch, I update the javadoc of 
setFilter() method. The new javadoc is "called AFTER all tests for ttl, column 
match, deletes and column family's max versions have been run". And add a new 
method setVersions() for scan, which means how many versions will be returned 
to user. And add a @deprecated mark for setMaxVersions() method. Thanks. 

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-17125:
---
Attachment: HBASE-17125.master.checkReturnedVersions.patch

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Fix Version/s: (was: 1.5.0)
   (was: 1.4.1)
   1.1.12
   1.4.0

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16351) do dependency license check via enforcer plugin

2017-06-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058354#comment-16058354
 ] 

Hudson commented on HBASE-16351:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #68 (See 
[https://builds.apache.org/job/HBase-1.3-IT/68/])
HBASE-16351 Improve error reporting for license failures (busbey: rev 
5e144fc01ce65ea73a3c0027a8500a9da76d53f4)
* (edit) hbase-resource-bundle/src/main/resources/META-INF/LICENSE.vm
* (edit) pom.xml
* (edit) hbase-resource-bundle/src/main/resources/META-INF/NOTICE.vm


> do dependency license check via enforcer plugin
> ---
>
> Key: HBASE-16351
> URL: https://issues.apache.org/jira/browse/HBASE-16351
> Project: HBase
>  Issue Type: Improvement
>  Components: build, dependencies
>Reporter: Sean Busbey
>Assignee: Mike Drob
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
> Attachments: HBASE-16351.patch, HBASE-16351.v2.patch, 
> HBASE-16351.v3.patch
>
>
> As a stop-gap measure we've made our velocity template fail things when we 
> attempt to bundle a cat-x dependency (see HBASE-16318). Unfortunately, the 
> error messages in this case are non-obvious and digging to find the culprit 
> in a partially rendered LICENSE file leaves a lot to be desired.
> The maven enforcer plugin should allow us to fail more gracefully, with 
> output given on the maven console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   >