Re: NOTICE: Creating hbase-2 branch tomorrow evening, Tuesday 6th

2017-06-05 Thread Stack
Thanks Duo. Let me review... Lets get it in.
St.Ack

On Mon, Jun 5, 2017 at 7:04 PM, 张铎(Duo Zhang)  wrote:

> HBASE-18038 please, the patch is ready here for several weeks, it is an
> interface change.
>
> 2017-06-06 3:57 GMT+08:00 Stack :
>
> > On Mon, Jun 5, 2017 at 11:47 AM, Francis Liu  wrote:
> >
> > > Hi Stack,
> > > Is it possible to get any potential interface changes for split meta
> > > before we branch?
> > >
> >
> >
> > If they are ready now, sure. Otherwise, let them come in after we branch.
> > Branch is closed to features. Place-holders that will ease roll-in of a
> > facility down the road will be allowed (especially when a request by a
> > long-time user of course).
> >
> > St.Ack
> >
> >
> >
> > > Thanks,Francis
> > >
> > >
> > >
> > > On Monday, June 5, 2017 8:52 AM, Stack  wrote:
> > >
> > >
> > >  I was going to cut branch-2 tomorrow unless objection.
> > >
> > > Intent is to push out our first hbase-2.0.0-alpha soon after, hopefully
> > by
> > > the end-of-this-week.
> > >
> > > Please see the tail of the thread '[DISCUSS] No regions on Master node
> in
> > > 2.0' for intent regards cluster layout.
> > >
> > > Thanks,
> > > St.Ack
> > >
> > >
> > >
> > >
> >
>


Re: NOTICE: Creating hbase-2 branch tomorrow evening, Tuesday 6th

2017-06-05 Thread Anoop John
Thanks for reminding HBASE-18038 Duo.. Ya better we get that in.. Will
look at the patch today.

-Anoop-

On Tue, Jun 6, 2017 at 7:34 AM, 张铎(Duo Zhang)  wrote:
> HBASE-18038 please, the patch is ready here for several weeks, it is an
> interface change.
>
> 2017-06-06 3:57 GMT+08:00 Stack :
>
>> On Mon, Jun 5, 2017 at 11:47 AM, Francis Liu  wrote:
>>
>> > Hi Stack,
>> > Is it possible to get any potential interface changes for split meta
>> > before we branch?
>> >
>>
>>
>> If they are ready now, sure. Otherwise, let them come in after we branch.
>> Branch is closed to features. Place-holders that will ease roll-in of a
>> facility down the road will be allowed (especially when a request by a
>> long-time user of course).
>>
>> St.Ack
>>
>>
>>
>> > Thanks,Francis
>> >
>> >
>> >
>> > On Monday, June 5, 2017 8:52 AM, Stack  wrote:
>> >
>> >
>> >  I was going to cut branch-2 tomorrow unless objection.
>> >
>> > Intent is to push out our first hbase-2.0.0-alpha soon after, hopefully
>> by
>> > the end-of-this-week.
>> >
>> > Please see the tail of the thread '[DISCUSS] No regions on Master node in
>> > 2.0' for intent regards cluster layout.
>> >
>> > Thanks,
>> > St.Ack
>> >
>> >
>> >
>> >
>>


Re: NOTICE: Creating hbase-2 branch tomorrow evening, Tuesday 6th

2017-06-05 Thread Duo Zhang
HBASE-18038 please, the patch is ready here for several weeks, it is an
interface change.

2017-06-06 3:57 GMT+08:00 Stack :

> On Mon, Jun 5, 2017 at 11:47 AM, Francis Liu  wrote:
>
> > Hi Stack,
> > Is it possible to get any potential interface changes for split meta
> > before we branch?
> >
>
>
> If they are ready now, sure. Otherwise, let them come in after we branch.
> Branch is closed to features. Place-holders that will ease roll-in of a
> facility down the road will be allowed (especially when a request by a
> long-time user of course).
>
> St.Ack
>
>
>
> > Thanks,Francis
> >
> >
> >
> > On Monday, June 5, 2017 8:52 AM, Stack  wrote:
> >
> >
> >  I was going to cut branch-2 tomorrow unless objection.
> >
> > Intent is to push out our first hbase-2.0.0-alpha soon after, hopefully
> by
> > the end-of-this-week.
> >
> > Please see the tail of the thread '[DISCUSS] No regions on Master node in
> > 2.0' for intent regards cluster layout.
> >
> > Thanks,
> > St.Ack
> >
> >
> >
> >
>


[jira] [Created] (HBASE-18166) [AMv2] We are splitting already-split files

2017-06-05 Thread stack (JIRA)
stack created HBASE-18166:
-

 Summary: [AMv2] We are splitting already-split files
 Key: HBASE-18166
 URL: https://issues.apache.org/jira/browse/HBASE-18166
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: stack
Assignee: stack
 Fix For: 2.0.0


Interesting issue. The below adds a lag cleaning up files after a compaction in 
case of on-going Scanners (for read replicas/offheap).

HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram)

What the lag means is that now that split is run from the HMaster in master 
branch, when it goes to get a listing of the files to split, it can pick up 
files that are for archiving but that have not been archived yet.  When it 
does, it goes ahead and splits them... making references of references.

Its a mess.

I added asking the Region if it is splittable a while back. The Master calls 
this from SplitTableRegionProcedure during preparation. If the RegionServer 
asked for the split, it is sort of redundant work given the RS asks itself if 
any references still; if any, it'll wait before asking for a split. But if a 
user/client asks, then this isSplittable over RPC comes in handy.

I was thinking that isSplittable could return list of files 

Or, easier, given we know a region is Splittable by the time we go to split the 
files, then I think master-side we can just skip any references found presuming 
read-for-archive.

Will be back with a patch. Want to test on cluster first (Side-effect is 
regions are offline because file at end of the reference to a reference is 
removed ... and so the open fails).





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-18126) Increment class

2017-06-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-18126.

  Resolution: Fixed
Hadoop Flags: Reviewed

> Increment class
> ---
>
> Key: HBASE-18126
> URL: https://issues.apache.org/jira/browse/HBASE-18126
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18126.v6.txt, 18126.v7.txt, 18126.v8.txt
>
>
> These Increment objects are used by the Table implementation to perform 
> increment operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18165) Predicate based deletion during major compactions

2017-06-05 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-18165:
-

 Summary: Predicate based deletion during major compactions
 Key: HBASE-18165
 URL: https://issues.apache.org/jira/browse/HBASE-18165
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


In many cases it is expensive to place a delete per version, column, or family.
HBase should have way to specify a predicate and remove all Cells matching the 
predicate during the next compactions (major and minor).

Nothing more concrete. The tricky part would be to know when it is safe to 
remove the predicate, i.e. when we can be sure that all Cells matching the 
predicate actually have been removed.

Could potentially use HBASE-12859 for that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: NOTICE: Creating hbase-2 branch tomorrow evening, Tuesday 6th

2017-06-05 Thread Stack
On Mon, Jun 5, 2017 at 11:47 AM, Francis Liu  wrote:

> Hi Stack,
> Is it possible to get any potential interface changes for split meta
> before we branch?
>


If they are ready now, sure. Otherwise, let them come in after we branch.
Branch is closed to features. Place-holders that will ease roll-in of a
facility down the road will be allowed (especially when a request by a
long-time user of course).

St.Ack



> Thanks,Francis
>
>
>
> On Monday, June 5, 2017 8:52 AM, Stack  wrote:
>
>
>  I was going to cut branch-2 tomorrow unless objection.
>
> Intent is to push out our first hbase-2.0.0-alpha soon after, hopefully by
> the end-of-this-week.
>
> Please see the tail of the thread '[DISCUSS] No regions on Master node in
> 2.0' for intent regards cluster layout.
>
> Thanks,
> St.Ack
>
>
>
>


[jira] [Created] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-05 Thread Kahlil Oppenheimer (JIRA)
Kahlil Oppenheimer created HBASE-18164:
--

 Summary: Much faster locality cost function and candidate generator
 Key: HBASE-18164
 URL: https://issues.apache.org/jira/browse/HBASE-18164
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: Kahlil Oppenheimer
Assignee: Kahlil Oppenheimer
Priority: Minor


We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18163) rubocop and ruby-lint are not available

2017-06-05 Thread Mike Drob (JIRA)
Mike Drob created HBASE-18163:
-

 Summary: rubocop and ruby-lint are not available
 Key: HBASE-18163
 URL: https://issues.apache.org/jira/browse/HBASE-18163
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: Mike Drob


>From the yetus output:
{noformat}
executable for 'rubocop' was not specified.
executable for 'Ruby-lint' was not specified.
{noformat}

which results in this during the build
| {color:blue}0{color} | {color:blue} rubocop {color} | {color:blue} 0m 11s 
{color} | {color:blue} rubocop was not available. {color} |
| {color:blue}0{color} | {color:blue} ruby-lint {color} | {color:blue} 0m 11s 
{color} | {color:blue} Ruby-lint was not available. {color} |

We should make those available.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18162) Backport 'Update jruby to a newer version' to branch-1

2017-06-05 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-18162:
---

 Summary: Backport 'Update jruby to a newer version' to branch-1
 Key: HBASE-18162
 URL: https://issues.apache.org/jira/browse/HBASE-18162
 Project: HBase
  Issue Type: Task
  Components: dependencies, shell
Affects Versions: 1.4.0
Reporter: Sean Busbey
 Fix For: 1.4.0


Work on HBASE-16196 ran into shell failures while attempting a branch-1 
backport.

Start from either the attempted backport or the master branch version. Work out 
what's different in branch-1 that causes TestShell to fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


NOTICE: Creating hbase-2 branch tomorrow evening, Tuesday 6th

2017-06-05 Thread Stack
I was going to cut branch-2 tomorrow unless objection.

Intent is to push out our first hbase-2.0.0-alpha soon after, hopefully by
the end-of-this-week.

Please see the tail of the thread '[DISCUSS] No regions on Master node in
2.0' for intent regards cluster layout.

Thanks,
St.Ack


[jira] [Created] (HBASE-18161) MultiHFileOutputFormat - comprehensive incremental load support

2017-06-05 Thread Densel Santhmayor (JIRA)
Densel Santhmayor created HBASE-18161:
-

 Summary: MultiHFileOutputFormat - comprehensive incremental load 
support
 Key: HBASE-18161
 URL: https://issues.apache.org/jira/browse/HBASE-18161
 Project: HBase
  Issue Type: New Feature
Reporter: Densel Santhmayor
Priority: Minor


h2. Introduction

MapReduce currently supports the ability to write HBase records in bulk to 
HFiles for a single table. The file(s) can then be uploaded to the relevant 
RegionServers information with reasonable latency. This feature is useful to 
make a large set of data available for queries at the same time as well as 
provides a way to efficiently process very large input into HBase without 
affecting query latencies.

There is, however, no support to write variations of the same record key to 
HFiles belonging to multiple HBase tables from within the same MapReduce job.  

h2. Goal

The goal of this JIRA is to extend HFileOutputFormat2 to support writing to 
HFiles for different tables within the same MapReduce job while single-table 
HFile features backwards-compatible. 

For our use case, we needed to write a record key to a smaller HBase table for 
quicker access, and the same record key with a date appended to a larger table 
for longer term storage with chronological access. Each of these tables would 
have different TTL and other settings to support their respective access 
patterns. We also needed to be able to bulk write records to multiple tables 
with different subsets of very large input as efficiently as possible. Rather 
than run the MapReduce job multiple times (one for each table or record 
structure), it would be useful to be able to parse the input a single time and 
write to multiple tables simultaneously.

Additionally, we'd like to maintain backwards compatibility with the existing 
heavily-used HFileOutputFormat2 interface to allow benefits such as locality 
sensitivity (that was introduced long after we implemented support for multiple 
tables) to support both single table and multi table hfile writes. 

h2. Proposal
* Backwards compatibility for existing single table support in 
HFileOutputFormat2 will be maintained and in this case, mappers will need to 
emit the table rowkey as before. However, a new class - MultiHFileOutputFormat 
- will provide a helper function to generate a rowkey for mappers that prefixes 
the desired tablename to the existing rowkey as well as provides 
configureIncrementalLoad support for multiple tables.
* HFileOutputFormat2 will be updated in the following way:
** configureIncrementalLoad will now accept multiple table descriptor and 
region locator pairs, analogous to the single pair currently accepted by 
HFileOutputFormat2. 
** Compression, Block Size, Bloom Type and Datablock settings PER column family 
that are set in the Configuration object are now indexed and retrieved by 
tablename AND column family
** getRegionStartKeys will now support multiple regionlocators and calculate 
split points and therefore partitions collectively for all tables. Similarly, 
now the eventual number of Reducers will be equal to the total number of 
partitions across all tables. 
** The RecordWriter class will be able to process rowkeys either with or 
without the tablename prepended depending on how configureIncrementalLoad was 
configured with MultiHFileOutputFormat or HFileOutputFormat2.
* The use of MultiHFileOutputFormat will write the output into HFiles which 
will match the output format of HFileOutputFormat2. However, while the default 
use case will keep the existing directory structure with column family name as 
the directory and HFiles within that directory, in the case of 
MultiHFileOutputFormat, it will output HFiles in the output directory with the 
following relative paths: 
{noformat}
 --table1 
   --family1 
 --HFiles 
 --table2 
   --family1 
   --family2 
 --HFiles
{noformat}

This aims to be a comprehensive solution to the original tickets - HBASE-3727 
and HBASE-16261. Thanks to [~clayb] for his support.

The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-05 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-18160:


 Summary: Fix incorrect  logic in FilterList.filterKeyValue
 Key: HBASE-18160
 URL: https://issues.apache.org/jira/browse/HBASE-18160
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu
Assignee: Zheng Hu


As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
implementation: 

1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to consider 
FilterList), So if uses use INCLUDE_AND_SEEK_NEXT_ROW in his own Filter and 
wrapped by a FilterList,  it'll  throw  an IllegalStateException("Received code 
is not valid."). 

2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the mininal 
step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose the mininal 
step among filters in filter list. Let's call it: The Mininal Step Rule).





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-11223) Limit the actions number of a call in the batch

2017-06-05 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved HBASE-11223.

Resolution: Duplicate

We had implemented the limit in the client (see HBASE-17408) and server (see 
HBASE-14946)

> Limit the actions number of a call in the batch 
> 
>
> Key: HBASE-11223
> URL: https://issues.apache.org/jira/browse/HBASE-11223
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>
> Huge batch operation will make regionserver crash for GC.
> The extreme code like this:
> {code}
> final List deletes = new ArrayList();
> final long rows = 400;
> for (long i = 0; i < rows; ++i) {
>   deletes.add(new Delete(Bytes.toBytes(i)));
> }
> table.delete(deletes);
> {code}
> We should limit the actions number of a call in the batch. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-9764) htable AutoFlush is hardcoded as false in PerformanceEvaluation

2017-06-05 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved HBASE-9764.
---
Resolution: Won't Fix

PE don't use the AutoFlush flag because the HTable is deprecated

> htable AutoFlush is hardcoded as false in PerformanceEvaluation
> ---
>
> Key: HBASE-9764
> URL: https://issues.apache.org/jira/browse/HBASE-9764
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, test
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9764-0.94-v1.patch
>
>
> In PerformanceEvaluation, htable AutoFlush option is hardcoded as false
> {code:title=PerformanceEvaluation.java|borderStyle=solid}
> void testSetup() throws IOException {
>   this.admin = new HBaseAdmin(conf);
>   this.table = new HTable(conf, tableName);
>   this.table.setAutoFlush(false);
>   this.table.setScannerCaching(30);
> }
> {code}
> This makes the write performace unreal. 
> Should we add an autoflush option in PerformanceEvaluation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)