date:20170127

[jira] [Resolved] (HBASE-16785) We are not running all tests

2017-01-27 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-16785.
---
Resolution: Fixed

Re-resolve after pushing addendum.

> We are not running all tests
> 
>
> Key: HBASE-16785
> URL: https://issues.apache.org/jira/browse/HBASE-16785
> Project: HBase
>  Issue Type: Bug
>  Components: build, test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: 16785.addendum.patch, HBASE-16785.master.001.patch, 
> HBASE-16785.master.002 (1).patch, HBASE-16785.master.002 (2).patch, 
> HBASE-16785.master.002 (2).patch, HBASE-16785.master.002.patch, 
> HBASE-16785.master.002.patch, HBASE-16785.master.003.patch
>
>
> Noticed by [~mbertozzi]
> We have some modules where we tried to 'skip' the running of the second part 
> of tests -- medium and larges. That might have made sense once when the 
> module was originally added when there may have been just a few small tests 
> to run but as time goes by and the module accumulates more tests in a few 
> cases we've added mediums and larges but we've not removed the 'skip' config.
> Matteo noticed this happened in hbase-procedure.
> In hbase-client, there is at least a medium test that is being skipped.
> Let me try purging this trick everywhere. It doesn't seem to save us anything 
> going by build time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-16785) We are not running all tests

2017-01-27 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-16785:
---

Reopening to add an addendum to disable 
TestHRegionServerBulkLoadWithOldSecureEndpoint... it is just failing since this 
patch went in enabling it. Needs looking at...

> We are not running all tests
> 
>
> Key: HBASE-16785
> URL: https://issues.apache.org/jira/browse/HBASE-16785
> Project: HBase
>  Issue Type: Bug
>  Components: build, test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-16785.master.001.patch, HBASE-16785.master.002 
> (1).patch, HBASE-16785.master.002 (2).patch, HBASE-16785.master.002 
> (2).patch, HBASE-16785.master.002.patch, HBASE-16785.master.002.patch, 
> HBASE-16785.master.003.patch
>
>
> Noticed by [~mbertozzi]
> We have some modules where we tried to 'skip' the running of the second part 
> of tests -- medium and larges. That might have made sense once when the 
> module was originally added when there may have been just a few small tests 
> to run but as time goes by and the module accumulates more tests in a few 
> cases we've added mediums and larges but we've not removed the 'skip' config.
> Matteo noticed this happened in hbase-procedure.
> In hbase-client, there is at least a medium test that is being skipped.
> Let me try purging this trick everywhere. It doesn't seem to save us anything 
> going by build time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Incremental import from HBase to Hive

2017-01-27 Thread Chetan Khatri

Sure, There are several applications talks to HBase and populate data, Now
I want to load Incrementally data from HBase and do transformations like
Data Quality (filters) and save at Hive.

Incremental load means - I want to run this job weekly, and making sure
should not get duplication at Hive level.

Thanks.

On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser  wrote:

> (-cc dev)
>
> Might you be able to be more specific in the context of your question?
>
> What kind of requirements do you have?
>
>
> Chetan Khatri wrote:
>
>> Hello Community,
>>
>> I am working with HBase 1.2.4 , what would be the best approach to do
>> Incremental load from HBase to Hive ?
>>
>> Thanks.
>>
>>

[jira] [Created] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-01-27 Thread Ted Yu (JIRA)

Ted Yu created HBASE-17565:
--

 Summary: StochasticLoadBalancer may incorrectly skip balancing due 
to skewed multiplier sum
 Key: HBASE-17565
 URL: https://issues.apache.org/jira/browse/HBASE-17565
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


I was investigating why a 6 node cluster kept skipping balancing requests.

Here were the region counts on the servers:
449, 448, 447, 449, 453, 0
{code}
2017-01-26 22:04:47,145 INFO  
[RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
which need balance is 0.05
{code}

The big multiplier sum caught my eyes. Here was what additional debug logging 
showed:

{code}
2017-01-27 23:25:31,749 DEBUG 
[RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
balancer.StochasticLoadBalancer: class org.apache.hadoop.hbase.master.balancer. 
 StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 
10.0
2017-01-27 23:25:31,749 DEBUG 
[RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
balancer.StochasticLoadBalancer: class org.apache.hadoop.hbase.master.balancer. 
 StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 
1.0
{code}
Note however, that no table in the cluster used read replica.

I can think of two ways of fixing this situation:

1. If there is no read replica in the cluster, ignore the multipliers for the 
above two functions.
2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17564) Fix remaining calls to deprecated methods of Admin and HBaseAdmin

2017-01-27 Thread Jan Hentschel (JIRA)

Jan Hentschel created HBASE-17564:
-

 Summary: Fix remaining calls to deprecated methods of Admin and 
HBaseAdmin
 Key: HBASE-17564
 URL: https://issues.apache.org/jira/browse/HBASE-17564
 Project: HBase
  Issue Type: Improvement
Reporter: Jan Hentschel
Assignee: Jan Hentschel
Priority: Trivial


Fix the remaining calls to deprecated methods of the *Admin* interface and the 
*HBaseAdmin* class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17563) Foreach and switch in RootDocProcessor and StabilityOptions

2017-01-27 Thread Jan Hentschel (JIRA)

Jan Hentschel created HBASE-17563:
-

 Summary: Foreach and switch in RootDocProcessor and 
StabilityOptions
 Key: HBASE-17563
 URL: https://issues.apache.org/jira/browse/HBASE-17563
 Project: HBase
  Issue Type: Improvement
Reporter: Jan Hentschel
Assignee: Jan Hentschel
Priority: Trivial


To make the code in *RootDocProcessor* and *StabilityOptions* more readable 
change the existing for-loops and if-statements to foreach-loops and 
switch-statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Re: Replication resiliency

2017-01-27 Thread Andrew Purtell

There is an old JIRA somewhere to use Error Prone 
(https://github.com/google/error-prone) as framework for implementing static 
code analysis checks like that. FWIW

> On Jan 27, 2017, at 1:03 PM, Sean Busbey  wrote:
> 
> Josh, probably worth checking if a grep or something else we can do in
> precommit could catch this.
> 
> 
> 
>> On Fri, Jan 27, 2017 at 1:26 PM, Josh Elser  wrote:
>> Cool.
>> 
>> Let me open an issue to scan the codebase to see if we can find any
>> instances where we are starting threads which aren't using the UEH.
>> 
>> 
>> Andrew Purtell wrote:
>>> 
>>> Agreed, let's abort with an abundance of caution. That is the _least_ that
>>> should be done when a thread dies unexpectedly. Maybe we can improve
>>> resiliency for specific cases later.
>>> 
>>> 
>>> On Jan 26, 2017, at 5:53 PM, Enis Söztutar  wrote:
>>> 
> Do we have worker threads that we can't safely continue without

 indefinitely? Can we solve the general problem of "unhandled exception
 in threads cause a RS Abort"?
 We have this already in the code base. We are injecting an
 UncaughtExceptionhandler (which is calling Abortable.abort) to almost all
 of the HRegionServer service threads (see HRS.startServiceThreads). But
 I've also seen cases where some important thread got unmarked. I think it
 is good idea to revisit that and make sure that all the threads are
 injected with the UEH.

 The replication source threads are started on demand, that is why the UEH
 is not injected I think. But agreed that we should do the safe route
 here,
 and abort the regionserver.

 Enis

> On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser  wrote:
> 
> +1 If any "worker" thread can't safely/reasonably retry some unexpected
> exception without a reasonable expectation of self-healing, tank the RS.
> 
> Having those threads die but not the RS could go uncaught for indefinite
> period of time.
> 
> 
> Sean Busbey wrote:
> 
>> I've noticed a few other places where we can lose a worker thread and
>> the RegionServer happily continues. One notable example is the worker
>> threads that handle syncs for the WAL. I'm generally a
>> fail-fast-and-loud advocate, so I like aborting when things look
>> wonky. I've also had to deal with a lot of pain around silent and thus
>> hard to see replication failures, so strong signals that the
>> replication system is in a bad way sound good to me atm.
>> 
>> Do we have worker threads that we can't safely continue without
>> indefinitely? Can we solve the general problem of "unhandled exception
>> in threads cause a RS Abort"?
>> 
>> As mentioned on the jira, I do worry a bit about cluster stability and
>> cascading failures, given the ability to have user-provided endpoints
>> in the replication process. Ultimately, I don't see that as different
>> than all the other places coprocessors can put the cluster at risk.
>> 
>>> On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey
>>> wrote:
>>> 
>>> (edited subject to ensure folks filtering for DISCUSS see this)
>>> 
>>> 
>>> 
>>> On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling
>>> wrote:
>>> 
 Over in HBASE-17381 there has been some discussion around whether an
 unhandled exception in a ReplicationSourceWorkerThread should trigger
 a
 regionserver abort.

 The current behavior in the case of an unexpected exception in
 ReplicationSourceWorkerThread.run() is to log a message and simply
 let
 the
 thread die, allowing replication for this source to back up.

 I've seen this happen in an OOME scenario, which seems like a clear
 case
 where we would be better off aborting the regionserver.

 However, in the case of any other unexpected exceptions out of the
 run()
 method, how do we want to handle this?

 I'm of the general opinion that where we would be better off aborting
 on
 all unexpected exceptions, as it means that:
 a) we have some missing error handling
 b) failing fast raises visibility and makes it easier to add any
 error
 handling that should be there
 c) silently stopping up replication creates problems that are
 difficult
 for
 our users to identify operationally and hard to troubleshoot.

 However, the current behavior has been there for quite a while, and
 maybe
 there are other situations or concerns I'm not seeing which would
 justify
 having regionserver stability over replication stability.

 What are folks thoughts on

Re: Reminder: Please use git am when committing patches

2017-01-27 Thread Sean Busbey

I'm a strong +1 for getting as many of our committers as possible over
to using git am or some way of making sure the contributor shows up as
the author. I agree with the previous statements about incentivizing
participation by making sure contributors get credit in places they'll
commonly look (e.g. github  and other open source contribution
aggregators). Additionally, as a PMC member it is much easier to try
to build an idea of contributions to the repo over time if I can scan
through git author and sign-off tags than if I have to try to pick out
bespoke commit message details.



On Thu, Jan 26, 2017 at 8:45 PM, Apekshit Sharma  wrote:
> Yes, many people who contribute to open source like their work to show up
> in git profile. I do for one.
> Past thread on this discussion:
> http://search-hadoop.com/m/HBase/YGbb14CnYDA6GX1/threaded
>
> On Thu, Jan 26, 2017 at 6:01 PM, 张铎(Duo Zhang) 
> wrote:
>
>> I think the point here is github will not count the name in the commit
>> message as a contributor for the project?
>>
>> Enis Söztutar 于2017年1月27日 周五09:41写道：
>>
>> > Yep, we have been indeed using the msg (author name) historically. In
>> cases
>> > where author info is there, the guideline is to use git am --signoff.
>> >
>> > I don't think we should require git comit --author, as long as there is
>> > attribution in the commit msg. But we can do a --author as an option.
>> >
>> > Enis
>> >
>> > On Thu, Jan 26, 2017 at 4:29 PM, 张铎(Duo Zhang) 
>> > wrote:
>> >
>> > > See here, our committer guide says that the commit message format
>> should
>> > be
>> > > HBASE-XXX XXX (the actual author).
>> > >
>> > > http://hbase.apache.org/book.html#_commit_message_format
>> > >
>> > > I think the rule was setup when we were still on svn, but we do not
>> > change
>> > > it.
>> > >
>> > > I‘m a big +1 on always using 'git am' if possible, and set
>> > > author explicitly when using 'git commit'. One more thing is that, do
>> not
>> > > forget to add --signoff :)
>> > >
>> > > Mind opening a issue to modify the above section in hbase book, Appy?
>> > >
>> > > Thanks.
>> > >
>> > > 2017-01-27 4:12 GMT+08:00 Chetan Khatri :
>> > >
>> > > > Thanks , Appy. For understanding because i am very new to open source
>> > > > contribution.
>> > > >
>> > > > On Fri, Jan 27, 2017 at 1:25 AM, Apekshit Sharma 
>> > > > wrote:
>> > > >
>> > > > > Hi devs,
>> > > > >
>> > > > > A recent question by new contributor (thread: Git Pull Request to
>> > > HBase)
>> > > > > and looking at history of commits, i think it is worth re-iterating
>> > > that
>> > > > > committer should use *git am* while committing patches so that
>> > whoever
>> > > > did
>> > > > > the actual work gets appropriate credit in git history.
>> > > > > In case the patch uploaded is unformatted patch (without author
>> tag),
>> > > > > please use
>> > > > > *git commit --author=.*
>> > > > > I don't think we should be lax about it as a community if the
>> > original
>> > > > > author doesn't get appropriate credit.
>> > > > >
>> > > > > Thanks
>> > > > > -- Appy
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
>
> -- Appy

Re: [DISCUSS] Re: Replication resiliency

2017-01-27 Thread Sean Busbey

Josh, probably worth checking if a grep or something else we can do in
precommit could catch this.



On Fri, Jan 27, 2017 at 1:26 PM, Josh Elser  wrote:
> Cool.
>
> Let me open an issue to scan the codebase to see if we can find any
> instances where we are starting threads which aren't using the UEH.
>
>
> Andrew Purtell wrote:
>>
>> Agreed, let's abort with an abundance of caution. That is the _least_ that
>> should be done when a thread dies unexpectedly. Maybe we can improve
>> resiliency for specific cases later.
>>
>>
>> On Jan 26, 2017, at 5:53 PM, Enis Söztutar  wrote:
>>
 Do we have worker threads that we can't safely continue without
>>>
>>> indefinitely? Can we solve the general problem of "unhandled exception
>>> in threads cause a RS Abort"?
>>> We have this already in the code base. We are injecting an
>>> UncaughtExceptionhandler (which is calling Abortable.abort) to almost all
>>> of the HRegionServer service threads (see HRS.startServiceThreads). But
>>> I've also seen cases where some important thread got unmarked. I think it
>>> is good idea to revisit that and make sure that all the threads are
>>> injected with the UEH.
>>>
>>> The replication source threads are started on demand, that is why the UEH
>>> is not injected I think. But agreed that we should do the safe route
>>> here,
>>> and abort the regionserver.
>>>
>>> Enis
>>>
 On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser  wrote:

 +1 If any "worker" thread can't safely/reasonably retry some unexpected
 exception without a reasonable expectation of self-healing, tank the RS.

 Having those threads die but not the RS could go uncaught for indefinite
 period of time.


 Sean Busbey wrote:

> I've noticed a few other places where we can lose a worker thread and
> the RegionServer happily continues. One notable example is the worker
> threads that handle syncs for the WAL. I'm generally a
> fail-fast-and-loud advocate, so I like aborting when things look
> wonky. I've also had to deal with a lot of pain around silent and thus
> hard to see replication failures, so strong signals that the
> replication system is in a bad way sound good to me atm.
>
> Do we have worker threads that we can't safely continue without
> indefinitely? Can we solve the general problem of "unhandled exception
> in threads cause a RS Abort"?
>
> As mentioned on the jira, I do worry a bit about cluster stability and
> cascading failures, given the ability to have user-provided endpoints
> in the replication process. Ultimately, I don't see that as different
> than all the other places coprocessors can put the cluster at risk.
>
>> On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey
>> wrote:
>>
>> (edited subject to ensure folks filtering for DISCUSS see this)
>>
>>
>>
>> On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling
>> wrote:
>>
>>> Over in HBASE-17381 there has been some discussion around whether an
>>> unhandled exception in a ReplicationSourceWorkerThread should trigger
>>> a
>>> regionserver abort.
>>>
>>> The current behavior in the case of an unexpected exception in
>>> ReplicationSourceWorkerThread.run() is to log a message and simply
>>> let
>>> the
>>> thread die, allowing replication for this source to back up.
>>>
>>> I've seen this happen in an OOME scenario, which seems like a clear
>>> case
>>> where we would be better off aborting the regionserver.
>>>
>>> However, in the case of any other unexpected exceptions out of the
>>> run()
>>> method, how do we want to handle this?
>>>
>>> I'm of the general opinion that where we would be better off aborting
>>> on
>>> all unexpected exceptions, as it means that:
>>> a) we have some missing error handling
>>> b) failing fast raises visibility and makes it easier to add any
>>> error
>>> handling that should be there
>>> c) silently stopping up replication creates problems that are
>>> difficult
>>> for
>>> our users to identify operationally and hard to troubleshoot.
>>>
>>> However, the current behavior has been there for quite a while, and
>>> maybe
>>> there are other situations or concerns I'm not seeing which would
>>> justify
>>> having regionserver stability over replication stability.
>>>
>>> What are folks thoughts on this?  Should the regionserver abort on
>>> all
>>> unexpected exceptions in the run method or should we more narrowly
>>> focus
>>> this on OOME's?
>>>
>

[jira] [Resolved] (HBASE-14477) Compaction improvements: Date tiered compaction policy

2017-01-27 Thread Vladimir Rodionov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14477.
---
Resolution: Duplicate

Duplicate of HBASE-15181

> Compaction improvements: Date tiered compaction policy
> --
>
> Key: HBASE-14477
> URL: https://issues.apache.org/jira/browse/HBASE-14477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
>
> For immutable and mostly immutable data the current SizeTiered-based 
> compaction policy is not efficient. 
> # There is no need to compact all files into one, because, data is (mostly) 
> immutable and we do not need to collect garbage. (performance reason will be 
> discussed later)
> # Size-tiered compaction is not suitable for applications where most recent 
> data is most important and prevents efficient caching of this data. 
> The idea  is pretty similar to DateTieredCompaction in Cassandra:
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> http://www.datastax.com/dev/blog/dtcs-notes-from-the-field
> From Cassandra own blog:
> {quote}
> Since DTCS can be used with any table, it is important to know when it is a 
> good idea, and when it is not. I’ll try to explain the spectrum and 
> trade-offs here:
> 1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you 
> ingest fact data that is ordered in time, with no deletes or overwrites. This 
> is the standard “time series” use case.
> 2. OK Fit: Time-Ordered, with limited updates across whole data set, or only 
> updates to recent data: When you ingest data that is (mostly) ordered in 
> time, but revise or delete a very small proportion of the overall data across 
> the whole timeline.
> 3. Not a Good Fit: many partial row updates or deletions over time: When you 
> need to partially revise or delete fields for rows that you read together. 
> Also, when you revise or delete rows within clustered reads.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Re: Replication resiliency

2017-01-27 Thread Gary Helmling

Thanks for the feedback, all.  Makes sense to me.  I'll follow up in the
issue to use the same UEH to abort or at least same abort handling as other
cases (the current UEH used for replication source worker threads only
logs).


On Fri, Jan 27, 2017 at 11:27 AM Josh Elser  wrote:

Cool.

Let me open an issue to scan the codebase to see if we can find any
instances where we are starting threads which aren't using the UEH.

Andrew Purtell wrote:
> Agreed, let's abort with an abundance of caution. That is the _least_
that should be done when a thread dies unexpectedly. Maybe we can improve
resiliency for specific cases later.
>
>
> On Jan 26, 2017, at 5:53 PM, Enis Söztutar  wrote:
>
>>> Do we have worker threads that we can't safely continue without
>> indefinitely? Can we solve the general problem of "unhandled exception
>> in threads cause a RS Abort"?
>> We have this already in the code base. We are injecting an
>> UncaughtExceptionhandler (which is calling Abortable.abort) to almost all
>> of the HRegionServer service threads (see HRS.startServiceThreads). But
>> I've also seen cases where some important thread got unmarked. I think it
>> is good idea to revisit that and make sure that all the threads are
>> injected with the UEH.
>>
>> The replication source threads are started on demand, that is why the UEH
>> is not injected I think. But agreed that we should do the safe route
here,
>> and abort the regionserver.
>>
>> Enis
>>
>>> On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser  wrote:
>>>
>>> +1 If any "worker" thread can't safely/reasonably retry some unexpected
>>> exception without a reasonable expectation of self-healing, tank the RS.
>>>
>>> Having those threads die but not the RS could go uncaught for indefinite
>>> period of time.
>>>
>>>
>>> Sean Busbey wrote:
>>>
 I've noticed a few other places where we can lose a worker thread and
 the RegionServer happily continues. One notable example is the worker
 threads that handle syncs for the WAL. I'm generally a
 fail-fast-and-loud advocate, so I like aborting when things look
 wonky. I've also had to deal with a lot of pain around silent and thus
 hard to see replication failures, so strong signals that the
 replication system is in a bad way sound good to me atm.

 Do we have worker threads that we can't safely continue without
 indefinitely? Can we solve the general problem of "unhandled exception
 in threads cause a RS Abort"?

 As mentioned on the jira, I do worry a bit about cluster stability and
 cascading failures, given the ability to have user-provided endpoints
 in the replication process. Ultimately, I don't see that as different
 than all the other places coprocessors can put the cluster at risk.

> On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey
 wrote:
>
> (edited subject to ensure folks filtering for DISCUSS see this)
>
>
>
> On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling
> wrote:
>
>> Over in HBASE-17381 there has been some discussion around whether an
>> unhandled exception in a ReplicationSourceWorkerThread should
trigger a
>> regionserver abort.
>>
>> The current behavior in the case of an unexpected exception in
>> ReplicationSourceWorkerThread.run() is to log a message and simply
let
>> the
>> thread die, allowing replication for this source to back up.
>>
>> I've seen this happen in an OOME scenario, which seems like a clear
case
>> where we would be better off aborting the regionserver.
>>
>> However, in the case of any other unexpected exceptions out of the
run()
>> method, how do we want to handle this?
>>
>> I'm of the general opinion that where we would be better off
aborting on
>> all unexpected exceptions, as it means that:
>> a) we have some missing error handling
>> b) failing fast raises visibility and makes it easier to add any
error
>> handling that should be there
>> c) silently stopping up replication creates problems that are
difficult
>> for
>> our users to identify operationally and hard to troubleshoot.
>>
>> However, the current behavior has been there for quite a while, and
>> maybe
>> there are other situations or concerns I'm not seeing which would
>> justify
>> having regionserver stability over replication stability.
>>
>> What are folks thoughts on this?  Should the regionserver abort on
all
>> unexpected exceptions in the run method or should we more narrowly
focus
>> this on OOME's?
>>

[jira] [Created] (HBASE-17562) Remove documentation for coprocessor execution times after HBASE-14205

2017-01-27 Thread Enis Soztutar (JIRA)

Enis Soztutar created HBASE-17562:
-

 Summary: Remove documentation for coprocessor execution times 
after HBASE-14205
 Key: HBASE-17562
 URL: https://issues.apache.org/jira/browse/HBASE-17562
 Project: HBase
  Issue Type: Sub-task
Reporter: Enis Soztutar


Thanks, [~manniche] for reporting. Opened up a subtask. Feel free to pick it up 
if you want to patch it yourself. Otherwise, I can do a quick patch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17561) table status page should escape table name

2017-01-27 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-17561:
---

 Summary: table status page should escape table name
 Key: HBASE-17561
 URL: https://issues.apache.org/jira/browse/HBASE-17561
 Project: HBase
  Issue Type: Bug
  Components: master, UI
Reporter: Sean Busbey
Assignee: Sean Busbey


We write out table names to an html document without escaping html entities

e.g. in this case it even comes directly from the request
{code}

<% if ( !readOnly && action != null ) { %>
HBase Master: <%= master.getServerName() %>
<% } else { %>
Table: <%= fqtn %>
<% } %>
{code}

in 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/resources/hbase-webapps/master/table.jsp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17560) HMaster redirect should sanity check user input

2017-01-27 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-17560:
---

 Summary: HMaster redirect should sanity check user input
 Key: HBASE-17560
 URL: https://issues.apache.org/jira/browse/HBASE-17560
 Project: HBase
  Issue Type: Bug
  Components: master, security, UI
Reporter: Sean Busbey


We should do some sanity checking on the user provided data before we blindly 
pass it to a redirect.

i.e.

{code}
  public static class RedirectServlet extends HttpServlet {
private static final long serialVersionUID = 2894774810058302472L;
private static int regionServerInfoPort;

@Override
public void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
  String redirectUrl = request.getScheme() + "://"
+ request.getServerName() + ":" + regionServerInfoPort
+ request.getRequestURI();
  response.sendRedirect(redirectUrl);
}
  }
{code}

e.g.

* Are we reidrecting to a server that is ours?
* Did we validate the path/query string?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17559) Verify service threads are using the UncaughtExceptionHandler

2017-01-27 Thread Josh Elser (JIRA)

Josh Elser created HBASE-17559:
--

 Summary: Verify service threads are using the 
UncaughtExceptionHandler
 Key: HBASE-17559
 URL: https://issues.apache.org/jira/browse/HBASE-17559
 Project: HBase
  Issue Type: Task
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 1.4.0


(Context: 
https://lists.apache.org/thread.html/47c9a0f7193eaf0546ce241cfe093885366f5177ed867e18b45d77b9@%3Cdev.hbase.apache.org%3E)

We should take a once-over on the threads we start in the RegionServer/Master 
to make sure that they're using the UncaughtExceptionHandler to prevent the 
case where the process keeps running but one of our threads has died.

Such a situation may result in operators not realizing that their HBase 
instance is not actually running as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Re: Replication resiliency

2017-01-27 Thread Josh Elser


Cool.

Let me open an issue to scan the codebase to see if we can find any 
instances where we are starting threads which aren't using the UEH.


Andrew Purtell wrote:

Agreed, let's abort with an abundance of caution. That is the _least_ that 
should be done when a thread dies unexpectedly. Maybe we can improve resiliency 
for specific cases later.


On Jan 26, 2017, at 5:53 PM, Enis Söztutar  wrote:


Do we have worker threads that we can't safely continue without

indefinitely? Can we solve the general problem of "unhandled exception
in threads cause a RS Abort"?
We have this already in the code base. We are injecting an
UncaughtExceptionhandler (which is calling Abortable.abort) to almost all
of the HRegionServer service threads (see HRS.startServiceThreads). But
I've also seen cases where some important thread got unmarked. I think it
is good idea to revisit that and make sure that all the threads are
injected with the UEH.

The replication source threads are started on demand, that is why the UEH
is not injected I think. But agreed that we should do the safe route here,
and abort the regionserver.

Enis


On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser  wrote:

+1 If any "worker" thread can't safely/reasonably retry some unexpected
exception without a reasonable expectation of self-healing, tank the RS.

Having those threads die but not the RS could go uncaught for indefinite
period of time.


Sean Busbey wrote:


I've noticed a few other places where we can lose a worker thread and
the RegionServer happily continues. One notable example is the worker
threads that handle syncs for the WAL. I'm generally a
fail-fast-and-loud advocate, so I like aborting when things look
wonky. I've also had to deal with a lot of pain around silent and thus
hard to see replication failures, so strong signals that the
replication system is in a bad way sound good to me atm.

Do we have worker threads that we can't safely continue without
indefinitely? Can we solve the general problem of "unhandled exception
in threads cause a RS Abort"?

As mentioned on the jira, I do worry a bit about cluster stability and
cascading failures, given the ability to have user-provided endpoints
in the replication process. Ultimately, I don't see that as different
than all the other places coprocessors can put the cluster at risk.


On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey   wrote:

(edited subject to ensure folks filtering for DISCUSS see this)



On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling
wrote:


Over in HBASE-17381 there has been some discussion around whether an
unhandled exception in a ReplicationSourceWorkerThread should trigger a
regionserver abort.

The current behavior in the case of an unexpected exception in
ReplicationSourceWorkerThread.run() is to log a message and simply let
the
thread die, allowing replication for this source to back up.

I've seen this happen in an OOME scenario, which seems like a clear case
where we would be better off aborting the regionserver.

However, in the case of any other unexpected exceptions out of the run()
method, how do we want to handle this?

I'm of the general opinion that where we would be better off aborting on
all unexpected exceptions, as it means that:
a) we have some missing error handling
b) failing fast raises visibility and makes it easier to add any error
handling that should be there
c) silently stopping up replication creates problems that are difficult
for
our users to identify operationally and hard to troubleshoot.

However, the current behavior has been there for quite a while, and
maybe
there are other situations or concerns I'm not seeing which would
justify
having regionserver stability over replication stability.

What are folks thoughts on this?  Should the regionserver abort on all
unexpected exceptions in the run method or should we more narrowly focus
this on OOME's?

[jira] [Created] (HBASE-17558) ZK dumping jsp should escape html

2017-01-27 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-17558:
---

 Summary: ZK dumping jsp should escape html 
 Key: HBASE-17558
 URL: https://issues.apache.org/jira/browse/HBASE-17558
 Project: HBase
  Issue Type: Bug
  Components: security, UI
Reporter: Sean Busbey
Priority: Minor


Right now the ZK status page in the master dumps data from ZK using ZKUtil 
without doing any processing to e.g. escape HTML entities.

ie.:

{codE}



ZooKeeper Dump




<%= ZKUtil.dump(watcher).trim() %>



{code}

current url: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/resources/hbase-webapps/master/zk.jsp#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Incremental import from HBase to Hive

2017-01-27 Thread Chetan Khatri

Hello Community,

I am working with HBase 1.2.4 , what would be the best approach to do
Incremental load from HBase to Hive ?

Thanks.

[jira] [Created] (HBASE-17557) HRegionServer#reportRegionSizesForQuotas() should respond to UnsupportedOperationException

2017-01-27 Thread Ted Yu (JIRA)

Ted Yu created HBASE-17557:
--

 Summary: HRegionServer#reportRegionSizesForQuotas() should respond 
to UnsupportedOperationException
 Key: HBASE-17557
 URL: https://issues.apache.org/jira/browse/HBASE-17557
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu


When master doesn't support quota, you would see the following repeatedly in 
region server log:
{code}
2017-01-27 17:24:27,389 DEBUG [cn011.x.com,16020,1485468203653_ChoreService_1] 
regionserver.HRegionServer: Failed to report region sizes to Master. This will 
be retried.
org.apache.hadoop.hbase.DoNotRetryIOException: /23:16000 is unable to read call 
parameter from client 21; java.lang.UnsupportedOperationException: 
ReportRegionSpaceUse
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:334)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionSizesForQuotas(HRegionServer.java:1211)
at 
org.apache.hadoop.hbase.quotas.FileSystemUtilizationChore.reportRegionSizesToMaster(FileSystemUtilizationChore.java:170)
at 
org.apache.hadoop.hbase.quotas.FileSystemUtilizationChore.chore(FileSystemUtilizationChore.java:129)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 /23:16000 is unable to read call parameter from client 21; java.lang.  
   UnsupportedOperationException: ReportRegionSpaceUse
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at 
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionSpaceUse(RegionServerStatusProtos.java:10919)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionSizesForQuotas(HRegionServer.java:1209)
{code}
HRegionServer.reportRegionSizesForQuotas() should respond to 
UnsupportedOperationException and stop retrying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Successful: HBase Generate Website

2017-01-27 Thread Apache Jenkins Server

Build status: Successful

If successful, the website and docs have been generated. To update the live 
site, follow the instructions below. If failed, skip to the bottom of this 
email.

Use the following commands to download the patch and apply it to a clean branch 
based on origin/asf-site. If you prefer to keep the hbase-site repo around 
permanently, you can skip the clone step.

  git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git

  cd hbase-site
  wget -O- 
https://builds.apache.org/job/hbase_generate_website/472/artifact/website.patch.zip
 | funzip > 92fc4c0cc8efddae74662ac26c9b821402dd6394.patch
  git fetch
  git checkout -b asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394 
origin/asf-site
  git am --whitespace=fix 92fc4c0cc8efddae74662ac26c9b821402dd6394.patch

At this point, you can preview the changes by opening index.html or any of the 
other HTML pages in your local 
asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394 branch.

There are lots of spurious changes, such as timestamps and CSS styles in 
tables, so a generic git diff is not very useful. To see a list of files that 
have been added, deleted, renamed, changed type, or are otherwise interesting, 
use the following command:

  git diff --name-status --diff-filter=ADCRTXUB origin/asf-site

To see only files that had 100 or more lines changed:

  git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}'

When you are satisfied, publish your changes to origin/asf-site using these 
commands:

  git commit --allow-empty -m "Empty commit" # to work around a current ASF 
INFRA bug
  git push origin asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394:asf-site
  git checkout asf-site
  git branch -D asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394

Changes take a couple of minutes to be propagated. You can verify whether they 
have been propagated by looking at the Last Published date at the bottom of 
http://hbase.apache.org/. It should match the date in the index.html on the 
asf-site branch in Git.

As a courtesy- reply-all to this email to let other committers know you pushed 
the site.



If failed, see https://builds.apache.org/job/hbase_generate_website/472/console

[jira] [Reopened] (HBASE-17526) Procedure v2 - cleanup isSuspended from MasterProcedureScheduler#Queue

2017-01-27 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-17526:
-

Reopening. This commit introduced a findbugs warning that is now pinging every 
patch submission that touches hbase-server. Please include an addendum that 
cleans it up.

One example:

https://builds.apache.org/job/PreCommit-HBASE-Build/5449/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html

Please check on precommit results before pushing.

> Procedure v2 - cleanup isSuspended from MasterProcedureScheduler#Queue
> --
>
> Key: HBASE-17526
> URL: https://issues.apache.org/jira/browse/HBASE-17526
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17526.master.001.patch
>
>
> Queue#setSuspended() is not used anywhere, probably because when queue 
> wait/wakes on an event, it gets removed or added back to the fairq.
> Removing state, functions, and uses of isSuspended()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17556) The client will not invalidate stale region caches

2017-01-27 Thread Marcin Januszkiewicz (JIRA)

Marcin Januszkiewicz created HBASE-17556:


 Summary: The client will not invalidate stale region caches
 Key: HBASE-17556
 URL: https://issues.apache.org/jira/browse/HBASE-17556
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.24, 1.0.0, 2.0.0
Reporter: Marcin Januszkiewicz
Priority: Critical


We noticed in our application, that sometimes when we interact with a table an 
operation will fail with an exception, an all operations that happen on the 
same region will also fail until the application is restarted.

It seems that when a merge or split happens on a region that is already in the 
clients cache, and the client is configured to retry operations, then there is 
no way for the client to detect this. In RpcRetryingCaller#callWithRetries if a 
call fails with RegionNotServingException then the cache will be cleared only 
if the retry parameter is equal to 1. This means the call will fail but the 
following calls will succeed.

RpcRetryingCaller#callWithoutRetries contains the comment "It would be nice to 
clear the location cache here". Additionally, the stale cache will cause this 
call to fail, even though the data is available.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17555) Change calls to deprecated getHBaseAdmin to getAdmin

2017-01-27 Thread Jan Hentschel (JIRA)

Jan Hentschel created HBASE-17555:
-

 Summary: Change calls to deprecated getHBaseAdmin to getAdmin
 Key: HBASE-17555
 URL: https://issues.apache.org/jira/browse/HBASE-17555
 Project: HBase
  Issue Type: Improvement
Reporter: Jan Hentschel
Assignee: Jan Hentschel
Priority: Minor


*HBaseTestingUtil.getHBaseAdmin* is deprecated and was replaced with 
*getAdmin*. Change the calls to *getHBaseAdmin* to *getAdmin* where possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16785) We are not running all tests

[jira] [Reopened] (HBASE-16785) We are not running all tests

Re: Incremental import from HBase to Hive

[jira] [Created] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

[jira] [Created] (HBASE-17564) Fix remaining calls to deprecated methods of Admin and HBaseAdmin

[jira] [Created] (HBASE-17563) Foreach and switch in RootDocProcessor and StabilityOptions

Re: [DISCUSS] Re: Replication resiliency

Re: Reminder: Please use git am when committing patches

Re: [DISCUSS] Re: Replication resiliency

[jira] [Resolved] (HBASE-14477) Compaction improvements: Date tiered compaction policy

Re: [DISCUSS] Re: Replication resiliency

[jira] [Created] (HBASE-17562) Remove documentation for coprocessor execution times after HBASE-14205

[jira] [Created] (HBASE-17561) table status page should escape table name

[jira] [Created] (HBASE-17560) HMaster redirect should sanity check user input

[jira] [Created] (HBASE-17559) Verify service threads are using the UncaughtExceptionHandler

Re: [DISCUSS] Re: Replication resiliency

[jira] [Created] (HBASE-17558) ZK dumping jsp should escape html

Incremental import from HBase to Hive

[jira] [Created] (HBASE-17557) HRegionServer#reportRegionSizesForQuotas() should respond to UnsupportedOperationException

Successful: HBase Generate Website

[jira] [Reopened] (HBASE-17526) Procedure v2 - cleanup isSuspended from MasterProcedureScheduler#Queue

[jira] [Created] (HBASE-17556) The client will not invalidate stale region caches

[jira] [Created] (HBASE-17555) Change calls to deprecated getHBaseAdmin to getAdmin

23 matches

Site Navigation

Mail list logo

Footer information