Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-02-26 Thread Mike Drob
dev@ on BCC to prevent additional replies

Please start a new thread for this so it gets the appropriate visibility.
Beta 1 had been out for a while.

On Feb 26, 2018 10:58 AM, "Jean-Marc Spaggiari" 
wrote:

> Hum. "Broke" my cluster cluster again...
>
> 2018-02-26 13:54:44,053 WARN  [ProcExecWrkr-14]
> assignment.RegionTransitionProcedure: Retryable error trying to
> transition:
> pid=409, ppid=344, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
> UnassignProcedure table=page_crc, region=6d459de812e7ff0a3aff9a6285979a4c,
> server=node3.distparser.com,16020,1519665621427; rit=OPENING, location=
> node3.distparser.com,16020,1519665621427
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> current state=OPENING
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.
> transitionState(RegionStates.java:155)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> markRegionAsClosing(AssignmentManager.java:1530)
> at
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> updateTransition(UnassignProcedure.java:179)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.
> execute(RegionTransitionProcedure.java:309)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.
> execute(RegionTransitionProcedure.java:85)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(
> ProcedureExecutor.java:1456)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(
> ProcedureExecutor.java:1225)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.
> access$800(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(
> ProcedureExecutor.java:1735)
>
> Is there an easy way to recover from that? Can I just drop procedure wal?
> Or do I have to wipe the table again and transfer back from source? :-/
>
> JMS
>
> 2018-01-11 18:16 GMT-05:00 Stack :
>
> > Thanks JMS.
> > S
> >
> > On Thu, Jan 11, 2018 at 9:36 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Opened HBASE-19767 
> > > and HBASE-19768.
> > > Regarding the issue to create the log writer, it fails even if the DN
> is
> > > already declared dead on the NN side...
> > >
> > > 2018-01-10 21:37 GMT-05:00 Apekshit Sharma :
> > >
> > > > On Wed, Jan 10, 2018 at 11:25 AM, Zach York <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > What is the expectation for flaky tests? I was going to post some
> > test
> > > > > failures, but saw that they were included in the excludes for flaky
> > > > tests.
> > > > >
> > > > > I understand we might be okay with having flaky tests for this
> beta-1
> > > > (and
> > > > > obviously for dev), but I would assume that we want consistent test
> > > > results
> > > > > for the official 2.0.0 release.
> > > > >
> > > >
> > > > Yeah, that's the goal, but sadly not many hands on deck are working
> on
> > > > that, so doesn't seem in reach.
> > > >
> > > >
> > > > > Do we have JIRAs created for all the flaky tests so that we can
> start
> > > > > fixing them before the beta-2/official RCs get put up?
> > > > >
> > > >
> > > > Whenever i start working on one, i search for it in jira first in
> case
> > > > someone's already working on it, if not I create a new one. (treating
> > > jira
> > > > as a lock to avoid redundant work).
> > > > Creating just the jiras doesn't really help until someone takes them
> > and
> > > so
> > > > most just remain open. But chicken and egg problem maybe? Might be
> good
> > > to
> > > > create jira for a few simple ones to see if peeps starting
> contributing
> > > on
> > > > this front?
> > > >
> > > >
> > > > > I'd be happy to help try to track down the root causes of the
> > flakiness
> > > > and
> > > > > try to fix these problematic tests.
> > > > >
> > > > Any help here would be great!
> > > > Here's a personal thank you :
> > > > http://calmcoolcollective.net/wp-content/uploads/2016/08/
> > > chocolatechip.jpg
> > > > :)
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Zach
> > > > >
> > > > > On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:
> > > > >
> > > > > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a
> bit
> > > > more
> > > > > > test coverage at least. Thanks sir.
> > > > > > St.Ack
> > > > > >
> > > > > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > The DN was dead since December 31st... I really hope the DN
> > figured
> > > > > that
> > > > > > > :-/
> > > > > > >
> > > > > > > I will retry with making sure that the NN is aware the local DN
> > is
> > > > > dead,
> > > > > > > and see. I let you know.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> >

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-02-26 Thread Jean-Marc Spaggiari
Hum. "Broke" my cluster cluster again...

2018-02-26 13:54:44,053 WARN  [ProcExecWrkr-14]
assignment.RegionTransitionProcedure: Retryable error trying to transition:
pid=409, ppid=344, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
UnassignProcedure table=page_crc, region=6d459de812e7ff0a3aff9a6285979a4c,
server=node3.distparser.com,16020,1519665621427; rit=OPENING, location=
node3.distparser.com,16020,1519665621427
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
current state=OPENING
at
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1530)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)

Is there an easy way to recover from that? Can I just drop procedure wal?
Or do I have to wipe the table again and transfer back from source? :-/

JMS

2018-01-11 18:16 GMT-05:00 Stack :

> Thanks JMS.
> S
>
> On Thu, Jan 11, 2018 at 9:36 AM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
> > Opened HBASE-19767 
> > and HBASE-19768.
> > Regarding the issue to create the log writer, it fails even if the DN is
> > already declared dead on the NN side...
> >
> > 2018-01-10 21:37 GMT-05:00 Apekshit Sharma :
> >
> > > On Wed, Jan 10, 2018 at 11:25 AM, Zach York <
> > [email protected]>
> > > wrote:
> > >
> > > > What is the expectation for flaky tests? I was going to post some
> test
> > > > failures, but saw that they were included in the excludes for flaky
> > > tests.
> > > >
> > > > I understand we might be okay with having flaky tests for this beta-1
> > > (and
> > > > obviously for dev), but I would assume that we want consistent test
> > > results
> > > > for the official 2.0.0 release.
> > > >
> > >
> > > Yeah, that's the goal, but sadly not many hands on deck are working on
> > > that, so doesn't seem in reach.
> > >
> > >
> > > > Do we have JIRAs created for all the flaky tests so that we can start
> > > > fixing them before the beta-2/official RCs get put up?
> > > >
> > >
> > > Whenever i start working on one, i search for it in jira first in case
> > > someone's already working on it, if not I create a new one. (treating
> > jira
> > > as a lock to avoid redundant work).
> > > Creating just the jiras doesn't really help until someone takes them
> and
> > so
> > > most just remain open. But chicken and egg problem maybe? Might be good
> > to
> > > create jira for a few simple ones to see if peeps starting contributing
> > on
> > > this front?
> > >
> > >
> > > > I'd be happy to help try to track down the root causes of the
> flakiness
> > > and
> > > > try to fix these problematic tests.
> > > >
> > > Any help here would be great!
> > > Here's a personal thank you :
> > > http://calmcoolcollective.net/wp-content/uploads/2016/08/
> > chocolatechip.jpg
> > > :)
> > >
> > >
> > > >
> > > > Thanks,
> > > > Zach
> > > >
> > > > On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:
> > > >
> > > > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit
> > > more
> > > > > test coverage at least. Thanks sir.
> > > > > St.Ack
> > > > >
> > > > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > The DN was dead since December 31st... I really hope the DN
> figured
> > > > that
> > > > > > :-/
> > > > > >
> > > > > > I will retry with making sure that the NN is aware the local DN
> is
> > > > dead,
> > > > > > and see. I let you know.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > JMS
> > > > > >
> > > > > > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
> > > > > >
> > > > > > > The problem maybe that the DN is dead, but NN does not know and
> > > keep
> > > > > > > telling RS that you should try to connect to it. And for the
> new
> > > > > > > AsyncFSWAL, we need to connect to all the 3 DNs successfully
> > > before
> > > > > > > writing actual data to it, so the RS sucks...
> > > > > > >
> > > > > > > This maybe a problem.
> > > > > > >
> > > > > > > 2018-01-10 18:40 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-11 Thread Stack
Thanks JMS.
S

On Thu, Jan 11, 2018 at 9:36 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Opened HBASE-19767 
> and HBASE-19768.
> Regarding the issue to create the log writer, it fails even if the DN is
> already declared dead on the NN side...
>
> 2018-01-10 21:37 GMT-05:00 Apekshit Sharma :
>
> > On Wed, Jan 10, 2018 at 11:25 AM, Zach York <
> [email protected]>
> > wrote:
> >
> > > What is the expectation for flaky tests? I was going to post some test
> > > failures, but saw that they were included in the excludes for flaky
> > tests.
> > >
> > > I understand we might be okay with having flaky tests for this beta-1
> > (and
> > > obviously for dev), but I would assume that we want consistent test
> > results
> > > for the official 2.0.0 release.
> > >
> >
> > Yeah, that's the goal, but sadly not many hands on deck are working on
> > that, so doesn't seem in reach.
> >
> >
> > > Do we have JIRAs created for all the flaky tests so that we can start
> > > fixing them before the beta-2/official RCs get put up?
> > >
> >
> > Whenever i start working on one, i search for it in jira first in case
> > someone's already working on it, if not I create a new one. (treating
> jira
> > as a lock to avoid redundant work).
> > Creating just the jiras doesn't really help until someone takes them and
> so
> > most just remain open. But chicken and egg problem maybe? Might be good
> to
> > create jira for a few simple ones to see if peeps starting contributing
> on
> > this front?
> >
> >
> > > I'd be happy to help try to track down the root causes of the flakiness
> > and
> > > try to fix these problematic tests.
> > >
> > Any help here would be great!
> > Here's a personal thank you :
> > http://calmcoolcollective.net/wp-content/uploads/2016/08/
> chocolatechip.jpg
> > :)
> >
> >
> > >
> > > Thanks,
> > > Zach
> > >
> > > On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:
> > >
> > > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit
> > more
> > > > test coverage at least. Thanks sir.
> > > > St.Ack
> > > >
> > > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > > [email protected]> wrote:
> > > >
> > > > > The DN was dead since December 31st... I really hope the DN figured
> > > that
> > > > > :-/
> > > > >
> > > > > I will retry with making sure that the NN is aware the local DN is
> > > dead,
> > > > > and see. I let you know.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > JMS
> > > > >
> > > > > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
> > > > >
> > > > > > The problem maybe that the DN is dead, but NN does not know and
> > keep
> > > > > > telling RS that you should try to connect to it. And for the new
> > > > > > AsyncFSWAL, we need to connect to all the 3 DNs successfully
> > before
> > > > > > writing actual data to it, so the RS sucks...
> > > > > >
> > > > > > This maybe a problem.
> > > > > >
> > > > > > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <
> > > > [email protected]
> > > > > >:
> > > > > >
> > > > > > > You're correct. It was dead. I thought HBase will be able to
> > > survive
> > > > > > that.
> > > > > > > Same the DN dies after the RS has started, RS will fail closing
> > > > nicely
> > > > > :(
> > > > > > >
> > > > > > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang)  >:
> > > > > > >
> > > > > > > > Connection refuse? Have you checked the status of the
> datanode
> > on
> > > > > > node8?
> > > > > > > >
> > > > > > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > > > > > [email protected]
> > > > > > > >:
> > > > > > > >
> > > > > > > > > I know, this one sunk, but still running it on my cluster,
> so
> > > > here
> > > > > > is a
> > > > > > > > new
> > > > > > > > > issue I just got
> > > > > > > > >
> > > > > > > > > Any idea what this can be? I see this only a one of my
> > nodes...
> > > > > > > > >
> > > > > > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > > > > > 168.23.2:16020
> > > > > > > > ]
> > > > > > > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > > > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > > > > > node8.com%2C16020%
> > > > > > > > > 2C1515579724994.1515579743134
> > > > > > > > > failed, retry = 6
> > > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.
> > AbstractChannel$
> > > > > > > > > AnnotatedConnectException:
> > > > > > > > > syscall:getsockopt(..) failed: Connexion refusée: /
> > > > > > 192.168.23.2:50010
> > > > > > > > > at
> > > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > > > > > > finishConnect(..)(Unknown
> > > > > > > > > Source)
> > > > > > > > > Caused by:
> > > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > > > > > > NativeConnectException:
> > > > > > > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > > > > > > ... 1 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > >

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-11 Thread Jean-Marc Spaggiari
Opened HBASE-19767 
and HBASE-19768.
Regarding the issue to create the log writer, it fails even if the DN is
already declared dead on the NN side...

2018-01-10 21:37 GMT-05:00 Apekshit Sharma :

> On Wed, Jan 10, 2018 at 11:25 AM, Zach York 
> wrote:
>
> > What is the expectation for flaky tests? I was going to post some test
> > failures, but saw that they were included in the excludes for flaky
> tests.
> >
> > I understand we might be okay with having flaky tests for this beta-1
> (and
> > obviously for dev), but I would assume that we want consistent test
> results
> > for the official 2.0.0 release.
> >
>
> Yeah, that's the goal, but sadly not many hands on deck are working on
> that, so doesn't seem in reach.
>
>
> > Do we have JIRAs created for all the flaky tests so that we can start
> > fixing them before the beta-2/official RCs get put up?
> >
>
> Whenever i start working on one, i search for it in jira first in case
> someone's already working on it, if not I create a new one. (treating jira
> as a lock to avoid redundant work).
> Creating just the jiras doesn't really help until someone takes them and so
> most just remain open. But chicken and egg problem maybe? Might be good to
> create jira for a few simple ones to see if peeps starting contributing on
> this front?
>
>
> > I'd be happy to help try to track down the root causes of the flakiness
> and
> > try to fix these problematic tests.
> >
> Any help here would be great!
> Here's a personal thank you :
> http://calmcoolcollective.net/wp-content/uploads/2016/08/chocolatechip.jpg
> :)
>
>
> >
> > Thanks,
> > Zach
> >
> > On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:
> >
> > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit
> more
> > > test coverage at least. Thanks sir.
> > > St.Ack
> > >
> > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > [email protected]> wrote:
> > >
> > > > The DN was dead since December 31st... I really hope the DN figured
> > that
> > > > :-/
> > > >
> > > > I will retry with making sure that the NN is aware the local DN is
> > dead,
> > > > and see. I let you know.
> > > >
> > > > Thanks,
> > > >
> > > > JMS
> > > >
> > > > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
> > > >
> > > > > The problem maybe that the DN is dead, but NN does not know and
> keep
> > > > > telling RS that you should try to connect to it. And for the new
> > > > > AsyncFSWAL, we need to connect to all the 3 DNs successfully
> before
> > > > > writing actual data to it, so the RS sucks...
> > > > >
> > > > > This maybe a problem.
> > > > >
> > > > > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <
> > > [email protected]
> > > > >:
> > > > >
> > > > > > You're correct. It was dead. I thought HBase will be able to
> > survive
> > > > > that.
> > > > > > Same the DN dies after the RS has started, RS will fail closing
> > > nicely
> > > > :(
> > > > > >
> > > > > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
> > > > > >
> > > > > > > Connection refuse? Have you checked the status of the datanode
> on
> > > > > node8?
> > > > > > >
> > > > > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > > > > [email protected]
> > > > > > >:
> > > > > > >
> > > > > > > > I know, this one sunk, but still running it on my cluster, so
> > > here
> > > > > is a
> > > > > > > new
> > > > > > > > issue I just got
> > > > > > > >
> > > > > > > > Any idea what this can be? I see this only a one of my
> nodes...
> > > > > > > >
> > > > > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > > > > 168.23.2:16020
> > > > > > > ]
> > > > > > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > > > > node8.com%2C16020%
> > > > > > > > 2C1515579724994.1515579743134
> > > > > > > > failed, retry = 6
> > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.
> AbstractChannel$
> > > > > > > > AnnotatedConnectException:
> > > > > > > > syscall:getsockopt(..) failed: Connexion refusée: /
> > > > > 192.168.23.2:50010
> > > > > > > > at
> > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > > > > > finishConnect(..)(Unknown
> > > > > > > > Source)
> > > > > > > > Caused by:
> > > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > > > > > NativeConnectException:
> > > > > > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > > > > > ... 1 more
> > > > > > > >
> > > > > > > >
> > > > > > > > From the same node, if I ls while the RS is starting, I can
> see
> > > the
> > > > > > > related
> > > > > > > > directoy:
> > > > > > > >
> > > > > > > >
> > > > > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > > > > /home/hadoop/hadoop-2.7.5/bin/
> > > > > > > hdfs
> > > > > > > > dfs -ls /hbase/WALs/
> > > > > > > > Found 35 items
> > > > > > > > ...
> > > > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Apekshit Sharma
On Wed, Jan 10, 2018 at 11:25 AM, Zach York 
wrote:

> What is the expectation for flaky tests? I was going to post some test
> failures, but saw that they were included in the excludes for flaky tests.
>
> I understand we might be okay with having flaky tests for this beta-1 (and
> obviously for dev), but I would assume that we want consistent test results
> for the official 2.0.0 release.
>

Yeah, that's the goal, but sadly not many hands on deck are working on
that, so doesn't seem in reach.


> Do we have JIRAs created for all the flaky tests so that we can start
> fixing them before the beta-2/official RCs get put up?
>

Whenever i start working on one, i search for it in jira first in case
someone's already working on it, if not I create a new one. (treating jira
as a lock to avoid redundant work).
Creating just the jiras doesn't really help until someone takes them and so
most just remain open. But chicken and egg problem maybe? Might be good to
create jira for a few simple ones to see if peeps starting contributing on
this front?


> I'd be happy to help try to track down the root causes of the flakiness and
> try to fix these problematic tests.
>
Any help here would be great!
Here's a personal thank you :
http://calmcoolcollective.net/wp-content/uploads/2016/08/chocolatechip.jpg
:)


>
> Thanks,
> Zach
>
> On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:
>
> > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit more
> > test coverage at least. Thanks sir.
> > St.Ack
> >
> > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > The DN was dead since December 31st... I really hope the DN figured
> that
> > > :-/
> > >
> > > I will retry with making sure that the NN is aware the local DN is
> dead,
> > > and see. I let you know.
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
> > >
> > > > The problem maybe that the DN is dead, but NN does not know and keep
> > > > telling RS that you should try to connect to it. And for the new
> > > > AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
> > > > writing actual data to it, so the RS sucks...
> > > >
> > > > This maybe a problem.
> > > >
> > > > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <
> > [email protected]
> > > >:
> > > >
> > > > > You're correct. It was dead. I thought HBase will be able to
> survive
> > > > that.
> > > > > Same the DN dies after the RS has started, RS will fail closing
> > nicely
> > > :(
> > > > >
> > > > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
> > > > >
> > > > > > Connection refuse? Have you checked the status of the datanode on
> > > > node8?
> > > > > >
> > > > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > > > [email protected]
> > > > > >:
> > > > > >
> > > > > > > I know, this one sunk, but still running it on my cluster, so
> > here
> > > > is a
> > > > > > new
> > > > > > > issue I just got
> > > > > > >
> > > > > > > Any idea what this can be? I see this only a one of my nodes...
> > > > > > >
> > > > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > > > 168.23.2:16020
> > > > > > ]
> > > > > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > > > node8.com%2C16020%
> > > > > > > 2C1515579724994.1515579743134
> > > > > > > failed, retry = 6
> > > > > > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > > > > > AnnotatedConnectException:
> > > > > > > syscall:getsockopt(..) failed: Connexion refusée: /
> > > > 192.168.23.2:50010
> > > > > > > at
> > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > > > > finishConnect(..)(Unknown
> > > > > > > Source)
> > > > > > > Caused by:
> > > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > > > > NativeConnectException:
> > > > > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > > > > ... 1 more
> > > > > > >
> > > > > > >
> > > > > > > From the same node, if I ls while the RS is starting, I can see
> > the
> > > > > > related
> > > > > > > directoy:
> > > > > > >
> > > > > > >
> > > > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > > > /home/hadoop/hadoop-2.7.5/bin/
> > > > > > hdfs
> > > > > > > dfs -ls /hbase/WALs/
> > > > > > > Found 35 items
> > > > > > > ...
> > > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > > /hbase/WALs/
> > > > > > > node1.com,16020,1515579724884
> > > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > > /hbase/WALs/
> > > > > > > node3.com,16020,1515579738916
> > > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > > /hbase/WALs/
> > > > > > > node4.com,16020,1515579717193
> > > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > > /hbase/WALs/
> > > > > > > node5.com,16020,1515579724586
> > > > > > > drwxr-xr-x   - hbase

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Zach York
What is the expectation for flaky tests? I was going to post some test
failures, but saw that they were included in the excludes for flaky tests.

I understand we might be okay with having flaky tests for this beta-1 (and
obviously for dev), but I would assume that we want consistent test results
for the official 2.0.0 release.
Do we have JIRAs created for all the flaky tests so that we can start
fixing them before the beta-2/official RCs get put up?
I'd be happy to help try to track down the root causes of the flakiness and
try to fix these problematic tests.

Thanks,
Zach

On Wed, Jan 10, 2018 at 9:37 AM, Stack  wrote:

> Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit more
> test coverage at least. Thanks sir.
> St.Ack
>
> On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
> > The DN was dead since December 31st... I really hope the DN figured that
> > :-/
> >
> > I will retry with making sure that the NN is aware the local DN is dead,
> > and see. I let you know.
> >
> > Thanks,
> >
> > JMS
> >
> > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
> >
> > > The problem maybe that the DN is dead, but NN does not know and keep
> > > telling RS that you should try to connect to it. And for the new
> > > AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
> > > writing actual data to it, so the RS sucks...
> > >
> > > This maybe a problem.
> > >
> > > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <
> [email protected]
> > >:
> > >
> > > > You're correct. It was dead. I thought HBase will be able to survive
> > > that.
> > > > Same the DN dies after the RS has started, RS will fail closing
> nicely
> > :(
> > > >
> > > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
> > > >
> > > > > Connection refuse? Have you checked the status of the datanode on
> > > node8?
> > > > >
> > > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > > [email protected]
> > > > >:
> > > > >
> > > > > > I know, this one sunk, but still running it on my cluster, so
> here
> > > is a
> > > > > new
> > > > > > issue I just got
> > > > > >
> > > > > > Any idea what this can be? I see this only a one of my nodes...
> > > > > >
> > > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > > 168.23.2:16020
> > > > > ]
> > > > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > > node8.com%2C16020%
> > > > > > 2C1515579724994.1515579743134
> > > > > > failed, retry = 6
> > > > > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > > > > AnnotatedConnectException:
> > > > > > syscall:getsockopt(..) failed: Connexion refusée: /
> > > 192.168.23.2:50010
> > > > > > at
> > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > > > finishConnect(..)(Unknown
> > > > > > Source)
> > > > > > Caused by:
> > > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > > > NativeConnectException:
> > > > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > > > ... 1 more
> > > > > >
> > > > > >
> > > > > > From the same node, if I ls while the RS is starting, I can see
> the
> > > > > related
> > > > > > directoy:
> > > > > >
> > > > > >
> > > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > > /home/hadoop/hadoop-2.7.5/bin/
> > > > > hdfs
> > > > > > dfs -ls /hbase/WALs/
> > > > > > Found 35 items
> > > > > > ...
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node1.com,16020,1515579724884
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node3.com,16020,1515579738916
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node4.com,16020,1515579717193
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node5.com,16020,1515579724586
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node6.com,16020,1515579724999
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node7.com,16020,1515579725681
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23
> > > > /hbase/WALs/
> > > > > > node8.com,16020,1515579724994
> > > > > >
> > > > > >
> > > > > >
> > > > > > and after the RS tries many times and fails the directory is
> gone:
> > > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > > /home/hadoop/hadoop-2.7.5/bin/
> > > > > hdfs
> > > > > > dfs -ls /hbase/WALs/
> > > > > > Found 34 items
> > > > > > ...
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node1.com,16020,1515579724884
> > > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > > /hbase/WALs/
> > > > > > node3.com,16020,1515579738916
> > > > > > drwxr-xr-x   

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Stack
Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit more
test coverage at least. Thanks sir.
St.Ack

On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> The DN was dead since December 31st... I really hope the DN figured that
> :-/
>
> I will retry with making sure that the NN is aware the local DN is dead,
> and see. I let you know.
>
> Thanks,
>
> JMS
>
> 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :
>
> > The problem maybe that the DN is dead, but NN does not know and keep
> > telling RS that you should try to connect to it. And for the new
> > AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
> > writing actual data to it, so the RS sucks...
> >
> > This maybe a problem.
> >
> > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari  >:
> >
> > > You're correct. It was dead. I thought HBase will be able to survive
> > that.
> > > Same the DN dies after the RS has started, RS will fail closing nicely
> :(
> > >
> > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
> > >
> > > > Connection refuse? Have you checked the status of the datanode on
> > node8?
> > > >
> > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > [email protected]
> > > >:
> > > >
> > > > > I know, this one sunk, but still running it on my cluster, so here
> > is a
> > > > new
> > > > > issue I just got
> > > > >
> > > > > Any idea what this can be? I see this only a one of my nodes...
> > > > >
> > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > 168.23.2:16020
> > > > ]
> > > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > node8.com%2C16020%
> > > > > 2C1515579724994.1515579743134
> > > > > failed, retry = 6
> > > > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > > > AnnotatedConnectException:
> > > > > syscall:getsockopt(..) failed: Connexion refusée: /
> > 192.168.23.2:50010
> > > > > at
> > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > > finishConnect(..)(Unknown
> > > > > Source)
> > > > > Caused by:
> > > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > > NativeConnectException:
> > > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > > ... 1 more
> > > > >
> > > > >
> > > > > From the same node, if I ls while the RS is starting, I can see the
> > > > related
> > > > > directoy:
> > > > >
> > > > >
> > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > /home/hadoop/hadoop-2.7.5/bin/
> > > > hdfs
> > > > > dfs -ls /hbase/WALs/
> > > > > Found 35 items
> > > > > ...
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node1.com,16020,1515579724884
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node3.com,16020,1515579738916
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node4.com,16020,1515579717193
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node5.com,16020,1515579724586
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node6.com,16020,1515579724999
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node7.com,16020,1515579725681
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23
> > > /hbase/WALs/
> > > > > node8.com,16020,1515579724994
> > > > >
> > > > >
> > > > >
> > > > > and after the RS tries many times and fails the directory is gone:
> > > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> > /home/hadoop/hadoop-2.7.5/bin/
> > > > hdfs
> > > > > dfs -ls /hbase/WALs/
> > > > > Found 34 items
> > > > > ...
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node1.com,16020,1515579724884
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node3.com,16020,1515579738916
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node4.com,16020,1515579717193
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node5.com,16020,1515579724586
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node6.com,16020,1515579724999
> > > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > > /hbase/WALs/
> > > > > node7.com,16020,1515579725681
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.
> > > 168.23.2:16020
> > > > ]
> > > > > regionserver.HRegionServer: * ABORTING region server
> > > > > node8.com,16020,1515579724994:
> > > > > Unhandled: Failed to create wal log writer hdfs://
> > > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > > node8.com%2C16020%
> > > > > 2C1

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
The DN was dead since December 31st... I really hope the DN figured that :-/

I will retry with making sure that the NN is aware the local DN is dead,
and see. I let you know.

Thanks,

JMS

2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) :

> The problem maybe that the DN is dead, but NN does not know and keep
> telling RS that you should try to connect to it. And for the new
> AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
> writing actual data to it, so the RS sucks...
>
> This maybe a problem.
>
> 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari :
>
> > You're correct. It was dead. I thought HBase will be able to survive
> that.
> > Same the DN dies after the RS has started, RS will fail closing nicely :(
> >
> > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
> >
> > > Connection refuse? Have you checked the status of the datanode on
> node8?
> > >
> > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> [email protected]
> > >:
> > >
> > > > I know, this one sunk, but still running it on my cluster, so here
> is a
> > > new
> > > > issue I just got
> > > >
> > > > Any idea what this can be? I see this only a one of my nodes...
> > > >
> > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > 168.23.2:16020
> > > ]
> > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > node8.com%2C16020%
> > > > 2C1515579724994.1515579743134
> > > > failed, retry = 6
> > > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > > AnnotatedConnectException:
> > > > syscall:getsockopt(..) failed: Connexion refusée: /
> 192.168.23.2:50010
> > > > at
> > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > finishConnect(..)(Unknown
> > > > Source)
> > > > Caused by:
> > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > NativeConnectException:
> > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > ... 1 more
> > > >
> > > >
> > > > From the same node, if I ls while the RS is starting, I can see the
> > > related
> > > > directoy:
> > > >
> > > >
> > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> /home/hadoop/hadoop-2.7.5/bin/
> > > hdfs
> > > > dfs -ls /hbase/WALs/
> > > > Found 35 items
> > > > ...
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node1.com,16020,1515579724884
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node3.com,16020,1515579738916
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node4.com,16020,1515579717193
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node5.com,16020,1515579724586
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node6.com,16020,1515579724999
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node7.com,16020,1515579725681
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23
> > /hbase/WALs/
> > > > node8.com,16020,1515579724994
> > > >
> > > >
> > > >
> > > > and after the RS tries many times and fails the directory is gone:
> > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> /home/hadoop/hadoop-2.7.5/bin/
> > > hdfs
> > > > dfs -ls /hbase/WALs/
> > > > Found 34 items
> > > > ...
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node1.com,16020,1515579724884
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node3.com,16020,1515579738916
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node4.com,16020,1515579717193
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node5.com,16020,1515579724586
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node6.com,16020,1515579724999
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node7.com,16020,1515579725681
> > > >
> > > >
> > > >
> > > >
> > > > 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.
> > 168.23.2:16020
> > > ]
> > > > regionserver.HRegionServer: * ABORTING region server
> > > > node8.com,16020,1515579724994:
> > > > Unhandled: Failed to create wal log writer hdfs://
> > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > node8.com%2C16020%
> > > > 2C1515579724994.1515579743134
> > > > after retrying 10 time(s) *
> > > > java.io.IOException: Failed to create wal log writer hdfs://
> > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > node8.com%2C16020%
> > > > 2C1515579724994.1515579743134
> > > > after retrying 10 time(s)
> > > > at
> > > > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> > > createWriterInstance(
> > > > AsyncFSWAL.java:663)
> > > > at
> > > > org.apache.hadoop.hbase.regionse

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Duo Zhang
The problem maybe that the DN is dead, but NN does not know and keep
telling RS that you should try to connect to it. And for the new
AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
writing actual data to it, so the RS sucks...

This maybe a problem.

2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari :

> You're correct. It was dead. I thought HBase will be able to survive that.
> Same the DN dies after the RS has started, RS will fail closing nicely :(
>
> 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :
>
> > Connection refuse? Have you checked the status of the datanode on node8?
> >
> > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari  >:
> >
> > > I know, this one sunk, but still running it on my cluster, so here is a
> > new
> > > issue I just got
> > >
> > > Any idea what this can be? I see this only a one of my nodes...
> > >
> > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> 168.23.2:16020
> > ]
> > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > node8.com%2C16020%
> > > 2C1515579724994.1515579743134
> > > failed, retry = 6
> > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > AnnotatedConnectException:
> > > syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> > > at
> > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > finishConnect(..)(Unknown
> > > Source)
> > > Caused by:
> > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > NativeConnectException:
> > > syscall:getsockopt(..) failed: Connexion refusée
> > > ... 1 more
> > >
> > >
> > > From the same node, if I ls while the RS is starting, I can see the
> > related
> > > directoy:
> > >
> > >
> > > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> > hdfs
> > > dfs -ls /hbase/WALs/
> > > Found 35 items
> > > ...
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node1.com,16020,1515579724884
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node3.com,16020,1515579738916
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node4.com,16020,1515579717193
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node5.com,16020,1515579724586
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node6.com,16020,1515579724999
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node7.com,16020,1515579725681
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23
> /hbase/WALs/
> > > node8.com,16020,1515579724994
> > >
> > >
> > >
> > > and after the RS tries many times and fails the directory is gone:
> > > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> > hdfs
> > > dfs -ls /hbase/WALs/
> > > Found 34 items
> > > ...
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node1.com,16020,1515579724884
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node3.com,16020,1515579738916
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node4.com,16020,1515579717193
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node5.com,16020,1515579724586
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node6.com,16020,1515579724999
> > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> /hbase/WALs/
> > > node7.com,16020,1515579725681
> > >
> > >
> > >
> > >
> > > 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.
> 168.23.2:16020
> > ]
> > > regionserver.HRegionServer: * ABORTING region server
> > > node8.com,16020,1515579724994:
> > > Unhandled: Failed to create wal log writer hdfs://
> > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > node8.com%2C16020%
> > > 2C1515579724994.1515579743134
> > > after retrying 10 time(s) *
> > > java.io.IOException: Failed to create wal log writer hdfs://
> > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > node8.com%2C16020%
> > > 2C1515579724994.1515579743134
> > > after retrying 10 time(s)
> > > at
> > > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> > createWriterInstance(
> > > AsyncFSWAL.java:663)
> > > at
> > > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> > createWriterInstance(
> > > AsyncFSWAL.java:130)
> > > at
> > > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > > AbstractFSWAL.java:766)
> > > at
> > > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > > AbstractFSWAL.java:504)
> > > at
> > > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> > > init>(AsyncFSWAL.java:264)
> > > at
> > > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> > > AsyncFSWALProvider.java:69)
> > > at
> > > org.apache.

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
You're correct. It was dead. I thought HBase will be able to survive that.
Same the DN dies after the RS has started, RS will fail closing nicely :(

2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) :

> Connection refuse? Have you checked the status of the datanode on node8?
>
> 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari :
>
> > I know, this one sunk, but still running it on my cluster, so here is a
> new
> > issue I just got
> >
> > Any idea what this can be? I see this only a one of my nodes...
> >
> > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020
> ]
> > wal.AsyncFSWAL: create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > failed, retry = 6
> > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > AnnotatedConnectException:
> > syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> > at
> > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > finishConnect(..)(Unknown
> > Source)
> > Caused by:
> > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > NativeConnectException:
> > syscall:getsockopt(..) failed: Connexion refusée
> > ... 1 more
> >
> >
> > From the same node, if I ls while the RS is starting, I can see the
> related
> > directoy:
> >
> >
> > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> hdfs
> > dfs -ls /hbase/WALs/
> > Found 35 items
> > ...
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node1.com,16020,1515579724884
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node3.com,16020,1515579738916
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node4.com,16020,1515579717193
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node5.com,16020,1515579724586
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node6.com,16020,1515579724999
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node7.com,16020,1515579725681
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
> > node8.com,16020,1515579724994
> >
> >
> >
> > and after the RS tries many times and fails the directory is gone:
> > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> hdfs
> > dfs -ls /hbase/WALs/
> > Found 34 items
> > ...
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node1.com,16020,1515579724884
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node3.com,16020,1515579738916
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node4.com,16020,1515579717193
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node5.com,16020,1515579724586
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node6.com,16020,1515579724999
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node7.com,16020,1515579725681
> >
> >
> >
> >
> > 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020
> ]
> > regionserver.HRegionServer: * ABORTING region server
> > node8.com,16020,1515579724994:
> > Unhandled: Failed to create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > after retrying 10 time(s) *
> > java.io.IOException: Failed to create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > after retrying 10 time(s)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(
> > AsyncFSWAL.java:663)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(
> > AsyncFSWAL.java:130)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > AbstractFSWAL.java:766)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > AbstractFSWAL.java:504)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> > init>(AsyncFSWAL.java:264)
> > at
> > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> > AsyncFSWALProvider.java:69)
> > at
> > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> > AsyncFSWALProvider.java:44)
> > at
> > org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> > AbstractFSWALProvider.java:139)
> > at
> > org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> > AbstractFSWALProvider.java:55)
> > at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:244)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> > getWAL(HRegionServer.java:2123)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> > buildServerLoad(HRegionServer.java:1315)
> > at
> > org.apache.hadoop

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Duo Zhang
Connection refuse? Have you checked the status of the datanode on node8?

2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari :

> I know, this one sunk, but still running it on my cluster, so here is a new
> issue I just got
>
> Any idea what this can be? I see this only a one of my nodes...
>
> 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020]
> wal.AsyncFSWAL: create wal log writer hdfs://
> node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%
> 2C1515579724994.1515579743134
> failed, retry = 6
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> AnnotatedConnectException:
> syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> at
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> finishConnect(..)(Unknown
> Source)
> Caused by:
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> NativeConnectException:
> syscall:getsockopt(..) failed: Connexion refusée
> ... 1 more
>
>
> From the same node, if I ls while the RS is starting, I can see the related
> directoy:
>
>
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 35 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
> node8.com,16020,1515579724994
>
>
>
> and after the RS tries many times and fails the directory is gone:
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 34 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
>
>
>
>
> 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020]
> regionserver.HRegionServer: * ABORTING region server
> node8.com,16020,1515579724994:
> Unhandled: Failed to create wal log writer hdfs://
> node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%
> 2C1515579724994.1515579743134
> after retrying 10 time(s) *
> java.io.IOException: Failed to create wal log writer hdfs://
> node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%
> 2C1515579724994.1515579743134
> after retrying 10 time(s)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(
> AsyncFSWAL.java:663)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(
> AsyncFSWAL.java:130)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:766)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:504)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> init>(AsyncFSWAL.java:264)
> at
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:69)
> at
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:44)
> at
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:139)
> at
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:55)
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:244)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.
> getWAL(HRegionServer.java:2123)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.
> buildServerLoad(HRegionServer.java:1315)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(
> HRegionServer.java:1196)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.
> run(HRegionServer.java:1008)
> at java.lang.Thread.run(Thread.java:748)
>
>
> ...
>
>
> 2018-01-10 05:23:46,324 INFO  [regionserver/node8.com/192.168.23.2:16020]
> regionserver.HRegionServer: regionserver/node8.com/192.168.23.2:16020
> exiting
> 2018-01-10 05:23:46,324 ERROR [main] regionserver.
> HRegionServerCommand

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
Oh, interesting! If the local DN is dead, HBase can not start... I will
have expected it to just used HDFS and any other node... That's why my
HBase was not able to start. Same, if the DN dies, HBase will not be able
to stop. Should we not be able to survive one DN failure?

JM

2018-01-10 5:31 GMT-05:00 Jean-Marc Spaggiari :

> I know, this one sunk, but still running it on my cluster, so here is a
> new issue I just got
>
> Any idea what this can be? I see this only a one of my nodes...
>
> 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020]
> wal.AsyncFSWAL: create wal log writer hdfs://node2.com:8020/hbase/
> WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.
> 1515579743134 failed, retry = 6
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
> Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
> syscall:getsockopt(..) failed: Connexion refusée
> ... 1 more
>
>
> From the same node, if I ls while the RS is starting, I can see the
> related directoy:
>
>
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 35 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
> node8.com,16020,1515579724994
>
>
>
> and after the RS tries many times and fails the directory is gone:
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 34 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
>
>
>
>
> 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020]
> regionserver.HRegionServer: * ABORTING region server 
> node8.com,16020,1515579724994:
> Unhandled: Failed to create wal log writer hdfs://node2.com:8020/hbase/
> WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.
> 1515579743134 after retrying 10 time(s) *
> java.io.IOException: Failed to create wal log writer hdfs://
> node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%
> 2C16020%2C1515579724994.1515579743134 after retrying 10 time(s)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(AsyncFSWAL.java:663)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(AsyncFSWAL.java:130)
> at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:766)
> at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:504)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> init>(AsyncFSWAL.java:264)
> at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:69)
> at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:44)
> at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:139)
> at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:55)
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:244)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> getWAL(HRegionServer.java:2123)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> buildServerLoad(HRegionServer.java:1315)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> tryRegionServerReport(HRegionServer.java:1196)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> run(HRegionServer.java:1008)
> at java.lang.Thread.run(Thread.java:748)
>
>
> ...
>
>
> 2018-01-10 05:23:46,324 INFO  [regionserver/node8.com/192.168.23

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
I know, this one sunk, but still running it on my cluster, so here is a new
issue I just got

Any idea what this can be? I see this only a one of my nodes...

2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020]
wal.AsyncFSWAL: create wal log writer hdfs://
node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.1515579743134
failed, retry = 6
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
Source)
Caused by:
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
syscall:getsockopt(..) failed: Connexion refusée
... 1 more


>From the same node, if I ls while the RS is starting, I can see the related
directoy:


hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
dfs -ls /hbase/WALs/
Found 35 items
...
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node1.com,16020,1515579724884
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node3.com,16020,1515579738916
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node4.com,16020,1515579717193
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node5.com,16020,1515579724586
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node6.com,16020,1515579724999
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node7.com,16020,1515579725681
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
node8.com,16020,1515579724994



and after the RS tries many times and fails the directory is gone:
hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
dfs -ls /hbase/WALs/
Found 34 items
...
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node1.com,16020,1515579724884
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node3.com,16020,1515579738916
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node4.com,16020,1515579717193
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node5.com,16020,1515579724586
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node6.com,16020,1515579724999
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
node7.com,16020,1515579725681




2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020]
regionserver.HRegionServer: * ABORTING region server
node8.com,16020,1515579724994:
Unhandled: Failed to create wal log writer hdfs://
node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.1515579743134
after retrying 10 time(s) *
java.io.IOException: Failed to create wal log writer hdfs://
node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.1515579743134
after retrying 10 time(s)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:663)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:130)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:766)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:504)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.(AsyncFSWAL.java:264)
at
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:69)
at
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:44)
at
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:139)
at
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:55)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:244)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2123)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1315)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1196)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1008)
at java.lang.Thread.run(Thread.java:748)


...


2018-01-10 05:23:46,324 INFO  [regionserver/node8.com/192.168.23.2:16020]
regionserver.HRegionServer: regionserver/node8.com/192.168.23.2:16020
exiting
2018-01-10 05:23:46,324 ERROR [main] regionserver.HRegionServerCommandLine:
Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
at
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.hadoop.hbase.util.ServerCommandLine

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-09 Thread Stack
On Tue, Jan 9, 2018 at 10:07 AM, Andrew Purtell  wrote:

> I just vetoed the RC because TestMemstoreLABWithoutPool always fails for
> me. It was the same with the last RC too. My Java is Oracle Java 8u144
> running on x64 Linux (Ubuntu xenial). Let me know if you need me to provide
> the test output.
>
>
Ok. I can't make it fail. I'm going to disable it and file an issue where
we can work on figuring what is different here.

Thanks A,

St.Ack



>
> On Tue, Jan 9, 2018 at 9:31 AM, Stack  wrote:
>
> > I put up a new RC JMS. It still has flakies (though Duo fixed
> > TestFromClientSide...). Was thinking that we could release beta-1 though
> it
> > has flakies. We'll keep working on cutting these down as we approach GA.
> > St.Ack
> >
> > On Sun, Jan 7, 2018 at 10:02 PM, Stack  wrote:
> >
> > > On Sun, Jan 7, 2018 at 3:14 AM, Jean-Marc Spaggiari <
> > > [email protected]> wrote:
> > >
> > >> Ok, thanks Stack. I will keep it running all day long until I get a
> > >> successful one. Is that useful that I report all the failed? Or just a
> > >> wast
> > >> of time? Here is the last failed:
> > >>
> > >> [INFO] Results:
> > >> [INFO]
> > >> [ERROR] Failures:
> > >> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > >> expected: but was:
> > >> [ERROR] Errors:
> > >> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > >> TableNotFound Region ...
> > >> [INFO]
> > >> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > >> [INFO]
> > >>
> > >>
> > >>
> > > Thanks for bringing up flakies. If we look at the nightlies' run, we
> can
> > > get the current list. Probably no harm if all tests pass once in a
> while
> > > (smile).
> > >
> > > Looking at your findings, TestFromClientSide.
> > testCheckAndDeleteWithCompareOp
> > > looks to be new to beta-1. Its a cranky one. I'm looking at it. Might
> > punt
> > > to beta-2 if can't figure it by tomorrow. HBASE-19731.
> > >
> > > TestDLSAsyncFSWAL is a flakey that unfortunately passes locally.
> > >
> > > Let me see what others we have...
> > >
> > > S
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >> JMS
> > >>
> > >> 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
> > >>
> > >> > bq. Don't you think we have enough branches already mighty Appy?
> > >> > Yeah we do...sigh.
> > >> >
> > >> >
> > >> > idk about that. But don't we need a *patch* branch branch-2.0 (just
> > like
> > >> > branch-1.4) where we "make backwards-compatible bug fixes" and a
> > *minor*
> > >> > branch branch-2 where we "add functionality in a
> backwards-compatible
> > >> > manner".
> > >> > Quotes are from http://hbase.apache.org/book.h
> > >> tml#hbase.versioning.post10.
> > >> > I stumbled on this issue when thinking about backporting
> > >> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> > >> >
> > >> > -- Appy
> > >> >
> > >> >
> > >> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> > >> >
> > >> > > It is not you.  There are a bunch of flies we need to fix. This
> > >> latter is
> > >> > > for sure flakey.  Let me take a look. Thanks, JMS.
> > >> > >
> > >> > > S
> > >> > >
> > >> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > I might not doing the right magic to get that run If someone
> is
> > >> able
> > >> > to
> > >> > > get all the tests pass, can you please share the command you run?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JMS
> > >> > >
> > >> > >
> > >> > > [INFO] Results:
> > >> > > [INFO]
> > >> > > [ERROR] Failures:
> > >> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > >> > > expected: but was:
> > >> > > [ERROR]
> > >> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
> > >> onsProcedure
> > >> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.
> master.assig
> > >> > > nment.TestMergeTableRegionsProcedure)
> > >> > > [ERROR]   Run 1:
> > >> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
> > >> estingKillFl
> > >> > > ag:138
> > >> > > expected executor to be running
> > >> > > [ERROR]   Run 2:
> > >> > > TestMergeTableRegionsProcedure.tearDown:128->
> > >> > > resetProcExecutorTestingKillFl
> > >> > > ag:138
> > >> > > expected executor to be running
> > >> > > [INFO]
> > >> > > [ERROR]
> > >> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
> > >> onsProcedure
> > >> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.
> assignment.Tes
> > >> > > tMergeTableRegionsProcedure)
> > >> > > [ERROR]   Run 1:
> > >> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
> > >> estingKillFl
> > >> > > ag:138
> > >> > > expected executor to be running
> > >> > > [ERROR]   Run 2:
> > >> > > TestMergeTableRegionsProcedure.tearDown:128->
> > >> > > resetProcExecutorTestingKillFl
> > >> > > ag:138
> > >> > > expected executor to be running
> > >> > > [INFO]
> > >> > > [ERROR]
> > >> > > org.apache.hadoop.hbase.master.assignment.TestMe

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-09 Thread Andrew Purtell
I just vetoed the RC because TestMemstoreLABWithoutPool always fails for
me. It was the same with the last RC too. My Java is Oracle Java 8u144
running on x64 Linux (Ubuntu xenial). Let me know if you need me to provide
the test output.


On Tue, Jan 9, 2018 at 9:31 AM, Stack  wrote:

> I put up a new RC JMS. It still has flakies (though Duo fixed
> TestFromClientSide...). Was thinking that we could release beta-1 though it
> has flakies. We'll keep working on cutting these down as we approach GA.
> St.Ack
>
> On Sun, Jan 7, 2018 at 10:02 PM, Stack  wrote:
>
> > On Sun, Jan 7, 2018 at 3:14 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> >> Ok, thanks Stack. I will keep it running all day long until I get a
> >> successful one. Is that useful that I report all the failed? Or just a
> >> wast
> >> of time? Here is the last failed:
> >>
> >> [INFO] Results:
> >> [INFO]
> >> [ERROR] Failures:
> >> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> >> expected: but was:
> >> [ERROR] Errors:
> >> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> >> TableNotFound Region ...
> >> [INFO]
> >> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> >> [INFO]
> >>
> >>
> >>
> > Thanks for bringing up flakies. If we look at the nightlies' run, we can
> > get the current list. Probably no harm if all tests pass once in a while
> > (smile).
> >
> > Looking at your findings, TestFromClientSide.
> testCheckAndDeleteWithCompareOp
> > looks to be new to beta-1. Its a cranky one. I'm looking at it. Might
> punt
> > to beta-2 if can't figure it by tomorrow. HBASE-19731.
> >
> > TestDLSAsyncFSWAL is a flakey that unfortunately passes locally.
> >
> > Let me see what others we have...
> >
> > S
> >
> >
> >
> >
> >
> >
> >
> >
> >> JMS
> >>
> >> 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
> >>
> >> > bq. Don't you think we have enough branches already mighty Appy?
> >> > Yeah we do...sigh.
> >> >
> >> >
> >> > idk about that. But don't we need a *patch* branch branch-2.0 (just
> like
> >> > branch-1.4) where we "make backwards-compatible bug fixes" and a
> *minor*
> >> > branch branch-2 where we "add functionality in a backwards-compatible
> >> > manner".
> >> > Quotes are from http://hbase.apache.org/book.h
> >> tml#hbase.versioning.post10.
> >> > I stumbled on this issue when thinking about backporting
> >> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> >> >
> >> > -- Appy
> >> >
> >> >
> >> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> >> >
> >> > > It is not you.  There are a bunch of flies we need to fix. This
> >> latter is
> >> > > for sure flakey.  Let me take a look. Thanks, JMS.
> >> > >
> >> > > S
> >> > >
> >> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > I might not doing the right magic to get that run If someone is
> >> able
> >> > to
> >> > > get all the tests pass, can you please share the command you run?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > JMS
> >> > >
> >> > >
> >> > > [INFO] Results:
> >> > > [INFO]
> >> > > [ERROR] Failures:
> >> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> >> > > expected: but was:
> >> > > [ERROR]
> >> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
> >> onsProcedure
> >> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> >> > > nment.TestMergeTableRegionsProcedure)
> >> > > [ERROR]   Run 1:
> >> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
> >> estingKillFl
> >> > > ag:138
> >> > > expected executor to be running
> >> > > [ERROR]   Run 2:
> >> > > TestMergeTableRegionsProcedure.tearDown:128->
> >> > > resetProcExecutorTestingKillFl
> >> > > ag:138
> >> > > expected executor to be running
> >> > > [INFO]
> >> > > [ERROR]
> >> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
> >> onsProcedure
> >> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> >> > > tMergeTableRegionsProcedure)
> >> > > [ERROR]   Run 1:
> >> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
> >> estingKillFl
> >> > > ag:138
> >> > > expected executor to be running
> >> > > [ERROR]   Run 2:
> >> > > TestMergeTableRegionsProcedure.tearDown:128->
> >> > > resetProcExecutorTestingKillFl
> >> > > ag:138
> >> > > expected executor to be running
> >> > > [INFO]
> >> > > [ERROR]
> >> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
> >> onsProcedure
> >> > .
> >> > > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> >> > > ignment.TestMergeTableRegionsProcedure)
> >> > > [ERROR]   Run 1:
> >> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
> >> estingKillFl
> >> > > ag:138
> >> > > expected executor to be running
> >> > > [ERROR]   Run 2:
> >> > > TestMergeTableRegionsProcedure.tearDown:128->
> >> > > resetProcExecutorTestingKillFl
> >> > > ag:138
> >> > > expected executor to 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-09 Thread Stack
I put up a new RC JMS. It still has flakies (though Duo fixed
TestFromClientSide...). Was thinking that we could release beta-1 though it
has flakies. We'll keep working on cutting these down as we approach GA.
St.Ack

On Sun, Jan 7, 2018 at 10:02 PM, Stack  wrote:

> On Sun, Jan 7, 2018 at 3:14 AM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
>> Ok, thanks Stack. I will keep it running all day long until I get a
>> successful one. Is that useful that I report all the failed? Or just a
>> wast
>> of time? Here is the last failed:
>>
>> [INFO] Results:
>> [INFO]
>> [ERROR] Failures:
>> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
>> expected: but was:
>> [ERROR] Errors:
>> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
>> TableNotFound Region ...
>> [INFO]
>> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
>> [INFO]
>>
>>
>>
> Thanks for bringing up flakies. If we look at the nightlies' run, we can
> get the current list. Probably no harm if all tests pass once in a while
> (smile).
>
> Looking at your findings, TestFromClientSide.testCheckAndDeleteWithCompareOp
> looks to be new to beta-1. Its a cranky one. I'm looking at it. Might punt
> to beta-2 if can't figure it by tomorrow. HBASE-19731.
>
> TestDLSAsyncFSWAL is a flakey that unfortunately passes locally.
>
> Let me see what others we have...
>
> S
>
>
>
>
>
>
>
>
>> JMS
>>
>> 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
>>
>> > bq. Don't you think we have enough branches already mighty Appy?
>> > Yeah we do...sigh.
>> >
>> >
>> > idk about that. But don't we need a *patch* branch branch-2.0 (just like
>> > branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
>> > branch branch-2 where we "add functionality in a backwards-compatible
>> > manner".
>> > Quotes are from http://hbase.apache.org/book.h
>> tml#hbase.versioning.post10.
>> > I stumbled on this issue when thinking about backporting
>> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
>> >
>> > -- Appy
>> >
>> >
>> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
>> >
>> > > It is not you.  There are a bunch of flies we need to fix. This
>> latter is
>> > > for sure flakey.  Let me take a look. Thanks, JMS.
>> > >
>> > > S
>> > >
>> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
>> [email protected]>
>> > > wrote:
>> > >
>> > > I might not doing the right magic to get that run If someone is
>> able
>> > to
>> > > get all the tests pass, can you please share the command you run?
>> > >
>> > > Thanks,
>> > >
>> > > JMS
>> > >
>> > >
>> > > [INFO] Results:
>> > > [INFO]
>> > > [ERROR] Failures:
>> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
>> > > expected: but was:
>> > > [ERROR]
>> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
>> onsProcedure
>> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
>> > > nment.TestMergeTableRegionsProcedure)
>> > > [ERROR]   Run 1:
>> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
>> estingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [ERROR]   Run 2:
>> > > TestMergeTableRegionsProcedure.tearDown:128->
>> > > resetProcExecutorTestingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [INFO]
>> > > [ERROR]
>> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
>> onsProcedure
>> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
>> > > tMergeTableRegionsProcedure)
>> > > [ERROR]   Run 1:
>> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
>> estingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [ERROR]   Run 2:
>> > > TestMergeTableRegionsProcedure.tearDown:128->
>> > > resetProcExecutorTestingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [INFO]
>> > > [ERROR]
>> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
>> onsProcedure
>> > .
>> > > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
>> > > ignment.TestMergeTableRegionsProcedure)
>> > > [ERROR]   Run 1:
>> > > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorT
>> estingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [ERROR]   Run 2:
>> > > TestMergeTableRegionsProcedure.tearDown:128->
>> > > resetProcExecutorTestingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [INFO]
>> > > [ERROR]
>> > > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegi
>> onsProcedure
>> > .
>> > > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
>> > > ignment.TestMergeTableRegionsProcedure)
>> > > [ERROR]   Run 1:
>> > > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
>> > > expected: but was:
>> > > [ERROR]   Run 2:
>> > > TestMergeTableRegionsProcedure.tearDown:128->
>> > > resetProcExecutorTestingKillFl
>> > > ag:138
>> > > expected executor to be running
>> > > [INFO]
>> > > [ERROR]   TestSnapsh

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Stack
On Sun, Jan 7, 2018 at 3:14 AM, Jean-Marc Spaggiari  wrote:

> Ok, thanks Stack. I will keep it running all day long until I get a
> successful one. Is that useful that I report all the failed? Or just a wast
> of time? Here is the last failed:
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> expected: but was:
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound Region ...
> [INFO]
> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> [INFO]
>
>
>
Thanks for bringing up flakies. If we look at the nightlies' run, we can
get the current list. Probably no harm if all tests pass once in a while
(smile).

Looking at your findings, TestFromClientSide.testCheckAndDeleteWithCompareOp
looks to be new to beta-1. Its a cranky one. I'm looking at it. Might punt
to beta-2 if can't figure it by tomorrow. HBASE-19731.

TestDLSAsyncFSWAL is a flakey that unfortunately passes locally.

Let me see what others we have...

S








> JMS
>
> 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
>
> > bq. Don't you think we have enough branches already mighty Appy?
> > Yeah we do...sigh.
> >
> >
> > idk about that. But don't we need a *patch* branch branch-2.0 (just like
> > branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> > branch branch-2 where we "add functionality in a backwards-compatible
> > manner".
> > Quotes are from http://hbase.apache.org/book.
> html#hbase.versioning.post10.
> > I stumbled on this issue when thinking about backporting
> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> >
> > -- Appy
> >
> >
> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> >
> > > It is not you.  There are a bunch of flies we need to fix. This latter
> is
> > > for sure flakey.  Let me take a look. Thanks, JMS.
> > >
> > > S
> > >
> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari"  >
> > > wrote:
> > >
> > > I might not doing the right magic to get that run If someone is
> able
> > to
> > > get all the tests pass, can you please share the command you run?
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > >
> > > [INFO] Results:
> > > [INFO]
> > > [ERROR] Failures:
> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > expected: but was:
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > > nment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > > tMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > .
> > > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > > ignment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > .
> > > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > > ignment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> > > expected: but was:
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> > > timed
> > > out after [30 000] msec
> > > [ERROR]
> > >  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> > > expected null, but was: > NotServingRegionException:
> > > testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306
> f6.
> > > is
> > > closing>
> > > [ERROR] Errors:
> > > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > TableNotFound testThr...
> > > [ERROR]

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Duo Zhang
Hope we still have time to get the procedure based replication peer
modification(HBASE-19397) in before cutting branch-2.0... :(

2018-01-07 21:51 GMT+08:00 Stack :

> On Sun, Jan 7, 2018 at 12:55 AM, Apekshit Sharma 
> wrote:
>
> > bq. Don't you think we have enough branches already mighty Appy?
> > Yeah we do...sigh.
> >
> >
> > idk about that. But don't we need a *patch* branch branch-2.0 (just like
> > branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> > branch branch-2 where we "add functionality in a backwards-compatible
> > manner".
> > Quotes are from http://hbase.apache.org/book.
> html#hbase.versioning.post10.
> > I stumbled on this issue when thinking about backporting
> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> >
> >
> Yes. Was trying to put this off as long as possible -- to avoid yet more
> banches that we need to backport to -- but looks like its coming time.
> St.Ack
>
>
>
> > -- Appy
> >
> >
> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> >
> > > It is not you.  There are a bunch of flies we need to fix. This latter
> is
> > > for sure flakey.  Let me take a look. Thanks, JMS.
> > >
> > > S
> > >
> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari"  >
> > > wrote:
> > >
> > > I might not doing the right magic to get that run If someone is
> able
> > to
> > > get all the tests pass, can you please share the command you run?
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > >
> > > [INFO] Results:
> > > [INFO]
> > > [ERROR] Failures:
> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > expected: but was:
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > > nment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > > tMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > .
> > > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > > ignment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > .
> > > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > > ignment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> > > expected: but was:
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> > > timed
> > > out after [30 000] msec
> > > [ERROR]
> > >  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> > > expected null, but was: > NotServingRegionException:
> > > testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306
> f6.
> > > is
> > > closing>
> > > [ERROR] Errors:
> > > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > TableNotFound testThr...
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.
> > > testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.
> > > TestRegionsOnMasterOptions)
> > > [ERROR]   Run 1:
> > > TestRegionsOnMasterOptions.testRegionsOnAllServers:94->
> > > checkBalance:207->Object.wait:-2
> > > » TestTimedOut
> > > [ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
> > > Appears to be stuck in t...
> > > [INFO]
> > > [INFO]
> > > [ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
> > > [INFO]
> > >
> > >
> > > 2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari <
> [email protected]
> > >:
> > >
> > > > Deleted the class to get all the tests running. Was running on t

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Stack
On Sun, Jan 7, 2018 at 12:55 AM, Apekshit Sharma  wrote:

> bq. Don't you think we have enough branches already mighty Appy?
> Yeah we do...sigh.
>
>
> idk about that. But don't we need a *patch* branch branch-2.0 (just like
> branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> branch branch-2 where we "add functionality in a backwards-compatible
> manner".
> Quotes are from http://hbase.apache.org/book.html#hbase.versioning.post10.
> I stumbled on this issue when thinking about backporting
> https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
>
>
Yes. Was trying to put this off as long as possible -- to avoid yet more
banches that we need to backport to -- but looks like its coming time.
St.Ack



> -- Appy
>
>
> On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
>
> > It is not you.  There are a bunch of flies we need to fix. This latter is
> > for sure flakey.  Let me take a look. Thanks, JMS.
> >
> > S
> >
> > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" 
> > wrote:
> >
> > I might not doing the right magic to get that run If someone is able
> to
> > get all the tests pass, can you please share the command you run?
> >
> > Thanks,
> >
> > JMS
> >
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > expected: but was:
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > nment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > tMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> > expected: but was:
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> > timed
> > out after [30 000] msec
> > [ERROR]
> >  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> > expected null, but was: NotServingRegionException:
> > testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306f6.
> > is
> > closing>
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound testThr...
> > [ERROR]
> > org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.
> > testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.
> > TestRegionsOnMasterOptions)
> > [ERROR]   Run 1:
> > TestRegionsOnMasterOptions.testRegionsOnAllServers:94->
> > checkBalance:207->Object.wait:-2
> > » TestTimedOut
> > [ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
> > Appears to be stuck in t...
> > [INFO]
> > [INFO]
> > [ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
> > [INFO]
> >
> >
> > 2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari  >:
> >
> > > Deleted the class to get all the tests running. Was running on the RC1
> > > from the tar.
> > >
> > > I know get those one failing.
> > >
> > > [ERROR] Failures:
> > > [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
> > > Balancer did not run
> > > [ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:
> > 284
> > > hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-
> > > 5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/
> > > f1bdc6441b090dbacb391c74eaf0d1d0
> > > [ERROR] Errors:
> > > [ERROR]   Tes

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
Excellent! Thanks again! Starting again with the tests...

JMS

2018-01-07 8:04 GMT-05:00 张铎(Duo Zhang) :

> The last '-fn' option in the mvn command does that magic for you.
>
> 2018-01-07 19:03 GMT+08:00 Jean-Marc Spaggiari :
>
> > So that's the way! Super. Thanks 张铎. Last, is there a way to keep going
> > with the remaining tests even if we get a failure on a test?
> >
> > JMS
> >
> > 2018-01-07 5:56 GMT-05:00 张铎(Duo Zhang) :
> >
> > > You can try to copy the command line from the pre commit job where we
> > will
> > > bypass the flakey tests...
> > >
> > > This is the command I use to run UTs
> > >
> > > mvn -PrunAllTests
> > > -Dtest.exclude.pattern=**/master.assignment.
> > TestMergeTableRegionsProcedure
> > > .java,**/master.balancer.TestRegionsOnMasterOptions.
> > java,**/regionserver.
> > > TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/
> > > regionserver.TestRegionMergeTransactionOnCluster.java,**/master.
> > > TestAssignmentManagerMetrics.java,**/snapshot.
> > TestExportSnapshot.java,**/
> > > master.TestDLSAsyncFSWAL.java,**/master.balancer.
> > > TestStochasticLoadBalancer2.java,**/master.assignment.
> > > TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded
> > > .java,**/master.balancer.TestFavoredStochasticLoadBalan
> > cer.java,**/master.
> > > TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.
> > > TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.
> > > java,**/mob.compactions.TestMobCompactor.java,**/regionserver.
> > > TestRegionServerReadRequestMetrics.java,**/client.
> > > TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.
> > > java,**/replication.TestReplicationSmallTests.
> java,**/master.assignment.
> > > TestSplitTableRegionProcedure.java,**/replication.
> > > TestReplicationKillSlaveRS.java,**/quotas.
> TestSnapshotQuotaObserverChore
> > > .java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.
> > > java,**/TestZooKeeper.java,**/master.TestRestartCluster.
> > > java,**/client.locking.TestEntityLocks.java,**/client.
> > > TestMobSnapshotCloneIndependence.java,**/regionserver.
> > > TestMemstoreLABWithoutPool.java,**/client.
> TestMetaWithReplicas.java,**/
> > > regionserver.wal.TestAsyncLogRolling.java,**/snapshot.
> > > TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.
> > > TestMetaShutdownHandler.java,**/client.TestSizeFailures.
> > > java,**/regionserver.TestFSErrorsExposed.java,**/
> > > master.TestSplitLogManager.java,**/master.cleaner.
> > > TestHFileCleaner.java,**/TestFromClientSide**
> > > -Dsurefire.secondPartForkCount=1 clean test -fn
> > >
> > > The TestFromClientSide is not reported by the flakey tests detector but
> > > same with you, I found that it fails all the time, so also exclude it.
> > >
> > > Hope this would help.
> > >
> > > 2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari <
> [email protected]
> > >:
> > >
> > > > Ok, thanks Stack. I will keep it running all day long until I get a
> > > > successful one. Is that useful that I report all the failed? Or just
> a
> > > wast
> > > > of time? Here is the last failed:
> > > >
> > > > [INFO] Results:
> > > > [INFO]
> > > > [ERROR] Failures:
> > > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > > expected: but was:
> > > > [ERROR] Errors:
> > > > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > > TableNotFound Region ...
> > > > [INFO]
> > > > [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > > > [INFO]
> > > >
> > > >
> > > > JMS
> > > >
> > > > 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
> > > >
> > > > > bq. Don't you think we have enough branches already mighty Appy?
> > > > > Yeah we do...sigh.
> > > > >
> > > > >
> > > > > idk about that. But don't we need a *patch* branch branch-2.0 (just
> > > like
> > > > > branch-1.4) where we "make backwards-compatible bug fixes" and a
> > > *minor*
> > > > > branch branch-2 where we "add functionality in a
> backwards-compatible
> > > > > manner".
> > > > > Quotes are from http://hbase.apache.org/book.
> > > > html#hbase.versioning.post10.
> > > > > I stumbled on this issue when thinking about backporting
> > > > > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> > > > >
> > > > > -- Appy
> > > > >
> > > > >
> > > > > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> > > > >
> > > > > > It is not you.  There are a bunch of flies we need to fix. This
> > > latter
> > > > is
> > > > > > for sure flakey.  Let me take a look. Thanks, JMS.
> > > > > >
> > > > > > S
> > > > > >
> > > > > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > I might not doing the right magic to get that run If someone
> is
> > > > able
> > > > > to
> > > > > > get all the tests pass, can you please share the command you run?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > JMS
> > > > > >
> > > > > >
> > > > > > [INFO] Resu

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Duo Zhang
The last '-fn' option in the mvn command does that magic for you.

2018-01-07 19:03 GMT+08:00 Jean-Marc Spaggiari :

> So that's the way! Super. Thanks 张铎. Last, is there a way to keep going
> with the remaining tests even if we get a failure on a test?
>
> JMS
>
> 2018-01-07 5:56 GMT-05:00 张铎(Duo Zhang) :
>
> > You can try to copy the command line from the pre commit job where we
> will
> > bypass the flakey tests...
> >
> > This is the command I use to run UTs
> >
> > mvn -PrunAllTests
> > -Dtest.exclude.pattern=**/master.assignment.
> TestMergeTableRegionsProcedure
> > .java,**/master.balancer.TestRegionsOnMasterOptions.
> java,**/regionserver.
> > TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/
> > regionserver.TestRegionMergeTransactionOnCluster.java,**/master.
> > TestAssignmentManagerMetrics.java,**/snapshot.
> TestExportSnapshot.java,**/
> > master.TestDLSAsyncFSWAL.java,**/master.balancer.
> > TestStochasticLoadBalancer2.java,**/master.assignment.
> > TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded
> > .java,**/master.balancer.TestFavoredStochasticLoadBalan
> cer.java,**/master.
> > TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.
> > TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.
> > java,**/mob.compactions.TestMobCompactor.java,**/regionserver.
> > TestRegionServerReadRequestMetrics.java,**/client.
> > TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.
> > java,**/replication.TestReplicationSmallTests.java,**/master.assignment.
> > TestSplitTableRegionProcedure.java,**/replication.
> > TestReplicationKillSlaveRS.java,**/quotas.TestSnapshotQuotaObserverChore
> > .java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.
> > java,**/TestZooKeeper.java,**/master.TestRestartCluster.
> > java,**/client.locking.TestEntityLocks.java,**/client.
> > TestMobSnapshotCloneIndependence.java,**/regionserver.
> > TestMemstoreLABWithoutPool.java,**/client.TestMetaWithReplicas.java,**/
> > regionserver.wal.TestAsyncLogRolling.java,**/snapshot.
> > TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.
> > TestMetaShutdownHandler.java,**/client.TestSizeFailures.
> > java,**/regionserver.TestFSErrorsExposed.java,**/
> > master.TestSplitLogManager.java,**/master.cleaner.
> > TestHFileCleaner.java,**/TestFromClientSide**
> > -Dsurefire.secondPartForkCount=1 clean test -fn
> >
> > The TestFromClientSide is not reported by the flakey tests detector but
> > same with you, I found that it fails all the time, so also exclude it.
> >
> > Hope this would help.
> >
> > 2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari  >:
> >
> > > Ok, thanks Stack. I will keep it running all day long until I get a
> > > successful one. Is that useful that I report all the failed? Or just a
> > wast
> > > of time? Here is the last failed:
> > >
> > > [INFO] Results:
> > > [INFO]
> > > [ERROR] Failures:
> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > expected: but was:
> > > [ERROR] Errors:
> > > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > TableNotFound Region ...
> > > [INFO]
> > > [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > > [INFO]
> > >
> > >
> > > JMS
> > >
> > > 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
> > >
> > > > bq. Don't you think we have enough branches already mighty Appy?
> > > > Yeah we do...sigh.
> > > >
> > > >
> > > > idk about that. But don't we need a *patch* branch branch-2.0 (just
> > like
> > > > branch-1.4) where we "make backwards-compatible bug fixes" and a
> > *minor*
> > > > branch branch-2 where we "add functionality in a backwards-compatible
> > > > manner".
> > > > Quotes are from http://hbase.apache.org/book.
> > > html#hbase.versioning.post10.
> > > > I stumbled on this issue when thinking about backporting
> > > > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> > > >
> > > > -- Appy
> > > >
> > > >
> > > > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> > > >
> > > > > It is not you.  There are a bunch of flies we need to fix. This
> > latter
> > > is
> > > > > for sure flakey.  Let me take a look. Thanks, JMS.
> > > > >
> > > > > S
> > > > >
> > > > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > I might not doing the right magic to get that run If someone is
> > > able
> > > > to
> > > > > get all the tests pass, can you please share the command you run?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > JMS
> > > > >
> > > > >
> > > > > [INFO] Results:
> > > > > [INFO]
> > > > > [ERROR] Failures:
> > > > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > > > expected: but was:
> > > > > [ERROR]
> > > > > org.apache.hadoop.hbase.master.assignment.
> > > TestMergeTableRegionsProcedure
> > > > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > > > > nment.TestMergeTableRegionsProcedure)
> > > > > [

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
So that's the way! Super. Thanks 张铎. Last, is there a way to keep going
with the remaining tests even if we get a failure on a test?

JMS

2018-01-07 5:56 GMT-05:00 张铎(Duo Zhang) :

> You can try to copy the command line from the pre commit job where we will
> bypass the flakey tests...
>
> This is the command I use to run UTs
>
> mvn -PrunAllTests
> -Dtest.exclude.pattern=**/master.assignment.TestMergeTableRegionsProcedure
> .java,**/master.balancer.TestRegionsOnMasterOptions.java,**/regionserver.
> TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/
> regionserver.TestRegionMergeTransactionOnCluster.java,**/master.
> TestAssignmentManagerMetrics.java,**/snapshot.TestExportSnapshot.java,**/
> master.TestDLSAsyncFSWAL.java,**/master.balancer.
> TestStochasticLoadBalancer2.java,**/master.assignment.
> TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded
> .java,**/master.balancer.TestFavoredStochasticLoadBalancer.java,**/master.
> TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.
> TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.
> java,**/mob.compactions.TestMobCompactor.java,**/regionserver.
> TestRegionServerReadRequestMetrics.java,**/client.
> TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.
> java,**/replication.TestReplicationSmallTests.java,**/master.assignment.
> TestSplitTableRegionProcedure.java,**/replication.
> TestReplicationKillSlaveRS.java,**/quotas.TestSnapshotQuotaObserverChore
> .java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.
> java,**/TestZooKeeper.java,**/master.TestRestartCluster.
> java,**/client.locking.TestEntityLocks.java,**/client.
> TestMobSnapshotCloneIndependence.java,**/regionserver.
> TestMemstoreLABWithoutPool.java,**/client.TestMetaWithReplicas.java,**/
> regionserver.wal.TestAsyncLogRolling.java,**/snapshot.
> TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.
> TestMetaShutdownHandler.java,**/client.TestSizeFailures.
> java,**/regionserver.TestFSErrorsExposed.java,**/
> master.TestSplitLogManager.java,**/master.cleaner.
> TestHFileCleaner.java,**/TestFromClientSide**
> -Dsurefire.secondPartForkCount=1 clean test -fn
>
> The TestFromClientSide is not reported by the flakey tests detector but
> same with you, I found that it fails all the time, so also exclude it.
>
> Hope this would help.
>
> 2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari :
>
> > Ok, thanks Stack. I will keep it running all day long until I get a
> > successful one. Is that useful that I report all the failed? Or just a
> wast
> > of time? Here is the last failed:
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > expected: but was:
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound Region ...
> > [INFO]
> > [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > [INFO]
> >
> >
> > JMS
> >
> > 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
> >
> > > bq. Don't you think we have enough branches already mighty Appy?
> > > Yeah we do...sigh.
> > >
> > >
> > > idk about that. But don't we need a *patch* branch branch-2.0 (just
> like
> > > branch-1.4) where we "make backwards-compatible bug fixes" and a
> *minor*
> > > branch branch-2 where we "add functionality in a backwards-compatible
> > > manner".
> > > Quotes are from http://hbase.apache.org/book.
> > html#hbase.versioning.post10.
> > > I stumbled on this issue when thinking about backporting
> > > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> > >
> > > -- Appy
> > >
> > >
> > > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> > >
> > > > It is not you.  There are a bunch of flies we need to fix. This
> latter
> > is
> > > > for sure flakey.  Let me take a look. Thanks, JMS.
> > > >
> > > > S
> > > >
> > > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > I might not doing the right magic to get that run If someone is
> > able
> > > to
> > > > get all the tests pass, can you please share the command you run?
> > > >
> > > > Thanks,
> > > >
> > > > JMS
> > > >
> > > >
> > > > [INFO] Results:
> > > > [INFO]
> > > > [ERROR] Failures:
> > > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > > expected: but was:
> > > > [ERROR]
> > > > org.apache.hadoop.hbase.master.assignment.
> > TestMergeTableRegionsProcedure
> > > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > > > nment.TestMergeTableRegionsProcedure)
> > > > [ERROR]   Run 1:
> > > > TestMergeTableRegionsProcedure.setup:111->
> > resetProcExecutorTestingKillFl
> > > > ag:138
> > > > expected executor to be running
> > > > [ERROR]   Run 2:
> > > > TestMergeTableRegionsProcedure.tearDown:128->
> > > > resetProcExecutorTestingKillFl
> > > > ag:138
> > > > expected executor to be running
> > > > [INFO]
> > > > [ERROR]
> > > > org.apache.

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Duo Zhang
You can try to copy the command line from the pre commit job where we will
bypass the flakey tests...

This is the command I use to run UTs

mvn -PrunAllTests
-Dtest.exclude.pattern=**/master.assignment.TestMergeTableRegionsProcedure.java,**/master.balancer.TestRegionsOnMasterOptions.java,**/regionserver.TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/regionserver.TestRegionMergeTransactionOnCluster.java,**/master.TestAssignmentManagerMetrics.java,**/snapshot.TestExportSnapshot.java,**/master.TestDLSAsyncFSWAL.java,**/master.balancer.TestStochasticLoadBalancer2.java,**/master.assignment.TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded.java,**/master.balancer.TestFavoredStochasticLoadBalancer.java,**/master.TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.java,**/mob.compactions.TestMobCompactor.java,**/regionserver.TestRegionServerReadRequestMetrics.java,**/client.TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.java,**/replication.TestReplicationSmallTests.java,**/master.assignment.TestSplitTableRegionProcedure.java,**/replication.TestReplicationKillSlaveRS.java,**/quotas.TestSnapshotQuotaObserverChore.java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.java,**/TestZooKeeper.java,**/master.TestRestartCluster.java,**/client.locking.TestEntityLocks.java,**/client.TestMobSnapshotCloneIndependence.java,**/regionserver.TestMemstoreLABWithoutPool.java,**/client.TestMetaWithReplicas.java,**/regionserver.wal.TestAsyncLogRolling.java,**/snapshot.TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.TestMetaShutdownHandler.java,**/client.TestSizeFailures.java,**/regionserver.TestFSErrorsExposed.java,**/master.TestSplitLogManager.java,**/master.cleaner.TestHFileCleaner.java,**/TestFromClientSide**
-Dsurefire.secondPartForkCount=1 clean test -fn

The TestFromClientSide is not reported by the flakey tests detector but
same with you, I found that it fails all the time, so also exclude it.

Hope this would help.

2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari :

> Ok, thanks Stack. I will keep it running all day long until I get a
> successful one. Is that useful that I report all the failed? Or just a wast
> of time? Here is the last failed:
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> expected: but was:
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound Region ...
> [INFO]
> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> [INFO]
>
>
> JMS
>
> 2018-01-07 1:55 GMT-05:00 Apekshit Sharma :
>
> > bq. Don't you think we have enough branches already mighty Appy?
> > Yeah we do...sigh.
> >
> >
> > idk about that. But don't we need a *patch* branch branch-2.0 (just like
> > branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> > branch branch-2 where we "add functionality in a backwards-compatible
> > manner".
> > Quotes are from http://hbase.apache.org/book.
> html#hbase.versioning.post10.
> > I stumbled on this issue when thinking about backporting
> > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> >
> > -- Appy
> >
> >
> > On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
> >
> > > It is not you.  There are a bunch of flies we need to fix. This latter
> is
> > > for sure flakey.  Let me take a look. Thanks, JMS.
> > >
> > > S
> > >
> > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari"  >
> > > wrote:
> > >
> > > I might not doing the right magic to get that run If someone is
> able
> > to
> > > get all the tests pass, can you please share the command you run?
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > >
> > > [INFO] Results:
> > > [INFO]
> > > [ERROR] Failures:
> > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > expected: but was:
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > > nment.TestMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [INFO]
> > > [ERROR]
> > > org.apache.hadoop.hbase.master.assignment.
> TestMergeTableRegionsProcedure
> > > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > > tMergeTableRegionsProcedure)
> > > [ERROR]   Run 1:
> > > TestMergeTableRegionsProcedure.setup:111->
> resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be running
> > > [ERROR]   Run 2:
> > > TestMergeTableRegionsProcedure.tearDown:128->
> > > resetProcExecutorTestingKillFl
> > > ag:138
> > > expected executor to be runnin

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
Ok, thanks Stack. I will keep it running all day long until I get a
successful one. Is that useful that I report all the failed? Or just a wast
of time? Here is the last failed:

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
expected: but was:
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound Region ...
[INFO]
[ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
[INFO]


JMS

2018-01-07 1:55 GMT-05:00 Apekshit Sharma :

> bq. Don't you think we have enough branches already mighty Appy?
> Yeah we do...sigh.
>
>
> idk about that. But don't we need a *patch* branch branch-2.0 (just like
> branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> branch branch-2 where we "add functionality in a backwards-compatible
> manner".
> Quotes are from http://hbase.apache.org/book.html#hbase.versioning.post10.
> I stumbled on this issue when thinking about backporting
> https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
>
> -- Appy
>
>
> On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:
>
> > It is not you.  There are a bunch of flies we need to fix. This latter is
> > for sure flakey.  Let me take a look. Thanks, JMS.
> >
> > S
> >
> > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" 
> > wrote:
> >
> > I might not doing the right magic to get that run If someone is able
> to
> > get all the tests pass, can you please share the command you run?
> >
> > Thanks,
> >
> > JMS
> >
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > expected: but was:
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > nment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > tMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> > expected: but was:
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> > timed
> > out after [30 000] msec
> > [ERROR]
> >  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> > expected null, but was: NotServingRegionException:
> > testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306f6.
> > is
> > closing>
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound testThr...
> > [ERROR]
> > org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.
> > testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.
> > TestRegionsOnMasterOptions)
> > [ERROR]   Run 1:
> > TestRegionsOnMasterOptions.testRegionsOnAllServers:94->
> > checkBalance:207->Object.wait:-2
> > » TestTimedOut
> > [ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
> > Appears to be stuck in t...
> > [INFO]
> > [INFO]
> > [ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
> > [INFO]
> >
> >
> > 2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari  >:
> >
> > > Deleted the class to get all the tests running. Was running on the RC1
> > > from the tar.
> > >
> > > I know get those one failing.
> > >
> > > [ERROR] Failures:
> > > [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
>

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Apekshit Sharma
bq. Don't you think we have enough branches already mighty Appy?
Yeah we do...sigh.


idk about that. But don't we need a *patch* branch branch-2.0 (just like
branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
branch branch-2 where we "add functionality in a backwards-compatible
manner".
Quotes are from http://hbase.apache.org/book.html#hbase.versioning.post10.
I stumbled on this issue when thinking about backporting
https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.

-- Appy


On Sat, Jan 6, 2018 at 4:11 PM, stack  wrote:

> It is not you.  There are a bunch of flies we need to fix. This latter is
> for sure flakey.  Let me take a look. Thanks, JMS.
>
> S
>
> On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" 
> wrote:
>
> I might not doing the right magic to get that run If someone is able to
> get all the tests pass, can you please share the command you run?
>
> Thanks,
>
> JMS
>
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> expected: but was:
> [ERROR]
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> nment.TestMergeTableRegionsProcedure)
> [ERROR]   Run 1:
> TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [ERROR]   Run 2:
> TestMergeTableRegionsProcedure.tearDown:128->
> resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [INFO]
> [ERROR]
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> tMergeTableRegionsProcedure)
> [ERROR]   Run 1:
> TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [ERROR]   Run 2:
> TestMergeTableRegionsProcedure.tearDown:128->
> resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [INFO]
> [ERROR]
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.
> testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> ignment.TestMergeTableRegionsProcedure)
> [ERROR]   Run 1:
> TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [ERROR]   Run 2:
> TestMergeTableRegionsProcedure.tearDown:128->
> resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [INFO]
> [ERROR]
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.
> testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> ignment.TestMergeTableRegionsProcedure)
> [ERROR]   Run 1:
> TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> expected: but was:
> [ERROR]   Run 2:
> TestMergeTableRegionsProcedure.tearDown:128->
> resetProcExecutorTestingKillFl
> ag:138
> expected executor to be running
> [INFO]
> [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> timed
> out after [30 000] msec
> [ERROR]
>  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> expected null, but was: testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306f6.
> is
> closing>
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound testThr...
> [ERROR]
> org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.
> testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.
> TestRegionsOnMasterOptions)
> [ERROR]   Run 1:
> TestRegionsOnMasterOptions.testRegionsOnAllServers:94->
> checkBalance:207->Object.wait:-2
> » TestTimedOut
> [ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
> Appears to be stuck in t...
> [INFO]
> [INFO]
> [ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
> [INFO]
>
>
> 2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari :
>
> > Deleted the class to get all the tests running. Was running on the RC1
> > from the tar.
> >
> > I know get those one failing.
> >
> > [ERROR] Failures:
> > [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
> > Balancer did not run
> > [ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:
> 284
> > hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-
> > 5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/
> > f1bdc6441b090dbacb391c74eaf0d1d0
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound Region ...
> > [INFO]
> > [ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44
> >
> >
> > I have not been able to get all the tests passed locally for a while :(
> >
> > JM
> >
> > 2018-01-06 15:05 GMT-05:00 Ted Yu :
> >
> >> Looks like you didn't include HBASE-19666 which would be in the next RC.
> >>
> >> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
> >> [email protected]> wrote:
> >>
> >> > Trying with a different comma

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread stack
It is not you.  There are a bunch of flies we need to fix. This latter is
for sure flakey.  Let me take a look. Thanks, JMS.

S

On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" 
wrote:

I might not doing the right magic to get that run If someone is able to
get all the tests pass, can you please share the command you run?

Thanks,

JMS


[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
expected: but was:
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
.testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
nment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
.testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
tMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.
testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.
testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
expected: but was:
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFl
ag:138
expected executor to be running
[INFO]
[ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting timed
out after [30 000] msec
[ERROR]
 TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
expected null, but was:
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound testThr...
[ERROR]
org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.
testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.
TestRegionsOnMasterOptions)
[ERROR]   Run 1:
TestRegionsOnMasterOptions.testRegionsOnAllServers:94->
checkBalance:207->Object.wait:-2
» TestTimedOut
[ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
Appears to be stuck in t...
[INFO]
[INFO]
[ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
[INFO]


2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari :

> Deleted the class to get all the tests running. Was running on the RC1
> from the tar.
>
> I know get those one failing.
>
> [ERROR] Failures:
> [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
> Balancer did not run
> [ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:284
> hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-
> 5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/
> f1bdc6441b090dbacb391c74eaf0d1d0
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound Region ...
> [INFO]
> [ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44
>
>
> I have not been able to get all the tests passed locally for a while :(
>
> JM
>
> 2018-01-06 15:05 GMT-05:00 Ted Yu :
>
>> Looks like you didn't include HBASE-19666 which would be in the next RC.
>>
>> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>>
>> > Trying with a different command line (mvn test -P runAllTests
>> > -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirector
>> y=/ram4g
>> > ) I get all those one failing.  How are you able to get everything
>> passed???
>> >
>> > [INFO] Results:
>> > [INFO]
>> > [ERROR] Failures:
>> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
>> > pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
>> > expected:<[[4, 2, 1]]> but was:<[[]]>
>> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
>> > estCompactionPolicy.compactEquals:182->TestCompactionPolicy.
>> compactEquals:201
>> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
>> > [INFO]
>> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
>> >
>> > Second run:
>> > [INFO] Results:
>> > [INFO]
>> > [ERROR] Failures:
>> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
>> > Te

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
I might not doing the right magic to get that run If someone is able to
get all the tests pass, can you please share the command you run?

Thanks,

JMS


[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
expected: but was:
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
nment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
tMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
expected: but was:
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting timed
out after [30 000] msec
[ERROR]
 TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
expected null, but was:
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound testThr...
[ERROR]
org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions)
[ERROR]   Run 1:
TestRegionsOnMasterOptions.testRegionsOnAllServers:94->checkBalance:207->Object.wait:-2
» TestTimedOut
[ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
Appears to be stuck in t...
[INFO]
[INFO]
[ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
[INFO]


2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari :

> Deleted the class to get all the tests running. Was running on the RC1
> from the tar.
>
> I know get those one failing.
>
> [ERROR] Failures:
> [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
> Balancer did not run
> [ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:284
> hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-
> 5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/
> f1bdc6441b090dbacb391c74eaf0d1d0
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound Region ...
> [INFO]
> [ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44
>
>
> I have not been able to get all the tests passed locally for a while :(
>
> JM
>
> 2018-01-06 15:05 GMT-05:00 Ted Yu :
>
>> Looks like you didn't include HBASE-19666 which would be in the next RC.
>>
>> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>>
>> > Trying with a different command line (mvn test -P runAllTests
>> > -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirector
>> y=/ram4g
>> > ) I get all those one failing.  How are you able to get everything
>> passed???
>> >
>> > [INFO] Results:
>> > [INFO]
>> > [ERROR] Failures:
>> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
>> > pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
>> > expected:<[[4, 2, 1]]> but was:<[[]]>
>> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
>> > estCompactionPolicy.compactEquals:182->TestCompactionPolicy.
>> compactEquals:201
>> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
>> > [INFO]
>> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
>> >
>> > Second run:
>> > [INFO] Results:
>> > [INFO]
>> > [ERROR] Failures:
>> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
>> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
>> .compactEquals:201
>> > expected:<[[4, 2, 1]]> but was:<[[]]>
>> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
Deleted the class to get all the tests running. Was running on the RC1 from
the tar.

I know get those one failing.

[ERROR] Failures:
[ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
Balancer did not run
[ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:284
hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/f1bdc6441b090dbacb391c74eaf0d1d0
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound Region ...
[INFO]
[ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44


I have not been able to get all the tests passed locally for a while :(

JM

2018-01-06 15:05 GMT-05:00 Ted Yu :

> Looks like you didn't include HBASE-19666 which would be in the next RC.
>
> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
> > Trying with a different command line (mvn test -P runAllTests
> > -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirector
> y=/ram4g
> > ) I get all those one failing.  How are you able to get everything
> passed???
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
> > pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
> > estCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> >
> > Second run:
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> >
> > Then again:
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> > [INFO]
> > [INFO] 
> > 
> > [INFO] Reactor Summary:
> >
> >
> > Sound like it's always the exact same result. Do I have a way to exclude
> > this TestCompactionPolicy test from the run?
> >
> > Here are more details from the last failure:
> > 
> > ---
> > Test set: org.apache.hadoop.hbase.regionserver.TestDefaultCompactSelec
> tion
> > 
> > ---
> > Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.323 s
> > <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection
> > testStuckStoreCompaction(org.apache.hadoop.hbase.regionserve
> r.TestDefaultCompactSelection)
> > Time elapsed: 1.047 s  <<< FAILURE!
> > org.junit.ComparisonFailure: expected:<[[]30, 30, 30]> but was:<[[99, 30,
> > ]30, 30, 30]>
> > at org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection.testStuckStoreCompaction(
> > TestDefaultCompactSelection.java:145)
> >
> > testCompactionRatio(org.apache.hadoop.hbase.regionserver.Tes
> tDefaultCompactSelection)
> > Time elapsed: 0.096 s  <<< FAILURE!
> > org.junit.ComparisonFailure: expected:<[[4, 2, 1]]> but was:<[[]]>
> > at org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection.testCompactionRatio(
> > TestDefaultCompactSelection.java:74)
> >
> >
> > 2018-01-06 12:53:53,240 WARN  [StoreOpener-22ce1d683ba4b6b93
> 73a3c541ebab2a2-1]
> > util.CommonFSUtils(536): FileSystem doesn't support setStoragePolicy;
> > HDFS-6584, HDFS-9345 not available. This is normal and expected on
> earlier
> > Hadoop versions.
> > java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.
> > setStoragePolicy(org.apache.hadoop.fs.Path, java.lang.String)
> > at java.lang.Class.getDeclaredMethod(Class.java:2130)
> > at org.apache.hadoop.hbase.util.CommonFSUtils.
> > invokeSetStoragePolicy(CommonFSUtils.java:528)
> > at org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(
> > CommonFSUtils.java:5

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Ted Yu
Looks like you didn't include HBASE-19666 which would be in the next RC.

On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Trying with a different command line (mvn test -P runAllTests
> -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirectory=/ram4g
> ) I get all those one failing.  How are you able to get everything passed???
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
> pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[4, 2, 1]]> but was:<[[]]>
> [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
> estCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> [INFO]
> [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
>
> Second run:
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[4, 2, 1]]> but was:<[[]]>
> [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> [INFO]
> [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
>
> Then again:
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[4, 2, 1]]> but was:<[[]]>
> [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> [INFO]
> [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> [INFO]
> [INFO] 
> 
> [INFO] Reactor Summary:
>
>
> Sound like it's always the exact same result. Do I have a way to exclude
> this TestCompactionPolicy test from the run?
>
> Here are more details from the last failure:
> 
> ---
> Test set: org.apache.hadoop.hbase.regionserver.TestDefaultCompactSelection
> 
> ---
> Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.323 s
> <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.
> TestDefaultCompactSelection
> testStuckStoreCompaction(org.apache.hadoop.hbase.regionserver.TestDefaultCompactSelection)
> Time elapsed: 1.047 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[[]30, 30, 30]> but was:<[[99, 30,
> ]30, 30, 30]>
> at org.apache.hadoop.hbase.regionserver.
> TestDefaultCompactSelection.testStuckStoreCompaction(
> TestDefaultCompactSelection.java:145)
>
> testCompactionRatio(org.apache.hadoop.hbase.regionserver.TestDefaultCompactSelection)
> Time elapsed: 0.096 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[[4, 2, 1]]> but was:<[[]]>
> at org.apache.hadoop.hbase.regionserver.
> TestDefaultCompactSelection.testCompactionRatio(
> TestDefaultCompactSelection.java:74)
>
>
> 2018-01-06 12:53:53,240 WARN  [StoreOpener-22ce1d683ba4b6b9373a3c541ebab2a2-1]
> util.CommonFSUtils(536): FileSystem doesn't support setStoragePolicy;
> HDFS-6584, HDFS-9345 not available. This is normal and expected on earlier
> Hadoop versions.
> java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.
> setStoragePolicy(org.apache.hadoop.fs.Path, java.lang.String)
> at java.lang.Class.getDeclaredMethod(Class.java:2130)
> at org.apache.hadoop.hbase.util.CommonFSUtils.
> invokeSetStoragePolicy(CommonFSUtils.java:528)
> at org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(
> CommonFSUtils.java:518)
> at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.
> setStoragePolicy(HRegionFileSystem.java:193)
> at org.apache.hadoop.hbase.regionserver.HStore.(
> HStore.java:250)
> at org.apache.hadoop.hbase.regionserver.HRegion.
> instantiateHStore(HRegion.java:5497)
> at org.apache.hadoop.hbase.regionserver.HRegion$1.call(
> HRegion.java:1002)
> at org.apache.hadoop.hbase.regionserver.HRegion$1.call(
> HRegion.java:999)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> 2018-01-06 12:

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Mike Drob
I can reproduce the issue locally. I think it has to do with the java
version being used - IIRC this is related to the version of java used, but
we can discuss in more detail on the JIRA.

https://issues.apache.org/jira/browse/HBASE-19721

Thanks, JMS!

On Sat, Jan 6, 2018 at 6:42 AM, Jean-Marc Spaggiari  wrote:

> How you guys are able to get the tests running?
>
> For me it keeps failing on TestReversedScannerCallable.
>
> I tried many times, always fails in the same place. I'm running on a 4GB
> tmpfs. Details are below. Am I doing something wrong?
>
> JM
>
>
>
> ./dev-support/hbasetests.sh runAllTests
>
>
>
> [INFO] Running org.apache.hadoop.hbase.client.TestOperation
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   TestReversedScannerCallable.unnecessary Mockito stubbings »
> UnnecessaryStubbing
> [INFO]
> [ERROR] Tests run: 245, Failures: 0, Errors: 1, Skipped: 8
> [INFO]
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache HBase ... SUCCESS [
> 1.409 s]
> [INFO] Apache HBase - Checkstyle .. SUCCESS [
> 1.295 s]
> [INFO] Apache HBase - Build Support ... SUCCESS [
> 0.038 s]
> [INFO] Apache HBase - Error Prone Rules ... SUCCESS [
> 1.069 s]
> [INFO] Apache HBase - Annotations . SUCCESS [
> 1.450 s]
> [INFO] Apache HBase - Build Configuration . SUCCESS [
> 0.073 s]
> [INFO] Apache HBase - Shaded Protocol . SUCCESS [
> 14.292 s]
> [INFO] Apache HBase - Common .. SUCCESS [01:51
> min]
> [INFO] Apache HBase - Metrics API . SUCCESS [
> 2.878 s]
> [INFO] Apache HBase - Hadoop Compatibility  SUCCESS [
> 12.216 s]
> [INFO] Apache HBase - Metrics Implementation .. SUCCESS [
> 7.206 s]
> [INFO] Apache HBase - Hadoop Two Compatibility  SUCCESS [
> 12.440 s]
> [INFO] Apache HBase - Protocol  SUCCESS [
> 0.074 s]
> [INFO] Apache HBase - Client .. FAILURE [02:10
> min]
> [INFO] Apache HBase - Zookeeper ... SKIPPED
> [INFO] Apache HBase - Replication . SKIPPED
>
>
>
>
>
> 
> ---
> Test set: org.apache.hadoop.hbase.client.TestReversedScannerCallable
> 
> ---
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.515 s <<<
> FAILURE! - in org.apache.hadoop.hbase.client.TestReversedScannerCallable
> unnecessary Mockito
> stubbings(org.apache.hadoop.hbase.client.TestReversedScannerCallable)
> Time
> elapsed: 0.014 s  <<< ERROR!
> org.mockito.exceptions.misusing.UnnecessaryStubbingException:
>
> Unnecessary stubbings detected in test class: TestReversedScannerCallable
> Clean & maintainable test code requires zero unnecessary code.
> Following stubbings are unnecessary (click to navigate to relevant line of
> code):
>   1. -> at
> org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(
> TestReversedScannerCallable.java:66)
>   2. -> at
> org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(
> TestReversedScannerCallable.java:68)
> Please remove unnecessary stubbings. More info: javadoc for
> UnnecessaryStubbingException class.
>
>
> 2018-01-06 0:44 GMT-05:00 stack :
>
> > On Jan 5, 2018 4:44 PM, "Apekshit Sharma"  wrote:
> >
> > bq. Care needs to be exercised backporting. Bug fixes only please. If in
> > doubt, ping me, the RM, please. Thanks.
> > In that case, shouldn't we branch out branch-2.0? We can then do normal
> > backports to branch-2 and only bug fixes to branch-2.0.
> >
> >
> >
> > Don't you think we have enough branches already mighty Appy?
> >
> > No new features on branch-2? New features are in master/3.0.0 only?
> >
> > S
> >
> >
> >
> >
> >
> >
> > On Fri, Jan 5, 2018 at 9:48 AM, Andrew Purtell 
> > wrote:
> >
> > > TestMemstoreLABWithoutPool is a flake, not a consistent fail.
> > >
> > >
> > > On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:
> > >
> > > > On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> > > > wrote:
> > > >
> > > > > This one is probably my fault:
> > > > >
> > > > > TestDefaultCompactSelection
> > > > >
> > > > > HBASE-19406
> > > > >
> > > > >
> > > > Balazs fixed it above, HBASE-19666
> > > >
> > > >
> > > >
> > > > > It can easily be reverted. The failure of interest
> > > > > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> > > > >
> > > > >
> > > > This seems fine. Passes in nightly
> > > > https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> > > > 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> > > > hbase.regionserver/TestMemstoreLABWithoutPool/
> > > > and locally against the tag. It fails consistently for you Andrew?

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
How you guys are able to get the tests running?

For me it keeps failing on TestReversedScannerCallable.

I tried many times, always fails in the same place. I'm running on a 4GB
tmpfs. Details are below. Am I doing something wrong?

JM



./dev-support/hbasetests.sh runAllTests



[INFO] Running org.apache.hadoop.hbase.client.TestOperation
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   TestReversedScannerCallable.unnecessary Mockito stubbings »
UnnecessaryStubbing
[INFO]
[ERROR] Tests run: 245, Failures: 0, Errors: 1, Skipped: 8
[INFO]
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache HBase ... SUCCESS [
1.409 s]
[INFO] Apache HBase - Checkstyle .. SUCCESS [
1.295 s]
[INFO] Apache HBase - Build Support ... SUCCESS [
0.038 s]
[INFO] Apache HBase - Error Prone Rules ... SUCCESS [
1.069 s]
[INFO] Apache HBase - Annotations . SUCCESS [
1.450 s]
[INFO] Apache HBase - Build Configuration . SUCCESS [
0.073 s]
[INFO] Apache HBase - Shaded Protocol . SUCCESS [
14.292 s]
[INFO] Apache HBase - Common .. SUCCESS [01:51
min]
[INFO] Apache HBase - Metrics API . SUCCESS [
2.878 s]
[INFO] Apache HBase - Hadoop Compatibility  SUCCESS [
12.216 s]
[INFO] Apache HBase - Metrics Implementation .. SUCCESS [
7.206 s]
[INFO] Apache HBase - Hadoop Two Compatibility  SUCCESS [
12.440 s]
[INFO] Apache HBase - Protocol  SUCCESS [
0.074 s]
[INFO] Apache HBase - Client .. FAILURE [02:10
min]
[INFO] Apache HBase - Zookeeper ... SKIPPED
[INFO] Apache HBase - Replication . SKIPPED





---
Test set: org.apache.hadoop.hbase.client.TestReversedScannerCallable
---
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.515 s <<<
FAILURE! - in org.apache.hadoop.hbase.client.TestReversedScannerCallable
unnecessary Mockito
stubbings(org.apache.hadoop.hbase.client.TestReversedScannerCallable)  Time
elapsed: 0.014 s  <<< ERROR!
org.mockito.exceptions.misusing.UnnecessaryStubbingException:

Unnecessary stubbings detected in test class: TestReversedScannerCallable
Clean & maintainable test code requires zero unnecessary code.
Following stubbings are unnecessary (click to navigate to relevant line of
code):
  1. -> at
org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:66)
  2. -> at
org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:68)
Please remove unnecessary stubbings. More info: javadoc for
UnnecessaryStubbingException class.


2018-01-06 0:44 GMT-05:00 stack :

> On Jan 5, 2018 4:44 PM, "Apekshit Sharma"  wrote:
>
> bq. Care needs to be exercised backporting. Bug fixes only please. If in
> doubt, ping me, the RM, please. Thanks.
> In that case, shouldn't we branch out branch-2.0? We can then do normal
> backports to branch-2 and only bug fixes to branch-2.0.
>
>
>
> Don't you think we have enough branches already mighty Appy?
>
> No new features on branch-2? New features are in master/3.0.0 only?
>
> S
>
>
>
>
>
>
> On Fri, Jan 5, 2018 at 9:48 AM, Andrew Purtell 
> wrote:
>
> > TestMemstoreLABWithoutPool is a flake, not a consistent fail.
> >
> >
> > On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:
> >
> > > On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> > > wrote:
> > >
> > > > This one is probably my fault:
> > > >
> > > > TestDefaultCompactSelection
> > > >
> > > > HBASE-19406
> > > >
> > > >
> > > Balazs fixed it above, HBASE-19666
> > >
> > >
> > >
> > > > It can easily be reverted. The failure of interest
> > > > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> > > >
> > > >
> > > This seems fine. Passes in nightly
> > > https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> > > 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> > > hbase.regionserver/TestMemstoreLABWithoutPool/
> > > and locally against the tag. It fails consistently for you Andrew?
> > >
> > >
> > > > > Should all unit tests pass on a beta? I think so, at least if the
> > > > failures
> > > > > are 100% repeatable.
> > > > >
> > > >
> > >
> > > This is fair. Let me squash this RC and roll another.
> > >
> > > Will put it up in a few hours.
> > >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > > > -0
> > > > >
> > > > > Checked sums and signatures: ok
> > > > > RAT check: ok
> > > > > Built from source: ok (8u144)
> > > > > Ran unit tests: some failures (8u144)
> > > > >
> > > > > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > > > > TestComp

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-05 Thread stack
On Jan 5, 2018 4:44 PM, "Apekshit Sharma"  wrote:

bq. Care needs to be exercised backporting. Bug fixes only please. If in
doubt, ping me, the RM, please. Thanks.
In that case, shouldn't we branch out branch-2.0? We can then do normal
backports to branch-2 and only bug fixes to branch-2.0.



Don't you think we have enough branches already mighty Appy?

No new features on branch-2? New features are in master/3.0.0 only?

S






On Fri, Jan 5, 2018 at 9:48 AM, Andrew Purtell  wrote:

> TestMemstoreLABWithoutPool is a flake, not a consistent fail.
>
>
> On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:
>
> > On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> > wrote:
> >
> > > This one is probably my fault:
> > >
> > > TestDefaultCompactSelection
> > >
> > > HBASE-19406
> > >
> > >
> > Balazs fixed it above, HBASE-19666
> >
> >
> >
> > > It can easily be reverted. The failure of interest
> > > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> > >
> > >
> > This seems fine. Passes in nightly
> > https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> > 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> > hbase.regionserver/TestMemstoreLABWithoutPool/
> > and locally against the tag. It fails consistently for you Andrew?
> >
> >
> > > > Should all unit tests pass on a beta? I think so, at least if the
> > > failures
> > > > are 100% repeatable.
> > > >
> > >
> >
> > This is fair. Let me squash this RC and roll another.
> >
> > Will put it up in a few hours.
> >
> > Thanks,
> > S
> >
> >
> >
> > > > -0
> > > >
> > > > Checked sums and signatures: ok
> > > > RAT check: ok
> > > > Built from source: ok (8u144)
> > > > Ran unit tests: some failures (8u144)
> > > >
> > > > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > > compactEquals:201
> > > > expected:<[[4, 2, 1]]> but was:<[[]]>
> > > >
> > > > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > > compactEquals:201
> > > > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > > >
> > > > [ERROR]   TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> > > SLABs:143
> > > > All the chunks must have been cleared
> > > >
> > > >
> > > >
> > > > On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:
> > > >
> > > >> The first release candidate for HBase 2.0.0-beta-1 is up at:
> > > >>
> > > >>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-
> beta-1-RC0/
> > > >>
> > > >> Maven artifacts are available from a staging directory here:
> > > >>
> > > >>  https://repository.apache.org/content/repositories/
> > orgapachehbase-1188
> > > >>
> > > >> All was signed with my key at 8ACC93D2 [1]
> > > >>
> > > >> I tagged the RC as 2.0.0-beta-1-RC0
> > > >> (0907563eb72697b394b8b960fe54887d6ff304fd)
> > > >>
> > > >> hbase-2.0.0-beta-1 is our first beta release. It includes all that
> was
> > > in
> > > >> previous alphas (new assignment manager, offheap read/write path,
> > > >> in-memory
> > > >> compactions, etc.). The APIs and feature-set are sealed.
> > > >>
> > > >> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0.
> It
> > is
> > > >> meant for devs and downstreamers to test drive and flag us if we
> > messed
> > > up
> > > >> on anything ahead of our rolling GAs. We are particular interested
> in
> > > >> hearing from Coprocessor developers.
> > > >>
> > > >> The list of features addressed in 2.0.0 so far can be found here
> [3].
> > > >> There
> > > >> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found
> > > >> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
mistakes).
> > > >>
> > > >> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do
> one
> > > >> more
> > > >> beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > > >> January, 2.0.0-beta-2. Its focus will be making it so users can do
a
> > > >> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
> > found
> > > >> running beta-1). Here is the list of what we have targeted so far
> for
> > > >> beta-2 [5]. Check it out.
> > > >>
> > > >> One knownissue is that the User API has not been properly filtered
> so
> > it
> > > >> shows more than just InterfaceAudience Public content (HBASE-19663,
> to
> > > be
> > > >> fixed by beta-2).
> > > >>
> > > >> Please take this beta for a spin. Please vote on whether it ok to
> put
> > > out
> > > >> this RC as our first beta (Note CHANGES has not yet been updated).
> Let
> > > the
> > > >> VOTE be open for 72 hours (Monday)
> > > >>
> > > >> Thanks,
> > > >> Your 2.0.0 Release Manager
> > > >>
> > > >> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > > >> 3. https://goo.gl/scYjJr
> > > >> 4. https://goo.gl/dFFT8b
> > > >> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > > >> 6. https://docs.google.com/docume

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-05 Thread Apekshit Sharma
bq. Care needs to be exercised backporting. Bug fixes only please. If in
doubt, ping me, the RM, please. Thanks.
In that case, shouldn't we branch out branch-2.0? We can then do normal
backports to branch-2 and only bug fixes to branch-2.0.

On Fri, Jan 5, 2018 at 9:48 AM, Andrew Purtell  wrote:

> TestMemstoreLABWithoutPool is a flake, not a consistent fail.
>
>
> On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:
>
> > On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> > wrote:
> >
> > > This one is probably my fault:
> > >
> > > TestDefaultCompactSelection
> > >
> > > HBASE-19406
> > >
> > >
> > Balazs fixed it above, HBASE-19666
> >
> >
> >
> > > It can easily be reverted. The failure of interest
> > > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> > >
> > >
> > This seems fine. Passes in nightly
> > https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> > 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> > hbase.regionserver/TestMemstoreLABWithoutPool/
> > and locally against the tag. It fails consistently for you Andrew?
> >
> >
> > > > Should all unit tests pass on a beta? I think so, at least if the
> > > failures
> > > > are 100% repeatable.
> > > >
> > >
> >
> > This is fair. Let me squash this RC and roll another.
> >
> > Will put it up in a few hours.
> >
> > Thanks,
> > S
> >
> >
> >
> > > > -0
> > > >
> > > > Checked sums and signatures: ok
> > > > RAT check: ok
> > > > Built from source: ok (8u144)
> > > > Ran unit tests: some failures (8u144)
> > > >
> > > > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > > compactEquals:201
> > > > expected:<[[4, 2, 1]]> but was:<[[]]>
> > > >
> > > > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > > compactEquals:201
> > > > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > > >
> > > > [ERROR]   TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> > > SLABs:143
> > > > All the chunks must have been cleared
> > > >
> > > >
> > > >
> > > > On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:
> > > >
> > > >> The first release candidate for HBase 2.0.0-beta-1 is up at:
> > > >>
> > > >>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-
> beta-1-RC0/
> > > >>
> > > >> Maven artifacts are available from a staging directory here:
> > > >>
> > > >>  https://repository.apache.org/content/repositories/
> > orgapachehbase-1188
> > > >>
> > > >> All was signed with my key at 8ACC93D2 [1]
> > > >>
> > > >> I tagged the RC as 2.0.0-beta-1-RC0
> > > >> (0907563eb72697b394b8b960fe54887d6ff304fd)
> > > >>
> > > >> hbase-2.0.0-beta-1 is our first beta release. It includes all that
> was
> > > in
> > > >> previous alphas (new assignment manager, offheap read/write path,
> > > >> in-memory
> > > >> compactions, etc.). The APIs and feature-set are sealed.
> > > >>
> > > >> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0.
> It
> > is
> > > >> meant for devs and downstreamers to test drive and flag us if we
> > messed
> > > up
> > > >> on anything ahead of our rolling GAs. We are particular interested
> in
> > > >> hearing from Coprocessor developers.
> > > >>
> > > >> The list of features addressed in 2.0.0 so far can be found here
> [3].
> > > >> There
> > > >> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found
> > > >> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
> > > >>
> > > >> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do
> one
> > > >> more
> > > >> beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > > >> January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > > >> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
> > found
> > > >> running beta-1). Here is the list of what we have targeted so far
> for
> > > >> beta-2 [5]. Check it out.
> > > >>
> > > >> One knownissue is that the User API has not been properly filtered
> so
> > it
> > > >> shows more than just InterfaceAudience Public content (HBASE-19663,
> to
> > > be
> > > >> fixed by beta-2).
> > > >>
> > > >> Please take this beta for a spin. Please vote on whether it ok to
> put
> > > out
> > > >> this RC as our first beta (Note CHANGES has not yet been updated).
> Let
> > > the
> > > >> VOTE be open for 72 hours (Monday)
> > > >>
> > > >> Thanks,
> > > >> Your 2.0.0 Release Manager
> > > >>
> > > >> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > > >> 3. https://goo.gl/scYjJr
> > > >> 4. https://goo.gl/dFFT8b
> > > >> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > > >> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
> > > >> ktczrlKHK8N4SZzs/
> > > >>  > > ktczrlKHK8N4SZzs/>
> > > >>
> > > >
> > > >
> > > >

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-05 Thread Andrew Purtell
TestMemstoreLABWithoutPool is a flake, not a consistent fail.


On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:

> On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> wrote:
>
> > This one is probably my fault:
> >
> > TestDefaultCompactSelection
> >
> > HBASE-19406
> >
> >
> Balazs fixed it above, HBASE-19666
>
>
>
> > It can easily be reverted. The failure of interest
> > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> >
> >
> This seems fine. Passes in nightly
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> hbase.regionserver/TestMemstoreLABWithoutPool/
> and locally against the tag. It fails consistently for you Andrew?
>
>
> > > Should all unit tests pass on a beta? I think so, at least if the
> > failures
> > > are 100% repeatable.
> > >
> >
>
> This is fair. Let me squash this RC and roll another.
>
> Will put it up in a few hours.
>
> Thanks,
> S
>
>
>
> > > -0
> > >
> > > Checked sums and signatures: ok
> > > RAT check: ok
> > > Built from source: ok (8u144)
> > > Ran unit tests: some failures (8u144)
> > >
> > > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > compactEquals:201
> > > expected:<[[4, 2, 1]]> but was:<[[]]>
> > >
> > > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> > compactEquals:201
> > > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > >
> > > [ERROR]   TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> > SLABs:143
> > > All the chunks must have been cleared
> > >
> > >
> > >
> > > On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:
> > >
> > >> The first release candidate for HBase 2.0.0-beta-1 is up at:
> > >>
> > >>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
> > >>
> > >> Maven artifacts are available from a staging directory here:
> > >>
> > >>  https://repository.apache.org/content/repositories/
> orgapachehbase-1188
> > >>
> > >> All was signed with my key at 8ACC93D2 [1]
> > >>
> > >> I tagged the RC as 2.0.0-beta-1-RC0
> > >> (0907563eb72697b394b8b960fe54887d6ff304fd)
> > >>
> > >> hbase-2.0.0-beta-1 is our first beta release. It includes all that was
> > in
> > >> previous alphas (new assignment manager, offheap read/write path,
> > >> in-memory
> > >> compactions, etc.). The APIs and feature-set are sealed.
> > >>
> > >> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It
> is
> > >> meant for devs and downstreamers to test drive and flag us if we
> messed
> > up
> > >> on anything ahead of our rolling GAs. We are particular interested in
> > >> hearing from Coprocessor developers.
> > >>
> > >> The list of features addressed in 2.0.0 so far can be found here [3].
> > >> There
> > >> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> found
> > >> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
> > >>
> > >> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> > >> more
> > >> beta before we put up our first 2.0.0 Release Candidate by the end of
> > >> January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > >> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
> found
> > >> running beta-1). Here is the list of what we have targeted so far for
> > >> beta-2 [5]. Check it out.
> > >>
> > >> One knownissue is that the User API has not been properly filtered so
> it
> > >> shows more than just InterfaceAudience Public content (HBASE-19663, to
> > be
> > >> fixed by beta-2).
> > >>
> > >> Please take this beta for a spin. Please vote on whether it ok to put
> > out
> > >> this RC as our first beta (Note CHANGES has not yet been updated). Let
> > the
> > >> VOTE be open for 72 hours (Monday)
> > >>
> > >> Thanks,
> > >> Your 2.0.0 Release Manager
> > >>
> > >> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > >> 3. https://goo.gl/scYjJr
> > >> 4. https://goo.gl/dFFT8b
> > >> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > >> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
> > >> ktczrlKHK8N4SZzs/
> > >>  > ktczrlKHK8N4SZzs/>
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >- A23, Crosstalk
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-05 Thread Stack
On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell  wrote:

> This one is probably my fault:
>
> TestDefaultCompactSelection
>
> HBASE-19406
>
>
Balazs fixed it above, HBASE-19666



> It can easily be reverted. The failure of interest
> is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
>
>
This seems fine. Passes in nightly
https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2/171/testReport/org.apache.hadoop.hbase.regionserver/TestMemstoreLABWithoutPool/
and locally against the tag. It fails consistently for you Andrew?


> > Should all unit tests pass on a beta? I think so, at least if the
> failures
> > are 100% repeatable.
> >
>

This is fair. Let me squash this RC and roll another.

Will put it up in a few hours.

Thanks,
S



> > -0
> >
> > Checked sums and signatures: ok
> > RAT check: ok
> > Built from source: ok (8u144)
> > Ran unit tests: some failures (8u144)
> >
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> >
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> >
> > [ERROR]   TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> SLABs:143
> > All the chunks must have been cleared
> >
> >
> >
> > On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:
> >
> >> The first release candidate for HBase 2.0.0-beta-1 is up at:
> >>
> >>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
> >>
> >> Maven artifacts are available from a staging directory here:
> >>
> >>  https://repository.apache.org/content/repositories/orgapachehbase-1188
> >>
> >> All was signed with my key at 8ACC93D2 [1]
> >>
> >> I tagged the RC as 2.0.0-beta-1-RC0
> >> (0907563eb72697b394b8b960fe54887d6ff304fd)
> >>
> >> hbase-2.0.0-beta-1 is our first beta release. It includes all that was
> in
> >> previous alphas (new assignment manager, offheap read/write path,
> >> in-memory
> >> compactions, etc.). The APIs and feature-set are sealed.
> >>
> >> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
> >> meant for devs and downstreamers to test drive and flag us if we messed
> up
> >> on anything ahead of our rolling GAs. We are particular interested in
> >> hearing from Coprocessor developers.
> >>
> >> The list of features addressed in 2.0.0 so far can be found here [3].
> >> There
> >> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be found
> >> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
> >>
> >> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> >> more
> >> beta before we put up our first 2.0.0 Release Candidate by the end of
> >> January, 2.0.0-beta-2. Its focus will be making it so users can do a
> >> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> >> running beta-1). Here is the list of what we have targeted so far for
> >> beta-2 [5]. Check it out.
> >>
> >> One knownissue is that the User API has not been properly filtered so it
> >> shows more than just InterfaceAudience Public content (HBASE-19663, to
> be
> >> fixed by beta-2).
> >>
> >> Please take this beta for a spin. Please vote on whether it ok to put
> out
> >> this RC as our first beta (Note CHANGES has not yet been updated). Let
> the
> >> VOTE be open for 72 hours (Monday)
> >>
> >> Thanks,
> >> Your 2.0.0 Release Manager
> >>
> >> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> >> 3. https://goo.gl/scYjJr
> >> 4. https://goo.gl/dFFT8b
> >> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> >> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
> >> ktczrlKHK8N4SZzs/
> >>  ktczrlKHK8N4SZzs/>
> >>
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-05 Thread Stack
On Thu, Jan 4, 2018 at 12:39 PM, Jean-Marc Spaggiari <
[email protected]> wrote:

> If I re-run from the original cluster, now that I have snappy enabled, it
> works. But if it helps I can easily remove snappy libs, transfer from
> source, re-run and capture all the logs. It's an easy step. Just confirm
> and I will do it.
>
> Apart from that, everything else seems to run correctly. I ran some
> RowCounters, merged regions, compactions, splits, alters, etc. Have not
> found anything else. Still +1 ;)
>
>
Sounds like disabled SNAPPY damaged the cluster. It shouldn't. Let me try
it myself. If I can't repro the damage I'll come looking for you.
Thanks JMS,
S



> JMS
>
> 2018-01-03 22:26 GMT-05:00 Stack :
>
> > +1 from me.
> > S
> >
> > On Fri, Dec 29, 2017 at 12:15 PM, Stack  wrote:
> >
> > > The first release candidate for HBase 2.0.0-beta-1 is up at:
> > >
> > >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
> > >
> > > Maven artifacts are available from a staging directory here:
> > >
> > >  https://repository.apache.org/content/repositories/
> orgapachehbase-1188
> > >
> > > All was signed with my key at 8ACC93D2 [1]
> > >
> > > I tagged the RC as 2.0.0-beta-1-RC0 (0907563eb72697b394b8b960fe5488
> > > 7d6ff304fd)
> > >
> > > hbase-2.0.0-beta-1 is our first beta release. It includes all that was
> in
> > > previous alphas (new assignment manager, offheap read/write path,
> > in-memory
> > > compactions, etc.). The APIs and feature-set are sealed.
> > >
> > > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It
> is
> > > meant for devs and downstreamers to test drive and flag us if we messed
> > up
> > > on anything ahead of our rolling GAs. We are particular interested in
> > > hearing from Coprocessor developers.
> > >
> > > The list of features addressed in 2.0.0 so far can be found here [3].
> > > There are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > > found here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
> > mistakes).
> > >
> > > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> > > more beta before we put up our first 2.0.0 Release Candidate by the end
> > of
> > > January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> > > running beta-1). Here is the list of what we have targeted so far for
> > > beta-2 [5]. Check it out.
> > >
> > > One knownissue is that the User API has not been properly filtered so
> it
> > > shows more than just InterfaceAudience Public content (HBASE-19663, to
> be
> > > fixed by beta-2).
> > >
> > > Please take this beta for a spin. Please vote on whether it ok to put
> out
> > > this RC as our first beta (Note CHANGES has not yet been updated). Let
> > the
> > > VOTE be open for 72 hours (Monday)
> > >
> > > Thanks,
> > > Your 2.0.0 Release Manager
> > >
> > > 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > > 3. https://goo.gl/scYjJr
> > > 4. https://goo.gl/dFFT8b
> > > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4
> > > z9iEu_ktczrlKHK8N4SZzs/
> > >
> >
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-04 Thread Andrew Purtell
This one is probably my fault:

TestDefaultCompactSelection

HBASE-19406

It can easily be reverted. The failure of interest
is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.


On Thu, Jan 4, 2018 at 12:22 PM, Andrew Purtell  wrote:

> Should all unit tests pass on a beta? I think so, at least if the failures
> are 100% repeatable.
>
> -0
>
> Checked sums and signatures: ok
> RAT check: ok
> Built from source: ok (8u144)
> Ran unit tests: some failures (8u144)
>
> [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[4, 2, 1]]> but was:<[[]]>
>
> [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
>
> [ERROR]   TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143
> All the chunks must have been cleared
>
>
>
> On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:
>
>> The first release candidate for HBase 2.0.0-beta-1 is up at:
>>
>>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
>>
>> Maven artifacts are available from a staging directory here:
>>
>>  https://repository.apache.org/content/repositories/orgapachehbase-1188
>>
>> All was signed with my key at 8ACC93D2 [1]
>>
>> I tagged the RC as 2.0.0-beta-1-RC0
>> (0907563eb72697b394b8b960fe54887d6ff304fd)
>>
>> hbase-2.0.0-beta-1 is our first beta release. It includes all that was in
>> previous alphas (new assignment manager, offheap read/write path,
>> in-memory
>> compactions, etc.). The APIs and feature-set are sealed.
>>
>> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
>> meant for devs and downstreamers to test drive and flag us if we messed up
>> on anything ahead of our rolling GAs. We are particular interested in
>> hearing from Coprocessor developers.
>>
>> The list of features addressed in 2.0.0 so far can be found here [3].
>> There
>> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be found
>> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
>>
>> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
>> more
>> beta before we put up our first 2.0.0 Release Candidate by the end of
>> January, 2.0.0-beta-2. Its focus will be making it so users can do a
>> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
>> running beta-1). Here is the list of what we have targeted so far for
>> beta-2 [5]. Check it out.
>>
>> One knownissue is that the User API has not been properly filtered so it
>> shows more than just InterfaceAudience Public content (HBASE-19663, to be
>> fixed by beta-2).
>>
>> Please take this beta for a spin. Please vote on whether it ok to put out
>> this RC as our first beta (Note CHANGES has not yet been updated). Let the
>> VOTE be open for 72 hours (Monday)
>>
>> Thanks,
>> Your 2.0.0 Release Manager
>>
>> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
>> 3. https://goo.gl/scYjJr
>> 4. https://goo.gl/dFFT8b
>> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
>> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
>> ktczrlKHK8N4SZzs/
>> 
>>
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-04 Thread Andrew Purtell
Should all unit tests pass on a beta? I think so, at least if the failures
are 100% repeatable.

-0

Checked sums and signatures: ok
RAT check: ok
Built from source: ok (8u144)
Ran unit tests: some failures (8u144)

[ERROR]
 
TestDefaultCompactSelection.testCompactionRatio:74->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
expected:<[[4, 2, 1]]> but was:<[[]]>

[ERROR]
 
TestDefaultCompactSelection.testStuckStoreCompaction:145->TestCompactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>

[ERROR]
 TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143 All the
chunks must have been cleared



On Fri, Dec 29, 2017 at 10:15 AM, Stack  wrote:

> The first release candidate for HBase 2.0.0-beta-1 is up at:
>
>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
>
> Maven artifacts are available from a staging directory here:
>
>  https://repository.apache.org/content/repositories/orgapachehbase-1188
>
> All was signed with my key at 8ACC93D2 [1]
>
> I tagged the RC as 2.0.0-beta-1-RC0
> (0907563eb72697b394b8b960fe54887d6ff304fd)
>
> hbase-2.0.0-beta-1 is our first beta release. It includes all that was in
> previous alphas (new assignment manager, offheap read/write path, in-memory
> compactions, etc.). The APIs and feature-set are sealed.
>
> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
> meant for devs and downstreamers to test drive and flag us if we messed up
> on anything ahead of our rolling GAs. We are particular interested in
> hearing from Coprocessor developers.
>
> The list of features addressed in 2.0.0 so far can be found here [3]. There
> are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be found
> here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
>
> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one more
> beta before we put up our first 2.0.0 Release Candidate by the end of
> January, 2.0.0-beta-2. Its focus will be making it so users can do a
> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> running beta-1). Here is the list of what we have targeted so far for
> beta-2 [5]. Check it out.
>
> One knownissue is that the User API has not been properly filtered so it
> shows more than just InterfaceAudience Public content (HBASE-19663, to be
> fixed by beta-2).
>
> Please take this beta for a spin. Please vote on whether it ok to put out
> this RC as our first beta (Note CHANGES has not yet been updated). Let the
> VOTE be open for 72 hours (Monday)
>
> Thanks,
> Your 2.0.0 Release Manager
>
> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> 3. https://goo.gl/scYjJr
> 4. https://goo.gl/dFFT8b
> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
> ktczrlKHK8N4SZzs/
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-04 Thread Jean-Marc Spaggiari
If I re-run from the original cluster, now that I have snappy enabled, it
works. But if it helps I can easily remove snappy libs, transfer from
source, re-run and capture all the logs. It's an easy step. Just confirm
and I will do it.

Apart from that, everything else seems to run correctly. I ran some
RowCounters, merged regions, compactions, splits, alters, etc. Have not
found anything else. Still +1 ;)

JMS

2018-01-03 22:26 GMT-05:00 Stack :

> +1 from me.
> S
>
> On Fri, Dec 29, 2017 at 12:15 PM, Stack  wrote:
>
> > The first release candidate for HBase 2.0.0-beta-1 is up at:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
> >
> > Maven artifacts are available from a staging directory here:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1188
> >
> > All was signed with my key at 8ACC93D2 [1]
> >
> > I tagged the RC as 2.0.0-beta-1-RC0 (0907563eb72697b394b8b960fe5488
> > 7d6ff304fd)
> >
> > hbase-2.0.0-beta-1 is our first beta release. It includes all that was in
> > previous alphas (new assignment manager, offheap read/write path,
> in-memory
> > compactions, etc.). The APIs and feature-set are sealed.
> >
> > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
> > meant for devs and downstreamers to test drive and flag us if we messed
> up
> > on anything ahead of our rolling GAs. We are particular interested in
> > hearing from Coprocessor developers.
> >
> > The list of features addressed in 2.0.0 so far can be found here [3].
> > There are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
> mistakes).
> >
> > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> > more beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> > running beta-1). Here is the list of what we have targeted so far for
> > beta-2 [5]. Check it out.
> >
> > One knownissue is that the User API has not been properly filtered so it
> > shows more than just InterfaceAudience Public content (HBASE-19663, to be
> > fixed by beta-2).
> >
> > Please take this beta for a spin. Please vote on whether it ok to put out
> > this RC as our first beta (Note CHANGES has not yet been updated). Let
> the
> > VOTE be open for 72 hours (Monday)
> >
> > Thanks,
> > Your 2.0.0 Release Manager
> >
> > 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > 3. https://goo.gl/scYjJr
> > 4. https://goo.gl/dFFT8b
> > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4
> > z9iEu_ktczrlKHK8N4SZzs/
> >
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread Stack
+1 from me.
S

On Fri, Dec 29, 2017 at 12:15 PM, Stack  wrote:

> The first release candidate for HBase 2.0.0-beta-1 is up at:
>
>  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
>
> Maven artifacts are available from a staging directory here:
>
>  https://repository.apache.org/content/repositories/orgapachehbase-1188
>
> All was signed with my key at 8ACC93D2 [1]
>
> I tagged the RC as 2.0.0-beta-1-RC0 (0907563eb72697b394b8b960fe5488
> 7d6ff304fd)
>
> hbase-2.0.0-beta-1 is our first beta release. It includes all that was in
> previous alphas (new assignment manager, offheap read/write path, in-memory
> compactions, etc.). The APIs and feature-set are sealed.
>
> hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
> meant for devs and downstreamers to test drive and flag us if we messed up
> on anything ahead of our rolling GAs. We are particular interested in
> hearing from Coprocessor developers.
>
> The list of features addressed in 2.0.0 so far can be found here [3].
> There are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> found here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
>
> I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> more beta before we put up our first 2.0.0 Release Candidate by the end of
> January, 2.0.0-beta-2. Its focus will be making it so users can do a
> rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> running beta-1). Here is the list of what we have targeted so far for
> beta-2 [5]. Check it out.
>
> One knownissue is that the User API has not been properly filtered so it
> shows more than just InterfaceAudience Public content (HBASE-19663, to be
> fixed by beta-2).
>
> Please take this beta for a spin. Please vote on whether it ok to put out
> this RC as our first beta (Note CHANGES has not yet been updated). Let the
> VOTE be open for 72 hours (Monday)
>
> Thanks,
> Your 2.0.0 Release Manager
>
> 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> 3. https://goo.gl/scYjJr
> 4. https://goo.gl/dFFT8b
> 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4
> z9iEu_ktczrlKHK8N4SZzs/
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread Stack
On Sun, Dec 31, 2017 at 7:23 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Nothing bad that I can see. Here is a region server log:
> https://pastebin.com/0r76Y6ap
>
>
Good one JMS. This log has "nothing" about why we decide to close the
Region post successful open (If it was a Region w/ old hfiles or a native
compression lib to which we had no access, I'd have thought we'd have
failed the open before this point). I'm supposing its the Master is asking
it close. The log should make this more clear if this is what is going on
(HBASE-19701). Unfortunately the Master log is from a later period so
cannot correlate the RS-side opens/closes (Do you have the Master log from
around 2017-12-31 09:54:21,058 ?)

Looking at Master log, link posted below, it has trouble opening log #30.
It is finding incomplete edits. E.g:

2017-12-31 10:11:38,130 ERROR [node2:6.masterManager]
procedure2.ProcedureExecutor: Corrupt pid=243, ppid=242,
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=email,
region=c07e50c3a15e8ab20cbd9514b333b67d, server=node4.com,16020,
1514693339685

We've seen this before (HBASE-18152). This is probably the root of the
strangeness we see here. I'd be interested in earlier logs JMS if you have
them sir. If Master is failing reading this last log it is going to be
working w/ an incomplete state. In particular, the regions at the point of
issue, were in OPENING state so when Master comes up, it is waiting on the
RS to report in a sucessful OPEN (of FAIL) but at this state in the game,
it is never going to happen it seems so we see


2017-12-31 10:12:52,611 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.com,16020,151469206, table=work_proposed,
region=5b4b9a7b4e58da39a2072fdcb512df2f

...in Master log.

Do you have older logs that I can look at? A particular sequence of events
put us in this state. In the past, we've been able to determine where the
hole is and we've been able to plug it.

Maybe we could rerun your loading from the beginning but w/ DEBUG enabled
in case INFO-level does not reveal enough info?

Thanks JMS,
S




> Disabling the table makes the regions leave the transition mode. I'm trying
> to disable all tables one by one (because it get stuck after each disable)
> and will see if re-enabling them helps...
>
> On the master side, I now have errors all over:
> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> assignment.RegionTransitionProcedure: Retryable error trying to
> transition:
> pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
> UnassignProcedure table=work_proposed,
> region=d0a58b76ad9376b12b3e763660049d3d, server=node3.com,16020,1514693
> 337210;
> rit=OPENING, location=node3.com,16020,1514693337210
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> current state=OPENING
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> nStateNode.transitionState(RegionStates.java:155)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> markRegionAsClosing(AssignmentManager.java:1530)
> at
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> updateTransition(UnassignProcedure.java:179)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:309)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:85)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> cedure(ProcedureExecutor.java:1456)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
> Procedure(ProcedureExecutor.java:1225)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
> 800(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
> hread.run(ProcedureExecutor.java:1735)
>
> Non-stop showing on the logs. Probably because I disabled the table.
>




> Restarting HBase so see if it clears that a but...
>
> After restart there isn't any
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException on the logs.
> Only INFO lever. And nothing bad. But still, regions are stuck in
> transition even for the disabled tables.
>
> Master ls are here. I removed some sections because it always says the same
> thing, for each and every single region: https://pastebin.com/K6SQ7DXP
>
> JMS
>
> 2017-12-31 9:58 GMT-05:00 stack :
>
> > There is nothing further up in the master log from regionservers or on
> > regionservers side on open?
> >
> > Thanks,
> > S
> >
> > On Dec 31, 2017 8:37 AM, "stack"  wrote:
> >
> > > Good questions.  If you disable snappy does it work?  If you start over
> > > fresh does it work?  It should be picking up native libs.  Make an
> issue
> > > please jms.  Thanks for 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread stack
Ok. Lets see if there other issues out there. Hopefully we can find
something more substantial than a failing unit test as reason for sinking
an rc.

Thanks balazs,
S

On Jan 3, 2018 8:53 AM, "Balazs Meszaros" 
wrote:

> Only that one failed.
>
> Balazs
>
> On Wed, Jan 3, 2018 at 3:48 PM, stack  wrote:
>
> > Thanks balazs. Did you find other failing tests or just this one?
> >
> > S
> >
> > On Jan 3, 2018 5:33 AM, "Balazs Meszaros" 
> > wrote:
> >
> > > My observations:
> > >
> > > - signatures checksums are ok
> > > - shell worked
> > > - load test tool with read/write also worked
> > > - when I built it, TestDefaultCompactSelection failed. The fix is
> already
> > > here: HBASE-19666 
> > >
> > > On Mon, Jan 1, 2018 at 11:30 PM, stack  wrote:
> > >
> > > > Yes. Of course. Need your input lads.
> > > > S
> > > >
> > > > On Jan 1, 2018 3:15 PM, "Andrew Purtell" 
> > > wrote:
> > > >
> > > > > Seconded. I’ll be back later this week. Can try it out then?
> > > > >
> > > > >
> > > > > > On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> > > > > >
> > > > > > Is an extension here a reasonable ask? Putting the vote up right
> > > before
> > > > > > what is a long New Year weekend for many folks doesn't give a lot
> > of
> > > > > > opportunity for thorough review.
> > > > > >
> > > > > > Mike
> > > > > >
> > > > > >> On Mon, Jan 1, 2018 at 1:30 PM, stack 
> > wrote:
> > > > > >>
> > > > > >> This is great stuff jms.  Thank you.  Away from computer at mo
> but
> > > > will
> > > > > dig
> > > > > >> in.
> > > > > >>
> > > > > >> Is it possible old files left over written with old hbase with
> old
> > > > hfile
> > > > > >> version? Can you see on source?  They should have but updated
> by a
> > > > > >> compaction if a long time idle, I agree.
> > > > > >>
> > > > > >> Yeah. If region assign fails, and goes into assignable state, we
> > > need
> > > > > >> intervention. We've been shutting down all the ways in which
> this
> > > > could
> > > > > >> happen but you seem to have stumbled on a new one. I will take a
> > > look
> > > > at
> > > > > >> your logs.
> > > > > >>
> > > > > >> What you going to vote?  Does it basically work?
> > > > > >>
> > > > > >> Thanks again for the try out.
> > > > > >> S
> > > > > >>
> > > > > >> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <
> > > > > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >> Sorry to spam the list :(
> > > > > >>
> > > > > >> Another interesting thing.
> > > > > >>
> > > > > >> Now most of my tablesare online. For few I'm getting this:
> > > > > >> Caused by: java.lang.IllegalArgumentException: Invalid HFile
> > > version:
> > > > > >> major=2, minor=1: expected at least major=2 and minor=3
> > > > > >>at
> > > > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.
> > checkFileVersion(
> > > > > >> HFileReaderImpl.java:332)
> > > > > >>at
> > > > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> > > > > >> HFileReaderImpl.java:199)
> > > > > >>at org.apache.hadoop.hbase.io.
> > hfile.HFile.openReader(HFile.
> > > > > >> java:538)
> > > > > >>... 13 more
> > > > > >>
> > > > > >> What is interesting is tat I'm not doing anything on the source
> > > > cluster
> > > > > for
> > > > > >> weeks/months. So all tables are all major compacted the same
> way.
> > I
> > > > will
> > > > > >> major compact them all under HFiles v3 format and retry.
> > > > > >>
> > > > > >> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <
> > > > [email protected]
> > > > > >:
> > > > > >>
> > > > > >>> Ok. With a brand new DestCP from source cluster, regions are
> > > getting
> > > > > >>> assigned correctly. So sound like if they get stuck initially
> for
> > > any
> > > > > >>> reason, then even if the reason is fixed they can not get
> > assigned
> > > > > >> anymore
> > > > > >>> again. Will keep playing.
> > > > > >>>
> > > > > >>> I kept the previous /hbase just in case we need something from
> > it.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>>
> > > > > >>> JMS
> > > > > >>>
> > > > > >>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <
> > > > > [email protected]
> > > > > >>> :
> > > > > >>>
> > > > >  Nothing bad that I can see. Here is a region server log:
> > > > >  https://pastebin.com/0r76Y6ap
> > > > > 
> > > > >  Disabling the table makes the regions leave the transition
> mode.
> > > I'm
> > > > >  trying to disable all tables one by one (because it get stuck
> > > after
> > > > > each
> > > > >  disable) and will see if re-enabling them helps...
> > > > > 
> > > > >  On the master side, I now have errors all over:
> > > > >  2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> > > > >  assignment.RegionTransitionProcedure: Retryable error trying
> to
> > > > >  transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> > > > > >> TRANSITION_DISPATCH;
> > > > >  UnassignProcedure table=work_pr

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread Balazs Meszaros
Only that one failed.

Balazs

On Wed, Jan 3, 2018 at 3:48 PM, stack  wrote:

> Thanks balazs. Did you find other failing tests or just this one?
>
> S
>
> On Jan 3, 2018 5:33 AM, "Balazs Meszaros" 
> wrote:
>
> > My observations:
> >
> > - signatures checksums are ok
> > - shell worked
> > - load test tool with read/write also worked
> > - when I built it, TestDefaultCompactSelection failed. The fix is already
> > here: HBASE-19666 
> >
> > On Mon, Jan 1, 2018 at 11:30 PM, stack  wrote:
> >
> > > Yes. Of course. Need your input lads.
> > > S
> > >
> > > On Jan 1, 2018 3:15 PM, "Andrew Purtell" 
> > wrote:
> > >
> > > > Seconded. I’ll be back later this week. Can try it out then?
> > > >
> > > >
> > > > > On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> > > > >
> > > > > Is an extension here a reasonable ask? Putting the vote up right
> > before
> > > > > what is a long New Year weekend for many folks doesn't give a lot
> of
> > > > > opportunity for thorough review.
> > > > >
> > > > > Mike
> > > > >
> > > > >> On Mon, Jan 1, 2018 at 1:30 PM, stack 
> wrote:
> > > > >>
> > > > >> This is great stuff jms.  Thank you.  Away from computer at mo but
> > > will
> > > > dig
> > > > >> in.
> > > > >>
> > > > >> Is it possible old files left over written with old hbase with old
> > > hfile
> > > > >> version? Can you see on source?  They should have but updated by a
> > > > >> compaction if a long time idle, I agree.
> > > > >>
> > > > >> Yeah. If region assign fails, and goes into assignable state, we
> > need
> > > > >> intervention. We've been shutting down all the ways in which this
> > > could
> > > > >> happen but you seem to have stumbled on a new one. I will take a
> > look
> > > at
> > > > >> your logs.
> > > > >>
> > > > >> What you going to vote?  Does it basically work?
> > > > >>
> > > > >> Thanks again for the try out.
> > > > >> S
> > > > >>
> > > > >> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <
> > > > [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >> Sorry to spam the list :(
> > > > >>
> > > > >> Another interesting thing.
> > > > >>
> > > > >> Now most of my tablesare online. For few I'm getting this:
> > > > >> Caused by: java.lang.IllegalArgumentException: Invalid HFile
> > version:
> > > > >> major=2, minor=1: expected at least major=2 and minor=3
> > > > >>at
> > > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.
> checkFileVersion(
> > > > >> HFileReaderImpl.java:332)
> > > > >>at
> > > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> > > > >> HFileReaderImpl.java:199)
> > > > >>at org.apache.hadoop.hbase.io.
> hfile.HFile.openReader(HFile.
> > > > >> java:538)
> > > > >>... 13 more
> > > > >>
> > > > >> What is interesting is tat I'm not doing anything on the source
> > > cluster
> > > > for
> > > > >> weeks/months. So all tables are all major compacted the same way.
> I
> > > will
> > > > >> major compact them all under HFiles v3 format and retry.
> > > > >>
> > > > >> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <
> > > [email protected]
> > > > >:
> > > > >>
> > > > >>> Ok. With a brand new DestCP from source cluster, regions are
> > getting
> > > > >>> assigned correctly. So sound like if they get stuck initially for
> > any
> > > > >>> reason, then even if the reason is fixed they can not get
> assigned
> > > > >> anymore
> > > > >>> again. Will keep playing.
> > > > >>>
> > > > >>> I kept the previous /hbase just in case we need something from
> it.
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> JMS
> > > > >>>
> > > > >>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <
> > > > [email protected]
> > > > >>> :
> > > > >>>
> > > >  Nothing bad that I can see. Here is a region server log:
> > > >  https://pastebin.com/0r76Y6ap
> > > > 
> > > >  Disabling the table makes the regions leave the transition mode.
> > I'm
> > > >  trying to disable all tables one by one (because it get stuck
> > after
> > > > each
> > > >  disable) and will see if re-enabling them helps...
> > > > 
> > > >  On the master side, I now have errors all over:
> > > >  2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> > > >  assignment.RegionTransitionProcedure: Retryable error trying to
> > > >  transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> > > > >> TRANSITION_DISPATCH;
> > > >  UnassignProcedure table=work_proposed, region=
> > > > >> d0a58b76ad9376b12b3e763660049d3d,
> > > >  server=node3.com,16020,1514693337210; rit=OPENING, location=
> > > node3.com
> > > >  ,16020,1514693337210
> > > >  org.apache.hadoop.hbase.exceptions.UnexpectedStateException:
> > > Expected
> > > >  [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to
> > CLOSING
> > > > but
> > > >  current state=OPENING
> > > >  at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> > > >  nStateNode.transitio

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread stack
Thanks balazs. Did you find other failing tests or just this one?

S

On Jan 3, 2018 5:33 AM, "Balazs Meszaros" 
wrote:

> My observations:
>
> - signatures checksums are ok
> - shell worked
> - load test tool with read/write also worked
> - when I built it, TestDefaultCompactSelection failed. The fix is already
> here: HBASE-19666 
>
> On Mon, Jan 1, 2018 at 11:30 PM, stack  wrote:
>
> > Yes. Of course. Need your input lads.
> > S
> >
> > On Jan 1, 2018 3:15 PM, "Andrew Purtell" 
> wrote:
> >
> > > Seconded. I’ll be back later this week. Can try it out then?
> > >
> > >
> > > > On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> > > >
> > > > Is an extension here a reasonable ask? Putting the vote up right
> before
> > > > what is a long New Year weekend for many folks doesn't give a lot of
> > > > opportunity for thorough review.
> > > >
> > > > Mike
> > > >
> > > >> On Mon, Jan 1, 2018 at 1:30 PM, stack  wrote:
> > > >>
> > > >> This is great stuff jms.  Thank you.  Away from computer at mo but
> > will
> > > dig
> > > >> in.
> > > >>
> > > >> Is it possible old files left over written with old hbase with old
> > hfile
> > > >> version? Can you see on source?  They should have but updated by a
> > > >> compaction if a long time idle, I agree.
> > > >>
> > > >> Yeah. If region assign fails, and goes into assignable state, we
> need
> > > >> intervention. We've been shutting down all the ways in which this
> > could
> > > >> happen but you seem to have stumbled on a new one. I will take a
> look
> > at
> > > >> your logs.
> > > >>
> > > >> What you going to vote?  Does it basically work?
> > > >>
> > > >> Thanks again for the try out.
> > > >> S
> > > >>
> > > >> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <
> > > [email protected]>
> > > >> wrote:
> > > >>
> > > >> Sorry to spam the list :(
> > > >>
> > > >> Another interesting thing.
> > > >>
> > > >> Now most of my tablesare online. For few I'm getting this:
> > > >> Caused by: java.lang.IllegalArgumentException: Invalid HFile
> version:
> > > >> major=2, minor=1: expected at least major=2 and minor=3
> > > >>at
> > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
> > > >> HFileReaderImpl.java:332)
> > > >>at
> > > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> > > >> HFileReaderImpl.java:199)
> > > >>at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.
> > > >> java:538)
> > > >>... 13 more
> > > >>
> > > >> What is interesting is tat I'm not doing anything on the source
> > cluster
> > > for
> > > >> weeks/months. So all tables are all major compacted the same way. I
> > will
> > > >> major compact them all under HFiles v3 format and retry.
> > > >>
> > > >> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <
> > [email protected]
> > > >:
> > > >>
> > > >>> Ok. With a brand new DestCP from source cluster, regions are
> getting
> > > >>> assigned correctly. So sound like if they get stuck initially for
> any
> > > >>> reason, then even if the reason is fixed they can not get assigned
> > > >> anymore
> > > >>> again. Will keep playing.
> > > >>>
> > > >>> I kept the previous /hbase just in case we need something from it.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> JMS
> > > >>>
> > > >>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <
> > > [email protected]
> > > >>> :
> > > >>>
> > >  Nothing bad that I can see. Here is a region server log:
> > >  https://pastebin.com/0r76Y6ap
> > > 
> > >  Disabling the table makes the regions leave the transition mode.
> I'm
> > >  trying to disable all tables one by one (because it get stuck
> after
> > > each
> > >  disable) and will see if re-enabling them helps...
> > > 
> > >  On the master side, I now have errors all over:
> > >  2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> > >  assignment.RegionTransitionProcedure: Retryable error trying to
> > >  transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> > > >> TRANSITION_DISPATCH;
> > >  UnassignProcedure table=work_proposed, region=
> > > >> d0a58b76ad9376b12b3e763660049d3d,
> > >  server=node3.com,16020,1514693337210; rit=OPENING, location=
> > node3.com
> > >  ,16020,1514693337210
> > >  org.apache.hadoop.hbase.exceptions.UnexpectedStateException:
> > Expected
> > >  [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to
> CLOSING
> > > but
> > >  current state=OPENING
> > >  at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> > >  nStateNode.transitionState(RegionStates.java:155)
> > >  at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> > >  markRegionAsClosing(AssignmentManager.java:1530)
> > >  at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> > >  updateTransition(UnassignProcedure.java:179)
> > >  at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-03 Thread Balazs Meszaros
My observations:

- signatures checksums are ok
- shell worked
- load test tool with read/write also worked
- when I built it, TestDefaultCompactSelection failed. The fix is already
here: HBASE-19666 

On Mon, Jan 1, 2018 at 11:30 PM, stack  wrote:

> Yes. Of course. Need your input lads.
> S
>
> On Jan 1, 2018 3:15 PM, "Andrew Purtell"  wrote:
>
> > Seconded. I’ll be back later this week. Can try it out then?
> >
> >
> > > On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> > >
> > > Is an extension here a reasonable ask? Putting the vote up right before
> > > what is a long New Year weekend for many folks doesn't give a lot of
> > > opportunity for thorough review.
> > >
> > > Mike
> > >
> > >> On Mon, Jan 1, 2018 at 1:30 PM, stack  wrote:
> > >>
> > >> This is great stuff jms.  Thank you.  Away from computer at mo but
> will
> > dig
> > >> in.
> > >>
> > >> Is it possible old files left over written with old hbase with old
> hfile
> > >> version? Can you see on source?  They should have but updated by a
> > >> compaction if a long time idle, I agree.
> > >>
> > >> Yeah. If region assign fails, and goes into assignable state, we need
> > >> intervention. We've been shutting down all the ways in which this
> could
> > >> happen but you seem to have stumbled on a new one. I will take a look
> at
> > >> your logs.
> > >>
> > >> What you going to vote?  Does it basically work?
> > >>
> > >> Thanks again for the try out.
> > >> S
> > >>
> > >> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <
> > [email protected]>
> > >> wrote:
> > >>
> > >> Sorry to spam the list :(
> > >>
> > >> Another interesting thing.
> > >>
> > >> Now most of my tablesare online. For few I'm getting this:
> > >> Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
> > >> major=2, minor=1: expected at least major=2 and minor=3
> > >>at
> > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
> > >> HFileReaderImpl.java:332)
> > >>at
> > >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> > >> HFileReaderImpl.java:199)
> > >>at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.
> > >> java:538)
> > >>... 13 more
> > >>
> > >> What is interesting is tat I'm not doing anything on the source
> cluster
> > for
> > >> weeks/months. So all tables are all major compacted the same way. I
> will
> > >> major compact them all under HFiles v3 format and retry.
> > >>
> > >> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <
> [email protected]
> > >:
> > >>
> > >>> Ok. With a brand new DestCP from source cluster, regions are getting
> > >>> assigned correctly. So sound like if they get stuck initially for any
> > >>> reason, then even if the reason is fixed they can not get assigned
> > >> anymore
> > >>> again. Will keep playing.
> > >>>
> > >>> I kept the previous /hbase just in case we need something from it.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> JMS
> > >>>
> > >>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <
> > [email protected]
> > >>> :
> > >>>
> >  Nothing bad that I can see. Here is a region server log:
> >  https://pastebin.com/0r76Y6ap
> > 
> >  Disabling the table makes the regions leave the transition mode. I'm
> >  trying to disable all tables one by one (because it get stuck after
> > each
> >  disable) and will see if re-enabling them helps...
> > 
> >  On the master side, I now have errors all over:
> >  2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> >  assignment.RegionTransitionProcedure: Retryable error trying to
> >  transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> > >> TRANSITION_DISPATCH;
> >  UnassignProcedure table=work_proposed, region=
> > >> d0a58b76ad9376b12b3e763660049d3d,
> >  server=node3.com,16020,1514693337210; rit=OPENING, location=
> node3.com
> >  ,16020,1514693337210
> >  org.apache.hadoop.hbase.exceptions.UnexpectedStateException:
> Expected
> >  [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING
> > but
> >  current state=OPENING
> >  at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> >  nStateNode.transitionState(RegionStates.java:155)
> >  at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> >  markRegionAsClosing(AssignmentManager.java:1530)
> >  at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> >  updateTransition(UnassignProcedure.java:179)
> >  at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> >  ocedure.execute(RegionTransitionProcedure.java:309)
> >  at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> >  ocedure.execute(RegionTransitionProcedure.java:85)
> >  at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
> >  dure.java:845)
> >  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> >  cedure(ProcedureExecu

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-01 Thread stack
Yes. Of course. Need your input lads.
S

On Jan 1, 2018 3:15 PM, "Andrew Purtell"  wrote:

> Seconded. I’ll be back later this week. Can try it out then?
>
>
> > On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> >
> > Is an extension here a reasonable ask? Putting the vote up right before
> > what is a long New Year weekend for many folks doesn't give a lot of
> > opportunity for thorough review.
> >
> > Mike
> >
> >> On Mon, Jan 1, 2018 at 1:30 PM, stack  wrote:
> >>
> >> This is great stuff jms.  Thank you.  Away from computer at mo but will
> dig
> >> in.
> >>
> >> Is it possible old files left over written with old hbase with old hfile
> >> version? Can you see on source?  They should have but updated by a
> >> compaction if a long time idle, I agree.
> >>
> >> Yeah. If region assign fails, and goes into assignable state, we need
> >> intervention. We've been shutting down all the ways in which this could
> >> happen but you seem to have stumbled on a new one. I will take a look at
> >> your logs.
> >>
> >> What you going to vote?  Does it basically work?
> >>
> >> Thanks again for the try out.
> >> S
> >>
> >> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <
> [email protected]>
> >> wrote:
> >>
> >> Sorry to spam the list :(
> >>
> >> Another interesting thing.
> >>
> >> Now most of my tablesare online. For few I'm getting this:
> >> Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
> >> major=2, minor=1: expected at least major=2 and minor=3
> >>at
> >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
> >> HFileReaderImpl.java:332)
> >>at
> >> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> >> HFileReaderImpl.java:199)
> >>at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.
> >> java:538)
> >>... 13 more
> >>
> >> What is interesting is tat I'm not doing anything on the source cluster
> for
> >> weeks/months. So all tables are all major compacted the same way. I will
> >> major compact them all under HFiles v3 format and retry.
> >>
> >> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari  >:
> >>
> >>> Ok. With a brand new DestCP from source cluster, regions are getting
> >>> assigned correctly. So sound like if they get stuck initially for any
> >>> reason, then even if the reason is fixed they can not get assigned
> >> anymore
> >>> again. Will keep playing.
> >>>
> >>> I kept the previous /hbase just in case we need something from it.
> >>>
> >>> Thanks,
> >>>
> >>> JMS
> >>>
> >>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <
> [email protected]
> >>> :
> >>>
>  Nothing bad that I can see. Here is a region server log:
>  https://pastebin.com/0r76Y6ap
> 
>  Disabling the table makes the regions leave the transition mode. I'm
>  trying to disable all tables one by one (because it get stuck after
> each
>  disable) and will see if re-enabling them helps...
> 
>  On the master side, I now have errors all over:
>  2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
>  assignment.RegionTransitionProcedure: Retryable error trying to
>  transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> >> TRANSITION_DISPATCH;
>  UnassignProcedure table=work_proposed, region=
> >> d0a58b76ad9376b12b3e763660049d3d,
>  server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
>  ,16020,1514693337210
>  org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
>  [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING
> but
>  current state=OPENING
>  at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
>  nStateNode.transitionState(RegionStates.java:155)
>  at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
>  markRegionAsClosing(AssignmentManager.java:1530)
>  at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
>  updateTransition(UnassignProcedure.java:179)
>  at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>  ocedure.execute(RegionTransitionProcedure.java:309)
>  at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>  ocedure.execute(RegionTransitionProcedure.java:85)
>  at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
>  dure.java:845)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
>  cedure(ProcedureExecutor.java:1456)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
>  Procedure(ProcedureExecutor.java:1225)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
>  800(ProcedureExecutor.java:78)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
>  hread.run(ProcedureExecutor.java:1735)
> 
>  Non-stop showing on the logs. Probably because I disabled the table.
>  Restarting HBase so see if it clears that a but...
> 
>  After restart there isn't any org.apache.hadoop.hbase.except
> >

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-01 Thread Andrew Purtell
Seconded. I’ll be back later this week. Can try it out then? 


> On Jan 1, 2018, at 12:13 PM, Mike Drob  wrote:
> 
> Is an extension here a reasonable ask? Putting the vote up right before
> what is a long New Year weekend for many folks doesn't give a lot of
> opportunity for thorough review.
> 
> Mike
> 
>> On Mon, Jan 1, 2018 at 1:30 PM, stack  wrote:
>> 
>> This is great stuff jms.  Thank you.  Away from computer at mo but will dig
>> in.
>> 
>> Is it possible old files left over written with old hbase with old hfile
>> version? Can you see on source?  They should have but updated by a
>> compaction if a long time idle, I agree.
>> 
>> Yeah. If region assign fails, and goes into assignable state, we need
>> intervention. We've been shutting down all the ways in which this could
>> happen but you seem to have stumbled on a new one. I will take a look at
>> your logs.
>> 
>> What you going to vote?  Does it basically work?
>> 
>> Thanks again for the try out.
>> S
>> 
>> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" 
>> wrote:
>> 
>> Sorry to spam the list :(
>> 
>> Another interesting thing.
>> 
>> Now most of my tablesare online. For few I'm getting this:
>> Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
>> major=2, minor=1: expected at least major=2 and minor=3
>>at
>> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
>> HFileReaderImpl.java:332)
>>at
>> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
>> HFileReaderImpl.java:199)
>>at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.
>> java:538)
>>... 13 more
>> 
>> What is interesting is tat I'm not doing anything on the source cluster for
>> weeks/months. So all tables are all major compacted the same way. I will
>> major compact them all under HFiles v3 format and retry.
>> 
>> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari :
>> 
>>> Ok. With a brand new DestCP from source cluster, regions are getting
>>> assigned correctly. So sound like if they get stuck initially for any
>>> reason, then even if the reason is fixed they can not get assigned
>> anymore
>>> again. Will keep playing.
>>> 
>>> I kept the previous /hbase just in case we need something from it.
>>> 
>>> Thanks,
>>> 
>>> JMS
>>> 
>>> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari >> :
>>> 
 Nothing bad that I can see. Here is a region server log:
 https://pastebin.com/0r76Y6ap
 
 Disabling the table makes the regions leave the transition mode. I'm
 trying to disable all tables one by one (because it get stuck after each
 disable) and will see if re-enabling them helps...
 
 On the master side, I now have errors all over:
 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
 assignment.RegionTransitionProcedure: Retryable error trying to
 transition: pid=511, ppid=398, state=RUNNABLE:REGION_
>> TRANSITION_DISPATCH;
 UnassignProcedure table=work_proposed, region=
>> d0a58b76ad9376b12b3e763660049d3d,
 server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
 ,16020,1514693337210
 org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
 [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
 current state=OPENING
 at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
 nStateNode.transitionState(RegionStates.java:155)
 at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
 markRegionAsClosing(AssignmentManager.java:1530)
 at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
 updateTransition(UnassignProcedure.java:179)
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
 ocedure.execute(RegionTransitionProcedure.java:309)
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
 ocedure.execute(RegionTransitionProcedure.java:85)
 at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
 dure.java:845)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
 cedure(ProcedureExecutor.java:1456)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
 Procedure(ProcedureExecutor.java:1225)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
 800(ProcedureExecutor.java:78)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
 hread.run(ProcedureExecutor.java:1735)
 
 Non-stop showing on the logs. Probably because I disabled the table.
 Restarting HBase so see if it clears that a but...
 
 After restart there isn't any org.apache.hadoop.hbase.except
 ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
 bad. But still, regions are stuck in transition even for the disabled
 tables.
 
 Master ls are here. I removed some sections because it always says the
 same thing, for each and every single region: https://pastebin.com/K
 6SQ7DXP
 
 JMS
 
 2017

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-01 Thread Mike Drob
Is an extension here a reasonable ask? Putting the vote up right before
what is a long New Year weekend for many folks doesn't give a lot of
opportunity for thorough review.

Mike

On Mon, Jan 1, 2018 at 1:30 PM, stack  wrote:

> This is great stuff jms.  Thank you.  Away from computer at mo but will dig
> in.
>
> Is it possible old files left over written with old hbase with old hfile
> version? Can you see on source?  They should have but updated by a
> compaction if a long time idle, I agree.
>
> Yeah. If region assign fails, and goes into assignable state, we need
> intervention. We've been shutting down all the ways in which this could
> happen but you seem to have stumbled on a new one. I will take a look at
> your logs.
>
> What you going to vote?  Does it basically work?
>
> Thanks again for the try out.
> S
>
> On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" 
> wrote:
>
> Sorry to spam the list :(
>
> Another interesting thing.
>
> Now most of my tablesare online. For few I'm getting this:
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
> major=2, minor=1: expected at least major=2 and minor=3
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
> HFileReaderImpl.java:332)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
> HFileReaderImpl.java:199)
> at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.
> java:538)
> ... 13 more
>
> What is interesting is tat I'm not doing anything on the source cluster for
> weeks/months. So all tables are all major compacted the same way. I will
> major compact them all under HFiles v3 format and retry.
>
> 2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari :
>
> > Ok. With a brand new DestCP from source cluster, regions are getting
> > assigned correctly. So sound like if they get stuck initially for any
> > reason, then even if the reason is fixed they can not get assigned
> anymore
> > again. Will keep playing.
> >
> > I kept the previous /hbase just in case we need something from it.
> >
> > Thanks,
> >
> > JMS
> >
> > 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari  >:
> >
> >> Nothing bad that I can see. Here is a region server log:
> >> https://pastebin.com/0r76Y6ap
> >>
> >> Disabling the table makes the regions leave the transition mode. I'm
> >> trying to disable all tables one by one (because it get stuck after each
> >> disable) and will see if re-enabling them helps...
> >>
> >> On the master side, I now have errors all over:
> >> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> >> assignment.RegionTransitionProcedure: Retryable error trying to
> >> transition: pid=511, ppid=398, state=RUNNABLE:REGION_
> TRANSITION_DISPATCH;
> >> UnassignProcedure table=work_proposed, region=
> d0a58b76ad9376b12b3e763660049d3d,
> >> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
> >> ,16020,1514693337210
> >> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> >> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> >> current state=OPENING
> >> at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> >> nStateNode.transitionState(RegionStates.java:155)
> >> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> >> markRegionAsClosing(AssignmentManager.java:1530)
> >> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> >> updateTransition(UnassignProcedure.java:179)
> >> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> >> ocedure.execute(RegionTransitionProcedure.java:309)
> >> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> >> ocedure.execute(RegionTransitionProcedure.java:85)
> >> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
> >> dure.java:845)
> >> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> >> cedure(ProcedureExecutor.java:1456)
> >> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
> >> Procedure(ProcedureExecutor.java:1225)
> >> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
> >> 800(ProcedureExecutor.java:78)
> >> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
> >> hread.run(ProcedureExecutor.java:1735)
> >>
> >> Non-stop showing on the logs. Probably because I disabled the table.
> >> Restarting HBase so see if it clears that a but...
> >>
> >> After restart there isn't any org.apache.hadoop.hbase.except
> >> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
> >> bad. But still, regions are stuck in transition even for the disabled
> >> tables.
> >>
> >> Master ls are here. I removed some sections because it always says the
> >> same thing, for each and every single region: https://pastebin.com/K
> >> 6SQ7DXP
> >>
> >> JMS
> >>
> >> 2017-12-31 9:58 GMT-05:00 stack :
> >>
> >>> There is nothing further up in the master log from regionservers or on
> >>> regionservers side on open?
> >>>
> >>> Thanks,
> >>> S
> >>>
> >>> On Dec 31, 201

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-01 Thread stack
This is great stuff jms.  Thank you.  Away from computer at mo but will dig
in.

Is it possible old files left over written with old hbase with old hfile
version? Can you see on source?  They should have but updated by a
compaction if a long time idle, I agree.

Yeah. If region assign fails, and goes into assignable state, we need
intervention. We've been shutting down all the ways in which this could
happen but you seem to have stumbled on a new one. I will take a look at
your logs.

What you going to vote?  Does it basically work?

Thanks again for the try out.
S

On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" 
wrote:

Sorry to spam the list :(

Another interesting thing.

Now most of my tablesare online. For few I'm getting this:
Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
major=2, minor=1: expected at least major=2 and minor=3
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
HFileReaderImpl.java:332)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(
HFileReaderImpl.java:199)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
... 13 more

What is interesting is tat I'm not doing anything on the source cluster for
weeks/months. So all tables are all major compacted the same way. I will
major compact them all under HFiles v3 format and retry.

2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari :

> Ok. With a brand new DestCP from source cluster, regions are getting
> assigned correctly. So sound like if they get stuck initially for any
> reason, then even if the reason is fixed they can not get assigned anymore
> again. Will keep playing.
>
> I kept the previous /hbase just in case we need something from it.
>
> Thanks,
>
> JMS
>
> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari :
>
>> Nothing bad that I can see. Here is a region server log:
>> https://pastebin.com/0r76Y6ap
>>
>> Disabling the table makes the regions leave the transition mode. I'm
>> trying to disable all tables one by one (because it get stuck after each
>> disable) and will see if re-enabling them helps...
>>
>> On the master side, I now have errors all over:
>> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
>> assignment.RegionTransitionProcedure: Retryable error trying to
>> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
>> UnassignProcedure table=work_proposed, region=
d0a58b76ad9376b12b3e763660049d3d,
>> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
>> ,16020,1514693337210
>> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
>> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
>> current state=OPENING
>> at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
>> nStateNode.transitionState(RegionStates.java:155)
>> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
>> markRegionAsClosing(AssignmentManager.java:1530)
>> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
>> updateTransition(UnassignProcedure.java:179)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:309)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:85)
>> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
>> dure.java:845)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
>> cedure(ProcedureExecutor.java:1456)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
>> Procedure(ProcedureExecutor.java:1225)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
>> 800(ProcedureExecutor.java:78)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
>> hread.run(ProcedureExecutor.java:1735)
>>
>> Non-stop showing on the logs. Probably because I disabled the table.
>> Restarting HBase so see if it clears that a but...
>>
>> After restart there isn't any org.apache.hadoop.hbase.except
>> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
>> bad. But still, regions are stuck in transition even for the disabled
>> tables.
>>
>> Master ls are here. I removed some sections because it always says the
>> same thing, for each and every single region: https://pastebin.com/K
>> 6SQ7DXP
>>
>> JMS
>>
>> 2017-12-31 9:58 GMT-05:00 stack :
>>
>>> There is nothing further up in the master log from regionservers or on
>>> regionservers side on open?
>>>
>>> Thanks,
>>> S
>>>
>>> On Dec 31, 2017 8:37 AM, "stack"  wrote:
>>>
>>> > Good questions.  If you disable snappy does it work?  If you start
over
>>> > fresh does it work?  It should be picking up native libs.  Make an
>>> issue
>>> > please jms.  Thanks for giving it a go.
>>> >
>>> > S
>>> >
>>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>>> [email protected]>
>>> > wrote:
>>> >
>>> >> Hi Stack,
>>> >>
>>> >> I just tried to give it a try... Wipe out all HDFS content and code,

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Sorry to spam the list :(

Another interesting thing.

Now most of my tablesare online. For few I'm getting this:
Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
major=2, minor=1: expected at least major=2 and minor=3
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(HFileReaderImpl.java:332)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(HFileReaderImpl.java:199)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
... 13 more

What is interesting is tat I'm not doing anything on the source cluster for
weeks/months. So all tables are all major compacted the same way. I will
major compact them all under HFiles v3 format and retry.

2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari :

> Ok. With a brand new DestCP from source cluster, regions are getting
> assigned correctly. So sound like if they get stuck initially for any
> reason, then even if the reason is fixed they can not get assigned anymore
> again. Will keep playing.
>
> I kept the previous /hbase just in case we need something from it.
>
> Thanks,
>
> JMS
>
> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari :
>
>> Nothing bad that I can see. Here is a region server log:
>> https://pastebin.com/0r76Y6ap
>>
>> Disabling the table makes the regions leave the transition mode. I'm
>> trying to disable all tables one by one (because it get stuck after each
>> disable) and will see if re-enabling them helps...
>>
>> On the master side, I now have errors all over:
>> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
>> assignment.RegionTransitionProcedure: Retryable error trying to
>> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
>> UnassignProcedure table=work_proposed, 
>> region=d0a58b76ad9376b12b3e763660049d3d,
>> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
>> ,16020,1514693337210
>> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
>> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
>> current state=OPENING
>> at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
>> nStateNode.transitionState(RegionStates.java:155)
>> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
>> markRegionAsClosing(AssignmentManager.java:1530)
>> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
>> updateTransition(UnassignProcedure.java:179)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:309)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:85)
>> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
>> dure.java:845)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
>> cedure(ProcedureExecutor.java:1456)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
>> Procedure(ProcedureExecutor.java:1225)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
>> 800(ProcedureExecutor.java:78)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
>> hread.run(ProcedureExecutor.java:1735)
>>
>> Non-stop showing on the logs. Probably because I disabled the table.
>> Restarting HBase so see if it clears that a but...
>>
>> After restart there isn't any org.apache.hadoop.hbase.except
>> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
>> bad. But still, regions are stuck in transition even for the disabled
>> tables.
>>
>> Master ls are here. I removed some sections because it always says the
>> same thing, for each and every single region: https://pastebin.com/K
>> 6SQ7DXP
>>
>> JMS
>>
>> 2017-12-31 9:58 GMT-05:00 stack :
>>
>>> There is nothing further up in the master log from regionservers or on
>>> regionservers side on open?
>>>
>>> Thanks,
>>> S
>>>
>>> On Dec 31, 2017 8:37 AM, "stack"  wrote:
>>>
>>> > Good questions.  If you disable snappy does it work?  If you start over
>>> > fresh does it work?  It should be picking up native libs.  Make an
>>> issue
>>> > please jms.  Thanks for giving it a go.
>>> >
>>> > S
>>> >
>>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>>> [email protected]>
>>> > wrote:
>>> >
>>> >> Hi Stack,
>>> >>
>>> >> I just tried to give it a try... Wipe out all HDFS content and code,
>>> all
>>> >> HBase content and code, and all ZK. Re-build a brand new cluster with
>>> 7
>>> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
>>> able
>>> >> to get my regions online.
>>> >>
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node8.16020,151469206, table=pageMini,
>>> >> region=a778eb67898dfd378e426f2e7700faea
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Ok. With a brand new DestCP from source cluster, regions are getting
assigned correctly. So sound like if they get stuck initially for any
reason, then even if the reason is fixed they can not get assigned anymore
again. Will keep playing.

I kept the previous /hbase just in case we need something from it.

Thanks,

JMS

2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari :

> Nothing bad that I can see. Here is a region server log:
> https://pastebin.com/0r76Y6ap
>
> Disabling the table makes the regions leave the transition mode. I'm
> trying to disable all tables one by one (because it get stuck after each
> disable) and will see if re-enabling them helps...
>
> On the master side, I now have errors all over:
> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> assignment.RegionTransitionProcedure: Retryable error trying to
> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
> UnassignProcedure table=work_proposed, 
> region=d0a58b76ad9376b12b3e763660049d3d,
> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
> ,16020,1514693337210
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> current state=OPENING
> at org.apache.hadoop.hbase.master.assignment.RegionStates$
> RegionStateNode.transitionState(RegionStates.java:155)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> markRegionAsClosing(AssignmentManager.java:1530)
> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> updateTransition(UnassignProcedure.java:179)
> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:309)
> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(
> Procedure.java:845)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> cedure(ProcedureExecutor.java:1456)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
> Procedure(ProcedureExecutor.java:1225)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
> 800(ProcedureExecutor.java:78)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
> hread.run(ProcedureExecutor.java:1735)
>
> Non-stop showing on the logs. Probably because I disabled the table.
> Restarting HBase so see if it clears that a but...
>
> After restart there isn't any org.apache.hadoop.hbase.except
> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
> bad. But still, regions are stuck in transition even for the disabled
> tables.
>
> Master ls are here. I removed some sections because it always says the
> same thing, for each and every single region: https://pastebin.com/K
> 6SQ7DXP
>
> JMS
>
> 2017-12-31 9:58 GMT-05:00 stack :
>
>> There is nothing further up in the master log from regionservers or on
>> regionservers side on open?
>>
>> Thanks,
>> S
>>
>> On Dec 31, 2017 8:37 AM, "stack"  wrote:
>>
>> > Good questions.  If you disable snappy does it work?  If you start over
>> > fresh does it work?  It should be picking up native libs.  Make an issue
>> > please jms.  Thanks for giving it a go.
>> >
>> > S
>> >
>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>> [email protected]>
>> > wrote:
>> >
>> >> Hi Stack,
>> >>
>> >> I just tried to give it a try... Wipe out all HDFS content and code,
>> all
>> >> HBase content and code, and all ZK. Re-build a brand new cluster with 7
>> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
>> able
>> >> to get my regions online.
>> >>
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node8.16020,151469206, table=pageMini,
>> >> region=a778eb67898dfd378e426f2e7700faea
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node6.16020,1514693336563, table=work_proposed,
>> >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node1.16020,1514693336898, table=page_crc,
>> >> region=86b3912a09a5676b6851636ed22c2abc
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node7.16020,1514693337406, table=pageAvro,
>> >> region=391784c43c87bdea6df05f96accad0ff
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node8.16020,151469206, table=page,
>> >> region=5850d782a3beea18872769bf8fd70fc7
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.Assign

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Nothing bad that I can see. Here is a region server log:
https://pastebin.com/0r76Y6ap

Disabling the table makes the regions leave the transition mode. I'm trying
to disable all tables one by one (because it get stuck after each disable)
and will see if re-enabling them helps...

On the master side, I now have errors all over:
2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
assignment.RegionTransitionProcedure: Retryable error trying to transition:
pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
UnassignProcedure table=work_proposed,
region=d0a58b76ad9376b12b3e763660049d3d, server=node3.com,16020,1514693337210;
rit=OPENING, location=node3.com,16020,1514693337210
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
current state=OPENING
at
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1530)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)

Non-stop showing on the logs. Probably because I disabled the table.
Restarting HBase so see if it clears that a but...

After restart there isn't any
org.apache.hadoop.hbase.exceptions.UnexpectedStateException on the logs.
Only INFO lever. And nothing bad. But still, regions are stuck in
transition even for the disabled tables.

Master ls are here. I removed some sections because it always says the same
thing, for each and every single region: https://pastebin.com/K6SQ7DXP

JMS

2017-12-31 9:58 GMT-05:00 stack :

> There is nothing further up in the master log from regionservers or on
> regionservers side on open?
>
> Thanks,
> S
>
> On Dec 31, 2017 8:37 AM, "stack"  wrote:
>
> > Good questions.  If you disable snappy does it work?  If you start over
> > fresh does it work?  It should be picking up native libs.  Make an issue
> > please jms.  Thanks for giving it a go.
> >
> > S
> >
> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari"  >
> > wrote:
> >
> >> Hi Stack,
> >>
> >> I just tried to give it a try... Wipe out all HDFS content and code, all
> >> HBase content and code, and all ZK. Re-build a brand new cluster with 7
> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
> able
> >> to get my regions online.
> >>
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node8.16020,151469206, table=pageMini,
> >> region=a778eb67898dfd378e426f2e7700faea
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node6.16020,1514693336563, table=work_proposed,
> >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node1.16020,1514693336898, table=page_crc,
> >> region=86b3912a09a5676b6851636ed22c2abc
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node7.16020,1514693337406, table=pageAvro,
> >> region=391784c43c87bdea6df05f96accad0ff
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node8.16020,151469206, table=page,
> >> region=5850d782a3beea18872769bf8fd70fc7
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node5.16020,1514693330961, table=work_proposed,
> >> region=1d892c9b54b66f802b82c2f9fe847f1f
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node5.16020,1514693330961, table=pageAvro,
> >> region=e9de2c68cc01883e959d7953a4251687
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node3.

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
> Good questions.  If you disable snappy does it work?
See below. I don't think it's related to snappy anymore.

> If you start over fresh does it work?
DistCP in progress. Will let you know in 4 hours...

> It should be picking up native libs.  Make an issue please jms.  Thanks
for giving it a go.
Native was my bad. So no issue here, except maybe on documentation ;)


Ok. Some progress here. I'm able to get Snappy working fine in both HDFS
and HBase side.

hbase@node2:~/hbase-2.0.0-beta-1$ bin/hbase
org.apache.hadoop.hbase.util.CompressionTest hdfs://node2/tmp/empty.txt
snappy
Linux-amd64-64
2017-12-31 02:36:51,745 INFO  [main] metrics.MetricRegistries: Loaded
MetricRegistries class
org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
2017-12-31 02:36:51,874 INFO  [main] hfile.CacheConfig: Created
cacheConfig: CacheConfig:disabled
2017-12-31 02:36:52,122 INFO  [main] compress.CodecPool: Got brand-new
compressor [.snappy]
2017-12-31 02:36:52,142 INFO  [main] compress.CodecPool: Got brand-new
compressor [.snappy]
2017-12-31 02:36:52,647 INFO  [main] hfile.CacheConfig: Created
cacheConfig: CacheConfig:disabled
2017-12-31 02:36:52,758 INFO  [main] compress.CodecPool: Got brand-new
decompressor [.snappy]
SUCCESS

But my regions are still not able to open, with still not any information
on the RS side. So I don't think anymore if it's because of Snappy... I
kept it running over night and still same state this morning. All my snappy
table are not deployed, but also some of my non-snappy tables are not. And
some small tables are. All single region tables are deployed correctly. All
multi-region tables are stuck. Interesting, but I don't really think there
is a pattern here. I tried running the disable command on tables with
regions in transition but the command never returns.

Last, when looking at the master web UI while HBase is starting, I got the
error below.

I will continue to play with that today to try to get it work. I will try
to open JIRAs for what ever I think is failing.

HTTP ERROR 500

Problem accessing /master-status. Reason:

Server Error

Caused by:

java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2738)
at 
org.apache.hadoop.hbase.master.HMaster.isBalancerOn(HMaster.java:3257)
at 
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:249)
at 
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:387)
at 
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:378)
at 
org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at 
org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1371)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread stack
There is nothing further up in the master log from regionservers or on
regionservers side on open?

Thanks,
S

On Dec 31, 2017 8:37 AM, "stack"  wrote:

> Good questions.  If you disable snappy does it work?  If you start over
> fresh does it work?  It should be picking up native libs.  Make an issue
> please jms.  Thanks for giving it a go.
>
> S
>
> On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" 
> wrote:
>
>> Hi Stack,
>>
>> I just tried to give it a try... Wipe out all HDFS content and code, all
>> HBase content and code, and all ZK. Re-build a brand new cluster with 7
>> physical worker nodes. I'm able to get HBase start, how-ever I'm not able
>> to get my regions online.
>>
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node8.16020,151469206, table=pageMini,
>> region=a778eb67898dfd378e426f2e7700faea
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node6.16020,1514693336563, table=work_proposed,
>> region=4a1d86197ace3f4c8b1c8de28dbe1d34
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node1.16020,1514693336898, table=page_crc,
>> region=86b3912a09a5676b6851636ed22c2abc
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node7.16020,1514693337406, table=pageAvro,
>> region=391784c43c87bdea6df05f96accad0ff
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node8.16020,151469206, table=page,
>> region=5850d782a3beea18872769bf8fd70fc7
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node5.16020,1514693330961, table=work_proposed,
>> region=1d892c9b54b66f802b82c2f9fe847f1f
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node5.16020,1514693330961, table=pageAvro,
>> region=e9de2c68cc01883e959d7953a4251687
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node3.16020,1514693337210, table=page,
>> region=e2e5fc1c262273893f10e92f24817d1b
>> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node3.16020,1514693337210, table=page,
>> region=89c443c09f10bd1584b1bb86a637e1a8
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node5.16020,1514693330961, table=page,
>> region=8ca93e9285233ca7b31992f194056bc1
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node4.16020,1514693339685, table=work_proposed,
>> region=9afcf06c4d0d21d7e04b0223edcfc40a
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node6.16020,1514693336563, table=page,
>> region=3457b3237c576eecd550eccee3f584cd
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node1.16020,1514693336898, table=page,
>> region=dd5fb1dbd41945a9ccbc110b8d4a51b5
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node7.16020,1514693337406, table=work_proposed,
>> region=480bb37af54d9fa57c727da9e8a33578
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node8.16020,151469206, table=page_crc,
>> region=56b18d470a569c5474ea084f0d995726
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node6.16020,1514693336563, table=page_duplicate,
>> region=e744a9af161de965c70c7d1a08b07660
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node1.16020,1514693336898, table=page_proposed,
>> region=1c75e53308acac6313db4be63c2b48fe
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> location=node8.16020,151469206, table=work_proposed,
>> region=45a25ba85f6341a177db7b15554259f9
>> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>> assignment.AssignmentManager: TODO Handle stuck in transition:
>> rit=OPENING,
>> loc

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread stack
Good questions.  If you disable snappy does it work?  If you start over
fresh does it work?  It should be picking up native libs.  Make an issue
please jms.  Thanks for giving it a go.

S

On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" 
wrote:

> Hi Stack,
>
> I just tried to give it a try... Wipe out all HDFS content and code, all
> HBase content and code, and all ZK. Re-build a brand new cluster with 7
> physical worker nodes. I'm able to get HBase start, how-ever I'm not able
> to get my regions online.
>
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=pageMini,
> region=a778eb67898dfd378e426f2e7700faea
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=work_proposed,
> region=4a1d86197ace3f4c8b1c8de28dbe1d34
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_crc,
> region=86b3912a09a5676b6851636ed22c2abc
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=pageAvro,
> region=391784c43c87bdea6df05f96accad0ff
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page,
> region=5850d782a3beea18872769bf8fd70fc7
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=work_proposed,
> region=1d892c9b54b66f802b82c2f9fe847f1f
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=pageAvro,
> region=e9de2c68cc01883e959d7953a4251687
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page,
> region=e2e5fc1c262273893f10e92f24817d1b
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page,
> region=89c443c09f10bd1584b1bb86a637e1a8
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=page,
> region=8ca93e9285233ca7b31992f194056bc1
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node4.16020,1514693339685, table=work_proposed,
> region=9afcf06c4d0d21d7e04b0223edcfc40a
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page,
> region=3457b3237c576eecd550eccee3f584cd
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page,
> region=dd5fb1dbd41945a9ccbc110b8d4a51b5
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=work_proposed,
> region=480bb37af54d9fa57c727da9e8a33578
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page_crc,
> region=56b18d470a569c5474ea084f0d995726
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page_duplicate,
> region=e744a9af161de965c70c7d1a08b07660
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_proposed,
> region=1c75e53308acac6313db4be63c2b48fe
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=work_proposed,
> region=45a25ba85f6341a177db7b15554259f9
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=work_proposed,
> region=d0a58b76ad9376b12b3e763660049d3d
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page,
> region=599a4b7b21b1d93fa232ebbb

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-30 Thread Jean-Marc Spaggiari
I forgot to say that I distCP the entire /hbase folder from another 1.3
HBase cluster ;) That's why there is data here.

2017-12-31 0:48 GMT-05:00 Jean-Marc Spaggiari :

> Hi Stack,
>
> I just tried to give it a try... Wipe out all HDFS content and code, all
> HBase content and code, and all ZK. Re-build a brand new cluster with 7
> physical worker nodes. I'm able to get HBase start, how-ever I'm not able
> to get my regions online.
>
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=pageMini, region=
> a778eb67898dfd378e426f2e7700faea
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=work_proposed, region=
> 4a1d86197ace3f4c8b1c8de28dbe1d34
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_crc, region=
> 86b3912a09a5676b6851636ed22c2abc
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=pageAvro, region=
> 391784c43c87bdea6df05f96accad0ff
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page, region=
> 5850d782a3beea18872769bf8fd70fc7
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=work_proposed, region=
> 1d892c9b54b66f802b82c2f9fe847f1f
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=pageAvro, region=
> e9de2c68cc01883e959d7953a4251687
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page, region=
> e2e5fc1c262273893f10e92f24817d1b
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page, region=
> 89c443c09f10bd1584b1bb86a637e1a8
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=page, region=
> 8ca93e9285233ca7b31992f194056bc1
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node4.16020,1514693339685, table=work_proposed, region=
> 9afcf06c4d0d21d7e04b0223edcfc40a
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page, region=
> 3457b3237c576eecd550eccee3f584cd
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page, region=
> dd5fb1dbd41945a9ccbc110b8d4a51b5
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=work_proposed, region=
> 480bb37af54d9fa57c727da9e8a33578
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page_crc, region=
> 56b18d470a569c5474ea084f0d995726
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page_duplicate, region=
> e744a9af161de965c70c7d1a08b07660
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_proposed, region=
> 1c75e53308acac6313db4be63c2b48fe
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=work_proposed, region=
> 45a25ba85f6341a177db7b15554259f9
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=work_proposed, region=
> d0a58b76ad9376b12b3e763660049d3d
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page, region=
> 599a4b7b21b1d93fa232ebbbef37a31b
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeou

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-30 Thread Jean-Marc Spaggiari
Hi Stack,

I just tried to give it a try... Wipe out all HDFS content and code, all
HBase content and code, and all ZK. Re-build a brand new cluster with 7
physical worker nodes. I'm able to get HBase start, how-ever I'm not able
to get my regions online.

2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=pageMini,
region=a778eb67898dfd378e426f2e7700faea
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=work_proposed,
region=4a1d86197ace3f4c8b1c8de28dbe1d34
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_crc,
region=86b3912a09a5676b6851636ed22c2abc
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node7.16020,1514693337406, table=pageAvro,
region=391784c43c87bdea6df05f96accad0ff
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=page,
region=5850d782a3beea18872769bf8fd70fc7
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=work_proposed,
region=1d892c9b54b66f802b82c2f9fe847f1f
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=pageAvro,
region=e9de2c68cc01883e959d7953a4251687
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=e2e5fc1c262273893f10e92f24817d1b
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=89c443c09f10bd1584b1bb86a637e1a8
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=page,
region=8ca93e9285233ca7b31992f194056bc1
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node4.16020,1514693339685, table=work_proposed,
region=9afcf06c4d0d21d7e04b0223edcfc40a
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=page,
region=3457b3237c576eecd550eccee3f584cd
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page,
region=dd5fb1dbd41945a9ccbc110b8d4a51b5
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node7.16020,1514693337406, table=work_proposed,
region=480bb37af54d9fa57c727da9e8a33578
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=page_crc,
region=56b18d470a569c5474ea084f0d995726
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=page_duplicate,
region=e744a9af161de965c70c7d1a08b07660
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_proposed,
region=1c75e53308acac6313db4be63c2b48fe
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=work_proposed,
region=45a25ba85f6341a177db7b15554259f9
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=work_proposed,
region=d0a58b76ad9376b12b3e763660049d3d
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=599a4b7b21b1d93fa232ebbbef37a31b
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_proposed,
region=55c07269cc907b8e8875c2a1c4ec27d5
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.,16020,1514693330961, table=page_crc,
region=fa3a