Re: Flaky dashboard for current branch-2

2018-01-12 Thread Zach York
Thanks for the explanation Appy!

bq. I think we can actually update the script to send a mail to dev@ when it
encounters these 100% failing tests. Waana try? :)

That would be cool, shame people into fixing tests :) I can try to take a
look into that.



On Fri, Jan 12, 2018 at 5:01 PM, Ted Yu  wrote:

> There is more than one reason.
>
> Sometimes QA reported tests in a module failed.
> When artifact/patchprocess/patch-unit-hbase-server.txt is checked, there
> were more than one occurrence of the following :
>
> https://pastebin.com/WBewfj3Q
>
> It is hard to decipher what was behind the crash.
> Finding hanging test currently is not automated.
>
> Also note the following at the beginning of the test run:
>
> https://pastebin.com/sK6ebk84
>
> FYI
>
> On Fri, Jan 12, 2018 at 4:35 PM, 张铎(Duo Zhang) 
> wrote:
>
> > Why a 100% failure test can not be detected with pre commit check?
> >
> > Ted Yu 于2018年1月13日 周六07:44写道:
> >
> > > As we get closer and closer to beta release, it is important to have as
> > few
> > > flaky tests as possible.
> > >
> > > bq. we can actually update the script to send a mail to dev@
> > >
> > > A post to the JIRA which caused the 100% failing test would be better.
> > > The committer would notice the post and take corresponding action.
> > >
> > > Cheers
> > >
> > > On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma 
> > > wrote:
> > >
> > > > >   Is Nightly now using a list of flakes?
> > > > Dashboard job was flaky yesterday, so didn't start using it. Looks
> like
> > > > it's working fine now. Let me exclude flakies from nightly job.
> > > >
> > > > > Just took a look at the dashboard. Does this capture only failed
> runs
> > > or
> > > > all
> > > > runs?
> > > > Sorry the question isn't clear. Runs of what?
> > > > Here's an attempt to answer it in best way i can understand - it
> looks
> > at
> > > > last X (x=6 now) runs of nightly branch-2 to collect failing,
> hanging,
> > > and
> > > > timedout tests.
> > > >
> > > > > I see that the following tests have failed 100% of the time for the
> > > last
> > > > 30
> > > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > > rather a
> > > > > legitimate failure, right?
> > > > > Maybe this tool is used to see all test failures, but if not, I
> feel
> > > like
> > > > > we could/should remove a test from the flaky tests/excludes if it
> > fails
> > > > > consistently so we can fix the root cause
> > > >
> > > > Has come up a lot of times before. Yes, you're right 100% failure =
> > > > legitimate failure.
> > > > 
> > > > We as a community suck at tracking nightly runs for failing tests and
> > > > fixing them, otherwise we wouldn't have ~40 bad test, right!
> > > > In fact, we suck at fixing tests even when it's presented in a nice
> > clean
> > > > list (this dashboard). We just don't prioritize tests in our work.
> > > > The general attitude is, tests are failing...meh..what's new, have
> been
> > > > failing for years. Instead of - Oh, one test failed, find the cause
> and
> > > > revert it!
> > > > So the real thing to change here is attitude of the community towards
> > > > tests. I am +1 for anything that'll promote/support that change.
> > > > 
> > > > I think we can actually update the script to send a mail to dev@
> when
> > it
> > > > encounters these 100% failing tests. Waana try? :)
> > > >
> > > > -- Appy
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <
> > > zyork.contribut...@gmail.com>
> > > > wrote:
> > > >
> > > > > Just took a look at the dashboard. Does this capture only failed
> runs
> > > or
> > > > > all runs?
> > > > >
> > > > > I see that the following tests have failed 100% of the time for the
> > > last
> > > > 30
> > > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > > rather a
> > > > > legitimate failure, right?
> > > > > Maybe this tool is used to see all test failures, but if not, I
> feel
> > > like
> > > > > we could/should remove a test from the flaky tests/excludes if it
> > fails
> > > > > consistently so we can fix the root cause.
> > > > >
> > > > > [1]
> > > > > master.balancer.TestRegionsOnMasterOptions
> > > > > client.TestMultiParallel
> > > > > regionserver.TestRegionServerReadRequestMetrics
> > > > >
> > > > > Thanks,
> > > > > Zach
> > > > >
> > > > > On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
> > > > >
> > > > > > Dashboard doesn't capture timed out tests, right Appy?
> > > > > > Thanks,
> > > > > > S
> > > > > >
> > > > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <
> > a...@cloudera.com>
> > > > > > wrote:
> > > > > >
> > > > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > > > > >
> > > > > > > @stack: when you branch out branch-2.0, let me know, i'll
> update
> > > the
> > > > > jobs
> > > > > > > to 

Re: Flaky dashboard for current branch-2

2018-01-12 Thread Ted Yu
There is more than one reason.

Sometimes QA reported tests in a module failed.
When artifact/patchprocess/patch-unit-hbase-server.txt is checked, there
were more than one occurrence of the following :

https://pastebin.com/WBewfj3Q

It is hard to decipher what was behind the crash.
Finding hanging test currently is not automated.

Also note the following at the beginning of the test run:

https://pastebin.com/sK6ebk84

FYI

On Fri, Jan 12, 2018 at 4:35 PM, 张铎(Duo Zhang) 
wrote:

> Why a 100% failure test can not be detected with pre commit check?
>
> Ted Yu 于2018年1月13日 周六07:44写道:
>
> > As we get closer and closer to beta release, it is important to have as
> few
> > flaky tests as possible.
> >
> > bq. we can actually update the script to send a mail to dev@
> >
> > A post to the JIRA which caused the 100% failing test would be better.
> > The committer would notice the post and take corresponding action.
> >
> > Cheers
> >
> > On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma 
> > wrote:
> >
> > > >   Is Nightly now using a list of flakes?
> > > Dashboard job was flaky yesterday, so didn't start using it. Looks like
> > > it's working fine now. Let me exclude flakies from nightly job.
> > >
> > > > Just took a look at the dashboard. Does this capture only failed runs
> > or
> > > all
> > > runs?
> > > Sorry the question isn't clear. Runs of what?
> > > Here's an attempt to answer it in best way i can understand - it looks
> at
> > > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging,
> > and
> > > timedout tests.
> > >
> > > > I see that the following tests have failed 100% of the time for the
> > last
> > > 30
> > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > rather a
> > > > legitimate failure, right?
> > > > Maybe this tool is used to see all test failures, but if not, I feel
> > like
> > > > we could/should remove a test from the flaky tests/excludes if it
> fails
> > > > consistently so we can fix the root cause
> > >
> > > Has come up a lot of times before. Yes, you're right 100% failure =
> > > legitimate failure.
> > > 
> > > We as a community suck at tracking nightly runs for failing tests and
> > > fixing them, otherwise we wouldn't have ~40 bad test, right!
> > > In fact, we suck at fixing tests even when it's presented in a nice
> clean
> > > list (this dashboard). We just don't prioritize tests in our work.
> > > The general attitude is, tests are failing...meh..what's new, have been
> > > failing for years. Instead of - Oh, one test failed, find the cause and
> > > revert it!
> > > So the real thing to change here is attitude of the community towards
> > > tests. I am +1 for anything that'll promote/support that change.
> > > 
> > > I think we can actually update the script to send a mail to dev@ when
> it
> > > encounters these 100% failing tests. Waana try? :)
> > >
> > > -- Appy
> > >
> > >
> > >
> > >
> > > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <
> > zyork.contribut...@gmail.com>
> > > wrote:
> > >
> > > > Just took a look at the dashboard. Does this capture only failed runs
> > or
> > > > all runs?
> > > >
> > > > I see that the following tests have failed 100% of the time for the
> > last
> > > 30
> > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > rather a
> > > > legitimate failure, right?
> > > > Maybe this tool is used to see all test failures, but if not, I feel
> > like
> > > > we could/should remove a test from the flaky tests/excludes if it
> fails
> > > > consistently so we can fix the root cause.
> > > >
> > > > [1]
> > > > master.balancer.TestRegionsOnMasterOptions
> > > > client.TestMultiParallel
> > > > regionserver.TestRegionServerReadRequestMetrics
> > > >
> > > > Thanks,
> > > > Zach
> > > >
> > > > On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
> > > >
> > > > > Dashboard doesn't capture timed out tests, right Appy?
> > > > > Thanks,
> > > > > S
> > > > >
> > > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <
> a...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > > > >
> > > > > > @stack: when you branch out branch-2.0, let me know, i'll update
> > the
> > > > jobs
> > > > > > to point to that branch so that it's helpful in release. Once
> > release
> > > > is
> > > > > > done, i'll move them back to "branch-2".
> > > > > >
> > > > > >
> > > > > > -- Appy
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > -- Appy
> > >
> >
>


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Apekshit Sharma
bq. Why a 100% failure test can not be detected with pre commit check?
Precommit runs only those tests which are in the modules being changed. If
a change breaks downstream modules, it can lead to such a scenario.

-- Appy

On Fri, Jan 12, 2018 at 4:35 PM, 张铎(Duo Zhang) 
wrote:

> Why a 100% failure test can not be detected with pre commit check?
>
> Ted Yu 于2018年1月13日 周六07:44写道:
>
> > As we get closer and closer to beta release, it is important to have as
> few
> > flaky tests as possible.
> >
> > bq. we can actually update the script to send a mail to dev@
> >
> > A post to the JIRA which caused the 100% failing test would be better.
> > The committer would notice the post and take corresponding action.
> >
> > Cheers
> >
> > On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma 
> > wrote:
> >
> > > >   Is Nightly now using a list of flakes?
> > > Dashboard job was flaky yesterday, so didn't start using it. Looks like
> > > it's working fine now. Let me exclude flakies from nightly job.
> > >
> > > > Just took a look at the dashboard. Does this capture only failed runs
> > or
> > > all
> > > runs?
> > > Sorry the question isn't clear. Runs of what?
> > > Here's an attempt to answer it in best way i can understand - it looks
> at
> > > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging,
> > and
> > > timedout tests.
> > >
> > > > I see that the following tests have failed 100% of the time for the
> > last
> > > 30
> > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > rather a
> > > > legitimate failure, right?
> > > > Maybe this tool is used to see all test failures, but if not, I feel
> > like
> > > > we could/should remove a test from the flaky tests/excludes if it
> fails
> > > > consistently so we can fix the root cause
> > >
> > > Has come up a lot of times before. Yes, you're right 100% failure =
> > > legitimate failure.
> > > 
> > > We as a community suck at tracking nightly runs for failing tests and
> > > fixing them, otherwise we wouldn't have ~40 bad test, right!
> > > In fact, we suck at fixing tests even when it's presented in a nice
> clean
> > > list (this dashboard). We just don't prioritize tests in our work.
> > > The general attitude is, tests are failing...meh..what's new, have been
> > > failing for years. Instead of - Oh, one test failed, find the cause and
> > > revert it!
> > > So the real thing to change here is attitude of the community towards
> > > tests. I am +1 for anything that'll promote/support that change.
> > > 
> > > I think we can actually update the script to send a mail to dev@ when
> it
> > > encounters these 100% failing tests. Waana try? :)
> > >
> > > -- Appy
> > >
> > >
> > >
> > >
> > > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <
> > zyork.contribut...@gmail.com>
> > > wrote:
> > >
> > > > Just took a look at the dashboard. Does this capture only failed runs
> > or
> > > > all runs?
> > > >
> > > > I see that the following tests have failed 100% of the time for the
> > last
> > > 30
> > > > runs [1]. If this captures all runs, this isn't truly flaky, but
> > rather a
> > > > legitimate failure, right?
> > > > Maybe this tool is used to see all test failures, but if not, I feel
> > like
> > > > we could/should remove a test from the flaky tests/excludes if it
> fails
> > > > consistently so we can fix the root cause.
> > > >
> > > > [1]
> > > > master.balancer.TestRegionsOnMasterOptions
> > > > client.TestMultiParallel
> > > > regionserver.TestRegionServerReadRequestMetrics
> > > >
> > > > Thanks,
> > > > Zach
> > > >
> > > > On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
> > > >
> > > > > Dashboard doesn't capture timed out tests, right Appy?
> > > > > Thanks,
> > > > > S
> > > > >
> > > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <
> a...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > > > >
> > > > > > @stack: when you branch out branch-2.0, let me know, i'll update
> > the
> > > > jobs
> > > > > > to point to that branch so that it's helpful in release. Once
> > release
> > > > is
> > > > > > done, i'll move them back to "branch-2".
> > > > > >
> > > > > >
> > > > > > -- Appy
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > -- Appy
> > >
> >
>



-- 

-- Appy


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Duo Zhang
Why a 100% failure test can not be detected with pre commit check?

Ted Yu 于2018年1月13日 周六07:44写道:

> As we get closer and closer to beta release, it is important to have as few
> flaky tests as possible.
>
> bq. we can actually update the script to send a mail to dev@
>
> A post to the JIRA which caused the 100% failing test would be better.
> The committer would notice the post and take corresponding action.
>
> Cheers
>
> On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma 
> wrote:
>
> > >   Is Nightly now using a list of flakes?
> > Dashboard job was flaky yesterday, so didn't start using it. Looks like
> > it's working fine now. Let me exclude flakies from nightly job.
> >
> > > Just took a look at the dashboard. Does this capture only failed runs
> or
> > all
> > runs?
> > Sorry the question isn't clear. Runs of what?
> > Here's an attempt to answer it in best way i can understand - it looks at
> > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging,
> and
> > timedout tests.
> >
> > > I see that the following tests have failed 100% of the time for the
> last
> > 30
> > > runs [1]. If this captures all runs, this isn't truly flaky, but
> rather a
> > > legitimate failure, right?
> > > Maybe this tool is used to see all test failures, but if not, I feel
> like
> > > we could/should remove a test from the flaky tests/excludes if it fails
> > > consistently so we can fix the root cause
> >
> > Has come up a lot of times before. Yes, you're right 100% failure =
> > legitimate failure.
> > 
> > We as a community suck at tracking nightly runs for failing tests and
> > fixing them, otherwise we wouldn't have ~40 bad test, right!
> > In fact, we suck at fixing tests even when it's presented in a nice clean
> > list (this dashboard). We just don't prioritize tests in our work.
> > The general attitude is, tests are failing...meh..what's new, have been
> > failing for years. Instead of - Oh, one test failed, find the cause and
> > revert it!
> > So the real thing to change here is attitude of the community towards
> > tests. I am +1 for anything that'll promote/support that change.
> > 
> > I think we can actually update the script to send a mail to dev@ when it
> > encounters these 100% failing tests. Waana try? :)
> >
> > -- Appy
> >
> >
> >
> >
> > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <
> zyork.contribut...@gmail.com>
> > wrote:
> >
> > > Just took a look at the dashboard. Does this capture only failed runs
> or
> > > all runs?
> > >
> > > I see that the following tests have failed 100% of the time for the
> last
> > 30
> > > runs [1]. If this captures all runs, this isn't truly flaky, but
> rather a
> > > legitimate failure, right?
> > > Maybe this tool is used to see all test failures, but if not, I feel
> like
> > > we could/should remove a test from the flaky tests/excludes if it fails
> > > consistently so we can fix the root cause.
> > >
> > > [1]
> > > master.balancer.TestRegionsOnMasterOptions
> > > client.TestMultiParallel
> > > regionserver.TestRegionServerReadRequestMetrics
> > >
> > > Thanks,
> > > Zach
> > >
> > > On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
> > >
> > > > Dashboard doesn't capture timed out tests, right Appy?
> > > > Thanks,
> > > > S
> > > >
> > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma 
> > > > wrote:
> > > >
> > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > > >
> > > > > @stack: when you branch out branch-2.0, let me know, i'll update
> the
> > > jobs
> > > > > to point to that branch so that it's helpful in release. Once
> release
> > > is
> > > > > done, i'll move them back to "branch-2".
> > > > >
> > > > >
> > > > > -- Appy
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > -- Appy
> >
>


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Ted Yu
As we get closer and closer to beta release, it is important to have as few
flaky tests as possible.

bq. we can actually update the script to send a mail to dev@

A post to the JIRA which caused the 100% failing test would be better.
The committer would notice the post and take corresponding action.

Cheers

On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma  wrote:

> >   Is Nightly now using a list of flakes?
> Dashboard job was flaky yesterday, so didn't start using it. Looks like
> it's working fine now. Let me exclude flakies from nightly job.
>
> > Just took a look at the dashboard. Does this capture only failed runs or
> all
> runs?
> Sorry the question isn't clear. Runs of what?
> Here's an attempt to answer it in best way i can understand - it looks at
> last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, and
> timedout tests.
>
> > I see that the following tests have failed 100% of the time for the last
> 30
> > runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> > legitimate failure, right?
> > Maybe this tool is used to see all test failures, but if not, I feel like
> > we could/should remove a test from the flaky tests/excludes if it fails
> > consistently so we can fix the root cause
>
> Has come up a lot of times before. Yes, you're right 100% failure =
> legitimate failure.
> 
> We as a community suck at tracking nightly runs for failing tests and
> fixing them, otherwise we wouldn't have ~40 bad test, right!
> In fact, we suck at fixing tests even when it's presented in a nice clean
> list (this dashboard). We just don't prioritize tests in our work.
> The general attitude is, tests are failing...meh..what's new, have been
> failing for years. Instead of - Oh, one test failed, find the cause and
> revert it!
> So the real thing to change here is attitude of the community towards
> tests. I am +1 for anything that'll promote/support that change.
> 
> I think we can actually update the script to send a mail to dev@ when it
> encounters these 100% failing tests. Waana try? :)
>
> -- Appy
>
>
>
>
> On Fri, Jan 12, 2018 at 11:29 AM, Zach York 
> wrote:
>
> > Just took a look at the dashboard. Does this capture only failed runs or
> > all runs?
> >
> > I see that the following tests have failed 100% of the time for the last
> 30
> > runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> > legitimate failure, right?
> > Maybe this tool is used to see all test failures, but if not, I feel like
> > we could/should remove a test from the flaky tests/excludes if it fails
> > consistently so we can fix the root cause.
> >
> > [1]
> > master.balancer.TestRegionsOnMasterOptions
> > client.TestMultiParallel
> > regionserver.TestRegionServerReadRequestMetrics
> >
> > Thanks,
> > Zach
> >
> > On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
> >
> > > Dashboard doesn't capture timed out tests, right Appy?
> > > Thanks,
> > > S
> > >
> > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma 
> > > wrote:
> > >
> > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > >
> > > > @stack: when you branch out branch-2.0, let me know, i'll update the
> > jobs
> > > > to point to that branch so that it's helpful in release. Once release
> > is
> > > > done, i'll move them back to "branch-2".
> > > >
> > > >
> > > > -- Appy
> > > >
> > >
> >
>
>
>
> --
>
> -- Appy
>


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Apekshit Sharma
>   Is Nightly now using a list of flakes?
Dashboard job was flaky yesterday, so didn't start using it. Looks like
it's working fine now. Let me exclude flakies from nightly job.

> Just took a look at the dashboard. Does this capture only failed runs or all
runs?
Sorry the question isn't clear. Runs of what?
Here's an attempt to answer it in best way i can understand - it looks at
last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, and
timedout tests.

> I see that the following tests have failed 100% of the time for the last
30
> runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> legitimate failure, right?
> Maybe this tool is used to see all test failures, but if not, I feel like
> we could/should remove a test from the flaky tests/excludes if it fails
> consistently so we can fix the root cause

Has come up a lot of times before. Yes, you're right 100% failure =
legitimate failure.

We as a community suck at tracking nightly runs for failing tests and
fixing them, otherwise we wouldn't have ~40 bad test, right!
In fact, we suck at fixing tests even when it's presented in a nice clean
list (this dashboard). We just don't prioritize tests in our work.
The general attitude is, tests are failing...meh..what's new, have been
failing for years. Instead of - Oh, one test failed, find the cause and
revert it!
So the real thing to change here is attitude of the community towards
tests. I am +1 for anything that'll promote/support that change.

I think we can actually update the script to send a mail to dev@ when it
encounters these 100% failing tests. Waana try? :)

-- Appy




On Fri, Jan 12, 2018 at 11:29 AM, Zach York 
wrote:

> Just took a look at the dashboard. Does this capture only failed runs or
> all runs?
>
> I see that the following tests have failed 100% of the time for the last 30
> runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> legitimate failure, right?
> Maybe this tool is used to see all test failures, but if not, I feel like
> we could/should remove a test from the flaky tests/excludes if it fails
> consistently so we can fix the root cause.
>
> [1]
> master.balancer.TestRegionsOnMasterOptions
> client.TestMultiParallel
> regionserver.TestRegionServerReadRequestMetrics
>
> Thanks,
> Zach
>
> On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:
>
> > Dashboard doesn't capture timed out tests, right Appy?
> > Thanks,
> > S
> >
> > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma 
> > wrote:
> >
> > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > >
> > > @stack: when you branch out branch-2.0, let me know, i'll update the
> jobs
> > > to point to that branch so that it's helpful in release. Once release
> is
> > > done, i'll move them back to "branch-2".
> > >
> > >
> > > -- Appy
> > >
> >
>



-- 

-- Appy


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Zach York
Just took a look at the dashboard. Does this capture only failed runs or
all runs?

I see that the following tests have failed 100% of the time for the last 30
runs [1]. If this captures all runs, this isn't truly flaky, but rather a
legitimate failure, right?
Maybe this tool is used to see all test failures, but if not, I feel like
we could/should remove a test from the flaky tests/excludes if it fails
consistently so we can fix the root cause.

[1]
master.balancer.TestRegionsOnMasterOptions
client.TestMultiParallel
regionserver.TestRegionServerReadRequestMetrics

Thanks,
Zach

On Fri, Jan 12, 2018 at 8:19 AM, Stack  wrote:

> Dashboard doesn't capture timed out tests, right Appy?
> Thanks,
> S
>
> On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma 
> wrote:
>
> > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> >
> > @stack: when you branch out branch-2.0, let me know, i'll update the jobs
> > to point to that branch so that it's helpful in release. Once release is
> > done, i'll move them back to "branch-2".
> >
> >
> > -- Appy
> >
>


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Stack
Dashboard doesn't capture timed out tests, right Appy?
Thanks,
S

On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma  wrote:

> https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> branch2.0/lastSuccessfulBuild/artifact/dashboard.html
>
> @stack: when you branch out branch-2.0, let me know, i'll update the jobs
> to point to that branch so that it's helpful in release. Once release is
> done, i'll move them back to "branch-2".
>
>
> -- Appy
>


Re: Flaky dashboard for current branch-2

2018-01-12 Thread Balazs Meszaros
Nice job! Thanks Appy!

On Fri, Jan 12, 2018 at 6:09 AM, Stack  wrote:

> Thanks Appy. Looks beautiful. Is Nightly now using a list of flakes?
> Thanks.
> S
>
> On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma 
> wrote:
>
> > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> >
> > @stack: when you branch out branch-2.0, let me know, i'll update the jobs
> > to point to that branch so that it's helpful in release. Once release is
> > done, i'll move them back to "branch-2".
> >
> >
> > -- Appy
> >
>


Re: Flaky dashboard for current branch-2

2018-01-11 Thread Stack
Thanks Appy. Looks beautiful. Is Nightly now using a list of flakes? Thanks.
S

On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma  wrote:

> https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> branch2.0/lastSuccessfulBuild/artifact/dashboard.html
>
> @stack: when you branch out branch-2.0, let me know, i'll update the jobs
> to point to that branch so that it's helpful in release. Once release is
> done, i'll move them back to "branch-2".
>
>
> -- Appy
>