Re: reverting test-breaking changes

2018-04-06 Thread Vihang Karajgaonkar
TestNegativeCliDriver failures are concerning. We are practically having no
negative test coverage since a last couple of weeks. I think we should
treat it as a blocker for Hive 3.0.0 release. Thoughts?

On Thu, Apr 5, 2018 at 9:50 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> It may be because of legitimate memory issue/leak. Alternative is to
> decrease the batch size of negative cli driver on ptest cluster but again
> we wouldn't know if there is any actual memory issue.
>
> Thanks
> Prasanth
>
>
>
> On Thu, Apr 5, 2018 at 1:36 PM -0700, "Vineet Garg"  > wrote:
>
>
> TestNegativeCli driver tests are still failing with
> java.lang.OutOfMemoryError: GC overhead limit exceeded error.
> Can we increase the amount of memory for tests?
>
> Vineet
>
> On Mar 5, 2018, at 11:35 AM, Sergey Shelukhin > wrote:
>
> On a semi-related note, I noticed recently that negative tests seem to OOM
> in setup from time to time.
> Can we increase the amount of memory for the tests a little bit, and/or
> maybe add the dump on OOM to them, saved to test logs directory, so we
> could investigate?
>
> On 18/3/5, 11:07, "Vineet Garg" > wrote:
>
> +1 for nightly build. We could generate reports to identify both frequent
> and sporadic test failures plus other interesting bits like average build
> time, yetus failures etc. It'll also help narrow down the culprit
> commit(s) range to one day.
> If you guys decide to go ahead with this I would like to help.
>
> Vineet
>
> On Mar 5, 2018, at 8:50 AM, Sahil Takiar > wrote:
>
> Wow that HBase UI looks super useful. +1 to having something like that.
>
> If not, +1 to having a proper nightly build, it would help devs identify
> which commits break which tests. I find using git-bisect can take a long
> time to run, and can be difficult to use (e.g. finding a known good
> commit
> isn't always easy).
>
> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary > wrote:
>
> Without a nightly build and with this many flaky tests it is very hard
> to
> identify the braking commits. We can use something like bisect and
> multiple
> test runs.
>
> There is a more elegant way to do this with nightly test runs:
> https://issues.apache.org/jira/browse/HBASE-15917 <
> https://issues.apache.org/jira/browse/HBASE-15917>
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
> lastSuccessfulBuild/artifact/dashboard.html
>
> This also helps to identify the flaky tests, and creates a continuos,
> updated list of them.
>
> On Feb 23, 2018, at 6:55 PM, Sahil Takiar
> wrote:
>
> +1
>
> Does anyone have suggestions about how to efficiently identify which
> commit
> is breaking a test? Is it just git-bisect or is there an easier way?
> Hive
> QA isn't always that helpful, it will say a test is failing for the
> past
> "x" builds, but that doesn't help much since Hive QA isn't a nightly
> build.
>
> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> wrote:
>
> +1
> Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
> would be
> good.
>
> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates
> wrote:
>
> +1.
>
> Alan.
>
> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair
> wrote:
>
> +1
> I agree, this makes sense. The number of failures keeps increasing.
> A 24 hour heads up in either case before revert would be good.
>
>
> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary
> wrote:
>
> I agree with Zoltan. The continuously braking tests make it very
> hard
> to
> spot real issues.
> Any thoughts on doing it automatically?
>
> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich
> wrote:
>
> *
>
> Hello,
>
> *
> *
>
> **
>
> In the last couple weeks the number of broken tests have started
> to
> go
> up...and even tho I run bisect/etc from time to time ; sometimes
> people
> don't react to my comments/tickets/etc.
>
> Because keeping this many failing tests makes it easier for a new
> one
> to
> slip in...I think reverting the patch introducing the test
> failures
> would
> also help in some case.
>
> I think it would help a lot to prevent further test breaks to
> revert
> the
> patch if any of the following conditions is met:
>
> *
> *
>
> C1) if the notification/comment about the fact that the patch
> indeed
> broken a test somehow have been unanswered for at least 24 hours.
>
> C2) if the patch is in for 7 days; but the test failure is still
> not
> addressed (note that in this case there might be a conversation
> about
> fixing it...but in this case ; to enable other people to work in a
> cleaner
> environment is more important than a single patch - and if it
> can't
> be
> fixed in 7 days...well it might not get fixed in a month).
>
> *
> *
>
> I would like to also note that I've seen a few tickets which have
> been
> picked up by people who were not involved in creating the original
> change -
> and although the intention was good, they might miss the context
> of
> the
> original patch and may "fix" the tests in the wrong 

Re: reverting test-breaking changes

2018-04-05 Thread Prasanth Jayachandran
It may be because of legitimate memory issue/leak. Alternative is to decrease 
the batch size of negative cli driver on ptest cluster but again we wouldn't 
know if there is any actual memory issue.

Thanks
Prasanth



On Thu, Apr 5, 2018 at 1:36 PM -0700, "Vineet Garg" 
> wrote:


TestNegativeCli driver tests are still failing with java.lang.OutOfMemoryError: 
GC overhead limit exceeded error.
Can we increase the amount of memory for tests?

Vineet

On Mar 5, 2018, at 11:35 AM, Sergey Shelukhin > wrote:

On a semi-related note, I noticed recently that negative tests seem to OOM
in setup from time to time.
Can we increase the amount of memory for the tests a little bit, and/or
maybe add the dump on OOM to them, saved to test logs directory, so we
could investigate?

On 18/3/5, 11:07, "Vineet Garg" > wrote:

+1 for nightly build. We could generate reports to identify both frequent
and sporadic test failures plus other interesting bits like average build
time, yetus failures etc. It'll also help narrow down the culprit
commit(s) range to one day.
If you guys decide to go ahead with this I would like to help.

Vineet

On Mar 5, 2018, at 8:50 AM, Sahil Takiar > wrote:

Wow that HBase UI looks super useful. +1 to having something like that.

If not, +1 to having a proper nightly build, it would help devs identify
which commits break which tests. I find using git-bisect can take a long
time to run, and can be difficult to use (e.g. finding a known good
commit
isn't always easy).

On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary > wrote:

Without a nightly build and with this many flaky tests it is very hard
to
identify the braking commits. We can use something like bisect and
multiple
test runs.

There is a more elegant way to do this with nightly test runs:
https://issues.apache.org/jira/browse/HBASE-15917 <
https://issues.apache.org/jira/browse/HBASE-15917>
https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
lastSuccessfulBuild/artifact/dashboard.html

This also helps to identify the flaky tests, and creates a continuos,
updated list of them.

On Feb 23, 2018, at 6:55 PM, Sahil Takiar
wrote:

+1

Does anyone have suggestions about how to efficiently identify which
commit
is breaking a test? Is it just git-bisect or is there an easier way?
Hive
QA isn't always that helpful, it will say a test is failing for the
past
"x" builds, but that doesn't help much since Hive QA isn't a nightly
build.

On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
vih...@cloudera.com>
wrote:

+1
Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
would be
good.

On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates
wrote:

+1.

Alan.

On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair
wrote:

+1
I agree, this makes sense. The number of failures keeps increasing.
A 24 hour heads up in either case before revert would be good.


On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary
wrote:

I agree with Zoltan. The continuously braking tests make it very
hard
to
spot real issues.
Any thoughts on doing it automatically?

On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich
wrote:

*

Hello,

*
*

**

In the last couple weeks the number of broken tests have started
to
go
up...and even tho I run bisect/etc from time to time ; sometimes
people
don't react to my comments/tickets/etc.

Because keeping this many failing tests makes it easier for a new
one
to
slip in...I think reverting the patch introducing the test
failures
would
also help in some case.

I think it would help a lot to prevent further test breaks to
revert
the
patch if any of the following conditions is met:

*
*

C1) if the notification/comment about the fact that the patch
indeed
broken a test somehow have been unanswered for at least 24 hours.

C2) if the patch is in for 7 days; but the test failure is still
not
addressed (note that in this case there might be a conversation
about
fixing it...but in this case ; to enable other people to work in a
cleaner
environment is more important than a single patch - and if it
can't
be
fixed in 7 days...well it might not get fixed in a month).

*
*

I would like to also note that I've seen a few tickets which have
been
picked up by people who were not involved in creating the original
change -
and although the intention was good, they might miss the context
of
the
original patch and may "fix" the tests in the wrong way: accept a
q.out
which is inappropriate or ignore the test...

*
*

would it be ok to implement this from now on? because it makes my
efforts practically useless if people are not reacting...

*
*

note: just to be on the same page - this is only about running a
single
test which falls on its own - I feel that flaky tests are an
entirely
different topic.

*
*

cheers,

Zoltan

**
*








--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309




--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309






Re: reverting test-breaking changes

2018-04-05 Thread Vineet Garg
TestNegativeCli driver tests are still failing with java.lang.OutOfMemoryError: 
GC overhead limit exceeded error.
Can we increase the amount of memory for tests?

Vineet

On Mar 5, 2018, at 11:35 AM, Sergey Shelukhin 
> wrote:

On a semi-related note, I noticed recently that negative tests seem to OOM
in setup from time to time.
Can we increase the amount of memory for the tests a little bit, and/or
maybe add the dump on OOM to them, saved to test logs directory, so we
could investigate?

On 18/3/5, 11:07, "Vineet Garg" 
> wrote:

+1 for nightly build. We could generate reports to identify both frequent
and sporadic test failures plus other interesting bits like average build
time, yetus failures etc. It’ll also help narrow down the culprit
commit(s) range to one day.
If you guys decide to go ahead with this I would like to help.

Vineet

On Mar 5, 2018, at 8:50 AM, Sahil Takiar 
> wrote:

Wow that HBase UI looks super useful. +1 to having something like that.

If not, +1 to having a proper nightly build, it would help devs identify
which commits break which tests. I find using git-bisect can take a long
time to run, and can be difficult to use (e.g. finding a known good
commit
isn't always easy).

On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary 
> wrote:

Without a nightly build and with this many flaky tests it is very hard
to
identify the braking commits. We can use something like bisect and
multiple
test runs.

There is a more elegant way to do this with nightly test runs:
https://issues.apache.org/jira/browse/HBASE-15917 <
https://issues.apache.org/jira/browse/HBASE-15917>
https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
lastSuccessfulBuild/artifact/dashboard.html 

This also helps to identify the flaky tests, and creates a continuos,
updated list of them.

On Feb 23, 2018, at 6:55 PM, Sahil Takiar 
wrote:

+1

Does anyone have suggestions about how to efficiently identify which
commit
is breaking a test? Is it just git-bisect or is there an easier way?
Hive
QA isn't always that helpful, it will say a test is failing for the
past
"x" builds, but that doesn't help much since Hive QA isn't a nightly
build.

On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
vih...@cloudera.com>
wrote:

+1
Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
would be
good.

On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates 
wrote:

+1.

Alan.

On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
wrote:

+1
I agree, this makes sense. The number of failures keeps increasing.
A 24 hour heads up in either case before revert would be good.


On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
wrote:

I agree with Zoltan. The continuously braking tests make it very
hard
to
spot real issues.
Any thoughts on doing it automatically?

On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
wrote:

*

Hello,

*
*

**

In the last couple weeks the number of broken tests have started
to
go
up...and even tho I run bisect/etc from time to time ; sometimes
people
don’t react to my comments/tickets/etc.

Because keeping this many failing tests makes it easier for a new
one
to
slip in...I think reverting the patch introducing the test
failures
would
also help in some case.

I think it would help a lot to prevent further test breaks to
revert
the
patch if any of the following conditions is met:

*
*

C1) if the notification/comment about the fact that the patch
indeed
broken a test somehow have been unanswered for at least 24 hours.

C2) if the patch is in for 7 days; but the test failure is still
not
addressed (note that in this case there might be a conversation
about
fixing it...but in this case ; to enable other people to work in a
cleaner
environment is more important than a single patch - and if it
can't
be
fixed in 7 days...well it might not get fixed in a month).

*
*

I would like to also note that I've seen a few tickets which have
been
picked up by people who were not involved in creating the original
change -
and although the intention was good, they might miss the context
of
the
original patch and may "fix" the tests in the wrong way: accept a
q.out
which is inappropriate or ignore the test...

*
*

would it be ok to implement this from now on? because it makes my
efforts practically useless if people are not reacting…

*
*

note: just to be on the same page - this is only about running a
single
test which falls on its own - I feel that flaky tests are an
entirely
different topic.

*
*

cheers,

Zoltan

**
*








--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309




--
Sahil Takiar
Software Engineer

Re: reverting test-breaking changes

2018-03-05 Thread Sergey Shelukhin
On a semi-related note, I noticed recently that negative tests seem to OOM
in setup from time to time.
Can we increase the amount of memory for the tests a little bit, and/or
maybe add the dump on OOM to them, saved to test logs directory, so we
could investigate?

On 18/3/5, 11:07, "Vineet Garg"  wrote:

>+1 for nightly build. We could generate reports to identify both frequent
>and sporadic test failures plus other interesting bits like average build
>time, yetus failures etc. It’ll also help narrow down the culprit
>commit(s) range to one day.
>If you guys decide to go ahead with this I would like to help.
>
>Vineet
>
>> On Mar 5, 2018, at 8:50 AM, Sahil Takiar  wrote:
>> 
>> Wow that HBase UI looks super useful. +1 to having something like that.
>> 
>> If not, +1 to having a proper nightly build, it would help devs identify
>> which commits break which tests. I find using git-bisect can take a long
>> time to run, and can be difficult to use (e.g. finding a known good
>>commit
>> isn't always easy).
>> 
>> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary  wrote:
>> 
>>> Without a nightly build and with this many flaky tests it is very hard
>>>to
>>> identify the braking commits. We can use something like bisect and
>>>multiple
>>> test runs.
>>> 
>>> There is a more elegant way to do this with nightly test runs:
>>> https://issues.apache.org/jira/browse/HBASE-15917 <
>>> https://issues.apache.org/jira/browse/HBASE-15917>
>>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
>>> lastSuccessfulBuild/artifact/dashboard.html >> job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html>
>>> 
>>> This also helps to identify the flaky tests, and creates a continuos,
>>> updated list of them.
>>> 
 On Feb 23, 2018, at 6:55 PM, Sahil Takiar 
>>> wrote:
 
 +1
 
 Does anyone have suggestions about how to efficiently identify which
>>> commit
 is breaking a test? Is it just git-bisect or is there an easier way?
Hive
 QA isn't always that helpful, it will say a test is failing for the
past
 "x" builds, but that doesn't help much since Hive QA isn't a nightly
>>> build.
 
 On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
>>> vih...@cloudera.com>
 wrote:
 
> +1
> Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
>>> would be
> good.
> 
> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates 
>>> wrote:
> 
>> +1.
>> 
>> Alan.
>> 
>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
>> wrote:
>> 
>>> +1
>>> I agree, this makes sense. The number of failures keeps increasing.
>>> A 24 hour heads up in either case before revert would be good.
>>> 
>>> 
>>> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
> wrote:
>>> 
 I agree with Zoltan. The continuously braking tests make it very
hard
>> to
 spot real issues.
 Any thoughts on doing it automatically?
 
> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
> wrote:
> 
> *
> 
> Hello,
> 
> *
> *
> 
> **
> 
> In the last couple weeks the number of broken tests have started
>to
>> go
 up...and even tho I run bisect/etc from time to time ; sometimes
> people
 don’t react to my comments/tickets/etc.
> 
> Because keeping this many failing tests makes it easier for a new
> one
>>> to
 slip in...I think reverting the patch introducing the test
failures
>> would
 also help in some case.
> 
> I think it would help a lot to prevent further test breaks to
> revert
>>> the
 patch if any of the following conditions is met:
> 
> *
> *
> 
> C1) if the notification/comment about the fact that the patch
> indeed
 broken a test somehow have been unanswered for at least 24 hours.
> 
> C2) if the patch is in for 7 days; but the test failure is still
> not
 addressed (note that in this case there might be a conversation
about
 fixing it...but in this case ; to enable other people to work in a
>>> cleaner
 environment is more important than a single patch - and if it
can't
> be
 fixed in 7 days...well it might not get fixed in a month).
> 
> *
> *
> 
> I would like to also note that I've seen a few tickets which have
>> been
 picked up by people who were not involved in creating the original
>>> change -
 and although the intention was good, they might miss the context
of
> the
 

Re: reverting test-breaking changes

2018-03-05 Thread Vineet Garg
+1 for nightly build. We could generate reports to identify both frequent and 
sporadic test failures plus other interesting bits like average build time, 
yetus failures etc. It’ll also help narrow down the culprit commit(s) range to 
one day.
If you guys decide to go ahead with this I would like to help.

Vineet

> On Mar 5, 2018, at 8:50 AM, Sahil Takiar  wrote:
> 
> Wow that HBase UI looks super useful. +1 to having something like that.
> 
> If not, +1 to having a proper nightly build, it would help devs identify
> which commits break which tests. I find using git-bisect can take a long
> time to run, and can be difficult to use (e.g. finding a known good commit
> isn't always easy).
> 
> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary  wrote:
> 
>> Without a nightly build and with this many flaky tests it is very hard to
>> identify the braking commits. We can use something like bisect and multiple
>> test runs.
>> 
>> There is a more elegant way to do this with nightly test runs:
>> https://issues.apache.org/jira/browse/HBASE-15917 <
>> https://issues.apache.org/jira/browse/HBASE-15917>
>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
>> lastSuccessfulBuild/artifact/dashboard.html > job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html>
>> 
>> This also helps to identify the flaky tests, and creates a continuos,
>> updated list of them.
>> 
>>> On Feb 23, 2018, at 6:55 PM, Sahil Takiar 
>> wrote:
>>> 
>>> +1
>>> 
>>> Does anyone have suggestions about how to efficiently identify which
>> commit
>>> is breaking a test? Is it just git-bisect or is there an easier way? Hive
>>> QA isn't always that helpful, it will say a test is failing for the past
>>> "x" builds, but that doesn't help much since Hive QA isn't a nightly
>> build.
>>> 
>>> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
>> vih...@cloudera.com>
>>> wrote:
>>> 
 +1
 Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
>> would be
 good.
 
 On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates 
>> wrote:
 
> +1.
> 
> Alan.
> 
> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
> wrote:
> 
>> +1
>> I agree, this makes sense. The number of failures keeps increasing.
>> A 24 hour heads up in either case before revert would be good.
>> 
>> 
>> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
 wrote:
>> 
>>> I agree with Zoltan. The continuously braking tests make it very hard
> to
>>> spot real issues.
>>> Any thoughts on doing it automatically?
>>> 
 On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
 wrote:
 
 *
 
 Hello,
 
 *
 *
 
 **
 
 In the last couple weeks the number of broken tests have started to
> go
>>> up...and even tho I run bisect/etc from time to time ; sometimes
 people
>>> don’t react to my comments/tickets/etc.
 
 Because keeping this many failing tests makes it easier for a new
 one
>> to
>>> slip in...I think reverting the patch introducing the test failures
> would
>>> also help in some case.
 
 I think it would help a lot to prevent further test breaks to
 revert
>> the
>>> patch if any of the following conditions is met:
 
 *
 *
 
 C1) if the notification/comment about the fact that the patch
 indeed
>>> broken a test somehow have been unanswered for at least 24 hours.
 
 C2) if the patch is in for 7 days; but the test failure is still
 not
>>> addressed (note that in this case there might be a conversation about
>>> fixing it...but in this case ; to enable other people to work in a
>> cleaner
>>> environment is more important than a single patch - and if it can't
 be
>>> fixed in 7 days...well it might not get fixed in a month).
 
 *
 *
 
 I would like to also note that I've seen a few tickets which have
> been
>>> picked up by people who were not involved in creating the original
>> change -
>>> and although the intention was good, they might miss the context of
 the
>>> original patch and may "fix" the tests in the wrong way: accept a
 q.out
>>> which is inappropriate or ignore the test...
 
 *
 *
 
 would it be ok to implement this from now on? because it makes my
>>> efforts practically useless if people are not reacting…
 
 *
 *
 
 note: just to be on the same page - this is only about running a
> single
>>> test which falls on its own - I feel that flaky tests are an entirely
>>> 

Re: reverting test-breaking changes

2018-03-05 Thread Sahil Takiar
Wow that HBase UI looks super useful. +1 to having something like that.

If not, +1 to having a proper nightly build, it would help devs identify
which commits break which tests. I find using git-bisect can take a long
time to run, and can be difficult to use (e.g. finding a known good commit
isn't always easy).

On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary  wrote:

> Without a nightly build and with this many flaky tests it is very hard to
> identify the braking commits. We can use something like bisect and multiple
> test runs.
>
> There is a more elegant way to do this with nightly test runs:
> https://issues.apache.org/jira/browse/HBASE-15917 <
> https://issues.apache.org/jira/browse/HBASE-15917>
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
> lastSuccessfulBuild/artifact/dashboard.html  job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html>
>
> This also helps to identify the flaky tests, and creates a continuos,
> updated list of them.
>
> > On Feb 23, 2018, at 6:55 PM, Sahil Takiar 
> wrote:
> >
> > +1
> >
> > Does anyone have suggestions about how to efficiently identify which
> commit
> > is breaking a test? Is it just git-bisect or is there an easier way? Hive
> > QA isn't always that helpful, it will say a test is failing for the past
> > "x" builds, but that doesn't help much since Hive QA isn't a nightly
> build.
> >
> > On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >
> >> +1
> >> Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
> would be
> >> good.
> >>
> >> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates 
> wrote:
> >>
> >>> +1.
> >>>
> >>> Alan.
> >>>
> >>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
> >>> wrote:
> >>>
>  +1
>  I agree, this makes sense. The number of failures keeps increasing.
>  A 24 hour heads up in either case before revert would be good.
> 
> 
>  On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
> >> wrote:
> 
> > I agree with Zoltan. The continuously braking tests make it very hard
> >>> to
> > spot real issues.
> > Any thoughts on doing it automatically?
> >
> >> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
> >> wrote:
> >>
> >> *
> >>
> >> Hello,
> >>
> >> *
> >> *
> >>
> >> **
> >>
> >> In the last couple weeks the number of broken tests have started to
> >>> go
> > up...and even tho I run bisect/etc from time to time ; sometimes
> >> people
> > don’t react to my comments/tickets/etc.
> >>
> >> Because keeping this many failing tests makes it easier for a new
> >> one
>  to
> > slip in...I think reverting the patch introducing the test failures
> >>> would
> > also help in some case.
> >>
> >> I think it would help a lot to prevent further test breaks to
> >> revert
>  the
> > patch if any of the following conditions is met:
> >>
> >> *
> >> *
> >>
> >> C1) if the notification/comment about the fact that the patch
> >> indeed
> > broken a test somehow have been unanswered for at least 24 hours.
> >>
> >> C2) if the patch is in for 7 days; but the test failure is still
> >> not
> > addressed (note that in this case there might be a conversation about
> > fixing it...but in this case ; to enable other people to work in a
>  cleaner
> > environment is more important than a single patch - and if it can't
> >> be
> > fixed in 7 days...well it might not get fixed in a month).
> >>
> >> *
> >> *
> >>
> >> I would like to also note that I've seen a few tickets which have
> >>> been
> > picked up by people who were not involved in creating the original
>  change -
> > and although the intention was good, they might miss the context of
> >> the
> > original patch and may "fix" the tests in the wrong way: accept a
> >> q.out
> > which is inappropriate or ignore the test...
> >>
> >> *
> >> *
> >>
> >> would it be ok to implement this from now on? because it makes my
> > efforts practically useless if people are not reacting…
> >>
> >> *
> >> *
> >>
> >> note: just to be on the same page - this is only about running a
> >>> single
> > test which falls on its own - I feel that flaky tests are an entirely
> > different topic.
> >>
> >> *
> >> *
> >>
> >> cheers,
> >>
> >> Zoltan
> >>
> >> **
> >> *
> >
> >
> 
> >>>
> >>
> >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer
> > takiar.sa...@gmail.com | (510) 673-0309
>
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: reverting test-breaking changes

2018-03-05 Thread Peter Vary
Without a nightly build and with this many flaky tests it is very hard to 
identify the braking commits. We can use something like bisect and multiple 
test runs.

There is a more elegant way to do this with nightly test runs:
https://issues.apache.org/jira/browse/HBASE-15917 

https://builds.apache.org/job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html
 


This also helps to identify the flaky tests, and creates a continuos, updated 
list of them.

> On Feb 23, 2018, at 6:55 PM, Sahil Takiar  wrote:
> 
> +1
> 
> Does anyone have suggestions about how to efficiently identify which commit
> is breaking a test? Is it just git-bisect or is there an easier way? Hive
> QA isn't always that helpful, it will say a test is failing for the past
> "x" builds, but that doesn't help much since Hive QA isn't a nightly build.
> 
> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar 
> wrote:
> 
>> +1
>> Commenting on JIRA and giving a 24hr heads-up (excluding weekends) would be
>> good.
>> 
>> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates  wrote:
>> 
>>> +1.
>>> 
>>> Alan.
>>> 
>>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
>>> wrote:
>>> 
 +1
 I agree, this makes sense. The number of failures keeps increasing.
 A 24 hour heads up in either case before revert would be good.
 
 
 On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
>> wrote:
 
> I agree with Zoltan. The continuously braking tests make it very hard
>>> to
> spot real issues.
> Any thoughts on doing it automatically?
> 
>> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
>> wrote:
>> 
>> *
>> 
>> Hello,
>> 
>> *
>> *
>> 
>> **
>> 
>> In the last couple weeks the number of broken tests have started to
>>> go
> up...and even tho I run bisect/etc from time to time ; sometimes
>> people
> don’t react to my comments/tickets/etc.
>> 
>> Because keeping this many failing tests makes it easier for a new
>> one
 to
> slip in...I think reverting the patch introducing the test failures
>>> would
> also help in some case.
>> 
>> I think it would help a lot to prevent further test breaks to
>> revert
 the
> patch if any of the following conditions is met:
>> 
>> *
>> *
>> 
>> C1) if the notification/comment about the fact that the patch
>> indeed
> broken a test somehow have been unanswered for at least 24 hours.
>> 
>> C2) if the patch is in for 7 days; but the test failure is still
>> not
> addressed (note that in this case there might be a conversation about
> fixing it...but in this case ; to enable other people to work in a
 cleaner
> environment is more important than a single patch - and if it can't
>> be
> fixed in 7 days...well it might not get fixed in a month).
>> 
>> *
>> *
>> 
>> I would like to also note that I've seen a few tickets which have
>>> been
> picked up by people who were not involved in creating the original
 change -
> and although the intention was good, they might miss the context of
>> the
> original patch and may "fix" the tests in the wrong way: accept a
>> q.out
> which is inappropriate or ignore the test...
>> 
>> *
>> *
>> 
>> would it be ok to implement this from now on? because it makes my
> efforts practically useless if people are not reacting…
>> 
>> *
>> *
>> 
>> note: just to be on the same page - this is only about running a
>>> single
> test which falls on its own - I feel that flaky tests are an entirely
> different topic.
>> 
>> *
>> *
>> 
>> cheers,
>> 
>> Zoltan
>> 
>> **
>> *
> 
> 
 
>>> 
>> 
> 
> 
> 
> -- 
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309



Re: reverting test-breaking changes

2018-02-23 Thread Sahil Takiar
+1

Does anyone have suggestions about how to efficiently identify which commit
is breaking a test? Is it just git-bisect or is there an easier way? Hive
QA isn't always that helpful, it will say a test is failing for the past
"x" builds, but that doesn't help much since Hive QA isn't a nightly build.

On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar 
wrote:

> +1
> Commenting on JIRA and giving a 24hr heads-up (excluding weekends) would be
> good.
>
> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates  wrote:
>
> > +1.
> >
> > Alan.
> >
> > On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
> > wrote:
> >
> > > +1
> > > I agree, this makes sense. The number of failures keeps increasing.
> > > A 24 hour heads up in either case before revert would be good.
> > >
> > >
> > > On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary 
> wrote:
> > >
> > > > I agree with Zoltan. The continuously braking tests make it very hard
> > to
> > > > spot real issues.
> > > > Any thoughts on doing it automatically?
> > > >
> > > > > On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich 
> wrote:
> > > > >
> > > > > *
> > > > >
> > > > > Hello,
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > **
> > > > >
> > > > > In the last couple weeks the number of broken tests have started to
> > go
> > > > up...and even tho I run bisect/etc from time to time ; sometimes
> people
> > > > don’t react to my comments/tickets/etc.
> > > > >
> > > > > Because keeping this many failing tests makes it easier for a new
> one
> > > to
> > > > slip in...I think reverting the patch introducing the test failures
> > would
> > > > also help in some case.
> > > > >
> > > > > I think it would help a lot to prevent further test breaks to
> revert
> > > the
> > > > patch if any of the following conditions is met:
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > C1) if the notification/comment about the fact that the patch
> indeed
> > > > broken a test somehow have been unanswered for at least 24 hours.
> > > > >
> > > > > C2) if the patch is in for 7 days; but the test failure is still
> not
> > > > addressed (note that in this case there might be a conversation about
> > > > fixing it...but in this case ; to enable other people to work in a
> > > cleaner
> > > > environment is more important than a single patch - and if it can't
> be
> > > > fixed in 7 days...well it might not get fixed in a month).
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > I would like to also note that I've seen a few tickets which have
> > been
> > > > picked up by people who were not involved in creating the original
> > > change -
> > > > and although the intention was good, they might miss the context of
> the
> > > > original patch and may "fix" the tests in the wrong way: accept a
> q.out
> > > > which is inappropriate or ignore the test...
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > would it be ok to implement this from now on? because it makes my
> > > > efforts practically useless if people are not reacting…
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > note: just to be on the same page - this is only about running a
> > single
> > > > test which falls on its own - I feel that flaky tests are an entirely
> > > > different topic.
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > > cheers,
> > > > >
> > > > > Zoltan
> > > > >
> > > > > **
> > > > > *
> > > >
> > > >
> > >
> >
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: reverting test-breaking changes

2018-02-22 Thread Vihang Karajgaonkar
+1
Commenting on JIRA and giving a 24hr heads-up (excluding weekends) would be
good.

On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates  wrote:

> +1.
>
> Alan.
>
> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair 
> wrote:
>
> > +1
> > I agree, this makes sense. The number of failures keeps increasing.
> > A 24 hour heads up in either case before revert would be good.
> >
> >
> > On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary  wrote:
> >
> > > I agree with Zoltan. The continuously braking tests make it very hard
> to
> > > spot real issues.
> > > Any thoughts on doing it automatically?
> > >
> > > > On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich  wrote:
> > > >
> > > > *
> > > >
> > > > Hello,
> > > >
> > > > *
> > > > *
> > > >
> > > > **
> > > >
> > > > In the last couple weeks the number of broken tests have started to
> go
> > > up...and even tho I run bisect/etc from time to time ; sometimes people
> > > don’t react to my comments/tickets/etc.
> > > >
> > > > Because keeping this many failing tests makes it easier for a new one
> > to
> > > slip in...I think reverting the patch introducing the test failures
> would
> > > also help in some case.
> > > >
> > > > I think it would help a lot to prevent further test breaks to revert
> > the
> > > patch if any of the following conditions is met:
> > > >
> > > > *
> > > > *
> > > >
> > > > C1) if the notification/comment about the fact that the patch indeed
> > > broken a test somehow have been unanswered for at least 24 hours.
> > > >
> > > > C2) if the patch is in for 7 days; but the test failure is still not
> > > addressed (note that in this case there might be a conversation about
> > > fixing it...but in this case ; to enable other people to work in a
> > cleaner
> > > environment is more important than a single patch - and if it can't be
> > > fixed in 7 days...well it might not get fixed in a month).
> > > >
> > > > *
> > > > *
> > > >
> > > > I would like to also note that I've seen a few tickets which have
> been
> > > picked up by people who were not involved in creating the original
> > change -
> > > and although the intention was good, they might miss the context of the
> > > original patch and may "fix" the tests in the wrong way: accept a q.out
> > > which is inappropriate or ignore the test...
> > > >
> > > > *
> > > > *
> > > >
> > > > would it be ok to implement this from now on? because it makes my
> > > efforts practically useless if people are not reacting…
> > > >
> > > > *
> > > > *
> > > >
> > > > note: just to be on the same page - this is only about running a
> single
> > > test which falls on its own - I feel that flaky tests are an entirely
> > > different topic.
> > > >
> > > > *
> > > > *
> > > >
> > > > cheers,
> > > >
> > > > Zoltan
> > > >
> > > > **
> > > > *
> > >
> > >
> >
>


Re: reverting test-breaking changes

2018-02-22 Thread Alan Gates
+1.

Alan.

On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair  wrote:

> +1
> I agree, this makes sense. The number of failures keeps increasing.
> A 24 hour heads up in either case before revert would be good.
>
>
> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary  wrote:
>
> > I agree with Zoltan. The continuously braking tests make it very hard to
> > spot real issues.
> > Any thoughts on doing it automatically?
> >
> > > On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich  wrote:
> > >
> > > *
> > >
> > > Hello,
> > >
> > > *
> > > *
> > >
> > > **
> > >
> > > In the last couple weeks the number of broken tests have started to go
> > up...and even tho I run bisect/etc from time to time ; sometimes people
> > don’t react to my comments/tickets/etc.
> > >
> > > Because keeping this many failing tests makes it easier for a new one
> to
> > slip in...I think reverting the patch introducing the test failures would
> > also help in some case.
> > >
> > > I think it would help a lot to prevent further test breaks to revert
> the
> > patch if any of the following conditions is met:
> > >
> > > *
> > > *
> > >
> > > C1) if the notification/comment about the fact that the patch indeed
> > broken a test somehow have been unanswered for at least 24 hours.
> > >
> > > C2) if the patch is in for 7 days; but the test failure is still not
> > addressed (note that in this case there might be a conversation about
> > fixing it...but in this case ; to enable other people to work in a
> cleaner
> > environment is more important than a single patch - and if it can't be
> > fixed in 7 days...well it might not get fixed in a month).
> > >
> > > *
> > > *
> > >
> > > I would like to also note that I've seen a few tickets which have been
> > picked up by people who were not involved in creating the original
> change -
> > and although the intention was good, they might miss the context of the
> > original patch and may "fix" the tests in the wrong way: accept a q.out
> > which is inappropriate or ignore the test...
> > >
> > > *
> > > *
> > >
> > > would it be ok to implement this from now on? because it makes my
> > efforts practically useless if people are not reacting…
> > >
> > > *
> > > *
> > >
> > > note: just to be on the same page - this is only about running a single
> > test which falls on its own - I feel that flaky tests are an entirely
> > different topic.
> > >
> > > *
> > > *
> > >
> > > cheers,
> > >
> > > Zoltan
> > >
> > > **
> > > *
> >
> >
>


Re: reverting test-breaking changes

2018-02-22 Thread Thejas Nair
+1
I agree, this makes sense. The number of failures keeps increasing.
A 24 hour heads up in either case before revert would be good.


On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary  wrote:

> I agree with Zoltan. The continuously braking tests make it very hard to
> spot real issues.
> Any thoughts on doing it automatically?
>
> > On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich  wrote:
> >
> > *
> >
> > Hello,
> >
> > *
> > *
> >
> > **
> >
> > In the last couple weeks the number of broken tests have started to go
> up...and even tho I run bisect/etc from time to time ; sometimes people
> don’t react to my comments/tickets/etc.
> >
> > Because keeping this many failing tests makes it easier for a new one to
> slip in...I think reverting the patch introducing the test failures would
> also help in some case.
> >
> > I think it would help a lot to prevent further test breaks to revert the
> patch if any of the following conditions is met:
> >
> > *
> > *
> >
> > C1) if the notification/comment about the fact that the patch indeed
> broken a test somehow have been unanswered for at least 24 hours.
> >
> > C2) if the patch is in for 7 days; but the test failure is still not
> addressed (note that in this case there might be a conversation about
> fixing it...but in this case ; to enable other people to work in a cleaner
> environment is more important than a single patch - and if it can't be
> fixed in 7 days...well it might not get fixed in a month).
> >
> > *
> > *
> >
> > I would like to also note that I've seen a few tickets which have been
> picked up by people who were not involved in creating the original change -
> and although the intention was good, they might miss the context of the
> original patch and may "fix" the tests in the wrong way: accept a q.out
> which is inappropriate or ignore the test...
> >
> > *
> > *
> >
> > would it be ok to implement this from now on? because it makes my
> efforts practically useless if people are not reacting…
> >
> > *
> > *
> >
> > note: just to be on the same page - this is only about running a single
> test which falls on its own - I feel that flaky tests are an entirely
> different topic.
> >
> > *
> > *
> >
> > cheers,
> >
> > Zoltan
> >
> > **
> > *
>
>


Re: reverting test-breaking changes

2018-02-22 Thread Peter Vary
I agree with Zoltan. The continuously braking tests make it very hard to spot 
real issues.
Any thoughts on doing it automatically?

> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich  wrote:
> 
> *
> 
> Hello,
> 
> *
> *
> 
> **
> 
> In the last couple weeks the number of broken tests have started to go 
> up...and even tho I run bisect/etc from time to time ; sometimes people don’t 
> react to my comments/tickets/etc.
> 
> Because keeping this many failing tests makes it easier for a new one to slip 
> in...I think reverting the patch introducing the test failures would also 
> help in some case.
> 
> I think it would help a lot to prevent further test breaks to revert the 
> patch if any of the following conditions is met:
> 
> *
> *
> 
> C1) if the notification/comment about the fact that the patch indeed broken a 
> test somehow have been unanswered for at least 24 hours.
> 
> C2) if the patch is in for 7 days; but the test failure is still not 
> addressed (note that in this case there might be a conversation about fixing 
> it...but in this case ; to enable other people to work in a cleaner 
> environment is more important than a single patch - and if it can't be fixed 
> in 7 days...well it might not get fixed in a month).
> 
> *
> *
> 
> I would like to also note that I've seen a few tickets which have been picked 
> up by people who were not involved in creating the original change - and 
> although the intention was good, they might miss the context of the original 
> patch and may "fix" the tests in the wrong way: accept a q.out which is 
> inappropriate or ignore the test...
> 
> *
> *
> 
> would it be ok to implement this from now on? because it makes my efforts 
> practically useless if people are not reacting…
> 
> *
> *
> 
> note: just to be on the same page - this is only about running a single test 
> which falls on its own - I feel that flaky tests are an entirely different 
> topic.
> 
> *
> *
> 
> cheers,
> 
> Zoltan
> 
> **
> *