Re: Some tests started hanging recently

2020-06-19 Thread Jagat Singh
Hello Zoltan,

Thank you for this.

So, I ran few tests under itests to replicate the issue.

There are 10 files inside itests which use Tez in one form or the other. I
ran tests for all.

All of them finished with maximum one running duration for 8 mins.

Below are the timings for all

|Testname|Duration|
|TestMmCompactorOnTez| 3.41 min|
|TestAcidOnTez||
|TestCrudCompactorOnTez| 4.00 min|
|TestBeeLineWithArgs| 2.59 min|
|TestCopyUtils| 49 s|
|TestTriggersTezSessionPoolManager| 22 s|
|TestTriggersWorkloadManager| 21 s|
|TestTriggersNoTezSessionPool| 1.50 min|
|TestTezPerfConstraintsCliDriver| 8 min |

I am not sure what is causing the issue seen on the build server.

Regards,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:

> Hey all,
>
> Since yesterday some tests started to hang - most frequently
> TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
> test as well - so I don't think its
> limited to those 2 tests.
>
> I was not able to figure out what have caused this - my current guess is
> that somehow the tez 0.9.2 upgrade have caused it.
> To validate this guess I've started the flaky checker with and without
> that patch from the current state...
>
> I've collected some jstacks from the containers running for more than 20
> hours
>
> https://termbin.com/z1eoc
> https://termbin.com/2m0j
> https://termbin.com/027t
> https://termbin.com/1dbe
>
> cheers,
> Zoltan
>


Re: Some tests started hanging recently

2020-06-19 Thread Zoltan Haindrich

Hey Jagat!

On 6/19/20 3:19 AM, Jagat Singh wrote:

I was not expecting to hear this for my first PR :(


No worries - this could happen; I think we've bumped into some nasty 
concurrency bug...


I will also try to re-run the tests locally on my system and report back to
you.


thank you, it took a while but the flaky-checker have stuck after running 8 times with one of the tests - while the other (with the tez update patch reverted) have finished 
successfully and run it a 100 times.

http://130.211.9.232/job/hive-flaky-check/51/
http://130.211.9.232/job/hive-flaky-check/52/

I'm going to revert tez 0.9.2 for now

cheers,
Zoltan




Thanks,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:


Hey all,

Since yesterday some tests started to hang - most frequently
TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
test as well - so I don't think its
limited to those 2 tests.

I was not able to figure out what have caused this - my current guess is
that somehow the tez 0.9.2 upgrade have caused it.
To validate this guess I've started the flaky checker with and without
that patch from the current state...

I've collected some jstacks from the containers running for more than 20
hours

https://termbin.com/z1eoc
https://termbin.com/2m0j
https://termbin.com/027t
https://termbin.com/1dbe

cheers,
Zoltan





Re: Some tests started hanging recently

2020-06-18 Thread Jagat Singh
Hello Zoltan,

I was not expecting to hear this for my first PR :(

I will also try to re-run the tests locally on my system and report back to
you.

Thanks,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:

> Hey all,
>
> Since yesterday some tests started to hang - most frequently
> TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
> test as well - so I don't think its
> limited to those 2 tests.
>
> I was not able to figure out what have caused this - my current guess is
> that somehow the tez 0.9.2 upgrade have caused it.
> To validate this guess I've started the flaky checker with and without
> that patch from the current state...
>
> I've collected some jstacks from the containers running for more than 20
> hours
>
> https://termbin.com/z1eoc
> https://termbin.com/2m0j
> https://termbin.com/027t
> https://termbin.com/1dbe
>
> cheers,
> Zoltan
>


Some tests started hanging recently

2020-06-18 Thread Zoltan Haindrich

Hey all,

Since yesterday some tests started to hang - most frequently TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication test as well - so I don't think its 
limited to those 2 tests.


I was not able to figure out what have caused this - my current guess is that 
somehow the tez 0.9.2 upgrade have caused it.
To validate this guess I've started the flaky checker with and without that 
patch from the current state...

I've collected some jstacks from the containers running for more than 20 hours

https://termbin.com/z1eoc
https://termbin.com/2m0j
https://termbin.com/027t
https://termbin.com/1dbe

cheers,
Zoltan