Re: Some tests started hanging recently
Hello Zoltan, Thank you for this. So, I ran few tests under itests to replicate the issue. There are 10 files inside itests which use Tez in one form or the other. I ran tests for all. All of them finished with maximum one running duration for 8 mins. Below are the timings for all |Testname|Duration| |TestMmCompactorOnTez| 3.41 min| |TestAcidOnTez|| |TestCrudCompactorOnTez| 4.00 min| |TestBeeLineWithArgs| 2.59 min| |TestCopyUtils| 49 s| |TestTriggersTezSessionPoolManager| 22 s| |TestTriggersWorkloadManager| 21 s| |TestTriggersNoTezSessionPool| 1.50 min| |TestTezPerfConstraintsCliDriver| 8 min | I am not sure what is causing the issue seen on the build server. Regards, Jagat Singh On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich wrote: > Hey all, > > Since yesterday some tests started to hang - most frequently > TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication > test as well - so I don't think its > limited to those 2 tests. > > I was not able to figure out what have caused this - my current guess is > that somehow the tez 0.9.2 upgrade have caused it. > To validate this guess I've started the flaky checker with and without > that patch from the current state... > > I've collected some jstacks from the containers running for more than 20 > hours > > https://termbin.com/z1eoc > https://termbin.com/2m0j > https://termbin.com/027t > https://termbin.com/1dbe > > cheers, > Zoltan >
Re: Some tests started hanging recently
Hey Jagat! On 6/19/20 3:19 AM, Jagat Singh wrote: I was not expecting to hear this for my first PR :( No worries - this could happen; I think we've bumped into some nasty concurrency bug... I will also try to re-run the tests locally on my system and report back to you. thank you, it took a while but the flaky-checker have stuck after running 8 times with one of the tests - while the other (with the tez update patch reverted) have finished successfully and run it a 100 times. http://130.211.9.232/job/hive-flaky-check/51/ http://130.211.9.232/job/hive-flaky-check/52/ I'm going to revert tez 0.9.2 for now cheers, Zoltan Thanks, Jagat Singh On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich wrote: Hey all, Since yesterday some tests started to hang - most frequently TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication test as well - so I don't think its limited to those 2 tests. I was not able to figure out what have caused this - my current guess is that somehow the tez 0.9.2 upgrade have caused it. To validate this guess I've started the flaky checker with and without that patch from the current state... I've collected some jstacks from the containers running for more than 20 hours https://termbin.com/z1eoc https://termbin.com/2m0j https://termbin.com/027t https://termbin.com/1dbe cheers, Zoltan
Re: Some tests started hanging recently
Hello Zoltan, I was not expecting to hear this for my first PR :( I will also try to re-run the tests locally on my system and report back to you. Thanks, Jagat Singh On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich wrote: > Hey all, > > Since yesterday some tests started to hang - most frequently > TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication > test as well - so I don't think its > limited to those 2 tests. > > I was not able to figure out what have caused this - my current guess is > that somehow the tez 0.9.2 upgrade have caused it. > To validate this guess I've started the flaky checker with and without > that patch from the current state... > > I've collected some jstacks from the containers running for more than 20 > hours > > https://termbin.com/z1eoc > https://termbin.com/2m0j > https://termbin.com/027t > https://termbin.com/1dbe > > cheers, > Zoltan >
Some tests started hanging recently
Hey all, Since yesterday some tests started to hang - most frequently TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication test as well - so I don't think its limited to those 2 tests. I was not able to figure out what have caused this - my current guess is that somehow the tez 0.9.2 upgrade have caused it. To validate this guess I've started the flaky checker with and without that patch from the current state... I've collected some jstacks from the containers running for more than 20 hours https://termbin.com/z1eoc https://termbin.com/2m0j https://termbin.com/027t https://termbin.com/1dbe cheers, Zoltan