Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13290 )

Change subject: Allow running backend tests sharded and in parallel
......................................................................


Patch Set 2:

(2 comments)

I think this is a good approach.

In the unified binary world, you end up with two sharding approaches. One is 
that you tell CMake the number of shards and it splits up the tests 
appropriately. The other is that you do multiple ADD_BE_TEST calls with 
different gtest_filter values and set them to RUN_SERIAL false. The CMake route 
is more automatic and probably produces more uniform shards. The other route 
produces a more coherent set of tests in a single shard (and produces an 
executable script that runs a single shard), but requires more manual effort.

CMake has some functionality to auto-detect google tests by running -list_tests 
on the gtest executable. I'm not sure how this maps to parallelism. Something 
to explore in the future now that the toolchain has a modern CMake.

http://gerrit.cloudera.org:8080/#/c/13290/2/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/13290/2/be/CMakeLists.txt@674
PS2, Line 674: "GTEST_OUTPUT=xml:$ENV{IMPALA_BE_TEST_LOGS_DIR}/${TARGET}
> Ack
One thing that I found with the unified binary is that gtest_filter causes 
gtest to output JUnitXML that contains the tests that didn't run with 
status="notrun". I suspect sharding does something similar. If it does, then we 
might need to pass the JUnitXML through bin/junitxml_prune_notrun.py. Some 
consumers of JUnitXML like Jenkins don't understand status="notrun" and this 
can throw off the list and count of tests run at the Jenkins level. A good test 
would be to run this on Jenkins and compare the number of tests seen through 
JUnitXML before and after (though I think we changed that slightly with the 
expr-test change).

One option is to modify bin/gen-backend-test-script.py to be able to generate 
scripts that represent the shards. It would take in the actual executable 
(rather than assuming the unified test executable) and then extra args to 
specify the shard number and total shards. Not sure if it is worth the 
complication, but it gets you the "notrun" fix for free. I'm not sure any 
developer would run one of the shard scripts directly rather than using 
gtest_filter or ctest.


http://gerrit.cloudera.org:8080/#/c/13290/2/be/CMakeLists.txt@675
PS2, Line 675: RUN_SERIAL
> right. That's a difference from the way that one of the old "shard and para
I think this is the right thing to do. We want parallelism to extend across 
test boundaries.



--
To view, visit http://gerrit.cloudera.org:8080/13290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d44475e3f5a45c806f00d5003eeadf59683b009
Gerrit-Change-Number: 13290
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 17 May 2019 20:17:46 +0000
Gerrit-HasComments: Yes

Reply via email to