Yikun commented on a change in pull request #32867:
URL: https://github.com/apache/spark/pull/32867#discussion_r654522707
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -58,14 +95,19 @@ def __init__(self, name, dependencies, source_file_regexes,
build_profile_flags=
is not explicitly changed.
:param should_run_r_tests: If true, changes in this module will
trigger all R tests.
:param should_run_build_tests: If true, changes in this module will
trigger build tests.
+ :param python_test_paths: A set of python test paths to be discovered
+ :param python_excluded_tests: A set of excluded Python tests
"""
self.name = name
self.dependencies = dependencies
self.source_file_prefixes = source_file_regexes
self.sbt_test_goals = sbt_test_goals
self.build_profile_flags = build_profile_flags
self.environ = environ or {}
- self.python_test_goals = python_test_goals
+ discovered_goals = _discover_python_unittests(python_test_paths)
+ # Final goals = Manual specified goals + Discoverd goals - Excluded
goals
+ all_goals = set(python_test_goals) | set(discovered_goals)
+ self.python_test_goals = sorted(list(set(all_goals) -
set(python_excluded_tests)))
Review comment:
Yes, great suggestion, it makes module implementation more clean.
If we want to **only** discover the slow tests
```
python_test_goals = [
...
] + _discover_python_unittests("pyspark/tests", discover_slow=True)
```
otherwise will skip slow tests.
I complete it in my next commit. : )
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
python_test_goals=[
# doctests
"pyspark.streaming.util",
Review comment:
Frankly, not yet, but I'd like to start to do some investigation. This
unittestPR is started when I forgot many times to add the unittests in module.
^_^
And the `doctests` discover is a bit special, it's not subclass of doctest
(just like the unitest), so looks like there are 2 potential solution:
1. Flag based. Add some flag like `_doc_test = True` in every test we want
to test.
2. Special name based. Discover which module contains `_doctest` method (we
should rename _test to _doctest to make doc test name unique).
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
python_test_goals=[
# doctests
"pyspark.streaming.util",
Review comment:
BTW, do you think the footer method (excute doctest or unitests in
__main__ with repeat code in many files) in python code is a good way enough?
Do we consider do some re-arch (or to say, already have some work)on testing
framework?
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
python_test_goals=[
# doctests
"pyspark.streaming.util",
Review comment:
BTW, I'm not sure here is the good place to discussion. : )
Do you think the footer method (excute doctest or unitests in __main__ with
repeat code in many files) in python code is a good way enough? Do we consider
do some re-arch (or to say, already have some work) on testing framework?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]