HyukjinKwon commented on a change in pull request #32867:
URL: https://github.com/apache/spark/pull/32867#discussion_r654349836
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -19,10 +19,48 @@
import itertools
import os
import re
+import unittest
+
+from sparktestsupport import SPARK_HOME
all_modules = []
+def _discover_python_unittests(paths):
+ """
+ Discover the python module which contains unittests under paths.
+
+ Such as:
+ ['pyspark/tests'], it will return the set of module name under the path of
pyspark/tests, like
+ {'pyspark.tests.test_appsubmit', 'pyspark.tests.test_broadcast', ...}
+
+ :param paths: paths of module to be discovered.
+ :return: A set of complete test module name discovered udner the paths
Review comment:
We should use numpydoc string style ... Feel free to make another PR to
fix other docstrings later ..
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -35,10 +73,9 @@ class Module(object):
def __init__(self, name, dependencies, source_file_regexes,
build_profile_flags=(),
environ=None, sbt_test_goals=(), python_test_goals=(),
excluded_python_implementations=(), test_tags=(),
should_run_r_tests=False,
- should_run_build_tests=False):
+ should_run_build_tests=False, python_test_paths=(),
python_excluded_tests=()):
"""
Define a new module.
-
Review comment:
I think we should keep this newline between description and parameter
descriptions.
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -58,14 +95,19 @@ def __init__(self, name, dependencies, source_file_regexes,
build_profile_flags=
is not explicitly changed.
:param should_run_r_tests: If true, changes in this module will
trigger all R tests.
:param should_run_build_tests: If true, changes in this module will
trigger build tests.
+ :param python_test_paths: A set of python test paths to be discovered
+ :param python_excluded_tests: A set of excluded Python tests
"""
self.name = name
self.dependencies = dependencies
self.source_file_prefixes = source_file_regexes
self.sbt_test_goals = sbt_test_goals
self.build_profile_flags = build_profile_flags
self.environ = environ or {}
- self.python_test_goals = python_test_goals
+ discovered_goals = _discover_python_unittests(python_test_paths)
+ # Final goals = Manual specified goals + Discoverd goals - Excluded
goals
+ all_goals = set(python_test_goals) | set(discovered_goals)
+ self.python_test_goals = sorted(list(set(all_goals) -
set(python_excluded_tests)))
Review comment:
@Yikun what about we add a private variable at these test slow modules,
and detect them by checking if that variable is defined?
For example,
```diff
diff --git a/python/pyspark/pandas/tests/test_dataframe.py
b/python/pyspark/pandas/tests/test_dataframe.py
index e54b7835c22..22f2b5da93e 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -50,6 +50,10 @@ from pyspark.testing.sqlutils import SQLTestUtils
from pyspark.pandas.utils import name_like_string
+# This is used in run-tests.py blah blah. see modules.py blah blah
+_slow_test = True
+
+
class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils):
@property
def pdf(self):
```
and then we can filter like:
```python
is_slow_test = hasattr(module, "_slow_test")
```
In this way, we can remove `pyspark_pandas_slow_unittests` define here too.
Also, can we remove `python_test_paths` and `python_excluded_tests` here
too?
We could run `_discover_python_unittests` and directly assign to
`python_test_goals` I guess. For example,
```python
python_test_goals = [
...
] + _discover_python_unittests("pyspark/tests")
```
##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
python_test_goals=[
# doctests
"pyspark.streaming.util",
Review comment:
BTW, do you plan to do it for doctests later? Doing it in a separate PR
is fine. I'm just curious about your plan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]