fedimser commented on PR #53157: URL: https://github.com/apache/spark/pull/53157#issuecomment-3598572185
## Bug: This PR breaks `run-tests` with "No module named pyspark.__main__" This commit causes all PySpark tests to fail immediately with: ``` /home/dmytro.fedoriaka/venv2/bin/python: No module named pyspark.__main__; 'pyspark' is a package and cannot be directly executed ``` ### Root Cause The new `module_exists()` function validates test module names using `importlib.util.find_spec()`, but this happens **before** PySpark is added to `PYTHONPATH`. The validation fails because PySpark modules aren't importable yet at argument parsing time. ### Reproduce ```bash python/run-tests --python-executables python3 --testnames pyspark.sql.tests.test_arrow ``` ### Verification Git bisect confirms this commit (7191a14d25e) introduced the regression: - Commit before (c6e8dbe2319): ✅ Tests pass - This commit (7191a14d25e): ❌ Tests fail immediately - Current master: ❌ Still broken ### Fix Needed The `module_exists()` validation needs to either: 1. Defer validation until after PySpark is in PYTHONPATH, or 2. Skip validation for `pyspark.*` modules, or 3. Temporarily add PySpark to sys.path before validation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
