fedimser commented on PR #53157:
URL: https://github.com/apache/spark/pull/53157#issuecomment-3598572185

   ## Bug: This PR breaks `run-tests` with "No module named pyspark.__main__"
   
   This commit causes all PySpark tests to fail immediately with:
   ```
   /home/dmytro.fedoriaka/venv2/bin/python: No module named pyspark.__main__; 
'pyspark' is a package and cannot be directly executed
   ```
   
   ### Root Cause
   The new `module_exists()` function validates test module names using 
`importlib.util.find_spec()`, but this happens **before** PySpark is added to 
`PYTHONPATH`. The validation fails because PySpark modules aren't importable 
yet at argument parsing time.
   
   ### Reproduce
   ```bash
   python/run-tests --python-executables python3 --testnames 
pyspark.sql.tests.test_arrow
   ```
   
   ### Verification
   Git bisect confirms this commit (7191a14d25e) introduced the regression:
   - Commit before (c6e8dbe2319): ✅ Tests pass
   - This commit (7191a14d25e): ❌ Tests fail immediately
   - Current master: ❌ Still broken
   
   ### Fix Needed
   The `module_exists()` validation needs to either:
   1. Defer validation until after PySpark is in PYTHONPATH, or
   2. Skip validation for `pyspark.*` modules, or  
   3. Temporarily add PySpark to sys.path before validation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to