gaogaotiantian commented on PR #53042: URL: https://github.com/apache/spark/pull/53042#issuecomment-3529611833
I think there's an ambiguity on the word "coverage". The "coverage" in my title, refers to a very specific action, where we run Python tests with a specific library `coveragepy` and get the line coverage number [here](https://app.codecov.io/gh/apache/spark). Currently, we do it once per day, but only for `pyspark` tests, which resulted in a coverage rate about 76%. If you take a look at the report, we are missing a lot in pyspark/pandas <img width="1264" height="45" alt="image" src="https://github.com/user-attachments/assets/98d3bd19-4724-4c22-8b50-3993074dd67e" /> The reason is that, for this specific daily run, we do not run pandas tests under `coveragepy`, so we do not have the number from it. Coverage could also mean a more general concept of whether we are testing the pandas part - and yes we do have that from all the PRs. I didn't mean we never test that part. Once again, to avoid the confusion - I did not plan to add more tests on every commit to master. I proposed to add pandas runs once per day, only in the `build_coverage` run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
