Re: [PR] [SPARK-54338][INFRA] Enable `pyspark-pandas` in `build_coverage.yml` [spark]

via GitHub Thu, 13 Nov 2025 12:34:31 -0800


gaogaotiantian commented on PR #53042:
URL: https://github.com/apache/spark/pull/53042#issuecomment-3529611833


   I think there's an ambiguity on the word "coverage". The "coverage" in my 
title, refers to a very specific action, where we run Python tests with a 
specific library `coveragepy` and get the line coverage number 
[here](https://app.codecov.io/gh/apache/spark). Currently, we do it once per 
day, but only for `pyspark` tests, which resulted in a coverage rate about 76%. 
If you take a look at the report, we are missing a lot in pyspark/pandas 
   
   <img width="1264" height="45" alt="image" 
src="https://github.com/user-attachments/assets/98d3bd19-4724-4c22-8b50-3993074dd67e";
 />
   
   The reason is that, for this specific daily run, we do not run pandas tests 
under `coveragepy`, so we do not have the number from it.
   
   Coverage could also mean a more general concept of whether we are testing 
the pandas part - and yes we do have that from all the PRs. I didn't mean we 
never test that part.
   
   Once again, to avoid the confusion - I did not plan to add more tests on 
every commit to master. I proposed to add pandas runs once per day, only in the 
`build_coverage` run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54338][INFRA] Enable `pyspark-pandas` in `build_coverage.yml` [spark]

Reply via email to