HyukjinKwon opened a new pull request, #44842: URL: https://github.com/apache/spark/pull/44842
### What changes were proposed in this pull request? This PR cleans up the obsolete code in PySpark coverage script ### Why are the changes needed? We used to use `coverage_daemon.py` for Python workers to track the coverage of the Python worker side (e.g., the coverage within Python UDF), added in https://github.com/apache/spark/pull/20204. However, seems it does not work anymore. In fact, it has been multiple years that it stopped working. The approach of replacing the Python worker itself was a bit hacky workaround. We should just get rid of them first, and find a proper way. This should also deflake the scheduled jobs, and speed up the build. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested via: ```python ./run-tests-with-coverage --python-executables=python3 --testname="pyspark.sql.functions.builtin" ``` ``` Finished test(python3): pyspark.sql.tests.test_functions (87s) Tests passed in 87 seconds Combining collected coverage data under /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71607.501653 Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71798.177503 Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71417.646740 Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71419.320617 Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71418.130736 Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71415.781423 Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71416.272012 Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71799.843181 Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71421.946328 Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71411.225487 Creating XML report file at python/coverage.xml Wrote XML report to coverage.xml Reporting the coverage data at /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data/coverage Name Stmts Miss Branch BrPart Cover ------------------------------------------------------------------------- pyspark/__init__.py 48 7 10 3 76% pyspark/_globals.py 16 3 4 2 75% pyspark/accumulators.py 123 38 26 5 66% pyspark/broadcast.py 121 79 40 3 33% pyspark/conf.py 99 33 50 5 64% pyspark/context.py 451 216 151 26 51% pyspark/errors/__init__.py 3 0 0 0 100% pyspark/errors/error_classes.py 3 0 0 0 100% pyspark/errors/exceptions/__init__.py 0 0 0 0 100% pyspark/errors/exceptions/base.py 91 15 24 4 83% pyspark/errors/exceptions/captured.py 168 81 57 17 48% pyspark/errors/utils.py 34 8 6 2 70% pyspark/files.py 34 15 12 3 57% pyspark/find_spark_home.py 30 24 12 2 19% pyspark/java_gateway.py 114 31 30 12 69% pyspark/join.py 66 58 58 0 6% pyspark/profiler.py 244 182 92 3 22% pyspark/rdd.py 1064 741 378 9 27% pyspark/rddsampler.py 68 50 32 0 18% pyspark/resource/__init__.py 5 0 0 0 100% pyspark/resource/information.py 11 4 4 0 73% pyspark/resource/profile.py 110 82 58 1 27% pyspark/resource/requests.py 139 90 70 0 35% pyspark/resultiterable.py 14 6 2 1 56% pyspark/serializers.py 349 185 90 13 43% pyspark/shuffle.py 397 322 180 1 13% pyspark/sql/__init__.py 14 0 0 0 100% pyspark/sql/catalog.py 203 127 66 2 30% pyspark/sql/column.py 268 78 64 12 67% pyspark/sql/conf.py 40 16 10 3 58% pyspark/sql/context.py 170 95 58 2 47% pyspark/sql/dataframe.py 900 475 459 40 45% pyspark/sql/functions/__init__.py 3 0 0 0 100% pyspark/sql/functions/builtin.py 1741 542 1126 26 76% pyspark/sql/functions/partitioning.py 41 19 18 3 59% pyspark/sql/group.py 81 30 32 3 65% pyspark/sql/observation.py 54 37 22 1 26% pyspark/sql/pandas/__init__.py 1 0 0 0 100% pyspark/sql/pandas/conversion.py 277 249 156 2 8% pyspark/sql/pandas/functions.py 67 49 34 0 18% pyspark/sql/pandas/group_ops.py 89 65 22 2 25% pyspark/sql/pandas/map_ops.py 37 27 10 2 26% pyspark/sql/pandas/serializers.py 381 323 172 0 10% pyspark/sql/pandas/typehints.py 41 32 26 1 15% pyspark/sql/pandas/types.py 407 383 326 1 3% pyspark/sql/pandas/utils.py 29 11 10 5 59% pyspark/sql/profiler.py 80 47 54 1 39% pyspark/sql/readwriter.py 362 253 146 7 27% pyspark/sql/session.py 469 206 228 22 56% pyspark/sql/sql_formatter.py 41 26 16 1 28% pyspark/sql/streaming/__init__.py 4 0 0 0 100% pyspark/sql/streaming/listener.py 400 200 186 1 61% pyspark/sql/streaming/query.py 102 63 40 1 39% pyspark/sql/streaming/readwriter.py 268 207 118 2 21% pyspark/sql/streaming/state.py 100 68 44 0 29% pyspark/sql/tests/__init__.py 0 0 0 0 100% pyspark/sql/tests/test_functions.py 646 2 244 7 99% pyspark/sql/types.py 1013 355 528 74 62% pyspark/sql/udf.py 240 132 90 20 42% pyspark/sql/udtf.py 152 98 52 2 33% pyspark/sql/utils.py 160 83 54 10 45% pyspark/sql/window.py 89 23 56 5 77% pyspark/statcounter.py 79 58 20 0 21% pyspark/status.py 36 13 6 0 55% pyspark/storagelevel.py 41 9 0 0 78% pyspark/taskcontext.py 111 63 40 1 40% pyspark/testing/__init__.py 2 0 0 0 100% pyspark/testing/sqlutils.py 149 44 52 1 75% pyspark/testing/utils.py 312 238 162 2 17% pyspark/traceback_utils.py 38 4 14 6 81% pyspark/util.py 153 120 56 2 18% pyspark/version.py 1 0 0 0 100% ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
