[GitHub] [spark] ryan-johnson-databricks commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

GitBox Mon, 22 Aug 2022 08:48:24 -0700


ryan-johnson-databricks commented on PR #37573:
URL: https://github.com/apache/spark/pull/37573#issuecomment-1222544502


   > For Spark's own internal development, I'm wondering whether this change 
will introduce source-compatibility concerns in our own patch backports: If I 
write a bugfix patch using the new lambda syntax then cherry-pick that patch to 
older branches then I'll run into compile failures. Of course, _the option_ to 
be compatible exists but a developer might forget to use it (esp. since IDEs 
are likely to suggest a replacement to the lambda syntax).
   
   I'm not sure that concern is very troublesome?
   1. The task listener code is _very_ slow-moving -- code involving a task 
listener tends to not change after being added, other than a half dozen 
refactors that move/indent it. Full analysis below.
   1. In the last four years, 11 new listeners were added by a half dozen code 
authors. Each case would have required the author to figure out that `[Unit]` 
was needed. Meanwhile, there has only been _ONE_ bugfix "backport" since 
`[Unit]` was added in 2018 -- and that one backport was for one of the 11 
new-code changes, from master back to a recent branch cut, to fix a regression 
found during release testing. So today the new dev cost of keeping `[Unit]` is 
10x higher than the backport cost of removing it.
   1. it's pretty easy to add `[Unit]` back in if needed?
   
   (***) By "_very_ slow moving" I mean:
   * There are no prod uses of `addTaskFailureListener` in the spark code base 
today -- which probably explains why nobody realized it needed to become 
polymorphic before now.
   * Out of 45 prod files that use `addTaskCompletionListener[Unit]`, 28 have 
no changes since the Aug 2018 refactor that added `[Unit]` as part of 
scala-2.12 effort. Of the 17 remaining files (full list below) only one change 
was a bug fix that would have needed a backport. That change added a new 
listener (2022-03-09) and merged to master less than a week before spark-3.3 
branch cut (2022-03-15); a perf regression was during release testing and fixed 
by revert (2022-04-26).
   
   List of recent changes to `addTaskCompletionListener[Unit]`:
     * d3d22928d4f in Jun 2022, ArrowConverters.scala (adding a null-check to 
the task context a listener gets added to, as part of a refactor)
     * 20ffbf7b308 in Apr 2022, DataSourceRDD.scala (refactor-induced 
indentation change)
     * 6b5a1f9df28 in Apr 2022, ShuffledHashJoinExec.scala (revert a Mar 2022 
change to code first added in Aug 2020)
     * 8714eefe6f9 in Aug 2021, Columnar.scala (refactor-induced indentation 
change)
     * 3257a30e539 in Jun 2021, RocksDB.scala (implement new RocksDB connector)
     * cc9a158712d in Jun 2021, SortMergeJoinExec.scala (added a new completion 
listener to update a spill size metric)
     * f11950f08f6 in Mar 2021, ExternalSorter.scala (refactor that moved 
existing code to a new location and also reformatted it)
     * d871b54a4e9 in Jan 2021, ObjectAggregationIterator.scala (added a new 
completion listener for capturing metrics)
     * 21413b7dd4e in Nov 2020, streaming/state/package.scala (new code)
     * 713124d5e32 in Aug 2020, InMemoryRelation.scala (refactor that moved an 
existing completion listener)
     * d7b268ab326 in Dec 2019, CollectMetricsExec.scala (added a new 
completion listener that collects metrics)
     * 05988b256e8 in Sep 2019, BaseArrowPythonRunner.scala (refactor that 
moved existing code to a new file, otherwise unchanged)
     * 3663dbe5418 in Jul 2019, AvroPartitionReaderFactory.scala (new code, 
avro DSv2 support)
     * 23ebd389b5c in Jun 2019, ParquetPartitionReaderFactory.scala (new code, 
parquet DSv2 support)
     * d50603a37c4 in Apr 2019, TextPartitionReaderFactory.scala (new code, 
text DSv2 support)
     * 8126d09fb5b in Feb 2019, ArrowRunner.scala (new code)
     * 1280bfd7564 in Jan 2019, PipedRDD.scala (bug fix that required adding a 
new completion listener)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ryan-johnson-databricks commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

Reply via email to