eejbyfeldt commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300650411
> OK, I think we can't accept that much perf degradation. If there's a simple way to refactor the code to make both faster, that seems OK. Ideally we avoid separate code branches for 2.12 vs 2.13, unless it's simple and important here I think the two options that have been discussed are either. Separate code branches for 2.12 and 2.13 converting the mutable collections to `Seq` for 2.12 it would just be a no-op since `Seq` is alias for `scala.collection.Seq`. For 2.13 we would copy the data to a `ArraySeq` since in 2.13 `Seq` is an alias for `scala.collection.immutable.Seq`. The gain here is I think that when we are on 2.13 we use an immutable collection instead of `scala.collection.Seq` which might point to a mutable collection. The other option would be to just change the code to explicitly use `scala.collection.Seq` (using scala.collection.IndexedSeq would also be an option) instead of `Seq` and removing the explicit calls `toSeq` then it would have the same meaning and performance as the current 2.12 code. @srowen Which approach do think is preferable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
