zhengruifeng edited a comment on issue #26547: [SPARK-29914][ML] ML models attach metadata in `transform`/`transformSchema` URL: https://github.com/apache/spark/pull/26547#issuecomment-556837046 @srowen > do any of these take non-trivial extra time to compute and update? There should not be non-trival cost in update schema, since its logic is simple (similar operations like `withColumns` are wildly used ) and should not affect the fit/transfrom much. > does adding them help anything else optimize its operation? Some downstream impls in the pipeline will try to use the meta if provided, otherwise it need to trigger a job, such as a `first` job to get vecter size, or a whole pass to get numClasses. Providing more inferrable metadata will help to minimize the computation cost of whole pipeline. Thanks for reviewing.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
