zhengruifeng edited a comment on issue #26547: [SPARK-29914][ML] ML models 
attach metadata in `transform`/`transformSchema`
URL: https://github.com/apache/spark/pull/26547#issuecomment-556837046
 
 
   @srowen 
   
   > do any of these take non-trivial extra time to compute and update?
   
   There should not be non-trival cost in update schema, since its logic is 
simple (similar operations like `withColumns` are wildly used ) and should not 
affect the fit/transfrom much.
   
   > does adding them help anything else optimize its operation?
   
   Some downstream impls in the pipeline will try to use the meta if provided, 
otherwise it need to trigger a job, such as a `first` job to get vecter size, 
or a whole pass to get numClasses. Providing more inferrable metadata will help 
to minimize the computation cost of whole pipeline.
   
   Thanks for reviewing.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to