[ https://issues.apache.org/jira/browse/SPARK-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng updated SPARK-28499: --------------------------------- Description: current impl of MinMaxScaler has some small places to be optimized: 1, avoid call param getter in udf. If I remember correctly, there was some tickets and prs about this, calling param getter in udf or map function, will significantly slow down the computation. 2, for a constant dim, the transformed value is also a constant value, which can be precomputed. 3, for a usual dim (i-th), the value is update by values(i) = (values(i) - minArray(i)) / range(i) * scale + $(min) here, we can precompute scale / range, so that a division can be skipped. was: current impl of MinMaxScaler has some small places to be optimized: 1, avoid call param getter in udf. If I remember correctly, there was some tickets and prs about this, calling param getter in udf or map function, will significantly slow down the computation. 2, for a constant dim, the transformed value is also a constant value, which can be precomputed. 3, for a usual dim (i-th), the value is update by values(i) = (values(i) - minArray(i)) / range(i) * scale + $(min) here, we can precompute range * scale, so that a division can be skipped. > Optimize MinMaxScaler > --------------------- > > Key: SPARK-28499 > URL: https://issues.apache.org/jira/browse/SPARK-28499 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Priority: Minor > > current impl of MinMaxScaler has some small places to be optimized: > 1, avoid call param getter in udf. > If I remember correctly, there was some tickets and prs about this, calling > param getter in udf or map function, will significantly slow down the > computation. > 2, for a constant dim, the transformed value is also a constant value, which > can be precomputed. > 3, for a usual dim (i-th), the value is update by > values(i) = (values(i) - minArray(i)) / range(i) * scale + $(min) > here, we can precompute scale / range, so that a division can be skipped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org