srowen commented on a change in pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric types URL: https://github.com/apache/spark/pull/17864#discussion_r309389129
########## File path: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ########## @@ -84,9 +84,15 @@ private[feature] trait ImputerParams extends Params with HasInputCols with HasOu * :: Experimental :: * Imputation estimator for completing missing values, either using the mean or the median * of the columns in which the missing values are located. The input columns should be of - * DoubleType or FloatType. Currently Imputer does not support categorical features + * numeric type. Currently Imputer does not support categorical features * (SPARK-15041) and possibly creates incorrect values for a categorical feature. * + * Note that the input columns are converted to Double data type internally to compute + * the mean/median value and impute the missing values, which are then casted back to Review comment: I wouldn't put all of this implementation detail into the docs. I would however note that in the case of integer types and mean imputation, the mean will be cast (truncated) to an integer type. That is, your example is a good one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
