srowen commented on a change in pull request #17864: [SPARK-20604][ML] Allow 
imputer to handle numeric types
URL: https://github.com/apache/spark/pull/17864#discussion_r309389129
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
 ##########
 @@ -84,9 +84,15 @@ private[feature] trait ImputerParams extends Params with 
HasInputCols with HasOu
  * :: Experimental ::
  * Imputation estimator for completing missing values, either using the mean 
or the median
  * of the columns in which the missing values are located. The input columns 
should be of
- * DoubleType or FloatType. Currently Imputer does not support categorical 
features
+ * numeric type. Currently Imputer does not support categorical features
  * (SPARK-15041) and possibly creates incorrect values for a categorical 
feature.
  *
+ * Note that the input columns are converted to Double data type internally to 
compute
+ * the mean/median value and impute the missing values, which are then casted 
back to
 
 Review comment:
   I wouldn't put all of this implementation detail into the docs. I would 
however note that in the case of integer types and mean imputation, the mean 
will be cast (truncated) to an integer type. That is, your example is a good 
one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to