Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/12762#discussion_r61539327
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -53,24 +53,43 @@ import org.apache.spark.util.random.XORShiftRandom
*/
private[recommendation] trait ALSModelParams extends Params with
HasPredictionCol {
/**
- * Param for the column name for user ids.
+ * Param for the column name for user ids. Ids must be integers. Other
+ * numeric types are supported for this column, but will be cast to
integers as long as they
+ * fall within the integer value range.
* Default: "user"
* @group param
*/
- val userCol = new Param[String](this, "userCol", "column name for user
ids")
+ val userCol = new Param[String](this, "userCol", "column name for user
ids. Must be within " +
+ "the integer value range.")
/** @group getParam */
def getUserCol: String = $(userCol)
/**
- * Param for the column name for item ids.
+ * Param for the column name for item ids. Ids must be integers. Other
+ * numeric types are supported for this column, but will be cast to
integers as long as they
--- End diff --
Personally as per the related JIRA, I would actually like to support Int,
Long and String for ids in ALS (with appropriate warnings about performance
impact for Long/String ids). For the vast majority of use cases I believe the
user-friendliness of supporting String in particular outweighs the performance
impact. For those users who need performance at scale, they can stick to Int.
But for now, since only Int ids are supported in the DF API, some
validation is better than nothing. I am actually slightly more in favor of only
supporting Int or Long for the id columns in this PR, since the real-world
occurrence of a Double or other more esoteric numeric type for the id column
is, IMO, highly unlikely, and in that case requiring users to do the cast
explicitly themselves is ok I would say.
So we can support Longs (within Integer range) as a simpler alternative
here - it would just require to update the type checks in `transformSchema` and
the tests.
@jkbradley @srowen @holdenk thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]