Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12762#discussion_r61539327
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
    @@ -53,24 +53,43 @@ import org.apache.spark.util.random.XORShiftRandom
      */
     private[recommendation] trait ALSModelParams extends Params with 
HasPredictionCol {
       /**
    -   * Param for the column name for user ids.
    +   * Param for the column name for user ids. Ids must be integers. Other
    +   * numeric types are supported for this column, but will be cast to 
integers as long as they
    +   * fall within the integer value range.
        * Default: "user"
        * @group param
        */
    -  val userCol = new Param[String](this, "userCol", "column name for user 
ids")
    +  val userCol = new Param[String](this, "userCol", "column name for user 
ids. Must be within " +
    +    "the integer value range.")
     
       /** @group getParam */
       def getUserCol: String = $(userCol)
     
       /**
    -   * Param for the column name for item ids.
    +   * Param for the column name for item ids. Ids must be integers. Other
    +   * numeric types are supported for this column, but will be cast to 
integers as long as they
    --- End diff --
    
    Personally as per the related JIRA, I would actually like to support Int, 
Long and String for ids in ALS (with appropriate warnings about performance 
impact for Long/String ids). For the vast majority of use cases I believe the 
user-friendliness of supporting String in particular outweighs the performance 
impact. For those users who need performance at scale, they can stick to Int.
    
    But for now, since only Int ids are supported in the DF API, some 
validation is better than nothing. I am actually slightly more in favor of only 
supporting Int or Long for the id columns in this PR, since the real-world 
occurrence of a Double or other more esoteric numeric type for the id column 
is, IMO, highly unlikely, and in that case requiring users to do the cast 
explicitly themselves is ok I would say.
    
    So we can support Longs (within Integer range) as a simpler alternative 
here - it would just require to update the type checks in `transformSchema` and 
the tests. 
    
    @jkbradley @srowen @holdenk thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to