Kevin Ushey created SPARK-17833: ----------------------------------- Summary: 'monotonicallyIncreasingId()' should probably be deterministic Key: SPARK-17833 URL: https://issues.apache.org/jira/browse/SPARK-17833 Project: Spark Issue Type: Bug Reporter: Kevin Ushey Priority: Minor
Right now, it's (IMHO) too easy to shoot yourself in the foot using 'monotonicallyIncreasingId()', as it's easy to expect the generated numbers to function as a 'stable' primary key, for example, and then go on to use that key in e.g. 'joins' and so on. Is there any reason why this function can't be made deterministic? Or, could a deterministic analogue of this function be added (e.g. 'withPrimaryKey(columnName = ...)')? A solution is to immediately cache / persist the table after calling 'monotonicallyIncreasingId()'; it's also possible that the documentation should spell that out loud and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org