Kevin Ushey created SPARK-17833:
-----------------------------------

             Summary: 'monotonicallyIncreasingId()' should probably be 
deterministic
                 Key: SPARK-17833
                 URL: https://issues.apache.org/jira/browse/SPARK-17833
             Project: Spark
          Issue Type: Bug
            Reporter: Kevin Ushey
            Priority: Minor


Right now, it's (IMHO) too easy to shoot yourself in the foot using 
'monotonicallyIncreasingId()', as it's easy to expect the generated numbers to 
function as a 'stable' primary key, for example, and then go on to use that key 
in e.g. 'joins' and so on.

Is there any reason why this function can't be made deterministic? Or, could a 
deterministic analogue of this function be added (e.g. 
'withPrimaryKey(columnName = ...)')?

A solution is to immediately cache / persist the table after calling 
'monotonicallyIncreasingId()'; it's also possible that the documentation should 
spell that out loud and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to