[ https://issues.apache.org/jira/browse/SPARK-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng resolved SPARK-14241. ----------------------------------- Resolution: Fixed > Output of monotonically_increasing_id lacks stable relation with rows of > DataFrame > ---------------------------------------------------------------------------------- > > Key: SPARK-14241 > URL: https://issues.apache.org/jira/browse/SPARK-14241 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 1.6.0, 1.6.1 > Reporter: Paul Shearer > Fix For: 2.0.0 > > > If you use monotonically_increasing_id() to append a column of IDs to a > DataFrame, the IDs do not have a stable, deterministic relationship to the > rows they are appended to. A given ID value can land on different rows > depending on what happens in the task graph: > http://stackoverflow.com/questions/35705038/how-do-i-add-an-persistent-column-of-row-ids-to-spark-dataframe/35706321#35706321 > From a user perspective this behavior is very unexpected, and many things one > would normally like to do with an ID column are in fact only possible under > very narrow circumstances. The function should either be made deterministic, > or there should be a prominent warning note in the API docs regarding its > behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org