Matei Zaharia created SPARK-1770: ------------------------------------ Summary: repartition and coalesce(shuffle=true) put objects with the same key in the same bucket Key: SPARK-1770 URL: https://issues.apache.org/jira/browse/SPARK-1770 Project: Spark Issue Type: Bug Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Matei Zaharia Priority: Blocker
This is bad when you have many identical objects. We should assign each one a random key. -- This message was sent by Atlassian JIRA (v6.2#6252)