Let me add. The problem is that GroupByKey cannot divide our sequence data into groups correctly ,and produce wrong key/value .The shuffle stage might not be execute correctly.And I don’t know what leads this.
The type of key is String, and the type of value is TagsWritable. I take out one user’s data for example. when the partition number is 300, the value of this user is 2700000102,1.00;130098967f,1.00;2700000027,1.00;2700000001,1.00. when the partition number is 100, the value of this user is 2800002133,1.00;150098921f,1.00; I guess the wrong value is the other user’s value.The data may be mismatched on the shuffle stage. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Different-partition-number-of-GroupByKey-leads-different-result-tp24989p24990.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org