Re: Different partition number of GroupByKey leads different result

Devin Huang Fri, 09 Oct 2015 02:41:29 -0700

Let me add.

The problem is that GroupByKey cannot divide our sequence data into groups
correctly ,and produce wrong key/value .The shuffle stage might not be
execute correctly.And I don’t know what leads this.



The type of key is String, and the type of value is TagsWritable.

I take out one user’s data for example.

when the partition number is 300, the value of this user is
2700000102,1.00;130098967f,1.00;2700000027,1.00;2700000001,1.00.
when the partition number is 100, the value of this user is
2800002133,1.00;150098921f,1.00;

I guess the wrong value is the other user’s value.The data may be mismatched
on the shuffle stage.







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Different-partition-number-of-GroupByKey-leads-different-result-tp24989p24990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Different partition number of GroupByKey leads different result

Reply via email to