First guess: your key class does not implement hashCode/equals On Fri, Oct 9, 2015 at 10:05 AM, Devin Huang <hos...@163.com> wrote: > Hi everyone, > > I got a trouble these days,and I don't know whether it is a bug of > spark.When I use GroupByKey for our sequenceFile Data,I find that different > partition number lead different result, so as ReduceByKey. I think the > problem happens on the shuffle stage.I read the source code, but still > can't find the answer. > > > this is the main code: > > val rdd = sc.sequenceFile[UserWritable, TagsWritable](input, > classOf[UserWritable], classOf[TagsWritable]) > val combinedRdd = rdd.map(s => (s._1.getuserid(), > s._2)).groupByKey(num).filter(_._1 == uid) > > num is the number of partition and uid is a filter id for result > comparision. > TagsWritable implements WritableComparable<TagsWritable> and Serializable. > > I used GroupByKey on text file, the result was right. > > Thanks, > Devin Huang > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Different-partition-number-of-GroupByKey-leads-different-result-tp24989.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org