Hi Sean, I didn't override hasCode. But the problem is that Array[T].toSet could work but Array[T].distinct couldn't. If it is because I didn't override hasCode, then toSet shouldn't work either right? I also tried using this Array[T].distinct outside RDD, and it is working alright also, returning me the same result as Array[T].toSet.
Thanks! Anny On Tue, Apr 7, 2015 at 2:31 PM, Sean Owen <so...@cloudera.com> wrote: > Did you override hashCode too? > On Apr 7, 2015 2:39 PM, "anny9699" <anny9...@gmail.com> wrote: > >> Hi, >> >> I have a question about Array[T].distinct on customized class T. My data >> is >> a like RDD[(String, Array[T])] in which T is a class written by my class. >> There are some duplicates in each Array[T] so I want to remove them. I >> override the equals() method in T and use >> >> val dataNoDuplicates = dataDuplicates.map{case(id, arr) => (id, >> arr.distinct)} >> >> to remove duplicates inside RDD. However this doesn't work since I did >> some >> further tests by using >> >> val dataNoDuplicates = dataDuplicates.map{case(id, arr) => >> val uniqArr = arr.distinct >> if(uniqArr.length > 1) println(uniqArr.head == uniqArr.last) >> (id, uniqArr) >> } >> >> And from the worker stdout I could see that it always returns "TRUE" >> results. I then tried removing duplicates by using Array[T].toSet instead >> of >> Array[T].distinct and it is working! >> >> Could anybody explain why the Array[T].toSet and Array[T].distinct behaves >> differently here? And Why is Array[T].distinct not working? >> >> Thanks a lot! >> Anny >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Array-T-distinct-doesn-t-work-inside-RDD-tp22412.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>