Hi Sean,

I didn't override hasCode. But the problem is that Array[T].toSet could
work but Array[T].distinct couldn't. If it is because I didn't override
hasCode, then toSet shouldn't work either right? I also tried using this
Array[T].distinct outside RDD, and it is working alright also, returning me
the same result as Array[T].toSet.

Thanks!
Anny

On Tue, Apr 7, 2015 at 2:31 PM, Sean Owen <so...@cloudera.com> wrote:

> Did you override hashCode too?
> On Apr 7, 2015 2:39 PM, "anny9699" <anny9...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a question about Array[T].distinct on customized class T. My data
>> is
>> a like RDD[(String, Array[T])] in which T is a class written by my class.
>> There are some duplicates in each Array[T] so I want to remove them. I
>> override the equals() method in T and use
>>
>> val dataNoDuplicates = dataDuplicates.map{case(id, arr) => (id,
>> arr.distinct)}
>>
>> to remove duplicates inside RDD. However this doesn't work since I did
>> some
>> further tests by using
>>
>> val dataNoDuplicates = dataDuplicates.map{case(id, arr) =>
>> val uniqArr = arr.distinct
>> if(uniqArr.length > 1) println(uniqArr.head == uniqArr.last)
>> (id, uniqArr)
>> }
>>
>> And from the worker stdout I could see that it always returns "TRUE"
>> results. I then tried removing duplicates by using Array[T].toSet instead
>> of
>> Array[T].distinct and it is working!
>>
>> Could anybody explain why the Array[T].toSet and Array[T].distinct behaves
>> differently here? And Why is Array[T].distinct not working?
>>
>> Thanks a lot!
>> Anny
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Array-T-distinct-doesn-t-work-inside-RDD-tp22412.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Reply via email to