Hi Jeff,

Yes, you are absolutely right.
It is because of the RecordReader reusing the Writable Instance. I did not
anticipate this as it worked for text files.

Thank you so much for doing this.
 Your answer is accepted!


Best,
Thamme



--
*Thamme Gowda N. *
Grad Student at usc.edu
Twitter: @thammegowda
Website : http://scf.usc.edu/~tnarayan/

On Tue, Mar 22, 2016 at 9:00 PM, Jeff Zhang <[email protected]> wrote:

> Zhan's reply on stackoverflow is correct.
>
>
> down vote
>
> Please refer to the comments in sequenceFile.
>
> /** Get an RDD for a Hadoop SequenceFile with given key and value types. *
> * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable
> object for each * record, directly caching the returned RDD or directly
> passing it to an aggregation or shuffle * operation will create many
> references to the same object. * If you plan to directly cache, sort, or
> aggregate Hadoop writable objects, you should first * copy them using a
> map function. */
>
>
>
> On Wed, Mar 23, 2016 at 11:58 AM, Jeff Zhang <[email protected]> wrote:
>
>> I think I got the root cause, you can use Text.toString() to solve this
>> issue.  Because the Text is shared so the last record display multiple
>> times.
>>
>> On Wed, Mar 23, 2016 at 11:37 AM, Jeff Zhang <[email protected]> wrote:
>>
>>> Looks like a spark bug. I can reproduce it for sequence file, but it
>>> works for text file.
>>>
>>> On Wed, Mar 23, 2016 at 10:56 AM, Thamme Gowda N. <[email protected]>
>>> wrote:
>>>
>>>> Hi spark experts,
>>>>
>>>> I am facing issues with cached RDDs. I noticed that few entries
>>>> get duplicated for n times when the RDD is cached.
>>>>
>>>> I asked a question on Stackoverflow with my code snippet to reproduce
>>>> it.
>>>>
>>>> I really appreciate  if you can visit
>>>> http://stackoverflow.com/q/36168827/1506477
>>>> and answer my question / give your comments.
>>>>
>>>> Or at the least confirm that it is a bug.
>>>>
>>>> Thanks in advance for your help!
>>>>
>>>> --
>>>> Thamme
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Reply via email to