There aren't any guarantees on the order that partitions are combined in
the 'saveAsTextFile' method. Generally the file will be written in
per-partition blocks, but there's no notion of order of the partitions. If
order matters to you you can do a sortByKey at load time.

Can you provide a reproducible example of the behavior you're seeing (say
from the spark shell)? It's difficult to provide guidance based on the code
you sent.


On Thu, Jan 30, 2014 at 10:24 AM, Archit Thakur
<[email protected]>wrote:

> Yes, I do that. But if I go to my worker node and check for the list it
> has printed
>
>
>
> MyRdd.flatmap(func(_))
> MyRdd.saveAsTextFile(..)
>
> func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = {
>
> //
>
> *println(list)*
> list
> }
>
>
>
> The records differ( only count match).
>
>
> On Thu, Jan 30, 2014 at 11:48 PM, Evan R. Sparks <[email protected]>wrote:
>
>> Actually - looking at your use case, you may simply be saving the
>> original RDD
>> Doing something like:
>> val newRdd = MyRdd.flatMap(func)
>> newRdd.saveAsTextFile(...)
>>
>> May solve your issue.
>>
>>
>> On Thu, Jan 30, 2014 at 10:17 AM, Evan R. Sparks 
>> <[email protected]>wrote:
>>
>>> Could it be that you have the same records that you get back from
>>> flatMap, just in a different order?
>>>
>>>
>>> On Thu, Jan 30, 2014 at 1:05 AM, Archit Thakur <
>>> [email protected]> wrote:
>>>
>>>> Needless to say, it works fine with int/string(primitive) type.
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 2:04 PM, Archit Thakur <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am facing a general problem with flatmap operation on rdd.
>>>>>
>>>>> I am doing
>>>>>
>>>>> MyRdd.flatmap(func(_))
>>>>> MyRdd.saveAsTextFile(..)
>>>>>
>>>>> func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = {
>>>>>
>>>>> //
>>>>>
>>>>> println(list)
>>>>> list
>>>>> }
>>>>>
>>>>> now if I check the list from the logs at worker and check the textfile
>>>>> it has created, it differs.
>>>>>
>>>>> Only the no. of records are same, but the actual records in the file
>>>>> differs from one in the logs.
>>>>>
>>>>> Does Spark modifies keys/values in between? What other operations does
>>>>> it perform with Key or Value?
>>>>>
>>>>> Thanks and Regards,
>>>>> Archit Thakur.
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to