gt;>> it's stored on HDFS.
>>>>
>>>>
>>>>
>>>> Sent with Good (www.good.com)
>>>>
>>>>
>>>> -Original Message-
>>>> *From: *Michal Michalski [michal.michal...@boxever.com]
>>>&
anelin, Ilya
>>> wrote:
>>>
>>>> If you're reading a file one by line then you should simply use Java's
>>>> Hadoop FileSystem class to read the file with a BuffereInputStream. I don't
>>>> think you need an RDD here.
>>&
---
>> *From: *Michal Michalski [michal.michal...@boxever.com]
>> *Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time
>> *To: *Ganelin, Ilya
>> *Cc: *Spico Florin; user
>> *Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of
>> the input d
with Good (www.good.com)
>>
>>
>> -Original Message-
>> *From: *Michal Michalski [michal.michal...@boxever.com]
>> *Sent: *Friday, April 24, 2015 11:04 AM Eastern Standard Time
>> *To: *Ganelin, Ilya
>> *Cc: *Spico Florin; user
>> *Subject: *
essage-
> *From: *Michal Michalski [michal.michal...@boxever.com]
> *Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time
> *To: *Ganelin, Ilya
> *Cc: *Spico Florin; user
> *Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of
> the input data from Ha
ssage-
From: Michal Michalski
[michal.michal...@boxever.com<mailto:michal.michal...@boxever.com>]
Sent: Friday, April 24, 2015 11:18 AM Eastern Standard Time
To: Ganelin, Ilya
Cc: Spico Florin; user
Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input
data from
ww.good.com)
>>
>>
>>
>> -Original Message-----
>> *From: *Michal Michalski [michal.michal...@boxever.com]
>> *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time
>> *To: *Spico Florin
>> *Cc: *user
>> *Subject: *Re: Does HadoopRDD.zipWi
...@boxever.com<mailto:michal.michal...@boxever.com>]
Sent: Friday, April 24, 2015 11:04 AM Eastern Standard Time
To: Ganelin, Ilya
Cc: Spico Florin; user
Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input
data from Hadoop?
The problem I'm facing is that I need t
nt with Good (www.good.com)
>>
>>
>>
>> -Original Message-
>> *From: *Michal Michalski [michal.michal...@boxever.com]
>> *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time
>> *To: *Spico Florin
>> *Cc: *user
>> *Subject:
< 50).sortBy(_._1)
>
>
>
> Sent with Good (www.good.com)
>
>
>
> -Original Message-
> *From: *Michal Michalski [michal.michal...@boxever.com]
> *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time
> *To: *Spico Florin
> *Cc: *user
> *Subject: *Re: Does Had
, 2015 10:41 AM Eastern Standard Time
To: Spico Florin
Cc: user
Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input
data from Hadoop?
Of course after you do it, you probably want to call repartition(somevalue) on
your RDD to "get your paralellism back".
Another issue is that hadooprdd (which sc.textfile uses) might split input
files and even if it doesn't split, it doesn't guarantee that part files
numbers go to the corresponding partition number in the rdd. Eg part-0
could go to partition 27
On Apr 24, 2015 7:41 AM, "Michal Michalski"
wrote
The order of elements in an RDD is in general not guaranteed unless
you sort. You shouldn't expect to encounter the partitions of an RDD
in any particular order.
In practice, you probably find the partitions come up in the order
Hadoop presents them in this case. And within a partition, in this
ca
Of course after you do it, you probably want to call repartition(somevalue)
on your RDD to "get your paralellism back".
Kind regards,
MichaĆ Michalski,
michal.michal...@boxever.com
On 24 April 2015 at 15:28, Michal Michalski
wrote:
> I did a quick test as I was curious about it too. I created a
I did a quick test as I was curious about it too. I created a file with
numbers from 0 to 999, in order, line by line. Then I did:
scala> val numbers = sc.textFile("./numbers.txt")
scala> val zipped = numbers.zipWithUniqueId
scala> zipped.foreach(i => println(i))
Expected result if the order was
zipwithIndex will preserve the order whatever is there in your val lines.
I am not sure about the "val lines=sc.textFile("hdfs://mytextFile") " if
this line maintain the order, next will maintain for sure
On 24 April 2015 at 18:35, Spico Florin wrote:
> Hello!
> I know that HadoopRDD partiti
Hello!
I know that HadoopRDD partitions are built based on the number of splits
in HDFS. I'm wondering if these partitions preserve the initial order of
data in file.
As an example, if I have an HDFS (myTextFile) file that has these splits:
split 0-> line 1, ..., line k
split 1->line k+1,..., li
17 matches
Mail list logo