Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
gt;>> it's stored on HDFS. >>>> >>>> >>>> >>>> Sent with Good (www.good.com) >>>> >>>> >>>> -Original Message- >>>> *From: *Michal Michalski [michal.michal...@boxever.com] >>>&

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
anelin, Ilya >>> wrote: >>> >>>> If you're reading a file one by line then you should simply use Java's >>>> Hadoop FileSystem class to read the file with a BuffereInputStream. I don't >>>> think you need an RDD here. >>&

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
--- >> *From: *Michal Michalski [michal.michal...@boxever.com] >> *Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time >> *To: *Ganelin, Ilya >> *Cc: *Spico Florin; user >> *Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of >> the input d

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
with Good (www.good.com) >> >> >> -Original Message- >> *From: *Michal Michalski [michal.michal...@boxever.com] >> *Sent: *Friday, April 24, 2015 11:04 AM Eastern Standard Time >> *To: *Ganelin, Ilya >> *Cc: *Spico Florin; user >> *Subject: *

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
essage- > *From: *Michal Michalski [michal.michal...@boxever.com] > *Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time > *To: *Ganelin, Ilya > *Cc: *Spico Florin; user > *Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of > the input data from Ha

RE: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Ganelin, Ilya
ssage- From: Michal Michalski [michal.michal...@boxever.com<mailto:michal.michal...@boxever.com>] Sent: Friday, April 24, 2015 11:18 AM Eastern Standard Time To: Ganelin, Ilya Cc: Spico Florin; user Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
ww.good.com) >> >> >> >> -Original Message----- >> *From: *Michal Michalski [michal.michal...@boxever.com] >> *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time >> *To: *Spico Florin >> *Cc: *user >> *Subject: *Re: Does HadoopRDD.zipWi

RE: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Ganelin, Ilya
...@boxever.com<mailto:michal.michal...@boxever.com>] Sent: Friday, April 24, 2015 11:04 AM Eastern Standard Time To: Ganelin, Ilya Cc: Spico Florin; user Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop? The problem I'm facing is that I need t

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
nt with Good (www.good.com) >> >> >> >> -Original Message- >> *From: *Michal Michalski [michal.michal...@boxever.com] >> *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time >> *To: *Spico Florin >> *Cc: *user >> *Subject:

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
< 50).sortBy(_._1) > > > > Sent with Good (www.good.com) > > > > -Original Message- > *From: *Michal Michalski [michal.michal...@boxever.com] > *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time > *To: *Spico Florin > *Cc: *user > *Subject: *Re: Does Had

RE: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Ganelin, Ilya
, 2015 10:41 AM Eastern Standard Time To: Spico Florin Cc: user Subject: Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop? Of course after you do it, you probably want to call repartition(somevalue) on your RDD to "get your paralellism back".

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Imran Rashid
Another issue is that hadooprdd (which sc.textfile uses) might split input files and even if it doesn't split, it doesn't guarantee that part files numbers go to the corresponding partition number in the rdd. Eg part-0 could go to partition 27 On Apr 24, 2015 7:41 AM, "Michal Michalski" wrote

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Sean Owen
The order of elements in an RDD is in general not guaranteed unless you sort. You shouldn't expect to encounter the partitions of an RDD in any particular order. In practice, you probably find the partitions come up in the order Hadoop presents them in this case. And within a partition, in this ca

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
Of course after you do it, you probably want to call repartition(somevalue) on your RDD to "get your paralellism back". Kind regards, MichaƂ Michalski, michal.michal...@boxever.com On 24 April 2015 at 15:28, Michal Michalski wrote: > I did a quick test as I was curious about it too. I created a

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
I did a quick test as I was curious about it too. I created a file with numbers from 0 to 999, in order, line by line. Then I did: scala> val numbers = sc.textFile("./numbers.txt") scala> val zipped = numbers.zipWithUniqueId scala> zipped.foreach(i => println(i)) Expected result if the order was

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Jeetendra Gangele
zipwithIndex will preserve the order whatever is there in your val lines. I am not sure about the "val lines=sc.textFile("hdfs://mytextFile") " if this line maintain the order, next will maintain for sure On 24 April 2015 at 18:35, Spico Florin wrote: > Hello! > I know that HadoopRDD partiti

Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Spico Florin
Hello! I know that HadoopRDD partitions are built based on the number of splits in HDFS. I'm wondering if these partitions preserve the initial order of data in file. As an example, if I have an HDFS (myTextFile) file that has these splits: split 0-> line 1, ..., line k split 1->line k+1,..., li