Hello! I know that HadoopRDD partitions are built based on the number of splits in HDFS. I'm wondering if these partitions preserve the initial order of data in file. As an example, if I have an HDFS (myTextFile) file that has these splits:
split 0-> line 1, ..., line k split 1->line k+1,..., line k+n splt 2->line k+n, line k+n+m and the code val lines=sc.textFile("hdfs://mytextFile") lines.zipWithIndex() will the order of lines preserved? (line 1, zipIndex 1) , .. (line k, zipIndex k), and so one. I found this question on stackoverflow ( http://stackoverflow.com/questions/26046410/how-can-i-obtain-an-element-position-in-sparks-rdd) whose answer intrigued me: "Essentially, RDD's zipWithIndex() method seems to do this, but it won't preserve the original ordering of the data the RDD was created from" Can you please confirm that is this the correct answer? Thanks. Florin