; Now, you can use RDD operation to run this function on each partition:
>
> >>> r1 = r.mapPartitions(f)
>
> Now, you would have local missing values. You can now write them out to a
> file.
>
> On Mon, Sep 19, 2016 at 4:39 PM, Sudhindra Magadi
> wrote:
>
>
that is correct
On Mon, Sep 19, 2016 at 12:09 PM, ayan guha wrote:
> Ok, so if you see
>
> 1,3,4,6.
>
> Will you say 2,5 are missing?
>
> On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi
> wrote:
>
>> Each of the records will be having a sequence id .N
Each of the records will be having a sequence id .No duplicates
On Mon, Sep 19, 2016 at 11:42 AM, ayan guha wrote:
> And how do you define missing sequence? Can you give an example?
>
> On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi
> wrote:
>
>> Hi Jorn ,
>>
Hi Jorn ,
We have a file with billion records.We want to find if there any missing
sequences here .If so what are they ?
Thanks
Sudhindra
On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke wrote:
> I am not sure what you try to achieve here. Can you please tell us what
> the goal of the prog
Hi i have coded something like this , pls tell me how bad it is .
package Spark.spark;
import java.util.List;
import java.util.function.Function;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.Jav