Ok, so if you see

1,3,4,6.....

Will you say 2,5 are missing?

On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi <smag...@gmail.com> wrote:

> Each of the records will be having a sequence id .No duplicates
>
> On Mon, Sep 19, 2016 at 11:42 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> And how do you define missing sequence? Can you give an example?
>>
>> On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi <smag...@gmail.com>
>> wrote:
>>
>>> Hi Jorn ,
>>>  We have a file with billion records.We want to find if there any
>>> missing sequences here .If so what are they ?
>>> Thanks
>>> Sudhindra
>>>
>>> On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> I am not sure what you try to achieve here. Can you please tell us what
>>>> the goal of the program is. Maybe with some example data?
>>>>
>>>> Besides this, I have the feeling that it will fail once it is not used
>>>> in a single node scenario due to the reference to the global counter
>>>> variable.
>>>>
>>>> Also unclear why you collect the data first to parallelize it again.
>>>>
>>>> On 18 Sep 2016, at 14:26, sudhindra <smag...@gmail.com> wrote:
>>>>
>>>> Hi i have coded something like this , pls tell me how bad it is .
>>>>
>>>> package Spark.spark;
>>>> import java.util.List;
>>>> import java.util.function.Function;
>>>>
>>>> import org.apache.spark.SparkConf;
>>>> import org.apache.spark.SparkContext;
>>>> import org.apache.spark.api.java.JavaRDD;
>>>> import org.apache.spark.api.java.JavaSparkContext;
>>>> import org.apache.spark.sql.DataFrame;
>>>> import org.apache.spark.sql.Dataset;
>>>> import org.apache.spark.sql.Row;
>>>> import org.apache.spark.sql.SQLContext;
>>>>
>>>>
>>>>
>>>> public class App
>>>> {
>>>>    static long counter=1;
>>>>    public static void main( String[] args )
>>>>    {
>>>>
>>>>
>>>>
>>>>        SparkConf conf = new
>>>> SparkConf().setAppName("sorter").setMaster("local[2]").set("
>>>> spark.executor.memory","1g");
>>>>        JavaSparkContext sc = new JavaSparkContext(conf);
>>>>
>>>>        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
>>>>
>>>>        DataFrame df = sqlContext.read().json("path");
>>>>        DataFrame sortedDF = df.sort("id");
>>>>        //df.show();
>>>>        //sortedDF.printSchema();
>>>>
>>>>        System.out.println(sortedDF.collectAsList().toString());
>>>>        JavaRDD<Row> distData = sc.parallelize(sortedDF.collec
>>>> tAsList());
>>>>
>>>>
>>>>     List<String >missingNumbers=distData.map(new
>>>> org.apache.spark.api.java.function.Function<Row, String>() {
>>>>
>>>>
>>>>            public String call(Row arg0) throws Exception {
>>>>                // TODO Auto-generated method stub
>>>>
>>>>
>>>>                if(counter!=new Integer(arg0.getString(0)).intValue())
>>>>                {
>>>>                    StringBuffer misses = new StringBuffer();
>>>>                    long newCounter=counter;
>>>>                    while(newCounter!=new Integer(arg0.getString(0)).int
>>>> Value())
>>>>                    {
>>>>                        misses.append(new String(new Integer((int)
>>>> counter).toString()) );
>>>>                        newCounter++;
>>>>
>>>>                    }
>>>>                    counter=new Integer(arg0.getString(0)).intValue()+1;
>>>>                    return misses.toString();
>>>>
>>>>                }
>>>>                counter++;
>>>>                return null;
>>>>
>>>>
>>>>
>>>>            }
>>>>        }).collect();
>>>>
>>>>
>>>>
>>>>        for (String name: missingNumbers) {
>>>>              System.out.println(name);
>>>>            }
>>>>
>>>>
>>>>
>>>>    }
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/filling-missing-values-in-a-sequence-t
>>>> p5708p27748.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>>> .
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Sudhindra S Magadi
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> Thanks & Regards
> Sudhindra S Magadi
>



-- 
Best Regards,
Ayan Guha

Reply via email to