And how do you define missing sequence? Can you give an example?

On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi <smag...@gmail.com> wrote:

> Hi Jorn ,
>  We have a file with billion records.We want to find if there any missing
> sequences here .If so what are they ?
> Thanks
> Sudhindra
>
> On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke <jornfra...@gmail.com>
> wrote:
>
>> I am not sure what you try to achieve here. Can you please tell us what
>> the goal of the program is. Maybe with some example data?
>>
>> Besides this, I have the feeling that it will fail once it is not used in
>> a single node scenario due to the reference to the global counter variable.
>>
>> Also unclear why you collect the data first to parallelize it again.
>>
>> On 18 Sep 2016, at 14:26, sudhindra <smag...@gmail.com> wrote:
>>
>> Hi i have coded something like this , pls tell me how bad it is .
>>
>> package Spark.spark;
>> import java.util.List;
>> import java.util.function.Function;
>>
>> import org.apache.spark.SparkConf;
>> import org.apache.spark.SparkContext;
>> import org.apache.spark.api.java.JavaRDD;
>> import org.apache.spark.api.java.JavaSparkContext;
>> import org.apache.spark.sql.DataFrame;
>> import org.apache.spark.sql.Dataset;
>> import org.apache.spark.sql.Row;
>> import org.apache.spark.sql.SQLContext;
>>
>>
>>
>> public class App
>> {
>>    static long counter=1;
>>    public static void main( String[] args )
>>    {
>>
>>
>>
>>        SparkConf conf = new
>> SparkConf().setAppName("sorter").setMaster("local[2]").set("
>> spark.executor.memory","1g");
>>        JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
>>
>>        DataFrame df = sqlContext.read().json("path");
>>        DataFrame sortedDF = df.sort("id");
>>        //df.show();
>>        //sortedDF.printSchema();
>>
>>        System.out.println(sortedDF.collectAsList().toString());
>>        JavaRDD<Row> distData = sc.parallelize(sortedDF.collectAsList());
>>
>>
>>     List<String >missingNumbers=distData.map(new
>> org.apache.spark.api.java.function.Function<Row, String>() {
>>
>>
>>            public String call(Row arg0) throws Exception {
>>                // TODO Auto-generated method stub
>>
>>
>>                if(counter!=new Integer(arg0.getString(0)).intValue())
>>                {
>>                    StringBuffer misses = new StringBuffer();
>>                    long newCounter=counter;
>>                    while(newCounter!=new Integer(arg0.getString(0)).int
>> Value())
>>                    {
>>                        misses.append(new String(new Integer((int)
>> counter).toString()) );
>>                        newCounter++;
>>
>>                    }
>>                    counter=new Integer(arg0.getString(0)).intValue()+1;
>>                    return misses.toString();
>>
>>                }
>>                counter++;
>>                return null;
>>
>>
>>
>>            }
>>        }).collect();
>>
>>
>>
>>        for (String name: missingNumbers) {
>>              System.out.println(name);
>>            }
>>
>>
>>
>>    }
>> }
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/filling-missing-values-in-a-sequence-t
>> p5708p27748.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Thanks & Regards
> Sudhindra S Magadi
>



-- 
Best Regards,
Ayan Guha

Reply via email to