Hi Jorn ,
 We have a file with billion records.We want to find if there any missing
sequences here .If so what are they ?
Thanks
Sudhindra

On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> I am not sure what you try to achieve here. Can you please tell us what
> the goal of the program is. Maybe with some example data?
>
> Besides this, I have the feeling that it will fail once it is not used in
> a single node scenario due to the reference to the global counter variable.
>
> Also unclear why you collect the data first to parallelize it again.
>
> On 18 Sep 2016, at 14:26, sudhindra <smag...@gmail.com> wrote:
>
> Hi i have coded something like this , pls tell me how bad it is .
>
> package Spark.spark;
> import java.util.List;
> import java.util.function.Function;
>
> import org.apache.spark.SparkConf;
> import org.apache.spark.SparkContext;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SQLContext;
>
>
>
> public class App
> {
>    static long counter=1;
>    public static void main( String[] args )
>    {
>
>
>
>        SparkConf conf = new
> SparkConf().setAppName("sorter").setMaster("local[2]")
> .set("spark.executor.memory","1g");
>        JavaSparkContext sc = new JavaSparkContext(conf);
>
>        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
>
>        DataFrame df = sqlContext.read().json("path");
>        DataFrame sortedDF = df.sort("id");
>        //df.show();
>        //sortedDF.printSchema();
>
>        System.out.println(sortedDF.collectAsList().toString());
>        JavaRDD<Row> distData = sc.parallelize(sortedDF.collectAsList());
>
>
>     List<String >missingNumbers=distData.map(new
> org.apache.spark.api.java.function.Function<Row, String>() {
>
>
>            public String call(Row arg0) throws Exception {
>                // TODO Auto-generated method stub
>
>
>                if(counter!=new Integer(arg0.getString(0)).intValue())
>                {
>                    StringBuffer misses = new StringBuffer();
>                    long newCounter=counter;
>                    while(newCounter!=new Integer(arg0.getString(0)).
> intValue())
>                    {
>                        misses.append(new String(new Integer((int)
> counter).toString()) );
>                        newCounter++;
>
>                    }
>                    counter=new Integer(arg0.getString(0)).intValue()+1;
>                    return misses.toString();
>
>                }
>                counter++;
>                return null;
>
>
>
>            }
>        }).collect();
>
>
>
>        for (String name: missingNumbers) {
>              System.out.println(name);
>            }
>
>
>
>    }
> }
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/filling-missing-values-in-a-sequence-
> tp5708p27748.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Thanks & Regards
Sudhindra S Magadi

Reply via email to