that is correct On Mon, Sep 19, 2016 at 12:09 PM, ayan guha <guha.a...@gmail.com> wrote:
> Ok, so if you see > > 1,3,4,6..... > > Will you say 2,5 are missing? > > On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi <smag...@gmail.com> > wrote: > >> Each of the records will be having a sequence id .No duplicates >> >> On Mon, Sep 19, 2016 at 11:42 AM, ayan guha <guha.a...@gmail.com> wrote: >> >>> And how do you define missing sequence? Can you give an example? >>> >>> On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi <smag...@gmail.com> >>> wrote: >>> >>>> Hi Jorn , >>>> We have a file with billion records.We want to find if there any >>>> missing sequences here .If so what are they ? >>>> Thanks >>>> Sudhindra >>>> >>>> On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke <jornfra...@gmail.com> >>>> wrote: >>>> >>>>> I am not sure what you try to achieve here. Can you please tell us >>>>> what the goal of the program is. Maybe with some example data? >>>>> >>>>> Besides this, I have the feeling that it will fail once it is not used >>>>> in a single node scenario due to the reference to the global counter >>>>> variable. >>>>> >>>>> Also unclear why you collect the data first to parallelize it again. >>>>> >>>>> On 18 Sep 2016, at 14:26, sudhindra <smag...@gmail.com> wrote: >>>>> >>>>> Hi i have coded something like this , pls tell me how bad it is . >>>>> >>>>> package Spark.spark; >>>>> import java.util.List; >>>>> import java.util.function.Function; >>>>> >>>>> import org.apache.spark.SparkConf; >>>>> import org.apache.spark.SparkContext; >>>>> import org.apache.spark.api.java.JavaRDD; >>>>> import org.apache.spark.api.java.JavaSparkContext; >>>>> import org.apache.spark.sql.DataFrame; >>>>> import org.apache.spark.sql.Dataset; >>>>> import org.apache.spark.sql.Row; >>>>> import org.apache.spark.sql.SQLContext; >>>>> >>>>> >>>>> >>>>> public class App >>>>> { >>>>> static long counter=1; >>>>> public static void main( String[] args ) >>>>> { >>>>> >>>>> >>>>> >>>>> SparkConf conf = new >>>>> SparkConf().setAppName("sorter").setMaster("local[2]").set(" >>>>> spark.executor.memory","1g"); >>>>> JavaSparkContext sc = new JavaSparkContext(conf); >>>>> >>>>> SQLContext sqlContext = new org.apache.spark.sql.SQLContex >>>>> t(sc); >>>>> >>>>> DataFrame df = sqlContext.read().json("path"); >>>>> DataFrame sortedDF = df.sort("id"); >>>>> //df.show(); >>>>> //sortedDF.printSchema(); >>>>> >>>>> System.out.println(sortedDF.collectAsList().toString()); >>>>> JavaRDD<Row> distData = sc.parallelize(sortedDF.collec >>>>> tAsList()); >>>>> >>>>> >>>>> List<String >missingNumbers=distData.map(new >>>>> org.apache.spark.api.java.function.Function<Row, String>() { >>>>> >>>>> >>>>> public String call(Row arg0) throws Exception { >>>>> // TODO Auto-generated method stub >>>>> >>>>> >>>>> if(counter!=new Integer(arg0.getString(0)).intValue()) >>>>> { >>>>> StringBuffer misses = new StringBuffer(); >>>>> long newCounter=counter; >>>>> while(newCounter!=new Integer(arg0.getString(0)).int >>>>> Value()) >>>>> { >>>>> misses.append(new String(new Integer((int) >>>>> counter).toString()) ); >>>>> newCounter++; >>>>> >>>>> } >>>>> counter=new Integer(arg0.getString(0)).int >>>>> Value()+1; >>>>> return misses.toString(); >>>>> >>>>> } >>>>> counter++; >>>>> return null; >>>>> >>>>> >>>>> >>>>> } >>>>> }).collect(); >>>>> >>>>> >>>>> >>>>> for (String name: missingNumbers) { >>>>> System.out.println(name); >>>>> } >>>>> >>>>> >>>>> >>>>> } >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: http://apache-spark-user-list. >>>>> 1001560.n3.nabble.com/filling-missing-values-in-a-sequence-t >>>>> p5708p27748.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards >>>> Sudhindra S Magadi >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> >> >> -- >> Thanks & Regards >> Sudhindra S Magadi >> > > > > -- > Best Regards, > Ayan Guha > -- Thanks & Regards Sudhindra S Magadi