Re: filling missing values in a sequence

2016-09-19 Thread Sudhindra Magadi
th"); >>>>>>>DataFrame sortedDF = df.sort("id"); >>>>>>>//df.show(); >>>>>>>//sortedDF.printSchema(); >>>>>>> >>>>>>>System.out.println(sortedDF.collectAsList().toString()); >>>>>>>Ja

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
ng()); >>>>>>JavaRDD distData = sc.parallelize(sortedDF.collec >>>>>> tAsList()); >>>>>> >>>>>> >>>>>> ListmissingNumbers=distData.map(new >>>>>> org.apache.spark.api.java.function.Function() { >>>>>> >>&

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
t; >>>>> >>>>> ListmissingNumbers=distData.map(new >>>>> org.apache.spark.api.java.function.Function() { >>>>> >>>>> >>>>>public String call(Row arg0) throws Exception { >>>>> // TODO Auto-ge

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
gt;>>>{ >>>> StringBuffer misses = new StringBuffer(); >>>>long newCounter=counter; >>>>while(newCounter!=new Integer(arg0.getString(0)).int >>>> Value()) >>>&

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
>>>long newCounter=counter; >>>while(newCounter!=new Integer(arg0.getString(0)).int >>> Value()) >>> { >>>misses.append(new String(new Integer((int) >>> count

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
{ >>misses.append(new String(new Integer((int) >> counter).toString()) ); >>newCounter++; >> >> } >> counter=new Integer(arg0.getString(0)).intValue()+1; >>return m

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
; > >} >counter=new Integer(arg0.getString(0)).intValue()+1; >return misses.toString(); > >} >counter++; > return null; > > > >} >}).co

Re: filling missing values in a sequence

2016-09-18 Thread Jörn Franke
(0)).intValue()+1; >return misses.toString(); > >} >counter++; >return null; > > > >} >}).collect(); > > >

Re: filling missing values in a sequence

2016-09-18 Thread sudhindra
t(); for (String name: missingNumbers) { System.out.println(name); } } } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filling-missing-values-in-

Re: filling missing values in a sequence

2014-05-20 Thread Mohit Jaggi
Xiangrui, Thanks for the pointer. I think it should work...for now I did cook up my own which is similar but on top of spark core APIs. I would suggest moving the sliding window RDD to the core spark library. It seems quite general to me and a cursory look at the code indicates nothing specific to

Re: filling missing values in a sequence

2014-05-19 Thread Xiangrui Meng
Actually there is a sliding method implemented in mllib.rdd.RDDFunctions. Since this is not for general use cases, we didn't include it in spark-core. You can take a look at the implementation there and see whether it fits. -Xiangrui On Mon, May 19, 2014 at 10:06 PM, Mohit Jaggi wrote: > Thanks S

Re: filling missing values in a sequence

2014-05-19 Thread Mohit Jaggi
Thanks Sean. Yes, your solution works :-) I did oversimplify my real problem, which has other parameters that go along with the sequence. On Fri, May 16, 2014 at 3:03 AM, Sean Owen wrote: > Not sure if this is feasible, but this literally does what I think you > are describing: > > sc.paralleli

Re: filling missing values in a sequence

2014-05-16 Thread Sean Owen
Not sure if this is feasible, but this literally does what I think you are describing: sc.parallelize(rdd1.first to rdd1.last) On Tue, May 13, 2014 at 4:56 PM, Mohit Jaggi wrote: > Hi, > I am trying to find a way to fill in missing values in an RDD. The RDD is a > sorted sequence. > For example,

Re: filling missing values in a sequence

2014-05-16 Thread bgawalt
over the hack of using .filter to remove the first element (how do you want to handle ties, for instance?), as well as the possible fragility of zipping. --Brian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filling-missing-values-in-a-sequence-tp5708p5846.h

filling missing values in a sequence

2014-05-15 Thread Mohit Jaggi
Hi, I am trying to find a way to fill in missing values in an RDD. The RDD is a sorted sequence. For example, (1, 2, 3, 5, 8, 11, ...) I need to fill in the missing numbers and get (1,2,3,4,5,6,7,8,9,10,11) One way to do this is to "slide and zip" rdd1 = sc.parallelize(List(1, 2, 3, 5, 8, 11, ...