th");
>>>>>>>DataFrame sortedDF = df.sort("id");
>>>>>>>//df.show();
>>>>>>>//sortedDF.printSchema();
>>>>>>>
>>>>>>>System.out.println(sortedDF.collectAsList().toString());
>>>>>>>Ja
ng());
>>>>>>JavaRDD distData = sc.parallelize(sortedDF.collec
>>>>>> tAsList());
>>>>>>
>>>>>>
>>>>>> ListmissingNumbers=distData.map(new
>>>>>> org.apache.spark.api.java.function.Function() {
>>>>>>
>>&
t;
>>>>>
>>>>> ListmissingNumbers=distData.map(new
>>>>> org.apache.spark.api.java.function.Function() {
>>>>>
>>>>>
>>>>>public String call(Row arg0) throws Exception {
>>>>> // TODO Auto-ge
gt;>>>{
>>>> StringBuffer misses = new StringBuffer();
>>>>long newCounter=counter;
>>>>while(newCounter!=new Integer(arg0.getString(0)).int
>>>> Value())
>>>&
>>>long newCounter=counter;
>>>while(newCounter!=new Integer(arg0.getString(0)).int
>>> Value())
>>> {
>>>misses.append(new String(new Integer((int)
>>> count
{
>>misses.append(new String(new Integer((int)
>> counter).toString()) );
>>newCounter++;
>>
>> }
>> counter=new Integer(arg0.getString(0)).intValue()+1;
>>return m
;
>
>}
>counter=new Integer(arg0.getString(0)).intValue()+1;
>return misses.toString();
>
>}
>counter++;
> return null;
>
>
>
>}
>}).co
(0)).intValue()+1;
>return misses.toString();
>
>}
>counter++;
>return null;
>
>
>
>}
>}).collect();
>
>
>
t();
for (String name: missingNumbers) {
System.out.println(name);
}
}
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/filling-missing-values-in-
Xiangrui,
Thanks for the pointer. I think it should work...for now I did cook up my
own which is similar but on top of spark core APIs. I would suggest moving
the sliding window RDD to the core spark library. It seems quite general to
me and a cursory look at the code indicates nothing specific to
Actually there is a sliding method implemented in
mllib.rdd.RDDFunctions. Since this is not for general use cases, we
didn't include it in spark-core. You can take a look at the
implementation there and see whether it fits. -Xiangrui
On Mon, May 19, 2014 at 10:06 PM, Mohit Jaggi wrote:
> Thanks S
Thanks Sean. Yes, your solution works :-) I did oversimplify my real
problem, which has other parameters that go along with the sequence.
On Fri, May 16, 2014 at 3:03 AM, Sean Owen wrote:
> Not sure if this is feasible, but this literally does what I think you
> are describing:
>
> sc.paralleli
Not sure if this is feasible, but this literally does what I think you
are describing:
sc.parallelize(rdd1.first to rdd1.last)
On Tue, May 13, 2014 at 4:56 PM, Mohit Jaggi wrote:
> Hi,
> I am trying to find a way to fill in missing values in an RDD. The RDD is a
> sorted sequence.
> For example,
over the hack of using
.filter to remove the first element (how do you want to handle ties, for
instance?), as well as the possible fragility of zipping.
--Brian
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/filling-missing-values-in-a-sequence-tp5708p5846.h
Hi,
I am trying to find a way to fill in missing values in an RDD. The RDD is a
sorted sequence.
For example, (1, 2, 3, 5, 8, 11, ...)
I need to fill in the missing numbers and get (1,2,3,4,5,6,7,8,9,10,11)
One way to do this is to "slide and zip"
rdd1 = sc.parallelize(List(1, 2, 3, 5, 8, 11, ...
15 matches
Mail list logo