How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-28 Thread Carlo Allocca

Dear All,

I am trying to propagate the last valid observation (e.g. not null) to the null 
values in a dataset.

Below I reported the partial solution:

Dataset tmp800=tmp700.select("uuid", "eventTime", "Washer_rinseCycles");
WindowSpec wspec= 
Window.partitionBy(tmp800.col("uuid")).orderBy(tmp800.col("uuid"),tmp800.col("eventTime"));
Column c1 = 
org.apache.spark.sql.functions.lag(tmp800.col("Washer_rinseCycles"),1).over(wspec);
Dataset tmp900=tmp800.withColumn("Washer_rinseCyclesFilled", 
when(tmp800.col("Washer_rinseCycles").isNull(), 
c1).otherwise(tmp800.col("Washer_rinseCycles")));
However, It does not solve the entire problem as the function lag(,1) returns 
the value that is the rows before the current row even if it is NULL (see the 
below table).

Is there in SPARK a similar method to Pandas' "backfill" for the DataFrame?

Is it possible to do it using SPARK API? How?

Many Thanks in advance.
Best Regards,
Carlo

[Immagine in linea con il testo]


How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-25 Thread Carlo . Allocca
Dear All,

I need to apply a dataset transformation to replace null values with the 
previous Non-null Value.
As an example, I report the following:

from:

id | col1
-
1 null
1 null
2 4
2 null
2 null
3 5
3 null
3 null

to:

id  |  col1
-
1 null
1 null
2 4
2 4
2 4
3 5
3 5
3 5

I am using SPARK SQL 2 and the Dataset.

I searched on google but I only find solution in the context of database e.g 
(https://blog.jooq.org/2015/12/17/how-to-fill-sparse-data-with-the-previous-non-empty-value-in-sql/)

Please, any help how to implement this in SPARK ? I understood that I should 
use Windows and Lang but I cannot put them together.


Thank you in advance for your help.

Best Regards,
Carlo