Re: DF blank value fill

2021-05-21 Thread ayan guha
Hi

You can do something like this:

SELECT MainKey, Subkey,
  case when val1 is null then newval1 else val1 end val1,
  case when val2 is null then newval2 else val1 end val2,
  case when val3 is null then newval3 else val1 end val3
 from (select mainkey,subkey, val1,val2, val3,
 first_value() over (partitionby mainkey, subkey order
by val1 nulls last) newval1,
 first_value() over (partitionby mainkey, subkey order
by val2 nulls last) newval2,
 first_value() over (partitionby mainkey, subkey order
by val3 nulls last) newval3
from table) x

On Fri, May 21, 2021 at 9:29 PM Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> Hi all,
>
>
>
> My df looks like follows:
>
>
>
> Situation:
>
> MainKey, SubKey, Val1, Val2, Val3, …
>
> 1, 2, a, null, c
>
> 1, 2, null, null, c
>
> 1, 3, null, b, null
>
> 1, 3, a, null, c
>
>
>
>
>
> Desired outcome:
>
> 1, 2, a, b, c
>
> 1, 2, a, b, c
>
> 1, 3, a, b, c
>
> 1, 3, a, b, c
>
>
>
>
>
> How could I populate/synchronize empty cells of all records with the same
> combination of MainKey and SubKey with the respective value of other rows
> with the same key combination?
>
> A certain value, if not null, of a col is guaranteed to be unique within
> the df. If a col exists then there is at least one row with a not-null
> value.
>
>
>
> I am using pyspark.
>
>
>
> Thanks for any hint,
>
> Best
>
> Meikel
>


-- 
Best Regards,
Ayan Guha


DF blank value fill

2021-05-21 Thread Bode, Meikel, NMA-CFD
Hi all,

My df looks like follows:

Situation:
MainKey, SubKey, Val1, Val2, Val3, ...
1, 2, a, null, c
1, 2, null, null, c
1, 3, null, b, null
1, 3, a, null, c


Desired outcome:
1, 2, a, b, c
1, 2, a, b, c
1, 3, a, b, c
1, 3, a, b, c


How could I populate/synchronize empty cells of all records with the same 
combination of MainKey and SubKey with the respective value of other rows with 
the same key combination?
A certain value, if not null, of a col is guaranteed to be unique within the 
df. If a col exists then there is at least one row with a not-null value.

I am using pyspark.

Thanks for any hint,
Best
Meikel