Hi everyone, I am kind of stuck in a problem and was hoping for some pointers or help :) have tried different things but couldn't achieve the desired results.
I want to *create single row from multiple rows if those rows are continuous* (based on time i.e if next row's time is within 2 minutes of previous row's time) so what i have is this df (after filtering and grouping) +--------------------+---+-----+| time|val|group|+--------------------+---+-----+| 2017-01-01 00:00:00| 41| 1|| 2017-01-01 00:01:00| 42| 1|| 2017-01-01 00:02:00| 41| 1|| 2017-01-01 00:15:00| 50| 1|| 2017-01-01 00:18:00| 49| 1|| 2017-01-01 00:19:00| 51| 1|| 2017-01-01 00:20:00| 30| 1|+--------------------+---+-----+ from which I want to compute another df +--------------------+--------------------+-----+| start time| end time|group|+--------------------+--------------------+-----+| 2017-01-01 00:00:00| 2017-01-01 00:02:00| 1|| 2017-01-01 00:15:00| 2017-01-01 00:15:00| 1|| 2017-01-01 00:18:00| 2017-01-01 00:20:00| 1|+--------------------+--------------------+-----+ how do I achieve this? UDAF with withColumn only works for aggregation in single row. I am using Spark 2.1.0 on zeppelin with pyspark