You should look into window functions for spark sql. Debabrata Ghosh <[email protected]> schrieb am Mo. 12. Feb. 2018 um 13:10:
> Hi, > Greetings ! > > I needed some efficient way in pyspark to execute a > comparison (on all the attributes) between the current row and the previous > row. My intent here is to leverage the distributed framework of Spark to > the best extent so that can achieve a good speed. Please can anyone suggest > me a suitable algorithm / command. Here is a snapshot of the underlying > data which I need to compare: > > [image: Inline image 1] > > Thanks in advance ! > > D >
