Adding a new column which contains a column value of the previous row.

Vibhath Ileperuma Sat, 06 Mar 2021 22:09:35 -0800

Hi all,

I have a set of CSV files which contains 'x_val', 'y_val' columns stored in
a S3 bucket.
Ex:
*file1.csv*


*x_val, y_val*
*1,2*
*2,5*
*3,2*

*file2.csv*

*x_val, y_val*
*4,8*
*5,3*
*6,5*

I need to,

   1. List the csv files in alphabetical order.
   2. Add a new column 'prev_y_val'
      1. which should contain the value of the 'y_val' column of the
      previous row.
      2. In the first row of a file, this column should contain the value
      of the 'y_val' column in the last row of the previous file.
(only the first
      row of the first file can be null). Even though the NIFI
instance is killed
      while processing a file, it should be able to write the value of
the first
      row correctly.

Ex:
*file1.csv*

*x_val, y_val,prev_y_val*
*1,2,*
*2,5,2*
*3,2,5*

*file2.csv*

*x_val, y_val,prev_y_val*
*4,8,2*
*5,3,8*
*6,5,3*


I'm grateful if you can suggest a way to implement this logic. If it is
required to create a custom processor, could you please suggest the best
practices for state management between two flow files.

Thanks & Regards

*Vibhath Ileperuma*

Adding a new column which contains a column value of the previous row.

Reply via email to