Hi all,
I have a set of CSV files which contains 'x_val', 'y_val' columns stored in
a S3 bucket.
Ex:
*file1.csv*
*x_val, y_val*
*1,2*
*2,5*
*3,2*
*file2.csv*
*x_val, y_val*
*4,8*
*5,3*
*6,5*
I need to,
1. List the csv files in alphabetical order.
2. Add a new column 'prev_y_val'
1. which should contain the value of the 'y_val' column of the
previous row.
2. In the first row of a file, this column should contain the value
of the 'y_val' column in the last row of the previous file.
(only the first
row of the first file can be null). Even though the NIFI
instance is killed
while processing a file, it should be able to write the value of
the first
row correctly.
Ex:
*file1.csv*
*x_val, y_val,prev_y_val*
*1,2,*
*2,5,2*
*3,2,5*
*file2.csv*
*x_val, y_val,prev_y_val*
*4,8,2*
*5,3,8*
*6,5,3*
I'm grateful if you can suggest a way to implement this logic. If it is
required to create a custom processor, could you please suggest the best
practices for state management between two flow files.
Thanks & Regards
*Vibhath Ileperuma*