Filter columns of a csv file with Flink

françois lacombe Fri, 06 Jul 2018 06:49:00 -0700

Hi all,

I'm a new user to Flink community. This tool sounds great to achieve some
data loading of millions-rows files into a pgsql db for a new project.


As I read docs and examples, a proper use case of csv loading into pgsql
can't be found.
The file I want to load isn't following the same structure than the table,
I have to delete some columns and make a json string from several others
too prior to load to pgsql

I plan to use Flink 1.5 Java API and a batch process.
Does the DataSet class is able to strip some columns out of the records I
load or should I iterate over each record to delete the columns?

Same question to make a json string from several columns of the same record?
E.g json_column =3D {"field1":col1, "field2":col2...}

I work with 20 millions length files and it sounds pretty ineffective to
iterate over each records.
Can someone tell me if it's possible or if I have to change my mind about
this?


Thanks in advance, all the best

François Lacombe

Filter columns of a csv file with Flink

Reply via email to