Re: Filter columns of a csv file with Flink

2018-07-11 Thread françois lacombe
Ok Hequn, I'll open 2 Jira for this issue, and maybe propose a draft of CsvTableSource class handling avro schemas FLINK-9813 and FLINK-9814 Thank you for your answers and best regards François 2018-07-11 8:11 GMT+02:00 Hequn Cheng : > Hi francois, > > > Is there any plan to give avro schemas

Re: Filter columns of a csv file with Flink

2018-07-10 Thread Hequn Cheng
Hi francois, > Is there any plan to give avro schemas a better role in Flink in further versions? Haven't heard about avro for csv. You can open a jira for it. Maybe also contribute to flink :-) On Tue, Jul 10, 2018 at 11:32 PM, françois lacombe < francois.laco...@dcbrain.com> wrote: > Hi Hequn

Re: Filter columns of a csv file with Flink

2018-07-10 Thread françois lacombe
Hi Hequn, 2018-07-10 3:47 GMT+02:00 Hequn Cheng : > Maybe I misunderstand you. So you don't want to skip the whole file? > Yes I do By skipping the whole file I mean "throw an Exception to stop the process and inform user that file is invalid for a given reason" and not "the process goes fully ri

Re: Filter columns of a csv file with Flink

2018-07-09 Thread Hequn Cheng
Hi francois, > I guess that first step doesn't require AvroInputFormat The first step requires an AvroInputFormat because the source needs AvroInputFormat to read avro data if data match schema. > the second would be more efficient with an AvroInputFormat I think the second step doesn't need an A

Re: Filter columns of a csv file with Flink

2018-07-09 Thread françois lacombe
Hi Hequn, As CsvTableSource sounds to be optimized for csv parsing I won't question it too much. Your second point sounds really better. I can extend the CsvTableSource with extra Avro schema conflating capabilities. Then if the csv file header doesn't match the avro schema specification, then it

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, > I see that CsvTableSource allows to define csv fields. Then, will it check if columns actually exists in the file and throw Exception if not ? Currently, CsvTableSource doesn't support Avro. CsvTableSource uses fieldDelim and rowDelim to parse data. But there is a workaround: read e

Re: Filter columns of a csv file with Flink

2018-07-06 Thread françois lacombe
Hi Hequn, The Table-API is really great. I will use and certainly love it to solve the issues I mentioned before One subsequent question regarding Table-API : I've got my csv files and avro schemas that describe them. As my users can send erroneous files, inconsistent with schemas, I want to chec

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, If I understand correctly, you can use sql or table-api to solve you problem. As you want to project part of columns from source, a columnar storage like parquet/orc would be efficient. Currently, ORC table source is supported in flink, you can find more details here[1]. Also, there a