Ok Hequn,
I'll open 2 Jira for this issue, and maybe propose a draft of
CsvTableSource class handling avro schemas
FLINK-9813 and FLINK-9814
Thank you for your answers and best regards
François
2018-07-11 8:11 GMT+02:00 Hequn Cheng :
> Hi francois,
>
> > Is there any plan to give avro schemas
Hi francois,
> Is there any plan to give avro schemas a better role in Flink in further
versions?
Haven't heard about avro for csv. You can open a jira for it. Maybe also
contribute to flink :-)
On Tue, Jul 10, 2018 at 11:32 PM, françois lacombe <
francois.laco...@dcbrain.com> wrote:
> Hi Hequn
Hi Hequn,
2018-07-10 3:47 GMT+02:00 Hequn Cheng :
> Maybe I misunderstand you. So you don't want to skip the whole file?
>
Yes I do
By skipping the whole file I mean "throw an Exception to stop the process
and inform user that file is invalid for a given reason" and not "the
process goes fully ri
Hi francois,
> I guess that first step doesn't require AvroInputFormat
The first step requires an AvroInputFormat because the source needs
AvroInputFormat to read avro data if data match schema.
> the second would be more efficient with an AvroInputFormat
I think the second step doesn't need an A
Hi Hequn,
As CsvTableSource sounds to be optimized for csv parsing I won't question
it too much.
Your second point sounds really better.
I can extend the CsvTableSource with extra Avro schema conflating
capabilities. Then if the csv file header doesn't match the avro schema
specification, then it
Hi francois,
> I see that CsvTableSource allows to define csv fields. Then, will it
check if columns actually exists in the file and throw Exception if not ?
Currently, CsvTableSource doesn't support Avro. CsvTableSource
uses fieldDelim and rowDelim to parse data. But there is a workaround: read
e
Hi Hequn,
The Table-API is really great.
I will use and certainly love it to solve the issues I mentioned before
One subsequent question regarding Table-API :
I've got my csv files and avro schemas that describe them.
As my users can send erroneous files, inconsistent with schemas, I want to
chec
Hi francois,
If I understand correctly, you can use sql or table-api to solve you
problem.
As you want to project part of columns from source, a columnar storage like
parquet/orc would be efficient. Currently, ORC table source is supported in
flink, you can find more details here[1]. Also, there a