Re: Filtering lines in parquet

Arvid Heise Thu, 11 Mar 2021 04:53:16 -0800

Hi Avi,

I'm not entirely sure I understand the question. Let's say you have source
A, B, C all with different schema but all have an id. You could use the
ParquetMapInputFormat that provides a map of the records and just use a
map-lookup.

However, I'm not sure how you want to write these records with different
schema into the same parquet file. Maybe, you just want to extract the
common fields of A, B, C? Then you can also use Table API and just declare
the fields that are common.

Or do you have sink A, B, C and actually 3 separate topologies?

On Wed, Mar 10, 2021 at 10:50 AM Avi Levi <a...@theneura.com> wrote:

> Hi all,
> I am trying to filter lines from parquet files, the problem is that they
> have different schemas, however the field that I am using to filter
> exists in all schemas.
> in spark this is quite straight forward :
>
> *val filtered = rawsDF.filter(col("id") != "123")*
>
> I tried to do it in flink by extending the ParquetInputFormat but in this
> case I need to schema (message type) and implement Convert method which I
> want to avoid since I do not want to convert the line (I want to write is
> as is to other parquet file)
>
> Any ideas ?
>
> Cheers
> Avi
>
>

Re: Filtering lines in parquet

Reply via email to