Hi Kamal

I think it depends on the amount of data and the usage scenario. If your
checkpoint interval is very long, multiple row groups may be written within
one checkpoint cycle, so configuring the row group size makes sense.
However, if there are short intervals between checkpoints or if you have a
small amount of data, there is indeed not much need to configure the row
group size.


Best,
Feng

On Sat, Aug 19, 2023 at 7:00 PM Kamal Mittal via user <user@flink.apache.org>
wrote:

> Hello Community,
>
>
>
> Please help me to find out inputs for below query.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Kamal Mittal via user <user@flink.apache.org>
> *Sent:* 18 August 2023 08:04 AM
> *To:* user@flink.apache.org
> *Subject:* RE: Flink AVRO to Parquet writer - Row group size/Page size
>
>
>
> Hello Community,
>
>
>
> Please share views for below.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Kamal Mittal <kamal.mit...@ericsson.com>
> *Sent:* 17 August 2023 08:01 AM
> *To:* Kamal Mittal <kamal.mit...@ericsson.com>; user@flink.apache.org
> *Subject:* RE: Flink AVRO to Parquet writer - Row group size/Page size
>
>
>
> Hello Community,
>
>
>
> Please share views for below.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Kamal Mittal via user <user@flink.apache.org>
> *Sent:* 16 August 2023 04:35 PM
> *To:* user@flink.apache.org
> *Subject:* Flink AVRO to Parquet writer - Row group size/Page size
>
>
>
> Hello,
>
>
>
> For Parquet, default row group size is 128 MB and Page size is 1MB but
> Flink Bulk writer using file sink create the files based on checkpointing
> interval only.
>
>
>
> So is there any significance of configured row group size and page size
> for Flink parquet bulk writer? How Flink uses these two values with
> checkpointing interval?
>
>
>
> Rgds,
>
> Kamal
>

Reply via email to