Yes. Due to below error, Flink bulk writer never close the part file and keep on creating new part file continuously. Is flink not handling exceptions like below?
From: Feng Jin <jinfeng1...@gmail.com> Sent: 20 September 2023 05:54 PM To: Kamal Mittal <kamal.mit...@ericsson.com> Cc: user@flink.apache.org Subject: Re: About Flink parquet format Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.bytes.CapacityByteArrayOutputStream.<init>(CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.<init>(RunLengthBitPackingHybridEncoder.jav ``` So I think the current minimum page size that can be set for parquet is 64B. Best, Feng On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote: Hello, If given page size as 1 byte then encountered exception as - ‘maxCapacityHint can't be less than initialSlabSize %d %d’. This is coming from class CapacityByteArrayOutputStream and contained in parquet-common library. Rgds, Kamal From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>> Sent: 19 September 2023 01:01 PM To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote: Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>> Sent: 15 September 2023 04:14 PM To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <user@flink.apache.org<mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal