Re: About Flink parquet format

2023-09-24 Thread Feng Jin
Hi Kamal

Indeed, Flink does not handle this exception. When this exception occurs,
the Flink job will fail directly and internally keep restarting,
continuously creating new files.

Personally, I think this logic can be optimized. When this exception
occurs, the file with the exception should be deleted before the Flink job
exits, to avoid generating too many unnecessary files.


Best,
Feng

On Mon, Sep 25, 2023 at 10:27 AM Kamal Mittal 
wrote:

> Hello,
>
>
>
> Can you please share that why Flink is not able to handle exception and
> keeps on creating files continuously without closing?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Kamal Mittal via user 
> *Sent:* 21 September 2023 07:58 AM
> *To:* Feng Jin 
> *Cc:* user@flink.apache.org
> *Subject:* RE: About Flink parquet format
>
>
>
> Yes.
>
>
>
> Due to below error, Flink bulk writer never close the part file and keep
> on creating new part file continuously. Is flink not handling exceptions
> like below?
>
>
>
> *From:* Feng Jin 
> *Sent:* 20 September 2023 05:54 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi
>
>
>
> I tested it on my side and also got the same error. This should be a
> limitation of Parquet.
>
>
>
> ```
>
> java.lang.IllegalArgumentException: maxCapacityHint can't be less than
> initialSlabSize 64 1
>
> at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:
> 57) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
>
> at org.apache.parquet.bytes.CapacityByteArrayOutputStream.(
> CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:
> 1.17.1]
>
> at org.apache.parquet.column.values.rle.
> RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder
> .jav
>
> ```
>
>
>
>
>
> So I think the current minimum page size that can be set for parquet is
> 64B.
>
>
>
> Best,
>
> Feng
>
>
>
>
>
> On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal 
> wrote:
>
> Hello,
>
>
>
> If given page size as 1 byte then encountered exception as  -
> ‘maxCapacityHint can't be less than initialSlabSize %d %d’.
>
>
>
> This is coming from class CapacityByteArrayOutputStream and contained in
> parquet-common library.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin 
> *Sent:* 19 September 2023 01:01 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> What exception did you encounter? I have tested it locally and it works
> fine.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
>
>
> On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
> wrote:
>
> Hello,
>
>
>
> Checkpointing is enabled and works fine if configured parquet page size is
> at least 64 bytes as otherwise there is exception thrown at back-end.
>
>
>
> Looks to be an issue which is not handled by file sink bulk writer?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin 
> *Sent:* 15 September 2023 04:14 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> Check if the checkpoint of the task is enabled and triggered correctly. By
> default, write parquet files will roll a new file when checkpointing.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
> On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <
> user@flink.apache.org> wrote:
>
> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>
>


RE: About Flink parquet format

2023-09-24 Thread Kamal Mittal via user
Hello,

Can you please share that why Flink is not able to handle exception and keeps 
on creating files continuously without closing?

Rgds,
Kamal

From: Kamal Mittal via user 
Sent: 21 September 2023 07:58 AM
To: Feng Jin 
Cc: user@flink.apache.org
Subject: RE: About Flink parquet format

Yes.

Due to below error, Flink bulk writer never close the part file and keep on 
creating new part file continuously. Is flink not handling exceptions like 
below?

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 20 September 2023 05:54 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi

I tested it on my side and also got the same error. This should be a limitation 
of Parquet.

```
java.lang.IllegalArgumentException: maxCapacityHint can't be less than 
initialSlabSize 64 1
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) 
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:153)
 ~[flink-sql-parquet-1.17.1.jar:1.17.1]
at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder.jav
```


So I think the current minimum page size that can be set for parquet is 64B.

Best,
Feng


On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

If given page size as 1 byte then encountered exception as  - ‘maxCapacityHint 
can't be less than initialSlabSize %d %d’.

This is coming from class CapacityByteArrayOutputStream and contained in 
parquet-common library.

Rgds,
Kamal

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 19 September 2023 01:01 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

What exception did you encounter? I have tested it locally and it works fine.


Best,
Feng


On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

Checkpointing is enabled and works fine if configured parquet page size is at 
least 64 bytes as otherwise there is exception thrown at back-end.

Looks to be an issue which is not handled by file sink bulk writer?

Rgds,
Kamal

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 15 September 2023 04:14 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By 
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple ‘in-progress’ state files and with content 
only as ‘PAR1’ and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
‘in-progress’ part files or why no error is given if applicable?

Rgds,
Kamal


RE: About Flink parquet format

2023-09-20 Thread Kamal Mittal via user
Yes.

Due to below error, Flink bulk writer never close the part file and keep on 
creating new part file continuously. Is flink not handling exceptions like 
below?

From: Feng Jin 
Sent: 20 September 2023 05:54 PM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: About Flink parquet format

Hi

I tested it on my side and also got the same error. This should be a limitation 
of Parquet.

```
java.lang.IllegalArgumentException: maxCapacityHint can't be less than 
initialSlabSize 64 1
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) 
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:153)
 ~[flink-sql-parquet-1.17.1.jar:1.17.1]
at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder.jav
```


So I think the current minimum page size that can be set for parquet is 64B.

Best,
Feng


On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

If given page size as 1 byte then encountered exception as  - ‘maxCapacityHint 
can't be less than initialSlabSize %d %d’.

This is coming from class CapacityByteArrayOutputStream and contained in 
parquet-common library.

Rgds,
Kamal

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 19 September 2023 01:01 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

What exception did you encounter? I have tested it locally and it works fine.


Best,
Feng


On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

Checkpointing is enabled and works fine if configured parquet page size is at 
least 64 bytes as otherwise there is exception thrown at back-end.

Looks to be an issue which is not handled by file sink bulk writer?

Rgds,
Kamal

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 15 September 2023 04:14 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By 
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple ‘in-progress’ state files and with content 
only as ‘PAR1’ and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
‘in-progress’ part files or why no error is given if applicable?

Rgds,
Kamal


Re: About Flink parquet format

2023-09-20 Thread Feng Jin
Hi

I tested it on my side and also got the same error. This should be a
limitation of Parquet.

```
java.lang.IllegalArgumentException: maxCapacityHint can't be less than
initialSlabSize 64 1
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at org.apache.parquet.bytes.CapacityByteArrayOutputStream.(
CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17.
1]
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder
.(RunLengthBitPackingHybridEncoder.jav
```


So I think the current minimum page size that can be set for parquet is 64B.

Best,
Feng


On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal 
wrote:

> Hello,
>
>
>
> If given page size as 1 byte then encountered exception as  -
> ‘maxCapacityHint can't be less than initialSlabSize %d %d’.
>
>
>
> This is coming from class CapacityByteArrayOutputStream and contained in
> parquet-common library.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin 
> *Sent:* 19 September 2023 01:01 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> What exception did you encounter? I have tested it locally and it works
> fine.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
>
>
> On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
> wrote:
>
> Hello,
>
>
>
> Checkpointing is enabled and works fine if configured parquet page size is
> at least 64 bytes as otherwise there is exception thrown at back-end.
>
>
>
> Looks to be an issue which is not handled by file sink bulk writer?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin 
> *Sent:* 15 September 2023 04:14 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> Check if the checkpoint of the task is enabled and triggered correctly. By
> default, write parquet files will roll a new file when checkpointing.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
> On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <
> user@flink.apache.org> wrote:
>
> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>
>


RE: About Flink parquet format

2023-09-19 Thread Kamal Mittal via user
Hello,

If given page size as 1 byte then encountered exception as  - ‘maxCapacityHint 
can't be less than initialSlabSize %d %d’.

This is coming from class CapacityByteArrayOutputStream and contained in 
parquet-common library.

Rgds,
Kamal

From: Feng Jin 
Sent: 19 September 2023 01:01 PM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: About Flink parquet format

Hi Kamal

What exception did you encounter? I have tested it locally and it works fine.


Best,
Feng


On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

Checkpointing is enabled and works fine if configured parquet page size is at 
least 64 bytes as otherwise there is exception thrown at back-end.

Looks to be an issue which is not handled by file sink bulk writer?

Rgds,
Kamal

From: Feng Jin mailto:jinfeng1...@gmail.com>>
Sent: 15 September 2023 04:14 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By 
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple ‘in-progress’ state files and with content 
only as ‘PAR1’ and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
‘in-progress’ part files or why no error is given if applicable?

Rgds,
Kamal


Re: About Flink parquet format

2023-09-19 Thread Feng Jin
Hi Kamal

What exception did you encounter? I have tested it locally and it works
fine.


Best,
Feng


On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
wrote:

> Hello,
>
>
>
> Checkpointing is enabled and works fine if configured parquet page size is
> at least 64 bytes as otherwise there is exception thrown at back-end.
>
>
>
> Looks to be an issue which is not handled by file sink bulk writer?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin 
> *Sent:* 15 September 2023 04:14 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> Check if the checkpoint of the task is enabled and triggered correctly. By
> default, write parquet files will roll a new file when checkpointing.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
> On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <
> user@flink.apache.org> wrote:
>
> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>
>


RE: About Flink parquet format

2023-09-17 Thread Kamal Mittal via user
Hello,

Checkpointing is enabled and works fine if configured parquet page size is at 
least 64 bytes as otherwise there is exception thrown at back-end.

Looks to be an issue which is not handled by file sink bulk writer?

Rgds,
Kamal

From: Feng Jin 
Sent: 15 September 2023 04:14 PM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: About Flink parquet format

Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By 
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple ‘in-progress’ state files and with content 
only as ‘PAR1’ and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
‘in-progress’ part files or why no error is given if applicable?

Rgds,
Kamal


Re: About Flink parquet format

2023-09-15 Thread Feng Jin
Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
wrote:

> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>


About Flink parquet format

2023-09-14 Thread Kamal Mittal via user
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple 'in-progress' state files and with content 
only as 'PAR1' and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
'in-progress' part files or why no error is given if applicable?

Rgds,
Kamal