JingGe commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-962947461
@tsreaper
> No I haven't. But I can come up with one problems about this: Some records
may be large, for example json strings containing tens of thousands of
characters
JingGe commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-961072361
@tsreaper many thanks for your effort and for sharing the benchmark data.
The option of using BulkFormat + ArrayList is almost the same as using
JingGe commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-948391521
@tsreaper
> Are you suggesting an avro `StreamFormat` which produces an avro block,
instead of a Flink row data, at a time? If yes we'll need another operator
after the
JingGe commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-948313648
> @JingGe
>
> > For point 1, the uncompressed data size should be controlled by
`StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap
size, since
JingGe commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-947822402
> @slinkydeveloper There are four reasons why I did not choose
`StreamFormat`.
>
> 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch`
stores all