[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-08 Thread GitBox
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-962947461 @tsreaper > No I haven't. But I can come up with one problems about this: Some records may be large, for example json strings containing tens of thousands of characters

[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-04 Thread GitBox
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-961072361 @tsreaper many thanks for your effort and for sharing the benchmark data. The option of using BulkFormat + ArrayList is almost the same as using

[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948391521 @tsreaper > Are you suggesting an avro `StreamFormat` which produces an avro block, instead of a Flink row data, at a time? If yes we'll need another operator after the

[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948313648 > @JingGe > > > For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size, since

[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-20 Thread GitBox
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-947822402 > @slinkydeveloper There are four reasons why I did not choose `StreamFormat`. > > 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch` stores all