[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-08 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-961764970 @JingGe > Have you tried to control the number of records each batchRead() will fetch instead of fetch all records of the current block in one shot? No I

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-05 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-961764970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-05 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-961764970 @JingGe > Have you tried to control the number of records each batchRead() will fetch instead of fetch all records of the current block in one shot? No I

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-03 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-959002265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-03 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-959002265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-03 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-959002265 @slinkydeveloper @fapaul @JingGe I've done some benchmarking on a testing yarn cluster. * Test data: The [Kaggle flight delay

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-03 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-959002265 @slinkydeveloper @fapaul @JingGe I've done some benchmarking on a testing yarn cluster. * Test data: The [Kaggle flight delay

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-11-03 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-959002265 @slinkydeveloper @fapaul @JingGe I've done some benchmarking on a testing yarn cluster. * Test data: The [Kaggle flight delay

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-21 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948333809 @JingGe > "record" is the abstract concept, it does not mean the record in avro. Are you suggesting an avro `StreamFormat` which produces an avro block,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-20 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948202218 @JingGe > For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size,

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-19 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-947280429 @slinkydeveloper There are four reasons why I did not choose `StreamFormat`. 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch` stores all results

[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory

2021-10-19 Thread GitBox
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-947280429 @slinkydeveloper There are four reasons why I did not choose `StreamFormat`. 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch` stores all results