Hi -

It appears as though the 'min_bundle_size' parameter is ignored.  Can
someone verify?  I didn't see an issue in JIRA for this.

I'm following the docs here
<https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.io.parquetio.html>
.

Example code:

mbs = 10000000 * 10 #  10 MBytes
( p
             | 'ReadParquetBatched' >>
beam.io.ReadFromParquetBatched(pq_file, min_bundle_size=mbs)
            | beam.Map(lambda table: print(table.shape))
)

I've experimented w/ various 'min_bundle_size' values, and they result in
the same value:  100 rows, 4 cols

Reply via email to