pitrou opened a new pull request #7098:
URL: https://github.com/apache/arrow/pull/7098


   The AWS SDK creates a auto-growing StringStream by default, entailing 
multiple memory copies when transferring large data blocks (because of 
resizes).  Instead, write directly into the target data area.
   
   Low-level benchmarks with a local Minio server:
   * before:
   ```
   
-----------------------------------------------------------------------------------------------------
   Benchmark                                           Time             CPU   
Iterations UserCounters...
   
-----------------------------------------------------------------------------------------------------
   MinioFixture/ReadAll500Mib/real_time        434528630 ns    431461370 ns     
       2 bytes_per_second=1.1237G/s items_per_second=2.30134/s
   MinioFixture/ReadChunked500Mib/real_time    419380389 ns    339293384 ns     
       2 bytes_per_second=1.16429G/s items_per_second=2.38447/s
   MinioFixture/ReadCoalesced500Mib/real_time  258812283 ns       470149 ns     
       3 bytes_per_second=1.88662G/s items_per_second=3.8638/s
   ```
   * after:
   ```
   MinioFixture/ReadAll500Mib/real_time        194620947 ns    161227337 ns     
       4 bytes_per_second=2.50888G/s items_per_second=5.13819/s
   MinioFixture/ReadChunked500Mib/real_time    276437393 ns    183030215 ns     
       3 bytes_per_second=1.76634G/s items_per_second=3.61746/s
   MinioFixture/ReadCoalesced500Mib/real_time   86693750 ns       448568 ns     
       6 bytes_per_second=5.63225G/s items_per_second=11.5349/s
   ```
   
   Parquet read benchmarks from a local Minio server show speedups from 1.1x to 
1.9x.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to