[GitHub] [arrow] pitrou opened a new pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

GitBox Mon, 04 May 2020 07:24:40 -0700


pitrou opened a new pull request #7098:
URL: https://github.com/apache/arrow/pull/7098



   The AWS SDK creates a auto-growing StringStream by default, entailing 
multiple memory copies when transferring large data blocks (because of 
resizes).  Instead, write directly into the target data area.
   
   Low-level benchmarks with a local Minio server:
   * before:
   ```
   
-----------------------------------------------------------------------------------------------------
   Benchmark                                           Time             CPU   
Iterations UserCounters...
   
-----------------------------------------------------------------------------------------------------
   MinioFixture/ReadAll500Mib/real_time        434528630 ns    431461370 ns     
       2 bytes_per_second=1.1237G/s items_per_second=2.30134/s
   MinioFixture/ReadChunked500Mib/real_time    419380389 ns    339293384 ns     
       2 bytes_per_second=1.16429G/s items_per_second=2.38447/s
   MinioFixture/ReadCoalesced500Mib/real_time  258812283 ns       470149 ns     
       3 bytes_per_second=1.88662G/s items_per_second=3.8638/s
   ```
   * after:
   ```
   MinioFixture/ReadAll500Mib/real_time        194620947 ns    161227337 ns     
       4 bytes_per_second=2.50888G/s items_per_second=5.13819/s
   MinioFixture/ReadChunked500Mib/real_time    276437393 ns    183030215 ns     
       3 bytes_per_second=1.76634G/s items_per_second=3.61746/s
   MinioFixture/ReadCoalesced500Mib/real_time   86693750 ns       448568 ns     
       6 bytes_per_second=5.63225G/s items_per_second=11.5349/s
   ```
   
   Parquet read benchmarks from a local Minio server show speedups from 1.1x to 
1.9x.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pitrou opened a new pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

Reply via email to