[GitHub] [jena] afs commented on pull request #1800: ByteBufferLib, RecordBuffer: use bulk get/put APIs

via GitHub Fri, 24 Mar 2023 07:36:05 -0700


afs commented on PR #1800:
URL: https://github.com/apache/jena/pull/1800#issuecomment-1482907354


   Generally, improvement with ByteBufferLib changes.
   
   Adding RecordBuffer causes some slow down compared to just the ByteBufferLib 
changes.
   This is probably because the optimizer can avoid the bounds checking in 
RecordBuffer `ByteBuffer.get` together with the RecordBuffer operaing on small 
data items, so the allocation costs of the array are significant.
   
   Setup: current Jena development codebase (same as 4.7.0 for TDB2), with 
ByteBufferLib changes and with
   ByteBufferLib and RecordBuffer changes.
   
   Writing to NVMe M2 SSD.
   Reading from data file nt.gz on HDD.
   It is slower to put the data on the same SSD as the output database - the 
parser is faster tan all the figures shown and is not not the limiting factor.
   
   The test is loading BSBM data with tdb2.tdbloader, running each of 3 loaders 
twice in one JVM (6 runs per JVM).
   The results of the second run; no class loading, some JIT optimization would 
have happened.
   
   Data sizes: 1m, 5m, 25m , 50m, and 100m.
   
   "basic" is approximately what a bulk load into a running Fuseki server is 
doing.
   
   <details>
   
     <summary>Time data for bulk loading</summary>
   
   ```
   1 million triples
   main:
   Basic:    7.842 seconds : Triples = 1,000,312 : Rate = 127,558 /s
   Phased:   3.897 seconds : Triples = 1,000,312 : Rate = 256,688 /s
   Parallel: 2.793 seconds : Triples = 1,000,312 : Rate = 358,150 /s
   
   ByteBuffer:
   Basic:    6.292 seconds : Triples = 1,000,312 : Rate = 158,982 /s
   Phased:   3.262 seconds : Triples = 1,000,312 : Rate = 306,656 /s
   Parallel: 2.754 seconds : Triples = 1,000,312 : Rate = 363,221 /s
   
   ByteBuffer and Record Buffer:
   Basic:    6.704 seconds : Triples = 1,000,312 : Rate = 149,211 /s
   Phased:   3.415 seconds : Triples = 1,000,312 : Rate = 292,917 /s
   Parallel: 2.720 seconds : Triples = 1,000,312 : Rate = 367,762 /s
   
   ByteBuffer faster than ByteBuffer+RecordBuffer
   
   Comparing existing and byte buffer changes:
   Basic faster
   Phased faster
   Parallel unaffected
   
   ------------------------
   
   5m
   main:
   Basic:    39.224 seconds : Triples = 5,000,599 : Rate = 127,488 /s
   Phased:   19.032 seconds : Triples = 5,000,599 : Rate = 262,747 /s
   Parallel: 13.148 seconds : Triples = 5,000,599 : Rate = 380,332 /s
   
   ByteBuffer:
   Basic:    33.248 seconds : Triples = 5,000,599 : Rate = 150,403 /s
   Phased:   16.070 seconds : Triples = 5,000,599 : Rate = 311,176 /s
   Parallel: 13.233 seconds : Triples = 5,000,599 : Rate = 377,889 /s
   
   ByteBuffer and Record Buffer:
   Basic:    34.614 seconds : Triples = 5,000,599 : Rate = 144,468 /s
   Phased:   16.635 seconds : Triples = 5,000,599 : Rate = 300,607 /s
   Parallel: 13.307 seconds : Triples = 5,000,599 : Rate = 375,787 /s
   
   ByteBuffer+RecordBuffer slower than just ByteBuffer
   
   Comparing existing and byte buffer changes:
   Basic faster
   Phased faster
   Parallel slightly slower
   
   ------------------------
   
   25m
   main:
   Basic:    141.306 seconds : Triples = 24,997,044 : Rate = 176,900 /s
   Phased:    75.321 seconds : Triples = 24,997,044 : Rate = 331,874 /s
   Parallel:  63.707 seconds : Triples = 24,997,044 : Rate = 392,375 /s
   
   ByteBuffer:
   Basic:    139.969 seconds : Triples = 24,997,044 : Rate = 178,590 /s
   Phased:    75.875 seconds : Triples = 24,997,044 : Rate = 329,450 /s
   Parallel:  64.162 seconds : Triples = 24,997,044 : Rate = 389,593 /s
   
   ByteBuffer and record buffer:
   Basic:    150.354 seconds : Triples = 24,997,044 : Rate = 166,255 /s
   Phased:    79.437 seconds : Triples = 24,997,044 : Rate = 314,678 /s
   Parallel:  65.110 seconds : Triples = 24,997,044 : Rate = 383,920 /s
   
   ByteBuffer+RecordBuffer slowest (could be GC related)
   
   Comparing existing and byte buffer changes:
   Basic faster
   Phased about the same
   Parallel slightly slower
   
   ------------------------
   
   50m
   main:
   Basic:    428.839 seconds : Triples = 50,005,630 : Rate = 116,607 /s
   Phased:   194.982 seconds : Triples = 50,005,630 : Rate = 256,463 /s
   Parallel: 132.348 seconds : Triples = 50,005,630 : Rate = 377,834 /s
   
   ByteBuffer:
   Basic:    357.125 seconds : Triples = 50,005,630 : Rate = 140,023 /s
   Phased:   164.474 seconds : Triples = 50,005,630 : Rate = 304,034 /s
   Parallel: 129.619 seconds : Triples = 50,005,630 : Rate = 385,789 /s
   
   ByteBuffer and record buffer:
   Basic:    379.890 seconds : Triples = 50,005,630 : Rate = 131,632 /s
   Phased:   171.425 seconds : Triples = 50,005,630 : Rate = 291,706 /s
   Parallel: 132.076 seconds : Triples = 50,005,630 : Rate = 378,613 /s
   
   ByteBuffer+RecordBuffer slower than just ByteBuffer
   
   Comparing existing and byte buffer changes:
   Basic faster
   Phased faster
   Parallel faster
   
   ------------------------
   
   100m:
   
   main:
   Basic:     875.759 seconds : Triples = 100,000,748 : Rate = 114,188 /s
   Phased:    390.673 seconds : Triples = 100,000,748 : Rate = 255,970 /s
   Parallel:  268.516 seconds : Triples = 100,000,748 : Rate = 372,420 /s
   
   ByteBuffer:
   Basic:     729.727 seconds : Triples = 100,000,748 : Rate = 137,039 /s
   Phased:    334.163 seconds : Triples = 100,000,748 : Rate = 299,257 /s
   Parallel:  257.353 seconds : Triples = 100,000,748 : Rate = 388,574 /s
   
   ByteBuffer and Record Buffer:
   Basic:     766.970 seconds : Triples = 100,000,748 : Rate = 130,384 /s
   Phased:    345.600 seconds : Triples = 100,000,748 : Rate = 289,354 /s
   Parallel:  258.614 seconds : Triples = 100,000,748 : Rate = 386,680 /s
   
   Record buffer slower than byte buffer
   
   Basic faster
   Phased faster
   Parallel slightly faster
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] afs commented on pull request #1800: ByteBufferLib, RecordBuffer: use bulk get/put APIs

Reply via email to