kris...@gmail.com wrote: > Full_Name: Kristopher William Zyp > Version: LMDB 0.9.23 > OS: Windows > URL: > https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9 > Submission from: (NULL) (71.199.6.148) > > > We have seen very poor performance on the sync of commits on large databases > in > Windows. On databases with 2GB of data, in writemap mode, the sync of even > small > commits is consistently well over 100ms (without writemap it is faster, but > still slow). It is expected that a sync should take some time while waiting > for > disk confirmation of the writes, but more concerning is that these sync > operations (in writemap mode) are instead dominated by nearly 100% system CPU > utilization, so operations that requires sub-millisecond b-tree update > operations are then dominated by very large amounts of system CPU cycles > during > the sync phase. > > I think that the fundamental problem is that FlushViewOfFile seems to be an > O(n) > operation where n is the size of the file (or map). I presume that Windows is > scanning the entire map/file for dirty pages to flush, I'm guessing because it > doesn't have an internal index of all the dirty pages for every file/map-view > in > the OS disk cache. Therefore, the turns into an extremely expensive, CPU-bound > operation to find the dirty pages for (large file) and initiate their writes, > which, of course, is contrary to the whole goal of a scalable database system. > And FlushFileBuffers is also relatively slow as well. We have attempted to > batch > as many operations into single transaction as possible, but this is still a > very > large overhead. > > The Windows docs for FlushFileBuffers itself warns about the inefficiencies of > this function > (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers). > Which also points to the solution: it is much faster to write out the dirty > pages with WriteFile through a sync file handle (FILE_FLAG_WRITE_THROUGH). > > The associated patch > (https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9) > is my attempt at implementing this solution, for Windows. Fortunately, with > the > design of LMDB, this is relatively straightforward. LMDB already supports > writing out dirty pages with WriteFile calls. I added a write-through handle > for > sending these writes directly to disk. I then made that file-handle > overlapped/asynchronously, so all the writes for a commit could be started in > overlap mode, and (at least theoretically) transfer in parallel to the drive > and > then used GetOverlappedResult to wait for the completion. So basically > mdb_page_flush becomes the sync. I extended the writing of dirty pages through > WriteFile to writemap mode as well (for writing meta too), so that WriteFile > with write-through can be used to flush the data without ever needing to call > FlushViewOfFile or FlushFileBuffers. I also implemented support for write > gathering in writemap mode where contiguous file positions infers contiguous > memory (by tracking the starting position with wdp and writing contiguous > pages > in single operations). Sorting of the dirty list is maintained even in > writemap > mode for this purpose.
What is the point of using writemap mode if you still need to use WriteFile on every individual page? > The performance benefits of this patch, in my testing, are considerable. > Writing > out/syncing transactions is typically over 5x faster in writemap mode, and 2x > faster in standard mode. And perhaps more importantly (especially in > environment > with many threads/processes), the efficiency benefits are even larger, > particularly in writemap mode, where there can be a 50-100x reduction in the > system CPU usage by using this patch. This brings windows performance with > sync'ed transactions in LMDB back into the range of "lightning" performance > :). What is the performance difference between your patch using writemap, and just not using writemap in the first place? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/