kris...@gmail.com wrote:
> Full_Name: Kristopher William Zyp
> Version: LMDB 0.9.23
> OS: Windows
> URL: 
> https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9
> Submission from: (NULL) (71.199.6.148)
> 
> 
> We have seen very poor performance on the sync of commits on large databases 
> in
> Windows. On databases with 2GB of data, in writemap mode, the sync of even 
> small
> commits is consistently well over 100ms (without writemap it is faster, but
> still slow). It is expected that a sync should take some time while waiting 
> for
> disk confirmation of the writes, but more concerning is that these sync
> operations (in writemap mode) are instead dominated by nearly 100% system CPU
> utilization, so operations that requires sub-millisecond b-tree update
> operations are then dominated by very large amounts of system CPU cycles 
> during
> the sync phase.
> 
> I think that the fundamental problem is that FlushViewOfFile seems to be an 
> O(n)
> operation where n is the size of the file (or map). I presume that Windows is
> scanning the entire map/file for dirty pages to flush, I'm guessing because it
> doesn't have an internal index of all the dirty pages for every file/map-view 
> in
> the OS disk cache. Therefore, the turns into an extremely expensive, CPU-bound
> operation to find the dirty pages for (large file) and initiate their writes,
> which, of course, is contrary to the whole goal of a scalable database system.
> And FlushFileBuffers is also relatively slow as well. We have attempted to 
> batch
> as many operations into single transaction as possible, but this is still a 
> very
> large overhead.
> 
> The Windows docs for FlushFileBuffers itself warns about the inefficiencies of
> this function 
> (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers).
> Which also points to the solution: it is much faster to write out the dirty
> pages with WriteFile through a sync file handle (FILE_FLAG_WRITE_THROUGH).
> 
> The associated patch
> (https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9)
> is my attempt at implementing this solution, for Windows. Fortunately, with 
> the
> design of LMDB, this is relatively straightforward. LMDB already supports
> writing out dirty pages with WriteFile calls. I added a write-through handle 
> for
> sending these writes directly to disk. I then made that file-handle
> overlapped/asynchronously, so all the writes for a commit could be started in
> overlap mode, and (at least theoretically) transfer in parallel to the drive 
> and
> then used GetOverlappedResult to wait for the completion. So basically
> mdb_page_flush becomes the sync. I extended the writing of dirty pages through
> WriteFile to writemap mode as well (for writing meta too), so that WriteFile
> with write-through can be used to flush the data without ever needing to call
> FlushViewOfFile or FlushFileBuffers. I also implemented support for write
> gathering in writemap mode where contiguous file positions infers contiguous
> memory (by tracking the starting position with wdp and writing contiguous 
> pages
> in single operations). Sorting of the dirty list is maintained even in 
> writemap
> mode for this purpose.

What is the point of using writemap mode if you still need to use WriteFile
on every individual page?

> The performance benefits of this patch, in my testing, are considerable. 
> Writing
> out/syncing transactions is typically over 5x faster in writemap mode, and 2x
> faster in standard mode. And perhaps more importantly (especially in 
> environment
> with many threads/processes), the efficiency benefits are even larger,
> particularly in writemap mode, where there can be a 50-100x reduction in the
> system CPU usage by using this patch. This brings windows performance with
> sync'ed transactions in LMDB back into the range of "lightning" performance 
> :).

What is the performance difference between your patch using writemap, and just
not using writemap in the first place?

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Reply via email to