Re: [Pvfs2-developers] CoalescingLowWatermark setting

Sam Lang Thu, 21 Sep 2006 09:17:37 -0700


Pete Wyckoff wrote:

I've been debugging why the metadata server calls fdatasync() five
times during a single create operation.  (IO server separate and
not considered here.)

In fs.conf, I had these StorageHints settings

    TroveSyncMeta no
    TroveSyncData no
    CoalescingHighWatermark infinity
    CoalescingLowWatermark 1

(defaults from pvfs2-genconfig with trovesync off).

The login in dbpf-sync.c goes like this:

    if (!metadata_sync)
        ++coalesce_count
        if (high_watermark > 0 && coalesce_count >= high_watermark)
            coalesce_count = 0
            sync
        if (num_pending_TROVE_SYNC_operations < low_watermark)
            coalesce_count = 0
            sync

No matter how low the low watermark, any trove operation marked as
TROVE_SYNC will cause a full sync.  Changing
"CoalescingLowWatermark" to 0 fixed that---no syncs

Do I understand this correctly?  Is setting the low WM to zero what
was intended?  Any non-zero value of low WM will always cause
immediate sync after every TROVE_SYNC operation---was this planned?

The intended behavior with TroveSyncMeta=no is to allow trove operationsmarked as TROVE_SYNC to be completed immediately. It does this bymoving the operation to the completion queue immediately as the firststatement inside the if(!metadata_sync) block. This allows the serverthread to push through those operations (go to the next state actions,return responses, etc.) without waiting for the sync. That said, thecode you are referring to will behave in the way you described under'low load' conditions. If there are no other operations in the dbpf opqueue marked TROVE_SYNC (or less than whatever LWM is set to) when thatsecond check is made, we sync. By setting the LWM to 0, you'reessentially saying that you don't want to ever sync under low loadconditions.

I'd like to have it sync every 5--10 ops, or from a timeout.  Is
there some sort of idea that these TROVE_SYNC operations are so
special that they must run immediately, every time?

The behavior of syncing every operation should only happen under lowload, and other than delaying other operations that get posted duringthat sync, there shouldn't be any performance differences from notsyncing at all. That's the idea anyway. Once more operations arequeued (meaning they're not getting serviced immediately), theper-operation sync doesn't happen.


The five syncing MD operations in a create, for those keeping score,
are:

    create dspace_create (sync)
    setattr metafile distribution (sync)
    setattr dspace_setattr (sync)
    crdirent write_directory_entry (sync)
    crdirent dspace_setattr (sync)

If you look at this though, its only doing one sync per-requestper-database:


request 1:     create dspace_create (dspace sync)
request 2:     setattr metafile distribution (keyval sync)
request 2:     setattr dspace_setattr (dspace sync)
request 3:     crdirent write_directory_entry (keyval sync)
request 3:     crdirent dspace_setattr (dspace sync)

That's a lot of sync on both dspace and keyval dbs.  The total sync
time adds 45 ms to the overall operation on a SATA disk.

I agree, but we don't at present group requests, so there's no way totell the trove layer that an operation doesn't need to be synced,because another is coming right behind it. We've talked about methodsand techniques to fix this, but as I see it, there is information lossfrom client to server, and then further from server state-machines totrove layer. Murali has been suggesting that we do transactions over anentire PVFS system interface call, which would only require two syncs(one for each db), but that means distributed transactions. :-)Julian's request-id work might be useful to us in figuring out whetherto wait for a sync, esp. for the create case. I'm not sure the behaviorwould be much different than what we have now though, the design of thesync coalescing code is really meant to perform well...err better (syncless frequently) under high-load conditions, since under load-loadconditions it really shouldn't matter that you're syncing every time.

Just curious, you mentioned 5 calls to fdatasync() in a single create.That _should not_ happen, and is a bug if it does. Its the db->synccall that we make 5 times (potentially, depending on parameters andload). Are you seeing fdatasync() for metadata operations? Also, haveyou see a big drop in metadata performance?


Let me know.

Thanks,

-sam


                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] CoalescingLowWatermark setting

Reply via email to