Re: [Pvfs2-developers] Data-sync mode

Sam Lang Tue, 13 Jun 2006 12:45:13 -0700


Hi Julian,

Thanks for sending out your thoughts and ideas. Even though we'vetalked about most of this offline, I'm just going to summarize what Isaid in case others want to comment.


-sam

On Jun 10, 2006, at 1:57 PM, Julian Martin Kunkel wrote:

Hi,
I looked a bit arround the implementation of the data sync mode,
currently the PINT_flow_setinfo is called which sets the sync modefor eachwrite operation of a flow. That means if 100 MByte are transferedfor blocks
with 256 Kbyte a sync happens, which ends up in quite a lot syncs.

Maybe it would be nice if the client could specify in the IO request
(PVFS_servreq_io) if the data should be synced instead of settingit perfilesystem. Maybe the kernel interface can take benefit of this tosave syncoperations or this can be useful elsewhere ? Of course, this valuecan be
filled by default with the filesystems TroveSyncData option.
In MPI there is the explicit sync via MPI_File_sync, maybe we couldrely on
this for MPI apps ?

This also requires an additional flag to be added to the parametersof PVFS_sys_io. The flag would specify whether to sync or not (orcould be extended for other uses). This saves a roundtrip betweenclient and server because the flag can be sent along with the IOrequest (as Julian proposes), instead of doing a separate flushoperation.

When I was looking at the performance of small-io, the overall costof doing an extra roundtrip was negligible once the IO request sizeswere larger (~ 32K IIRC), so the benefit here may not be that great,and modifying the system interface may not make it worthwhile.

At the same time, in the use case where clients want to specify adata sync on a per IO request basis, allowing the server to know atthe beginning of an IO operation that it needs to be synced may helpimprove the sync coalescing behavior, because it gives the servermore time to determine if multiple IO ops can be synced together.

Independent of this questions, Rob mentioned that the sync policymaybe shouldbe changed, too. For example to sync the data only after at the endof theflow and that data syncs could be coalesynced like the metadatacoalesyncs.

This is a good idea. In fact it sounds like we can just replace the'TroveSyncData on' semantics to sync at the end of the entire IO opinstead of for each trove write call that flow makes. In otherwords, we don't need to provide the user with the config option tosync for each trove write.

I think maybe the coalesyncing of operations should be handled bythe trove
module, because this knows which coalesync method is best for the
implementation or should this be handled by a upper layer (e.g.job ?).

I would put it in the dbpf layer. The queuing of operations ishandled there (both metadata and io), so you can do you policy stuffmost easily from there. The trove layer just acts as a wrapper forthe underlying implementation, and the job layer is used by theserver thread for testing completion. Since the request schedulerallows write ops on the same handle to be scheduled immediately, youshould be able to manage everything in dbpf.

In case a I/O scheduler will be added to the Trove layer maybesmall writerequests can be combined like in ROMIO. Also the policy mightdepend on the
servers I/O load and pending I/O jobs.

The problem with doing this on the server is that its hard to know inadvance that many small IO operations are being done together, unlessthey're all sitting in the queue waiting to be serviced. I like thepvfs2 stance of encouraging client-side data-sieving since in manycases clients aren't acting independently (if that is the pvfs2stance, perhaps I'm projecting :-)). In our discussion yesterdayRobL pointed out that the disk scheduler should be doing some amountof read-ahead, so assuming that the disk operations are the expensivepart, doing many lio_listio calls instead of coalescing them into onecall may not actually matter.

I will take care for modifications and evaluate possible policiesif nobody

else is currently working on this issues.

Thanks,
Julian
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] Data-sync mode

Reply via email to