Re: [PVFS2-developers] Question about flow handling

Sam Lang Mon, 19 Dec 2005 05:26:56 -0800


Hi Julian,


I've included comments inline.

-sam

On Dec 17, 2005, at 9:35 AM, Julian Martin Kunkel wrote:

Hi,
I try to document the handling of some operations.
However the IO handling with flow is a bit complicated ;-)
I try to summarize only important steps of the IO process(including states ofthe state-machines) and would be very happy if you could have alook and give
me hints if something is wrong or if I forgot a important state.
For me especially messages send between client and server and trove-operations
are important.

C means client state, S means server state.

pvfs2_client_io_sm
C1) Get file attributes and size using pvfs2_client_getattr_sm
C2) Find target datafiles using the distribution function
C3) Send a message(PVFS_SERV_IO) to every server participatingduring IO
operation to initiate flow

Looks good so far. I've just recently committed some changes to theclient IO state machine that checks the size of IO to be done. Ifthe size fits within the transport layer's limit for an unexpectedmessage size (for tcp is 16K), instead of starting a flow to eachserver, the IO is packed into the request (for writes) or response(for reads). All this is done from a separate "small IO" operationand state machine.

        S: pvfs2_io_sm
        S1) prelude_sm
S2) Send a positive Acknowledge if permissions allow access or anegative if
                a error occurs
C4) If we get a positive ACK for a server start flow for that server.
        S3) Setup a flow (job_flow)
Post the flow, probe for the flowprotocoll which does handle thespecified
                transfer type call flowproto_post for that protocoll.
In our case flowproto-multiqueue, initializes several buffers(currently 8)and makes a setup depending on the two endpoints of the flow andruns the
                appropriate callback function to start the flow.
Now a flow will be established between client and server, whichtransfers at
                maximum 256KByte of data per message.
                
If operation is write (SRC=BMI TARGET=TROVE):trove_write_callback_fninitialize the bmi recv connection and is called when atrove_write is doneand updates a performance counter. Currently only one buffer isused at a
                time.
                bmi_recv_callback_fn is called when bmi receives data, calls
                trove_bstream_write_list.
                
                A read operation starts for every buffer bmi_send_callback_fn 
which
initiates a communication and updates a performance counter alsocallstrove_bstream_read_list to read Data. The trove_read_callback_fnis executedwhen a trove read is completed and starts a bmi send operationfor the data
                read.
                        
                S4) Flow ends: send a ack to the client if it was a write 
operation.
C4) Client sticks in this state until the transmission is completedor a
        transfer error occured during the flow, retry to do the IO in step 3.
C5) Analyze if the transfer is succesful and the amount of datatransfered
        using the distribution function or whether an IO error occured.
C6) For a read request it can be necessary to get the sizes of alldatafilesto detect the correct file size read, this happens when a hole iswithin the
        requested file area.

With the small IO changes I also committed some changes to the way wezero the memory regions where holes exist. Previously we werezeroing the entire memory region at the beginning of the IO request.The changes I've made determine the actual regions during thisanalyze results phase and zero only those regions.

Note: The value of the performance counters is processed and storedby adifferent state machine. It can be used to analyze the transfereddata within
a period for example by the karma tool.
Another question: Why is it necessary to get the sizes of thedatafiles when
reading a hole ?

Remember that C2 was find the target datafiles, so we are operatingon potentially just a subset of the datafiles. Using the sizes wehave from that subset, the analyze step looks for a size (mapped tothe logical domain) that is past the end of the file request. If wefind one, we know the request is not past EOF, so the total file sizeends at the end of the request. If we don't find one, we have to getthe other datafile sizes not in the subset and check those for a sizethat is past the end of the file request.


Does that clarify the problem?

-sam

Thanks a lot for your help,
Julian
_______________________________________________
PVFS2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
PVFS2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [PVFS2-developers] Question about flow handling

Reply via email to