Hi,
I try to document the handling of some operations. 
However the IO handling with flow is a bit complicated ;-)
I try to summarize only important steps of the IO process (including states of 
the state-machines) and would be very happy if you could have a look and give 
me hints if something is wrong or if I forgot a important state. 
For me especially messages send between client and server and trove-operations 
are important.

C means client state, S means server state.

pvfs2_client_io_sm
C1) Get file attributes and size using pvfs2_client_getattr_sm
C2) Find target datafiles using the distribution function
C3) Send a message(PVFS_SERV_IO) to every server participating during IO 
operation to initiate flow
        S: pvfs2_io_sm
        S1) prelude_sm
        S2) Send a positive Acknowledge if permissions allow access or a 
negative if 
                a error occurs
C4) If we get a positive ACK for a server start flow for that server. 
        S3) Setup a flow (job_flow)
                Post the flow, probe for the flowprotocoll which does handle 
the specified 
                transfer type call flowproto_post for that protocoll.
                In our case flowproto-multiqueue, initializes several buffers 
(currently 8) 
                and makes a setup depending on the two endpoints of the flow 
and runs the       
                appropriate callback function to start the flow.
                Now a flow will be established between client and server, which 
transfers at 
                maximum 256KByte of data per message.
                
                If operation is write (SRC=BMI TARGET=TROVE): 
trove_write_callback_fn   
                initialize the bmi recv connection and is called when a 
trove_write is done             
                and updates a performance counter. Currently only one buffer is 
used at a       
                time.
                bmi_recv_callback_fn is called when bmi receives data, calls 
                trove_bstream_write_list.
                
                A read operation starts for every buffer bmi_send_callback_fn 
which 
                initiates a communication and updates a performance counter 
also calls  
                trove_bstream_read_list to read Data. The 
trove_read_callback_fn is executed            
                when a trove read is completed and starts a bmi send operation 
for the data     
                read.
                        
                S4) Flow ends: send a ack to the client if it was a write 
operation.
C4) Client sticks in this state until the transmission is completed or a 
        transfer error occured during the flow, retry to do the IO in step 3.
C5) Analyze if the transfer is succesful and the amount of data transfered 
        using the distribution function or whether an IO error occured.
C6) For a read request it can be necessary to get the sizes of all datafiles 
        to detect the correct file size read, this happens when a hole is 
within the 
        requested file area.

Note: The value of the performance counters is processed and stored by a 
different state machine. It can be used to analyze the transfered data within 
a period for example by the karma tool.

Another question: Why is it necessary to get the sizes of the datafiles when 
reading a hole ? 

Thanks a lot for your help,
Julian
_______________________________________________
PVFS2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to