The current kernel set PAGE_COPY without write bit. This will cause 
intermittent non-cosistent data for user-level network drivers such as 
Infiniband, Quadrics and Myrinet. Which has also be mentioned by Costin Iancu 
in the paper "HUNTing the Overlap " (PACT'05).
    An example of such phenomena is the following sequences: 
        register a memory space BUFF for receive message, 
        receive message,
        call mprotect(...PROT_NONE...) and mprotect(...PROT_READ|PROT_WRITE) 
one by one,        
        write into BUFF, 
        then receive again.      
    The second time received data will perhaps not be the data sent by the peer 
machine but the data written by itself in the 4th step.

     The reson is that :
     1) User-level network driver locks phy pages when memory space is 
registered;
     2) 2 calls to mprotect change ptes in the space to PAGE_COPY, so write any 
page in the space will cause a page fault;
     3) In the page fault handler, it goes to do_wp_page, and in it if Page Is 
Locked, a new page is generated and filled into the pte, which is the 
COW(Copy-On-Write). So the physical page seen by the host is not the same one 
by the NIC.

     Adding PAGE_RW to PAGE_COPY will resolve this problem.  
     In my option, the reason for absense of RW is to save memory by mapping 
all those only read pages into ZERO_PAGE. But is there really programs which 
make many read-ops in memory space without even initialize them?

___________________________________________________
_      Yingchao Zhou                              _
_      ICT, CAS                                   _
_      (86)010-62601009                           _
___________________________________________________



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to