Dear all, I am a computer science student at IIIT Hyderabad, India. I am interested in contributing to OpenAFS and have applied in GSoC 2010 for OpenAFS. I think this would be good starting point for me to work with the community. I have also participated in GSoC 2009 with Globus Alliance as my mentoring organization. Also I am working on a research project at my university to improve read access and execution performance for DFS.
I am interested in Collaborative Caching Project listed on the ideas page. The project proposal I have submitted is as follows: The project aims at developing a system which would use collaborative caching techniques to improve the read accesses in OpenAFS. This project is based on two observations. Firstly, in a cluster environment, a large number of clients need same datasets to work on i.e. the data on which client nodes need to execute is same for many other nodes on the network. Currently, each client contacts the server individually to fetch the data. This increase load on the server unnecessarily. If the size of the file is very large then the problem would be highly magnified. Second observation is that the local bandwidth are mostly fast and runs into Gbps. In a cluster, many clients would share the same geography and thus have fast interconnects between them. The server might be connected through a slow network link. In this situation, accessing data from another client would be much faster than accessing data from server itself. Instead of each client contacting the server individually, a collaborative caching technique can be employed. When a client contacts a server for fetching some data, the subsequent requests for the data can be forwarded to this client. This reduces load on server and also improves bandwidth usage at the server side. It also leads to faster data access if the link between the requesting client is weaker than that with other clients. Initially, we can start with a fixed list of peers at the client. The client would access only these clients present on this list for collaboration. Next, we would allow functionality to discover the peers. This can be done using the fileserver. The fileserver can be modified to keep the access logs of the clients and if a client request for any data then its corresponding clients in these logs can be returned to the requesting client. The access controls are also needed here as to how a fileserver could authorise a client to fetch data from another client. Then in OpenAFS systems, server responds with a callback to the client if the file it is using has been modified. We have to consider the situation if some client is accessing data from some other client and this client receives a callback in midst of the transfer. In this situation we could make the call that the client uses to get the hash from the fileserver also establish a callback guarantee. So that all of the clients would be notified by the fileserver, regardless of where they got their data from. I have received a reply from Mr. Jeffrey Altman asking me to contact the community for refining the proposal. He has suggested that It would be useful to discuss the internal workings of the AFS cache manager and CM-FS interactions so that I can refine my proposal. Also, please suggest a project that I can perform over the next few days to demonstrate my abilities and get selected for OpenAFS. Link to project proposal on GSoC portal: http://socghop.appspot.com/gsoc/student_proposal/private/google/gsoc2010/shrutijain/t127083915309 Thank You Best Regards, Shruti
