I took a look at some distributed file systems and went a little deeper in Hadoop and his HDFS, for instance. I don't really need full POSIX compliance, but having a nested structure is important, but as far as I know there are way to simulate this on Switf, is that correct?
The problem I see in using something like hadoop is the single point of failure, not because I need almost 100% availability, but because the people who will access the data does not belong to the same organization. They will be researchers from different institutions that may want to deploy a local server with a subset of the data to improve their productivity, but the data set's size makes impractical to just copy everything. The plan would be that the interface to the system would show which files are stored locally and which are not, so that everyone gets access to everything, almost like a peer to peer system where they download from the closest source and then store for their own use. At first, I though of implementing something by hand, but using an already mature solution makes a lot more sense. So, is this plausible or am I trying to use the wrong tools? thanks, again Thiago Moraes - EnC 07 - UFSCar 2011/8/14 Todd Deshane <[email protected]> > On Sun, Aug 14, 2011 at 4:10 AM, Thiago Moraes > <[email protected]> wrote: > > Hey guys, > > > > I'm new on the list and I'm currently considering Openstack to solve a > data > > distribution problem. Right now, there's a server which contains very > large > > files (usual files have 30GB or even more). This server is accessed by > LAN > > and over the internet but, of course, it's difficult to do this without > > local connection. > > > > My idea to solve this problem is to deploy new servers on the places > which > > access data more often in an such a way that they get a local copy of the > > most accessed part of data by then. In my head, I consider that there > will > > be N different clouds, one at my location and the others spread on > another > > networks. Then, these new clouds would download and store parts of the > data > > (entire files) so that they can be accessed through their own LAN. > > > > It sounds like you are looking for the functionality that Zones (aim > to?) provide. > > Take a look at: > > http://wiki.openstack.org/MultiClusterZones > > > > Is Openstack suitable in this environment? Anyone would recommend another > > solution? > > > > Have you also looked at SheepDog, Hadoop or HC2? All of these seem to > have some OpenStack integration points as well. > > Some links to look into: > http://wiki.openstack.org/SheepdogSupport > http://doubleclix.wordpress.com/2011/03/17/hadoop-2-0-openstack-pbj/ > > http://www.quora.com/What-features-differentiate-HDFS-and-OpenStack-Object-Storage > > > Hope that helps. > > Thanks, > Todd > > > PS: I know the file size limitations of 5GB. I just need that all parts > of a > > file to be in the same local area network so that a blazingly fast > Internet > > connection is not required all the time. > > > > thanks, > > > > > > Thiago Moraes - EnC 07 - UFSCar > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : [email protected] > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > > > > > -- > Todd Deshane > http://www.linkedin.com/in/deshantm > http://www.xen.org/products/cloudxen.html > http://runningxen.com/ >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : [email protected] Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp

