I just took a look and I'm not sure the design point works for us. It targets large size files (few tens of MB to few tens of GB according to their verbage). We have Lustre which is also an excellent distributed filesystem but with that same design point and of course HDFS. Btw, it is now called CloudStore (you better trademark Cloudstone quickly - before someone steals that name !)

And considering that they've integrated Hadoop, Hypertable etc. I suspect they're targeting as a replacement for HDFS.

Shanti

On 02/ 4/09 10:01 AM, William Sobel wrote:

Has anyone investigated KFS? If we're focused on single node, I think developing our own file manager is a good idea, if were looking a 1MM user scale, KFS or something similar would be more realistic for both a replication and storage perspective. - Will

On Feb 4, 2009, at 9:57 AM, William Sobel wrote:


Are there any existing filestores that attempt to solve this problem? I would think someone would have been working on this. -Will

On Jan 20, 2009, at 10:16 AM, Akara Sucharitakul wrote:

Sorry to be vague on the last email. Yes, the events need to be distributed the same way. It is a little more tricky. Let me think about the distribution algorithm there and I'll get back.

-Akara

Shanti Subramanyam wrote:
That sounds reasonable. You mention users but not events. How will the event files be handled ? Also as we keep adding files, we need to ensure that the ratios of files in dirs is maintained.
Shanti
Akara Sucharitakul wrote:
100 is kind of low. That will mean 10,000 directories for 1 million loaded users. I don't like it when we see 10,000 entries on ls. Gets unwieldy again.

I'd limit to 1,000. Seems reasonable for directories to handle. But the limit here is really a million users with two level directories. If we increase the stored:concurrent user ratio to 100:1, we'll be stuck at 10,000 concurrent users. I think Olio has the potential to scale much higher.

Coming down, I really think we should do a 3 layer directory (or 2 layer subdirectory). With a limit of 1000 files/users per directory you can actually go up to 10^9 stored users and 10^7 concurrent users - and even if we at some point decide the user ratio to be 1000:1, we are still up to 10^6 concurrent users. That's probably as big a rig as anybody will want to test.

In summary, I'd go for 2 layer subdirectory limiting each directory to have no more than 1000 entries.

-Akara

Shanti Subramanyam wrote:
Currently, the filestore is a single flat layer with all files both pre-loaded and those that are generated during a run going into the same directory. As the number of users are scaled, this directory is unwieldy - I'm sure some of you may have tried to do a 'ls /filestore' and then cursed yourself for doing it !

How about using one directory for every 100 users or so ? Something similar for events ? Not sure how we'll deal with the added events and files during a run though.

Thoughts ?

Shanti




Reply via email to