I just took a look and I'm not sure the design point works for us. It
targets large size files (few tens of MB to few tens of GB according to
their verbage). We have Lustre which is also an excellent distributed
filesystem but with that same design point and of course HDFS.
Btw, it is now called CloudStore (you better trademark Cloudstone
quickly - before someone steals that name !)
And considering that they've integrated Hadoop, Hypertable etc. I
suspect they're targeting as a replacement for HDFS.
Shanti
On 02/ 4/09 10:01 AM, William Sobel wrote:
Has anyone investigated KFS? If we're focused on single node, I think
developing our own file manager is a good idea, if were looking a 1MM
user scale, KFS or something similar would be more realistic for both a
replication and storage perspective. - Will
On Feb 4, 2009, at 9:57 AM, William Sobel wrote:
Are there any existing filestores that attempt to solve this problem?
I would think someone would have been working on this. -Will
On Jan 20, 2009, at 10:16 AM, Akara Sucharitakul wrote:
Sorry to be vague on the last email. Yes, the events need to be
distributed the same way. It is a little more tricky. Let me think
about the distribution algorithm there and I'll get back.
-Akara
Shanti Subramanyam wrote:
That sounds reasonable. You mention users but not events. How will
the event files be handled ? Also as we keep adding files, we need
to ensure that the ratios of files in dirs is maintained.
Shanti
Akara Sucharitakul wrote:
100 is kind of low. That will mean 10,000 directories for 1 million
loaded users. I don't like it when we see 10,000 entries on ls.
Gets unwieldy again.
I'd limit to 1,000. Seems reasonable for directories to handle. But
the limit here is really a million users with two level
directories. If we increase the stored:concurrent user ratio to
100:1, we'll be stuck at 10,000 concurrent users. I think Olio has
the potential to scale much higher.
Coming down, I really think we should do a 3 layer directory (or 2
layer subdirectory). With a limit of 1000 files/users per directory
you can actually go up to 10^9 stored users and 10^7 concurrent
users - and even if we at some point decide the user ratio to be
1000:1, we are still up to 10^6 concurrent users. That's probably
as big a rig as anybody will want to test.
In summary, I'd go for 2 layer subdirectory limiting each directory
to have no more than 1000 entries.
-Akara
Shanti Subramanyam wrote:
Currently, the filestore is a single flat layer with all files
both pre-loaded and those that are generated during a run going
into the same directory. As the number of users are scaled, this
directory is unwieldy - I'm sure some of you may have tried to do
a 'ls /filestore' and then cursed yourself for doing it !
How about using one directory for every 100 users or so ?
Something similar for events ? Not sure how we'll deal with the
added events and files during a run though.
Thoughts ?
Shanti