HI Maarteen,
Can you elaborate on your use case of ZooKeeper? We currently don't have
any symlinks feature in zookeeper. The only way to do it for you would be a
client side hash/lookup table that buckets data to different zookeeper
servers.
Or you could also store this hash/lookup table in
Hi Mahadev,
My use is mapping a flat object store (like S3) to a filesystem and
opening it up via WebDAV. So Zookeeper mirror the filesystem (each node
corresponds to a collection or a file), and is used for locking and
provides the pointer to the actual data object in e.g. S3
A symlink
So ZK is going to act like a file meta-data store and the number of files
might scale to a very large number.
For me, 5 billion files sounds like a large number and this seems to imply
ZK storage of 50-500GB. If you assume 8GB usable space per machine, a fully
scaled system would require 6-60 ZK
Ted,
Thanks for you thinking along with me, your line of thought is what I
originally had in mind, but I have some boundary conditions that I think
make things subtly different. I am curious as to what you think.
First, I think your numbers are right. Even so, every multiple of that
number
I think it only mostly disappears. If a user puts 1K files up and is placed
on a ZK cluster with 30K free slots then everything is good. But if that
user adds 40K files, you have split or migrate that user. I think that the
easy answer is to more than one location to look for a user's files.
Depending on your application, it might be good to simply hash the node name
to decide which ZK cluster to put it on.
Also, a scalable key value store like Voldemort or Cassandra might be more
appropriate for your application. Unless you need the hard-core guarantees
of ZK, they can be better
Depending on what a user needs to see, you can also have parallel structures
and select a cluster based on user number.
Your insistence on guarantees is worrisome, though. As much as I like ZK, I
like getting rid of hard consistency requirements even more. As I tend to
put it, the cost of NOW