Re: node symlinks

2010-07-26 Thread Mahadev Konar
HI Maarteen, Can you elaborate on your use case of ZooKeeper? We currently don't have any symlinks feature in zookeeper. The only way to do it for you would be a client side hash/lookup table that buckets data to different zookeeper servers. Or you could also store this hash/lookup table in

Re: node symlinks

2010-07-26 Thread Maarten Koopmans
Hi Mahadev, My use is mapping a flat object store (like S3) to a filesystem and opening it up via WebDAV. So Zookeeper mirror the filesystem (each node corresponds to a collection or a file), and is used for locking and provides the pointer to the actual data object in e.g. S3 A symlink

Re: node symlinks

2010-07-26 Thread Ted Dunning
So ZK is going to act like a file meta-data store and the number of files might scale to a very large number. For me, 5 billion files sounds like a large number and this seems to imply ZK storage of 50-500GB. If you assume 8GB usable space per machine, a fully scaled system would require 6-60 ZK

Re: node symlinks

2010-07-26 Thread Maarten Koopmans
Ted, Thanks for you thinking along with me, your line of thought is what I originally had in mind, but I have some boundary conditions that I think make things subtly different. I am curious as to what you think. First, I think your numbers are right. Even so, every multiple of that number

Re: node symlinks

2010-07-26 Thread Ted Dunning
I think it only mostly disappears. If a user puts 1K files up and is placed on a ZK cluster with 30K free slots then everything is good. But if that user adds 40K files, you have split or migrate that user. I think that the easy answer is to more than one location to look for a user's files.

Re: node symlinks

2010-07-24 Thread Ted Dunning
Depending on your application, it might be good to simply hash the node name to decide which ZK cluster to put it on. Also, a scalable key value store like Voldemort or Cassandra might be more appropriate for your application. Unless you need the hard-core guarantees of ZK, they can be better

Re: node symlinks

2010-07-24 Thread Ted Dunning
Depending on what a user needs to see, you can also have parallel structures and select a cluster based on user number. Your insistence on guarantees is worrisome, though. As much as I like ZK, I like getting rid of hard consistency requirements even more. As I tend to put it, the cost of NOW