Hi Erik, You bring up some really great points and possible solutions. Comments inline:
On Wed, Mar 23, 2011 at 01:02:55AM +0000, Erik Carlin wrote: > I assume the lowest zone (zone D) is responsible for assigning the id? Yes, the zone that actually contains the running VM should be the fully qualified name. > Does that mean there are now 4 URIs for the same exact resource (I'm > assuming a numeric server id here for a moment): > > http://zoned.dfw.servers.rackspace.com/v1.1/123/servers/12345 (this would > be non-public) > http://dfw.servers.rackspace.com/v1.1/123/servers/12345 > http://servers.osprovider.com/v1.1/456/servers/12345 > http://servers.myos.com/v1.1/789/servers/12345 Well, this is four ways of accessing the resource if 12345 is actually a globally unique ID (more on that later). Lets not confuse API endpoints with fully-qualified resource names. A resource name is used in more places than APIs (billing, logging, other API versions/protocols, etc.). There could be thousands of places that you can use a resource ID, but only one ID for a given resource. > I assume then the user is only returned the URI from the high level zone > they are hitting (http://servers.myos.com/v1.1/789/servers/12345 in this > example)? If so, that means the high level zone defines everything in the > URI except the actually server ID which is assigned by the low level zone. > Would users ever get returned a "downstream" URI they could hit directly? Perhaps, but for the simple case probably not. If we use DNS names for resources, and DNS services are fully integrated into Nova, you could potentially get the most specific endpoint or use SRV records as Justin suggests. In any case, the user of the high-level API gets back a resource record which they can use again with this API or any other that can route to the final zone. Just like it doesn't matter which DNS server I query when I want an IP, they all return the same IP(s) for openstack.org. > Pure numeric ids will not work in a federated model at scale. If you have > registered zone prefixes/suffixes, you will limit the total zone count > based on the number of digits you preallocate and need a registration > process to ensure uniqueness. How many zones is enough? Agreed. I'm a bit confused though because in the next paragraph you mention using a UUID. A UUID is a pure numeric ID, just large enough that it is not likely to conflict. Depending on implementation this could be random, time-based, MAC address based, or a combination. In any case the meaning of the bits varies so you can't count on structure, so you're left with a simple 128bit number. The only difference from what we have now is that it's twice the size and not sequentially assigned. > You could use UUID. If the above flow is accurate, I can only see how you > create collisions in your OWN OS deployment. For example, if I > purposefully create a UUID collision in servers.myos.com (that I run) with > dfw.servers.rackspace.com (that Rackspace runs), it would only affect me > since the collision would only be seen in the servers.myos.com namespace. > Maybe I'm missing something, but I don't see how you could inject a > collision ID downstream - you can just shoot yourself in your own foot. Lets say you have api.rackspace.com (global aggregation zone), rack1.dfw.rackspace.com (real zone running instances), and bursty.customer.com (private zone). Bursty is a rackspace customer and they want to leverage their private resources alongside the public cloud, so they add bursty.customer.com as a private zone for their Rackspace account. The api.rackspace.com server now gets a terminate request for <id x> and it needs to know where to route the request. If we have a global namespace for instances (such as UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both have servers for <id x> (most likely from bursty spoofing the ID). Now api.rackspace.com doesn't know who to forward the request to. If we provide some structure to the IDs, such as DNS names, we not only solve this namespacing problem but we also get a much more efficient routing mechanism. I no longer need to cache every UUID for every peer zone, I can just map *.bursty.customer.com to the bursty.customer.com zone. We may still need to cache the list of instances though for quick 'list my instances' queries, so this may not be as important. > Eric Day, please jump in here if I am off. AFAICT, same applies to dns > (which I will discuss more below). I could just make my server ID dns > namespace collide with rackspace, but it would still only affect me in my > own URI namespace. In my previous email I mentioned if we do use DNS names we'll need zone name (which is also a DNS name) verification. For example, when the customer adds bursty.customer.com to api.rackspace.com for peering, api.rackspace.com will not add the zone or attempt to discover resources until the authenticity of this zone is verified. The most obvious method is SSL cert check with child zone API server from a trusted CA, but there are of course other ways we can verify this. Once the zone is verified, all resource names matching *.bursty.customer.com will go to that zone. > This is obviously redundant with the Rackspace URI since we are > representing Rackspace and the region twice, e.g. > http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789. > > This option also means we need a mechanism for registering unique > prefixes. We could use the same one we are proposing for API extensions, > or, as Eric pointed out, use dns, but that would REALLY get redundant, > e.g. > http://dfw.servers.rackspace.com/v1.1/12345/servers/6789.dfw.servers.racksp > ace.com. This is the same as your rax:dfw:1:6789 example above, but it's just abbreviated. It also introduces a non-standard scheme (well, almost a URN), and we loose the benefits of DNS entries. > The fundamental problem I see here is URI is intended to be the universal > resource identifier but since zone federation will create multiple URIs > for the same resource, the server id now has to be ANOTHER universal > resource identifier. I can see your point. As I said above, I think we need to treat API endpoints differently from resource IDs. We're already going to have many URIs, at the very least one for every API version. I think making this URI/resource ID split now future proofs us, as it doesn't lock us into a single API pattern. Who knows what other endpoints and protocols we'll support in the future! > Another issue is whether you want transparency or opaqueness when you are > federating. If you hit http://servers.myos.com, create two servers, and > the ids that come back are (assuming using dns as server ids for a moment): > > http://servers.myos.com/v1.1/12345/servers/5678.servers.myos.com > > http://servers.myos.com/v1.1/12345/servers/6789.dfw.servers.rackspace.com > > It will be obvious in which deployment the servers live. This will > effectively prevent whitelabel federating. UUID would be more opaque. I see two options here: * Reseller APIs can keep a translation database if they like. I understand that this could get nasty since the API layer needs to translate any place a name appears in requests and responses. * Allow multiple zone names (DNS names) for a single zone, and the name used for a given resource is determined by the requester. A reseller proxy would make a request to an external server using a reseller-specific name, and the resource returned would be under that name. It's another application of aliases or virtual hosts. -Eric _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp