Re: [Openstack] Instance IDs and Multiple Zones

2011-03-24 Thread Chris Behrens
It's early here, but I think it's closer to 200 zones? :)

On Mar 24, 2011, at 5:16 AM, Ed Leafe  wrote:

> On Mar 23, 2011, at 9:41 PM, Justin Santa Barbara wrote:
> 
>> The type of a server @id in CloudServers is xsd:int, which is a 32-bit 
>> signed integer:
>> http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd
>> 
>> So if you have 1 billion integers per zone, you only get 2 zones.  You can 
>> have 4 if you're willing to go negative, but surely it's too early in the 
>> campaign.
> 
>Yes, you're correct. That always trips me up: why would anyone pick a 
> signed integer for a PK?
> 
>OK, so I'll slice the ranges down to the current Rackspace practice of 10 
> million. That will allow for around 2000 zones.
> 
> 
> 
> -- Ed Leafe
> 
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-24 Thread Ewan Mellor
> If we were to go with UUIDs and using XenServer, I should be able to use
> the uuid that it generates upon VM creation. I would almost ask your above
> question for XenServer then. When I terminate and launch an VM on the same
> machine, I should be able to give it the same uuid that I was just using,
> but I can't. Maybe I can and I'm making it harder on myself :)

Yes, it would be great if you could use the XenServer-generated UUID.  The 
reason this doesn't work is because OpenStack is outside of the XenServer 
design envelope, and is orchestrating on top of it.  If you were using 
XenServer pools, and only starting, stopping, and migrating VMs within the 
pool, then you could use our UUID, because the VM retains its identity for the 
lifetime of all those operations.   It's the fact that OpenStack moves beyond 
that model that breaks this.  For OpenStack, it might be a VM move, but for 
XenServer it's a VM copy + VM destroy + VM start for a completely different VM; 
we lose track of the identity at that point.

Ewan.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-24 Thread Ed Leafe
On Mar 23, 2011, at 9:41 PM, Justin Santa Barbara wrote:

> The type of a server @id in CloudServers is xsd:int, which is a 32-bit signed 
> integer:
> http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd
> 
> So if you have 1 billion integers per zone, you only get 2 zones.  You can 
> have 4 if you're willing to go negative, but surely it's too early in the 
> campaign.

Yes, you're correct. That always trips me up: why would anyone pick a 
signed integer for a PK?

OK, so I'll slice the ranges down to the current Rackspace practice of 
10 million. That will allow for around 2000 zones.



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-24 Thread Sandy Walsh
+1

Great discussion and not anything that should be blocking distributed 
scheduler. 

-S


From: Eric Day [e...@oddments.org]

Ok. :)  The original statement felt like it was written with negative
connotations, and I just wanted to say I think it's all been positive.

-Eric


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
Ok. :)  The original statement felt like it was written with negative
connotations, and I just wanted to say I think it's all been positive.

-Eric

On Wed, Mar 23, 2011 at 10:09:50PM -0400, Ed Leafe wrote:
> On Mar 23, 2011, at 9:54 PM, Eric Day wrote:
> 
> > I don't think anyone is arguing, all the discussion has been very
> > healthy IMHO.
> 
> 
>   Of course we are arguing - presenting evidence for a particular 
> position in an effort to persuade is argument. The arguments have not become 
> heated or personal, if that's what you meant.
> 
>   Differing ideas and opposing POVs are wonderful, IMO. Groupthink is 
> what should be avoided as unhealthy.
> 
> 
> 
> -- Ed Leafe
> 
> 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 9:54 PM, Eric Day wrote:

> I don't think anyone is arguing, all the discussion has been very
> healthy IMHO.


Of course we are arguing - presenting evidence for a particular 
position in an effort to persuade is argument. The arguments have not become 
heated or personal, if that's what you meant.

Differing ideas and opposing POVs are wonderful, IMO. Groupthink is 
what should be avoided as unhealthy.



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
On Wed, Mar 23, 2011 at 09:15:38PM -0400, Ed Leafe wrote:
> On Mar 23, 2011, at 8:59 PM, Eric Day wrote:
> 
> > May I ask what is the point of doing this if it won't make cactus and
> > we're just going to replace it in a month or two? I think we all agree
> > that 64-bit integer IDs are insufficient for multi-zone deployments,
> > so no one will be deploying this until we sort it out and come up
> > with a better ID.
> 
> 
>   Because this is just one part of the process of creating a distributed 
> scheduler. The process for selecting a host for a new instance won't depend 
> on the type of PK used for that instance in a db table.

Sure, selecting a host for new instances doesn't depend on solving
the unique ID issue either. You can still work on this without the
partitioning, no?

It's only once the instance is created and we need to list and route
subsequent requests does the uniqueness issue (and ID type) come
up. For this I'm asking why bother implementing partitioning that
will be thrown away if we can finish working through a more robust
path and start working on that?

>   The only reason I brought it up was that Sandy pointed out this 
> uniqueness requirement, and we felt it would be a good idea to ask the list 
> if they had any good ideas about alternatives to range partitions.

Yeah, and this is an important issue we need to solve soon for the
multi-zone work.

> I prefaced my initial post with a disclaimer that I wasn't looking to 
> re-argue things that had already been discussed and agreed to, but I guess 
> most people missed that part. :)

I don't think anyone is arguing, all the discussion has been very
healthy IMHO. We've also not previously discussed or decided anything
at the multi-zone level either (at least nothing I was aware of),
only ID's within a zone.

I think all this is new and useful discussion. :)

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Justin Santa Barbara
>
>  So I'm going to implement a partition of 1 billion integers per zone,
> which should allow for approximately 1 billion zones, given a 64 bit integer
> for the PK. This should be workable for now, and after the design summit,
> when we've come to a consensus on changing the API to accept something other
> than integer identifiers, it should not be too difficult to retrofit.
>

The type of a server @id in CloudServers is xsd:int, which is a 32-bit
signed integer:
http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd

So if you have 1 billion integers per zone, you only get 2 zones.  You can
have 4 if you're willing to go negative, but surely it's too early in the
campaign. 

I think the only way long-term we're going to have CloudServers
v1.0 compatibility is by having a proxy that bridges between legacy APIs
(EC2 and CS) and future APIs (OpenStack).  I'm guessing that proxy will have
to be stateful to implement mappings of server IDs etc.  Yes, this sucks.
 But at some stage you have to say "you know, maybe 640KB wasn't enough, and
we have to make some changes"

How about this as a solution: use ranges as you suggest, but let the
starting points for the zone-ids that child-zones draw from be
customer-configured.  We're pushing the problem onto the end-user, but they
probably know best anyway, and we don't really expect anyone to use
sub-zones in anger anyway until Diablo or later, right?

Justin
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
Hi Sandy,

On Thu, Mar 24, 2011 at 01:01:18AM +, Sandy Walsh wrote:
> > From: Eric Day [e...@oddments.org]
> >Within that zone Nova will prevent collisions, but if things are really 
> >broken (accident or on purpose)
> and it starts returning duplicate resource IDs, peer zones can choose to just 
> use one/none. We can document the behavior as undefined.
> 
> I'm not sure that's a good thing ... the use case I was thinking of is the 
> customer using two providers:
> 
> The customer won't be happy that sometimes he gets status on Instance 
> 10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all.
> 
> If we append the DNS name of the provider, we bust RS 1.0 compatibility. 

I think this is fine. RS 1.0, just like the EC2 API, were not designed
with federation in mine. We should not try to jump through hoops to
force it if we have the luxury of defining the next API version and
supporting it more elegantly there.

As for backwards compatibility for RS 1.0/EC2, those APIs could
depend on a global mapping server for non-bursting zones to translate
nova-internal IDs (id.zone) to what they need (integer, etc.), but this
should not be a core component of Nova since it goes against our design
tenets. It should be deprecated (along with the APIs) and shutdown
in a timely manner once the new API and tools are available. Managing
resources in bursting zones would only be available through the new API
(along with other new features), so there will be plenty of incentive
for clients to change.

> Perhaps you can walk me through how you see the Cert check helping here 
> (assuming no prefix on id)?
> Or are we assuming that bursting is a RS x.0 API feature and things will 
> change then?

Yeah, the cert check verifies the zone nova.example.com can
return resource IDs named *.nova.example.com, all others should be
ignored. The ID's need the zone name suffix for it to make sense.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 8:59 PM, Eric Day wrote:

> May I ask what is the point of doing this if it won't make cactus and
> we're just going to replace it in a month or two? I think we all agree
> that 64-bit integer IDs are insufficient for multi-zone deployments,
> so no one will be deploying this until we sort it out and come up
> with a better ID.


Because this is just one part of the process of creating a distributed 
scheduler. The process for selecting a host for a new instance won't depend on 
the type of PK used for that instance in a db table.

The only reason I brought it up was that Sandy pointed out this 
uniqueness requirement, and we felt it would be a good idea to ask the list if 
they had any good ideas about alternatives to range partitions. I prefaced my 
initial post with a disclaimer that I wasn't looking to re-argue things that 
had already been discussed and agreed to, but I guess most people missed that 
part. :)


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Sandy Walsh
> From: Eric Day [e...@oddments.org]
> > On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote:
> > Regardless of how we delineate it or which ID scheme we use, we have no way 
> > of detecting collisions.
> Why not? Some schemes such as the ID.DNS name + ssl cert check I
mentioned before allow us to verify the authenticity of a namespace
before it is used. No other peer could register a zone with that
name unless the cert checks out. 

Hmm, yeah, you're right, the SSL cert approach should work for validating 
unique zone names. Funny, myself and pvo were talking about that route 
yesterday. 

But will it help us with the duplicates problem? ...

>Within that zone Nova will prevent collisions, but if things are really broken 
>(accident or on purpose)
and it starts returning duplicate resource IDs, peer zones can choose to just 
use one/none. We can document the behavior as undefined.

I'm not sure that's a good thing ... the use case I was thinking of is the 
customer using two providers:

The customer has his own Openstack deployment (range 0-1B) and outsources to 
Provider-A and Provider-B.
Sadly, Pro-A and Pro-B both use the default ID ranges for service providers 
(let's say 10-11B). 
The customer starts provisioning instances to both provider zones evenly ... 
pow, duplicates.

The customer won't be happy that sometimes he gets status on Instance 
10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all.

If we append the DNS name of the provider, we bust RS 1.0 compatibility. 

Perhaps you can walk me through how you see the Cert check helping here 
(assuming no prefix on id)?
Or are we assuming that bursting is a RS x.0 API feature and things will change 
then?

-S

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
Hi Ed,

May I ask what is the point of doing this if it won't make cactus and
we're just going to replace it in a month or two? I think we all agree
that 64-bit integer IDs are insufficient for multi-zone deployments,
so no one will be deploying this until we sort it out and come up
with a better ID.

Since we've reached MP freeze, and none of this is going to make it
into cactus, it seems to make the most sense to finish flushing this
out on the ML (I think we're close), discuss at the summit if needed,
and implement it once we have a consensus on a more robust solution.

-Eric

On Wed, Mar 23, 2011 at 08:24:29PM -0400, Ed Leafe wrote:
>   OK, time for everyone to step back and take a deep breath.
> 
>   There are many implications of the earlier design decision to use 
> integer PKs for database entries. Most who have responded here, myself 
> included, have indicated that they would prefer that this be changed to 
> either a string value comprised of several meaningful bits of information, or 
> a UUID approach, or some combination of things that would address various 
> things in the operation of a zoned design. I think that this will make an 
> excellent discussion at next month's design summit!
> 
>   But the reality is that this needs to be developed now, under the 
> current design of integer PKs. Please note that the only concern here is how 
> to reconcile the Rackspace API requirement of globally unique instance IDs 
> with the current design of generating PKs in local databases at the compute 
> node level. To my understanding, there is no other alternative than 
> partitioning the available integer range across zones, so that each zone 
> generates its instance PKs starting from a different number, and spaced far 
> enough apart that they will never overlap.
> 
>   In the first post of this thread, I proposed a simple partitioning 
> system: allocating a range of integers for each zone, and asked for feedback 
> as to what people would think would be a reasonable estimate for the maximum 
> number of instances a zone would ever need to create. Most shared my distaste 
> for this sort of partitioning system, but no one offered an alternative that 
> would be workable given the current constraints. So I'm going to implement a 
> partition of 1 billion integers per zone, which should allow for 
> approximately 1 billion zones, given a 64 bit integer for the PK. This should 
> be workable for now, and after the design summit, when we've come to a 
> consensus on changing the API to accept something other than integer 
> identifiers, it should not be too difficult to retrofit.
> 
>   Unless someone has a better idea... ;-)
> 
> 
> -- Ed Leafe
> 
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote:
> From: Eric Day [e...@oddments.org]
> > Do we want this namespace per zone, deployment, resource owner, or some 
> > other dimension?
> 
> Good question. We can prevent collisions at the zone level and within a 
> deployment (single provider / multi-zone). But hybrid clusters are a 
> different matter. Regardless of how we delineate it or which ID scheme we 
> use, we have no way of detecting collisions.

Why not? Some schemes such as the ID.DNS name + ssl cert check I
mentioned before allow us to verify the authenticity of a namespace
before it is used. No other peer could register a zone with that
name unless the cert checks out. Within that zone Nova will prevent
collisions, but if things are really broken (accident or on purpose)
and it starts returning duplicate resource IDs, peer zones can choose
to just use one/none. We can document the behavior as undefined.

So, sure, you can still have duplicates within a zone (or other
namespace), but at least it's self contained and others peering with
it don't need to concern itself or worry about spoofing attacks within
it's own namespace.

> In the top-level zones of hybrid installations, all instances.get(id) calls 
> issued would have to assume they could get back more than one instance. Ugly, 
> but perhaps this is just the nature of the problem?

If we define the API for that call to only return a single instance,
it is up to the child zone to choose which one to send. If it tries
to return an array for a single ID, it would just be a protocol error
and fail.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Sandy Walsh
(sorry Eric, meant to send to the list)
-S

From: Eric Day [e...@oddments.org]
> Do we want this namespace per zone, deployment, resource owner, or some other 
> dimension?

Good question. We can prevent collisions at the zone level and within a 
deployment (single provider / multi-zone). But hybrid clusters are a different 
matter. Regardless of how we delineate it or which ID scheme we use, we have no 
way of detecting collisions.

In the top-level zones of hybrid installations, all instances.get(id) calls 
issued would have to assume they could get back more than one instance. Ugly, 
but perhaps this is just the nature of the problem?

This includes for 64-bit integer, 1-billion per zone approaches ... but so be 
it.

Let's just get something working.

-S


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
OK, time for everyone to step back and take a deep breath.

There are many implications of the earlier design decision to use 
integer PKs for database entries. Most who have responded here, myself 
included, have indicated that they would prefer that this be changed to either 
a string value comprised of several meaningful bits of information, or a UUID 
approach, or some combination of things that would address various things in 
the operation of a zoned design. I think that this will make an excellent 
discussion at next month's design summit!

But the reality is that this needs to be developed now, under the 
current design of integer PKs. Please note that the only concern here is how to 
reconcile the Rackspace API requirement of globally unique instance IDs with 
the current design of generating PKs in local databases at the compute node 
level. To my understanding, there is no other alternative than partitioning the 
available integer range across zones, so that each zone generates its instance 
PKs starting from a different number, and spaced far enough apart that they 
will never overlap.

In the first post of this thread, I proposed a simple partitioning 
system: allocating a range of integers for each zone, and asked for feedback as 
to what people would think would be a reasonable estimate for the maximum 
number of instances a zone would ever need to create. Most shared my distaste 
for this sort of partitioning system, but no one offered an alternative that 
would be workable given the current constraints. So I'm going to implement a 
partition of 1 billion integers per zone, which should allow for approximately 
1 billion zones, given a 64 bit integer for the PK. This should be workable for 
now, and after the design summit, when we've come to a consensus on changing 
the API to accept something other than integer identifiers, it should not be 
too difficult to retrofit.

Unless someone has a better idea... ;-)


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
On Wed, Mar 23, 2011 at 11:09:01PM +, Sandy Walsh wrote:
> From: Ewan Mellor [ewan.mel...@eu.citrix.com]
> > To your point about the boundary of preservation of ID, that's a good 
> > question.  If you ignore the security / trust issues, then the obvious 
> > answer is that IDs should be globally, infinitely, permanently unique.  
> > That's what UUIDs are for.  We can generate these randomly without any need 
> > for a central authority, and with no fear of collisions.  It would 
> > certainly be nice if my VM can leave my SoftLayer DC and arrive in my 
> > Rackspace DC and when it comes back I still know that it's the same VM.  
> > That's the OpenStack dream, right?
> 
> Hmm, I may have been swayed against UNC. Routing and caching can still be 
> layered on a UUID without having to parse it.

"If you ignore the security / trust issues..." but we can't ignore
them, so UUIDs alone are sufficient. Do we want this namespace per
zone, deployment, resource owner, or some other dimension?

I see the cases against per-zone with RHEL licensing, but pvo does
give an acceptable workaround. Besides that, I guess I don't see the
value in permanent instances. Tools, billing, etc. should work with a
changing working set. Having said that, I'd be ok with any of those as
namespace boundaries (although auth/owner gets nasty with federation),
as long as we have *something*.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Paul Voccio
Thanks for the clarification. I wasn't sure if you were actually
contradicting yourself as it seemed such an odd thing for you to do. : )

More below!


>
>I certainly didn't intend for those statements to be contradictory.  I
>don't think that they are.

Thanks for the clarification. I wasn't sure if you were actually
contradicting yourself as it seemed such an odd thing for you to do. : )


>
>My view is that identity should be preserved as long as it's possible to
>do so.  A VM that moves around, gets resized, gets rebooted, etc, should
>have the same identity.
>
>By "identity" I mean that other pieces of software should be able to tell
>that it's the same thing.  A billing system should be able to say "that's
>the same VM that I saw before".  For example, if I charge my customers
>for a month of usage, even if they only run the VM for a part of that
>month, then my billing system needs to be able to say "that VM has moved
>from here to here, but it's actually the same VM, so I'm charging for one
>month, not two".  This is the current charging scheme for RHEL instances
>hosted on Rackspace Cloud
>(http://www.rackspace.com/cloud/blog/2010/08/31/red-hat-license-fee-for-ra
>ckspace-cloud-servers-changing-from-hourly-to-monthly/), not just a
>corner-case example.

I can speak to this particular example as it only charges you for the max
number of RedHat vms you run for the month. With the caveat of this is how
it was explained to me, please consider the scenario:

Launch 2, 
Terminate 2
Launch 5
Terminate 2
Launch 3 
Terminate All

You get billed for the hours plus 6 RHEL licenses since that was your
peak. In your example above, if you terminated then started another
instance, that¹s really 2 instances, with only one active at any time. If
you launched one with cloned data from the other one and both are active
at the same time, its really a additional instance and the operator can
bill accordingly. I don't suppose this really matters for the point your
making and I'll concede that.

>
>You can invent similar arguments for penetration detection systems ("that
>VM is acting the way that it used to") or any other system for enforcing
>policy.
>
>If you are using some kind of location- or path-based identifier for that
>VM, then client software has to be notified of and keep track of all the
>movement of the VM.  If you have a unique identifier, then clients don't
>have to do any of this.
>
>My point about the UI was that we shouldn't worry about how complex these
>IDs should be.  We should make sure that bits of software can talk to
>each other correctly and simply, and base our ID scheme on those needs.
>Once we've figured out what ID scheme we're using, it's _trivial_ for a
>UI or CLI to turn those ugly IDs into "Paul's Apache server" and "Ewan's
>build machine".
I would agree with this.

>
>To your point about the boundary of preservation of ID, that's a good
>question.  If you ignore the security / trust issues, then the obvious
>answer is that IDs should be globally, infinitely, permanently unique.
>That's what UUIDs are for.  We can generate these randomly without any
>need for a central authority, and with no fear of collisions.  It would
>certainly be nice if my VM can leave my SoftLayer DC and arrive in my
>Rackspace DC and when it comes back I still know that it's the same VM.
>That's the OpenStack dream, right?

Is it the ID that matters or the data inside the vm? I think its really
about the data. Consistent Ids would be nice though.

>
>I'm willing to accept that that's difficult to achieve, and I'd
>compromise on identity only being preserved within an ownership/trust
>boundary.  I really don't see why I should lose track of my VM when it
>moves from one zone to another within a given provider though.

If we were to go with UUIDs and using XenServer, I should be able to use
the uuid that it generates upon VM creation. I would almost ask your above
question for XenServer then. When I terminate and launch an VM on the same
machine, I should be able to give it the same uuid that I was just using,
but I can't. Maybe I can and I'm making it harder on myself :)

pvo


>
>Ewan.
>



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Sandy Walsh
From: Ewan Mellor [ewan.mel...@eu.citrix.com]

> For example, if I charge my customers for a month of usage, even if they only 
> run the VM for a part of that month, then my billing system needs to be able 
> to say "that VM has moved from here to here, but it's actually the same VM, 
> so I'm charging for one month, not two".  This is the current charging scheme 
> for RHEL instances hosted on Rackspace Cloud 
> (http://www.rackspace.com/cloud/blog/2010/08/31/red-hat-license-fee-for-rackspace-cloud-servers-changing-from-hourly-to-monthly/),
>  not just a corner-case example.

Good use cases.

> To your point about the boundary of preservation of ID, that's a good 
> question.  If you ignore the security / trust issues, then the obvious answer 
> is that IDs should be globally, infinitely, permanently unique.  That's what 
> UUIDs are for.  We can generate these randomly without any need for a central 
> authority, and with no fear of collisions.  It would certainly be nice if my 
> VM can leave my SoftLayer DC and arrive in my Rackspace DC and when it comes 
> back I still know that it's the same VM.  That's the OpenStack dream, right?

Hmm, I may have been swayed against UNC. Routing and caching can still be 
layered on a UUID without having to parse it.

-S


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor
> -Original Message-
> From: Paul Voccio [mailto:paul.voc...@rackspace.com]
> Sent: 23 March 2011 22:19
> To: Ewan Mellor; Justin Santa Barbara; Eric Day
> Cc: openstack@lists.launchpad.net
> Subject: Re: [Openstack] Instance IDs and Multiple Zones
>
> >I don't agree at all.  There are many good reasons to preserve the
> >identity of a VM even when it's IP or location changes.  Billing, for
> >example.  Access control.  Intrusion detection.
> >
> >Just because I move a VM from one place to another, why would I expect
> >its identity to change?
> >
> >
> 
> Where do we put the boundary on the preservation of id? Within the same
> deployment? Within the same zone topology? I'm not quite following the
> billing aspect. If you shut one down and start another that is a
> problem
> for billing?
> 
> You stated earlier today:
> "We have to accept that, on the scales we care about, any unique ID is
> going to be incomprehensible to a human.  Rely on your presentation
> layer,
> that's what it's there for!"
> 
> Is this really different? If the id changes, should the user care if it is
> presented in the same way with the same data? Am I missing something?

I certainly didn't intend for those statements to be contradictory.  I don't 
think that they are.

My view is that identity should be preserved as long as it's possible to do so. 
 A VM that moves around, gets resized, gets rebooted, etc, should have the same 
identity.

By "identity" I mean that other pieces of software should be able to tell that 
it's the same thing.  A billing system should be able to say "that's the same 
VM that I saw before".  For example, if I charge my customers for a month of 
usage, even if they only run the VM for a part of that month, then my billing 
system needs to be able to say "that VM has moved from here to here, but it's 
actually the same VM, so I'm charging for one month, not two".  This is the 
current charging scheme for RHEL instances hosted on Rackspace Cloud 
(http://www.rackspace.com/cloud/blog/2010/08/31/red-hat-license-fee-for-rackspace-cloud-servers-changing-from-hourly-to-monthly/),
 not just a corner-case example.

You can invent similar arguments for penetration detection systems ("that VM is 
acting the way that it used to") or any other system for enforcing policy.

If you are using some kind of location- or path-based identifier for that VM, 
then client software has to be notified of and keep track of all the movement 
of the VM.  If you have a unique identifier, then clients don't have to do any 
of this.

My point about the UI was that we shouldn't worry about how complex these IDs 
should be.  We should make sure that bits of software can talk to each other 
correctly and simply, and base our ID scheme on those needs.  Once we've 
figured out what ID scheme we're using, it's _trivial_ for a UI or CLI to turn 
those ugly IDs into "Paul's Apache server" and "Ewan's build machine".

To your point about the boundary of preservation of ID, that's a good question. 
 If you ignore the security / trust issues, then the obvious answer is that IDs 
should be globally, infinitely, permanently unique.  That's what UUIDs are for. 
 We can generate these randomly without any need for a central authority, and 
with no fear of collisions.  It would certainly be nice if my VM can leave my 
SoftLayer DC and arrive in my Rackspace DC and when it comes back I still know 
that it's the same VM.  That's the OpenStack dream, right?

I'm willing to accept that that's difficult to achieve, and I'd compromise on 
identity only being preserved within an ownership/trust boundary.  I really 
don't see why I should lose track of my VM when it moves from one zone to 
another within a given provider though.

Ewan.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Paul Voccio



>
>I don't agree at all.  There are many good reasons to preserve the
>identity of a VM even when it's IP or location changes.  Billing, for
>example.  Access control.  Intrusion detection.
>
>Just because I move a VM from one place to another, why would I expect
>its identity to change?
>
>

Where do we put the boundary on the preservation of id? Within the same
deployment? Within the same zone topology? I'm not quite following the
billing aspect. If you shut one down and start another that is a problem
for billing? 

You stated earlier today:
"We have to accept that, on the scales we care about, any unique ID is
going to be incomprehensible to a human.  Rely on your presentation layer,
that's what it's there for!"

Is this really different? If the id changes, should the user care if it is
presented in the same way with the same data? Am I missing something?

pvo
>



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor
> From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net 
> [mailto:openstack-
> bounces+ewan.mellor=citrix@lists.launchpad.net] On Behalf Of Justin Santa 
> Barbara
> Sent: 23 March 2011 19:22
> To: Eric Day
> Cc: openstack@lists.launchpad.net
> Subject: Re: [Openstack] Instance IDs and Multiple Zones
>
> > Migrations outside of a zone would require a new
> > instance ID, but this should be fine, since other things would also
> > change (such as IP, available volumes, ...). A cross-zone migration
> > will be more of a copy+delete than a proper move.

> +1 on this.  If the IP is changing, there's little point in trying to keep 
> the ID the same.  Great 
> point.

I don't agree at all.  There are many good reasons to preserve the identity of 
a VM even when it's IP or location changes.  Billing, for example.  Access 
control.  Intrusion detection.

Just because I move a VM from one place to another, why would I expect its 
identity to change?

Ewan.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
On Wed, Mar 23, 2011 at 07:40:20PM +, Ed Leafe wrote:
> > Migrations outside of a zone would require a new
> > instance ID, but this should be fine, since other things would also
> > change (such as IP, available volumes, ...). 
> 
>   That's probably true in the Rackspace use case, as zones would most 
> likely be physically separate hardware, but nothing about zones makes that 
> mandatory. 

It does currently, I wasn't speaking specifically to Rackspace's
use case. Right now some network and volume code are not aware of
cross-zone issues, and instead assume they are the authority for things
like configured IP ranges. We can certainly change this, and if we
do want to allow proper instance migrations between zones, we would
need to allow instance IDs to change. I don't see the importance of
enabling cross-zone migrations (backup+restore seems sufficient). I
may be wrong, but if we did enable this functionality in the future,
I don't see a reason not to allow resource IDs to change.

Sounds like a design summit topic if folks feel we should support this.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 3:00 PM, Eric Day wrote:

> Migrations outside of a zone would require a new
> instance ID, but this should be fine, since other things would also
> change (such as IP, available volumes, ...). 

That's probably true in the Rackspace use case, as zones would most 
likely be physically separate hardware, but nothing about zones makes that 
mandatory. 


-- Ed Leafe


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Justin Santa Barbara
>
> Migrations outside of a zone would require a new
> instance ID, but this should be fine, since other things would also
> change (such as IP, available volumes, ...). A cross-zone migration
> will be more of a copy+delete than a proper move.


+1 on this.  If the IP is changing, there's little point in trying to keep
the ID the same.  Great point.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Justin Santa Barbara
Indeed, migrations are a major fly in the ointment for any strategy for
meaningful naming (i.e. anything that doesn't use a central DB).  It's not
clear to me with cross-zone migrations that we (a) can keep the same ID and
(b) want to keep the same ID even if we could...

We could look at tricks like storing a 'redirect' pointer after a migration.
 I think it all boils down to our use-case for migrations:

   - If it's the cloud provider that notices that one machine is overloaded
   / failing / whatever and wants to move a VM to another host in the same
   zone, that should keep the same ID because it should be transparent to the
   user, and I think this shouldn't be a problem, because each zone has a DB.
   - If the cloud provider has a problem affecting an entire zone (e.g. 5
   more minutes of UPS power), can they live migrate those machines to another
   zone?  This could be problematic, and is where redirect pointers might have
   to come in.  So the 'parent zone' would have to know that 'childzone1' is
   down and those machines are potentially now scattered amongst other zones.
   - If it's the user that wants to migrate a machine e.g. because they are
   fed up with the AWS datacenter and want to go to a competitor, then we don't
   necessarily have to guarantee the same ID.  This presumes this is indeed a
   supported scenario; it may be that even if we make this easy, we don't
   support _live_ migration here.

Do we have any scoping on the use cases for migrations?

In my suggestion, I wasn't really talking about an external registry (other
than DNS to locate the top-level zones)  I don't think my proposed solution
addresses the issue of changing IDs in cross-zone migrations.

Looks like we might need an extended session for this at the Design Summit
:-)

Justin
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
I asked for further details in IRC, which started a discussion
there. To sum up, most folks agree migrations within a zone won't
require a new instance ID. Nothing changes except the compute host
it's running on. Migrations outside of a zone would require a new
instance ID, but this should be fine, since other things would also
change (such as IP, available volumes, ...). A cross-zone migration
will be more of a copy+delete than a proper move.

-Eric

On Wed, Mar 23, 2011 at 06:14:31PM +, Sandy Walsh wrote:
> Pvo brought up a good use case for naming a little while ago: Migrations.
> 
> If we use the instance id (assume UNC) to provide hints to the target zone, 
> this means the instance id would need to change should the instance move 
> locations. That's a no-no by everyone's measure. 
> 
> So, now I'm thinking more about Justin's comment about an external registry.
> 
> Perhaps a glance-like entry with metadata that can change?
> 
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use of the
> individual or entity to which this message is addressed, and unless otherwise
> expressly indicated, is confidential and privileged information of Rackspace.
> Any dissemination, distribution or copying of the enclosed material is 
> prohibited.
> If you receive this transmission in error, please notify us immediately by 
> e-mail
> at ab...@rackspace.com, and delete the original message.
> Your cooperation is appreciated.
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Sandy Walsh
Pvo brought up a good use case for naming a little while ago: Migrations.

If we use the instance id (assume UNC) to provide hints to the target zone, 
this means the instance id would need to change should the instance move 
locations. That's a no-no by everyone's measure. 

So, now I'm thinking more about Justin's comment about an external registry.

Perhaps a glance-like entry with metadata that can change?

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Eric Day
Hi Ed,

On Wed, Mar 23, 2011 at 08:15:54AM -0400, Ed Leafe wrote:
> On Mar 23, 2011, at 1:55 AM, Eric Day wrote:
> 
> > If we provide some structure to the IDs, such as DNS names, we not only
> > solve this namespacing problem but we also get a much more efficient
> > routing mechanism.
> 
> 
>   When I read things like this, the DBA in me winces a little. Meaningful 
> PKs, compound PKs - they always end up being a Very Bad Thing. If you want to 
> add efficient DNS routing, that could be added as additional data about an 
> instance that is periodically updated up the zone structure along with the 
> other capability information, but until now we've passed on that as a 
> premature optimization. That was one of the major arguments in favor of the 
> global DB design.

We're talking about a number of partitioning schemes, reserved bits,
URNs, URIs, etc. Because of the namespace issue I believe we will
need some structure to our resource names.

> > Lets say you have api.rackspace.com (global aggregation zone),
> > rack1.dfw.rackspace.com (real zone running instances), and
> > bursty.customer.com (private zone). Bursty is a rackspace customer
> > and they want to leverage their private resources alongside the
> > public cloud, so they add bursty.customer.com as a private zone
> > for their Rackspace account. The api.rackspace.com server now gets
> > a terminate request for  and it needs to know where to route
> > the request. If we have a global namespace for instances (such as
> > UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both
> > have servers for  (most likely from bursty spoofing the ID). Now
> > api.rackspace.com doesn't know who to forward the request to.
> 
>   Even if this scenario were to happen, and nova tried to delete an 
> instance with a spoofed ID that did *not* belong to Bursty, it would fail due 
> to improper auth. Otherwise, even without zones/uuids/whatever, I could send 
> termination requests to the API with random IDs and delete any machines with 
> those IDs, whether I had rights to them or not. 

This implies the resource is now uniquely identified along with auth
credentials, which means the resource name cannot stand alone. If
we do have collisions due to spoofing, we're going to see ambiguity
issues crop up in other systems that don't have the auth context. I
strongly believe we need unique resource names that stand on our own
and don't depend on any other component such as auth.

>   In the current zone design, a request to terminate  would not be 
> handled by the outermost zone, since it wouldn't have instances, so it would 
> be forward to each child zone. This would repeat down the zone hierarchy 
> until either there were no more child zones, or a zone found that it had an 
> instance with that ID. In the Bursty example, two zones would find an 
> instance with that ID; one would fail due to auth, and the one owned by 
> Bursty would be terminated as requested. The only way more than one instance 
> would terminate would be if Bursty spoofed their own IDs, which would be 
> their problem, not ours.

I think the "In the current zone design" is my main concern. This
discussions is taking into account how things need to work in the
near future, not just now. We've punted on routing for now and are
simply sending the request to every zone, but this won't work in the
long run. If we had a large public cloud with hundreds of zones,
and thousands of bursting zones, things will get prohibitively
expensive. It's not that they won't function, it just may be
unreasonable response time.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Chris Behrens
You have a fundamental misunderstanding of my fundamental understanding of how 
inter-zone communication works. :)  I understand how it works.  I'm asking 
about an admin API that has privileges for actions for all VMs.  As an ISP, I 
want to disable a particular VM because it's being 'bad'.  If someone has 
injected a collision, I would be sending an action to more than 1 VM, not only 
the intended target.  I don't see how collisions can be made to work at all.

And yes, we're talking about spoofing (or really, purposefully colliding a 
known UUID).  I haven't seen any mention to anything else (although I may have 
missed it).  I'm certainly really not worried about machine generated UUIDs 
colliding, myself.

But what we're also talking about here is efficient routing.  Is it necessary?  
No.  Would it scale?  Yes.  A zone name or ID needs to be part of the 
identifier.  I prefer the DNS name idea, although prefixing UUIDs or reserving 
bits in a UUID could also work.

- Chris

On Mar 23, 2011, at 9:01 AM, Ed Leafe wrote:

> On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote:
> 
>> How would the admin API know which ID to work with if there are collisions?  
>> Eric's point is that we'd not know where to route the request.
> 
> 
>   This reflects a fundamental misunderstanding of the way inter-zone 
> communication works. There is no direct routing. Instead, a zone "knows" 
> about its instances and its child zones. If the zone receives a request for 
> some action involving a particular instance, it checks if it has that 
> instance among its compute nodes; if not, it forwards the request to each of 
> its child zones. That is repeated until the leaf zones are reached, and most 
> of those will respond with something akin to a 404, indicating that they 
> didn't handle the request. The zone that does have the requested instance, 
> though, will carry out the action and return the result of that action.
> 
>   The child zone responses are then aggregated. If all indicate 404, the 
> zone returns the same. If one child responds that it has handled the request, 
> that response is returned. This repeats back up the zone tree until the zone 
> that originally received the request has heard from all of its child zones 
> (or they timed out). 
> 
>   If there were to be a collision (i.e., two leaf nodes handling the 
> request), there are only two possibilities: either the authenticated user has 
> rights to those nodes, or they do not. If they do not, nothing will happen 
> beyond an authorization failure message. If they do have rights to both 
> instances, then the action will happen to both instances. Since the context 
> of this discussion is deliberate spoofing, my response would be "serves them 
> right". :)
> 
>   So it seems that spoofing should have no effect, assuming that our 
> authentication/authorization system is sound. If it isn't, then we have 
> bigger issues than just ID spoofing, since I could write a program to send 
> API delete requests for random instance IDs - no spoofing required.
> 
>   Without spoofing, let's be realistic: the chance of duplicate uuid 
> values colliding is much, much smaller than the chance of a meteorite 
> smashing into our data centers. From Wikipedia: "In other words, only after 
> generating 1 billion UUIDs every second for the next 100 years, the 
> probability of creating just one duplicate would be about 50%". I believe 
> that that is well beyond our scalability goals, so we can effectively ignore 
> the impact of non-spoofed collisions.
> 
> 
> -- Ed Leafe
> 
> 
> 


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote:

> How would the admin API know which ID to work with if there are collisions?  
> Eric's point is that we'd not know where to route the request.


This reflects a fundamental misunderstanding of the way inter-zone 
communication works. There is no direct routing. Instead, a zone "knows" about 
its instances and its child zones. If the zone receives a request for some 
action involving a particular instance, it checks if it has that instance among 
its compute nodes; if not, it forwards the request to each of its child zones. 
That is repeated until the leaf zones are reached, and most of those will 
respond with something akin to a 404, indicating that they didn't handle the 
request. The zone that does have the requested instance, though, will carry out 
the action and return the result of that action.

The child zone responses are then aggregated. If all indicate 404, the 
zone returns the same. If one child responds that it has handled the request, 
that response is returned. This repeats back up the zone tree until the zone 
that originally received the request has heard from all of its child zones (or 
they timed out). 

If there were to be a collision (i.e., two leaf nodes handling the 
request), there are only two possibilities: either the authenticated user has 
rights to those nodes, or they do not. If they do not, nothing will happen 
beyond an authorization failure message. If they do have rights to both 
instances, then the action will happen to both instances. Since the context of 
this discussion is deliberate spoofing, my response would be "serves them 
right". :)

So it seems that spoofing should have no effect, assuming that our 
authentication/authorization system is sound. If it isn't, then we have bigger 
issues than just ID spoofing, since I could write a program to send API delete 
requests for random instance IDs - no spoofing required.

Without spoofing, let's be realistic: the chance of duplicate uuid 
values colliding is much, much smaller than the chance of a meteorite smashing 
into our data centers. From Wikipedia: "In other words, only after generating 1 
billion UUIDs every second for the next 100 years, the probability of creating 
just one duplicate would be about 50%". I believe that that is well beyond our 
scalability goals, so we can effectively ignore the impact of non-spoofed 
collisions.


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Chris Behrens
How would the admin API know which ID to work with if there are collisions?  
Eric's point is that we'd not know where to route the request.


On Mar 23, 2011, at 5:15 AM, Ed Leafe wrote:

> On Mar 23, 2011, at 1:55 AM, Eric Day wrote:
> 
>> If we provide some structure to the IDs, such as DNS names, we not only
>> solve this namespacing problem but we also get a much more efficient
>> routing mechanism.
> 
> 
>   When I read things like this, the DBA in me winces a little. Meaningful 
> PKs, compound PKs - they always end up being a Very Bad Thing. If you want to 
> add efficient DNS routing, that could be added as additional data about an 
> instance that is periodically updated up the zone structure along with the 
> other capability information, but until now we've passed on that as a 
> premature optimization. That was one of the major arguments in favor of the 
> global DB design.
> 
>> Lets say you have api.rackspace.com (global aggregation zone),
>> rack1.dfw.rackspace.com (real zone running instances), and
>> bursty.customer.com (private zone). Bursty is a rackspace customer
>> and they want to leverage their private resources alongside the
>> public cloud, so they add bursty.customer.com as a private zone
>> for their Rackspace account. The api.rackspace.com server now gets
>> a terminate request for  and it needs to know where to route
>> the request. If we have a global namespace for instances (such as
>> UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both
>> have servers for  (most likely from bursty spoofing the ID). Now
>> api.rackspace.com doesn't know who to forward the request to.
> 
>   Even if this scenario were to happen, and nova tried to delete an 
> instance with a spoofed ID that did *not* belong to Bursty, it would fail due 
> to improper auth. Otherwise, even without zones/uuids/whatever, I could send 
> termination requests to the API with random IDs and delete any machines with 
> those IDs, whether I had rights to them or not. 
> 
>   In the current zone design, a request to terminate  would not be 
> handled by the outermost zone, since it wouldn't have instances, so it would 
> be forward to each child zone. This would repeat down the zone hierarchy 
> until either there were no more child zones, or a zone found that it had an 
> instance with that ID. In the Bursty example, two zones would find an 
> instance with that ID; one would fail due to auth, and the one owned by 
> Bursty would be terminated as requested. The only way more than one instance 
> would terminate would be if Bursty spoofed their own IDs, which would be 
> their problem, not ours.
> 
> 
> -- Ed Leafe
> 
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 8:46 AM, Ewan Mellor wrote:

> We have to accept that, on the scales we care about, any unique ID is going 
> to be incomprehensible to a human.  Rely on your presentation layer, that's 
> what it's there for!


+1


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor
We shouldn't keep tainting this argument with concerns about whether the IDs 
are readable or not.  We have UIs and CLIs to make things readable for humans.

We have to accept that, on the scales we care about, any unique ID is going to 
be incomprehensible to a human.  Rely on your presentation layer, that's what 
it's there for!

Ewan.

> -Original Message-
> From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net
> [mailto:openstack-bounces+ewan.mellor=citrix@lists.launchpad.net]
> On Behalf Of Sandy Walsh
> Sent: 23 March 2011 12:30
> To: openstack@lists.launchpad.net
> Subject: Re: [Openstack] Instance IDs and Multiple Zones
> 
> Good conversation guys. Certainly something we need to get settled out
> sooner than later.
> 
> On naming:
> 
> No matter how we shake it out (prefixes, mac address, time, etc), we're
> essentially fabricating our own form of UUID ... trying to pick some
> unique qualifier(s) to avoid collisions.
> 
> I think the real driver is making something that is as-short-as-
> possible and mnemonic enough that a user could look at it and say "yup,
> that's mine". Personally, I find UUID's to be ugly monsters and think
> URN's are better for providing a mnemonic for remembering names.
> 
> Given: "6373-ba62-9847-feab-b72a-00dd" vs.
> "rax:ord:zone3:rack2:cust29:inst383" ... give me a URN anytime.
> However, this does pose security risks by exposing internal layouts.
> 
> We currently allow a user supplied friendly name but under-the-hood use
> the instance ID. Since customers use different auth credentials their
> instances live in different Projects and there is no conflict.
> Duplicate names are allowed across customers (even within customers?)
> Downside is there are no hints for routing from names.
> 
> On bursting:
> 
> Currently, the Instance ID is fabricated in the zone where the create()
> call was handled. This Instance ID is treated like a Reservation #
> which is returned to the user for later follow-up (since provisioning
> can take a while).
> 
> The way I currently envision bursting with zones is that the commercial
> zones would be the leaf zones in a deployment. That is, instances would
> be provisioned locally first (depending on Server Best Match) due to
> their low weight scores and ultimately "burst" through the bottom of
> the zone tree to the commercial cloud.
> 
> I think this works well. If I have a hybrid cloud and issue 'nova list'
> I would see something like:
> 
> "sleepy" - com:myco:development:inst1
> "dopey" - com:myco:development:inst2
> "blinky" - com:myco:development:inst3
> "inky" - rax:ord:zone3:rack2:cust293:inst393
> "pinky" - rax:ord:zone2:rack34:cust293:inst8746
> "clyde" - bobscloud:basement:shelf2:cust9:inst8
> 
> and get a good idea of what's what.
> 
> 
> 
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use
> of the
> individual or entity to which this message is addressed, and unless
> otherwise
> expressly indicated, is confidential and privileged information of
> Rackspace.
> Any dissemination, distribution or copying of the enclosed material is
> prohibited.
> If you receive this transmission in error, please notify us immediately
> by e-mail
> at ab...@rackspace.com, and delete the original message.
> Your cooperation is appreciated.
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Sandy Walsh
Good conversation guys. Certainly something we need to get settled out sooner 
than later.

On naming:

No matter how we shake it out (prefixes, mac address, time, etc), we're 
essentially fabricating our own form of UUID ... trying to pick some unique 
qualifier(s) to avoid collisions. 

I think the real driver is making something that is as-short-as-possible and 
mnemonic enough that a user could look at it and say "yup, that's mine". 
Personally, I find UUID's to be ugly monsters and think URN's are better for 
providing a mnemonic for remembering names. 

Given: "6373-ba62-9847-feab-b72a-00dd" vs. "rax:ord:zone3:rack2:cust29:inst383" 
... give me a URN anytime. However, this does pose security risks by exposing 
internal layouts. 

We currently allow a user supplied friendly name but under-the-hood use the 
instance ID. Since customers use different auth credentials their instances 
live in different Projects and there is no conflict. Duplicate names are 
allowed across customers (even within customers?) Downside is there are no 
hints for routing from names.

On bursting: 

Currently, the Instance ID is fabricated in the zone where the create() call 
was handled. This Instance ID is treated like a Reservation # which is returned 
to the user for later follow-up (since provisioning can take a while).

The way I currently envision bursting with zones is that the commercial zones 
would be the leaf zones in a deployment. That is, instances would be 
provisioned locally first (depending on Server Best Match) due to their low 
weight scores and ultimately "burst" through the bottom of the zone tree to the 
commercial cloud. 

I think this works well. If I have a hybrid cloud and issue 'nova list' I would 
see something like:

"sleepy" - com:myco:development:inst1
"dopey" - com:myco:development:inst2
"blinky" - com:myco:development:inst3
"inky" - rax:ord:zone3:rack2:cust293:inst393
"pinky" - rax:ord:zone2:rack34:cust293:inst8746
"clyde" - bobscloud:basement:shelf2:cust9:inst8

and get a good idea of what's what.



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ed Leafe
On Mar 23, 2011, at 1:55 AM, Eric Day wrote:

> If we provide some structure to the IDs, such as DNS names, we not only
> solve this namespacing problem but we also get a much more efficient
> routing mechanism.


When I read things like this, the DBA in me winces a little. Meaningful 
PKs, compound PKs - they always end up being a Very Bad Thing. If you want to 
add efficient DNS routing, that could be added as additional data about an 
instance that is periodically updated up the zone structure along with the 
other capability information, but until now we've passed on that as a premature 
optimization. That was one of the major arguments in favor of the global DB 
design.

> Lets say you have api.rackspace.com (global aggregation zone),
> rack1.dfw.rackspace.com (real zone running instances), and
> bursty.customer.com (private zone). Bursty is a rackspace customer
> and they want to leverage their private resources alongside the
> public cloud, so they add bursty.customer.com as a private zone
> for their Rackspace account. The api.rackspace.com server now gets
> a terminate request for  and it needs to know where to route
> the request. If we have a global namespace for instances (such as
> UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both
> have servers for  (most likely from bursty spoofing the ID). Now
> api.rackspace.com doesn't know who to forward the request to.

Even if this scenario were to happen, and nova tried to delete an 
instance with a spoofed ID that did *not* belong to Bursty, it would fail due 
to improper auth. Otherwise, even without zones/uuids/whatever, I could send 
termination requests to the API with random IDs and delete any machines with 
those IDs, whether I had rights to them or not. 

In the current zone design, a request to terminate  would not be 
handled by the outermost zone, since it wouldn't have instances, so it would be 
forward to each child zone. This would repeat down the zone hierarchy until 
either there were no more child zones, or a zone found that it had an instance 
with that ID. In the Bursty example, two zones would find an instance with that 
ID; one would fail due to auth, and the one owned by Bursty would be terminated 
as requested. The only way more than one instance would terminate would be if 
Bursty spoofed their own IDs, which would be their problem, not ours.


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day
Hi Erik,

You bring up some really great points and possible solutions. Comments
inline:

On Wed, Mar 23, 2011 at 01:02:55AM +, Erik Carlin wrote:
> I assume the lowest zone (zone D) is responsible for assigning the id?

Yes, the zone that actually contains the running VM should be the
fully qualified name.

> Does that mean there are now 4 URIs for the same exact resource (I'm
> assuming a numeric server id here for a moment):
> 
> http://zoned.dfw.servers.rackspace.com/v1.1/123/servers/12345 (this would
> be non-public)
> http://dfw.servers.rackspace.com/v1.1/123/servers/12345
> http://servers.osprovider.com/v1.1/456/servers/12345
> http://servers.myos.com/v1.1/789/servers/12345

Well, this is four ways of accessing the resource if 12345 is actually
a globally unique ID (more on that later). Lets not confuse API
endpoints with fully-qualified resource names. A resource name
is used in more places than APIs (billing, logging, other API
versions/protocols, etc.). There could be thousands of places that
you can use a resource ID, but only one ID for a given resource.

> I assume then the user is only returned the URI from the high level zone
> they are hitting (http://servers.myos.com/v1.1/789/servers/12345 in this
> example)?  If so, that means the high level zone defines everything in the
> URI except the actually server ID which is assigned by the low level zone.
>  Would users ever get returned a "downstream" URI they could hit directly?

Perhaps, but for the simple case probably not. If we use DNS names
for resources, and DNS services are fully integrated into Nova, you
could potentially get the most specific endpoint or use SRV records
as Justin suggests. In any case, the user of the high-level API gets
back a resource record which they can use again with this API or any
other that can route to the final zone. Just like it doesn't matter
which DNS server I query when I want an IP, they all return the same
IP(s) for openstack.org.

> Pure numeric ids will not work in a federated model at scale.  If you have
> registered zone prefixes/suffixes, you will limit the total zone count
> based on the number of digits you preallocate and need a registration
> process to ensure uniqueness.  How many zones is enough?

Agreed. I'm a bit confused though because in the next paragraph you
mention using a UUID. A UUID is a pure numeric ID, just large enough
that it is not likely to conflict. Depending on implementation this
could be random, time-based, MAC address based, or a combination. In
any case the meaning of the bits varies so you can't count on
structure, so you're left with a simple 128bit number. The only
difference from what we have now is that it's twice the size and not
sequentially assigned.

> You could use UUID.  If the above flow is accurate, I can only see how you
> create collisions in your OWN OS deployment.  For example, if I
> purposefully create a UUID collision in servers.myos.com (that I run) with
> dfw.servers.rackspace.com (that Rackspace runs), it would only affect me
> since the collision would only be seen in the servers.myos.com namespace.
> Maybe I'm missing something, but I don't see how you could inject a
> collision ID downstream - you can just shoot yourself in your own foot.

Lets say you have api.rackspace.com (global aggregation zone),
rack1.dfw.rackspace.com (real zone running instances), and
bursty.customer.com (private zone). Bursty is a rackspace customer
and they want to leverage their private resources alongside the
public cloud, so they add bursty.customer.com as a private zone
for their Rackspace account. The api.rackspace.com server now gets
a terminate request for  and it needs to know where to route
the request. If we have a global namespace for instances (such as
UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both
have servers for  (most likely from bursty spoofing the ID). Now
api.rackspace.com doesn't know who to forward the request to.

If we provide some structure to the IDs, such as DNS names, we not only
solve this namespacing problem but we also get a much more efficient
routing mechanism. I no longer need to cache every UUID for every peer
zone, I can just map *.bursty.customer.com to the bursty.customer.com
zone. We may still need to cache the list of instances though for quick
'list my instances' queries, so this may not be as important.

> Eric Day, please jump in here if I am off.  AFAICT, same applies to dns
> (which I will discuss more below).  I could just make my server ID dns
> namespace collide with rackspace, but it would still only affect me in my
> own URI namespace.

In my previous email I mentioned if we do use DNS names we'll need
zone name (which is also a DNS name) verification. For example,
when the customer adds bursty.customer.com to api.rackspace.com for
peering, api.rackspace.com will not add the zone or attempt to discover
resources until the authenticity of this zone is verified. The most
obvious method is SSL cert 

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Paul Voccio
Ed, 

I spoke with Jorge earlier today and this is still treated as the instance
id. That instance can fail or succeed, but the id of what you call to
retrieve that status never changes.

pvo

On 3/22/11 2:55 PM, "Ed Leafe"  wrote:

>On Mar 22, 2011, at 3:20 PM, Paul Voccio wrote:
>
>> This means the api isn't
>> asynchronous if it has to wait for the zone to create the id. From page
>>46
>> of the API Spec states the following:
>> 
>> "Note that when creating a server only the server ID and the admin
>> password are guaranteed to be returned in the request object. Additional
>> attributes may be retrieved by performing subsequent GETs on the
>>server."
>> 
>> 
>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>> public cloud, which has to wait for Z3-X to find out where it is going
>>be
>> placed. How would this work?
>
>
>I thought this had been changed to return a reservation ID, which
>would then be used to get information about the instance once it had been
>created. That would allow the API to return immediately without having to
>wait for a host to be selected, an instance to be created, networking to
>be configured, etc.
>
>
>-- Ed Leafe
>


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
> Pure numeric ids will not work in a federated model at scale.
>

Agreed


> Maybe I'm missing something, but I don't see how you could inject a
> collision ID downstream - you can just shoot yourself in your own foot.


I think that you can get away with it only in simple hierarchical
structures.  Suppose cloud users are combining multiple public clouds into
their own 'megaclouds'.  If I'm an evil public cloud operator, I can start
handing out UUIDs that match any UUIDs I can discover on the Rackspace
cloud, and anyone that has constructed a cloud that combines my cloud and
Rackspace would have collisions.  Users wouldn't easily know who to blame
either.

The other option apart from UUID is a globally unique string prefix.  If
> Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones,
> the ID would need to be something like rax:dfw:1:12345 (I would actually
> want to hash the zone id "1" portion with something unique per customer so
> people couldn't coordinate info about zones and target attacks, etc.).
> This is obviously redundant with the Rackspace URI since we are
> representing Rackspace and the region twice, e.g.
> http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789.
>

I am in favor of this option, but with a few tweaks:

1) We use DNS, rather than inventing and administering our own scheme
2) I think the server ID looks like
dfw.rackspace.com/servers/a282-a6-cj7aks89.  It's not necessarily a valid
HTTP endpoint, because there's a mapping to a protocol request
3) The client maps it by "filling in" the http/https protocol (or whatever
protocol it is e.g. direct queuing), and it fills in v1.1 because that's the
dialect it speaks.
4) Part of the mapping could be to map from a DNS name to an endpoint,
perhaps using _srv records (I'm sure I'm mangling all the terminology here)
5) This also allows a form of discovery ... if I tell my cloud controller I
want to use rackspace.com, it can then look up the _srv record, find the
endpoint (e.g. openstack.rackspace.com), then do a zone listing request and
find child zones etc.  If I ask my monitoring system to monitor "
rackspace.com/servers/a6cj7aks89", it knows how to map that to an openstack
endpoint.  Auth is another story of course.

Using strings also means people could make ids whatever they want as long
> as they obeyed the prefix/suffix.  So one provider could be
> rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn.  That is
> technically not a big deal, but there is something for consistency and
> simplicity.


True.  We could restrict the character set to A-Z,0-9 and a few other 'safe
characters' if this is a real problem.  We probably should eliminate
difficult-to-encode characters anyway, whether encoding means umlauts or
url-encoding.


> The fundamental problem I see here is URI is intended to be the universal
> resource identifier but since zone federation will create multiple URIs
> for the same resource, the server id now has to be ANOTHER universal
> resource identifier.
>

I think the server ID should be the unique identifier, and is more important
than the REST representation.  I think we should avoid remapping the URI
unless we have to... (more later)

It will be obvious in which deployment the servers live.  This will
> effectively prevent whitelabel federating.  UUID would be more opaque.
>

Whitelabel federation for reselling an underlying provider can easily be
done by rewriting strings: id.replace("rackspace.com", "a.justinsbcloud.com
").replace("citrix.com", "b.justinsbcloud.com").  I suspect the same
approach would work for internal implementation zones also.  The truly
dedicated will discover the underlying structure whatever scheme you put in
place.

Would users ever get returned a "downstream" URI they could hit directly?
>

So now finally I think I can answer this (with my opinion)... Users should
usually get the downstream URI.  Just like DNS, they can either use that URI
directly, or - preferably - use a local openstack endpoint, which acts a bit
like a local DNS resolver.  Your local openstack "proxy" could also do
things like authentication, so - for example - I authenticate to my local
proxy, and it then signs my request before forwarding it.  This could also
deal with the billing issue - the proxy can do charge-back and enforce
internal spending limits and policies, and the public clouds can then bill
the organization in aggregate.

If you need the proxy to sign your requests, then you _can't_ use the
downstream URI directly, which is a great control technique.

Some clouds will want to use zones for internal operational reasons, and
will want to keep the inner zones secret.  So there, we need something like
NAT: the front-end zone translates between "public" and "private" IDs as
they travel in and out.  How that translation works is
deployment-dependent... they could have a mapping database, or could try to
figure out a function which is aware of their internal structure to do thi

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Erik Carlin
Good discussion.  I need to understand a bit more about how cross org
boundary bursting is envisioned to work before assessing the implications
on server id format.

Say a user hits the http://servers.myos.com api on zone A, which then
calls out to http://servers.osprovider.com api in zone B, which calls out
to http://dfw.servers.rackspace.com zone C, which calls down to
http://zoned.dfw.servers.rackspace.com zone D (which would not be a public
endpoint).  

[We'll exclude authN and the network implications for now :->]

I assume the lowest zone (zone D) is responsible for assigning the id?

Does that mean there are now 4 URIs for the same exact resource (I'm
assuming a numeric server id here for a moment):

http://zoned.dfw.servers.rackspace.com/v1.1/123/servers/12345 (this would
be non-public)
http://dfw.servers.rackspace.com/v1.1/123/servers/12345
http://servers.osprovider.com/v1.1/456/servers/12345
http://servers.myos.com/v1.1/789/servers/12345

I assume then the user is only returned the URI from the high level zone
they are hitting (http://servers.myos.com/v1.1/789/servers/12345 in this
example)?  If so, that means the high level zone defines everything in the
URI except the actually server ID which is assigned by the low level zone.
 Would users ever get returned a "downstream" URI they could hit directly?

Pure numeric ids will not work in a federated model at scale.  If you have
registered zone prefixes/suffixes, you will limit the total zone count
based on the number of digits you preallocate and need a registration
process to ensure uniqueness.  How many zones is enough?

You could use UUID.  If the above flow is accurate, I can only see how you
create collisions in your OWN OS deployment.  For example, if I
purposefully create a UUID collision in servers.myos.com (that I run) with
dfw.servers.rackspace.com (that Rackspace runs), it would only affect me
since the collision would only be seen in the servers.myos.com namespace.
Maybe I'm missing something, but I don't see how you could inject a
collision ID downstream - you can just shoot yourself in your own foot.
Eric Day, please jump in here if I am off.  AFAICT, same applies to dns
(which I will discuss more below).  I could just make my server ID dns
namespace collide with rackspace, but it would still only affect me in my
own URI namespace.

The other option apart from UUID is a globally unique string prefix.  If
Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones,
the ID would need to be something like rax:dfw:1:12345 (I would actually
want to hash the zone id "1" portion with something unique per customer so
people couldn't coordinate info about zones and target attacks, etc.).
This is obviously redundant with the Rackspace URI since we are
representing Rackspace and the region twice, e.g.
http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789.

This option also means we need a mechanism for registering unique
prefixes.  We could use the same one we are proposing for API extensions,
or, as Eric pointed out, use dns, but that would REALLY get redundant,
e.g. 
http://dfw.servers.rackspace.com/v1.1/12345/servers/6789.dfw.servers.racksp
ace.com.

Using strings also means people could make ids whatever they want as long
as they obeyed the prefix/suffix.  So one provider could be
rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn.  That is
technically not a big deal, but there is something for consistency and
simplicity.


The fundamental problem I see here is URI is intended to be the universal
resource identifier but since zone federation will create multiple URIs
for the same resource, the server id now has to be ANOTHER universal
resource identifier.

Another issue is whether you want transparency or opaqueness when you are
federating.  If you hit http://servers.myos.com, create two servers, and
the ids that come back are (assuming using dns as server ids for a moment):

http://servers.myos.com/v1.1/12345/servers/5678.servers.myos.com

http://servers.myos.com/v1.1/12345/servers/6789.dfw.servers.rackspace.com

It will be obvious in which deployment the servers live.  This will
effectively prevent whitelabel federating.  UUID would be more opaque.

Given all of the above, I think I lean towards UUID.

Would love to hear more thought and dialog on this.

Erik  



On 3/22/11 3:49 PM, "Eric Day"  wrote:

>See my previous response to Justin's email as to why UUIDs alone are
>not sifficient.
>
>-Eric
>
>On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote:
>> +1
>> Sounds like some IPV6 discussions back when the standards were being
>>debated.  We could debate bit-allocation forever.  Why can't we use
>>UUIDs?
>> 
>> http://tools.ietf.org/html/rfc4122
>> 
>> """
>> 2.  Motivation
>> 
>> 
>>One of the main reasons for using UUIDs is that no centralized
>>authority is required to administer them (although one format uses
>>IEEE 802 node identifiers, others do not).  As a result, generation

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day
See my previous response to Justin's email as to why UUIDs alone are
not sifficient.

-Eric

On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote:
> +1
> Sounds like some IPV6 discussions back when the standards were being debated. 
>  We could debate bit-allocation forever.  Why can't we use UUIDs?
> 
> http://tools.ietf.org/html/rfc4122
> 
> """
> 2.  Motivation
> 
> 
>One of the main reasons for using UUIDs is that no centralized
>authority is required to administer them (although one format uses
>IEEE 802 node identifiers, others do not).  As a result, generation
>on demand can be completely automated, and used for a variety of
>purposes.  The UUID generation algorithm described here supports very
>high allocation rates of up to 10 million per second per machine if
>necessary, so that they could even be used as transaction IDs.
> 
>UUIDs are of a fixed size (128 bits) which is reasonably small
>compared to other alternatives.  This lends itself well to sorting,
>ordering, and hashing of all sorts, storing in databases, simple
>allocation, and ease of programming in general.
> 
>Since UUIDs are unique and persistent, they make excellent Uniform
>Resource Names.  The unique ability to generate a new UUID without a
>registration process allows for UUIDs to be one of the URNs with the
>lowest minting cost.
> 
> """
> 
> Brian Schott
> bfsch...@gmail.com
> 
> 
> 
> On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:
> 
> > I know you don't want to resurrect a past discussion. But, UUIDs are
> > designed to solve these kind of problems, frankly. The decision to go
> > with integer IDs is a poor one, and will be negatively affecting the
> > scalability and architecture of our systems well into the future.
> > 
> > I'd love to see a discussion around moving away from internal integer
> > identifiers and towards UUID internal identifiers at the next summit.
> > 
> > Just my 2 cents,
> > -jay
> > 
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~openstack
> > More help   : https://help.launchpad.net/ListHelp
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Brian Schott
I remember reading this a while ago.  Not saying we have to do this.  This is 
probably why zones are independent and ids are not unique across zones in EC2.  

This could be handled in the ec2 api service for compatibility.  We could just 
XOR the  top half and the bottom half of a UUID and get a unique hash that just 
the EC2 API needs to keep track of.  The only important thing is that the USER 
doesn't get id collisions.

---

http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/

Anatomy of a Resource ID

So how were the numbers above calculated? To find out, let’s decompose an EC2 
resource ID. After comparing hundreds of IDs, this opaque identifier turned out 
to be a little more transparent than you’d expect.

<>
Type

The most trivial of the fields, the type is one of the following values, 
depending on the resource type:

• i – instance
• r – reservation
• vol – EBS volume
• snap – EBS snapshot
• ami – Amazon machine image
• aki – Amazon kernel image
• ari – Amazon ramdisk image
Inner ID

The Inner ID is a 16-bit counter of resources allocated. Each time a resource 
is requested, the Inner ID increments by one. For instance and reservation IDs, 
it increments by two (i.e., these Inner IDs are always even). Instead of 
counting from 0- as you’d expect, the Inner ID uses the following cycle:

• 4000-7FFF
• -3FFF
• C000-
• 8000-BFFF
(This cycle can be easily normalized by XORing with 4000.) When the Inner ID 
has exhausted its space, a new series begins (see below) and the cycle restarts.

Series Marker

For a given resource type, there is one active 8-bit Series ID. This Series ID, 
however, is not embedded directly into the resource ID. Instead, it is XORed to 
the leftmost 8 bits of the Inner ID. The result, which I call the Series 
Marker, is embedded in the ID to the left of the Inner ID.

For example, on the resource ID above the Series ID would be e5 = a7 XOR 42.

Series IDs usually decrement by one each time the Inner ID completes a cycle. I 
say “usually” because while this is the most common behavior, from time to time 
Series IDs seem to jump around in a pattern which is yet to be explained.

UPDATE (Oct 7th 2009): RightScale contributed the missing piece: to normalize a 
series ID, XOR with E5 – this irons out the “jumps” I noticed perfectly.

Superseries Marker

For a given resource type, there is one active 8-bit Superseries ID. Like the 
Series ID, the Superseries ID is not embedded directly into the resource ID. 
Instead, it is XORed to the rightmost 8 bits of the Inner ID. The result – the 
Superseries Marker – is the leftmost byte of the resource ID.

For example, on the resource ID above the Superseries ID would be 69 = 31 XOR 
58.

The Superseries ID changes so rarely that originally I had assumed it was some 
kind of checksum. This would have been odd as it limits the total available IDs 
to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource 
types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 
region (for eu-west-1 the Superseries ID is 74). These days, new instances use 
the Superseries ID 68. This subtle change, unnoticed by the industry, may hint 
at an astonishing achievement: 8.4 million instances launched since EC2′s 
debut! (Instance IDs are even so 8.4M = 16.8M / 2.)

UPDATE (Oct 7th 2009): RightScale suggested to normalize the Superseries ID by 
XORing with 69. In this technique, the superseries ID for us-east-1 was 0, and 
the recent change incremented it to 1.

Brian Schott
bfsch...@gmail.com



On Mar 22, 2011, at 3:44 PM, Vishvananda Ishaya wrote:

> The main issue that drove integers is backwards compatibility to the ec2_api 
> and existing ec2 toolsets.  People seemed very opposed to the idea of having 
> two separate ids in the database, one for ec2 and one for the underlying 
> system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
> integer we have to provide a way for ec2 style ids to be assigned to 
> instances, perhaps through a central authority that hands out unique ids.
> 
> Vish
> 
> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
> 
>> The API spec doesn't seem to preclude us from doing a fully-synchronous 
>> method if we want to (it just reserves the option to do an async 
>> implementation).  Obviously we should make scheduling fast, but I think 
>> we're fine doing synchronous scheduling.  It's still probably going to be 
>> much faster than CloudServers on a bad day anyway :-)
>> 
>> Anyone have a link to where we chose to go with integer IDs?  I'd like to 
>> understand why, because presumably we had a good reason.
>> 
>> However, if we don't have documentation of the decision, then I vote that it 
>> never happened, and instance ids are strings.  We've always been at war with 
>> Eastasia, and all ids have always been strings.
>> 
>> Justi

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jay Pipes
I'd like to point out that while I agree with moving to strings (and
using UUIDs as internal identifiers), this is a discussion for the
summit, and under no circumstances should anyone get the impression
that a change to the identifier will be occurring for Cactus!

-jay

On Tue, Mar 22, 2011 at 4:06 PM, Justin Santa Barbara
 wrote:
> Let's take a leadership position here and go with strings; we're not
> breaking Amazon's API.  AWS will have to make the same changes when they
> reach our scale and ambition :-)
> We should also start engaging with client tools, because we're never going
> to be 100% EC2 compatible.  At the least, our endpoints will be different.
> I think we should discuss this at the Design Summit, and then make an effort
> on this front as part of Diablo.
>
>
> On Tue, Mar 22, 2011 at 12:58 PM, Vishvananda Ishaya 
> wrote:
>>
>> Yes, that is what they say,  Unfortunately all of the ec2 tools expect the
>> current format that they are using to various degrees.
>> Some just need the proper prefix (euca2ools)
>> Others need the prefix + hex (elasticfox, irrc)
>> Others allow a string but limit it to 11 chars, etc.
>> So to keep compatibility we are stuck mimicking amazon's string version
>> for now.
>> Vish
>> On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:
>>
>> EC2 uses xsd:string for their instance id.  I can't find any additional
>> guarantees.
>> Here's a (second hand) quote from Amazon:
>>
>> http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
>> "Instance ids are unique. You'll never receive a duplicate id. However,
>> the current format of the instance id is an implementation detail that is
>> subject to change. If you use the instance id as a string, you should be
>> fine."
>> So, strings it is then? :-)
>>
>>
>>
>> On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya
>>  wrote:
>>>
>>> The main issue that drove integers is backwards compatibility to the
>>> ec2_api and existing ec2 toolsets.  People seemed very opposed to the idea
>>> of having two separate ids in the database, one for ec2 and one for the
>>> underlying system.  If we want to move to another id scheme that doesn't fit
>>> in a 32 bit integer we have to provide a way for ec2 style ids to be
>>> assigned to instances, perhaps through a central authority that hands out
>>> unique ids.
>>>
>>> Vish
>>> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
>>>
>>> The API spec doesn't seem to preclude us from doing a fully-synchronous
>>> method if we want to (it just reserves the option to do an async
>>> implementation).  Obviously we should make scheduling fast, but I think
>>> we're fine doing synchronous scheduling.  It's still probably going to be
>>> much faster than CloudServers on a bad day anyway :-)
>>> Anyone have a link to where we chose to go with integer IDs?  I'd like to
>>> understand why, because presumably we had a good reason.
>>> However, if we don't have documentation of the decision, then I vote that
>>> it never happened, and instance ids are strings.  We've always been at war
>>> with Eastasia, and all ids have always been strings.
>>>
>>> Justin
>>>
>>>
>>>
>>> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
>>> wrote:

 I agree with the sentiment that integers aren't the way to go long term.
 The current spec of the api does introduce some interesting problems to
 this discussion. All can be solved. The spec calls for the api to return
 an id and a password upon instance creation. This means the api isn't
 asynchronous if it has to wait for the zone to create the id. From page
 46
 of the API Spec states the following:

 "Note that when creating a server only the server ID and the admin
 password are guaranteed to be returned in the request object. Additional
 attributes may be retrieved by performing subsequent GETs on the
 server."



 This creates a problem with the bursting if Z1 calls to Z2, which is a
 public cloud, which has to wait for Z3-X to find out where it is going
 be
 placed. How would this work?

 pvo

 On 3/22/11 1:39 PM, "Chris Behrens"  wrote:

 >
 >I think Dragon got it right.  We need a zone identifier prefix on the
 >IDs.  I think we need to get away from numbers.  I don't see any reason
 >why they need to be numbers.  But, even if they did, you can pick very
 >large numbers and reserve some bits for zone ID.
 >
 >- Chris
 >
 >
 >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
 >
 >> I think _if_ we want to stick with straight numbers, the following
 >> are
 >>the 'traditional' choices:
 >>
 >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
 >>2,4,6.  Requires that you know in advance how many zones there are.
 >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
 >> 3) Central allocation - each zone would request 

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Vishvananda Ishaya
Seems resonable

+1 to design summit discussion

Vish

On Mar 22, 2011, at 1:06 PM, Justin Santa Barbara wrote:

> Let's take a leadership position here and go with strings; we're not breaking 
> Amazon's API.  AWS will have to make the same changes when they reach our 
> scale and ambition :-)
> 
> We should also start engaging with client tools, because we're never going to 
> be 100% EC2 compatible.  At the least, our endpoints will be different.
> 
> I think we should discuss this at the Design Summit, and then make an effort 
> on this front as part of Diablo.
> 
> 
> 
> On Tue, Mar 22, 2011 at 12:58 PM, Vishvananda Ishaya  
> wrote:
> Yes, that is what they say,  Unfortunately all of the ec2 tools expect the 
> current format that they are using to various degrees.
> 
> Some just need the proper prefix (euca2ools)
> Others need the prefix + hex (elasticfox, irrc)
> Others allow a string but limit it to 11 chars, etc.
> 
> So to keep compatibility we are stuck mimicking amazon's string version for 
> now.
> 
> Vish
> 
> On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:
> 
>> EC2 uses xsd:string for their instance id.  I can't find any additional 
>> guarantees.
>> 
>> Here's a (second hand) quote from Amazon:
>> 
>> http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
>> "Instance ids are unique. You'll never receive a duplicate id. However, the 
>> current format of the instance id is an implementation detail that is 
>> subject to change. If you use the instance id as a string, you should be 
>> fine."
>> 
>> So, strings it is then? :-)
>> 
>> 
>> 
>> On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya  
>> wrote:
>> The main issue that drove integers is backwards compatibility to the ec2_api 
>> and existing ec2 toolsets.  People seemed very opposed to the idea of having 
>> two separate ids in the database, one for ec2 and one for the underlying 
>> system.  If we want to move to another id scheme that doesn't fit in a 32 
>> bit integer we have to provide a way for ec2 style ids to be assigned to 
>> instances, perhaps through a central authority that hands out unique ids.
>> 
>> Vish
>> 
>> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
>> 
>>> The API spec doesn't seem to preclude us from doing a fully-synchronous 
>>> method if we want to (it just reserves the option to do an async 
>>> implementation).  Obviously we should make scheduling fast, but I think 
>>> we're fine doing synchronous scheduling.  It's still probably going to be 
>>> much faster than CloudServers on a bad day anyway :-)
>>> 
>>> Anyone have a link to where we chose to go with integer IDs?  I'd like to 
>>> understand why, because presumably we had a good reason.
>>> 
>>> However, if we don't have documentation of the decision, then I vote that 
>>> it never happened, and instance ids are strings.  We've always been at war 
>>> with Eastasia, and all ids have always been strings.
>>> 
>>> Justin
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio  
>>> wrote:
>>> I agree with the sentiment that integers aren't the way to go long term.
>>> The current spec of the api does introduce some interesting problems to
>>> this discussion. All can be solved. The spec calls for the api to return
>>> an id and a password upon instance creation. This means the api isn't
>>> asynchronous if it has to wait for the zone to create the id. From page 46
>>> of the API Spec states the following:
>>> 
>>> "Note that when creating a server only the server ID and the admin
>>> password are guaranteed to be returned in the request object. Additional
>>> attributes may be retrieved by performing subsequent GETs on the server."
>>> 
>>> 
>>> 
>>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>>> public cloud, which has to wait for Z3-X to find out where it is going be
>>> placed. How would this work?
>>> 
>>> pvo
>>> 
>>> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>>> 
>>> >
>>> >I think Dragon got it right.  We need a zone identifier prefix on the
>>> >IDs.  I think we need to get away from numbers.  I don't see any reason
>>> >why they need to be numbers.  But, even if they did, you can pick very
>>> >large numbers and reserve some bits for zone ID.
>>> >
>>> >- Chris
>>> >
>>> >
>>> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>>> >
>>> >> I think _if_ we want to stick with straight numbers, the following are
>>> >>the 'traditional' choices:
>>> >>
>>> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>>> >>2,4,6.  Requires that you know in advance how many zones there are.
>>> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>>> >> 3) Central allocation - each zone would request an ID from a central
>>> >>pool.  This might not be a bad thing, if you do want to have a quick
>>> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>>> >>same administrative control.
>>> >> 4) Block al

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Brian Schott
+1
Sounds like some IPV6 discussions back when the standards were being debated.  
We could debate bit-allocation forever.  Why can't we use UUIDs?

http://tools.ietf.org/html/rfc4122

"""
2.  Motivation


   One of the main reasons for using UUIDs is that no centralized
   authority is required to administer them (although one format uses
   IEEE 802 node identifiers, others do not).  As a result, generation
   on demand can be completely automated, and used for a variety of
   purposes.  The UUID generation algorithm described here supports very
   high allocation rates of up to 10 million per second per machine if
   necessary, so that they could even be used as transaction IDs.

   UUIDs are of a fixed size (128 bits) which is reasonably small
   compared to other alternatives.  This lends itself well to sorting,
   ordering, and hashing of all sorts, storing in databases, simple
   allocation, and ease of programming in general.

   Since UUIDs are unique and persistent, they make excellent Uniform
   Resource Names.  The unique ability to generate a new UUID without a
   registration process allows for UUIDs to be one of the URNs with the
   lowest minting cost.

"""

Brian Schott
bfsch...@gmail.com



On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:

> I know you don't want to resurrect a past discussion. But, UUIDs are
> designed to solve these kind of problems, frankly. The decision to go
> with integer IDs is a poor one, and will be negatively affecting the
> scalability and architecture of our systems well into the future.
> 
> I'd love to see a discussion around moving away from internal integer
> identifiers and towards UUID internal identifiers at the next summit.
> 
> Just my 2 cents,
> -jay
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
Let's take a leadership position here and go with strings; we're not
breaking Amazon's API.  AWS will have to make the same changes when they
reach our scale and ambition :-)

We should also start engaging with client tools, because we're never going
to be 100% EC2 compatible.  At the least, our endpoints will be different.

I think we should discuss this at the Design Summit, and then make an effort
on this front as part of Diablo.



On Tue, Mar 22, 2011 at 12:58 PM, Vishvananda Ishaya
wrote:

> Yes, that is what they say,  Unfortunately all of the ec2 tools expect the
> current format that they are using to various degrees.
>
> Some just need the proper prefix (euca2ools)
> Others need the prefix + hex (elasticfox, irrc)
> Others allow a string but limit it to 11 chars, etc.
>
> So to keep compatibility we are stuck mimicking amazon's string version for
> now.
>
> Vish
>
> On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:
>
> EC2 uses xsd:string for their instance id.  I can't find any additional
> guarantees.
>
> Here's a (second hand) quote from Amazon:
>
>
> http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
> "Instance ids are unique. You'll never receive a duplicate id. However, the
> current format of the instance id is an implementation detail that is
> subject to change. If you use the instance id as a string, you should be
> fine."
>
> So, strings it is then? :-)
>
>
>
> On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya <
> vishvana...@gmail.com> wrote:
>
>> The main issue that drove integers is backwards compatibility to the
>> ec2_api and existing ec2 toolsets.  People seemed very opposed to the idea
>> of having two separate ids in the database, one for ec2 and one for the
>> underlying system.  If we want to move to another id scheme that doesn't fit
>> in a 32 bit integer we have to provide a way for ec2 style ids to be
>> assigned to instances, perhaps through a central authority that hands out
>> unique ids.
>>
>> Vish
>>
>> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
>>
>> The API spec doesn't seem to preclude us from doing a fully-synchronous
>> method if we want to (it just reserves the option to do an async
>> implementation).  Obviously we should make scheduling fast, but I think
>> we're fine doing synchronous scheduling.  It's still probably going to be
>> much faster than CloudServers on a bad day anyway :-)
>>
>> Anyone have a link to where we chose to go with integer IDs?  I'd like to
>> understand why, because presumably we had a good reason.
>>
>> However, if we don't have documentation of the decision, then I vote that
>> it never happened, and instance ids are strings.  We've always been at war
>> with Eastasia, and all ids have always been strings.
>>
>> Justin
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
>> wrote:
>>
>>> I agree with the sentiment that integers aren't the way to go long term.
>>> The current spec of the api does introduce some interesting problems to
>>> this discussion. All can be solved. The spec calls for the api to return
>>> an id and a password upon instance creation. This means the api isn't
>>> asynchronous if it has to wait for the zone to create the id. From page
>>> 46
>>> of the API Spec states the following:
>>>
>>> "Note that when creating a server only the server ID and the admin
>>> password are guaranteed to be returned in the request object. Additional
>>> attributes may be retrieved by performing subsequent GETs on the server."
>>>
>>>
>>>
>>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>>> public cloud, which has to wait for Z3-X to find out where it is going be
>>> placed. How would this work?
>>>
>>> pvo
>>>
>>> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>>>
>>> >
>>> >I think Dragon got it right.  We need a zone identifier prefix on the
>>> >IDs.  I think we need to get away from numbers.  I don't see any reason
>>> >why they need to be numbers.  But, even if they did, you can pick very
>>> >large numbers and reserve some bits for zone ID.
>>> >
>>> >- Chris
>>> >
>>> >
>>> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>>> >
>>> >> I think _if_ we want to stick with straight numbers, the following are
>>> >>the 'traditional' choices:
>>> >>
>>> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>>> >>2,4,6.  Requires that you know in advance how many zones there are.
>>> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>>> >> 3) Central allocation - each zone would request an ID from a central
>>> >>pool.  This might not be a bad thing, if you do want to have a quick
>>> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>>> >>same administrative control.
>>> >> 4) Block allocation - a refinement of #3, where you get a bunch of
>>> IDs.
>>> >> Effectively amortizes the cost of the RPC.  Probably not worth the
>>> >>effort here.
>>> >>
>>> >> (If you want central 

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Vishvananda Ishaya
Yes, that is what they say,  Unfortunately all of the ec2 tools expect the 
current format that they are using to various degrees.

Some just need the proper prefix (euca2ools)
Others need the prefix + hex (elasticfox, irrc)
Others allow a string but limit it to 11 chars, etc.

So to keep compatibility we are stuck mimicking amazon's string version for now.

Vish

On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:

> EC2 uses xsd:string for their instance id.  I can't find any additional 
> guarantees.
> 
> Here's a (second hand) quote from Amazon:
> 
> http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
> "Instance ids are unique. You'll never receive a duplicate id. However, the 
> current format of the instance id is an implementation detail that is subject 
> to change. If you use the instance id as a string, you should be fine."
> 
> So, strings it is then? :-)
> 
> 
> 
> On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya  
> wrote:
> The main issue that drove integers is backwards compatibility to the ec2_api 
> and existing ec2 toolsets.  People seemed very opposed to the idea of having 
> two separate ids in the database, one for ec2 and one for the underlying 
> system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
> integer we have to provide a way for ec2 style ids to be assigned to 
> instances, perhaps through a central authority that hands out unique ids.
> 
> Vish
> 
> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
> 
>> The API spec doesn't seem to preclude us from doing a fully-synchronous 
>> method if we want to (it just reserves the option to do an async 
>> implementation).  Obviously we should make scheduling fast, but I think 
>> we're fine doing synchronous scheduling.  It's still probably going to be 
>> much faster than CloudServers on a bad day anyway :-)
>> 
>> Anyone have a link to where we chose to go with integer IDs?  I'd like to 
>> understand why, because presumably we had a good reason.
>> 
>> However, if we don't have documentation of the decision, then I vote that it 
>> never happened, and instance ids are strings.  We've always been at war with 
>> Eastasia, and all ids have always been strings.
>> 
>> Justin
>> 
>> 
>> 
>> 
>> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio  
>> wrote:
>> I agree with the sentiment that integers aren't the way to go long term.
>> The current spec of the api does introduce some interesting problems to
>> this discussion. All can be solved. The spec calls for the api to return
>> an id and a password upon instance creation. This means the api isn't
>> asynchronous if it has to wait for the zone to create the id. From page 46
>> of the API Spec states the following:
>> 
>> "Note that when creating a server only the server ID and the admin
>> password are guaranteed to be returned in the request object. Additional
>> attributes may be retrieved by performing subsequent GETs on the server."
>> 
>> 
>> 
>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>> public cloud, which has to wait for Z3-X to find out where it is going be
>> placed. How would this work?
>> 
>> pvo
>> 
>> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>> 
>> >
>> >I think Dragon got it right.  We need a zone identifier prefix on the
>> >IDs.  I think we need to get away from numbers.  I don't see any reason
>> >why they need to be numbers.  But, even if they did, you can pick very
>> >large numbers and reserve some bits for zone ID.
>> >
>> >- Chris
>> >
>> >
>> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>> >
>> >> I think _if_ we want to stick with straight numbers, the following are
>> >>the 'traditional' choices:
>> >>
>> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>> >>2,4,6.  Requires that you know in advance how many zones there are.
>> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>> >> 3) Central allocation - each zone would request an ID from a central
>> >>pool.  This might not be a bad thing, if you do want to have a quick
>> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>> >>same administrative control.
>> >> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
>> >> Effectively amortizes the cost of the RPC.  Probably not worth the
>> >>effort here.
>> >>
>> >> (If you want central allocation without a shared database, that's also
>> >>possible, but requires some trickier protocols.)
>> >>
>> >> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
>> >>a customer of Rackspace CloudServers once it is running on OpenStack,
>> >>and I also have a private cloud that the new Rackspace Cloud Business
>> >>unit has built for me.  I like both, and then I want to do cloud
>> >>bursting in between them, by putting an aggregating zone in front of
>> >>them.  I think at that stage, we're screwed unless we figure this out
>> >>now.  And this scenario

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jay Pipes
On Tue, Mar 22, 2011 at 3:29 PM, Ed Leafe  wrote:
> On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:
>
>> I know you don't want to resurrect a past discussion. But, UUIDs are
>> designed to solve these kind of problems, frankly. The decision to go
>> with integer IDs is a poor one, and will be negatively affecting the
>> scalability and architecture of our systems well into the future.
>>
>> I'd love to see a discussion around moving away from internal integer
>> identifiers and towards UUID internal identifiers at the next summit.
>
>        So would I. For the benefit of those of us who were not involved in 
> these prior discussions, can you (or anyone else) remember the objections to 
> string IDs, or, conversely, the reasons in favor of integer IDs?

I don't know who made the decision in the first place, and I only
remember IRC discussions, nothing on the mailing list or in a design
summit.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Mark Washenberger
> However, if we don't have documentation of the decision, then I vote that it
> never happened, and instance ids are strings.  We've always been at war with
> Eastasia, and all ids have always been strings.

This approach might help us in fixing some of the nastier bits of the openstack 
api images resource, as well.

"Justin Santa Barbara"  said:

> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> The API spec doesn't seem to preclude us from doing a fully-synchronous
> method if we want to (it just reserves the option to do an async
> implementation).  Obviously we should make scheduling fast, but I think
> we're fine doing synchronous scheduling.  It's still probably going to be
> much faster than CloudServers on a bad day anyway :-)
> 
> Anyone have a link to where we chose to go with integer IDs?  I'd like to
> understand why, because presumably we had a good reason.
> 
> However, if we don't have documentation of the decision, then I vote that it
> never happened, and instance ids are strings.  We've always been at war with
> Eastasia, and all ids have always been strings.
> 
> Justin
> 
> 
> 
> 
> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
> wrote:
> 
>> I agree with the sentiment that integers aren't the way to go long term.
>> The current spec of the api does introduce some interesting problems to
>> this discussion. All can be solved. The spec calls for the api to return
>> an id and a password upon instance creation. This means the api isn't
>> asynchronous if it has to wait for the zone to create the id. From page 46
>> of the API Spec states the following:
>>
>> "Note that when creating a server only the server ID and the admin
>> password are guaranteed to be returned in the request object. Additional
>> attributes may be retrieved by performing subsequent GETs on the server."
>>
>>
>>
>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>> public cloud, which has to wait for Z3-X to find out where it is going be
>> placed. How would this work?
>>
>> pvo
>>
>> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>>
>> >
>> >I think Dragon got it right.  We need a zone identifier prefix on the
>> >IDs.  I think we need to get away from numbers.  I don't see any reason
>> >why they need to be numbers.  But, even if they did, you can pick very
>> >large numbers and reserve some bits for zone ID.
>> >
>> >- Chris
>> >
>> >
>> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>> >
>> >> I think _if_ we want to stick with straight numbers, the following are
>> >>the 'traditional' choices:
>> >>
>> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>> >>2,4,6.  Requires that you know in advance how many zones there are.
>> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>> >> 3) Central allocation - each zone would request an ID from a central
>> >>pool.  This might not be a bad thing, if you do want to have a quick
>> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>> >>same administrative control.
>> >> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
>> >> Effectively amortizes the cost of the RPC.  Probably not worth the
>> >>effort here.
>> >>
>> >> (If you want central allocation without a shared database, that's also
>> >>possible, but requires some trickier protocols.)
>> >>
>> >> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
>> >>a customer of Rackspace CloudServers once it is running on OpenStack,
>> >>and I also have a private cloud that the new Rackspace Cloud Business
>> >>unit has built for me.  I like both, and then I want to do cloud
>> >>bursting in between them, by putting an aggregating zone in front of
>> >>them.  I think at that stage, we're screwed unless we figure this out
>> >>now.  And this scenario only has one provider (Rackspace) involved!
>> >>
>> >> We can square the circle however - if we want numbers, let's use UUIDs
>> >>- they're 128 bit numbers, and won't in practice collide.  I'd still
>> >>prefer strings though...
>> >>
>> >> Justin
>> >>
>> >>
>> >>
>> >> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
>> >>I want to get some input from all of you on what you think is
>> >>the best way to approach this problem: the RS API requires that every
>> >>instance have a unique ID, and we are currently creating these IDs by
>> >>use of an auto-increment field in the instances table. The introduction
>> >>of zones complicates this, as each zone has its own database.
>> >>
>> >>The two obvious solutions are a) a single, shared database and
>> >>b) using a UUID instead of an integer for the ID. Both of these
>> >>approaches have been discussed and rejected, so let's not bring them
>> >>back up now.
>> >>
>> >>Given intege

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe
On Mar 22, 2011, at 3:20 PM, Paul Voccio wrote:

> This means the api isn't
> asynchronous if it has to wait for the zone to create the id. From page 46
> of the API Spec states the following:
> 
> "Note that when creating a server only the server ID and the admin
> password are guaranteed to be returned in the request object. Additional
> attributes may be retrieved by performing subsequent GETs on the server."
> 
> 
> This creates a problem with the bursting if Z1 calls to Z2, which is a
> public cloud, which has to wait for Z3-X to find out where it is going be
> placed. How would this work?


I thought this had been changed to return a reservation ID, which would 
then be used to get information about the instance once it had been created. 
That would allow the API to return immediately without having to wait for a 
host to be selected, an instance to be created, networking to be configured, 
etc.


-- Ed Leafe


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
EC2 uses xsd:string for their instance id.  I can't find any additional
guarantees.

Here's a (second hand) quote from Amazon:

http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
"Instance ids are unique. You'll never receive a duplicate id. However, the
current format of the instance id is an implementation detail that is
subject to change. If you use the instance id as a string, you should be
fine."

So, strings it is then? :-)



On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya
wrote:

> The main issue that drove integers is backwards compatibility to the
> ec2_api and existing ec2 toolsets.  People seemed very opposed to the idea
> of having two separate ids in the database, one for ec2 and one for the
> underlying system.  If we want to move to another id scheme that doesn't fit
> in a 32 bit integer we have to provide a way for ec2 style ids to be
> assigned to instances, perhaps through a central authority that hands out
> unique ids.
>
> Vish
>
> On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
>
> The API spec doesn't seem to preclude us from doing a fully-synchronous
> method if we want to (it just reserves the option to do an async
> implementation).  Obviously we should make scheduling fast, but I think
> we're fine doing synchronous scheduling.  It's still probably going to be
> much faster than CloudServers on a bad day anyway :-)
>
> Anyone have a link to where we chose to go with integer IDs?  I'd like to
> understand why, because presumably we had a good reason.
>
> However, if we don't have documentation of the decision, then I vote that
> it never happened, and instance ids are strings.  We've always been at war
> with Eastasia, and all ids have always been strings.
>
> Justin
>
>
>
>
> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
> wrote:
>
>> I agree with the sentiment that integers aren't the way to go long term.
>> The current spec of the api does introduce some interesting problems to
>> this discussion. All can be solved. The spec calls for the api to return
>> an id and a password upon instance creation. This means the api isn't
>> asynchronous if it has to wait for the zone to create the id. From page 46
>> of the API Spec states the following:
>>
>> "Note that when creating a server only the server ID and the admin
>> password are guaranteed to be returned in the request object. Additional
>> attributes may be retrieved by performing subsequent GETs on the server."
>>
>>
>>
>> This creates a problem with the bursting if Z1 calls to Z2, which is a
>> public cloud, which has to wait for Z3-X to find out where it is going be
>> placed. How would this work?
>>
>> pvo
>>
>> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>>
>> >
>> >I think Dragon got it right.  We need a zone identifier prefix on the
>> >IDs.  I think we need to get away from numbers.  I don't see any reason
>> >why they need to be numbers.  But, even if they did, you can pick very
>> >large numbers and reserve some bits for zone ID.
>> >
>> >- Chris
>> >
>> >
>> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>> >
>> >> I think _if_ we want to stick with straight numbers, the following are
>> >>the 'traditional' choices:
>> >>
>> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>> >>2,4,6.  Requires that you know in advance how many zones there are.
>> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>> >> 3) Central allocation - each zone would request an ID from a central
>> >>pool.  This might not be a bad thing, if you do want to have a quick
>> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>> >>same administrative control.
>> >> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
>> >> Effectively amortizes the cost of the RPC.  Probably not worth the
>> >>effort here.
>> >>
>> >> (If you want central allocation without a shared database, that's also
>> >>possible, but requires some trickier protocols.)
>> >>
>> >> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
>> >>a customer of Rackspace CloudServers once it is running on OpenStack,
>> >>and I also have a private cloud that the new Rackspace Cloud Business
>> >>unit has built for me.  I like both, and then I want to do cloud
>> >>bursting in between them, by putting an aggregating zone in front of
>> >>them.  I think at that stage, we're screwed unless we figure this out
>> >>now.  And this scenario only has one provider (Rackspace) involved!
>> >>
>> >> We can square the circle however - if we want numbers, let's use UUIDs
>> >>- they're 128 bit numbers, and won't in practice collide.  I'd still
>> >>prefer strings though...
>> >>
>> >> Justin
>> >>
>> >>
>> >>
>> >> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
>> >>I want to get some input from all of you on what you think is
>> >>the best way to approach this problem: the RS API requires that every
>> >>instance have a unique ID, a

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Paul Voccio
With this, are we saying EC2API wouldn't be able to use the child zones in the 
same way as the OSAPI?

From: Vishvananda Ishaya mailto:vishvana...@gmail.com>>
Date: Tue, 22 Mar 2011 12:44:21 -0700
To: Justin Santa Barbara mailto:jus...@fathomdb.com>>
Cc: Paul Voccio mailto:paul.voc...@rackspace.com>>, 
"openstack@lists.launchpad.net<mailto:openstack@lists.launchpad.net>" 
mailto:openstack@lists.launchpad.net>>, Chris 
Behrens mailto:chris.behr...@rackspace.com>>
Subject: Re: [Openstack] Instance IDs and Multiple Zones

The main issue that drove integers is backwards compatibility to the ec2_api 
and existing ec2 toolsets.  People seemed very opposed to the idea of having 
two separate ids in the database, one for ec2 and one for the underlying 
system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
integer we have to provide a way for ec2 style ids to be assigned to instances, 
perhaps through a central authority that hands out unique ids.

Vish

On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:

The API spec doesn't seem to preclude us from doing a fully-synchronous method 
if we want to (it just reserves the option to do an async implementation).  
Obviously we should make scheduling fast, but I think we're fine doing 
synchronous scheduling.  It's still probably going to be much faster than 
CloudServers on a bad day anyway :-)

Anyone have a link to where we chose to go with integer IDs?  I'd like to 
understand why, because presumably we had a good reason.

However, if we don't have documentation of the decision, then I vote that it 
never happened, and instance ids are strings.  We've always been at war with 
Eastasia, and all ids have always been strings.

Justin




On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
mailto:paul.voc...@rackspace.com>> wrote:
I agree with the sentiment that integers aren't the way to go long term.
The current spec of the api does introduce some interesting problems to
this discussion. All can be solved. The spec calls for the api to return
an id and a password upon instance creation. This means the api isn't
asynchronous if it has to wait for the zone to create the id. From page 46
of the API Spec states the following:

"Note that when creating a server only the server ID and the admin
password are guaranteed to be returned in the request object. Additional
attributes may be retrieved by performing subsequent GETs on the server."



This creates a problem with the bursting if Z1 calls to Z2, which is a
public cloud, which has to wait for Z3-X to find out where it is going be
placed. How would this work?

pvo

On 3/22/11 1:39 PM, "Chris Behrens" 
mailto:chris.behr...@rackspace.com>> wrote:

>
>I think Dragon got it right.  We need a zone identifier prefix on the
>IDs.  I think we need to get away from numbers.  I don't see any reason
>why they need to be numbers.  But, even if they did, you can pick very
>large numbers and reserve some bits for zone ID.
>
>- Chris
>
>
>On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>
>> I think _if_ we want to stick with straight numbers, the following are
>>the 'traditional' choices:
>>
>> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>>2,4,6.  Requires that you know in advance how many zones there are.
>> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>> 3) Central allocation - each zone would request an ID from a central
>>pool.  This might not be a bad thing, if you do want to have a quick
>>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>>same administrative control.
>> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
>> Effectively amortizes the cost of the RPC.  Probably not worth the
>>effort here.
>>
>> (If you want central allocation without a shared database, that's also
>>possible, but requires some trickier protocols.)
>>
>> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
>>a customer of Rackspace CloudServers once it is running on OpenStack,
>>and I also have a private cloud that the new Rackspace Cloud Business
>>unit has built for me.  I like both, and then I want to do cloud
>>bursting in between them, by putting an aggregating zone in front of
>>them.  I think at that stage, we're screwed unless we figure this out
>>now.  And this scenario only has one provider (Rackspace) involved!
>>
>> We can square the circle however - if we want numbers, let's use UUIDs
>>- they're 128 bit numbers, and won't in practice collide.  I'd still
>>prefer strings though...
>>
>> Jus

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Vishvananda Ishaya
The main issue that drove integers is backwards compatibility to the ec2_api 
and existing ec2 toolsets.  People seemed very opposed to the idea of having 
two separate ids in the database, one for ec2 and one for the underlying 
system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
integer we have to provide a way for ec2 style ids to be assigned to instances, 
perhaps through a central authority that hands out unique ids.

Vish

On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:

> The API spec doesn't seem to preclude us from doing a fully-synchronous 
> method if we want to (it just reserves the option to do an async 
> implementation).  Obviously we should make scheduling fast, but I think we're 
> fine doing synchronous scheduling.  It's still probably going to be much 
> faster than CloudServers on a bad day anyway :-)
> 
> Anyone have a link to where we chose to go with integer IDs?  I'd like to 
> understand why, because presumably we had a good reason.
> 
> However, if we don't have documentation of the decision, then I vote that it 
> never happened, and instance ids are strings.  We've always been at war with 
> Eastasia, and all ids have always been strings.
> 
> Justin
> 
> 
> 
> 
> On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio  
> wrote:
> I agree with the sentiment that integers aren't the way to go long term.
> The current spec of the api does introduce some interesting problems to
> this discussion. All can be solved. The spec calls for the api to return
> an id and a password upon instance creation. This means the api isn't
> asynchronous if it has to wait for the zone to create the id. From page 46
> of the API Spec states the following:
> 
> "Note that when creating a server only the server ID and the admin
> password are guaranteed to be returned in the request object. Additional
> attributes may be retrieved by performing subsequent GETs on the server."
> 
> 
> 
> This creates a problem with the bursting if Z1 calls to Z2, which is a
> public cloud, which has to wait for Z3-X to find out where it is going be
> placed. How would this work?
> 
> pvo
> 
> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
> 
> >
> >I think Dragon got it right.  We need a zone identifier prefix on the
> >IDs.  I think we need to get away from numbers.  I don't see any reason
> >why they need to be numbers.  But, even if they did, you can pick very
> >large numbers and reserve some bits for zone ID.
> >
> >- Chris
> >
> >
> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
> >
> >> I think _if_ we want to stick with straight numbers, the following are
> >>the 'traditional' choices:
> >>
> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
> >>2,4,6.  Requires that you know in advance how many zones there are.
> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
> >> 3) Central allocation - each zone would request an ID from a central
> >>pool.  This might not be a bad thing, if you do want to have a quick
> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
> >>same administrative control.
> >> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
> >> Effectively amortizes the cost of the RPC.  Probably not worth the
> >>effort here.
> >>
> >> (If you want central allocation without a shared database, that's also
> >>possible, but requires some trickier protocols.)
> >>
> >> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
> >>a customer of Rackspace CloudServers once it is running on OpenStack,
> >>and I also have a private cloud that the new Rackspace Cloud Business
> >>unit has built for me.  I like both, and then I want to do cloud
> >>bursting in between them, by putting an aggregating zone in front of
> >>them.  I think at that stage, we're screwed unless we figure this out
> >>now.  And this scenario only has one provider (Rackspace) involved!
> >>
> >> We can square the circle however - if we want numbers, let's use UUIDs
> >>- they're 128 bit numbers, and won't in practice collide.  I'd still
> >>prefer strings though...
> >>
> >> Justin
> >>
> >>
> >>
> >> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
> >>I want to get some input from all of you on what you think is
> >>the best way to approach this problem: the RS API requires that every
> >>instance have a unique ID, and we are currently creating these IDs by
> >>use of an auto-increment field in the instances table. The introduction
> >>of zones complicates this, as each zone has its own database.
> >>
> >>The two obvious solutions are a) a single, shared database and
> >>b) using a UUID instead of an integer for the ID. Both of these
> >>approaches have been discussed and rejected, so let's not bring them
> >>back up now.
> >>
> >>Given integer IDs and separate databases, the only obvious
> >>choice is partitioning the numeric space so that each zone starts its
> >>auto-incrementing at

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
The API spec doesn't seem to preclude us from doing a fully-synchronous
method if we want to (it just reserves the option to do an async
implementation).  Obviously we should make scheduling fast, but I think
we're fine doing synchronous scheduling.  It's still probably going to be
much faster than CloudServers on a bad day anyway :-)

Anyone have a link to where we chose to go with integer IDs?  I'd like to
understand why, because presumably we had a good reason.

However, if we don't have documentation of the decision, then I vote that it
never happened, and instance ids are strings.  We've always been at war with
Eastasia, and all ids have always been strings.

Justin




On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio wrote:

> I agree with the sentiment that integers aren't the way to go long term.
> The current spec of the api does introduce some interesting problems to
> this discussion. All can be solved. The spec calls for the api to return
> an id and a password upon instance creation. This means the api isn't
> asynchronous if it has to wait for the zone to create the id. From page 46
> of the API Spec states the following:
>
> "Note that when creating a server only the server ID and the admin
> password are guaranteed to be returned in the request object. Additional
> attributes may be retrieved by performing subsequent GETs on the server."
>
>
>
> This creates a problem with the bursting if Z1 calls to Z2, which is a
> public cloud, which has to wait for Z3-X to find out where it is going be
> placed. How would this work?
>
> pvo
>
> On 3/22/11 1:39 PM, "Chris Behrens"  wrote:
>
> >
> >I think Dragon got it right.  We need a zone identifier prefix on the
> >IDs.  I think we need to get away from numbers.  I don't see any reason
> >why they need to be numbers.  But, even if they did, you can pick very
> >large numbers and reserve some bits for zone ID.
> >
> >- Chris
> >
> >
> >On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
> >
> >> I think _if_ we want to stick with straight numbers, the following are
> >>the 'traditional' choices:
> >>
> >> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
> >>2,4,6.  Requires that you know in advance how many zones there are.
> >> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
> >> 3) Central allocation - each zone would request an ID from a central
> >>pool.  This might not be a bad thing, if you do want to have a quick
> >>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
> >>same administrative control.
> >> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
> >> Effectively amortizes the cost of the RPC.  Probably not worth the
> >>effort here.
> >>
> >> (If you want central allocation without a shared database, that's also
> >>possible, but requires some trickier protocols.)
> >>
> >> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
> >>a customer of Rackspace CloudServers once it is running on OpenStack,
> >>and I also have a private cloud that the new Rackspace Cloud Business
> >>unit has built for me.  I like both, and then I want to do cloud
> >>bursting in between them, by putting an aggregating zone in front of
> >>them.  I think at that stage, we're screwed unless we figure this out
> >>now.  And this scenario only has one provider (Rackspace) involved!
> >>
> >> We can square the circle however - if we want numbers, let's use UUIDs
> >>- they're 128 bit numbers, and won't in practice collide.  I'd still
> >>prefer strings though...
> >>
> >> Justin
> >>
> >>
> >>
> >> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
> >>I want to get some input from all of you on what you think is
> >>the best way to approach this problem: the RS API requires that every
> >>instance have a unique ID, and we are currently creating these IDs by
> >>use of an auto-increment field in the instances table. The introduction
> >>of zones complicates this, as each zone has its own database.
> >>
> >>The two obvious solutions are a) a single, shared database and
> >>b) using a UUID instead of an integer for the ID. Both of these
> >>approaches have been discussed and rejected, so let's not bring them
> >>back up now.
> >>
> >>Given integer IDs and separate databases, the only obvious
> >>choice is partitioning the numeric space so that each zone starts its
> >>auto-incrementing at a different point, with enough room between
> >>starting ranges to ensure that they would never overlap. This would
> >>require some assumptions be made about the maximum number of instances
> >>that would ever be created in a single zone in order to determine how
> >>much numeric space that zone would need. I'm looking to get some
> >>feedback on what would seem to be reasonable guesses to these partition
> >>sizes.
> >>
> >>The other concern is more aesthetic than technical: we can make
> >>the numeric spaces big enough to avoid overlap, but then we'll have

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe
On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:

> I know you don't want to resurrect a past discussion. But, UUIDs are
> designed to solve these kind of problems, frankly. The decision to go
> with integer IDs is a poor one, and will be negatively affecting the
> scalability and architecture of our systems well into the future.
> 
> I'd love to see a discussion around moving away from internal integer
> identifiers and towards UUID internal identifiers at the next summit.


So would I. For the benefit of those of us who were not involved in 
these prior discussions, can you (or anyone else) remember the objections to 
string IDs, or, conversely, the reasons in favor of integer IDs?



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Paul Voccio
I agree with the sentiment that integers aren't the way to go long term.
The current spec of the api does introduce some interesting problems to
this discussion. All can be solved. The spec calls for the api to return
an id and a password upon instance creation. This means the api isn't
asynchronous if it has to wait for the zone to create the id. From page 46
of the API Spec states the following:

"Note that when creating a server only the server ID and the admin
password are guaranteed to be returned in the request object. Additional
attributes may be retrieved by performing subsequent GETs on the server."



This creates a problem with the bursting if Z1 calls to Z2, which is a
public cloud, which has to wait for Z3-X to find out where it is going be
placed. How would this work?

pvo

On 3/22/11 1:39 PM, "Chris Behrens"  wrote:

>
>I think Dragon got it right.  We need a zone identifier prefix on the
>IDs.  I think we need to get away from numbers.  I don't see any reason
>why they need to be numbers.  But, even if they did, you can pick very
>large numbers and reserve some bits for zone ID.
>
>- Chris
>
>
>On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
>
>> I think _if_ we want to stick with straight numbers, the following are
>>the 'traditional' choices:
>> 
>> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers
>>2,4,6.  Requires that you know in advance how many zones there are.
>> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
>> 3) Central allocation - each zone would request an ID from a central
>>pool.  This might not be a bad thing, if you do want to have a quick
>>lookup table of ID -> zone.  Doesn't work if the zones aren't under the
>>same administrative control.
>> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
>> Effectively amortizes the cost of the RPC.  Probably not worth the
>>effort here.
>> 
>> (If you want central allocation without a shared database, that's also
>>possible, but requires some trickier protocols.)
>> 
>> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
>>a customer of Rackspace CloudServers once it is running on OpenStack,
>>and I also have a private cloud that the new Rackspace Cloud Business
>>unit has built for me.  I like both, and then I want to do cloud
>>bursting in between them, by putting an aggregating zone in front of
>>them.  I think at that stage, we're screwed unless we figure this out
>>now.  And this scenario only has one provider (Rackspace) involved!
>> 
>> We can square the circle however - if we want numbers, let's use UUIDs
>>- they're 128 bit numbers, and won't in practice collide.  I'd still
>>prefer strings though...
>> 
>> Justin
>> 
>> 
>> 
>> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
>>I want to get some input from all of you on what you think is
>>the best way to approach this problem: the RS API requires that every
>>instance have a unique ID, and we are currently creating these IDs by
>>use of an auto-increment field in the instances table. The introduction
>>of zones complicates this, as each zone has its own database.
>> 
>>The two obvious solutions are a) a single, shared database and
>>b) using a UUID instead of an integer for the ID. Both of these
>>approaches have been discussed and rejected, so let's not bring them
>>back up now.
>> 
>>Given integer IDs and separate databases, the only obvious
>>choice is partitioning the numeric space so that each zone starts its
>>auto-incrementing at a different point, with enough room between
>>starting ranges to ensure that they would never overlap. This would
>>require some assumptions be made about the maximum number of instances
>>that would ever be created in a single zone in order to determine how
>>much numeric space that zone would need. I'm looking to get some
>>feedback on what would seem to be reasonable guesses to these partition
>>sizes.
>> 
>>The other concern is more aesthetic than technical: we can make
>>the numeric spaces big enough to avoid overlap, but then we'll have very
>>large ID values; e.g., 10 or more digits for an instance. Computers
>>won't care, but people might, so I thought I'd at least bring up this
>>potential objection.
>> 
>> 
>> 
>> -- Ed Leafe
>> 
>> 
>> 
>> 
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>> 
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
>___
>Mailing list: https://launchpad.net/~openstack
>Post to : openstack@lists.launchpad.net
>Unsubscribe : https://launchpad

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jay Pipes
I know you don't want to resurrect a past discussion. But, UUIDs are
designed to solve these kind of problems, frankly. The decision to go
with integer IDs is a poor one, and will be negatively affecting the
scalability and architecture of our systems well into the future.

I'd love to see a discussion around moving away from internal integer
identifiers and towards UUID internal identifiers at the next summit.

Just my 2 cents,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Chris Behrens
Without fixing the integer ID problem, I'd vote for reserving some bits for 
zone ID.  I don't like the idea assigning ranges to zones.  But I think the 
right answer is to fix the integer ID problem. 

On Mar 22, 2011, at 11:28 AM, Ed Leafe wrote:

> On Mar 22, 2011, at 1:45 PM, Jon Slenk wrote:
> 
>> if the schema cannot be changed (which might be worth reconsidering
>> since it seems to be a bit of a root cause of trouble) then maybe you
>> have to reserve the last 4 or 5 digits of the id to be the zone id,
>> and then autoincrement on top of that? on the assumption that there
>> would be a limit of  or 9 zones ever.
> 
>   Just to be clear: I would not have been in favor of using integer IDs. 
> However, this was discussed and settled before I was actively involved in the 
> OpenStack code, so I didn't want to have this devolve into a resurrection of 
> what had already been decided. If someone wants to restart that discussion, 
> I'd certainly be interested, but that's not what I'm looking for in this 
> thread.
> 
>   The question before us is: given integer IDs, what is the best way to 
> handle the added complexity of multiple zones?
> 
> 
> -- Ed Leafe
> 
> 
> 
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use of the
> individual or entity to which this message is addressed, and unless otherwise
> expressly indicated, is confidential and privileged information of Rackspace.
> Any dissemination, distribution or copying of the enclosed material is 
> prohibited.
> If you receive this transmission in error, please notify us immediately by 
> e-mail
> at ab...@rackspace.com, and delete the original message.
> Your cooperation is appreciated.
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Chris Behrens

I think Dragon got it right.  We need a zone identifier prefix on the IDs.  I 
think we need to get away from numbers.  I don't see any reason why they need 
to be numbers.  But, even if they did, you can pick very large numbers and 
reserve some bits for zone ID.

- Chris


On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:

> I think _if_ we want to stick with straight numbers, the following are the 
> 'traditional' choices:
> 
> 1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6.  
> Requires that you know in advance how many zones there are.
> 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
> 3) Central allocation - each zone would request an ID from a central pool.  
> This might not be a bad thing, if you do want to have a quick lookup table of 
> ID -> zone.  Doesn't work if the zones aren't under the same administrative 
> control.
> 4) Block allocation - a refinement of #3, where you get a bunch of IDs.  
> Effectively amortizes the cost of the RPC.  Probably not worth the effort 
> here.
> 
> (If you want central allocation without a shared database, that's also 
> possible, but requires some trickier protocols.)
> 
> However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm a 
> customer of Rackspace CloudServers once it is running on OpenStack, and I 
> also have a private cloud that the new Rackspace Cloud Business unit has 
> built for me.  I like both, and then I want to do cloud bursting in between 
> them, by putting an aggregating zone in front of them.  I think at that 
> stage, we're screwed unless we figure this out now.  And this scenario only 
> has one provider (Rackspace) involved!
> 
> We can square the circle however - if we want numbers, let's use UUIDs - 
> they're 128 bit numbers, and won't in practice collide.  I'd still prefer 
> strings though...
> 
> Justin
> 
> 
> 
> On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:
>I want to get some input from all of you on what you think is the best 
> way to approach this problem: the RS API requires that every instance have a 
> unique ID, and we are currently creating these IDs by use of an 
> auto-increment field in the instances table. The introduction of zones 
> complicates this, as each zone has its own database.
> 
>The two obvious solutions are a) a single, shared database and b) 
> using a UUID instead of an integer for the ID. Both of these approaches have 
> been discussed and rejected, so let's not bring them back up now.
> 
>Given integer IDs and separate databases, the only obvious choice is 
> partitioning the numeric space so that each zone starts its auto-incrementing 
> at a different point, with enough room between starting ranges to ensure that 
> they would never overlap. This would require some assumptions be made about 
> the maximum number of instances that would ever be created in a single zone 
> in order to determine how much numeric space that zone would need. I'm 
> looking to get some feedback on what would seem to be reasonable guesses to 
> these partition sizes.
> 
>The other concern is more aesthetic than technical: we can make the 
> numeric spaces big enough to avoid overlap, but then we'll have very large ID 
> values; e.g., 10 or more digits for an instance. Computers won't care, but 
> people might, so I thought I'd at least bring up this potential objection.
> 
> 
> 
> -- Ed Leafe
> 
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
Totally agree with Eric.

Two questions that I think can help us move forward:


   1. Is the decision to stick with integers still valid?  Can someone that
   was there give us the reason for the decision?  Is it documented anywhere?
   2. If "we must have integers" means that we get 128 bit 'random'
   integers, do we still want integers?



Justin





On Tue, Mar 22, 2011 at 11:25 AM, Eric Day  wrote:

> On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote:
> >   The two obvious solutions are a) a single, shared database and b)
> using a UUID instead of an integer for the ID. Both of these approaches have
> been discussed and rejected, so let's not bring them back up now.
>
> We shouldn't dismiss previous ideas just because we've not chosen
> them in the past, but lets not have the same discussion.
>
> >   Given integer IDs and separate databases, the only obvious choice
> is partitioning the numeric space so that each zone starts its
> auto-incrementing at a different point, with enough room between starting
> ranges to ensure that they would never overlap. This would require some
> assumptions be made about the maximum number of instances that would ever be
> created in a single zone in order to determine how much numeric space that
> zone would need. I'm looking to get some feedback on what would seem to be
> reasonable guesses to these partition sizes.
>
> I think we need:
>
> * No central authority such as a globally shared DB. This also
>  means not partitioning some set and handing them out to zones as
>  offset (this is just another form of a shared DB).
>
> * Ability to seamlessly join existing zones without chance of namespace
>  collisions for peering and bursting. This means a globally unique
>  zone naming scheme, and for this I'll reiterate the idea of using
>  DNS names for zones.
>
> If we want to stick with a single DB per zone, as it looks like we
> are, this can simply be the auto-increment value from the instance
> table and the zone as: ..
>
> >   The other concern is more aesthetic than technical: we can make the
> numeric spaces big enough to avoid overlap, but then we'll have very large
> ID values; e.g., 10 or more digits for an instance. Computers won't care,
> but people might, so I thought I'd at least bring up this potential
> objection.
>
> I'm not concerned with aesthetic issues to be honest. We have
> copy/paste, DNS, and various techniques for presentation layers.
>
> -Eric
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day
On Tue, Mar 22, 2011 at 10:48:09AM -0700, Justin Santa Barbara wrote:
>We can square the circle however - if we want numbers, let's use UUIDs -
>they're 128 bit numbers, and won't in practice collide.  I'd still prefer
>strings though...

If we use a number/uuid without a zone prefix, then they can
collide. What happens when I want to burst to my private cloud and
I've fixed my UUIDs to intentionally collide just to cause trouble?

Through peering and bursting we have potentially malicious users
for some deployments and we need to be sure resource ID spoofing and
poisoning is not possible. The simplest way is to have a namespace for
every zone, and the most obvious namespace is the zone name. We'll
of course need a mechanism to detect authenticity of zone names too
(signed certs, etc).

Oh, and all this discussion should not be limited to just instance
IDs, networks and volumes need to be globally addressed as well and
should follow the same mechanism.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe
On Mar 22, 2011, at 1:45 PM, Jon Slenk wrote:

> if the schema cannot be changed (which might be worth reconsidering
> since it seems to be a bit of a root cause of trouble) then maybe you
> have to reserve the last 4 or 5 digits of the id to be the zone id,
> and then autoincrement on top of that? on the assumption that there
> would be a limit of  or 9 zones ever.

Just to be clear: I would not have been in favor of using integer IDs. 
However, this was discussed and settled before I was actively involved in the 
OpenStack code, so I didn't want to have this devolve into a resurrection of 
what had already been decided. If someone wants to restart that discussion, I'd 
certainly be interested, but that's not what I'm looking for in this thread.

The question before us is: given integer IDs, what is the best way to 
handle the added complexity of multiple zones?


-- Ed Leafe



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day
On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote:
>   The two obvious solutions are a) a single, shared database and b) using 
> a UUID instead of an integer for the ID. Both of these approaches have been 
> discussed and rejected, so let's not bring them back up now.

We shouldn't dismiss previous ideas just because we've not chosen
them in the past, but lets not have the same discussion.

>   Given integer IDs and separate databases, the only obvious choice is 
> partitioning the numeric space so that each zone starts its auto-incrementing 
> at a different point, with enough room between starting ranges to ensure that 
> they would never overlap. This would require some assumptions be made about 
> the maximum number of instances that would ever be created in a single zone 
> in order to determine how much numeric space that zone would need. I'm 
> looking to get some feedback on what would seem to be reasonable guesses to 
> these partition sizes.

I think we need:

* No central authority such as a globally shared DB. This also
  means not partitioning some set and handing them out to zones as
  offset (this is just another form of a shared DB).

* Ability to seamlessly join existing zones without chance of namespace
  collisions for peering and bursting. This means a globally unique
  zone naming scheme, and for this I'll reiterate the idea of using
  DNS names for zones.

If we want to stick with a single DB per zone, as it looks like we
are, this can simply be the auto-increment value from the instance
table and the zone as: ..

>   The other concern is more aesthetic than technical: we can make the 
> numeric spaces big enough to avoid overlap, but then we'll have very large ID 
> values; e.g., 10 or more digits for an instance. Computers won't care, but 
> people might, so I thought I'd at least bring up this potential objection.

I'm not concerned with aesthetic issues to be honest. We have
copy/paste, DNS, and various techniques for presentation layers.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Monsyne Dragon
Also, I should note that there seems to be merges pending to make the 
v1.1 api use urls as instance identifiers in api calls, rather than 
integer id's...
I'm not sure of the impact of that with the v1.0 compat, but that is 
something to think of.


--

--
-Monsyne Dragon




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Justin Santa Barbara
I think _if_ we want to stick with straight numbers, the following are the
'traditional' choices:

1) "Skipping" - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6.
 Requires that you know in advance how many zones there are.
2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
3) Central allocation - each zone would request an ID from a central pool.
 This might not be a bad thing, if you do want to have a quick lookup table
of ID -> zone.  Doesn't work if the zones aren't under the same
administrative control.
4) Block allocation - a refinement of #3, where you get a bunch of IDs.
 Effectively amortizes the cost of the RPC.  Probably not worth the effort
here.

(If you want central allocation without a shared database, that's also
possible, but requires some trickier protocols.)

However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm a
customer of Rackspace CloudServers once it is running on OpenStack, and I
also have a private cloud that the new Rackspace Cloud Business unit has
built for me.  I like both, and then I want to do cloud bursting in between
them, by putting an aggregating zone in front of them.  I think at that
stage, we're screwed unless we figure this out now.  And this scenario only
has one provider (Rackspace) involved!

We can square the circle however - if we want numbers, let's use UUIDs -
they're 128 bit numbers, and won't in practice collide.  I'd still prefer
strings though...

Justin



On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe  wrote:

>I want to get some input from all of you on what you think is the
> best way to approach this problem: the RS API requires that every instance
> have a unique ID, and we are currently creating these IDs by use of an
> auto-increment field in the instances table. The introduction of zones
> complicates this, as each zone has its own database.
>
>The two obvious solutions are a) a single, shared database and b)
> using a UUID instead of an integer for the ID. Both of these approaches have
> been discussed and rejected, so let's not bring them back up now.
>
>Given integer IDs and separate databases, the only obvious choice is
> partitioning the numeric space so that each zone starts its
> auto-incrementing at a different point, with enough room between starting
> ranges to ensure that they would never overlap. This would require some
> assumptions be made about the maximum number of instances that would ever be
> created in a single zone in order to determine how much numeric space that
> zone would need. I'm looking to get some feedback on what would seem to be
> reasonable guesses to these partition sizes.
>
>The other concern is more aesthetic than technical: we can make the
> numeric spaces big enough to avoid overlap, but then we'll have very large
> ID values; e.g., 10 or more digits for an instance. Computers won't care,
> but people might, so I thought I'd at least bring up this potential
> objection.
>
>
>
> -- Ed Leafe
>
>
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jon Slenk
On Tue, Mar 22, 2011 at 10:41 AM, Ed Leafe  wrote:
>        Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT,
> I would say the chance of a stringish thing slipping in is pretty small. :)

if the schema cannot be changed (which might be worth reconsidering
since it seems to be a bit of a root cause of trouble) then maybe you
have to reserve the last 4 or 5 digits of the id to be the zone id,
and then autoincrement on top of that? on the assumption that there
would be a limit of  or 9 zones ever.

but really i'd hazard to suggest that it should somehow be 2 parts,
neither of which are super constrained: a zone part and an in-zone-id
part.

it could even be that the id is left as-is and is semantically
required to be joined with the zone name as a prefix before it is a
valid interzone id.

sincerely.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe
On Mar 22, 2011, at 1:11 PM, Jon Slenk wrote:

> the IDs must be strictly numericalish numbers, with nothing smelling
> of something like a string in there, i take it?


Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT,
I would say the chance of a stringish thing slipping in is pretty small. :)



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Monsyne Dragon

On 3/22/11 11:40 AM, Ed Leafe wrote:

I want to get some input from all of you on what you think is the best 
way to approach this problem: the RS API requires that every instance have a 
unique ID, and we are currently creating these IDs by use of an auto-increment 
field in the instances table. The introduction of zones complicates this, as 
each zone has its own database.

The two obvious solutions are a) a single, shared database and b) using 
a UUID instead of an integer for the ID. Both of these approaches have been 
discussed and rejected, so let's not bring them back up now.

Given integer IDs and separate databases, the only obvious choice is 
partitioning the numeric space so that each zone starts its auto-incrementing 
at a different point, with enough room between starting ranges to ensure that 
they would never overlap. This would require some assumptions be made about the 
maximum number of instances that would ever be created in a single zone in 
order to determine how much numeric space that zone would need. I'm looking to 
get some feedback on what would seem to be reasonable guesses to these 
partition sizes.

The other concern is more aesthetic than technical: we can make the 
numeric spaces big enough to avoid overlap, but then we'll have very large ID 
values; e.g., 10 or more digits for an instance. Computers won't care, but 
people might, so I thought I'd at least bring up this potential objection.

Hmm If you make your id large enough, you are basically recreating 
option b) (uuids).  As for partitioning a numeric space, that does 
kindof throw a wrench in the idea of attaching a child zone that knows 
nothing of the parent, (i.e. for bursting to a public cloud from a 
private one)  unless the number is globally unique. (see option b again).


IMHO, the instance id probably should be prefixed with some zone 
identifier.   Thus zone1:001234 does not conflict with zone2:001234.




-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



--

--
-Monsyne Dragon
work: 210-312-4190
mobile210-441-0965
google voice: 210-338-0336



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jon Slenk
the IDs must be strictly numericalish numbers, with nothing smelling
of something like a string in there, i take it?

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp