Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-14 Thread Jay Pipes
On 08/13/2012 07:38 PM, Michael Still wrote:
 On 14/08/12 08:54, Jay Pipes wrote:
 
 I was *going* to create a random-data table with the same average row
 size as the instances table in Nova to see how long the migration would
 take, and then I realized something... The user_data column is already
 of column type MEDIUMTEXT, not TEXT:

 jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data
 user_datamediumtext  YES NULL

 So the column can already store data up to 2^24 bytes long, or 16MB of
 data. So this might be a moot issue already? Do we expect user data to
 be more than 16MB?
 
 The bug reports truncation at 64kb. The last schema change I can see for
 that column is Essex version 82, which has:
 
 $ grep user_data *.py
 082_essex.py:Column('user_data', Text),
 
 http://docs.sqlalchemy.org/en/latest/dialects/mysql.html says that Text
 is MySQL TEXT type, for text up to 2^16 characters.
 
 Am I misunderstanding something here?

No, I read the exact same thing in the SQLAlchemy docs and was surprised
to see the column type was MEDIUMTEXT. But I assure you it is :) Just
run devstack and verify!

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Dan Prince


- Original Message -
 From: Michael Still michael.st...@canonical.com
 To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org
 Sent: Saturday, August 11, 2012 5:12:22 AM
 Subject: [Openstack] [Nova] How common is user_data for instances?
 
 Greetings.
 
 I'm seeking information about how common user_data is for instances
 in
 nova. Specifically for large deployments (rackspace and HP, here's
 looking at you). What sort of costs would be associated with changing
 the data type of the user_data column in the nova database?
 
 Bug 1035055 [1] requests that we allow user_data of more than 65,535
 bytes per instance. Note that this size is a base64 encoded version
 of
 the data, so that's only a bit under 50k of data. This is because the
 data is a sqlalchemy Text column.
 
 We could convert to a LongText column, which allows 2^32 worth of
 data,
 but I want to understand the cost to operators of that change some
 more.
 Is user_data really common? Do you think people would start uploading
 much bigger user_data? Do you care?

Nova has configurable quotas on most things so if we do increase the size of 
the DB column we should probably guard it in a configurable manner with quotas 
as well.

My preference would actually be that we go the other way though and not have to 
store user_data in the database at all. That unfortunately may not be possible 
since some images obtain user_data via the metadata service which needs a way 
to look it up. Other methods of injecting metadata via disk injection, agents 
and/or config drive however might not need it to be store in the database right?

As a simpler solution:

Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 
bad request if incoming requests exceed that limit be good enough to resolve 
this ticket? That way we don't have to increase the DB column at all and end 
users would be notified up front that user_data is too large (not silently 
truncated). They way I see it user_data is really for bootstrapping 
instances... we probably don't need it to be large enough to write an entire 
application, etc.


 
 Mikal
 
 1: https://bugs.launchpad.net/nova/+bug/1035055
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Stephen Gran
Hi,

I think user_data is probably reasonably common - most people who use,
eg, cloud-init will use it (we do).

As the 64k limit is a MySQL limitation, and not a nova limitation, why
not just say, if you want more storage, use postgres (or similar)?  I
have no issue with making the size guarded in the application, with a
configurable limit, but the particular problem that started this off is
an implementation issue rather than a code issue.

Storing the user_data in some place like the database is fairly
important for making things like launch configs for autoscale groups
work.  I'd like to not make that harder to implement.

Cheers,

On Mon, 2012-08-13 at 09:12 -0400, Dan Prince wrote:
 
 - Original Message -
  From: Michael Still michael.st...@canonical.com
  To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org
  Sent: Saturday, August 11, 2012 5:12:22 AM
  Subject: [Openstack] [Nova] How common is user_data for instances?
  
  Greetings.
  
  I'm seeking information about how common user_data is for instances
  in
  nova. Specifically for large deployments (rackspace and HP, here's
  looking at you). What sort of costs would be associated with changing
  the data type of the user_data column in the nova database?
  
  Bug 1035055 [1] requests that we allow user_data of more than 65,535
  bytes per instance. Note that this size is a base64 encoded version
  of
  the data, so that's only a bit under 50k of data. This is because the
  data is a sqlalchemy Text column.
  
  We could convert to a LongText column, which allows 2^32 worth of
  data,
  but I want to understand the cost to operators of that change some
  more.
  Is user_data really common? Do you think people would start uploading
  much bigger user_data? Do you care?
 
 Nova has configurable quotas on most things so if we do increase the size of 
 the DB column we should probably guard it in a configurable manner with 
 quotas as well.
 
 My preference would actually be that we go the other way though and not have 
 to store user_data in the database at all. That unfortunately may not be 
 possible since some images obtain user_data via the metadata service which 
 needs a way to look it up. Other methods of injecting metadata via disk 
 injection, agents and/or config drive however might not need it to be store 
 in the database right?
 
 As a simpler solution:
 
 Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 
 bad request if incoming requests exceed that limit be good enough to resolve 
 this ticket? That way we don't have to increase the DB column at all and end 
 users would be notified up front that user_data is too large (not silently 
 truncated). They way I see it user_data is really for bootstrapping 
 instances... we probably don't need it to be large enough to write an entire 
 application, etc.
 
 
  
  Mikal
  
  1: https://bugs.launchpad.net/nova/+bug/1035055
  
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
  
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

-- 
Stephen Gran
Senior Systems Integrator - guardian.co.uk

Please consider the environment before printing this email.
--
Visit guardian.co.uk - newspaper of the year

www.guardian.co.ukwww.observer.co.uk www.guardiannews.com 

On your mobile, visit m.guardian.co.uk or download the Guardian
iPhone app www.guardian.co.uk/iphone and iPad edition www.guardian.co.uk/iPad 
 
Save up to 37% by subscribing to the Guardian and Observer - choose the papers 
you want and get full digital access. 
Visit guardian.co.uk/subscribe 

-
This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.
 
Guardian News  Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News  Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https

Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Jay Pipes
On 08/13/2012 09:12 AM, Dan Prince wrote:
 - Original Message -
 From: Michael Still michael.st...@canonical.com
 To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org
 Sent: Saturday, August 11, 2012 5:12:22 AM
 Subject: [Openstack] [Nova] How common is user_data for instances?

 Greetings.

 I'm seeking information about how common user_data is for instances
 in
 nova. Specifically for large deployments (rackspace and HP, here's
 looking at you). What sort of costs would be associated with changing
 the data type of the user_data column in the nova database?

 Bug 1035055 [1] requests that we allow user_data of more than 65,535
 bytes per instance. Note that this size is a base64 encoded version
 of
 the data, so that's only a bit under 50k of data. This is because the
 data is a sqlalchemy Text column.

 We could convert to a LongText column, which allows 2^32 worth of
 data,
 but I want to understand the cost to operators of that change some
 more.
 Is user_data really common? Do you think people would start uploading
 much bigger user_data? Do you care?
 
 Nova has configurable quotas on most things so if we do increase the size of 
 the DB column we should probably guard it in a configurable manner with 
 quotas as well.
 
 My preference would actually be that we go the other way though and not have 
 to store user_data in the database at all. That unfortunately may not be 
 possible since some images obtain user_data via the metadata service which 
 needs a way to look it up. Other methods of injecting metadata via disk 
 injection, agents and/or config drive however might not need it to be store 
 in the database right?

+1 When we can, let's not hobble ourselves to the EC2 API way of doing
things when we can have a more efficient and innovative solution.

 As a simpler solution:
 
 Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 
 bad request if incoming requests exceed that limit be good enough to resolve 
 this ticket? That way we don't have to increase the DB column at all and end 
 users would be notified up front that user_data is too large (not silently 
 truncated). They way I see it user_data is really for bootstrapping 
 instances... we probably don't need it to be large enough to write an entire 
 application, etc.

Seems reasonable to me.

-jay


 Mikal

 1: https://bugs.launchpad.net/nova/+bug/1035055

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Jay Pipes
On 08/13/2012 09:53 AM, Stephen Gran wrote:
 Hi,
 
 I think user_data is probably reasonably common - most people who use,
 eg, cloud-init will use it (we do).
 
 As the 64k limit is a MySQL limitation, and not a nova limitation, why
 not just say, if you want more storage, use postgres (or similar)?  I
 have no issue with making the size guarded in the application, with a
 configurable limit, but the particular problem that started this off is
 an implementation issue rather than a code issue.

Or just set the column to the LONGTEXT type and both MySQL and
PostgreSQL will be just as happy.

 Storing the user_data in some place like the database is fairly
 important for making things like launch configs for autoscale groups
 work.  I'd like to not make that harder to implement.

Why is storing user_data in the database fairly important? You say above
you don't want an implementation issue to be misconceived as a code
issue -- and then go on to say that an implementation issue (storing
user_data in a database) isn't a code issue. I don't think you can have
it both ways. :)

Now, I totally buy the argument that there is a large existing
cloud-init userbase out there that relies on the EC2 Metadata API
service living on the hard-coded 169.254.169.254 address, and we
shouldn't do anything to mess up that experience. But I totally think
that config-drive or disk-injection is a better way to handle this stuff
-- and certainly doesn't force an implementation that has proven to be a
major performance and scaling bottleneck (the EC2 Metadata service)

Best,
-jay

 Cheers,
 
 On Mon, 2012-08-13 at 09:12 -0400, Dan Prince wrote:

 - Original Message -
 From: Michael Still michael.st...@canonical.com
 To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org
 Sent: Saturday, August 11, 2012 5:12:22 AM
 Subject: [Openstack] [Nova] How common is user_data for instances?

 Greetings.

 I'm seeking information about how common user_data is for instances
 in
 nova. Specifically for large deployments (rackspace and HP, here's
 looking at you). What sort of costs would be associated with changing
 the data type of the user_data column in the nova database?

 Bug 1035055 [1] requests that we allow user_data of more than 65,535
 bytes per instance. Note that this size is a base64 encoded version
 of
 the data, so that's only a bit under 50k of data. This is because the
 data is a sqlalchemy Text column.

 We could convert to a LongText column, which allows 2^32 worth of
 data,
 but I want to understand the cost to operators of that change some
 more.
 Is user_data really common? Do you think people would start uploading
 much bigger user_data? Do you care?

 Nova has configurable quotas on most things so if we do increase the size of 
 the DB column we should probably guard it in a configurable manner with 
 quotas as well.

 My preference would actually be that we go the other way though and not have 
 to store user_data in the database at all. That unfortunately may not be 
 possible since some images obtain user_data via the metadata service which 
 needs a way to look it up. Other methods of injecting metadata via disk 
 injection, agents and/or config drive however might not need it to be store 
 in the database right?

 As a simpler solution:

 Would setting a reasonable limit (hopefully smaller) and returning a HTTP 
 400 bad request if incoming requests exceed that limit be good enough to 
 resolve this ticket? That way we don't have to increase the DB column at all 
 and end users would be notified up front that user_data is too large (not 
 silently truncated). They way I see it user_data is really for bootstrapping 
 instances... we probably don't need it to be large enough to write an entire 
 application, etc.



 Mikal

 1: https://bugs.launchpad.net/nova/+bug/1035055

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Michael Still
On 14/08/12 01:24, Jay Pipes wrote:

 Or just set the column to the LONGTEXT type and both MySQL and
 PostgreSQL will be just as happy.

This is what I was originally aiming at -- will large deployers be angry
if I change this column to longtext? Will the migration be a significant
problem for them?

Mikal

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Joshua Harlow
I'm pretty sure its common since its the main way to get data into
cloud-init.

-Josh

On 8/13/12 3:02 PM, Michael Still michael.st...@canonical.com wrote:

On 14/08/12 01:24, Jay Pipes wrote:

 Or just set the column to the LONGTEXT type and both MySQL and
 PostgreSQL will be just as happy.

This is what I was originally aiming at -- will large deployers be angry
if I change this column to longtext? Will the migration be a significant
problem for them?

Mikal

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Jay Pipes
On 08/13/2012 06:02 PM, Michael Still wrote:
 On 14/08/12 01:24, Jay Pipes wrote:
 
 Or just set the column to the LONGTEXT type and both MySQL and
 PostgreSQL will be just as happy.
 
 This is what I was originally aiming at -- will large deployers be angry
 if I change this column to longtext? Will the migration be a significant
 problem for them?

From the MySQL standpoint, the migration impact is neglible. It's
essentially changing the row pointer size from 2 bytes to 4 bytes and
rewriting data pages. For InnoDB tables, it's unlikely many rows would
even be moved, as InnoDB stores a good chunk of these types of rows in
its main data pages -- I think up to 4KB if I remember correctly -- so
unless the user data exceeded that size, I don't think the rows would
even need to move data pages...

I would guess that an ALTER TABLE that changes the column from a TEXT to
a LONGTEXT would likely take less than a minute for even a pretty big
(millions of rows in the instances table) database.

I was *going* to create a random-data table with the same average row
size as the instances table in Nova to see how long the migration would
take, and then I realized something... The user_data column is already
of column type MEDIUMTEXT, not TEXT:

jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data
user_data   mediumtext  YES NULL

So the column can already store data up to 2^24 bytes long, or 16MB of
data. So this might be a moot issue already? Do we expect user data to
be more than 16MB?

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Nova] How common is user_data for instances?

2012-08-13 Thread Michael Still
On 14/08/12 08:54, Jay Pipes wrote:

 I was *going* to create a random-data table with the same average row
 size as the instances table in Nova to see how long the migration would
 take, and then I realized something... The user_data column is already
 of column type MEDIUMTEXT, not TEXT:
 
 jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data
 user_data mediumtext  YES NULL
 
 So the column can already store data up to 2^24 bytes long, or 16MB of
 data. So this might be a moot issue already? Do we expect user data to
 be more than 16MB?

The bug reports truncation at 64kb. The last schema change I can see for
that column is Essex version 82, which has:

$ grep user_data *.py
082_essex.py:Column('user_data', Text),

http://docs.sqlalchemy.org/en/latest/dialects/mysql.html says that Text
is MySQL TEXT type, for text up to 2^16 characters.

Am I misunderstanding something here?

Mikal

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] [Nova] How common is user_data for instances?

2012-08-11 Thread Michael Still
Greetings.

I'm seeking information about how common user_data is for instances in
nova. Specifically for large deployments (rackspace and HP, here's
looking at you). What sort of costs would be associated with changing
the data type of the user_data column in the nova database?

Bug 1035055 [1] requests that we allow user_data of more than 65,535
bytes per instance. Note that this size is a base64 encoded version of
the data, so that's only a bit under 50k of data. This is because the
data is a sqlalchemy Text column.

We could convert to a LongText column, which allows 2^32 worth of data,
but I want to understand the cost to operators of that change some more.
Is user_data really common? Do you think people would start uploading
much bigger user_data? Do you care?

Mikal

1: https://bugs.launchpad.net/nova/+bug/1035055

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp