Re: [Openstack] [Nova] How common is user_data for instances?
On 08/13/2012 07:38 PM, Michael Still wrote: On 14/08/12 08:54, Jay Pipes wrote: I was *going* to create a random-data table with the same average row size as the instances table in Nova to see how long the migration would take, and then I realized something... The user_data column is already of column type MEDIUMTEXT, not TEXT: jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data user_datamediumtext YES NULL So the column can already store data up to 2^24 bytes long, or 16MB of data. So this might be a moot issue already? Do we expect user data to be more than 16MB? The bug reports truncation at 64kb. The last schema change I can see for that column is Essex version 82, which has: $ grep user_data *.py 082_essex.py:Column('user_data', Text), http://docs.sqlalchemy.org/en/latest/dialects/mysql.html says that Text is MySQL TEXT type, for text up to 2^16 characters. Am I misunderstanding something here? No, I read the exact same thing in the SQLAlchemy docs and was surprised to see the column type was MEDIUMTEXT. But I assure you it is :) Just run devstack and verify! -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
- Original Message - From: Michael Still michael.st...@canonical.com To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org Sent: Saturday, August 11, 2012 5:12:22 AM Subject: [Openstack] [Nova] How common is user_data for instances? Greetings. I'm seeking information about how common user_data is for instances in nova. Specifically for large deployments (rackspace and HP, here's looking at you). What sort of costs would be associated with changing the data type of the user_data column in the nova database? Bug 1035055 [1] requests that we allow user_data of more than 65,535 bytes per instance. Note that this size is a base64 encoded version of the data, so that's only a bit under 50k of data. This is because the data is a sqlalchemy Text column. We could convert to a LongText column, which allows 2^32 worth of data, but I want to understand the cost to operators of that change some more. Is user_data really common? Do you think people would start uploading much bigger user_data? Do you care? Nova has configurable quotas on most things so if we do increase the size of the DB column we should probably guard it in a configurable manner with quotas as well. My preference would actually be that we go the other way though and not have to store user_data in the database at all. That unfortunately may not be possible since some images obtain user_data via the metadata service which needs a way to look it up. Other methods of injecting metadata via disk injection, agents and/or config drive however might not need it to be store in the database right? As a simpler solution: Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 bad request if incoming requests exceed that limit be good enough to resolve this ticket? That way we don't have to increase the DB column at all and end users would be notified up front that user_data is too large (not silently truncated). They way I see it user_data is really for bootstrapping instances... we probably don't need it to be large enough to write an entire application, etc. Mikal 1: https://bugs.launchpad.net/nova/+bug/1035055 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
Hi, I think user_data is probably reasonably common - most people who use, eg, cloud-init will use it (we do). As the 64k limit is a MySQL limitation, and not a nova limitation, why not just say, if you want more storage, use postgres (or similar)? I have no issue with making the size guarded in the application, with a configurable limit, but the particular problem that started this off is an implementation issue rather than a code issue. Storing the user_data in some place like the database is fairly important for making things like launch configs for autoscale groups work. I'd like to not make that harder to implement. Cheers, On Mon, 2012-08-13 at 09:12 -0400, Dan Prince wrote: - Original Message - From: Michael Still michael.st...@canonical.com To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org Sent: Saturday, August 11, 2012 5:12:22 AM Subject: [Openstack] [Nova] How common is user_data for instances? Greetings. I'm seeking information about how common user_data is for instances in nova. Specifically for large deployments (rackspace and HP, here's looking at you). What sort of costs would be associated with changing the data type of the user_data column in the nova database? Bug 1035055 [1] requests that we allow user_data of more than 65,535 bytes per instance. Note that this size is a base64 encoded version of the data, so that's only a bit under 50k of data. This is because the data is a sqlalchemy Text column. We could convert to a LongText column, which allows 2^32 worth of data, but I want to understand the cost to operators of that change some more. Is user_data really common? Do you think people would start uploading much bigger user_data? Do you care? Nova has configurable quotas on most things so if we do increase the size of the DB column we should probably guard it in a configurable manner with quotas as well. My preference would actually be that we go the other way though and not have to store user_data in the database at all. That unfortunately may not be possible since some images obtain user_data via the metadata service which needs a way to look it up. Other methods of injecting metadata via disk injection, agents and/or config drive however might not need it to be store in the database right? As a simpler solution: Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 bad request if incoming requests exceed that limit be good enough to resolve this ticket? That way we don't have to increase the DB column at all and end users would be notified up front that user_data is too large (not silently truncated). They way I see it user_data is really for bootstrapping instances... we probably don't need it to be large enough to write an entire application, etc. Mikal 1: https://bugs.launchpad.net/nova/+bug/1035055 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Stephen Gran Senior Systems Integrator - guardian.co.uk Please consider the environment before printing this email. -- Visit guardian.co.uk - newspaper of the year www.guardian.co.ukwww.observer.co.uk www.guardiannews.com On your mobile, visit m.guardian.co.uk or download the Guardian iPhone app www.guardian.co.uk/iphone and iPad edition www.guardian.co.uk/iPad Save up to 37% by subscribing to the Guardian and Observer - choose the papers you want and get full digital access. Visit guardian.co.uk/subscribe - This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https
Re: [Openstack] [Nova] How common is user_data for instances?
On 08/13/2012 09:12 AM, Dan Prince wrote: - Original Message - From: Michael Still michael.st...@canonical.com To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org Sent: Saturday, August 11, 2012 5:12:22 AM Subject: [Openstack] [Nova] How common is user_data for instances? Greetings. I'm seeking information about how common user_data is for instances in nova. Specifically for large deployments (rackspace and HP, here's looking at you). What sort of costs would be associated with changing the data type of the user_data column in the nova database? Bug 1035055 [1] requests that we allow user_data of more than 65,535 bytes per instance. Note that this size is a base64 encoded version of the data, so that's only a bit under 50k of data. This is because the data is a sqlalchemy Text column. We could convert to a LongText column, which allows 2^32 worth of data, but I want to understand the cost to operators of that change some more. Is user_data really common? Do you think people would start uploading much bigger user_data? Do you care? Nova has configurable quotas on most things so if we do increase the size of the DB column we should probably guard it in a configurable manner with quotas as well. My preference would actually be that we go the other way though and not have to store user_data in the database at all. That unfortunately may not be possible since some images obtain user_data via the metadata service which needs a way to look it up. Other methods of injecting metadata via disk injection, agents and/or config drive however might not need it to be store in the database right? +1 When we can, let's not hobble ourselves to the EC2 API way of doing things when we can have a more efficient and innovative solution. As a simpler solution: Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 bad request if incoming requests exceed that limit be good enough to resolve this ticket? That way we don't have to increase the DB column at all and end users would be notified up front that user_data is too large (not silently truncated). They way I see it user_data is really for bootstrapping instances... we probably don't need it to be large enough to write an entire application, etc. Seems reasonable to me. -jay Mikal 1: https://bugs.launchpad.net/nova/+bug/1035055 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
On 08/13/2012 09:53 AM, Stephen Gran wrote: Hi, I think user_data is probably reasonably common - most people who use, eg, cloud-init will use it (we do). As the 64k limit is a MySQL limitation, and not a nova limitation, why not just say, if you want more storage, use postgres (or similar)? I have no issue with making the size guarded in the application, with a configurable limit, but the particular problem that started this off is an implementation issue rather than a code issue. Or just set the column to the LONGTEXT type and both MySQL and PostgreSQL will be just as happy. Storing the user_data in some place like the database is fairly important for making things like launch configs for autoscale groups work. I'd like to not make that harder to implement. Why is storing user_data in the database fairly important? You say above you don't want an implementation issue to be misconceived as a code issue -- and then go on to say that an implementation issue (storing user_data in a database) isn't a code issue. I don't think you can have it both ways. :) Now, I totally buy the argument that there is a large existing cloud-init userbase out there that relies on the EC2 Metadata API service living on the hard-coded 169.254.169.254 address, and we shouldn't do anything to mess up that experience. But I totally think that config-drive or disk-injection is a better way to handle this stuff -- and certainly doesn't force an implementation that has proven to be a major performance and scaling bottleneck (the EC2 Metadata service) Best, -jay Cheers, On Mon, 2012-08-13 at 09:12 -0400, Dan Prince wrote: - Original Message - From: Michael Still michael.st...@canonical.com To: openstack@lists.launchpad.net, openstack-operat...@lists.openstack.org Sent: Saturday, August 11, 2012 5:12:22 AM Subject: [Openstack] [Nova] How common is user_data for instances? Greetings. I'm seeking information about how common user_data is for instances in nova. Specifically for large deployments (rackspace and HP, here's looking at you). What sort of costs would be associated with changing the data type of the user_data column in the nova database? Bug 1035055 [1] requests that we allow user_data of more than 65,535 bytes per instance. Note that this size is a base64 encoded version of the data, so that's only a bit under 50k of data. This is because the data is a sqlalchemy Text column. We could convert to a LongText column, which allows 2^32 worth of data, but I want to understand the cost to operators of that change some more. Is user_data really common? Do you think people would start uploading much bigger user_data? Do you care? Nova has configurable quotas on most things so if we do increase the size of the DB column we should probably guard it in a configurable manner with quotas as well. My preference would actually be that we go the other way though and not have to store user_data in the database at all. That unfortunately may not be possible since some images obtain user_data via the metadata service which needs a way to look it up. Other methods of injecting metadata via disk injection, agents and/or config drive however might not need it to be store in the database right? As a simpler solution: Would setting a reasonable limit (hopefully smaller) and returning a HTTP 400 bad request if incoming requests exceed that limit be good enough to resolve this ticket? That way we don't have to increase the DB column at all and end users would be notified up front that user_data is too large (not silently truncated). They way I see it user_data is really for bootstrapping instances... we probably don't need it to be large enough to write an entire application, etc. Mikal 1: https://bugs.launchpad.net/nova/+bug/1035055 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
On 14/08/12 01:24, Jay Pipes wrote: Or just set the column to the LONGTEXT type and both MySQL and PostgreSQL will be just as happy. This is what I was originally aiming at -- will large deployers be angry if I change this column to longtext? Will the migration be a significant problem for them? Mikal ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
I'm pretty sure its common since its the main way to get data into cloud-init. -Josh On 8/13/12 3:02 PM, Michael Still michael.st...@canonical.com wrote: On 14/08/12 01:24, Jay Pipes wrote: Or just set the column to the LONGTEXT type and both MySQL and PostgreSQL will be just as happy. This is what I was originally aiming at -- will large deployers be angry if I change this column to longtext? Will the migration be a significant problem for them? Mikal ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
On 08/13/2012 06:02 PM, Michael Still wrote: On 14/08/12 01:24, Jay Pipes wrote: Or just set the column to the LONGTEXT type and both MySQL and PostgreSQL will be just as happy. This is what I was originally aiming at -- will large deployers be angry if I change this column to longtext? Will the migration be a significant problem for them? From the MySQL standpoint, the migration impact is neglible. It's essentially changing the row pointer size from 2 bytes to 4 bytes and rewriting data pages. For InnoDB tables, it's unlikely many rows would even be moved, as InnoDB stores a good chunk of these types of rows in its main data pages -- I think up to 4KB if I remember correctly -- so unless the user data exceeded that size, I don't think the rows would even need to move data pages... I would guess that an ALTER TABLE that changes the column from a TEXT to a LONGTEXT would likely take less than a minute for even a pretty big (millions of rows in the instances table) database. I was *going* to create a random-data table with the same average row size as the instances table in Nova to see how long the migration would take, and then I realized something... The user_data column is already of column type MEDIUMTEXT, not TEXT: jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data user_data mediumtext YES NULL So the column can already store data up to 2^24 bytes long, or 16MB of data. So this might be a moot issue already? Do we expect user data to be more than 16MB? -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] How common is user_data for instances?
On 14/08/12 08:54, Jay Pipes wrote: I was *going* to create a random-data table with the same average row size as the instances table in Nova to see how long the migration would take, and then I realized something... The user_data column is already of column type MEDIUMTEXT, not TEXT: jpipes@uberbox:~$ mysql -uroot nova -e DESC instances | grep user_data user_data mediumtext YES NULL So the column can already store data up to 2^24 bytes long, or 16MB of data. So this might be a moot issue already? Do we expect user data to be more than 16MB? The bug reports truncation at 64kb. The last schema change I can see for that column is Essex version 82, which has: $ grep user_data *.py 082_essex.py:Column('user_data', Text), http://docs.sqlalchemy.org/en/latest/dialects/mysql.html says that Text is MySQL TEXT type, for text up to 2^16 characters. Am I misunderstanding something here? Mikal ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] [Nova] How common is user_data for instances?
Greetings. I'm seeking information about how common user_data is for instances in nova. Specifically for large deployments (rackspace and HP, here's looking at you). What sort of costs would be associated with changing the data type of the user_data column in the nova database? Bug 1035055 [1] requests that we allow user_data of more than 65,535 bytes per instance. Note that this size is a base64 encoded version of the data, so that's only a bit under 50k of data. This is because the data is a sqlalchemy Text column. We could convert to a LongText column, which allows 2^32 worth of data, but I want to understand the cost to operators of that change some more. Is user_data really common? Do you think people would start uploading much bigger user_data? Do you care? Mikal 1: https://bugs.launchpad.net/nova/+bug/1035055 ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp