Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Pádraig Brady
On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,
 
 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice on.
 
 == The Problem ==
 

 QEMU was unhelpfully returning a vague error message of error while writing.

That could be improved.
As an aside, since qemu-img is mainly dealing with large files,
it would be a prime candidate to call fallocate() from
to get good layout for the files and immediate feedback
if there isn't enough space.

On a related note, I've a patch pending for after RC1
that should auto clean any of these partially written files:
https://review.openstack.org/#change,5442

 As it turns out, the base operating system we install on our compute nodes in 
 TryStack has a (very) small root partition

 == Possible Solutions ==
 
 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come up 
 with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:
 
 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful error 
 message up the stack

The space can change while writing, so you could still get the same error above.

 
 Solutions to the disk space problem:
 
 (1) Silly Jay, change the damn size of the root partition in your PXE base OS 
 install!
 
 Now, I'm no expert in creating customized base disk images, but from looking 
 at the build_pxe_env.sh script in devstack [1], it seems pretty trivial to 
 change the ramdisk_size parameter in the startup options to something larger 
 than 2109600. We could do this and reimage the compute nodes one by one.
 
 (2) Make the location in which the snapshot is made configurable.
 
 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the root 
 partition).
 
 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.
 
 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).

I'd lean towards (3).
That's something that depends on the environment (as you've nicely 
demonstrated),
and also for security reasons the admin should be able to set TMPDIR.
That's the standard way to do it, and it works already (hopefully).

cheers,
Pádraig.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Justin Shepherd


Sent from my iPad

On Mar 16, 2012, at 12:26, Pádraig Brady p...@draigbrady.com wrote:

 On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,
 
 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice on.
 
 == The Problem ==
 
 
 QEMU was unhelpfully returning a vague error message of error while 
 writing.
 
 That could be improved.
 As an aside, since qemu-img is mainly dealing with large files,
 it would be a prime candidate to call fallocate() from
 to get good layout for the files and immediate feedback
 if there isn't enough space.
 
 On a related note, I've a patch pending for after RC1
 that should auto clean any of these partially written files:
 https://review.openstack.org/#change,5442
 
 As it turns out, the base operating system we install on our compute nodes 
 in TryStack has a (very) small root partition
 
 == Possible Solutions ==
 
 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come 
 up with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:
 
 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful error 
 message up the stack
 
 The space can change while writing, so you could still get the same error 
 above.
 
 
 Solutions to the disk space problem:
 
 (1) Silly Jay, change the damn size of the root partition in your PXE base 
 OS install!
 
 Now, I'm no expert in creating customized base disk images, but from looking 
 at the build_pxe_env.sh script in devstack [1], it seems pretty trivial to 
 change the ramdisk_size parameter in the startup options to something larger 
 than 2109600. We could do this and reimage the compute nodes one by one.
 
 (2) Make the location in which the snapshot is made configurable.
 
 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the root 
 partition).
 
 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.
 
 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).
 
 I'd lean towards (3).
 That's something that depends on the environment (as you've nicely 
 demonstrated),
 and also for security reasons the admin should be able to set TMPDIR.
 That's the standard way to do it, and it works already (hopefully).

Actually I would argue that the best way to accomplish this would be option #2. 
That way an admin/operator has control over the location. Not manipulating this 
by messing around with a users environment variable.

 
 cheers,
 Pádraig.
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Pádraig Brady
On 03/16/2012 11:57 PM, Justin Shepherd wrote:
 
 
 On Mar 16, 2012, at 12:26, Pádraig Brady p...@draigbrady.com wrote:
 
 On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,

 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice on.

 == The Problem ==


 QEMU was unhelpfully returning a vague error message of error while 
 writing.

 That could be improved.
 As an aside, since qemu-img is mainly dealing with large files,
 it would be a prime candidate to call fallocate() from
 to get good layout for the files and immediate feedback
 if there isn't enough space.

 On a related note, I've a patch pending for after RC1
 that should auto clean any of these partially written files:
 https://review.openstack.org/#change,5442

 As it turns out, the base operating system we install on our compute nodes 
 in TryStack has a (very) small root partition

 == Possible Solutions ==

 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come 
 up with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:

 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful 
 error message up the stack

 The space can change while writing, so you could still get the same error 
 above.


 Solutions to the disk space problem:

 (1) Silly Jay, change the damn size of the root partition in your PXE base 
 OS install!

 Now, I'm no expert in creating customized base disk images, but from 
 looking at the build_pxe_env.sh script in devstack [1], it seems pretty 
 trivial to change the ramdisk_size parameter in the startup options to 
 something larger than 2109600. We could do this and reimage the compute 
 nodes one by one.

 (2) Make the location in which the snapshot is made configurable.

 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the 
 root partition).

 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.

 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).

 I'd lean towards (3).
 That's something that depends on the environment (as you've nicely 
 demonstrated),
 and also for security reasons the admin should be able to set TMPDIR.
 That's the standard way to do it, and it works already (hopefully).
 
 Actually I would argue that the best way to accomplish this would be option 
 #2. That way an admin/operator has control over the location. Not 
 manipulating this by messing around with a users environment variable.

Well one can set the TMPDIR in the init script for the service.
That's a fairly standard mechanism.

(2) is good though if you would ever want to separate
--libvirt-snapshot-dir from, $TMPDIR

Now I can definitely see the need for changing TMPDIR from /tmp
for Jay's reasons and /tmp being tmpfs by default on debian for example:
http://lists.debian.org/debian-devel/2011/11/msg00281.html
I'm not sure if you'd need to separate them?
Though I'm always biased towards avoiding new config variables.
I suppose one could argue you might want /tmp for small fast accesses,
and something large and separate for manipulating large files.

Now that I look at the existing nova uses of tmp dirs
to store/stage large images, I see existing config vars:

FLAGS.xenapi_sr_base_path  # xens default Storage Repo
FLAGS.image_decryption_dir # nova/image/s3.py

So if you were following that you would implement (2) with:

FLAGS.libvirt_snapshot_dir

There might be opportunity to merge all three to:

FLAGS.nova_image_staging_dir

cheers,
Pádraig.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Lorin Hochstein




On Mar 16, 2012, at 7:57 PM, Justin Shepherd wrote:

 
 
 Sent from my iPad
 
 On Mar 16, 2012, at 12:26, Pádraig Brady p...@draigbrady.com wrote:
 
 On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,
 
 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice on.
 
 == The Problem ==
 
 
 QEMU was unhelpfully returning a vague error message of error while 
 writing.
 

We ran into this problem in our Diablo deployment as well. It definitely needs 
a more informative error message.

 
 == Possible Solutions ==
 
 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come 
 up with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:
 
 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful 
 error message up the stack
 
 The space can change while writing, so you could still get the same error 
 above.
 
 
 Solutions to the disk space problem:
 
 (1) Silly Jay, change the damn size of the root partition in your PXE base 
 OS install!
 
 Now, I'm no expert in creating customized base disk images, but from 
 looking at the build_pxe_env.sh script in devstack [1], it seems pretty 
 trivial to change the ramdisk_size parameter in the startup options to 
 something larger than 2109600. We could do this and reimage the compute 
 nodes one by one.
 
 (2) Make the location in which the snapshot is made configurable.
 
 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the 
 root partition).
 
 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.
 
 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).
 
 I'd lean towards (3).
 That's something that depends on the environment (as you've nicely 
 demonstrated),
 and also for security reasons the admin should be able to set TMPDIR.
 That's the standard way to do it, and it works already (hopefully).
 
 Actually I would argue that the best way to accomplish this would be option 
 #2. That way an admin/operator has control over the location. Not 
 manipulating this by messing around with a users environment variable.
 



I agree with Pádraig that option #2 is a better way to go. I'd also recommend 
that it default to something under /var/lib/nova/instances, since that 
directory should generally be mounted a large partition by default.


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Vishvananda Ishaya
Now that we have the temdir context manager I was thinking something like:

diff --git a/nova/utils.py b/nova/utils.py
index e375f11..a3ac896 100644
--- a/nova/utils.py
+++ b/nova/utils.py
@@ -61,9 +61,11 @@ ISO_TIME_FORMAT = %Y-%m-%dT%H:%M:%S
 PERFECT_TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%f
 FLAGS = flags.FLAGS
 
-FLAGS.register_opt(
+FLAGS.register_opts([
 cfg.BoolOpt('disable_process_locking', default=False,
-help='Whether to disable inter-process locks'))
+help='Whether to disable inter-process locks'),
+cfg.StrOpt('tempdir_location', default=None,
+help='Path where tempdirs will be created')])
 
 
 def import_class(import_str):
@@ -1611,6 +1613,7 @@ def temporary_chown(path, owner_uid=None):
 
 @contextlib.contextmanager
 def tempdir(**kwargs):
+kwargs['dir'] = kwargs.get('dir', FLAGS.tempdir_location)
 tmpdir = tempfile.mkdtemp(**kwargs)
 try:
 yield tmpdir

And get rid of the other flags that are used.

Vish

On Mar 16, 2012, at 5:51 PM, Pádraig Brady wrote:

 On 03/16/2012 11:57 PM, Justin Shepherd wrote:
 
 
 On Mar 16, 2012, at 12:26, Pádraig Brady p...@draigbrady.com wrote:
 
 On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,
 
 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice 
 on.
 
 == The Problem ==
 
 
 QEMU was unhelpfully returning a vague error message of error while 
 writing.
 
 That could be improved.
 As an aside, since qemu-img is mainly dealing with large files,
 it would be a prime candidate to call fallocate() from
 to get good layout for the files and immediate feedback
 if there isn't enough space.
 
 On a related note, I've a patch pending for after RC1
 that should auto clean any of these partially written files:
 https://review.openstack.org/#change,5442
 
 As it turns out, the base operating system we install on our compute nodes 
 in TryStack has a (very) small root partition
 
 == Possible Solutions ==
 
 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come 
 up with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:
 
 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful 
 error message up the stack
 
 The space can change while writing, so you could still get the same error 
 above.
 
 
 Solutions to the disk space problem:
 
 (1) Silly Jay, change the damn size of the root partition in your PXE base 
 OS install!
 
 Now, I'm no expert in creating customized base disk images, but from 
 looking at the build_pxe_env.sh script in devstack [1], it seems pretty 
 trivial to change the ramdisk_size parameter in the startup options to 
 something larger than 2109600. We could do this and reimage the compute 
 nodes one by one.
 
 (2) Make the location in which the snapshot is made configurable.
 
 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the 
 root partition).
 
 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.
 
 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).
 
 I'd lean towards (3).
 That's something that depends on the environment (as you've nicely 
 demonstrated),
 and also for security reasons the admin should be able to set TMPDIR.
 That's the standard way to do it, and it works already (hopefully).
 
 Actually I would argue that the best way to accomplish this would be option 
 #2. That way an admin/operator has control over the location. Not 
 manipulating this by messing around with a users environment variable.
 
 Well one can set the TMPDIR in the init script for the service.
 That's a fairly standard mechanism.
 
 (2) is good though if you would ever want to separate
 --libvirt-snapshot-dir from, $TMPDIR
 
 Now I can definitely see the need for changing TMPDIR from /tmp
 for Jay's reasons and /tmp being tmpfs by default on debian for example:
 http://lists.debian.org/debian-devel/2011/11/msg00281.html
 I'm not sure if you'd need to separate them?
 Though I'm always biased towards avoiding new config variables.
 I suppose one could argue you might want /tmp for small fast accesses,
 and something large and separate for manipulating large files.
 
 Now that I look at the existing nova uses of tmp dirs
 to store/stage large images, I see existing config vars:
 
 FLAGS.xenapi_sr_base_path  # xens default Storage Repo
 FLAGS.image_decryption_dir # nova/image/s3.py
 
 So if you were following that you would implement (2) with:
 
 FLAGS.libvirt_snapshot_dir
 
 There might be opportunity to merge all three to:
 
 

Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Justin Shepherd

On Mar 16, 2012, at 7:51 PM, Pádraig Brady wrote:

 On 03/16/2012 11:57 PM, Justin Shepherd wrote:
 
 
 On Mar 16, 2012, at 12:26, Pádraig Brady p...@draigbrady.com wrote:
 
 On 03/16/2012 04:11 PM, Jay Pipes wrote:
 Hi Stackers,
 
 So, in diagnosing a few things on TryStack yesterday, I ran into an 
 interesting problem with snapshotting that I'm hoping to get some advice 
 on.
 
 == The Problem ==
 
 
 QEMU was unhelpfully returning a vague error message of error while 
 writing.
 
 That could be improved.
 As an aside, since qemu-img is mainly dealing with large files,
 it would be a prime candidate to call fallocate() from
 to get good layout for the files and immediate feedback
 if there isn't enough space.
 
 On a related note, I've a patch pending for after RC1
 that should auto clean any of these partially written files:
 https://review.openstack.org/#change,5442
 
 As it turns out, the base operating system we install on our compute nodes 
 in TryStack has a (very) small root partition
 
 == Possible Solutions ==
 
 So, there are a number of solutions that we can work on here, and I'm 
 wondering what the preference would be. Here are the solutions I have come 
 up with, along with a no-brainer improvement to Nova that would help in 
 diagnosing this problem:
 
 The no-brainer: Detect before attempting a snapshot that there is enough 
 space on a device to perform the operation, and if not, throw a useful 
 error message up the stack
 
 The space can change while writing, so you could still get the same error 
 above.
 
 
 Solutions to the disk space problem:
 
 (1) Silly Jay, change the damn size of the root partition in your PXE base 
 OS install!
 
 Now, I'm no expert in creating customized base disk images, but from 
 looking at the build_pxe_env.sh script in devstack [1], it seems pretty 
 trivial to change the ramdisk_size parameter in the startup options to 
 something larger than 2109600. We could do this and reimage the compute 
 nodes one by one.
 
 (2) Make the location in which the snapshot is made configurable.
 
 Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a 
 directory in the user's TMPDIR (typically /tmp, which is usually on the 
 root partition).
 
 We could add an option (--libvirt-snapshot-dir?) that would allow 
 nova-compute to override where that snapshot is built.
 
 (3) Change the user (running nova-compute) TMPDIR setting to something 
 different than /tmp on the root partition).
 
 I'd lean towards (3).
 That's something that depends on the environment (as you've nicely 
 demonstrated),
 and also for security reasons the admin should be able to set TMPDIR.
 That's the standard way to do it, and it works already (hopefully).
 
 Actually I would argue that the best way to accomplish this would be option 
 #2. That way an admin/operator has control over the location. Not 
 manipulating this by messing around with a users environment variable.
 
 Well one can set the TMPDIR in the init script for the service.
 That's a fairly standard mechanism.

While it is fairly standard practice.. it makes me cry a little inside every 
time i have to start adding ENV vars to an init script because of a hard coded 
value that was not exposed as a configuration option.

My $0.02 as an ops guy.

 
 (2) is good though if you would ever want to separate
 --libvirt-snapshot-dir from, $TMPDIR
 
 Now I can definitely see the need for changing TMPDIR from /tmp
 for Jay's reasons and /tmp being tmpfs by default on debian for example:
 http://lists.debian.org/debian-devel/2011/11/msg00281.html
 I'm not sure if you'd need to separate them?
 Though I'm always biased towards avoiding new config variables.
 I suppose one could argue you might want /tmp for small fast accesses,
 and something large and separate for manipulating large files.
 
 Now that I look at the existing nova uses of tmp dirs
 to store/stage large images, I see existing config vars:
 
 FLAGS.xenapi_sr_base_path  # xens default Storage Repo
 FLAGS.image_decryption_dir # nova/image/s3.py
 
 So if you were following that you would implement (2) with:
 
 FLAGS.libvirt_snapshot_dir
 
 There might be opportunity to merge all three to:
 
 FLAGS.nova_image_staging_dir
 
 cheers,
 Pádraig.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Justin Santa Barbara
We're creating a (huge) temp file, uploading it, and then deleting it.  So
really we should be streaming the snapshot direct to the destination
(glance?)

Checking the code, we are writing it sequentially (particularly if we're
writing in raw):
https://github.com/qemu/QEMU/blob/master/qemu-img.c


But there's more...

 qemu-img --help
qemu-img version 1.0, Copyright (c) 2004-2008 Fabrice Bellard
...
Supported formats: vvfat vpc vmdk vdi *sheepdog* *rbd* raw host_cdrom
host_floppy host_device file qed qcow2 qcow parallels *nbd iscsi* dmg *tftp
ftps ftp https http* cow cloop bochs blkverify blkdebug


So it looks like we really want a Supported format: glance there
(particularly as there's already http support in block/curl.c) :-)  I guess
we could then even do crazy things like booting direct from glance?

Or, if we don't want to get back into C, we could at least optimize the
case where glance is backed by Ceph, and stream direct to a Ceph file, and
then hand that file to Glance.

Justin





On Fri, Mar 16, 2012 at 9:11 AM, Jay Pipes jaypi...@gmail.com wrote:

 Hi Stackers,

 So, in diagnosing a few things on TryStack yesterday, I ran into an
 interesting problem with snapshotting that I'm hoping to get some advice on.

 == The Problem ==

 The TryStack codebase is Diablo, however the code involved in this
 particular problem I believe is the same in Essex...

 The issue that was happening was a user was attempting to snapshot a tiny
 instance (512MB/1-core) through the dashboard. The dashboard returned and
 noted that a snapshot was created and was in Queued status.

 The snapshot never goes out of Queued status, and so I logged into the
 compute node that housed the instance in question to see if I could figure
 out what was going on.

 Grepping through the compute log, I found the following:

 (nova.rpc): TRACE: Traceback (most recent call last):
 (nova.rpc): TRACE:   File /usr/lib/python2.7/dist-**
 packages/nova/rpc/impl_kombu.**py, line 628, in _process_data
 (nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
 (nova.rpc): TRACE:   File 
 /usr/lib/python2.7/dist-**packages/nova/exception.py,
 line 100, in wrapped
 (nova.rpc): TRACE: return f(*args, **kw)
 (nova.rpc): TRACE:   File /usr/lib/python2.7/dist-**
 packages/nova/compute/manager.**py, line 687, in snapshot_instance
 (nova.rpc): TRACE: self.driver.snapshot(context, instance_ref,
 image_id)
 (nova.rpc): TRACE:   File 
 /usr/lib/python2.7/dist-**packages/nova/exception.py,
 line 100, in wrapped
 (nova.rpc): TRACE: return f(*args, **kw)
 (nova.rpc): TRACE:   File /usr/lib/python2.7/dist-**
 packages/nova/virt/libvirt/**connection.py, line 479, in snapshot
 (nova.rpc): TRACE: utils.execute(*qemu_img_cmd)
 (nova.rpc): TRACE:   File /usr/lib/python2.7/dist-**packages/nova/utils.py,
 line 190, in execute
 (nova.rpc): TRACE: cmd=' '.join(cmd))
 (nova.rpc): TRACE: ProcessExecutionError: Unexpected error while running
 command.
 (nova.rpc): TRACE: Command: qemu-img convert -f qcow2 -O raw -s
 e7ba4fb5f6f04f99b07d1d222ada02**19 
 /opt/openstack/nova/instances/**instance-0548/disk
 /tmp/tmpIuOQo0/**e7ba4fb5f6f04f99b07d1d222ada02**19
 (nova.rpc): TRACE: Exit code: 1
 (nova.rpc): TRACE: Stdout: ''
 (nova.rpc): TRACE: Stderr: 'qemu-img: error while writing\n'

 QEMU was unhelpfully returning a vague error message of error while
 writing.

 It turned out, after speaking with a couple folks on IRC (thx vishy and
 rmk!) that the snapshot process (qemu-img convert ... above) is storing the
 output of the process (the snapshot) in a temporary directory created using
 tempfile.mkdtemp() in the nova/virt/libvirt/connection.**py file.

 As it turns out, the base operating system we install on our compute nodes
 in TryStack has a (very) small root partition -- only 2GB in size (we use
 the devstack build_pxe_env.sh script to create the base Ubuntu image that
 is netbooted on the compute nodes.

 Looking at the free disk space on the compute node in question, the
 problem was apparent:

 root@freecloud102:/var/log/**nova# df -h
 FilesystemSize  Used Avail Use% Mounted on
 /dev/ram0 2.0G  1.4G  535M  73% /
 devtmpfs   48G  240K   48G   1% /dev
 none   48G 0   48G   0% /dev/shm
 none   48G  212K   48G   1% /var/run
 none   48G 0   48G   0% /var/lock
 /dev/md0  5.4T   93G  5.1T   2% /opt/openstack

 There simply isn't enough free space on the root partition (which is where
 /tmp is housed) for the snapshot to be created.

 == Possible Solutions ==

 So, there are a number of solutions that we can work on here, and I'm
 wondering what the preference would be. Here are the solutions I have come
 up with, along with a no-brainer improvement to Nova that would help in
 diagnosing this problem:

 The no-brainer: Detect before attempting a snapshot that there is enough
 space on a device to perform the operation, and if not, throw a 

Re: [Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

2012-03-16 Thread Johannes Erdfelt
On Fri, Mar 16, 2012, Vishvananda Ishaya vishvana...@gmail.com wrote:
 Now that we have the temdir context manager I was thinking something like:
 
 diff --git a/nova/utils.py b/nova/utils.py
 index e375f11..a3ac896 100644
 --- a/nova/utils.py
 +++ b/nova/utils.py
 @@ -61,9 +61,11 @@ ISO_TIME_FORMAT = %Y-%m-%dT%H:%M:%S
  PERFECT_TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%f
  FLAGS = flags.FLAGS
  
 -FLAGS.register_opt(
 +FLAGS.register_opts([
  cfg.BoolOpt('disable_process_locking', default=False,
 -help='Whether to disable inter-process locks'))
 +help='Whether to disable inter-process locks'),
 +cfg.StrOpt('tempdir_location', default=None,
 +help='Path where tempdirs will be created')])
  
  
  def import_class(import_str):
 @@ -1611,6 +1613,7 @@ def temporary_chown(path, owner_uid=None):
  
  @contextlib.contextmanager
  def tempdir(**kwargs):
 +kwargs['dir'] = kwargs.get('dir', FLAGS.tempdir_location)
  tmpdir = tempfile.mkdtemp(**kwargs)
  try:
  yield tmpdir
 
 And get rid of the other flags that are used.

This may not be necessary. tempfile will use the TMPDIR, TEMP and TMP
environment variables to determine which directory to use as well.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp