Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 06/08/2015 06:26 PM, Robert Collins wrote: On 9 June 2015 at 07:53, Chris Friesen wrote: From what I understand, the iterator (in the glance-api process) normally breaks out of the while loop once the whole file has been read and the read() call returns an empty string. It's not clear to me how an error in the nova process (which causes it to stop reading the file from glance-api) will cause glance-api to break out of the while loop in ChunkedFile.__iter__(). AIUI the conclusion of our IRC investigation was: - with caching middleware, the fd is freed, just after ~4m. - without caching middleware, the fd is freed after ~90s. Correct. I went back and tried it out in a hardware lab and those are the results I got. Sorry for the noise (though it'd be nice to get rid of the 4-minute delay). I did a bit more digging and we've apparently got another similar issue reported where taking a snapshot fails because the filesystem fills up and the space never gets reclaimed even though the snapshot failed. I'll make sure I can reproduce that one before going further. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 9 June 2015 at 07:53, Chris Friesen wrote: > On 06/08/2015 12:30 PM, Robert Collins wrote: >> >> On 9 June 2015 at 03:50, Chris Friesen >> wrote: >>> >>> On 06/07/2015 04:22 PM, Robert Collins wrote: >> >> >>> Hi, original reporter here. >>> >>> There's no LB involved. The issue was noticed in a test lab that is >>> tight >>> on disk space. When an instance failed to boot the person using the lab >>> tried to delete some images to free up space, at which point it was >>> noticed >>> that space didn't actually free up. (For at least half an hour, exact >>> time >>> unknown.) >>> >>> I'm more of a nova guy, so could you elaborate a bit on the GC? Is >>> something going to delete the ChunkedFile object after a certain amount >>> of >>> inactivity? What triggers the GC to run? >> >> >> Ok, updating my theory... I'm not super familiar with glances >> innards, so I'm going to call out my assumptions. >> >> The ChunkedFile object is in the Nova process. It reads from a local >> file, so its the issue - and it has a handle onto the image because >> glance arranged for it to read from it. > > > As I understand it, the ChunkedFile object is in the glance-api process. > (At least, it's the glance-api process that is holding the open file > descriptor.) > > >> Anyhow - to answer your question: the iterator is only referenced by >> the for loop, nothing else *can* hold a reference to it (without nasty >> introspection hacks) and so as long as the iterator has an appropriate >> try:finally:, which the filesystem ChunkedFile one does- the file will >> be closed automatically. > > > From what I understand, the iterator (in the glance-api process) normally > breaks out of the while loop once the whole file has been read and the > read() call returns an empty string. > > It's not clear to me how an error in the nova process (which causes it to > stop reading the file from glance-api) will cause glance-api to break out > of the while loop in ChunkedFile.__iter__(). AIUI the conclusion of our IRC investigation was: - with caching middleware, the fd is freed, just after ~4m. - without caching middleware, the fd is freed after ~90s. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 06/08/2015 12:30 PM, Robert Collins wrote: On 9 June 2015 at 03:50, Chris Friesen wrote: On 06/07/2015 04:22 PM, Robert Collins wrote: Hi, original reporter here. There's no LB involved. The issue was noticed in a test lab that is tight on disk space. When an instance failed to boot the person using the lab tried to delete some images to free up space, at which point it was noticed that space didn't actually free up. (For at least half an hour, exact time unknown.) I'm more of a nova guy, so could you elaborate a bit on the GC? Is something going to delete the ChunkedFile object after a certain amount of inactivity? What triggers the GC to run? Ok, updating my theory... I'm not super familiar with glances innards, so I'm going to call out my assumptions. The ChunkedFile object is in the Nova process. It reads from a local file, so its the issue - and it has a handle onto the image because glance arranged for it to read from it. As I understand it, the ChunkedFile object is in the glance-api process. (At least, it's the glance-api process that is holding the open file descriptor.) Anyhow - to answer your question: the iterator is only referenced by the for loop, nothing else *can* hold a reference to it (without nasty introspection hacks) and so as long as the iterator has an appropriate try:finally:, which the filesystem ChunkedFile one does- the file will be closed automatically. From what I understand, the iterator (in the glance-api process) normally breaks out of the while loop once the whole file has been read and the read() call returns an empty string. It's not clear to me how an error in the nova process (which causes it to stop reading the file from glance-api) will cause glance-api to break out of the while loop in ChunkedFile.__iter__(). Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 9 June 2015 at 03:50, Chris Friesen wrote: > On 06/07/2015 04:22 PM, Robert Collins wrote: > Hi, original reporter here. > > There's no LB involved. The issue was noticed in a test lab that is tight > on disk space. When an instance failed to boot the person using the lab > tried to delete some images to free up space, at which point it was noticed > that space didn't actually free up. (For at least half an hour, exact time > unknown.) > > I'm more of a nova guy, so could you elaborate a bit on the GC? Is > something going to delete the ChunkedFile object after a certain amount of > inactivity? What triggers the GC to run? Ok, updating my theory... I'm not super familiar with glances innards, so I'm going to call out my assumptions. The ChunkedFile object is in the Nova process. It reads from a local file, so its the issue - and it has a handle onto the image because glance arranged for it to read from it. Nova runs out of space and throws in the middle of iterating the ChunkedFile. the code in https://review.openstack.org/#/c/188179/1/nova/image/glance.py (base) propogates that exception, hitting the finally: if close_file: data.close() on the way out. the image_chunks iterator, which is being iterated by the for loop, will be closed when the for loop is left because there are no other references to the iterator object. So the in-nova ChunkedFile object will have its close method called due to the close() in the finally block of the __iter__ method automatically when you except out of that exception. (demo below). Assumption: glance_store/_drivers/filesystem.py is the ChunkedFile in question; it might not be. If e.g. you are reading from the glance HTTP service, then it might be a different class. Anyhow - to answer your question: the iterator is only referenced by the for loop, nothing else *can* hold a reference to it (without nasty introspection hacks) and so as long as the iterator has an appropriate try:finally:, which the filesystem ChunkedFile one does- the file will be closed automatically. So things to verify: - it is filesystem.ChunkedFile that you're seeing when you reproduce it (within Nova). If not, what class is it :). If its an HTTP handle of some sort, then we just need to make sure it has an appropriate try:finally: within its generator. Assuming that is that the contract with glance client is 'and you get an iterator' not a 'file like object'. >>> class f: ... def __iter__(self): ... try: ... yield 1 ... 1/0 ... finally: ... print('yay') ... >>> for x in f(): ... pass ... yay Traceback (most recent call last): File "", line 1, in File "", line 5, in __iter__ ZeroDivisionError: integer division or modulo by zero >>> for x in f(): ... raise Exception() ... yay Traceback (most recent call last): File "", line 2, in Exception -- Robert Collins Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 06/07/2015 04:22 PM, Robert Collins wrote: On 6 June 2015 at 13:08, Ian Cordasco wrote: So the problem is with how we use ResponseSerializer and the ChunkedFile (https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d ata.py#n222). I think the problem we'll have is that webob provides nothing on a Response (https://webob.readthedocs.org/en/latest/modules/webob.html#response) to hook into so we can close the ChunkedFile. I wonder if we used the body_file attribute if webob would close the file when the response is closed (because I'm assuming that nova/glanceclient are closing the response with which it's downloading the data). But the maximum leak time is a single GC run, which we don't expect to be long, unless the server is super quiet (and if it is, the temporary leak is less like to be an issue, no?). I wonder if there's actually something else going on here. E.g. a broken LB in front of glance-api which is preventing the HTTP connection termination from being detected, and the thread is staying open-and-stalled. That would explain the symptoms just as well - and is deployer specific so could also explain the (perceived) trickiness in reproduction. Hi, original reporter here. There's no LB involved. The issue was noticed in a test lab that is tight on disk space. When an instance failed to boot the person using the lab tried to delete some images to free up space, at which point it was noticed that space didn't actually free up. (For at least half an hour, exact time unknown.) I'm more of a nova guy, so could you elaborate a bit on the GC? Is something going to delete the ChunkedFile object after a certain amount of inactivity? What triggers the GC to run? Thanks, Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 6 June 2015 at 13:08, Ian Cordasco wrote: > > > On 6/5/15, 02:55, "Flavio Percoco" wrote: > >>On 04/06/15 11:46 -0600, Chris Friesen wrote: >>>On 06/04/2015 03:01 AM, Flavio Percoco wrote: On 03/06/15 16:46 -0600, Chris Friesen wrote: >We recently ran into an issue where nova couldn't write an image file >due to >lack of space and so just quit reading from glance. > >This caused glance to be stuck with an open file descriptor, which >meant that >the image consumed space even after it was deleted. > >I have a crude fix for nova at >"https://review.openstack.org/#/c/188179/"; >which basically continues to read the image even though it can't write >it. >That seems less than ideal for large images though. > >Is there a better way to do this? Is there a way for nova to indicate >to >glance that it's no longer interested in that image and glance can >close the >file? > >If I've followed this correctly, on the glance side I think the code in >question is ultimately >glance_store._drivers.filesystem.ChunkedFile.__iter__(). Actually, to be honest, I was quite confused by the email :P Correct me if I still didn't understand what you're asking. You ran out of space on the Nova side while downloading the image and there's a file descriptor leak somewhere either in that lovely (sarcasm) glance wrapper or in glanceclient. >>> >>>The first part is correct, but the file descriptor is actually held by >>>glance-api. >>> Just by reading your email and glancing your patch, I believe the bug might be in glanceclient but I'd need to five into this. The piece of code you'll need to look into is[0]. glance_store is just used server side. If that's what you meant - glance is keeping the request and the ChunkedFile around - then yes, glance_store is the place to look into. [0] https://github.com/openstack/python-glanceclient/blob/master/glanceclien t/v1/images.py#L152 >>> >>>I believe what's happening is that the ChunkedFile code opens the file >>>and creates the iterator. Nova then starts iterating through the >>>file. >>> >>>If nova (or any other user of glance) iterates all the way through the >>>file then the ChunkedFile code will hit the "finally" clause in >>>__iter__() and close the file descriptor. >>> >>>If nova starts iterating through the file and then stops (due to >>>running out of room, for example), the ChunkedFile.__iter__() routine >>>is left with an open file descriptor. At this point deleting the >>>image will not actually free up any space. >>> >>>I'm not a glance guy so I could be wrong about the code. The >>>externally-visible data are: >>>1) glance-api is holding an open file descriptor to a deleted image file >>>2) If I kill glance-api the disk space is freed up. >>>3) If I modify nova to always finish iterating through the file the >>>problem doesn't occur in the first place. >> >>Gotcha, thanks for explaining. I think the problem is that there might >>be a reference leak and therefore the FD is kept opened. Probably the >>request interruption is not getting to the driver. I've filed this >>bug[0] so we can look into it. >> >>[0] https://bugs.launchpad.net/glance-store/+bug/1462235 >> >>Flavio >> >>-- >>@flaper87 >>Flavio Percoco > > So the problem is with how we use ResponseSerializer and the ChunkedFile > (https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d > ata.py#n222). I think the problem we'll have is that webob provides > nothing on a Response > (https://webob.readthedocs.org/en/latest/modules/webob.html#response) to > hook into so we can close the ChunkedFile. > > I wonder if we used the body_file attribute if webob would close the file > when the response is closed (because I'm assuming that nova/glanceclient > are closing the response with which it's downloading the data). But the maximum leak time is a single GC run, which we don't expect to be long, unless the server is super quiet (and if it is, the temporary leak is less like to be an issue, no?). I wonder if there's actually something else going on here. E.g. a broken LB in front of glance-api which is preventing the HTTP connection termination from being detected, and the thread is staying open-and-stalled. That would explain the symptoms just as well - and is deployer specific so could also explain the (perceived) trickiness in reproduction. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 6/5/15, 02:55, "Flavio Percoco" wrote: >On 04/06/15 11:46 -0600, Chris Friesen wrote: >>On 06/04/2015 03:01 AM, Flavio Percoco wrote: >>>On 03/06/15 16:46 -0600, Chris Friesen wrote: We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). >>> >>>Actually, to be honest, I was quite confused by the email :P >>> >>>Correct me if I still didn't understand what you're asking. >>> >>>You ran out of space on the Nova side while downloading the image and >>>there's a file descriptor leak somewhere either in that lovely (sarcasm) >>>glance wrapper or in glanceclient. >> >>The first part is correct, but the file descriptor is actually held by >>glance-api. >> >>>Just by reading your email and glancing your patch, I believe the bug >>>might be in glanceclient but I'd need to five into this. The piece of >>>code you'll need to look into is[0]. >>> >>>glance_store is just used server side. If that's what you meant - >>>glance is keeping the request and the ChunkedFile around - then yes, >>>glance_store is the place to look into. >>> >>>[0] >>>https://github.com/openstack/python-glanceclient/blob/master/glanceclien >>>t/v1/images.py#L152 >> >>I believe what's happening is that the ChunkedFile code opens the file >>and creates the iterator. Nova then starts iterating through the >>file. >> >>If nova (or any other user of glance) iterates all the way through the >>file then the ChunkedFile code will hit the "finally" clause in >>__iter__() and close the file descriptor. >> >>If nova starts iterating through the file and then stops (due to >>running out of room, for example), the ChunkedFile.__iter__() routine >>is left with an open file descriptor. At this point deleting the >>image will not actually free up any space. >> >>I'm not a glance guy so I could be wrong about the code. The >>externally-visible data are: >>1) glance-api is holding an open file descriptor to a deleted image file >>2) If I kill glance-api the disk space is freed up. >>3) If I modify nova to always finish iterating through the file the >>problem doesn't occur in the first place. > >Gotcha, thanks for explaining. I think the problem is that there might >be a reference leak and therefore the FD is kept opened. Probably the >request interruption is not getting to the driver. I've filed this >bug[0] so we can look into it. > >[0] https://bugs.launchpad.net/glance-store/+bug/1462235 > >Flavio > >-- >@flaper87 >Flavio Percoco So the problem is with how we use ResponseSerializer and the ChunkedFile (https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d ata.py#n222). I think the problem we'll have is that webob provides nothing on a Response (https://webob.readthedocs.org/en/latest/modules/webob.html#response) to hook into so we can close the ChunkedFile. I wonder if we used the body_file attribute if webob would close the file when the response is closed (because I'm assuming that nova/glanceclient are closing the response with which it's downloading the data). __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 04/06/15 11:46 -0600, Chris Friesen wrote: On 06/04/2015 03:01 AM, Flavio Percoco wrote: On 03/06/15 16:46 -0600, Chris Friesen wrote: We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). Actually, to be honest, I was quite confused by the email :P Correct me if I still didn't understand what you're asking. You ran out of space on the Nova side while downloading the image and there's a file descriptor leak somewhere either in that lovely (sarcasm) glance wrapper or in glanceclient. The first part is correct, but the file descriptor is actually held by glance-api. Just by reading your email and glancing your patch, I believe the bug might be in glanceclient but I'd need to five into this. The piece of code you'll need to look into is[0]. glance_store is just used server side. If that's what you meant - glance is keeping the request and the ChunkedFile around - then yes, glance_store is the place to look into. [0] https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152 I believe what's happening is that the ChunkedFile code opens the file and creates the iterator. Nova then starts iterating through the file. If nova (or any other user of glance) iterates all the way through the file then the ChunkedFile code will hit the "finally" clause in __iter__() and close the file descriptor. If nova starts iterating through the file and then stops (due to running out of room, for example), the ChunkedFile.__iter__() routine is left with an open file descriptor. At this point deleting the image will not actually free up any space. I'm not a glance guy so I could be wrong about the code. The externally-visible data are: 1) glance-api is holding an open file descriptor to a deleted image file 2) If I kill glance-api the disk space is freed up. 3) If I modify nova to always finish iterating through the file the problem doesn't occur in the first place. Gotcha, thanks for explaining. I think the problem is that there might be a reference leak and therefore the FD is kept opened. Probably the request interruption is not getting to the driver. I've filed this bug[0] so we can look into it. [0] https://bugs.launchpad.net/glance-store/+bug/1462235 Flavio -- @flaper87 Flavio Percoco pgpqbO1TwobzI.pgp Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 06/04/2015 07:16 PM, Joshua Harlow wrote: Perhaps someone needs to use (or forgot to use) contextlib.closing? https://docs.python.org/2/library/contextlib.html#contextlib.closing Or contextlib2 also may be useful: http://contextlib2.readthedocs.org/en/latest/#contextlib2.ExitStack The complication is that the error happens in nova, but it's glance-api that holds the file descriptor open. So somehow the error has to be detected in nova, then the fact that nova is no longer interested in the file needs to be propagated through glanceclient via an HTTP call into glance-api so that it knows it can close the file descriptor. And I don't think that an API to do this exists currently. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
Perhaps someone needs to use (or forgot to use) contextlib.closing? https://docs.python.org/2/library/contextlib.html#contextlib.closing Or contextlib2 also may be useful: http://contextlib2.readthedocs.org/en/latest/#contextlib2.ExitStack -Josh Chris Friesen wrote: On 06/04/2015 03:01 AM, Flavio Percoco wrote: On 03/06/15 16:46 -0600, Chris Friesen wrote: We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). Actually, to be honest, I was quite confused by the email :P Correct me if I still didn't understand what you're asking. You ran out of space on the Nova side while downloading the image and there's a file descriptor leak somewhere either in that lovely (sarcasm) glance wrapper or in glanceclient. The first part is correct, but the file descriptor is actually held by glance-api. Just by reading your email and glancing your patch, I believe the bug might be in glanceclient but I'd need to five into this. The piece of code you'll need to look into is[0]. glance_store is just used server side. If that's what you meant - glance is keeping the request and the ChunkedFile around - then yes, glance_store is the place to look into. [0] https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152 I believe what's happening is that the ChunkedFile code opens the file and creates the iterator. Nova then starts iterating through the file. If nova (or any other user of glance) iterates all the way through the file then the ChunkedFile code will hit the "finally" clause in __iter__() and close the file descriptor. If nova starts iterating through the file and then stops (due to running out of room, for example), the ChunkedFile.__iter__() routine is left with an open file descriptor. At this point deleting the image will not actually free up any space. I'm not a glance guy so I could be wrong about the code. The externally-visible data are: 1) glance-api is holding an open file descriptor to a deleted image file 2) If I kill glance-api the disk space is freed up. 3) If I modify nova to always finish iterating through the file the problem doesn't occur in the first place. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 06/04/2015 03:01 AM, Flavio Percoco wrote: On 03/06/15 16:46 -0600, Chris Friesen wrote: We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). Actually, to be honest, I was quite confused by the email :P Correct me if I still didn't understand what you're asking. You ran out of space on the Nova side while downloading the image and there's a file descriptor leak somewhere either in that lovely (sarcasm) glance wrapper or in glanceclient. The first part is correct, but the file descriptor is actually held by glance-api. Just by reading your email and glancing your patch, I believe the bug might be in glanceclient but I'd need to five into this. The piece of code you'll need to look into is[0]. glance_store is just used server side. If that's what you meant - glance is keeping the request and the ChunkedFile around - then yes, glance_store is the place to look into. [0] https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152 I believe what's happening is that the ChunkedFile code opens the file and creates the iterator. Nova then starts iterating through the file. If nova (or any other user of glance) iterates all the way through the file then the ChunkedFile code will hit the "finally" clause in __iter__() and close the file descriptor. If nova starts iterating through the file and then stops (due to running out of room, for example), the ChunkedFile.__iter__() routine is left with an open file descriptor. At this point deleting the image will not actually free up any space. I'm not a glance guy so I could be wrong about the code. The externally-visible data are: 1) glance-api is holding an open file descriptor to a deleted image file 2) If I kill glance-api the disk space is freed up. 3) If I modify nova to always finish iterating through the file the problem doesn't occur in the first place. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?
On 03/06/15 16:46 -0600, Chris Friesen wrote: We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). Actually, to be honest, I was quite confused by the email :P Correct me if I still didn't understand what you're asking. You ran out of space on the Nova side while downloading the image and there's a file descriptor leak somewhere either in that lovely (sarcasm) glance wrapper or in glanceclient. Just by reading your email and glancing your patch, I believe the bug might be in glanceclient but I'd need to five into this. The piece of code you'll need to look into is[0]. glance_store is just used server side. If that's what you meant - glance is keeping the request and the ChunkedFile around - then yes, glance_store is the place to look into. [0] https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152 Cheers, Flavio -- @flaper87 Flavio Percoco pgp0DL1sr4LBf.pgp Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [glance] How to deal with aborted image read?
We recently ran into an issue where nova couldn't write an image file due to lack of space and so just quit reading from glance. This caused glance to be stuck with an open file descriptor, which meant that the image consumed space even after it was deleted. I have a crude fix for nova at "https://review.openstack.org/#/c/188179/"; which basically continues to read the image even though it can't write it. That seems less than ideal for large images though. Is there a better way to do this? Is there a way for nova to indicate to glance that it's no longer interested in that image and glance can close the file? If I've followed this correctly, on the glance side I think the code in question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__(). Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev