Re: [openstack-dev] creating a default for oslo config variables within a project?
On Wed, Dec 04 2013, Sean Dague wrote: Honestly, I'd love us to be clever and figure out a not dangerous way through this, even if unwise (where we can yell at the user in the LOGs loudly, and fail them in J if lock_dir=/tmp) that lets us progress through this while gracefully bringing configs into line. Correct me if I'm wrong, but I think the correct way to deal with that security problem is to use an atomic operation using open(2) with: open(pathname, O_CREAT | O_EXCL) or mkstemp(3). That should be doable in Python too. -- Julien Danjou # Free Software hacker # independent consultant # http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
Excerpts from Julien Danjou's message of 2013-12-05 01:22:00 -0800: On Wed, Dec 04 2013, Sean Dague wrote: Honestly, I'd love us to be clever and figure out a not dangerous way through this, even if unwise (where we can yell at the user in the LOGs loudly, and fail them in J if lock_dir=/tmp) that lets us progress through this while gracefully bringing configs into line. Correct me if I'm wrong, but I think the correct way to deal with that security problem is to use an atomic operation using open(2) with: open(pathname, O_CREAT | O_EXCL) DOS by a malicious user creating it first is still trivial. or mkstemp(3). Can't use mkstemp as the point is this needs to be something shared between processes. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 12/03/2013 11:21 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. So that's the generic case (and the way people often write this). But the oslo lockutils code doesn't work that way. While it does open the file for write, it does not actually write, it's using fcntl to hold locks. That's taking a data structure on the fd in kernel memory (IIRC), so it correctly gives it up if the process crashes. I'm not saying there isn't some other possible security vulnerability here as well, but it's not jumping out at me. So I'd love to understand that, because if we can close that exposure we can provide a working default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp that this should be adjusted. The typical solution is to use a lock directory, /var/run/yourprogram, that has restrictive enough permissions setup for your program to have exclusive use of it, and is created by root at boot time. That is what the packages do now. It would be good if everybody agreed on a default, %(nova_home)/locks or something, but root still must set it up with the right
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 2013-12-04 06:07, Sean Dague wrote: On 12/03/2013 11:21 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. So that's the generic case (and the way people often write this). But the oslo lockutils code doesn't work that way. While it does open the file for write, it does not actually write, it's using fcntl to hold locks. That's taking a data structure on the fd in kernel memory (IIRC), so it correctly gives it up if the process crashes. I'm not saying there isn't some other possible security vulnerability here as well, but it's not jumping out at me. So I'd love to understand that, because if we can close that exposure we can provide a working default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp that this should be adjusted. Full disclosure: I don't claim to be a security expert, so take my thoughts on this with a grain of salt. tldr: I still don't see a way to do this without breaking _something_. Unfortunately, while we don't actually write to the file, just opening it for write access truncates it. So there remains the issue Clint raised if someone
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 12/04/2013 11:56 AM, Ben Nemec wrote: On 2013-12-04 06:07, Sean Dague wrote: On 12/03/2013 11:21 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. So that's the generic case (and the way people often write this). But the oslo lockutils code doesn't work that way. While it does open the file for write, it does not actually write, it's using fcntl to hold locks. That's taking a data structure on the fd in kernel memory (IIRC), so it correctly gives it up if the process crashes. I'm not saying there isn't some other possible security vulnerability here as well, but it's not jumping out at me. So I'd love to understand that, because if we can close that exposure we can provide a working default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp that this should be adjusted. Full disclosure: I don't claim to be a security expert, so take my thoughts on this with a grain of salt. tldr: I still don't see a way to do this without breaking _something_. Unfortunately, while we don't actually write to the file, just opening it for
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 2013-12-04 12:51, Sean Dague wrote: On 12/04/2013 11:56 AM, Ben Nemec wrote: On 2013-12-04 06:07, Sean Dague wrote: On 12/03/2013 11:21 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. So that's the generic case (and the way people often write this). But the oslo lockutils code doesn't work that way. While it does open the file for write, it does not actually write, it's using fcntl to hold locks. That's taking a data structure on the fd in kernel memory (IIRC), so it correctly gives it up if the process crashes. I'm not saying there isn't some other possible security vulnerability here as well, but it's not jumping out at me. So I'd love to understand that, because if we can close that exposure we can provide a working default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp that this should be adjusted. Full disclosure: I don't claim to be a security expert, so take my thoughts on this with a grain of salt. tldr: I still don't see a way to do this without breaking _something_. Unfortunately, while we don't actually write to the file, just opening it for
Re: [openstack-dev] creating a default for oslo config variables within a project?
Excerpts from Sean Dague's message of 2013-12-04 10:51:16 -0800: On 12/04/2013 11:56 AM, Ben Nemec wrote: On 2013-12-04 06:07, Sean Dague wrote: On 12/03/2013 11:21 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. So that's the generic case (and the way people often write this). But the oslo lockutils code doesn't work that way. While it does open the file for write, it does not actually write, it's using fcntl to hold locks. That's taking a data structure on the fd in kernel memory (IIRC), so it correctly gives it up if the process crashes. I'm not saying there isn't some other possible security vulnerability here as well, but it's not jumping out at me. So I'd love to understand that, because if we can close that exposure we can provide a working default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp that this should be adjusted. Full disclosure: I don't claim to be a security expert, so take my
Re: [openstack-dev] creating a default for oslo config variables within a project?
Should we update oslo-incubator first to? default=os.environ.get(OSLO_LOCK_PATH) or tempfile.mkdtemp() -- dims On Tue, Dec 3, 2013 at 4:56 PM, Sean Dague s...@dague.net wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: http://davanum.wordpress.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
We can't do that. It used to behave that way, which led to https://bugs.launchpad.net/cinder/+bug/1065531 -Ben On 2013-12-03 16:22, Davanum Srinivas wrote: Should we update oslo-incubator first to? default=os.environ.get(OSLO_LOCK_PATH) or tempfile.mkdtemp() -- dims On Tue, Dec 3, 2013 at 4:56 PM, Sean Dague s...@dague.net wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. (from the chmod man page): RESTRICTED DELETION FLAG OR STICKY BIT The restricted deletion flag or sticky bit is a single bit, whose interpretation depends on the file type. For directories, it prevents unprivi‐ leged users from removing or renaming a file in the directory unless they own the file or the directory; this is called the restricted deletion flag for the directory, and is commonly found on world-writable directories like /tmp. For regular files on some older systems, the bit saves the program's text image on the swap device so it will load more quickly when run; this is called the sticky bit. -Sean -- Sean Dague http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] creating a default for oslo config variables within a project?
Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: On 12/03/2013 06:13 PM, Ben Nemec wrote: On 2013-12-03 17:09, Sean Dague wrote: On 12/03/2013 05:50 PM, Mark McLoughlin wrote: On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: On 2013-12-03 15:56, Sean Dague wrote: This cinder patch - https://review.openstack.org/#/c/48935/ Is blocked on failing upgrade because the updated oslo lockutils won't function until there is a specific configuration variable added to the cinder.conf. That work around is proposed here - https://review.openstack.org/#/c/52070/3 However I think this is exactly the kind of forward breaks that we want to prevent with grenade, as cinder failing to function after a rolling upgrade because a config item wasn't added is exactly the kind of pain we are trying to prevent happening to ops. So the question is, how is this done correctly so that a default can be set in the cinder code for this value, and it not require a config change to work? You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to support it. I don't know that I have a good answer on how to handle this, but for context this change is the result of a nasty bug in lockutils that meant external locks were doing nothing if lock_path wasn't set. Basically it's something we should never have allowed in the first place. As far as setting this in code, it's important that all of the processes for a service are using the same value to avoid the same bad situation we were in before. For tests, we have a lockutils wrapper (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within the same wrapper, and I'm not sure how secure that is for production deployments since it puts all of the lock files in a temporary directory. Right, I don't think the previous default really worked - if you used the default, then external locking was broken. I suspect most distros do set a default - I see RDO has this in its default nova.conf: lock_path = /var/lib/nova/tmp So, yes - this is all terrible. IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it was before, right? So a default of lock_path = /tmp will work (FHS says that path has to be there), even if not optimal. Could we make it a default value like that instead of the current default which is null (and hence the problem). IIRC, my initial fix was something similar to that, but it got shot down because putting the lock files in a known world writeable location was a security issue. Although maybe if we put them in a subdirectory of /tmp and ensured that the permissions were such that only the user running the service could use that directory, it might be acceptable? We could still log a warning if we wanted. This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we decide on a solution. How is that a security issue? Are the lock files being written with some sensitive data in them and have g or o permissions on? The sticky bit (+t) on /tmp will prevent other users from deleting the file. Right, but it won't prevent users from creating a symlink with the same name. ln -s /var/lib/nova/instances/x/image.raw /tmp/well.known.location Now when you do with open('/tmp/well.known.location', 'w') as lockfile: lockfile.write('Stuff') Nova has just truncated the image file and written 'Stuff' to it. The typical solution is to use a lock directory, /var/run/yourprogram, that has restrictive enough permissions setup for your program to have exclusive use of it, and is created by root at boot time. That is what the packages do now. It would be good if everybody agreed on a default, %(nova_home)/locks or something, but root still must set it up with the right permissions before the program can use it. IMO that is fine, your upgrade should include a change to your init script/systemd unit/upstart job which ensures that it exists before starting. We could abstract this away with a nova-manage subcommand that is intended to be run as root but can inspect the config file. That would allow for simpler documentation and the packagers would probably love you for it. :) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev