On 12/04/2013 11:56 AM, Ben Nemec wrote: > On 2013-12-04 06:07, Sean Dague wrote: >> On 12/03/2013 11:21 PM, Clint Byrum wrote: >>> Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: >>>> On 12/03/2013 06:13 PM, Ben Nemec wrote: >>>>> On 2013-12-03 17:09, Sean Dague wrote: >>>>>> On 12/03/2013 05:50 PM, Mark McLoughlin wrote: >>>>>>> On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: >>>>>>>> On 2013-12-03 15:56, Sean Dague wrote: >>>>>>>>> This cinder patch - https://review.openstack.org/#/c/48935/ >>>>>>>>> >>>>>>>>> Is blocked on failing upgrade because the updated oslo >>>>>>>>> lockutils won't >>>>>>>>> function until there is a specific configuration variable added >>>>>>>>> to the >>>>>>>>> cinder.conf. >>>>>>>>> >>>>>>>>> That work around is proposed here - >>>>>>>>> https://review.openstack.org/#/c/52070/3 >>>>>>>>> >>>>>>>>> However I think this is exactly the kind of forward breaks that we >>>>>>>>> want >>>>>>>>> to prevent with grenade, as cinder failing to function after a >>>>>>>>> rolling >>>>>>>>> upgrade because a config item wasn't added is exactly the kind >>>>>>>>> of pain >>>>>>>>> we are trying to prevent happening to ops. >>>>>>>>> >>>>>>>>> So the question is, how is this done correctly so that a default >>>>>>>>> can be >>>>>>>>> set in the cinder code for this value, and it not require a config >>>>>>>>> change to work? >>>>>>> >>>>>>> You're absolutely correct, in principle - if the default value for >>>>>>> lock_path worked for users before, we should absolutely continue to >>>>>>> support it. >>>>>>> >>>>>>>> I don't know that I have a good answer on how to handle this, >>>>>>>> but for >>>>>>>> context this change is the result of a nasty bug in lockutils that >>>>>>>> meant >>>>>>>> external locks were doing nothing if lock_path wasn't set. >>>>>>>> Basically >>>>>>>> it's something we should never have allowed in the first place. >>>>>>>> >>>>>>>> As far as setting this in code, it's important that all of the >>>>>>>> processes >>>>>>>> for a service are using the same value to avoid the same bad >>>>>>>> situation >>>>>>>> we were in before. For tests, we have a lockutils wrapper >>>>>>>> (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) >>>>>>>> >>>>>>>> that sets an environment variable to address this, but that only >>>>>>>> works if all of the processes are going to be spawned from within >>>>>>>> the same wrapper, and I'm not sure how secure that is for >>>>>>>> production >>>>>>>> deployments since it puts all of the lock files in a temporary >>>>>>>> directory. >>>>>>> >>>>>>> Right, I don't think the previous default really "worked" - if >>>>>>> you used >>>>>>> the default, then external locking was broken. >>>>>>> >>>>>>> I suspect most distros do set a default - I see RDO has this in its >>>>>>> default nova.conf: >>>>>>> >>>>>>> lock_path = /var/lib/nova/tmp >>>>>>> >>>>>>> So, yes - this is all terrible. >>>>>>> >>>>>>> IMHO, rather than raise an exception we should log a big fat warning >>>>>>> about relying on the default and perhaps just treat the lock as an >>>>>>> in-process lock in that case ... since that's essentially what it >>>>>>> was >>>>>>> before, right? >>>>>> >>>>>> So a default of lock_path = /tmp will work (FHS says that path has >>>>>> to be >>>>>> there), even if not optimal. Could we make it a default value like >>>>>> that >>>>>> instead of the current default which is null (and hence the problem). >>>>> >>>>> IIRC, my initial fix was something similar to that, but it got shot >>>>> down >>>>> because putting the lock files in a known world writeable location >>>>> was a >>>>> security issue. >>>>> >>>>> Although maybe if we put them in a subdirectory of /tmp and ensured >>>>> that >>>>> the permissions were such that only the user running the service could >>>>> use that directory, it might be acceptable? We could still log a >>>>> warning if we wanted. >>>>> >>>>> This seems like it would have implications for people running services >>>>> on Windows too, but we can probably find a way to make that work if we >>>>> decide on a solution. >>>> >>>> How is that a security issue? Are the lock files being written with >>>> some >>>> sensitive data in them and have g or o permissions on? The sticky bit >>>> (+t) on /tmp will prevent other users from deleting the file. >>>> >>> >>> Right, but it won't prevent users from creating a symlink with the same >>> name. >>> >>> ln -s /var/lib/nova/instances/xxxxx/image.raw /tmp/well.known.location >>> >>> Now when you do >>> >>> with open('/tmp/well.known.location', 'w') as lockfile: >>> lockfile.write('Stuff') >>> >>> Nova has just truncated the image file and written 'Stuff' to it. >> >> So that's the generic case (and the way people often write this). But >> the oslo lockutils code doesn't work that way. While it does open the >> file for write, it does not actually write, it's using fcntl to hold >> locks. That's taking a data structure on the fd in kernel memory (IIRC), >> so it correctly gives it up if the process crashes. >> >> I'm not saying there isn't some other possible security vulnerability >> here as well, but it's not jumping out at me. So I'd love to understand >> that, because if we can close that exposure we can provide a working >> default, plus a strong recommendation for how to do that *right*. I'd be >> totally happy with printing WARNING level at startup if lock_path = /tmp >> that this should be adjusted. > > Full disclosure: I don't claim to be a security expert, so take my > thoughts on this with a grain of salt. > > tldr: I still don't see a way to do this without breaking _something_. > > Unfortunately, while we don't actually write to the file, just opening > it for write access truncates it. So there remains the issue Clint > raised if someone can make that file a link. I suppose we could check > for the bad things people might do, but I'm pretty sure that way lies > madness.
Fair point, more coffee before I make statements like that I guess :).
Follow up question: could we change it to open in "a" mode instead.
Still requires write access, but won't truncate.
> As a followup to my suggestion to have lockutils create something like
> /tmp/oslo-lock-path and make it accessible only to the owner, I think
> there's still a problem because a malicious user could create that
> directory ahead of time and when OpenStack is run for the first time it
> would happily use the existing directory and potentially overwrite any
> linked files in it. There might be ways to avoid that, but I suspect
> they would be racy (which I believe is why you're not supposed to use
> tmp paths that are known ahead of time). Granted, I'm not sure you
> should be installing to a system that already has malicious users on it,
> but I suspect security people would be less than thrilled with that answer.
>
> So the workable options that I see right now are to have it break on
> upgrade if lock_path is not set, or continue to break external locking
> by having it fall back to process locks when lock_path is not set, as
> Mark suggested. I'm not wild about either so I'd love to hear if anyone
> has a third option.
Honestly, I'd love us to be clever and figure out a not dangerous way
through this, even if unwise (where we can yell at the user in the LOGs
loudly, and fail them in J if lock_dir=/tmp) that lets us progress
through this while gracefully bringing configs into line.
-Sean
--
Sean Dague
http://dague.net
signature.asc
Description: OpenPGP digital signature
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
