On 11/30/2015 03:01 PM, Robert Collins wrote: > On 1 December 2015 at 08:37, Ben Nemec <openst...@nemebean.com> wrote: >> On 11/30/2015 12:42 PM, Joshua Harlow wrote: >>> Hi all, >>> >>> I just wanted to bring up an issue, possible solution and get feedback >>> on it from folks because it seems to be an on-going problem that shows >>> up not when an application is initially deployed but as on-going >>> operation and running of that application proceeds (ie after running for >>> a period of time). >>> >>> The jist of the problem is the following: >>> >>> A <<pick your favorite openstack project>> has a need to ensure that no >>> application on the same machine can manipulate a given resource on that >>> same machine, so it uses the lock file pattern (acquire a *local* lock >>> file for that resource, manipulate that resource, release that lock >>> file) to do actions on that resource in a safe manner (note this does >>> not ensure safety outside of that machine, lock files are *not* >>> distributed locks). >>> >>> The api that we expose from oslo is typically accessed via the following: >>> >>> oslo_concurrency.lockutils.synchronized(name, lock_file_prefix=None, >>> external=False, lock_path=None, semaphores=None, delay=0.01) >>> >>> or via its underlying library (that I extracted from oslo.concurrency >>> and have improved to add more usefulness) @ >>> http://fasteners.readthedocs.org/ >>> >>> The issue though for <<your favorite openstack project>> is that each of >>> these projects now typically has a large amount of lock files that exist >>> or have existed and no easy way to determine when those lock files can >>> be deleted (afaik no? periodic task exists in said projects to clean up >>> lock files, or to delete them when they are no longer in use...) so what >>> happens is bugs like https://bugs.launchpad.net/cinder/+bug/1432387 >>> appear and there is no a simple solution to clean lock files up (since >>> oslo.concurrency is really not the right layer to know when a lock can >>> or can not be deleted, only the application knows that...) >>> >>> So then we get a few creative solutions like the following: >>> >>> - https://review.openstack.org/#/c/241663/ >>> - https://review.openstack.org/#/c/239678/ >>> - (and others?) >>> >>> So I wanted to ask the question, how are people involved in <<your >>> favorite openstack project>> cleaning up these files (are they at all?) >>> >>> Another idea that I have been proposing also is to use offset locks. >>> >>> This would allow for not creating X lock files, but create a *single* >>> lock file per project and use offsets into it as the way to lock. For >>> example nova could/would create a 1MB (or larger/smaller) *empty* file >>> for locks, that would allow for 1,048,576 locks to be used at the same >>> time, which honestly should be way more than enough, and then there >>> would not need to be any lock cleanup at all... Is there any reason this >>> wasn't initially done back way when this lock file code was created? >>> (https://github.com/harlowja/fasteners/pull/10 adds this functionality >>> to the underlying library if people want to look it over) >> >> I think the main reason was that even with a million locks available, >> you'd have to find a way to hash the lock names to offsets in the file, >> and a million isn't a very large collision space for that. Having two >> differently named locks that hashed to the same offset would lead to >> incredibly confusing bugs. >> >> We could switch to requiring the projects to provide the offsets instead >> of hashing a string value, but that's just pushing the collision problem >> off onto every project that uses us. >> >> So that's the problem as I understand it, but where does that leave us >> for solutions? First, there's >> https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L151 >> which allows consumers to delete lock files when they're done with them. >> Of course, in that case the onus is on the caller to make sure the lock >> couldn't possibly be in use anymore. >> >> Second, is this actually a problem? Modern filesystems have absurdly >> large limits on the number of files in a directory, so it's highly >> unlikely we would ever exhaust that, and we're creating all zero byte >> files so there shouldn't be a significant space impact either. In the >> past I believe our recommendation has been to simply create a cleanup >> job that runs on boot, before any of the OpenStack services start, that >> deletes all of the lock files. At that point you know it's safe to >> delete them, and it prevents your lock file directory from growing forever. > > Not that high - ext3 (still the default for nova ephemeral > partitions!) has a limit of 64k in one directory. > > That said, I don't disagree - my thinkis is that we should advise > putting such files on a tmpfs.
So, I think the issue really is that the named external locks were originally thought to be handling some pretty sensitive critical sections. Both cinder / nova have less than 20 such named locks. Cinder uses a parametrized version for all volume operations - https://github.com/openstack/cinder/blob/7fb767f2d652f070a20fd70d92585d61e56f3a50/cinder/volume/manager.py#L143 Nova also does something similar in image cache https://github.com/openstack/nova/blob/1734ce7101982dd95f8fab1ab4815bd258a33744/nova/virt/libvirt/imagecache.py#L169 I honestly didn't realize that locks weren't deleting when completed due to the implementation details. Honestly, it seems like a busy wait try_lock / sleep might be better here than the open blocking, as it would let us stay on top of the cleanup, at the cost of a small amount of performance when contending for locks. But, honestly, most of these aren't performance critical bits, they are safety from corruption places. -Sean -- Sean Dague http://dague.net __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev