Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/17 Robert Love : > On Thu, May 16, 2013 at 1:19 PM, Andrew Morton > wrote: >> On Thu, 16 May 2013 13:08:17 -0400 Robert Love wrote: >>> This problem seems a rare proper use of mutex_trylock. >> >> Not really. The need for a trylock is often an indication that a >> subsystem has a locking misdesign. That is indeed the case here. > > It is exactly the same as PF_MEMALLOC. We've got an effectively > asynchronous event (shrinking) that can occur while you are holding > locks requisite to that shrinking. Given that the shrinkage is best > effort, a trylock actually communicates the intent pretty well: "If > possible, grab this lock and shrink." > > I think the idiomatic fix is to introduce a GFP_SHMEM but that seems > overkill. Lots of the GFP flags are really just preventing recursing > into the shrinkage code and it seems ill-designed that we require > developers to know where they might end up. But we can disagree. :) > >> Well, it's not exactly a ton of work, but adding a per-ashmem_area lock >> to protect ->file would rather be putting lipstick on a pig. I suppose >> we can put the trylock in there and run away, but it wouldn't hurt to >> drop in a big fat comment somewhere explaining that the driver should be >> migrated to a per-object locking scheme. > > Unfortunately I think ashmem_shrink would need to grab the per-object > lock too; it needs to update the ranges. I'm sure we could re-design > this but I don't think it is as easy as simply pushing the locking > into the objects. > >Robert Hi all, I am wondering if this is fixed in latest kernel? We are continuously seeing this deadlock issue. Best Regards, Raul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/17 Robert Love rl...@google.com: On Thu, May 16, 2013 at 1:19 PM, Andrew Morton a...@linux-foundation.org wrote: On Thu, 16 May 2013 13:08:17 -0400 Robert Love rl...@google.com wrote: This problem seems a rare proper use of mutex_trylock. Not really. The need for a trylock is often an indication that a subsystem has a locking misdesign. That is indeed the case here. It is exactly the same as PF_MEMALLOC. We've got an effectively asynchronous event (shrinking) that can occur while you are holding locks requisite to that shrinking. Given that the shrinkage is best effort, a trylock actually communicates the intent pretty well: If possible, grab this lock and shrink. I think the idiomatic fix is to introduce a GFP_SHMEM but that seems overkill. Lots of the GFP flags are really just preventing recursing into the shrinkage code and it seems ill-designed that we require developers to know where they might end up. But we can disagree. :) Well, it's not exactly a ton of work, but adding a per-ashmem_area lock to protect -file would rather be putting lipstick on a pig. I suppose we can put the trylock in there and run away, but it wouldn't hurt to drop in a big fat comment somewhere explaining that the driver should be migrated to a per-object locking scheme. Unfortunately I think ashmem_shrink would need to grab the per-object lock too; it needs to update the ranges. I'm sure we could re-design this but I don't think it is as easy as simply pushing the locking into the objects. Robert Hi all, I am wondering if this is fixed in latest kernel? We are continuously seeing this deadlock issue. Best Regards, Raul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 1:19 PM, Andrew Morton wrote: > On Thu, 16 May 2013 13:08:17 -0400 Robert Love wrote: >> This problem seems a rare proper use of mutex_trylock. > > Not really. The need for a trylock is often an indication that a > subsystem has a locking misdesign. That is indeed the case here. It is exactly the same as PF_MEMALLOC. We've got an effectively asynchronous event (shrinking) that can occur while you are holding locks requisite to that shrinking. Given that the shrinkage is best effort, a trylock actually communicates the intent pretty well: "If possible, grab this lock and shrink." I think the idiomatic fix is to introduce a GFP_SHMEM but that seems overkill. Lots of the GFP flags are really just preventing recursing into the shrinkage code and it seems ill-designed that we require developers to know where they might end up. But we can disagree. :) > Well, it's not exactly a ton of work, but adding a per-ashmem_area lock > to protect ->file would rather be putting lipstick on a pig. I suppose > we can put the trylock in there and run away, but it wouldn't hurt to > drop in a big fat comment somewhere explaining that the driver should be > migrated to a per-object locking scheme. Unfortunately I think ashmem_shrink would need to grab the per-object lock too; it needs to update the ranges. I'm sure we could re-design this but I don't think it is as easy as simply pushing the locking into the objects. Robert -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, 16 May 2013 13:08:17 -0400 Robert Love wrote: > On Thu, May 16, 2013 at 12:45 PM, Andrew Morton > wrote: > > A better approach would be to add a new __GFP_NOSHRINKERS, but it's all > > variations on a theme. > > I don't like this proposal, either. Many of the existing GFP flags > already exist to prevent recurse into that flag's respective shrinker. > > This problem seems a rare proper use of mutex_trylock. Not really. The need for a trylock is often an indication that a subsystem has a locking misdesign. That is indeed the case here. > > The mutex_trylock(ashmem_mutex) will actually have the best > > performance, because it skips the least amount of memory reclaim > > opportunities. > > Right. > > > But it still sucks! The real problem is that there exists a lock > > called "ashmem_mutex", taken by both the high-level mmap() and by the > > low-level shrinker. And taken by everything else too! The ashmem > > locking is pretty crude... > > The locking is "crude" because I optimized for space, not time, and > there was (and is) no indication we were suffering lock contention due > to the global lock. I haven't thought through the implications of > pushing locking into the ashmem_area and ashmem_range objects, but it > does look like we'd end up often grabbing all of the locks ... > > > What is the mutex_lock() in ashmem_mmap() actually protecting? I don't > > see much, apart from perhaps some incidental races around the contents > > of the file's ashmem_area, and those could/should be protected by a > > per-object lock, not a global one? > > ... but not, as you note, in ashmem_mmap. The main race there is > around the allocation of asma->file. That could definitely be a lock > local to ashmem_area. I'm OK if anyone wants to take that on but it > seems a lot of work for a driver with an unclear future. Well, it's not exactly a ton of work, but adding a per-ashmem_area lock to protect ->file would rather be putting lipstick on a pig. I suppose we can put the trylock in there and run away, but it wouldn't hurt to drop in a big fat comment somewhere explaining that the driver should be migrated to a per-object locking scheme. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 12:45 PM, Andrew Morton wrote: > A better approach would be to add a new __GFP_NOSHRINKERS, but it's all > variations on a theme. I don't like this proposal, either. Many of the existing GFP flags already exist to prevent recurse into that flag's respective shrinker. This problem seems a rare proper use of mutex_trylock. > The mutex_trylock(ashmem_mutex) will actually have the best > performance, because it skips the least amount of memory reclaim > opportunities. Right. > But it still sucks! The real problem is that there exists a lock > called "ashmem_mutex", taken by both the high-level mmap() and by the > low-level shrinker. And taken by everything else too! The ashmem > locking is pretty crude... The locking is "crude" because I optimized for space, not time, and there was (and is) no indication we were suffering lock contention due to the global lock. I haven't thought through the implications of pushing locking into the ashmem_area and ashmem_range objects, but it does look like we'd end up often grabbing all of the locks ... > What is the mutex_lock() in ashmem_mmap() actually protecting? I don't > see much, apart from perhaps some incidental races around the contents > of the file's ashmem_area, and those could/should be protected by a > per-object lock, not a global one? ... but not, as you note, in ashmem_mmap. The main race there is around the allocation of asma->file. That could definitely be a lock local to ashmem_area. I'm OK if anyone wants to take that on but it seems a lot of work for a driver with an unclear future. Robert -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, 16 May 2013 09:44:49 -0400 Robert Love wrote: > On Thu, May 16, 2013 at 4:15 AM, Raul Xiong wrote: > > The issue happens in such sequence: > > ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup > > called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink > > tries to acquire the same ashmem_mutex -- it blocks here. > > > > I think this reports the bug clearly. Please have a look. > > There is no debate about the nature of the bug. Only the fix. > > My mutex_trylock patch fixes the problem. I prefer that solution. > > Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate > that down into shmem and elsewhere. s/won't work/impractical/ A better approach would be to add a new __GFP_NOSHRINKERS, but it's all variations on a theme. > Using PF_MEMALLOC will work. You'd want to define something like: > > static int set_memalloc(void) > { > if (current->flags & PF_MEMALLOC) > return 0; > current->flags |= PF_MEMALLOC; > return 1; > } > > static void clear_memalloc(int memalloc) > { > if (memalloc) > current->flags &= ~PF_MEMALLOC; > } > > and then set/clear PF_MEMALLOC around every memory allocation and > function that descends into a memory allocation. As said I prefer my > solution but if someone wants to put together a patch with this > approach, fine by me. The mutex_trylock(ashmem_mutex) will actually have the best performance, because it skips the least amount of memory reclaim opportunities. But it still sucks! The real problem is that there exists a lock called "ashmem_mutex", taken by both the high-level mmap() and by the low-level shrinker. And taken by everything else too! The ashmem locking is pretty crude... What is the mutex_lock() in ashmem_mmap() actually protecting? I don't see much, apart from perhaps some incidental races around the contents of the file's ashmem_area, and those could/should be protected by a per-object lock, not a global one? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 4:15 AM, Raul Xiong wrote: > The issue happens in such sequence: > ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup > called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink > tries to acquire the same ashmem_mutex -- it blocks here. > > I think this reports the bug clearly. Please have a look. There is no debate about the nature of the bug. Only the fix. My mutex_trylock patch fixes the problem. I prefer that solution. Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate that down into shmem and elsewhere. Using PF_MEMALLOC will work. You'd want to define something like: static int set_memalloc(void) { if (current->flags & PF_MEMALLOC) return 0; current->flags |= PF_MEMALLOC; return 1; } static void clear_memalloc(int memalloc) { if (memalloc) current->flags &= ~PF_MEMALLOC; } and then set/clear PF_MEMALLOC around every memory allocation and function that descends into a memory allocation. As said I prefer my solution but if someone wants to put together a patch with this approach, fine by me. Robert -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Raul Xiong > > 2013/5/14 Neil Zhang : > > 2013/5/14 Greg Kroah-Hartman : > >> On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: > >>> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed > >>> into the > >>> shrinker code from within ashmem. Just bail out, avoiding a deadlock. > >>> This is > >>> fine, as ashmem cache pruning is advisory anyhow. > >>> > >>> Signed-off-by: Robert Love > >>> --- > >>> drivers/staging/android/ashmem.c | 6 +- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >> > >> Based on Andrew's review comments, I'll drop this from my queue and wait > >> for a "better" fix for this. > >> > >> thanks, > >> > >> greg k-h > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > > > > We found the similar issue these days. > > Add RaulXiong to paste the call stack. > > > > Best Regards, > > Neil Zhang > > Hi all, > I just encountered this deadlock during some stress test and it can be > described clearly by below function stack. Please check and suggest a > formal fix for it. > > [] (__schedule) from [] > [] (schedule_preempt_disabled) from [] > [] (__mutex_lock_slowpath) from [] > [] (mutex_lock) from [] > [] (ashmem_shrink) from [] > [] (shrink_slab) from [] > [] (try_to_free_pages) from [] > [] (__alloc_pages_nodemask) from [] > [] (new_slab) from [] > [] (__slab_alloc.isra.46.constprop.52) from [] > [] (kmem_cache_alloc) from [] > [] (shmem_alloc_inode) from [] > [] (alloc_inode) from [] > [] (new_inode_pseudo) from [] > [] (new_inode) from [] > [] (shmem_get_inode) from [] > [] (shmem_file_setup) from [] > [] (ashmem_mmap) from [] > [] (mmap_region) from [] > [] (sys_mmap_pgoff) from [] > > Thanks, > Raul Xiong Hi Andrew, Greg, Any feedback? The issue happens in such sequence: ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink tries to acquire the same ashmem_mutex -- it blocks here. I think this reports the bug clearly. Please have a look. Thanks, Raul Xiong -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Raul Xiong raulxi...@gmail.com 2013/5/14 Neil Zhang glacier1...@gmail.com: 2013/5/14 Greg Kroah-Hartman gre...@linuxfoundation.org: On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com --- drivers/staging/android/ashmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Based on Andrew's review comments, I'll drop this from my queue and wait for a better fix for this. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ We found the similar issue these days. Add RaulXiong to paste the call stack. Best Regards, Neil Zhang Hi all, I just encountered this deadlock during some stress test and it can be described clearly by below function stack. Please check and suggest a formal fix for it. [c05d3370] (__schedule) from [c05d3818] [c05d3818] (schedule_preempt_disabled) from [c05d2578] [c05d2578] (__mutex_lock_slowpath) from [c05d263c] [c05d263c] (mutex_lock) from [c0441dd8] [c0441dd8] (ashmem_shrink) from [c01ae00c] [c01ae00c] (shrink_slab) from [c01b0ec8] [c01b0ec8] (try_to_free_pages) from [c01a65ec] [c01a65ec] (__alloc_pages_nodemask) from [c01d0414] [c01d0414] (new_slab) from [c05cf3a0] [c05cf3a0] (__slab_alloc.isra.46.constprop.52) from [c01d08cc] [c01d08cc] (kmem_cache_alloc) from [c01b1f6c] [c01b1f6c] (shmem_alloc_inode) from [c01e8d18] [c01e8d18] (alloc_inode) from [c01ea3c4] [c01ea3c4] (new_inode_pseudo) from [c01ea404] [c01ea404] (new_inode) from [c01b157c] [c01b157c] (shmem_get_inode) from [c01b3eac] [c01b3eac] (shmem_file_setup) from [c0441d1c] [c0441d1c] (ashmem_mmap) from [c01c1908] [c01c1908] (mmap_region) from [c01c1eac] [c01c1eac] (sys_mmap_pgoff) from [c0112d80] Thanks, Raul Xiong Hi Andrew, Greg, Any feedback? The issue happens in such sequence: ashmem_mmap acquired ashmem_mutex -- ashmem_mutex:shmem_file_setup called kmem_cache_alloc -- shrink due to low memory -- ashmem_shrink tries to acquire the same ashmem_mutex -- it blocks here. I think this reports the bug clearly. Please have a look. Thanks, Raul Xiong -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 4:15 AM, Raul Xiong raulxi...@gmail.com wrote: The issue happens in such sequence: ashmem_mmap acquired ashmem_mutex -- ashmem_mutex:shmem_file_setup called kmem_cache_alloc -- shrink due to low memory -- ashmem_shrink tries to acquire the same ashmem_mutex -- it blocks here. I think this reports the bug clearly. Please have a look. There is no debate about the nature of the bug. Only the fix. My mutex_trylock patch fixes the problem. I prefer that solution. Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate that down into shmem and elsewhere. Using PF_MEMALLOC will work. You'd want to define something like: static int set_memalloc(void) { if (current-flags PF_MEMALLOC) return 0; current-flags |= PF_MEMALLOC; return 1; } static void clear_memalloc(int memalloc) { if (memalloc) current-flags = ~PF_MEMALLOC; } and then set/clear PF_MEMALLOC around every memory allocation and function that descends into a memory allocation. As said I prefer my solution but if someone wants to put together a patch with this approach, fine by me. Robert -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, 16 May 2013 09:44:49 -0400 Robert Love rl...@google.com wrote: On Thu, May 16, 2013 at 4:15 AM, Raul Xiong raulxi...@gmail.com wrote: The issue happens in such sequence: ashmem_mmap acquired ashmem_mutex -- ashmem_mutex:shmem_file_setup called kmem_cache_alloc -- shrink due to low memory -- ashmem_shrink tries to acquire the same ashmem_mutex -- it blocks here. I think this reports the bug clearly. Please have a look. There is no debate about the nature of the bug. Only the fix. My mutex_trylock patch fixes the problem. I prefer that solution. Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate that down into shmem and elsewhere. s/won't work/impractical/ A better approach would be to add a new __GFP_NOSHRINKERS, but it's all variations on a theme. Using PF_MEMALLOC will work. You'd want to define something like: static int set_memalloc(void) { if (current-flags PF_MEMALLOC) return 0; current-flags |= PF_MEMALLOC; return 1; } static void clear_memalloc(int memalloc) { if (memalloc) current-flags = ~PF_MEMALLOC; } and then set/clear PF_MEMALLOC around every memory allocation and function that descends into a memory allocation. As said I prefer my solution but if someone wants to put together a patch with this approach, fine by me. The mutex_trylock(ashmem_mutex) will actually have the best performance, because it skips the least amount of memory reclaim opportunities. But it still sucks! The real problem is that there exists a lock called ashmem_mutex, taken by both the high-level mmap() and by the low-level shrinker. And taken by everything else too! The ashmem locking is pretty crude... What is the mutex_lock() in ashmem_mmap() actually protecting? I don't see much, apart from perhaps some incidental races around the contents of the file's ashmem_area, and those could/should be protected by a per-object lock, not a global one? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 12:45 PM, Andrew Morton a...@linux-foundation.org wrote: A better approach would be to add a new __GFP_NOSHRINKERS, but it's all variations on a theme. I don't like this proposal, either. Many of the existing GFP flags already exist to prevent recurse into that flag's respective shrinker. This problem seems a rare proper use of mutex_trylock. The mutex_trylock(ashmem_mutex) will actually have the best performance, because it skips the least amount of memory reclaim opportunities. Right. But it still sucks! The real problem is that there exists a lock called ashmem_mutex, taken by both the high-level mmap() and by the low-level shrinker. And taken by everything else too! The ashmem locking is pretty crude... The locking is crude because I optimized for space, not time, and there was (and is) no indication we were suffering lock contention due to the global lock. I haven't thought through the implications of pushing locking into the ashmem_area and ashmem_range objects, but it does look like we'd end up often grabbing all of the locks ... What is the mutex_lock() in ashmem_mmap() actually protecting? I don't see much, apart from perhaps some incidental races around the contents of the file's ashmem_area, and those could/should be protected by a per-object lock, not a global one? ... but not, as you note, in ashmem_mmap. The main race there is around the allocation of asma-file. That could definitely be a lock local to ashmem_area. I'm OK if anyone wants to take that on but it seems a lot of work for a driver with an unclear future. Robert -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, 16 May 2013 13:08:17 -0400 Robert Love rl...@google.com wrote: On Thu, May 16, 2013 at 12:45 PM, Andrew Morton a...@linux-foundation.org wrote: A better approach would be to add a new __GFP_NOSHRINKERS, but it's all variations on a theme. I don't like this proposal, either. Many of the existing GFP flags already exist to prevent recurse into that flag's respective shrinker. This problem seems a rare proper use of mutex_trylock. Not really. The need for a trylock is often an indication that a subsystem has a locking misdesign. That is indeed the case here. The mutex_trylock(ashmem_mutex) will actually have the best performance, because it skips the least amount of memory reclaim opportunities. Right. But it still sucks! The real problem is that there exists a lock called ashmem_mutex, taken by both the high-level mmap() and by the low-level shrinker. And taken by everything else too! The ashmem locking is pretty crude... The locking is crude because I optimized for space, not time, and there was (and is) no indication we were suffering lock contention due to the global lock. I haven't thought through the implications of pushing locking into the ashmem_area and ashmem_range objects, but it does look like we'd end up often grabbing all of the locks ... What is the mutex_lock() in ashmem_mmap() actually protecting? I don't see much, apart from perhaps some incidental races around the contents of the file's ashmem_area, and those could/should be protected by a per-object lock, not a global one? ... but not, as you note, in ashmem_mmap. The main race there is around the allocation of asma-file. That could definitely be a lock local to ashmem_area. I'm OK if anyone wants to take that on but it seems a lot of work for a driver with an unclear future. Well, it's not exactly a ton of work, but adding a per-ashmem_area lock to protect -file would rather be putting lipstick on a pig. I suppose we can put the trylock in there and run away, but it wouldn't hurt to drop in a big fat comment somewhere explaining that the driver should be migrated to a per-object locking scheme. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 16, 2013 at 1:19 PM, Andrew Morton a...@linux-foundation.org wrote: On Thu, 16 May 2013 13:08:17 -0400 Robert Love rl...@google.com wrote: This problem seems a rare proper use of mutex_trylock. Not really. The need for a trylock is often an indication that a subsystem has a locking misdesign. That is indeed the case here. It is exactly the same as PF_MEMALLOC. We've got an effectively asynchronous event (shrinking) that can occur while you are holding locks requisite to that shrinking. Given that the shrinkage is best effort, a trylock actually communicates the intent pretty well: If possible, grab this lock and shrink. I think the idiomatic fix is to introduce a GFP_SHMEM but that seems overkill. Lots of the GFP flags are really just preventing recursing into the shrinkage code and it seems ill-designed that we require developers to know where they might end up. But we can disagree. :) Well, it's not exactly a ton of work, but adding a per-ashmem_area lock to protect -file would rather be putting lipstick on a pig. I suppose we can put the trylock in there and run away, but it wouldn't hurt to drop in a big fat comment somewhere explaining that the driver should be migrated to a per-object locking scheme. Unfortunately I think ashmem_shrink would need to grab the per-object lock too; it needs to update the ranges. I'm sure we could re-design this but I don't think it is as easy as simply pushing the locking into the objects. Robert -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Neil Zhang : > 2013/5/14 Greg Kroah-Hartman : >> On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: >>> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed >>> into the >>> shrinker code from within ashmem. Just bail out, avoiding a deadlock. >>> This is >>> fine, as ashmem cache pruning is advisory anyhow. >>> >>> Signed-off-by: Robert Love >>> --- >>> drivers/staging/android/ashmem.c | 6 +- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> Based on Andrew's review comments, I'll drop this from my queue and wait >> for a "better" fix for this. >> >> thanks, >> >> greg k-h >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > We found the similar issue these days. > Add RaulXiong to paste the call stack. > > Best Regards, > Neil Zhang Hi all, I just encountered this deadlock during some stress test and it can be described clearly by below function stack. Please check and suggest a formal fix for it. [] (__schedule) from [] [] (schedule_preempt_disabled) from [] [] (__mutex_lock_slowpath) from [] [] (mutex_lock) from [] [] (ashmem_shrink) from [] [] (shrink_slab) from [] [] (try_to_free_pages) from [] [] (__alloc_pages_nodemask) from [] [] (new_slab) from [] [] (__slab_alloc.isra.46.constprop.52) from [] [] (kmem_cache_alloc) from [] [] (shmem_alloc_inode) from [] [] (alloc_inode) from [] [] (new_inode_pseudo) from [] [] (new_inode) from [] [] (shmem_get_inode) from [] [] (shmem_file_setup) from [] [] (ashmem_mmap) from [] [] (mmap_region) from [] [] (sys_mmap_pgoff) from [] Thanks, Raul Xiong -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Greg Kroah-Hartman : > On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: >> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed >> into the >> shrinker code from within ashmem. Just bail out, avoiding a deadlock. >> This is >> fine, as ashmem cache pruning is advisory anyhow. >> >> Signed-off-by: Robert Love >> --- >> drivers/staging/android/ashmem.c | 6 +- >> 1 file changed, 5 insertions(+), 1 deletion(-) > > Based on Andrew's review comments, I'll drop this from my queue and wait > for a "better" fix for this. > > thanks, > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ We found the similar issue these days. Add RaulXiong to paste the call stack. Best Regards, Neil Zhang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is > fine, as ashmem cache pruning is advisory anyhow. > > Signed-off-by: Robert Love > --- > drivers/staging/android/ashmem.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) Based on Andrew's review comments, I'll drop this from my queue and wait for a "better" fix for this. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com --- drivers/staging/android/ashmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Based on Andrew's review comments, I'll drop this from my queue and wait for a better fix for this. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Greg Kroah-Hartman gre...@linuxfoundation.org: On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com --- drivers/staging/android/ashmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Based on Andrew's review comments, I'll drop this from my queue and wait for a better fix for this. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ We found the similar issue these days. Add RaulXiong to paste the call stack. Best Regards, Neil Zhang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
2013/5/14 Neil Zhang glacier1...@gmail.com: 2013/5/14 Greg Kroah-Hartman gre...@linuxfoundation.org: On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com --- drivers/staging/android/ashmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Based on Andrew's review comments, I'll drop this from my queue and wait for a better fix for this. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ We found the similar issue these days. Add RaulXiong to paste the call stack. Best Regards, Neil Zhang Hi all, I just encountered this deadlock during some stress test and it can be described clearly by below function stack. Please check and suggest a formal fix for it. [c05d3370] (__schedule) from [c05d3818] [c05d3818] (schedule_preempt_disabled) from [c05d2578] [c05d2578] (__mutex_lock_slowpath) from [c05d263c] [c05d263c] (mutex_lock) from [c0441dd8] [c0441dd8] (ashmem_shrink) from [c01ae00c] [c01ae00c] (shrink_slab) from [c01b0ec8] [c01b0ec8] (try_to_free_pages) from [c01a65ec] [c01a65ec] (__alloc_pages_nodemask) from [c01d0414] [c01d0414] (new_slab) from [c05cf3a0] [c05cf3a0] (__slab_alloc.isra.46.constprop.52) from [c01d08cc] [c01d08cc] (kmem_cache_alloc) from [c01b1f6c] [c01b1f6c] (shmem_alloc_inode) from [c01e8d18] [c01e8d18] (alloc_inode) from [c01ea3c4] [c01ea3c4] (new_inode_pseudo) from [c01ea404] [c01ea404] (new_inode) from [c01b157c] [c01b157c] (shmem_get_inode) from [c01b3eac] [c01b3eac] (shmem_file_setup) from [c0441d1c] [c0441d1c] (ashmem_mmap) from [c01c1908] [c01c1908] (mmap_region) from [c01c1eac] [c01c1eac] (sys_mmap_pgoff) from [c0112d80] Thanks, Raul Xiong -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013 09:56:13 -0400 Robert Love wrote: > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is > fine, as ashmem cache pruning is advisory anyhow. > Sorry, but I don't think "somehow" is an adequate description of a kernel bug. The deadlock should be described with specificity, so that others can understand and review the fix and perhaps suggest alternative implementations. Presumably someone is performing a memory allocation while holding ashmem_mutex. A more idiomatic way of avoiding a call to direct reclaim in these circumstances would be for the task to set its PF_MEMALLOC flag, or to use GFP_ATOMIC. But without any details that's as far as I can go. > --- a/drivers/staging/android/ashmem.c > +++ b/drivers/staging/android/ashmem.c > @@ -363,7 +363,11 @@ static int ashmem_shrink(struct shrinker *s, struct > shrink_control *sc) > if (!sc->nr_to_scan) > return lru_count; > > - mutex_lock(_mutex); > + /* avoid recursing into this code from within ashmem itself */ > + if (!mutex_trylock(_mutex)) { > + return -1; > + } This is rather hacky. It consumes more CPU than the above approaches, and more stack. Worst of all, it obviously hasn't met checkpatch.pl ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013 09:56:13 -0400 Robert Love rl...@google.com wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Sorry, but I don't think somehow is an adequate description of a kernel bug. The deadlock should be described with specificity, so that others can understand and review the fix and perhaps suggest alternative implementations. Presumably someone is performing a memory allocation while holding ashmem_mutex. A more idiomatic way of avoiding a call to direct reclaim in these circumstances would be for the task to set its PF_MEMALLOC flag, or to use GFP_ATOMIC. But without any details that's as far as I can go. --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -363,7 +363,11 @@ static int ashmem_shrink(struct shrinker *s, struct shrink_control *sc) if (!sc-nr_to_scan) return lru_count; - mutex_lock(ashmem_mutex); + /* avoid recursing into this code from within ashmem itself */ + if (!mutex_trylock(ashmem_mutex)) { + return -1; + } This is rather hacky. It consumes more CPU than the above approaches, and more stack. Worst of all, it obviously hasn't met checkpatch.pl ;) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 02, 2013 at 11:22:18AM -0700, David Rientjes wrote: > On Wed, 1 May 2013, David Rientjes wrote: > > > > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed > > > into the > > > shrinker code from within ashmem. Just bail out, avoiding a deadlock. > > > This is > > > fine, as ashmem cache pruning is advisory anyhow. > > > > > > Signed-off-by: Robert Love > > > > Any reason not to send this to sta...@vger.kernel.org if it fixes an > > observable deadlock? (It's annotated to be applied to linux-next, but I > > don't see any differences between it and Linus's tree.) > > > > This was sent separately to sta...@vger.kernel.org before being merged > into Linus's tree . Greg, could this be queued up for 3.10 with a cc to > sta...@vger.kernel.org? Yes, I'll handle all of this properly, thanks. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013, David Rientjes wrote: > > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into > > the > > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This > > is > > fine, as ashmem cache pruning is advisory anyhow. > > > > Signed-off-by: Robert Love > > Any reason not to send this to sta...@vger.kernel.org if it fixes an > observable deadlock? (It's annotated to be applied to linux-next, but I > don't see any differences between it and Linus's tree.) > This was sent separately to sta...@vger.kernel.org before being merged into Linus's tree . Greg, could this be queued up for 3.10 with a cc to sta...@vger.kernel.org? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013, David Rientjes wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com Any reason not to send this to sta...@vger.kernel.org if it fixes an observable deadlock? (It's annotated to be applied to linux-next, but I don't see any differences between it and Linus's tree.) This was sent separately to sta...@vger.kernel.org before being merged into Linus's tree . Greg, could this be queued up for 3.10 with a cc to sta...@vger.kernel.org? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Thu, May 02, 2013 at 11:22:18AM -0700, David Rientjes wrote: On Wed, 1 May 2013, David Rientjes wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com Any reason not to send this to sta...@vger.kernel.org if it fixes an observable deadlock? (It's annotated to be applied to linux-next, but I don't see any differences between it and Linus's tree.) This was sent separately to sta...@vger.kernel.org before being merged into Linus's tree . Greg, could this be queued up for 3.10 with a cc to sta...@vger.kernel.org? Yes, I'll handle all of this properly, thanks. greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013, Robert Love wrote: > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is > fine, as ashmem cache pruning is advisory anyhow. > > Signed-off-by: Robert Love Any reason not to send this to sta...@vger.kernel.org if it fixes an observable deadlock? (It's annotated to be applied to linux-next, but I don't see any differences between it and Linus's tree.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
On Wed, 1 May 2013, Robert Love wrote: Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is fine, as ashmem cache pruning is advisory anyhow. Signed-off-by: Robert Love rl...@google.com Any reason not to send this to sta...@vger.kernel.org if it fixes an observable deadlock? (It's annotated to be applied to linux-next, but I don't see any differences between it and Linus's tree.) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/