Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi David, others Results seem OK recap: I have 2 6core 64bit opterons and I make -j13 I do # echo always >/sys/kernel/mm/transparent_hugepage/enabled # while [ 1 ] do sleep 10 date echo = vmstat egrep "(thp|compact)" /proc/vmstat echo = khugepaged stack cat /proc/501/stack done > /tmp/49361. # emerge icedtea (where 501 = pidof khugepaged) for = base = 3.6.6 and = test = 3.6.6 + diff you provided I attach /tmp/49361.base.gz and /tmp/49361.test.gz Note: with xxx=base, I could see PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8617 root 20 0 3620m 41m 10m S 988.3 0.5 6:19.06 javac 1 root 20 0 4208 588 556 S 0.0 0.0 0:03.25 init already during configure and I needed to kill -9 javac with xxx=test, I could see PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9275 root 20 0 2067m 474m 10m S 304.2 5.9 0:32.81 javac 710 root 0 -20 000 S 0.3 0.0 0:01.07 kworker/0:1H later when processing >700 java files Also note that with xxx=test compact_blocks_moved stays 0 hope this helps Thanks have a nice day On 2012 Nov 15, Marc Duponcheel wrote: > Hi David > > Thanks for the changeset > > I will test 3.6.6 without this weekend. > > Have a nice day -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be 49361.base.gz Description: Binary data 49361.test.gz Description: Binary data
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi David, others Results seem OK recap: I have 2 6core 64bit opterons and I make -j13 I do # echo always /sys/kernel/mm/transparent_hugepage/enabled # while [ 1 ] do sleep 10 date echo = vmstat egrep (thp|compact) /proc/vmstat echo = khugepaged stack cat /proc/501/stack done /tmp/49361. # emerge icedtea (where 501 = pidof khugepaged) for = base = 3.6.6 and = test = 3.6.6 + diff you provided I attach /tmp/49361.base.gz and /tmp/49361.test.gz Note: with xxx=base, I could see PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8617 root 20 0 3620m 41m 10m S 988.3 0.5 6:19.06 javac 1 root 20 0 4208 588 556 S 0.0 0.0 0:03.25 init already during configure and I needed to kill -9 javac with xxx=test, I could see PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9275 root 20 0 2067m 474m 10m S 304.2 5.9 0:32.81 javac 710 root 0 -20 000 S 0.3 0.0 0:01.07 kworker/0:1H later when processing 700 java files Also note that with xxx=test compact_blocks_moved stays 0 hope this helps Thanks have a nice day On 2012 Nov 15, Marc Duponcheel wrote: Hi David Thanks for the changeset I will test 3.6.6 withoutwith this weekend. Have a nice day -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be 49361.base.gz Description: Binary data 49361.test.gz Description: Binary data
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi David Thanks for the changeset I will test 3.6.6 without this weekend. Have a nice day On 2012 Nov 14, #David Rientjes wrote: > On Wed, 14 Nov 2012, Marc Duponcheel wrote: > > > Hi all > > > > If someone can provide the patches (or learn me how to get them with > > git (I apologise to not be git savy)) then, this weekend, I can apply > > them to 3.6.6 and compare before/after to check if they fix #49361. > > > > I've backported all the commits that Mel quoted to 3.6.6 and appended them > to this email as one big patch. It should apply cleanly to your kernel. > > Now we are only missing these commits that weren't quoted: > > - 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page > immediately when it is made available"), and > > - 83fde0f22872 ("mm: vmscan: scale number of pages reclaimed by > reclaim/compaction based on failures"). > > Since your regression is easily reproducible, would it be possible to try > to reproduce the issue FIRST with 3.6.6 and, if still present as it was in > 3.6.2, then try reproducing it with the appended patch? > > You earlier reported that khugepaged was taking the second-most cpu time > when this was happening, which initially pointed you to thp, so presumably > this isn't a kswapd issue running at 100%. If both 3.6.6 kernels fail > (the one with and without the following patch), would it be possible to > try Mel's suggestion of patching with > > - https://lkml.org/lkml/2012/11/5/308 + >https://lkml.org/lkml/2012/11/12/113 > > to see if it helps and, if not, reverting the latter and trying > > - https://lkml.org/lkml/2012/11/5/308 + >https://lkml.org/lkml/2012/11/12/151 > > as the final test? This will certainly help us to find out what needs to > be backported to 3.6 stable to prevent this issue for other users. > > Thanks! > --- > include/linux/compaction.h | 15 ++ > include/linux/mmzone.h |6 +- > include/linux/pageblock-flags.h | 19 +- > mm/compaction.c | 450 > +-- > mm/internal.h | 16 +- > mm/page_alloc.c | 42 ++-- > mm/vmscan.c |8 + > 7 files changed, 366 insertions(+), 190 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -24,6 +24,7 @@ extern unsigned long try_to_compact_pages(struct zonelist > *zonelist, > int order, gfp_t gfp_mask, nodemask_t *mask, > bool sync, bool *contended); > extern int compact_pgdat(pg_data_t *pgdat, int order); > +extern void reset_isolation_suitable(pg_data_t *pgdat); > extern unsigned long compaction_suitable(struct zone *zone, int order); > > /* Do not skip compaction more than 64 times */ > @@ -61,6 +62,16 @@ static inline bool compaction_deferred(struct zone *zone, > int order) > return zone->compact_considered < defer_limit; > } > > +/* Returns true if restarting compaction after many failures */ > +static inline bool compaction_restarting(struct zone *zone, int order) > +{ > + if (order < zone->compact_order_failed) > + return false; > + > + return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT && > + zone->compact_considered >= 1UL << zone->compact_defer_shift; > +} > + > #else > static inline unsigned long try_to_compact_pages(struct zonelist *zonelist, > int order, gfp_t gfp_mask, nodemask_t *nodemask, > @@ -74,6 +85,10 @@ static inline int compact_pgdat(pg_data_t *pgdat, int > order) > return COMPACT_CONTINUE; > } > > +static inline void reset_isolation_suitable(pg_data_t *pgdat) > +{ > +} > + > static inline unsigned long compaction_suitable(struct zone *zone, int order) > { > return COMPACT_SKIPPED; > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -369,8 +369,12 @@ struct zone { > spinlock_t lock; > int all_unreclaimable; /* All pages pinned */ > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > - /* pfn where the last incremental compaction isolated free pages */ > + /* Set to true when the PG_migrate_skip bits should be cleared */ > + boolcompact_blockskip_flush; > + > + /* pfns where compaction scanners should start */ > unsigned long compact_c
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi all If someone can provide the patches (or learn me how to get them with git (I apologise to not be git savy)) then, this weekend, I can apply them to 3.6.6 and compare before/after to check if they fix #49361. Thanks On 2012 Nov 14, Mel Gorman wrote: > On Tue, Nov 13, 2012 at 03:41:02PM -0800, David Rientjes wrote: > > On Tue, 13 Nov 2012, Andy Lutomirski wrote: > > > > > It just happened again. > > > > > > $ grep -E "compact_|thp_" /proc/vmstat > > > compact_blocks_moved 8332448774 > > > compact_pages_moved 21831286 > > > compact_pagemigrate_failed 211260 > > > compact_stall 13484 > > > compact_fail 6717 > > > compact_success 6755 > > > thp_fault_alloc 150665 > > > thp_fault_fallback 4270 > > > thp_collapse_alloc 19771 > > > thp_collapse_alloc_failed 2188 > > > thp_split 19600 > > > > > > > Two of the patches from the list provided at > > http://marc.info/?l=linux-mm=135179005510688 are already in your 3.6.3 > > kernel: > > > > mm: compaction: abort compaction loop if lock is contended or run too > > long > > mm: compaction: acquire the zone->lock as late as possible > > > > and all have not made it to the 3.6 stable kernel yet, so would it be > > possible to try with 3.7-rc5 to see if it fixes the issue? If so, it will > > indicate that the entire series is a candidate to backport to 3.6. > > Thanks David once again. > > The full list of compaction-related patches I believe are necessary for > this particular problem are > > e64c5237cf6ff474cb2f3f832f48f2b441dd9979 mm: compaction: abort compaction > loop if lock is contended or run too long > 3cc668f4e30fbd97b3c0574d8cac7a83903c9bc7 mm: compaction: move fatal signal > check out of compact_checklock_irqsave > 661c4cb9b829110cb68c18ea05a56be39f75a4d2 mm: compaction: Update > try_to_compact_pages()kerneldoc comment > 2a1402aa044b55c2d30ab0ed9405693ef06fb07c mm: compaction: acquire the > zone->lru_lock as late as possible > f40d1e42bb988d2a26e8e111ea4c4c7bac819b7e mm: compaction: acquire the > zone->lock as late as possible > 753341a4b85ff337487b9959c71c529f522004f4 revert "mm: have order > 0 > compaction start off where it left" > bb13ffeb9f6bfeb301443994dfbf29f91117dfb3 mm: compaction: cache if a pageblock > was scanned and no pages were isolated > c89511ab2f8fe2b47585e60da8af7fd213ec877e mm: compaction: Restart compaction > from near where it left off > 62997027ca5b3d4618198ed8b1aba40b61b1137b mm: compaction: clear > PG_migrate_skip based on compaction and reclaim activity > 0db63d7e25f96e2c6da925c002badf6f144ddf30 mm: compaction: correct the > nr_strict va isolated check for CMA > > If we can get confirmation that these fix the problem in 3.6 kernels then > I can backport them to -stable. This fixing a problem where "many processes > stall, all in an isolation-related function". This started happening after > lumpy reclaim was removed because we depended on that to aggressively > reclaim with less compaction. Now compaction is depended upon more. > > The full 3.7-rc5 kernel has a different problem on top of this and it's > important the problems do not get conflacted. It has these fixes *but* > GFP_NO_KSWAPD has been removed and there is a patch that scales reclaim > with THP failures that is causing problem. With them, kswapd can get > stuck in a 100% loop where it is neither reclaiming nor reaching its exit > conditions. The correct fix would be to identify why this happens but I > have not got around to it yet. To test with 3.7-rc5 then apply either > > 1) https://lkml.org/lkml/2012/11/5/308 > 2) https://lkml.org/lkml/2012/11/12/113 > > or > > 1) https://lkml.org/lkml/2012/11/5/308 > 3) https://lkml.org/lkml/2012/11/12/151 > > on top of 3.7-rc5. So it's a lot of work but there are three tests I'm > interested in hearing about. The results of each determine what happens > in -stable or mainline > > Test 1: 3.6 + the last of commits above (should fix processes stick in > isolate) > Test 2: 3.7-rc5 + (1+2) above (should fix kswapd stuck at 100%) > Test 3: 3.7-rc5 + (1+3) above (should fix kswapd stuck at 100% but better) > > Thanks. > > -- > Mel Gorman > SUSE Labs > -- -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
On 2012 Nov 13, David Rientjes wrote: > On Wed, 14 Nov 2012, Marc Duponcheel wrote: > > > Hi all, please let me know if there is are patches you want me to try. > > > > FWIW time did not stand still and I run 3.6.6 now. > > Hmm, interesting since there are no core VM changes between 3.6.2, the > kernel you ran into problems with, and 3.6.6. Hi David I have not tried yet to repro #49361 on 3.6.6, but, as you say, if there are no core VM changes, I am confident I can do so just by doing # echo always > /sys/kernel/mm/transparent_hugepage/enabled I am at your disposal to test further, and, if there are patches, to try them out. Note that I only once experienced a crash for which I could not find relevant info in logs. But the hanging processes issue could always be reproduced consistently. have a nice day -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
On 2012 Nov 13, David Rientjes wrote: On Wed, 14 Nov 2012, Marc Duponcheel wrote: Hi all, please let me know if there is are patches you want me to try. FWIW time did not stand still and I run 3.6.6 now. Hmm, interesting since there are no core VM changes between 3.6.2, the kernel you ran into problems with, and 3.6.6. Hi David I have not tried yet to repro #49361 on 3.6.6, but, as you say, if there are no core VM changes, I am confident I can do so just by doing # echo always /sys/kernel/mm/transparent_hugepage/enabled I am at your disposal to test further, and, if there are patches, to try them out. Note that I only once experienced a crash for which I could not find relevant info in logs. But the hanging processes issue could always be reproduced consistently. have a nice day -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi all If someone can provide the patches (or learn me how to get them with git (I apologise to not be git savy)) then, this weekend, I can apply them to 3.6.6 and compare before/after to check if they fix #49361. Thanks On 2012 Nov 14, Mel Gorman wrote: On Tue, Nov 13, 2012 at 03:41:02PM -0800, David Rientjes wrote: On Tue, 13 Nov 2012, Andy Lutomirski wrote: It just happened again. $ grep -E compact_|thp_ /proc/vmstat compact_blocks_moved 8332448774 compact_pages_moved 21831286 compact_pagemigrate_failed 211260 compact_stall 13484 compact_fail 6717 compact_success 6755 thp_fault_alloc 150665 thp_fault_fallback 4270 thp_collapse_alloc 19771 thp_collapse_alloc_failed 2188 thp_split 19600 Two of the patches from the list provided at http://marc.info/?l=linux-mmm=135179005510688 are already in your 3.6.3 kernel: mm: compaction: abort compaction loop if lock is contended or run too long mm: compaction: acquire the zone-lock as late as possible and all have not made it to the 3.6 stable kernel yet, so would it be possible to try with 3.7-rc5 to see if it fixes the issue? If so, it will indicate that the entire series is a candidate to backport to 3.6. Thanks David once again. The full list of compaction-related patches I believe are necessary for this particular problem are e64c5237cf6ff474cb2f3f832f48f2b441dd9979 mm: compaction: abort compaction loop if lock is contended or run too long 3cc668f4e30fbd97b3c0574d8cac7a83903c9bc7 mm: compaction: move fatal signal check out of compact_checklock_irqsave 661c4cb9b829110cb68c18ea05a56be39f75a4d2 mm: compaction: Update try_to_compact_pages()kerneldoc comment 2a1402aa044b55c2d30ab0ed9405693ef06fb07c mm: compaction: acquire the zone-lru_lock as late as possible f40d1e42bb988d2a26e8e111ea4c4c7bac819b7e mm: compaction: acquire the zone-lock as late as possible 753341a4b85ff337487b9959c71c529f522004f4 revert mm: have order 0 compaction start off where it left bb13ffeb9f6bfeb301443994dfbf29f91117dfb3 mm: compaction: cache if a pageblock was scanned and no pages were isolated c89511ab2f8fe2b47585e60da8af7fd213ec877e mm: compaction: Restart compaction from near where it left off 62997027ca5b3d4618198ed8b1aba40b61b1137b mm: compaction: clear PG_migrate_skip based on compaction and reclaim activity 0db63d7e25f96e2c6da925c002badf6f144ddf30 mm: compaction: correct the nr_strict va isolated check for CMA If we can get confirmation that these fix the problem in 3.6 kernels then I can backport them to -stable. This fixing a problem where many processes stall, all in an isolation-related function. This started happening after lumpy reclaim was removed because we depended on that to aggressively reclaim with less compaction. Now compaction is depended upon more. The full 3.7-rc5 kernel has a different problem on top of this and it's important the problems do not get conflacted. It has these fixes *but* GFP_NO_KSWAPD has been removed and there is a patch that scales reclaim with THP failures that is causing problem. With them, kswapd can get stuck in a 100% loop where it is neither reclaiming nor reaching its exit conditions. The correct fix would be to identify why this happens but I have not got around to it yet. To test with 3.7-rc5 then apply either 1) https://lkml.org/lkml/2012/11/5/308 2) https://lkml.org/lkml/2012/11/12/113 or 1) https://lkml.org/lkml/2012/11/5/308 3) https://lkml.org/lkml/2012/11/12/151 on top of 3.7-rc5. So it's a lot of work but there are three tests I'm interested in hearing about. The results of each determine what happens in -stable or mainline Test 1: 3.6 + the last of commits above (should fix processes stick in isolate) Test 2: 3.7-rc5 + (1+2) above (should fix kswapd stuck at 100%) Test 3: 3.7-rc5 + (1+3) above (should fix kswapd stuck at 100% but better) Thanks. -- Mel Gorman SUSE Labs -- -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi David Thanks for the changeset I will test 3.6.6 withoutwith this weekend. Have a nice day On 2012 Nov 14, #David Rientjes wrote: On Wed, 14 Nov 2012, Marc Duponcheel wrote: Hi all If someone can provide the patches (or learn me how to get them with git (I apologise to not be git savy)) then, this weekend, I can apply them to 3.6.6 and compare before/after to check if they fix #49361. I've backported all the commits that Mel quoted to 3.6.6 and appended them to this email as one big patch. It should apply cleanly to your kernel. Now we are only missing these commits that weren't quoted: - 1fb3f8ca0e92 (mm: compaction: capture a suitable high-order page immediately when it is made available), and - 83fde0f22872 (mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures). Since your regression is easily reproducible, would it be possible to try to reproduce the issue FIRST with 3.6.6 and, if still present as it was in 3.6.2, then try reproducing it with the appended patch? You earlier reported that khugepaged was taking the second-most cpu time when this was happening, which initially pointed you to thp, so presumably this isn't a kswapd issue running at 100%. If both 3.6.6 kernels fail (the one with and without the following patch), would it be possible to try Mel's suggestion of patching with - https://lkml.org/lkml/2012/11/5/308 + https://lkml.org/lkml/2012/11/12/113 to see if it helps and, if not, reverting the latter and trying - https://lkml.org/lkml/2012/11/5/308 + https://lkml.org/lkml/2012/11/12/151 as the final test? This will certainly help us to find out what needs to be backported to 3.6 stable to prevent this issue for other users. Thanks! --- include/linux/compaction.h | 15 ++ include/linux/mmzone.h |6 +- include/linux/pageblock-flags.h | 19 +- mm/compaction.c | 450 +-- mm/internal.h | 16 +- mm/page_alloc.c | 42 ++-- mm/vmscan.c |8 + 7 files changed, 366 insertions(+), 190 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -24,6 +24,7 @@ extern unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask, bool sync, bool *contended); extern int compact_pgdat(pg_data_t *pgdat, int order); +extern void reset_isolation_suitable(pg_data_t *pgdat); extern unsigned long compaction_suitable(struct zone *zone, int order); /* Do not skip compaction more than 64 times */ @@ -61,6 +62,16 @@ static inline bool compaction_deferred(struct zone *zone, int order) return zone-compact_considered defer_limit; } +/* Returns true if restarting compaction after many failures */ +static inline bool compaction_restarting(struct zone *zone, int order) +{ + if (order zone-compact_order_failed) + return false; + + return zone-compact_defer_shift == COMPACT_MAX_DEFER_SHIFT + zone-compact_considered = 1UL zone-compact_defer_shift; +} + #else static inline unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *nodemask, @@ -74,6 +85,10 @@ static inline int compact_pgdat(pg_data_t *pgdat, int order) return COMPACT_CONTINUE; } +static inline void reset_isolation_suitable(pg_data_t *pgdat) +{ +} + static inline unsigned long compaction_suitable(struct zone *zone, int order) { return COMPACT_SKIPPED; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -369,8 +369,12 @@ struct zone { spinlock_t lock; int all_unreclaimable; /* All pages pinned */ #if defined CONFIG_COMPACTION || defined CONFIG_CMA - /* pfn where the last incremental compaction isolated free pages */ + /* Set to true when the PG_migrate_skip bits should be cleared */ + boolcompact_blockskip_flush; + + /* pfns where compaction scanners should start */ unsigned long compact_cached_free_pfn; + unsigned long compact_cached_migrate_pfn; #endif #ifdef CONFIG_MEMORY_HOTPLUG /* see spanned/present_pages for more description */ diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -30,6 +30,9 @@ enum pageblock_bits { PB_migrate, PB_migrate_end = PB_migrate + 3 - 1, /* 3 bits required for migrate types */ +#ifdef CONFIG_COMPACTION
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi all, please let me know if there is are patches you want me to try. FWIW time did not stand still and I run 3.6.6 now. On 2012 Nov 13, #David Rientjes wrote: > On Tue, 13 Nov 2012, Andy Lutomirski wrote: > > > >> $ grep -E "compact_|thp_" /proc/vmstat > > >> compact_blocks_moved 8332448774 > > >> compact_pages_moved 21831286 > > >> compact_pagemigrate_failed 211260 > > >> compact_stall 13484 > > >> compact_fail 6717 > > >> compact_success 6755 > > >> thp_fault_alloc 150665 > > >> thp_fault_fallback 4270 > > >> thp_collapse_alloc 19771 > > >> thp_collapse_alloc_failed 2188 > > >> thp_split 19600 > > >> > > > > > > Two of the patches from the list provided at > > > http://marc.info/?l=linux-mm=135179005510688 are already in your 3.6.3 > > > kernel: > > > > > > mm: compaction: abort compaction loop if lock is contended or run > > > too long > > > mm: compaction: acquire the zone->lock as late as possible > > > > > > and all have not made it to the 3.6 stable kernel yet, so would it be > > > possible to try with 3.7-rc5 to see if it fixes the issue? If so, it will > > > indicate that the entire series is a candidate to backport to 3.6. > > > > I'll try later on. The last time I tried to boot 3.7 on this box, it > > failed impressively (presumably due to a localmodconfig bug, but I > > haven't tracked it down yet). > > > > I'm also not sure how reliably I can reproduce this. > > > > The challenge goes out to Marc too since he reported this issue on 3.6.2 > but we haven't heard back yet on the success of the backport (although > it's probably easier to try 3.7-rc5 since there are some conflicts to > resolve). -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.6 regression?] THP + migration/compaction livelock (I think)
Hi all, please let me know if there is are patches you want me to try. FWIW time did not stand still and I run 3.6.6 now. On 2012 Nov 13, #David Rientjes wrote: On Tue, 13 Nov 2012, Andy Lutomirski wrote: $ grep -E compact_|thp_ /proc/vmstat compact_blocks_moved 8332448774 compact_pages_moved 21831286 compact_pagemigrate_failed 211260 compact_stall 13484 compact_fail 6717 compact_success 6755 thp_fault_alloc 150665 thp_fault_fallback 4270 thp_collapse_alloc 19771 thp_collapse_alloc_failed 2188 thp_split 19600 Two of the patches from the list provided at http://marc.info/?l=linux-mmm=135179005510688 are already in your 3.6.3 kernel: mm: compaction: abort compaction loop if lock is contended or run too long mm: compaction: acquire the zone-lock as late as possible and all have not made it to the 3.6 stable kernel yet, so would it be possible to try with 3.7-rc5 to see if it fixes the issue? If so, it will indicate that the entire series is a candidate to backport to 3.6. I'll try later on. The last time I tried to boot 3.7 on this box, it failed impressively (presumably due to a localmodconfig bug, but I haven't tracked it down yet). I'm also not sure how reliably I can reproduce this. The challenge goes out to Marc too since he reported this issue on 3.6.2 but we haven't heard back yet on the success of the backport (although it's probably easier to try 3.7-rc5 since there are some conflicts to resolve). -- Marc Duponcheel Velodroomstraat 74 - 2600 Berchem - Belgium +32 (0)478 68.10.91 - m...@offline.be -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] (was: [OOPS] dquot_transfer() - 2.4.0-test8)
Hi Martin > well, was a little bit to pessimistic. After some look at the code > I'm pretty sure the obvious check will solve it - succesfully tested > on local UP box. FYI: your patch made my 2 (quota enabled) boxes happy (they did not boot 2.4.0-test8 to completion) Thanks! -- Marc Duponcheel. [work] [EMAIL PROTECTED] [home] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] (was: [OOPS] dquot_transfer() - 2.4.0-test8)
Hi Martin well, was a little bit to pessimistic. After some look at the code I'm pretty sure the obvious check will solve it - succesfully tested on local UP box. FYI: your patch made my 2 (quota enabled) boxes happy (they did not boot 2.4.0-test8 to completion) Thanks! -- Marc Duponcheel. [work] [EMAIL PROTECTED] [home] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/