On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote: > Gilles Chanteperdrix wrote: > > On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote: > >> Gilles Chanteperdrix wrote: > >>> On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote: > >>>> Gilles Chanteperdrix wrote: > >>>>> Jan Kiszka wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> we are getting a lot of > >>>>>> > >>>>>> BUG: sleeping function called from invalid context at > >>>>>> mm/page_alloc.c:1225 > >>>>>> in_atomic():1, irqs_disabled():0 > >>>>>> [<c010305d>] show_trace_log_lvl+0x1a/0x2f > >>>>>> [<c0103156>] show_trace+0x12/0x14 > >>>>>> [<c0103915>] dump_stack+0x16/0x18 > >>>>>> [<c010c4ab>] __might_sleep+0xcd/0xd3 > >>>>>> [<c0149488>] __alloc_pages+0x32/0x281 > >>>>>> [<c014fdd2>] copy_page_range+0x221/0x41e > >>>>>> [<c010ec18>] copy_process+0x9e1/0xfe2 > >>>>>> [<c010f415>] do_fork+0x99/0x176 > >>>>>> [<c0100e75>] sys_clone+0x33/0x39 > >>>>>> [<c0102aaf>] syscall_call+0x7/0xb > >>>>>> ======================= > >>>>>> > >>>>>> here due to a Xenomai program issuing system() calls. > >>>>>> > >>>>>> After once again dissecting the "nice" mm code (sigh...), the reason > >>>>>> turned out to be plain simple: > >>>>>> > >>>>>> copy_pte_range(...); > >>>>>> spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); > >>>>>> copy_one_pte(...); > >>>>>> if (is_cow_mapping(vm_flags)) > >>>>>> alloc_page_vma(GFP_HIGHUSER, ...); > >>>>>> __alloc_pages(...) > >>>>>> might_sleep_if(gfp_mask & __GFP_WAIT); > >>>>>> > >>>>>> And this is true due to #define GFP_HIGHUSER (__GFP_WAIT | ... > >>>>>> > >>>>>> So the bad news is that the COW code in likely all i-pipe versions is > >>>>>> broken. But the good new is that this might be easily fixable by > >>>>>> providing the right gfp_mask. GFP_ATOMIC? > >>>>> It does not look like a good solution, you are going to empty the > >>>>> GFP_ATOMIC pools. The proper solution would rather be to look at the > >>>>> real mm code (I mean not the one I wrote) and see how they cope with > >>>>> this issue. > >>>> Mmpf. What are the chances for a quick fix within the next days? We have > >>>> to consider alternatives right now here because the whole system is > >>>> meant for production purpose next week (C-ELROB '07). > >>>> > >>>> OK, I'm already finding myself inside the code :-/. What about this > >>>> approach: We try to alloc with GFP_ATOMIC. Once this fails, we break > >>>> out, drop all locks (just like it happens in case of need_resched()), > >>>> try to fill up the pool, and restart then. What would reliably make > >>>> Linux refill its atomic pool? > >>>> > >>>> Alternative approach: preallocate the required pages before entering the > >>>> loop in copy_pte_range. But that may require more code changes. > >>> I would say the real fix is to drop momentarily the spinlock(s?) for > >>> allocating. > >>> > >> Are you sure it's safe to drop locks in the (logical) middle of > >> copy_one_pte()? I can't tell yet from the few glances I took. It's just > >> my feeling that says "no" so far. > > > > There is certainly something possible, since the vanilla kernel > > actually works without these warning. > > Vanilla doesn't allocate pages from within copy_one_pte.
The fact that you are in a hurry should not be an excuse to propose a fix which is much worse than the bug itself. -- Gilles Chanteperdrix _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core