On 05/05/15 08:00, Thomas Meyer wrote: > >> Am 04.05.2015 um 18:22 schrieb Anton Ivanov <anton.iva...@kot-begemot.co.uk>: >> >> Hi Thomas, > Hi Anton, > >> I read it and like it except idle sleep. See my comments inline. I >> killed that nanosleep quite on purpose :) > Okay. I think I understand now. > >> With what you did to the tramp the issue of userspace stalling >> should go away by the way (in theory). I think that it will still show a >> very HZ behavior, but IMHO that cannot be helped at this stage. > Do you mean the user space processes with above statement? What else could be > done with them if not trigger at HZ?
I do not think that you can do anything because you cannot keep the userspace timers sync-ed to the kernel space without risking races. Frankly, even if it just does HZ will be better than now. > >> A. >> >> >>> On 03/05/15 16:46, Thomas Meyer wrote: >>> Am Samstag, den 02.05.2015, 12:08 +0100 schrieb Anton Ivanov: >>> * This is a per-cpu array. A processor only modifies its entry and it >>> only >>> @@ -204,8 +205,16 @@ void arch_cpu_idle(void) >>> unsigned long long nsecs; >>> >>> cpu_tasks[current_thread_info()->cpu].pid = os_getpid(); >>> - nsecs = disable_timer(); >>> - idle_sleep(nsecs); >>> + >>> + //WHAT? >> The way idle sleep was originally implemented is by setting a nanosleep >> for nsecs until the next expected timer firing. After that the sleep >> routine used to generated a timer interrupt. >> >> It was killing two ducks with one shot - the original timer is based on >> VTALRM and virtual clock. So if you are idle enough to go into sleep you >> cannot rely on the signal from itimer (as this is dependent on your CPU >> usage) to wake you up on time. In fact, you are never going to wake up. >> >> Setting a nano-sleep dealt with both problems - idle sleeping and waking >> up on time as expressed in absolute wall clock. >> >> If your wall clock is no longer running off VIRTUAL and is now off >> MONOTONIC you do not need to do this. You just go to sleep without >> disabling the next timer interrupt, the next timer interrupt happens on >> schedule (as it is now off the right clock), you wake up from sleep. Two >> syscalls less, same effect. > Okay, I'll change that. In your original patch you wait maximum for one HZ > interval, why not wait forever os_idle_sleep()? I did not have a periodic timer. If you have a periodic timer in addition to the one-shot (as you have done) you can wait forever - the periodic will wake you up. >> >> >>> + /* there is no benefit whatsoever in disabling a pending >>> + * hrtimer and setting a nanowait for the same value instead >>> + * so we do timer disable + wait only for the tracing one here >>> + */ >>> + >>> + nsecs = os_timer_disable(); >>> + os_idle_sleep(nsecs); >>> + os_timer_set_interval(NULL); >> This is wrong as nsec is time to next timer firing. You just used two >> syscalls to achieve the same effect as leaving the timer lapse while >> going to sleep. See above. >> >>> local_irq_enable(); >>> } >>> >>> diff --git a/arch/um/kernel/skas/clone.c b/arch/um/kernel/skas/clone.c >>> index 289771d..c1cdc2e 100644 >>> --- a/arch/um/kernel/skas/clone.c >>> +++ b/arch/um/kernel/skas/clone.c >>> @@ -35,10 +35,11 @@ stub_clone_handler(void) >>> if (err) >>> goto out; >>> >>> - err = stub_syscall3(__NR_setitimer, ITIMER_VIRTUAL, >>> - (long) &data->timer, 0); >>> - if (err) >>> - goto out; >>> +// WHY? FIXME: Switch to timer_create, timer_settime needed?! >>> +// err = stub_syscall3(__NR_setitimer, ITIMER_VIRTUAL, >>> +// (long) &data->timer, 0); >>> +// if (err) >>> +// goto out; >> Beats me - if you do not do it, userspace stalls. Though I see you do it >> now in the tramp. I need to test that :) > I'm not sure about that yet. > I really need to figure out when above code is called. Sadly I seem to hit > some assertion in gdb and this kills my gdb session :-( > I'll look into the gdb code to understand what assert it is hitting. Did you > see similar problems while debugging the uml kernel executable? I ended up debugging like a caveman - with printk()s. GDB is unusable half of the time. > > I think start_userspace is one way of creating a new userspace process, maybe > there are other ways to reach above code from a clone call that will start a > new user space process?! That would be the only thing that would make > sense... If so an posix interval timer need to be created in above code. I never understood this part. Richard? > >>> +/** >>> + * userspace() - user space control loop >>> + * @regs: ? >>> + * >>> + * The main loop that traces and controls each spwaned userspace >>> + * process, i.e. >>> + */ >>> void userspace(struct uml_pt_regs *regs) >>> { >>> - struct itimerval timer; >>> - unsigned long long nsecs, now; >>> + unsigned long long nsecs; >>> int err, status, op, pid = userspace_pid[0]; >>> /* To prevent races if using_sysemu changes under us.*/ >>> int local_using_sysemu; >>> @@ -325,13 +371,11 @@ void userspace(struct uml_pt_regs *regs) >>> /* Handle any immediate reschedules or signals */ >>> interrupt_end(); >>> >>> - if (getitimer(ITIMER_VIRTUAL, &timer)) >>> - printk(UM_KERN_ERR "Failed to get itimer, errno = %d\n", errno); >>> - nsecs = timer.it_value.tv_sec * UM_NSEC_PER_SEC + >>> - timer.it_value.tv_usec * UM_NSEC_PER_USEC; >>> - nsecs += os_nsecs(); >>> - >>> while (1) { >>> + >>> + nsecs = os_timer_remain(NULL); >>> + nsecs += os_nsecs(); >>> + >>> /* >>> * This can legitimately fail if the process loads a >>> * bogus value into a segment register. It will >>> @@ -388,31 +432,27 @@ void userspace(struct uml_pt_regs *regs) >>> switch (sig) { >>> case SIGSEGV: >>> if (PTRACE_FULL_FAULTINFO) { >>> - get_skas_faultinfo(pid, >>> - ®s->faultinfo); >>> - (*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)&si, >>> - regs); >>> + get_skas_faultinfo(pid,®s->faultinfo); >>> + (*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)&si, >>> regs); >>> + } else { >>> + handle_segv(pid, regs); >>> } >>> - else handle_segv(pid, regs); >>> break; >>> case SIGTRAP + 0x80: >>> - handle_trap(pid, regs, local_using_sysemu); >>> + handle_trap(pid, regs, local_using_sysemu); >>> break; >>> case SIGTRAP: >>> relay_signal(SIGTRAP, (struct siginfo *)&si, regs); >>> break; >>> - case SIGVTALRM: >>> - now = os_nsecs(); >>> - if (now < nsecs) >>> + case SIGUSR2: >> We need to look into this one - it was here because VTALRM can produce a >> significant difference between the time the kernel one fires and the >> userspace fires. If the kernel one uses MONOTONIC to an exact time and >> the userspace uses MONOTONIC to pace it to firings from "interval" sets >> you end up with HZ scheduling and HZ timer precision instead of the >> scheduler and userspace timers as we know it in today's Linux. > Okay, yes I understand now. > > But currently the user space posix interval timers are not created > synchronised compared to the kernel interval timer. So there is very likely a > gap between the firing of all timers. > Creating all timer to fire in HZ rate at exactly the same time is the correct > thing to do in my opinion. What do you think? > > > Do we need to take care of CONFIG_NO_HZ_IDLE AND CONFIG_NO_HZ_FULL?! Need to read them in another architecture to compare. They are new (relative to the time the patch was done). A. > > With kind regards > Thomas > > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel