Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

Blue Swirl Thu, 27 May 2010 12:20:19 -0700

On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka <jan.kis...@web.de> wrote:
> Blue Swirl wrote:
>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka <jan.kis...@web.de> wrote:
>>> Blue Swirl wrote:
>>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook <p...@codesourcery.com> wrote:
>>>>>> At the other extreme, would it be possible to make the educated guests
>>>>>> aware of the virtualization also in clock aspect: virtio-clock?
>>>>> The guest doesn't even need to be aware of virtualization. It just needs 
>>>>> to be
>>>>> able to accommodate the lack of guaranteed realtime behavior.
>>>>>
>>>>> The fundamental problem here is that some guest operating systems assume 
>>>>> that
>>>>> the hardware provides certain realtime guarantees with respect to 
>>>>> execution of
>>>>> interrupt handlers.  In particular they assume that the CPU will always be
>>>>> able to complete execution of the timer IRQ handler before the periodic 
>>>>> timer
>>>>> triggers again.  In most virtualized environments you have absolutely no
>>>>> guarantee of realtime response.
>>>>>
>>>>> With Linux guests this was solved a long time ago by the introduction of
>>>>> tickless kernels.  These separate the timekeeping from wakeup events, so 
>>>>> it
>>>>> doesn't matter if several wakeup triggers end up getting merged (either 
>>>>> at the
>>>>> hardware level or via top/bottom half guest IRQ handlers).
>>>>>
>>>>>
>>>>> It's worth mentioning that this problem also occurs on real hardware,
>>>>> typically due to lame hardware/drivers which end up masking interrupts or
>>>>> otherwise stall the CPU for for long periods of time.
>>>>>
>>>>>
>>>>> The PIT hack attempts to workaround broken guests by adding artificial 
>>>>> latency
>>>>> to the timer event, ensuring that the guest "sees" them all.  
>>>>> Unfortunately
>>>>> guests vary on when it is safe for them to see the next timer event, and
>>>>> trying to observe this behavior involves potentially harmful heuristics 
>>>>> and
>>>>> collusion between unrelated devices (e.g. interrupt controller and timer).
>>>>>
>>>>> In some cases we don't even do that, and just reschedule the event some
>>>>> arbitrarily small amount of time later. This assumes the guest to do 
>>>>> useful
>>>>> work in that time. In a single threaded environment this is probably true 
>>>>> -
>>>>> qemu got enough CPU to inject the first interrupt, so will probably 
>>>>> manage to
>>>>> execute some guest code before the end of its timeslice. In an environment
>>>>> where interrupt processing/delivery and execution of the guest code 
>>>>> happen in
>>>>> different threads this becomes increasingly likely to fail.
>>>> So any voodoo around timer events is doomed to fail in some cases.
>>>> What's the amount of hacks what we want then? Is there any generic
>>> The aim of this patch is to reduce the amount of existing and upcoming
>>> hacks. It may still require some refinements, but I think we haven't
>>> found any smarter approach yet that fits existing use cases.
>>
>> I don't feel we have tried other possibilities hard enough.
>
> Well, seeing prototypes wouldn't be bad, also to run real load againt
> them. But at least I'm currently clueless what to implement.


Perhaps now is then not the time to rush to implement something, but
to brainstorm for a clean solution.

>>
>>>> solution, like slowing down the guest system to the point where we can
>>>> guarantee the interrupt rate vs. CPU execution speed?
>>> That's generally a non-option in virtualized production environments.
>>> Specifically if the guest system lost interrupts due to host
>>> overcommitment, you do not want it slow down even further.
>>
>> I meant that the guest time could be scaled down, for example 2s in
>> wall clock time would be presented to the guest as 1s.
>
> But that is precisely what already happens when the guest loses timer
> interrupts. There is no other time source for this kind of guests -
> often except for some external events generated by systems which you
> don't want to fall behind arbitrarily.
>
>> Then the amount
>> of CPU cycles between timer interrupts would increase and hopefully
>> the guest can keep up. If the guest sleeps, time base could be
>> accelerated to catch up with wall clock and then set back to 1:1 rate.
>
> Can't follow you ATM, sorry. What should be slowed down then? And how
> precisely?

I think vm_clock and everything that depends on vm_clock, also
rtc_clock should be tied to vm_clock in this mode, not host_clock.

>
> Jan
>
>>
>> Slowing down could be triggered by measuring the guest load (for
>> example, by checking for presence of halt instructions), if it's close
>> to 1, time would be slowed down. If the guest starts to issue halt
>> instructions because it's more idle, we can increase speed.
>>
>> If this approach worked, even APIC could be made ignorant about
>> coalescing voodoo so it should be a major cleanup.
>
>
>

Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

Reply via email to