On Sun, May 6, 2012 at 8:58 AM, Alexander Graf <ag...@suse.de> wrote:
>
>
> Am 06.05.2012 um 10:29 schrieb Blue Swirl <blauwir...@gmail.com>:
>
>> On Wed, May 2, 2012 at 2:38 PM, Artyom Tarasenko <atar4q...@gmail.com> wrote:
>>> On Tue, May 1, 2012 at 4:06 PM, Blue Swirl <blauwir...@gmail.com> wrote:
>>>> On Tue, May 1, 2012 at 13:54, Artyom Tarasenko <atar4q...@gmail.com> wrote:
>>>>> On Tue, May 1, 2012 at 11:25 AM, Blue Swirl <blauwir...@gmail.com> wrote:
>>>>>> On Mon, Apr 30, 2012 at 17:38, Artyom Tarasenko <atar4q...@gmail.com> 
>>>>>> wrote:
>>>>>>> On Mon, Apr 30, 2012 at 7:15 PM, Andreas Färber <afaer...@suse.de> 
>>>>>>> wrote:
>>>>>>>> Am 30.04.2012 18:39, schrieb Artyom Tarasenko:
>>>>>>>>> Tried to boot QEMU Niagara machine with the firmware from the
>>>>>>>>> OpenSPARC T1 emulator ( www.opensparc.net/opensparc-t1/download.html )
>>>>>>>>> , and it dies very early.
>>>>>>>>> The reason: in translate.c
>>>>>>>>>
>>>>>>>>> #define hypervisor(dc) (dc->mem_idx == MMU_HYPV_IDX)
>>>>>>>>> #define supervisor(dc) (dc->mem_idx >= MMU_KERNEL_IDX)
>>>>>>>>>
>>>>>>>>> and the dc->mem_idx is initialized like this:
>>>>>>>>>
>>>>>>>>>     if (env1->tl > 0) {
>>>>>>>>>         return MMU_NUCLEUS_IDX;
>>>>>>>>>     } else if (cpu_hypervisor_mode(env1)) {
>>>>>>>>>         return MMU_HYPV_IDX;
>>>>>>>>>     } else if (cpu_supervisor_mode(env1)) {
>>>>>>>>>         return MMU_KERNEL_IDX;
>>>>>>>>>     } else {
>>>>>>>>>         return MMU_USER_IDX;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> Which seems to be conceptually incorrect. After reset tl == MAXTL, but
>>>>>>>>> still super- and hyper-visor bits are set, so both supervisor(dc) and
>>>>>>>>> hypervisor(dc) must return 1 which is impossible in the current
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> What would be the proper way to fix it? Make mem_idx bitmap, add two
>>>>>>>>> more variables to DisasContext, or ...?
>>>>>>>>>
>>>>>>>>> Some other findings/questions:
>>>>>>>>>
>>>>>>>>>     /* Sun4v generic Niagara machine */
>>>>>>>>>     {
>>>>>>>>>         .default_cpu_model = "Sun UltraSparc T1",
>>>>>>>>>         .console_serial_base = 0xfff0c2c000ULL,
>>>>>>>>>
>>>>>>>>> Where is this address coming from? The OpenSPARC Niagara machine has a
>>>>>>>>> "dumb serial" at 0x1f10000000ULL.
>>>>>>>>>
>>>>>>>>> And the biggest issue: UA2005 (as well as UA2007) describe a totally
>>>>>>>>> different format for a MMU TTE entry than the one sun4u CPU are using.
>>>>>>>>> I think the best way to handle it would be splitting off Niagara
>>>>>>>>> machine, and #defining MMU bits differently for sun4u and sun4v
>>>>>>>>> machines.
>>>>>>>>>
>>>>>>>>> Do we the cases in qemu where more than two (qemu-system-xxx and
>>>>>>>>> qemu-system-xxx64) binaries are produced?
>>>>>>>>> Would the name qemu-system-sun4v fit the naming convention?
>>>>>>>>
>>>>>>>> We have such a case for ppc (ppcemb) and it is kind of a maintenance
>>>>>>>> nightmare - I'm working towards getting rid of it with my QOM CPU work.
>>>>>>>> Better avoid it for sparc in the first place.
>>>>>>>>
>>>>>>>> Instead, you should add a callback function pointer to SPARCCPUClass
>>>>>>>> that you initialize based on CPU model so that is behaves differently 
>>>>>>>> at
>>>>>>>> runtime rather than at compile time.
>>>>>>>> Or if it's just about the class_init then after the Hard Freeze I can
>>>>>>>> start polishing my subclasses for sparc so that you can add a special
>>>>>>>> class_init for Niagara.
>>>>>>>
>>>>>>> But this would mean that the defines from
>>>>>>> #define TTE_NFO_BIT (1ULL << 60)
>>>>>>> to
>>>>>>> #define TTE_PGSIZE(tte)     (((tte) >> 61) & 3ULL)
>>>>>>>
>>>>>>> inclusive would need to be replaced with functions and variables?
>>>>>>> Sounds like a further performance regression for sun4u?
>>>>>>
>>>>>> There could be parallel definitions for sun4u (actually UltraSparc-III
>>>>>> onwards the MMU is again different) and sun4v.
>>>>>>
>>>>>> At tlb_fill(), different implementations can be selected based on MMU
>>>>>> model. For ASI accesses, we can add conditional code but for higher
>>>>>> performance, some checks can be moved to translation time.
>>>>>
>>>>> Can be done, but what is the gain of having it runtime configurable?
>>>>
>>>> I was thinking of code like this in:
>>>>
>>>> switch (env->mmu_model) {
>>>> case MMU_US2:
>>>>   return tlb_fill_us2(..);
>>>> case MMU_US3:
>>>>   return tlb_fill_us3(..);
>>>> case MMU_US4:
>>>>   return tlb_fill_us4(..);
>>>> case MMU_T1:
>>>>   return tlb_fill_t1(..);
>>>> case MMU_T2:
>>>>   return tlb_fill_t2(..);
>>>> }
>>>>
>>>> The perfomance cost shouldn't be too high. Alternatively a function
>>>> pointer could be set up.
>>>
>>> Actually I was more worried about get_physical_address_* than filling,
>>> there we would have to use variables instead of constants and
>>> functions instead of macros.
>>
>> Preferably entirely different functions with constants.
>>
>>>
>>>> Yes, we can always provide the register bank, older models just access
>>>> some of those.
>>>>
>>>>> cpu_change_pstate should probably have another parameter (new_GL)
>>>>> which is only valid for sun4v.
>>>>> And, depending on a trap type, env->htba has to be taken instead of
>>>>> env->tbr. To me it looks like at the end do_interrupt will have less
>>>>> common parts between sun4u and sun4v than specific ones.
>>>>
>>>> Same as tlb_fill(), switch() or function pointer. The functions are 
>>>> different.
>>>>
>>>> This is unavoidable (unless maybe in the future the TLB handling can
>>>> be pushed partially higher so mmu_idx parameters can be eliminated)
>>>> and the performance cost is not great.
>>>
>>> So, altogether you'd still prefer run-time checks over having
>>> qemu-system-sun4v (or -sparc64v) ?
>>
>> Yes. Architectures are not meant to handle small issues like this.
>> Should performance become a problem, there are a plenty of lower
>> hanging fruits where to start optimizing.
>>
>> Even in this case, rather than the new architecture solution, it could
>> be possible to build separate TLB handlers which call directly the
>> correct MMU functions without switches and these would be selected at
>> translation time or earlier. For the PPCEMB case, maybe the memory API
>> could be changed to handle different page sizes without loss of
>> performance, I don't know. Devices should not depend on
>> TARGET_PAGE_SIZE.
>
> It's not a matter of an API. The main problem is that the QEMU TLB has to be 
> fine grained enough to handle 1k faults, so it has to be in 1k-steps in its 
> current design.
>
> That'd hurt performance quite a bit. The softmmu already is a very big chunk 
> of execution time on ppc and zi really don't want that number to go up.

Yes, but that's not what I proposed. Now the translator arranges a
call to for example qemu_ld32u which uses this fixed TLB size.
Instead, it should generate calls to qemu_ld32u_ppcemb (which should
use 1k pages) or qemu_ld32u_ppc (4k?) as needed, maybe also combining
with MMU_IDX they would expand to qemu_ld32u_ppc_hypv etc. Obviously
this would need big changes everywhere and MMU_IDX change should be
compared with the negative cache effects of having many separate,
often used functions in the hot path.

Memory API (or actually the related pieces in exec.c) hardcodes
TARGET_PAGE_SIZE assumptions in many places, but it might be possible
to make this dynamic.

>
>
> Alex
>

Reply via email to