Am 06.05.2012 um 10:29 schrieb Blue Swirl <blauwir...@gmail.com>:
> On Wed, May 2, 2012 at 2:38 PM, Artyom Tarasenko <atar4q...@gmail.com> wrote: >> On Tue, May 1, 2012 at 4:06 PM, Blue Swirl <blauwir...@gmail.com> wrote: >>> On Tue, May 1, 2012 at 13:54, Artyom Tarasenko <atar4q...@gmail.com> wrote: >>>> On Tue, May 1, 2012 at 11:25 AM, Blue Swirl <blauwir...@gmail.com> wrote: >>>>> On Mon, Apr 30, 2012 at 17:38, Artyom Tarasenko <atar4q...@gmail.com> >>>>> wrote: >>>>>> On Mon, Apr 30, 2012 at 7:15 PM, Andreas Färber <afaer...@suse.de> wrote: >>>>>>> Am 30.04.2012 18:39, schrieb Artyom Tarasenko: >>>>>>>> Tried to boot QEMU Niagara machine with the firmware from the >>>>>>>> OpenSPARC T1 emulator ( www.opensparc.net/opensparc-t1/download.html ) >>>>>>>> , and it dies very early. >>>>>>>> The reason: in translate.c >>>>>>>> >>>>>>>> #define hypervisor(dc) (dc->mem_idx == MMU_HYPV_IDX) >>>>>>>> #define supervisor(dc) (dc->mem_idx >= MMU_KERNEL_IDX) >>>>>>>> >>>>>>>> and the dc->mem_idx is initialized like this: >>>>>>>> >>>>>>>> if (env1->tl > 0) { >>>>>>>> return MMU_NUCLEUS_IDX; >>>>>>>> } else if (cpu_hypervisor_mode(env1)) { >>>>>>>> return MMU_HYPV_IDX; >>>>>>>> } else if (cpu_supervisor_mode(env1)) { >>>>>>>> return MMU_KERNEL_IDX; >>>>>>>> } else { >>>>>>>> return MMU_USER_IDX; >>>>>>>> } >>>>>>>> >>>>>>>> Which seems to be conceptually incorrect. After reset tl == MAXTL, but >>>>>>>> still super- and hyper-visor bits are set, so both supervisor(dc) and >>>>>>>> hypervisor(dc) must return 1 which is impossible in the current >>>>>>>> implementation. >>>>>>>> >>>>>>>> What would be the proper way to fix it? Make mem_idx bitmap, add two >>>>>>>> more variables to DisasContext, or ...? >>>>>>>> >>>>>>>> Some other findings/questions: >>>>>>>> >>>>>>>> /* Sun4v generic Niagara machine */ >>>>>>>> { >>>>>>>> .default_cpu_model = "Sun UltraSparc T1", >>>>>>>> .console_serial_base = 0xfff0c2c000ULL, >>>>>>>> >>>>>>>> Where is this address coming from? The OpenSPARC Niagara machine has a >>>>>>>> "dumb serial" at 0x1f10000000ULL. >>>>>>>> >>>>>>>> And the biggest issue: UA2005 (as well as UA2007) describe a totally >>>>>>>> different format for a MMU TTE entry than the one sun4u CPU are using. >>>>>>>> I think the best way to handle it would be splitting off Niagara >>>>>>>> machine, and #defining MMU bits differently for sun4u and sun4v >>>>>>>> machines. >>>>>>>> >>>>>>>> Do we the cases in qemu where more than two (qemu-system-xxx and >>>>>>>> qemu-system-xxx64) binaries are produced? >>>>>>>> Would the name qemu-system-sun4v fit the naming convention? >>>>>>> >>>>>>> We have such a case for ppc (ppcemb) and it is kind of a maintenance >>>>>>> nightmare - I'm working towards getting rid of it with my QOM CPU work. >>>>>>> Better avoid it for sparc in the first place. >>>>>>> >>>>>>> Instead, you should add a callback function pointer to SPARCCPUClass >>>>>>> that you initialize based on CPU model so that is behaves differently at >>>>>>> runtime rather than at compile time. >>>>>>> Or if it's just about the class_init then after the Hard Freeze I can >>>>>>> start polishing my subclasses for sparc so that you can add a special >>>>>>> class_init for Niagara. >>>>>> >>>>>> But this would mean that the defines from >>>>>> #define TTE_NFO_BIT (1ULL << 60) >>>>>> to >>>>>> #define TTE_PGSIZE(tte) (((tte) >> 61) & 3ULL) >>>>>> >>>>>> inclusive would need to be replaced with functions and variables? >>>>>> Sounds like a further performance regression for sun4u? >>>>> >>>>> There could be parallel definitions for sun4u (actually UltraSparc-III >>>>> onwards the MMU is again different) and sun4v. >>>>> >>>>> At tlb_fill(), different implementations can be selected based on MMU >>>>> model. For ASI accesses, we can add conditional code but for higher >>>>> performance, some checks can be moved to translation time. >>>> >>>> Can be done, but what is the gain of having it runtime configurable? >>> >>> I was thinking of code like this in: >>> >>> switch (env->mmu_model) { >>> case MMU_US2: >>> return tlb_fill_us2(..); >>> case MMU_US3: >>> return tlb_fill_us3(..); >>> case MMU_US4: >>> return tlb_fill_us4(..); >>> case MMU_T1: >>> return tlb_fill_t1(..); >>> case MMU_T2: >>> return tlb_fill_t2(..); >>> } >>> >>> The perfomance cost shouldn't be too high. Alternatively a function >>> pointer could be set up. >> >> Actually I was more worried about get_physical_address_* than filling, >> there we would have to use variables instead of constants and >> functions instead of macros. > > Preferably entirely different functions with constants. > >> >>> Yes, we can always provide the register bank, older models just access >>> some of those. >>> >>>> cpu_change_pstate should probably have another parameter (new_GL) >>>> which is only valid for sun4v. >>>> And, depending on a trap type, env->htba has to be taken instead of >>>> env->tbr. To me it looks like at the end do_interrupt will have less >>>> common parts between sun4u and sun4v than specific ones. >>> >>> Same as tlb_fill(), switch() or function pointer. The functions are >>> different. >>> >>> This is unavoidable (unless maybe in the future the TLB handling can >>> be pushed partially higher so mmu_idx parameters can be eliminated) >>> and the performance cost is not great. >> >> So, altogether you'd still prefer run-time checks over having >> qemu-system-sun4v (or -sparc64v) ? > > Yes. Architectures are not meant to handle small issues like this. > Should performance become a problem, there are a plenty of lower > hanging fruits where to start optimizing. > > Even in this case, rather than the new architecture solution, it could > be possible to build separate TLB handlers which call directly the > correct MMU functions without switches and these would be selected at > translation time or earlier. For the PPCEMB case, maybe the memory API > could be changed to handle different page sizes without loss of > performance, I don't know. Devices should not depend on > TARGET_PAGE_SIZE. It's not a matter of an API. The main problem is that the QEMU TLB has to be fine grained enough to handle 1k faults, so it has to be in 1k-steps in its current design. That'd hurt performance quite a bit. The softmmu already is a very big chunk of execution time on ppc and zi really don't want that number to go up. Alex