[Xen-ia64-devel] RE: [PATCH 04/15] ia64/pv_ops: introduce pv_info which describes some random info.
Rather than making these binary patches, why not make them fast syscalls and using a vdso page. Some of the priviledged instructions are simply reads and we could have that information in a read-only data page, so there is no need to do a context switch at all. Others could benefit from a fast system call that doesn't do a full context switch. The issue is we don't want to change Linux code a lot, otherwise it won't be accepted. If we use same logic with original Linux, but it is using indirect function call now, binary patching could help in performance. It would be nice if we could come up with a generic implementation for such a vdso style interface that could be shared between xen/kvm/lguest. Introducing a new mechanism to use it to replace Linux code to use them is another different story. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: [PATCH 04/15] ia64/pv_ops: introduce pv_info which describes some random info.
Jes Sorensen wrote: Dong, Eddie wrote: Rather than making these binary patches, why not make them fast syscalls and using a vdso page. Some of the priviledged instructions are simply reads and we could have that information in a read-only data page, so there is no need to do a context switch at all. Others could benefit from a fast system call that doesn't do a full context switch. The issue is we don't want to change Linux code a lot, otherwise it won't be accepted. If we use same logic with original Linux, but it is using indirect function call now, binary patching could help in performance. Hi Eddie, Sorry but this is a wrong assumption. If the code is correct then there is no reason why it will not be accepted. It's far more important to avoid ugly clutter that makes the code hard to maintain. My understanding is that code such as IVT table are well tuned and you are really difficult to pursuade people to replace those privilege resource access instruction to use vdso or something equalvalent such as mov GRx=CRy. For those C code previlige resource access, like Isaku mentioned, we need to consider native too. Anyway binary patching is just an optimization that X86 used and there is no reason IA64 can't take. At least replacing indirect function with direct function call. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: imntrinsic pv_ops
Isaku Yamahata wrote: Hi Eddie. I commited some clean ups based on your patch. Could you please review it? It looks like you still prefer to use intermediate symbol paravirt_/ia64_native_xxx to wrap ia64_xxx. In some sense, when I saw many similar and bulk code in the patch, I feel dirty. I prefer we don't touch those gcc_intrinsic.h MACROs. If you don't like the way I wrap it, i.e. MACRO level ifdef /else. you may just simply provide a new head file for CONFIG_PARAVIRT which simply use pv_ops.name for all those same MACROs. such as in intrinsic.h: #ifdef CONFIG_PARAVIRT_GUEST #include paravirt_intrinsic.h #else gcc_intrinsic.h or intel_intrinsic.h. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops progress ask for suggestion
Tony all: Recently we have completed the IVT.s pv_ops by using dual compile, and also many cleanups to simplify the changes to upstream code. All the C code touching privilege instruction is replaced with indirect function call (will be binary patched to use direct function call in future), and IVT table is dual compiled to minimize impact to native IVT table, but we get some dilemma in handling kernel/entry.S and also generic policy for other ASM files. In entry.S, there are around 17 privilege instructions, some of them must be paravirtualized including 2 cover instructions, and 1 RFI (this one is due to Xen hypervisor issue). There are other 15 privilege instructions (In Xen) such as CR access that could be paravirtualized for performance reason. Now we have 2 choices: Alt1: Dual compile entry.S like IVT.s (dual compile all ASM files if it needs virtualization) pros: Same policy with iVT, use same MACRO to replacement. cons: There are other ASM files such as sn/kernel/pio_phys.S need to be dual compiled too. And unlike IVT table, the memory occupied by dual compiled code won't be able to be freed easily since the size is not fixed. Also all future ASM code touch privilege instruction may need to be dual compiled too. Alt2: Use indirect call like C code for non IVT nor gate page code (dual compile only for IVT gate page which has fixed size and performance killer) Pros: flexible for future ASM code (just use same MACRO, no dual compile requirement). Cons: 2 sets of solution for ASM code, and also slightly performance lose due to indirect function call (future patching will convert it to direct function call, or in place code.) Any suggestions? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] cpu ops
In current approach, we have cpu ops like eoi/set_tpr/get_tpr,/set_itm /set_kr0/set_kr2.../set_kr7 etc. I think there is another simple alternative is to simply export setreg/getreg for cpu ops. The benefit of this could be: 1: Simple in pv_ops I/F 2: hypervisor neutral. Today we only virtualize around 15 AR/CR read/write, But future it may extends since different Hypervisor may do in different way. Do u like to see this one happen? thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] cpu ops
Isaku Yamahata wrote: On Mon, Apr 07, 2008 at 05:25:54PM +0800, Dong, Eddie wrote: In current approach, we have cpu ops like eoi/set_tpr/get_tpr,/set_itm /set_kr0/set_kr2.../set_kr7 etc. I think there is another simple alternative is to simply export setreg/getreg for cpu ops. The benefit of this could be: 1: Simple in pv_ops I/F 2: hypervisor neutral. Today we only virtualize around 15 AR/CR read/write, But future it may extends since different Hypervisor may do in different way. Sounds reasonable. I have a patch ready for this based on my previous removed paravirt_xxx/ia64_native_xxx version, I can rebase. Although it would result in big switch, presumably we can eliminate Yes, many switch case. May hurt performance before patching. runtime switch cost by specifically handling setreg/getreg which might complicate binay patch slightly. Yes, but eventually the performance will come back after patching. Eddie. ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: pv_ops progress ask for suggestion
Isaku Yamahata wrote: On Mon, Apr 07, 2008 at 05:47:38PM +0800, Dong, Eddie wrote: Tony all: Recently we have completed the IVT.s pv_ops by using dual compile, and also many cleanups to simplify the changes to upstream code. All the C code touching privilege instruction is replaced with indirect function call (will be binary patched to use direct function call in future), and IVT table is dual compiled to minimize impact to native IVT table, but we get some dilemma in handling kernel/entry.S and also generic policy for other ASM files. In entry.S, there are around 17 privilege instructions, some of them must be paravirtualized including 2 cover instructions, and 1 RFI (this one is due to Xen hypervisor issue). There are other 15 privilege instructions (In Xen) such as CR access that could be paravirtualized for performance reason. Probably we can discusse well with the concrete patch. So I'll post the patches. (Creating the reviewable patch set may take a while though.) If it is 200 lines of patch, that is perfect. If it is a 2000+ lines of patch, I prefer a 200 lines of pseudo code. Now we have 2 choices: Alt1: Dual compile entry.S like IVT.s (dual compile all ASM files if it needs virtualization) pros: Same policy with iVT, use same MACRO to replacement. cons: There are other ASM files such as sn/kernel/pio_phys.S need to be dual compiled too. And unlike IVT table, the memory occupied by dual compiled code won't be able to be freed easily since the size is not fixed. Also all future ASM code touch privilege instruction may need to be dual compiled too. I suppose the more generalized problem is - The memory for unused pv code/data won't be executed/referenced so that it can be freed somehow. Is it worth while to do that? And how to do it? For IVT table (64K) gate page (1 page), it can be done except relocating those IP relative symbols. Looking at current xen code size it might be worth while, but not so big win. Agree in some level. Depend on how strictly we want the code to be perfect. This is not ia64 specific issues, and should be addressed in arch generic way. This hasn't been addressed even on x86. X86 doesn't use dual compile. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: intrinsic ops2
In current patch series, we have many definition: +#define ia64_itci ia64_native_itci +#define ia64_itcd ia64_native_itcd +#define ia64_itri ia64_native_itri +#define ia64_itrd ia64_native_itrd +#define ia64_tpa ia64_native_tpa +#define ia64_set_ibr ia64_native_set_ibr +#define ia64_set_pkr ia64_native_set_pkr +#define ia64_set_pmc ia64_native_set_pmc +#define ia64_set_pmd ia64_native_set_pmd +#define ia64_set_rria64_native_set_rr +#define ia64_get_cpuid ia64_native_get_cpuid +#define ia64_get_ibr ia64_native_get_ibr +#define ia64_get_pkr ia64_native_get_pkr +#define ia64_get_pmc ia64_native_get_pmc +#define ia64_get_pmd ia64_native_get_pmd Which comes from gcc_intrin.h such as: -#define ia64_itci(addr)asm volatile (itc.i %0;; :: r(addr) : memory) +#define ia64_native_itci(addr) asm volatile (itc.i %0;; :: r(addr) : memory) The question is we actually don't have xen_itci in the whole patch, should we remove the indirect reference of new ia64_itci--ia64_native_itci-old_ia64_itci. It is just identical and the change is redunadnt at least for this moment. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: imntrinsic pv_ops
Isaku Yamahata wrote: On Wed, Apr 02, 2008 at 01:51:28PM +0800, Dong, Eddie wrote: Current definition of intrinsic APIs seems to be too expansive, this one give alternative way to do simply and reduce some changes. If this applies, further simplification can be applied. Thx, eddie Interesting approach. If we can replace most of them, I'll apply. But half converted state is inconsistent. Can't it replace others? All of them can be done in this way. Defining those function by macro is a good idea. But, undef/redefine CONFIG_PARAVIRT looks ugly and defining conflicting name would be confusing. Ideally they should be in a seperate file, or at end of paravirt.c where #undef is clean. I guess your concern is removing bunch of #define ia64_xxx ... Not exactly. I just think it is cleaner and smaller in patch size. (And yes, I agree with you to clean them up.) So how about something like the following? in intrinsic.h #ifdef CONFIG_APRAVIRT #define IA64_INTRINSIC_API(name) paravirt. ## name Do u mean pv_cpu_ops. ## name ? #else #define IA64_INTRINSIC_API(name) ia64_native_ ## name #endif #define ia64_fc IA64_INTRINSIC(fc) ... and keep ia64_native_xxx definitions. I want to keep ia64_xxx definition since it is an unnecessary change. BTW, if we review at diff against original one, it looks better. This doesn't depend on the number of arguments. ??? It is always one parameter. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: imntrinsic pv_ops
Current definition of intrinsic APIs seems to be too expansive, this one give alternative way to do simply and reduce some changes. If this applies, further simplification can be applied. Thx, eddie Simplify intrinsic API handling. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index 4b01c44..6ce4f60 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -3,6 +3,7 @@ * * Copyright (c) 2008 Isaku Yamahata yamahata at valinux co jp *VA Linux Systems Japan K.K. + * Yaozu (Eddie) Dong [EMAIL PROTECTED] * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -53,29 +54,18 @@ struct pv_init_ops pv_init_ops; /* ia64_native_xxx are macros so that we have to make them real functions */ -static void -ia64_native_fc_func(unsigned long addr) -{ - ia64_native_fc(addr); -} - -static unsigned long -ia64_native_thash_func(unsigned long addr) -{ - return ia64_native_thash(addr); +#define NATIVE_INTRINSIC_API1(type, name, para1) \ +static type native_ ## name(unsigned long para1) { \ + return name(para1); \ } -static unsigned long -ia64_native_get_cpuid_func(int index) -{ - return ia64_native_get_cpuid(index); -} - -static unsigned long -ia64_native_get_pmd_func(int index) -{ - return ia64_native_get_pmd(index); -} +#undef CONFIG_PARAVIRT +NATIVE_INTRINSIC_API1(void, ia64_fc, addr) +NATIVE_INTRINSIC_API1(unsigned long, ia64_thash, addr) +NATIVE_INTRINSIC_API1(unsigned long, ia64_get_cpuid, addr) +NATIVE_INTRINSIC_API1(unsigned long, ia64_get_pmd, index) +NATIVE_INTRINSIC_API1(void, ia64_intrin_local_irq_restore, flags) +#define CONFIG_PARAVIRT static unsigned long ia64_native_get_eflag_func(void) @@ -217,17 +207,11 @@ ia64_native_get_psr_i_func(void) return ia64_native_get_psr_i(); } -static void -ia64_native_intrin_local_irq_restore_func(unsigned long flags) -{ - ia64_native_intrin_local_irq_restore(flags); -} - struct pv_cpu_ops pv_cpu_ops = { - .fc = ia64_native_fc_func, - .thash = ia64_native_thash_func, - .get_cpuid = ia64_native_get_cpuid_func, - .get_pmd= ia64_native_get_pmd_func, + .fc = native_ia64_fc, + .thash = native_ia64_thash, + .get_cpuid = native_ia64_get_cpuid, + .get_pmd= native_ia64_get_pmd, .get_eflag = ia64_native_get_eflag_func, .set_eflag = ia64_native_set_eflag_func, .get_psr= ia64_native_get_psr_func, @@ -252,7 +236,7 @@ struct pv_cpu_ops pv_cpu_ops = { .rsm_i = ia64_native_rsm_i_func, .get_psr_i = ia64_native_get_psr_i_func, .intrin_local_irq_restore - = ia64_native_intrin_local_irq_restore_func, + = native_ia64_intrin_local_irq_restore, }; /*** *** @@ -335,3 +319,5 @@ ia64_native_do_steal_accounting(unsigned long *new_itm) struct pv_time_ops pv_time_ops = { .do_steal_accounting = ia64_native_do_steal_accounting, }; + + diff --git a/include/asm-ia64/gcc_intrin.h b/include/asm-ia64/gcc_intrin.h index b9fa3f4..fda12e7 100644 --- a/include/asm-ia64/gcc_intrin.h +++ b/include/asm-ia64/gcc_intrin.h @@ -4,6 +4,7 @@ * * Copyright (C) 2002,2003 Jun Nakajima [EMAIL PROTECTED] * Copyright (C) 2002,2003 Suresh Siddha [EMAIL PROTECTED] + * Copyright (C) 2008 Yaozu (Eddie) Dong [EMAIL PROTECTED] */ #include linux/compiler.h @@ -28,6 +29,13 @@ extern void ia64_bad_param_for_getreg (void); register unsigned long ia64_r13 asm (r13) __used; #endif +#ifdef CONFIG_PARAVIRT +#define INTRINSIC_INS1(name, para1, ins) \ + pv_cpu_ops.name(para1) +#else +#define INTRINSIC_INS1(name, para1, ins) ins +#endif + #define ia64_native_setreg(regnum, val) \ ({ \ switch (regnum) { \ @@ -381,8 +389,8 @@ register unsigned long ia64_r13 asm (r13) __used; #define ia64_invala() asm volatile (invala ::: memory) -#define ia64_native_thash(addr) \ -({ \ +#define ia64_thash(addr) \ + INTRINSIC_INS1(thash, addr,{ \ __u64 ia64_intri_res; \ asm volatile (thash %0=%1 : =r(ia64_intri_res) : r (addr));\ ia64_intri_res; \ @@ -437,8 +445,8 @@ register unsigned long ia64_r13 asm (r13) __used; #define ia64_native_set_rr(index, val) \ asm volatile (mov rr[%0]=%1 :: r(index), r(val) : memory); -#define ia64_native_get_cpuid(index) \ -({ \ +#define ia64_get_cpuid(index) \ + INTRINSIC_INS1(get_cpuid, index, { \ __u64 ia64_intri_res; \ asm volatile (mov %0=cpuid[%r1] : =r(ia64_intri_res) : rO(index)); \ ia64_intri_res; \ @@ -473,8 +481,8 @@ register unsigned
[Xen-ia64-devel] pv_ops: file name
Isaku: When looking at the new files in kernel new directory, I am wondering if we can merge paravirt_core.c code into paravirt.c? The size of them are very small and meaning are similar, also X86 put in one file. Another file is paravirt_entry.c, will paravirt_patch.c be much accurate? Some for head file side. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: entry.S simplification
Isaku Yamahata wrote: On Fri, Mar 28, 2008 at 01:43:23PM +0800, Dong, Eddie wrote: Eventually those running_on_xen checks should be removed somehow. Are you just thinking that the multi compile with binary patching should be introduced after the first merge? Or do you have any idea other than the multi compile with binary patching? Dual compile every change may be not necessary for me. The reason for IVT is that code there is very critical and stakeholders won't change them to steal registers. They even don't want a single change without full hand of performance data + stress test. In entry.S, steal clobber register is easy. ia64_swtich_to(), ia64_leave_syscall() and ia64_leave_kernel() are also performance critical, aren't they? If we rate those performance critical items, I would vote IVT as 1st, and then followed by fast system call. The 3rd one can be this one. A handler of IVT is in the range of 20-50 instructions, a fast hypercall may be less than 100 instructions. For ia64_switch_to, the scheduler switch code is in levels of multiple handreds of instructions in my understanding. Putting indirect function call of pv_ops here just introduce 3-10 additional instructions. Is this what you concern? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] include/asm-ia64/xen/hypercall.h
It seems some APIs in that file is dead code, this one is to remove dead code or dom0 only code? Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/xen/Makefile b/arch/ia64/xen/Makefile index 605b757..dc8fee6 100644 --- a/arch/ia64/xen/Makefile +++ b/arch/ia64/xen/Makefile @@ -5,7 +5,7 @@ KBUILD_AFLAGS += -D__IA64_ASM_PARAVIRTUALIZED_XEN obj-y := hypercall.o time.o xenivt.o xensetup.o xen_pv_ops.o irq_xen.o \ -hypervisor.o util.o xencomm.o xcom_hcall.o xcom_asm.o paravirt_xen.o +hypervisor.o util.o xencomm.o xcom_hcall.o paravirt_xen.o obj-y += ../kernel/ivt.o diff --git a/arch/ia64/xen/xcom_asm.S b/arch/ia64/xen/xcom_asm.S deleted file mode 100644 index 8747908..000 --- a/arch/ia64/xen/xcom_asm.S +++ /dev/null @@ -1,27 +0,0 @@ -/* - * xencomm suspend support - * Support routines for Xen - * - * Copyright (C) 2005 Dan Magenheimer [EMAIL PROTECTED] - */ -#include asm/asmmacro.h -#include xen/interface/xen.h - -/* - * Stub for suspend. - * Just force the stacked registers to be written in memory. - */ -GLOBAL_ENTRY(xencomm_arch_hypercall_suspend) - ;; - alloc r20=ar.pfs,0,0,6,0 - mov r2=__HYPERVISOR_sched_op - ;; - /* We don't want to deal with RSE. */ - flushrs - mov r33=r32 - mov r32=2 // SCHEDOP_shutdown - ;; - break 0x1000 - ;; - br.ret.sptk.many b0 -END(xencomm_arch_hypercall_suspend) diff --git a/arch/ia64/xen/xcom_hcall.c b/arch/ia64/xen/xcom_hcall.c index bfddbd7..4a89a74 100644 --- a/arch/ia64/xen/xcom_hcall.c +++ b/arch/ia64/xen/xcom_hcall.c @@ -401,17 +401,6 @@ xencomm_hypercall_memory_op(unsigned int cmd, void *arg) } EXPORT_SYMBOL_GPL(xencomm_hypercall_memory_op); -int -xencomm_hypercall_suspend(unsigned long srec) -{ - struct sched_shutdown arg; - - arg.reason = SHUTDOWN_suspend; - - return xencomm_arch_hypercall_suspend( - xencomm_map_no_alloc(arg, sizeof(arg))); -} - long xencomm_hypercall_vcpu_op(int cmd, int cpu, void *arg) { @@ -443,16 +432,3 @@ xencomm_hypercall_opt_feature(void *arg) xencomm_map_no_alloc(arg, sizeof(struct xen_ia64_opt_feature))); } - -int -xencomm_hypercall_fpswa_revision(unsigned int *revision) -{ - struct xencomm_handle *desc; - - desc = xencomm_map_no_alloc(revision, sizeof(*revision)); - if (desc == NULL) - return -EINVAL; - - return xencomm_arch_hypercall_fpswa_revision(desc); -} -EXPORT_SYMBOL_GPL(xencomm_hypercall_fpswa_revision); diff --git a/include/asm-ia64/xen/hypercall.h b/include/asm-ia64/xen/hypercall.h index 075b9e1..77dda9d 100644 --- a/include/asm-ia64/xen/hypercall.h +++ b/include/asm-ia64/xen/hypercall.h @@ -313,38 +313,7 @@ HYPERVISOR_unexpose_foreign_p2m(unsigned long gpfn, domid_t domid) } #endif -static inline int -xencomm_arch_hypercall_perfmon_op(unsigned long cmd, - struct xencomm_handle *arg, - unsigned long count) -{ - return _hypercall4(int, ia64_dom0vp_op, - IA64_DOM0VP_perfmon, cmd, arg, count); -} - -static inline int -xencomm_arch_hypercall_fpswa_revision(struct xencomm_handle *arg) -{ - return _hypercall2(int, ia64_dom0vp_op, - IA64_DOM0VP_fpswa_revision, arg); -} -static inline int -xencomm_arch_hypercall_ia64_debug_op(unsigned long cmd, -unsigned long domain, -struct xencomm_handle *arg) -{ - return _hypercall3(int, ia64_debug_op, cmd, domain, arg); -} - -static inline int -HYPERVISOR_add_io_space(unsigned long phys_base, - unsigned long sparse, - unsigned long space_number) -{ - return _hypercall4(int, ia64_dom0vp_op, IA64_DOM0VP_add_io_space, - phys_base, sparse, space_number); -} /* for balloon driver */ #define HYPERVISOR_update_va_mapping(va, new_val, flags) (0) @@ -355,16 +324,9 @@ HYPERVISOR_add_io_space(unsigned long phys_base, #define HYPERVISOR_callback_op xencomm_hypercall_callback_op #define HYPERVISOR_multicall xencomm_hypercall_multicall #define HYPERVISOR_xen_version xencomm_hypercall_xen_version -#define HYPERVISOR_console_io xencomm_hypercall_console_io -#define HYPERVISOR_hvm_op xencomm_hypercall_hvm_op #define HYPERVISOR_memory_op xencomm_hypercall_memory_op -#define HYPERVISOR_xenoprof_op xencomm_hypercall_xenoprof_op -#define HYPERVISOR_perfmon_op xencomm_hypercall_perfmon_op -#define HYPERVISOR_fpswa_revision xencomm_hypercall_fpswa_revision -#define HYPERVISOR_suspend xencomm_hypercall_suspend #define HYPERVISOR_vcpu_op xencomm_hypercall_vcpu_op #define HYPERVISOR_opt_feature xencomm_hypercall_opt_feature -#define HYPERVISOR_kexec_op xencomm_hypercall_kexec_op /* to compile gnttab_copy_grant_page() in drivers/xen/core/gnttab.c */ #define
RE: [Xen-ia64-devel] pv_ops: ministate.h typo fix
Isaku Yamahata wrote: On Thu, Mar 27, 2008 at 12:20:37PM +0800, Dong, Eddie wrote: - shuffle instructions of XEN_BSW_1 and xen DO_XEN_MIN(). Is this for producing better bundles? Please ellaborate on this. If so, I'll take as another patch. ??? Which code are u talking for? The following hunks. The instruction order was changed. What's the purpose? Yes, this is to make a compact bundle, otherwise IVT table size will be overflowed in dispatch_to_fault_handler. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: entry.S simplification
Isaku Yamahata wrote: Hi Eddie. I looked into entry.S closely. Unfortunately I found that ia64_leave_syscall() and ia64_leave_kernel() includes invirtualizable instructions, cover instruction with psr.ic = 0 so that those paravirtualization is inevitable. (ia64_switch_to() doesn't need paravirtualization though.) Yes there 2 kind of instructions we must modify, one is cover when psr.ic=0, another one is RFI which can;t be handled by Xen today. But I temply put running on xen for now, I am working on using indirect function call pv_ops now. Or do you mean there are still missed cover instruction? Does it really work? Probably just seeing login prompt test doesn't reveal the issues. thanks, I can login and do minimal ops, I didn't take stress test. But in my coding time, if a cover with PSR.ic=0 is missed, or RFI is missed, guest will soon die. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Where to compile additional IVT.S
Isaku Yamahata wrote: arch/ia64/kernel/ivt.o is overwritten. Building again under arch/ia64/kernel would cause trouble. What do you think the following? ia64/pv_ops: complie paravirtualized assembly files into each pv dirs. compile ivt.S and switch_leave.S into each pv instanc dir. With this patch, arch/ia64/kernel/Makefile can be simpler than before. I already posted a similar one, and seems to be much simple. Any issues? Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: move binary patching to later after CPU initialization
Isaku Yamahata wrote: I guess you just followed x86 way, but delaying until check_bug() is too late for IA64 case because of at least ia64_get_cpuid(). No. Binary patching is just optimization, while pv_ops hook is installed at very beginning. At this moment I'm not sure how late binary patching can be delayed, though. Presumably it is necessary to revise boot protocol. Any time as if SMC sequence is considered. Putting together with original Linux patching code will make things simple. Renaming xen_paravirt_patch() to xen_patch() seems reasonable, so I applied only the renaming part. I strongly suggest you check in entry.S patch first and then everything will be very simple. And your above concern can be solved too. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: entry.S simplification
Isaku Yamahata wrote: Oh, I misunderstood your patch. I thought it just revert entry.S to original state. But it paravirtualized conver and rfi with running_on_xen check. Now I'm convinced that your patch works. Only one comment on the patch itself is, #ifdef CONFIG_XEN is necessary for !CONFIG_XEN case. Then the left issue is 'if the patch is acceptable for the upstream'. The purpose of reducing the total patch size is eventually to make ia64/xen domU patches more acceptable for the upstream. However with the patch you reintroduced running_on_xen check which we have eliminted. That contradicts with the pv_ops principle. It's a trade off between the patch size and the patch cleanness. That is a temporary solution, I am working to use indirect function call. Eventually those running_on_xen checks should be removed somehow. Are you just thinking that the multi compile with binary patching should be introduced after the first merge? Or do you have any idea other than the multi compile with binary patching? Dual compile every change may be not necessary for me. The reason for IVT is that code there is very critical and stakeholders won't change them to steal registers. They even don't want a single change without full hand of performance data + stress test. In entry.S, steal clobber register is easy. Anyway it's linux-ia64 people that finally determines what way is better. To be honest I'm not sure which way is more acceptable. So let's discuss with linux-ia64. Yes, we can, but keeping that big patch is a problem for now. Also the switch_entry.S has a ugly code that must use binary patching to patch to Xen. Meanwhile like you mentioned yesterday, where to do binary patching is an issue, especially for native case. Current approach doesn;t consider native patching which is more important than xen patching for now. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] remove dead pv_irq_ops.init_IRQ_late
commit d9c6c77dbb20cd5cc9ffbbe8e2398eb737a83162 Author: root [EMAIL PROTECTED] Date: Wed Mar 26 14:14:31 2008 +0800 Remove paravirt_init_IRQ_late since it is never activated. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c index 10d9a5d..ceafde9 100644 --- a/arch/ia64/kernel/irq_ia64.c +++ b/arch/ia64/kernel/irq_ia64.c @@ -665,7 +665,6 @@ init_IRQ (void) pfm_init_percpu(); #endif platform_irq_init(); - paravirt_init_IRQ_late(); } void diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h index 3721eff..285f7ff 100644 --- a/include/asm-ia64/paravirt.h +++ b/include/asm-ia64/paravirt.h @@ -164,7 +164,6 @@ __iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) struct pv_irq_ops { void (*init_IRQ_early)(void); - void (*init_IRQ_late)(void); int (*assign_irq_vector)(int irq); void (*free_irq_vector)(int vector); @@ -184,13 +183,6 @@ paravirt_init_IRQ_early(void) pv_irq_ops.init_IRQ_early(); } -static inline void -paravirt_init_IRQ_late(void) -{ - if (pv_irq_ops.init_IRQ_late) - pv_irq_ops.init_IRQ_late(); -} - static inline int assign_irq_vector(int irq) { @@ -266,7 +258,6 @@ paravirt_do_steal_accounting(unsigned long *new_itm) #define paravirt_inst_patch_module(start, end) do { } while (0) #define paravirt_init_IRQ_early() do { } while (0) -#define paravirt_init_IRQ_late() do { } while (0) #define paravirt_init_missing_ticks_accounting(cpu)do { } while (0) #define paravirt_do_steal_accounting() 0 irq_ia64_clean1.patch Description: irq_ia64_clean1.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: RFC: paravirt_init_IRQ_early
Currently, paravirt_init_IRQ_early is used to register IA64_IPI_RESCHEDULE/IA64_IPI_LOCAL_TLB_FLUSH for different hypervisor/native. It seems not strightforward from the name, how about something like: pv_irq_ops.register_ipi ? We can let include IA64_IPI_VECTOR register too. I am not native english, so I am not sure. Comments? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: move binary patching to later after CPU initialization
arch/ia64/kernel/paravirt.c |8 +++- arch/ia64/kernel/paravirt_core.c | 17 ++--- arch/ia64/kernel/paravirt_entry.c |3 ++- arch/ia64/kernel/setup.c |3 +++ arch/ia64/xen/paravirt_xen.c |8 +--- arch/ia64/xen/xen_pv_ops.c|4 arch/ia64/xen/xensetup.S | 10 -- include/asm-ia64/paravirt.h |1 + 8 files changed, 20 insertions(+), 34 deletions(-) So far it is still NULL for both native xen. Thanks, eddie Defer binary patching from beginning to later after initialization is done. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index 37bad82..b7340dd 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -39,12 +39,18 @@ struct pv_info pv_info = { .name = bare hardware }; +static void native_patch(void) +{ +} + /*** * pv_init_ops * initialization hooks. */ -struct pv_init_ops pv_init_ops; +struct pv_init_ops pv_init_ops = { + .patch = native_patch, +}; /*** * pv_cpu_ops diff --git a/arch/ia64/kernel/paravirt_core.c b/arch/ia64/kernel/paravirt_core.c index 6b7c70f..003ce1f 100644 --- a/arch/ia64/kernel/paravirt_core.c +++ b/arch/ia64/kernel/paravirt_core.c @@ -21,20 +21,7 @@ */ #include asm/paravirt_core.h - -/* - * flush_icache_range() can't be used here. - * we are here before cpu_init() which initializes - * ia64_i_cache_stride_shift. flush_icache_range() uses it. - */ -void __init_or_module -paravirt_flush_i_cache_range(const void *instr, unsigned long size) -{ - unsigned long i; - - for (i = 0; i size; i += sizeof(bundle_t)) - asm volatile (fc.i %0:: r(instr + i): memory); -} +#include asm/pgtable.h bundle_t* __init_or_module paravirt_get_bundle(unsigned long tag) @@ -162,7 +149,7 @@ paravirt_write_inst(unsigned long tag, cmp_inst_t inst) default: BUG(); } - paravirt_flush_i_cache_range(bundle, sizeof(*bundle)); + flush_icache_range((unsigned long)bundle, (unsigned long)(bundle+1)); } /* for debug */ diff --git a/arch/ia64/kernel/paravirt_entry.c b/arch/ia64/kernel/paravirt_entry.c index 708287a..857d2a1 100644 --- a/arch/ia64/kernel/paravirt_entry.c +++ b/arch/ia64/kernel/paravirt_entry.c @@ -20,6 +20,7 @@ #include asm/paravirt_core.h #include asm/paravirt_entry.h +#include asm/pgtable.h /* br.cond.sptk.many target25B1 */ typedef union inst_b1 { @@ -56,7 +57,7 @@ __paravirt_entry_apply(unsigned long tag, const void *target) inst.l = inst_b1.l; paravirt_write_inst(tag, inst); - paravirt_flush_i_cache_range(bundle, sizeof(*bundle)); + flush_icache_range((unsigned long)bundle, (unsigned long)(bundle+1)); } static void __init diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index 24561d3..6634ba7 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -987,6 +987,9 @@ cpu_init (void) void __init check_bugs (void) { +#ifdef CONFIG_PARAVIRT_GUEST +pv_init_ops.patch(); +#endif ia64_patch_mckinley_e9((unsigned long) __start___mckinley_e9_bundles, (unsigned long) __end___mckinley_e9_bundles); } diff --git a/arch/ia64/xen/paravirt_xen.c b/arch/ia64/xen/paravirt_xen.c index aa12cb5..969478e 100644 --- a/arch/ia64/xen/paravirt_xen.c +++ b/arch/ia64/xen/paravirt_xen.c @@ -28,7 +28,7 @@ const static struct paravirt_entry xen_entries[] __initdata = { }; void __init -xen_entry_patch(void) +xen_patch(void) { extern const struct paravirt_entry_patch __start_paravirt_entry[]; extern const struct paravirt_entry_patch __stop_paravirt_entry[]; @@ -39,12 +39,6 @@ xen_entry_patch(void) sizeof(xen_entries)/sizeof(xen_entries[0])); } -void __init -xen_paravirt_patch(void) -{ - xen_entry_patch(); -} - /* * Local variables: * mode: C diff --git a/arch/ia64/xen/xen_pv_ops.c b/arch/ia64/xen/xen_pv_ops.c index 3601b79..a2da7b2 100644 --- a/arch/ia64/xen/xen_pv_ops.c +++ b/arch/ia64/xen/xen_pv_ops.c @@ -38,6 +38,9 @@ #include irq_xen.h #include time.h +/* TODO: move xen_patch to this file */ +extern void xen_patch(void); + /*** * general info */ @@ -157,6 +160,7 @@ xen_post_smp_prepare_boot_cpu(void) static const struct pv_init_ops xen_init_ops __initdata = { .banner = xen_banner, + .patch = xen_patch, .reserve_memory = xen_reserve_memory, diff --git a/arch/ia64/xen/xensetup.S b/arch/ia64/xen/xensetup.S index cb3432b..0df93d8 100644 --- a/arch/ia64/xen/xensetup.S +++ b/arch/ia64/xen/xensetup.S @@ -45,16 +45,6 @@ GLOBAL_ENTRY(early_xen_setup) ;; #endif -#ifdef
[Xen-ia64-devel] RE: Xen common code across architecture
Jeremy Fitzhardinge wrote: Dong, Eddie wrote: Jeremy/Andrew: Isaku Yamahata, I and some other IA64/Xen community memebers are working together to enable pv_ops for IA64 Linux. This patch is a preparation to move common arch/x86/xen/events.c to drivers/xen (contents are identical) against mm tree, it is based on Yamahata's IA64/pv_ops patch serie. In case you want to have a brief view of whole pv_ops/IA64 patch serie, please refer to IA64 Linux mailinglist. How do you want to manage this work? I'm currently basing off Ingo+tglx's x86.git tree. Would you like me to track these kinds of common-code changes in my tree, while you maintain a separate ia64-specific tree? Hi, Jeremy: I didn't realized there is a xen pv_ops downstream tree, yes if you can take it first, that will be great! BTW, where is your latest tree? I got one from git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git, but it seems to be pretty old. commit ab9c232286c2b77be78441c2d8396500b045777e Merge: 8bd0983... 2855568... Author: Linus Torvalds [EMAIL PROTECTED] Date: Fri Oct 12 16:16:41 2007 -0700 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jga Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops: ministate.h typo fix
Isaku Yamahata wrote: Hi Eddie. The attached patches does many things. Could you explain? - convert cover argument in SAVE_MIN_WITH_COVER(_R19) into COVER. This seems correct. I'll take this part. - convert __COVER argument into COVER. Using conflicting argument is a bad practice. This is what original Linux uses, I think we don't need to convert from COVER to __COVER which I guess you think it is as a cover instruction. - shuffle instructions of XEN_BSW_1 and xen DO_XEN_MIN(). Is this for producing better bundles? Please ellaborate on this. If so, I'll take as another patch. ??? Which code are u talking for? - churning header file inclusion. I need to rethink to do this with another mail you posted as where to compile. I'll answer it to that mail. I'm now inclined to move ia64/kernel/minstate.h under include/asm-ia64/native/. This is not in my patch, right? If you want to remove it back forth, can u do after my queueing patch is taken or modified since rebase is headache. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: hypercall.S cleanup
Most hypercall are identical in source code, using a common MACRO to define 0/1/2 parameter(s) hypercall is much simple. arch/ia64/xen/hypercall.S | 154 +- include/asm-ia64/xen/privop.h | 26 --- 2 files changed, 51 insertions(+), 129 deletions(-) Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/xen/hypercall.S b/arch/ia64/xen/hypercall.S index 615dad9..ce7b015 100644 --- a/arch/ia64/xen/hypercall.S +++ b/arch/ia64/xen/hypercall.S @@ -2,79 +2,64 @@ * Support routines for Xen hypercalls * * Copyright (C) 2005 Dan Magenheimer [EMAIL PROTECTED] + * Copyright (C) 2008 Yaozu (Eddie) Dong [EMAIL PROTECTED] */ #include asm/asmmacro.h #include asm/intrinsics.h -GLOBAL_ENTRY(xen_get_psr) - XEN_HYPER_GET_PSR - br.ret.sptk.many rp - ;; -END(xen_get_psr) - -GLOBAL_ENTRY(xen_get_ivr) - XEN_HYPER_GET_IVR - br.ret.sptk.many rp - ;; -END(xen_get_ivr) - -GLOBAL_ENTRY(xen_get_tpr) - XEN_HYPER_GET_TPR - br.ret.sptk.many rp - ;; -END(xen_get_tpr) - -GLOBAL_ENTRY(xen_set_tpr) - mov r8=r32 - XEN_HYPER_SET_TPR - br.ret.sptk.many rp - ;; -END(xen_set_tpr) - -GLOBAL_ENTRY(xen_eoi) - mov r8=r32 - XEN_HYPER_EOI - br.ret.sptk.many rp - ;; -END(xen_eoi) - -GLOBAL_ENTRY(xen_thash) - mov r8=r32 - XEN_HYPER_THASH - br.ret.sptk.many rp - ;; -END(xen_thash) - -GLOBAL_ENTRY(xen_set_itm) - mov r8=r32 - XEN_HYPER_SET_ITM - br.ret.sptk.many rp - ;; -END(xen_set_itm) +/* + * Hypercalls without parameter. + */ +#define __HCALL0(name,hcall) \ + GLOBAL_ENTRY(name); \ + break hcall; \ + br.ret.sptk.many rp;\ + END(name) -GLOBAL_ENTRY(xen_ptcga) - mov r8=r32 - mov r9=r33 - XEN_HYPER_PTC_GA - br.ret.sptk.many rp - ;; -END(xen_ptcga) +/* + * Hypercalls with 1 parameter. + */ +#define __HCALL1(name,hcall) \ + GLOBAL_ENTRY(name); \ + mov r8=r32; \ + break hcall; \ + br.ret.sptk.many rp;\ + END(name) -GLOBAL_ENTRY(xen_get_rr) - mov r8=r32 - XEN_HYPER_GET_RR - br.ret.sptk.many rp - ;; -END(xen_get_rr) +/* + * Hypercalls with 2 parameters. + */ +#define __HCALL2(name,hcall) \ + GLOBAL_ENTRY(name); \ + mov r8=r32; \ + mov r9=r33; \ + break hcall; \ + br.ret.sptk.many rp;\ + END(name) + +__HCALL0(xen_get_psr, HYPERPRIVOP_GET_PSR) +__HCALL0(xen_get_ivr, HYPERPRIVOP_GET_IVR) +__HCALL0(xen_get_tpr, HYPERPRIVOP_GET_TPR) +__HCALL0(xen_hyper_ssm_i, HYPERPRIVOP_SSM_I) + +__HCALL1(xen_set_tpr, HYPERPRIVOP_SET_TPR) +__HCALL1(xen_eoi, HYPERPRIVOP_EOI) +__HCALL1(xen_thash, HYPERPRIVOP_THASH) +__HCALL1(xen_set_itm, HYPERPRIVOP_SET_ITM) +__HCALL1(xen_get_rr, HYPERPRIVOP_GET_RR) +__HCALL1(xen_fc, HYPERPRIVOP_FC) +__HCALL1(xen_get_cpuid, HYPERPRIVOP_GET_CPUID) +__HCALL1(xen_get_pmd, HYPERPRIVOP_GET_PMD) + +__HCALL2(xen_ptcga, HYPERPRIVOP_PTC_GA) +__HCALL2(xen_set_rr, HYPERPRIVOP_SET_RR) +__HCALL2(xen_set_kr, HYPERPRIVOP_SET_KR) -GLOBAL_ENTRY(xen_set_rr) - mov r8=r32 - mov r9=r33 - XEN_HYPER_SET_RR - br.ret.sptk.many rp - ;; -END(xen_set_rr) +#ifdef CONFIG_IA32_SUPPORT +__HCALL1(xen_get_eflag, HYPERPRIVOP_GET_EFLAG) +__HCALL1(xen_set_eflag, HYPERPRIVOP_SET_EFLAG) // refer SDM vol1 3.1.8 +#endif /* CONFIG_IA32_SUPPORT */ GLOBAL_ENTRY(xen_set_rr0_to_rr4) mov r8=r32 @@ -87,45 +72,6 @@ GLOBAL_ENTRY(xen_set_rr0_to_rr4) ;; END(xen_set_rr0_to_rr4) -GLOBAL_ENTRY(xen_set_kr) - mov r8=r32 - mov r9=r33 - XEN_HYPER_SET_KR - br.ret.sptk.many rp -END(xen_set_kr) - -GLOBAL_ENTRY(xen_fc) - mov r8=r32 - XEN_HYPER_FC - br.ret.sptk.many rp -END(xen_fc) - -GLOBAL_ENTRY(xen_get_cpuid) - mov r8=r32 - XEN_HYPER_GET_CPUID - br.ret.sptk.many rp -END(xen_get_cpuid) - -GLOBAL_ENTRY(xen_get_pmd) - mov r8=r32 - XEN_HYPER_GET_PMD - br.ret.sptk.many rp -END(xen_get_pmd) - -#ifdef CONFIG_IA32_SUPPORT -GLOBAL_ENTRY(xen_get_eflag) - XEN_HYPER_GET_EFLAG - br.ret.sptk.many rp -END(xen_get_eflag) - -// some bits aren't set if pl!=0, see SDM vol1 3.1.8 -GLOBAL_ENTRY(xen_set_eflag) - mov r8=r32 - XEN_HYPER_SET_EFLAG - br.ret.sptk.many rp -END(xen_set_eflag) -#endif /* CONFIG_IA32_SUPPORT */ - GLOBAL_ENTRY(xen_send_ipi) mov r14=r32 mov r15=r33 diff --git a/include/asm-ia64/xen/privop.h b/include/asm-ia64/xen/privop.h index 7657d37..e69380a 100644 --- a/include/asm-ia64/xen/privop.h +++ b/include/asm-ia64/xen/privop.h @@ -35,22 +35,8 @@ #define XEN_HYPER_ITC_Ibreak HYPERPRIVOP_ITC_I #define XEN_HYPER_SSM_I
[Xen-ia64-devel] RE: Xen common code across architecture
Dong, Eddie wrote: Jeremy/Andrew: Isaku Yamahata, I and some other IA64/Xen community memebers are working together to enable pv_ops for IA64 Linux. This patch is a preparation to move common arch/x86/xen/events.c to drivers/xen (contents are identical) against mm tree, it is based on Yamahata's IA64/pv_ops patch serie. In case you want to have a brief view of whole pv_ops/IA64 patch serie, please refer to IA64 Linux mailinglist. Thanks, Eddie Fix a typo. Merged one is attached too. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] --- drivers/xen/events_old.c2008-03-25 14:31:40.503525471 +0800 +++ drivers/xen/events.c2008-03-25 14:19:39.841851430 +0800 @@ -37,7 +37,7 @@ #include xen/interface/xen.h #include xen/interface/event_channel.h -#include xen-ops.h +#include xen/xen-ops.h /* * This lock protects updates to the following mapping and reference-count typo Description: typo move_xenirq3.patch Description: move_xenirq3.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: ministate.h typo fix
The MACRO parameter COVER in DO_SAVE_MIN won't be replaced by COVER macro in inst.h since it is already replaced when compiler extend SAVE_MIN_WITH_COVER macro etc. Thanks, eddie Fix DO_SAVE_MIN macro typo, and move some instructions to make bundle compact. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/ivt.S b/arch/ia64/kernel/ivt.S index d1cebe5..f2306ae 100644 --- a/arch/ia64/kernel/ivt.S +++ b/arch/ia64/kernel/ivt.S @@ -75,7 +75,6 @@ # define DBG_FAULT(i) #endif -#include inst_paravirt.h #include minstate.h #define FAULT(n) \ diff --git a/arch/ia64/kernel/minstate.h b/arch/ia64/kernel/minstate.h index 10a412c..9e18fb0 100644 --- a/arch/ia64/kernel/minstate.h +++ b/arch/ia64/kernel/minstate.h @@ -2,6 +2,7 @@ #include asm/cache.h #include entry.h +#include inst_paravirt.h #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* @@ -29,7 +30,7 @@ * Note that psr.ic is NOT turned on by this macro. This is so that * we can pass interruption state as arguments to a handler. */ -#define DO_SAVE_MIN(__COVER,SAVE_IFS,EXTRA) \ +#define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \ mov r16=IA64_KR(CURRENT); /* M */ \ mov r27=ar.rsc; /* M */ \ mov r20=r1; /* A */ \ @@ -38,7 +39,7 @@ mov r26=ar.pfs; /* I */ \ MOV_FROM_IIP(r28); /* M */ \ mov r21=ar.fpsr;/* M */ \ - __COVER;/* B;; (or nothing) */ \ + COVER; /* B;; (or nothing) */ \ ;; \ adds r16=IA64_TASK_THREAD_ON_USTACK_OFFSET,r16; \ ;; \ @@ -194,6 +195,6 @@ st8 [r25]=r10; /* ar.ssd */\ ;; -#define SAVE_MIN_WITH_COVERDO_SAVE_MIN(cover, mov r30=cr.ifs,) -#define SAVE_MIN_WITH_COVER_R19DO_SAVE_MIN(cover, mov r30=cr.ifs, mov r15=r19) +#define SAVE_MIN_WITH_COVERDO_SAVE_MIN(COVER, mov r30=cr.ifs,) +#define SAVE_MIN_WITH_COVER_R19DO_SAVE_MIN(COVER, mov r30=cr.ifs, mov r15=r19) #define SAVE_MIN DO_SAVE_MIN( , mov r30=r0, ) diff --git a/arch/ia64/xen/xenivt.S b/arch/ia64/xen/xenivt.S index 2d509f2..17987af 100644 --- a/arch/ia64/xen/xenivt.S +++ b/arch/ia64/xen/xenivt.S @@ -13,9 +13,8 @@ #include asm/kregs.h #include asm/pgtable.h -#include asm/xen/inst.h -#include asm/xen/minstate.h #include ../kernel/minstate.h +#include asm/xen/minstate.h .section .text,ax GLOBAL_ENTRY(xen_event_callback) diff --git a/include/asm-ia64/xen/inst.h b/include/asm-ia64/xen/inst.h index a8fb2ac..1e92d02 100644 --- a/include/asm-ia64/xen/inst.h +++ b/include/asm-ia64/xen/inst.h @@ -414,10 +414,10 @@ movl r30 = XSI_B1NAT; \ ;; \ ld8 r30 = [r30];\ + mov r31 = 1;\ ;; \ mov ar.unat = r30; \ movl r30 = XSI_BANKNUM; \ - mov r31 = 1;\ ;; \ st4 [r30] = r31;\ movl r30 = XSI_BANK1_R16; \ diff --git a/include/asm-ia64/xen/minstate.h b/include/asm-ia64/xen/minstate.h index 67bbf79..7cdebc2 100644 --- a/include/asm-ia64/xen/minstate.h +++ b/include/asm-ia64/xen/minstate.h @@ -25,17 +25,16 @@ * Note that psr.ic is NOT turned on by this macro. This is so that * we can pass interruption state as arguments to a handler. */ -#define DO_SAVE_MIN(__COVER,SAVE_IFS,EXTRA) \ +#define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \ mov r16=IA64_KR(CURRENT); /* M */ \ mov r27=ar.rsc; /* M */ \ mov r20=r1; /* A */ \ mov r25=ar.unat;/* M */ \ MOV_FROM_IPSR(r29); /* M */ \ - mov r26=ar.pfs; /* I */ \ MOV_FROM_IIP(r28); /* M */ \ mov r21=ar.fpsr;/* M */ \ - __COVER;/* B;; (or nothing) */ \ - ;; \ + mov r26=ar.pfs; /* I */ \ + COVER; /* B;; (or nothing) */ \ adds r16=IA64_TASK_THREAD_ON_USTACK_OFFSET,r16; \ ;; \ ld1 r17=[r16]; /* load current-thread.on_ustack flag */ \ @@ -80,17 +79,17 @@ .mem.offset 8,0; st8.spill [r17]=r9,16; \ ;; \ .mem.offset 0,0; st8.spill [r16]=r10,24; \ + movl r8=XSI_PRECOVER_IFS; \ .mem.offset 8,0; st8.spill [r17]=r11,24; \ ;; \ /* xen special handling for possibly lazy cover */ \ /* XXX: SAVE_MIN case in dispatch_ia32_handler: mov r30=r0 */ \ - movl r8=XSI_PRECOVER_IFS; \ ;; \ ld8 r30=[r8]; \ - ;; \ +(pUStk)sub r18=r18,r22;/* r18=RSE.ndirty*8 */ \ st8 [r16]=r28,16; /* save cr.iip */ \ + ;; \ st8 [r17]=r30,16; /* save cr.ifs */
[Xen-ia64-devel] pv_ops: IVT.s replacement to cover all sensitive instructions
Replace all sensitive instructions in dual compile IVT.s. Now the total change against upstream is: 89 files changed, 8857 insertions(+), 1059 deletions(-) All in one patch file size is 11643 lines. If we can seperate those common file movement from arch/x86/xen to driver/xen, additional 1.2-1.4K lines can be saved. Thanks, eddie Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/ivt.S b/arch/ia64/kernel/ivt.S index f2306ae..d516bf4 100644 --- a/arch/ia64/kernel/ivt.S +++ b/arch/ia64/kernel/ivt.S @@ -19,6 +19,7 @@ * Copyright (c) 2008 Isaku Yamahata yamahata at valinux co jp *VA Linux Systems Japan K.K. *pv_ops. + * Yaozu (Eddie) Dong [EMAIL PROTECTED] */ /* * This file defines the interruption vector table used by the CPU. @@ -338,7 +339,7 @@ ENTRY(alt_itlb_miss) DBG_FAULT(3) MOV_FROM_IFA(r16) // get address that caused the TLB miss movl r17=PAGE_KERNEL - MOV_FROM_IPSR(r21) + MOV_FROM_IPSR(p0,r21) movl r19=(((1 IA64_MAX_PHYS_BITS) - 1) ~0xfff) mov r31=pr ;; @@ -378,7 +379,7 @@ ENTRY(alt_dtlb_miss) movl r17=PAGE_KERNEL MOV_FROM_ISR(r20) movl r19=(((1 IA64_MAX_PHYS_BITS) - 1) ~0xfff) - MOV_FROM_IPSR(r21) + MOV_FROM_IPSR(p0,r21) mov r31=pr mov r24=PERCPU_ADDR ;; @@ -417,7 +418,7 @@ ENTRY(alt_dtlb_miss) dep r21=-1,r21,IA64_PSR_ED_BIT,1 ;; or r19=r19,r17 // insert PTE control bits into r19 -(p6) mov cr.ipsr=r21 + MOV_FROM_IPSR(p6,r21) ;; ITC_D(p7, r19, r18) // insert the TLB entry mov pr=r31,-1 @@ -618,9 +619,9 @@ ENTRY(iaccess_bit) /* * Erratum 10 (IFA may contain incorrect address) has NoFix status. */ - mov r17=cr.ipsr + MOV_FROM_IPSR(p0,r17) ;; - mov r18=cr.iip + MOV_FROM_IIP(r18) tbit.z p6,p0=r17,IA64_PSR_IS_BIT// IA64 instruction set? ;; (p6) mov r16=r18 // if so, use cr.iip instead of cr.ifa @@ -745,7 +746,7 @@ ENTRY(break_fault) */ DBG_FAULT(11) mov.m r16=IA64_KR(CURRENT) // M2 r16 - current task (12 cyc) - MOV_FROM_IPSR(r29) // M2 (12 cyc) + MOV_FROM_IPSR(p0,r29) // M2 (12 cyc) mov r31=pr // I0 (2 cyc) MOV_FROM_IIM(r17) // M2 (2 cyc) @@ -1057,11 +1058,11 @@ ENTRY(dispatch_illegal_op_fault) .prologue .body SAVE_MIN_WITH_COVER - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r3) // restore psr.i adds r3=8,r2// set up second base pointer for SAVE_REST ;; alloc r14=ar.pfs,0,0,1,0// must be first in insn group @@ -1109,15 +1110,15 @@ ENTRY(non_syscall) // suitable spot... alloc r14=ar.pfs,0,0,2,0 - mov out0=cr.iim + MOV_FROM_IIM(out0) add out1=16,sp adds r3=8,r2// set up second base pointer for SAVE_REST - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r15,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r15) // restore psr.i movl r15=ia64_leave_kernel ;; SAVE_REST @@ -1143,14 +1144,14 @@ ENTRY(dispatch_unaligned_handler) SAVE_MIN_WITH_COVER ;; alloc r14=ar.pfs,0,0,2,0// now it's safe (must be first in insn group!) - mov out0=cr.ifa + MOV_FROM_IFA(out0) adds out1=16,sp - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r3) // restore psr.i adds r3=8,r2// set up second base pointer ;; SAVE_REST @@ -1182,17 +1183,17 @@ ENTRY(dispatch_to_fault_handler) */ SAVE_MIN_WITH_COVER_R19 alloc r14=ar.pfs,0,0,5,0 - mov out0=r15 MOV_FROM_ISR(out1) MOV_FROM_IFA(out2) MOV_FROM_IIM(out3) MOV_FROM_ITIR(out4) ;; - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3, out0) + mov out0=r15 ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i +
RE: [Xen-ia64-devel] pv_ops: IVT.s replacement to cover all sensitiveinstructions
Akio Takebe wrote: Hi, Eddie diff --git a/arch/ia64/kernel/ivt.S b/arch/ia64/kernel/ivt.S index f2306ae..d516bf4 100644 --- a/arch/ia64/kernel/ivt.S +++ b/arch/ia64/kernel/ivt.S @@ -19,6 +19,7 @@ * Copyright (c) 2008 Isaku Yamahata yamahata at valinux co jp *VA Linux Systems Japan K.K. *pv_ops. + * Yaozu (Eddie) Dong [EMAIL PROTECTED] */ /* * This file defines the interruption vector table used by the CPU. @@ -338,7 +339,7 @@ ENTRY(alt_itlb_miss) DBG_FAULT(3) MOV_FROM_IFA(r16) // get address that caused the TLB miss movl r17=PAGE_KERNEL -MOV_FROM_IPSR(r21) +MOV_FROM_IPSR(p0,r21) Why do you specify p0 to the macro? Is it not neccessary to perform the mov? Originally it is just mov, but there is a place it needs a Pred, see alt_dtlb_miss in ivt.S. So the MACRO in inst.h is replaced with a pred, Or do u mean we keep seperate MACRO for with Pred?. Actually p0 is default pred for all(default) instruction, so post compile code is same. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [Q] How to download pv_ops git tree
Akio Takebe wrote: Hi, Isaku and all I'd like to start pv_ops work. I'm not familiar with git command. I did git clone Isaku's git tree, but I cannot download it. (I can download Xiantao's kvm-ia64 git tree.) Did I have some mistakes? And can I use git protocol instead of http? # git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia6 4.git Initialized empty Git repository in /root/pv_ops/linux-2.6-xen-ia64/.git/ Cannot get remote repository information. Perhaps git-update-server-info needs to be run there? Best Regards, Akio Takebe I am not a git expert either :( But would you please check if the http_proxy is set correctly? Xiantao'tree support git port I think. We was ever suffer from this for couple days too :) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Where to compile additional IVT.S
Dong, Eddie wrote: Alex/Isaku: Current the make file is to compile additional ivt.S at kernel/., another approach is to compile in xen/.. The later one has following benfit: 1: Easy to read for Makefile and easy to extend for more hypervisors. 2: Xen specific ministate.h can be in arch/ia64/xen/, like the one under arch/ia64/kernel. I am not a makefile expert, just use this example to explain idea, suggestion? thanks, eddie Here is the formal patch for this. Thanks, eddie Move 2nd compile of ivt.S to per hypervisor sub dir. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 3e9a162..78ec040 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -80,16 +80,3 @@ $(obj)/gate-data.o: $(obj)/gate.so # AFLAGS_ivt.o += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE -# xen multi compile -$(obj)/xen_%.o: $(src)/%.S FORCE - $(call if_changed_dep,as_o_S) - -# -# xenivt.o -# -obj-$(CONFIG_XEN) += xen_ivt.o -ifeq ($(CONFIG_XEN), y) -targets += xen_ivt.o -$(obj)/build-in.o: xen_ivt.o -endif -AFLAGS_xen_ivt.o += -D__IA64_ASM_PARAVIRTUALIZED_XEN diff --git a/arch/ia64/xen/Makefile b/arch/ia64/xen/Makefile index 87e29d2..605b757 100644 --- a/arch/ia64/xen/Makefile +++ b/arch/ia64/xen/Makefile @@ -2,7 +2,11 @@ # Makefile for Xen components # +KBUILD_AFLAGS += -D__IA64_ASM_PARAVIRTUALIZED_XEN + obj-y := hypercall.o time.o xenivt.o xensetup.o xen_pv_ops.o irq_xen.o \ hypervisor.o util.o xencomm.o xcom_hcall.o xcom_asm.o paravirt_xen.o +obj-y += ../kernel/ivt.o + obj-$(CONFIG_IA64_GENERIC) += machvec.o diff --git a/arch/ia64/xen/xenivt.S b/arch/ia64/xen/xenivt.S index c688aaa..2d509f2 100644 --- a/arch/ia64/xen/xenivt.S +++ b/arch/ia64/xen/xenivt.S @@ -13,7 +13,6 @@ #include asm/kregs.h #include asm/pgtable.h -#define __IA64_ASM_PARAVIRTUALIZED_XEN #include asm/xen/inst.h #include asm/xen/minstate.h #include ../kernel/minstate.h ivt_simplify.patch Description: ivt_simplify.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] Xen common code across architecture
Jeremy all: Current xen kernel codes are in arch/x86/xen, but xen dynamic irqchip (events.c) are common for other architectures such as IA64. We are in progress with enabling pv_ops for IA64 now and want to reuse same code, do we need to move the code to some place common? suggestions? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: simplify hw_irq.h
Either are fine. -Original Message- From: Isaku Yamahata [mailto:[EMAIL PROTECTED] Sent: 2008年3月19日 10:45 To: Dong, Eddie Cc: Alex Williamson; xen-ia64-devel@lists.xensource.com Subject: Re: simplify hw_irq.h Hi Eddie. Thank you for the patches. ia64_vector is for iosapic redirect vector which is 8bit width, isn't it? So just unconditionally replacing u8 with u16 seems unreasonable. How about the following? #ifndef CONFIG_PARAVIRT typedef u8 ia64_vector; #else typedef u16 ia64_vector; #endif On Tue, Mar 18, 2008 at 09:30:19PM +0800, Dong, Eddie wrote: This one should be safe and easy to be accepted to remove CONFIG_XEN. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/include/asm-ia64/hw_irq.h b/include/asm-ia64/hw_irq.h index 80009cd..f670433 100644 --- a/include/asm-ia64/hw_irq.h +++ b/include/asm-ia64/hw_irq.h @@ -15,11 +15,7 @@ #include asm/ptrace.h #include asm/smp.h -#ifndef CONFIG_XEN -typedef u8 ia64_vector; -#else typedef u16 ia64_vector; -#endif /* * 0 special -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: pv_ops: entry.S simplification
Followup patch to delete dead file then. Thanks, eddie entry32.patch Description: entry32.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] Where to compile additional IVT.S
Alex/Isaku: Current the make file is to compile additional ivt.S at kernel/., another approach is to compile in xen/.. The later one has following benfit: 1: Easy to read for Makefile and easy to extend for more hypervisors. 2: Xen specific ministate.h can be in arch/ia64/xen/, like the one under arch/ia64/kernel. I am not a makefile expert, just use this example to explain idea, suggestion? thanks, eddie diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 3e9a162..78ec040 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -80,16 +80,3 @@ $(obj)/gate-data.o: $(obj)/gate.so # AFLAGS_ivt.o += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE -# xen multi compile -$(obj)/xen_%.o: $(src)/%.S FORCE - $(call if_changed_dep,as_o_S) - -# -# xenivt.o -# -obj-$(CONFIG_XEN) += xen_ivt.o -ifeq ($(CONFIG_XEN), y) -targets += xen_ivt.o -$(obj)/build-in.o: xen_ivt.o -endif -AFLAGS_xen_ivt.o += -D__IA64_ASM_PARAVIRTUALIZED_XEN diff --git a/arch/ia64/kernel/ivt.S b/arch/ia64/kernel/ivt.S index d1cebe5..e0c5ec8 100644 --- a/arch/ia64/kernel/ivt.S +++ b/arch/ia64/kernel/ivt.S @@ -75,8 +75,13 @@ # define DBG_FAULT(i) #endif -#include inst_paravirt.h +#ifdef __IA64_ASM_PARAVIRTUALIZED_XEN +#include asm/xen/inst.h #include minstate.h +#else +#include asm/native/inst.h +#endif +#include ../kernel/minstate.h #define FAULT(n) \ mov r31=pr; \ diff --git a/arch/ia64/xen/Makefile b/arch/ia64/xen/Makefile index 87e29d2..a6b5b9a 100644 --- a/arch/ia64/xen/Makefile +++ b/arch/ia64/xen/Makefile @@ -2,7 +2,14 @@ # Makefile for Xen components # +extra-y += xen-ivt.S +KBUILD_AFLAGS += -D__IA64_ASM_PARAVIRTUALIZED_XEN + obj-y := hypercall.o time.o xenivt.o xensetup.o xen_pv_ops.o irq_xen.o \ -hypervisor.o util.o xencomm.o xcom_hcall.o xcom_asm.o paravirt_xen.o +hypervisor.o util.o xencomm.o xcom_hcall.o xcom_asm.o \ +paravirt_xen.o xen-ivt.o + +$(obj)/xen-ivt.S: + cp $(obj)/../kernel/ivt.S $(obj)/xen-ivt.S obj-$(CONFIG_IA64_GENERIC) += machvec.o ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RE: pv_ops polish: config option head file
Alex Williamson wrote: On Tue, 2008-03-18 at 11:36 +0800, Dong, Eddie wrote: I think CONFIG_XEN might become something like CONFIG_PARAVIRT_XEN, which will be dependent on CONFIG_PARAVIRT. There might also be CONFIG_PARAVIRT_LGUEST, CONFIG_PARAVIRT_KVM, etc... I think that Then a single image won't be able to run on both lguest/Xen/KVM. This is worse than running_on_xen dynamic condition check. Huh? I never said you couldn't enable more than one CONFIG_PARAVIRT_FOO flavor in the same binary. How about just simply use CONFIG_PARAVIRT ? ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] simplify hw_irq.h
Alex Williamson wrote: Hi Isaku, Here's some cleanup to arch/ia64/kernel/time.c. I removed time_resume() since it's not called from anywhere. I think this file still needs some work; any PV guest is going to need something like this, so it would be nice to isolate the Xen specific parts and have everything else in PARAVIRT_GUEST code instead of XEN. This might be an opportunity for another pv_ops structure. Maybe we should also create a is_paravirt_guest() macro to clearly distinguish Xen-isms from things we think apply to all PV guests. This should probably live in asm/paravirt.h and include asm/xen/hypervisor.h so we can just include one file and get both is_paravirt_guest() and is_running_on_xen(). Thanks, Alex Signed-off-by: Alex Williamson [EMAIL PROTECTED] --- time.c | 58 +++--- 1 file changed, 7 insertions(+), 51 deletions(-) diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 1bb0362..cae777e 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -31,10 +31,10 @@ #include asm/xen/hypervisor.h #ifdef CONFIG_XEN +#include asm/percpu.h #include linux/kernel_stat.h #include linux/posix-timers.h #include xen/interface/vcpu.h -#include asm/percpu.h #endif #include fsyscall_gtod_data.h @@ -283,7 +283,7 @@ __setup(nojitter, nojitter_setup); #ifdef CONFIG_XEN /* taken from i386/kernel/time-xen.c */ -static void init_missing_ticks_accounting(int cpu) +static void xen_init_missing_ticks_accounting(int cpu) { struct vcpu_register_runstate_memory_area area; struct vcpu_runstate_info *runstate = per_cpu(runstate, cpu); @@ -301,63 +301,19 @@ static void init_missing_ticks_accounting(int cpu) + runstate-time[RUNSTATE_offline]; } -static int xen_ia64_settimefoday_after_resume; +static int xen_ia64_settimeofday_after_resume; static int __init __xen_ia64_settimeofday_after_resume(char *str) { - xen_ia64_settimefoday_after_resume = 1; + xen_ia64_settimeofday_after_resume = 1; return 1; } -__setup(xen_ia64_settimefoday_after_resume, +__setup(xen_ia64_settimeofday_after_resume, __xen_ia64_settimeofday_after_resume); -/* Called after suspend, to resume time. */ -void -time_resume(void) -{ - unsigned int cpu; - - /* Just trigger a tick. */ - ia64_cpu_local_tick(); - - if (xen_ia64_settimefoday_after_resume) { - /* do_settimeofday() resets timer interplator */ - struct timespec xen_time; - int ret; - efi_gettimeofday(xen_time); - - ret = do_settimeofday(xen_time); - WARN_ON(ret); - } else { -#if 0 - /* adjust EFI time */ - struct timespec my_time = CURRENT_TIME; - struct timespec xen_time; - static timespec diff; - struct xen_domctl domctl; - int ret; - - efi_gettimeofday(xen_time); - diff = timespec_sub(xen_time, my_time); - domctl.cmd = XEN_DOMCTL_settimeoffset; - domctl.domain = DOMID_SELF; - domctl.u.settimeoffset.timeoffset_seconds = diff.tv_sec; - ret = HYPERVISOR_domctl_op(domctl); - WARN_ON(ret); -#endif - /* itc_clocksource remembers the last timer status in - * itc_jitter_data. Forget it */ - clocksource_resume(); - } - - for_each_online_cpu(cpu) - init_missing_ticks_accounting(cpu); - - touch_softlockup_watchdog(); -} #else -#define init_missing_ticks_accounting(cpu) do {} while (0) +#define xen_init_missing_ticks_accounting(cpu) do {} while (0) #endif void __devinit @@ -455,7 +411,7 @@ ia64_init_itm (void) clocksource_itc.rating = 50; if (is_running_on_xen()) - init_missing_ticks_accounting(smp_processor_id()); + xen_init_missing_ticks_accounting(smp_processor_id()); /* avoid softlock up message when cpu is unplug and plugged again. */ touch_softlockup_watchdog(); ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel This one should be safe and easy to be accepted to remove CONFIG_XEN. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/include/asm-ia64/hw_irq.h b/include/asm-ia64/hw_irq.h index 80009cd..f670433 100644 --- a/include/asm-ia64/hw_irq.h +++ b/include/asm-ia64/hw_irq.h @@ -15,11 +15,7 @@ #include asm/ptrace.h #include asm/smp.h -#ifndef CONFIG_XEN -typedef u8 ia64_vector; -#else typedef u16 ia64_vector; -#endif /* * 0 special x1 Description: x1 ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com
[Xen-ia64-devel] (no subject)
Following CONFIG_XEN is kind of historic issue, with CONFIG_PARAVIRT, those code should be always enabled, so replacing with CONFIG_PARAVIRT makes more sense. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index a80dd3f..61643f8 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -91,8 +91,8 @@ $(obj)/xen_%.o: $(src)/%.S FORCE # # xenivt.o, xen_switch_leave.o # -obj-$(CONFIG_XEN) += xen_ivt.o xen_switch_leave.o -ifeq ($(CONFIG_XEN), y) +obj-$(CONFIG_PARAVIRT) += xen_ivt.o xen_switch_leave.o +ifeq ($(CONFIG_PARAVIRT), y) targets += xen_ivt.o xen_switch_leave.o $(obj)/build-in.o: xen_ivt.o xen_switch_leave.o endif diff --git a/arch/ia64/kernel/salinfo.c b/arch/ia64/kernel/salinfo.c index 91bc631..dd6b986 100644 --- a/arch/ia64/kernel/salinfo.c +++ b/arch/ia64/kernel/salinfo.c @@ -378,7 +378,7 @@ salinfo_log_open(struct inode *inode, struct file *file) data-open = 0; return -ENOMEM; } -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { ia64_mca_xencomm_t *entry; unsigned long flags; @@ -408,7 +408,7 @@ salinfo_log_release(struct inode *inode, struct file *file) struct salinfo_data *data = entry-data; if (data-state == STATE_NO_DATA) { -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { struct list_head *pos, *n; ia64_mca_xencomm_t *found_entry = NULL; diff --git a/include/asm-ia64/hw_irq.h b/include/asm-ia64/hw_irq.h diff --git a/include/asm-ia64/sal.h b/include/asm-ia64/sal.h index 2965112..8aeefd2 100644 --- a/include/asm-ia64/sal.h +++ b/include/asm-ia64/sal.h @@ -682,7 +682,7 @@ ia64_sal_clear_state_info (u64 sal_info_type) /* Get the processor and platform information logged by SAL with respect to the machine * state at the time of the MCAs, INITs, CMCs, or CPEs. */ -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT static inline u64 ia64_sal_get_state_info_size (u64 sal_info_type); typedef struct ia64_mca_xencomm_t { void *record; @@ -697,7 +697,7 @@ static inline u64 ia64_sal_get_state_info (u64 sal_info_type, u64 *sal_info) { struct ia64_sal_retval isrv; -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { ia64_mca_xencomm_t *entry; struct xencomm_handle *desc = NULL; x2 Description: x2 ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] (no subject)
Yes, but running_on_xen is already there. Alex Williamson wrote: On Tue, 2008-03-18 at 21:51 +0800, Dong, Eddie wrote: Following CONFIG_XEN is kind of historic issue, with CONFIG_PARAVIRT, those code should be always enabled, so replacing with CONFIG_PARAVIRT makes more sense. I disagree, these are xen specific. Alex diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index a80dd3f..61643f8 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -91,8 +91,8 @@ $(obj)/xen_%.o: $(src)/%.S FORCE # # xenivt.o, xen_switch_leave.o # -obj-$(CONFIG_XEN) += xen_ivt.o xen_switch_leave.o -ifeq ($(CONFIG_XEN), y) +obj-$(CONFIG_PARAVIRT) += xen_ivt.o xen_switch_leave.o +ifeq ($(CONFIG_PARAVIRT), y) targets += xen_ivt.o xen_switch_leave.o $(obj)/build-in.o: xen_ivt.o xen_switch_leave.o endif diff --git a/arch/ia64/kernel/salinfo.c b/arch/ia64/kernel/salinfo.c index 91bc631..dd6b986 100644 --- a/arch/ia64/kernel/salinfo.c +++ b/arch/ia64/kernel/salinfo.c @@ -378,7 +378,7 @@ salinfo_log_open(struct inode *inode, struct file *file) data-open = 0; return -ENOMEM; } -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { ia64_mca_xencomm_t *entry; unsigned long flags; @@ -408,7 +408,7 @@ salinfo_log_release(struct inode *inode, struct file *file) struct salinfo_data *data = entry-data; if (data-state == STATE_NO_DATA) { -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { struct list_head *pos, *n; ia64_mca_xencomm_t *found_entry = NULL; diff --git a/include/asm-ia64/hw_irq.h b/include/asm-ia64/hw_irq.h diff --git a/include/asm-ia64/sal.h b/include/asm-ia64/sal.h index 2965112..8aeefd2 100644 --- a/include/asm-ia64/sal.h +++ b/include/asm-ia64/sal.h @@ -682,7 +682,7 @@ ia64_sal_clear_state_info (u64 sal_info_type) /* Get the processor and platform information logged by SAL with respect to the machine * state at the time of the MCAs, INITs, CMCs, or CPEs. */ -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT static inline u64 ia64_sal_get_state_info_size (u64 sal_info_type); typedef struct ia64_mca_xencomm_t { void *record; @@ -697,7 +697,7 @@ static inline u64 ia64_sal_get_state_info (u64 sal_info_type, u64 *sal_info) { struct ia64_sal_retval isrv; -#ifdef CONFIG_XEN +#ifdef CONFIG_PARAVIRT if (is_running_on_xen()) { ia64_mca_xencomm_t *entry; struct xencomm_handle *desc = NULL; ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RE: pv_ops polish: config option head file
How about just simply use CONFIG_PARAVIRT ? Then how do you specify that you want a kernel built with Xen support, but not KVM? Mmm, this is kind of what level of detail do we want user to choose. Given that RHEL want one image, so this sub-option is just for in house development even if multiple IA64 VMM really comes. We can argu for the usage model. Leaving some code for this is OK, but but at least for those who have running_on_xen condition already, we don;t need CONFIG_XEN, (rather CONFIG_PARAVIRT). Also for those Xen specific files, i.e. those xen wrapper code, we can treat whole directory as one, either compile it or skip it. Does this make sense? ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] remove CONFIG_XEN for those already embraced in xen directory
Xen specific directories are only compiled with Xen, keeping CONFIG_XEN in each file is redudant. diff --git a/arch/ia64/xen/xen_pv_ops.c b/arch/ia64/xen/xen_pv_ops.c index 93a5c64..0c978e8 100644 --- a/arch/ia64/xen/xen_pv_ops.c +++ b/arch/ia64/xen/xen_pv_ops.c @@ -210,10 +210,8 @@ static void __init xen_post_paging_init(void) { #ifdef notyet /* XXX: notyet dma api paravirtualization*/ -#ifdef CONFIG_XEN xen_contiguous_bitmap_init(max_pfn); #endif -#endif } static void __init diff --git a/arch/ia64/xen/xenpal.S b/arch/ia64/xen/xenpal.S index 0e05210..57dca95 100644 --- a/arch/ia64/xen/xenpal.S +++ b/arch/ia64/xen/xenpal.S @@ -13,9 +13,7 @@ #include asm/paravirt_nop.h GLOBAL_ENTRY(xen_pal_call_static) -#ifdef CONFIG_XEN BR_IF_NATIVE(native_pal_call_static, r22, p7) -#endif .prologue ASM_UNW_PRLG_RP|ASM_UNW_PRLG_PFS, ASM_UNW_PRLG_GRSAVE(5) alloc loc1 = ar.pfs,4,5,0,0 movl loc2 = pal_entry_point @@ -30,21 +28,16 @@ GLOBAL_ENTRY(xen_pal_call_static) mov loc4=ar.rsc // save RSE configuration ;; mov ar.rsc=0// put RSE in enforced lazy, LE mode -#ifdef CONFIG_XEN mov r9 = r8 XEN_HYPER_GET_PSR ;; mov loc3 = r8 mov r8 = r9 ;; -#else - mov loc3 = psr -#endif mov loc0 = rp .body mov r30 = in2 -#ifdef CONFIG_XEN // this is low priority for paravirtualization, but is called // from the idle loop so confuses privop counting movl r31=XSI_PSR_I_ADDR @@ -57,13 +50,6 @@ GLOBAL_ENTRY(xen_pal_call_static) mov r31 = in3 mov b7 = loc2 ;; -#else - mov r31 = in3 - mov b7 = loc2 - -(p7) rsm psr.i - ;; -#endif mov rp = r8 br.cond.sptk.many b7 1: mov psr.l = loc3 x3 Description: x3 ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] (no subject)
Alex Williamson wrote: On Tue, 2008-03-18 at 22:19 +0800, Dong, Eddie wrote: Yes, but running_on_xen is already there. Will it be there if we only have a kernel compiled with PV KVM support? Are we going to stub out any *_xen_* function/macro in that case? Well, firstly it is harmless, and then do u mean we should also add CONFIG_NATIVE when building with Xen or KVM? It is minor, so up to you and Isaku for each of them case by case. For me I persuade small patch size, and I checked X86 side only have 2 CONFIG_XEN in common file. Thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] remove CONFIG_XEN_IA64_EXPOSE_P2M for now
CONFIG_XEN_IA64_EXPOSE_P2M could be dropped for 1st domU only patch to achieve small patch size, since it is a kind of performance patch. Thx, eddie x4 Description: x4 ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RE: pv_ops polish: config option head file
Alex Williamson wrote: On Tue, 2008-03-18 at 22:18 +0800, Dong, Eddie wrote: How about just simply use CONFIG_PARAVIRT ? Then how do you specify that you want a kernel built with Xen support, but not KVM? Mmm, this is kind of what level of detail do we want user to choose. Given that RHEL want one image, so this sub-option is just for in house development even if multiple IA64 VMM really comes. We can argu for the usage model. Leaving some code for this is OK, but but at least for those who have running_on_xen condition already, we don;t need CONFIG_XEN, (rather CONFIG_PARAVIRT). Also for those Xen specific files, i.e. those xen wrapper code, we can treat whole directory as one, either compile it or skip it. Does this make sense? Hmm, I still disagree. The way the Kconfigs are structured now, we have: PARAVIRT_GUEST - PARAVIRT - XEN PARAVIRT_GUEST adds no code, but enables the other config options. XEN is dependent on PARAVIRT. IMHO, PARAVIRT should enable the pv_ops functionality, but not add the Xen specific code. You can imagine a Yes if full pv_ops is enabled, those kind issue will all go away like X86 side. Actually I compromised to your suggestion to leave some running_on_xen in code, though I still think majority of them should be pv_opsed. With full pv_ops, running_on_xen can disappear too. In this case, full pv_ops will solve CONFIG_XEN too, but since we may not have that much resource to complete it in short term, so I agree we may leave some CONFIG_XEN running_on_xen. For those directories dedicated for XEN, I don't think we need in code CONFIG_XEN any more. For those running_on_xen + CONFIG_XEN case, it is a coding style issue. Long time goal is to use full pv_ops, mca_xencomm_list is one of the candidate IMO. But for now leaveing runing_on_xen, or CONFIG_XEN is OK to me. Whether it needs double condition is up to your guys. PV KVM option or LGUEST option that wants PARAVIRT, but not XEN (or all of them together in one binary). I think which VMMs you want to support is a reasonable level of detail for someone configuring a kernel to select. The granularity also shows upstream that we've It is always a tradeoff. If LGUEST or KVM hypervisor will come soon, I bet full pv_ops will come soon too... thought about generic PV support and we're not just trying to dump a bunch of Xen-only code into the tree. In this case, those mca_xencomm_list is hard to say Xen only code, it could be abstracted as generic PV mechanism I think. But we just leave it to future. We don't need to be constantly concerned with RH's config, we need to look at the bigger picture for what's right in Linux. We can make sure RH's config has everything we want for a single binary later as long as we enable that possibility in what we're doing. Thanks, Alex Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RE: pv_ops polish: config option head file
Alex Williamson wrote: On Fri, 2008-03-14 at 17:19 +0800, Dong, Eddie wrote: Oh, either are OK, just make sure we are in same page. Pleae keep this here. But we need to make sure generic_defconfig can include Xen machine vector in current case. Some Makefile/source change is needed to include this, I think REDHAT use generic_defconfig. I don't think any distributions use defconfig directly. The RH kernel config is significantly different. I don't think we need to touch the defconfig for now. It's useful to have an example domU Sure, that is fine and can be a future minor task. config (I think I'm actually the one who requested it) as we put the pieces together. We need to keep that integration in mind, but we're a long way from that point. Thanks, Something I want to get clarified first, eventually with pv_ops patch series get in, RH eventually will only compile to get one image to run on different platforms including xen machine. In this way all the codes with CONFIG_XEN today must be either checked in as generic code, or pv_ops except for those dual compile codes such as IVT gate. In other word, CONFIG_XEN will disappear mostly. RIght? Xen machine vector also needs to be compiled when in CONFIG_IA64_GENERIC. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops polish: remove fsys.S changes
Commit a07b265c618c279a84bac8f75f5acba1c1646200 is quit intrusive, it removed code from entry.S to a new file switch_entry.S and create 1000 lines of patch. At least we stay in original file, not? At least xen/ia64 needs to paravirtualize ia64_swtich_to, I did a scan on Alex's tree and find the diff between entry.S xenentry.S is very small, see the attachment, probably KR set need to by modified. For rest, what I saw is just a sensitive instruction replacement, we can do using indirect function call, or leave to future. ia64_leave_syscall and ia64_leave_kernel. The discussion for hand written assembly code indicates that single source and multiple compile. This is easy for IVT gate page, but for those normal APIs, I am conservative except indirect function call can't solve the problem. I think the clean way to do for those three functions is to split out them from entry.S. Cluttering out entry.S by ifdef would be uglry. Do you have better idea? Maybe my above finding is wrong, otherwise leave it for now is fine. Eddie entry_s_diff.patch Description: entry_s_diff.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RE: pv_ops polish: config option head file
Alex Williamson wrote: On Mon, 2008-03-17 at 14:07 +0800, Dong, Eddie wrote: Something I want to get clarified first, eventually with pv_ops patch series get in, RH eventually will only compile to get one image to run on different platforms including xen machine. In this way all the codes with CONFIG_XEN today must be either checked in as generic code, or pv_ops except for those dual compile codes such as IVT gate. In other word, CONFIG_XEN will disappear mostly. RIght? Xen machine vector also needs to be compiled when in CONFIG_IA64_GENERIC. I think CONFIG_XEN might become something like CONFIG_PARAVIRT_XEN, which will be dependent on CONFIG_PARAVIRT. There might also be CONFIG_PARAVIRT_LGUEST, CONFIG_PARAVIRT_KVM, etc... I think that Then a single image won't be able to run on both lguest/Xen/KVM. This is worse than running_on_xen dynamic condition check. would fit the typical Linux model of being able to selectively include features. We'll need to make sure distributions set these the way we Yes if the feature is alternative one. want and maybe add it to the defconfig once the code is stabilized. Agree. The Xen machine vector will need to be included in CONFIG_IA64_GENERIC, but it will also depend on CONFIG_PARAVIRT_XEN. Does that sound reasonable? Thanks, Yes. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops: kernel/inst_native.h
Isaku/Alex: There is a new file called kernel/inst_native.h to define those pv MACROs for native. I would suggest we do following changes: 1: Move it to public head files such as include/asm-ia64 at least since some other files will use it too. 2: Further thinking is that how we generate those dual compile code? If we will use symbol link, then I would suggest we put it in include/asm-ia64/native. Comments? 3: How about this style if MACRO? #define MOV_FROM_CR(reg,crx) \ mov reg = cr.##crx Using this we can reduce the patch size avoid mistake. Thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops polish: config option head file
Isaku: Targeting the patchset or git tree http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/, I got some questions: 1: I saw some config options such as: CONFIG_PARAVIRT CONFIG_PARAVIRT_ALT CONFIG_PARAVIRT_ENTRY CONFIG_PARAVIRT_NOP_B_PATCH CONFIG_PARAVIRT_GUEST I am not sure what is best, but seems we expose too much here, and X86 just have one CONFIG_PARAVIRT. I suggest we can go mainly using one especially we have strong reasons. 2: config file I saw you generated a new config file specifically for domU (xen_domu_wip_defconfig), I am wondering is it is what Redhat want. I think RH will only build one image for various machine including PV guest in one release. So I suggest we remove the new config file xen_domu_wip_defconfig, but put CONFIG_PARAVIRT into each existing config files. Comments? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops polish: remove fsys.S changes
CONFIG_XEN needs to be gradually removed per discussion since pv_ops include this concept. Due to this, we defer the fsys.S changes for some time later to use indirect function call. Temporay undo it for now. Thanks, eddie commit a882270f415e717a3694f2762f348ab285fb55ce Author: root [EMAIL PROTECTED] Date: Fri Mar 14 16:18:49 2008 +0800 Undo performance optimization items temporary for fsys.S per discussion to make first patch to upstream simple. Signed-off-by: Yaozu (Eddie) Dong [EMAIL PROTECTED] diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S index 7d97e37..4484197 100644 --- a/arch/ia64/kernel/fsys.S +++ b/arch/ia64/kernel/fsys.S @@ -570,34 +570,11 @@ ENTRY(fsys_fallback_syscall) adds r17=-1024,r15 movl r14=sys_call_table ;; -#ifdef CONFIG_XEN - movl r18=running_on_xen;; - ld4 r18=[r18];; - // p14 = running_on_xen - // p15 = !running_on_xen - cmp.ne p14,p15=r0,r18 - ;; -(p14) movl r18=XSI_PSR_I_ADDR;; -(p14) ld8 r18=[r18] -(p14) mov r29=1;; -(p14) st1 [r18]=r29 -(p15) rsm psr.i -#else rsm psr.i -#endif shladd r18=r17,3,r14 ;; ld8 r18=[r18] // load normal (heavy-weight) syscall entry-point -#ifdef CONFIG_XEN -(p14) mov r27=r8 -(p14) XEN_HYPER_GET_PSR - ;; -(p14) mov r29=r8 -(p14) mov r8=r27 -(p15) mov r29=psr // read psr (12 cyc load latency) -#else mov r29=psr // read psr (12 cyc load latency) -#endif mov r27=ar.rsc mov r21=ar.fpsr mov r26=ar.pfs @@ -709,25 +686,7 @@ GLOBAL_ENTRY(fsys_bubble_down) mov rp=r14 // I0 set the real return addr and r3=_TIF_SYSCALL_TRACEAUDIT,r3 // A ;; -#ifdef CONFIG_XEN - movl r14=running_on_xen;; - ld4 r14=[r14];; - // p14 = running_on_xen - // p15 = !running_on_xen - cmp.ne p14,p15=r0,r14 - ;; -(p14) movl r28=XSI_PSR_I_ADDR;; -(p14) ld8 r28=[r28];; -(p14) adds r28=-1,r28;; // event_pending -(p14) ld1 r14=[r28];; -(p14) cmp.ne.unc p13,p14=r14,r0;; -(p13) XEN_HYPER_SSM_I -(p14) adds r28=1,r28;;// event_mask -(p14) st1 [r28]=r0;; -(p15) ssm psr.i -#else ssm psr.i // M2 we're on kernel stacks now, reenable irqs -#endif cmp.eq p8,p0=r3,r0 // A (p10) br.cond.spnt.many ia64_ret_from_syscall // Breturn if bad call-frame or r15 is a NaT fsys_S_undo.patch Description: fsys_S_undo.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: pv_ops polish: config option head file
Isaku Yamahata wrote: On Fri, Mar 14, 2008 at 03:39:15PM +0800, Dong, Eddie wrote: Isaku: Targeting the patchset or git tree http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/, I got some questions: Thank you for comments. 1: I saw some config options such as: CONFIG_PARAVIRT CONFIG_PARAVIRT_ALT CONFIG_PARAVIRT_ENTRY CONFIG_PARAVIRT_NOP_B_PATCH CONFIG_PARAVIRT_GUEST I am not sure what is best, but seems we expose too much here, and X86 just have one CONFIG_PARAVIRT. I suggest we can go mainly using one especially we have strong reasons. In fact I'm sorting them out right now as a part of pv_cpu_ops clean up. Great! They are just historical leftovers. Presumably we'll have only CONFIG_PARAVIRT and CONFIG_PARAVIRT_GUEST. (X86 has both CONFIG_PARAVIRT and CONFIG_PARAVIRT_GUEST. Please make sure.) Oh, Yes it is in latest tree now:) 2: config file I saw you generated a new config file specifically for domU (xen_domu_wip_defconfig), I am wondering is it is what Redhat want. I think RH will only build one image for various machine including PV guest in one release. So I suggest we remove the new config file xen_domu_wip_defconfig, but put CONFIG_PARAVIRT into each existing config files. I put the file there because others may want to know my config. I haven't intended to push the file to the upstream. I should have written so in the commit log message. Hmm, I can also remove it and put my config somewhere else. Either way is okay because the file is just for other's convenience. Which do you prefer, removing it or updating the commit log? Oh, either are OK, just make sure we are in same page. Pleae keep this here. But we need to make sure generic_defconfig can include Xen machine vector in current case. Some Makefile/source change is needed to include this, I think REDHAT use generic_defconfig. Thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] pv_ops polish: remove fsys.S changes
Isaku Yamahata wrote: Hi Eddie. Thank you for your patch. the change is already isolated as the commit of d81f732b0d57371bfc220b1a1027ab18ea9a5265. So what we need to do is just dropping the change set. The same would apply to the gate page paravirtualization change set. I'll take care of it. Do you have any other changesets to be dropped for minimal domU? Commit a07b265c618c279a84bac8f75f5acba1c1646200 is quit intrusive, it removed code from entry.S to a new file switch_entry.S and create 1000 lines of patch. At least we stay in original file, not? Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] pv_ops enable
Isaku/Alex: There are some some certain features that target dom0 or for some extended domU features such as EXEC support etc, I would suggest we drop it temporary after some basic domU code get in. If this is true, then patch like following can be dropped, also many other files are similar and can be dropped. diff --git a/arch/ia64/kernel/machine_kexec.c b/arch/ia64/kernel/machine_kexec.c index eaec78a..bf2e473 100644 --- a/arch/ia64/kernel/machine_kexec.c +++ b/arch/ia64/kernel/machine_kexec.c @@ -25,6 +25,9 @@ #include asm/meminit.h #include asm/processor.h #ifdef CONFIG_XEN +#ifdef notyet +#include xen/interface/kexec.h +#endif #include asm/kexec.h #endif @@ -131,7 +134,13 @@ void machine_kexec(struct kimage *image) for(;;); } #else /* CONFIG_XEN */ -/* notyet */ +#ifdef notyet +void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, struct kimage *image) +{ + xki-reboot_code_buffer = + kexec_page_to_pfn(image-control_code_page) PAGE_SHIFT; +} +#endif #endif /* CONFIG_XEN */ void arch_crash_save_vmcoreinfo(void) Basically original CSET 226/227 in Alex's tree can temporary be removed from Alex's tree. Also in Alex's tree, I noticed most driver directory patches are dropped since it is dom0 feature, but I see intel-agp.c is imported, typo? Can we remove it now? commit 3a0f146c2b00f9b48dd3e23c4bdf16e5c1775259 Author: Isaku Yamahata [EMAIL PROTECTED] Date: Mon Jan 21 18:45:15 2008 +0900 ia64/xen: import patches under drivers Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Paravirt_ops/hybrid directions and next steps
Tristan: We are talking about pv_ops interface calling convention, not hypervisor API convention. It should not violate each other because we still have hypervisor wrapper which can do the convertion. One thing in my mind is that when we do pv_ops, we stand in hypervisor neutral position. Only when we implement xen hypervisor wrapper of pv_ops, we stand on Xen. But yes, since we use single source, dual compile to generate code in place. Actually those pv_cpu_asm_ops won't be used frequently, most of them are not used. So even we use this policy, it is very few place which may use a formal pv_ops for ASM code which imply the calling convention. All IVT/gate table/page doesn't have this issue. Thanks, eddie -Original Message- From: Tristan Gingold [mailto:[EMAIL PROTECTED] Sent: 2008年3月11日 17:24 To: Dong, Eddie Cc: Alex Williamson; xen-ia64-devel Subject: Re: [Xen-ia64-devel] Paravirt_ops/hybrid directions and next steps Hi, just a point about call convention: I don't think switching to PAL static convention is a good idea as it doesn't work well with xen hyperprivop because of banked registers. Tristan. ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Paravirt_ops/hybrid directions and next steps
Alex Williamson wrote: On Mon, 2008-03-10 at 13:56 +0800, Dong, Eddie wrote: Alex all: I exchanged some ideas with Isaku to discuss the gap and status of pv_ops support in IA64, Isaku did a lot of work toward pv_ops since his previous forward backport patch. Great thanks to Isaku. The attached doc is a draft for some of the key gaps and current status. I think it is time for another cross major company meeting to discuss how we cooperate and how effectively go with pv_ops. Mostly Isaku and I are in same page for what IA64 pv_ops should look like now, though some details may have different understanding. Any ideas? Hi Eddie, Much thanks to you and Isaku for leading this effort. I'm open to another conference call, but maybe we can discuss some items here on the mailing list too. I saw that Isaku has created a wiki page on the Xen wiki and started a new git project on Gitorious.org. The wiki page seems like a good place to keep track of who is working on which chunk and the status. For the git side, I would suggest that the model might be that each developer has a project on gitorious.org and sends out patches or pull requests to have a single upstream reference. Isaku's tree seems to be a good focal point for now if he's willing to take on the task of accepting code from others. The 2.6.26 merge window will likely open before too long, so we also need to do some coordination with Tony Luck and the other Since we are unable to get whole solution (dom0) to upstream in near future since X86 didn't complete it yet. OSV are unable to build single image for all, so I think they may stay with current solution a little bit longer till X86 get solved. I am not that care about which version IA64 pv_ops will be in. As if Tony starts to take the patch, the rest will be easy. upstream developers. Are they going to be interested in putting in pieces at each upstream merge window, or should we build up a complete solution for domU support in Isaku's tree or Tony's testing Yes, we need to get clear message firstly. In the doc, I was assuming maintainer need to see whole patch, though he takes one slowly at beginning especially. branch first? We also need to be careful about submitting patch sets that stand on their own and are bisect-able. It's likely going to Agree, so at least we get xen-ia64/kvm-ia64 people buy in the patch first so that we can push together. take several kernel merge windows before we get full domU support, let alone dom0. In your slide set, you mention removing running_on_xen since it conflicts with pv_ops. I think this is a really good goal, but I have doubts about whether it's achievable. We're not likely to make a My assumption is that Linux maintainer won't expect to see running_on_xen, running_on_kvm, running_on_lguest, running_on_hybrid_xen, running_on_hybrid_kvm etc. It is too ugly. But if you mean we keep it temporary for now, I am fine. pv_ops to fit every corner case, and we may have to resort to an ugly direct test for xen. Let's try to avoid them, but we already have a few cases of checking machine vector names for this type of thing in other parts of the ia64 code. Thanks, Could you point me to the code that you feel pv_ops may be hard here? Alex Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Paravirt_ops/hybrid directions and next steps
I don't have any architecture specific examples off the top of my head, but how about skipping serial port detection on dom0? It's rather Xen specific and we haven't yet come up with a way to hide Xen's UART (ioport mmio) from dom0. KVM/Lguest wouldn't care about this, so it may not be worthy of a pv_op. Thanks, Hi, Alex. We can keep this detail in pv_ops enabling time to see if we can get a right abstract. I assume we will need 1-2 month to make it full pv_ops. Thanks, Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] unify vtlb and vhpt
Limiting the entry to be not moved to VHPT head could solve this issue but again the code will be complicated. Sharing VTLB/VHPT memory could be simply used here, and the patch will be more smaller and simple IMO. My concept is just sharing vTLB/VHPT memory. As long as sharing the pool of collision chain, distinction of vTLB/VHPT can't be avoided I am not sure about the statement. Putting vTLB in physical VHPT side is mixing something, not only sharing. What I mean here is something like following pseudo code, (defenitely init code and many cleanup was not in this pseudo code). This way, we don't impact low level VHPT walk. and makes it clear in concept to distinguish vTLB VHPT. diff -r ff90abf572f2 xen/arch/ia64/vmx/vtlb.c --- a/xen/arch/ia64/vmx/vtlb.c Fri Jan 18 14:11:20 2008 -0700 +++ b/xen/arch/ia64/vmx/vtlb.c Tue Mar 04 02:18:33 2008 +0800 @@ -398,7 +398,9 @@ static thash_data_t *__alloc_chain(thash cch = cch_alloc(hcb); if (cch == NULL) { -thash_recycle_cch_all(hcb); + vcpu = container_of(hcb, vcpu, vtlb); +thash_recycle_cch_all(vcpu-vtlb); +thash_recycle_cch_all(vcpu-vhpt); cch = cch_alloc(hcb); } return cch; @@ -440,12 +442,13 @@ static void vtlb_insert(VCPU *v, u64 pte } cch = cch-next; } +vcpu = container_of(hcb, vcpu, vtlb); if (hash_table-len = MAX_CCN_DEPTH) { thash_recycle_cch(hcb, hash_table); -cch = cch_alloc(hcb); + cch = cch_alloc(vcpu-vhpt); } else { -cch = __alloc_chain(hcb); + cch = __alloc_chain(vcpu-vhpt); } cch-page_flags = pte; cch-itir = itir; ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] unify vtlb and vhpt
[EMAIL PROTECTED] wrote: Hi Kouya, to be honest I have mixed feelings about your patch. I like it but I don't really understand its purpose. See my comment. I still think it would be nice if the vTLB were TR-mapped. This is same with sharing vTLB/VHPT memory. Single TR or double TR (your case) can solve problem both. Quoting Kouya Shimura [EMAIL PROTECTED]: Dong, Eddie writes: This can be simply solved by increasing vTLB size, or use same memory with VHPT. The problem is, how much size is suitable? There is a trade off. The larger size consumes a time for ptc.e emulation and causes a serious slowdown for a Windows guest. How frequently does Windows issue PTC.E? In current situation, VHPT is 16MB, while vTLB is 32K, so I would think purging VHPT is dominant. Ok. Currently vTLB size is configurable but ordinary users can't understand what vTLB is. ??? This is not true except the user(developer) doesn't have virtualization concept. In my experience, I have trouble to explain what is host VHPT in VMM for a guest, but pretty easy to say the meaning of vTLB whose original meaning is guest TLB. The issue in today's Xen/IA64 is that so called vTLB is not equal to real guest TLB. (guest TLB = vTR + vTLB + something in VHPT + something in machine TLB) If you want to rename vTLB to something else, I will vote for Yes. A purpose of this patch is to make users free from setting vTLB size. This is same with sharing memory between VTLB VHPT. By merging vTLB and VHPT the user can't anymore set the size of the vTLB. This is obvious. But is your patch different from increasing vTLB size ? Did I miss a point ? I am not sure it is a good idea to remove vTLB size. On a real processor the TLB structure is fixed and defined. Yes, but probably this is ok since vTLB isn't equal to guest TLB :( Ideally guest TLB should have a fixed size. Sharing memory makes concept clear for me. I.e. VHPT is VHPT, while vTLB is those entries can't be put into VHPT. With this patch, if a VTLB entry in collision chain has to become head of VHPT table, it is really dilemma to put this to head or not. GP fault for reserved bit could be used here with performance penalty but it is really not good and it could happen again as if the VHPT entry head keeps for vTLB (TC could go away soon). Limiting the entry to be not moved to VHPT head could solve this issue but again the code will be complicated. Sharing VTLB/VHPT memory could be simply used here, and the patch will be more smaller and simple IMO. To tell the truth, I rewrote the vtlb_thash() function before. See. http://lists.xensource.com/archives/html/xen-ia64-devel/2007-08/msg00108 .html I think the algorithm is the same as HW. I did a reverse engineering on a Montecito processor. (I'm afraid Montvale use the different algorithm...) Could be in reality, I don't know :) But we still think it is different since we can;t guarante it is same :( This seems to be the same algorithm as the one for Madison. Cf Matthew Chapman pages. Tristan. Thanks, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: [PATCH 0/5] RFC: ia64/pv_ops: ia64 intrinsics paravirtualization
Isaku Yamahata wrote: Hi. Thank you for comments on asm code paravirtualization. Its direction is getting clear. Although it hasn't been finished yet, I'd like to start discussion on ia64 intrinsics paravirtualization. This patch set is just for discussion so that it is a subset of xen Linux/ia64 domU paravirtualization, not self complete. You can get the full patched tree by typing git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/ A paravirtualized guest wants to replace ia64 intrinsics, i.e. the operations defined in include/asm-ia64/gcc_instrin.h or include/asm-ia64/intel_instrin.h, with its own version. (At least xenLinux/ia64 does.) So we need a sort of interface to do so. I want to discuss on which direction to go for, please comment. This paravirtualization corresponds to the part of x86 pv_ops, Performance critical code written in C. They are basically indirect function call via pv_xxx_ops. For performance, each pv instance is allowed to binary patch in order to replace function call instruction with their predefined instructions in place. The ia64 intrinsics corresonds to this kind of interface. The discussion points so far are - binary patching should be mandatory or optional? The current patch requires binary patch, but some people think requiring binary patch for pv instances is a bad idea. I think by providing reasonable helper functions set, binary patch won't be burden for pv instances. - How differ from x86 pv_ops? Some people think that the very similarity to x86 pv_ops is important. I guess they're thinking so considering maintenance cost. Anyway ia64 is already different from x86, so such difference doesn't matter as long as ia64 paravirtualization interface is clean enough for maintenance. Note: the way can differ from one operation from another, but it might cause some inconsistency. The following ways are proposed so far. * Option 1: the current way The code would look like static inline unsigned long paravirt_get_cpuid(int index) { register __u64 ia64_intri_res asm (r8); register __u64 __index asm (r8) = index; asm volatile (paravirt_alt_inst(mov %0=cpuid[%r1], PARAVIRT_INST_GET_CPUID): =r(ia64_intri_res): 0O(__index)); return ia64_intri_res; } #define ia64_get_cpuid paravirt_get_cpuid note: Using r8 is derived from xen hypercall abi. We have to define which register should be used or can be clobbered. Pros: - in-place binary patch is possible. (We may want to pad with nop. How many?) - native case performance is good. - native case doesn't need any modification. Cons: - binary patch is required for pv instances. - Probably current implementation might be too xen-biased. Reviewing them would be necessary for hypervisor neutrality. * Option 2: direct branch The code would look like static inline unsigned long paravirt_get_cpuid(int index) { register __u64 ia64_intri_res asm (r8); register __u64 __index asm (r8) = index; register __u64 ret_addr asm (r9); asm volatile (paravirt_alt_inst( br.cond b0=native_get_cpuid, /* or brl.cond for fast hypercall */ PARAVIRT_INST_GET_CPUID): =r(ia64_intri_res), =r(ret_addr): 0O(__index) b0); return ia64_intri_res; } #define ia64_get_cpuid paravirt_get_cpuid note: Using r8 is derived from xen hypercall abi. We have to define which register should be used or can be clobbered. Pros: - in-place binary patch is possible. (We may want to pad with nop. How many?) - so that performance would be good for native case using it. Cons: - binary patch is required for pv instances. - native case needs binary patch for optimal performance. * Option 3: indirect branch The code would look like static inline unsigned long paravirt_get_cpuid(int index) { register __u64 ia64_intri_res asm (r8); register __u64 __index asm (r8) = index; register __u64 func asm (r9); asm volatile (paravirt_alt_inst( mov %1 = pv_cpu_ops add %1 = %1, PV_CPU_GET_CPUID_OFFSET ld8 %1 = [%1] mov b1 = %1 br.cond b0=b1 PARAVIRT_INST_GET_CPUID): =r(ia64_intri_res), =r(func):
[Xen-ia64-devel] RE: [kvm-ia64-devel] [PATCH 0/4] ia64/xen: paravirtualization ofhand written assembly code
Keith Owens wrote: Isaku Yamahata (on Mon, 25 Feb 2008 12:16:42 +0900) wrote: Hi. The patch I send before was too large so that it was dropped from the maling list. I'm sending again with smaller size. This patch set is the xen paravirtualization of hand written assenbly code. And I expect that much clean up is necessary before merge. We really need the feed back before starting actual clean up as Eddie already said before. Eddie discussed how to clean up and suggested several ways. 1: Dual IVT source code, dual IVT table. (The way this patch set adopted) 2: Same IVT source code, but dual/mulitple compile to generate dual/multiple IVT table using assembler macro. 3: Single IVT table, using indirect function call for pv_ops using branch/binary patching. At this moment my preference is the option 2. Please comment. A combination of options (2) and (3) would work. Have a single source file for the IVT, using conditional macros. Use that source file to build (at least) two copies of the IVT, for native and any virtualized Thanks, we are getting more comments now:) I would like to take this chance to go into a little bit more details now for sub-alternatives. For all of above, we need replace IVT source code like following example: @@ -102,7 +116,7 @@ * - the faulting virtual address uses unimplemented address bits * - the faulting virtual address has no valid page table mapping */ - mov r16=cr.ifa // get address that caused the TLB miss + _READ_IFA(r16, r24, r25) #ifdef CONFIG_HUGETLB_PAGE movl r18=PAGE_SHIFT mov r25=cr.itir For #2 (Dual compile, Dual IVT instance), now we have following sub-alternatives: A) Generate code in place like following: +#ifdef CONFIG_XEN +#define _READ_IFA(regr, clob1, clob2) \ + movl clob1=XSI_IFA;;\ + ld8 regr=[clob1];; +#endif +#ifdef CONFIG_NATIVE +#define _READ_IFA(regr, clob1, clob2) \ + mov regr=cr.ifa; +#endif In this approach, we don't do function call/jump, all the codes for different hypervisor are generated in place. To be more important, it doesn't require any fixed clobber registers, i.e. any registers found spare can be used as clob registers. If we go with this apporach, the coding effort is minized and current Xen code can be simply merged into this model. Cons: No explicit pv_asm_ops function table, diversity to X86's is bigger. B) Directly jump This model use function call (actually jump) in those primitive pv MACROs. +GLOBAL_ENTRY(xen_read_ifa) + mov b0=r24; + movl r25=XSI_IFA;; + ld8 r24=[r25];; + br.cond.sptk b0 +END(xen_read_ifa) +#ifdef CONFIG_XEN +#define _READ_IFA(regr, clob1, clob2) \ + movl r24=1f;\ + br.sptk.many xen_read_ifa;; \ +1: \ + mov regr=r24;; +#endif Pros: less code size generated in place, Cons: need clob registers and probably fixed clob registers. C) Indirect function call This model is mostly close to what pv_ops mean. Previous solution actually doesn't refer to the function table. possible for C ASM to share same pv_ops code with wrapper in C side, and could support single IVT table solution. Cons: Need more clobber registers and change IVT source code. +#define _READ_IFA(regr, clob1, clob2) \ + mov r24=_READ_IFA_OPS_INDEX;\ + movl r25=pv_cpu_asm_ops;; \ + add r25=r24,r25;; \ + ld8 r25=[r25]; \ + movl r24=1f;; \ + mov b0=r25;;\ + br.sptk.many b0;; \ +1: \ + mov regr=r24;; + Binary patching at boot ime can convert C to B or A, or convert B to A if certain condition is met such as clob registers code size. So run time performance degradation to native is minimized. The only difference is we get more nop ops in native IVT table (patching will convert those non-used code space to nop instructions, or maybe use a relative jump to skip those spare code). #A is easiest from effort point of view (no need to re-org mass IVT code), and #A doesn;t need binary patching. but the code quality may be not that good in current Xen such as: @@ -192,7 +235,17 @@ */ adds r24=__DIRTY_BITS_NO_ED|_PAGE_PL_0|_PAGE_AR_RW,r23 ;; +#ifdef CONFIG_XEN +(p7) mov r25=r8 +(p7) mov r8=r24 + ;; +(p7) XEN_HYPER_ITC_D + ;; +(p7) mov r8=r25 + ;; +#else (p7) itc.d r24 +#endif ;; #ifdef CONFIG_SMP #C(also #B) need massive IVT source code change to find clob registers. modes. The native copy of the IVT starts at label ia64_ivt in section .text.ivt, as it does now. Any IVT versions for virtualized mode are defined as __cpuinitdata, so they are discarded after boot, unless Looks like you prefer #A of above dual compiler option, right? If most people agree with this, we
[Xen-ia64-devel] RE: [kvm-ia64-devel] [PATCH 13/28] ia64/xen: introduce xenhypercall routines necessary for domU.
IA64 pv_ops frame work doesn't exist yet so that xen code does in order to boot on both native and xen for now. I expect those check will be eliminated during developing ia64 pv_ops. Qing He I am working on the pv_ops framework, hopefully we can get a draft soon :) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RE: paravirt_ops support in IA64
Isaku Yamahata wrote: 2: Same IVT source code, but dual/mulitple compile to generate dual/multiple IVT table. I.e. we replace those primitive ops (sensitive instructions) with a MACRO which uses compile option for different hypervisor type. The pseudo code of the MACRO could be: (take read CR.IVR as example) AltA: #define ASM_READ_IVR /* read IVR to GR24 */ #ifdef XEN breg1 = return address brxen_readivr #else/* native mov GR24=CR.IVR; #endif Or AltB: #define ASM_READ_IVR /* read IVR to GR24 */ #ifdef XEN in place code of function xen_readivr #else/* native mov GR24=CR.IVR; #endif From maintenance effort point of view, it is minimized, but not exactly what X86 pv_ops look like. Both approach will cause code size issue, but altB is much worse in this area, while AltA need one additional BR clobber register Pros: - single code - hopefull less maintenance cost compared to #1 Cons: - requires restriction on register usage. And we need to define its convension. When modifying ivt.S in the future after converting ivt.S, those convesion must be kept in mind. - suboptimal for paravirtualized case compared to #1 case 3: Single IVT table, using indirect function call for pv_ops. This is more like X86 pv_ops, but we need to pay 2 additional BR clobber registers due to indirect function call, like following pseudo code: AltC: breg0 = pv_ops base breg0 += offset for this pv_ops breg1 = return address; br breg0. /* pv_ops clobbered breg0/breg1 */ For both #2 #3, we need to modify Linux IVT code to get clobber register for those MACROs, #3 need 2 br registers and 1-2 GR registers for the function body. #2A needs least clobber register, just 1-2 GR registers. #2B may also need clobber 1(or 2?) GR registers depending on the original instruction. Yes, clobber GR # is almost same for all Alts. Pros: - single code/binary - less maintenance cost Cons: - requires restriction on register usage. And we need to define its convension. When modifying ivt.S in the future after converting ivt.S, those convesion must be kept in mind. - more clobbered register (for AltC) - suboptimal even for native case. After binary patching, native side won't have impact. We can have in place patching, i..e. replace whole AltC code dynamically with mov GRx=CR.IVR;nop;nop... Presumably we can use binary patching technique to mitigate those overhead. Probably for native case, we can convert those branch with single instruction. For example we can make 'br breg0' into direct branch. If it is single IVT table, we don't know the target address of the function call. AltD(AltC'): breg1 = return address; br native_pv_ops_ops === binary patch at boot time ?? Are u talking about AltA? thanks, Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] paravirt_ops support in IA64
Hi, Tony all: Recently Xen-IA64 community is considering to add paravirt_ops support to keep sync with X86 and reduce maintenance effort. With pv_ops, sensitive instructions or some high level primitive functionalities (such as MMU ops) are replaced with pv_ops which is a function table call whose exact function pointer is initialized at Linux startup time depending on different hypervisor (or native) runing underlayer. With this, we can reuse many code with X86 such as irqchip with X86, and similar dma support with X86, similar xenoprof/PMU profiling support etc. While CPU side pv_ops is quit different especially for those ASM code, since IA64 processor doesn;t have memory/stack ready at most IVT handler code. In X86, ASM side pv_ops can save clobber registers to stack and do function call, but IA64 can't due to unavailable of memory access. #define DISABLE_INTERRUPTS(clobbers) \ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \ pushl %eax; pushl %ecx; pushl %edx; \ call *%cs:pv_irq_ops+PV_IRQ_irq_disable; \ popl %edx; popl %ecx; popl %eax) \ One of the 1st biggest argument is how to support those ASM IVT handler code. Some ideas discussed include: 1: Dual IVT source code, dual IVT table. This is current Xen did, and probably are not warmly welcomed since it is not in upstream yet and have maintenance effort. 2: Same IVT source code, but dual/mulitple compile to generate dual/multiple IVT table. I.e. we replace those primitive ops (sensitive instructions) with a MACRO which uses compile option for different hypervisor type. The pseudo code of the MACRO could be: (take read CR.IVR as example) AltA: #define ASM_READ_IVR/* read IVR to GR24 */ #ifdef XEN breg1 = return address brxen_readivr #else /* native mov GR24=CR.IVR; #endif Or AltB: #define ASM_READ_IVR/* read IVR to GR24 */ #ifdef XEN in place code of function xen_readivr #else /* native mov GR24=CR.IVR; #endif From maintenance effort point of view, it is minimized, but not exactly what X86 pv_ops look like. Both approach will cause code size issue, but altB is much worse in this area, while AltA need one additional BR clobber register 3: Single IVT table, using indirect function call for pv_ops. This is more like X86 pv_ops, but we need to pay 2 additional BR clobber registers due to indirect function call, like following pseudo code: AltC: breg0 = pv_ops base breg0 += offset for this pv_ops breg1 = return address; br breg0. /* pv_ops clobbered breg0/breg1 */ For both #2 #3, we need to modify Linux IVT code to get clobber register for those MACROs, #3 need 2 br registers and 1-2 GR registers for the function body. #2A needs least clobber register, just 1-2 GR registers. In X86, there are another enhancement (dynamic patching) base on pv_ops. The purpose is to improve cpu predication by converting indriect function call to direct function call for both C ASM code. We may take similar approach some time later too. We really need advices from community before we jump into coding. CC some active members that I though may be interested in pv_ops since KVM-IA64 mailinglist doesn;t exist yet. Thanks a lot, Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Re: [Xen-devel] XenLinux/IA64 domU forward port
Isaku Yamahata wrote: Dong, Eddie wrote: I guess we are talking in different angle which hide the real issues. We have multiple alternaitves: 1: pv_ops 2: pv_ops + binary patching to convert those indirect function call to direct function call like in X86 3: pure binary patching For community, #1 need many effort like Jeremy spent in X86 side, it could last for 6-12 months, #2 is based on #1, the additional effort is very small, probably 2-4 weeks. #3 is not pv_ops, it may need 2-3 months effort. Per my understanding to previous Yamahata san's patch, it address part of #3 effort. I.e. #A of #3. What I want to suggest is #2. Hmm, by pv_ops you mean a set of functions which are grouped, right? My current implementation does #define ia64_fc(addr) paravirt_fc(addr) ... But do you want to make them indirect call? i.e. something like #define ia64_fc(addr) pv_ops-fc(addr) That is what X86 pv_ops did, such as following pv_ops for halt instruction in X86. static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } The key issue is current approach (putting native instruction by default) always need a binary patching. While pv_ops doesn't assume in that way. In X86, only only cli/sti/iret sti; sysexit is in patch site. Xen/X86 patch cli/sti from indirect call to direct call. Anyway patching or not is totally depend on hypervisor itself. With pv_ops, all those instruction both in A/B/C are already replaced by source level pv_ops code, so no binary patching is needed. The only patching needed in #2 is to convert indirect function call to direct function call for some hot APIs, for example X86 does for cli/sti. The majority of pv_ops are not patched. So basically #2 #3 approach is kind of conflict, and we probably need to decide which way to go earlier. It's not difficult to make #A of #3 to #A of #2. Yes, but the issue is pv_ops based patching is pure optional. But this patch makes it permanent. And the 3-5 cycles saved by patching is too small for huge C code function, which should be addressed in later stage. Looking at X86 code, only arch\x86\kernel/entry_32.S may be patched, such as sysenter_entry(). All C code are not. For IA64, it is IVT code if we take same policy with X86, i,e. #C is critical and may need patching. (Note here: patching or not even for #C is still optional) (At least for making the current implementation into #A of #2, but it requires more work and performance degrade.) However I don't see any advantage #A of #2 than #A of #3. We don't need #A at all. If it is necessary to call some other function for #A of #3, it is possible to rewrite instructions into something like mov reg = 1f br target25 (relocation is necessary) 1: So left issues are how many instructions (or bundles) should be reserved for each operations and what is their calling convention. Although currently I put instructions for native as default case, you can put the above sequence if you desire. The issue is we don;t have clobber registers here, that is what I say pv_ops for ASM is key challenge, and we need to change IVT code a lot to get clobber registers. That is why adding pv_ops support is a big challenge, but patching or not is not that difficult. Even we want to patch it, we need to get pv_ops code done and than do optimization. For those C based codewe can always use scratch registers. Given that #A of #2 is for performance critical path, so that not using usual stacked calling convension would be acceptable. As you already proposed, PAL static calling convention is a candidate. Not at all. For #A, it is already in C code and thus memory is available, C calling convention can be applied seamlessly. PAL like convention is only for IVT code. However I don't see any advantage to switch from the current convention (using r8, r9...) for #A at this moment. I don't oppose here either. I just leave the question here and let Linux guys to decide. In neutral, I won't let Xen specific implementation impacts pv_ops design since the later one is hypervisor neutral. But I can accept either. It is necessary to discuss with linux-ia64 people to see if it's acceptable or not. If we found it necessary to change the convention, it wouldn't be so difficult to do so. But it should be after discussion with linux-ia64. Not now. Actually I didn't opose binary patching, but my point is that we can't assume patching is a must for each hypervisor. Leaving the code to native by default will enforce this assumption. Also I think we should get pv_ops done first and then do optimization (patching), reverse sequence will just make more effort for whole community. Once we get pv_ops done, the framework used in this patch can be extended to that code base and we can decide which one need patching. Per my understanding to this patch, I think the 90% effort is forward
RE: [Xen-ia64-devel] Question about migration
The kernel guarantees applications only see time move forward, even across multiple CPUs. See: kernel/timer.c:time_interpolator_get_counter() We never return a time before last_cycle unless booted with the nojitter options. Echo from me too. I was told some time ago, the crystal used in IPF platform is usually expansive than other platforms and thus much more accurate. Normally the small difference won;t cause application see backward ITC value, but live migration per current Xen time virtualization policy is another story. It could be a headache :( Hopefully some time later, with Tukwila, we can live with hybrid virtualization, thus we got the problem solved by HW trapping application ITC read :) Or if some platform has high resolution platform time, we can restore physical ITC at VP switch time. (platform_time + per VP offset) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Kouya Shimura wrote: Dong, Eddie writes: 3: irq chip paravirt_ops, xen irq chip or vSAPIC? Is xen irqchip really necessary? X86 side already pushed the xen irq chip into upstream, so I think it should be easy to do same thing in IPF side. In current PV implementation, an evtchn interrupt is injected and reflected directly to a guest OS. See reflect_event()@xen-unstable.hg/xen/arch/ia64/xen/faults.c and [EMAIL PROTECTED]/arch/ia64/xen/xenivt.c Yes, this is xen irq chip. It should have better performance than vSAPIC. We need to re-do this base on upstream xen riq chip code, + debug, it is more than vSAPIC, but love to see going in this direction since x86 already pushed it. There is no intermediate layer there. I think that the same mechanism can work in paravirt_ops. Perhaps I might misunderstand something. :-) Just term difference :) basically we are talking about same thing. thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Isaku Yamahata wrote: Dual compile could be a good approach. Another alternative will be X86 pv_ops like dynamic binary patching per compile time hints. The later method also uses macro for those different path, but this macro will generate a special section including the information for runtime patch base on pv_ops hook. (some kind of similar to Yamahata's binary patch method though it addresses C side code) With dynamic pv_ops patch, the native machine side will again see original single instruction + some nop. So I guess the performance lose will be very minor. Both approach is very similar and could be left to Linux community's favorite in future :) Actually already we adopted dual compilatin approach for gate page. See gate.S in xenLinux arch/ia64/kernel/gate.S and Makefile. I'm guessing that dual compiling approach is easier than binary patching approach because some modification of xenivt.S doesn't correspond to single instruction. Yes I agree that we can go for either way according to upstream favor. Yes, it is there already. When we implemented pv_ops, I would assume we define the APIs and would ask future kernel patches to follow too (not conflict those APIs). So we have to define clear clobber register in those MACRO, and then modify many original linux IVT code to provide those clobber register effectively. Current XenLinux provide one solution, but I saw 2 issues: 1: The coding style is not as good as original IVT code. For example: #ifdef CONFIG_XEN mov r24=r8 mov r8=r18 ;; (p10) XEN_HYPER_ITC_I ;; (p11) XEN_HYPER_ITC_D ;; mov r8=r24 ;; #else This kind of save/restore R8 in each replacement (MACRO) is kind of not well tuned. We probably need a big IVT code change to avoid frequent save/restore in each MACRO. This needs many effort. Of course taking shortcut before into upstream. 2: We are not using function pointer which pv_ops wants. But this one can be avoided if we use dual IVT. This is kind of very high level pv_ops (hypervisor provide whole IVT table), not normal pv_ops address (for low level instruction API). But anyway I love the idea too if the upstream guys like too Another problem I want to raise is about pv_ops calling convention. Unlike X86 where stack is mostly available, IPF ASM code such as IVT entrance doesn't invoke stack, so I think we have to define static registers only pv_ops stacked registers pv_ops like PAL. With respect to hypervisor ABI, we have already differentiate them. ia64 specific HYPERVIRVOPS as static registers convention and normal xen hypercall as stacked registers convention. Yes, hyperpriv is doing something similar, so I think people won't have much resist here. For most ASM code (ivt), it have to use static registers only pv_ops. We need to carefully define the clobber registers used and do manual modification to Linux IVT.s. Dual IVT table or binary patching is preferred for performance. Stacked register pv_ops could follow C convention and it is less performance critical, so probably no need to do dynamic patching. I'm guessing one important exception is masking/unmasking interrupts. i.e. ssm/rsm psr.i. Anyway we will see during our merge effort. If it is called in C, I won't say it is critical becuase it is slow path in native OS too. But some time later, we can add more after the Linux community takes it. This is the advantage of pv_ops when people argued about ABI level abstraction or API level abstraction at very beginning when Vmware raises their VMI spec. API approach can have on going improvement :) Thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Isaku Yamahata wrote: On Tue, Feb 05, 2008 at 10:17:10PM +0800, Dong, Eddie wrote: 1: The coding style is not as good as original IVT code. I have to agree with you here. For example: #ifdef CONFIG_XEN mov r24=r8 mov r8=r18 ;; (p10) XEN_HYPER_ITC_I ;; (p11) XEN_HYPER_ITC_D ;; mov r8=r24 ;; #else This kind of save/restore R8 in each replacement (MACRO) is kind of not well tuned. We probably need a big IVT code change to avoid frequent save/restore in each MACRO. This needs many effort. Of course taking shortcut before into upstream. Yes, such register value save/restore is suboptimal. Another issue from me is that why we use R8/R9 for In/Out parameter in Xen static hypercall. This raises us an issue to save/restore R8/R9 using bank 0 register. static PAL call doesn't use R8/R9, should we? Especially pv_ops itself is Xen neutral. I'm guessing such overhead is relatively small compared to the hyperprivops overhead which issues break instruction. Yes, the overhead is mostly un-observable, but mainly coding style or code quality concern. I assume Linux guys is much more paranoid in pursuing best. So presumably for reducing such overhead, it is necessary to replace those break instructions with fast hyperprivops using gate page. Such optimization would be the next step after upstream merge though. Yes, this could be future effort, actually this is not a pv_ops work, but xen wrapper work. Let me create another thread for compile time dual IVT table vs. single discussion. thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] per hypervisor IVT table vs. global IVT table
All: If we use single IVT table, the pv_ops code will look like: ALT0: breg0 = pv_ops base breg0 += offset for this pv_ops breg1 = return address; br breg0. /* pv_ops clobbered breg0/breg1 */ That means we have to use 2 BR clobber register. Or we can use X86 hypercall page like technique to copy those hooks to a common page to avoid breg0. This make ALT0 same with following ALT1. If we use per hypervisor IVT table at compile time. We could do: ALT1: #define ASM_READ_IVR #if XEN breg1 = return address brpv_ops_api_readivr #endif When pv_ops_api_readivr is hooked, it do read_ivr_code. Or we can just do: ALT2: #define ASM_READ_IVR #ifdef XEN read_ivr_code #endif ALT1 is more like X86 pv_ops that some initialization code will hook, ALT2 can save an additional br register, and thus probably less change to Linux IVT code. In terms of former approach, binary patching can patch ALT1 code back to ALT2 solutuon to avoid the indirection call cost if we follow same approach with X86. Which ALT should we pursue first? thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Yang, Fred wrote: Alex Williamson wrote: On Mon, 2008-02-04 at 09:53 +0800, Dong, Eddie wrote: Yang, Fred wrote: Dong, Eddie wrote: Re-post it to warmup discussion in case people can't read PPT format, IVT is very performance sensitive for the native Linux, how about dual IVT tables alternative for CPU virtualization? It would need maintainance effort but it would be much cleaner forIA64 situation. -Fred Dual IVT table could be a night mare for Tony, I guess. But yes we need to have more active discussion to kick it off. Yes, two separate IVTs with 95+% of the code being the same would not be ideal. I think we should aim for a single ivt.S that gets compiled a couple times with different options, once for native and again for each virtualization option. It looks like more than half of the changes in xenivt.S could be easily converted to macros that could be switched by compile options. Perhaps a pattern will emerge for the rest. If it is not necessarily to stick with a single image and runtime to determine code path, multi-compile paths to generate different PV or native image then macros can possibly work.. -Fred Dual compile could be a good approach. Another alternative will be X86 pv_ops like dynamic binary patching per compile time hints. The later method also uses macro for those different path, but this macro will generate a special section including the information for runtime patch base on pv_ops hook. (some kind of similar to Yamahata's binary patch method though it addresses C side code) With dynamic pv_ops patch, the native machine side will again see original single instruction + some nop. So I guess the performance lose will be very minor. Both approach is very similar and could be left to Linux community's favorite in future :) Another problem I want to raise is about pv_ops calling convention. Unlike X86 where stack is mostly available, IPF ASM code such as IVT entrance doesn't invoke stack, so I think we have to define static registers only pv_ops stacked registers pv_ops like PAL. For most ASM code (ivt), it have to use static registers only pv_ops. We need to carefully define the clobber registers used and do manual modification to Linux IVT.s. Dual IVT table or binary patching is preferred for performance. Stacked register pv_ops could follow C convention and it is less performance critical, so probably no need to do dynamic patching. more comments are welcome:) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] RFC: Remove dead code?
Alex Williamson wrote: On Fri, 2008-02-01 at 13:16 +0800, Dong, Eddie wrote: The following code is not used anymore, and I think it is also a legacy issue when people are debugging on SKI simulator. Should we simplify it? thx, eddie Yes, I haven't run Xen on ski for a long time, and it's probably cruft that we don't want to push upstream. Please send a sign-off for the below and I'll apply it. Thanks, Alex Thanks, here it is. Eddie Remove ski simulator related stuff since it is for early Xen development stage, and no longer necessary for now. Signed-off-by: YaoZu (Eddie) Dong [EMAIL PROTECTED] diff -r 0e62beb4c36a arch/ia64/xen/Makefile --- a/arch/ia64/xen/MakefileFri Feb 01 09:33:32 2008 +0800 +++ b/arch/ia64/xen/MakefileFri Feb 01 12:59:24 2008 +0800 @@ -2,7 +2,7 @@ # Makefile for Xen components # -obj-y := hypercall.o xenivt.o xenentry.o xensetup.o xenpal.o xenhpski.o \ +obj-y := hypercall.o xenivt.o xenentry.o xensetup.o xenpal.o \ hypervisor.o util.o xencomm.o xcom_hcall.o \ xcom_privcmd.o xen_dma.o diff -r 0e62beb4c36a arch/ia64/xen/xenhpski.c --- a/arch/ia64/xen/xenhpski.c Fri Feb 01 09:33:32 2008 +0800 +++ /dev/null Thu Jan 01 00:00:00 1970 + @@ -1,19 +0,0 @@ -#include linux/kernel.h -#include asm/hypervisor.h - -int -running_on_sim(void) -{ - int i; - long cpuid[6]; - - for (i = 0; i 5; ++i) - cpuid[i] = xen_get_cpuid(i); - if ((cpuid[0] 0xff) != 'H') return 0; - if ((cpuid[3] 0xff) != 0x4) return 0; - if (((cpuid[3] 8) 0xff) != 0x0) return 0; - if (((cpuid[3] 16) 0xff) != 0x0) return 0; - if (((cpuid[3] 24) 0x7) != 0x7) return 0; - return 1; -} - pv_ops_cleanup2.patch Description: pv_ops_cleanup2.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Yang, Fred wrote: Dong, Eddie wrote: Re-post it to warmup discussion in case people can't read PPT format, IVT is very performance sensitive for the native Linux, how about dual IVT tables alternative for CPU virtualization? It would need maintainance effort but it would be much cleaner forIA64 situation. -Fred Dual IVT table could be a night mare for Tony, I guess. But yes we need to have more active discussion to kick it off. Tony: I think this discussion shouldn't exclude IA64 Linux community (at least for those active members), will u like us to post this kind of discussion to IA64 community? Or do u have list of people who are most interested? I want to kick off some high level discussion. thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] RFC: Remove dead code?
The following code is not used anymore, and I think it is also a legacy issue when people are debugging on SKI simulator. Should we simplify it? thx, eddie diff -r 0e62beb4c36a arch/ia64/xen/Makefile --- a/arch/ia64/xen/MakefileFri Feb 01 09:33:32 2008 +0800 +++ b/arch/ia64/xen/MakefileFri Feb 01 12:59:24 2008 +0800 @@ -2,7 +2,7 @@ # Makefile for Xen components # -obj-y := hypercall.o xenivt.o xenentry.o xensetup.o xenpal.o xenhpski.o \ +obj-y := hypercall.o xenivt.o xenentry.o xensetup.o xenpal.o \ hypervisor.o util.o xencomm.o xcom_hcall.o \ xcom_privcmd.o xen_dma.o diff -r 0e62beb4c36a arch/ia64/xen/xenhpski.c --- a/arch/ia64/xen/xenhpski.c Fri Feb 01 09:33:32 2008 +0800 +++ /dev/null Thu Jan 01 00:00:00 1970 + @@ -1,19 +0,0 @@ -#include linux/kernel.h -#include asm/hypervisor.h - -int -running_on_sim(void) -{ - int i; - long cpuid[6]; - - for (i = 0; i 5; ++i) - cpuid[i] = xen_get_cpuid(i); - if ((cpuid[0] 0xff) != 'H') return 0; - if ((cpuid[3] 0xff) != 0x4) return 0; - if (((cpuid[3] 8) 0xff) != 0x0) return 0; - if (((cpuid[3] 16) 0xff) != 0x0) return 0; - if (((cpuid[3] 24) 0x7) != 0x7) return 0; - return 1; -} - ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] paravirt_ops and its alternatives
Isaku Yamahata wrote: On Thu, Jan 31, 2008 at 08:21:51AM +0800, Dong, Eddie wrote: Alex All: Hi Eddie. At first I'd like to make it clear. The goal is to merge xenLinux/ia64 modification into upstream kenrel. Hence reduce maintenane cost etc... And you want to dicuss how to do it. Is this correct? Mmm, Kind of mix. I want to know the gap, and then see how we can across the gap:) Now I'm forward poring domU code to 2.6.24. (In fact to 2.6.24-rc, but I'm going to rebase to 2.6.24 release). I haven't got it work yet. This work is still useful. If we decide to go with paravirt_ops eventually, we can refer what you have in 2.6.24 kernel, replacing one by one to paravirt_ops, and it will reduce debug effort in latest kernel. I'm planning to post it as a single jumbo patch once I get it work. To make our collaboration effective, we should have some kind of repository for that purpose. What kind of repository is best? Considering upstream merge, having our modification as patch queues might be easy. But should we also have git or hg repo to track our change? Yes, Alex will work on this I think. Here is a gap analysis for paravirt_ops, can you all comment? In summary we have 4 catagory of jobs: 1: CPU paravirt_ops including MMU timer interrupt 2: Xen hooks 3: irq chip paravirt_ops, xen irq chip or vSAPIC? 4: dma for driver domain My understanding is that the effort is almost similar for each part, while all various alternatives such as pre-virtualization, binary patching (privify) or even unmodified Linux as dom0 only save part of #1 effort, which means less than 25% effort saving. Do we really want a temporary solution for 25%- effort saving? So I would suggest we go with paravirt_ops which is the Linux community direction to avoid resource fragmentation. The writeup is very draft and I am planning to spend more time in investigation, comments are welcome. Probably as you know it, Linux/ia64 already has the machine vector frame work so that many basic functinality like dma api are called indiretly. So it would be wise to utilize machine vector at first and I fact we already defined xen machine vector which is due to Alex Williamson. If there were something unsuitable to machine vector, then we could introduce pv_ops. Anyway this is the only implementation details and how we call it. Conceptually they are same. About CPU virtualization. Last year I wrote the patch which does binary patching like x86 paravirt_alt. And I called it paravirt_alt patch. But I'm not sure about paravirtulized hand written assembly code. I'm afraid Linux people may dislike such code duplication. Yes it's possible to use binary patch technique somehow, however it is inevitable make the hand written assembly code less readable to some extent. Yes. One issue in my mind is that binary patching couldn't solve the high level virtualization issues in Xen today such as xen irqchip dma patch. It could work for domU but very difficult for dom0 or driver domain without these. X86 side has xen irqchip in Linux upstream today, so it should be ok for us to reuse it since XenLinux-IA64 is same with X86 in this area before paravirt_ops. Redhat guys are working on dma support, I think we can rely on them to push it upstream and then we implement IA64 specific things with same concept. To detect environment, mov from cpuid can't be used on PV case because it isn't privileged instructions. On VT-i environment cpuid can be hooked though. Current we check only priveleged level on which kenrel is runinng. Possibley more sophisticated way is necessary to allow another pv technology. Actually paravirt_ops version of X86 Linux doesn;t detect this. In stead, it put a special initial code in ELF Linux image and let the dom builder find it and start from that point to make sure it is on top of Xen hypervisor. For VMI stuff, there is a new code in Linux startup sequence to check if the VMI support ROM exist. For us, we can use same mechanism right now, Jeremy is working on dom0 detection support. Any way that is not a big issue. CC jeremy Keir in case they are not in this mailinglist. thx, eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] MINSTATE_PHYS?
I did a quick grep and find MINSTATE_PHYS is never defined in xenlinux. Xen mca code did. Anything missed? diff -r 71a415f9179b arch/ia64/xen/xenminstate.h --- a/arch/ia64/xen/xenminstate.h Fri Jan 18 14:20:59 2008 -0700 +++ b/arch/ia64/xen/xenminstate.h Thu Jan 31 15:08:42 2008 +0800 @@ -66,12 +66,6 @@ # define MINSTATE_GET_CURRENT(reg) mov reg=IA64_KR(CURRENT) # define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_VIRT # define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_VIRT -#endif - -#ifdef MINSTATE_PHYS -# define MINSTATE_GET_CURRENT(reg) mov reg=IA64_KR(CURRENT);; tpa reg=reg -# define MINSTATE_START_SAVE_MIN MINSTATE_START_SAVE_MIN_PHYS -# define MINSTATE_END_SAVE_MIN MINSTATE_END_SAVE_MIN_PHYS #endif /* ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] domU address space
Do we have checks when inserting guest TLB for PV dom? Seems not, If a guest insert a TLB with HV VA in domU, the TLB in machine side may be mis-used by HV. It should be able to be fixed :) Eddie Kouya Shimura wrote: In addition, It seems that PV domain can use an unimplemented VA address except xen area. Ideally xen should check it and reflect the unimplemented address fault to the guest. But it sounds overkill. Isaku Yamahata writes: On Thu, Jan 24, 2008 at 09:28:39AM +0800, Dong, Eddie wrote: Alex All: First of all, pls forgive me that I was out of Xen/IA64 for quit a long time, and I didn't fully catch up yet now. In the very beginning day of Xen/IA64, I remembered the address isolation between guest (domU) hypervisor is not solved though guest PAL can provide less number of VA bits, it just assume pv guest won't touch hypervisor address space, i.e. it will strictly follow PAL reported VA address bits. Is this solved now? Yes. (Possibly there might be bugs, though.) In paravirtualized domain case, PV domain is running under ring 2 (or ring 1 depending on the compile time configuration), and the xen area is proteceted by privileged level. In VTi domain case, it's protected by psr.vm = 1. -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Time for hybrid virtualization?
[EMAIL PROTECTED] wrote: Quoting Dong, Eddie [EMAIL PROTECTED]: Not sure if anybody ever tried to run Xen/IA64 VMM in Xen/IA64 HVM guest? It may not be already there, but looks like not that far. I tried in the past, but it doesn't work out of the box. You of course can't run bare Xen within VTi, but a modified version can run. Why? The goal of hardware virtualization is to provide architecture equavalent virtualization. We may see the gap. It may be used to solve the compatability issues if we move root VMM to new VT-i dom0 based solution. But performances are not good. Yes, we need to see the gap too, but I think the additional degradation won't be big. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: single software TLB vs. multiple software TLBs (was RE: [Xen-ia64-devel] [PATCH] [Resend]Enable hash vtlb)
Alex: It becomes much clear now that multiple softsware TLB support is a must in functionality. For at least following reasons: 1: We should merge VTI and para domain vMMU code together to reduce future maintaince effort. In this case multiple software TLB support for MMIOs is a must for VTI domain and thus a must for Xen/IA64. 2: Para domain needs to capture guest MMIO access too in case a native driver in guest detect its existance. Definitely we should not crash system in that case, so multiple software TLB support is a must too. 3: Multiple software TLBs provides flexibility for guest huge TLB translations. Anthony's patch is ready to support all of above as a functionality ready solution, and so far I didn't see anybody against multiple software TLB support. Can u check in now as a build option? The performance difference in 1-2 percent should be a second level consideration for now. Thanks very much! Eddie Tristan Gingold wrote: Le Mercredi 12 Avril 2006 16:51, Dong, Eddie a écrit : Tristan Gingold wrote: At every tlb miss time, you can get guest translation from software TLB (not from VHPT). You actually don't need to care about VHPT entries no matter it is all there or nothing there. Ok, it is clear now. Tristan. ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH][RFC][TAKE4] the P2M/VP patches
Congratulations! That is why Kevin and I advocated many times before to suggest p2m translation (p!=m) :-) Can we also share the free beer? Eddie Magenheimer, Dan (HP Labs Fort Collins) wrote: I was also able to get networking working with Isaku's patches and Alex's. Hooray! For the last eight months, I have gulped as I told people that Xen/ia64 doesn't support networking. No longer! Domo arigato, Yamahata-san! Free beer (or sake) for you at the next summit! Dan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Williamson, Alex (Linux Kernel Dev) Sent: Monday, April 10, 2006 2:08 PM To: Isaku Yamahata Cc: xen-ia64-devel@lists.xensource.com Subject: Re: [Xen-ia64-devel] [PATCH][RFC][TAKE4] the P2M/VP patches On Mon, 2006-04-10 at 10:51 -0600, Alex Williamson wrote: On Fri, 2006-04-07 at 13:16 +0900, Isaku Yamahata wrote: 9512:f5d0a531cb58_dom0_vp_model_xen_part.patch I'm having trouble with the legacy VGA memory descriptor section of this patch. I managed to get my system booting with the patch below (2-way, w/ 1GB RAM). Networking works, yeah! The main changes here are that I removed the fabricated MDT entries describing the legacy VGA space, added EFI_ACPI_RECLAIM_MEMORY to the memory types mapped, and sorted the resulting memory descriptor table. I also included the hack to avoid calling assign_domain_mmio_page() for large MMIO ranges. Minor nit, we're still subtracting IA64_GRANULE_SIZE from the MDT entry for conventional memory, but we're not adding in the granule at the end of memory like we used to. I also had to make a change to the -xen kernel which is not shown here. The sal_cache_flush_check() appears to be causing us some trouble again with the P2M/VP patches (MCA'd on my system), so I commented out the call to in in arich/ia64/kernel/sal.c:ia64_sal_init(). Thanks, Alex -- Alex Williamson HP Linux Open Source Lab --- xen/xen/arch/ia64/xen/dom_fw.c 2006-04-10 13:17:31.0 -0600 +++ xen/xen/arch/ia64/xen/dom_fw.c 2006-04-10 13:15:21.0 -0600 @@ -10,6 +10,7 @@ #include asm/pgalloc.h #include linux/efi.h +#include linux/sort.h #include asm/io.h #include asm/pal.h #include asm/sal.h @@ -600,9 +601,14 @@ u64 end = start + (md-num_pages EFI_PAGE_SHIFT); if (md-type == EFI_MEMORY_MAPPED_IO || -md-type == EFI_MEMORY_MAPPED_IO_PORT_SPACE) +md-type == EFI_MEMORY_MAPPED_IO_PORT_SPACE) { + +if (md-type == EFI_MEMORY_MAPPED_IO +((md-num_pages EFI_PAGE_SHIFT) 0x1UL)) + return 0; + paddr = assign_domain_mmio_page(d, start, end - start); - else +} else paddr = assign_domain_mach_page(d, start, end - start); #else paddr = md-phys_addr; @@ -610,6 +616,7 @@ BUG_ON(md-type != EFI_RUNTIME_SERVICES_CODE md-type != EFI_RUNTIME_SERVICES_DATA + md-type != EFI_ACPI_RECLAIM_MEMORY md-type != EFI_MEMORY_MAPPED_IO md-type != EFI_MEMORY_MAPPED_IO_PORT_SPACE); @@ -626,6 +633,18 @@ return 0; } +static int +efi_mdt_cmp(const void *a, const void *b) +{ +const efi_memory_desc_t *x = a, *y = b; + +if (x-phys_addr y-phys_addr) +return 1; +if (x-phys_addr y-phys_addr) +return -1; +return 0; +} + static struct ia64_boot_param * dom_fw_init (struct domain *d, const char *args, int arglen, char *fw_mem, int fw_mem_size) { @@ -834,6 +853,7 @@ /* simulate 1MB free memory at physical address zero */ MAKE_MD(EFI_LOADER_DATA,EFI_MEMORY_WB,0*MB,1*MB, 0);//XXX #else +#if 0 //XXX dom0 should use VGA? #define VGA_RAM_START 0xb8000 #define VGA_RAM_END 0xc @@ -852,6 +872,7 @@ pcolour_map_end = pcolour_map + VGA_CMAPSZ * 8; MAKE_MD(EFI_LOADER_DATA, EFI_MEMORY_WB, 0 * MB, pvga_start, 1); MAKE_MD(EFI_LOADER_DATA, EFI_MEMORY_WB, pcolour_map_end, 1 * MB, 1); +#endif /* 0 */ #endif /* hypercall patches live here, masquerade as reserved PAL memory */ MAKE_MD(EFI_PAL_CODE,EFI_MEMORY_WB,HYPERCALL_START,HYPERCALL_END, 0); @@ -890,6 +911,8 @@ // for ACPI table. efi_memmap_walk_type(EFI_RUNTIME_SERVICES_DATA, dom_fw_dom0_passthrough, arg); +efi_memmap_walk_type(EFI_ACPI_RECLAIM_MEMORY, + dom_fw_dom0_passthrough, arg); efi_memmap_walk_type(EFI_MEMORY_MAPPED_IO, dom_fw_dom0_passthrough, arg); efi_memmap_walk_type(EFI_MEMORY_MAPPED_IO_PORT_SPACE, @@ -902,8 +925,10 @@ #ifndef CONFIG_XEN_IA64_DOM0_VP MAKE_MD(EFI_LOADER_DATA,EFI_MEMORY_WB,0*MB,1*MB, 1); #else +#if 0 MAKE_MD(EFI_LOADER_DATA,EFI_MEMORY_WB, 0 * MB,
[Xen-ia64-devel] vIRQ design brief
All: This is the draft design of the IRQ virtualization, comments are appreciated. Thx,eddie Xen/IA64 interrupt virtualization * Introduction This document targets xen/ia64 developers, providing an design overview of interrupt virtualization. How the guest IOSAPIC looks like and how the machine IOSAPIC is used in hypervisor. * Terminology (Not formal definition, just for better understanding) PIRQ: Physical IRQ generate by partitioned device, vector 0-255 in X86 VIRQ: Dynamic IRQ that is pure virtual. Vector 256-511 in X86 IPI: Inter processor IRQ VIPI: MMIO: Memory Mapped IO Event channel: * Background How Xen/X86 handle callback and event channel: In Xen environment, a para-guest registers its callback/safe callback entry to hypervisor for batch delivering of events to guest. When a guest has pending events(shared bitmap), the guest execution turn to the pre-registered callback function (evtchn_do_upcall) like an interrupt happens on native system. This control transfer can be disabled by another shared variable evtchn_upcall_mask. In this way guest software can disable upcall for some reason. Within evtchn_do_upcall, the events is dispatched. I.e. call do_IRQ() or evtchn_device_upcall(). Current IA64 approach for callback: Current Xen/IA64 is using a pesudo physical IRQ to indicate the active of events and do dispatch at that pseudo IRQ handle. Within Xen summit we all agree to implement the callback/safe fallback mechanism to avoid potentail bugs and Intel is working on that now. How X86 Xenlinux handle IRQs: Guest IRQ including PIRQ, VIRQ, IPI and interdomain communication channel are all bund with event channel. I.e. they all are carried by event channel. At intial time, the guest needs to initialize IO_APIC hardware base on knowledge presented by firmware. And eventually register a pure virtual pirq_type as hw_interrupt_type instead of ioapic_level_type and ioapic_edge_type. At run time, pirq_type works and do pure event channel based operation. for example, irq_desc-handler-ack (becomes ack_pirq) mask the corresponding event channel (no hypercall). irq_desc-handler-end (becomes end_pirq) unmask the corresponding event channel and may notify xen through hypercall (PHYSDEVOP_IRQ_UNMASK_NOTIFY) to call xen irq_desc-handler-end. The later one may signal EOI in hypervisor (In IO_APIC, it is unmask_IO_APIC_irq). Difference between pirq_type and ioapic_level_type/ioapic_edge_type: The initial time of this 2 type are similar, I.e. startup/shutdown, enable/disable are same, both may need to access machine resource. But the runtime service, i.e. ack/end, are quit different. pirq_type mainly access event channel related share memory for mask/unmask, but ioapic_level_type/ioapic_edge_type needs to access machine IOSAPIC resource for example: ack_edge_ioapic_irq and ack_edge_ioapic_vector need to mask APIC reource and ack APIC. Another difference is that with event channel approach, the hw_interrupt_type, i.e. pirq_type, works for both level and edge triggered IRQ. When Xen received PHYSDEVOP_IRQ_UNMASK_NOTIFY (comes from guest pirq_type.end): pirq_guest_unmask() if ( --irq_desc-action-in_flight == 0 ) { irq_desc-handler-end(); // EOI } Done; Machine IRQ delivery in Xen/X86 The code flow of xen IRQ delivery (IRQ belongs to guests) A machine IRQ happens - xen - do_IRQ() of xen. irq_desc-handler-ack(); // same with Linux, op real resource __do_IRQ_guest() for each bund guest { send_guest_pirq(); irq_desc-action-in_flight++; } Done; send_guest_pirq(): Set pending event channel bit (shared evtchn_pending) in target processor. In SMP system when the target processor is running, a machine IPI will be sent to (evtchn_notify). When xen return to guest Before restore_all_guest, if VCPUINFO_upcall_mask=0, i.e evtchan_upcall_mask = 0 and there is pending event channel, Xen will create a bounce_frame on guest that is similar with exception frame, the guest control then goes to callback entry. *Xen/IA64 IRQ virtualization design 1: Hypervisor owns machine IOSAPIC/LSAPIC exclusively. This makes IRQ sharing between driver domains much easier as there is no contention from domains. 2: Machine IRQ delivery in Xen/IA64: The basic logical is exactly same with Linux/IA64 An IRQ happens - IVT+0x3000 - ia64_handle_irq() while (IRQ exist) { vector=CR.IVR; mask IRQ using TPR; __do_IRQ(); unmask IRQ using TPR Issue CR.EOI } A slight difference is __do_IRQ. In linux it calls do_IRQ, because Xen merge do_IRQ and __do_IRQ together and use name __do_IRQ. --- Resue The do_IRQ do followings (API in Xen/arch/x86/irq.c), the code sequence is same and is much detail explained here:
RE: [Xen-ia64-devel] vIOSAPIC and IRQs delivery
Tristan Gingold wrote: To me, it's also likely to imagine a mixed style: multiple IOSAPICs provided by the platform with total irq lines less than number of interrupt sources. In that case, people may partition interrupt domains, but finally still with irq line sharing even within local domain if it's electronically wired together. :-) I really think this is pure theory. IRQ lines were shared when hardware was costly: IBM-PC of 1981. It is also less performant (we have to check all drivers) and less sure: what about badly written drivers or fault tolerance? Do you mean you change mind in supporting shared IRQ? Previously you agreed to support in Xen/IA64. Now IOSAPIC are cheap. I don't know any ia64 vendor sharing IRQs on PCI bus. Maybe I am wrong, but until now it is true. Furthermore, PCI-e has no more IRQ lines. IRQ are now in-band messages (MSI). Thus sharing IRQs is not the future too. Is not FUJITSU PRIMEQUEST an example as mentioned by Kenji Kaneshige? Tristan. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Event channel vs current scheme speed [wasvIOSAPIC and IRQs delivery]
Tristan Gingold wrote: Le Jeudi 09 Mars 2006 21:02, Tian, Kevin a écrit : Anyway, good discussion by far though still some way to go for consensus. :-) Maybe we want to look at this from another way - fairness. [...] Regarding current model, there seems to be an issue about fairness between physical interrupts and xen events. Taking current 0xE9 for example, it's lower than timer but higher than all external device interrupts. This means xen events will always preempt device interrupts in this case, which is unfair and not what we want. To my understanding, this is also true for x86. With event channel, real physical IRQs use events 0-255, while Xen events use events 256-511. So what is the difference ? The difference is that with event channel, IRQ (PIRQ from 0-255 and VIRQ from 256-511) vector itself doesn't participate in prioritization, but event channel. There is a map between event channel and IRQ in evtchn.c. With event channel solution, all the guest physical IRQ is injected/reflected through event channel instead of vLSAPIC. Event channel is a must as VBD/VNIF and Control Panel is using it except you rewrite all of them, I think you will not think in that way.If callback itself is built on a pseudo IRQ (0xE9) in vLSAPIC, then all event channel has priority 0xE9 in VLSAPIC. That has problem as some event channel need higher priority but some lower comparing with other guest device IRQ. So the solution is to eliminate VLSAPIC and all guest PIRQ go with event channel. In the meantime, callback must go with a so called upcall function (callback function). Hope this answer your question. Tristan. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] [RESEND] domU destroy page ref counter
Really a good job! A minor suggestion for next in my mind is that we may add a simple COMPILE option in Makefile or some .h file to be able to choice 1/3 byte swap or 1/2 byte swap. People has some thoughts that 1/2 byte swap may have better hash locality. Eddie. ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] [RESEND] domU destroy page ref counter
Yes, I remembered your suggestion. That is why I suggest to enable a compile option so that somebody can start formal benchmark measurement :-) Eddie Magenheimer, Dan (HP Labs Fort Collins) wrote: Rid mangling change has been discussed many times on the list, most recently: http://lists.xensource.com/archives/html/xen-ia64-devel/2005-11/msg00282.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xu, Anthony Sent: Monday, March 13, 2006 6:37 PM To: Dong, Eddie; Tian, Kevin; Akio Takebe; Masaki Kanno; xen-ia64-devel@lists.xensource.com Subject: RE: [Xen-ia64-devel] [PATCH] [RESEND] domU destroy page ref counter From: Dong, Eddie Sent: 2006年3月13日 22:12 A minor suggestion for next in my mind is that we may add a simple COMPILE option in Makefile or some .h file to be able to choice 1/3 byte swap or 1/2 byte swap. People has some thoughts that 1/2 byte swap may have better hash locality. Eddie. I second Eddie, I have some observations about this. Usually guest applications use almost the address space, the only different is rid. What I observed was if the lowest 17 bits of rid are same, the hash address is same. If we swap 1/3 byte, applications use the same address space but different rid may have the hash address in a majority of situations, which may make some collision chains very long. These are just some observations, I don't mean 1/2 byte swap is better than 1/3 byte swap.I think we need to add COMPILE option to get benchmark data first, and then make the decision. It's obviously not a big task but deserve to do. One thing we need to pay extra attention is the rid byte swap is done in assembly code in some fast_hyperpriops. Thanks, Anthony ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Event channel vs current scheme speed [wasvIOSAPIC and IRQs delivery]
I agree the current model has implicit priorities. But I am a little bit skeptical how the priority argument. As far as I understand, in Xen or in Linux first asked is first priority. Yes, here we see 3 concerns about priority: 1: When multiple IRQs arrive at same time, higher priority one get earlier serviced in native. In xen event channel, higher priority IRQ can have higher priority event channel. So they are basically same. 2: A lower IRQ arrives first, then followed by an high priority IRQ. In this case, the situation is relative complicate. If the duration is big enough, yes the first asked is serviced. If the duration is small, then later one may preempt the first one. In virtual machine environment, the time is virtualized. So no matter which one is service first has no correctness issue (you can think the virtual duration maybe big maybe large). 3: A higher IRQ arrives first, then followed by an lower priority IRQ. In this case, higher IRQ must be serviced earlier than lower one. Xen event channel search the highest priority IRQ and do service. At service time, the callbreak is masked but the event can still be set. So a lower IRQ can't interrupt higher one. Semantics are guaranteed. All in all, above long context is just one factor that I view to choose the proper mechanism. :-) If you are that worried about priorities, we way find solution in the current scheme. I'd just like to understand why priorities are that important. These are all corner cases that we must consider as product, but at early development we can take shortcut like using pseudo IRQ for event channel here to let the whole project go ahead. And this is what we talked at xensummit, people (Dan, Ian, Keir, Jun) all have no object for potential issue concerns (for example mask/unmask support and priority issue) and agree to take next. PPC guy also uses pseudo physical IRQ for event channel as I remembered. Their community is much smaller than us now and their development is also lagger than IA64. This is why we need to clean up now as callback based event channel approach has already been in production stage. Making a new mechanism has high risk. BTW, even with this strong event channel mechanism in Xen, we sometime saw bugs in xen-devl such as a deadlock in a VMX SMP system like Xin rootcaused before new year. But anyway it is very few now. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] vIOSAPIC and IRQs delivery
Tristan: One more thing is that our proposal is to make out an idea design for IRQ virtualization. We don't want to spend plenty of time to argue each detail here. I have found several limitations in previous patch and I'd like to suggest us to work together to get the idea design. Let us complete the event channel based design first, then people can compare and comments. Will u contribute to that effort too? Then you can fight between your left hand and right hand, like letting a republican to debate for democracy or visa versa :-) The current solution (dom0 own IOSAPIC and event channel built on pseudo physical IRQ) can serve us for a while before a well considered solution comes out. BTW, please refer to xen-devl, Keir confirmed io_apic.c in X86 is only used for initialization time, it is no longer necessary at run time. See embedded too. thx,eddie Tristan Gingold wrote: The event channel model in some case will request real IOSAPIC operation base on the type it is bund to. The software stack layer is very clear: 1: guest PIRQ (top), 2: event channel (middle), 3: machine IRQ (bottom). BTW, event channel is a pure software design, there is no architecture dependency here. I don't wholy agree. The callback entry is written in assembly, and seems to have tricks. callback is one of the mechanism that event channel is carried on. Using pseudo physical IRQ like in current xen/ia64 is another alternative. No related to callback here. Then let us see where the previous patch need to improve. 1: IRQ sharing is not supported. This feature, especially for huge Iron like Itanium, is a must. I agree. However we won't reach this problem now as device drivers do not exist yet. we are designing this patch to solve the driver domain problem in xensummit. If won't reach the problem is the reason to not do that, why we need this change? let dom0 own IOSAPIC is pretty simple and robust. Remember our goal is: 1: Support driver domain 2: Drive domain may share IRQ lines. 2: Sharing machine IOSAPIC resource to multiple guest introduces many dangerous situation. Example: If DeviceA in DomX and Device B in DomY share IRQn, When domX handle DeviceA IRQ (IRQn), take the example of function in the patch like mask_irq: s1:spin_lock_irqsave(iosapic_lock, flags); s2:xen_iosapic_write () // write RTE to disable the IRQ in this line s3:spin_unlock_irqrestore(iosapic_lock, flags); Here is the domX is switched out at S3, and DeviceB fire an IRQ at that time. Due to the disable in RTE, DomY can never response to the IRQ till DomA get executed again and enable RTE. This doesn't make sense for me. Neither for me. However my patch do not allow this behavior: once an IRQ is allocated by a domain, it can't be modified by another one. Again I agree this is far from perfect and using an in_flight mechanism is better. No, I don't think in_flight can help on this as if domX is masking machine RTE. The point is all those real IOSAPIC resource should be owned by xen, no partitioning, no sharing. 3: Another major issue is that there is no easy way in future to add IRQ sharing support base on that patch. That is why I want to let hypervisor own IOSAPIC exclusively, and guest are purely based on software mechanism: Event channel. I don't think IRQ sharing requires event channel. This can also be done using current IRQ delivery. Don;t know what is IRQ delivery mean. 4: More new hypercall is introduced and more call to hypervisor. Only physdev_op hypercall is added, but it is also used in x86 to set up IOAPIC. You can't avoid it. Initial time is OK for no matter what approach, runtime is critical. I saw a lot of hypercall for RTE write. Additionnal calls to hypervisor are for reading or writting IVR, EOI and TPR. I really think this is fast using hyper-privop. The current ia64 model is well tested too and seems efficient too (according to Dan measures). Yes, Xen/IA64 can say having undergone some level of test although domU is still not that stable. Maybe because domU do not have pirqs :-) But vIOSAPIC is totally new for VMs and is not well tested. Whatever we do Xen will control IOSAPICs. For sure my patch is not well tested, but simple enough. We should not take this risk for a one month lifecycle patch. On the other hand, the event channel based approach is well tested in Xen with real deployment by customer. Correct but it won't drap and drop on ia64. No, all the code are xen common in para-guest side, you don't need to drap and drop. And even more, the patch based on event channel will be less than your previous patch. I.e. less modification to xenlinux. BTW, due to virtual driver support, all the xen event channel related files are already imported in xen/ia64. This evtchn.c file contain all guest IRQ virtualizaion code. You don't need to add new
RE: [Xen-ia64-devel] SMP-g design notes
Tian, Kevin wrote: Yes, emulation of ptc.ga will be more complex than other emulations. However simply talking about the claim above, IPI is a necessity IMO, if another vcpu is running on another LP at the emulation. Though we may add some lazy flush later. Agree! I would like to see xen implements this kind of IPI for global purge. And I'd like to see a whole approach design in near future for this issue so that we can comment :-) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] vIOSAPIC and IRQs delivery
Tristan Gingold wrote: Le Mardi 07 Mars 2006 00:34, Dong, Eddie a écrit : Magenheimer, Dan (HP Labs Fort Collins) wrote: Hi Tristan -- Do you have any more design information? I'm not very familiar with the x86 implementation but is it your intent for it to be (nearly) identical? What would be different? The difference is that should guest OS (para xen) still access the IOSAPIC MMIO port? If the guest OS keeps accessing the machine IOSAPIC MMIO address, multiple driver domain share same IRQ has potential problem. The design in my opnion is that hypervisor own the machine IOSAPIC resource exclusively including reading IVR and issuing CR.EOI. All the guest is working with a pure virtual IOSAPIC or virtual IO_APIC (actually doesn't matter for guest). [Note that IVR and CR.EOI are LSAPIC stuff.] So should we use a new term virtual IRQ or interrupt virtualization? Both LSAPIC and IOSAPIC need to be done in vIRQ. BTW, RTE is still accessed by para-guest in previous patch :-) Writing of RTE in machine resource from one domain will impact the correctness of other domain if they share same IRQ line. Would all hardware I/O interrupts require queueing by Xen in an event channel? This seems like it could be a potential high overhead performance issue. There are two things: * delivery of IRQs through event channel. I am not sure about performance impact (should be almost the same). I am sure about linux modification impact (new files added, interrupt low-level handling completly modified). I don't see too much Linux modifications here as most of these files are already in xen. You can find them if you compile a X86 Xen, see linux/arch/xen/kernel/** , all those event channel related file are there including the PIRQ dispatching. In some sense, the whole IOSAPIC.c file is no longer a must. * Use of callback for event channel (instead of an IRQ). I suppose it should be slightly faster. I suppose this is required (for speed reasons) if we deliver IRQs through event-channel. Mmm, I have different opnion here. With all guest physical IRQ queueing by Xen event channel through a bitmap that is shared in para-guest, the guest OS no longer needs to access IVR and EOI now, that means we don't need to trap into hypervisor. Checking the bitmap is defenitely higher performance than read IVR, in this way the performance is improved actually. I really think this is not that obvious due to hyper-privop and hyper-reflexion. This is basically the difference between hypercall and using share memory. Hard to say the amount but benefits is clear, although as this code is frequently accessed especially for driver domain where there are a lot of IRQs. Please start (maybe using some mails we have exchanged). I will complete if necessary. Yes, I have sent u some drafts. Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] vIOSAPIC and IRQs delivery
Magenheimer, Dan (HP Labs Fort Collins) wrote: Hi Tristan -- Do you have any more design information? I'm not very familiar with the x86 implementation but is it your intent for it to be (nearly) identical? What would be different? The difference is that should guest OS (para xen) still access the IOSAPIC MMIO port? If the guest OS keeps accessing the machine IOSAPIC MMIO address, multiple driver domain share same IRQ has potential problem. The design in my opnion is that hypervisor own the machine IOSAPIC resource exclusively including reading IVR and issuing CR.EOI. All the guest is working with a pure virtual IOSAPIC or virtual IO_APIC (actually doesn't matter for guest). Would all hardware I/O interrupts require queueing by Xen in an event channel? This seems like it could be a potential high overhead performance issue. Mmm, I have different opnion here. With all guest physical IRQ queueing by Xen event channel through a bitmap that is shared in para-guest, the guest OS no longer needs to access IVR and EOI now, that means we don't need to trap into hypervisor. Checking the bitmap is defenitely higher performance than read IVR, in this way the performance is improved actually. In the meantime, we don't need to spend time to re-design the vIOSAPIC, it could be same with X86 vIO_APIC (90%). Definitely somebody need to write down the vIO_APIC design :-) Tristan or me can do that, Tristan? Perhaps a design document (or at least a few paragraphs) would be useful for the developers on the list. Yes. Thanks, Dan Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] CONFIG_DOMAIN0_CONTIGUOUS in domain.c
Dan: I guess you misunderstand here. Definitely we need to fix this bug first for the path #undef can't work as pre-clean up patch. With this patch, everything can stay with same functionality. It is not necessary to combine it together with VP+DMA patches that makes things much complicate. Eddie -Original Message- From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:[EMAIL PROTECTED] Sent: 2006年3月1日 7:36 To: Dong, Eddie; Tian, Kevin; Isaku Yamahata Cc: xen-ia64-devel@lists.xensource.com Subject: RE: [Xen-ia64-devel] CONFIG_DOMAIN0_CONTIGUOUS in domain.c This isn't a performance issue. I don't think domain0/U will function correctly with CONFIG_...CONTIGUOUS undef'd until all of Isaku's necessary VP+DMA changes (in Xen, Xenlinux, Xen drivers, and possibly tools) are complete. -Original Message- From: Dong, Eddie [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 28, 2006 4:19 PM To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin; Isaku Yamahata Cc: xen-ia64-devel@lists.xensource.com Subject: RE: [Xen-ia64-devel] CONFIG_DOMAIN0_CONTIGUOUS in domain.c Magenheimer, Dan (HP Labs Fort Collins) wrote: to VP. HOWEVER... it may be possible and desirable for much of Isaku's work to support both VP and P==M. For non-I/O code, CONFIG_DOMAIN0_CONTIGUOUS could be used (or possibly renamed) to select VP or P==M at compile-time, at least until the conversion to VP+DMA is complete. This would allow at least some of Isaku's As if eventually we will remove this code, putting an compile option now is OK IMO. But I think the default one should be #undefed by some pre-cleanip patch now so that people can find issues earlier if there have. #undef this one can support no matter p==m or p!=m, while #define this can only support p==m. Yes maybe we will see 0.5% performance degradation with #undef, but this is a functionality must as we all go toward p!=m :-( After the whole p2m/VP patch comes out, we can then do more performance tuning :-) Eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] SMP guest: first boot
Congratulations too! Now more SMP host issues can be found, right? Bravo Tristan! thx,eddie Alex Williamson wrote: On Mon, 2006-02-27 at 14:01 +0100, Tristan Gingold wrote: Hi, this is my first dom0 boot using 2 cpus: Nice work Tristan! ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] CONFIG_DOMAIN0_CONTIGUOUS in domain.c
Isaku Yamahata wrote: Hi. I think that construct_dom0() is broken for CONFIG_DOMAIN0_CONTIGUOUS. I had to modify it heavily to boot dom0 with P2M/VP model. For example construct_dom0() ... memcpy(__va(pinitrd_start),initrd_start,initrd_len); This memcpy() assumes p==m. I thought that CONFIG_DOMAIN0_CONTIGUOUS option was introduced at the eary develpment stage and it remained just because no one removed it. However I don't know its history. Looks like true :-) So, I guess you will provide a fix patch to remove those stuff as a pre-cleanup, am I right? I support this as somebody else can do more test base on this. thx,eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] SMP_HOST: Alloc vhpt from domheap
First of all, Anthony's previous patch is good enough to check in. This memory used for VHPT is allocated from dom heap but doesn't belong to any specific domain. In future per VP VHPT options, yes we should account this memory to the domain memory size. Secondly, yes as Alex pointed out, we may redesign the Xenheap size, but it looks like we can defer this to some time later. Something in X86 side like m2p table may be needed in IA64 side in future but not sure. If we need this one, it should be in xenheap that is a relative big memory chunk. On the other side, the xenheap is translated by a single TR to save the treasure TLB resource, so we can only choose among 16MB, 64MB and 256MB supported by IA64 architecture. Probably 16MB is too small :-) Eddie Xu, Anthony wrote: From: Isaku Yamahata Sent: 2006年2月27日 13:18 struct domain-max_pages is used for two purposes currently. a) to account pages allocated for a domain. (by xen/common/page_alloc.c) b) maximal pseudo physical address. (e.g. lookup_domain_mpa() in xen/arch/ia64/domain.c and others) This patch breaks b). Somethings needs to be adjusted. Maybe it is needed to add a new member to struct arch_domain for b) and to compensate max_pages at domain construction. Good catch! Domain-max_pages should be the number of memory pages allocated to domain, for instance, if a domain has 512M memory, the domain-max_pages should be 512M/16K. VHPTs are allocated from domheap, but not from designated domain due to the first parameter is NULL, so domain-max_pages and domain-tot_pages will not be impacted. Seems not break a and b. Yes, you can use two variables with each representing domain's memory pages and pages used by this domain separately, the later includes the former. How do you think about accounting pages which is used for struct arch_domain-mm? Please see pgtable_quick_list_alloc() in xen/arch/ia64/xen/xenmis.c. It's the same issue with above, it is better that P2M table is allocated from domheap with the first parameter NULL instead of xenheap, since you are doing P2M task; you can fix this in the same time. Thanks. ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
[Xen-ia64-devel] CONFIG_DOMAIN0_CONTIGUOUS in domain.c
All: I am not sure if somebody has tested the pathwith CONFIG_DOMAIN0_CONTIGUOUSin xen/arch/ia64/xen/domain.c disabled, as this is a must for p2m/vp coming patches, please echo if yes?If no, then it is an urgent task now toeither remove the code or disable the configuration by default. Thanks,eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] SMP guest and itc
Alex: Can't exposing an formula cooperatively between domain hypervisor solve the prpblem? E.g: guest_ITC=host_ITC * fact + offset;If the host ITC is accurate enough. Or guest_ITC = host_high_precise_timer_count * fat + offset; if the hyprevisor want to use other platform high precise timer like HPET. Implementing those one in xenlinux is quit easy IMO. One more thing for time virtualization for us is to implement other timer resource virtualization like ACPI timer, HPET that may be used optionally. Eddie Alex Williamson wrote: On Tue, 2006-02-14 at 09:26 +0800, Dong, Eddie wrote: Base on my understanding, the ITC drift between different processor after fixup done in Linux or Xen today is less than 100ns. So I think that is not a big issue as if we guarantee the guest doesn't see backward time. (As VP migration usually take longer than 100ns) Hi Eddie, This is why I suggest pre-sync'd ITC are probably sufficient for now. However, we will need to support systems where the ITC between processors drift. In such a case, we either need to expose a better time source to the guest (perhaps a paravirtualized time interpolator) or fabricate ITC values for the guest which make the ITCs appear synchronized. The latter feels like it could be a bottleneck. For the gettimeofday() concern, I don't agree. Because even we support full virtualization, an paravirtualized guest can still get guest ITC quickly by exposing the formula to guest or accessing share memory (X86 use share memory). Have a look at kernel/timer.c:time_interpolator_get_counter(). ITCs cannot be perfectly synchronized, therefore all ITC time interpolators have jitter. We ensure that we never see time go backwards by keeping track of the last cycle count we returned. Storing this cycle counter requires a cmpxchg. As soon as we get multiple CPUs doing gettimeofday(), we get contention in the cmpxchg. Multiple CPUs doing gettimeofday() simultaneously can potentially cause us to read the ITC many, many, many times. If each read causes a trap into xen, the performance implications could be severe. This is why larger systems provide HPETs or other similar platform time sources. The ITC based time interpolator does not scale well to large SMP systems. Anyway, gettimeofday is not frequently accessed, so it is not a big cake:-) I disagree and note that Linux/ia64 implements fsys_gettimeofday to avoid even entering C code to make a fast gettimeofday call. Thanks, Alex ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] IOSAPIC virtualisation
Tristan Gingold wrote: Le Vendredi 03 Février 2006 17:13, Alex Williamson a écrit : On Fri, 2006-02-03 at 09:33 +0100, Tristan Gingold wrote: [...] I agree that we can't hit this problem right now, but it's easy to fix and would be one less thing we might miss when we do enable driver domains. It looks the block of code to mask the vector could be copied identically into the section to unmask the vector with the appropriate s/mask_vec/unmask_vec and setting of the rte values. I guess it keeps catching my eye because the mask and unmask are not symmetric. Thanks, Hi, I have slightly modified the patch so that it looks almost symmetric. Thanks, Tristan. Tristan: Great work! And sorry I don't find time to go through all. A quick question is that why we need to do cpu_wake() immediately after IRQ injection? In Xen design, this API is mainly used for VP pause/unpause and manual ops that is reasonable to disturb/bypass the scheduler decision. This disturb is heavily costed as the scheduler triggered in the next time tick will go back to its normal decision tree that probably means preemption of dom0 quantum. What X86 did is to wait for the scheduler to take the decision. I know the original code also do in this way, but it is not an architecture requirement. Rather it is a shortcut in previous implementation, and I think it is time to revise now. +xen_reflect_interrupt_to_domains (ia64_vector vector) +{ + struct iosapic_intr_info *info = iosapic_intr_info[vector]; + struct iosapic_rte_info *rte; + int res = 1; + + list_for_each_entry(rte, info-rtes, rte_list) { + if (rte-vcpu != NULL) { + if (rte-vcpu == VCPU_XEN) + res = 0; + else { + /* printf (Send %d to vcpu as %d\n, + vector, rte-vec); */ + /* FIXME: vcpus should be really + interrupted. This should currently works + because only domain0 receive interrupts and + domain0 runs on CPU#0, which receives all + the interrupts... */ + vcpu_pend_interrupt(rte-vcpu, rte-vcpu_vec); + vcpu_wake(rte-vcpu); + } Another minor comments are: 1: +#define VCPU_XEN ((struct vcpu *)1)looks stranger for me. Further more, I'd like to put a bit in RTE indicating ownership of IRQ, anything else you considered? 2: Similar with #1, checking IRQ vector (if (vector == IA64_TIMER_VECTOR || vector == IA64_IPI_VECTOR)) in following code is too hardcode. Today we only have 2 IRQs in hypervisor, but actually we need more such as platform management interrupt like Alex mentioned previously for hotplug, thermal sensor IRQ. So we don't want to see a long list of check here. My suggestion is to adopt similar mechanism with X86. I.e. like __do_IRQ_guest in arch/x86/irq.c, the detail implementation can be architecture dependant like x86 use desc-status IRQ_GUEST but we may not. Anyway, keep the capability that a machine IRQ may be bound to multiple guest like X86 did today is better and it is not so difficult. you may also be able to reuse some code there :-) xen_do_IRQ(ia64_vector vector) { - if (vector != IA64_TIMER_VECTOR vector != IA64_IPI_VECTOR) { - extern void vcpu_pend_interrupt(void *, int); + struct vcpu *vcpu; + ia64_vector v; + + /* Do not reflect special interrupts. */ + if (vector == IA64_TIMER_VECTOR || vector == IA64_IPI_VECTOR) + return 0; + Thx,eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] IOSAPIC virtualisation
Tristan Gingold wrote: Yes I have just copied from the original code. However, we should also take IPI into consideration (unless we go directly to event channel). Can you explain more on IPI stuff? I am not in the context. Anyway, keep the capability that a machine IRQ may be bound to multiple guest like X86 did today is better and it is not so difficult. you may also be able to reuse some code there :-) To be added on my TODO list, since we can't trigger such a case or test it now. Mmm, I would suggest we come out a full solution and hold this patch for a while. Your previous patch let hypervisor own IOAPIC, but it is still not Xen solution. Sharing physical IRQ by mulitple driver domain is a normal case for level triggered IRQ. To be more important, the X86 solution to handle physical IRQ is pretty clean and beautiful, why not resue the code? Keir, please correct me if I made mistake in understanding the X86 IOAPIC virtualization policy. Yes, we don't meet this situation now because driver domain is not there yet, but are not we implementing this patch to solve future driver domain issue? If this is supported, then other issues like how to indicating the ownership of IRQ disappear too. Thx,eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] Make VTIdomain boot again
Dan and all: This is an bug related to the task event channel callback/failsafe callback that we decided to do in Xen summit but not started yet. If the task is completed, then same mechanism with X86 should be implemented, and the soft-IRQ only happens when returning to guest like X86. (see arch/x86/x86_32/entry.s API: ret_from_intr, test_all_events only happens when returning to guest). I believe there are others issues but not found yet due to our very beginning's walkaround in IA64. That usually means saving time at beginning (shortcut) take us more in later :-( Eddie Xu, Anthony wrote: Hi All, Since the merge from xen-unstable, there is a small window between bvt_do_schedule and context_switch in function __enter_schedule, where interrupt is enabled. See below scenario: 1. VTI domain accesses legacy IO, VMM gets control, sets VTI-domain into blocked status and calls __enter_schedule to yield scheduler and wait QEMU in domain0 to handle IO request. 2. There is a timer interrupt in above window, and this timer interrupt triggers schedule timer, then in irq_exit function, VMM will do soft_irq, which in turn will invoke __enter_schedule, thus __enter_schedule is reentered in VMM, which is not correct. So the root cause is __enter_schedule is reentered. The correct way is, soft_irq should be done just before VMM returning to guest just like in native linux soft-irq is done just before linux returning to application. But in current implementation soft-irq is done in irq-exit function. The reason why xenU can boot is, xenU is always runable, so it will not be deleted from runqueue, though __enter_schedule is reentered, no issue appear, as for VTIdomain, when it does IO operation, it will be set into blocked status and be deleted from runqueue, which will crash the whole system. This patch is just a workaround, it makes sure in irq_exit only when VMM is not in nested interrupted situation, soft_irq is done. I strongly suggest soft-irq be done in the path of ia64_leave_kernel just like native linux kernel. Any comment? Thanks, -Anthony ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] Meeting Summary taken from Xen-ia64 Next StepsDiscussion during Xen Summit
Thanks for the summary and one more thing: We are all agreed the mechanism to send event channel should use callback/failsafe eventually instead of using a pseudo physical IRQ, as the latter one has some potential issues. Actually same issues happen in PPC side too. Thanks Dan, Thanks Ian and Keir for these great results. Eddie Yang, Fred wrote: Xen-ia64 Community members, Following is the agreement/summary taken from Xen/ia64 Next Steps Discussion session held during Xen Summit at January 17, 2006. The items in http://www.xensource.com/files/xs0106_ia64_nextsteps_disc.pdf are fully discussed The Work session attendees had agreed following actions in order to move Xen/ia64 to the next stage, and the efforts will be started immediately. 1. Physical Memory support for Domain0 * PPC port has the similar P2M issue as Xen-ia64 * Group agreed P2M is the route to take, the detail implementation can be between P2M VP approaches to change XenLinux as less as possible * To merge P2M into mainline code may cause Xen-ia64-unstable to be buggy or unstable for a period of time. Since this is a must feature to go, we should merge the code and get community to work together to get system stablized * Fujitsu has been looking into this effort and will contribute this effort * To Enable P2M for Domain0 is must for Xen-ia64 and should be done early to enable VBD/VNIF driver domain to come 2. Memory enhancement for page reference count * this can possibly cause stability issue and affecting domain destory * This item is a must for Xen-ia64 3. Virtual Interrupt Controller to let Xen owns physical IOSAPIC * This can help to address SMP guest for para-domains as well as a must for driver domain * a must item for xen-ia64 4. VTLB/VHPT SMP Support * To support next step SMP guest support, hash VTLB and same VHPT model should be adapted * A patch to extend hash VTLB/VHPT to hookup for para-virtualized Domain should be added Code should try to be built with option to able to pull original VHPT model back per future performance tunning needs * A must item for Xen/ia64 to get to SMP guest support 5. Reboot/Destroy Domains * A must item after page reference count is done 6. Hypercalls * Can be adapted per P2M and future needs 7. Timer Virtualization * This is defineitely worth to do to help the performance This summary lists major effort to overhaul overall Xen/ia64 infrastructure, we welcome developers to contribute into these efforts -Fred Yang ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] implemented vcpu_ptc_l()
Isaku Yamahata wrote: On Tue, Dec 06, 2005 at 01:31:03PM +0800, Dong, Eddie wrote: I disabled CONFIG_SMP manually because ski doesn't support smp. (And I enabled some configs related to ski.) para linux with CONFIG_SMP=n uses ptc.l. That makes sense. Should I enable CONFIG_SMP? If so, does following patch make sense? I think ptc doesn't need to purge TR, and also 1-entry TLB needs to be purged. I mean vcpu-arch.dtlb and vcpu-arch.itlb. Thanks,eddie ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [PATCH] implemented vcpu_ptc_l()
Yamahata san: I think the para linux does not use ptc, does it on ski? 2 comments: a: Any reason to purge TR? b: I believe there is a single SW TLB entry in non-VTI implementation, if you do that, you'd better to check this too. Eddie -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Isaku Yamahata Sent: 2005年12月6日 11:21 To: xen-ia64-devel@lists.xensource.com Subject: [Xen-ia64-devel] [PATCH] implemented vcpu_ptc_l() Hi. I implemented vcpu_ptc_l() which is needed to boot dom0 on ski. Is there any reason why it hasn't been implemented? I didn't see difficulties to implement it. Do I miss anything? Signed-off-by Isaku Yamahata [EMAIL PROTECTED] -- diff -r c4a86ad93e49 xen/arch/ia64/linux-xen/tlb.c --- a/xen/arch/ia64/linux-xen/tlb.c Thu Dec 1 18:21:59 2005 +0900 +++ b/xen/arch/ia64/linux-xen/tlb.c Tue Dec 6 12:13:48 2005 +0900 @@ -110,6 +110,15 @@ } void +ia64_local_tlb_purge (unsigned long start, unsigned long end, unsigned long nbits) +{ + do { + ia64_ptcl(start, (nbits 2)); + start += (1UL nbits); + } while (start end); +} + +void local_flush_tlb_all (void) { unsigned long i, j, flags, count0, count1, stride0, stride1, addr; diff -r c4a86ad93e49 xen/arch/ia64/xen/vcpu.c --- a/xen/arch/ia64/xen/vcpu.c Thu Dec 1 18:21:59 2005 +0900 +++ b/xen/arch/ia64/xen/vcpu.c Tue Dec 6 12:13:48 2005 +0900 @@ -1827,8 +1827,20 @@ IA64FAULT vcpu_ptc_l(VCPU *vcpu, UINT64 vadr, UINT64 addr_range) { - printk(vcpu_ptc_l: called, not implemented yet\n); - return IA64_ILLOP_FAULT; + extern void ia64_local_tlb_purge (unsigned long start, unsigned long end, unsigned long nbits); + + //XXX FIXME: validate not flushing Xen addresses + if (IS_VMM_ADDRESS(vadr)) { + return IA64_ILLOP_FAULT; + } + +#ifdef VHPT_GLOBAL + vhpt_flush_address(vadr, addr_range); +#endif + ia64_local_tlb_purge(vadr, vadr + addr_range, PAGE_SHIFT); + vcpu_purge_tr_entry(PSCBX(vcpu,dtlb)); + vcpu_purge_tr_entry(PSCBX(vcpu,itlb)); + return IA64_NO_FAULT; } // At privlvl=0, fc performs no access rights or protection key checks, while -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel