Re: [patch 4/6] Rename the parainstructions symbols to be consistent with the others

2007-04-03 Thread Rusty Russell
On Tue, 2007-04-03 at 18:06 -0700, Jeremy Fitzhardinge wrote:
> plain text document attachment (fix-parainstructions-name.patch)
> The other symbols used to delineate the alt-instructions sections have
> the form __foo/__foo_end.  Rename parainstructions to match.

OK, I guess this is an area where the kernel has its own standard.

(__start_ and __stop_ are the symbols automatically
inserted by ld if the section name is straight alpha-numeric.  It's
actually pretty cool for code where you want section boundaries without
writing a linker script).

Cheers,
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 3/6] Remove smp_alt_instructions

2007-04-03 Thread Jeremy Fitzhardinge
The .smp_altinstructions section and its corresponding symbols are
completely unused, so remove them.

Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/i386/kernel/alternative.c |   38 ++
 arch/i386/kernel/vmlinux.lds.S |   11 ---
 include/asm-i386/alternative.h |6 +-
 3 files changed, 3 insertions(+), 52 deletions(-)

===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -132,10 +132,7 @@ static void nop_out(void *insns, unsigne
 }
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
-extern struct alt_instr __smp_alt_instructions[], __smp_alt_instructions_end[];
 extern u8 *__smp_locks[], *__smp_locks_end[];
-
-extern u8 __smp_alt_begin[], __smp_alt_end[];
 
 /* Replace instructions with better alternatives for this CPU type.
This runs before SMP is initialized to avoid SMP problems with
@@ -170,29 +167,6 @@ void apply_alternatives(struct alt_instr
 }
 
 #ifdef CONFIG_SMP
-
-static void alternatives_smp_save(struct alt_instr *start, struct alt_instr 
*end)
-{
-   struct alt_instr *a;
-
-   DPRINTK("%s: alt table %p-%p\n", __FUNCTION__, start, end);
-   for (a = start; a < end; a++) {
-   memcpy(a->replacement + a->replacementlen,
-  a->instr,
-  a->instrlen);
-   }
-}
-
-static void alternatives_smp_apply(struct alt_instr *start, struct alt_instr 
*end)
-{
-   struct alt_instr *a;
-
-   for (a = start; a < end; a++) {
-   memcpy(a->instr,
-  a->replacement + a->replacementlen,
-  a->instrlen);
-   }
-}
 
 static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 {
@@ -319,8 +293,6 @@ void alternatives_smp_switch(int smp)
printk(KERN_INFO "SMP alternatives: switching to SMP code\n");
clear_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
clear_bit(X86_FEATURE_UP, cpu_data[0].x86_capability);
-   alternatives_smp_apply(__smp_alt_instructions,
-  __smp_alt_instructions_end);
list_for_each_entry(mod, &smp_alt_modules, next)
alternatives_smp_lock(mod->locks, mod->locks_end,
  mod->text, mod->text_end);
@@ -328,8 +300,6 @@ void alternatives_smp_switch(int smp)
printk(KERN_INFO "SMP alternatives: switching to UP code\n");
set_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
set_bit(X86_FEATURE_UP, cpu_data[0].x86_capability);
-   apply_alternatives(__smp_alt_instructions,
-  __smp_alt_instructions_end);
list_for_each_entry(mod, &smp_alt_modules, next)
alternatives_smp_unlock(mod->locks, mod->locks_end,
mod->text, mod->text_end);
@@ -384,17 +354,13 @@ void __init alternative_instructions(voi
printk(KERN_INFO "SMP alternatives: switching to UP 
code\n");
set_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
set_bit(X86_FEATURE_UP, cpu_data[0].x86_capability);
-   apply_alternatives(__smp_alt_instructions,
-  __smp_alt_instructions_end);
alternatives_smp_unlock(__smp_locks, __smp_locks_end,
_text, _etext);
}
free_init_pages("SMP alternatives",
-   __pa_symbol(&__smp_alt_begin),
-   __pa_symbol(&__smp_alt_end));
+   __pa_symbol(&__smp_locks),
+   __pa_symbol(&__smp_locks_end));
} else {
-   alternatives_smp_save(__smp_alt_instructions,
- __smp_alt_instructions_end);
alternatives_smp_module_add(NULL, "core kernel",
__smp_locks, __smp_locks_end,
_text, _etext);
===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -116,21 +116,10 @@ SECTIONS
 
   /* might get freed after init */
   . = ALIGN(4096);
-  .smp_altinstructions : AT(ADDR(.smp_altinstructions) - LOAD_OFFSET) {
-   __smp_alt_begin = .;
-   __smp_alt_instructions = .;
-   *(.smp_altinstructions)
-   __smp_alt_instructions_end = .;
-  }
-  . = ALIGN(4);
   .smp_locks : AT(ADDR(.smp_locks) - LOAD_OFFSET) {
__smp_locks = .;
*(.smp_locks)
__smp_lo

[patch 6/6] Allow boot-time disable of paravirt_ops patching

2007-04-03 Thread Jeremy Fitzhardinge
Add "noreplace-paravirt" to disable paravirt_ops patching.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 Documentation/kernel-parameters.txt |3 +++
 arch/i386/kernel/alternative.c  |   13 +
 2 files changed, 16 insertions(+)

===
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -64,6 +64,7 @@ parameter is applicable:
GENERIC_TIME The generic timeofday code is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
+   PV_OPS  A paravirtualized kernel
PARIDE  The ParIDE subsystem is enabled.
PARISC  The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
@@ -1134,6 +1135,8 @@ and is between 256 and 4096 characters. 
nomca   [IA-64] Disable machine check abort handling
 
nomce   [IA-32] Machine Check Exception
+
+   noreplace-paravirt  [IA-32,PV_OPS] Don't patch paravirt_ops
 
noreplace-smp   [IA-32,SMP] Don't replace SMP instructions
with UP alternatives
===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -30,6 +30,16 @@ static int __init setup_noreplace_smp(ch
 }
 __setup("noreplace-smp", setup_noreplace_smp);
 
+#ifdef CONFIG_PARAVIRT
+static int noreplace_paravirt = 0;
+
+static int __init setup_noreplace_paravirt(char *str)
+{
+   noreplace_paravirt = 1;
+   return 1;
+}
+__setup("noreplace-paravirt", setup_noreplace_paravirt);
+#endif
 
 #define DPRINTK(fmt, args...) if (debug_alternative) \
printk(KERN_DEBUG fmt, args)
@@ -329,6 +339,9 @@ void apply_paravirt(struct paravirt_patc
 {
struct paravirt_patch *p;
 
+   if (noreplace_paravirt)
+   return;
+
for (p = start; p < end; p++) {
unsigned int used;
 

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 2/6] Remove noreplacement option

2007-04-03 Thread Jeremy Fitzhardinge
noreplacement is dangerous on modern systems because it will not replace the
context switch FNSAVE with SSE aware FXSAVE. But other places in the kernel 
still assume
SSE and do FXSAVE and the CPU will then access FXSAVE information with
FNSAVE and cause corruption.

Easiest way to avoid this is to remove the option. It was mostly for paranoia
reasons anyways and alternative()s have been stable for some time.

Thanks to Jeremy F. for reporting and helping debug it.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 Documentation/x86_64/boot-options.txt |4 
 arch/i386/kernel/alternative.c|   21 ++---
 2 files changed, 2 insertions(+), 23 deletions(-)

===
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -305,7 +305,3 @@ Debugging
stuck (default)
 
 Miscellaneous
-
-  noreplacement  Don't replace instructions with more appropriate ones
-for the CPU. This may be useful on asymmetric MP systems
-where some CPUs have less capabilities than others.
===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -5,15 +5,9 @@
 #include 
 #include 
 
-static int no_replacement= 0;
 static int smp_alt_once  = 0;
 static int debug_alternative = 0;
 
-static int __init noreplacement_setup(char *s)
-{
-   no_replacement = 1;
-   return 1;
-}
 static int __init bootonly(char *str)
 {
smp_alt_once = 1;
@@ -25,7 +19,6 @@ static int __init debug_alt(char *str)
return 1;
 }
 
-__setup("noreplacement", noreplacement_setup);
 __setup("smp-alt-boot", bootonly);
 __setup("debug-alternative", debug_alt);
 
@@ -252,9 +245,6 @@ void alternatives_smp_module_add(struct 
struct smp_alt_module *smp;
unsigned long flags;
 
-   if (no_replacement)
-   return;
-
if (smp_alt_once) {
if (boot_cpu_has(X86_FEATURE_UP))
alternatives_smp_unlock(locks, locks_end,
@@ -289,7 +279,7 @@ void alternatives_smp_module_del(struct 
struct smp_alt_module *item;
unsigned long flags;
 
-   if (no_replacement || smp_alt_once)
+   if (smp_alt_once)
return;
 
spin_lock_irqsave(&smp_alt, flags);
@@ -320,7 +310,7 @@ void alternatives_smp_switch(int smp)
return;
 #endif
 
-   if (no_replacement || smp_alt_once)
+   if (smp_alt_once)
return;
BUG_ON(!smp && (num_online_cpus() > 1));
 
@@ -374,13 +364,6 @@ void __init alternative_instructions(voi
 void __init alternative_instructions(void)
 {
unsigned long flags;
-   if (no_replacement) {
-   printk(KERN_INFO "(SMP-)alternatives turned off\n");
-   free_init_pages("SMP alternatives",
-   __pa_symbol(&__smp_alt_begin),
-   __pa_symbol(&__smp_alt_end));
-   return;
-   }
 
local_irq_save(flags);
apply_alternatives(__alt_instructions, __alt_instructions_end);

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 1/6] Re-enable VDSO by default with PARAVIRT

2007-04-03 Thread Jeremy Fitzhardinge
Everyone wants VDSO to be enabled by default.  COMPAT_VDSO still needs
a fix, but with luck that will turn up soon.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/sysenter.c |4 
 1 file changed, 4 deletions(-)

===
--- a/arch/i386/kernel/sysenter.c
+++ b/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 4/6] Rename the parainstructions symbols to be consistent with the others

2007-04-03 Thread Jeremy Fitzhardinge
The other symbols used to delineate the alt-instructions sections have
the form __foo/__foo_end.  Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/kernel/alternative.c |6 +++---
 arch/i386/kernel/vmi.c |6 +++---
 arch/i386/kernel/vmlinux.lds.S |4 ++--
 include/asm-i386/alternative.h |4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -327,8 +327,8 @@ void apply_paravirt(struct paravirt_patc
/* Sync to be conservative, in case we patched following instructions */
sync_core();
 }
-extern struct paravirt_patch __start_parainstructions[],
-   __stop_parainstructions[];
+extern struct paravirt_patch __parainstructions[],
+   __parainstructions_end[];
 #endif /* CONFIG_PARAVIRT */
 
 void __init alternative_instructions(void)
@@ -367,6 +367,6 @@ void __init alternative_instructions(voi
alternatives_smp_switch(0);
}
 #endif
-   apply_paravirt(__start_parainstructions, __stop_parainstructions);
+   apply_paravirt(__parainstructions, __parainstructions_end);
local_irq_restore(flags);
 }
===
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -72,8 +72,8 @@ static struct {
 } vmi_ops;
 
 /* XXX move this to alternative.h */
-extern struct paravirt_patch __start_parainstructions[],
-   __stop_parainstructions[];
+extern struct paravirt_patch __parainstructions[],
+   __parainstructions_end[];
 
 /*
  * VMI patching routines.
@@ -917,7 +917,7 @@ static inline int __init activate_vmi(vo
 * to do this before IRQs get reenabled.  Fortunately, it is
 * idempotent.
 */
-   apply_paravirt(__start_parainstructions, __stop_parainstructions);
+   apply_paravirt(__parainstructions, __parainstructions_end);
 
vmi_bringup();
 
===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -166,9 +166,9 @@ SECTIONS
   }
   . = ALIGN(4);
   .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
-   __start_parainstructions = .;
+   __parainstructions = .;
*(.parainstructions)
-   __stop_parainstructions = .;
+   __parainstructions_end = .;
   }
   /* .exit.text is discard at runtime, not link time, to deal with references
  from .altinstructions and .eh_frame */
===
--- a/include/asm-i386/alternative.h
+++ b/include/asm-i386/alternative.h
@@ -121,8 +121,8 @@ static inline void
 static inline void
 apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
 {}
-#define __start_parainstructions NULL
-#define __stop_parainstructions NULL
+#define __parainstructions NULL
+#define __parainstructions_end NULL
 #endif
 
 #endif /* _I386_ALTERNATIVE_H */

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 5/6] Allow boot-time disable of SMP altinstructions

2007-04-03 Thread Jeremy Fitzhardinge
Add "noreplace-smp" to disable SMP instruction replacement.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
 
---
 Documentation/kernel-parameters.txt |6 ++
 arch/i386/kernel/alternative.c  |   23 +++
 2 files changed, 25 insertions(+), 4 deletions(-)

===
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1134,6 +1134,9 @@ and is between 256 and 4096 characters. 
nomca   [IA-64] Disable machine check abort handling
 
nomce   [IA-32] Machine Check Exception
+
+   noreplace-smp   [IA-32,SMP] Don't replace SMP instructions
+   with UP alternatives
 
noresidual  [PPC] Don't use residual data on PReP machines.
 
@@ -1540,6 +1543,9 @@ and is between 256 and 4096 characters. 
smart2= [HW]
Format: [,[,...,]]
 
+   smp-alt-once[IA-32,SMP] On a hotplug CPU system, only
+   attempt to substitute SMP alternatives once at boot.
+
snd-ad1816a=[HW,ALSA]
 
snd-ad1848= [HW,ALSA]
===
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 
+static int noreplace_smp = 0;
 static int smp_alt_once  = 0;
 static int debug_alternative = 0;
 
@@ -13,14 +14,22 @@ static int __init bootonly(char *str)
smp_alt_once = 1;
return 1;
 }
+__setup("smp-alt-boot", bootonly);
+
 static int __init debug_alt(char *str)
 {
debug_alternative = 1;
return 1;
 }
-
-__setup("smp-alt-boot", bootonly);
 __setup("debug-alternative", debug_alt);
+
+static int __init setup_noreplace_smp(char *str)
+{
+   noreplace_smp = 1;
+   return 1;
+}
+__setup("noreplace-smp", setup_noreplace_smp);
+
 
 #define DPRINTK(fmt, args...) if (debug_alternative) \
printk(KERN_DEBUG fmt, args)
@@ -185,6 +194,9 @@ static void alternatives_smp_unlock(u8 *
 {
u8 **ptr;
 
+   if (noreplace_smp)
+   return;
+
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
@@ -218,6 +230,9 @@ void alternatives_smp_module_add(struct 
 {
struct smp_alt_module *smp;
unsigned long flags;
+
+   if (noreplace_smp)
+   return;
 
if (smp_alt_once) {
if (boot_cpu_has(X86_FEATURE_UP))
@@ -253,7 +268,7 @@ void alternatives_smp_module_del(struct 
struct smp_alt_module *item;
unsigned long flags;
 
-   if (smp_alt_once)
+   if (smp_alt_once || noreplace_smp)
return;
 
spin_lock_irqsave(&smp_alt, flags);
@@ -284,7 +299,7 @@ void alternatives_smp_switch(int smp)
return;
 #endif
 
-   if (smp_alt_once)
+   if (noreplace_smp || smp_alt_once)
return;
BUG_ON(!smp && (num_online_cpus() > 1));
 

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[patch 0/6] Various cleanups

2007-04-03 Thread Jeremy Fitzhardinge
Hi Andi,

Here's a little batch of cleanups:
 - re-enable VDSO when PARAVIRT is enabled
 - make the parainstructions symbols match the
   other altinstructions naming convention
 - add kernel command-line options to disable altinstructions for
   smp and pv_ops

Oh, and I'm mailing your noreplacement patch back at you, for no
particularly good reason.

J

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread H. Peter Anvin
Arnd Bergmann wrote:
> On Wednesday 04 April 2007, H. Peter Anvin wrote:
>> Note that at least for PIO-based devices, there is nothing that says you 
>> can't implement PCI over another transport, if you wish.  It's really 
>> just a very simple RPC protocol.
> 
> The PIO aspect of PCI is simple, yes, except on architectures that don't
> have the concept of PIO or even uncached memory, but even that can
> be done by defining readl/writel/inl/outl/... as hcalls.
> 
> The tricky part about PCI is the device probing, everything about config
> space accesses, interrupt swizzling, bus/device/function numbers and
> base address registers becomes a pointless excercise when the other side
> is just faking it.

Configuration space access is platform-dependent.  It's only defined to 
work in a specific way on x86 platforms.

"Interrupt swizzling" is really totally independent of PCI.  ALL PCI 
really provides is up to four interrupts per device (not counting 
MSI/MSI-X) and an 8-bit writable field which the platform can choose to 
use to hold interrupt information.  That's all.  The rest is all 
platform information.

PCI enumeration is hardly complex.  Most of the stuff that doesn't apply 
to you you can generally ignore, as is done by other busses like 
HyperTransport when they emulate PCI.

That being said, on platforms which are PCI-centric, such as x86, this 
of course makes it a lot easier to produce virtual devices which work 
across hypervisors, since the device model, of *any* operating system is 
set up to handle them.

-hpa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Wednesday 04 April 2007, H. Peter Anvin wrote:
> Note that at least for PIO-based devices, there is nothing that says you 
> can't implement PCI over another transport, if you wish.  It's really 
> just a very simple RPC protocol.

The PIO aspect of PCI is simple, yes, except on architectures that don't
have the concept of PIO or even uncached memory, but even that can
be done by defining readl/writel/inl/outl/... as hcalls.

The tricky part about PCI is the device probing, everything about config
space accesses, interrupt swizzling, bus/device/function numbers and
base address registers becomes a pointless excercise when the other side
is just faking it.

> DMA is trickier, as it makes the data appear into the address space of 
> the guest in a way that is both device- and host-dependent (in the 
> presence of PCI domains, IOMMU etc.)  There may be reason to avoid DMA 
> for that reason.

Right, PCI DMA and virtualization don't mix. DMA in general is fine though,
as long as your devices (real or virtual) see the guest physical addresses
as a contiguous 64 bit range and have well-defined semantics about what
addresses are accessed in what way.
When you think of file read/write syscalls as DMA into user space, it's
a very clean concept. Async I/O somewhat less so, but still pretty good.

Arnd <><

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread H. Peter Anvin
Arnd Bergmann wrote:
> 
> One interesting aspect of the PS3 hypervisor is that some of the
> low-speed interfaces are implemented as a virtual UART, meaning
> something that only has read and write operations and uses an
> interrupt for flow control. The implementation in 
> drivers/ps3/vuart.c is probably more complex than what we want
> as a generic transport mechanism, but simply having a bidirectional
> data stream sounds like an ideal abstraction for the "simple"
> case. Some more or less obvious users of this include:
> 
> - console
> - additional tty
> - random
> - slow network (using ppp)
> - printer
> - watchdog
> - hid (e.g. mouse)
> - system management (like ps3)
> - fast network (in combination with
>   shared memory segment)
> 
> The transport can be hypervisor specific, e.g. there could be
> a virtual PCI serial port on kvm, an hcall interface on the ps3
> and a virtual CTC on s390 (kidding), while all of them can have
> the same kind of hardware _behind_ the serial connection.
> 

Note that at least for PIO-based devices, there is nothing that says you 
can't implement PCI over another transport, if you wish.  It's really 
just a very simple RPC protocol.

DMA is trickier, as it makes the data appear into the address space of 
the guest in a way that is both device- and host-dependent (in the 
presence of PCI domains, IOMMU etc.)  There may be reason to avoid DMA 
for that reason.

-hpa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> That said, something like USB is probably the best bet for this kind of
> low-performance device.  I think.  Not that I really know anything about
> USB.

USB has the disadvantage that it is more complex than PCI and requires
significantly more code to simulate on the host side.

On the plus side, I think it should be possible to implement a virtual
USB host on s390, which is not possible with PCI, but that again takes
a lot of work to implement.

One interesting aspect of the PS3 hypervisor is that some of the
low-speed interfaces are implemented as a virtual UART, meaning
something that only has read and write operations and uses an
interrupt for flow control. The implementation in 
drivers/ps3/vuart.c is probably more complex than what we want
as a generic transport mechanism, but simply having a bidirectional
data stream sounds like an ideal abstraction for the "simple"
case. Some more or less obvious users of this include:

- console
- additional tty
- random
- slow network (using ppp)
- printer
- watchdog
- hid (e.g. mouse)
- system management (like ps3)
- fast network (in combination with
  shared memory segment)

The transport can be hypervisor specific, e.g. there could be
a virtual PCI serial port on kvm, an hcall interface on the ps3
and a virtual CTC on s390 (kidding), while all of them can have
the same kind of hardware _behind_ the serial connection.

Arnd <><

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> So, what you're saying is:
> 
>1. assuming there's going to be a vast number of miscellaneous devices
>2. it would be best if there were one per device rather than one per
>   hypervisor per device
>3. so we'd have one linux device driver
> 
> But this implies that the work is just pushed off into all the
> hypervisors to support this new device over the generic interface;
> there's no overall reduction of code or complexity, other than making
> "wc" on the kernel source smaller.
> 

Sure there is, assuming you deal about heterogenous clients.  I'm not 
sure Xen is (although that is, as far as I understand, being remedied), 
which might explain your different perspective.

Consider that this may not even be about Linux -- having these standard 
devices would enable, say, 'doze device drivers to be written and shared.

> That said, something like USB is probably the best bet for this kind of
> low-performance device.  I think.  Not that I really know anything about
> USB.

USB is evil in the extreme for this kind of stuff.  Although in theory 
you can have any HCI you want, in practice the ones that are implemented 
requires a very complex framework for full compatiblity.

-hpa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> However, there are other things; console is some, or my original
> example, which was random number generation.  For those, the benefit
> of unification is proportionally greater, simply because the win of
> anything hypervisor-specific is much smaller. 

So, what you're saying is:

   1. assuming there's going to be a vast number of miscellaneous devices
   2. it would be best if there were one per device rather than one per
  hypervisor per device
   3. so we'd have one linux device driver

But this implies that the work is just pushed off into all the
hypervisors to support this new device over the generic interface;
there's no overall reduction of code or complexity, other than making
"wc" on the kernel source smaller.

That said, something like USB is probably the best bet for this kind of
low-performance device.  I think.  Not that I really know anything about
USB.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> > Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
> >   
> 
> OK, interesting.  People had proposed using SCSI as the interface, but I
> wasn't aware of any results from doing that.  How is it not good?
> 

SCSI is really overengineered for something as simple as a block interface.
A large part of the SCSI stack deals only with error handling, which
you don't want to burden the guests with at all, since most error conditions
can be handled fine by the host.
Another big aspect of SCSI is device enumeration and probing. Doing it
the SCSI way is particularly pointless. It's much simpler to have one
device with its own I/O interface at the hcall layer, and one interrupt
number for the block device, instead of faking the full hca/bus/dev/lun
hierarchy.

Arnd <><

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> Yes, and that's the core of the Xen netfront.  But is there really much
> code which can be shared between different hypervisors?  When you get
> down to it, all the real code is hypervisor-specific stuff for setting
> up ringbuffers and dealing with interrupts.  Like all the other network
> drivers.
> 

One thing, Jeremy, which I think is being a bit misleading here: you're 
focusing on big, performance-critical stuff.  Those things are going to 
be the ones which has the most win to implement in hypervisor-specific 
ways.  Although we can offer models for some hypervisors (and G-d knows 
there are enough implementations out there of virtual disk which are 
almost identical), they're clearly not going to be universal.

However, there are other things; console is some, or my original 
example, which was random number generation.  For those, the benefit of 
unification is proportionally greater, simply because the win of 
anything hypervisor-specific is much smaller.

-hpa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Jeremy Fitzhardinge
Arnd Bergmann wrote:
> We already have device drivers for physical devices that can be attached
> to different buses. The EHCI USB is an example of a driver that can 
> be for instance PCI, OF or an on-chip device. Moreover, you can have an
> abstracted device behind it that does not need to know about the transport,
> like the SCSI disk driver does not care if it is talking to an ATA, 
> parallel SCSI or SAS chip, or even which controller that is.
>   

Yes, that kind of layering is useful when there's enough of an
abstraction gap to fit the layers into.  USB is particularly simple in
that way, since it can be made to travel nicely over any number of
transports.

> console is also the least problematic interface, you can do it over
> practically anything.
>   

Sure.  But its interesting that there are savings to be had.

> Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
>   

OK, interesting.  People had proposed using SCSI as the interface, but I
wasn't aware of any results from doing that.  How is it not good?

> The interesting question about block devices is how to handle concurrency
> and interrupt mitigation. An efficient interface should
>
> - have asynchronous notification, not sleep until the transfer is complete
> - allow multiple blocks to be in flight simultaneously, so the host can
>   reorder the requests if it is smart enough
> - give only a single interrupt when multiple transfers have completed
>   

Yes.  The Xen block interface is already pretty efficient in these respects.

>> I'm not sure what similar common code could be extracted for network
>> devices.  I haven't looked into it all that closely.
>> 
>
> One way to do networking would be to simply provide a shared memory area
> that everyone can write to, then use a ring buffer and atomic operations
> to synchronize between the guests, and a method to send interrupts to the
> others for flow control.
>   

Yes, and that's the core of the Xen netfront.  But is there really much
code which can be shared between different hypervisors?  When you get
down to it, all the real code is hypervisor-specific stuff for setting
up ringbuffers and dealing with interrupts.  Like all the other network
drivers.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
> Arnd Bergmann wrote:
> > I think we need to separate two problems here:
> >
> > 1. Probing:
> > That's really what triggered the discussion, PCI probing is well-understood
> > and implemented on _most_ platforms, so there is some value in reusing it.
> > When you talk about 'very simple probing', I'm not sure what the most simple
> > approach could be. 
> 
> Is probing an interesting problem to consider on its own?  If there's
> some hypervisor-agnostic device driver in Linux, then obviously it needs
> some way to find the the corresponding (virtual) hardware for it to talk
> to.  But that probing mechanism will depend on the actual interface
> structure, and is just one of the many problems that need to be solved. 
> There's no point in overloading PCI to probe for the device unless
> you're actually using PCI to talk to the device.

We already have device drivers for physical devices that can be attached
to different buses. The EHCI USB is an example of a driver that can 
be for instance PCI, OF or an on-chip device. Moreover, you can have an
abstracted device behind it that does not need to know about the transport,
like the SCSI disk driver does not care if it is talking to an ATA, 
parallel SCSI or SAS chip, or even which controller that is.

> Let me say up front that I'm skeptical that we can come up with a single
> bus-like abstraction which can be a both simple and efficient interface
> to all the virtual architectures.  I think a more fruitful path is to
> find what pieces of functionality can be made common, with the aim of
> having small, simple and self-contained hypervisor-specific backends.
> 
> I think this needs to be considered on a class by class basis.  This
> thread started with a discussion about entropy sources.  In theory you
> could implement it as simply as exposing a mmaped ringbuffer.  There are
> some extra complexities deriving from the security requirements though;
> for example, all the entropy needs to be kept strictly private to the
> domain that consumes it.
> 
> But beyond that, there are 3 other important classes of device:
> 
> * console
> * disk
> * networking
> 
> (There are obviously more, but these are the must-have.)
> 
> Console already provides us with a model to work on, in the form of
> hvc-console.  The hvc-console code itself has the bulk of the common
> console code, along with a set of very small hypervisor-specific
> backends. The Xen console implementation shrunk considerably when we
> switched to using it.

console is also the least problematic interface, you can do it over
practically anything.
 
> If we could do the same thing with disk and net, I would be very happy.
> 
> For example, if we wanted to change the Xen frontend/backend disk
> interface, we could use SCSI as the basic protocol, and then convert
> netfront into a relatively simple scsi driver.  There would still be a
> Xen-specific piece, but it should be fairly small and have a clean
> interface.  Though the existing interface is pretty simple
> shove-this-block-there affair.

Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
The interesting question about block devices is how to handle concurrency
and interrupt mitigation. An efficient interface should

- have asynchronous notification, not sleep until the transfer is complete
- allow multiple blocks to be in flight simultaneously, so the host can
  reorder the requests if it is smart enough
- give only a single interrupt when multiple transfers have completed

minor optimizations could be
- give an interrupt early when some transfers are complete
- allow I/O barriers to be inserted in the stream
- allow marking blocks as more or less important (readahead vs. read)
- provide passthrough of SG_IO or similar for optical media
  (e.g. DVD writer)

> I'm not sure what similar common code could be extracted for network
> devices.  I haven't looked into it all that closely.

One way to do networking would be to simply provide a shared memory area
that everyone can write to, then use a ring buffer and atomic operations
to synchronize between the guests, and a method to send interrupts to the
others for flow control.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Jeremy Fitzhardinge
Arnd Bergmann wrote:
> I think we need to separate two problems here:
>
> 1. Probing:
> That's really what triggered the discussion, PCI probing is well-understood
> and implemented on _most_ platforms, so there is some value in reusing it.
> When you talk about 'very simple probing', I'm not sure what the most simple
> approach could be. 

Is probing an interesting problem to consider on its own?  If there's
some hypervisor-agnostic device driver in Linux, then obviously it needs
some way to find the the corresponding (virtual) hardware for it to talk
to.  But that probing mechanism will depend on the actual interface
structure, and is just one of the many problems that need to be solved. 
There's no point in overloading PCI to probe for the device unless
you're actually using PCI to talk to the device.

> Ideas that have been implemented before include:
> a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic 
> tree),
>and try to access each one of them in order to find out if it's there. We
>do that for PCI or CCW, for instance.
> b) Have an iterator in the hypervisor (or firmware), to return a handle to
>the first, next or child of a device. We do that for open firmware.
> c) ask the hypervisor for an unused device of a given class, which needs to
>be returned to the hypervisor when no longer used. This is how the PS3
>hypervisor works, but it does not play well with the Linux driver model.
>   

Xen has xenbus, which is essentially a filesystem-like namespace which
can be walked to find the devices being exposed to a guest.  It is
fairly similar to OFW's device tree.

> 2. Device access:
> When talking to a virtual device, you want to have at least a way to give
> commands to it and a way to get interrupts back. Again, multiple ideas
> have been used in the past, and we should choose a subset:
>   

Let me say up front that I'm skeptical that we can come up with a single
bus-like abstraction which can be a both simple and efficient interface
to all the virtual architectures.  I think a more fruitful path is to
find what pieces of functionality can be made common, with the aim of
having small, simple and self-contained hypervisor-specific backends.

I think this needs to be considered on a class by class basis.  This
thread started with a discussion about entropy sources.  In theory you
could implement it as simply as exposing a mmaped ringbuffer.  There are
some extra complexities deriving from the security requirements though;
for example, all the entropy needs to be kept strictly private to the
domain that consumes it.

But beyond that, there are 3 other important classes of device:

* console
* disk
* networking

(There are obviously more, but these are the must-have.)

Console already provides us with a model to work on, in the form of
hvc-console.  The hvc-console code itself has the bulk of the common
console code, along with a set of very small hypervisor-specific
backends. The Xen console implementation shrunk considerably when we
switched to using it.

If we could do the same thing with disk and net, I would be very happy.

For example, if we wanted to change the Xen frontend/backend disk
interface, we could use SCSI as the basic protocol, and then convert
netfront into a relatively simple scsi driver.  There would still be a
Xen-specific piece, but it should be fairly small and have a clean
interface.  Though the existing interface is pretty simple
shove-this-block-there affair.

I'm not sure what similar common code could be extracted for network
devices.  I haven't looked into it all that closely.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Cornelia Huck wrote:
> On s390, it would be more than strangeness. There's no implementation
> of PCI at all, someone would have to cook it up - and it wouldn't have
> any use beyond those special devices. Since there isn't any bus type
> that is available on *all* architectures, a generic "virtual" bus with
> very simple probing seems much saner...

I think we need to separate two problems here:

1. Probing:
That's really what triggered the discussion, PCI probing is well-understood
and implemented on _most_ platforms, so there is some value in reusing it.
When you talk about 'very simple probing', I'm not sure what the most simple
approach could be. Ideas that have been implemented before include:
a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree),
   and try to access each one of them in order to find out if it's there. We
   do that for PCI or CCW, for instance.
b) Have an iterator in the hypervisor (or firmware), to return a handle to
   the first, next or child of a device. We do that for open firmware.
c) ask the hypervisor for an unused device of a given class, which needs to
   be returned to the hypervisor when no longer used. This is how the PS3
   hypervisor works, but it does not play well with the Linux driver model.

2. Device access:
When talking to a virtual device, you want to have at least a way to give
commands to it and a way to get interrupts back. Again, multiple ideas
have been used in the past, and we should choose a subset:
a) PCI-like: mmio using memory and/or I/O space BAR setup, interrupt
   numbers and DMA to guest physical addresses.
b) Channel-like: use an hcall to give commands to the hypervisor, passing
   down a device handle command code and data areas in guest physical space.
   Interrupts return the device handle or a OS-defined per-device value.
c) Minimalistic: Every device is mapped into the guest address space and
   can potentially be remapped into user space. The device memory can be
   shared between guests and/or with the host if that uses the same driver.
   The guest is able to signal the receiving end using an hcall and gets
   interrupts like in b)
d) UNIX-like: devices appear like file descriptors, the guest can do
   operations like read/write/sync/mmap, potentially ioctl on them to talk
   to the host.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


All data backing beans are managed by the JavaServer Faces managed bean facility.

2007-04-03 Thread Pol
 Topics include: components in the visual interface, creating a project, page 
navigation, binding components to back-end services, deploying a project, and 
securing your applications with JAAS. Terminates life cycle processing if 
responseComplete is called in a validate method. Field key to be used in a sort 
for a table data provider. On the shadow pane, the user also has the ability to 
use the PayPal service to send payment to the seller. A simple display of text 
on a page. By exploring the use cases, design, and implementation of the Pet 
Store 2. An HTML anchor on a page that can be used by a hyperlink as a target. 
Ajax requires a different server-side architecture to support this interaction 
model. ---
Letter content was scanned by WinAntiVirus Pro 2007.
No threat detected.
Please visit www.winantivirus.com for more details.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Cornelia Huck
On Tue, 3 Apr 2007 16:03:14 +0200,
Arnd Bergmann <[EMAIL PROTECTED]> wrote:

> > > struct virt_dev {
> > >   struct device dev;
> > >   struct virt_driver *driver;
> > >   struct virt_bus *bus;
> > >   struct pci_device_id id;
> > >   int irq;
> > > };
> > 
> > And that's where I have problems :) The notion of "irq" is far too
> > platform specific. I can bend my mind round using PCI-like ids for
> > non-PCI virtualized devices, but an integer is far too small and to
> > specific for a way to access the device.
> 
> Sorry, I've been working too long on the lesser architectures.
> IRQ number are evil indeed.
> However, I'm pretty sure that we need _some_ abstraction of an
> interrupt mechanism here. The easiest way is probably to have a
> callback function like
>   int (*irq_handler)(struct virt_dev*, unsigned long message);
> in the virt_dev.

Yes, something like
int (*handler) (struct virt_dev *, struct virt_interrupt_info *);
should cover the needed cases.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Adrian Bunk
On Tue, Apr 03, 2007 at 11:26:52AM +0200, Andi Kleen wrote:
> > 
> > On s390, it would be more than strangeness. There's no implementation
> > of PCI at all, someone would have to cook it up - and it wouldn't have
> > any use beyond those special devices. Since there isn't any bus type
> > that is available on *all* architectures, a generic "virtual" bus with
> > very simple probing seems much saner...
> 
> You just have to change all the distribution installers then. 
> Ok I suppose on s390 that's not that big issue because there are not
> that many for s390. But for x86 there exist quite a lot. I suppose
> it's easier to change it in the kernel.

I don't get this point.

Compared to whatever will be done in the kernel, any change to a 
distribution installer should be trivial.

And a new release of a distribution with a new kernel might anyway 
usually require some updates to an installer.

> -Andi

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Cornelia Huck wrote:
> On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann <[EMAIL PROTECTED]> wrote:
> 
> That's OK for a virtualized architecture where the base architecture
> already supports PCI. But a traditional s390 OS would be as unhappy
> with a PCI device as with a device of a completely new type :)

Sure, that was my point from the start.

> There are several options for virtualized devices (and I don't know why
> they shouldn't coexist):
> 
> 1. Emulate a well-known device (like a e1000 network card on PCI or a
> model 3390 dasd on CCW). Existing operating systems can just use them,
> but it's a lot of work in the hypervisor.

Most hypervisors already do this, and it's an unrelated topic. 
What we're trying to achieve is to make sure not every hypervisor
and simulator has to introduce its own set of drivers.


> > struct virt_bus {
> > /* platform dependent */
> > long (*transfer)(struct virt_dev *dev, void *buffer,
> > unsigned long size, int type);
> > };
> 
> Should this embed a struct bus_type? Or reference a generic_virt_bus?

yes, that should embed the bus_type.

> > struct virt_dev {
> > struct device dev;
> > struct virt_driver *driver;
> > struct virt_bus *bus;
> > struct pci_device_id id;
> > int irq;
> > };
> 
> And that's where I have problems :) The notion of "irq" is far too
> platform specific. I can bend my mind round using PCI-like ids for
> non-PCI virtualized devices, but an integer is far too small and to
> specific for a way to access the device.

Sorry, I've been working too long on the lesser architectures.
IRQ number are evil indeed.
However, I'm pretty sure that we need _some_ abstraction of an
interrupt mechanism here. The easiest way is probably to have a
callback function like
int (*irq_handler)(struct virt_dev*, unsigned long message);
in the virt_dev.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Cornelia Huck
On Tue, 3 Apr 2007 14:15:37 +0200,
Arnd Bergmann <[EMAIL PROTECTED]> wrote:

> Right, but an interesting point is the question what to do when running
> another operating system as a guest under Linux, e.g. with kvm.
> 
> Ideally, you'd want to use the same interface to announce the presence
> of the device, which can be done far more easily with PCI than using
> a new bus type that you'd need to implement for every OS, instead of
> just implementing the virtual PCI driver.

That's OK for a virtualized architecture where the base architecture
already supports PCI. But a traditional s390 OS would be as unhappy
with a PCI device as with a device of a completely new type :)

There are several options for virtualized devices (and I don't know why
they shouldn't coexist):

1. Emulate a well-known device (like a e1000 network card on PCI or a
model 3390 dasd on CCW). Existing operating systems can just use them,
but it's a lot of work in the hypervisor.

2. Create a virtual PCI device (or a virtual CCW device) with a new id.
Operating systems would need to write a new device driver, but they can
use a familiar infrastructure. That seems to be what most people are
talking about here.

3. Create a new bus which uses a new access method. This new method can
be made very simple, but requires support from the guest operating
system. That's what I was talking about :)

[Note: I'm not actually advocating an emulated ccw driver. There be
dragons.]

> Using a 16 bit number to identify a specific interface sounds like
> a good idea to me, if only for the reason that it is a widely used
> approach. The alternative would be to use an ascii string, like we
> have for open-firmware devices on powerpc or sparc.

OK, we could use common identifiers (and reserve it) for case 2 across
several busses. Like

#define PCI_VIRT_ID GENERIC_VIRT_ID
#define CCW_VIRT_DEVTYPE GENERIC_VIRT_ID

> I think in either way, we need to abstract the driver for the virtual
> device from the underlying bus infrastructure, which is hypervisor
> and/or platform dependent.

Yes, that sounds sane for case 3. We should just standardize the
interface.

> The abstraction could work roughly like this:
> 
> 
> ==
> virt_dev.h
> ==
> struct virt_driver { /* platform independent */
>   struct device_driver drv;
>   struct pci_device_id *ids; /* not necessarily PCI */
> };
> struct virt_bus {
>   /* platform dependent */
>   long (*transfer)(struct virt_dev *dev, void *buffer,
>   unsigned long size, int type);
> };

Should this embed a struct bus_type? Or reference a generic_virt_bus?

> struct virt_dev {
>   struct device dev;
>   struct virt_driver *driver;
>   struct virt_bus *bus;
>   struct pci_device_id id;
>   int irq;
> };

And that's where I have problems :) The notion of "irq" is far too
platform specific. I can bend my mind round using PCI-like ids for
non-PCI virtualized devices, but an integer is far too small and to
specific for a way to access the device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Cornelia Huck wrote:
> > 
> > I think that's true outside of s390, but a standardized virtual device
> > interface should be able to work there as well. Interestingly, the
> > s390 channel I/O also uses two 16 bit numbers to identify a device
> > (type and model), just like PCI or USB, so in that light, we might
> > be able to use the same number space for something entirely different
> > depending on the virtual bus.
> 
> Even if we used those ids for cu_type and dev_type, it would still be
> ugly IMO. It would be much cleaner to just define a very simple, easy
> to implement virtual bus without dragging implementation details for
> other types of devices around.

Right, but an interesting point is the question what to do when running
another operating system as a guest under Linux, e.g. with kvm.

Ideally, you'd want to use the same interface to announce the presence
of the device, which can be done far more easily with PCI than using
a new bus type that you'd need to implement for every OS, instead of
just implementing the virtual PCI driver.

Using a 16 bit number to identify a specific interface sounds like
a good idea to me, if only for the reason that it is a widely used
approach. The alternative would be to use an ascii string, like we
have for open-firmware devices on powerpc or sparc.

I think in either way, we need to abstract the driver for the virtual
device from the underlying bus infrastructure, which is hypervisor
and/or platform dependent. The abstraction could work roughly like this:


==
virt_dev.h
==
struct virt_driver { /* platform independent */
struct device_driver drv;
struct pci_device_id *ids; /* not necessarily PCI */
};
struct virt_bus {
/* platform dependent */
long (*transfer)(struct virt_dev *dev, void *buffer,
unsigned long size, int type);
};
struct virt_dev {
struct device dev;
struct virt_driver *driver;
struct virt_bus *bus;
struct pci_device_id id;
int irq;
};
==
virt_example.c
==
static ssize_t virt_pipe_read(struct file *filp, char __user *buffer,
 size_t len, loff_t *off)
{
struct virt_dev *dev = filp->private_data;
ssize_t ret = dev->bus->transfer(dev, buffer, len, READ);
*off += ret;
return ret;
}
static struct file_operations virt_pipe_fops = {
.open = nonseekable_open,
.read = virt_pipe_read,
};
static int virt_pipe_probe(struct device *dev)
{
struct virt_dev *vdev = to_virt_dev(dev);
struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL);
mdev->name = "virt_pipe";
mdev->fops = &virt_pipe_fops;
mdev->parent = dev;
return register_miscdev(mdev);
}
static struct pci_device_id virt_pipe_id = {
.vendor = PCI_VENDOR_LINUX, .device = 0x3456,
};
MODULE_DEVICE_TABLE(pci, virt_pipe_id);
static struct virt_driver virt_pipe_driver = {
.drv = {
.name = "virt_pipe",
.probe = virt_pipe_probe,
},
.ids = &virt_pipe_id,
}
static int virt_pipe_init(void)
{
return virt_driver_register(&virt_pipe_driver);
}
module_init(virt_pipe_init);
==
virt_devtree.c
==
static long virt_devtree_transfer(struct virt_dev *dev, void *buffer,
unsigned long size, int type)
{
long reg;
switch type {
case READ:
ret = hcall(HV_READ, dev->dev.platform_data, buffer, size);
break;
case WRITE:
ret = hcall(HV_WRITE, dev->dev.platform_data, buffer, size);
break;
default:
BUG();
}
return ret;
}
static struct virt_bus virt_devtree_bus = {
.transfer = virt_devtree_transfer,
};
static int virt_devtree_probe(struct of_device *ofdev,
struct of_device_id *match)
{
struct virt_dev *vdev = kzalloc(sizeof(*vdev);
vdev->bus = &virt_devtree_bus;
vdev->dev.parent = &ofdev->dev;
vdev.id.vendor = PCI_VENDOR_LINUX;
vdev.id.device = *of_get_property(ofdev, "virt_dev_id"),
vdev.irq = of_irq_parse_and_map(ofdev, 0);
return device_register(&vdev->dev);
}
struct of_device_id virt_devtree_ids = {
.compatible = "virt-dev",
};
static struct of_platform_driver virt_devtree_driver = {
.probe = virt_devtree_probe,
.match_table = &virt_devtree_ids,
};
==
virt_pci.c
==
static long virt_pci_transfer(struct virt_dev *dev, void *buffer,
unsigned long size, int type)
{
struct virt_pci_regs __iomem *regs = dev->dev.platform_data;
switch type {
case READ:
mmio_insb(regs->read_port, buffer, size);
break;
case WRITE:
mmio_outsb(regs->write_port, buffer, size);
break;
defaul

Re: A set of "standard" virtual devices?

2007-04-03 Thread Cornelia Huck
On Tue, 3 Apr 2007 11:26:52 +0200,
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > 
> > On s390, it would be more than strangeness. There's no implementation
> > of PCI at all, someone would have to cook it up - and it wouldn't have
> > any use beyond those special devices. Since there isn't any bus type
> > that is available on *all* architectures, a generic "virtual" bus with
> > very simple probing seems much saner...
> 
> You just have to change all the distribution installers then. 
> Ok I suppose on s390 that's not that big issue because there are not
> that many for s390. But for x86 there exist quite a lot. I suppose
> it's easier to change it in the kernel.

Huh? I don't follow you here. Why should this be easier for s390 vs.
x86? (And since there seems to be a trend to use HAL as a device
discovery tool recently: A new bus type is easy enough to add there.)

And I really think we should have a clean design in the kernel instead
of trying to wedge virtual devices into a known system. Exposing
virtual devices (which may be handled totally differently) as PCI
devices just seems hackish to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Cornelia Huck
On Tue, 3 Apr 2007 11:41:49 +0200,
Arnd Bergmann <[EMAIL PROTECTED]> wrote:

> On Tuesday 03 April 2007, H. Peter Anvin wrote:
> > However, one probably wants to think about what the heck one actually 
> > means with "virtualization" in the absence of a lot of this stuff.  PCI 
> > is probably the closest thing we have to a lowest common denominator for 
> > device detection.
> 
> I think that's true outside of s390, but a standardized virtual device
> interface should be able to work there as well. Interestingly, the
> s390 channel I/O also uses two 16 bit numbers to identify a device
> (type and model), just like PCI or USB, so in that light, we might
> be able to use the same number space for something entirely different
> depending on the virtual bus.

Even if we used those ids for cu_type and dev_type, it would still be
ugly IMO. It would be much cleaner to just define a very simple, easy
to implement virtual bus without dragging implementation details for
other types of devices around.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, H. Peter Anvin wrote:
> However, one probably wants to think about what the heck one actually 
> means with "virtualization" in the absence of a lot of this stuff.  PCI 
> is probably the closest thing we have to a lowest common denominator for 
> device detection.

I think that's true outside of s390, but a standardized virtual device
interface should be able to work there as well. Interestingly, the
s390 channel I/O also uses two 16 bit numbers to identify a device
(type and model), just like PCI or USB, so in that light, we might
be able to use the same number space for something entirely different
depending on the virtual bus.

Arnd <><

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Andi Kleen
> 
> On s390, it would be more than strangeness. There's no implementation
> of PCI at all, someone would have to cook it up - and it wouldn't have
> any use beyond those special devices. Since there isn't any bus type
> that is available on *all* architectures, a generic "virtual" bus with
> very simple probing seems much saner...

You just have to change all the distribution installers then. 
Ok I suppose on s390 that's not that big issue because there are not
that many for s390. But for x86 there exist quite a lot. I suppose
it's easier to change it in the kernel.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Cornelia Huck
On Tue, 3 Apr 2007 10:30:36 +0200,
Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote:
> > On Monday 02 April 2007 23:12, Andi Kleen wrote:
> > > 
> > > > How would that work in the case where virtualized guests don't have a
> > > > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > > > a PCI bus?
> > > 
> > > If they emulated one with the appropiate device 
> > > then distribution driver auto probing would just work transparently for
> > > them. 
> > 
> > Still, that would only make sense for virtualized platforms that usually 
> > have 
> > a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.
> 
> If it gets the job done surely you can tolerate a little strangeness?

On s390, it would be more than strangeness. There's no implementation
of PCI at all, someone would have to cook it up - and it wouldn't have
any use beyond those special devices. Since there isn't any bus type
that is available on *all* architectures, a generic "virtual" bus with
very simple probing seems much saner...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Andi Kleen
On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote:
> On Monday 02 April 2007 23:12, Andi Kleen wrote:
> > 
> > > How would that work in the case where virtualized guests don't have a
> > > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > > a PCI bus?
> > 
> > If they emulated one with the appropiate device 
> > then distribution driver auto probing would just work transparently for
> > them. 
> 
> Still, that would only make sense for virtualized platforms that usually have 
> a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.

If it gets the job done surely you can tolerate a little strangeness?

-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: A set of "standard" virtual devices?

2007-04-03 Thread Christian Borntraeger
On Monday 02 April 2007 23:12, Andi Kleen wrote:
> 
> > How would that work in the case where virtualized guests don't have a
> > visible PCI bus, and the virtual environment doesn't pretend to emulate
> > a PCI bus?
> 
> If they emulated one with the appropiate device 
> then distribution driver auto probing would just work transparently for
> them. 

Still, that would only make sense for virtualized platforms that usually have 
a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange.

Christian
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization