Re: [Qemu-devel] [PATCH] Guest mouse cursor drawing in SDL

2007-03-20 Thread Anthony Liguori

andrzej zaborowski wrote:

 (the pixel format of the cursor was the same as the pixel format of
DisplayState).


I'm not sure if we want to always use the same pixel format - for
example with VMware SVGA and SDL in 16 bit mode, the cursor pixel
format reported by guest Xorg was 8 bpp. This would mean two
conversions instead of one: first VMware SVGA would have to convert 8
- 16 bpp then SDL 16 - 1 bpp.


VNC expects the cursor data to be in the current pixel format.  The nice 
thing about this approach is you don't need to pass a bunch of pixel 
info (for instance, you don't need to pass the depth, endianness, etc.).


The Cirrus hardware cursor also has to be copied anyway and doing the 
conversion is really straight forward.



One question in my mind is what the alpha mask should look like.  All
that VNC (and SDL) can use is a 1-bit alpha depth.  That's all Cirrus
supports too.  VMware SVGA supports an 8-bit alpha channel though so it
may make sense to design the interface now to support that.


VMware SVGA without CAP_ALPHA_CURSOR also does only 1-bit alpha. Maybe
we should keep two masks - if the frontend (e.g SDL) supports only
1-bit alpha it would use only this basic mask - if the emulated VGA
supports only 1-bit alpha it would leave the other mask opaque.


I thought about that.  Right now with my patch, the cursor mask is 
always 1-bit deep.  I was thinking that perhaps the right thing to do is 
to make it always 8-bits deep and then convert in VNC.


Do you think using the SDL cursor is all that useful?  As soon as gtk 
widgets get involved, the cursor becomes ARGB so in practice, I'm not 
sure that it's all that helpful.


BTW, I've got the ALPHA_CURSOR working with VNC.. it's very sweet :-)

Regards,

Anthony Liguori


Regards,
Andrzej


___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel





___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] qemu-cvs-2007-3-22 compile error (in cygwin)

2007-03-22 Thread Anthony Liguori

[EMAIL PROTECTED] wrote:

I have exactly the same error, using MinGW.

I commented out the faulty line in vl.c, and compiled again. Now, I have
following error :

C:/msys/home/Denis/qemu/hw/pckbd.c:376: undefined reference to `vmmouse_init'
make[1]: *** [qemu.exe] Error 1
make[1]: Leaving directory `/home/Denis/qemu/i386-softmmu'
make: *** [subdir-i386-softmmu] Error 2
  


You probably need to rerun configure.  I do not see this problem using a 
mingw cross compiler.


Regards,

Anthony Liguori



Selon Christian MICHON [EMAIL PROTECTED]:

  

use mingw to compile qemu for windows hosts, not cygwin.
if I recall well, SDL does not cope well on cygwin.
--
Christian


___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel







___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel

  




___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works

2007-03-24 Thread Anthony Liguori

Axel Zeuner wrote:

Hi,
  


Hi Axel,

By adding some GCC4 fixes on top of your patch, I was able to get qemu 
for i386 (on i386) to compile and run.  So far, I've only tested a win2k 
guest.


The big problem (which pbrook helped me with) was GCC4 freaking out over 
some stq's.  Splitting up the 64bit ops into 32bit ops seemed to address 
most of the problems.


The tricky thing I still can't figure out is how to get ASM_SOFTMMU 
working.  The problem is GLUE(st, SUFFIX) function.  First GCC cannot 
deal with the register pressure.  The problem I can't seem to fix though 
is that GCC sticks %1 in %esi because we're only using an r 
constraint, not a q constraint.  This results in the generation of 
%sib which is an invalid register.  However, refactoring the code to not 
require a q constraint doesn't seem to help either.


The attached patch is what I have so far.  Some help with people more 
familiar with gcc asm foo would be appreciated!


Regards,

Anthony Liguori

there were a lot of discussions about compiling qemu with gcc4 or higher. The 
summary of the discussions were, as I understood, that compiling qemu with 
gcc4 requires changing the code generation engine of the most of the 
supported targets. These changes require a lot of work and time.


How about splitting the current static code generation process further? 
Today gcc produces object code and dyngen adapts it for the purposes of qemu, 
i.e produces the generation function, patches in parameters ..:
gcc -c op.o op.c ;dyngen -o op.h ... op.o . 
The op_XXX functions generated by gcc may not contain more than one exit

and this exit must be at the end, no not intended jumps to external
functions may occur.

It is possible to split the transformation into the following steps: 
Generate assembly output from the C-Sources: gcc -S -o op-0.s op.c.
Convert the assembly output: cvtasm op.s op-0.s. 
Assemble the converted assembler sources: as -o op.o op.s. 
Use dyngen as before: dyngen -o op.h ... op.o. 
Nothing will change if cvtasm copies only the input to the output, i.e. this 
additional pass will not break existing code.


A full featured converter (cvtasm) has a lot of dependencies: it has to 
support all hosts (M) (with all assembler dialects M') and all targets N, 
i.e. in the worst case one would end with M'x N variants of it, or M x N if 
one supports only one assembler dialect per host.  It is clear, that the 
number of variants is one of the biggest disadvantages of such an approach.


Now I will focus on x86_64 host and x86_64-softmmu target.
cvtasm has to do the following tasks in this case:
0) convert repXXX; ret to ret only. (Not done yet, x86_64 only, but does not 
harm).
1) append to all functions, where the last instruction is not a return a ret 
instruction.
2) add a label to all functions with more than one return before the last  
return.
3) replace all returns not at the end of a function with an unconditional jump 
to the generated end label. Avoid touching op_exit_tb here.
4) check all jump instructions if they contain jumps to external labels,  
replace jumps to external labels with calls to the labels.


The task 0-2 are easy, task 3 may, task 4 is definitely target/host dependent, 
because there exist intentionally some jumps to external labels, i.e. outside 
of the function, for instance op_goto_tb. 
Please correct me, if I am wrong or something is not mentioned above. 

The attached cvtasm.c allows compiling op.c/op.s/op.o without any disabled 
optimisations in Makefile.target (patches for Makefile and Makefile.target are 
attached). The program itself definitely needs a rewrite, is not failsafe and 
produces to much output on stdout. 

The macro OP_GOTO_TB from exec-all.h in the general case contains two nice 
variables and label definitions to force a reference from a variable into the 
op_goto_tbXXX functions. Unfortunately gcc4 detects that these variables and 
lables are unused and suppresses their generation, as result dyngen does not 
generate two lines in op.h:
case INDEX_op_goto_tb0: 
	...

label_offsets[0] = 8 + (gen_code_ptr - gen_code_buf); // --
...
case INDEX_op_goto_tb1: 
	...
	label_offsets[1] = 8 + (gen_code_ptr - gen_code_buf); // -- 
	...

and qemu produces a SIGSEGV on the first jump from one buffer to the next.
I was not able to force gcc4 to generate the two variables, therefore I had to 
replace the general macro with a host dependent one for x86_64 similar to x86 
but using the indirect branch method. 
After the replacement qemu worked when compiled with gcc4.


I made my checks with the following compilers using Debian testing amd64: gcc 
version 3.4.6 (Debian 3.4.6-5) and gcc version 4.1.2 20061115 (prerelease) 
(Debian 4.1.1-21).


Please note: These patches work only for x86_64 hosts and x86_64 targets. They 
will break all other architectures. I did not check i386-softmmu. It works 
for me. 


I apologise for the size of the attachments.

Kind regards
Axel

Re: [Qemu-devel] Recursion in cpu_physical_memory_rw

2007-03-25 Thread Anthony Liguori

Herbert Xu wrote:

On Wed, Nov 15, 2006 at 12:57:24AM +, Paul Brook wrote:
It isn't always system memory. Some DMA controllers deliberately write to 
device FIFOs. There are also several devices which map areas of onboard RAM. 
At minimum you need to make those to use RAM mappings rather than MMIO.


I'm not suggesting that we change all existing users of cpu_physical_*
to a new interface that only accessed RAM.  However, for cases where it
is obvious that only system RAM is intended (e.g., rtl8139), it makes
sense to bypass MMIO handlers.

If a device is recursively writing to itself I'd take this as sign that the 
guest OS is already pretty screwed. I'm not sure what happens in this 
situation on real hardware, but I wouldn't be surprised if it caused similar 
effects by flooding the bus.


The scenario here is a compromised guest attempting to harm a host such
as Xen.


The only harm done to a host is that the process will take as much CPU 
as it can get.  This is really only a problem in Xen because the device 
model is in Domain-0.  Once the device model is in a different domain, 
it doesn't matter anymore as the normal scheduler parameters can be used 
to ensure that no other hosts are harmed.


Regards,

Anthony Liguori



Cheers,






[Qemu-devel] Re: using mmap?

2007-03-25 Thread Anthony Liguori

Mark Williamson wrote:

I'm also doubtful how much benefit it gave in practice. I'm sure it would
be good for synthetic CPU benchmarks. However using mmap significantly
increases the overhead of context switches/tlb misses.

To get good overall performance I suspect you're going to need closer
cooperation with the kernel than mmap gives you. If you really want to make
cross-emulation go fast I suggest working with the xen and/or kvm people to
integrate qemu dynamic translation into those products. In theory I'd guess
you should be able to plug it in as an alternative to the HVM code. I've no
idea how close that is to being practical.


http://wiki.xensource.com/xenwiki/HVM/V2E

The v2e stuff allows execution state to be extracted from the real CPU, 
plugged into QEmu for a bit for emulation, then transferred back to the real 
CPU again.  This is initially to be used for supporting Big Real Mode 
emulation on HVM platforms.  Later on it's planned to be used to accelerate 
IO emulation further.


Eventually this may provide a means to use QEmu's translator to execute kernel 
code whilst running user mode code under Xen.  It may be that this isn't as 
fast as other approaches, but it'd be a useful feature for Xen to have IMO.


I'd absolute like to get there.  The current Xen HVM code is a bit of a 
mess at the moment.  There are far too many assumptions about having 
VT/SVM hardware present.  However, I'd really like to get to a point 
where Xen could run unmodified guests on non VT/SVM hardware using qemu 
with user virtualization.


KVM is having similar trouble ATM with big real mode.  It would be very 
nice to also add a V2E like interface to KVM.


I'd also like to see the translator even used for paravirtual guests. 
This way we could still have a proper boot loader run without having to 
deal with paravirtualizing it.  It would be nice to use emulation to 
avoid paravirtualizing the non-performance critical bits.


Regards,

Anthony Liguori


Cheers,
Mark


Wouldn't this be a *significant*
performance enhancement for this setup which I'm sure is a common one?
Maybe this can be implemented for regular processes on the guest and
only use the softmmu for the kernel?  Would someone point me in the
right direction for technical information?  I have had to switch to
vmware free player until Qemu+KQEMU reaches a point of similar
performance, but I would really rather see Qemu advance further.

If you're using an accelerator (eg. kqemu or kvm) this is all irelevant as
most code isn't run by qemu, it's virtualized by the accelerator. qemu just
does the IO emulation.

Paul


___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel








Re: [Qemu-devel] [patch] factor out commonly used scancode translation table

2007-03-25 Thread Anthony Liguori

Bernhard Fischer wrote:

On Thu, Jan 04, 2007 at 06:10:30PM +, Thiemo Seufer wrote:
  

Bernhard Fischer wrote:


On Thu, Jan 04, 2007 at 12:52:56PM -0500, Jonathan Phenix wrote:
  

Bernhard Fischer wrote:


Hi,

The attached patch moves the x_keycode_to_pc_keycode LUT from sdl.c into
an x_keycode.c. This struct is also used by the GGI backend (that is not
yet merged ¹).

Comments?
 
  
How it is done right now, each time x_keycode.c is included, you will 
end up with an extra copy in the final executable. Perhaps that simply 
keeping the LUT in sdl.c but removing the 'static' keyword from it and 
creating a sdl.h file with the statement:


Yes, or create one public accessor func (_translate_keycode() or the
like). I don't have SDL installed, so only have the LUT once, but you're
of course right.

What's the preferred method? public LUT or public accessor?
  

Public accessor, I'd say. Keystroke processing isn't performance critical.



New patch with a public accessor is attached. Couldn't think of a better
name, please feel free to change it..
  


Is it worth evening worry about this if the GGD patch isn't going to 
eventually end up in CVS?


Why have the abstraction if it's not going to be used?  I'm not really 
sure I see the value in having GGD...


Regards,

Anthony Liguori


thanks,
  



--- ../qemu_trunk.orig/vl.h 2006-12-27 14:17:48.0 +0100
+++ vl.h2007-01-04 19:27:07.0 +0100
@@ -869,6 +873,9 @@ void cocoa_display_init(DisplayState *ds
 /* vnc.c */
 void vnc_display_init(DisplayState *ds, const char *display);
 
+/* x_keymap.c */

+extern uint8_t _translate_keycode(const int key);
+
 /* ide.c */
 #define MAX_DISKS 4
 
--- ../qemu_trunk.orig/sdl.c	2006-12-11 23:33:26.0 +0100

+++ sdl.c   2007-01-04 19:28:30.0 +0100
@@ -122,88 +122,6 @@ static uint8_t sdl_keyevent_to_keycode(c
 
 #else
 
-static const uint8_t x_keycode_to_pc_keycode[115] = {

-   0xc7,  /*  97  Home   */
-   0xc8,  /*  98  Up */
-   0xc9,  /*  99  PgUp   */
-   0xcb,  /* 100  Left   */
-   0x4c,/* 101  KP-5   */
-   0xcd,  /* 102  Right  */
-   0xcf,  /* 103  End*/
-   0xd0,  /* 104  Down   */
-   0xd1,  /* 105  PgDn   */
-   0xd2,  /* 106  Ins*/
-   0xd3,  /* 107  Del*/
-   0x9c,  /* 108  Enter  */
-   0x9d,  /* 109  Ctrl-R */
-   0x0,   /* 110  Pause  */
-   0xb7,  /* 111  Print  */
-   0xb5,  /* 112  Divide */
-   0xb8,  /* 113  Alt-R  */
-   0xc6,  /* 114  Break  */   
-   0x0, /* 115 */

-   0x0, /* 116 */
-   0x0, /* 117 */
-   0x0, /* 118 */
-   0x0, /* 119 */
-   0x0, /* 120 */
-   0x0, /* 121 */
-   0x0, /* 122 */
-   0x0, /* 123 */
-   0x0, /* 124 */
-   0x0, /* 125 */
-   0x0, /* 126 */
-   0x0, /* 127 */
-   0x0, /* 128 */
-   0x79, /* 129 Henkan */
-   0x0, /* 130 */
-   0x7b, /* 131 Muhenkan */
-   0x0, /* 132 */
-   0x7d, /* 133 Yen */
-   0x0, /* 134 */
-   0x0, /* 135 */
-   0x47, /* 136 KP_7 */
-   0x48, /* 137 KP_8 */
-   0x49, /* 138 KP_9 */
-   0x4b, /* 139 KP_4 */
-   0x4c, /* 140 KP_5 */
-   0x4d, /* 141 KP_6 */
-   0x4f, /* 142 KP_1 */
-   0x50, /* 143 KP_2 */
-   0x51, /* 144 KP_3 */
-   0x52, /* 145 KP_0 */
-   0x53, /* 146 KP_. */
-   0x47, /* 147 KP_HOME */
-   0x48, /* 148 KP_UP */
-   0x49, /* 149 KP_PgUp */
-   0x4b, /* 150 KP_Left */
-   0x4c, /* 151 KP_ */
-   0x4d, /* 152 KP_Right */
-   0x4f, /* 153 KP_End */
-   0x50, /* 154 KP_Down */
-   0x51, /* 155 KP_PgDn */
-   0x52, /* 156 KP_Ins */
-   0x53, /* 157 KP_Del */
-   0x0, /* 158 */
-   0x0, /* 159 */
-   0x0, /* 160 */
-   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 170 */
-   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 180 */
-   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 190 */
-   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 200 */
-   0x0, /* 201 */
-   0x0, /* 202 */
-   0x0, /* 203 */
-   0x0, /* 204 */
-   0x0, /* 205 */
-   0x0, /* 206 */
-   0x0, /* 207 */
-   0x70, /* 208 Hiragana_Katakana */
-   0x0, /* 209 */
-   0x0, /* 210 */
-   0x73, /* 211 backslash */
-};
-
 static uint8_t sdl_keyevent_to_keycode(const SDL_KeyboardEvent *ev)
 {
 int keycode;
@@ -216,7 +134,7 @@ static uint8_t sdl_keyevent_to_keycode(c
 keycode -= 8; /* just an offset */
 } else if (keycode  212) {
 /* use conversion table */
-keycode

Re: [Qemu-devel] QEMU: VNC

2007-03-25 Thread Anthony Liguori

Christopher Olsen wrote:

Anyone here know if there is a way to append to the VNC display header?
  


I'm not sure I understand what you're asking.  What is the VNC display 
header?


Regards,

Anthony Liguori


-Christopher


  






Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding

2007-03-25 Thread Anthony Liguori

Ramesh Dharan wrote:

[EMAIL PROTECTED] wrote:


This extension is documented at
http://tocm.wikidot.com/pointertypechange
  

VMware has a very similar extension for their remote console.  I
believe
that Ramesh Dharan (whom I've CCed) at some point implemented it in one
or more open source clients.  Perhaps some coordination here is
possible?



Hi Ramesh,

Sorry for the delayed response.  Believe it or not, your note just made 
it to qemu-devel (along with a month or so of backlogged mail).



Yeah I played implemented support for VMware's relative pointer extension, as
well as our grab metaphor, and some other stuff, on top of the RealVNC
4.1.1 codebase. I unfortunately never got the patch cleaned up enough for
submitting back to RealVNC however...

I have a work-in-progress document that defines our extensions but it's not
really ready for public consumption yet. I'm not sure of the context here,
but I assume QEMU is using VNC as the wire protocol for displaying
framebuffer contents remotely?


Yup.


 Are you writing a new VNC client from scratch
or building on one of the existing GPL'ed clients?
  


At the moment, we work out of the box with existing clients (albeit with 
reduced functionality).  I have my own VNC client (gtk-vnc).  The VNC 
extensions are also supported by virt-manager.


As they mature, I'd like to get them into the main VNC clients too.


The main limitations we've run into with using the VNC protocol for VM
interaction are:

(1) no support for keyboard scancodes (VNC uses higher level symbolic keys
which aren't necessarily 1:1 mappable back to their scancode equivalents in
all the scenarios we care about)
  


Yes, right now, we do not address this.


(2) no relative mouse support
  


We have an extension that addresses this.


(3) no alpha cursor support
  


There is a pseudo-encoding that provides alpha cursors with a 1-bit 
alpha channel.  An 8 bit alpha channel would be nice of course.



(4) no dynamic pixel depth/bpp change support
  


This is something I do want to fix.  This isn't just a server-side 
issue, there is also a race in the protocol when a client uses 
SetPixelFormat.  A proper notification of when the server switches pixel 
format after a SetPixelFormat would solve this problem.


(5) No notion of multiplexing displays on a single port. 
  


Right.  ATM, I don't really plan on addressing this.


(6) No client-server message negotiation.
  


We actually can do this.  I've reserved the 255 client message ID which 
I'm using to multiplex various other messages.  I currently use this to 
implement a shared memory pseudo-encoding for VNC.  The idea is that on 
localhost, the server can render directly to a shared memory segment 
that the client also maps.  If the one is smart about how this shared 
memory segment is chosen, it can be used as an XShmImage.



I implemented new client-server messages for the (1) and (2), and
server-client pseudorectangle extensions for (3) and (4). We deal with (5)
in a different way; our display multiplexing is handled at a higher level. We
ignored (6) since our clients currently only talk to our servers.

Looking at the linked site, it looks like you folks have already identified
and are planning to deal with (2), possibly (5) (via some games with the
display name?) and (6). I'd be particularly interested to hear more about
plans for how to address (5), it would be great if off-the-shelf/3rd-party
clients could talk to multiple VMs running on a host without needing to know
and use a separate the port for each VM. 


Our remote consoles/clients are heavily tied to our products and there aren't
any real plans to divest them or make them more generally useful in the near
future. However, that's likely to change someday, and anyway I'd be happy to
provide feedback on, and implement support for these extensions so that e.g.
a QEMU client could talk to a VMware instance, and in general get
standardization to the point where it's possible to build a single remote
client that could talk to the various VM implementations (QEMU, Xen, VMware,
etc.). 
  


I believe http://wiki.multimedia.cx/index.php?title=VMNC covers some of 
your extensions right?  Have you gotten the appropriate encodings 
reserved in the RFB spec?  I don't want to support any encodings that 
aren't properly registered so this would be an important first step.


I'm certainly happy to work toward a common set of VNC extensions to 
cover virtualization.  It's in everyones benefit to ensure that the 
largest number of clients Just Work out of the box.


Regards,

Anthony Liguori


--
Ramesh



  






Re: [Qemu-devel] QEMU/pc and scsi disks

2007-03-25 Thread Anthony Liguori

Joe Batt wrote:
I disagree.  /bin/sh makes a very flexible config file format that I 
use.  I use it on win32, Linux and Mac OS X.


The problem with only taking command line arguments is that the number 
and size of command line arguments are severely limited on certain 
platforms.  This is why almost every sufficiently aged/portable program 
supports either a config file or a method of taking command line options 
via a file.  The later is really just a particular format of a config file.


If you care about scripting QEMU, then just do something like:

qemu -config - EOF
hda=${myhda}
hdb=${myhdb}
EOF

Regards,

Anthony Liguori

I would prefer that you write another cross platform shell, than 
another config file.  At least that way I could use the same config 
tool for more than one application.


Everytime this comes up, do I have to disagree again, so that my voice 
is not lost?


Any yes, adding features that I do not use increases the complexity 
and decreases the stability of the features I do use, so it would 
effect me.  I have the same feeling about embedding VNC 
authentication, the samba server, etc.


Joe

On Mar 2, 2007, at 11:24 AM, Anthony Liguori wrote:


Paul Brook wrote:
There's also no reason to limit to 7 disks, and we should support 
scsi

cdroms.


The reason for 7 is the number of available id on the scsi bus.


For wide scsi it is 15.



I wouldn't bet on wide scsi working.
For PCI based systems you can add more host adapters to get more 
devices. I haven't actually tested this, but it should work.




I think most people agree that we need a config file.  I haven't seen 
any comments on my config file patch though.


So, any comments on that patch?  Any requirements on a format?

Regards,

Anthony Liguori


Paul


___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel






___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel











Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding

2007-03-25 Thread Anthony Liguori

Ramesh Dharan wrote:

I implemented new client-server messages for the (1) and (2), and
  


I should have read more carefully.  This means that you're not a 
compliant RFB server ATM.


I guess that means there isn't a point registering the pseudo-encodings 
you are currently using since you have to change the server/client 
anyway.  It's easier anyway since pseudo-encodings are supposed to be 
negative numbers and you're using positive numbers.


I'm happy to start looking at reimplementing your encodings using my 
pseudo-encoding range.  It should be real easy for ya'll to support 
these too.  I'll spend a little time in the next week enumerating these 
extensions and assigning proper values to them.


Do you have any documentation on your currently client messages?  Those 
aren't documented as part of the FFMPEG project.


Regards,

Anthony Liguori


server-client pseudorectangle extensions for (3) and (4). We deal with (5)
in a different way; our display multiplexing is handled at a higher level. We
ignored (6) since our clients currently only talk to our servers.

Looking at the linked site, it looks like you folks have already identified
and are planning to deal with (2), possibly (5) (via some games with the
display name?) and (6). I'd be particularly interested to hear more about
plans for how to address (5), it would be great if off-the-shelf/3rd-party
clients could talk to multiple VMs running on a host without needing to know
and use a separate the port for each VM. 


Our remote consoles/clients are heavily tied to our products and there aren't
any real plans to divest them or make them more generally useful in the near
future. However, that's likely to change someday, and anyway I'd be happy to
provide feedback on, and implement support for these extensions so that e.g.
a QEMU client could talk to a VMware instance, and in general get
standardization to the point where it's possible to build a single remote
client that could talk to the various VM implementations (QEMU, Xen, VMware,
etc.). 


--
Ramesh



  






Re: [Qemu-devel] Re: [PATCH] Reducing X communication bandwidth, take 2

2007-03-25 Thread Anthony Liguori

Brian Johnson wrote:

Paul Brook wrote:

Will this work also for the CL542x adaptor?  (Does that fall in the
category of vga?)  My current hack works for with/without -std-vga 
and I

think that's because it lives underneath both, in the connection to
SDL.

Each adapter will have to do it's own minimization but that's sort of
the write thing anyway IMHO.  How granular each update is really only
depends on the adapter.  For instance, the VMware adapter really
shouldn't need to do any minimization at all.


It would be nice if we could share the framebuffer blitting routines. 
We've currently got 3 different implementations (vga/cirrus, tcx and 
pl110) of basically the same framebuffer rendering routines.


Take a look at the video code in BasiliskII / SheepShaver, a 68k/PPC 
classic MacOS emulator written by Gwenolé Beauchesne:


http://gwenole.beauchesne.info/projects/sheepshaver/

It contains optimized code (source level) for blitting between various 
bit depths and endiannesses.  See 
SheepShaver-2.3/src/Unix/video_blit.{h,cpp} in the sources.


It also uses a technique called video on segfault (VOSF) to improve 
performance on platforms which support it:  


This is what QEMU already does.  In fact, it's an exceedingly common 
technique.


Regards,

Anthony Liguori

rather than testing each store to see if it modifies the framebuffer, 
it keeps the framebuffer write-protected (via mprotect(), or the 
equivalent on non-POSIX systems) and uses a SIGSEGV handler to catch 
stores to the buffer.  When a page receives a store, the handler 
unprotects the page and updates a bitmap of modified pages.  Every so 
often a display update thread wakes up, consults the bitmap, 
calculates the updated region, blits it to the screen (using the 
optimized blitters), and clears the bitmap.  See 
SheepShaver-2.3/src/Unix/video_vosf.h, 
SheepShaver-2.3/src/Unix/video_x.cpp, and other files.


The emulators also have alternative techniques for tracking update 
regions on systems for which VOSF is not supported.  But VOSF is 
almost always a big win.  And most modern OSes can support it without 
trouble.


The code is in C++ so it can't be dropped in directly, but Some of the 
techniques may be useful in qemu.



Brian J. Johnson









Re: [Qemu-devel] [PATCH] Implement Win32 locking in vl.c (create_pidfile).

2007-03-25 Thread Anthony Liguori

Carlos O'Donell wrote:

The following patch implements Win32 locking for vl.c (create_pidfile).
Builds for mingw32, and tested on a Windows Server 2003 host by running
two qemu's with the same -pidfile option.

When cross-compiling the use of --enable-mingw32 should effect AIOLIBS,
otherwise the build will use -lrt and this is not valid when compiling
with mingw32.
  


Instead of using an #ifdef, I think it would be better to have a win32 
specific file and a unix specific file that compiles conditionally 
depending on the host.


Sort of like and osdep-posix.c and and osdep-win32.c.  Both would 
implement a create_pidfile() function.  I think this general approach 
would help clean up a lot of the win32 code.


Thoughts?

Regards,

Anthony Liguori


Patch attached for both. Built on linux and tested on Windows Server
2003.

Comments? Please include me in the CC.

Cheers,
Carlos.
  






Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works

2007-03-25 Thread Anthony Liguori

Axel Zeuner wrote:

On Saturday 24 March 2007 21:15, Anthony Liguori wrote:
  

The tricky thing I still can't figure out is how to get ASM_SOFTMMU
working.  The problem is GLUE(st, SUFFIX) function.  First GCC cannot
deal with the register pressure.  The problem I can't seem to fix though
is that GCC sticks %1 in %esi because we're only using an r
constraint, not a q constraint.  This results in the generation of
%sib which is an invalid register.  However, refactoring the code to not
require a q constraint doesn't seem to help either.


Hi Anthony,
could you please try the attached patch for softmmu_header.h? Allows compiling 
with gcc4 and ASM_SOFTMMU.
  


That did the trick.  Could you explain what your changes did?

Regards,

Anthony Liguori


Kind regards
Axel
  






Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works

2007-03-25 Thread Anthony Liguori

Axel Zeuner wrote:

On Saturday 24 March 2007 21:15, Anthony Liguori wrote:
  

Axel Zeuner wrote:


Hi,
  

Hi Axel,

By adding some GCC4 fixes on top of your patch, I was able to get qemu
for i386 (on i386) to compile and run.  So far, I've only tested a win2k
guest.


Hi Anthony,

thank you for the test, I like to hear about your success. I have applied your 
patches, compiled and checked qemu-i386-softmmu on i386 without kqemu with 
FreeDos. It works also.


  

The big problem (which pbrook helped me with) was GCC4 freaking out over
some stq's.  Splitting up the 64bit ops into 32bit ops seemed to address
most of the problems.

The tricky thing I still can't figure out is how to get ASM_SOFTMMU
working.  The problem is GLUE(st, SUFFIX) function.  First GCC cannot
deal with the register pressure.  The problem I can't seem to fix though
is that GCC sticks %1 in %esi because we're only using an r
constraint, not a q constraint.  This results in the generation of
%sib which is an invalid register.  However, refactoring the code to not
require a q constraint doesn't seem to help either.

In the past I made some patches (not published yet) to speed up the helpers 
for 64 operations in target-i386/helper.c on x86_64 and i386 using gcc inline 
assembly.  x86_64 was really easy, but for i386 I had to use m and =m 
constraints and as less inputs and outputs as possible. 
  

The attached patch is what I have so far.  Some help with people more
familiar with gcc asm foo would be appreciated!



May I suggest some changes?  
I would like to try not to split the 64 bit accesses on hosts supporting it 
native, i.e. something like this:

===
--- cpu-all.h   (revision 16)
+++ cpu-all.h   (working copy)
@@ -339,7 +339,13 @@

 static inline void stq_le_p(void *ptr, uint64_t v)
 {
-*(uint64_t *)ptr = v;
+#if (HOST_LONG_BITS  64)
+uint8_t *p = ptr;
+stl_le_p(p, (uint32_t)v);
+stl_le_p(p + 4, v  32);
+#else
+*(uint64_t*)ptr = v;
+#endif
 }
  


Yes, I think the proper thing to do is to use a configure check for GCC 
version to determine whether or not to use the 32 bit or 64 version of 
stq_le_p.


There is already a function in cpu-all.h that does the 32 bit version.

Furthermore I think one should move helper_pshufw() from target-i386/helper2.c 
into target-i386/helper.c where all the other helper methods reside.   
  


I moved to helper2.c because AFAICT helper.c is compiled with the same 
sort of restrictions as op.c which leads to the compile failure.


Regards,

Anthony Liguori


Kind Regards
Axel

  

Regards,

Anthony Liguori




  






Re: [Qemu-devel] [PATCH] Guest mouse cursor drawing in SDL

2007-03-25 Thread Anthony Liguori

andrzej zaborowski wrote:

Hi, sorry for late reply.

On 21/03/07, Anthony Liguori [EMAIL PROTECTED] wrote:

Do you think using the SDL cursor is all that useful?  As soon as gtk
widgets get involved, the cursor becomes ARGB so in practice, I'm not
sure that it's all that helpful.


It's pretty cool that you have only one cursor and that it's either
in the VM or outside, and it's full hardware drawn. In addition
without it, I think you can't get cursor refreshes between the SDL
refresh callbacks. So with the patch the cursor movement is really
smooth and responsive, eliminating the SDL latency. I find it amazing
that it works :p


Hrm, interesting.  I actually want to rethink the mouse interface again 
as I've been thinking that even if an absolute mouse isn't available, 
provided that you know the location of the guest cursor, you can 
implement a smarter grab/ungrab.



I think VNC support is not a reason to drop this couple of lines in
sdl.c (cause there's no API changes related).



BTW, I've got the ALPHA_CURSOR working with VNC.. it's very sweet :-)


Awesome! Are there screenshots?


I could make them, but they just show a mouse in the window :-)  BTW, 
have you thought about how to integrate std-vga into the vmware vga 
device?  I took a look at it and it seems like it will be overly 
complicated based on the way VGA memory dirtying is done now.


Regards,

Anthony Liguori


Regards,
Andrew







Re: [Qemu-devel] QEMU + -std-vga + XFree86

2007-03-25 Thread Anthony Liguori

Kyle Hubert wrote:

Hi, I'm working with QEMU on an XFree86 machine. I was desirous of
having 1600x1200 working in the virtual machine, so I looked at using
the -std-vga option with the vesa X driver.


You probably have to increase your hsync and vrefresh ranges.

Regards,

Anthony Liguori


While initializing X, I get the following:

(II) VESA(0): Primary V_BIOS segment is: 0xc000
(II) VESA(0): VESA BIOS detected
(II) VESA(0): VESA VBE Version 0.0
(II) VESA(0): VESA VBE Total Mem: 0 kB
(II) VESA(0): VESA VBE OEM: ^A

The log continues on to say:

(II) VESA(0): Total Memory: 0 64KB banks (0kB)
(II) VESA(0): Generic Monitor: Using hsync range of 28.00-51.00 kHz
(II) VESA(0): Generic Monitor: Using vrefresh range of 43.00-60.00 Hz
(II) VESA(0): Not using mode 1600x1200 (no mode of this name)
(II) VESA(0): Not using mode 1280x1024 (no mode of this name)
(II) VESA(0): Not using mode 1024x768 (no mode of this name)
(II) VESA(0): Not using mode 800x600 (no mode of this name)
(II) VESA(0): Not using built-in mode 640x480 (insufficient memory
given virtual size)

I'm not positive this is my problem, but it looks like it isn't
reporting the vram size correctly, does this sound correct?

Thanks for the time.

-Kyle









Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works

2007-03-28 Thread Anthony Liguori

Axel Zeuner wrote:

Hi Anthony,

On Monday 26 March 2007 01:44, you wrote:
  

Axel Zeuner wrote:


On Saturday 24 March 2007 21:15, Anthony Liguori wrote:
  

The tricky thing I still can't figure out is how to get ASM_SOFTMMU
working.  The problem is GLUE(st, SUFFIX) function.  First GCC cannot
deal with the register pressure.  The problem I can't seem to fix though
is that GCC sticks %1 in %esi because we're only using an r
constraint, not a q constraint.  This results in the generation of
%sib which is an invalid register.  However, refactoring the code to not
require a q constraint doesn't seem to help either.


Hi Anthony,
could you please try the attached patch for softmmu_header.h? Allows
compiling with gcc4 and ASM_SOFTMMU.
  

That did the trick.  Could you explain what your changes did?



QEMU/i386 has only 3 three available registers if TARGET_I386 is selected 
because ebx,ebp,esi,edi are used by the environment and T0, T1, T3( AKA A0). 
This makes inline assembly really ugly. The called external C functions in  
ASM_SOFTMMU are REGPARM(1,2), i.e. require their first arguments in eax, edx.
  


Based on some feedback from Paul Brook, I wrote another patch that just 
disables the use of register variables for GCC4.  I think this is a 
considerably less hackish way to go about this.


The generated code won't be as nice of course but at least it works.  
The patch applies against your cvtasm patches.


Regards,

Anthony Liguori

In the two ld functions three registers (eax, edx, ecx) are required and 
destroyed because an external C function may be called. We relax the register 
pressure a little bit by forcing the return value (res) into eax , because 
the return value is returned in a destroyed register. Furthermore the called 
C function returns its value in eax anyway (call %7). 

The st functions are a little more tricky: we need three registers and the 
assembly code requires a reload of %0 (ptr) after the check if the external 
function must be called. In the external function the three remaining 
registers are destroyed. After the call a need also to reload of %1 (v) into 
register is needed, i.e. we need more registers. Register saving on the stack 
does not work, because there exist already 2 m constraints: if the code is 
compiled with -fomit-frame-pointers these are expressed as offsets relative 
to %esp, i.e X(%esp) and would become invalid after pushes onto the stack.


One solution was to force all inputs to the asm block onto the stack, thats 
what the replacement of the r constraints into m constraints do: they 
force a memory reference. Because i386 can not do direct memory memory moves 
one has to reload m(v) into ecx again, otherwise the generated assembler 
code is invalid.
It must be mentioned, that the generated code is a little bit slower than the 
original one.


Kind Regards 
Axel
  

Regards,

Anthony Liguori



Kind regards
Axel
  


  


diff -r d19a5903d749 softmmu_header.h
--- a/softmmu_header.h	Tue Mar 27 13:23:10 2007 -0500
+++ b/softmmu_header.h	Tue Mar 27 13:23:21 2007 -0500
@@ -240,9 +240,13 @@ static inline void glue(glue(st, SUFFIX)
   2:\n
   : 
   : r (ptr), 
+#ifdef USE_REGISTER_VARIABLES
 /* NOTE: 'q' would be needed as constraint, but we could not use it
with T1 ! */
   r (v), 
+#else
+		  q (v),
+#endif
   i ((CPU_TLB_SIZE - 1)  CPU_TLB_ENTRY_BITS), 
   i (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS), 
   i (TARGET_PAGE_MASK | (DATA_SIZE - 1)),
diff -r d19a5903d749 target-i386/cpu.h
--- a/target-i386/cpu.h	Tue Mar 27 13:23:10 2007 -0500
+++ b/target-i386/cpu.h	Tue Mar 27 13:23:21 2007 -0500
@@ -26,6 +26,10 @@
 #define TARGET_LONG_BITS 64
 #else
 #define TARGET_LONG_BITS 32
+#endif
+
+#if TARGET_LONG_BITS = HOST_LONG_BITS  __GNUC__  4
+#define USE_REGISTER_VARIABLES
 #endif
 
 /* target supports implicit self modifying code */
@@ -424,7 +428,7 @@ typedef union {
 #endif
 
 typedef struct CPUX86State {
-#if TARGET_LONG_BITS  HOST_LONG_BITS
+#ifndef USE_REGISTER_VARIABLES
 /* temporaries if we cannot store them in host registers */
 target_ulong t0, t1, t2;
 #endif
diff -r d19a5903d749 target-i386/exec.h
--- a/target-i386/exec.h	Tue Mar 27 13:23:10 2007 -0500
+++ b/target-i386/exec.h	Tue Mar 27 13:23:21 2007 -0500
@@ -27,12 +27,16 @@
 #define TARGET_LONG_BITS 32
 #endif
 
+#if TARGET_LONG_BITS = HOST_LONG_BITS  __GNUC__  4
+#define USE_REGISTER_VARIABLES
+#endif
+
 #include cpu-defs.h
 
 /* at least 4 register variables are defined */
 register struct CPUX86State *env asm(AREG0);
 
-#if TARGET_LONG_BITS  HOST_LONG_BITS
+#ifndef USE_REGISTER_VARIABLES
 
 /* no registers can be used */
 #define T0 (env-t0)
@@ -88,7 +92,7 @@ register target_ulong EDI asm(AREG11);
 #define reg_EDI
 #endif
 
-#endif /* ! (TARGET_LONG_BITS  HOST_LONG_BITS) */
+#endif /* ! USE_REGISTER_VARIABLES */
 
 #define A0 T2
 


Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works

2007-03-29 Thread Anthony Liguori

Axel Zeuner wrote:

Hi Anthony,

On Thursday 29 March 2007 04:07, you wrote:
  

Axel Zeuner wrote:


Hi Anthony,

On Monday 26 March 2007 01:44, you wrote:
  

Axel Zeuner wrote:


On Saturday 24 March 2007 21:15, Anthony Liguori wrote:
  

The tricky thing I still can't figure out is how to get ASM_SOFTMMU
working.  The problem is GLUE(st, SUFFIX) function.  First GCC cannot
deal with the register pressure.  The problem I can't seem to fix
though is that GCC sticks %1 in %esi because we're only using an r
constraint, not a q constraint.  This results in the generation of
%sib which is an invalid register.  However, refactoring the code to
not require a q constraint doesn't seem to help either.


Hi Anthony,
could you please try the attached patch for softmmu_header.h? Allows
compiling with gcc4 and ASM_SOFTMMU.
  

That did the trick.  Could you explain what your changes did?


QEMU/i386 has only 3 three available registers if TARGET_I386 is selected
because ebx,ebp,esi,edi are used by the environment and T0, T1, T3( AKA
A0). This makes inline assembly really ugly. The called external C
functions in ASM_SOFTMMU are REGPARM(1,2), i.e. require their first
arguments in eax, edx.
  

Based on some feedback from Paul Brook, I wrote another patch that just
disables the use of register variables for GCC4.  I think this is a
considerably less hackish way to go about this.

The generated code won't be as nice of course but at least it works.
The patch applies against your cvtasm patches.

Looks good to me, sorry I had no time yet to test your patch. Did you check 
the performance impact of your changes? 
Perhaps it is possible to use register variables in dependence of the register 
count of the host processor.
  


Yes, I need to update the patch to include a  defined(__i386__) and 
also to add the proper guards to the other architectures.


Regards,

Anthony Liguori


Kind Regards
Axel

  






Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding

2007-03-29 Thread Anthony Liguori

Ramesh Dharan wrote:

Anthony, I have a detailed response to your earlier e-mail but I wanted to
handle this discussion separately. 

  

I implemented new client-server messages for the (1) and (2), and
  
I should have read more carefully.  This means that you're not a 
compliant RFB server ATM.



I'm not sure I follow you. We support standard VNC clients, and we don't
violate the protocol expectations of any standard VNC clients who talk to us.


However, if a client *happens* to send us these new client-server messages
(which don't exist in the standard spec), then we support them. 


We never send a message to a client that the client doesn't know how to
handle. 
  


Right, but how does your client determine that the server supports these 
new messages?


The proper way to use new client message types (which is now described 
in the RFB spec) is to advertise a new pseudo-encoding for that client 
message type and wait for the server to send the pseudo-encoding back to 
the client.  That lets the client know that it is safe to use the new 
client message type.  This is what I'm using for my shared memory encoding.


Otherwise, there's no way to write a client that works with the 
enhanced server and a normal VNC server.


I guess that means there isn't a point registering the 
pseudo-encodings 
you are currently using since you have to change the server/client 
anyway.  It's easier anyway since pseudo-encodings are supposed to be 
negative numbers and you're using positive numbers.



No, there's still a point. The display path and input path are essentially
independent. Better input handling requires extending the client-server path
(so the client can send new kinds of data other than the standard VNC
keyboard and pointer events). Unfortunately, extension of the protocol in
this direction was not baked into the original design. 


I think that basically we should actually extend the RFB protocol itself to
just have a server-client message, SetSupportedClientMessages, which works
exactly the way SetEncodings works today, i.e. the server can send down to
the client a list of the messages which it can handle, and clients should not
send message types to the server if the server doesn't support them. 
  


The mechanism I described above is what the current preferred method 
is.  If you want, we can bring the topic up with the VNC authors as 
AFAIK I'm the only person with a reserved client message type.  Of 
course, I think using a pseudo-encoding is a perfectly suitable way to 
address this problem.


Regards,

Anthony Liguori


--
Ramesh

  






Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding

2007-03-29 Thread Anthony Liguori
 it should start decoding pixels 
differently.


In practice, most VNC servers do not support arbitrary pixel formats too 
so it makes a lot of sense for server's to have this message available 
to them.


(5) No notion of multiplexing displays on a single port. 
  
  

Right.  ATM, I don't really plan on addressing this.



So, the old Connectix Virtual Server product (which was in beta when
Microsoft snapped them up) had a super slick way of dealing with this, that
we actually joked about doing but never seriously thought anyone would do.
They rendered an entire picker UI using VNC primitives. The UI would show
live thumbnails of each VM, and then when you click on one of the tiles,
you'd get redirected to that console. You couldn't go back out to the picker,
but still, it was pretty neat. 


I've since looked into doing the same thing, but the closest thing to any VNC
UI/widget rendering library that I've found is TclRFB (which, as the name
would imply, is written in Tcl). 


Anyway, what you really want is a way to say you've connected to a
multi-home server, here's a list of VMs available (with arbitrary or perhaps
structured metadata attached to each VM tag), which one do you want?. And
then have the ability for the client ot pick. But you can't really do this
sanely with just extensions, I think this is a good candidate for actually
extending the base protocol.
  


I have seen KVM switches that use VNC implement an extension that is 
basically, SetName.  They then use a client encoding to switch the KVM.  
This is definitely an interesting area to pursue.


The idea is that on localhost, the server can render directly to a shared 
memory segment that the client also maps.  If the one is smart about how 
this shared memory segment is chosen, it can be used as an XShmImage.



Yeah we did the exact same thing, with a slight twist. We get the server to
allocate the SHM segment, and the client just attaches to it. If you do it
this way, you don't need a new client-server message, the server just sends
the client the shm id if the client supports the SharedFB extension (that's
what we named it).
  


I chose to have the client allocate the memory since it knew the X 
server's line size which is required to allocate an XShmImage.  My first 
implementation had the server allocate the buffer but required the 
client to tell the server what the line size should be.



There's some complexity involved with how the server can actually figure out
whether the client is on the same machine (e.g. if the client is tunneling
over SSH, this is not easy to do). So in fact what we do is just if the
client supports SharedFBInfo and SharedFBRect, the server just sends along
the messages. The client then tries to map the segment described in
SharedFBInfo. If it fails, it just sends a new encoding set with those
encodings removed, and the server starts sending real pixel data instead. 
  


I've thought about this problem too.  My fear is that one could 
accidentally map a shared memory area that's actually valid.



Yes, this means that we always try and fail sharedfb so there's a couple
extra roundtrips before remote clients actually get real pixel data. This was
deemed acceptable (by me ;-) for our use cases since most networks are fast
enough. But if this is too hackish for your taste we could bake in something
better. 
  


I really haven't thought of a better solution myself.  That doesn't mean 
one exists though.


I believe http://wiki.multimedia.cx/index.php?title=VMNC 
covers some of 
your extensions right?  Have you gotten the appropriate encodings 
reserved in the RFB spec?  I don't want to support any encodings that 
aren't properly registered so this would be an important first step.



So once upon a time, we tried to register encodings, but we didn't get very
far. It sounds like perhaps the process has improved since then. I will try
to get the ball rolling on this again. 
  


I can help you with that if you want.  It wasn't a painful process at 
all for me.


I'm certainly happy to work toward a common set of VNC extensions to 
cover virtualization.  It's in everyones benefit to ensure that the 
largest number of clients Just Work out of the box.



Agreed. I look forward to hearing more of your thoughts on this subject.
  


Same here!  Thanks for the response.

Regards,

Anthony Liguori


--
Ramesh

  






Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding

2007-03-29 Thread Anthony Liguori

Ramesh Dharan wrote:
The proper way to use new client message types (which is now 
described 
in the RFB spec) is to advertise a new pseudo-encoding for 
that client 
message type and wait for the server to send the 
pseudo-encoding back to 
the client.  That lets the client know that it is safe to use the new 
client message type.  This is what I'm using for my shared 
memory encoding.


Otherwise, there's no way to write a client that works with the 
enhanced server and a normal VNC server.



Ok, yeah that makes sense. So yeah basically we would need to add new server
encodings for our client-server messages, and then you get the server to
send dummy ack messages for each one to say yes, I understand this message
type?
  


Yup.

The mechanism I described above is what the current preferred method 
is.  If you want, we can bring the topic up with the VNC authors as 
AFAIK I'm the only person with a reserved client message type.  Of 
course, I think using a pseudo-encoding is a perfectly 
suitable way to 
address this problem.



Yeah the only problem I see with it is that I don't see how the server can
dynamically *change* the set of client messages that it accepts?
  


To do it in general?  Yeah, I don't think there's a solution.  Of 
course, a SetServerEncodings would introduce a race.  What does the 
server do if it receives one of the new special client messages after 
sending the SetServerEncodings message (but before the client receives 
the SetServerEncodings message)?


I think it's easier to just add something to the psuedo-encoding to 
allow client messages that make sense to disable to be disabled on a 
case-by-case basis.


Regards,

Anthony Liguori


--
Ramesh

  






Re: [Qemu-devel] Patch: dyngen-exec.h for OpenBSD

2007-04-06 Thread Anthony Liguori

Paul Brook wrote:

On Thursday 05 April 2007 23:12, Todd T. Fries wrote:
  

Penned by Thiemo Seufer on 20070402 10:54.53, we have:
|   /* NOTE: standard headers should be used with special care at this
|  point because host CPU registers are used as global variables. Some
|  host headers do not allow that. */
|   #include stddef.h
|  -
|  +#ifdef __OpenBSD__
|  +#include sys/types.h
Hello? Portability?  sys/types.h defines these types portably.
Doing so the way this code does it, is not portable.



If you want  portability you should be including stdint.h (or inttypes.h for 
old, broken systems).
  


I thought I'd add that this isn't just portability.  stdint.h is what 
C99 mandates although as Paul mentions, some older systems used inttypes.h.


Regards,

Anthony Liguori


Why is it that qemu knows what the definition of these prototypes
are on all systems without consulting the header files.  I have a
better idea, lets let the header files define the prototypes.
Who would have though of that?



See the big NOTE: comment above. dyngen is inherently unportable.

Paul



  






Re: [Qemu-devel] Feature proposal: USB devices over TCP

2007-04-06 Thread Anthony Liguori

Eduardo Felipe wrote:

Hi everybody,

I think that a useful feature for QEMU would be to expose the USB 
interface through TCP.


Hi Eduardo,

I don't know all that much about USB, but I think the most useful way to 
do this would be to do something that's protocol compatible with USBIP.  
We could then tunnel this traffic over VNC and allow for exposing local 
USB devices to a remote VM.


Think of a virtual desktop being hosted on a server and exposed on a 
thin client.  If you could plug in your iPod and it would just work with 
the VM, that would be an exceedingly cool feature.


Are you familiar with USBIP?  If so, does this sound reasonable?

Regards,

Anthony Liguori

It would allow quick USB device development in high level languages 
without recompiling QEMU. We could have an instance of QEMU running 
all the time while we create our device and hot plug/unplug it 
whenever we want to.


This could also attract people interested in hardware emulation, but 
scared of learning QEMU internals just to create a simple new device.


I think USB is quite suited for this, as it is designed for pluggable 
external devices, but something similar could be made for serial and 
parallel devices too.


The attached patch is a quick hack derived from the VNC server just to 
show the idea, not intended for commiting. A dummy protocol is used 
for message interchange between server and client.


It adds the new command line option:
 -usbtcp port
It starts a socket listening on port for incoming connections. A 
sample USB mouse in python is also provided that moves the cursor in 
circles.


Would such a feature be of any interest for QEMU?

Regards,
Eduardo Felipe






Re: [Qemu-devel] Fixing ACPI for Vista -- Source code for bios.bin vs. bochs CVS?

2007-04-30 Thread Anthony Liguori

Paul Brook wrote:

On Monday 30 April 2007, Nakajima, Jun wrote:
  

We found a way to get 32-bit Vista up on KVM by fixing bios.bin. And I
think it should be helpful to qemu as well. However, I'm not sure if the
binary file has been created simply by applying bios.diff to the latest
bochs CVS tree (actually I doubt it).



What makes you think this won't work? Have you tried it?
  


This has worked for me in the past.

Regards,

Anthony Liguori


Paul



  






Re: [Qemu-devel] stale pid files?

2007-06-01 Thread Anthony Liguori

Axel Kittenberger wrote:

Hi list again, another suggestion.

I use the deamons pidfile feature to write pidfiles so I can controll 
the virtual machines with /etc/init.d/bla scripts, to e.g. auto spawn 
the guests when the server starts, auto/start/stop with runlevels and 
so on.. However I got following issue, the daemon wont start if there 
is a pidfile already (yes no problem for user to delete, but for auto 
spawning after unattended boot) . Since e.g. following situation: 
BANG! Powerout!  System goes up when power is available again. The 
init scripts want to start the qemu/kvm deamons, but they refuse 
because these see their pidfiles already from pre-powerout..


-pidfile behaves properly in CVS.  You may be able to find the patch on 
the ML that fixed it.


Regards,

Anthony Liguori

I have seen other daemons somehow locking their pidfiles while 
running, and when killed -9ed, or poweroutes and they start again, 
they see their pid-file is unlocked, thus stale, and just overwrite 
it... only notifing you with a message they did so.


Kind Regards,
Axel









Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-07 Thread Anthony Liguori

On 10/07/2010 01:08 PM, Yehuda Sadeh Weinraub wrote:

On Thu, Oct 7, 2010 at 7:12 AM, Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 08/03/2010 03:14 PM, Christian Brunner wrote:
 

+#include qemu-common.h
+#include qemu-error.h
+#includesys/types.h
+#includestdbool.h
+
+#includeqemu-common.h

   

This looks to be unnecessary.  Generally, system includes shouldn't be
required so all of these should go away except rado/librados.h
 

Removed.

   
 

+
+#include rbd_types.h
+#include module.h
+#include block_int.h
+
+#includestdio.h
+#includestdlib.h
+#includerados/librados.h
+
+#includesignal.h
+
+
+int eventfd(unsigned int initval, int flags);

   

This is not quite right.  Depending on eventfd is curious but in the very
least, you need to detect the presence of eventfd in configure and provide a
wrapper that redefines it as necessary.
 

Can fix that, though please see my later remarks.
   

+static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc)
+{
+uint32_t len = strlen(name);
+/* total_len = encoding op + name + empty buffer */
+uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t);
+char *desc = NULL;

   

char is the wrong type to use here as it may be signed or unsigned.  That
can have weird effects with binary data when you're directly manipulating
it.
 

Well, I can change it to uint8_t, so that it matches the op type, but
that'll require adding some other castings. In any case, you usually
get such a weird behavior when you cast to types of different sizes
and have the sign bit padded which is not the case in here.

   
 

+
+desc = qemu_malloc(total_len);
+
+*tmap_desc = desc;
+
+*desc = op;
+desc++;
+memcpy(desc,len, sizeof(len));
+desc += sizeof(len);
+memcpy(desc, name, len);
+desc += len;
+len = 0;
+memcpy(desc,len, sizeof(len));
+desc += sizeof(len);

   

Shouldn't endianness be a concern?
 

Right. Fixed that.

   
 

+
+return desc - *tmap_desc;
+}
+
+static void free_tmap_op(char *tmap_desc)
+{
+qemu_free(tmap_desc);
+}
+
+static int rbd_register_image(rados_pool_t pool, const char *name)
+{
+char *tmap_desc;
+const char *dir = RBD_DIRECTORY;
+int ret;
+
+ret = create_tmap_op(CEPH_OSD_TMAP_SET, name,tmap_desc);
+if (ret0) {
+return ret;
+}
+
+ret = rados_tmap_update(pool, dir, tmap_desc, ret);
+free_tmap_op(tmap_desc);
+
+return ret;
+}

   

This ops are all synchronous?  IOW, rados_tmap_update() call blocks until
the operation is completed?
 

Yeah. And this is only called from the rbd_create() callback.

   

+header_snap += strlen(header_snap) + 1;
+if (header_snapend)
+error_report(bad header, snapshot list broken);

   

Missing curly braces here.
 

Fixed.

   

+if (strncmp(hbuf + 68, RBD_HEADER_VERSION, 8)) {
+error_report(Unknown image version %s, hbuf + 68);
+r = -EMEDIUMTYPE;
+goto failed;
+}
+
+RbdHeader1 *header;


   

Don't mix variable definitions with code.
 

Fixed.

   

+s-efd = eventfd(0, 0);
+if (s-efd0) {
+error_report(error opening eventfd);
+goto failed;
+}
+fcntl(s-efd, F_SETFL, O_NONBLOCK);
+qemu_aio_set_fd_handler(s-efd, rbd_aio_completion_cb, NULL,
+rbd_aio_flush_cb, NULL, s);

   

It looks like you just use the eventfd to signal aio completion callbacks.
  A better way to do this would be to schedule a bottom half.  eventfds are
Linux specific and specific to recent kernels.
 

Digging back why we introduced the eventfd, it was due to some issues
seen with do_savevm() hangs on qemu_aio_flush(). The reason seemed
that we had no fd associated with the block device, which seemed to
not work well with the qemu aio model. If that assumption is wrong,
we'd be happy to change it. In any case, there are other more portable
ways to generate fds, so if it's needed we can do that.
   


There's no fd at all?   How do you get notifications about an 
asynchronous event completion?


Regards,

Anthony Liguori



Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-07 Thread Anthony Liguori

On 10/07/2010 01:41 PM, Yehuda Sadeh Weinraub wrote:

On Thu, Oct 7, 2010 at 11:38 AM, Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 10/07/2010 01:08 PM, Yehuda Sadeh Weinraub wrote:
 

On Thu, Oct 7, 2010 at 7:12 AM, Anthony Liguorianth...@codemonkey.ws
  wrote:

   

On 08/03/2010 03:14 PM, Christian Brunner wrote:

 

+#include qemu-common.h
+#include qemu-error.h
+#includesys/types.h
+#includestdbool.h
+
+#includeqemu-common.h


   

This looks to be unnecessary.  Generally, system includes shouldn't be
required so all of these should go away except rado/librados.h

 

Removed.


   


 

+
+#include rbd_types.h
+#include module.h
+#include block_int.h
+
+#includestdio.h
+#includestdlib.h
+#includerados/librados.h
+
+#includesignal.h
+
+
+int eventfd(unsigned int initval, int flags);


   

This is not quite right.  Depending on eventfd is curious but in the very
least, you need to detect the presence of eventfd in configure and
provide a
wrapper that redefines it as necessary.

 

Can fix that, though please see my later remarks.

   

+static int create_tmap_op(uint8_t op, const char *name, char
**tmap_desc)
+{
+uint32_t len = strlen(name);
+/* total_len = encoding op + name + empty buffer */
+uint32_t total_len = 1 + (sizeof(uint32_t) + len) +
sizeof(uint32_t);
+char *desc = NULL;


   

char is the wrong type to use here as it may be signed or unsigned.  That
can have weird effects with binary data when you're directly manipulating
it.

 

Well, I can change it to uint8_t, so that it matches the op type, but
that'll require adding some other castings. In any case, you usually
get such a weird behavior when you cast to types of different sizes
and have the sign bit padded which is not the case in here.


   


 

+
+desc = qemu_malloc(total_len);
+
+*tmap_desc = desc;
+
+*desc = op;
+desc++;
+memcpy(desc,len, sizeof(len));
+desc += sizeof(len);
+memcpy(desc, name, len);
+desc += len;
+len = 0;
+memcpy(desc,len, sizeof(len));
+desc += sizeof(len);


   

Shouldn't endianness be a concern?

 

Right. Fixed that.


   


 

+
+return desc - *tmap_desc;
+}
+
+static void free_tmap_op(char *tmap_desc)
+{
+qemu_free(tmap_desc);
+}
+
+static int rbd_register_image(rados_pool_t pool, const char *name)
+{
+char *tmap_desc;
+const char *dir = RBD_DIRECTORY;
+int ret;
+
+ret = create_tmap_op(CEPH_OSD_TMAP_SET, name,tmap_desc);
+if (ret  0) {
+return ret;
+}
+
+ret = rados_tmap_update(pool, dir, tmap_desc, ret);
+free_tmap_op(tmap_desc);
+
+return ret;
+}


   

This ops are all synchronous?  IOW, rados_tmap_update() call blocks until
the operation is completed?

 

Yeah. And this is only called from the rbd_create() callback.


   

+header_snap += strlen(header_snap) + 1;
+if (header_snap  end)
+error_report(bad header, snapshot list broken);


   

Missing curly braces here.

 

Fixed.


   

+if (strncmp(hbuf + 68, RBD_HEADER_VERSION, 8)) {
+error_report(Unknown image version %s, hbuf + 68);
+r = -EMEDIUMTYPE;
+goto failed;
+}
+
+RbdHeader1 *header;



   

Don't mix variable definitions with code.

 

Fixed.


   

+s-efd = eventfd(0, 0);
+if (s-efd  0) {
+error_report(error opening eventfd);
+goto failed;
+}
+fcntl(s-efd, F_SETFL, O_NONBLOCK);
+qemu_aio_set_fd_handler(s-efd, rbd_aio_completion_cb, NULL,
+rbd_aio_flush_cb, NULL, s);


   

It looks like you just use the eventfd to signal aio completion
callbacks.
  A better way to do this would be to schedule a bottom half.  eventfds
are
Linux specific and specific to recent kernels.

 

Digging back why we introduced the eventfd, it was due to some issues
seen with do_savevm() hangs on qemu_aio_flush(). The reason seemed
that we had no fd associated with the block device, which seemed to
not work well with the qemu aio model. If that assumption is wrong,
we'd be happy to change it. In any case, there are other more portable
ways to generate fds, so if it's needed we can do that.

   

There's no fd at all?   How do you get notifications about an asynchronous
event completion?

Regards,

Anthony Liguori

 

(resending to list, sorry)

The fd is hidden deep under in librados. We get callback notifications
for events completion.
   


How is that possible?  Are the callbacks delivered in the context of a 
different thread?  If so, don't you need locking?


Regards,

Anthony Liguori


Thanks,
Yehuda
   





Re: [Qemu-devel] [PATCH 2/3] vnc: support password expire

2010-10-07 Thread Anthony Liguori

On 10/07/2010 06:15 AM, Gerd Hoffmann wrote:

This patch adds support for expiring passwords to vnc.  It adds a new
lifetime parameter to the vnc_display_password() function, which
specifies the number of seconds the new password will be valid.  Passing
zero as lifetime maintains current behavior (password never expires).

Signed-off-by: Gerd Hoffmannkra...@redhat.com
   


This has been posted before and I've never understood it.  Why can't a 
management tool just expire passwords on it's own?


How does password expiration help with security at all?

Regards,

Anthony Liguori


---
  console.h |2 +-
  monitor.c |3 +--
  ui/vnc.c  |   15 ++-
  ui/vnc.h  |1 +
  4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/console.h b/console.h
index aafb031..24670e5 100644
--- a/console.h
+++ b/console.h
@@ -368,7 +368,7 @@ void cocoa_display_init(DisplayState *ds, int full_screen);
  void vnc_display_init(DisplayState *ds);
  void vnc_display_close(DisplayState *ds);
  int vnc_display_open(DisplayState *ds, const char *display);
-int vnc_display_password(DisplayState *ds, const char *password);
+int vnc_display_password(DisplayState *ds, const char *password, int lifetime);
  void do_info_vnc_print(Monitor *mon, const QObject *data);
  void do_info_vnc(Monitor *mon, QObject **ret_data);
  char *vnc_display_local_addr(DisplayState *ds);
diff --git a/monitor.c b/monitor.c
index fbb678d..d82eb9e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -966,11 +966,10 @@ static int do_quit(Monitor *mon, const QDict *qdict, 
QObject **ret_data)

  static int change_vnc_password(const char *password)
  {
-if (vnc_display_password(NULL, password)  0) {
+if (vnc_display_password(NULL, password, 0)  0) {
  qerror_report(QERR_SET_PASSWD_FAILED);
  return -1;
  }
-
  return 0;
  }

diff --git a/ui/vnc.c b/ui/vnc.c
index 1ef0fc5..51aa9ca 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2078,11 +2078,19 @@ static int protocol_client_auth_vnc(VncState *vs, 
uint8_t *data, size_t len)
  unsigned char response[VNC_AUTH_CHALLENGE_SIZE];
  int i, j, pwlen;
  unsigned char key[8];
+time_t now;

  if (!vs-vd-password || !vs-vd-password[0]) {
  VNC_DEBUG(No password configured on server);
  goto reject;
  }
+if (vs-vd-expires) {
+time(now);
+if (vs-vd-expires  now) {
+VNC_DEBUG(Password is expired);
+goto reject;
+}
+}

  memcpy(response, vs-challenge, VNC_AUTH_CHALLENGE_SIZE);

@@ -2474,7 +2482,7 @@ void vnc_display_close(DisplayState *ds)
  #endif
  }

-int vnc_display_password(DisplayState *ds, const char *password)
+int vnc_display_password(DisplayState *ds, const char *password, int lifetime)
  {
  VncDisplay *vs = ds ? (VncDisplay *)ds-opaque : vnc_display;

@@ -2492,6 +2500,11 @@ int vnc_display_password(DisplayState *ds, const char 
*password)
  if (vs-auth == VNC_AUTH_NONE) {
  vs-auth = VNC_AUTH_VNC;
  }
+if (lifetime) {
+vs-expires = time(NULL) + lifetime;
+} else {
+vs-expires = 0;
+}
  } else {
  vs-auth = VNC_AUTH_NONE;
  }
diff --git a/ui/vnc.h b/ui/vnc.h
index 9619b24..4f895be 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -120,6 +120,7 @@ struct VncDisplay

  char *display;
  char *password;
+time_t expires;
  int auth;
  bool lossy;
  #ifdef CONFIG_VNC_TLS
   





Re: [Qemu-devel] [PATCH 0/2] USB CCID device

2010-10-07 Thread Anthony Liguori

On 10/07/2010 02:39 AM, Alon Levy wrote:

- Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/06/2010 03:55 AM, Gerd Hoffmann wrote:
 

On 10/06/10 02:28, Alon Levy wrote:
   
 

Does this work with live migration?  I can't see how it would.

   

No, it doesn't right now. It would require cooperation with the
 

client,
 

to tell it to reconnect to the target qemu (kind of like spice).
 

I think until we have this migration should have pretty much the
   

same
 

effect as a chardev disconnect, i.e. detach the usb device (which
   

the
 

guest will see as unplug).
   

Better yet, mark the guest as unmigrateable and let the management
tool
unplug the usb device before migration and replug it after migration.

It's the same principle behind device assignment.

 

Is there any way to also get a pre_migrate callback with 
register_device_unmigratable?
I'd like to send a VSC_Reconnect message, then the guest sees an unplug, then 
migration,
   


No.  The disconnect needs to happen in the management tooling layer.   
Same is true for any device doing hardware passthrough.


Automagic unplugging/plugging during migration is not a good idea 
universally so it's something that needs to happen as a policy at the 
management level.


Regards,

Anthony Liguori


then (no plug yet since the device is marked as auto_attach=0) client reconnects
(actually this happens before but to a paused machine waiting for migration), 
and then
causes attachement, same as a new machine.

   

Regards,

Anthony Liguori

 

Needs some code though, at minimum you'll have to xfer the connected
   
 

state from the migration source and have some bits in post_load()
which do attach/detach if needed.

cheers,
   Gerd

   





Re: [Qemu-devel] [PATCH 0/2] USB CCID device

2010-10-07 Thread Anthony Liguori

On 10/06/2010 05:12 PM, Alon Levy wrote:

Actually, both are possible - but the later is the interesting use case (the
former is mainly for debugging). To elaborate: the device is meant to allow
a hardware reader to be available to the guest while still being available to
the client (which is running on the computer with the real reader attached). So
the real card is what we are talking to. The other usage is to have a virtual
card, which would actually be more logical to put with the qemu device, but
It isn't my current focus (the focus being the real card, and the virtual card
being implemented in the client side is a testing measure).
   


Okay, that makes sense.


I'll do this.

A side note: I tried migrating a QSIMPLEQ today - not a fun experience. I'm
wondering if this is a result of a mentality that all devices should allocate
memory upfront that makes using / migrating dynamic linked lists a non-starter,
or just an oversight (no one wrote VMSTATE_QSIMPLEQ yet). As it stands migrating
a QSIMPLEQ (or any other list) is much easier the old way without using VMSTATE.
/rant.
   


Yeah, complex types are not at all easy to migrate today.  Something we 
need to tackle with vmstate2 eventually.


Regards,

Anthony Liguori


Regards,

Anthony Liguori
 





Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-07 Thread Anthony Liguori

On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote:

How is that possible?  Are the callbacks delivered in the context of a
different thread?  If so, don't you need locking?
 

Not sure I'm completely following you. The callbacks are delivered in
the context of a different thread, but won't run concurrently.


Concurrently to what?  How do you prevent them from running concurrently 
with qemu?


If you saw lock ups, I bet that's what it was from.

Regards,

Anthony Liguori


  Do you
see any specific concurrency issue? We can add some mutex protection
around at the aio callback, so that if librados turns multithreaded at
this point we're covered.


Thanks,
Yehuda

   





Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-07 Thread Anthony Liguori

On 10/07/2010 04:49 PM, Yehuda Sadeh Weinraub wrote:

On Thu, Oct 7, 2010 at 2:04 PM, Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote:
 

How is that possible?  Are the callbacks delivered in the context of a
different thread?  If so, don't you need locking?

 

Not sure I'm completely following you. The callbacks are delivered in
the context of a different thread, but won't run concurrently.
   

Concurrently to what?  How do you prevent them from running concurrently
with qemu?
 

There are two types of callbacks. The first is for rados aio
completions, and the second one is the one added later for the fd glue
layer.
   


This is a bad architecture for something like qemu.  You could create a 
pipe and use the pipe to signal to qemu.  Same principle as eventfd.  
Ideally, you would do this in the library itself.


Regards,

Anthony Liguori


The first callback, called by librados whenever aio completes, runs in
the context of a single librados thread:

+static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
+{
+RBDAIOCB *acb = rcb-acb;
rcb is per a single aio. Was created  before and will be destroyed
here, whereas acb is shared between a few aios, however, it was
generated before the first aio was created.

+int64_t r;
+uint64_t buf = 1;
+int i;
+
+acb-aiocnt--;

acb-aiocnt has been set before initiating all the aios, so it's ok to
touch it now. Same goes to all acb fields.

+r = rados_aio_get_return_value(c);
+rados_aio_release(c);
+if (acb-write) {
+if (r  0) {
+acb-ret = r;
+acb-error = 1;
+} else if (!acb-error) {
+acb-ret += rcb-segsize;
+}
+} else {
+if (r == -ENOENT) {
+memset(rcb-buf, 0, rcb-segsize);
+if (!acb-error) {
+acb-ret += rcb-segsize;
+}
+} else if (r  0) {
+acb-ret = r;
+acb-error = 1;
+} else if (r  rcb-segsize) {
+memset(rcb-buf + r, 0, rcb-segsize - r);
+if (!acb-error) {
+acb-ret += rcb-segsize;
+}
+} else if (!acb-error) {
+acb-ret += r;
+}
+}
+if (write(acb-s-efd,buf, sizeof(buf))  0)
This will wake up the io_read()

+error_report(failed writing to acb-s-efd\n);
+qemu_free(rcb);
+i = 0;
+if (!acb-aiocnt  acb-bh) {
+qemu_bh_schedule(acb-bh);
This is the only qemu related call in here, seems safe to call it.

+}
+}

The scheduled bh function will be called only after all aios that
relate to this specific aio set are done, so the following seems ok,
as there's no more acb references.
+static void rbd_aio_bh_cb(void *opaque)
+{
+RBDAIOCB *acb = opaque;
+uint64_t buf = 1;
+
+if (!acb-write) {
+qemu_iovec_from_buffer(acb-qiov, acb-bounce, acb-qiov-size);
+}
+qemu_vfree(acb-bounce);
+acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 : acb-ret));
+qemu_bh_delete(acb-bh);
+acb-bh = NULL;
+
+if (write(acb-s-efd,buf, sizeof(buf))  0)
+error_report(failed writing to acb-s-efd\n);
+qemu_aio_release(acb);
+}

Now, the second ones are the io_read(), in which we have our glue fd.
We send uint64 per each completed io

+static void rbd_aio_completion_cb(void *opaque)
+{
+BDRVRBDState *s = opaque;
+
+uint64_t val;
+ssize_t ret;
+
+do {
+if ((ret = read(s-efd,val, sizeof(val)))  0) {
+s-qemu_aio_count -= val;
There is an issue here with s-qemu_aio_count which needs to be
protected by a mutex. Other than that, it just reads from s-efd.

+   }
+} while (ret  0  errno == EINTR);
+
+return;
+}
+
+static int rbd_aio_flush_cb(void *opaque)
+{
+BDRVRBDState *s = opaque;
+
+return (s-qemu_aio_count  0);
Same here as with the previous one, needs a mutex around s-qemu_aio_count.

+}

   

If you saw lock ups, I bet that's what it was from.

 

As I explained before, before introducing the fd glue layer, the lack
of fd associated with our block device caused that there was no way
for qemu to check whether all aios were flushed or not, which didn't
work well when doing migration/savevm.

Thanks,
Yehuda
   





Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-08 Thread Anthony Liguori

On 10/07/2010 05:45 PM, Sage Weil wrote:

On Thu, 7 Oct 2010, Anthony Liguori wrote:

   

On 10/07/2010 04:49 PM, Yehuda Sadeh Weinraub wrote:
 

On Thu, Oct 7, 2010 at 2:04 PM, Anthony Liguorianth...@codemonkey.ws
wrote:

   

On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote:

 

How is that possible?  Are the callbacks delivered in the context of a
different thread?  If so, don't you need locking?


 

Not sure I'm completely following you. The callbacks are delivered in
the context of a different thread, but won't run concurrently.

   

Concurrently to what?  How do you prevent them from running concurrently
with qemu?

 

There are two types of callbacks. The first is for rados aio
completions, and the second one is the one added later for the fd glue
layer.

   

This is a bad architecture for something like qemu.  You could create a
pipe and use the pipe to signal to qemu.  Same principle as eventfd.
Ideally, you would do this in the library itself.
 

I'm sorry, I'm having a hard time understanding what it is you're
objecting to, or what you would prefer, as there are two different things
we're talking about here (callbacks and fd glue/pipes).  (Please bear with
me as I am not a qemu expert!)

The first is the aio completion.  You said a few messages back:

   

It looks like you just use the eventfd to signal aio completion
callbacks.  A better way to do this would be to schedule a bottom half.
 

This is what we're doing.  The librados makes a callback to rbd.c's
rbd_finish_aiocb(), which updates some internal rbd accounting and then
calls qemu_bh_schedule().  Is that part right?
   


No.  You're calling qemu_bh_schedule() in a separate thread in parallel 
to other operations.


That's absolutely not safe.


The second part is an fd (currently created via eventfd(), but I don't
think it matters where it comes from) that was later added because
qemu_aio_flush() wouldn't trigger when our aio's completed (and scheduled
the bottom halves).  This was proposed by Simone Gotti, who had problems
with live migration:

http://www.mail-archive.com/qemu-devel@nongnu.org/msg35516.html

Apparently calling the bottom half isn't sufficient to wake up a blocked
qemu_aio_flush()?  His solution was to create an eventfd() fd, write a
word to it in the aio completion callback (before we schedule the bh), and
add the necessary callbacks to make qemu_aio_flush() behave.

Is the problem simply that we should be using pipe(2) instead of
eventfd(2)?

So far I've heard that we should be scheduling the bottom halves (we are),
and we should be using a pipe to signal qemu (we're using an fd created by
eventfd(2)).
   


Your fundamental problem is your use of threads.  QEMU is single 
threaded.  You cannot call into QEMU code from another thread without 
introducing locking.  Any other solution is going to be intrinsically 
broken.


There are two possibilities to fix this:

1) You can change your library interface so that it doesn't generate 
callbacks via threads.  That would be my preference because I think it's 
a bad interface but it's your library so it's not really my choice :-)


2) You can limit the callbacks to doing nothing other than writing to a 
file descriptor.  You then read the file descriptor somewhere else in 
the normal QEMU code and you can use the file descriptor to get 
signals.  If you're passing data to callbacks, it's much harder because 
you're going to have to store that data somewhere and inevitably require 
locking.


The complexity of (2) is why I think thread-based callbacks is such a 
bad interface.


Regards,

Anthony Liguori


Thanks,
sage




   

Regards,

Anthony Liguori

 

The first callback, called by librados whenever aio completes, runs in
the context of a single librados thread:

+static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
+{
+RBDAIOCB *acb = rcb-acb;
rcb is per a single aio. Was created  before and will be destroyed
here, whereas acb is shared between a few aios, however, it was
generated before the first aio was created.

+int64_t r;
+uint64_t buf = 1;
+int i;
+
+acb-aiocnt--;

acb-aiocnt has been set before initiating all the aios, so it's ok to
touch it now. Same goes to all acb fields.

+r = rados_aio_get_return_value(c);
+rados_aio_release(c);
+if (acb-write) {
+if (r   0) {
+acb-ret = r;
+acb-error = 1;
+} else if (!acb-error) {
+acb-ret += rcb-segsize;
+}
+} else {
+if (r == -ENOENT) {
+memset(rcb-buf, 0, rcb-segsize);
+if (!acb-error) {
+acb-ret += rcb-segsize;
+}
+} else if (r   0) {
+acb-ret = r;
+acb-error = 1;
+} else if (r   rcb-segsize) {
+memset(rcb-buf + r, 0, rcb-segsize - r);
+if (!acb-error) {
+acb-ret += rcb-segsize

Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

2010-10-08 Thread Anthony Liguori

On 10/08/2010 10:50 AM, Yehuda Sadeh Weinraub wrote:

Oh, that makes it more clean. Considering that we did it for kvm, and
looking at the kvm qemu_bh_schedule() implementation, it does look
thread safe (there might be an issue though with canceling the bh
though, haven't looked at it, not really relevant).


It's definitely not thread safe.  Even though you can set the flag 
atomically (not guaranteed, but assume you can), we rely on the fact 
that we can check for pending BHs before entering sleep without having 
to worry about new BHs being scheduled in between the sleep and the 
check.  If you schedule a BH in a thread then you open yourself up to 
the race.


Regards,

Anthony Liguori




[Qemu-devel] Re: [PATCH v2 1/7] qcow2: Make get_bits_from_size() common

2010-10-08 Thread Anthony Liguori

On 10/08/2010 10:48 AM, Stefan Hajnoczi wrote:

The get_bits_from_size() calculates the log base-2 of a number.  This is
useful in bit manipulation code working with power-of-2s.

Currently used by qcow2 and needed by qed in a follow-on patch.

Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com
   


Acked-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  block/qcow2.c |   22 --
  cutils.c  |   18 ++
  qemu-common.h |1 +
  3 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ee3481b..6e25812 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -794,28 +794,6 @@ static int qcow2_change_backing_file(BlockDriverState *bs,
  return qcow2_update_ext_header(bs, backing_file, backing_fmt);
  }

-static int get_bits_from_size(size_t size)
-{
-int res = 0;
-
-if (size == 0) {
-return -1;
-}
-
-while (size != 1) {
-/* Not a power of two */
-if (size  1) {
-return -1;
-}
-
-size= 1;
-res++;
-}
-
-return res;
-}
-
-
  static int preallocate(BlockDriverState *bs)
  {
  uint64_t nb_sectors;
diff --git a/cutils.c b/cutils.c
index 5883737..6c32198 100644
--- a/cutils.c
+++ b/cutils.c
@@ -283,3 +283,21 @@ int fcntl_setfl(int fd, int flag)
  }
  #endif

+/**
+ * Get the number of bits for a power of 2
+ *
+ * The following is true for powers of 2:
+ *   n == 1  get_bits_from_size(n)
+ */
+int get_bits_from_size(size_t size)
+{
+if (size == 0 || (size  (size - 1))) {
+return -1;
+}
+
+#if defined(_WIN32)  defined(__x86_64__)
+return __builtin_ctzll(size);
+#else
+return __builtin_ctzl(size);
+#endif
+}
diff --git a/qemu-common.h b/qemu-common.h
index 81aafa0..e0ca398 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -153,6 +153,7 @@ time_t mktimegm(struct tm *tm);
  int qemu_fls(int i);
  int qemu_fdatasync(int fd);
  int fcntl_setfl(int fd, int flag);
+int get_bits_from_size(size_t size);

  /* path.c */
  void init_paths(const char *prefix);
   





Re: [Qemu-devel] [PATCH] monitor: add usb_detach

2010-10-08 Thread Anthony Liguori
How is this different than usb_del?  Is it that it detaches it but does 
not delete the device?


Regards,

Anthony Liguori

On 10/05/2010 09:40 AM, Alon Levy wrote:

Signed-off-by: Alon Levyal...@redhat.com

---
  qemu-monitor.hx |   17 +
  sysemu.h|1 +
  vl.c|   41 +
  3 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 49bcd8d..9c792a9 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -711,6 +711,23 @@ command @code{info usb} to see the devices you can remove.
  ETEXI

  {
+.name   = usb_detach,
+.args_type  = devname:s,
+.params = device,
+.help   = detach USB device 'bus.addr',
+.mhandler.cmd = do_usb_detach,
+},
+
+STEXI
+...@item usb_detach @var{devname}
+...@findex usb_detach
+
+Detach the USB device @var{devname} from the QEMU virtual USB
+hub. @var{devname} has the syntax @code{bus.addr}. Use the monitor
+command @code{info usb} to see the devices you can detach.
+ETEXI
+
+{
  .name   = device_add,
  .args_type  = device:O,
  .params = driver[,prop=value][,...],
diff --git a/sysemu.h b/sysemu.h
index a1f6466..ac68863 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -183,6 +183,7 @@ extern struct soundhw soundhw[];

  void do_usb_add(Monitor *mon, const QDict *qdict);
  void do_usb_del(Monitor *mon, const QDict *qdict);
+void do_usb_detach(Monitor *mon, const QDict *qdict);
  void usb_info(Monitor *mon);

  void rtc_change_mon_event(struct tm *tm);
diff --git a/vl.c b/vl.c
index d352d18..6cfa009 100644
--- a/vl.c
+++ b/vl.c
@@ -891,6 +891,47 @@ void do_usb_del(Monitor *mon, const QDict *qdict)
  }
  }

+static USBDevice *usb_device_from_bus_dot_addr(const char *devname)
+{
+int bus_num, addr;
+const char *p;
+USBBus *bus;
+USBPort *port;
+
+if (!usb_enabled) {
+return NULL;
+}
+p = strchr(devname, '.');
+if (!p) {
+return NULL;
+}
+bus_num = strtoul(devname, NULL, 0);
+addr = strtoul(p + 1, NULL, 0);
+bus = usb_bus_find(bus_num);
+if (!bus) {
+return NULL;
+}
+QTAILQ_FOREACH(port,bus-used, next) {
+if (port-dev-addr == addr) {
+break;
+}
+}
+if (!port) {
+return NULL;
+}
+return port-dev;
+}
+
+void do_usb_detach(Monitor *mon, const QDict *qdict)
+{
+const char *devname = qdict_get_str(qdict, devname);
+USBDevice *dev = usb_device_from_bus_dot_addr(devname);
+
+if (dev == NULL || usb_device_detach(dev)  0) {
+error_report(could not detach USB device '%s', devname);
+}
+}
+
  /***/
  /* PCMCIA/Cardbus */

   





[Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X

2010-10-08 Thread Anthony Liguori
A lot of people copy libvirt's command line from ps -ef and then wonder why the
VM isn't working correctly.  Let's be kind and tell them what they should do
instead.

Without this patch, if you run with an invalid -net tap,fd=X, the guest still
runs and we poll 100% on a bad file descriptor.  With this patch, you get:

 qemu: -net tap,fd=42: invalid fd= for tap network device.  If you're copying
 from libvirt, use `virsh dom2xml-to-native' instead
 qemu: -net tap,fd=42: Device 'tap' could not be initialized

Signed-off-by: Anthony Liguori aligu...@us.ibm.com

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 1d83400..5f9f749 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -64,7 +64,7 @@
 #define UHCI_PORT_CSC   (1  1)
 #define UHCI_PORT_CCS   (1  0)
 
-#define FRAME_TIMER_FREQ 1000
+#define FRAME_TIMER_FREQ 500
 
 #define FRAME_MAX_LOOPS  100
 
@@ -1054,7 +1054,7 @@ static void uhci_frame_timer(void *opaque)
 UHCIState *s = opaque;
 
 /* prepare the timer for the next frame */
-s-expire_time += (get_ticks_per_sec() / FRAME_TIMER_FREQ);
+s-expire_time = qemu_get_clock(vm_clock) + (get_ticks_per_sec() / 
FRAME_TIMER_FREQ);
 
 if (!(s-cmd  UHCI_CMD_RS)) {
 /* Full stop */
diff --git a/net/tap.c b/net/tap.c
index 0147dab..45b1fb6 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -405,6 +405,8 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 int fd, vnet_hdr = 0;
 
 if (qemu_opt_get(opts, fd)) {
+int flags;
+
 if (qemu_opt_get(opts, ifname) ||
 qemu_opt_get(opts, script) ||
 qemu_opt_get(opts, downscript) ||
@@ -418,6 +420,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 return -1;
 }
 
+if (fcntl(fd, F_GETFL, flags) == -1) {
+error_report(invalid fd= for tap network device.  If you're 
copying from libvirt, use `virsh dom2xml-to-native' instead);
+return -1;
+}
+
 fcntl(fd, F_SETFL, O_NONBLOCK);
 
 vnet_hdr = tap_probe_vnet_hdr(fd);
-- 
1.7.0.4




Re: [Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X

2010-10-08 Thread Anthony Liguori

On 10/08/2010 06:28 PM, Alexander Graf wrote:

On 09.10.2010, at 00:04, Anthony Liguori wrote:

   

A lot of people copy libvirt's command line from ps -ef and then wonder why the
VM isn't working correctly.  Let's be kind and tell them what they should do
instead.

Without this patch, if you run with an invalid -net tap,fd=X, the guest still
runs and we poll 100% on a bad file descriptor.  With this patch, you get:

qemu: -net tap,fd=42: invalid fd= for tap network device.  If you're copying
from libvirt, use `virsh dom2xml-to-native' instead
qemu: -net tap,fd=42: Device 'tap' could not be initialized

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 1d83400..5f9f749 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -64,7 +64,7 @@
#define UHCI_PORT_CSC   (1  1)
#define UHCI_PORT_CCS   (1  0)

-#define FRAME_TIMER_FREQ 1000
+#define FRAME_TIMER_FREQ 500

#define FRAME_MAX_LOOPS  100

@@ -1054,7 +1054,7 @@ static void uhci_frame_timer(void *opaque)
 UHCIState *s = opaque;

 /* prepare the timer for the next frame */
-s-expire_time += (get_ticks_per_sec() / FRAME_TIMER_FREQ);
+s-expire_time = qemu_get_clock(vm_clock) + (get_ticks_per_sec() / 
FRAME_TIMER_FREQ);
 

How exactly is this related?
   


So, didn't have a clean working directory.

Regards,

Anthony Liguori


Alex


   





[Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X (v2)

2010-10-08 Thread Anthony Liguori
A lot of people copy libvirt's command line from ps -ef and then wonder why the
VM isn't working correctly.  Let's be kind and tell them what they should do
instead.

Without this patch, if you run with an invalid -net tap,fd=X, the guest still
runs and we poll 100% on a bad file descriptor.  With this patch, you get:

 qemu: -net tap,fd=42: invalid fd= for tap network device.  If you're copying
 from libvirt, use `virsh dom2xml-to-native' instead
 qemu: -net tap,fd=42: Device 'tap' could not be initialized

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
---
v1 - v2
 - Remove garbage from unclean commit (spotted by Alex Graf)

diff --git a/net/tap.c b/net/tap.c
index 0147dab..45b1fb6 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -405,6 +405,8 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 int fd, vnet_hdr = 0;
 
 if (qemu_opt_get(opts, fd)) {
+int flags;
+
 if (qemu_opt_get(opts, ifname) ||
 qemu_opt_get(opts, script) ||
 qemu_opt_get(opts, downscript) ||
@@ -418,6 +420,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 return -1;
 }
 
+if (fcntl(fd, F_GETFL, flags) == -1) {
+error_report(invalid fd= for tap network device.  If you're 
copying from libvirt, use `virsh dom2xml-to-native' instead);
+return -1;
+}
+
 fcntl(fd, F_SETFL, O_NONBLOCK);
 
 vnet_hdr = tap_probe_vnet_hdr(fd);
-- 
1.7.0.4




Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-11 Thread Anthony Liguori

On 10/11/2010 08:10 AM, Avi Kivity wrote:

 On 10/11/2010 12:37 PM, Stefan Hajnoczi wrote:

On Sun, Oct 10, 2010 at 11:10:15AM +0200, Avi Kivity wrote:
   On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote:
 This patch implements the read/write state machine.  Operations are
 fully asynchronous and multiple operations may be active at any time.
 
 Allocating writes lock tables to ensure metadata updates do not
 interfere with each other.  If two allocating writes need to 
update the
 same L2 table they will run sequentially.  If two allocating 
writes need

 to update different L2 tables they will run in parallel.
 

  Shouldn't there be a flush between an allocating write and an L2
  update?  Otherwise a reuse of a cluster can move logical sectors
  from one place to another, causing a data disclosure.

  Can be skipped if the new cluster is beyond the physical image size.

Currently clusters are never reused and new clusters are always beyond
physical image size.  The only exception I can think of is when the
image file size is not a multiple of the cluster size and we round down
to the start of cluster.

In an implementation that supports TRIM or otherwise reuses clusters
this is a cost.

 +
 +/*
 + * Table locking works as follows:
 + *
 + * Reads and non-allocating writes do not acquire locks because 
they do not

 + * modify tables and only see committed L2 cache entries.

  What about a non-allocating write that follows an allocating write?

  1 Guest writes to sector 0
  2 Host reads backing image (or supplies zeros), sectors 1-127
  3 Host writes sectors 0-127
  4 Guest writes sector 1
  5 Host writes sector 1

  There needs to be a barrier that prevents the host and the disk from
  reordering operations 3 and 5, or guest operation 4 is lost.  As far
  as the guest is concerned no overlapping writes were issued, so it
  isn't required to provide any barriers.

  (based on the comment only, haven't read the code)

There is no barrier between operations 3 and 5.  However, operation 5
only starts after operation 3 has completed because of table locking.
It is my understanding that *independent* requests may be reordered but
two writes to the *same* sector will not be reordered if write A
completes before write B is issued.

Imagine a test program that uses pwrite() to rewrite a counter many
times on disk.  When the program finishes it prints the counter
variable's last value.  This scenario is like operations 3 and 5 above.
If we read the counter back from disk it will be the final value, not
some intermediate value.  The writes will not be reordered.


Yes, all that is needed is a barrier in program order.  So, operation 
5 is an allocating write as long as 3 hasn't returned? (at which point 
it becomes a non-allocating write)?


Correct.  The table lock in 0 is held until the request completes fully 
(including the write out of all of the fill data--step 3) which means 5 
will not begin until 3 has completed


Regards,

Anthony Liguori




[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 08:44 AM, Avi Kivity wrote:

 On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote:


  A leak is acceptable (it won't grow; it's just an unused, incorrect
  freelist), but data corruption is not.

The alternative is for the freelist to be a non-compat feature bit.
That means older QEMU binaries cannot use a QED image that has enabled
the freelist.


For this one feature.  What about others?


A compat feature is one where the feature can be completely ignored 
(meaning that the QEMU does not have to understand the data format).


An example of a compat feature is copy-on-read.  It's merely a 
suggestion and there is no additional metadata.  If a QEMU doesn't 
understand it, it doesn't affect it's ability to read the image.


An example of a non-compat feature would be zero cluster entries.  Zero 
cluster entries are a special L2 table entry that indicates that a 
cluster's on-disk data is all zeros.  As long as there is at least 1 ZCE 
in the L2 tables, this feature bit must be set.  As soon as all of the 
ZCE bits are cleared, the feature bit can be unset.


An older QEMU will gracefully fail when presented with an image using 
ZCE bits.  An image with no ZCEs will work on older QEMUs.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 08:04 AM, Avi Kivity wrote:

 On 10/11/2010 12:09 PM, Stefan Hajnoczi wrote:

On Sun, Oct 10, 2010 at 11:20:09AM +0200, Avi Kivity wrote:
   On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote:
 Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com
 ---
docs/specs/qed_spec.txt |   94 
+++

1 files changed, 94 insertions(+), 0 deletions(-)
 
 +Feature bits:
 +* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.
 +* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check 
before use.


  Great that QED_F_NEED_CHECK is now non-optional.  However, suppose
  we add a new structure (e.g. persistent freelist); it's now
  impossible to tell whether the structure is updated or not:

  1 new qemu opens image
  2 writes persistent freelist
  3 clears need_check
  4 shuts down
  5 old qemu opens image
  6 doesn't update persistent freelist
  7 clears need_check
  8 shuts down

  The image is now inconsistent, but has need_check cleared.

  We can address this by having a third feature bitmask that is
  autocleared by guests that don't recognize various bits; so the
  sequence becomes:

  1 new qemu opens image
  2 writes persistent freelist
  3 clears need_check
  4 sets persistent_freelist
  5 shuts down
  6 old qemu opens image
  7 clears persistent_freelist (and any other bits it doesn't 
recognize)

  8 doesn't update persistent freelist
  9 clears need_check
  10 shuts down

  The image is now consistent, since the persistent freelist has 
disappeared.


It is more complicated than just the feature bit.  The freelist requires
space in the image file.  Clearing the persistent_freelist bit leaks the
freelist.

We can solve this by using a compat feature bit and an autoclear feature
bit.  The autoclear bit specifies whether or not the freelist is valid
and the compat feature bit specifices whether or not the freelist
exists.

When the new qemu opens the image again it notices that the autoclear
bit is unset but the compat bit is set.  This means the freelist is
out-of-date and its space can be reclaimed.

I don't like the idea of doing these feature bit acrobatics because they
add complexity.  On the other hand your proposal ensures backward
compatibility in the case of an optional data structure that needs to
stay in sync with the image.  I'm just not 100% convinced it's worth it.


My scenario ends up with data corruption if we move to an old qemu and 
then back again, without any aborts.


A leak is acceptable (it won't grow; it's just an unused, incorrect 
freelist), but data corruption is not.


A leak is unacceptable.  It means an image can grow to an unbounded 
size.  If you are a server provider offering multitenancy, then a 
malicious guest can potentially grow the image beyond it's allotted size 
causing a Denial of Service attack against another tenant.


A freelist has to be a non-optional feature.  When the freelist bit is 
set, an older QEMU cannot read the image.  If the freelist is completed 
used, the freelist bit can be cleared and the image is then usable by 
older QEMUs.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 09:58 AM, Avi Kivity wrote:
A leak is unacceptable.  It means an image can grow to an unbounded 
size.  If you are a server provider offering multitenancy, then a 
malicious guest can potentially grow the image beyond it's allotted 
size causing a Denial of Service attack against another tenant.



This particular leak cannot grow, and is not controlled by the guest.


As the image gets moved from hypervisor to hypervisor, it can keep 
growing if given a chance to fill up the disk, then trim it all way.


In a mixed hypervisor environment, it just becomes a numbers game.

A freelist has to be a non-optional feature.  When the freelist bit 
is set, an older QEMU cannot read the image.  If the freelist is 
completed used, the freelist bit can be cleared and the image is then 
usable by older QEMUs.


Once we support TRIM (or detect zeros) we'll never have a clean freelist.


Zero detection doesn't add to the free list.

A potential solution here is to treat TRIM a little differently than 
we've been discussing.


When TRIM happens, don't immediately write an unallocated cluster entry 
for the L2.  Leave the L2 entry in-tact.  Don't actually write a UCE to 
the L2 until you actually allocate the block.


This implies a cost because you'll need to do metadata syncs to make 
this work.  However, that eliminates leakage.


Regards,

Anthony Liguori





[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 10:24 AM, Avi Kivity wrote:

 On 10/11/2010 05:02 PM, Anthony Liguori wrote:

On 10/11/2010 08:44 AM, Avi Kivity wrote:

 On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote:


  A leak is acceptable (it won't grow; it's just an unused, incorrect
  freelist), but data corruption is not.

The alternative is for the freelist to be a non-compat feature bit.
That means older QEMU binaries cannot use a QED image that has enabled
the freelist.


For this one feature.  What about others?


A compat feature is one where the feature can be completely ignored 
(meaning that the QEMU does not have to understand the data format).


An example of a compat feature is copy-on-read.  It's merely a 
suggestion and there is no additional metadata.  If a QEMU doesn't 
understand it, it doesn't affect it's ability to read the image.


An example of a non-compat feature would be zero cluster entries.  
Zero cluster entries are a special L2 table entry that indicates that 
a cluster's on-disk data is all zeros.  As long as there is at least 
1 ZCE in the L2 tables, this feature bit must be set.  As soon as all 
of the ZCE bits are cleared, the feature bit can be unset.


An older QEMU will gracefully fail when presented with an image using 
ZCE bits.  An image with no ZCEs will work on older QEMUs.




What's the motivation behind ZCE?


It's very useful for Copy-on-Read.  If the cluster in the backing file 
is unallocated, then when you do a copy-on-read, you don't want to write 
out a zero cluster since you'd expand the image to it's maximum size.


It's also useful for operations like compaction in the absence of TRIM.  
The common implementation on platforms like VMware is to open a file and 
write zeros to it until it fills up the filesystem.  You then delete the 
file.  The result is that any unallocated data on the disk is written as 
zero and combined with zero-detection in the image format, you can 
compact the image size by marking unallocated blocks as ZCE.


There is yet a third type of feature, one which is not strictly needed 
in order to use the image, but if used, must be kept synchronized.  An 
example is the freelist.  Another example is a directory index for a 
filesystem.  I can't think of another example which would be relevant 
to QED -- metadata checksums perhaps? -- we can always declare it a 
non-compatible feature, but of course, it reduces compatibility.


You're suggesting a feature that is not strictly needed, but that needs 
to be kept up to date.  If it can't be kept up to date, something needs 
to happen to remove it.  Let's call this a transient feature.


Most of the transient features can be removed given some bit of code.  
For instance, ZCE can be removed by writing out zero clusters or writing 
an unallocated L2 entry if there is no backing file.


I think we could add a qemu-img demote command or something like that 
that attempted to remove features when possible.  That doesn't give you 
instant compatibility but I'm doubtful that you can come up with a 
generic way to remove a feature from an image without knowing anything 
about the image.


Regards,

Anthony Liguori










Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 10:46 AM, Stefan Hajnoczi wrote:

On Mon, Oct 11, 2010 at 05:39:01PM +0200, Avi Kivity wrote:
   

  On 10/11/2010 05:30 PM, Stefan Hajnoczi wrote:
 

  It was discussed before, but I don't think we came to a conclusion. Are
  there any circumstances under which you don't want to set the
  QED_CF_BACKING_FORMAT flag?
 

I suggest the following:

QED_CF_BACKING_FORMAT_RAW = 0x1

When set, the backing file is a raw image and should not be probed for
its file format.  The default (unset) means that the backing image file
format may be probed.

Now the backing_fmt_{offset,size} are no longer necessary.
   

Should it not be an incompatible option?  If the backing disk starts
with a format magic, it will be probed by an older qemu,
incorrectly.
 

Agreed, it should be a non-compat feature bit.
   


If it's just raw or not raw, then I agree it should be non-compat.

I think we just need a feature bit then that indicates that the backing 
file is non-probeable which certainly simplifies the implementation.


QED_F_BACKING_FORMAT_NOPROBE maybe?

Regards,

Anthony Liguori


Stefan

   





Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 11:02 AM, Avi Kivity wrote:

 On 10/11/2010 05:49 PM, Anthony Liguori wrote:

On 10/11/2010 09:58 AM, Avi Kivity wrote:
A leak is unacceptable.  It means an image can grow to an unbounded 
size.  If you are a server provider offering multitenancy, then a 
malicious guest can potentially grow the image beyond it's allotted 
size causing a Denial of Service attack against another tenant.



This particular leak cannot grow, and is not controlled by the guest.


As the image gets moved from hypervisor to hypervisor, it can keep 
growing if given a chance to fill up the disk, then trim it all way.


In a mixed hypervisor environment, it just becomes a numbers game.


I don't see how it can grow.  Both the freelist and the clusters it 
points to consume space, which becomes a leak once you move it to a 
hypervisor that doesn't understand the freelist.  The older hypervisor 
then allocates new blocks.  As soon as it performs a metadata scan (if 
ever), the freelist is reclaimed.


Assume you don't ever do a metadata scan (which is really our design point).

If you move to a hypervisor that doesn't support it, then move to a 
hypervisor that does, you create a brand new freelist and start leaking 
more space.  This isn't a contrived scenario if you have a cloud 
environment with a mix of hosts.


You might not be able to get a ping-pong every time you provision, but 
with enough effort, you could create serious problems.


It's really an issue of correctness.  Making correctness trade-offs for 
the purpose of compatibility is a policy decision and not something we 
should bake into an image format.  If a tool feels strongly that it's a 
reasonable trade off to make, it can always fudge the feature bits itself.




A freelist has to be a non-optional feature.  When the freelist bit 
is set, an older QEMU cannot read the image.  If the freelist is 
completed used, the freelist bit can be cleared and the image is 
then usable by older QEMUs.


Once we support TRIM (or detect zeros) we'll never have a clean 
freelist.


Zero detection doesn't add to the free list.


Why not?  If a cluster is zero filled, you may drop it (assuming no 
backing image).


Sorry, I was thinking about the case of copy-on-read.  When you 
transition from UCE - ZCE, nothing gets added to the free list.  But if 
you go from allocated - ZCE, then you would add to the free list.




A potential solution here is to treat TRIM a little differently than 
we've been discussing.


When TRIM happens, don't immediately write an unallocated cluster 
entry for the L2.  Leave the L2 entry in-tact.  Don't actually write 
a UCE to the L2 until you actually allocate the block.


This implies a cost because you'll need to do metadata syncs to make 
this work.  However, that eliminates leakage.


The information is lost on shutdown; and you can have a large number 
of unallocated-in-waiting clusters (like a TRIM issued by mkfs, or a 
user expecting a visit from RIAA).


A slight twist on your proposal is to have an allocated-but-may-drop 
bit in a L2 entry.  TRIM or zero detection sets the bit (leaving the 
cluster number intact).  A following write to the cluster needs to 
clear the bit; if we reallocate the cluster we need to replace it with 
a ZCE.


Yeah, this is sort of what I was thinking.  You would still want a free 
list but it becomes totally optional because if it's lost, no data is 
leaked (assuming that the older version understands the bit).


I was suggesting that we store that bit in the free list though because 
that let's us support having older QEMUs with absolutely no knowledge 
still work.


This makes the freelist all L2 entries with the bit set; it may be 
less efficient than a custom data structure though.


We still want the freelist to avoid recreating it.  We also want to 
store the allocated-but-may-drop bit in the free list.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-11 Thread Anthony Liguori

On 10/11/2010 11:18 AM, Anthony Liguori wrote:

On 10/11/2010 10:46 AM, Stefan Hajnoczi wrote:

On Mon, Oct 11, 2010 at 05:39:01PM +0200, Avi Kivity wrote:

  On 10/11/2010 05:30 PM, Stefan Hajnoczi wrote:
  It was discussed before, but I don't think we came to a 
conclusion. Are

  there any circumstances under which you don't want to set the
  QED_CF_BACKING_FORMAT flag?

I suggest the following:

QED_CF_BACKING_FORMAT_RAW = 0x1

When set, the backing file is a raw image and should not be probed for
its file format.  The default (unset) means that the backing image 
file

format may be probed.

Now the backing_fmt_{offset,size} are no longer necessary.

Should it not be an incompatible option?  If the backing disk starts
with a format magic, it will be probed by an older qemu,
incorrectly.

Agreed, it should be a non-compat feature bit.


If it's just raw or not raw, then I agree it should be non-compat.

I think we just need a feature bit then that indicates that the 
backing file is non-probeable which certainly simplifies the 
implementation.


QED_F_BACKING_FORMAT_NOPROBE maybe?


Er, thinking more, this is still a good idea but we still need 
QED_CF_BACKING_FORMAT because we specifically need to know when a 
protocol is specified.  Otherwise, we have no way of doing nbd as a 
backing file.


Regards,

Anthony Liguori


Regards,

Anthony Liguori


Stefan








Re: [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread Anthony Liguori

On 10/11/2010 10:52 AM, Stefan Hajnoczi wrote:

2010/10/11 Gleb Natapovg...@redhat.com:
   

On Mon, Oct 11, 2010 at 01:48:09PM +0100, Stefan Hajnoczi wrote:
 

On Mon, Oct 11, 2010 at 12:16 PM, Bernhard Kohlbernhard.k...@nsn.com  wrote:
   

Am 11.10.2010 12:18, schrieb ext Gleb Natapov:
 

Currently if VM is started with multiple disks it is almost impossible to
guess which one of them will be used as boot device especially if there
is a mix of ATA/virtio/SCSI devices. Essentially BIOS decides the order
and without looking into the code you can't tell what the order will
be (and in qemu-kvm if boot=on is used it brings even more havoc). We
should allow fine-grained control of boot order from qemu command line,
or as a minimum control what device will be used for booting.

To do that along with inventing syntax to specify boot order on qemu
command line we need to communicate boot order to seabios via fw_cfg
interface. For that we need to have a way to unambiguously specify a
disk from qemu to seabios.  PCI bus address is not enough since not all
devices are PCI (do we care about them?) and since one PCI device may
control more then one disk (ATA slave/master, SCSI LUNs). We can do what
EDD specification does. Describe disk as:
 bus type (isa/pci),
 address on a bus (16 bit base address for isa, b/s/f for pci)
 device type (ATA/SCSI/VIRTIO)
 device path (slave/master for ATA, LUN for SCSI, nothing for virtio)

Will it cover all use cased? Any other ideas?
   

I think this also applies to network booting via gPXE. Usually our VMs
have 4 NICs, mixed virtio-net and PCI pass-through. 2 of the NICs shall
be used for booting, even if there are hard disks or floppy disks
connected. This scenario is currently almost impossible to configure.
 

Here is a gPXE to support fw_cfg.  You can pass gPXE script files from
the host to gPXE inside the guest.  This means you can boot specific
NICs:
http://patchwork.ozlabs.org/patch/43777/

Just wanted to post the link because it is related to the gPXE side of
this discussion.

   

Don't we load gPXE for each NIC and seabios passes PCI device to boot from
when it invokes one of them?
 

SeaBIOS may do that but gPXE internally just probes all PCI devices.
It does not take advantage of the PCI bus/addr/fn that was passed to
the option ROM.  A gPXE instance will try booting from each available
NIC in sequence.
   


It still registers a BEV entry though, no?

Does it at least try to boot from the PCI bus/addr/fn of the selected 
BEV entry?


Regards,

Anthony Liguori


Stefan

   





Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread Anthony Liguori

On 10/11/2010 07:16 AM, Gleb Natapov wrote:

On Mon, Oct 11, 2010 at 02:07:14PM +0200, Gerd Hoffmann wrote:
   

   Hi,

 

Floppy? Yes, I think we do.
   

And *one* floppy controllers can actually have *two* drives
connected, although booting from 'b' doesn't work IIRC.

 

and since one PCI device may
control more then one disk (ATA slave/master, SCSI LUNs). We can do what
EDD specification does. Describe disk as:
 bus type (isa/pci),
 address on a bus (16 bit base address for isa, b/s/f for pci)
 device type (ATA/SCSI/VIRTIO)
 device path (slave/master for ATA, LUN for SCSI, nothing for virtio)
 

If we had a qdev ID for all devices (which I think we should have
anyway), would this work or is a string not really handy enough?
   

I think we'll need support for that in all drivers supporting boot
anyway, i.e. have virtio-blk-pci register a boot edd when configured
that way.  Question is how to configure this.  We could attach the
boot index to either the blockdev or the device, i.e.

   -blockdev foo,bootindex=1

or

   -device virtio-blk-pci,bootindex=1

The latter looks more useful to me, boot order is guest state imho,
also it might expand to PXE booting nicely, i.e.

   -device e1000,bootindex=2

 

Yes, boot order is a guest sate managed by BIOS on real HW.
   


It's not that simple.  On advanced platforms, the boot order can be 
stored outside of the BIOS.  For instance, the boot order is actually 
stored in the IMM on certain IBM platforms and the BIOS queries the IMM 
for the boot order.  This allows out-of-band management tools to alter 
the boot order.


This is more or less what we're looking for here.  The BIOS should be 
able to query and modify the boot order but this is something that 
ideally belongs in QEMU.



Which turns up the question how this plays with option roms.
seabios should be able to order at pci device level at least when
booting via (pci) option rom.  OK for nics.  Booting from a scsi
disk with id != 0 using the lsi rom is probably impossible though.

What about non-pci option roms?  The one used for -kernel for example?

 

-option-rom rom.bin,bootindex=3?

We can pass boot index along with option rom via fw_cfg interface.
   


If the option rom is just hijacking int19, then there is no meaningful 
order you can give it.


Regards,

Anthony Liguori


--
Gleb.

___
SeaBIOS mailing list
seab...@seabios.org
http://www.seabios.org/mailman/listinfo/seabios
   





Re: [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread Anthony Liguori

On 10/11/2010 07:07 AM, Gerd Hoffmann wrote:

  Hi,


Floppy? Yes, I think we do.


And *one* floppy controllers can actually have *two* drives connected, 
although booting from 'b' doesn't work IIRC.



and since one PCI device may
control more then one disk (ATA slave/master, SCSI LUNs). We can do 
what

EDD specification does. Describe disk as:
 bus type (isa/pci),
 address on a bus (16 bit base address for isa, b/s/f for pci)
 device type (ATA/SCSI/VIRTIO)
 device path (slave/master for ATA, LUN for SCSI, nothing for 
virtio)


If we had a qdev ID for all devices (which I think we should have
anyway), would this work or is a string not really handy enough?


I think we'll need support for that in all drivers supporting boot 
anyway, i.e. have virtio-blk-pci register a boot edd when configured 
that way.  Question is how to configure this.  We could attach the 
boot index to either the blockdev or the device, i.e.


  -blockdev foo,bootindex=1

or

  -device virtio-blk-pci,bootindex=1

The latter looks more useful to me, boot order is guest state imho, 
also it might expand to PXE booting nicely, i.e.


  -device e1000,bootindex=2

Which turns up the question how this plays with option roms.  seabios 
should be able to order at pci device level at least when booting via 
(pci) option rom.  OK for nics.  Booting from a scsi disk with id != 0 
using the lsi rom is probably impossible though.


What about non-pci option roms?  The one used for -kernel for example?


-kernel hijacks int19 so it cannot participate in any kind of boot 
order.  It's either present (and therefore the bootable disk) or not 
present.


Regards,

Anthony Liguori


cheers,
  Gerd






Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread Anthony Liguori

On 10/11/2010 02:59 PM, Gleb Natapov wrote:

No boot rom should do that. extboot wreaks havoc when it is used.
And since virtio is now supported by bios there is no reason to use it.
   


You don't really have a choice.  You could be doing hardware passthrough 
and the ROM on the card may hijack int19.



Whoever needs scsi boot should add it to seabios too.
   


I don't disagree.

I think the best thing to do is to let SeaBIOS create a boot order table 
that contains descriptive information and then advertise that to QEMU.


QEMU can then try to associate the list of bootable devices with it's 
own set of devices and select a preferred order that it can then give 
back to SeaBIOS.  SeaBIOS can then present that list to the user for 
additional refinement.


Regards,

Anthony Liguori


--
Gleb.
   





Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread Anthony Liguori

On 10/11/2010 03:36 PM, Gleb Natapov wrote:

On Mon, Oct 11, 2010 at 03:30:21PM -0500, Anthony Liguori wrote:
   

On 10/11/2010 02:59 PM, Gleb Natapov wrote:
 

No boot rom should do that. extboot wreaks havoc when it is used.
And since virtio is now supported by bios there is no reason to use it.
   

You don't really have a choice.  You could be doing hardware
passthrough and the ROM on the card may hijack int19.

 

Then this particular HW would be broken on real HW too and will not
respect BIOS settings. But the code we provide should work properly.

   

Whoever needs scsi boot should add it to seabios too.
   

I don't disagree.

I think the best thing to do is to let SeaBIOS create a boot order
table that contains descriptive information and then advertise that
to QEMU.

 

What for? Why this step is needed?

   

QEMU can then try to associate the list of bootable devices with
it's own set of devices and select a preferred order that it can
then give back to SeaBIOS.  SeaBIOS can then present that list to
the user for additional refinement.

 

Why not skip your first step and let QEMU create boot order list and
pass it into Seabios. If menu=on option is present user will be able to
override the default from Seabios.
   


Because SeaBIOS is definitive and QEMU is not.

We can ask SeaBIOS to boot from SCSI LUN 3 on PCI address X.Y.Z but that 
doesn't mean that it can figure out what that means.  If it can't, how 
do we communicate that to the user?  If SeaBIOS communicates its list to 
QEMU then we can at least display that list in the monitor in the same 
way that it's displayed to the guest.  That means that we can reorder in 
the monitor and potentially can persistent the boot device list in a 
more meaningful way.


Regards,

Anthony Liguori


--
Gleb.
   





Re: [Qemu-devel] [PULL] eeepro100, virtio, net, vhost fixes

2010-10-11 Thread Anthony Liguori

On 10/07/2010 09:54 AM, Michael S. Tsirkin wrote:

Here are some fixes, all over the place.

I am guessing the below fixes all make sense for the 0.13 branch, too.
The per-device notifier patch is the only one that does not
fix bugs, but it is needed for a bugfix patch on qemu-kvm stable
that depends on it.

The following changes since commit 358664cc6d1b5f7c36004be0179b36011b81c49d:

   console: Avoid dereferencing NULL active_console (2010-10-03 06:43:10 +)
   


Pulled.  Thanks.

Regards,

Anthony Liguori


are available in the git repository at:
   git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony

Michael S. Tsirkin (5):
   net: delay freeing peer host device
   virtio: invoke set_status callback on reset
   virtio-net: unify vhost-net start/stop
   virtio: change set guest notifier to per-device
   vhost: error code

Stefan Weil (1):
   eepro100: Add support for multiple individual addresses (multiple IA)

  hw/eepro100.c   |   30 +-
  hw/vhost.c  |   54 ++---
  hw/virtio-net.c |   89 ++
  hw/virtio-pci.c |   29 +-
  hw/virtio.c |2 +
  hw/virtio.h |2 +-
  net.c   |   49 ++
  net.h   |1 +
  8 files changed, 168 insertions(+), 88 deletions(-)



   





[Qemu-devel] [RFT] qemu 0.13.0-rc3

2010-10-11 Thread Anthony Liguori
After suffering from a prolonged maintainer softlockup, I'm attempting 
to get 0.13.0 release process back on track.


I've tagged qemu-0.13.0-rc3 in git which only carries a few changes 
since 0.13.0-rc1.  Most notably, a series of updates from Kevin Wolf and 
Cam Macdonell's ivshmem device.


I think we're pretty good testing wise for the final release but I 
wanted to offer a 24-hour window for last minute fixes.  I'm only 
interested in the following:


1) Patches that are *tested* against the stable-0.13 branch that are 
already committed in master.  Please tell me explicitly that you've 
tested the patch and how you've tested it.


2) Pull requests from other maintainers with cherry-picked patches 
against stable-0.13 that have been tested.


If you don't already have something ready, it's probably not worth 
worrying about.  We can follow with 0.13.1 in a relatively short period 
of time.


So please get any requests to me before 6PM US Central time October 12th.

Post 0.13.0, as part of the 0.14 planning, we can discuss how to avoid 
future delay with releases.


Thanks!

Regards,

Anthony Liguori



Re: [Qemu-devel] qdev: Some ISA devices don't handle second instantiation gracefully

2010-10-12 Thread Anthony Liguori

On 10/12/2010 08:00 AM, Markus Armbruster wrote:

Markus Armbrusterarm...@redhat.com  writes:

   

When I try -device isa-applesmc -device isa-applesmc, I get

 WARNING: Using AppleSMC with invalid key
 qemu: hardware error: register_ioport_read: invalid opaque
 [...]

and a core dump.

I know nothing about this device.  Instantiating twice may well make no
sense.  But hw_error() is not a nice way to reject a command line
option.
 

Actually, ib700 and isa-debugcon fail the same way.

They call register_ioport_write(), which aborts via hw_error() when the
port is already in use.  This is okay for non-configurable parts of a
board emulation, but not okay for a qdev device, unless it has no_user
set.
   


It's definitely right to fail but I agree, it's better to propagate the 
error.



Related: when isa_init_irq() finds the requested IRQ already in use, it
fails with exit(1).  Maybe register_ioport_write()  friends should do
that as well.

Or maybe qdev device models should have an at most one flag.
   


I think the proper thing to do is remove all exit(1)s and propagate 
errors instead.


A simple approach would be to make register_ioport_{read,write}() return 
an int, then do a query-replace on the source tree to make all 
invocations of it simply check the return value and exit if it's non-zero.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

2010-10-12 Thread Anthony Liguori

On 10/12/2010 08:16 AM, Stefan Hajnoczi wrote:



Well, the protocol is currently encoded in the file name, separated by a
colon. Of course, we want to get rid of that, but we still don't know
what we want instead. It's completely unrelated to the backing file
format, though, it's about the format of the backing file name.
 

I agree with Kevin.  There's no need to have the ill-defined backing
format AFAICT.
   


Yeah, I've now convinced myself we don't need backing format name too.

Regards,

Anthony Liguori


Stefan
   





Re: [Qemu-devel] [PATCH 00/39] Make configure command line autoconf-compatible

2010-10-12 Thread Anthony Liguori

On 10/12/2010 08:32 AM, malc wrote:

On Tue, 12 Oct 2010, Paolo Bonzini wrote:

   

On 10/12/2010 03:22 PM, malc wrote:
 

   19 files changed, 3729 insertions(+), 456 deletions(-)
 

I don't think the explanation above justified net plus of 3273 lines of new
code.
   

The part you forgot to quote is:
 

Yes, failed to notice that, sorry.

   
 

  Makefile   |   54 +-
  Makefile.dis   |6 +-
  Makefile.hw|8 +-
  Makefile.objs  |4 +-
  Makefile.target|   32 +-
  Makefile.user  |6 +-
  bsd-user/main.c|   10 +-
  config.guess   | 1502 ++
  config.sub | 1731

  configure  |  757 ++-
  create_config  |5 +-
  darwin-user/machload.c |   12 +-
  darwin-user/main.c |6 +-
  darwin-user/syscall.c  |2 +-
  linux-user/main.c  |   10 +-
  pc-bios/optionrom/Makefile |8 +-
  rules.mak  |   10 +-
  tests/Makefile |   16 +-
  tests/cris/Makefile|6 +-
  19 files changed, 3729 insertions(+), 456 deletions(-)
   

All these lines are in config.guess and config.sub that are shared with all
projects using Autotools and do not really need to be considered.

It's 40 lines of new code, 33 of which are in configure's help message.
 


I think there's a fair bit of value in looking like autoconf but a 39 
patch series is a bit to digest at once.


I'll try to do a thorough review but I'd suggest keeping patch series 
down to about 20 patches whenever possible and do multiple rounds as 
necessary.


Regards,

Anthony Liguori


Paolo

 
   





Re: [Qemu-devel] qdev: Some ISA devices don't handle second instantiation gracefully

2010-10-12 Thread Anthony Liguori

On 10/12/2010 08:54 AM, Markus Armbruster wrote:



I think the proper thing to do is remove all exit(1)s and propagate
errors instead.
 

exit() is good enough during startup, i.e. -device.  It's wrong for hot
plug; anything to be used in a hot plug path must propagate errors.  We
could keep exiting in code that's only used by non-hotpluggable devices.
   


I'm not really suggesting that we move exit()s out of all devices right 
now.  Am just suggesting that we bump them up one level.



A simple approach would be to make register_ioport_{read,write}()
return an int, then do a query-replace on the source tree to make all
invocations of it simply check the return value and exit if it's
non-zero.
 

In similar cases, we've used a simple FOO_nofail() wrapper in places
that want to exit.

I'll see what I can do.
   


I prefer propagating the exit but am not deeply opposed to nofail().

Regards,

Anthony Liguori



[Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-12 Thread Anthony Liguori

On 10/12/2010 10:08 AM, Kevin Wolf wrote:

  Otherwise we might destroy data that isn't
even touched by the guest request in case of a crash.
   


The failure scenarios are either that the cluster is leaked in which 
case, the old version of the data is still present or the cluster is 
orphaned because the L2 entry is written, in which case the old version 
of the data is present.


Are you referring to a scenario where the cluster is partially written 
because the data is present in the write cache and the write cache isn't 
flushed on power failure?


Regards,

Anthony Liguori


Kevin
   





Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)

2010-10-12 Thread Anthony Liguori

On 10/12/2010 11:03 AM, Alon Levy wrote:

- Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/12/2010 07:58 AM, Alon Levy wrote:
 

This patch adds a new device, it is described in full in the second
   

patch
 

intro and also in the documentation in docs. In brief it provides a
   

standard
 

smart card reader device.

The first patch is the configure change and docs.
The second patch contains the actual device, I couldn't figure out a
   

good
 

way to split it to ease review.

v2 changed:
   * all QSIMPLEQ turned into fixed sized rings
   * all allocated buffers turned into fixed size buffers
   * added migration support
   * added a message to tell client qemu has migrated to ip:port
* for lack of monitor commands ip:port are 0:0, which causes the
   

updated
 

 vscclient to connect to one port higher on the same host. will
   

add monitor
 

 commands in a separate patch. tested with current setup.

   

This is way too much magic to live within a device.  Devices manage
reconnecting themselves during migration.  When you create the
destination qemu instance, you specify what to connect to.

IOW,

On the source:

qemu -chardev tcp:localhost:1025,id=foo -usbdevice ccid,chardev=foo
...

On the destination:

qemu -chardev tcp:localhost:1026,id=foo -usbdevice ccid,chardev=foo
-incoming tcp:0.0.0.0:1024 ...

A connection happens when the device is created.

But now I'm even further confused then when I first reviewed it..  If

you're now supporting migration, does that mean that you're relying on

the daemon to emulate the device?

 

Let me try to clarify this. Nothing has changed since the last patch except
for what's in the notes, i.e. migration support. The device I'm adding is a
reader. The reader is just a pipe between smart cards and the guest operating
system. The smart card logic does live outside of this device, and is available
in the cac_card sources at http://cgit.freedesktop.org/~alon/cac_card/ (all of
this is in docs/usb_ccid.txt).

So when I speak of vscclient, I'm talking of an application that emulates a 
smart
card and initiates a tcp connection to qemu that connects to the usb-ccid 
device.
vscclient is also in the cac_card sources.
   


Okay, let me be clear.  We shouldn't be doing device emulation outside 
of QEMU's source tree--at least, not in this type of context.  External 
devices present a great deal of challenges and we shouldn't just 
approach it in an ad-hoc fashion.


I'm not opposed to passthrough although I'd prefer QEMU to talk to the 
device directly instead of going through a daemon.


There's a lot of delicate integration between QEMU and a device and if a 
device lives outside of QEMU, it makes it extremely difficult for us to 
influence changes to that device to support QEMU new features.



Regarding the method of reconnection: You are absolutely right that if I have 
qemu
connect to the remote instead of the other way around then I remove the need to 
inform
vscclient of the new address. But the way it stands requires the client to know 
the
address of the destination qemu. I have to inform it somehow. You are saying 
that
devices shouldn't know this information? ok, that's why I talked about monitor 
commands.
I come from the world of spice - in spice we use monitor commands for this.


And none of that is upstream.

Regards,

Anthony Liguori


  I could
change this to have qemu connect to vscclient, but I don't see the logic in 
general -
sometimes you do want to have a chardev that is listening (the fact that it is 
implemented
suggests someone found it useful), if you then migrate you have the same 
problem I'm
solving.

   

Regards,

Anthony Liguori

 

Alon Levy (2):
usb-ccid: add CCID device. add configure option.
usb-ccid: add CCID device (device itself)

   Makefile.objs  |1 +
   configure  |   12 +
   docs/usb-ccid.txt  |  115 +
   hw/usb-ccid.c  | 1376
   


 

   hw/vscard_common.h |  131 +
   5 files changed, 1635 insertions(+), 0 deletions(-)
   create mode 100644 docs/usb-ccid.txt
   create mode 100644 hw/usb-ccid.c
   create mode 100644 hw/vscard_common.h


   





[Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-12 Thread Anthony Liguori

On 10/12/2010 10:59 AM, Stefan Hajnoczi wrote:

On Tue, Oct 12, 2010 at 05:39:48PM +0200, Kevin Wolf wrote:
   

Am 12.10.2010 17:22, schrieb Anthony Liguori:
 

On 10/12/2010 10:08 AM, Kevin Wolf wrote:
   

   Otherwise we might destroy data that isn't
even touched by the guest request in case of a crash.

 

The failure scenarios are either that the cluster is leaked in which
case, the old version of the data is still present or the cluster is
orphaned because the L2 entry is written, in which case the old version
of the data is present.
   

Hm, how does the latter case work? Or rather, what do mean by orphaned?

 

Are you referring to a scenario where the cluster is partially written
because the data is present in the write cache and the write cache isn't
flushed on power failure?
   

The case I'm referring to is a COW. So let's assume a partial write to
an unallocated cluster, we then need to do a COW in pre/postfill. Then
we do a normal write and link the new cluster in the L2 table.

Assume that the write to the L2 table is already on the disk, but the
pre/postfill data isn't yet. At this point we have a bad state because
if we crash now we have lost the data that should have been copied from
the backing file.
 

In this case QED_F_NEED_CHECK is set and the invalid cluster offset
should be reset to zero on open.

However, I think we can get into a state where the pre/postfill data
isn't on the disk yet but another allocation has increased the file
size, making the unwritten cluster valid.  This fools consistency
check into thinking the data cluster (which was never written to on
disk) is valid.

Will think about this more tonight.
   


It's fairly simple to add a sync to this path.  It's probably worth 
checking the prefill/postfill for zeros and avoiding the write/sync if 
that's the case.  That should optimize the common cases of allocating 
new space within a file.


My intuition is that we can avoid the sync entirely but we'll need to 
think about it further.


Regards,

Anthony Liguori


Stefan
   





Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)

2010-10-12 Thread Anthony Liguori

On 10/12/2010 11:43 AM, Alon Levy wrote:

- Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/12/2010 11:03 AM, Alon Levy wrote:
 

- Anthony Liguorianth...@codemonkey.ws   wrote:


   

On 10/12/2010 07:58 AM, Alon Levy wrote:

 

This patch adds a new device, it is described in full in the
   

second
 


   

patch

 

intro and also in the documentation in docs. In brief it provides
   

a
 


   

standard

 

smart card reader device.

The first patch is the configure change and docs.
The second patch contains the actual device, I couldn't figure out
   

a
 


   

good

 

way to split it to ease review.

v2 changed:
* all QSIMPLEQ turned into fixed sized rings
* all allocated buffers turned into fixed size buffers
* added migration support
* added a message to tell client qemu has migrated to ip:port
 * for lack of monitor commands ip:port are 0:0, which causes
   

the
 


   

updated

 

  vscclient to connect to one port higher on the same host.
   

will
 


   

add monitor

 

  commands in a separate patch. tested with current setup.


   

This is way too much magic to live within a device.  Devices
 

manage
 

reconnecting themselves during migration.  When you create the
destination qemu instance, you specify what to connect to.

IOW,

On the source:

qemu -chardev tcp:localhost:1025,id=foo -usbdevice
 

ccid,chardev=foo
 

...

On the destination:

qemu -chardev tcp:localhost:1026,id=foo -usbdevice
 

ccid,chardev=foo
 

-incoming tcp:0.0.0.0:1024 ...

A connection happens when the device is created.

But now I'm even further confused then when I first reviewed it..
 

If
 

you're now supporting migration, does that mean that you're relying
 

on
 

the daemon to emulate the device?


 

Let me try to clarify this. Nothing has changed since the last patch
   

except
 

for what's in the notes, i.e. migration support. The device I'm
   

adding is a
 

reader. The reader is just a pipe between smart cards and the guest
   

operating
 

system. The smart card logic does live outside of this device, and
   

is available
 

in the cac_card sources at
   

http://cgit.freedesktop.org/~alon/cac_card/ (all of
 

this is in docs/usb_ccid.txt).

So when I speak of vscclient, I'm talking of an application that
   

emulates a smart
 

card and initiates a tcp connection to qemu that connects to the
   

usb-ccid device.
 

vscclient is also in the cac_card sources.

   

Okay, let me be clear.  We shouldn't be doing device emulation outside

of QEMU's source tree--at least, not in this type of context.
External
devices present a great deal of challenges and we shouldn't just
approach it in an ad-hoc fashion.
 

There are two devices:
  smart card reader, aka CCID device - emulated inside qemu. Submitted.
  smart card - emulated outside of qemu. Fully available, open source separate 
project.
   


And how does the smart card state get migrated during migration?  How do 
you keep it synced with QEMU?


I don't understand the use-case behind this.  Is this so that a local 
physical smart card can be passed through to a guest from a Spice client 
and when migration happens, the QEMU instance connects back to the Spice 
client?  So the device is never actually migrated?


Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)

2010-10-12 Thread Anthony Liguori

On 10/12/2010 12:09 PM, Alon Levy wrote:

The smart card is not being migrated. It is running on the client machine,
which is not being migrated/shutdown (same as vncviewer isn't migrated).

   


Ok, let's look at this compared to another similar use-case: USB 
passthrough of a webcam device that's remoted using USB over IP.


In this model, you have a USB bus that's modelled as a bus and a 
device.  Within the USB bus, you have additional devices.  These are all 
qdev devices and they may be emulated or they may be implemented using 
passthrough.  While we don't do it today, USB over IP would be just 
another form of passthrough.


Migration is a rather interesting challenge in this model.  You've got a 
mix of client state and server state on the USB over IP connection.  You 
could marshal up the client state and as long as you reconnected to the 
same server on the destination, I guess it would be okay.


I think the problem with your current implementation is that you've 
completed skipped the bus modelling and you're also using the Device 
over IP connection to implement device emulation.


What I would suggest is that you model the bus/device relationship via 
qdev and move the Smart Card emulation into QEMU.  I would also suggest 
adding proper passthrough support in QEMU.  CCID over IP is a reasonable 
thing to have but I think you've got way too much outside of QEMU right 
now such that long term maintenance is going to be exceedingly difficult.


Regards,

Anthony Liguori


I don't understand the use-case behind this.  Is this so that a local

physical smart card can be passed through to a guest from a Spice
client
and when migration happens, the QEMU instance connects back to the
Spice
client?  So the device is never actually migrated?

 

The *smart card* is never migrated. The ccid device is. Here is the scenario:

Host A: qemu_a
qemu_a: guest

Host B: vscclient
  - physical reader

Host C: qemu_b -incoming ..

yes, we will use this for SPICE, but this is submitted
to qemu on the hopes and with testing ensuring it is perfectly usable as is
without using SPICE, otherwise I wouldn't have sent it upstream.

non-SPICE usage:

1. user on B runs vscclient (and presumably the user has some connection to the 
guest to use the smartcard device, i.e. vnc/ssh/spice, but that's not relevant).
2. vscclient connects via tcp to qemu_a.
3. qemu_a starts migrating to qemu_b.
  (qemu_b is alive at this point, can receive incoming tcp connections on 
chardev - otherwise a migration fails immediately anyway)
4. pre_load for usb-ccid sends a Reconnect message
5. vscclient gets the Reconnect message, closes socket to qemu_a, opens socket 
to qemu_b
6. from guest pov nothing happened (no device detach/attach).

I have to stress that the main problem the migration intends to solve is to 
avoid a detach/attach in the guest. Actual
operations on the smartcard could possibly fail as a result of the migration, 
and it would not be a real problem (i.e.
we could live without, but we can't leave with a lock of the guest screen as a 
result of a migration). Which is why I
consider the current code good enough. It is certainly not perfect (short 
packet issue), or tested enough.

The SPICE usage scenario is basically the same, just replace vscclient with 
spicec, and
we don't need the Reconnect message since SPICE takes care of this for us (we 
just get
a channel detach/attach event if we care).
1. user on B runs spicec
2. spicec connects to qemu via spice channel, smart card channel connects to 
usb-ccid device via spicevmc chardev (so it doesn't care it's spice or not).
3. qemu_a migrates
4. spicec disconnects from qemu_a and connects to qemu_b
5. from guest os pov nothing happens on the ccid usb device.

   

Regards,

Anthony Liguori
 





Re: [Qemu-devel] [RFT] qemu 0.13.0-rc3

2010-10-12 Thread Anthony Liguori

On 10/12/2010 04:34 PM, Juergen Lock wrote:

In article4cb38c82.1090...@linux.vnet.ibm.com  you write:
   

After suffering from a prolonged maintainer softlockup, I'm attempting
to get 0.13.0 release process back on track.

I've tagged qemu-0.13.0-rc3 in git which only carries a few changes
since 0.13.0-rc1.  Most notably, a series of updates from Kevin Wolf and
Cam Macdonell's ivshmem device.

I think we're pretty good testing wise for the final release but I
wanted to offer a 24-hour window for last minute fixes.  I'm only
interested in the following:

1) Patches that are *tested* against the stable-0.13 branch that are
already committed in master.  Please tell me explicitly that you've
tested the patch and how you've tested it.

[...]
 

First tests:

- vmmouse seems broken (or disabled?  I don't see it in `info qdm'...)
   


I don't think vmmouse is part of qdev yet.


- to test arm emulation I tried booting my old zaurus images (-M terrier)
   which failed (qd fix below), and in the process also tested -M n800
   w/o an image which failed with:

qemu-system-arm: Duplicate ID 'null' for chardev
Can't create serial device, empty char device

   and that I got fixed by cherry-picking this commit from master:

6a8aabd3c132188ee8e0e82ef4aba09f782cbe96

From: Stefan Weilw...@mail.berlios.de
Date: Sun, 8 Aug 2010 14:09:26 +0200
Subject: [PATCH] hw/omap: Fix default setup for OMAP UART devices
   


Sorry, is the patch below an additional fix?  If so, please submit 
against master.


Regards,

Anthony Liguori


==

And the zaurus patch, problem was the 2nd scoop's base address
(0x08800040) gets rounded down to start of page which causes its
io read/write callbacks to be passed addresses 0x40 higher than
the code expects:  (as witnessed by Bad register offset messages
and failure to attach the internal CF disk aka microdrive at least.)

Signed-off-by: Juergen Lockn...@jelal.kn-bremen.de

--- a/hw/zaurus.c
+++ b/hw/zaurus.c
@@ -70,6 +70,10 @@ static uint32_t scoop_readb(void *opaque
  {
  ScoopInfo *s = (ScoopInfo *) opaque;

+// XXX Workaround for base address (0x08800040 in this case)
+// rounded down to start of page
+addr= 0x3f;
+
  switch (addr) {
  case SCOOP_MCR:
  return s-mcr;
@@ -104,6 +108,10 @@ static void scoop_writeb(void *opaque, t
  ScoopInfo *s = (ScoopInfo *) opaque;
  value= 0x;

+// XXX Workaround for base address (0x08800040 in this case)
+// rounded down to start of page
+addr= 0x3f;
+
  switch (addr) {
  case SCOOP_MCR:
  s-mcr = value;

   





Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v6)

2010-10-12 Thread Anthony Liguori
, RBD_SUFFIX);
   


sizeof(n)


+while (1) {
+qemu_free(outbuf);
+outbuf = qemu_malloc(len);
+
+r = rados_exec(s-header_pool, n, rbd, snap_list, NULL, 0,
+   outbuf, len);
+if (r  0) {
+error_report(rbd.snap_list execution failed failed: %s, 
strerror(-r));
+return r;
+}
+if (r != len)
+break;
+
+len *= 2;
+}
   


Is this really the only way to figure out how large the buffer should be??


+buf = outbuf;
+end = buf + len;
+
+if ((r = decode64(buf, end,snap_seq))  0)
+goto done_err;
+if ((r = decode32(buf, end,snap_count))  0)
+goto done_err;
+
+sn_tab = qemu_mallocz(snap_count * sizeof(QEMUSnapshotInfo));
+for (i = 0; i  snap_count; i++) {
+uint64_t id, image_size;
+char *snap_name;
+int name_len;
+
+if ((r = decode64(buf, end,id))  0)
+goto done_err;
+if ((r = decode64(buf, end,image_size))  0)
+goto done_err;
+if ((r = decode_str(buf, end,snap_name))  0)
+goto done_err;
+
+name_len = sizeof(sn_info-id_str) - 1;
+if (r  name_len)
+name_len = r;
+
+sn_info = sn_tab + i;
+pstrcpy(sn_info-id_str, name_len + 1, snap_name);
+pstrcpy(sn_info-name, name_len + 1, snap_name);
+qemu_free(snap_name);
+
+sn_info-vm_state_size = image_size;
+sn_info-date_sec = 0;
+sn_info-date_nsec = 0;
+sn_info-vm_clock_nsec = 0;
+}
+*psn_tab = sn_tab;
+qemu_free(outbuf);
+return snap_count;
+done_err:
+qemu_free(sn_tab);
+qemu_free(outbuf);
+return r;
+}
+
+static QEMUOptionParameter rbd_create_options[] = {
+{
+ .name = BLOCK_OPT_SIZE,
+ .type = OPT_SIZE,
+ .help = Virtual disk size
+},
+{
+ .name = BLOCK_OPT_CLUSTER_SIZE,
+ .type = OPT_SIZE,
+ .help = RBD object size
+},
+{NULL}
+};
+
+static BlockDriver bdrv_rbd = {
+.format_name= rbd,
+.instance_size  = sizeof(BDRVRBDState),
+.bdrv_file_open = rbd_open,
+.bdrv_close = rbd_close,
+.bdrv_create= rbd_create,
+.bdrv_get_info  = rbd_getinfo,
+.create_options = rbd_create_options,
+.bdrv_getlength = rbd_getlength,
+.protocol_name  = rbd,
+
+.bdrv_aio_readv = rbd_aio_readv,
+.bdrv_aio_writev= rbd_aio_writev,
+
+.bdrv_snapshot_create = rbd_snap_create,
+.bdrv_snapshot_list = rbd_snap_list,
+};
+
+static void bdrv_rbd_init(void)
+{
+bdrv_register(bdrv_rbd);
+}
+
+block_init(bdrv_rbd_init);
diff --git a/block/rbd_types.h b/block/rbd_types.h
new file mode 100644
index 000..c35d840
--- /dev/null
+++ b/block/rbd_types.h
@@ -0,0 +1,71 @@
+/*
+ * Ceph - scalable distributed file system
+ *
+ * Copyright (C) 2004-2010 Sage Weils...@newdream.net
+ *
+ * This is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1, as published by the Free Software
+ * Foundation.  See file COPYING.
+ *
+ */
+
+#ifndef CEPH_RBD_TYPES_H
+#define CEPH_RBD_TYPES_H
+
+
+/*
+ * rbd image 'foo' consists of objects
+ *   foo.rbd  - image metadata
+ *   foo.
+ *   foo.0001
+ *   ...  - data
+ */
+
+#define RBD_SUFFIX  .rbd
+#define RBD_DIRECTORY   rbd_directory
+#define RBD_INFOrbd_info
+
+#define RBD_DEFAULT_OBJ_ORDER   22   /* 4MB */
+
+#define RBD_MAX_OBJ_NAME_SIZE   96
+#define RBD_MAX_BLOCK_NAME_SIZE 24
+#define RBD_MAX_SEG_NAME_SIZE   128
+
+#define RBD_COMP_NONE   0
+#define RBD_CRYPT_NONE  0
+
+#define RBD_HEADER_TEXT   Rados Block Device Image\n
+#define RBD_HEADER_SIGNATURERBD
+#define RBD_HEADER_VERSION  001.005
+
+struct rbd_info {
+uint64_t max_id;
+} __attribute__ ((packed));
+
+struct rbd_obj_snap_ondisk {
+uint64_t id;
+uint64_t image_size;
+} __attribute__((packed));
+
+struct rbd_obj_header_ondisk {
+char text[40];
+char block_name[RBD_MAX_BLOCK_NAME_SIZE];
+char signature[4];
+char version[8];
+struct {
+uint8_t order;
+uint8_t crypt_type;
+uint8_t comp_type;
+uint8_t unused;
+} __attribute__((packed)) options;
+uint64_t image_size;
+uint64_t snap_seq;
+uint32_t snap_count;
+uint32_t reserved;
+uint64_t snap_names_len;
+struct rbd_obj_snap_ondisk snaps[0];
+} __attribute__((packed));
+
+
+#endif
   


These types don't match QEMU CODING_STYLE.

Regards,

Anthony Liguori


diff --git a/configure b/configure
index af50607..5d8f620 100755
--- a/configure
+++ b/configure
@@ -325,6 +325,7 @@ cpu_emulation=yes
  check_utests=no
  user_pie=no
  zero_malloc=
+rbd=

  # OS specific
  if check_define __linux__ ; then
@@ -724,6 +725,10 @@ for opt do
;;
--*dir)
;;
+  --disable-rbd) rbd=no
+  ;;
+  --enable-rbd) rbd=yes

Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori

On 10/13/2010 08:07 AM, Kevin Wolf wrote:

Am 13.10.2010 14:13, schrieb Stefan Hajnoczi:
   

We can avoid it when a backing image is not used.  Your idea to check
for zeroes in the backing image is neat too, it may well reduce the
common case even for backing images.
 

The additional requirement is that we're extending the file and not
reusing an old cluster. (And bdrv_has_zero_init() == true, but QED
doesn't work on host_devices anyway)
   


Yes, that's a good point.

BTW, I think we've decided that making it work on host_devices is not 
that bad.


We can add an additional feature called QED_F_PHYSICAL_SIZE.

This feature will add another field to the header that contains an 
offset immediately following the last cluster allocation.


During a metadata scan, we can accurately recreate this field so we only 
need to update this field whenever we clear the header dirty bit (which 
means during an fsync()).


That means we can maintain the physical size without introducing 
additional fsync()s in the allocation path.  Since we're already writing 
out the header anyway, the write operation is basically free too.


Regards,

Anthony Liguori


Kevin
   





Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori



 
 That means we can maintain the physical size without introducing
 additional fsync()s in the allocation path.  Since we're already
 writing out the header anyway, the write operation is basically
 free too.

  I don't see how it is free.  It's an extra write.  The good news is
  that it's very easy to amortize.

We only need to update the header field on disk when we're already
updating the header, so it's not even an extra write operation.


Why would you ever update the header, apart from relocating L1 for 
some reason?


To update the L1/L2 tables clean bit.  That's what prevents a check in 
the normal case where you have a clean shutdown.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori

On 10/13/2010 08:50 AM, Avi Kivity wrote:

 On 10/13/2010 03:24 PM, Anthony Liguori wrote:

On 10/13/2010 08:07 AM, Kevin Wolf wrote:

Am 13.10.2010 14:13, schrieb Stefan Hajnoczi:

We can avoid it when a backing image is not used.  Your idea to check
for zeroes in the backing image is neat too, it may well reduce the
common case even for backing images.

The additional requirement is that we're extending the file and not
reusing an old cluster. (And bdrv_has_zero_init() == true, but QED
doesn't work on host_devices anyway)


Yes, that's a good point.

BTW, I think we've decided that making it work on host_devices is not 
that bad.


We can add an additional feature called QED_F_PHYSICAL_SIZE.

This feature will add another field to the header that contains an 
offset immediately following the last cluster allocation.


During a metadata scan, we can accurately recreate this field so we 
only need to update this field whenever we clear the header dirty bit 
(which means during an fsync()).


If you make QED_F_PHYSICAL_SIZE an autoclear bit, you don't need the 
header dirty bit.


Yes, autoclear bits are essentially granular header dirty bits.

Regards,

Anthony Liguori





That means we can maintain the physical size without introducing 
additional fsync()s in the allocation path.  Since we're already 
writing out the header anyway, the write operation is basically free 
too.


I don't see how it is free.  It's an extra write.  The good news is 
that it's very easy to amortize.







Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori

On 10/13/2010 09:07 AM, Stefan Hajnoczi wrote:



That means we can maintain the physical size without introducing
additional fsync()s in the allocation path.  Since we're already
writing out the header anyway, the write operation is basically
free too.
   

I don't see how it is free.  It's an extra write.  The good news is
that it's very easy to amortize.
 

We only need to update the header field on disk when we're already
updating the header, so it's not even an extra write operation.
   


Because we're already writing out the sector that contains that field in 
the header.


Regards,

Anthony Liguori


Stefan
   





Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori

On 10/13/2010 09:16 AM, Avi Kivity wrote:

 On 10/13/2010 04:11 PM, Anthony Liguori wrote:
Why would you ever update the header, apart from relocating L1 for 
some reason?



To update the L1/L2 tables clean bit.  That's what prevents a check 
in the normal case where you have a clean shutdown.


I see - so you wouldn't update it every allocation, only when the disk 
has been quiet for a while.


Right, the current plan is to flush the header dirty bit on shutdown or 
whenever there is an explicit flush of the device.  Current that is 
caused by either a guest-initiated flush or a L1 update.  We also plan 
to add a timer-based flush such that a flush is scheduled for some 
period of time (like 5 minutes) after the dirty bit is set.


The end result should be that the only window for requiring a metadata 
scan is if a crash occurs within 5 minutes of a cluster allocation and 
an explicit flush has not occurred for some other reason.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support

2010-10-13 Thread Anthony Liguori

On 10/13/2010 10:08 AM, Avi Kivity wrote:

 On 10/13/2010 04:53 PM, Anthony Liguori wrote:

On 10/13/2010 09:16 AM, Avi Kivity wrote:

 On 10/13/2010 04:11 PM, Anthony Liguori wrote:
Why would you ever update the header, apart from relocating L1 for 
some reason?



To update the L1/L2 tables clean bit.  That's what prevents a check 
in the normal case where you have a clean shutdown.


I see - so you wouldn't update it every allocation, only when the 
disk has been quiet for a while.


Right, the current plan is to flush the header dirty bit on shutdown 
or whenever there is an explicit flush of the device.  Current that 
is caused by either a guest-initiated flush or a L1 update.


That does add an extra write (and a new write+flush later to mark the 
header dirty again when you start allocating).  I'd drop it and only 
use the timer.


in fact, it adds an extra flush too.  The sequence

1 L1 update
2 mark clean
3 flush

is unsafe since you can crash between 2 and 3, ad only 2 makes it.  So 
I'd do something like


You've got the order wrong.

1. L1 update
2. flush()
3. mark clean

If (3) doesn't make it to disk, that's okay.  It just causes an extra scan.


1 opportunistic flush (for whatever reason)
2 set timer
3 no intervening metadata changes
4 mark clean
5 no intervening metadata changes
6 mark dirty
7 flush
8 metadata changes


Not sure I see why we set the timer in step 2 as opposed to:

0 clear scheduled flush (if necessary)
1 opportunistic flush (for whatever reason)
2 mark clean
3 no intervening metadata changes
4 mark dirty
5 flush
6 schedule flush (in 5 minutes)
7 metadata changes

Which is now recorded at http://wiki.qemu.org/Features/QED/ScanAvoidance 
so we can keep track of this.


Regards,

Anthony Liguori









Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-13 Thread Anthony Liguori

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:


Hi,

Using the legacy way of starting up NICs, I am hitting a limitation after 29
NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the maximum
that can be supported by the qdev mothod?
   


I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device 
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=

done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
-enable-kvm


The key is to make the virtio-net devices multifunction and to fill out 
all 8 functions for each slot.


Regards,

Anthony Liguori


Thanks
Anjali


   





[Qemu-devel] [Bug 660366] Re: qemu-img convert -O qcow2 -o backing_file makes huge images

2010-10-14 Thread Anthony Liguori
** Changed in: qemu
   Importance: Undecided = Wishlist

** Changed in: qemu
   Status: New = Confirmed

-- 
qemu-img convert -O qcow2 -o backing_file makes huge images
https://bugs.launchpad.net/bugs/660366
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Confirmed

Bug description:
$ dd if=/dev/urandom bs=1M of=1.img count=4
4+0 records in
4+0 records out
4194304 bytes (4,2 MB) copied, 1,0413 s, 4,0 MB/s
$ qemu-img create -f qcow2 -b 1.img 2.img
Formatting '2.img', fmt=qcow2 size=4194304 backing_file='1.img' encryption=off 
cluster_size=0 
$ qemu-img convert -O qcow2 -o backing_file=1.img 2.img 3.img
$ du -h ?.img
4,1M1.img
144K2.img
4,3M3.img

The conversion result is bigger then the source!

It appears that -o backing_file is not applied to data (as expected). I.e. 
all data is put into the resulting image: both from source image and backing 
image.

Expected behavior is to put only data that is not present in backing_file.





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 07:07 AM, Avi Kivity wrote:

 On 10/14/2010 12:54 AM, Anthony Liguori wrote:

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:


Hi,

Using the legacy way of starting up NICs, I am hitting a limitation 
after 29

NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to 
get to

more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the 
maximum

that can be supported by the qdev mothod?


I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device 
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= 


done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
-enable-kvm


The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.


Yes.  Our hotplug design is based on devices..  This is wrong, it should 
be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not really 
sure what you're suggesting.  It would certainly be nice to make this 
all more user friendly (and make hotplug work).


Regards,

Anthony Liguori


What's the motivation for such a huge number of interfaces?





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 07:10 AM, Daniel P. Berrange wrote:

On Thu, Oct 14, 2010 at 02:07:17PM +0200, Avi Kivity wrote:
   

  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:
   

Hi,

Using the legacy way of starting up NICs, I am hitting a limitation
after 29
NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the
maximum
that can be supported by the qdev mothod?
 

I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm

The key is to make the virtio-net devices multifunction and to fill
out all 8 functions for each slot.
   

This is unlikely to work right wrt pci hotplug.  If we want to support a
large number of interfaces, we need true multiport cards.
 

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.
   


I've always been sceptical of this.  When physical systems have a large 
number of NICs, it's via multiple functions, not a bunch of PCI bridges.


With just a handful of 8-port NICs, you can exceed the current 
slot-based limit on physical hardware.  It's not an extremely common 
configuration but it does exist.


BTW, I don't think it's possible to hot-add physical functions.  I 
believe I know of a card that supports dynamic add of physical functions 
(pre-dating SR-IOV).


Regards,

Anthony Liguori


Daniel
   





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 07:36 AM, Markus Armbruster wrote:

Avi Kivitya...@redhat.com  writes:

   

  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:
   

Hi,

Using the legacy way of starting up NICs, I am hitting a limitation
after 29
NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the
maximum
that can be supported by the qdev mothod?
 

I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
 args=$args -netdev user,id=eth${slot}_${fn}
 args=$args -device
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm

The key is to make the virtio-net devices multifunction and to fill
out all 8 functions for each slot.
   

I'm amazed that works.  Can't see how creating another qdev in the same
slot makes a proper multifunction device.
   


multifunction=on sets the multifunction bit for the PCI device.  Then 
it's a matter of setting the address to be a specific function.


Our default platform devices are actually multifunction.


This is unlikely to work right wrt pci hotplug.  If we want to support
a large number of interfaces, we need true multiport cards.
 

Indeed.  As far as I know, we can't hot plug multifunction PCI devices.
   


Yup.

Regards,

Anthony Liguori


What's the motivation for such a huge number of interfaces?
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   





Re: [Qemu-devel] [Bug 660366] Re: qemu-img convert -O qcow2 -o backing_file makes huge images

2010-10-14 Thread Anthony Liguori

On 10/14/2010 07:51 AM, Stefan Hajnoczi wrote:

On Thu, Oct 14, 2010 at 1:26 PM, Anthony Liguorianth...@codemonkey.ws  wrote:
   

** Changed in: qemu
   Importance: Undecided =  Wishlist

** Changed in: qemu
   Status: New =  Confirmed
 

Thanks for doing this Anthony.  Can I set the status myself next time
or do we have rules on who handles bugs?
   


I'm pretty sure anyone can do it.  If not, I'm certainly willing to give 
people extra rights on the project if they want to help with bug triage.


Regards,

Anthony Liguori


Stefan
   





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 08:23 AM, Avi Kivity wrote:

 On 10/14/2010 02:54 PM, Anthony Liguori wrote:
The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.



Yes.  Our hotplug design is based on devices..  This is wrong, it 
should be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not really 
sure what you're suggesting.  It would certainly be nice to make this 
all more user friendly (and make hotplug work).




The big issue is to fix hotplug.


Yes, but this is entirely independent of multifunction devices.

Today we shoe-horn hot remove into device_del.  Instead, we should have 
explicit bus-level interfaces for hot remove.


Regards,

Anthony Liguori

I don't see how we can make it user friendly, without making the 
ordinary case even more unfriendly.  Looks like we need yet another 
level of indirection here.







Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)

2010-10-14 Thread Anthony Liguori

On 10/14/2010 01:37 PM, Robert Relyea wrote:

Anthony Liguori wrote:

   

  And how does the smart card state get migrated during migration? How
  do you keep it synced with QEMU?

  I don't understand the use-case behind this. Is this so that a local
  physical smart card can be passed through to a guest from a Spice
  client and when migration happens, the QEMU instance connects back to
  the Spice client? So the device is never actually migrated?
   

A lot of this discussion has confused me until I realized we are talking
2 different models.

My current understanding is that qemu assumes that all devices are local
to the qemu instance (that is on the host). When you migrate you want to
connect to the new hardware on the new host, not feed back to some
general client. The only exception seems to be mouse and keyboard, where
qemu depends on some external protocol (vncclient or xdesktop or the x
protocol itself) to transport the mouse and keyboard events.

Our model has been that the smart card is local to the user/client --
like the mouse and keyboard. When you migrate qemu you do not migrate
the smart card itself, since it's still physically on your client
machine (like the mouse and keyboard), and needs to be managed by the
local drivers on that client machine (which knows how to talk to the
specific smart card installed there). So the daemon stays right where
it's at and connects to the new qemu instance as it comes up. This is
where I think I was confused about your migration question. I think you
are assuming that the smart card itself connects to new hardware on the
new host, meaning the daemon itself needs to move. If that is the
semantic you are trying to present, then you are quite right, it's
ludicrous to have the external daemon as part of the emulation.
   



Remote device passthrough is just a special case of passthrough.  It's 
got interesting characteristics in that unlike local device passthrough, 
if you preserve the connection to the remove device, migration is still 
possible.


However, remote device *emulation* is the thing that I'm concerned 
about.  Having a device emulated outside of QEMU means that it's not 
possible to participate in many of QEMU's features (like live migration, 
tracing, debugging, etc.).  Device creation is extremely complicated 
because you have to launch the external daemon and somehow configure that.


I have no objection to remote device passthrough but I don't think 
remote device emulation is right for QEMU today.


After talking to Alon in IRC, I think a better model for Spice would be 
to integrate the smart card emulation into QEMU and then develop a 
specific protocol for the smart card emulation to interface with the 
physical smart card.  This interface isn't really any different than the 
network interface or the block interface in QEMU today.


Regards,

Anthony Liguori


It now appears to me that qemu punts on this case, except for the
keyboard and mouse -- well maybe not punts, but simply doesn't support
any device that isn't on the host machine. If you look at the way qemu
handles the sound device, for instance. Normally you want the sound to
come out the speakers of the controlling console, not a random server
that's hosting the guest. However, straight qemu doesn't handle things
that way. The sound (if it comes out at all) comes out the server that
qemu is installed on. When you migrate qemu, the sound now comes out the
new server.

This probably isn't a problem since most of the time someone is using
the speaker, he's got the case where host == client. In that case it
makes perfect sense to put the emulator inside qemu. In the case where
we are running a hosted server service, it's highly unlikely anyone is
going to be using sound (or an attached webcam, etc.). In fact migration
for these devices are really a noop.

Smart cards are really like these devices. In fact more than a few
keyboards have built in smart card readers. The smart card model is I
want the smart card at the same location as my keyboard and mouse. I use
that set of smart cards to authenticate. The use case on machines
running with a server is that some customers have a requirement that you
need the smart card to log in and administer those machines. Those smart
cards are ones the operator carries with him, not ones that would sit on
some server farm. For their requirements, one needs a way to get back to
the local client.

As I said before, I don't think this requirement is unique. The only way
to handle it is to run code on the client machine. The devices that run
on that client are ones you don't migrate with qemu, but stay with the
client itself and reconnect to the new instance. I agree that having a
daemon for each devices will eventually become unweildy. It looks like
spice is the answer for this scenario. If you have devices other than
the mouse/keyboard/display that are located on the client == host, then
you should assume the need for spice and not use

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Anthony Liguori

On 10/14/2010 02:44 PM, Jes Sorensen wrote:

On 10/14/10 20:33, Alex Williamson wrote:
   

We can't let the compiler define the alignment for qemu_cfg data.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

0.13 stable candidate?
 

ACK I would say so.
   


fw_cfg interfaces are somewhat difficult to rationalize about for 
compatibility.


0.13.0 is tagged already so it's too late to pull it in there.  If we 
say we don't care about compatibility at the fw_cfg level, then it 
doesn't matter if we pull it into stable-0.13.  If we do care, then this 
is an ABI breaker.


I don't know that the answer is obvious to me.

Regards,

Anthony Liguori


Jes

   





Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Anthony Liguori

On 10/14/2010 02:58 PM, Alex Williamson wrote:

On Thu, 2010-10-14 at 14:48 -0500, Anthony Liguori wrote:
   

On 10/14/2010 02:44 PM, Jes Sorensen wrote:
 

On 10/14/10 20:33, Alex Williamson wrote:

   

We can't let the compiler define the alignment for qemu_cfg data.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

0.13 stable candidate?

 

ACK I would say so.

   

fw_cfg interfaces are somewhat difficult to rationalize about for
compatibility.

0.13.0 is tagged already so it's too late to pull it in there.  If we
say we don't care about compatibility at the fw_cfg level, then it
doesn't matter if we pull it into stable-0.13.  If we do care, then this
is an ABI breaker.
 

If it works anywhere (I assume it works on 32bit), then it's only
because it happened to get the alignment right.  This just makes 64bit
hosts get it right too.  I don't see any compatibility issues,
non-packed + 64bit = broken.  Thanks,
   


Ok, I'll buy that argument :-)

Regards,

Anthony Liguori


Alex

   





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:

On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:
   

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.
 

Break the 29/30/31 virtio-blk limit ... please!
   


It was broken ages ago:

anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
422 /proc/partitions
 251 1618  1 vdcx2
 251 1621 489951 vdcx5
 251 1632   10485760 vdcy
 251 16339992398 vdcy1
 251 1634  1 vdcy2
 251 1637 489951 vdcy5
 251 1648   10485760 vdcz
 251 16499992398 vdcz1
 251 1650  1 vdcz2
 251 1653 489951 vdcz5

This is what makes qdev so useful.

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -drive 
file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}
args=$args -device 
virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on

done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
-enable-kvm -serial stdio


Regards,

Anthony Liguori


Rich.

   





Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 05:00 PM, Anjali Kulkarni wrote:

Can you send me pointers to the qdev documentation? How can I use it? Will
it allow us to scale above the 32 PCI limit?
   


It's all below.  You just have to create a PCI device and mark the 
multifunction flag to on and then assign it a PCI address that includes 
a function number.  Then you can pack 8 virtio PCI devices into a single 
slot.


Regards,

Anthony Liguori


Anjali


On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:
 

On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:

   

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.

 

Break the 29/30/31 virtio-blk limit ... please!

   

It was broken ages ago:

anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
422 /proc/partitions
   251 1618  1 vdcx2
   251 1621 489951 vdcx5
   251 1632   10485760 vdcy
   251 16339992398 vdcy1
   251 1634  1 vdcy2
   251 1637 489951 vdcy5
   251 1648   10485760 vdcz
   251 16499992398 vdcz1
   251 1650  1 vdcz2
   251 1653 489951 vdcz5

This is what makes qdev so useful.

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
  args=$args -drive
file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}
  args=$args -device
virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm -serial stdio

Regards,

Anthony Liguori

 

Rich.


   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
   





Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)

2010-10-14 Thread Anthony Liguori

On 10/14/2010 05:03 PM, Robert Relyea wrote:
Remote device passthrough is just a special case of passthrough.  
It's got interesting characteristics in that unlike local device 
passthrough, if you preserve the connection to the remove device, 
migration is still possible.


However, remote device *emulation* is the thing that I'm concerned 
about.  Having a device emulated outside of QEMU means that it's not 
possible to participate in many of QEMU's features (like live 
migration, tracing, debugging, etc.).  Device creation is extremely 
complicated because you have to launch the external daemon and 
somehow configure that.
There's always some emulation going on on the client side. The client 
side has the device drivers, so you are either emulating an actual 
device or you are emulating the abstraction you invent. Once you have 
the client side, you have to launch the external daemon anyway.


That's not a very convincing argument.

It's pretty simple really.  We don't want to split QEMU into a bunch of 
different daemons that all implement device emulation in slightly 
different ways.  The user complexity is enormous and the ability to 
manage the complexity because impossible because nothing is centralized.


It seems to me that the best way to go is to provide the native host 
== client support like other devices and allow the passthru. If you 
really need client support, just run spice (which is a single client 
daemon that handles everything).


Let's not confuse passthrough with implementing device emulation outside 
of QEMU.  They are two very different things.


I think a remote passthrough protocol who's sole purpose is to allow 
external device emulation is a bad idea for QEMU.




After talking to Alon in IRC, I think a better model for Spice would 
be to integrate the smart card emulation into QEMU and then develop a 
specific protocol for the smart card emulation to interface with the 
physical smart card.  This interface isn't really any different than 
the network interface or the block interface in QEMU today.
I seems to me that a second protocol is overkill. Having 2 protocols 
is a bit much to manage. We can do everything we need with the passthru.


How is external device emulation not overkill?  I don't see why two 
protocols are necessary.  You just need one.


My worry about creating any thing else is we may not have the 
flexibility to handle future cards. Smart cards themselves are 
programmable, so the interface for new cards are pretty dynamic.


My worry is that we're creating an impossible situation to maintain in 
the long term because device emulation is happening in 10 different 
places.  If there's a bug in your smart card emulation, a guest can now 
break into a Spice client.  Part of the advantage of keeping everything 
contained in a single place (QEMU) is that we can restrict QEMU from a 
security perspective via sVirt and other mechanisms.  Once you split 
apart device emulation, you break that security model.


Regards,

Anthony Liguori


bob


Regards,

Anthony Liguori


It now appears to me that qemu punts on this case, except for the
keyboard and mouse -- well maybe not punts, but simply doesn't support
any device that isn't on the host machine. If you look at the way qemu
handles the sound device, for instance. Normally you want the sound to
come out the speakers of the controlling console, not a random server
that's hosting the guest. However, straight qemu doesn't handle things
that way. The sound (if it comes out at all) comes out the server that
qemu is installed on. When you migrate qemu, the sound now comes out the
new server.

This probably isn't a problem since most of the time someone is using
the speaker, he's got the case where host == client. In that case it
makes perfect sense to put the emulator inside qemu. In the case where
we are running a hosted server service, it's highly unlikely anyone is
going to be using sound (or an attached webcam, etc.). In fact migration
for these devices are really a noop.

Smart cards are really like these devices. In fact more than a few
keyboards have built in smart card readers. The smart card model is I
want the smart card at the same location as my keyboard and mouse. I use
that set of smart cards to authenticate. The use case on machines
running with a server is that some customers have a requirement that you
need the smart card to log in and administer those machines. Those smart
cards are ones the operator carries with him, not ones that would sit on
some server farm. For their requirements, one needs a way to get back to
the local client.

As I said before, I don't think this requirement is unique. The only way
to handle it is to run code on the client machine. The devices that run
on that client are ones you don't migrate with qemu, but stay with the
client itself and reconnect to the new instance. I agree that having a
daemon for each devices will eventually become unweildy. It looks like
spice

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori

On 10/14/2010 05:12 PM, Anjali Kulkarni wrote:

Thanks. Does this work for e1000 as well?
   


Haven't tried.  I don't know how various e1000 drivers would react.


Also, does it support pci hotplug?
   


No, but that's fixable down the road.

Regards,

Anthony Liguori


Anjali

On 10/14/10 3:09 PM, Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/14/2010 05:00 PM, Anjali Kulkarni wrote:
 

Can you send me pointers to the qdev documentation? How can I use it? Will
it allow us to scale above the 32 PCI limit?

   

It's all below.  You just have to create a PCI device and mark the
multifunction flag to on and then assign it a PCI address that includes
a function number.  Then you can pack 8 virtio PCI devices into a single
slot.

Regards,

Anthony Liguori

 

Anjali


On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws   wrote:


   

On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:

 

On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:


   

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.


 

Break the 29/30/31 virtio-blk limit ... please!


   

It was broken ages ago:

anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
422 /proc/partitions
251 1618  1 vdcx2
251 1621 489951 vdcx5
251 1632   10485760 vdcy
251 16339992398 vdcy1
251 1634  1 vdcy2
251 1637 489951 vdcy5
251 1648   10485760 vdcz
251 16499992398 vdcz1
251 1650  1 vdcz2
251 1653 489951 vdcz5

This is what makes qdev so useful.

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
   args=$args -drive

 

file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}

   

   args=$args -device
virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm -serial stdio

Regards,

Anthony Liguori


 

Rich.



   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

 


   
 
   





[Qemu-devel] Re: Passing in additional info to guest OS and e1000 test suite?

2010-10-15 Thread Anthony Liguori

On 10/15/2010 01:49 PM, Anjali Kulkarni wrote:

Hi,

- If I want to pass in additional arguments to the guest OS while booting(in
particular which slot I want to map a nic to) - is there any way to do it?
Some kind of configuration file that I can pass in would also be ok for me.
- Is there a KVM/Qemu/e1000 test suite that is already available and I can
us?
   


I'm a little confused about what you're really asking.  Can you give a 
more illustrated example of what you're asking?  Are you using libvirt?



Thanks
Anjali

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   





Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI device id in PCI ROM

2010-10-15 Thread Anthony Liguori

On 10/15/2010 03:51 PM, Stefan Weil wrote:

PCI device with different device ids sometimes share
the same rom code. Only the device id and the checksum
differ in a boot rom for such devices.
   


BTW, SeaBIOS doesn't reject ROMs when they're loaded via rombar, only 
when they're loaded via romfile.


Maybe it's better to use fw_cfg to explicitly tell SeaBIOS to ignore the 
PCI device id in the rom header for a certain device?


Regards,

Anthony Liguori


The i825xx ethernet controller family is a typical example
which is implemented in hw/eepro100.c. It uses at least
3 different device ids, so normally 3 boot roms would be needed.

By automatically patching the device id (and the checksum)
in qemu, all emulated family members can share the same
boot rom.

Cc: Markus Armbrusterarm...@redhat.com
Cc: Michael S. Tsirkinm...@redhat.com
Signed-off-by: Stefan Weilw...@mail.berlios.de
---
  hw/pci.c |   53 +
  1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 1280d4d..c1f8218 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1797,6 +1797,57 @@ static void pci_map_option_rom(PCIDevice *pdev, int 
region_num, pcibus_t addr, p
  cpu_register_physical_memory(addr, size, pdev-rom_offset);
  }

+/* Patch the PCI device id in a PCI rom image if necessary.
+   This is needed for an option rom which is used for more than one device. */
+static void pci_patch_device_id(PCIDevice *pdev, uint8_t *ptr, int size)
+{
+uint16_t vendor_id;
+uint16_t device_id;
+uint16_t rom_vendor_id;
+uint16_t rom_device_id;
+uint16_t rom_magic;
+uint16_t pcir_offset;
+
+/* Words in rom data are little endian (like in PCI configuration),
+   so they can be read / written with pci_get_word / pci_set_word. */
+
+/* Only a valid rom will be patched. */
+rom_magic = pci_get_word(ptr);
+if (rom_magic != 0xaa55) {
+PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic);
+return;
+}
+pcir_offset = pci_get_word(ptr + 0x18);
+if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 4)) {
+PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, pcir_offset);
+return;
+}
+
+vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID);
+device_id = pci_get_word(pdev-config + PCI_DEVICE_ID);
+rom_vendor_id = pci_get_word(ptr + pcir_offset + 4);
+rom_device_id = pci_get_word(ptr + pcir_offset + 6);
+
+/* Don't patch a rom with wrong vendor id (might be changed if needed). */
+if (vendor_id != rom_vendor_id) {
+return;
+}
+
+PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n,
+vendor_id, device_id, rom_vendor_id, rom_device_id);
+
+if (device_id != rom_device_id) {
+/* Patch device id and checksum (at offset 6 for etherboot roms). */
+uint8_t checksum;
+checksum = ptr[6];
+checksum += (uint8_t)rom_device_id + (uint8_t)(rom_device_id  8);
+checksum -= (uint8_t)device_id + (uint8_t)(device_id  8);
+PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum);
+ptr[6] = checksum;
+pci_set_word(ptr + pcir_offset + 6, device_id);
+}
+}
+
  /* Add an option rom for the device */
  static int pci_add_option_rom(PCIDevice *pdev)
  {
@@ -1849,6 +1900,8 @@ static int pci_add_option_rom(PCIDevice *pdev)
  load_image(path, ptr);
  qemu_free(path);

+pci_patch_device_id(pdev, ptr, size);
+
  pci_register_bar(pdev, PCI_ROM_SLOT, size,
   0, pci_map_option_rom);

   





Re: [Qemu-devel] Changelog of qemu-0.13.0.tar.gz ?

2010-10-18 Thread Anthony Liguori

On 10/17/2010 11:58 PM, Sergei Steshenko wrote:

Hello,

though there is already

http://download.savannah.gnu.org/releases/qemu/qemu-0.13.0.tar.gz

available, I don't see its changelog on

http://wiki.qemu.org/Index.html
.

Is it expected to be this way ?
   


I haven't sent the announce yet (which will be coming very soon).

Regards,

Anthony Liguori


Thanks,
   Sergei.





   





Re: [Qemu-devel] LP#584139

2010-10-18 Thread Anthony Liguori

On 10/18/2010 06:34 AM, Michael Tokarev wrote:

Can we fix this trivial bug please?
See:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578846
  https://bugs.launchpad.net/qemu/+bug/584139

I switched qemu-kvm in debian to use qemu-keymaps package
(separately packaged keymaps), but it re-introduces
debian#578846.

Thanks!
   


Can you send a patch to the mailing list with a Signed-off-by and an 
explanation of the change?


Regards,

Anthony Liguori


/mjt

   





[Qemu-devel] [ANNOUNCE] Release 0.13.0 of QEMU

2010-10-18 Thread Anthony Liguori

The QEMU team is pleased to announce the availability of the 0.13.0 release.

This release consists of over 2,500 commits from 145 contributors.

Some major features were added in this release including:

 - vhost-net: kernel-accelerating network backend for virtio devices 
(using KVM)

 - qmp: significant improvements covering most monitor commands
 - vnc: introduction of new encodings that dramatically improve 
bandwidth (part of GSoC project)
 - ivshmem: new shared memory device allowing multiple guests to share 
a memory region

 - mips: introduction of fulong mini-pc
 - virtio-9p: introduction of a paravirtual file system passthrough 
mechanism

 - hpet: many enhancements
 - target-s390: support for s390 usermode emulation
 - many more features and bug fixes

It can be downloaded from Savannah at:

http://download.savannah.gnu.org/releases/qemu/qemu-0.13.0.tar.gz

For detailed Changelogs, please consult the revision history in git.

On behalf of the QEMU team, I'd like to thank everyone who contributed
to make this release happen!

A special note about QMP support in 0.13.0.  QMP is still considered 
experimental in 0.13.0.  There are no plans to change the protocol in an 
incompatible way but there are likely to be missing features.


Regards,

Anthony Liguori



[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-10-18 Thread Anthony Liguori

On 10/18/2010 03:22 AM, Roedel, Joerg wrote:

(Sorry for the late reply)

On Thu, Oct 07, 2010 at 08:48:06AM -0400, Anthony Liguori wrote:
   

On 10/07/2010 03:42 AM, Roedel, Joerg wrote:
 

On Wed, Oct 06, 2010 at 03:24:59PM -0400, Anthony Liguori wrote:

   

+qemu_compat_version = machine-compat_version;
+
if (display_type == DT_NOGRAPHIC) {
if (default_parallel)
add_device_config(DEV_PARALLEL, null);
--
1.7.0.4


 

Looks fine to me, given CPUs are not in qdev. Anthony?


   

The idea is fine, but why not just add the default CPU to the machine
description?

 

If I remember correctly the reason was that the machine description was
not accessible in the cpuid initialization path because it is a function
local variable.
   

Not tested at all but I think the attached patch addresses it in a
pretty nice way.

There's a couple ways you could support your patch on top of this.  You
could add a kvm_cpu_model to the machine structure that gets defaulted
too if kvm_enabled().  You could also introduce a new KVM machine type
that gets defaulted to if no explicit machine is specified.
 

I had something similar in mind but then I realized that we need at
least a cpu_model and a cpu_model_kvm to distinguish between the TCG and
the KVM case.
   


I would think that having different default machines for KVM and TCG 
would be a better solution.



Further the QEMUMachine data structure is used for all architectures in
QEMU and the model-names only make sense for x86.


SPARC uses cpu_model too FWIW.  I believe Blue Swirl has even discussed 
using a feature-format similar to how x86 does it for SPARC CPUs.


Regards,

Anthony Liguori


  So I decided for the
comapt-version way (which doesn't mean I object against this one ;-) )

Joerg

   

 From d2370c88cef4b07d48ba3c4804e35ae2db8db7c0 Mon Sep 17 00:00:00 2001
From: Anthony Liguorialigu...@us.ibm.com
Date: Thu, 7 Oct 2010 07:43:42 -0500
Subject: [PATCH] machine: make default cpu model part of machine structure

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/hw/boards.h b/hw/boards.h
index 6f0f0d7..8c6ef27 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -16,6 +16,7 @@ typedef struct QEMUMachine {
  const char *name;
  const char *alias;
  const char *desc;
+const char *cpu_model;
  QEMUMachineInitFunc *init;
  int use_scsi;
  int max_cpus;
diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..0826107 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -866,14 +866,6 @@ void pc_cpus_init(const char *cpu_model)
  int i;

  /* init CPUs */
-if (cpu_model == NULL) {
-#ifdef TARGET_X86_64
-cpu_model = qemu64;
-#else
-cpu_model = qemu32;
-#endif
-}
-
  for(i = 0; i  smp_cpus; i++) {
  pc_new_cpu(cpu_model);
  }
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 12359a7..919b4d6 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -204,17 +204,22 @@ static void pc_init_isa(ram_addr_t ram_size,
  const char *initrd_filename,
  const char *cpu_model)
  {
-if (cpu_model == NULL)
-cpu_model = 486;
  pc_init1(ram_size, boot_device,
   kernel_filename, kernel_cmdline,
   initrd_filename, cpu_model, 0);
  }

+#ifdef TARGET_X86_64
+#define DEF_CPU_MODEL qemu64
+#else
+#define DEF_CPU_MODEL qemu32
+#endif
+
  static QEMUMachine pc_machine = {
  .name = pc-0.13,
  .alias = pc,
  .desc = Standard PC,
+.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .is_default = 1,
@@ -223,6 +228,7 @@ static QEMUMachine pc_machine = {
  static QEMUMachine pc_machine_v0_12 = {
  .name = pc-0.12,
  .desc = Standard PC,
+.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
@@ -242,6 +248,7 @@ static QEMUMachine pc_machine_v0_12 = {
  static QEMUMachine pc_machine_v0_11 = {
  .name = pc-0.11,
  .desc = Standard PC, qemu 0.11,
+.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
@@ -277,6 +284,7 @@ static QEMUMachine pc_machine_v0_11 = {
  static QEMUMachine pc_machine_v0_10 = {
  .name = pc-0.10,
  .desc = Standard PC, qemu 0.10,
+.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
@@ -324,6 +332,7 @@ static QEMUMachine pc_machine_v0_10 = {
  static QEMUMachine isapc_machine = {
  .name = isapc,
  .desc = ISA-only PC,
+.cpu_model = 486,
  .init = pc_init_isa,
  .max_cpus = 1,
  };
diff --git a/vl.c b/vl.c
index df414ef..3a55cc8 100644
--- a/vl.c
+++ b/vl.c
@@ -2904,6 +2904,10 @@ int main(int argc, char **argv, char **envp)
  }
  qemu_add_globals();

+if (cpu_model == NULL) {
+cpu_model = machine-cpu_model;
+}
+
  machine-init

Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 05:09 AM, Gerd Hoffmann wrote:

On 10/15/10 23:05, Anthony Liguori wrote:

On 10/15/2010 03:51 PM, Stefan Weil wrote:

PCI device with different device ids sometimes share
the same rom code. Only the device id and the checksum
differ in a boot rom for such devices.


BTW, SeaBIOS doesn't reject ROMs when they're loaded via rombar, only
when they're loaded via romfile.


SeaBIOS rejects them when loaded from the rom bar and doesn't reject 
them when loaded via fw_cfg.


What I meant was, rombar=0 in qdev.  Sometimes my fingers don't work the 
same way my brain does :-)


Using the rom bar is the prefered way though, fw_cfg is only there for 
compatibility with older versions.



Maybe it's better to use fw_cfg to explicitly tell SeaBIOS to ignore the
PCI device id in the rom header for a certain device?


Patching the rom is fine IMHO.  Why create + use a separate 
communication path when we can use a much simpler approach?


How does this interact with PCI device passthrough?

We clearly can't patch in that case whereas if we had a hint to SeaBIOS, 
it would still work.


Regards,

Anthony Liguori


cheers,
  Gerd






Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 12:58 PM, Michael S. Tsirkin wrote:

On Mon, Oct 18, 2010 at 07:55:11PM +0200, Stefan Weil wrote:
   

PCI devices with different vendor or device ids sometimes share
the same rom code. Only the ids and the checksum
differs in a boot rom for such devices.

The i825xx ethernet controller family is a typical example
which is implemented in hw/eepro100.c. It uses at least
3 different device ids, so normally 3 boot roms would be needed.

By automatically patching vendor id and device id (and the checksum)
in qemu, all emulated family members can share the same boot rom.

VGA bios roms are another example with different vendor and device ids.

v2:

* Patch also the vendor id (and remove the sanity check for vendor id).

Cc: Gerd Hoffmannkra...@redhat.com
Cc: Markus Armbrusterarm...@redhat.com
Cc: Michael S. Tsirkinm...@redhat.com
Signed-off-by: Stefan Weilw...@mail.berlios.de
---
  hw/pci.c |   58 ++
  1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 1280d4d..139eb24 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice *pdev, int 
region_num, pcibus_t addr, p
  cpu_register_physical_memory(addr, size, pdev-rom_offset);
  }

+/* Patch the PCI vendor and device ids in a PCI rom image if necessary.
+   This is needed for an option rom which is used for more than one device. */
+static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size)
 

let's return an error code on malformed roms so management can detect errors?
   


A bad/missing PnP header does not mean it's an invalid ROM.

Regards,

Anthony Liguori





Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 12:55 PM, Stefan Weil wrote:

PCI devices with different vendor or device ids sometimes share
the same rom code. Only the ids and the checksum
differs in a boot rom for such devices.

The i825xx ethernet controller family is a typical example
which is implemented in hw/eepro100.c. It uses at least
3 different device ids, so normally 3 boot roms would be needed.

By automatically patching vendor id and device id (and the checksum)
in qemu, all emulated family members can share the same boot rom.

VGA bios roms are another example with different vendor and device ids.

v2:

* Patch also the vendor id (and remove the sanity check for vendor id).

Cc: Gerd Hoffmannkra...@redhat.com
Cc: Markus Armbrusterarm...@redhat.com
Cc: Michael S. Tsirkinm...@redhat.com
Signed-off-by: Stefan Weilw...@mail.berlios.de
   


I get very nervous about patching a ROM.  Who's to say that the ROM 
doesn't somehow depend on the contents of its header?  Maybe it has an 
internal CRC built into it or something like that.


Regards,

Anthony Liguori


---
  hw/pci.c |   58 ++
  1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 1280d4d..139eb24 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice *pdev, int 
region_num, pcibus_t addr, p
  cpu_register_physical_memory(addr, size, pdev-rom_offset);
  }

+/* Patch the PCI vendor and device ids in a PCI rom image if necessary.
+   This is needed for an option rom which is used for more than one device. */
+static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size)
+{
+uint16_t vendor_id;
+uint16_t device_id;
+uint16_t rom_vendor_id;
+uint16_t rom_device_id;
+uint16_t rom_magic;
+uint16_t pcir_offset;
+uint8_t checksum;
+
+/* Words in rom data are little endian (like in PCI configuration),
+   so they can be read / written with pci_get_word / pci_set_word. */
+
+/* Only a valid rom will be patched. */
+rom_magic = pci_get_word(ptr);
+if (rom_magic != 0xaa55) {
+PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic);
+return;
+}
+pcir_offset = pci_get_word(ptr + 0x18);
+if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 4)) {
+PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, pcir_offset);
+return;
+}
+
+vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID);
+device_id = pci_get_word(pdev-config + PCI_DEVICE_ID);
+rom_vendor_id = pci_get_word(ptr + pcir_offset + 4);
+rom_device_id = pci_get_word(ptr + pcir_offset + 6);
+
+PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n,
+vendor_id, device_id, rom_vendor_id, rom_device_id);
+
+checksum = ptr[6];
+
+if (vendor_id != rom_vendor_id) {
+/* Patch vendor id and checksum (at offset 6 for etherboot roms). */
+checksum += (uint8_t)rom_vendor_id + (uint8_t)(rom_vendor_id  8);
+checksum -= (uint8_t)vendor_id + (uint8_t)(vendor_id  8);
+PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum);
+ptr[6] = checksum;
+pci_set_word(ptr + pcir_offset + 4, vendor_id);
+}
+
+if (device_id != rom_device_id) {
+/* Patch device id and checksum (at offset 6 for etherboot roms). */
+checksum += (uint8_t)rom_device_id + (uint8_t)(rom_device_id  8);
+checksum -= (uint8_t)device_id + (uint8_t)(device_id  8);
+PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum);
+ptr[6] = checksum;
+pci_set_word(ptr + pcir_offset + 6, device_id);
+}
+}
+
  /* Add an option rom for the device */
  static int pci_add_option_rom(PCIDevice *pdev)
  {
@@ -1849,6 +1905,8 @@ static int pci_add_option_rom(PCIDevice *pdev)
  load_image(path, ptr);
  qemu_free(path);

+pci_patch_ids(pdev, ptr, size);
+
  pci_register_bar(pdev, PCI_ROM_SLOT, size,
   0, pci_map_option_rom);

   





Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 01:44 PM, Anthony Liguori wrote:

On 10/18/2010 12:55 PM, Stefan Weil wrote:

PCI devices with different vendor or device ids sometimes share
the same rom code. Only the ids and the checksum
differs in a boot rom for such devices.

The i825xx ethernet controller family is a typical example
which is implemented in hw/eepro100.c. It uses at least
3 different device ids, so normally 3 boot roms would be needed.

By automatically patching vendor id and device id (and the checksum)
in qemu, all emulated family members can share the same boot rom.

VGA bios roms are another example with different vendor and device ids.

v2:

* Patch also the vendor id (and remove the sanity check for vendor id).

Cc: Gerd Hoffmannkra...@redhat.com
Cc: Markus Armbrusterarm...@redhat.com
Cc: Michael S. Tsirkinm...@redhat.com
Signed-off-by: Stefan Weilw...@mail.berlios.de


I get very nervous about patching a ROM.  Who's to say that the ROM 
doesn't somehow depend on the contents of its header?  Maybe it has an 
internal CRC built into it or something like that.


As part of PMM, ROMs typically reduce their size by decompressing and 
removing code or something of that nature and then rewrite themselves in 
scratch RAM.  The BIOS then copies the resulting ROM (using the ROM size 
in the base header as an indication of how much to copy) into the option 
ROM space.


So the likelihood of depending on the contents of the header seems 
non-trivial to me.


Regards,

Anthony Liguori


Regards,

Anthony Liguori


---
  hw/pci.c |   58 
++

  1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 1280d4d..139eb24 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice 
*pdev, int region_num, pcibus_t addr, p

  cpu_register_physical_memory(addr, size, pdev-rom_offset);
  }

+/* Patch the PCI vendor and device ids in a PCI rom image if necessary.
+   This is needed for an option rom which is used for more than one 
device. */

+static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size)
+{
+uint16_t vendor_id;
+uint16_t device_id;
+uint16_t rom_vendor_id;
+uint16_t rom_device_id;
+uint16_t rom_magic;
+uint16_t pcir_offset;
+uint8_t checksum;
+
+/* Words in rom data are little endian (like in PCI configuration),
+   so they can be read / written with pci_get_word / 
pci_set_word. */

+
+/* Only a valid rom will be patched. */
+rom_magic = pci_get_word(ptr);
+if (rom_magic != 0xaa55) {
+PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic);
+return;
+}
+pcir_offset = pci_get_word(ptr + 0x18);
+if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 
4)) {
+PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, 
pcir_offset);

+return;
+}
+
+vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID);
+device_id = pci_get_word(pdev-config + PCI_DEVICE_ID);
+rom_vendor_id = pci_get_word(ptr + pcir_offset + 4);
+rom_device_id = pci_get_word(ptr + pcir_offset + 6);
+
+PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n,
+vendor_id, device_id, rom_vendor_id, rom_device_id);
+
+checksum = ptr[6];
+
+if (vendor_id != rom_vendor_id) {
+/* Patch vendor id and checksum (at offset 6 for etherboot 
roms). */
+checksum += (uint8_t)rom_vendor_id + 
(uint8_t)(rom_vendor_id  8);

+checksum -= (uint8_t)vendor_id + (uint8_t)(vendor_id  8);
+PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum);
+ptr[6] = checksum;
+pci_set_word(ptr + pcir_offset + 4, vendor_id);
+}
+
+if (device_id != rom_device_id) {
+/* Patch device id and checksum (at offset 6 for etherboot 
roms). */
+checksum += (uint8_t)rom_device_id + 
(uint8_t)(rom_device_id  8);

+checksum -= (uint8_t)device_id + (uint8_t)(device_id  8);
+PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum);
+ptr[6] = checksum;
+pci_set_word(ptr + pcir_offset + 6, device_id);
+}
+}
+
  /* Add an option rom for the device */
  static int pci_add_option_rom(PCIDevice *pdev)
  {
@@ -1849,6 +1905,8 @@ static int pci_add_option_rom(PCIDevice *pdev)
  load_image(path, ptr);
  qemu_free(path);

+pci_patch_ids(pdev, ptr, size);
+
  pci_register_bar(pdev, PCI_ROM_SLOT, size,
   0, pci_map_option_rom);









Re: Testing of russian keymap (was Re: [Qemu-devel] [PATCH] fix '/' and '|' on russian keymap)

2010-10-18 Thread Anthony Liguori

On 10/18/2010 12:30 PM, Oleg Sadov wrote:

I don't understand reasons for such locale-default keyboard settings for
qemu too, but may be it's useful for someone...
   


-k only exists to deal with crappy VNC clients.

If you use a good VNC client (like vinagre or virt-viewer) then you 
don't have to use -k.


Regards,

Anthony Liguori


Thanks!

/mjt
 

Regards!
--Oleg


   





Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 02:03 PM, Michael S. Tsirkin wrote:

On Mon, Oct 18, 2010 at 01:42:06PM -0500, Anthony Liguori wrote:
   

+/* Patch the PCI vendor and device ids in a PCI rom image if necessary.
+   This is needed for an option rom which is used for more than one device. */
+static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size)
 

let's return an error code on malformed roms so management can detect errors?
   

A bad/missing PnP header does not mean it's an invalid ROM.
 

I don't see this as a generic capability - rather a specific
hack that helps reduce some duplication for eepro100 and friends.
As such, if we can't patch the id we know it's an invalid file.
   


This code is unconditional in the pci option rom loading path.

If it's restricted to a qdev property that's defaulted to enabled for 
the eepro cards, that would be a reasonable argument to make.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM

2010-10-18 Thread Anthony Liguori

On 10/18/2010 02:36 PM, Stefan Weil wrote:

Maybe a more perfect solution would only patch the preconfigured
rom files but not user configured files, but I don't think we
need this degree of perfection.


Generally speaking, patching third-party code is not something that we 
should get in the habit of doing unless we're very very sure that it's 
okay and we have as many checks in place as possible to avoid bad things 
from happening.


There are so many bad things that can happen.  If attempted to support 
attestation in QEMU and prepopulated a virtual TPM with checksums from 
the BIOS and ROMs, when the virtual BIOS attempts to measure itself if 
we've patched the ROM underneath of it, then the measurements will fail.


In the very least, if we go this route, it has to be an optional feature.

Regards,

Anthony Liguori


Regards,
Stefan






Re: [Qemu-devel] [PATCH 0/7] ATAPI CDROM passthrough v5

2010-10-18 Thread Anthony Liguori

On 10/18/2010 06:29 PM, Alexander Graf wrote:

A user will get a really nasty surprise if they think they can use a flag or 
rely on QEMU to prevent a VM from doing something nasty with a device.  If they 
have this feeling of security, they're likely to chmod the device to allow 
unprivileged users to access it.

But how a device handles ATAPI commands is totally up to the device.  If you 
issue the wrong sequence, I'm sure there are devices out there that totally 
hose themselves.  Are you absolutely confident that every ATAPI device out 
there is completely safe against hostile code provided that you simply prevent 
the FW update commands?  I'm certainly not.
 

Ping?
   


Who are you pinging?

Regards,

Anthony Liguori


Alex


   





Re: [Qemu-devel] Re: KVM call agenda for Oct 19

2010-10-19 Thread Anthony Liguori

On 10/19/2010 08:03 AM, Avi Kivity wrote:

 On 10/19/2010 02:58 PM, Dor Laor wrote:

On 10/19/2010 02:55 PM, Avi Kivity wrote:

On 10/19/2010 02:48 PM, Dor Laor wrote:

On 10/19/2010 04:11 AM, Chris Wright wrote:

* Juan Quintela (quint...@redhat.com) wrote:


Please send in any agenda items you are interested in covering.


- 0.13.X -stable handoff
- 0.14 planning
- threadlet work
- virtfs proposals



- Live snapshots
- We were asked to add this feature for external qcow2
images. Will simple approach of fsync + tracking each requested
backing file (it can be per vDisk) and re-open the new image would
be accepted?
- Integration with FS freeze for consistent guest app snapshot
Many apps do not sync their ram state to disk correctly or frequent
enough. Physical world backup software calls fs freeze on xfs and
VSS for windows to make the backup consistent.
In order to integrated this with live snapshots we need a guest
agent to trigger the guest fs freeze.
We can either have qemu communicate with the agent directly through
virtio-serial or have a mgmt daemon use virtio-serial to
communicate with the guest in addition to QMP messages about the
live snapshot state.
Preferences? The first solution complicates qemu while the second
complicates mgmt.


Third option, make the freeze path management - qemu - virtio-blk -
guest kernel - file systems. The advantage is that it's easy to
associate file systems with a block device this way.


OTH the userspace freeze path already exist and now you create 
another path. 


I guess we would still have a userspace daemon; instead of talking to 
virtio-serial it talks to virtio-blk.  So:


  management - qemu - virtio-blk - guest driver - kernel fs 
resolver - daemon - apps


Yuck.


Yeah, in Windows, I'm pretty sure the freeze API is a userspace 
concept.  Various apps can hook into it to serialize their state.


At the risk of stealing Mike's thunder, we've actually been working on a 
simple guest agent exactly for this type of task.  Mike's planning an 
RFC for later this week but for those that are interested the repo is at 
http://repo.or.cz/w/qemu/mdroth.git


Regards,

Anthony Liguori



What about FS that span over LVM with multiple drives? IDE/SCSI?


Good points.






<    4   5   6   7   8   9   10   11   12   13   >