Re: [Qemu-devel] [PATCH] Guest mouse cursor drawing in SDL
andrzej zaborowski wrote: (the pixel format of the cursor was the same as the pixel format of DisplayState). I'm not sure if we want to always use the same pixel format - for example with VMware SVGA and SDL in 16 bit mode, the cursor pixel format reported by guest Xorg was 8 bpp. This would mean two conversions instead of one: first VMware SVGA would have to convert 8 - 16 bpp then SDL 16 - 1 bpp. VNC expects the cursor data to be in the current pixel format. The nice thing about this approach is you don't need to pass a bunch of pixel info (for instance, you don't need to pass the depth, endianness, etc.). The Cirrus hardware cursor also has to be copied anyway and doing the conversion is really straight forward. One question in my mind is what the alpha mask should look like. All that VNC (and SDL) can use is a 1-bit alpha depth. That's all Cirrus supports too. VMware SVGA supports an 8-bit alpha channel though so it may make sense to design the interface now to support that. VMware SVGA without CAP_ALPHA_CURSOR also does only 1-bit alpha. Maybe we should keep two masks - if the frontend (e.g SDL) supports only 1-bit alpha it would use only this basic mask - if the emulated VGA supports only 1-bit alpha it would leave the other mask opaque. I thought about that. Right now with my patch, the cursor mask is always 1-bit deep. I was thinking that perhaps the right thing to do is to make it always 8-bits deep and then convert in VNC. Do you think using the SDL cursor is all that useful? As soon as gtk widgets get involved, the cursor becomes ARGB so in practice, I'm not sure that it's all that helpful. BTW, I've got the ALPHA_CURSOR working with VNC.. it's very sweet :-) Regards, Anthony Liguori Regards, Andrzej ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu-cvs-2007-3-22 compile error (in cygwin)
[EMAIL PROTECTED] wrote: I have exactly the same error, using MinGW. I commented out the faulty line in vl.c, and compiled again. Now, I have following error : C:/msys/home/Denis/qemu/hw/pckbd.c:376: undefined reference to `vmmouse_init' make[1]: *** [qemu.exe] Error 1 make[1]: Leaving directory `/home/Denis/qemu/i386-softmmu' make: *** [subdir-i386-softmmu] Error 2 You probably need to rerun configure. I do not see this problem using a mingw cross compiler. Regards, Anthony Liguori Selon Christian MICHON [EMAIL PROTECTED]: use mingw to compile qemu for windows hosts, not cygwin. if I recall well, SDL does not cope well on cygwin. -- Christian ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works
Axel Zeuner wrote: Hi, Hi Axel, By adding some GCC4 fixes on top of your patch, I was able to get qemu for i386 (on i386) to compile and run. So far, I've only tested a win2k guest. The big problem (which pbrook helped me with) was GCC4 freaking out over some stq's. Splitting up the 64bit ops into 32bit ops seemed to address most of the problems. The tricky thing I still can't figure out is how to get ASM_SOFTMMU working. The problem is GLUE(st, SUFFIX) function. First GCC cannot deal with the register pressure. The problem I can't seem to fix though is that GCC sticks %1 in %esi because we're only using an r constraint, not a q constraint. This results in the generation of %sib which is an invalid register. However, refactoring the code to not require a q constraint doesn't seem to help either. The attached patch is what I have so far. Some help with people more familiar with gcc asm foo would be appreciated! Regards, Anthony Liguori there were a lot of discussions about compiling qemu with gcc4 or higher. The summary of the discussions were, as I understood, that compiling qemu with gcc4 requires changing the code generation engine of the most of the supported targets. These changes require a lot of work and time. How about splitting the current static code generation process further? Today gcc produces object code and dyngen adapts it for the purposes of qemu, i.e produces the generation function, patches in parameters ..: gcc -c op.o op.c ;dyngen -o op.h ... op.o . The op_XXX functions generated by gcc may not contain more than one exit and this exit must be at the end, no not intended jumps to external functions may occur. It is possible to split the transformation into the following steps: Generate assembly output from the C-Sources: gcc -S -o op-0.s op.c. Convert the assembly output: cvtasm op.s op-0.s. Assemble the converted assembler sources: as -o op.o op.s. Use dyngen as before: dyngen -o op.h ... op.o. Nothing will change if cvtasm copies only the input to the output, i.e. this additional pass will not break existing code. A full featured converter (cvtasm) has a lot of dependencies: it has to support all hosts (M) (with all assembler dialects M') and all targets N, i.e. in the worst case one would end with M'x N variants of it, or M x N if one supports only one assembler dialect per host. It is clear, that the number of variants is one of the biggest disadvantages of such an approach. Now I will focus on x86_64 host and x86_64-softmmu target. cvtasm has to do the following tasks in this case: 0) convert repXXX; ret to ret only. (Not done yet, x86_64 only, but does not harm). 1) append to all functions, where the last instruction is not a return a ret instruction. 2) add a label to all functions with more than one return before the last return. 3) replace all returns not at the end of a function with an unconditional jump to the generated end label. Avoid touching op_exit_tb here. 4) check all jump instructions if they contain jumps to external labels, replace jumps to external labels with calls to the labels. The task 0-2 are easy, task 3 may, task 4 is definitely target/host dependent, because there exist intentionally some jumps to external labels, i.e. outside of the function, for instance op_goto_tb. Please correct me, if I am wrong or something is not mentioned above. The attached cvtasm.c allows compiling op.c/op.s/op.o without any disabled optimisations in Makefile.target (patches for Makefile and Makefile.target are attached). The program itself definitely needs a rewrite, is not failsafe and produces to much output on stdout. The macro OP_GOTO_TB from exec-all.h in the general case contains two nice variables and label definitions to force a reference from a variable into the op_goto_tbXXX functions. Unfortunately gcc4 detects that these variables and lables are unused and suppresses their generation, as result dyngen does not generate two lines in op.h: case INDEX_op_goto_tb0: ... label_offsets[0] = 8 + (gen_code_ptr - gen_code_buf); // -- ... case INDEX_op_goto_tb1: ... label_offsets[1] = 8 + (gen_code_ptr - gen_code_buf); // -- ... and qemu produces a SIGSEGV on the first jump from one buffer to the next. I was not able to force gcc4 to generate the two variables, therefore I had to replace the general macro with a host dependent one for x86_64 similar to x86 but using the indirect branch method. After the replacement qemu worked when compiled with gcc4. I made my checks with the following compilers using Debian testing amd64: gcc version 3.4.6 (Debian 3.4.6-5) and gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21). Please note: These patches work only for x86_64 hosts and x86_64 targets. They will break all other architectures. I did not check i386-softmmu. It works for me. I apologise for the size of the attachments. Kind regards Axel
Re: [Qemu-devel] Recursion in cpu_physical_memory_rw
Herbert Xu wrote: On Wed, Nov 15, 2006 at 12:57:24AM +, Paul Brook wrote: It isn't always system memory. Some DMA controllers deliberately write to device FIFOs. There are also several devices which map areas of onboard RAM. At minimum you need to make those to use RAM mappings rather than MMIO. I'm not suggesting that we change all existing users of cpu_physical_* to a new interface that only accessed RAM. However, for cases where it is obvious that only system RAM is intended (e.g., rtl8139), it makes sense to bypass MMIO handlers. If a device is recursively writing to itself I'd take this as sign that the guest OS is already pretty screwed. I'm not sure what happens in this situation on real hardware, but I wouldn't be surprised if it caused similar effects by flooding the bus. The scenario here is a compromised guest attempting to harm a host such as Xen. The only harm done to a host is that the process will take as much CPU as it can get. This is really only a problem in Xen because the device model is in Domain-0. Once the device model is in a different domain, it doesn't matter anymore as the normal scheduler parameters can be used to ensure that no other hosts are harmed. Regards, Anthony Liguori Cheers,
[Qemu-devel] Re: using mmap?
Mark Williamson wrote: I'm also doubtful how much benefit it gave in practice. I'm sure it would be good for synthetic CPU benchmarks. However using mmap significantly increases the overhead of context switches/tlb misses. To get good overall performance I suspect you're going to need closer cooperation with the kernel than mmap gives you. If you really want to make cross-emulation go fast I suggest working with the xen and/or kvm people to integrate qemu dynamic translation into those products. In theory I'd guess you should be able to plug it in as an alternative to the HVM code. I've no idea how close that is to being practical. http://wiki.xensource.com/xenwiki/HVM/V2E The v2e stuff allows execution state to be extracted from the real CPU, plugged into QEmu for a bit for emulation, then transferred back to the real CPU again. This is initially to be used for supporting Big Real Mode emulation on HVM platforms. Later on it's planned to be used to accelerate IO emulation further. Eventually this may provide a means to use QEmu's translator to execute kernel code whilst running user mode code under Xen. It may be that this isn't as fast as other approaches, but it'd be a useful feature for Xen to have IMO. I'd absolute like to get there. The current Xen HVM code is a bit of a mess at the moment. There are far too many assumptions about having VT/SVM hardware present. However, I'd really like to get to a point where Xen could run unmodified guests on non VT/SVM hardware using qemu with user virtualization. KVM is having similar trouble ATM with big real mode. It would be very nice to also add a V2E like interface to KVM. I'd also like to see the translator even used for paravirtual guests. This way we could still have a proper boot loader run without having to deal with paravirtualizing it. It would be nice to use emulation to avoid paravirtualizing the non-performance critical bits. Regards, Anthony Liguori Cheers, Mark Wouldn't this be a *significant* performance enhancement for this setup which I'm sure is a common one? Maybe this can be implemented for regular processes on the guest and only use the softmmu for the kernel? Would someone point me in the right direction for technical information? I have had to switch to vmware free player until Qemu+KQEMU reaches a point of similar performance, but I would really rather see Qemu advance further. If you're using an accelerator (eg. kqemu or kvm) this is all irelevant as most code isn't run by qemu, it's virtualized by the accelerator. qemu just does the IO emulation. Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] [patch] factor out commonly used scancode translation table
Bernhard Fischer wrote: On Thu, Jan 04, 2007 at 06:10:30PM +, Thiemo Seufer wrote: Bernhard Fischer wrote: On Thu, Jan 04, 2007 at 12:52:56PM -0500, Jonathan Phenix wrote: Bernhard Fischer wrote: Hi, The attached patch moves the x_keycode_to_pc_keycode LUT from sdl.c into an x_keycode.c. This struct is also used by the GGI backend (that is not yet merged ¹). Comments? How it is done right now, each time x_keycode.c is included, you will end up with an extra copy in the final executable. Perhaps that simply keeping the LUT in sdl.c but removing the 'static' keyword from it and creating a sdl.h file with the statement: Yes, or create one public accessor func (_translate_keycode() or the like). I don't have SDL installed, so only have the LUT once, but you're of course right. What's the preferred method? public LUT or public accessor? Public accessor, I'd say. Keystroke processing isn't performance critical. New patch with a public accessor is attached. Couldn't think of a better name, please feel free to change it.. Is it worth evening worry about this if the GGD patch isn't going to eventually end up in CVS? Why have the abstraction if it's not going to be used? I'm not really sure I see the value in having GGD... Regards, Anthony Liguori thanks, --- ../qemu_trunk.orig/vl.h 2006-12-27 14:17:48.0 +0100 +++ vl.h2007-01-04 19:27:07.0 +0100 @@ -869,6 +873,9 @@ void cocoa_display_init(DisplayState *ds /* vnc.c */ void vnc_display_init(DisplayState *ds, const char *display); +/* x_keymap.c */ +extern uint8_t _translate_keycode(const int key); + /* ide.c */ #define MAX_DISKS 4 --- ../qemu_trunk.orig/sdl.c 2006-12-11 23:33:26.0 +0100 +++ sdl.c 2007-01-04 19:28:30.0 +0100 @@ -122,88 +122,6 @@ static uint8_t sdl_keyevent_to_keycode(c #else -static const uint8_t x_keycode_to_pc_keycode[115] = { - 0xc7, /* 97 Home */ - 0xc8, /* 98 Up */ - 0xc9, /* 99 PgUp */ - 0xcb, /* 100 Left */ - 0x4c,/* 101 KP-5 */ - 0xcd, /* 102 Right */ - 0xcf, /* 103 End*/ - 0xd0, /* 104 Down */ - 0xd1, /* 105 PgDn */ - 0xd2, /* 106 Ins*/ - 0xd3, /* 107 Del*/ - 0x9c, /* 108 Enter */ - 0x9d, /* 109 Ctrl-R */ - 0x0, /* 110 Pause */ - 0xb7, /* 111 Print */ - 0xb5, /* 112 Divide */ - 0xb8, /* 113 Alt-R */ - 0xc6, /* 114 Break */ - 0x0, /* 115 */ - 0x0, /* 116 */ - 0x0, /* 117 */ - 0x0, /* 118 */ - 0x0, /* 119 */ - 0x0, /* 120 */ - 0x0, /* 121 */ - 0x0, /* 122 */ - 0x0, /* 123 */ - 0x0, /* 124 */ - 0x0, /* 125 */ - 0x0, /* 126 */ - 0x0, /* 127 */ - 0x0, /* 128 */ - 0x79, /* 129 Henkan */ - 0x0, /* 130 */ - 0x7b, /* 131 Muhenkan */ - 0x0, /* 132 */ - 0x7d, /* 133 Yen */ - 0x0, /* 134 */ - 0x0, /* 135 */ - 0x47, /* 136 KP_7 */ - 0x48, /* 137 KP_8 */ - 0x49, /* 138 KP_9 */ - 0x4b, /* 139 KP_4 */ - 0x4c, /* 140 KP_5 */ - 0x4d, /* 141 KP_6 */ - 0x4f, /* 142 KP_1 */ - 0x50, /* 143 KP_2 */ - 0x51, /* 144 KP_3 */ - 0x52, /* 145 KP_0 */ - 0x53, /* 146 KP_. */ - 0x47, /* 147 KP_HOME */ - 0x48, /* 148 KP_UP */ - 0x49, /* 149 KP_PgUp */ - 0x4b, /* 150 KP_Left */ - 0x4c, /* 151 KP_ */ - 0x4d, /* 152 KP_Right */ - 0x4f, /* 153 KP_End */ - 0x50, /* 154 KP_Down */ - 0x51, /* 155 KP_PgDn */ - 0x52, /* 156 KP_Ins */ - 0x53, /* 157 KP_Del */ - 0x0, /* 158 */ - 0x0, /* 159 */ - 0x0, /* 160 */ - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 170 */ - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 180 */ - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 190 */ - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* 200 */ - 0x0, /* 201 */ - 0x0, /* 202 */ - 0x0, /* 203 */ - 0x0, /* 204 */ - 0x0, /* 205 */ - 0x0, /* 206 */ - 0x0, /* 207 */ - 0x70, /* 208 Hiragana_Katakana */ - 0x0, /* 209 */ - 0x0, /* 210 */ - 0x73, /* 211 backslash */ -}; - static uint8_t sdl_keyevent_to_keycode(const SDL_KeyboardEvent *ev) { int keycode; @@ -216,7 +134,7 @@ static uint8_t sdl_keyevent_to_keycode(c keycode -= 8; /* just an offset */ } else if (keycode 212) { /* use conversion table */ -keycode
Re: [Qemu-devel] QEMU: VNC
Christopher Olsen wrote: Anyone here know if there is a way to append to the VNC display header? I'm not sure I understand what you're asking. What is the VNC display header? Regards, Anthony Liguori -Christopher
Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding
Ramesh Dharan wrote: [EMAIL PROTECTED] wrote: This extension is documented at http://tocm.wikidot.com/pointertypechange VMware has a very similar extension for their remote console. I believe that Ramesh Dharan (whom I've CCed) at some point implemented it in one or more open source clients. Perhaps some coordination here is possible? Hi Ramesh, Sorry for the delayed response. Believe it or not, your note just made it to qemu-devel (along with a month or so of backlogged mail). Yeah I played implemented support for VMware's relative pointer extension, as well as our grab metaphor, and some other stuff, on top of the RealVNC 4.1.1 codebase. I unfortunately never got the patch cleaned up enough for submitting back to RealVNC however... I have a work-in-progress document that defines our extensions but it's not really ready for public consumption yet. I'm not sure of the context here, but I assume QEMU is using VNC as the wire protocol for displaying framebuffer contents remotely? Yup. Are you writing a new VNC client from scratch or building on one of the existing GPL'ed clients? At the moment, we work out of the box with existing clients (albeit with reduced functionality). I have my own VNC client (gtk-vnc). The VNC extensions are also supported by virt-manager. As they mature, I'd like to get them into the main VNC clients too. The main limitations we've run into with using the VNC protocol for VM interaction are: (1) no support for keyboard scancodes (VNC uses higher level symbolic keys which aren't necessarily 1:1 mappable back to their scancode equivalents in all the scenarios we care about) Yes, right now, we do not address this. (2) no relative mouse support We have an extension that addresses this. (3) no alpha cursor support There is a pseudo-encoding that provides alpha cursors with a 1-bit alpha channel. An 8 bit alpha channel would be nice of course. (4) no dynamic pixel depth/bpp change support This is something I do want to fix. This isn't just a server-side issue, there is also a race in the protocol when a client uses SetPixelFormat. A proper notification of when the server switches pixel format after a SetPixelFormat would solve this problem. (5) No notion of multiplexing displays on a single port. Right. ATM, I don't really plan on addressing this. (6) No client-server message negotiation. We actually can do this. I've reserved the 255 client message ID which I'm using to multiplex various other messages. I currently use this to implement a shared memory pseudo-encoding for VNC. The idea is that on localhost, the server can render directly to a shared memory segment that the client also maps. If the one is smart about how this shared memory segment is chosen, it can be used as an XShmImage. I implemented new client-server messages for the (1) and (2), and server-client pseudorectangle extensions for (3) and (4). We deal with (5) in a different way; our display multiplexing is handled at a higher level. We ignored (6) since our clients currently only talk to our servers. Looking at the linked site, it looks like you folks have already identified and are planning to deal with (2), possibly (5) (via some games with the display name?) and (6). I'd be particularly interested to hear more about plans for how to address (5), it would be great if off-the-shelf/3rd-party clients could talk to multiple VMs running on a host without needing to know and use a separate the port for each VM. Our remote consoles/clients are heavily tied to our products and there aren't any real plans to divest them or make them more generally useful in the near future. However, that's likely to change someday, and anyway I'd be happy to provide feedback on, and implement support for these extensions so that e.g. a QEMU client could talk to a VMware instance, and in general get standardization to the point where it's possible to build a single remote client that could talk to the various VM implementations (QEMU, Xen, VMware, etc.). I believe http://wiki.multimedia.cx/index.php?title=VMNC covers some of your extensions right? Have you gotten the appropriate encodings reserved in the RFB spec? I don't want to support any encodings that aren't properly registered so this would be an important first step. I'm certainly happy to work toward a common set of VNC extensions to cover virtualization. It's in everyones benefit to ensure that the largest number of clients Just Work out of the box. Regards, Anthony Liguori -- Ramesh
Re: [Qemu-devel] QEMU/pc and scsi disks
Joe Batt wrote: I disagree. /bin/sh makes a very flexible config file format that I use. I use it on win32, Linux and Mac OS X. The problem with only taking command line arguments is that the number and size of command line arguments are severely limited on certain platforms. This is why almost every sufficiently aged/portable program supports either a config file or a method of taking command line options via a file. The later is really just a particular format of a config file. If you care about scripting QEMU, then just do something like: qemu -config - EOF hda=${myhda} hdb=${myhdb} EOF Regards, Anthony Liguori I would prefer that you write another cross platform shell, than another config file. At least that way I could use the same config tool for more than one application. Everytime this comes up, do I have to disagree again, so that my voice is not lost? Any yes, adding features that I do not use increases the complexity and decreases the stability of the features I do use, so it would effect me. I have the same feeling about embedding VNC authentication, the samba server, etc. Joe On Mar 2, 2007, at 11:24 AM, Anthony Liguori wrote: Paul Brook wrote: There's also no reason to limit to 7 disks, and we should support scsi cdroms. The reason for 7 is the number of available id on the scsi bus. For wide scsi it is 15. I wouldn't bet on wide scsi working. For PCI based systems you can add more host adapters to get more devices. I haven't actually tested this, but it should work. I think most people agree that we need a config file. I haven't seen any comments on my config file patch though. So, any comments on that patch? Any requirements on a format? Regards, Anthony Liguori Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding
Ramesh Dharan wrote: I implemented new client-server messages for the (1) and (2), and I should have read more carefully. This means that you're not a compliant RFB server ATM. I guess that means there isn't a point registering the pseudo-encodings you are currently using since you have to change the server/client anyway. It's easier anyway since pseudo-encodings are supposed to be negative numbers and you're using positive numbers. I'm happy to start looking at reimplementing your encodings using my pseudo-encoding range. It should be real easy for ya'll to support these too. I'll spend a little time in the next week enumerating these extensions and assigning proper values to them. Do you have any documentation on your currently client messages? Those aren't documented as part of the FFMPEG project. Regards, Anthony Liguori server-client pseudorectangle extensions for (3) and (4). We deal with (5) in a different way; our display multiplexing is handled at a higher level. We ignored (6) since our clients currently only talk to our servers. Looking at the linked site, it looks like you folks have already identified and are planning to deal with (2), possibly (5) (via some games with the display name?) and (6). I'd be particularly interested to hear more about plans for how to address (5), it would be great if off-the-shelf/3rd-party clients could talk to multiple VMs running on a host without needing to know and use a separate the port for each VM. Our remote consoles/clients are heavily tied to our products and there aren't any real plans to divest them or make them more generally useful in the near future. However, that's likely to change someday, and anyway I'd be happy to provide feedback on, and implement support for these extensions so that e.g. a QEMU client could talk to a VMware instance, and in general get standardization to the point where it's possible to build a single remote client that could talk to the various VM implementations (QEMU, Xen, VMware, etc.). -- Ramesh
Re: [Qemu-devel] Re: [PATCH] Reducing X communication bandwidth, take 2
Brian Johnson wrote: Paul Brook wrote: Will this work also for the CL542x adaptor? (Does that fall in the category of vga?) My current hack works for with/without -std-vga and I think that's because it lives underneath both, in the connection to SDL. Each adapter will have to do it's own minimization but that's sort of the write thing anyway IMHO. How granular each update is really only depends on the adapter. For instance, the VMware adapter really shouldn't need to do any minimization at all. It would be nice if we could share the framebuffer blitting routines. We've currently got 3 different implementations (vga/cirrus, tcx and pl110) of basically the same framebuffer rendering routines. Take a look at the video code in BasiliskII / SheepShaver, a 68k/PPC classic MacOS emulator written by Gwenolé Beauchesne: http://gwenole.beauchesne.info/projects/sheepshaver/ It contains optimized code (source level) for blitting between various bit depths and endiannesses. See SheepShaver-2.3/src/Unix/video_blit.{h,cpp} in the sources. It also uses a technique called video on segfault (VOSF) to improve performance on platforms which support it: This is what QEMU already does. In fact, it's an exceedingly common technique. Regards, Anthony Liguori rather than testing each store to see if it modifies the framebuffer, it keeps the framebuffer write-protected (via mprotect(), or the equivalent on non-POSIX systems) and uses a SIGSEGV handler to catch stores to the buffer. When a page receives a store, the handler unprotects the page and updates a bitmap of modified pages. Every so often a display update thread wakes up, consults the bitmap, calculates the updated region, blits it to the screen (using the optimized blitters), and clears the bitmap. See SheepShaver-2.3/src/Unix/video_vosf.h, SheepShaver-2.3/src/Unix/video_x.cpp, and other files. The emulators also have alternative techniques for tracking update regions on systems for which VOSF is not supported. But VOSF is almost always a big win. And most modern OSes can support it without trouble. The code is in C++ so it can't be dropped in directly, but Some of the techniques may be useful in qemu. Brian J. Johnson
Re: [Qemu-devel] [PATCH] Implement Win32 locking in vl.c (create_pidfile).
Carlos O'Donell wrote: The following patch implements Win32 locking for vl.c (create_pidfile). Builds for mingw32, and tested on a Windows Server 2003 host by running two qemu's with the same -pidfile option. When cross-compiling the use of --enable-mingw32 should effect AIOLIBS, otherwise the build will use -lrt and this is not valid when compiling with mingw32. Instead of using an #ifdef, I think it would be better to have a win32 specific file and a unix specific file that compiles conditionally depending on the host. Sort of like and osdep-posix.c and and osdep-win32.c. Both would implement a create_pidfile() function. I think this general approach would help clean up a lot of the win32 code. Thoughts? Regards, Anthony Liguori Patch attached for both. Built on linux and tested on Windows Server 2003. Comments? Please include me in the CC. Cheers, Carlos.
Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works
Axel Zeuner wrote: On Saturday 24 March 2007 21:15, Anthony Liguori wrote: The tricky thing I still can't figure out is how to get ASM_SOFTMMU working. The problem is GLUE(st, SUFFIX) function. First GCC cannot deal with the register pressure. The problem I can't seem to fix though is that GCC sticks %1 in %esi because we're only using an r constraint, not a q constraint. This results in the generation of %sib which is an invalid register. However, refactoring the code to not require a q constraint doesn't seem to help either. Hi Anthony, could you please try the attached patch for softmmu_header.h? Allows compiling with gcc4 and ASM_SOFTMMU. That did the trick. Could you explain what your changes did? Regards, Anthony Liguori Kind regards Axel
Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works
Axel Zeuner wrote: On Saturday 24 March 2007 21:15, Anthony Liguori wrote: Axel Zeuner wrote: Hi, Hi Axel, By adding some GCC4 fixes on top of your patch, I was able to get qemu for i386 (on i386) to compile and run. So far, I've only tested a win2k guest. Hi Anthony, thank you for the test, I like to hear about your success. I have applied your patches, compiled and checked qemu-i386-softmmu on i386 without kqemu with FreeDos. It works also. The big problem (which pbrook helped me with) was GCC4 freaking out over some stq's. Splitting up the 64bit ops into 32bit ops seemed to address most of the problems. The tricky thing I still can't figure out is how to get ASM_SOFTMMU working. The problem is GLUE(st, SUFFIX) function. First GCC cannot deal with the register pressure. The problem I can't seem to fix though is that GCC sticks %1 in %esi because we're only using an r constraint, not a q constraint. This results in the generation of %sib which is an invalid register. However, refactoring the code to not require a q constraint doesn't seem to help either. In the past I made some patches (not published yet) to speed up the helpers for 64 operations in target-i386/helper.c on x86_64 and i386 using gcc inline assembly. x86_64 was really easy, but for i386 I had to use m and =m constraints and as less inputs and outputs as possible. The attached patch is what I have so far. Some help with people more familiar with gcc asm foo would be appreciated! May I suggest some changes? I would like to try not to split the 64 bit accesses on hosts supporting it native, i.e. something like this: === --- cpu-all.h (revision 16) +++ cpu-all.h (working copy) @@ -339,7 +339,13 @@ static inline void stq_le_p(void *ptr, uint64_t v) { -*(uint64_t *)ptr = v; +#if (HOST_LONG_BITS 64) +uint8_t *p = ptr; +stl_le_p(p, (uint32_t)v); +stl_le_p(p + 4, v 32); +#else +*(uint64_t*)ptr = v; +#endif } Yes, I think the proper thing to do is to use a configure check for GCC version to determine whether or not to use the 32 bit or 64 version of stq_le_p. There is already a function in cpu-all.h that does the 32 bit version. Furthermore I think one should move helper_pshufw() from target-i386/helper2.c into target-i386/helper.c where all the other helper methods reside. I moved to helper2.c because AFAICT helper.c is compiled with the same sort of restrictions as op.c which leads to the compile failure. Regards, Anthony Liguori Kind Regards Axel Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] Guest mouse cursor drawing in SDL
andrzej zaborowski wrote: Hi, sorry for late reply. On 21/03/07, Anthony Liguori [EMAIL PROTECTED] wrote: Do you think using the SDL cursor is all that useful? As soon as gtk widgets get involved, the cursor becomes ARGB so in practice, I'm not sure that it's all that helpful. It's pretty cool that you have only one cursor and that it's either in the VM or outside, and it's full hardware drawn. In addition without it, I think you can't get cursor refreshes between the SDL refresh callbacks. So with the patch the cursor movement is really smooth and responsive, eliminating the SDL latency. I find it amazing that it works :p Hrm, interesting. I actually want to rethink the mouse interface again as I've been thinking that even if an absolute mouse isn't available, provided that you know the location of the guest cursor, you can implement a smarter grab/ungrab. I think VNC support is not a reason to drop this couple of lines in sdl.c (cause there's no API changes related). BTW, I've got the ALPHA_CURSOR working with VNC.. it's very sweet :-) Awesome! Are there screenshots? I could make them, but they just show a mouse in the window :-) BTW, have you thought about how to integrate std-vga into the vmware vga device? I took a look at it and it seems like it will be overly complicated based on the way VGA memory dirtying is done now. Regards, Anthony Liguori Regards, Andrew
Re: [Qemu-devel] QEMU + -std-vga + XFree86
Kyle Hubert wrote: Hi, I'm working with QEMU on an XFree86 machine. I was desirous of having 1600x1200 working in the virtual machine, so I looked at using the -std-vga option with the vesa X driver. You probably have to increase your hsync and vrefresh ranges. Regards, Anthony Liguori While initializing X, I get the following: (II) VESA(0): Primary V_BIOS segment is: 0xc000 (II) VESA(0): VESA BIOS detected (II) VESA(0): VESA VBE Version 0.0 (II) VESA(0): VESA VBE Total Mem: 0 kB (II) VESA(0): VESA VBE OEM: ^A The log continues on to say: (II) VESA(0): Total Memory: 0 64KB banks (0kB) (II) VESA(0): Generic Monitor: Using hsync range of 28.00-51.00 kHz (II) VESA(0): Generic Monitor: Using vrefresh range of 43.00-60.00 Hz (II) VESA(0): Not using mode 1600x1200 (no mode of this name) (II) VESA(0): Not using mode 1280x1024 (no mode of this name) (II) VESA(0): Not using mode 1024x768 (no mode of this name) (II) VESA(0): Not using mode 800x600 (no mode of this name) (II) VESA(0): Not using built-in mode 640x480 (insufficient memory given virtual size) I'm not positive this is my problem, but it looks like it isn't reporting the vram size correctly, does this sound correct? Thanks for the time. -Kyle
Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works
Axel Zeuner wrote: Hi Anthony, On Monday 26 March 2007 01:44, you wrote: Axel Zeuner wrote: On Saturday 24 March 2007 21:15, Anthony Liguori wrote: The tricky thing I still can't figure out is how to get ASM_SOFTMMU working. The problem is GLUE(st, SUFFIX) function. First GCC cannot deal with the register pressure. The problem I can't seem to fix though is that GCC sticks %1 in %esi because we're only using an r constraint, not a q constraint. This results in the generation of %sib which is an invalid register. However, refactoring the code to not require a q constraint doesn't seem to help either. Hi Anthony, could you please try the attached patch for softmmu_header.h? Allows compiling with gcc4 and ASM_SOFTMMU. That did the trick. Could you explain what your changes did? QEMU/i386 has only 3 three available registers if TARGET_I386 is selected because ebx,ebp,esi,edi are used by the environment and T0, T1, T3( AKA A0). This makes inline assembly really ugly. The called external C functions in ASM_SOFTMMU are REGPARM(1,2), i.e. require their first arguments in eax, edx. Based on some feedback from Paul Brook, I wrote another patch that just disables the use of register variables for GCC4. I think this is a considerably less hackish way to go about this. The generated code won't be as nice of course but at least it works. The patch applies against your cvtasm patches. Regards, Anthony Liguori In the two ld functions three registers (eax, edx, ecx) are required and destroyed because an external C function may be called. We relax the register pressure a little bit by forcing the return value (res) into eax , because the return value is returned in a destroyed register. Furthermore the called C function returns its value in eax anyway (call %7). The st functions are a little more tricky: we need three registers and the assembly code requires a reload of %0 (ptr) after the check if the external function must be called. In the external function the three remaining registers are destroyed. After the call a need also to reload of %1 (v) into register is needed, i.e. we need more registers. Register saving on the stack does not work, because there exist already 2 m constraints: if the code is compiled with -fomit-frame-pointers these are expressed as offsets relative to %esp, i.e X(%esp) and would become invalid after pushes onto the stack. One solution was to force all inputs to the asm block onto the stack, thats what the replacement of the r constraints into m constraints do: they force a memory reference. Because i386 can not do direct memory memory moves one has to reload m(v) into ecx again, otherwise the generated assembler code is invalid. It must be mentioned, that the generated code is a little bit slower than the original one. Kind Regards Axel Regards, Anthony Liguori Kind regards Axel diff -r d19a5903d749 softmmu_header.h --- a/softmmu_header.h Tue Mar 27 13:23:10 2007 -0500 +++ b/softmmu_header.h Tue Mar 27 13:23:21 2007 -0500 @@ -240,9 +240,13 @@ static inline void glue(glue(st, SUFFIX) 2:\n : : r (ptr), +#ifdef USE_REGISTER_VARIABLES /* NOTE: 'q' would be needed as constraint, but we could not use it with T1 ! */ r (v), +#else + q (v), +#endif i ((CPU_TLB_SIZE - 1) CPU_TLB_ENTRY_BITS), i (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS), i (TARGET_PAGE_MASK | (DATA_SIZE - 1)), diff -r d19a5903d749 target-i386/cpu.h --- a/target-i386/cpu.h Tue Mar 27 13:23:10 2007 -0500 +++ b/target-i386/cpu.h Tue Mar 27 13:23:21 2007 -0500 @@ -26,6 +26,10 @@ #define TARGET_LONG_BITS 64 #else #define TARGET_LONG_BITS 32 +#endif + +#if TARGET_LONG_BITS = HOST_LONG_BITS __GNUC__ 4 +#define USE_REGISTER_VARIABLES #endif /* target supports implicit self modifying code */ @@ -424,7 +428,7 @@ typedef union { #endif typedef struct CPUX86State { -#if TARGET_LONG_BITS HOST_LONG_BITS +#ifndef USE_REGISTER_VARIABLES /* temporaries if we cannot store them in host registers */ target_ulong t0, t1, t2; #endif diff -r d19a5903d749 target-i386/exec.h --- a/target-i386/exec.h Tue Mar 27 13:23:10 2007 -0500 +++ b/target-i386/exec.h Tue Mar 27 13:23:21 2007 -0500 @@ -27,12 +27,16 @@ #define TARGET_LONG_BITS 32 #endif +#if TARGET_LONG_BITS = HOST_LONG_BITS __GNUC__ 4 +#define USE_REGISTER_VARIABLES +#endif + #include cpu-defs.h /* at least 4 register variables are defined */ register struct CPUX86State *env asm(AREG0); -#if TARGET_LONG_BITS HOST_LONG_BITS +#ifndef USE_REGISTER_VARIABLES /* no registers can be used */ #define T0 (env-t0) @@ -88,7 +92,7 @@ register target_ulong EDI asm(AREG11); #define reg_EDI #endif -#endif /* ! (TARGET_LONG_BITS HOST_LONG_BITS) */ +#endif /* ! USE_REGISTER_VARIABLES */ #define A0 T2
Re: [Qemu-devel] [RFC/experimental patch] qemu (x86_64 on x86_64 -no-kqemu) compiles with gcc4 and works
Axel Zeuner wrote: Hi Anthony, On Thursday 29 March 2007 04:07, you wrote: Axel Zeuner wrote: Hi Anthony, On Monday 26 March 2007 01:44, you wrote: Axel Zeuner wrote: On Saturday 24 March 2007 21:15, Anthony Liguori wrote: The tricky thing I still can't figure out is how to get ASM_SOFTMMU working. The problem is GLUE(st, SUFFIX) function. First GCC cannot deal with the register pressure. The problem I can't seem to fix though is that GCC sticks %1 in %esi because we're only using an r constraint, not a q constraint. This results in the generation of %sib which is an invalid register. However, refactoring the code to not require a q constraint doesn't seem to help either. Hi Anthony, could you please try the attached patch for softmmu_header.h? Allows compiling with gcc4 and ASM_SOFTMMU. That did the trick. Could you explain what your changes did? QEMU/i386 has only 3 three available registers if TARGET_I386 is selected because ebx,ebp,esi,edi are used by the environment and T0, T1, T3( AKA A0). This makes inline assembly really ugly. The called external C functions in ASM_SOFTMMU are REGPARM(1,2), i.e. require their first arguments in eax, edx. Based on some feedback from Paul Brook, I wrote another patch that just disables the use of register variables for GCC4. I think this is a considerably less hackish way to go about this. The generated code won't be as nice of course but at least it works. The patch applies against your cvtasm patches. Looks good to me, sorry I had no time yet to test your patch. Did you check the performance impact of your changes? Perhaps it is possible to use register variables in dependence of the register count of the host processor. Yes, I need to update the patch to include a defined(__i386__) and also to add the proper guards to the other architectures. Regards, Anthony Liguori Kind Regards Axel
Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding
Ramesh Dharan wrote: Anthony, I have a detailed response to your earlier e-mail but I wanted to handle this discussion separately. I implemented new client-server messages for the (1) and (2), and I should have read more carefully. This means that you're not a compliant RFB server ATM. I'm not sure I follow you. We support standard VNC clients, and we don't violate the protocol expectations of any standard VNC clients who talk to us. However, if a client *happens* to send us these new client-server messages (which don't exist in the standard spec), then we support them. We never send a message to a client that the client doesn't know how to handle. Right, but how does your client determine that the server supports these new messages? The proper way to use new client message types (which is now described in the RFB spec) is to advertise a new pseudo-encoding for that client message type and wait for the server to send the pseudo-encoding back to the client. That lets the client know that it is safe to use the new client message type. This is what I'm using for my shared memory encoding. Otherwise, there's no way to write a client that works with the enhanced server and a normal VNC server. I guess that means there isn't a point registering the pseudo-encodings you are currently using since you have to change the server/client anyway. It's easier anyway since pseudo-encodings are supposed to be negative numbers and you're using positive numbers. No, there's still a point. The display path and input path are essentially independent. Better input handling requires extending the client-server path (so the client can send new kinds of data other than the standard VNC keyboard and pointer events). Unfortunately, extension of the protocol in this direction was not baked into the original design. I think that basically we should actually extend the RFB protocol itself to just have a server-client message, SetSupportedClientMessages, which works exactly the way SetEncodings works today, i.e. the server can send down to the client a list of the messages which it can handle, and clients should not send message types to the server if the server doesn't support them. The mechanism I described above is what the current preferred method is. If you want, we can bring the topic up with the VNC authors as AFAIK I'm the only person with a reserved client message type. Of course, I think using a pseudo-encoding is a perfectly suitable way to address this problem. Regards, Anthony Liguori -- Ramesh
Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding
it should start decoding pixels differently. In practice, most VNC servers do not support arbitrary pixel formats too so it makes a lot of sense for server's to have this message available to them. (5) No notion of multiplexing displays on a single port. Right. ATM, I don't really plan on addressing this. So, the old Connectix Virtual Server product (which was in beta when Microsoft snapped them up) had a super slick way of dealing with this, that we actually joked about doing but never seriously thought anyone would do. They rendered an entire picker UI using VNC primitives. The UI would show live thumbnails of each VM, and then when you click on one of the tiles, you'd get redirected to that console. You couldn't go back out to the picker, but still, it was pretty neat. I've since looked into doing the same thing, but the closest thing to any VNC UI/widget rendering library that I've found is TclRFB (which, as the name would imply, is written in Tcl). Anyway, what you really want is a way to say you've connected to a multi-home server, here's a list of VMs available (with arbitrary or perhaps structured metadata attached to each VM tag), which one do you want?. And then have the ability for the client ot pick. But you can't really do this sanely with just extensions, I think this is a good candidate for actually extending the base protocol. I have seen KVM switches that use VNC implement an extension that is basically, SetName. They then use a client encoding to switch the KVM. This is definitely an interesting area to pursue. The idea is that on localhost, the server can render directly to a shared memory segment that the client also maps. If the one is smart about how this shared memory segment is chosen, it can be used as an XShmImage. Yeah we did the exact same thing, with a slight twist. We get the server to allocate the SHM segment, and the client just attaches to it. If you do it this way, you don't need a new client-server message, the server just sends the client the shm id if the client supports the SharedFB extension (that's what we named it). I chose to have the client allocate the memory since it knew the X server's line size which is required to allocate an XShmImage. My first implementation had the server allocate the buffer but required the client to tell the server what the line size should be. There's some complexity involved with how the server can actually figure out whether the client is on the same machine (e.g. if the client is tunneling over SSH, this is not easy to do). So in fact what we do is just if the client supports SharedFBInfo and SharedFBRect, the server just sends along the messages. The client then tries to map the segment described in SharedFBInfo. If it fails, it just sends a new encoding set with those encodings removed, and the server starts sending real pixel data instead. I've thought about this problem too. My fear is that one could accidentally map a shared memory area that's actually valid. Yes, this means that we always try and fail sharedfb so there's a couple extra roundtrips before remote clients actually get real pixel data. This was deemed acceptable (by me ;-) for our use cases since most networks are fast enough. But if this is too hackish for your taste we could bake in something better. I really haven't thought of a better solution myself. That doesn't mean one exists though. I believe http://wiki.multimedia.cx/index.php?title=VMNC covers some of your extensions right? Have you gotten the appropriate encodings reserved in the RFB spec? I don't want to support any encodings that aren't properly registered so this would be an important first step. So once upon a time, we tried to register encodings, but we didn't get very far. It sounds like perhaps the process has improved since then. I will try to get the ball rolling on this again. I can help you with that if you want. It wasn't a painful process at all for me. I'm certainly happy to work toward a common set of VNC extensions to cover virtualization. It's in everyones benefit to ensure that the largest number of clients Just Work out of the box. Agreed. I look forward to hearing more of your thoughts on this subject. Same here! Thanks for the response. Regards, Anthony Liguori -- Ramesh
Re: [Qemu-devel] [PATCH] Support VNC PointerTypeChange psuedo-encoding
Ramesh Dharan wrote: The proper way to use new client message types (which is now described in the RFB spec) is to advertise a new pseudo-encoding for that client message type and wait for the server to send the pseudo-encoding back to the client. That lets the client know that it is safe to use the new client message type. This is what I'm using for my shared memory encoding. Otherwise, there's no way to write a client that works with the enhanced server and a normal VNC server. Ok, yeah that makes sense. So yeah basically we would need to add new server encodings for our client-server messages, and then you get the server to send dummy ack messages for each one to say yes, I understand this message type? Yup. The mechanism I described above is what the current preferred method is. If you want, we can bring the topic up with the VNC authors as AFAIK I'm the only person with a reserved client message type. Of course, I think using a pseudo-encoding is a perfectly suitable way to address this problem. Yeah the only problem I see with it is that I don't see how the server can dynamically *change* the set of client messages that it accepts? To do it in general? Yeah, I don't think there's a solution. Of course, a SetServerEncodings would introduce a race. What does the server do if it receives one of the new special client messages after sending the SetServerEncodings message (but before the client receives the SetServerEncodings message)? I think it's easier to just add something to the psuedo-encoding to allow client messages that make sense to disable to be disabled on a case-by-case basis. Regards, Anthony Liguori -- Ramesh
Re: [Qemu-devel] Patch: dyngen-exec.h for OpenBSD
Paul Brook wrote: On Thursday 05 April 2007 23:12, Todd T. Fries wrote: Penned by Thiemo Seufer on 20070402 10:54.53, we have: | /* NOTE: standard headers should be used with special care at this | point because host CPU registers are used as global variables. Some | host headers do not allow that. */ | #include stddef.h | - | +#ifdef __OpenBSD__ | +#include sys/types.h Hello? Portability? sys/types.h defines these types portably. Doing so the way this code does it, is not portable. If you want portability you should be including stdint.h (or inttypes.h for old, broken systems). I thought I'd add that this isn't just portability. stdint.h is what C99 mandates although as Paul mentions, some older systems used inttypes.h. Regards, Anthony Liguori Why is it that qemu knows what the definition of these prototypes are on all systems without consulting the header files. I have a better idea, lets let the header files define the prototypes. Who would have though of that? See the big NOTE: comment above. dyngen is inherently unportable. Paul
Re: [Qemu-devel] Feature proposal: USB devices over TCP
Eduardo Felipe wrote: Hi everybody, I think that a useful feature for QEMU would be to expose the USB interface through TCP. Hi Eduardo, I don't know all that much about USB, but I think the most useful way to do this would be to do something that's protocol compatible with USBIP. We could then tunnel this traffic over VNC and allow for exposing local USB devices to a remote VM. Think of a virtual desktop being hosted on a server and exposed on a thin client. If you could plug in your iPod and it would just work with the VM, that would be an exceedingly cool feature. Are you familiar with USBIP? If so, does this sound reasonable? Regards, Anthony Liguori It would allow quick USB device development in high level languages without recompiling QEMU. We could have an instance of QEMU running all the time while we create our device and hot plug/unplug it whenever we want to. This could also attract people interested in hardware emulation, but scared of learning QEMU internals just to create a simple new device. I think USB is quite suited for this, as it is designed for pluggable external devices, but something similar could be made for serial and parallel devices too. The attached patch is a quick hack derived from the VNC server just to show the idea, not intended for commiting. A dummy protocol is used for message interchange between server and client. It adds the new command line option: -usbtcp port It starts a socket listening on port for incoming connections. A sample USB mouse in python is also provided that moves the cursor in circles. Would such a feature be of any interest for QEMU? Regards, Eduardo Felipe
Re: [Qemu-devel] Fixing ACPI for Vista -- Source code for bios.bin vs. bochs CVS?
Paul Brook wrote: On Monday 30 April 2007, Nakajima, Jun wrote: We found a way to get 32-bit Vista up on KVM by fixing bios.bin. And I think it should be helpful to qemu as well. However, I'm not sure if the binary file has been created simply by applying bios.diff to the latest bochs CVS tree (actually I doubt it). What makes you think this won't work? Have you tried it? This has worked for me in the past. Regards, Anthony Liguori Paul
Re: [Qemu-devel] stale pid files?
Axel Kittenberger wrote: Hi list again, another suggestion. I use the deamons pidfile feature to write pidfiles so I can controll the virtual machines with /etc/init.d/bla scripts, to e.g. auto spawn the guests when the server starts, auto/start/stop with runlevels and so on.. However I got following issue, the daemon wont start if there is a pidfile already (yes no problem for user to delete, but for auto spawning after unattended boot) . Since e.g. following situation: BANG! Powerout! System goes up when power is available again. The init scripts want to start the qemu/kvm deamons, but they refuse because these see their pidfiles already from pre-powerout.. -pidfile behaves properly in CVS. You may be able to find the patch on the ML that fixed it. Regards, Anthony Liguori I have seen other daemons somehow locking their pidfiles while running, and when killed -9ed, or poweroutes and they start again, they see their pid-file is unlocked, thus stale, and just overwrite it... only notifing you with a message they did so. Kind Regards, Axel
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/07/2010 01:08 PM, Yehuda Sadeh Weinraub wrote: On Thu, Oct 7, 2010 at 7:12 AM, Anthony Liguorianth...@codemonkey.ws wrote: On 08/03/2010 03:14 PM, Christian Brunner wrote: +#include qemu-common.h +#include qemu-error.h +#includesys/types.h +#includestdbool.h + +#includeqemu-common.h This looks to be unnecessary. Generally, system includes shouldn't be required so all of these should go away except rado/librados.h Removed. + +#include rbd_types.h +#include module.h +#include block_int.h + +#includestdio.h +#includestdlib.h +#includerados/librados.h + +#includesignal.h + + +int eventfd(unsigned int initval, int flags); This is not quite right. Depending on eventfd is curious but in the very least, you need to detect the presence of eventfd in configure and provide a wrapper that redefines it as necessary. Can fix that, though please see my later remarks. +static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc) +{ +uint32_t len = strlen(name); +/* total_len = encoding op + name + empty buffer */ +uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t); +char *desc = NULL; char is the wrong type to use here as it may be signed or unsigned. That can have weird effects with binary data when you're directly manipulating it. Well, I can change it to uint8_t, so that it matches the op type, but that'll require adding some other castings. In any case, you usually get such a weird behavior when you cast to types of different sizes and have the sign bit padded which is not the case in here. + +desc = qemu_malloc(total_len); + +*tmap_desc = desc; + +*desc = op; +desc++; +memcpy(desc,len, sizeof(len)); +desc += sizeof(len); +memcpy(desc, name, len); +desc += len; +len = 0; +memcpy(desc,len, sizeof(len)); +desc += sizeof(len); Shouldn't endianness be a concern? Right. Fixed that. + +return desc - *tmap_desc; +} + +static void free_tmap_op(char *tmap_desc) +{ +qemu_free(tmap_desc); +} + +static int rbd_register_image(rados_pool_t pool, const char *name) +{ +char *tmap_desc; +const char *dir = RBD_DIRECTORY; +int ret; + +ret = create_tmap_op(CEPH_OSD_TMAP_SET, name,tmap_desc); +if (ret0) { +return ret; +} + +ret = rados_tmap_update(pool, dir, tmap_desc, ret); +free_tmap_op(tmap_desc); + +return ret; +} This ops are all synchronous? IOW, rados_tmap_update() call blocks until the operation is completed? Yeah. And this is only called from the rbd_create() callback. +header_snap += strlen(header_snap) + 1; +if (header_snapend) +error_report(bad header, snapshot list broken); Missing curly braces here. Fixed. +if (strncmp(hbuf + 68, RBD_HEADER_VERSION, 8)) { +error_report(Unknown image version %s, hbuf + 68); +r = -EMEDIUMTYPE; +goto failed; +} + +RbdHeader1 *header; Don't mix variable definitions with code. Fixed. +s-efd = eventfd(0, 0); +if (s-efd0) { +error_report(error opening eventfd); +goto failed; +} +fcntl(s-efd, F_SETFL, O_NONBLOCK); +qemu_aio_set_fd_handler(s-efd, rbd_aio_completion_cb, NULL, +rbd_aio_flush_cb, NULL, s); It looks like you just use the eventfd to signal aio completion callbacks. A better way to do this would be to schedule a bottom half. eventfds are Linux specific and specific to recent kernels. Digging back why we introduced the eventfd, it was due to some issues seen with do_savevm() hangs on qemu_aio_flush(). The reason seemed that we had no fd associated with the block device, which seemed to not work well with the qemu aio model. If that assumption is wrong, we'd be happy to change it. In any case, there are other more portable ways to generate fds, so if it's needed we can do that. There's no fd at all? How do you get notifications about an asynchronous event completion? Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/07/2010 01:41 PM, Yehuda Sadeh Weinraub wrote: On Thu, Oct 7, 2010 at 11:38 AM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/07/2010 01:08 PM, Yehuda Sadeh Weinraub wrote: On Thu, Oct 7, 2010 at 7:12 AM, Anthony Liguorianth...@codemonkey.ws wrote: On 08/03/2010 03:14 PM, Christian Brunner wrote: +#include qemu-common.h +#include qemu-error.h +#includesys/types.h +#includestdbool.h + +#includeqemu-common.h This looks to be unnecessary. Generally, system includes shouldn't be required so all of these should go away except rado/librados.h Removed. + +#include rbd_types.h +#include module.h +#include block_int.h + +#includestdio.h +#includestdlib.h +#includerados/librados.h + +#includesignal.h + + +int eventfd(unsigned int initval, int flags); This is not quite right. Depending on eventfd is curious but in the very least, you need to detect the presence of eventfd in configure and provide a wrapper that redefines it as necessary. Can fix that, though please see my later remarks. +static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc) +{ +uint32_t len = strlen(name); +/* total_len = encoding op + name + empty buffer */ +uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t); +char *desc = NULL; char is the wrong type to use here as it may be signed or unsigned. That can have weird effects with binary data when you're directly manipulating it. Well, I can change it to uint8_t, so that it matches the op type, but that'll require adding some other castings. In any case, you usually get such a weird behavior when you cast to types of different sizes and have the sign bit padded which is not the case in here. + +desc = qemu_malloc(total_len); + +*tmap_desc = desc; + +*desc = op; +desc++; +memcpy(desc,len, sizeof(len)); +desc += sizeof(len); +memcpy(desc, name, len); +desc += len; +len = 0; +memcpy(desc,len, sizeof(len)); +desc += sizeof(len); Shouldn't endianness be a concern? Right. Fixed that. + +return desc - *tmap_desc; +} + +static void free_tmap_op(char *tmap_desc) +{ +qemu_free(tmap_desc); +} + +static int rbd_register_image(rados_pool_t pool, const char *name) +{ +char *tmap_desc; +const char *dir = RBD_DIRECTORY; +int ret; + +ret = create_tmap_op(CEPH_OSD_TMAP_SET, name,tmap_desc); +if (ret 0) { +return ret; +} + +ret = rados_tmap_update(pool, dir, tmap_desc, ret); +free_tmap_op(tmap_desc); + +return ret; +} This ops are all synchronous? IOW, rados_tmap_update() call blocks until the operation is completed? Yeah. And this is only called from the rbd_create() callback. +header_snap += strlen(header_snap) + 1; +if (header_snap end) +error_report(bad header, snapshot list broken); Missing curly braces here. Fixed. +if (strncmp(hbuf + 68, RBD_HEADER_VERSION, 8)) { +error_report(Unknown image version %s, hbuf + 68); +r = -EMEDIUMTYPE; +goto failed; +} + +RbdHeader1 *header; Don't mix variable definitions with code. Fixed. +s-efd = eventfd(0, 0); +if (s-efd 0) { +error_report(error opening eventfd); +goto failed; +} +fcntl(s-efd, F_SETFL, O_NONBLOCK); +qemu_aio_set_fd_handler(s-efd, rbd_aio_completion_cb, NULL, +rbd_aio_flush_cb, NULL, s); It looks like you just use the eventfd to signal aio completion callbacks. A better way to do this would be to schedule a bottom half. eventfds are Linux specific and specific to recent kernels. Digging back why we introduced the eventfd, it was due to some issues seen with do_savevm() hangs on qemu_aio_flush(). The reason seemed that we had no fd associated with the block device, which seemed to not work well with the qemu aio model. If that assumption is wrong, we'd be happy to change it. In any case, there are other more portable ways to generate fds, so if it's needed we can do that. There's no fd at all? How do you get notifications about an asynchronous event completion? Regards, Anthony Liguori (resending to list, sorry) The fd is hidden deep under in librados. We get callback notifications for events completion. How is that possible? Are the callbacks delivered in the context of a different thread? If so, don't you need locking? Regards, Anthony Liguori Thanks, Yehuda
Re: [Qemu-devel] [PATCH 2/3] vnc: support password expire
On 10/07/2010 06:15 AM, Gerd Hoffmann wrote: This patch adds support for expiring passwords to vnc. It adds a new lifetime parameter to the vnc_display_password() function, which specifies the number of seconds the new password will be valid. Passing zero as lifetime maintains current behavior (password never expires). Signed-off-by: Gerd Hoffmannkra...@redhat.com This has been posted before and I've never understood it. Why can't a management tool just expire passwords on it's own? How does password expiration help with security at all? Regards, Anthony Liguori --- console.h |2 +- monitor.c |3 +-- ui/vnc.c | 15 ++- ui/vnc.h |1 + 4 files changed, 17 insertions(+), 4 deletions(-) diff --git a/console.h b/console.h index aafb031..24670e5 100644 --- a/console.h +++ b/console.h @@ -368,7 +368,7 @@ void cocoa_display_init(DisplayState *ds, int full_screen); void vnc_display_init(DisplayState *ds); void vnc_display_close(DisplayState *ds); int vnc_display_open(DisplayState *ds, const char *display); -int vnc_display_password(DisplayState *ds, const char *password); +int vnc_display_password(DisplayState *ds, const char *password, int lifetime); void do_info_vnc_print(Monitor *mon, const QObject *data); void do_info_vnc(Monitor *mon, QObject **ret_data); char *vnc_display_local_addr(DisplayState *ds); diff --git a/monitor.c b/monitor.c index fbb678d..d82eb9e 100644 --- a/monitor.c +++ b/monitor.c @@ -966,11 +966,10 @@ static int do_quit(Monitor *mon, const QDict *qdict, QObject **ret_data) static int change_vnc_password(const char *password) { -if (vnc_display_password(NULL, password) 0) { +if (vnc_display_password(NULL, password, 0) 0) { qerror_report(QERR_SET_PASSWD_FAILED); return -1; } - return 0; } diff --git a/ui/vnc.c b/ui/vnc.c index 1ef0fc5..51aa9ca 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -2078,11 +2078,19 @@ static int protocol_client_auth_vnc(VncState *vs, uint8_t *data, size_t len) unsigned char response[VNC_AUTH_CHALLENGE_SIZE]; int i, j, pwlen; unsigned char key[8]; +time_t now; if (!vs-vd-password || !vs-vd-password[0]) { VNC_DEBUG(No password configured on server); goto reject; } +if (vs-vd-expires) { +time(now); +if (vs-vd-expires now) { +VNC_DEBUG(Password is expired); +goto reject; +} +} memcpy(response, vs-challenge, VNC_AUTH_CHALLENGE_SIZE); @@ -2474,7 +2482,7 @@ void vnc_display_close(DisplayState *ds) #endif } -int vnc_display_password(DisplayState *ds, const char *password) +int vnc_display_password(DisplayState *ds, const char *password, int lifetime) { VncDisplay *vs = ds ? (VncDisplay *)ds-opaque : vnc_display; @@ -2492,6 +2500,11 @@ int vnc_display_password(DisplayState *ds, const char *password) if (vs-auth == VNC_AUTH_NONE) { vs-auth = VNC_AUTH_VNC; } +if (lifetime) { +vs-expires = time(NULL) + lifetime; +} else { +vs-expires = 0; +} } else { vs-auth = VNC_AUTH_NONE; } diff --git a/ui/vnc.h b/ui/vnc.h index 9619b24..4f895be 100644 --- a/ui/vnc.h +++ b/ui/vnc.h @@ -120,6 +120,7 @@ struct VncDisplay char *display; char *password; +time_t expires; int auth; bool lossy; #ifdef CONFIG_VNC_TLS
Re: [Qemu-devel] [PATCH 0/2] USB CCID device
On 10/07/2010 02:39 AM, Alon Levy wrote: - Anthony Liguorianth...@codemonkey.ws wrote: On 10/06/2010 03:55 AM, Gerd Hoffmann wrote: On 10/06/10 02:28, Alon Levy wrote: Does this work with live migration? I can't see how it would. No, it doesn't right now. It would require cooperation with the client, to tell it to reconnect to the target qemu (kind of like spice). I think until we have this migration should have pretty much the same effect as a chardev disconnect, i.e. detach the usb device (which the guest will see as unplug). Better yet, mark the guest as unmigrateable and let the management tool unplug the usb device before migration and replug it after migration. It's the same principle behind device assignment. Is there any way to also get a pre_migrate callback with register_device_unmigratable? I'd like to send a VSC_Reconnect message, then the guest sees an unplug, then migration, No. The disconnect needs to happen in the management tooling layer. Same is true for any device doing hardware passthrough. Automagic unplugging/plugging during migration is not a good idea universally so it's something that needs to happen as a policy at the management level. Regards, Anthony Liguori then (no plug yet since the device is marked as auto_attach=0) client reconnects (actually this happens before but to a paused machine waiting for migration), and then causes attachement, same as a new machine. Regards, Anthony Liguori Needs some code though, at minimum you'll have to xfer the connected state from the migration source and have some bits in post_load() which do attach/detach if needed. cheers, Gerd
Re: [Qemu-devel] [PATCH 0/2] USB CCID device
On 10/06/2010 05:12 PM, Alon Levy wrote: Actually, both are possible - but the later is the interesting use case (the former is mainly for debugging). To elaborate: the device is meant to allow a hardware reader to be available to the guest while still being available to the client (which is running on the computer with the real reader attached). So the real card is what we are talking to. The other usage is to have a virtual card, which would actually be more logical to put with the qemu device, but It isn't my current focus (the focus being the real card, and the virtual card being implemented in the client side is a testing measure). Okay, that makes sense. I'll do this. A side note: I tried migrating a QSIMPLEQ today - not a fun experience. I'm wondering if this is a result of a mentality that all devices should allocate memory upfront that makes using / migrating dynamic linked lists a non-starter, or just an oversight (no one wrote VMSTATE_QSIMPLEQ yet). As it stands migrating a QSIMPLEQ (or any other list) is much easier the old way without using VMSTATE. /rant. Yeah, complex types are not at all easy to migrate today. Something we need to tackle with vmstate2 eventually. Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote: How is that possible? Are the callbacks delivered in the context of a different thread? If so, don't you need locking? Not sure I'm completely following you. The callbacks are delivered in the context of a different thread, but won't run concurrently. Concurrently to what? How do you prevent them from running concurrently with qemu? If you saw lock ups, I bet that's what it was from. Regards, Anthony Liguori Do you see any specific concurrency issue? We can add some mutex protection around at the aio callback, so that if librados turns multithreaded at this point we're covered. Thanks, Yehuda
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/07/2010 04:49 PM, Yehuda Sadeh Weinraub wrote: On Thu, Oct 7, 2010 at 2:04 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote: How is that possible? Are the callbacks delivered in the context of a different thread? If so, don't you need locking? Not sure I'm completely following you. The callbacks are delivered in the context of a different thread, but won't run concurrently. Concurrently to what? How do you prevent them from running concurrently with qemu? There are two types of callbacks. The first is for rados aio completions, and the second one is the one added later for the fd glue layer. This is a bad architecture for something like qemu. You could create a pipe and use the pipe to signal to qemu. Same principle as eventfd. Ideally, you would do this in the library itself. Regards, Anthony Liguori The first callback, called by librados whenever aio completes, runs in the context of a single librados thread: +static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) +{ +RBDAIOCB *acb = rcb-acb; rcb is per a single aio. Was created before and will be destroyed here, whereas acb is shared between a few aios, however, it was generated before the first aio was created. +int64_t r; +uint64_t buf = 1; +int i; + +acb-aiocnt--; acb-aiocnt has been set before initiating all the aios, so it's ok to touch it now. Same goes to all acb fields. +r = rados_aio_get_return_value(c); +rados_aio_release(c); +if (acb-write) { +if (r 0) { +acb-ret = r; +acb-error = 1; +} else if (!acb-error) { +acb-ret += rcb-segsize; +} +} else { +if (r == -ENOENT) { +memset(rcb-buf, 0, rcb-segsize); +if (!acb-error) { +acb-ret += rcb-segsize; +} +} else if (r 0) { +acb-ret = r; +acb-error = 1; +} else if (r rcb-segsize) { +memset(rcb-buf + r, 0, rcb-segsize - r); +if (!acb-error) { +acb-ret += rcb-segsize; +} +} else if (!acb-error) { +acb-ret += r; +} +} +if (write(acb-s-efd,buf, sizeof(buf)) 0) This will wake up the io_read() +error_report(failed writing to acb-s-efd\n); +qemu_free(rcb); +i = 0; +if (!acb-aiocnt acb-bh) { +qemu_bh_schedule(acb-bh); This is the only qemu related call in here, seems safe to call it. +} +} The scheduled bh function will be called only after all aios that relate to this specific aio set are done, so the following seems ok, as there's no more acb references. +static void rbd_aio_bh_cb(void *opaque) +{ +RBDAIOCB *acb = opaque; +uint64_t buf = 1; + +if (!acb-write) { +qemu_iovec_from_buffer(acb-qiov, acb-bounce, acb-qiov-size); +} +qemu_vfree(acb-bounce); +acb-common.cb(acb-common.opaque, (acb-ret 0 ? 0 : acb-ret)); +qemu_bh_delete(acb-bh); +acb-bh = NULL; + +if (write(acb-s-efd,buf, sizeof(buf)) 0) +error_report(failed writing to acb-s-efd\n); +qemu_aio_release(acb); +} Now, the second ones are the io_read(), in which we have our glue fd. We send uint64 per each completed io +static void rbd_aio_completion_cb(void *opaque) +{ +BDRVRBDState *s = opaque; + +uint64_t val; +ssize_t ret; + +do { +if ((ret = read(s-efd,val, sizeof(val))) 0) { +s-qemu_aio_count -= val; There is an issue here with s-qemu_aio_count which needs to be protected by a mutex. Other than that, it just reads from s-efd. + } +} while (ret 0 errno == EINTR); + +return; +} + +static int rbd_aio_flush_cb(void *opaque) +{ +BDRVRBDState *s = opaque; + +return (s-qemu_aio_count 0); Same here as with the previous one, needs a mutex around s-qemu_aio_count. +} If you saw lock ups, I bet that's what it was from. As I explained before, before introducing the fd glue layer, the lack of fd associated with our block device caused that there was no way for qemu to check whether all aios were flushed or not, which didn't work well when doing migration/savevm. Thanks, Yehuda
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/07/2010 05:45 PM, Sage Weil wrote: On Thu, 7 Oct 2010, Anthony Liguori wrote: On 10/07/2010 04:49 PM, Yehuda Sadeh Weinraub wrote: On Thu, Oct 7, 2010 at 2:04 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote: How is that possible? Are the callbacks delivered in the context of a different thread? If so, don't you need locking? Not sure I'm completely following you. The callbacks are delivered in the context of a different thread, but won't run concurrently. Concurrently to what? How do you prevent them from running concurrently with qemu? There are two types of callbacks. The first is for rados aio completions, and the second one is the one added later for the fd glue layer. This is a bad architecture for something like qemu. You could create a pipe and use the pipe to signal to qemu. Same principle as eventfd. Ideally, you would do this in the library itself. I'm sorry, I'm having a hard time understanding what it is you're objecting to, or what you would prefer, as there are two different things we're talking about here (callbacks and fd glue/pipes). (Please bear with me as I am not a qemu expert!) The first is the aio completion. You said a few messages back: It looks like you just use the eventfd to signal aio completion callbacks. A better way to do this would be to schedule a bottom half. This is what we're doing. The librados makes a callback to rbd.c's rbd_finish_aiocb(), which updates some internal rbd accounting and then calls qemu_bh_schedule(). Is that part right? No. You're calling qemu_bh_schedule() in a separate thread in parallel to other operations. That's absolutely not safe. The second part is an fd (currently created via eventfd(), but I don't think it matters where it comes from) that was later added because qemu_aio_flush() wouldn't trigger when our aio's completed (and scheduled the bottom halves). This was proposed by Simone Gotti, who had problems with live migration: http://www.mail-archive.com/qemu-devel@nongnu.org/msg35516.html Apparently calling the bottom half isn't sufficient to wake up a blocked qemu_aio_flush()? His solution was to create an eventfd() fd, write a word to it in the aio completion callback (before we schedule the bh), and add the necessary callbacks to make qemu_aio_flush() behave. Is the problem simply that we should be using pipe(2) instead of eventfd(2)? So far I've heard that we should be scheduling the bottom halves (we are), and we should be using a pipe to signal qemu (we're using an fd created by eventfd(2)). Your fundamental problem is your use of threads. QEMU is single threaded. You cannot call into QEMU code from another thread without introducing locking. Any other solution is going to be intrinsically broken. There are two possibilities to fix this: 1) You can change your library interface so that it doesn't generate callbacks via threads. That would be my preference because I think it's a bad interface but it's your library so it's not really my choice :-) 2) You can limit the callbacks to doing nothing other than writing to a file descriptor. You then read the file descriptor somewhere else in the normal QEMU code and you can use the file descriptor to get signals. If you're passing data to callbacks, it's much harder because you're going to have to store that data somewhere and inevitably require locking. The complexity of (2) is why I think thread-based callbacks is such a bad interface. Regards, Anthony Liguori Thanks, sage Regards, Anthony Liguori The first callback, called by librados whenever aio completes, runs in the context of a single librados thread: +static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) +{ +RBDAIOCB *acb = rcb-acb; rcb is per a single aio. Was created before and will be destroyed here, whereas acb is shared between a few aios, however, it was generated before the first aio was created. +int64_t r; +uint64_t buf = 1; +int i; + +acb-aiocnt--; acb-aiocnt has been set before initiating all the aios, so it's ok to touch it now. Same goes to all acb fields. +r = rados_aio_get_return_value(c); +rados_aio_release(c); +if (acb-write) { +if (r 0) { +acb-ret = r; +acb-error = 1; +} else if (!acb-error) { +acb-ret += rcb-segsize; +} +} else { +if (r == -ENOENT) { +memset(rcb-buf, 0, rcb-segsize); +if (!acb-error) { +acb-ret += rcb-segsize; +} +} else if (r 0) { +acb-ret = r; +acb-error = 1; +} else if (r rcb-segsize) { +memset(rcb-buf + r, 0, rcb-segsize - r); +if (!acb-error) { +acb-ret += rcb-segsize
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
On 10/08/2010 10:50 AM, Yehuda Sadeh Weinraub wrote: Oh, that makes it more clean. Considering that we did it for kvm, and looking at the kvm qemu_bh_schedule() implementation, it does look thread safe (there might be an issue though with canceling the bh though, haven't looked at it, not really relevant). It's definitely not thread safe. Even though you can set the flag atomically (not guaranteed, but assume you can), we rely on the fact that we can check for pending BHs before entering sleep without having to worry about new BHs being scheduled in between the sleep and the check. If you schedule a BH in a thread then you open yourself up to the race. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH v2 1/7] qcow2: Make get_bits_from_size() common
On 10/08/2010 10:48 AM, Stefan Hajnoczi wrote: The get_bits_from_size() calculates the log base-2 of a number. This is useful in bit manipulation code working with power-of-2s. Currently used by qcow2 and needed by qed in a follow-on patch. Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com Acked-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- block/qcow2.c | 22 -- cutils.c | 18 ++ qemu-common.h |1 + 3 files changed, 19 insertions(+), 22 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index ee3481b..6e25812 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -794,28 +794,6 @@ static int qcow2_change_backing_file(BlockDriverState *bs, return qcow2_update_ext_header(bs, backing_file, backing_fmt); } -static int get_bits_from_size(size_t size) -{ -int res = 0; - -if (size == 0) { -return -1; -} - -while (size != 1) { -/* Not a power of two */ -if (size 1) { -return -1; -} - -size= 1; -res++; -} - -return res; -} - - static int preallocate(BlockDriverState *bs) { uint64_t nb_sectors; diff --git a/cutils.c b/cutils.c index 5883737..6c32198 100644 --- a/cutils.c +++ b/cutils.c @@ -283,3 +283,21 @@ int fcntl_setfl(int fd, int flag) } #endif +/** + * Get the number of bits for a power of 2 + * + * The following is true for powers of 2: + * n == 1 get_bits_from_size(n) + */ +int get_bits_from_size(size_t size) +{ +if (size == 0 || (size (size - 1))) { +return -1; +} + +#if defined(_WIN32) defined(__x86_64__) +return __builtin_ctzll(size); +#else +return __builtin_ctzl(size); +#endif +} diff --git a/qemu-common.h b/qemu-common.h index 81aafa0..e0ca398 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -153,6 +153,7 @@ time_t mktimegm(struct tm *tm); int qemu_fls(int i); int qemu_fdatasync(int fd); int fcntl_setfl(int fd, int flag); +int get_bits_from_size(size_t size); /* path.c */ void init_paths(const char *prefix);
Re: [Qemu-devel] [PATCH] monitor: add usb_detach
How is this different than usb_del? Is it that it detaches it but does not delete the device? Regards, Anthony Liguori On 10/05/2010 09:40 AM, Alon Levy wrote: Signed-off-by: Alon Levyal...@redhat.com --- qemu-monitor.hx | 17 + sysemu.h|1 + vl.c| 41 + 3 files changed, 59 insertions(+), 0 deletions(-) diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 49bcd8d..9c792a9 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -711,6 +711,23 @@ command @code{info usb} to see the devices you can remove. ETEXI { +.name = usb_detach, +.args_type = devname:s, +.params = device, +.help = detach USB device 'bus.addr', +.mhandler.cmd = do_usb_detach, +}, + +STEXI +...@item usb_detach @var{devname} +...@findex usb_detach + +Detach the USB device @var{devname} from the QEMU virtual USB +hub. @var{devname} has the syntax @code{bus.addr}. Use the monitor +command @code{info usb} to see the devices you can detach. +ETEXI + +{ .name = device_add, .args_type = device:O, .params = driver[,prop=value][,...], diff --git a/sysemu.h b/sysemu.h index a1f6466..ac68863 100644 --- a/sysemu.h +++ b/sysemu.h @@ -183,6 +183,7 @@ extern struct soundhw soundhw[]; void do_usb_add(Monitor *mon, const QDict *qdict); void do_usb_del(Monitor *mon, const QDict *qdict); +void do_usb_detach(Monitor *mon, const QDict *qdict); void usb_info(Monitor *mon); void rtc_change_mon_event(struct tm *tm); diff --git a/vl.c b/vl.c index d352d18..6cfa009 100644 --- a/vl.c +++ b/vl.c @@ -891,6 +891,47 @@ void do_usb_del(Monitor *mon, const QDict *qdict) } } +static USBDevice *usb_device_from_bus_dot_addr(const char *devname) +{ +int bus_num, addr; +const char *p; +USBBus *bus; +USBPort *port; + +if (!usb_enabled) { +return NULL; +} +p = strchr(devname, '.'); +if (!p) { +return NULL; +} +bus_num = strtoul(devname, NULL, 0); +addr = strtoul(p + 1, NULL, 0); +bus = usb_bus_find(bus_num); +if (!bus) { +return NULL; +} +QTAILQ_FOREACH(port,bus-used, next) { +if (port-dev-addr == addr) { +break; +} +} +if (!port) { +return NULL; +} +return port-dev; +} + +void do_usb_detach(Monitor *mon, const QDict *qdict) +{ +const char *devname = qdict_get_str(qdict, devname); +USBDevice *dev = usb_device_from_bus_dot_addr(devname); + +if (dev == NULL || usb_device_detach(dev) 0) { +error_report(could not detach USB device '%s', devname); +} +} + /***/ /* PCMCIA/Cardbus */
[Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X
A lot of people copy libvirt's command line from ps -ef and then wonder why the VM isn't working correctly. Let's be kind and tell them what they should do instead. Without this patch, if you run with an invalid -net tap,fd=X, the guest still runs and we poll 100% on a bad file descriptor. With this patch, you get: qemu: -net tap,fd=42: invalid fd= for tap network device. If you're copying from libvirt, use `virsh dom2xml-to-native' instead qemu: -net tap,fd=42: Device 'tap' could not be initialized Signed-off-by: Anthony Liguori aligu...@us.ibm.com diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c index 1d83400..5f9f749 100644 --- a/hw/usb-uhci.c +++ b/hw/usb-uhci.c @@ -64,7 +64,7 @@ #define UHCI_PORT_CSC (1 1) #define UHCI_PORT_CCS (1 0) -#define FRAME_TIMER_FREQ 1000 +#define FRAME_TIMER_FREQ 500 #define FRAME_MAX_LOOPS 100 @@ -1054,7 +1054,7 @@ static void uhci_frame_timer(void *opaque) UHCIState *s = opaque; /* prepare the timer for the next frame */ -s-expire_time += (get_ticks_per_sec() / FRAME_TIMER_FREQ); +s-expire_time = qemu_get_clock(vm_clock) + (get_ticks_per_sec() / FRAME_TIMER_FREQ); if (!(s-cmd UHCI_CMD_RS)) { /* Full stop */ diff --git a/net/tap.c b/net/tap.c index 0147dab..45b1fb6 100644 --- a/net/tap.c +++ b/net/tap.c @@ -405,6 +405,8 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan int fd, vnet_hdr = 0; if (qemu_opt_get(opts, fd)) { +int flags; + if (qemu_opt_get(opts, ifname) || qemu_opt_get(opts, script) || qemu_opt_get(opts, downscript) || @@ -418,6 +420,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan return -1; } +if (fcntl(fd, F_GETFL, flags) == -1) { +error_report(invalid fd= for tap network device. If you're copying from libvirt, use `virsh dom2xml-to-native' instead); +return -1; +} + fcntl(fd, F_SETFL, O_NONBLOCK); vnet_hdr = tap_probe_vnet_hdr(fd); -- 1.7.0.4
Re: [Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X
On 10/08/2010 06:28 PM, Alexander Graf wrote: On 09.10.2010, at 00:04, Anthony Liguori wrote: A lot of people copy libvirt's command line from ps -ef and then wonder why the VM isn't working correctly. Let's be kind and tell them what they should do instead. Without this patch, if you run with an invalid -net tap,fd=X, the guest still runs and we poll 100% on a bad file descriptor. With this patch, you get: qemu: -net tap,fd=42: invalid fd= for tap network device. If you're copying from libvirt, use `virsh dom2xml-to-native' instead qemu: -net tap,fd=42: Device 'tap' could not be initialized Signed-off-by: Anthony Liguorialigu...@us.ibm.com diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c index 1d83400..5f9f749 100644 --- a/hw/usb-uhci.c +++ b/hw/usb-uhci.c @@ -64,7 +64,7 @@ #define UHCI_PORT_CSC (1 1) #define UHCI_PORT_CCS (1 0) -#define FRAME_TIMER_FREQ 1000 +#define FRAME_TIMER_FREQ 500 #define FRAME_MAX_LOOPS 100 @@ -1054,7 +1054,7 @@ static void uhci_frame_timer(void *opaque) UHCIState *s = opaque; /* prepare the timer for the next frame */ -s-expire_time += (get_ticks_per_sec() / FRAME_TIMER_FREQ); +s-expire_time = qemu_get_clock(vm_clock) + (get_ticks_per_sec() / FRAME_TIMER_FREQ); How exactly is this related? So, didn't have a clean working directory. Regards, Anthony Liguori Alex
[Qemu-devel] [PATCH] net: provide a friendly message when a user passes a bad -net tap, fd=X (v2)
A lot of people copy libvirt's command line from ps -ef and then wonder why the VM isn't working correctly. Let's be kind and tell them what they should do instead. Without this patch, if you run with an invalid -net tap,fd=X, the guest still runs and we poll 100% on a bad file descriptor. With this patch, you get: qemu: -net tap,fd=42: invalid fd= for tap network device. If you're copying from libvirt, use `virsh dom2xml-to-native' instead qemu: -net tap,fd=42: Device 'tap' could not be initialized Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- v1 - v2 - Remove garbage from unclean commit (spotted by Alex Graf) diff --git a/net/tap.c b/net/tap.c index 0147dab..45b1fb6 100644 --- a/net/tap.c +++ b/net/tap.c @@ -405,6 +405,8 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan int fd, vnet_hdr = 0; if (qemu_opt_get(opts, fd)) { +int flags; + if (qemu_opt_get(opts, ifname) || qemu_opt_get(opts, script) || qemu_opt_get(opts, downscript) || @@ -418,6 +420,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan return -1; } +if (fcntl(fd, F_GETFL, flags) == -1) { +error_report(invalid fd= for tap network device. If you're copying from libvirt, use `virsh dom2xml-to-native' instead); +return -1; +} + fcntl(fd, F_SETFL, O_NONBLOCK); vnet_hdr = tap_probe_vnet_hdr(fd); -- 1.7.0.4
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/11/2010 08:10 AM, Avi Kivity wrote: On 10/11/2010 12:37 PM, Stefan Hajnoczi wrote: On Sun, Oct 10, 2010 at 11:10:15AM +0200, Avi Kivity wrote: On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote: This patch implements the read/write state machine. Operations are fully asynchronous and multiple operations may be active at any time. Allocating writes lock tables to ensure metadata updates do not interfere with each other. If two allocating writes need to update the same L2 table they will run sequentially. If two allocating writes need to update different L2 tables they will run in parallel. Shouldn't there be a flush between an allocating write and an L2 update? Otherwise a reuse of a cluster can move logical sectors from one place to another, causing a data disclosure. Can be skipped if the new cluster is beyond the physical image size. Currently clusters are never reused and new clusters are always beyond physical image size. The only exception I can think of is when the image file size is not a multiple of the cluster size and we round down to the start of cluster. In an implementation that supports TRIM or otherwise reuses clusters this is a cost. + +/* + * Table locking works as follows: + * + * Reads and non-allocating writes do not acquire locks because they do not + * modify tables and only see committed L2 cache entries. What about a non-allocating write that follows an allocating write? 1 Guest writes to sector 0 2 Host reads backing image (or supplies zeros), sectors 1-127 3 Host writes sectors 0-127 4 Guest writes sector 1 5 Host writes sector 1 There needs to be a barrier that prevents the host and the disk from reordering operations 3 and 5, or guest operation 4 is lost. As far as the guest is concerned no overlapping writes were issued, so it isn't required to provide any barriers. (based on the comment only, haven't read the code) There is no barrier between operations 3 and 5. However, operation 5 only starts after operation 3 has completed because of table locking. It is my understanding that *independent* requests may be reordered but two writes to the *same* sector will not be reordered if write A completes before write B is issued. Imagine a test program that uses pwrite() to rewrite a counter many times on disk. When the program finishes it prints the counter variable's last value. This scenario is like operations 3 and 5 above. If we read the counter back from disk it will be the final value, not some intermediate value. The writes will not be reordered. Yes, all that is needed is a barrier in program order. So, operation 5 is an allocating write as long as 3 hasn't returned? (at which point it becomes a non-allocating write)? Correct. The table lock in 0 is held until the request completes fully (including the write out of all of the fill data--step 3) which means 5 will not begin until 3 has completed Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 08:44 AM, Avi Kivity wrote: On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote: A leak is acceptable (it won't grow; it's just an unused, incorrect freelist), but data corruption is not. The alternative is for the freelist to be a non-compat feature bit. That means older QEMU binaries cannot use a QED image that has enabled the freelist. For this one feature. What about others? A compat feature is one where the feature can be completely ignored (meaning that the QEMU does not have to understand the data format). An example of a compat feature is copy-on-read. It's merely a suggestion and there is no additional metadata. If a QEMU doesn't understand it, it doesn't affect it's ability to read the image. An example of a non-compat feature would be zero cluster entries. Zero cluster entries are a special L2 table entry that indicates that a cluster's on-disk data is all zeros. As long as there is at least 1 ZCE in the L2 tables, this feature bit must be set. As soon as all of the ZCE bits are cleared, the feature bit can be unset. An older QEMU will gracefully fail when presented with an image using ZCE bits. An image with no ZCEs will work on older QEMUs. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 08:04 AM, Avi Kivity wrote: On 10/11/2010 12:09 PM, Stefan Hajnoczi wrote: On Sun, Oct 10, 2010 at 11:20:09AM +0200, Avi Kivity wrote: On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote: Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com --- docs/specs/qed_spec.txt | 94 +++ 1 files changed, 94 insertions(+), 0 deletions(-) +Feature bits: +* QED_F_BACKING_FILE = 0x01. The image uses a backing file. +* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. Great that QED_F_NEED_CHECK is now non-optional. However, suppose we add a new structure (e.g. persistent freelist); it's now impossible to tell whether the structure is updated or not: 1 new qemu opens image 2 writes persistent freelist 3 clears need_check 4 shuts down 5 old qemu opens image 6 doesn't update persistent freelist 7 clears need_check 8 shuts down The image is now inconsistent, but has need_check cleared. We can address this by having a third feature bitmask that is autocleared by guests that don't recognize various bits; so the sequence becomes: 1 new qemu opens image 2 writes persistent freelist 3 clears need_check 4 sets persistent_freelist 5 shuts down 6 old qemu opens image 7 clears persistent_freelist (and any other bits it doesn't recognize) 8 doesn't update persistent freelist 9 clears need_check 10 shuts down The image is now consistent, since the persistent freelist has disappeared. It is more complicated than just the feature bit. The freelist requires space in the image file. Clearing the persistent_freelist bit leaks the freelist. We can solve this by using a compat feature bit and an autoclear feature bit. The autoclear bit specifies whether or not the freelist is valid and the compat feature bit specifices whether or not the freelist exists. When the new qemu opens the image again it notices that the autoclear bit is unset but the compat bit is set. This means the freelist is out-of-date and its space can be reclaimed. I don't like the idea of doing these feature bit acrobatics because they add complexity. On the other hand your proposal ensures backward compatibility in the case of an optional data structure that needs to stay in sync with the image. I'm just not 100% convinced it's worth it. My scenario ends up with data corruption if we move to an old qemu and then back again, without any aborts. A leak is acceptable (it won't grow; it's just an unused, incorrect freelist), but data corruption is not. A leak is unacceptable. It means an image can grow to an unbounded size. If you are a server provider offering multitenancy, then a malicious guest can potentially grow the image beyond it's allotted size causing a Denial of Service attack against another tenant. A freelist has to be a non-optional feature. When the freelist bit is set, an older QEMU cannot read the image. If the freelist is completed used, the freelist bit can be cleared and the image is then usable by older QEMUs. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 09:58 AM, Avi Kivity wrote: A leak is unacceptable. It means an image can grow to an unbounded size. If you are a server provider offering multitenancy, then a malicious guest can potentially grow the image beyond it's allotted size causing a Denial of Service attack against another tenant. This particular leak cannot grow, and is not controlled by the guest. As the image gets moved from hypervisor to hypervisor, it can keep growing if given a chance to fill up the disk, then trim it all way. In a mixed hypervisor environment, it just becomes a numbers game. A freelist has to be a non-optional feature. When the freelist bit is set, an older QEMU cannot read the image. If the freelist is completed used, the freelist bit can be cleared and the image is then usable by older QEMUs. Once we support TRIM (or detect zeros) we'll never have a clean freelist. Zero detection doesn't add to the free list. A potential solution here is to treat TRIM a little differently than we've been discussing. When TRIM happens, don't immediately write an unallocated cluster entry for the L2. Leave the L2 entry in-tact. Don't actually write a UCE to the L2 until you actually allocate the block. This implies a cost because you'll need to do metadata syncs to make this work. However, that eliminates leakage. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 10:24 AM, Avi Kivity wrote: On 10/11/2010 05:02 PM, Anthony Liguori wrote: On 10/11/2010 08:44 AM, Avi Kivity wrote: On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote: A leak is acceptable (it won't grow; it's just an unused, incorrect freelist), but data corruption is not. The alternative is for the freelist to be a non-compat feature bit. That means older QEMU binaries cannot use a QED image that has enabled the freelist. For this one feature. What about others? A compat feature is one where the feature can be completely ignored (meaning that the QEMU does not have to understand the data format). An example of a compat feature is copy-on-read. It's merely a suggestion and there is no additional metadata. If a QEMU doesn't understand it, it doesn't affect it's ability to read the image. An example of a non-compat feature would be zero cluster entries. Zero cluster entries are a special L2 table entry that indicates that a cluster's on-disk data is all zeros. As long as there is at least 1 ZCE in the L2 tables, this feature bit must be set. As soon as all of the ZCE bits are cleared, the feature bit can be unset. An older QEMU will gracefully fail when presented with an image using ZCE bits. An image with no ZCEs will work on older QEMUs. What's the motivation behind ZCE? It's very useful for Copy-on-Read. If the cluster in the backing file is unallocated, then when you do a copy-on-read, you don't want to write out a zero cluster since you'd expand the image to it's maximum size. It's also useful for operations like compaction in the absence of TRIM. The common implementation on platforms like VMware is to open a file and write zeros to it until it fills up the filesystem. You then delete the file. The result is that any unallocated data on the disk is written as zero and combined with zero-detection in the image format, you can compact the image size by marking unallocated blocks as ZCE. There is yet a third type of feature, one which is not strictly needed in order to use the image, but if used, must be kept synchronized. An example is the freelist. Another example is a directory index for a filesystem. I can't think of another example which would be relevant to QED -- metadata checksums perhaps? -- we can always declare it a non-compatible feature, but of course, it reduces compatibility. You're suggesting a feature that is not strictly needed, but that needs to be kept up to date. If it can't be kept up to date, something needs to happen to remove it. Let's call this a transient feature. Most of the transient features can be removed given some bit of code. For instance, ZCE can be removed by writing out zero clusters or writing an unallocated L2 entry if there is no backing file. I think we could add a qemu-img demote command or something like that that attempted to remove features when possible. That doesn't give you instant compatibility but I'm doubtful that you can come up with a generic way to remove a feature from an image without knowing anything about the image. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 10:46 AM, Stefan Hajnoczi wrote: On Mon, Oct 11, 2010 at 05:39:01PM +0200, Avi Kivity wrote: On 10/11/2010 05:30 PM, Stefan Hajnoczi wrote: It was discussed before, but I don't think we came to a conclusion. Are there any circumstances under which you don't want to set the QED_CF_BACKING_FORMAT flag? I suggest the following: QED_CF_BACKING_FORMAT_RAW = 0x1 When set, the backing file is a raw image and should not be probed for its file format. The default (unset) means that the backing image file format may be probed. Now the backing_fmt_{offset,size} are no longer necessary. Should it not be an incompatible option? If the backing disk starts with a format magic, it will be probed by an older qemu, incorrectly. Agreed, it should be a non-compat feature bit. If it's just raw or not raw, then I agree it should be non-compat. I think we just need a feature bit then that indicates that the backing file is non-probeable which certainly simplifies the implementation. QED_F_BACKING_FORMAT_NOPROBE maybe? Regards, Anthony Liguori Stefan
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 11:02 AM, Avi Kivity wrote: On 10/11/2010 05:49 PM, Anthony Liguori wrote: On 10/11/2010 09:58 AM, Avi Kivity wrote: A leak is unacceptable. It means an image can grow to an unbounded size. If you are a server provider offering multitenancy, then a malicious guest can potentially grow the image beyond it's allotted size causing a Denial of Service attack against another tenant. This particular leak cannot grow, and is not controlled by the guest. As the image gets moved from hypervisor to hypervisor, it can keep growing if given a chance to fill up the disk, then trim it all way. In a mixed hypervisor environment, it just becomes a numbers game. I don't see how it can grow. Both the freelist and the clusters it points to consume space, which becomes a leak once you move it to a hypervisor that doesn't understand the freelist. The older hypervisor then allocates new blocks. As soon as it performs a metadata scan (if ever), the freelist is reclaimed. Assume you don't ever do a metadata scan (which is really our design point). If you move to a hypervisor that doesn't support it, then move to a hypervisor that does, you create a brand new freelist and start leaking more space. This isn't a contrived scenario if you have a cloud environment with a mix of hosts. You might not be able to get a ping-pong every time you provision, but with enough effort, you could create serious problems. It's really an issue of correctness. Making correctness trade-offs for the purpose of compatibility is a policy decision and not something we should bake into an image format. If a tool feels strongly that it's a reasonable trade off to make, it can always fudge the feature bits itself. A freelist has to be a non-optional feature. When the freelist bit is set, an older QEMU cannot read the image. If the freelist is completed used, the freelist bit can be cleared and the image is then usable by older QEMUs. Once we support TRIM (or detect zeros) we'll never have a clean freelist. Zero detection doesn't add to the free list. Why not? If a cluster is zero filled, you may drop it (assuming no backing image). Sorry, I was thinking about the case of copy-on-read. When you transition from UCE - ZCE, nothing gets added to the free list. But if you go from allocated - ZCE, then you would add to the free list. A potential solution here is to treat TRIM a little differently than we've been discussing. When TRIM happens, don't immediately write an unallocated cluster entry for the L2. Leave the L2 entry in-tact. Don't actually write a UCE to the L2 until you actually allocate the block. This implies a cost because you'll need to do metadata syncs to make this work. However, that eliminates leakage. The information is lost on shutdown; and you can have a large number of unallocated-in-waiting clusters (like a TRIM issued by mkfs, or a user expecting a visit from RIAA). A slight twist on your proposal is to have an allocated-but-may-drop bit in a L2 entry. TRIM or zero detection sets the bit (leaving the cluster number intact). A following write to the cluster needs to clear the bit; if we reallocate the cluster we need to replace it with a ZCE. Yeah, this is sort of what I was thinking. You would still want a free list but it becomes totally optional because if it's lost, no data is leaked (assuming that the older version understands the bit). I was suggesting that we store that bit in the free list though because that let's us support having older QEMUs with absolutely no knowledge still work. This makes the freelist all L2 entries with the bit set; it may be less efficient than a custom data structure though. We still want the freelist to avoid recreating it. We also want to store the allocated-but-may-drop bit in the free list. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/11/2010 11:18 AM, Anthony Liguori wrote: On 10/11/2010 10:46 AM, Stefan Hajnoczi wrote: On Mon, Oct 11, 2010 at 05:39:01PM +0200, Avi Kivity wrote: On 10/11/2010 05:30 PM, Stefan Hajnoczi wrote: It was discussed before, but I don't think we came to a conclusion. Are there any circumstances under which you don't want to set the QED_CF_BACKING_FORMAT flag? I suggest the following: QED_CF_BACKING_FORMAT_RAW = 0x1 When set, the backing file is a raw image and should not be probed for its file format. The default (unset) means that the backing image file format may be probed. Now the backing_fmt_{offset,size} are no longer necessary. Should it not be an incompatible option? If the backing disk starts with a format magic, it will be probed by an older qemu, incorrectly. Agreed, it should be a non-compat feature bit. If it's just raw or not raw, then I agree it should be non-compat. I think we just need a feature bit then that indicates that the backing file is non-probeable which certainly simplifies the implementation. QED_F_BACKING_FORMAT_NOPROBE maybe? Er, thinking more, this is still a good idea but we still need QED_CF_BACKING_FORMAT because we specifically need to know when a protocol is specified. Otherwise, we have no way of doing nbd as a backing file. Regards, Anthony Liguori Regards, Anthony Liguori Stefan
Re: [Qemu-devel] [RFC] Passing boot order from qemu to seabios
On 10/11/2010 10:52 AM, Stefan Hajnoczi wrote: 2010/10/11 Gleb Natapovg...@redhat.com: On Mon, Oct 11, 2010 at 01:48:09PM +0100, Stefan Hajnoczi wrote: On Mon, Oct 11, 2010 at 12:16 PM, Bernhard Kohlbernhard.k...@nsn.com wrote: Am 11.10.2010 12:18, schrieb ext Gleb Natapov: Currently if VM is started with multiple disks it is almost impossible to guess which one of them will be used as boot device especially if there is a mix of ATA/virtio/SCSI devices. Essentially BIOS decides the order and without looking into the code you can't tell what the order will be (and in qemu-kvm if boot=on is used it brings even more havoc). We should allow fine-grained control of boot order from qemu command line, or as a minimum control what device will be used for booting. To do that along with inventing syntax to specify boot order on qemu command line we need to communicate boot order to seabios via fw_cfg interface. For that we need to have a way to unambiguously specify a disk from qemu to seabios. PCI bus address is not enough since not all devices are PCI (do we care about them?) and since one PCI device may control more then one disk (ATA slave/master, SCSI LUNs). We can do what EDD specification does. Describe disk as: bus type (isa/pci), address on a bus (16 bit base address for isa, b/s/f for pci) device type (ATA/SCSI/VIRTIO) device path (slave/master for ATA, LUN for SCSI, nothing for virtio) Will it cover all use cased? Any other ideas? I think this also applies to network booting via gPXE. Usually our VMs have 4 NICs, mixed virtio-net and PCI pass-through. 2 of the NICs shall be used for booting, even if there are hard disks or floppy disks connected. This scenario is currently almost impossible to configure. Here is a gPXE to support fw_cfg. You can pass gPXE script files from the host to gPXE inside the guest. This means you can boot specific NICs: http://patchwork.ozlabs.org/patch/43777/ Just wanted to post the link because it is related to the gPXE side of this discussion. Don't we load gPXE for each NIC and seabios passes PCI device to boot from when it invokes one of them? SeaBIOS may do that but gPXE internally just probes all PCI devices. It does not take advantage of the PCI bus/addr/fn that was passed to the option ROM. A gPXE instance will try booting from each available NIC in sequence. It still registers a BEV entry though, no? Does it at least try to boot from the PCI bus/addr/fn of the selected BEV entry? Regards, Anthony Liguori Stefan
Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios
On 10/11/2010 07:16 AM, Gleb Natapov wrote: On Mon, Oct 11, 2010 at 02:07:14PM +0200, Gerd Hoffmann wrote: Hi, Floppy? Yes, I think we do. And *one* floppy controllers can actually have *two* drives connected, although booting from 'b' doesn't work IIRC. and since one PCI device may control more then one disk (ATA slave/master, SCSI LUNs). We can do what EDD specification does. Describe disk as: bus type (isa/pci), address on a bus (16 bit base address for isa, b/s/f for pci) device type (ATA/SCSI/VIRTIO) device path (slave/master for ATA, LUN for SCSI, nothing for virtio) If we had a qdev ID for all devices (which I think we should have anyway), would this work or is a string not really handy enough? I think we'll need support for that in all drivers supporting boot anyway, i.e. have virtio-blk-pci register a boot edd when configured that way. Question is how to configure this. We could attach the boot index to either the blockdev or the device, i.e. -blockdev foo,bootindex=1 or -device virtio-blk-pci,bootindex=1 The latter looks more useful to me, boot order is guest state imho, also it might expand to PXE booting nicely, i.e. -device e1000,bootindex=2 Yes, boot order is a guest sate managed by BIOS on real HW. It's not that simple. On advanced platforms, the boot order can be stored outside of the BIOS. For instance, the boot order is actually stored in the IMM on certain IBM platforms and the BIOS queries the IMM for the boot order. This allows out-of-band management tools to alter the boot order. This is more or less what we're looking for here. The BIOS should be able to query and modify the boot order but this is something that ideally belongs in QEMU. Which turns up the question how this plays with option roms. seabios should be able to order at pci device level at least when booting via (pci) option rom. OK for nics. Booting from a scsi disk with id != 0 using the lsi rom is probably impossible though. What about non-pci option roms? The one used for -kernel for example? -option-rom rom.bin,bootindex=3? We can pass boot index along with option rom via fw_cfg interface. If the option rom is just hijacking int19, then there is no meaningful order you can give it. Regards, Anthony Liguori -- Gleb. ___ SeaBIOS mailing list seab...@seabios.org http://www.seabios.org/mailman/listinfo/seabios
Re: [Qemu-devel] [RFC] Passing boot order from qemu to seabios
On 10/11/2010 07:07 AM, Gerd Hoffmann wrote: Hi, Floppy? Yes, I think we do. And *one* floppy controllers can actually have *two* drives connected, although booting from 'b' doesn't work IIRC. and since one PCI device may control more then one disk (ATA slave/master, SCSI LUNs). We can do what EDD specification does. Describe disk as: bus type (isa/pci), address on a bus (16 bit base address for isa, b/s/f for pci) device type (ATA/SCSI/VIRTIO) device path (slave/master for ATA, LUN for SCSI, nothing for virtio) If we had a qdev ID for all devices (which I think we should have anyway), would this work or is a string not really handy enough? I think we'll need support for that in all drivers supporting boot anyway, i.e. have virtio-blk-pci register a boot edd when configured that way. Question is how to configure this. We could attach the boot index to either the blockdev or the device, i.e. -blockdev foo,bootindex=1 or -device virtio-blk-pci,bootindex=1 The latter looks more useful to me, boot order is guest state imho, also it might expand to PXE booting nicely, i.e. -device e1000,bootindex=2 Which turns up the question how this plays with option roms. seabios should be able to order at pci device level at least when booting via (pci) option rom. OK for nics. Booting from a scsi disk with id != 0 using the lsi rom is probably impossible though. What about non-pci option roms? The one used for -kernel for example? -kernel hijacks int19 so it cannot participate in any kind of boot order. It's either present (and therefore the bootable disk) or not present. Regards, Anthony Liguori cheers, Gerd
Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios
On 10/11/2010 02:59 PM, Gleb Natapov wrote: No boot rom should do that. extboot wreaks havoc when it is used. And since virtio is now supported by bios there is no reason to use it. You don't really have a choice. You could be doing hardware passthrough and the ROM on the card may hijack int19. Whoever needs scsi boot should add it to seabios too. I don't disagree. I think the best thing to do is to let SeaBIOS create a boot order table that contains descriptive information and then advertise that to QEMU. QEMU can then try to associate the list of bootable devices with it's own set of devices and select a preferred order that it can then give back to SeaBIOS. SeaBIOS can then present that list to the user for additional refinement. Regards, Anthony Liguori -- Gleb.
Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios
On 10/11/2010 03:36 PM, Gleb Natapov wrote: On Mon, Oct 11, 2010 at 03:30:21PM -0500, Anthony Liguori wrote: On 10/11/2010 02:59 PM, Gleb Natapov wrote: No boot rom should do that. extboot wreaks havoc when it is used. And since virtio is now supported by bios there is no reason to use it. You don't really have a choice. You could be doing hardware passthrough and the ROM on the card may hijack int19. Then this particular HW would be broken on real HW too and will not respect BIOS settings. But the code we provide should work properly. Whoever needs scsi boot should add it to seabios too. I don't disagree. I think the best thing to do is to let SeaBIOS create a boot order table that contains descriptive information and then advertise that to QEMU. What for? Why this step is needed? QEMU can then try to associate the list of bootable devices with it's own set of devices and select a preferred order that it can then give back to SeaBIOS. SeaBIOS can then present that list to the user for additional refinement. Why not skip your first step and let QEMU create boot order list and pass it into Seabios. If menu=on option is present user will be able to override the default from Seabios. Because SeaBIOS is definitive and QEMU is not. We can ask SeaBIOS to boot from SCSI LUN 3 on PCI address X.Y.Z but that doesn't mean that it can figure out what that means. If it can't, how do we communicate that to the user? If SeaBIOS communicates its list to QEMU then we can at least display that list in the monitor in the same way that it's displayed to the guest. That means that we can reorder in the monitor and potentially can persistent the boot device list in a more meaningful way. Regards, Anthony Liguori -- Gleb.
Re: [Qemu-devel] [PULL] eeepro100, virtio, net, vhost fixes
On 10/07/2010 09:54 AM, Michael S. Tsirkin wrote: Here are some fixes, all over the place. I am guessing the below fixes all make sense for the 0.13 branch, too. The per-device notifier patch is the only one that does not fix bugs, but it is needed for a bugfix patch on qemu-kvm stable that depends on it. The following changes since commit 358664cc6d1b5f7c36004be0179b36011b81c49d: console: Avoid dereferencing NULL active_console (2010-10-03 06:43:10 +) Pulled. Thanks. Regards, Anthony Liguori are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony Michael S. Tsirkin (5): net: delay freeing peer host device virtio: invoke set_status callback on reset virtio-net: unify vhost-net start/stop virtio: change set guest notifier to per-device vhost: error code Stefan Weil (1): eepro100: Add support for multiple individual addresses (multiple IA) hw/eepro100.c | 30 +- hw/vhost.c | 54 ++--- hw/virtio-net.c | 89 ++ hw/virtio-pci.c | 29 +- hw/virtio.c |2 + hw/virtio.h |2 +- net.c | 49 ++ net.h |1 + 8 files changed, 168 insertions(+), 88 deletions(-)
[Qemu-devel] [RFT] qemu 0.13.0-rc3
After suffering from a prolonged maintainer softlockup, I'm attempting to get 0.13.0 release process back on track. I've tagged qemu-0.13.0-rc3 in git which only carries a few changes since 0.13.0-rc1. Most notably, a series of updates from Kevin Wolf and Cam Macdonell's ivshmem device. I think we're pretty good testing wise for the final release but I wanted to offer a 24-hour window for last minute fixes. I'm only interested in the following: 1) Patches that are *tested* against the stable-0.13 branch that are already committed in master. Please tell me explicitly that you've tested the patch and how you've tested it. 2) Pull requests from other maintainers with cherry-picked patches against stable-0.13 that have been tested. If you don't already have something ready, it's probably not worth worrying about. We can follow with 0.13.1 in a relatively short period of time. So please get any requests to me before 6PM US Central time October 12th. Post 0.13.0, as part of the 0.14 planning, we can discuss how to avoid future delay with releases. Thanks! Regards, Anthony Liguori
Re: [Qemu-devel] qdev: Some ISA devices don't handle second instantiation gracefully
On 10/12/2010 08:00 AM, Markus Armbruster wrote: Markus Armbrusterarm...@redhat.com writes: When I try -device isa-applesmc -device isa-applesmc, I get WARNING: Using AppleSMC with invalid key qemu: hardware error: register_ioport_read: invalid opaque [...] and a core dump. I know nothing about this device. Instantiating twice may well make no sense. But hw_error() is not a nice way to reject a command line option. Actually, ib700 and isa-debugcon fail the same way. They call register_ioport_write(), which aborts via hw_error() when the port is already in use. This is okay for non-configurable parts of a board emulation, but not okay for a qdev device, unless it has no_user set. It's definitely right to fail but I agree, it's better to propagate the error. Related: when isa_init_irq() finds the requested IRQ already in use, it fails with exit(1). Maybe register_ioport_write() friends should do that as well. Or maybe qdev device models should have an at most one flag. I think the proper thing to do is remove all exit(1)s and propagate errors instead. A simple approach would be to make register_ioport_{read,write}() return an int, then do a query-replace on the source tree to make all invocations of it simply check the return value and exit if it's non-zero. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
On 10/12/2010 08:16 AM, Stefan Hajnoczi wrote: Well, the protocol is currently encoded in the file name, separated by a colon. Of course, we want to get rid of that, but we still don't know what we want instead. It's completely unrelated to the backing file format, though, it's about the format of the backing file name. I agree with Kevin. There's no need to have the ill-defined backing format AFAICT. Yeah, I've now convinced myself we don't need backing format name too. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] [PATCH 00/39] Make configure command line autoconf-compatible
On 10/12/2010 08:32 AM, malc wrote: On Tue, 12 Oct 2010, Paolo Bonzini wrote: On 10/12/2010 03:22 PM, malc wrote: 19 files changed, 3729 insertions(+), 456 deletions(-) I don't think the explanation above justified net plus of 3273 lines of new code. The part you forgot to quote is: Yes, failed to notice that, sorry. Makefile | 54 +- Makefile.dis |6 +- Makefile.hw|8 +- Makefile.objs |4 +- Makefile.target| 32 +- Makefile.user |6 +- bsd-user/main.c| 10 +- config.guess | 1502 ++ config.sub | 1731 configure | 757 ++- create_config |5 +- darwin-user/machload.c | 12 +- darwin-user/main.c |6 +- darwin-user/syscall.c |2 +- linux-user/main.c | 10 +- pc-bios/optionrom/Makefile |8 +- rules.mak | 10 +- tests/Makefile | 16 +- tests/cris/Makefile|6 +- 19 files changed, 3729 insertions(+), 456 deletions(-) All these lines are in config.guess and config.sub that are shared with all projects using Autotools and do not really need to be considered. It's 40 lines of new code, 33 of which are in configure's help message. I think there's a fair bit of value in looking like autoconf but a 39 patch series is a bit to digest at once. I'll try to do a thorough review but I'd suggest keeping patch series down to about 20 patches whenever possible and do multiple rounds as necessary. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] qdev: Some ISA devices don't handle second instantiation gracefully
On 10/12/2010 08:54 AM, Markus Armbruster wrote: I think the proper thing to do is remove all exit(1)s and propagate errors instead. exit() is good enough during startup, i.e. -device. It's wrong for hot plug; anything to be used in a hot plug path must propagate errors. We could keep exiting in code that's only used by non-hotpluggable devices. I'm not really suggesting that we move exit()s out of all devices right now. Am just suggesting that we bump them up one level. A simple approach would be to make register_ioport_{read,write}() return an int, then do a query-replace on the source tree to make all invocations of it simply check the return value and exit if it's non-zero. In similar cases, we've used a simple FOO_nofail() wrapper in places that want to exit. I'll see what I can do. I prefer propagating the exit but am not deeply opposed to nofail(). Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/12/2010 10:08 AM, Kevin Wolf wrote: Otherwise we might destroy data that isn't even touched by the guest request in case of a crash. The failure scenarios are either that the cluster is leaked in which case, the old version of the data is still present or the cluster is orphaned because the L2 entry is written, in which case the old version of the data is present. Are you referring to a scenario where the cluster is partially written because the data is present in the write cache and the write cache isn't flushed on power failure? Regards, Anthony Liguori Kevin
Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)
On 10/12/2010 11:03 AM, Alon Levy wrote: - Anthony Liguorianth...@codemonkey.ws wrote: On 10/12/2010 07:58 AM, Alon Levy wrote: This patch adds a new device, it is described in full in the second patch intro and also in the documentation in docs. In brief it provides a standard smart card reader device. The first patch is the configure change and docs. The second patch contains the actual device, I couldn't figure out a good way to split it to ease review. v2 changed: * all QSIMPLEQ turned into fixed sized rings * all allocated buffers turned into fixed size buffers * added migration support * added a message to tell client qemu has migrated to ip:port * for lack of monitor commands ip:port are 0:0, which causes the updated vscclient to connect to one port higher on the same host. will add monitor commands in a separate patch. tested with current setup. This is way too much magic to live within a device. Devices manage reconnecting themselves during migration. When you create the destination qemu instance, you specify what to connect to. IOW, On the source: qemu -chardev tcp:localhost:1025,id=foo -usbdevice ccid,chardev=foo ... On the destination: qemu -chardev tcp:localhost:1026,id=foo -usbdevice ccid,chardev=foo -incoming tcp:0.0.0.0:1024 ... A connection happens when the device is created. But now I'm even further confused then when I first reviewed it.. If you're now supporting migration, does that mean that you're relying on the daemon to emulate the device? Let me try to clarify this. Nothing has changed since the last patch except for what's in the notes, i.e. migration support. The device I'm adding is a reader. The reader is just a pipe between smart cards and the guest operating system. The smart card logic does live outside of this device, and is available in the cac_card sources at http://cgit.freedesktop.org/~alon/cac_card/ (all of this is in docs/usb_ccid.txt). So when I speak of vscclient, I'm talking of an application that emulates a smart card and initiates a tcp connection to qemu that connects to the usb-ccid device. vscclient is also in the cac_card sources. Okay, let me be clear. We shouldn't be doing device emulation outside of QEMU's source tree--at least, not in this type of context. External devices present a great deal of challenges and we shouldn't just approach it in an ad-hoc fashion. I'm not opposed to passthrough although I'd prefer QEMU to talk to the device directly instead of going through a daemon. There's a lot of delicate integration between QEMU and a device and if a device lives outside of QEMU, it makes it extremely difficult for us to influence changes to that device to support QEMU new features. Regarding the method of reconnection: You are absolutely right that if I have qemu connect to the remote instead of the other way around then I remove the need to inform vscclient of the new address. But the way it stands requires the client to know the address of the destination qemu. I have to inform it somehow. You are saying that devices shouldn't know this information? ok, that's why I talked about monitor commands. I come from the world of spice - in spice we use monitor commands for this. And none of that is upstream. Regards, Anthony Liguori I could change this to have qemu connect to vscclient, but I don't see the logic in general - sometimes you do want to have a chardev that is listening (the fact that it is implemented suggests someone found it useful), if you then migrate you have the same problem I'm solving. Regards, Anthony Liguori Alon Levy (2): usb-ccid: add CCID device. add configure option. usb-ccid: add CCID device (device itself) Makefile.objs |1 + configure | 12 + docs/usb-ccid.txt | 115 + hw/usb-ccid.c | 1376 hw/vscard_common.h | 131 + 5 files changed, 1635 insertions(+), 0 deletions(-) create mode 100644 docs/usb-ccid.txt create mode 100644 hw/usb-ccid.c create mode 100644 hw/vscard_common.h
[Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/12/2010 10:59 AM, Stefan Hajnoczi wrote: On Tue, Oct 12, 2010 at 05:39:48PM +0200, Kevin Wolf wrote: Am 12.10.2010 17:22, schrieb Anthony Liguori: On 10/12/2010 10:08 AM, Kevin Wolf wrote: Otherwise we might destroy data that isn't even touched by the guest request in case of a crash. The failure scenarios are either that the cluster is leaked in which case, the old version of the data is still present or the cluster is orphaned because the L2 entry is written, in which case the old version of the data is present. Hm, how does the latter case work? Or rather, what do mean by orphaned? Are you referring to a scenario where the cluster is partially written because the data is present in the write cache and the write cache isn't flushed on power failure? The case I'm referring to is a COW. So let's assume a partial write to an unallocated cluster, we then need to do a COW in pre/postfill. Then we do a normal write and link the new cluster in the L2 table. Assume that the write to the L2 table is already on the disk, but the pre/postfill data isn't yet. At this point we have a bad state because if we crash now we have lost the data that should have been copied from the backing file. In this case QED_F_NEED_CHECK is set and the invalid cluster offset should be reset to zero on open. However, I think we can get into a state where the pre/postfill data isn't on the disk yet but another allocation has increased the file size, making the unwritten cluster valid. This fools consistency check into thinking the data cluster (which was never written to on disk) is valid. Will think about this more tonight. It's fairly simple to add a sync to this path. It's probably worth checking the prefill/postfill for zeros and avoiding the write/sync if that's the case. That should optimize the common cases of allocating new space within a file. My intuition is that we can avoid the sync entirely but we'll need to think about it further. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)
On 10/12/2010 11:43 AM, Alon Levy wrote: - Anthony Liguorianth...@codemonkey.ws wrote: On 10/12/2010 11:03 AM, Alon Levy wrote: - Anthony Liguorianth...@codemonkey.ws wrote: On 10/12/2010 07:58 AM, Alon Levy wrote: This patch adds a new device, it is described in full in the second patch intro and also in the documentation in docs. In brief it provides a standard smart card reader device. The first patch is the configure change and docs. The second patch contains the actual device, I couldn't figure out a good way to split it to ease review. v2 changed: * all QSIMPLEQ turned into fixed sized rings * all allocated buffers turned into fixed size buffers * added migration support * added a message to tell client qemu has migrated to ip:port * for lack of monitor commands ip:port are 0:0, which causes the updated vscclient to connect to one port higher on the same host. will add monitor commands in a separate patch. tested with current setup. This is way too much magic to live within a device. Devices manage reconnecting themselves during migration. When you create the destination qemu instance, you specify what to connect to. IOW, On the source: qemu -chardev tcp:localhost:1025,id=foo -usbdevice ccid,chardev=foo ... On the destination: qemu -chardev tcp:localhost:1026,id=foo -usbdevice ccid,chardev=foo -incoming tcp:0.0.0.0:1024 ... A connection happens when the device is created. But now I'm even further confused then when I first reviewed it.. If you're now supporting migration, does that mean that you're relying on the daemon to emulate the device? Let me try to clarify this. Nothing has changed since the last patch except for what's in the notes, i.e. migration support. The device I'm adding is a reader. The reader is just a pipe between smart cards and the guest operating system. The smart card logic does live outside of this device, and is available in the cac_card sources at http://cgit.freedesktop.org/~alon/cac_card/ (all of this is in docs/usb_ccid.txt). So when I speak of vscclient, I'm talking of an application that emulates a smart card and initiates a tcp connection to qemu that connects to the usb-ccid device. vscclient is also in the cac_card sources. Okay, let me be clear. We shouldn't be doing device emulation outside of QEMU's source tree--at least, not in this type of context. External devices present a great deal of challenges and we shouldn't just approach it in an ad-hoc fashion. There are two devices: smart card reader, aka CCID device - emulated inside qemu. Submitted. smart card - emulated outside of qemu. Fully available, open source separate project. And how does the smart card state get migrated during migration? How do you keep it synced with QEMU? I don't understand the use-case behind this. Is this so that a local physical smart card can be passed through to a guest from a Spice client and when migration happens, the QEMU instance connects back to the Spice client? So the device is never actually migrated? Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)
On 10/12/2010 12:09 PM, Alon Levy wrote: The smart card is not being migrated. It is running on the client machine, which is not being migrated/shutdown (same as vncviewer isn't migrated). Ok, let's look at this compared to another similar use-case: USB passthrough of a webcam device that's remoted using USB over IP. In this model, you have a USB bus that's modelled as a bus and a device. Within the USB bus, you have additional devices. These are all qdev devices and they may be emulated or they may be implemented using passthrough. While we don't do it today, USB over IP would be just another form of passthrough. Migration is a rather interesting challenge in this model. You've got a mix of client state and server state on the USB over IP connection. You could marshal up the client state and as long as you reconnected to the same server on the destination, I guess it would be okay. I think the problem with your current implementation is that you've completed skipped the bus modelling and you're also using the Device over IP connection to implement device emulation. What I would suggest is that you model the bus/device relationship via qdev and move the Smart Card emulation into QEMU. I would also suggest adding proper passthrough support in QEMU. CCID over IP is a reasonable thing to have but I think you've got way too much outside of QEMU right now such that long term maintenance is going to be exceedingly difficult. Regards, Anthony Liguori I don't understand the use-case behind this. Is this so that a local physical smart card can be passed through to a guest from a Spice client and when migration happens, the QEMU instance connects back to the Spice client? So the device is never actually migrated? The *smart card* is never migrated. The ccid device is. Here is the scenario: Host A: qemu_a qemu_a: guest Host B: vscclient - physical reader Host C: qemu_b -incoming .. yes, we will use this for SPICE, but this is submitted to qemu on the hopes and with testing ensuring it is perfectly usable as is without using SPICE, otherwise I wouldn't have sent it upstream. non-SPICE usage: 1. user on B runs vscclient (and presumably the user has some connection to the guest to use the smartcard device, i.e. vnc/ssh/spice, but that's not relevant). 2. vscclient connects via tcp to qemu_a. 3. qemu_a starts migrating to qemu_b. (qemu_b is alive at this point, can receive incoming tcp connections on chardev - otherwise a migration fails immediately anyway) 4. pre_load for usb-ccid sends a Reconnect message 5. vscclient gets the Reconnect message, closes socket to qemu_a, opens socket to qemu_b 6. from guest pov nothing happened (no device detach/attach). I have to stress that the main problem the migration intends to solve is to avoid a detach/attach in the guest. Actual operations on the smartcard could possibly fail as a result of the migration, and it would not be a real problem (i.e. we could live without, but we can't leave with a lock of the guest screen as a result of a migration). Which is why I consider the current code good enough. It is certainly not perfect (short packet issue), or tested enough. The SPICE usage scenario is basically the same, just replace vscclient with spicec, and we don't need the Reconnect message since SPICE takes care of this for us (we just get a channel detach/attach event if we care). 1. user on B runs spicec 2. spicec connects to qemu via spice channel, smart card channel connects to usb-ccid device via spicevmc chardev (so it doesn't care it's spice or not). 3. qemu_a migrates 4. spicec disconnects from qemu_a and connects to qemu_b 5. from guest os pov nothing happens on the ccid usb device. Regards, Anthony Liguori
Re: [Qemu-devel] [RFT] qemu 0.13.0-rc3
On 10/12/2010 04:34 PM, Juergen Lock wrote: In article4cb38c82.1090...@linux.vnet.ibm.com you write: After suffering from a prolonged maintainer softlockup, I'm attempting to get 0.13.0 release process back on track. I've tagged qemu-0.13.0-rc3 in git which only carries a few changes since 0.13.0-rc1. Most notably, a series of updates from Kevin Wolf and Cam Macdonell's ivshmem device. I think we're pretty good testing wise for the final release but I wanted to offer a 24-hour window for last minute fixes. I'm only interested in the following: 1) Patches that are *tested* against the stable-0.13 branch that are already committed in master. Please tell me explicitly that you've tested the patch and how you've tested it. [...] First tests: - vmmouse seems broken (or disabled? I don't see it in `info qdm'...) I don't think vmmouse is part of qdev yet. - to test arm emulation I tried booting my old zaurus images (-M terrier) which failed (qd fix below), and in the process also tested -M n800 w/o an image which failed with: qemu-system-arm: Duplicate ID 'null' for chardev Can't create serial device, empty char device and that I got fixed by cherry-picking this commit from master: 6a8aabd3c132188ee8e0e82ef4aba09f782cbe96 From: Stefan Weilw...@mail.berlios.de Date: Sun, 8 Aug 2010 14:09:26 +0200 Subject: [PATCH] hw/omap: Fix default setup for OMAP UART devices Sorry, is the patch below an additional fix? If so, please submit against master. Regards, Anthony Liguori == And the zaurus patch, problem was the 2nd scoop's base address (0x08800040) gets rounded down to start of page which causes its io read/write callbacks to be passed addresses 0x40 higher than the code expects: (as witnessed by Bad register offset messages and failure to attach the internal CF disk aka microdrive at least.) Signed-off-by: Juergen Lockn...@jelal.kn-bremen.de --- a/hw/zaurus.c +++ b/hw/zaurus.c @@ -70,6 +70,10 @@ static uint32_t scoop_readb(void *opaque { ScoopInfo *s = (ScoopInfo *) opaque; +// XXX Workaround for base address (0x08800040 in this case) +// rounded down to start of page +addr= 0x3f; + switch (addr) { case SCOOP_MCR: return s-mcr; @@ -104,6 +108,10 @@ static void scoop_writeb(void *opaque, t ScoopInfo *s = (ScoopInfo *) opaque; value= 0x; +// XXX Workaround for base address (0x08800040 in this case) +// rounded down to start of page +addr= 0x3f; + switch (addr) { case SCOOP_MCR: s-mcr = value;
Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v6)
, RBD_SUFFIX); sizeof(n) +while (1) { +qemu_free(outbuf); +outbuf = qemu_malloc(len); + +r = rados_exec(s-header_pool, n, rbd, snap_list, NULL, 0, + outbuf, len); +if (r 0) { +error_report(rbd.snap_list execution failed failed: %s, strerror(-r)); +return r; +} +if (r != len) +break; + +len *= 2; +} Is this really the only way to figure out how large the buffer should be?? +buf = outbuf; +end = buf + len; + +if ((r = decode64(buf, end,snap_seq)) 0) +goto done_err; +if ((r = decode32(buf, end,snap_count)) 0) +goto done_err; + +sn_tab = qemu_mallocz(snap_count * sizeof(QEMUSnapshotInfo)); +for (i = 0; i snap_count; i++) { +uint64_t id, image_size; +char *snap_name; +int name_len; + +if ((r = decode64(buf, end,id)) 0) +goto done_err; +if ((r = decode64(buf, end,image_size)) 0) +goto done_err; +if ((r = decode_str(buf, end,snap_name)) 0) +goto done_err; + +name_len = sizeof(sn_info-id_str) - 1; +if (r name_len) +name_len = r; + +sn_info = sn_tab + i; +pstrcpy(sn_info-id_str, name_len + 1, snap_name); +pstrcpy(sn_info-name, name_len + 1, snap_name); +qemu_free(snap_name); + +sn_info-vm_state_size = image_size; +sn_info-date_sec = 0; +sn_info-date_nsec = 0; +sn_info-vm_clock_nsec = 0; +} +*psn_tab = sn_tab; +qemu_free(outbuf); +return snap_count; +done_err: +qemu_free(sn_tab); +qemu_free(outbuf); +return r; +} + +static QEMUOptionParameter rbd_create_options[] = { +{ + .name = BLOCK_OPT_SIZE, + .type = OPT_SIZE, + .help = Virtual disk size +}, +{ + .name = BLOCK_OPT_CLUSTER_SIZE, + .type = OPT_SIZE, + .help = RBD object size +}, +{NULL} +}; + +static BlockDriver bdrv_rbd = { +.format_name= rbd, +.instance_size = sizeof(BDRVRBDState), +.bdrv_file_open = rbd_open, +.bdrv_close = rbd_close, +.bdrv_create= rbd_create, +.bdrv_get_info = rbd_getinfo, +.create_options = rbd_create_options, +.bdrv_getlength = rbd_getlength, +.protocol_name = rbd, + +.bdrv_aio_readv = rbd_aio_readv, +.bdrv_aio_writev= rbd_aio_writev, + +.bdrv_snapshot_create = rbd_snap_create, +.bdrv_snapshot_list = rbd_snap_list, +}; + +static void bdrv_rbd_init(void) +{ +bdrv_register(bdrv_rbd); +} + +block_init(bdrv_rbd_init); diff --git a/block/rbd_types.h b/block/rbd_types.h new file mode 100644 index 000..c35d840 --- /dev/null +++ b/block/rbd_types.h @@ -0,0 +1,71 @@ +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2004-2010 Sage Weils...@newdream.net + * + * This is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License version 2.1, as published by the Free Software + * Foundation. See file COPYING. + * + */ + +#ifndef CEPH_RBD_TYPES_H +#define CEPH_RBD_TYPES_H + + +/* + * rbd image 'foo' consists of objects + * foo.rbd - image metadata + * foo. + * foo.0001 + * ... - data + */ + +#define RBD_SUFFIX .rbd +#define RBD_DIRECTORY rbd_directory +#define RBD_INFOrbd_info + +#define RBD_DEFAULT_OBJ_ORDER 22 /* 4MB */ + +#define RBD_MAX_OBJ_NAME_SIZE 96 +#define RBD_MAX_BLOCK_NAME_SIZE 24 +#define RBD_MAX_SEG_NAME_SIZE 128 + +#define RBD_COMP_NONE 0 +#define RBD_CRYPT_NONE 0 + +#define RBD_HEADER_TEXT Rados Block Device Image\n +#define RBD_HEADER_SIGNATURERBD +#define RBD_HEADER_VERSION 001.005 + +struct rbd_info { +uint64_t max_id; +} __attribute__ ((packed)); + +struct rbd_obj_snap_ondisk { +uint64_t id; +uint64_t image_size; +} __attribute__((packed)); + +struct rbd_obj_header_ondisk { +char text[40]; +char block_name[RBD_MAX_BLOCK_NAME_SIZE]; +char signature[4]; +char version[8]; +struct { +uint8_t order; +uint8_t crypt_type; +uint8_t comp_type; +uint8_t unused; +} __attribute__((packed)) options; +uint64_t image_size; +uint64_t snap_seq; +uint32_t snap_count; +uint32_t reserved; +uint64_t snap_names_len; +struct rbd_obj_snap_ondisk snaps[0]; +} __attribute__((packed)); + + +#endif These types don't match QEMU CODING_STYLE. Regards, Anthony Liguori diff --git a/configure b/configure index af50607..5d8f620 100755 --- a/configure +++ b/configure @@ -325,6 +325,7 @@ cpu_emulation=yes check_utests=no user_pie=no zero_malloc= +rbd= # OS specific if check_define __linux__ ; then @@ -724,6 +725,10 @@ for opt do ;; --*dir) ;; + --disable-rbd) rbd=no + ;; + --enable-rbd) rbd=yes
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/13/2010 08:07 AM, Kevin Wolf wrote: Am 13.10.2010 14:13, schrieb Stefan Hajnoczi: We can avoid it when a backing image is not used. Your idea to check for zeroes in the backing image is neat too, it may well reduce the common case even for backing images. The additional requirement is that we're extending the file and not reusing an old cluster. (And bdrv_has_zero_init() == true, but QED doesn't work on host_devices anyway) Yes, that's a good point. BTW, I think we've decided that making it work on host_devices is not that bad. We can add an additional feature called QED_F_PHYSICAL_SIZE. This feature will add another field to the header that contains an offset immediately following the last cluster allocation. During a metadata scan, we can accurately recreate this field so we only need to update this field whenever we clear the header dirty bit (which means during an fsync()). That means we can maintain the physical size without introducing additional fsync()s in the allocation path. Since we're already writing out the header anyway, the write operation is basically free too. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
That means we can maintain the physical size without introducing additional fsync()s in the allocation path. Since we're already writing out the header anyway, the write operation is basically free too. I don't see how it is free. It's an extra write. The good news is that it's very easy to amortize. We only need to update the header field on disk when we're already updating the header, so it's not even an extra write operation. Why would you ever update the header, apart from relocating L1 for some reason? To update the L1/L2 tables clean bit. That's what prevents a check in the normal case where you have a clean shutdown. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/13/2010 08:50 AM, Avi Kivity wrote: On 10/13/2010 03:24 PM, Anthony Liguori wrote: On 10/13/2010 08:07 AM, Kevin Wolf wrote: Am 13.10.2010 14:13, schrieb Stefan Hajnoczi: We can avoid it when a backing image is not used. Your idea to check for zeroes in the backing image is neat too, it may well reduce the common case even for backing images. The additional requirement is that we're extending the file and not reusing an old cluster. (And bdrv_has_zero_init() == true, but QED doesn't work on host_devices anyway) Yes, that's a good point. BTW, I think we've decided that making it work on host_devices is not that bad. We can add an additional feature called QED_F_PHYSICAL_SIZE. This feature will add another field to the header that contains an offset immediately following the last cluster allocation. During a metadata scan, we can accurately recreate this field so we only need to update this field whenever we clear the header dirty bit (which means during an fsync()). If you make QED_F_PHYSICAL_SIZE an autoclear bit, you don't need the header dirty bit. Yes, autoclear bits are essentially granular header dirty bits. Regards, Anthony Liguori That means we can maintain the physical size without introducing additional fsync()s in the allocation path. Since we're already writing out the header anyway, the write operation is basically free too. I don't see how it is free. It's an extra write. The good news is that it's very easy to amortize.
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/13/2010 09:07 AM, Stefan Hajnoczi wrote: That means we can maintain the physical size without introducing additional fsync()s in the allocation path. Since we're already writing out the header anyway, the write operation is basically free too. I don't see how it is free. It's an extra write. The good news is that it's very easy to amortize. We only need to update the header field on disk when we're already updating the header, so it's not even an extra write operation. Because we're already writing out the sector that contains that field in the header. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/13/2010 09:16 AM, Avi Kivity wrote: On 10/13/2010 04:11 PM, Anthony Liguori wrote: Why would you ever update the header, apart from relocating L1 for some reason? To update the L1/L2 tables clean bit. That's what prevents a check in the normal case where you have a clean shutdown. I see - so you wouldn't update it every allocation, only when the disk has been quiet for a while. Right, the current plan is to flush the header dirty bit on shutdown or whenever there is an explicit flush of the device. Current that is caused by either a guest-initiated flush or a L1 update. We also plan to add a timer-based flush such that a flush is scheduled for some period of time (like 5 minutes) after the dirty bit is set. The end result should be that the only window for requiring a metadata scan is if a crash occurs within 5 minutes of a cluster allocation and an explicit flush has not occurred for some other reason. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
On 10/13/2010 10:08 AM, Avi Kivity wrote: On 10/13/2010 04:53 PM, Anthony Liguori wrote: On 10/13/2010 09:16 AM, Avi Kivity wrote: On 10/13/2010 04:11 PM, Anthony Liguori wrote: Why would you ever update the header, apart from relocating L1 for some reason? To update the L1/L2 tables clean bit. That's what prevents a check in the normal case where you have a clean shutdown. I see - so you wouldn't update it every allocation, only when the disk has been quiet for a while. Right, the current plan is to flush the header dirty bit on shutdown or whenever there is an explicit flush of the device. Current that is caused by either a guest-initiated flush or a L1 update. That does add an extra write (and a new write+flush later to mark the header dirty again when you start allocating). I'd drop it and only use the timer. in fact, it adds an extra flush too. The sequence 1 L1 update 2 mark clean 3 flush is unsafe since you can crash between 2 and 3, ad only 2 makes it. So I'd do something like You've got the order wrong. 1. L1 update 2. flush() 3. mark clean If (3) doesn't make it to disk, that's okay. It just causes an extra scan. 1 opportunistic flush (for whatever reason) 2 set timer 3 no intervening metadata changes 4 mark clean 5 no intervening metadata changes 6 mark dirty 7 flush 8 metadata changes Not sure I see why we set the timer in step 2 as opposed to: 0 clear scheduled flush (if necessary) 1 opportunistic flush (for whatever reason) 2 mark clean 3 no intervening metadata changes 4 mark dirty 5 flush 6 schedule flush (in 5 minutes) 7 metadata changes Which is now recorded at http://wiki.qemu.org/Features/QED/ScanAvoidance so we can keep track of this. Regards, Anthony Liguori
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/13/2010 05:32 PM, Anjali Kulkarni wrote: Hi, Using the legacy way of starting up NICs, I am hitting a limitation after 29 NICs ie no more than 29 are detected (that's because of the 32 PCI slot limit on a single bus- 3 are already taken up) I had initially increased the MAX_NICS to 48, just on my tree, to get to more, but ofcource that wont work. Is there any way to go beyond 29 NICs the legacy way? What is the maximum that can be supported by the qdev mothod? I got up to 104 without trying very hard using the following script: args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -netdev user,id=eth${slot}_${fn} args=$args -device virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm The key is to make the virtio-net devices multifunction and to fill out all 8 functions for each slot. Regards, Anthony Liguori Thanks Anjali
[Qemu-devel] [Bug 660366] Re: qemu-img convert -O qcow2 -o backing_file makes huge images
** Changed in: qemu Importance: Undecided = Wishlist ** Changed in: qemu Status: New = Confirmed -- qemu-img convert -O qcow2 -o backing_file makes huge images https://bugs.launchpad.net/bugs/660366 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Confirmed Bug description: $ dd if=/dev/urandom bs=1M of=1.img count=4 4+0 records in 4+0 records out 4194304 bytes (4,2 MB) copied, 1,0413 s, 4,0 MB/s $ qemu-img create -f qcow2 -b 1.img 2.img Formatting '2.img', fmt=qcow2 size=4194304 backing_file='1.img' encryption=off cluster_size=0 $ qemu-img convert -O qcow2 -o backing_file=1.img 2.img 3.img $ du -h ?.img 4,1M1.img 144K2.img 4,3M3.img The conversion result is bigger then the source! It appears that -o backing_file is not applied to data (as expected). I.e. all data is put into the resulting image: both from source image and backing image. Expected behavior is to put only data that is not present in backing_file.
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 07:07 AM, Avi Kivity wrote: On 10/14/2010 12:54 AM, Anthony Liguori wrote: On 10/13/2010 05:32 PM, Anjali Kulkarni wrote: Hi, Using the legacy way of starting up NICs, I am hitting a limitation after 29 NICs ie no more than 29 are detected (that's because of the 32 PCI slot limit on a single bus- 3 are already taken up) I had initially increased the MAX_NICS to 48, just on my tree, to get to more, but ofcource that wont work. Is there any way to go beyond 29 NICs the legacy way? What is the maximum that can be supported by the qdev mothod? I got up to 104 without trying very hard using the following script: args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -netdev user,id=eth${slot}_${fn} args=$args -device virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm The key is to make the virtio-net devices multifunction and to fill out all 8 functions for each slot. This is unlikely to work right wrt pci hotplug. Yes. Our hotplug design is based on devices.. This is wrong, it should be based on bus-level concepts (like PCI slots). If we want to support a large number of interfaces, we need true multiport cards. This magic here creates a multiport virtio-net card so I'm not really sure what you're suggesting. It would certainly be nice to make this all more user friendly (and make hotplug work). Regards, Anthony Liguori What's the motivation for such a huge number of interfaces?
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 07:10 AM, Daniel P. Berrange wrote: On Thu, Oct 14, 2010 at 02:07:17PM +0200, Avi Kivity wrote: On 10/14/2010 12:54 AM, Anthony Liguori wrote: On 10/13/2010 05:32 PM, Anjali Kulkarni wrote: Hi, Using the legacy way of starting up NICs, I am hitting a limitation after 29 NICs ie no more than 29 are detected (that's because of the 32 PCI slot limit on a single bus- 3 are already taken up) I had initially increased the MAX_NICS to 48, just on my tree, to get to more, but ofcource that wont work. Is there any way to go beyond 29 NICs the legacy way? What is the maximum that can be supported by the qdev mothod? I got up to 104 without trying very hard using the following script: args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -netdev user,id=eth${slot}_${fn} args=$args -device virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm The key is to make the virtio-net devices multifunction and to fill out all 8 functions for each slot. This is unlikely to work right wrt pci hotplug. If we want to support a large number of interfaces, we need true multiport cards. Or a PCI bridge to wire up more PCI buses, so we raise the max limit for any type of device we emulate. I've always been sceptical of this. When physical systems have a large number of NICs, it's via multiple functions, not a bunch of PCI bridges. With just a handful of 8-port NICs, you can exceed the current slot-based limit on physical hardware. It's not an extremely common configuration but it does exist. BTW, I don't think it's possible to hot-add physical functions. I believe I know of a card that supports dynamic add of physical functions (pre-dating SR-IOV). Regards, Anthony Liguori Daniel
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 07:36 AM, Markus Armbruster wrote: Avi Kivitya...@redhat.com writes: On 10/14/2010 12:54 AM, Anthony Liguori wrote: On 10/13/2010 05:32 PM, Anjali Kulkarni wrote: Hi, Using the legacy way of starting up NICs, I am hitting a limitation after 29 NICs ie no more than 29 are detected (that's because of the 32 PCI slot limit on a single bus- 3 are already taken up) I had initially increased the MAX_NICS to 48, just on my tree, to get to more, but ofcource that wont work. Is there any way to go beyond 29 NICs the legacy way? What is the maximum that can be supported by the qdev mothod? I got up to 104 without trying very hard using the following script: args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -netdev user,id=eth${slot}_${fn} args=$args -device virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm The key is to make the virtio-net devices multifunction and to fill out all 8 functions for each slot. I'm amazed that works. Can't see how creating another qdev in the same slot makes a proper multifunction device. multifunction=on sets the multifunction bit for the PCI device. Then it's a matter of setting the address to be a specific function. Our default platform devices are actually multifunction. This is unlikely to work right wrt pci hotplug. If we want to support a large number of interfaces, we need true multiport cards. Indeed. As far as I know, we can't hot plug multifunction PCI devices. Yup. Regards, Anthony Liguori What's the motivation for such a huge number of interfaces? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [Bug 660366] Re: qemu-img convert -O qcow2 -o backing_file makes huge images
On 10/14/2010 07:51 AM, Stefan Hajnoczi wrote: On Thu, Oct 14, 2010 at 1:26 PM, Anthony Liguorianth...@codemonkey.ws wrote: ** Changed in: qemu Importance: Undecided = Wishlist ** Changed in: qemu Status: New = Confirmed Thanks for doing this Anthony. Can I set the status myself next time or do we have rules on who handles bugs? I'm pretty sure anyone can do it. If not, I'm certainly willing to give people extra rights on the project if they want to help with bug triage. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 08:23 AM, Avi Kivity wrote: On 10/14/2010 02:54 PM, Anthony Liguori wrote: The key is to make the virtio-net devices multifunction and to fill out all 8 functions for each slot. This is unlikely to work right wrt pci hotplug. Yes. Our hotplug design is based on devices.. This is wrong, it should be based on bus-level concepts (like PCI slots). If we want to support a large number of interfaces, we need true multiport cards. This magic here creates a multiport virtio-net card so I'm not really sure what you're suggesting. It would certainly be nice to make this all more user friendly (and make hotplug work). The big issue is to fix hotplug. Yes, but this is entirely independent of multifunction devices. Today we shoe-horn hot remove into device_del. Instead, we should have explicit bus-level interfaces for hot remove. Regards, Anthony Liguori I don't see how we can make it user friendly, without making the ordinary case even more unfriendly. Looks like we need yet another level of indirection here.
Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)
On 10/14/2010 01:37 PM, Robert Relyea wrote: Anthony Liguori wrote: And how does the smart card state get migrated during migration? How do you keep it synced with QEMU? I don't understand the use-case behind this. Is this so that a local physical smart card can be passed through to a guest from a Spice client and when migration happens, the QEMU instance connects back to the Spice client? So the device is never actually migrated? A lot of this discussion has confused me until I realized we are talking 2 different models. My current understanding is that qemu assumes that all devices are local to the qemu instance (that is on the host). When you migrate you want to connect to the new hardware on the new host, not feed back to some general client. The only exception seems to be mouse and keyboard, where qemu depends on some external protocol (vncclient or xdesktop or the x protocol itself) to transport the mouse and keyboard events. Our model has been that the smart card is local to the user/client -- like the mouse and keyboard. When you migrate qemu you do not migrate the smart card itself, since it's still physically on your client machine (like the mouse and keyboard), and needs to be managed by the local drivers on that client machine (which knows how to talk to the specific smart card installed there). So the daemon stays right where it's at and connects to the new qemu instance as it comes up. This is where I think I was confused about your migration question. I think you are assuming that the smart card itself connects to new hardware on the new host, meaning the daemon itself needs to move. If that is the semantic you are trying to present, then you are quite right, it's ludicrous to have the external daemon as part of the emulation. Remote device passthrough is just a special case of passthrough. It's got interesting characteristics in that unlike local device passthrough, if you preserve the connection to the remove device, migration is still possible. However, remote device *emulation* is the thing that I'm concerned about. Having a device emulated outside of QEMU means that it's not possible to participate in many of QEMU's features (like live migration, tracing, debugging, etc.). Device creation is extremely complicated because you have to launch the external daemon and somehow configure that. I have no objection to remote device passthrough but I don't think remote device emulation is right for QEMU today. After talking to Alon in IRC, I think a better model for Spice would be to integrate the smart card emulation into QEMU and then develop a specific protocol for the smart card emulation to interface with the physical smart card. This interface isn't really any different than the network interface or the block interface in QEMU today. Regards, Anthony Liguori It now appears to me that qemu punts on this case, except for the keyboard and mouse -- well maybe not punts, but simply doesn't support any device that isn't on the host machine. If you look at the way qemu handles the sound device, for instance. Normally you want the sound to come out the speakers of the controlling console, not a random server that's hosting the guest. However, straight qemu doesn't handle things that way. The sound (if it comes out at all) comes out the server that qemu is installed on. When you migrate qemu, the sound now comes out the new server. This probably isn't a problem since most of the time someone is using the speaker, he's got the case where host == client. In that case it makes perfect sense to put the emulator inside qemu. In the case where we are running a hosted server service, it's highly unlikely anyone is going to be using sound (or an attached webcam, etc.). In fact migration for these devices are really a noop. Smart cards are really like these devices. In fact more than a few keyboards have built in smart card readers. The smart card model is I want the smart card at the same location as my keyboard and mouse. I use that set of smart cards to authenticate. The use case on machines running with a server is that some customers have a requirement that you need the smart card to log in and administer those machines. Those smart cards are ones the operator carries with him, not ones that would sit on some server farm. For their requirements, one needs a way to get back to the local client. As I said before, I don't think this requirement is unique. The only way to handle it is to run code on the client machine. The devices that run on that client are ones you don't migrate with qemu, but stay with the client itself and reconnect to the new instance. I agree that having a daemon for each devices will eventually become unweildy. It looks like spice is the answer for this scenario. If you have devices other than the mouse/keyboard/display that are located on the client == host, then you should assume the need for spice and not use
Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed
On 10/14/2010 02:44 PM, Jes Sorensen wrote: On 10/14/10 20:33, Alex Williamson wrote: We can't let the compiler define the alignment for qemu_cfg data. Signed-off-by: Alex Williamsonalex.william...@redhat.com --- 0.13 stable candidate? ACK I would say so. fw_cfg interfaces are somewhat difficult to rationalize about for compatibility. 0.13.0 is tagged already so it's too late to pull it in there. If we say we don't care about compatibility at the fw_cfg level, then it doesn't matter if we pull it into stable-0.13. If we do care, then this is an ABI breaker. I don't know that the answer is obvious to me. Regards, Anthony Liguori Jes
Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed
On 10/14/2010 02:58 PM, Alex Williamson wrote: On Thu, 2010-10-14 at 14:48 -0500, Anthony Liguori wrote: On 10/14/2010 02:44 PM, Jes Sorensen wrote: On 10/14/10 20:33, Alex Williamson wrote: We can't let the compiler define the alignment for qemu_cfg data. Signed-off-by: Alex Williamsonalex.william...@redhat.com --- 0.13 stable candidate? ACK I would say so. fw_cfg interfaces are somewhat difficult to rationalize about for compatibility. 0.13.0 is tagged already so it's too late to pull it in there. If we say we don't care about compatibility at the fw_cfg level, then it doesn't matter if we pull it into stable-0.13. If we do care, then this is an ABI breaker. If it works anywhere (I assume it works on 32bit), then it's only because it happened to get the alignment right. This just makes 64bit hosts get it right too. I don't see any compatibility issues, non-packed + 64bit = broken. Thanks, Ok, I'll buy that argument :-) Regards, Anthony Liguori Alex
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 04:42 PM, Richard W.M. Jones wrote: On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote: Or a PCI bridge to wire up more PCI buses, so we raise the max limit for any type of device we emulate. Break the 29/30/31 virtio-blk limit ... please! It was broken ages ago: anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions 422 /proc/partitions 251 1618 1 vdcx2 251 1621 489951 vdcx5 251 1632 10485760 vdcy 251 16339992398 vdcy1 251 1634 1 vdcy2 251 1637 489951 vdcy5 251 1648 10485760 vdcz 251 16499992398 vdcz1 251 1650 1 vdcz2 251 1653 489951 vdcz5 This is what makes qdev so useful. args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -drive file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn} args=$args -device virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm -serial stdio Regards, Anthony Liguori Rich.
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 05:00 PM, Anjali Kulkarni wrote: Can you send me pointers to the qdev documentation? How can I use it? Will it allow us to scale above the 32 PCI limit? It's all below. You just have to create a PCI device and mark the multifunction flag to on and then assign it a PCI address that includes a function number. Then you can pack 8 virtio PCI devices into a single slot. Regards, Anthony Liguori Anjali On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/14/2010 04:42 PM, Richard W.M. Jones wrote: On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote: Or a PCI bridge to wire up more PCI buses, so we raise the max limit for any type of device we emulate. Break the 29/30/31 virtio-blk limit ... please! It was broken ages ago: anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions 422 /proc/partitions 251 1618 1 vdcx2 251 1621 489951 vdcx5 251 1632 10485760 vdcy 251 16339992398 vdcy1 251 1634 1 vdcy2 251 1637 489951 vdcy5 251 1648 10485760 vdcz 251 16499992398 vdcz1 251 1650 1 vdcz2 251 1653 489951 vdcz5 This is what makes qdev so useful. args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -drive file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn} args=$args -device virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm -serial stdio Regards, Anthony Liguori Rich. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2] usb-ccid device (v2)
On 10/14/2010 05:03 PM, Robert Relyea wrote: Remote device passthrough is just a special case of passthrough. It's got interesting characteristics in that unlike local device passthrough, if you preserve the connection to the remove device, migration is still possible. However, remote device *emulation* is the thing that I'm concerned about. Having a device emulated outside of QEMU means that it's not possible to participate in many of QEMU's features (like live migration, tracing, debugging, etc.). Device creation is extremely complicated because you have to launch the external daemon and somehow configure that. There's always some emulation going on on the client side. The client side has the device drivers, so you are either emulating an actual device or you are emulating the abstraction you invent. Once you have the client side, you have to launch the external daemon anyway. That's not a very convincing argument. It's pretty simple really. We don't want to split QEMU into a bunch of different daemons that all implement device emulation in slightly different ways. The user complexity is enormous and the ability to manage the complexity because impossible because nothing is centralized. It seems to me that the best way to go is to provide the native host == client support like other devices and allow the passthru. If you really need client support, just run spice (which is a single client daemon that handles everything). Let's not confuse passthrough with implementing device emulation outside of QEMU. They are two very different things. I think a remote passthrough protocol who's sole purpose is to allow external device emulation is a bad idea for QEMU. After talking to Alon in IRC, I think a better model for Spice would be to integrate the smart card emulation into QEMU and then develop a specific protocol for the smart card emulation to interface with the physical smart card. This interface isn't really any different than the network interface or the block interface in QEMU today. I seems to me that a second protocol is overkill. Having 2 protocols is a bit much to manage. We can do everything we need with the passthru. How is external device emulation not overkill? I don't see why two protocols are necessary. You just need one. My worry about creating any thing else is we may not have the flexibility to handle future cards. Smart cards themselves are programmable, so the interface for new cards are pretty dynamic. My worry is that we're creating an impossible situation to maintain in the long term because device emulation is happening in 10 different places. If there's a bug in your smart card emulation, a guest can now break into a Spice client. Part of the advantage of keeping everything contained in a single place (QEMU) is that we can restrict QEMU from a security perspective via sVirt and other mechanisms. Once you split apart device emulation, you break that security model. Regards, Anthony Liguori bob Regards, Anthony Liguori It now appears to me that qemu punts on this case, except for the keyboard and mouse -- well maybe not punts, but simply doesn't support any device that isn't on the host machine. If you look at the way qemu handles the sound device, for instance. Normally you want the sound to come out the speakers of the controlling console, not a random server that's hosting the guest. However, straight qemu doesn't handle things that way. The sound (if it comes out at all) comes out the server that qemu is installed on. When you migrate qemu, the sound now comes out the new server. This probably isn't a problem since most of the time someone is using the speaker, he's got the case where host == client. In that case it makes perfect sense to put the emulator inside qemu. In the case where we are running a hosted server service, it's highly unlikely anyone is going to be using sound (or an attached webcam, etc.). In fact migration for these devices are really a noop. Smart cards are really like these devices. In fact more than a few keyboards have built in smart card readers. The smart card model is I want the smart card at the same location as my keyboard and mouse. I use that set of smart cards to authenticate. The use case on machines running with a server is that some customers have a requirement that you need the smart card to log in and administer those machines. Those smart cards are ones the operator carries with him, not ones that would sit on some server farm. For their requirements, one needs a way to get back to the local client. As I said before, I don't think this requirement is unique. The only way to handle it is to run code on the client machine. The devices that run on that client are ones you don't migrate with qemu, but stay with the client itself and reconnect to the new instance. I agree that having a daemon for each devices will eventually become unweildy. It looks like spice
Re: [Qemu-devel] Hitting 29 NIC limit
On 10/14/2010 05:12 PM, Anjali Kulkarni wrote: Thanks. Does this work for e1000 as well? Haven't tried. I don't know how various e1000 drivers would react. Also, does it support pci hotplug? No, but that's fixable down the road. Regards, Anthony Liguori Anjali On 10/14/10 3:09 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/14/2010 05:00 PM, Anjali Kulkarni wrote: Can you send me pointers to the qdev documentation? How can I use it? Will it allow us to scale above the 32 PCI limit? It's all below. You just have to create a PCI device and mark the multifunction flag to on and then assign it a PCI address that includes a function number. Then you can pack 8 virtio PCI devices into a single slot. Regards, Anthony Liguori Anjali On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 10/14/2010 04:42 PM, Richard W.M. Jones wrote: On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote: Or a PCI bridge to wire up more PCI buses, so we raise the max limit for any type of device we emulate. Break the 29/30/31 virtio-blk limit ... please! It was broken ages ago: anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions 422 /proc/partitions 251 1618 1 vdcx2 251 1621 489951 vdcx5 251 1632 10485760 vdcy 251 16339992398 vdcy1 251 1634 1 vdcy2 251 1637 489951 vdcy5 251 1648 10485760 vdcz 251 16499992398 vdcz1 251 1650 1 vdcz2 251 1653 489951 vdcz5 This is what makes qdev so useful. args= for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do for fn in 0 1 2 3 4 5 6 7; do args=$args -drive file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn} args=$args -device virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on done done x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} -enable-kvm -serial stdio Regards, Anthony Liguori Rich. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] Re: Passing in additional info to guest OS and e1000 test suite?
On 10/15/2010 01:49 PM, Anjali Kulkarni wrote: Hi, - If I want to pass in additional arguments to the guest OS while booting(in particular which slot I want to map a nic to) - is there any way to do it? Some kind of configuration file that I can pass in would also be ok for me. - Is there a KVM/Qemu/e1000 test suite that is already available and I can us? I'm a little confused about what you're really asking. Can you give a more illustrated example of what you're asking? Are you using libvirt? Thanks Anjali -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI device id in PCI ROM
On 10/15/2010 03:51 PM, Stefan Weil wrote: PCI device with different device ids sometimes share the same rom code. Only the device id and the checksum differ in a boot rom for such devices. BTW, SeaBIOS doesn't reject ROMs when they're loaded via rombar, only when they're loaded via romfile. Maybe it's better to use fw_cfg to explicitly tell SeaBIOS to ignore the PCI device id in the rom header for a certain device? Regards, Anthony Liguori The i825xx ethernet controller family is a typical example which is implemented in hw/eepro100.c. It uses at least 3 different device ids, so normally 3 boot roms would be needed. By automatically patching the device id (and the checksum) in qemu, all emulated family members can share the same boot rom. Cc: Markus Armbrusterarm...@redhat.com Cc: Michael S. Tsirkinm...@redhat.com Signed-off-by: Stefan Weilw...@mail.berlios.de --- hw/pci.c | 53 + 1 files changed, 53 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 1280d4d..c1f8218 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1797,6 +1797,57 @@ static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, p cpu_register_physical_memory(addr, size, pdev-rom_offset); } +/* Patch the PCI device id in a PCI rom image if necessary. + This is needed for an option rom which is used for more than one device. */ +static void pci_patch_device_id(PCIDevice *pdev, uint8_t *ptr, int size) +{ +uint16_t vendor_id; +uint16_t device_id; +uint16_t rom_vendor_id; +uint16_t rom_device_id; +uint16_t rom_magic; +uint16_t pcir_offset; + +/* Words in rom data are little endian (like in PCI configuration), + so they can be read / written with pci_get_word / pci_set_word. */ + +/* Only a valid rom will be patched. */ +rom_magic = pci_get_word(ptr); +if (rom_magic != 0xaa55) { +PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic); +return; +} +pcir_offset = pci_get_word(ptr + 0x18); +if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 4)) { +PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, pcir_offset); +return; +} + +vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID); +device_id = pci_get_word(pdev-config + PCI_DEVICE_ID); +rom_vendor_id = pci_get_word(ptr + pcir_offset + 4); +rom_device_id = pci_get_word(ptr + pcir_offset + 6); + +/* Don't patch a rom with wrong vendor id (might be changed if needed). */ +if (vendor_id != rom_vendor_id) { +return; +} + +PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n, +vendor_id, device_id, rom_vendor_id, rom_device_id); + +if (device_id != rom_device_id) { +/* Patch device id and checksum (at offset 6 for etherboot roms). */ +uint8_t checksum; +checksum = ptr[6]; +checksum += (uint8_t)rom_device_id + (uint8_t)(rom_device_id 8); +checksum -= (uint8_t)device_id + (uint8_t)(device_id 8); +PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum); +ptr[6] = checksum; +pci_set_word(ptr + pcir_offset + 6, device_id); +} +} + /* Add an option rom for the device */ static int pci_add_option_rom(PCIDevice *pdev) { @@ -1849,6 +1900,8 @@ static int pci_add_option_rom(PCIDevice *pdev) load_image(path, ptr); qemu_free(path); +pci_patch_device_id(pdev, ptr, size); + pci_register_bar(pdev, PCI_ROM_SLOT, size, 0, pci_map_option_rom);
Re: [Qemu-devel] Changelog of qemu-0.13.0.tar.gz ?
On 10/17/2010 11:58 PM, Sergei Steshenko wrote: Hello, though there is already http://download.savannah.gnu.org/releases/qemu/qemu-0.13.0.tar.gz available, I don't see its changelog on http://wiki.qemu.org/Index.html . Is it expected to be this way ? I haven't sent the announce yet (which will be coming very soon). Regards, Anthony Liguori Thanks, Sergei.
Re: [Qemu-devel] LP#584139
On 10/18/2010 06:34 AM, Michael Tokarev wrote: Can we fix this trivial bug please? See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578846 https://bugs.launchpad.net/qemu/+bug/584139 I switched qemu-kvm in debian to use qemu-keymaps package (separately packaged keymaps), but it re-introduces debian#578846. Thanks! Can you send a patch to the mailing list with a Signed-off-by and an explanation of the change? Regards, Anthony Liguori /mjt
[Qemu-devel] [ANNOUNCE] Release 0.13.0 of QEMU
The QEMU team is pleased to announce the availability of the 0.13.0 release. This release consists of over 2,500 commits from 145 contributors. Some major features were added in this release including: - vhost-net: kernel-accelerating network backend for virtio devices (using KVM) - qmp: significant improvements covering most monitor commands - vnc: introduction of new encodings that dramatically improve bandwidth (part of GSoC project) - ivshmem: new shared memory device allowing multiple guests to share a memory region - mips: introduction of fulong mini-pc - virtio-9p: introduction of a paravirtual file system passthrough mechanism - hpet: many enhancements - target-s390: support for s390 usermode emulation - many more features and bug fixes It can be downloaded from Savannah at: http://download.savannah.gnu.org/releases/qemu/qemu-0.13.0.tar.gz For detailed Changelogs, please consult the revision history in git. On behalf of the QEMU team, I'd like to thank everyone who contributed to make this release happen! A special note about QMP support in 0.13.0. QMP is still considered experimental in 0.13.0. There are no plans to change the protocol in an incompatible way but there are likely to be missing features. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()
On 10/18/2010 03:22 AM, Roedel, Joerg wrote: (Sorry for the late reply) On Thu, Oct 07, 2010 at 08:48:06AM -0400, Anthony Liguori wrote: On 10/07/2010 03:42 AM, Roedel, Joerg wrote: On Wed, Oct 06, 2010 at 03:24:59PM -0400, Anthony Liguori wrote: +qemu_compat_version = machine-compat_version; + if (display_type == DT_NOGRAPHIC) { if (default_parallel) add_device_config(DEV_PARALLEL, null); -- 1.7.0.4 Looks fine to me, given CPUs are not in qdev. Anthony? The idea is fine, but why not just add the default CPU to the machine description? If I remember correctly the reason was that the machine description was not accessible in the cpuid initialization path because it is a function local variable. Not tested at all but I think the attached patch addresses it in a pretty nice way. There's a couple ways you could support your patch on top of this. You could add a kvm_cpu_model to the machine structure that gets defaulted too if kvm_enabled(). You could also introduce a new KVM machine type that gets defaulted to if no explicit machine is specified. I had something similar in mind but then I realized that we need at least a cpu_model and a cpu_model_kvm to distinguish between the TCG and the KVM case. I would think that having different default machines for KVM and TCG would be a better solution. Further the QEMUMachine data structure is used for all architectures in QEMU and the model-names only make sense for x86. SPARC uses cpu_model too FWIW. I believe Blue Swirl has even discussed using a feature-format similar to how x86 does it for SPARC CPUs. Regards, Anthony Liguori So I decided for the comapt-version way (which doesn't mean I object against this one ;-) ) Joerg From d2370c88cef4b07d48ba3c4804e35ae2db8db7c0 Mon Sep 17 00:00:00 2001 From: Anthony Liguorialigu...@us.ibm.com Date: Thu, 7 Oct 2010 07:43:42 -0500 Subject: [PATCH] machine: make default cpu model part of machine structure Signed-off-by: Anthony Liguorialigu...@us.ibm.com diff --git a/hw/boards.h b/hw/boards.h index 6f0f0d7..8c6ef27 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -16,6 +16,7 @@ typedef struct QEMUMachine { const char *name; const char *alias; const char *desc; +const char *cpu_model; QEMUMachineInitFunc *init; int use_scsi; int max_cpus; diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..0826107 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -866,14 +866,6 @@ void pc_cpus_init(const char *cpu_model) int i; /* init CPUs */ -if (cpu_model == NULL) { -#ifdef TARGET_X86_64 -cpu_model = qemu64; -#else -cpu_model = qemu32; -#endif -} - for(i = 0; i smp_cpus; i++) { pc_new_cpu(cpu_model); } diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 12359a7..919b4d6 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -204,17 +204,22 @@ static void pc_init_isa(ram_addr_t ram_size, const char *initrd_filename, const char *cpu_model) { -if (cpu_model == NULL) -cpu_model = 486; pc_init1(ram_size, boot_device, kernel_filename, kernel_cmdline, initrd_filename, cpu_model, 0); } +#ifdef TARGET_X86_64 +#define DEF_CPU_MODEL qemu64 +#else +#define DEF_CPU_MODEL qemu32 +#endif + static QEMUMachine pc_machine = { .name = pc-0.13, .alias = pc, .desc = Standard PC, +.cpu_model = DEF_CPU_MODEL, .init = pc_init_pci, .max_cpus = 255, .is_default = 1, @@ -223,6 +228,7 @@ static QEMUMachine pc_machine = { static QEMUMachine pc_machine_v0_12 = { .name = pc-0.12, .desc = Standard PC, +.cpu_model = DEF_CPU_MODEL, .init = pc_init_pci, .max_cpus = 255, .compat_props = (GlobalProperty[]) { @@ -242,6 +248,7 @@ static QEMUMachine pc_machine_v0_12 = { static QEMUMachine pc_machine_v0_11 = { .name = pc-0.11, .desc = Standard PC, qemu 0.11, +.cpu_model = DEF_CPU_MODEL, .init = pc_init_pci, .max_cpus = 255, .compat_props = (GlobalProperty[]) { @@ -277,6 +284,7 @@ static QEMUMachine pc_machine_v0_11 = { static QEMUMachine pc_machine_v0_10 = { .name = pc-0.10, .desc = Standard PC, qemu 0.10, +.cpu_model = DEF_CPU_MODEL, .init = pc_init_pci, .max_cpus = 255, .compat_props = (GlobalProperty[]) { @@ -324,6 +332,7 @@ static QEMUMachine pc_machine_v0_10 = { static QEMUMachine isapc_machine = { .name = isapc, .desc = ISA-only PC, +.cpu_model = 486, .init = pc_init_isa, .max_cpus = 1, }; diff --git a/vl.c b/vl.c index df414ef..3a55cc8 100644 --- a/vl.c +++ b/vl.c @@ -2904,6 +2904,10 @@ int main(int argc, char **argv, char **envp) } qemu_add_globals(); +if (cpu_model == NULL) { +cpu_model = machine-cpu_model; +} + machine-init
Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI device id in PCI ROM
On 10/18/2010 05:09 AM, Gerd Hoffmann wrote: On 10/15/10 23:05, Anthony Liguori wrote: On 10/15/2010 03:51 PM, Stefan Weil wrote: PCI device with different device ids sometimes share the same rom code. Only the device id and the checksum differ in a boot rom for such devices. BTW, SeaBIOS doesn't reject ROMs when they're loaded via rombar, only when they're loaded via romfile. SeaBIOS rejects them when loaded from the rom bar and doesn't reject them when loaded via fw_cfg. What I meant was, rombar=0 in qdev. Sometimes my fingers don't work the same way my brain does :-) Using the rom bar is the prefered way though, fw_cfg is only there for compatibility with older versions. Maybe it's better to use fw_cfg to explicitly tell SeaBIOS to ignore the PCI device id in the rom header for a certain device? Patching the rom is fine IMHO. Why create + use a separate communication path when we can use a much simpler approach? How does this interact with PCI device passthrough? We clearly can't patch in that case whereas if we had a hint to SeaBIOS, it would still work. Regards, Anthony Liguori cheers, Gerd
Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/2010 12:58 PM, Michael S. Tsirkin wrote: On Mon, Oct 18, 2010 at 07:55:11PM +0200, Stefan Weil wrote: PCI devices with different vendor or device ids sometimes share the same rom code. Only the ids and the checksum differs in a boot rom for such devices. The i825xx ethernet controller family is a typical example which is implemented in hw/eepro100.c. It uses at least 3 different device ids, so normally 3 boot roms would be needed. By automatically patching vendor id and device id (and the checksum) in qemu, all emulated family members can share the same boot rom. VGA bios roms are another example with different vendor and device ids. v2: * Patch also the vendor id (and remove the sanity check for vendor id). Cc: Gerd Hoffmannkra...@redhat.com Cc: Markus Armbrusterarm...@redhat.com Cc: Michael S. Tsirkinm...@redhat.com Signed-off-by: Stefan Weilw...@mail.berlios.de --- hw/pci.c | 58 ++ 1 files changed, 58 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 1280d4d..139eb24 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, p cpu_register_physical_memory(addr, size, pdev-rom_offset); } +/* Patch the PCI vendor and device ids in a PCI rom image if necessary. + This is needed for an option rom which is used for more than one device. */ +static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size) let's return an error code on malformed roms so management can detect errors? A bad/missing PnP header does not mean it's an invalid ROM. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/2010 12:55 PM, Stefan Weil wrote: PCI devices with different vendor or device ids sometimes share the same rom code. Only the ids and the checksum differs in a boot rom for such devices. The i825xx ethernet controller family is a typical example which is implemented in hw/eepro100.c. It uses at least 3 different device ids, so normally 3 boot roms would be needed. By automatically patching vendor id and device id (and the checksum) in qemu, all emulated family members can share the same boot rom. VGA bios roms are another example with different vendor and device ids. v2: * Patch also the vendor id (and remove the sanity check for vendor id). Cc: Gerd Hoffmannkra...@redhat.com Cc: Markus Armbrusterarm...@redhat.com Cc: Michael S. Tsirkinm...@redhat.com Signed-off-by: Stefan Weilw...@mail.berlios.de I get very nervous about patching a ROM. Who's to say that the ROM doesn't somehow depend on the contents of its header? Maybe it has an internal CRC built into it or something like that. Regards, Anthony Liguori --- hw/pci.c | 58 ++ 1 files changed, 58 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 1280d4d..139eb24 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, p cpu_register_physical_memory(addr, size, pdev-rom_offset); } +/* Patch the PCI vendor and device ids in a PCI rom image if necessary. + This is needed for an option rom which is used for more than one device. */ +static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size) +{ +uint16_t vendor_id; +uint16_t device_id; +uint16_t rom_vendor_id; +uint16_t rom_device_id; +uint16_t rom_magic; +uint16_t pcir_offset; +uint8_t checksum; + +/* Words in rom data are little endian (like in PCI configuration), + so they can be read / written with pci_get_word / pci_set_word. */ + +/* Only a valid rom will be patched. */ +rom_magic = pci_get_word(ptr); +if (rom_magic != 0xaa55) { +PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic); +return; +} +pcir_offset = pci_get_word(ptr + 0x18); +if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 4)) { +PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, pcir_offset); +return; +} + +vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID); +device_id = pci_get_word(pdev-config + PCI_DEVICE_ID); +rom_vendor_id = pci_get_word(ptr + pcir_offset + 4); +rom_device_id = pci_get_word(ptr + pcir_offset + 6); + +PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n, +vendor_id, device_id, rom_vendor_id, rom_device_id); + +checksum = ptr[6]; + +if (vendor_id != rom_vendor_id) { +/* Patch vendor id and checksum (at offset 6 for etherboot roms). */ +checksum += (uint8_t)rom_vendor_id + (uint8_t)(rom_vendor_id 8); +checksum -= (uint8_t)vendor_id + (uint8_t)(vendor_id 8); +PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum); +ptr[6] = checksum; +pci_set_word(ptr + pcir_offset + 4, vendor_id); +} + +if (device_id != rom_device_id) { +/* Patch device id and checksum (at offset 6 for etherboot roms). */ +checksum += (uint8_t)rom_device_id + (uint8_t)(rom_device_id 8); +checksum -= (uint8_t)device_id + (uint8_t)(device_id 8); +PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum); +ptr[6] = checksum; +pci_set_word(ptr + pcir_offset + 6, device_id); +} +} + /* Add an option rom for the device */ static int pci_add_option_rom(PCIDevice *pdev) { @@ -1849,6 +1905,8 @@ static int pci_add_option_rom(PCIDevice *pdev) load_image(path, ptr); qemu_free(path); +pci_patch_ids(pdev, ptr, size); + pci_register_bar(pdev, PCI_ROM_SLOT, size, 0, pci_map_option_rom);
Re: [Qemu-devel] [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/2010 01:44 PM, Anthony Liguori wrote: On 10/18/2010 12:55 PM, Stefan Weil wrote: PCI devices with different vendor or device ids sometimes share the same rom code. Only the ids and the checksum differs in a boot rom for such devices. The i825xx ethernet controller family is a typical example which is implemented in hw/eepro100.c. It uses at least 3 different device ids, so normally 3 boot roms would be needed. By automatically patching vendor id and device id (and the checksum) in qemu, all emulated family members can share the same boot rom. VGA bios roms are another example with different vendor and device ids. v2: * Patch also the vendor id (and remove the sanity check for vendor id). Cc: Gerd Hoffmannkra...@redhat.com Cc: Markus Armbrusterarm...@redhat.com Cc: Michael S. Tsirkinm...@redhat.com Signed-off-by: Stefan Weilw...@mail.berlios.de I get very nervous about patching a ROM. Who's to say that the ROM doesn't somehow depend on the contents of its header? Maybe it has an internal CRC built into it or something like that. As part of PMM, ROMs typically reduce their size by decompressing and removing code or something of that nature and then rewrite themselves in scratch RAM. The BIOS then copies the resulting ROM (using the ROM size in the base header as an indication of how much to copy) into the option ROM space. So the likelihood of depending on the contents of the header seems non-trivial to me. Regards, Anthony Liguori Regards, Anthony Liguori --- hw/pci.c | 58 ++ 1 files changed, 58 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 1280d4d..139eb24 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1797,6 +1797,62 @@ static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, p cpu_register_physical_memory(addr, size, pdev-rom_offset); } +/* Patch the PCI vendor and device ids in a PCI rom image if necessary. + This is needed for an option rom which is used for more than one device. */ +static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size) +{ +uint16_t vendor_id; +uint16_t device_id; +uint16_t rom_vendor_id; +uint16_t rom_device_id; +uint16_t rom_magic; +uint16_t pcir_offset; +uint8_t checksum; + +/* Words in rom data are little endian (like in PCI configuration), + so they can be read / written with pci_get_word / pci_set_word. */ + +/* Only a valid rom will be patched. */ +rom_magic = pci_get_word(ptr); +if (rom_magic != 0xaa55) { +PCI_DPRINTF(Bad ROM magic %04x\n, rom_magic); +return; +} +pcir_offset = pci_get_word(ptr + 0x18); +if (pcir_offset + 8= size || memcmp(ptr + pcir_offset, PCIR, 4)) { +PCI_DPRINTF(Bad PCIR offset 0x%x or signature\n, pcir_offset); +return; +} + +vendor_id = pci_get_word(pdev-config + PCI_VENDOR_ID); +device_id = pci_get_word(pdev-config + PCI_DEVICE_ID); +rom_vendor_id = pci_get_word(ptr + pcir_offset + 4); +rom_device_id = pci_get_word(ptr + pcir_offset + 6); + +PCI_DPRINTF(ROM id %04x%04x / PCI id %04x%04x\n, +vendor_id, device_id, rom_vendor_id, rom_device_id); + +checksum = ptr[6]; + +if (vendor_id != rom_vendor_id) { +/* Patch vendor id and checksum (at offset 6 for etherboot roms). */ +checksum += (uint8_t)rom_vendor_id + (uint8_t)(rom_vendor_id 8); +checksum -= (uint8_t)vendor_id + (uint8_t)(vendor_id 8); +PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum); +ptr[6] = checksum; +pci_set_word(ptr + pcir_offset + 4, vendor_id); +} + +if (device_id != rom_device_id) { +/* Patch device id and checksum (at offset 6 for etherboot roms). */ +checksum += (uint8_t)rom_device_id + (uint8_t)(rom_device_id 8); +checksum -= (uint8_t)device_id + (uint8_t)(device_id 8); +PCI_DPRINTF(ROM checksum %02x / %02x\n, ptr[6], checksum); +ptr[6] = checksum; +pci_set_word(ptr + pcir_offset + 6, device_id); +} +} + /* Add an option rom for the device */ static int pci_add_option_rom(PCIDevice *pdev) { @@ -1849,6 +1905,8 @@ static int pci_add_option_rom(PCIDevice *pdev) load_image(path, ptr); qemu_free(path); +pci_patch_ids(pdev, ptr, size); + pci_register_bar(pdev, PCI_ROM_SLOT, size, 0, pci_map_option_rom);
Re: Testing of russian keymap (was Re: [Qemu-devel] [PATCH] fix '/' and '|' on russian keymap)
On 10/18/2010 12:30 PM, Oleg Sadov wrote: I don't understand reasons for such locale-default keyboard settings for qemu too, but may be it's useful for someone... -k only exists to deal with crappy VNC clients. If you use a good VNC client (like vinagre or virt-viewer) then you don't have to use -k. Regards, Anthony Liguori Thanks! /mjt Regards! --Oleg
Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/2010 02:03 PM, Michael S. Tsirkin wrote: On Mon, Oct 18, 2010 at 01:42:06PM -0500, Anthony Liguori wrote: +/* Patch the PCI vendor and device ids in a PCI rom image if necessary. + This is needed for an option rom which is used for more than one device. */ +static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size) let's return an error code on malformed roms so management can detect errors? A bad/missing PnP header does not mean it's an invalid ROM. I don't see this as a generic capability - rather a specific hack that helps reduce some duplication for eepro100 and friends. As such, if we can't patch the id we know it's an invalid file. This code is unconditional in the pci option rom loading path. If it's restricted to a qdev property that's defaulted to enabled for the eepro cards, that would be a reasonable argument to make. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH 1/2] pci: Automatically patch PCI vendor id and device id in PCI ROM
On 10/18/2010 02:36 PM, Stefan Weil wrote: Maybe a more perfect solution would only patch the preconfigured rom files but not user configured files, but I don't think we need this degree of perfection. Generally speaking, patching third-party code is not something that we should get in the habit of doing unless we're very very sure that it's okay and we have as many checks in place as possible to avoid bad things from happening. There are so many bad things that can happen. If attempted to support attestation in QEMU and prepopulated a virtual TPM with checksums from the BIOS and ROMs, when the virtual BIOS attempts to measure itself if we've patched the ROM underneath of it, then the measurements will fail. In the very least, if we go this route, it has to be an optional feature. Regards, Anthony Liguori Regards, Stefan
Re: [Qemu-devel] [PATCH 0/7] ATAPI CDROM passthrough v5
On 10/18/2010 06:29 PM, Alexander Graf wrote: A user will get a really nasty surprise if they think they can use a flag or rely on QEMU to prevent a VM from doing something nasty with a device. If they have this feeling of security, they're likely to chmod the device to allow unprivileged users to access it. But how a device handles ATAPI commands is totally up to the device. If you issue the wrong sequence, I'm sure there are devices out there that totally hose themselves. Are you absolutely confident that every ATAPI device out there is completely safe against hostile code provided that you simply prevent the FW update commands? I'm certainly not. Ping? Who are you pinging? Regards, Anthony Liguori Alex
Re: [Qemu-devel] Re: KVM call agenda for Oct 19
On 10/19/2010 08:03 AM, Avi Kivity wrote: On 10/19/2010 02:58 PM, Dor Laor wrote: On 10/19/2010 02:55 PM, Avi Kivity wrote: On 10/19/2010 02:48 PM, Dor Laor wrote: On 10/19/2010 04:11 AM, Chris Wright wrote: * Juan Quintela (quint...@redhat.com) wrote: Please send in any agenda items you are interested in covering. - 0.13.X -stable handoff - 0.14 planning - threadlet work - virtfs proposals - Live snapshots - We were asked to add this feature for external qcow2 images. Will simple approach of fsync + tracking each requested backing file (it can be per vDisk) and re-open the new image would be accepted? - Integration with FS freeze for consistent guest app snapshot Many apps do not sync their ram state to disk correctly or frequent enough. Physical world backup software calls fs freeze on xfs and VSS for windows to make the backup consistent. In order to integrated this with live snapshots we need a guest agent to trigger the guest fs freeze. We can either have qemu communicate with the agent directly through virtio-serial or have a mgmt daemon use virtio-serial to communicate with the guest in addition to QMP messages about the live snapshot state. Preferences? The first solution complicates qemu while the second complicates mgmt. Third option, make the freeze path management - qemu - virtio-blk - guest kernel - file systems. The advantage is that it's easy to associate file systems with a block device this way. OTH the userspace freeze path already exist and now you create another path. I guess we would still have a userspace daemon; instead of talking to virtio-serial it talks to virtio-blk. So: management - qemu - virtio-blk - guest driver - kernel fs resolver - daemon - apps Yuck. Yeah, in Windows, I'm pretty sure the freeze API is a userspace concept. Various apps can hook into it to serialize their state. At the risk of stealing Mike's thunder, we've actually been working on a simple guest agent exactly for this type of task. Mike's planning an RFC for later this week but for those that are interested the repo is at http://repo.or.cz/w/qemu/mdroth.git Regards, Anthony Liguori What about FS that span over LVM with multiple drives? IDE/SCSI? Good points.