[Qemu-devel] Re: [PATCH] kvm: x86: Save/restore error_code

2010-12-10 Thread Jason Wang
Juan Quintela writes:
  Jason Wang jasow...@redhat.com wrote:
   Juan Quintela writes:
 Jason Wang jasow...@redhat.com wrote:
  The saving and restoring of error_code seems lost and convert the
  error_code to uint32_t.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   target-i386/cpu.h |4 ++--
   target-i386/machine.c |2 ++
   2 files changed, 4 insertions(+), 2 deletions(-)
 
 It should be a new subsection.  The test is if has_error_code != 0
 according to gleb.
 
 Later, Juan.
 
  
   Thanks for reminding, and maybe we can just use VMSTATE_UINT32_TEST() 
   which is
   simpler than subsection to do the check, isn't it?
  
  we need the subsection, that way we don't need to bump the section
  version.
  
  Later, Juan.
  

Have tried the subsection, but there's an issue I find:

When we use subsections with the structure who have an nested
VMStateDescription at the end of fields, the subsections could not be loaded
correctly becuase qemu always try to match the subsection with the nested one.

So when we use subsections for vmstate_cpu, the subsection always fail as it try
to load the cpu's subsection for vmstate_ymmh_reg which is an nested
VMStateDescription on the end.

Maybe we need to modify vmstate_subsection_load() to handle this condition.

Any thought about this?

  
  diff --git a/target-i386/cpu.h b/target-i386/cpu.h
  index 06e40f3..c990db9 100644
  --- a/target-i386/cpu.h
  +++ b/target-i386/cpu.h
  @@ -688,7 +688,7 @@ typedef struct CPUX86State {
   uint64_t pat;
   
   /* exception/interrupt handling */
  -int error_code;
  +uint32_t error_code;
   int exception_is_int;
   target_ulong exception_next_eip;
   target_ulong dr[8]; /* debug registers */
  @@ -933,7 +933,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
   #define cpu_list_id x86_cpu_list
   #define cpudef_setup x86_cpudef_setup
   
  -#define CPU_SAVE_VERSION 12
  +#define CPU_SAVE_VERSION 13
   
   /* MMU modes definitions */
   #define MMU_MODE0_SUFFIX _kernel
  diff --git a/target-i386/machine.c b/target-i386/machine.c
  index d78eceb..0e467da 100644
  --- a/target-i386/machine.c
  +++ b/target-i386/machine.c
  @@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
   VMSTATE_UINT64_V(xcr0, CPUState, 12),
   VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
   VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
  +
  +VMSTATE_UINT32_V(error_code, CPUState, 13),
   VMSTATE_END_OF_LIST()
   /* The above list is not sorted /wrt version numbers, watch 
   out! */
   },
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Qemu-devel] Re: [PATCH V2] qemu, kvm: Enable user space NMI injection for kvm guest

2010-12-10 Thread Jan Kiszka
Am 10.12.2010 08:42, Lai Jiangshan wrote:
 
 Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
 user space raised them. (example: qemu monitor's nmi command)
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 ---
 diff --git a/configure b/configure
 index 2917874..f6f9362 100755
 --- a/configure
 +++ b/configure
 @@ -1646,6 +1646,9 @@ if test $kvm != no ; then
  #if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
  #error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
  #endif
 +#if !defined(KVM_CAP_USER_NMI)
 +#error Missing KVM capability KVM_CAP_USER_NMI
 +#endif
  int main(void) { return 0; }
  EOF
if test $kerneldir !=  ; then

That's what I meant.

We also have a runtime check for KVM_CAP_DESTROY_MEMORY_REGION_WORKS on
kvm init, but IMHO adding the same for KVM_CAP_USER_NMI would be
overkill. So...

 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 7dfc357..755f8c9 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1417,6 +1417,13 @@ int kvm_arch_get_registers(CPUState *env)
  
  int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
  {
 +/* Inject NMI */
 +if (env-interrupt_request  CPU_INTERRUPT_NMI) {
 +env-interrupt_request = ~CPU_INTERRUPT_NMI;
 +DPRINTF(injected NMI\n);
 +kvm_vcpu_ioctl(env, KVM_NMI);
 +}
 +
  /* Try to inject an interrupt if the guest can accept it */
  if (run-ready_for_interrupt_injection 
  (env-interrupt_request  CPU_INTERRUPT_HARD) 

Acked-by: Jan Kiszka jan.kis...@siemens.com

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



[Qemu-devel] Re: [PATCH] fix qruncom compilation problems

2010-12-10 Thread Paolo Bonzini

On 12/09/2010 06:29 PM, Stefano Bonifazi wrote:

how can one think that addresses around zero are free for a mapping??


Addresses around zero are always free, because if they weren't you 
couldn't detect NULL pointer dereferences reliably.


mmap-ing at zero thus is a tricky operation, because it removes the 
possibility to detect NULL pointer dereferences.  What's worse, such 
ability would be lost even for _kernel_ dereferences of NULL, thus 
opening a large security hole for privilege-escalation or kernel 
exploits.  So, mmap-ing addresses close to zero is restricted to root.


Paolo



[Qemu-devel] Re: [PATCH 09/13] ahci: add ahci emulation

2010-12-10 Thread Kevin Wolf
Am 09.12.2010 17:18, schrieb Alexander Graf:
 Kevin Wolf wrote:
 Am 09.12.2010 16:48, schrieb Alexander Graf:
   
 +static void ncq_cb(void *opaque, int ret)
 +{
 +NCQTransferState *ncq_tfs = (NCQTransferState *)opaque;
 +IDEState *ide_state;
 +
 +if (ret  0) {
 +/* XXX error */
 +}
 
 
 Missing error handling.
   
   
 Yes, that's what the XXX stands for :).
 

 I think Stefan wanted to tell us that he thinks this XXX should be
 addressed. I don't disagree, by the way. ;-)

   
 +static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
 +int slot, QEMUSGList *sg)
 +{
 +NCQFrame *ncq_fis = (NCQFrame*)cmd_fis;
 +uint8_t tag = ncq_fis-tag  3;
 +NCQTransferState *ncq_tfs = s-dev[port].ncq_tfs[tag];
 +
 +if (ncq_tfs-used) {
 +/* error - already in use */
 +fprintf(stderr, %s: tag %d already used\n, __FUNCTION__, tag);
 +return;
 +}
 +
 +ncq_tfs-used = 1;
 +ncq_tfs-drive = s-dev[port];
 +ncq_tfs-drive-cmd_fis = cmd_fis;
 +ncq_tfs-drive-cmd_fis_len = 0x20;
 +ncq_tfs-slot = slot;
 +ncq_tfs-lba = ((uint64_t)ncq_fis-lba5  40) |
 +   ((uint64_t)ncq_fis-lba4  32) |
 +   ((uint64_t)ncq_fis-lba3  24) |
 +   ((uint64_t)ncq_fis-lba2  16) |
 +   ((uint64_t)ncq_fis-lba1  8) |
 +   (uint64_t)ncq_fis-lba0;
 +
 +/* Note: We calculate the sector count, but don't currently rely on 
 it.
 + * The total size of the DMA buffer tells us the transfer size 
 instead. */
 +ncq_tfs-sector_count = ((uint16_t)ncq_fis-sector_count_high  8) |
 +ncq_fis-sector_count_low;
 +
 +DPRINTF(port, NCQ transfer LBA from %ld to %ld, drive max %ld\n,
 +ncq_tfs-lba, ncq_tfs-lba + ncq_tfs-sector_count - 2,
 +s-dev[port].port.ifs[0].nb_sectors - 1);
 +
 +ncq_tfs-sglist = *sg;
 +ncq_tfs-tag = tag;
 +
 +switch(ncq_fis-command) {
 +case READ_FPDMA_QUEUED:
 +DPRINTF(port, NCQ reading %d sectors from LBA %ld, tag 
 %d\n,
 +ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag);
 +ncq_tfs-is_read = 1;
 +
 +/* XXX: The specification is unclear about whether the DMA 
 Setup
 + * FIS here should have the I bit set, but it suggest that 
 it should
 + * not. Linux works without this interrupt, so I disabled it.
 + * If someone knows if it is needed, please tell me, or fix 
 this. */
 +
 +/* ahci_trigger_irq(s,s-dev[port],PORT_IRQ_STAT_DSS); */
 +DPRINTF(port, tag %d aio read %ld\n, ncq_tfs-tag, 
 ncq_tfs-lba);
 +dma_bdrv_read(ncq_tfs-drive-port.ifs[0].bs, 
 ncq_tfs-sglist,
 +  ncq_tfs-lba, ncq_cb, ncq_tfs);
 +break;
 +case WRITE_FPDMA_QUEUED:
 +DPRINTF(port, NCQ writing %d sectors to LBA %ld, tag %d\n,
 +ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag);
 +ncq_tfs-is_read = 0;
 +/* ahci_trigger_irq(s,s-dev[port],PORT_IRQ_STAT_DSS); */
 +DPRINTF(port, tag %d aio write %ld\n, ncq_tfs-tag, 
 ncq_tfs-lba);
 +dma_bdrv_write(ncq_tfs-drive-port.ifs[0].bs, 
 ncq_tfs-sglist,
 +   ncq_tfs-lba, ncq_cb, ncq_tfs);
 +break;
 +default:
 +hw_error(ahci: tried to process non-NCQ command as NCQ\n);
 
 
 Guest triggerable abort.
   
   
 Those happen. The guest can shoot itself in the foot. We have more of
 these in other places. Just check virtio.c and search for abort() :).
 

 They are bugs which should be fixed in virtio rather than being spread
 to new code.
   
 
 Not sure about that. Would you prefer a broken guest to abort so you can
 debug it or to have it spew your log files with error messages or to
 silently ignore errors and never find bugs?

If you need it for debugging, maybe have a DPRINTF_ERROR macro that
aborts with an error message if DEBUG is defined, and doesn't do
anything in normal builds? I'm still not sure if aborting is a good idea
there, but I think it would be acceptable.

For normal builds, the preferred behaviour is doing the same as real
hardware in such cases. And real hardware doesn't kill the machine when
the driver is doing stupid things, but rather sets an error bit in some
register or something.

Kevin



[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing writeback cache option

2010-12-10 Thread Stephane Chazelas
For the record, there's more on that bug at
http://thread.gmane.org/gmane.linux.ubuntu.bugs.server/36923

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/595117

Title:
  qemu-nbd slow and missing writeback cache option

Status in QEMU:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Expired

Bug description:
  Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and qemu-nbd -d doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless syncs are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
syncs will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon qemu-nbd -d to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a  /dev/null
qemu-nbd --cache=x -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT-sync or die$!' 1 /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.





Re: [Qemu-devel] [RFC][PATCH v5 00/21] virtagent: host/guest RPC communication agent

2010-12-10 Thread Stefan Hajnoczi
On Thu, Dec 9, 2010 at 8:45 PM, Michael Roth mdr...@linux.vnet.ibm.com wrote:
 On 12/08/2010 04:10 AM, Stefan Hajnoczi wrote:
 What concrete use-cases are there?
 * Reboot support on x86.  A QMP command can invoke guest-initiated
 reboot via virtagent.
 * ?


 * viewfile
 The ability to do a quick peek at guest stats via, say, /proc, is a use case
 that seems to be fairly generally desirable. It's what essentially started
 all this work, actually. That's also why I think a simple/limited viewfile
 RPC for relatively small text files is warranted regardless of whatever
 approach we end up taking for handling large/binary file transfers.

 * getdmesg
 Really useful for trouble-shooting things like soft-lockups.

 * ping
 Heartbeat monitoring has also been a fairly re-occurring approach to
 identifying potential problems in our cloud, and it's not even something
 we're capable of accomplishing in a production environment due to having
 limited network access to the guests. Being able to do it without relying on
 custom/network-based daemons would be pretty useful.

 * exec (planned)
 Internally and externally I've seen interest in guest-initiated snapshots,
 but that would tie into exec or some other, higher-level, RPC, which isn't
 yet well-defined.

 * copyfile (planned)
 Nothing solid, but I think it's generally desirable. Quick access to
 coredumps and such would be useful. Lots of discussion on how to implement
 this and I think we have some good potential approaches to adding this soon.

 * writefile (planned)
 Nothing solid. But guest activation is probably a big potential use case for
 this one. Managing various guest kernel params is another. Another would be
 deploying custom scripts to run via exec.

Perhaps getfile and putfile as names?

 Will virtagent be extensible by host administrators or end-users?  For
 example, can I drop in a custom command to collect statistics and
 invoke it across VMs on my hosts?  Do I need to recompile QEMU and/or
 the virtagent daemon?

 writefile + exec would probably be the best way to achieve this. I don't
 think there are any plans to make the supported set of RPCs
 pluggable/extendable.

Thanks for explaining this, I think this is an important aspect of
virtagent.  Users will want to take advantage of an always-available
host-guest management transport.  There should be a clear scope
defined for virtagent.

For example users might include backup, monitoring, and inventory
(making sure installed software is up-to-date) software.

I imagine just how flexible and easy to use will be controversial
because some people may feel users should use this channel into the VM
carefully and only in situations where other remote access methods are
not suitable (e.g. ssh, etc).

If there is a flexible interface from the start then new use-cases can
be implemented without modifying QEMU/virtagent source code and
updating guests.

Stefan



[Qemu-devel] Re: [PATCH v2 2/2] qemu, qmp: convert do_inject_nmi() to QObject, QError

2010-12-10 Thread Markus Armbruster
Lai Jiangshan la...@cn.fujitsu.com writes:

 Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt).

 changed from v1
 Add document.
 Add error handling when the cpu index is invalid.

 Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com
 ---
 diff --git a/hmp-commands.hx b/hmp-commands.hx
 index 23024ba..f86d9fe 100644
 --- a/hmp-commands.hx
 +++ b/hmp-commands.hx
 @@ -724,7 +724,8 @@ ETEXI
  .args_type  = cpu_index:i,
  .params = cpu,
  .help   = inject an NMI on the given CPU,
 -.mhandler.cmd = do_inject_nmi,
 +.user_print = monitor_user_noop,
 +.mhandler.cmd_new = do_inject_nmi,
  },
  #endif
  STEXI
 diff --git a/monitor.c b/monitor.c
 index ec31eac..f375eb3 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
 *qdict)
  #endif
  
  #if defined(TARGET_I386)
 -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
  {
  CPUState *env;
  int cpu_index = qdict_get_int(qdict, cpu_index);
 @@ -2127,8 +2127,11 @@ static void do_inject_nmi(Monitor *mon, const QDict 
 *qdict)
  for (env = first_cpu; env != NULL; env = env-next_cpu)
  if (env-cpu_index == cpu_index) {
  cpu_interrupt(env, CPU_INTERRUPT_NMI);
 -break;
 +return 0;
  }
 +
 +qerror_report(QERR_INVALID_CPU_INDEX, cpu_index);
 +return -1;

do_cpu_set() reports invalud index like this:

qerror_report(QERR_INVALID_PARAMETER_VALUE, index,
  a CPU number);

What about sticking to that?

[...]



[Qemu-devel] Re: [PATCH v2 1/2] QError: new QERR_INVALID_CPU_INDEX

2010-12-10 Thread Luiz Capitulino
On Fri, 10 Dec 2010 14:36:01 +0800
Lai Jiangshan la...@cn.fujitsu.com wrote:

 
 Signed-off-by:  Lai Jiangshan la...@cn.fujitsu.com

As Markus said, we report this as an invalid parameter in do_cpu(), we can do
the same for inject-nmi.

 ---
 diff --git a/qerror.c b/qerror.c
 index ac2cdaf..f59fb58 100644
 --- a/qerror.c
 +++ b/qerror.c
 @@ -117,6 +117,10 @@ static const QErrorStringTable qerror_table[] = {
  .desc  = Invalid block format '%(name)',
  },
  {
 +.error_fmt = QERR_INVALID_CPU_INDEX,
 +.desc  = Invalid CPU index '%(cpu_index)',
 +},
 +{
  .error_fmt = QERR_INVALID_PARAMETER,
  .desc  = Invalid parameter '%(name)',
  },
 diff --git a/qerror.h b/qerror.h
 index 943a24b..9117dda 100644
 --- a/qerror.h
 +++ b/qerror.h
 @@ -102,6 +102,9 @@ QError *qobject_to_qerror(const QObject *obj);
  #define QERR_INVALID_BLOCK_FORMAT \
  { 'class': 'InvalidBlockFormat', 'data': { 'name': %s } }
  
 +#define QERR_INVALID_CPU_INDEX \
 +{ 'class': 'InvalidCPUIndex', 'data': { 'cpu_index': %d } }
 +
  #define QERR_INVALID_PARAMETER \
  { 'class': 'InvalidParameter', 'data': { 'name': %s } }
  
 




[Qemu-devel] Re: [PATCH v2 2/2] qemu, qmp: convert do_inject_nmi() to QObject, QError

2010-12-10 Thread Luiz Capitulino
On Fri, 10 Dec 2010 14:36:08 +0800
Lai Jiangshan la...@cn.fujitsu.com wrote:

 +SQMP
 +inject_nmi
 +--
 +
 +Inject an NMI on the given CPU (x86 only).
 +
 +Arguments:
 +
 +- cpu_index: the index of the CPU to be injected NMI (json-int)
 +
 +Example:
 +
 +- { execute: inject_nmi, arguments: { cpu_index: 0 } }
 +- { return: {} }
 +
 +EQMP
 +

Avi, Anthony, can you please review this? Do we expect some kind of ack from
the guest? Do we expect it respond in some way?

Also note that the current series defines only one error condition: invalid
cpu index. Can this fail in other ways?



Re: [Qemu-devel] [PATCH 0/3] add hotplug opt-out option for devices.

2010-12-10 Thread Markus Armbruster
Gerd Hoffmann kra...@redhat.com writes:

 I understand why you're adding this but this is one of those horrible
 abuses of qdev that we really need to avoid.

 There are two valid reasons why hotplug is not possible:

 1) Hotplugging is not supported by the *slot*.  This is something that
 needs to be exposes through ACPI. This is not a qdev property, but a
 property of a PCI slot.

 Well, yea, right.  Sort of.  The ACPI thing applies to some of the
 slots only.  But, yes, strictly speaking this is a slot not a device
 property in the case of PCI.  Problem is qemu doesn't really has an
 idea what a pci slot is ...

Needs fixing.  Can't see how we can get sane hot plug without a proper
representation of PCI slots in QEMU.

 It's very important that we do this correctly
 because Windows puts a little icon in the systray that advertises
 quick-removal of devices in slots that support hotplug.

 Indeed.

 2) The PCI device is soldered to the MB or is otherwise not part of a
 PCI slot. Again, this is part of the ACPI definition.

 (3) The qemu emulation can't handle hot-unplug.

 Since the PIIX3 lives in slot 1, our ACPI tables should not advertise
 slot 0 or slot 1 as supporting hotplug.

 They do currently.  Should be easily fixable.

 Hotplug information has no business being part of the core qdev
 structures. Hotplug is a PCI concept and the information needs to live
 at the PCI layer to be meaningfully.

 Wrong.  PCI certainly isn't the only bus which supports hotplug.  It
 *does* make sense to handle generic hotplug stuff at qdev level.

Could the proper place be qbus instead of qdev?

 An ideal interface would explicitly allow a user to mark a series of PCI
 slots as no supporting hotplug. It would be convenient in order to
 ensure that your virtio-net wasn't accidentally ejected by a click-happy
 Windows user.

 Indeed.  That one is a bit harder I suspect.  Can this be done without
 generating acpi tables dynamically?



Re: [Qemu-devel] [PATCH 10/11] config: Add header file for device config options

2010-12-10 Thread Alexander Graf

On 10.12.2010, at 13:37, Markus Armbruster wrote:

 Alexander Graf ag...@suse.de writes:
 
 On 21.11.2010, at 13:37, Blue Swirl wrote:
 
 On Fri, Nov 19, 2010 at 2:56 AM, Alexander Graf ag...@suse.de wrote:
 So far we have C preprocessor defines for target and host config
 options, but we're lacking any information on which devices are
 available.
 
 We do need that information at times though, for example in the
 ahci patch where we need to call a legacy init function depending
 on whether we have support compiled in or not.
 
 That does not seem right. Devices should not care about what other
 devices may exist. Perhaps stub style approach would be better.
 
 Well, for the -drive parameter we need to know what devices we can create 
 and I'd like to keep that code as close as possible to the actual device 
 code.
 
 Stupid question: why can't we just try to create, then check status to
 figure out whether it failed because the device isn't available?

That's what the later version of this patch that in between is ripped out of 
the patch set did :)


Alex




Re: [Qemu-devel] [PATCH 1/1] NBD isn't used by qemu-img, so don't link qemu-img against NBD objects

2010-12-10 Thread Markus Armbruster
Jes Sorensen jes.soren...@redhat.com writes:

 On 11/22/10 16:20, Anthony Liguori wrote:
 On 11/22/2010 09:10 AM, Jes Sorensen wrote:
 On 11/22/10 16:08, Anthony Liguori wrote:
 On 11/22/2010 08:58 AM, Jes Sorensen wrote:
 Right, the right solution is probably to create a block driver list
 argument for configure, similar to what we have for the sound drivers.

 --block-drv-whitelist=
  
 Any idea what the difference is between 'whitelist' and 'list' in this
 context?
 
 Everything is built, but only the formats that are in the whitelist are
 usable.

 Kinda defeats the purpose IMHO. It would be useful to be able to strip
 out the formats one doesn't want to get a slimmed down binary.

There are two different purposes here:

1. You want to remove support for a format from QEMU and all the tools.
That's what you have in mind, I think.

2. You want to distinguish between good for production and not so
good for production formats.  That's what the whitelist is for.
Formats need to be whitelisted to be usable with QEMU proper.  Tools
normally ignore the whitelist.  Lets you attempt offline format
conversion for non-whitelisted formats, which is useful.

Both are legitimate, in my opinion.



[Qemu-devel] [PATCH 0/5] virtio-serial: Trivial fixes, don't copy buffers to host

2010-12-10 Thread Amit Shah
Hi,

This patch series converts virtio-serial-bus to use the guest buffers
instead of copying over guest data to the host, as suggested by Paul.

In addition, there are some trivial fixes for the virtio-console and
virtio-serial code.


Amit Shah (5):
  virtio-console: Factor out common init between console and generic
ports
  virtio-console: Remove unnecessary braces
  virtio-serial: Simplify condition for a while loop
  virtio-serial: Don't copy over guest buffer to host
  virtio-serial: Error out if guest sends unexpected vq elements

 hw/virtio-console.c|   34 +++---
 hw/virtio-serial-bus.c |   44 +++-
 2 files changed, 50 insertions(+), 28 deletions(-)

-- 
1.7.3.2




[Qemu-devel] [PATCH 1/5] virtio-console: Factor out common init between console and generic ports

2010-12-10 Thread Amit Shah
The initialisation for generic ports and console ports is similar.
Factor out the parts that are the same in a different function that can
be called from each of the initfns.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c |   31 ++-
 1 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index caea11f..d7fe68b 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -58,24 +58,28 @@ static void chr_event(void *opaque, int event)
 }
 }
 
-/* Virtio Console Ports */
-static int virtconsole_initfn(VirtIOSerialDevice *dev)
+static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
 {
-VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
-VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
-
-port-info = dev-info;
-
-port-is_console = true;
+vcon-port.info = dev-info;
 
 if (vcon-chr) {
 qemu_chr_add_handlers(vcon-chr, chr_can_read, chr_read, chr_event,
   vcon);
-port-info-have_data = flush_buf;
+vcon-port.info-have_data = flush_buf;
 }
 return 0;
 }
 
+/* Virtio Console Ports */
+static int virtconsole_initfn(VirtIOSerialDevice *dev)
+{
+VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
+VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+
+port-is_console = true;
+return generic_port_init(vcon, dev);
+}
+
 static int virtconsole_exitfn(VirtIOSerialDevice *dev)
 {
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
@@ -115,14 +119,7 @@ static int virtserialport_initfn(VirtIOSerialDevice *dev)
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 
-port-info = dev-info;
-
-if (vcon-chr) {
-qemu_chr_add_handlers(vcon-chr, chr_can_read, chr_read, chr_event,
-  vcon);
-port-info-have_data = flush_buf;
-}
-return 0;
+return generic_port_init(vcon, dev);
 }
 
 static VirtIOSerialPortInfo virtserialport_info = {
-- 
1.7.3.2




[Qemu-devel] [PATCH 2/5] virtio-console: Remove unnecessary braces

2010-12-10 Thread Amit Shah
Remove unnecessary braces around a case statement.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index d7fe68b..d0b9354 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -48,10 +48,9 @@ static void chr_event(void *opaque, int event)
 VirtConsole *vcon = opaque;
 
 switch (event) {
-case CHR_EVENT_OPENED: {
+case CHR_EVENT_OPENED:
 virtio_serial_open(vcon-port);
 break;
-}
 case CHR_EVENT_CLOSED:
 virtio_serial_close(vcon-port);
 break;
-- 
1.7.3.2




[Qemu-devel] [PATCH 4/5] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Amit Shah
When the guest writes something to a host, we copied over the entire
buffer first into the host and then processed it.  Do away with that, it
could result in a malicious guest causing a DoS on the host.

Reported-by: Paul Brook p...@codesourcery.com
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index ecf0056..3bbd915 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -125,17 +125,21 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 return;
 }
 while (virtqueue_pop(vq, elem)) {
-uint8_t *buf;
-size_t ret, buf_size;
+unsigned int i;
 
-if (!discard) {
-buf_size = iov_size(elem.out_sg, elem.out_num);
-buf = qemu_malloc(buf_size);
-ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+if (discard) {
+goto next;
+}
+for (i = 0; i  elem.out_num; i++) {
+size_t buf_size;
 
-port-info-have_data(port, buf, ret);
-qemu_free(buf);
+buf_size = elem.out_sg[i].iov_len;
+
+port-info-have_data(port,
+  elem.out_sg[i].iov_base,
+  buf_size);
 }
+next:
 virtqueue_push(vq, elem, 0);
 }
 virtio_notify(vdev, vq);
-- 
1.7.3.2




[Qemu-devel] [PATCH 3/5] virtio-serial: Simplify condition for a while loop

2010-12-10 Thread Amit Shah
Separate out a non-changing condition over the period of a loop into an
if statement before the loop.  This will be used later to re-work the
loop.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 74ba5ec..ecf0056 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -121,7 +121,10 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 assert(port || discard);
 assert(virtio_queue_ready(vq));
 
-while ((discard || !port-throttled)  virtqueue_pop(vq, elem)) {
+if (!discard  port-throttled) {
+return;
+}
+while (virtqueue_pop(vq, elem)) {
 uint8_t *buf;
 size_t ret, buf_size;
 
-- 
1.7.3.2




[Qemu-devel] [PATCH 5/5] virtio-serial: Error out if guest sends unexpected vq elements

2010-12-10 Thread Amit Shah
Check if the guest really sent any items in the out_vq before using
them.  Similarly, check if there is a buffer to send data in before
writing.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 3bbd915..3a3032f 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -102,6 +102,11 @@ static size_t write_to_port(VirtIOSerialPort *port,
 break;
 }
 
+if (elem.in_num  1) {
+error_report(No buffer to send data in?);
+abort();
+}
+
 len = iov_from_buf(elem.in_sg, elem.in_num,
buf + offset, size - offset);
 offset += len;
@@ -127,6 +132,11 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 while (virtqueue_pop(vq, elem)) {
 unsigned int i;
 
+if (elem.out_num  1) {
+error_report(No data sent by guest?);
+abort();
+}
+
 if (discard) {
 goto next;
 }
@@ -169,6 +179,11 @@ static size_t send_control_msg(VirtIOSerialPort *port, 
void *buf, size_t len)
 return 0;
 }
 
+if (elem.in_num  1) {
+error_report(No buffer to send control data in?);
+abort();
+}
+
 cpkt = (struct virtio_console_control *)buf;
 stl_p(cpkt-id, port-id);
 memcpy(elem.in_sg[0].iov_base, buf, len);
@@ -386,6 +401,10 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 while (virtqueue_pop(vq, elem)) {
 size_t cur_len, copied;
 
+if (elem.out_num  1) {
+error_report(No data sent in control packet);
+abort();
+}
 cur_len = iov_size(elem.out_sg, elem.out_num);
 /*
  * Allocate a new buf only if we didn't have one previously or
-- 
1.7.3.2




Re: [Qemu-devel] [PATCH 4/5] ide: add TRIM support

2010-12-10 Thread Christoph Hellwig
On Thu, Dec 02, 2010 at 03:07:49PM +0100, Kevin Wolf wrote:
 This looks wrong. Wouldn't werror=stop cause the request to be retried
 as a write when the VM is resumed?

Indeed.

 But having a copypaste error gives just about right reason to mention
 that after read and write this is the third almost unchanged copy of
 this code. Eventually we'll want to refactor this.

I've added a patch to refactor the DMA code to the next iteration
of the patch series.

 While we're at it, do you know why in the eot: case we set
 BM_STATUS_INT, but don't actually call ide_set_irq? From what I
 understand, those two should always be coupled, but I might be wrong.

No idea, sorry.




Re: [Qemu-devel] [PATCH 0/3] add hotplug opt-out option for devices.

2010-12-10 Thread Gerd Hoffmann

  Hi,


Wrong.  PCI certainly isn't the only bus which supports hotplug.  It
*does* make sense to handle generic hotplug stuff at qdev level.


Could the proper place be qbus instead of qdev?


No.  But PCI is the only bus where some devices are hot-pluggable and 
some are not.  On all other busses it is either all or no devices.  So 
moving to pci is an option (patches for that are on the list).


cheers,
  Gerd



[Qemu-devel] [PATCH] ide: Register vm change state handler once only

2010-12-10 Thread Stefan Hajnoczi
We register the vm change state handler in a PCI BAR map() function.
This function can be called multiple times throughout the lifetime of a
PCI IDE device.  This results in duplicate vm change state handlers
being register, none of which are ever unregistered.

Instead, register the vm change state handler in the device's init
function once and for all.

piix tested, cmd646 and via not tested.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/ide/cmd646.c |   16 +---
 hw/ide/piix.c   |   32 +++-
 hw/ide/via.c|   32 +++-
 3 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index dfe6091..23fc6e1 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -167,9 +167,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
 for(i = 0;i  2; i++) {
 BMDMAState *bm = d-bmdma[i];
-d-bus[i].bmdma = bm;
-bm-bus = d-bus+i;
-qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
 
 if (i == 0) {
 register_ioport_write(addr, 4, 1, bmdma_writeb_0, d);
@@ -228,6 +225,7 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
 PCIIDEState *d = DO_UPCAST(PCIIDEState, dev, dev);
 uint8_t *pci_conf = d-dev.config;
 qemu_irq *irq;
+int i;
 
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_CMD);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_CMD_646);
@@ -253,10 +251,14 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
 irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
-ide_bus_new(d-bus[0], d-dev.qdev);
-ide_bus_new(d-bus[1], d-dev.qdev);
-ide_init2(d-bus[0], irq[0]);
-ide_init2(d-bus[1], irq[1]);
+for (i = 0; i  2; i++) {
+ide_bus_new(d-bus[i], d-dev.qdev);
+ide_init2(d-bus[i], irq[i]);
+
+d-bus[i].bmdma = d-bmdma[i];
+bm-bus = d-bus[i];
+qemu_add_vm_change_state_handler(ide_dma_restart_cb, d-bmdma[i]);
+}
 
 vmstate_register(dev-qdev, 0, vmstate_ide_pci, d);
 qemu_register_reset(cmd646_reset, d);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index e02b89a..f2752e7 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,9 +76,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
 for(i = 0;i  2; i++) {
 BMDMAState *bm = d-bmdma[i];
-d-bus[i].bmdma = bm;
-bm-bus = d-bus+i;
-qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
 
 register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -112,6 +109,28 @@ static void piix3_reset(void *opaque)
 pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */
 }
 
+static void pci_piix_init_ports(PCIIDEState *d) {
+int i;
+struct {
+int iobase;
+int iobase2;
+int isairq;
+} port_info[] = {
+{0x1f0, 0x3f6, 14},
+{0x170, 0x376, 15},
+};
+
+for (i = 0; i  2; i++) {
+ide_bus_new(d-bus[i], d-dev.qdev);
+ide_init_ioport(d-bus[i], port_info[i].iobase, port_info[i].iobase2);
+ide_init2(d-bus[i], isa_reserve_irq(port_info[i].isairq));
+
+d-bus[i].bmdma = d-bmdma[i];
+d-bmdma[i].bus = d-bus[i];
+qemu_add_vm_change_state_handler(ide_dma_restart_cb, d-bmdma[i]);
+}
+}
+
 static int pci_piix_ide_initfn(PCIIDEState *d)
 {
 uint8_t *pci_conf = d-dev.config;
@@ -125,13 +144,8 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
 vmstate_register(d-dev.qdev, 0, vmstate_ide_pci, d);
 
-ide_bus_new(d-bus[0], d-dev.qdev);
-ide_bus_new(d-bus[1], d-dev.qdev);
-ide_init_ioport(d-bus[0], 0x1f0, 0x3f6);
-ide_init_ioport(d-bus[1], 0x170, 0x376);
+pci_piix_init_ports(d);
 
-ide_init2(d-bus[0], isa_reserve_irq(14));
-ide_init2(d-bus[1], isa_reserve_irq(15));
 return 0;
 }
 
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 66be0c4..839c571 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -78,9 +78,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
 for(i = 0;i  2; i++) {
 BMDMAState *bm = d-bmdma[i];
-d-bus[i].bmdma = bm;
-bm-bus = d-bus+i;
-qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
 
 register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -135,6 +132,28 @@ static void via_reset(void *opaque)
 pci_set_long(pci_conf + 0xc0, 0x00020001);
 }
 
+static void vt82c686b_init_ports(PCIIDEState *d) {
+int i;
+struct {
+int iobase;
+int iobase2;
+int isairq;
+} port_info[] = {
+{0x1f0, 0x3f6, 14},
+{0x170, 0x376, 15},
+};
+
+for (i = 0; i  2; i++) {
+ide_bus_new(d-bus[i], d-dev.qdev);
+ide_init2(d-bus[i], isa_reserve_irq(port_info[i].isairq));
+ide_init_ioport(d-bus[i], port_info[i].iobase, port_info[i].iobase2);
+
+d-bus[i].bmdma = d-bmdma[i];
+d-bmdma[i].bus = d-bus[i];
+

[Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3

2010-12-10 Thread Christoph Hellwig
This patchset adds support for the ATA TRIM and SCSI WRITE SAME with
unmap commands, which allow reclaiming free space from a backing image.

The user facing implementation is pretty complete, but not really
efficient because the underlying bdrv_discard implementation doesn't
use the aio implementation yet.  The reason for that is that the SCSI
layer doesn't really allow any asynchronous commands except for
READ/WRITE by design, and implementing the ATA TRIM command with it's
multiple ranges is rather painful, and combined with the SCSI limitation
I didn't bother yet.  The only backend support so far is the XFS hole
punching ioctl, but others can be added easily when they become
available.  A virtio implementation for a discard command would also
be pretty easy, but until we actually support a better backend then
a plain sparse file it's not worth using for production enviroments
anyway, but more for playing with the thin provisioning infrastructure,
or observing guest behaviour when TRIM / unmap is supported.

If the support is enabled and the backend doesn't support hole punching
the TRIM / WRITE SAME commands become no-ops so that migration from
hosts supporting or not supporting it works.

Version 3:
- refactor IDE dma support code
- proper brace obsfucation
- fix compile without xfs headers
- use bool instead of int for a one-byte flag

Version 2:
- replace tabs with spaces
- return -ENOMEDIUM from bdrv_discard if there's no driver
  assigned
- actually list the TP EVPD page as supported when querying
  for supported EVPD pages



[Qemu-devel] Re: [PATCH 5/5] virtio-serial: Error out if guest sends unexpected vq elements

2010-12-10 Thread Amit Shah
On (Fri) Dec 10 2010 [13:59:50], Paul Brook wrote:
  Check if the guest really sent any items in the out_vq before using
  them.  Similarly, check if there is a buffer to send data in before
  writing.
 
 Can this actually happen? If so why/how?
 Why does it need a special case in this device?

A malicious guest (ie, a guest with the virtio_console module suitably
modified) could send in buffers with the 'input' bit set instead of
output as expected or vice-versa.

 If this is guest triggerable then calling abort() is wrong.

It's either a guest bug or a malicious guest.  What action is
recommended?

Amit



[Qemu-devel] [PATCH 3/7] make dma_bdrv_io available to drivers

2010-12-10 Thread Christoph Hellwig
Make dma_bdrv_io available for drivers, and pass an explicit I/O function
instead of hardcoding bdrv_aio_readv/bdrv_aio_writev.  This is required
to implement non-READ/WRITE dma commands in the ide driver, e.g. the
upcoming TRIM support.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/dma-helpers.c
===
--- qemu.orig/dma-helpers.c 2010-11-23 14:13:42.178253390 +0100
+++ qemu/dma-helpers.c  2010-11-23 14:14:45.186253110 +0100
@@ -47,6 +47,7 @@ typedef struct {
 target_phys_addr_t sg_cur_byte;
 QEMUIOVector iov;
 QEMUBH *bh;
+DMAIOFunc *io_func;
 } DMAAIOCB;
 
 static void dma_bdrv_cb(void *opaque, int ret);
@@ -116,13 +117,8 @@ static void dma_bdrv_cb(void *opaque, in
 return;
 }
 
-if (dbs-is_write) {
-dbs-acb = bdrv_aio_writev(dbs-bs, dbs-sector_num, dbs-iov,
-   dbs-iov.size / 512, dma_bdrv_cb, dbs);
-} else {
-dbs-acb = bdrv_aio_readv(dbs-bs, dbs-sector_num, dbs-iov,
-  dbs-iov.size / 512, dma_bdrv_cb, dbs);
-}
+dbs-acb = dbs-io_func(dbs-bs, dbs-sector_num, dbs-iov,
+dbs-iov.size / 512, dma_bdrv_cb, dbs);
 if (!dbs-acb) {
 dma_bdrv_unmap(dbs);
 qemu_iovec_destroy(dbs-iov);
@@ -144,12 +140,12 @@ static AIOPool dma_aio_pool = {
 .cancel = dma_aio_cancel,
 };
 
-static BlockDriverAIOCB *dma_bdrv_io(
+BlockDriverAIOCB *dma_bdrv_io(
 BlockDriverState *bs, QEMUSGList *sg, uint64_t sector_num,
-BlockDriverCompletionFunc *cb, void *opaque,
-int is_write)
+DMAIOFunc *io_func, BlockDriverCompletionFunc *cb,
+void *opaque, int is_write)
 {
-DMAAIOCB *dbs =  qemu_aio_get(dma_aio_pool, bs, cb, opaque);
+DMAAIOCB *dbs = qemu_aio_get(dma_aio_pool, bs, cb, opaque);
 
 dbs-acb = NULL;
 dbs-bs = bs;
@@ -158,6 +154,7 @@ static BlockDriverAIOCB *dma_bdrv_io(
 dbs-sg_cur_index = 0;
 dbs-sg_cur_byte = 0;
 dbs-is_write = is_write;
+dbs-io_func = io_func;
 dbs-bh = NULL;
 qemu_iovec_init(dbs-iov, sg-nsg);
 dma_bdrv_cb(dbs, 0);
@@ -173,12 +170,12 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDri
 QEMUSGList *sg, uint64_t sector,
 void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, cb, opaque, 0);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque, 0);
 }
 
 BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
  QEMUSGList *sg, uint64_t sector,
  void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, cb, opaque, 1);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, 1);
 }
Index: qemu/dma.h
===
--- qemu.orig/dma.h 2010-11-23 14:13:42.185253809 +0100
+++ qemu/dma.h  2010-11-23 14:14:45.186253110 +0100
@@ -32,6 +32,14 @@ void qemu_sglist_add(QEMUSGList *qsg, ta
  target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 
+typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque);
+
+BlockDriverAIOCB *dma_bdrv_io(BlockDriverState *bs,
+  QEMUSGList *sg, uint64_t sector_num,
+  DMAIOFunc *io_func, BlockDriverCompletionFunc 
*cb,
+  void *opaque, int is_write);
 BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 QEMUSGList *sg, uint64_t sector,
 BlockDriverCompletionFunc *cb, void *opaque);



[Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit

2010-12-10 Thread Christoph Hellwig
Support discards via the WRITE SAME command with the unmap bit set, and
tell the initiator about the support for it via the block limit and the
new thin provisioning EVPD pages.  Also fix the comment which incorrectly
describedthe block limits EVPD page.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/hw/scsi-disk.c
===
--- qemu.orig/hw/scsi-disk.c2010-12-07 09:35:44.203009851 +0100
+++ qemu/hw/scsi-disk.c 2010-12-07 09:42:07.984255141 +0100
@@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS
 outbuf[buflen++] = 0x80; // unit serial number
 outbuf[buflen++] = 0x83; // device identification
 if (bdrv_get_type_hint(s-bs) != BDRV_TYPE_CDROM) {
-outbuf[buflen++] = 0xb0; // block device characteristics
+outbuf[buflen++] = 0xb0; // block limits
+outbuf[buflen++] = 0xb2; // thin provisioning
 }
 outbuf[pages] = buflen - pages - 1; // number of pages
 break;
@@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS
 buflen += id_len;
 break;
 }
-case 0xb0: /* block device characteristics */
+case 0xb0: /* block limits */
 {
+unsigned int unmap_sectors =
+s-qdev.conf.discard_granularity / s-qdev.blocksize;
 unsigned int min_io_size =
 s-qdev.conf.min_io_size / s-qdev.blocksize;
 unsigned int opt_io_size =
@@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS
 outbuf[13] = (opt_io_size  16)  0xff;
 outbuf[14] = (opt_io_size  8)  0xff;
 outbuf[15] = opt_io_size  0xff;
+
+/* optimal unmap granularity */
+outbuf[28] = (unmap_sectors  24)  0xff;
+outbuf[29] = (unmap_sectors  16)  0xff;
+outbuf[30] = (unmap_sectors  8)  0xff;
+outbuf[31] = unmap_sectors  0xff;
+break;
+}
+case 0xb2: /* thin provisioning */
+{
+outbuf[3] = buflen = 8;
+outbuf[4] = 0;
+outbuf[5] = 0x40; /* write same with unmap supported */
+outbuf[6] = 0;
+outbuf[7] = 0;
 break;
 }
 default:
@@ -959,6 +977,11 @@ static int scsi_disk_emulate_command(SCS
 outbuf[11] = 0;
 outbuf[12] = 0;
 outbuf[13] = get_physical_block_exp(s-qdev.conf);
+
+/* set TPE bit if the format supports discard */
+if (s-qdev.conf.discard_granularity)
+outbuf[14] = 0x80;
+
 /* Protection, exponent and lowest lba field left blank. */
 buflen = req-cmd.xfer;
 break;
@@ -1123,6 +1146,34 @@ static int32_t scsi_send_command(SCSIDev
 goto illegal_lba;
 }
 break;
+case WRITE_SAME_16:
+len = r-req.cmd.xfer / d-blocksize;
+
+DPRINTF(WRITE SAME(16) (sector % PRId64 , count %d)\n,
+r-req.cmd.lba, len);
+
+if (r-req.cmd.lba  s-max_lba) {
+goto illegal_lba;
+}
+
+/*
+ * We only support WRITE SAME with the unmap bit set for now.
+ */
+if (!(buf[1]  0x8)) {
+goto fail;
+}
+
+rc = bdrv_discard(s-bs, r-req.cmd.lba * s-cluster_size,
+  len * s-cluster_size);
+if (rc  0) {
+/* XXX: better error code ?*/
+goto fail;
+}
+
+scsi_req_set_status(r, GOOD, NO_SENSE);
+scsi_req_complete(r-req);
+scsi_remove_request(r);
+return 0;
 default:
 DPRINTF(Unknown SCSI command (%2.2x)\n, buf[0]);
 fail:
Index: qemu/hw/scsi-defs.h
===
--- qemu.orig/hw/scsi-defs.h2010-12-07 09:35:44.219005032 +0100
+++ qemu/hw/scsi-defs.h 2010-12-07 09:38:35.726003986 +0100
@@ -84,6 +84,7 @@
 #define MODE_SENSE_10 0x5a
 #define PERSISTENT_RESERVE_IN 0x5e
 #define PERSISTENT_RESERVE_OUT 0x5f
+#define WRITE_SAME_16 0x93
 #define MAINTENANCE_IN0xa3
 #define MAINTENANCE_OUT   0xa4
 #define MOVE_MEDIUM   0xa5



[Qemu-devel] [PATCH 7/7] raw-posix: add discard support

2010-12-10 Thread Christoph Hellwig
Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/block/raw-posix.c
===
--- qemu.orig/block/raw-posix.c 2010-12-07 09:35:40.806004264 +0100
+++ qemu/block/raw-posix.c  2010-12-07 09:44:30.093290897 +0100
@@ -69,6 +69,10 @@
 #include sys/diskslice.h
 #endif
 
+#ifdef CONFIG_XFS
+#include xfs/xfs.h
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -120,6 +124,9 @@ typedef struct BDRVRawState {
 #endif
 uint8_t *aligned_buf;
 unsigned aligned_buf_size;
+#ifdef CONFIG_XFS
+bool is_xfs : 1;
+#endif
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt
 #endif
 }
 
+#ifdef CONFIG_XFS
+if (platform_test_xfs_fd(s-fd)) {
+s-is_xfs = 1;
+}
+#endif
+
 return 0;
 
 out_free_buf:
@@ -740,6 +753,36 @@ static int raw_flush(BlockDriverState *b
 return qemu_fdatasync(s-fd);
 }
 
+#ifdef CONFIG_XFS
+static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
+{
+struct xfs_flock64 fl;
+
+memset(fl, 0, sizeof(fl));
+fl.l_whence = SEEK_SET;
+fl.l_start = sector_num  9;
+fl.l_len = (int64_t)nb_sectors  9;
+
+if (xfsctl(NULL, s-fd, XFS_IOC_UNRESVSP64, fl)  0) {
+printf(cannot punch hole (%s)\n, strerror(errno));
+return -errno;
+}
+
+return 0;
+}
+#endif
+
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors)
+{
+#ifdef CONFIG_XFS
+BDRVRawState *s = bs-opaque;
+
+if (s-is_xfs)
+return xfs_discard(s, sector_num, nb_sectors);
+#endif
+
+return 0;
+}
 
 static QEMUOptionParameter raw_create_options[] = {
 {
@@ -761,6 +804,7 @@ static BlockDriver bdrv_file = {
 .bdrv_close = raw_close,
 .bdrv_create = raw_create,
 .bdrv_flush = raw_flush,
+.bdrv_discard = raw_discard,
 
 .bdrv_aio_readv = raw_aio_readv,
 .bdrv_aio_writev = raw_aio_writev,
Index: qemu/configure
===
--- qemu.orig/configure 2010-12-07 09:35:40.815261632 +0100
+++ qemu/configure  2010-12-07 09:44:52.764255834 +0100
@@ -288,6 +288,7 @@ xen=
 linux_aio=
 attr=
 vhost_net=
+xfs=
 
 gprof=no
 debug_tcg=no
@@ -1393,6 +1394,27 @@ EOF
 fi
 
 ##
+# xfsctl() probe, used for raw-posix
+if test $xfs != no ; then
+  cat  $TMPC  EOF
+#include xfs/xfs.h
+int main(void)
+{
+xfsctl(NULL, 0, 0, NULL);
+return 0;
+}
+EOF
+  if compile_prog   ; then
+xfs=yes
+  else
+if test $xfs = yes ; then
+  feature_not_found xfs
+fi
+xfs=no
+  fi
+fi
+
+##
 # vde libraries probe
 if test $vde != no ; then
   vde_libs=-lvdeplug
@@ -2354,6 +2376,7 @@ echo vhost-net support $vhost_net
 echo Trace backend $trace_backend
 echo Trace output file $trace_file-pid
 echo spice support $spice
+echo xfsctl support$xfs
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -2499,6 +2522,9 @@ fi
 if test $uuid = yes ; then
   echo CONFIG_UUID=y  $config_host_mak
 fi
+if test $xfs = yes ; then
+  echo CONFIG_XFS=y  $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo VERSION=$qemu_version $config_host_mak
 echo PKGVERSION=$pkgversion $config_host_mak



[Qemu-devel] [PATCH 1/7] block: add discard support

2010-12-10 Thread Christoph Hellwig
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/block.c
===
--- qemu.orig/block.c   2010-12-07 09:34:17.563010201 +0100
+++ qemu/block.c2010-12-07 09:37:44.734255486 +0100
@@ -1499,6 +1499,17 @@ int bdrv_has_zero_init(BlockDriverState
 return 1;
 }
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+if (!bs-drv) {
+return -ENOMEDIUM;
+}
+if (!bs-drv-bdrv_discard) {
+return 0;
+}
+return bs-drv-bdrv_discard(bs, sector_num, nb_sectors);
+}
+
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
  * not implementing the functionality are assumed to not support backing files,
Index: qemu/block.h
===
--- qemu.orig/block.h   2010-12-07 09:34:17.571010061 +0100
+++ qemu/block.h2010-12-07 09:34:29.689022144 +0100
@@ -146,6 +146,7 @@ int bdrv_flush(BlockDriverState *bs);
 void bdrv_flush_all(void);
 void bdrv_close_all(void);
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
int *pnum);
Index: qemu/block_int.h
===
--- qemu.orig/block_int.h   2010-12-07 09:34:17.588010480 +0100
+++ qemu/block_int.h2010-12-07 09:34:29.693010131 +0100
@@ -73,6 +73,8 @@ struct BlockDriver {
 BlockDriverCompletionFunc *cb, void *opaque);
 BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
 BlockDriverCompletionFunc *cb, void *opaque);
+int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
+int nb_sectors);
 
 int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
 int num_reqs);
@@ -227,6 +229,7 @@ typedef struct BlockConf {
 uint16_t logical_block_size;
 uint16_t min_io_size;
 uint32_t opt_io_size;
+uint32_t discard_granularity;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -249,6 +252,8 @@ static inline unsigned int get_physical_
 DEFINE_PROP_UINT16(physical_block_size, _state,   \
_conf.physical_block_size, 512), \
 DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0),  \
-DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0)
+DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0), \
+DEFINE_PROP_UINT32(discard_granularity, _state, \
+   _conf.discard_granularity, 0)
 
 #endif /* BLOCK_INT_H */
Index: qemu/block/raw.c
===
--- qemu.orig/block/raw.c   2010-12-07 09:34:17.596025496 +0100
+++ qemu/block/raw.c2010-12-07 09:34:29.695297321 +0100
@@ -65,6 +65,11 @@ static int raw_probe(const uint8_t *buf,
return 1; /* everything can be opened as raw image */
 }
 
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors)
+{
+return bdrv_discard(bs-file, sector_num, nb_sectors);
+}
+
 static int raw_is_inserted(BlockDriverState *bs)
 {
 return bdrv_is_inserted(bs-file);
@@ -130,6 +135,7 @@ static BlockDriver bdrv_raw = {
 .bdrv_aio_readv = raw_aio_readv,
 .bdrv_aio_writev= raw_aio_writev,
 .bdrv_aio_flush = raw_aio_flush,
+.bdrv_discard   = raw_discard,
 
 .bdrv_is_inserted   = raw_is_inserted,
 .bdrv_eject = raw_eject,



[Qemu-devel] Re: [PATCH] blockdev: check dinfo ptr before using

2010-12-10 Thread Kevin Wolf
Am 08.12.2010 17:05, schrieb Ryan Harper:
 If a user decides to punish a guest by revoking its block device via
 drive_del, and subsequently also attempts to remove the pci device
 backing it, and the device is using blockdev_auto_del() then we get a
 segfault when we attempt to access dinfo-auto_del.[1]
 
 The fix is to check if drive_get_by_blockdev() actually returns a valid
 dinfo pointer or not.
 
 1. (qemu) pci_add auto storage 
 file=images/test01.raw,if=virtio,id=block1,snapshot=on
(qemu) drive_del block1
(qemu) pci_del 5
*segfault*
 
 Signed-off-by: Ryan Harper ry...@us.ibm.com

Thanks, applied to the block branch.

Kevin



[Qemu-devel] [PATCH 5/7] ide: also reset io_buffer_index for writes

2010-12-10 Thread Christoph Hellwig
Currenly the code only resets the io_buffer_index field for reads,
but the code seems to expect this for all types of I/O.  I guess
we simply don't hit large enough transfers that would require this
often enough.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/hw/ide/core.c
===
--- qemu.orig/hw/ide/core.c 2010-12-10 11:35:23.164005731 +0100
+++ qemu/hw/ide/core.c  2010-12-10 11:35:30.471253949 +0100
@@ -605,8 +605,7 @@ static void ide_dma_cb(void *opaque, int
 
 /* launch next transfer */
 n = s-nsector;
-if (s-is_read)
-s-io_buffer_index = 0;
+s-io_buffer_index = 0;
 s-io_buffer_size = n * 512;
 if (dma_buf_prepare(bm, s-is_read) == 0)
 goto eot;



[Qemu-devel] [PATCH 4/7] ide: factor dma handling helpers

2010-12-10 Thread Christoph Hellwig
The DMA I/O path is duplicated between read and write commands, and
support for the TRIM command would add another copy.  Factor the
code into common helpers using the s-is_read flag added for the
macio ATA controller, and the newly added dma_bdrv_io function.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/hw/ide/core.c
===
--- qemu.orig/hw/ide/core.c 2010-12-10 11:17:57.328254648 +0100
+++ qemu/hw/ide/core.c  2010-12-10 11:35:23.164005731 +0100
@@ -61,7 +61,8 @@ static inline int media_is_cd(IDEState *
 return (media_present(s)  s-nb_sectors = CD_MAX_SECTORS);
 }
 
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb);
+static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb,
+   DMAIOFunc *io_func);
 static void ide_dma_restart(IDEState *s, int is_read);
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
 static int ide_handle_rw_error(IDEState *s, int error, int op);
@@ -568,7 +569,7 @@ static int dma_buf_rw(BMDMAState *bm, in
 return 1;
 }
 
-static void ide_read_dma_cb(void *opaque, int ret)
+static void ide_dma_cb(void *opaque, int ret)
 {
 BMDMAState *bm = opaque;
 IDEState *s = bmdma_active_if(bm);
@@ -576,9 +577,12 @@ static void ide_read_dma_cb(void *opaque
 int64_t sector_num;
 
 if (ret  0) {
-if (ide_handle_rw_error(s, -ret,
-BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ))
-{
+int op = BM_STATUS_DMA_RETRY;
+
+if (s-is_read)
+op |= BM_STATUS_RETRY_READ;
+
+if (ide_handle_rw_error(s, -ret, op)) {
 return;
 }
 }
@@ -586,7 +590,7 @@ static void ide_read_dma_cb(void *opaque
 n = s-io_buffer_size  9;
 sector_num = ide_get_sector(s);
 if (n  0) {
-dma_buf_commit(s, 1);
+dma_buf_commit(s, s-is_read);
 sector_num += n;
 ide_set_sector(s, sector_num);
 s-nsector -= n;
@@ -596,32 +600,39 @@ static void ide_read_dma_cb(void *opaque
 if (s-nsector == 0) {
 s-status = READY_STAT | SEEK_STAT;
 ide_set_irq(s-bus);
-eot:
-bm-status |= BM_STATUS_INT;
-ide_dma_set_inactive(bm);
-return;
+goto eot;
 }
 
 /* launch next transfer */
 n = s-nsector;
-s-io_buffer_index = 0;
+if (s-is_read)
+s-io_buffer_index = 0;
 s-io_buffer_size = n * 512;
-if (dma_buf_prepare(bm, 1) == 0)
+if (dma_buf_prepare(bm, s-is_read) == 0)
 goto eot;
+
 #ifdef DEBUG_AIO
-printf(aio_read: sector_num=% PRId64  n=%d\n, sector_num, n);
+printf(ide_dma_cb: read: %d sector_num=% PRId64  n=%d\n,
+   s-is_read, sector_num, n);
 #endif
-bm-aiocb = dma_bdrv_read(s-bs, s-sg, sector_num, ide_read_dma_cb, bm);
-ide_dma_submit_check(s, ide_read_dma_cb, bm);
+
+bm-aiocb = dma_bdrv_io(s-bs, s-sg, sector_num, bm-io_func,
+ide_dma_cb, bm, !s-is_read);
+ide_dma_submit_check(s, ide_dma_cb, bm);
+return;
+
+eot:
+bm-status |= BM_STATUS_INT;
+ide_dma_set_inactive(bm);
 }
 
-static void ide_sector_read_dma(IDEState *s)
+static void ide_sector_dma(IDEState *s, DMAIOFunc *io_func, bool is_read)
 {
 s-status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
 s-io_buffer_index = 0;
 s-io_buffer_size = 0;
-s-is_read = 1;
-ide_dma_start(s, ide_read_dma_cb);
+s-is_read = is_read;
+ide_dma_start(s, ide_dma_cb, io_func);
 }
 
 static void ide_sector_write_timer_cb(void *opaque)
@@ -714,58 +725,6 @@ void ide_dma_restart_cb(void *opaque, in
 }
 }
 
-static void ide_write_dma_cb(void *opaque, int ret)
-{
-BMDMAState *bm = opaque;
-IDEState *s = bmdma_active_if(bm);
-int n;
-int64_t sector_num;
-
-if (ret  0) {
-if (ide_handle_rw_error(s, -ret,  BM_STATUS_DMA_RETRY))
-return;
-}
-
-n = s-io_buffer_size  9;
-sector_num = ide_get_sector(s);
-if (n  0) {
-dma_buf_commit(s, 0);
-sector_num += n;
-ide_set_sector(s, sector_num);
-s-nsector -= n;
-}
-
-/* end of transfer ? */
-if (s-nsector == 0) {
-s-status = READY_STAT | SEEK_STAT;
-ide_set_irq(s-bus);
-eot:
-bm-status |= BM_STATUS_INT;
-ide_dma_set_inactive(bm);
-return;
-}
-
-n = s-nsector;
-s-io_buffer_size = n * 512;
-/* launch next transfer */
-if (dma_buf_prepare(bm, 0) == 0)
-goto eot;
-#ifdef DEBUG_AIO
-printf(aio_write: sector_num=% PRId64  n=%d\n, sector_num, n);
-#endif
-bm-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num, ide_write_dma_cb, 
bm);
-ide_dma_submit_check(s, ide_write_dma_cb, bm);
-}
-
-static void ide_sector_write_dma(IDEState *s)
-{
-s-status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
-s-io_buffer_index = 0;
-s-io_buffer_size = 0;
-s-is_read = 0;
-ide_dma_start(s, ide_write_dma_cb);
-}
-
 void 

[Qemu-devel] [PATCH 6/7] ide: add TRIM support

2010-12-10 Thread Christoph Hellwig
Add support for the data set management command, and the TRIM sub function
in it.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/hw/ide/core.c
===
--- qemu.orig/hw/ide/core.c 2010-12-10 11:35:30.471253949 +0100
+++ qemu/hw/ide/core.c  2010-12-10 11:35:33.430253739 +0100
@@ -146,6 +146,9 @@ static void ide_identify(IDEState *s)
 put_le16(p + 66, 120);
 put_le16(p + 67, 120);
 put_le16(p + 68, 120);
+if (dev  dev-conf.discard_granularity) {
+put_le16(p + 69, (1  14)); /* determinate TRIM behavior */
+}
 put_le16(p + 80, 0xf0); /* ata3 - ata6 supported */
 put_le16(p + 81, 0x16); /* conforms to ata5 */
 /* 14=NOP supported, 5=WCACHE supported, 0=SMART supported */
@@ -172,6 +175,9 @@ static void ide_identify(IDEState *s)
 dev = s-unit ? s-bus-slave : s-bus-master;
 if (dev  dev-conf.physical_block_size)
 put_le16(p + 106, 0x6000 | get_physical_block_exp(dev-conf));
+if (dev  dev-conf.discard_granularity) {
+put_le16(p + 169, 1); /* TRIM support */
+}
 
 memcpy(s-identify_data, p, sizeof(s-identify_data));
 s-identify_set = 1;
@@ -1746,6 +1752,72 @@ static void ide_clear_hob(IDEBus *bus)
 bus-ifs[1].select = ~(1  7);
 }
 
+typedef struct TrimAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+int ret;
+} TrimAIOCB;
+
+static void trim_aio_cancel(BlockDriverAIOCB *acb)
+{
+TrimAIOCB *iocb = container_of(acb, TrimAIOCB, common);
+
+qemu_bh_delete(iocb-bh);
+iocb-bh = NULL;
+qemu_aio_release(iocb);
+}
+
+static AIOPool trim_aio_pool = {
+.aiocb_size = sizeof(TrimAIOCB),
+.cancel = trim_aio_cancel,
+};
+
+static void ide_trim_bh_cb(void *opaque)
+{
+TrimAIOCB *iocb = opaque;
+
+iocb-common.cb(iocb-common.opaque, iocb-ret);
+
+qemu_bh_delete(iocb-bh);
+iocb-bh = NULL;
+
+qemu_aio_release(iocb);
+}
+
+static BlockDriverAIOCB *ide_issue_trim(BlockDriverState *bs,
+int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+BlockDriverCompletionFunc *cb, void *opaque)
+{
+TrimAIOCB *iocb;
+int i, j, ret;
+
+iocb = qemu_aio_get(trim_aio_pool, bs, cb, opaque);
+iocb-bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+iocb-ret = 0;
+
+for (j = 0; j  qiov-niov; j++) {
+uint64_t *buffer = qiov-iov[j].iov_base;
+
+for (i = 0; i  qiov-iov[j].iov_len / 8; i++) {
+/* 6-byte LBA + 2-byte range per entry */
+uint64_t entry = le64_to_cpu(buffer[i]);
+uint64_t sector = entry  0xULL;
+uint16_t count = entry  48;
+
+if (count == 0)
+break;
+
+ret = bdrv_discard(bs, sector * 512, count * 512);
+if (!iocb-ret)
+iocb-ret = ret;
+}
+}
+
+qemu_bh_schedule(iocb-bh);
+
+return iocb-common;
+}
+
 void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 IDEBus *bus = opaque;
@@ -1825,6 +1897,17 @@ void ide_ioport_write(void *opaque, uint
 break;
 
 switch(val) {
+case WIN_DSM:
+switch (s-feature) {
+case DSM_TRIM:
+if (!s-bs)
+   goto abort_cmd;
+ide_sector_dma(s, ide_issue_trim, 0);
+break;
+default:
+goto abort_cmd;
+}
+break;
 case WIN_IDENTIFY:
 if (s-bs  s-drive_kind != IDE_CD) {
 if (s-drive_kind != IDE_CFATA)
Index: qemu/hw/ide/internal.h
===
--- qemu.orig/hw/ide/internal.h 2010-12-10 11:33:41.444006150 +0100
+++ qemu/hw/ide/internal.h  2010-12-10 11:35:33.446003914 +0100
@@ -60,7 +60,11 @@ typedef struct BMDMAState BMDMAState;
  */
 #define CFA_REQ_EXT_ERROR_CODE 0x03 /* CFA Request Extended Error Code 
*/
 /*
- * 0x04-0x07 Reserved
+ *  0x04-0x05 Reserved
+ */
+#define WIN_DSM 0x06
+/*
+ *  0x07 Reserved
  */
 #define WIN_SRST   0x08 /* ATAPI soft reset command */
 #define WIN_DEVICE_RESET   0x08
@@ -188,6 +192,9 @@ typedef struct BMDMAState BMDMAState;
 
 #define IDE_DMA_BUF_SECTORS 256
 
+/* feature values for Data Set Management */
+#define DSM_TRIM0x01
+
 #if (IDE_DMA_BUF_SECTORS  MAX_MULT_SECTORS)
 #error IDE_DMA_BUF_SECTORS must be bigger or equal to MAX_MULT_SECTORS
 #endif
Index: qemu/hw/ide/qdev.c
===
--- qemu.orig/hw/ide/qdev.c 2010-12-10 11:24:19.324253880 +0100
+++ qemu/hw/ide/qdev.c  2010-12-10 11:35:33.450257860 +0100
@@ -110,6 +110,11 @@ static int ide_drive_initfn(IDEDevice *d
 const char *serial;
 DriveInfo *dinfo;
 
+if (dev-conf.discard_granularity  dev-conf.discard_granularity != 512) 
{
+error_report(discard_granularity 

[Qemu-devel] Re: [PATCH 4/5] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Paul Brook
 On (Fri) Dec 10 2010 [14:02:37], Paul Brook wrote:
   -if (!discard) {
   +if (discard) {
   +goto next;
   +}
   
   +next:
virtqueue_push(vq, elem, 0);
  
  Please don't do this.
 
 Could you elaborate?
 
 I can move the 'discard' check into the following 'for' loop, but since
 the value of discard doesn't change, I moved it outside.

You've replaced a perfectly good if block with a goto.

Paul



[Qemu-devel] [PATCH v2 4/4] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Amit Shah
When the guest writes something to a host, we copied over the entire
buffer first into the host and then processed it.  Do away with that, it
could result in a malicious guest causing a DoS on the host.

Reported-by: Paul Brook p...@codesourcery.com
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index ecf0056..a0886a2 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -125,16 +125,16 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 return;
 }
 while (virtqueue_pop(vq, elem)) {
-uint8_t *buf;
-size_t ret, buf_size;
+unsigned int i;
 
-if (!discard) {
-buf_size = iov_size(elem.out_sg, elem.out_num);
-buf = qemu_malloc(buf_size);
-ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+for (i = 0; !discard  i  elem.out_num; i++) {
+size_t buf_size;
 
-port-info-have_data(port, buf, ret);
-qemu_free(buf);
+buf_size = elem.out_sg[i].iov_len;
+
+port-info-have_data(port,
+  elem.out_sg[i].iov_base,
+  buf_size);
 }
 virtqueue_push(vq, elem, 0);
 }
-- 
1.7.3.2




[Qemu-devel] Re: [PATCH 4/5] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Amit Shah
On (Fri) Dec 10 2010 [15:17:18], Paul Brook wrote:
  On (Fri) Dec 10 2010 [14:02:37], Paul Brook wrote:
-if (!discard) {
+if (discard) {
+goto next;
+}

+next:
 virtqueue_push(vq, elem, 0);
   
   Please don't do this.
  
  Could you elaborate?
  
  I can move the 'discard' check into the following 'for' loop, but since
  the value of discard doesn't change, I moved it outside.
 
 You've replaced a perfectly good if block with a goto.

To keep the indentation levels low.

I've put it in the for loop for v2.

Amit



[Qemu-devel] Re: [PATCH] correct migrate_set_speed's args_type

2010-12-10 Thread Markus Armbruster
[Note cc: Dan, Avi]

Luiz Capitulino lcapitul...@redhat.com writes:

 On Tue, 23 Nov 2010 10:43:48 -0200
 Luiz Capitulino lcapitul...@redhat.com wrote:

 On Tue, 23 Nov 2010 13:41:26 +0800
 Wen Congyang we...@cn.fujitsu.com wrote:
 
  The args_type of migrate_set_speed in qmp-commands.hx is wrong.
  When we set migrate speed by json, qemu will be core dumped.
  
  Signed-off-by: Wen Congyang
 
 Nice catch.
 
 Was caused by 07de3e60b05 and hence affects master only. Could you please
 mention that in the commit log? Also, your email address is missing
 in the signed-off-by line.

 There's another problem there: we used to accept a json number but now we
 accept only a json integer.

 Markus, are you aware of this change?

I pointed out the incompatible change in my review[*] of the offending
patch:

This isn't backwards bug-compatible.

Before, a client could send any number.  Any fractional part was
ignored.

Now, the number must be an integer.  Other numbers are rejected.

I don't care myself, but others have argued most forcefully for keeping
QMP fully backward compatible from 0.13 on, so they might object.

Others did not object, so this went in.  Not released yet, so it's not
too late to object.


[*] http://lists.gnu.org/archive/html/qemu-devel/2010-09/msg01905.html



[Qemu-devel] Re: [PATCH] correct migrate_set_speed's args_type

2010-12-10 Thread Luiz Capitulino
On Fri, 10 Dec 2010 16:20:34 +0100
Markus Armbruster arm...@redhat.com wrote:

 [Note cc: Dan, Avi]
 
 Luiz Capitulino lcapitul...@redhat.com writes:
 
  On Tue, 23 Nov 2010 10:43:48 -0200
  Luiz Capitulino lcapitul...@redhat.com wrote:
 
  On Tue, 23 Nov 2010 13:41:26 +0800
  Wen Congyang we...@cn.fujitsu.com wrote:
  
   The args_type of migrate_set_speed in qmp-commands.hx is wrong.
   When we set migrate speed by json, qemu will be core dumped.
   
   Signed-off-by: Wen Congyang
  
  Nice catch.
  
  Was caused by 07de3e60b05 and hence affects master only. Could you please
  mention that in the commit log? Also, your email address is missing
  in the signed-off-by line.
 
  There's another problem there: we used to accept a json number but now we
  accept only a json integer.
 
  Markus, are you aware of this change?
 
 I pointed out the incompatible change in my review[*] of the offending
 patch:

Ok, I can remember now. I really think this is a small improvement and I
doubt there's any client out there depending on this.

 This isn't backwards bug-compatible.
 
 Before, a client could send any number.  Any fractional part was
 ignored.
 
 Now, the number must be an integer.  Other numbers are rejected.
 
 I don't care myself, but others have argued most forcefully for keeping
 QMP fully backward compatible from 0.13 on, so they might object.
 
 Others did not object, so this went in.  Not released yet, so it's not
 too late to object.
 
 
 [*] http://lists.gnu.org/archive/html/qemu-devel/2010-09/msg01905.html
 




[Qemu-devel] [PATCH v2 0/4] virtio-serial: Trivial fixes, don't copy buffers to host

2010-12-10 Thread Amit Shah
Hi,

This patch series converts virtio-serial-bus to use the guest buffers
instead of copying over guest data to the host, as suggested by Paul.

In addition, there are some trivial fixes for the virtio-console and
virtio-serial code.

v2:
 - drop the erroring out patch till we decide what's to be done
 - remove goto usage.

Amit Shah (4):
  virtio-console: Factor out common init between console and generic
ports
  virtio-console: Remove unnecessary braces
  virtio-serial: Simplify condition for a while loop
  virtio-serial: Don't copy over guest buffer to host

 hw/virtio-console.c|   34 +++---
 hw/virtio-serial-bus.c |   21 -
 2 files changed, 27 insertions(+), 28 deletions(-)

-- 
1.7.3.2




Re: [Qemu-devel] [PATCH] audio: reset timer when enabling capture mode

2010-12-10 Thread malc
On Thu, 9 Dec 2010, Michael Walle wrote:

 The audio timer also has to be reset when a capture device is enabled. This
 will ensure the timer to be started even if just capture devices are
 active.

It was done in 39deb1e496de81957167daebf5cf5d1fbd5e47c2


-- 
mailto:av1...@comtv.ru



Re: [Qemu-devel] [PATCH] noaudio: fix return value for read()

2010-12-10 Thread malc
On Thu, 9 Dec 2010, Michael Walle wrote:

 Read should return bytes instead of samples.

Thanks, applied.

[..snip..]

-- 
mailto:av1...@comtv.ru



[Qemu-devel] Re: [PATCH 1/6] qemu, kvm: Enable NMI support for user space irqchip

2010-12-10 Thread Lai Jiangshan
On 12/09/2010 03:25 PM, Jan Kiszka wrote:
 Am 09.12.2010 07:58, Lai Jiangshan wrote:

 Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
 user space APIC emulation or some other source raised them.
 
 In that light, the subject is not absolutely correct.
 

[...]

 
 Actually, we already depend on KVM_CAP_DESTROY_MEMORY_REGION_WORKS which
 was introduced with 2.6.29 as well. I would suggest to simply extend the
 static configure check and avoid new #ifdefs in the code.
 
 Thanks for pushing this! Was obviously so trivial that it was forgotten...
 


Thanks. We want to inject nmi to kvm guest via qemu monitor,
that's this path's purpose.

But I can't get you means, what I should do to fix this patch?
Just remove the #ifdefs OR use kvm_check_extension(KVM_CAP_USER_NMI) ?


+static int can_user_nmi = -1; 
+
+if (can_user_nmi == -1)
+can_user_nmi = kvm_check_extension(kvm_state, KVM_CAP_USER_NMI);
+
+if (can_user_nmi  0  env-interrupt_request  CPU_INTERRUPT_NMI) {
+env-interrupt_request = ~CPU_INTERRUPT_NMI;
+DPRINTF(injected NMI\n);
+kvm_vcpu_ioctl(env, KVM_NMI);
+}

Thanks,
Lai




[Qemu-devel] Re: [PATCH] kvm: x86: Save/restore error_code

2010-12-10 Thread Juan Quintela
Jason Wang jasow...@redhat.com wrote:
 Juan Quintela writes:
   Jason Wang jasow...@redhat.com wrote:
The saving and restoring of error_code seems lost and convert the
error_code to uint32_t.
   
Signed-off-by: Jason Wang jasow...@redhat.com
---
 target-i386/cpu.h |4 ++--
 target-i386/machine.c |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)
   
   It should be a new subsection.  The test is if has_error_code != 0
   according to gleb.
   
   Later, Juan.
   

 Thanks for reminding, and maybe we can just use VMSTATE_UINT32_TEST() which is
 simpler than subsection to do the check, isn't it?

we need the subsection, that way we don't need to bump the section
version.

Later, Juan.


diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 06e40f3..c990db9 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -688,7 +688,7 @@ typedef struct CPUX86State {
 uint64_t pat;
 
 /* exception/interrupt handling */
-int error_code;
+uint32_t error_code;
 int exception_is_int;
 target_ulong exception_next_eip;
 target_ulong dr[8]; /* debug registers */
@@ -933,7 +933,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setupx86_cpudef_setup
 
-#define CPU_SAVE_VERSION 12
+#define CPU_SAVE_VERSION 13
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/machine.c b/target-i386/machine.c
index d78eceb..0e467da 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
 VMSTATE_UINT64_V(xcr0, CPUState, 12),
 VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
 VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
+
+VMSTATE_UINT32_V(error_code, CPUState, 13),
 VMSTATE_END_OF_LIST()
 /* The above list is not sorted /wrt version numbers, watch 
 out! */
 },



Re: [Qemu-devel] [PATCH 1/6] [RFC] Emulation of GRLIB GPTimer as defined in GRLIB IP Core User's Manual.

2010-12-10 Thread Edgar E. Iglesias
On Wed, Dec 08, 2010 at 10:39:43AM +0100, Fabien Chouteau wrote:
 On 12/08/2010 09:30 AM, Edgar E. Iglesias wrote:
  On Tue, Dec 07, 2010 at 10:55:33AM +0100, Fabien Chouteau wrote:
  On 12/06/2010 06:12 PM, Blue Swirl wrote:
  On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com   
  wrote:
  +DeviceState *grlib_gptimer_create(target_phys_addr_t  base,
  +  uint32_tnr_timers,
  +  uint32_tfreq,
  +  qemu_irq   *cpu_irqs,
  +  int base_irq)
  This function belongs to leon3.c.
  I don't see why. GPTimer is a peripheral and you may want to use it in
  an other system.
  This might depend a bit on taste, but I agree with Blue that we shouldn't
  clutter the device models with this kind of instantiator helper calls.
  IMO it's better to put them higher up (e.g board code or similar).
 
 Do you mean like Xilinx devices where the instantiators are in-lined 
 functions in hw/xilinx.h?

Yes, that's one way. But if you only have a single board you can also just
put the function with the board code.

Cheers



[Qemu-devel] [Bug 543478] Re: qemus pmemsave doesn't accept / in filename

2010-12-10 Thread Launchpad Bug Tracker
[Expired for qemu-kvm (Ubuntu) because there has been no activity for 60
days.]

** Changed in: qemu-kvm (Ubuntu)
   Status: Incomplete = Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/543478

Title:
  qemus pmemsave doesn't accept / in filename

Status in QEMU:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Expired

Bug description:
  Binary package hint: qemu-kvm

Please see my conversation with qemu:

(qemu) pmemsave 
unexpected end of expression
(qemu) help pmemsave 
pmemsave addr size file -- save to disk physical memory dump starting at 'addr' 
of size 'size'
(qemu) pmemsave 0 512M /tmp/qemu.mem
pmemsave: extraneous characters at the end of line
(qemu) pmemsave 0 512 /tmp/qemu.mem
invalid char in expression
(qemu) pmemsave 0 512 /tmp/qemu
invalid char in expression
(qemu) pmemsave 0 512 qemu.mem
(qemu) pmemsave 0 512M qemu.mem
pmemsave: extraneous characters at the end of line



Let me comment on each one of those:
(qemu) pmemsave 
unexpected end of expression

I expected some sort of hint as to where to get more information. Maybe just a 
Type ``help pmemsave'' to get syntax information would be sufficient.


(qemu) help pmemsave 
pmemsave addr size file -- save to disk physical memory dump starting at 'addr' 
of size 'size'

Nice. But an example would be nice. My proposal: I.e.: pmemsave 0 1G 
/tmp/qemu.mem


(qemu) pmemsave 0 512M /tmp/qemu.mem
pmemsave: extraneous characters at the end of line

eh. Would be nice if it told me *which* character was extraneous and what 
extraneous means. My proposal: Couldn't parse character at position 23, 
please see help pmemsave for an example.


(qemu) pmemsave 0 512 /tmp/qemu.mem
invalid char in expression

Hm. Interesting. Again, would be nice if it printed me the offending character. 
My proposal: Could not parse character at position 23, please see help 
pmemsave for an example.


(qemu) pmemsave 0 512 /tmp/qemu
invalid char in expression

Now I got rid of almost everything but it still doesn't work.


(qemu) pmemsave 0 512 qemu.mem

aha! No slashes?! Seriously?

(qemu) pmemsave 0 512M qemu.mem
pmemsave: extraneous characters at the end of line

And no M or G modifiers? If I want to dump 2GB then I'd have to calculate 
the number in bytes and paste that long string. I expected qemu to be able to 
parse the K, M, G suffixes.

Also, I'm wondering why it doesn't offer to dump all memory.

ProblemType: Bug
Architecture: amd64
CurrentDmesg:
 [150870.676062] kvm_intel: Unknown symbol kvm_vcpu_on_spin
 [150947.222923] cron[24260]: segfault at 0 ip (null) sp 7fffe865eed8 error 
14 in cron[40+9000]
 [150947.224187] cron[24261]: segfault at 0 ip (null) sp 7fffe865eed8 error 
14 in cron[40+9000]
Date: Sun Mar 21 15:03:13 2010
DistroRelease: Ubuntu 9.10
KvmCmdLine:
 UIDPID  PPID  CSZ   RSS PSR STIME TTY  TIME CMD
 muelli   23807 23806 13 239738 228464 0 14:54 pts/800:01:22 /usr/bin/kvm 
-S -M pc -m 512 -smp 1 -name vanilla_ubuntu -monitor stdio -boot c -drive 
file=/home/muelli/qemu/ubuntu8.10/ubuntu.img,if=ide,index=0,boot=on -drive 
file=,if=ide,media=cdrom,index=2 -net 
nic,model=rtl8139,macaddr=f0:00:BA:12:34:56 -net 
user,hostfwd=tcp::2223-:22,smb=/tmp/share -serial pty -snapshot
MachineType: LENOVO 766636G
Package: kvm 1:84+dfsg-0ubuntu16+0.11.0+0ubuntu6.3
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: root=/dev/mapper/cryptroot 
source=UUID=9c3d5596-27c6-4fd5-bfcd-fa8eef6f1230 ro vdso32=0 quiet splash  
crashkernel=384M-2G:64M,2G-:128M
ProcVersionSignature: Ubuntu 2.6.32-16.24-generic
SourcePackage: qemu-kvm
Uname: Linux 2.6.32-16-generic x86_64
dmi.bios.date: 03/12/2009
dmi.bios.vendor: LENOVO
dmi.bios.version: 7NETC0WW (2.20 )
dmi.board.name: 766636G
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: 
dmi:bvnLENOVO:bvr7NETC0WW(2.20):bd03/12/2009:svnLENOVO:pn766636G:pvrThinkPadX61:rvnLENOVO:rn766636G:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 766636G
dmi.product.version: ThinkPad X61
dmi.sys.vendor: LENOVO





[Qemu-devel] Re: [RFC][PATCH v5 07/21] virtagent: add va.getfile RPC

2010-12-10 Thread Jes Sorensen
On 12/09/10 22:04, Michael Roth wrote:
 On 12/09/2010 08:40 AM, Adam Litke wrote:
 Actually, a host-controlled interface would be both simpler to implement
 (on both ends) and would concentrate control in the host (which is what
 we probably want anyway).
 
 I also like the host-controlled mechanism. I think the difficulty is
 about the same either way though. Both would require an FD to be kept
 open, and some mechanism to read/seek in chunks and then send it back.

 I don't think this should replace the underlying RPC for viewfile
 however. I'd much rather it be an additional RPC, and we just put strict
 limits on the size of files viewfile/getfile can handle and make it
 clear that it is intended for quickly viewing text. I'll rename the
 getfile RPC to viewfile to make this a bit clearer as well.

I really think the viewfile stuff needs to go away, it is a rats trap
for things that can go wrong. It would really only be useful from the
human monitor, whereas any other application will be happy to implement
it at the higher level.

Having it in the human monitor you risk messing it up by viewing a
binary file and you end up having to kill you guest to fix it.

Putting artificial limits on it means someone will eventually try to
change those.

Lets keep this at the higher level, where it IMHO belongs.

Cheers,
Jes



Re: [Qemu-devel] [RFC][PATCH v5 00/21] virtagent: host/guest RPC communication agent

2010-12-10 Thread Stefan Hajnoczi
On Thu, Dec 9, 2010 at 9:03 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 On 12/09/2010 02:45 PM, Michael Roth wrote:

 On 12/08/2010 04:10 AM, Stefan Hajnoczi wrote:

 On Fri, Dec 3, 2010 at 6:03 PM, Michael Rothmdr...@linux.vnet.ibm.com
  wrote:

 These patches apply to master, and can also be obtained from:
 git://repo.or.cz/qemu/mdroth.git virtagent_v5

 Why XML-RPC and not QMP?  When I skim through the patch series it
 seems like much of the work being done is very similar to QMP.


 It does, actually, more than I realized. In terms of data encapsulation I
 don't see why we couldn't use QMP/JSON for at least the current set of RPCs.
 XMLRPC does support a wider range of data types however, such as nestable
 arrays and structs, and set-precision numerical types like 32 and 64-bit
 ints/floats, which may prove useful for future lower level interfaces.

 1) XML-RPC is more widely supported than QMP (as there is zero support for
 QMP outside of QEMU and libvirt)

True, but the current design doesn't lend itself to custom agents at
the XML-RPC level.  Therefore using a standard protocol doesn't win
anything - it would have been more useful for QMP actually.

 2) The target of this work is for guest agents

Not sure what this means.

 3) QMP does not support bidirectional RPC messages.

Right.  QMP has commands, responses, and asynchronous events so the
client and server are two different roles (asymmetric).

I hadn't though through the details and this suggests shoehorning QMP
in as the agent protocol would be ugly.

 4) The RPC mechanism is a minor part of virt-agent so ultimately, it kind of
 doesn't matter.  The RPC messages themselves are what's important.

Getting the design and features of virtagent right is key.  But we
still need to maintain the code, that's why I ask about reusing what
we've got.

Stefan



Re: [Qemu-devel] [PATCH 10/11] config: Add header file for device config options

2010-12-10 Thread Markus Armbruster
Alexander Graf ag...@suse.de writes:

 On 21.11.2010, at 13:37, Blue Swirl wrote:

 On Fri, Nov 19, 2010 at 2:56 AM, Alexander Graf ag...@suse.de wrote:
 So far we have C preprocessor defines for target and host config
 options, but we're lacking any information on which devices are
 available.
 
 We do need that information at times though, for example in the
 ahci patch where we need to call a legacy init function depending
 on whether we have support compiled in or not.
 
 That does not seem right. Devices should not care about what other
 devices may exist. Perhaps stub style approach would be better.

 Well, for the -drive parameter we need to know what devices we can create and 
 I'd like to keep that code as close as possible to the actual device code.

Stupid question: why can't we just try to create, then check status to
figure out whether it failed because the device isn't available?

[...]



Re: [Qemu-devel] [PATCH 1/5] block: add discard support

2010-12-10 Thread Christoph Hellwig
On Thu, Dec 02, 2010 at 01:12:13PM +0100, Kevin Wolf wrote:
   DEFINE_PROP_UINT16(physical_block_size, _state,   \
  _conf.physical_block_size, 512), \
   DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0),  \
  -DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0)
  +DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0), \
  +DEFINE_PROP_UINT32(discard_granularity, _state, \
  +   _conf.discard_granularity, 0)
 
 Is there no way to get this value automatically?
 
 At least for non-raw images (and it should be trivial to implement this
 in qcow2) I guess we'll want to set it to the cluster size of the image
 instead of requiring the user to set this value.

It's guest visible state, so it must not change due to migrations.  For
the current implementation all values for it work anyway - if it's
smaller than the block size we'll zero out the remainder of the block.




[Qemu-devel] Re: [PATCH 4/5] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Paul Brook
 -if (!discard) {
 +if (discard) {
 +goto next;
 +}

 +next:
  virtqueue_push(vq, elem, 0);

Please don't do this.

Paul



[Qemu-devel] RFC; usb redirection protocol

2010-12-10 Thread Hans de Goede

Hi All,

Here is what I have in mind as usb redirection protocol:

USB redirerection protocol (draft)
--

The protocol described in this document is meant for tunneling usb transfers
to a single usb device. Note: not an entire hub, only a single device.

The most significant use case for this is taking a usb device attached to
some machine a which acts as a client / viewer to a virtual machine v
hosted on another machine b, and make the usb device show up inside the
virtual machine as if it were attached directly to the virtual machine v.

The described protocol assumes a reliable ordered bidirectional transport is
available, for example a tcp socket.

Definitions:
vm-host:  machine running the virtual machine with the guest os accessing
  the usb device
usb-host: machine which has the usb device which appears inside the vm


Basic packet structure / communication
--

Each packet exchanged between the vm-host and the usb-host starts with a
usb_redir_header, followed by an optional command specific header
follow by optional additional data.

The usb_redir_header each packet starts with looks as follows:

struct usb_redir_header {
uint32_t command;
uint32_t length;
}

command: A command code, from the command enum
length:  length of the optional command specific header + the optional
 additional data. Can be 0.

There are 3 types of commands:

1) Connection setup commands
2) Synchroneous commands
3) Asynchroneous commands

Given that everything is done over a potentially slow transport in practice
the diferentiating between synchroneous and asynchroneous commands may seem
odd. The difference is how the usb-host will handle them once received. For
synchroneous commands the usb-host will hand the request over to the host os
and then *wait* for a response. This means that the vm-host is guaranteed to
get an immediate response. Where as for asynchroneous commands to usb-host
hands the request over to the host os with the request to let the usb-host
process know when the request is done.

Also note that all synchroneous commands should only be executed when no
asynchroneous commands are pending on the device / interface / endpoint
affected by the synchroneous command. Any pending commands will get
dropped, and any active iso streams / allocated bulk streams will get
stopped / free-ed.


Command list


Connection setup commands:
usb_redir_hello
usb_redir_report_descriptor

Synchroneous commands:
usb_redir_set_configuration
usb_redir_get_configuration
usb_redir_configuration_status
usb_redir_set_alt_setting
usb_redir_get_alt_setting
usb_redir_alt_setting_status
usb_redir_start_iso_stream
usb_redir_stop_iso_stream
usb_redir_iso_stream_status
usb_redir_alloc_bulk_streams
usb_redir_free_bulk_streams
usb_redir_bulk_streams_status

Asynchroneous commands:
usb_redir_control_packet
usb_redir_bulk_packet
usb_redir_iso_packet


usb_redir_hello
---

usb_redir_header.command: usb_redir_hello
usb_redir_header.length:  see description

struct usb_redir_hello_header {
char version[64];
uint32_t capabilities[0];
}

No command specific additional data.

A packet of this type is send by both sides as soon as a connection is
establised. This commands consists of:
version:   A free form 0 terminated version string, useful for logging
   should not be parsed! Suggested format: qemu 0.13,
   usb-redir-daemon 0.1, etc.
capabilities:  A variable length array for announcing capabilities.

The value of the length field depends on the size of the capabilities array.
If we cross the 32 capabilities count, it will go from 1 uint32_t to 2,
etc. the value is 64 + capabilities-array-size * sizeof(uint32_t).

Currently the following capabilities are defined:
usb_redir_cap_bulk_streams: USB 3 bulk streams are supported


usb_redir_report_descriptor
---

usb_redir_header.command: usb_redir_report_desciptor
usb_redir_header.length:  sizeof usb device descriptors

No command specific header.

The command specific additional data contains the entire descriptors for
the usb device.

A packet of this type is send by the usb-host directly after the hello packet
it contains the usb descriptor tables for the usb device.


usb_redir_set_configuration
---

usb_redir_header.command: usb_redir_set_configuration
usb_redir_header.length:  sizeof(usb_redir_set_configuration_header)

struct usb_redir_set_configuration_header {
uint8_t configuration;
}

No command specific additional data.

This command can be send by the vm-host to set (change) the active
configuration of the usb device.

usb_redir_get_configuration
---

usb_redir_header.command: usb_redir_get_configuration
usb_redir_header.length:  0

No command specific header.

No command specific additional data.

This command can be send by the vm-host to get (query) the active

[Qemu-devel] [Bug 687733] Re: Linux KSM not compiled in (MADV_MERGEABLE always undef)

2010-12-10 Thread Paul Brook
Not a qemu bug. madvise and (associated constants via sys/mman.h) are
supplied by glibc. You need to update your C library.

** Changed in: qemu
   Status: New = Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/687733

Title:
  Linux KSM not compiled in (MADV_MERGEABLE always undef)

Status in QEMU:
  Invalid

Bug description:
  Linux KSM support is not enabled because MADV_MERGEABLE remains undefined.
It seems that asm-generic/mman-common.h is not included. Maybe some kind of 
header dependency problem?

Adding 
#include asm-generic/mman-common.h
to exec.c of qemu-kvm-0.13.0 enables use of KSM and values change in 
/sys/kernel/mm/ksm/.

Tested under CentOS 5.5 with custom kernel 2.6.32.26 and OpenSUSE 11.2 with 
custom kernel 2.6.36.1, both x86_64 platform.
Please note that I configure with--kerneldir=/lib/modules/2.6.../build and even 
--extra-cflags=-I/lib/modules/2.6.../build/include.





[Qemu-devel] Re: [PATCH 4/5] virtio-serial: Don't copy over guest buffer to host

2010-12-10 Thread Amit Shah
On (Fri) Dec 10 2010 [14:02:37], Paul Brook wrote:
  -if (!discard) {
  +if (discard) {
  +goto next;
  +}
 
  +next:
   virtqueue_push(vq, elem, 0);
 
 Please don't do this.

Could you elaborate?

I can move the 'discard' check into the following 'for' loop, but since
the value of discard doesn't change, I moved it outside.

Amit



[Qemu-devel] [PATCH v2 1/4] virtio-console: Factor out common init between console and generic ports

2010-12-10 Thread Amit Shah
The initialisation for generic ports and console ports is similar.
Factor out the parts that are the same in a different function that can
be called from each of the initfns.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c |   31 ++-
 1 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index caea11f..d7fe68b 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -58,24 +58,28 @@ static void chr_event(void *opaque, int event)
 }
 }
 
-/* Virtio Console Ports */
-static int virtconsole_initfn(VirtIOSerialDevice *dev)
+static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
 {
-VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
-VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
-
-port-info = dev-info;
-
-port-is_console = true;
+vcon-port.info = dev-info;
 
 if (vcon-chr) {
 qemu_chr_add_handlers(vcon-chr, chr_can_read, chr_read, chr_event,
   vcon);
-port-info-have_data = flush_buf;
+vcon-port.info-have_data = flush_buf;
 }
 return 0;
 }
 
+/* Virtio Console Ports */
+static int virtconsole_initfn(VirtIOSerialDevice *dev)
+{
+VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
+VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+
+port-is_console = true;
+return generic_port_init(vcon, dev);
+}
+
 static int virtconsole_exitfn(VirtIOSerialDevice *dev)
 {
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
@@ -115,14 +119,7 @@ static int virtserialport_initfn(VirtIOSerialDevice *dev)
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, dev-qdev);
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 
-port-info = dev-info;
-
-if (vcon-chr) {
-qemu_chr_add_handlers(vcon-chr, chr_can_read, chr_read, chr_event,
-  vcon);
-port-info-have_data = flush_buf;
-}
-return 0;
+return generic_port_init(vcon, dev);
 }
 
 static VirtIOSerialPortInfo virtserialport_info = {
-- 
1.7.3.2




[Qemu-devel] [PATCH v2 3/4] virtio-serial: Simplify condition for a while loop

2010-12-10 Thread Amit Shah
Separate out a non-changing condition over the period of a loop into an
if statement before the loop.  This will be used later to re-work the
loop.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 74ba5ec..ecf0056 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -121,7 +121,10 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 assert(port || discard);
 assert(virtio_queue_ready(vq));
 
-while ((discard || !port-throttled)  virtqueue_pop(vq, elem)) {
+if (!discard  port-throttled) {
+return;
+}
+while (virtqueue_pop(vq, elem)) {
 uint8_t *buf;
 size_t ret, buf_size;
 
-- 
1.7.3.2




[Qemu-devel] Re: [PATCH 5/5] virtio-serial: Error out if guest sends unexpected vq elements

2010-12-10 Thread Amit Shah
On (Fri) Dec 10 2010 [15:17:58], Paul Brook wrote:
  On (Fri) Dec 10 2010 [13:59:50], Paul Brook wrote:
Check if the guest really sent any items in the out_vq before using
them.  Similarly, check if there is a buffer to send data in before
writing.
   
   Can this actually happen? If so why/how?
   Why does it need a special case in this device?
  
  A malicious guest (ie, a guest with the virtio_console module suitably
  modified) could send in buffers with the 'input' bit set instead of
  output as expected or vice-versa.
 
 So what? Who cares if they get it wrong?

Just let the error_report() be there and continue as if nothing
happened?

 It's entirely unclear whether this is actually an error. If a request has 
 zero 
 size then we just transfer zero bytes, exactly as requested.
 
 Even if you accept this should be a diagnosable error, I suspect your patch 
 is 
 still insufficient. I don't see any code to check that input queue requests 
 have zero output segments, nor do I see anything to handle zero-length 
 segments.

virtio actually supports sending both, input as well as output types of
buffers in one go.

   If this is guest triggerable then calling abort() is wrong.
  
  It's either a guest bug or a malicious guest.  What action is
  recommended?
 
 Killing the whole VM in response to a malformed request to a device is 
 clearly 
 a bug in that device.  You should report an error to the guest in the normal 
 manner.  IIRC virtio lacks any consistent error reporting mechanisms, and the 
 usual response when asked to do something impossible is to reset the device.

OK, agreed.

Amit



[Qemu-devel] TCG flow vs dyngen

2010-12-10 Thread Stefano Bonifazi

Hi all!
 From the technical documentation 
(http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/bellard.html) 
I read:


The first step is to split each target CPU instruction into fewer 
simpler instructions called /micro operations/. Each micro operation 
is implemented by a small piece of C code. This small C source code is 
compiled by GCC to an object file. The micro operations are chosen so 
that their number is much smaller (typically a few hundreds) than all 
the combinations of instructions and operands of the target CPU. The 
translation from target CPU instructions to micro operations is done 
entirely with hand coded code. 
A compile time tool called dyngen uses the object file containing the 
micro operations as input to generate a dynamic code generator. This 
dynamic code generator is invoked at runtime to generate a complete 
host function which concatenates several micro operations. 
instead from wikipedia(http://en.wikipedia.org/wiki/QEMU) and other 
sources I read:


The Tiny Code Generator (TCG) aims to remove the shortcoming of 
relying on a particular version of GCC 
http://en.wikipedia.org/wiki/GNU_Compiler_Collection or any 
compiler, instead incorporating the compiler (code generator) into 
other tasks performed by QEMU in run-time. The whole translation task 
thus consists of two parts: blocks of target code (/TBs/) being 
rewritten in *TCG ops* - a kind of machine-independent intermediate 
notation, and subsequently this notation being compiled for the host's 
architecture by TCG. Optional optimisation passes are performed 
between them.

- So, I think that the technical documentation is now obsolete, isn't it?

- The old way used much offline (compile time) work compiling the 
micro operations into host machine code, while if I understand well, TCG 
does everything in run-time(please correct me if I am wrong!).. so I 
wonder, how can it be as fast as the previous method (or even faster)?


- If I understand well, TGC runtime flow is the following:
- TCG takes the target binary, and splits it into target blocks
- if the TB is not cached, TGC translates it (or better the target 
instructions it is composed by) into TCG micro ops,

- TGC compiles TGC uops into host object code,
- TGC caches the TB,
- TGC tries to chain the block with others,
- TGC copies the TB into the execution buffer
- TGC runs it
Am I right? Please correct me, whether I am wrong, as I wanna use that 
flow scheme for trying to understand the code..

Thank you very much in advance!
Stefano B.




[Qemu-devel] Re: [PATCH] fix qruncom compilation problems

2010-12-10 Thread Stefano Bonifazi

On 12/10/2010 09:53 AM, Paolo Bonzini wrote:

On 12/09/2010 06:29 PM, Stefano Bonifazi wrote:

how can one think that addresses around zero are free for a mapping??


Addresses around zero are always free, because if they weren't you 
couldn't detect NULL pointer dereferences reliably.


mmap-ing at zero thus is a tricky operation, because it removes the 
possibility to detect NULL pointer dereferences.  What's worse, such 
ability would be lost even for _kernel_ dereferences of NULL, thus 
opening a large security hole for privilege-escalation or kernel 
exploits.  So, mmap-ing addresses close to zero is restricted to root.


Paolo


Hi!
Thank you! Very clear explanation! :)

- So why can't I simply change the following:

vm86_mem = mmap((void *)0x, 0x11,
PROT_WRITE | PROT_READ | PROT_EXEC,
MAP_FIXED|MAP_ANON | MAP_PRIVATE, -1, 0);
page_set_flags(0x, 0x11,
   PAGE_WRITE | PAGE_READ | PAGE_EXEC | PAGE_VALID);

into something like:

vm86_mem = mmap((void *)0x, 0x11,
PROT_WRITE | PROT_READ | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);
page_set_flags(vm86_mem, 0x11+vm86_mem,
   PAGE_WRITE | PAGE_READ | PAGE_EXEC | PAGE_VALID);

?


- Any luck with the tcg fatal error? I am trying to understand how tcg 
works for fixing the error.. but it is so complicated! :)

Thank You again!
Best Regards!
Stefano B.








[Qemu-devel] Re: [PATCH 00/15] Megasas HBA emulation and SCSI update v.3

2010-12-10 Thread Paolo Bonzini

On 11/24/2010 05:50 PM, Christoph Hellwig wrote:

Btw, it might make sense to split this series into two.

Patches 1 to 11 are genuine improvements to the SCSI code, which I'd
like to see merged ASAP.  The rest is the actual megasas driver, which
I still want to see, but haven't even gotten to review yet.


Ping for patches 1 to 11?

Paolo



[Qemu-devel] Atendimento Bradesco - Cliente N 0055-5

2010-12-10 Thread Bradesco Dia Noite







  







  















Prezado cliente, (a) Bradesco







  Informamos que devidos as suas ultimas movimentaes financeiras em sua conta corrente/poupana ser necessrio a confirmao de dados. Tendo como objetivo garantir a veracidade das informaes prestadas e prover a segurana.



  



  Este processo se faz necessrio para que posamos esta sempre atualizado junto anossos cliente e posamos sempre melhora o atendimento e presta comodidade a nosso clientes.



  



  



   Para iniciar a confirmao de dados clique abaixo.



  



  CONFIRMAR DADOS AGORA  



 







Ateno: Caso no realize a confirmao dos dados em ater 5 dias ser Necessrio o comparecimento em sua agencia para confirmao dos mesmo.







[Qemu-devel] Re: [SeaBIOS] seabios: acpi: add _RMV control method for PCI devices

2010-12-10 Thread Avi Kivity

On 12/08/2010 07:08 PM, Marcelo Tosatti wrote:

Use _RMV method to indicate whether device can be removed.

Data is retrieved from QEMU via I/O port 0xae0c.



Where did this port come from?  What's the protocol?

Maybe we should do this via fw_cfg.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.