Re: [PATCH v2] io_uring: fix short read slow path

2022-07-05 Thread Jens Axboe
On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
>> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
>>>> so when we ask for more we issue an extra short reads, making sure we go
>>>> through the two short reads path.
>>>> (Unfortunately I wasn't quite sure what to fiddle with to issue short
>>>> reads in the first place, I tried cutting one of the iovs short in
>>>> luring_do_submit() but I must not have been doing it properly as I ended
>>>> up with 0 return values which are handled by filling in with 0 (reads
>>>> after eof) and that didn't work well)
>>>
>>> Do you remember the kernel version where you first saw these problems?
>>
>> Since you're quoting my paragraph about testing two short reads, I've
>> never seen any that I know of; but there's also no reason these couldn't
>> happen.
>>
>> Single short reads have been happening for me with O_DIRECT (cache=none)
>> on btrfs for a while, but unfortunately I cannot remember which was the
>> first kernel I've seen this on -- I think rather than a kernel update it
>> was due to file manipulations that made the file eligible for short
>> reads in the first place (I started running deduplication on the backing
>> file)
>>
>> The older kernel I have installed right now is 5.16 and that can
>> reproduce it --  I'll give my laptop some work over the weekend to test
>> still maintained stable branches if that's useful.
> 
> Hi Dominique,
> Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> async context"). The comment above QEMU's luring_resubmit_short_read()
> claims that short reads are a bug that was fixed by Linux commit
> 9d93a3f5a0c.
> 
> If the comment is inaccurate it needs to be fixed. Maybe short writes
> need to be handled too.
> 
> I have CCed Jens and the io_uring mailing list to clarify:
> 1. Are short IORING_OP_READV reads possible on files/block devices?
> 2. Are short IORING_OP_WRITEV writes possible on files/block devices?

In general we try very hard to avoid them, but if eg we get a short read
or write from blocking context (eg io-wq), then io_uring does return
that. There's really not much we can do here, it seems futile to retry
IO which was issued just like it would've been from a normal blocking
syscall yet it is still short.

-- 
Jens Axboe




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-17 Thread Jens Axboe
On 10/17/20 8:29 AM, Ju Hyung Park wrote:
> Hi Jens.
> 
> On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe  wrote:
>>
>> Would be great if you could try 5.4.71 and see if that helps for your
>> issue.
>>
> 
> Oh wow, yeah it did fix the issue.
> 
> I'm able to reliably turn off and start the VM multiple times in a row.
> Double checked by confirming QEMU is dynamically linked to liburing.so.1.
> 
> Looks like those 4 io_uring fixes helped.

Awesome, thanks for testing!

-- 
Jens Axboe




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-16 Thread Jens Axboe
On 10/16/20 12:04 PM, Ju Hyung Park wrote:
> A small update:
> 
> As per Stefano's suggestion, disabling io_uring support from QEMU from
> the configuration step did fix the problem and I'm no longer having
> hangs.
> 
> Looks like it __is__ an io_uring issue :(

Would be great if you could try 5.4.71 and see if that helps for your
issue.

-- 
Jens Axboe




Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Jens Axboe
...

-- 
Jens Axboe




Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Jens Axboe
On 11/26/2014 01:51 PM, Mike Snitzer wrote:
 On Wed, Nov 26 2014 at  2:48pm -0500,
 Jens Axboe ax...@kernel.dk wrote:
 
 On 11/21/2014 08:49 AM, Mike Snitzer wrote:
 On Fri, Nov 21 2014 at  4:54am -0500,
 Christoph Hellwig h...@infradead.org wrote:

 On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
 virtio_blk incorrectly established -1U as the default for these
 queue_limits.  Set these limits to sane default values to avoid crashing
 the kernel.  But the virtio-blk protocol should probably be extended to
 allow proper stacking of the disk's limits from the host.

 This change fixes a crash that was reported when virtio-blk was used to
 test linux-dm.git commit 604ea90641b4 (dm thin: adjust max_sectors_kb
 based on thinp blocksize) that will initially set max_sectors to
 max_hw_sectors and then rounddown to the first power-of-2 factor of the
 DM thin-pool's blocksize.  Basically that commit assumes drivers don't
 suck when establishing max_hw_sectors so it acted like a canary in the
 coal mine.

 Is that a crash in the host or guest?  What kind of mishandling did you
 see?  Unless the recent virtio standard changed anything the host
 should be able to handle our arbitrary limits, and even if it doesn't
 that something we need to hash out with qemu and the virtio standards
 folks.

 Some good news: this guest crash isn't an issue with recent kernels (so
 upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
 drop my virtio_blk patch; even though some of it's limits are clearly
 broken I'll defer to the virtio_blk developers on the best way forward
 -- sorry for the noise!).

 The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
 actually _test_ on upstream before reporting a crash against upstream!)

 [root@RHEL-6 ~]# echo 1073741824  /sys/block/vdc/queue/max_sectors_kb
 [root@RHEL-6 ~]# lvs

 Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
  kernel:Kernel panic - not syncing: Fatal exception

 Here is the RHEL6 guest crash, just for full disclosure:

 kernel BUG at fs/direct-io.c:696!
 invalid opcode:  [#1] SMP
 last sysfs file: /sys/devices/virtual/block/dm-4/dev
 CPU 0
 Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 
 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror 
 dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core 
 virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio 
 pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]

 Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
 RIP: 0010:[811ce336]  [811ce336] 
 __blockdev_direct_IO_newtrunc+0x986/0x1270
 RSP: 0018:88011a11ba48  EFLAGS: 00010287
 RAX:  RBX: 8801192fbd28 RCX: 1000
 RDX: ea0003b3d218 RSI: 88011aac4300 RDI: 880118572378
 RBP: 88011a11bbe8 R08:  R09: 
 R10:  R11:  R12: 8801192fbd00
 R13:  R14: 880118c3cac0 R15: 
 FS:  7fde78bc37a0() GS:88002820() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 012706f0 CR3: 00011a432000 CR4: 000407f0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process lvs (pid: 1679, threadinfo 88011a11a000, task 8801185a4aa0)
 Stack:
  88011a11bb48 88011a11baa8 8801000c 88011a11bb18
 d   88011a11bdc8 88011a11beb8
 d 000c1a11baa8 880118c3cb98  18c3ccb8
 Call Trace:
  [811c9e90] ? blkdev_get_block+0x0/0x20
  [811cec97] __blockdev_direct_IO+0x77/0xe0
  [811c9e90] ? blkdev_get_block+0x0/0x20
  [811caf17] blkdev_direct_IO+0x57/0x60
  [811c9e90] ? blkdev_get_block+0x0/0x20
  [8112619b] generic_file_aio_read+0x6bb/0x700
  [811cba60] ? blkdev_get+0x10/0x20
  [811cba70] ? blkdev_open+0x0/0xc0
  [8118af4f] ? __dentry_open+0x23f/0x360
  [811ca2d1] blkdev_aio_read+0x51/0x80
  [8118dc6a] do_sync_read+0xfa/0x140
  [8109eaf0] ? autoremove_wake_function+0x0/0x40
  [811ca22c] ? block_ioctl+0x3c/0x40
  [811a34c2] ? vfs_ioctl+0x22/0xa0
  [811a3664] ? do_vfs_ioctl+0x84/0x580
  [8122cee6] ? security_file_permission+0x16/0x20
  [8118e625] vfs_read+0xb5/0x1a0
  [8118e761] sys_read+0x51/0x90
  [810e5aae] ? __audit_syscall_exit+0x25e/0x290
  [8100b072] system_call_fastpath+0x16/0x1b
 Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 
 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 0f 0b eb fe 
 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
 RIP  [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270
  RSP 88011a11ba48
 ---[ end trace 73be5dcaf8939399

Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Jens Axboe
On 11/26/2014 02:51 PM, Mike Snitzer wrote:
 On Wed, Nov 26 2014 at  3:54pm -0500,
 Jens Axboe ax...@kernel.dk wrote:
 
 On 11/26/2014 01:51 PM, Mike Snitzer wrote:
 On Wed, Nov 26 2014 at  2:48pm -0500,
 Jens Axboe ax...@kernel.dk wrote:


 That code isn't even in mainline, as far as I can tell...

 Right, it is old RHEL6 code.

 But I've yet to determine what changed upstream that enables this to
 just work with a really large max_sectors (I haven't been looking
 either).

 Kind of hard for the rest of us to say, since it's triggering a BUG in
 code we don't have :-)
 
 I never asked you or others to weigh in on old RHEL6 code.  Once I
 realized upstream worked even if max_sectors is _really_ high I said
 sorry for the noise.
 
 But while you're here, I wouldn't mind getting your take on virtio-blk
 setting max_hw_sectors to -1U.
 
 As I said in my original reply to mst: it only makes sense to set a
 really high initial upper bound like that in a driver if that driver
 goes on to stack an underlying device's limit.

-1U should just work, IMHO, there's no reason we should need to cap it
at some synthetic value. That said, it seems it should be one of those
parameters that should be negotiated up and set appropriately.

-- 
Jens Axboe




Re: [Qemu-devel] Linux multiqueue block layer thoughts

2013-11-28 Thread Jens Axboe
On Wed, Nov 27 2013, Stefan Hajnoczi wrote:
 I finally got around to reading the Linux multiqueue block layer paper
 and wanted to share some thoughts about how it relates to QEMU and
 dataplane/QContext:
 http://kernel.dk/blk-mq.pdf
 
 I think Jens has virtio-blk multiqueue patches.  So let's imagine that
 the virtio-blk device has multiple virtqueues.  (virtio-scsi is
 already multiqueue BTW.)
 
 The paper focusses on two queue mappings: 1 queue per core and 1 queue
 per node.  In both cases the idea is to keep the block I/O code path
 localized.  This makes block I/O scale as the number of CPUs
 increases.
 
 In QEMU we'd want to set up a mapping for the virtio-blk mq device:
 each guest vcpu or guest node has a virtio-blk virtqueue which is
 serviced by a dataplane/QContext thread.
 
 QEMU would then process requests across these queues in parallel,
 although currently BlockDriverState is not thread-safe.  At least for
 raw we should be able to submit requests in parallel from QEMU.
 
 Unfortunately there are some complications in the QEMU block layer:
 QEMU's own accounting, request tracking, and throttling features are
 global.  We'd need to eventually do something similar to the
 multiqueue block layer changes in the kernel to detangle this state.
 
 Doing multiqueue for image formats is much more challenging - we'd
 have to tackle thread-safety in qcow2 and friends.  For network block
 drivers like Gluster or NBD it's also not 100% clear what the best
 approach is.  But I think the target here is local SSDs that are
 capable of high IOPs together with an SMP guest.
 
 At the end of all this we'd arrive at the following architecture:
 1. Guest virtio device has multiple queues (1 per node or vcpu).
 2. QEMU has multiple dataplane/QContext threads that process virtqueue
 kicks, they are bound to host CPUs/nodes.
 3. Linux kernel has multiqueue block I/O.

I think that sounds very reasonable. Let me know if there's anything you
need help or advice with.

 Jens: when experimenting with multiqueue virtio-blk, how far did you
 modify QEMU to eliminate global request processing state from block.c?

I did very little scaling testing on virtio-blk, it was more a demo case
for conversion than anything else. So probably not of much use to what
you are looking for...

-- 
Jens Axboe




[Qemu-devel] Re: [PATCH RFC] virtio_blk: Use blk-iopoll for host-guest notify

2010-05-18 Thread Jens Axboe
On Tue, May 18 2010, Stefan Hajnoczi wrote:
 On Fri, May 14, 2010 at 05:30:56PM -0500, Brian Jackson wrote:
  Any preliminary numbers? latency, throughput, cpu use? What about comparing 
  different weights?
 
 I am running benchmarks and will report results when they are in.

I'm very interested as well, I have been hoping for some more adoption
of this. I have mptsas and mpt2sas patches pending as well.

I have not done enough and fully exhaustive weight analysis, so note me
down for wanting such an analysis on virtio_blk as well.

-- 
Jens Axboe




[Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jens Axboe
On Tue, May 04 2010, Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?
 
 I've given up on figuring out the block device.  What seem to me to be sane
 semantics along the lines of memory barriers are foreign to disk people: they
 want (and depend on) flushing everywhere.
 
 For example, tdb transactions do not require a flush, they only require what
 I would call a barrier: that prior data be written out before any future data.
 Surely that would be more efficient in general than a flush!  In fact, TDB
 wants only writes to *that file* (and metadata) written out first; it has no
 ordering issues with other I/O on the same device.
 
 A generic I/O interface would allow you to specify this request depends on 
 these
 outstanding requests and leave it at that.  It might have some sync flush
 command for dumb applications and OSes.  The userspace API might be not be as
 precise and only allow such a barrier against all prior writes on this fd.
 
 ISTR someone mentioning a desire for such an API years ago, so CC'ing the
 usual I/O suspects...

It would be nice to have a more fuller API for this, but the reality is
that only the flush approach is really workable. Even just strict
ordering of requests could only be supported on SCSI, and even there the
kernel still lacks proper guarantees on error handling to prevent
reordering there.

-- 
Jens Axboe





Re: [Qemu-devel] cdrom disc type - is this patch correct? (unbreaks recent FreeBSD guest's -cdrom access)

2007-11-19 Thread Jens Axboe
 0x01
 #define MST_SEP_MUTE0x02
 
 u_int16_t   max_read_speed; /* max raw data rate in bytes/1000 */
 u_int16_t   max_vol_levels; /* number of discrete volume levels */
 u_int16_t   buf_size;   /* internal buffer size in bytes/1024 
 */
 u_int16_t   cur_read_speed; /* current data rate in bytes/1000  */
 
 u_int8_treserved3;
 u_int8_tmisc;
 
 u_int16_t   max_write_speed;/* max raw data rate in bytes/1000 */
 u_int16_t   cur_write_speed;/* current data rate in bytes/1000  */
 u_int16_t   copy_protect_rev;
 u_int16_t   reserved4;
 };
 
 [...]
 
  and in
   
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/atapi-cd.c?rev=1.193.2.1;content-type=text%2Fx-cvsweb-markup
 a check is done like this:
 
 [...]
 static int
 acd_geom_access(struct g_provider *pp, int dr, int dw, int de)
 {
 device_t dev = pp-geom-softc;
 struct acd_softc *cdp = device_get_ivars(dev);
 int timeout = 60, track;
 
 /* check for media present, waiting for loading medium just in case */
 while (timeout--) {
   if (!acd_mode_sense(dev, ATAPI_CDROM_CAP_PAGE,
   (caddr_t)cdp-cap, sizeof(cdp-cap)) 
   cdp-cap.page_code == ATAPI_CDROM_CAP_PAGE) {
   if ((cdp-cap.medium_type == MST_FMT_NONE) ||
   (cdp-cap.medium_type == MST_NO_DISC) ||
   (cdp-cap.medium_type == MST_DOOR_OPEN) ||
   (cdp-cap.medium_type == MST_FMT_ERROR))
   return EIO;
   else
   break;
   }
   pause(acdld, hz / 2);
 }
 [...]
 
  There have been reports of this also being broken on real hw tho,
  like,
  http://lists.freebsd.org/pipermail/freebsd-current/2007-November/079760.html
  so I'm not sure what to make of this...

Well if you ask me (I used to maintain the linux atapi driver), the
freebsd driver suffers from a classic case of 'but the specs says so!'
syndrome. In this case it's even ancient documentation. Drivers should
never try to be 100% spec oriented, they also need a bit of real life
sensibility. The code you quote right above this text is clearly too
anal.

-- 
Jens Axboe





Re: [Qemu-devel] cdrom disc type - is this patch correct? (unbreaks recent FreeBSD guest's -cdrom access)

2007-11-14 Thread Jens Axboe
On Tue, Nov 13 2007, Juergen Lock wrote:
 Hi!
 
  Yesterday I learned that FreeBSD 7.0-BETA2 guests will no longer
 read from the emulated cd drive, apparently because of this commit:
   
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/atapi-cd.c.diff?r1=1.193;r2=1.193.2.1
 The following patch file added to the qemu-devel port fixes the issue
 for me, is it also correct?   (making the guest see a dvd in the drive
 when it is inserted, previously it saw the drive as empty.)
 
  The second hunk is already in qemu cvs so remove it if you want to
 test on that.  ISO used for testing:
   
 ftp://ftp.freebsd.org:/pub/FreeBSD/ISO-IMAGES-i386/7.0/7.0-BETA2-i386-disc1.iso
 (test by either selecting fixit-cdrom or by trying to install, just
 booting it will always work because that goes thru the bios.)
 
 Index: qemu/hw/ide.c
 @@ -1339,6 +1341,8 @@
  case 0x2a:
  cpu_to_ube16(buf[0], 28 + 6);
  buf[2] = 0x70;
 +if (bdrv_is_inserted(s-bs))
 +buf[2] = 0x40;

medium type code has been obsoleted since at least 1999. Looking back at
even older docs, 0x70 is 'door closed, no disc present'. 0x40 is a
reserved value though, I would not suggest using that. Given that
freebsd breaks, my suggest change would be the below - keep the 0x70 for
when no disc is really inserted, but don't set anything if there is.

diff --git a/hw/ide.c b/hw/ide.c
index 5f76c27..52d4c78 100644
--- a/hw/ide.c
+++ b/hw/ide.c
@@ -1344,7 +1344,10 @@ static void ide_atapi_cmd(IDEState *s)
 break;
 case 0x2a:
 cpu_to_ube16(buf[0], 28 + 6);
-buf[2] = 0x70;
+   if (!bdrv_is_inserted(s-bs))
+   buf[2] = 0x70;
+   else
+   buf[2] = 0;
 buf[3] = 0;
 buf[4] = 0;
 buf[5] = 0;

-- 
Jens Axboe





Re: [Qemu-devel] qemu-i386 segfaults running hello world.

2007-06-26 Thread Jens Axboe
On Sun, Jun 24 2007, Rob Landley wrote:
 On Saturday 23 June 2007 07:00:03 Jens Axboe wrote:
   I realize releases are a bit out of fashion, but is there any way to go
   through cvs to track down which checkin broke this stuff?  I can do it in
   git, mercurial, or subversion.  But cvs isn't really set up for this sort
   of thing...
 
  git clone git://git.kernel.dk/data/git/qemu.git
 
  and bisect on that then. It's a continued git import of the cvs repo,
  gets updated every night.
 
 Oh _cool_.  Any way to get a mention of that on the qemu web page?

I don't mind, it's already mentioned on some japanese qemu-win page for
quite some time.

-- 
Jens Axboe





Re: [Qemu-devel] qemu-i386 segfaults running hello world.

2007-06-23 Thread Jens Axboe
On Sat, Jun 23 2007, Rob Landley wrote:
 On Friday 22 June 2007 18:31:20 Rob Landley wrote:
  Ok, it's a more fundamental problem:
 
  [EMAIL PROTECTED]:/sys$ qemu-i386
  Segmentation fault (core dumped)
 
  Nothing to do with the program it's trying to run, it segfaults with no
  arguments.
 
  Is anybody else seeing this?
 
  Rob
 
 So I'm vaguely suspecting that some of the dynamic linker magic this thing's 
 doing is contributing to the screw up (or at least the complexity of 
 debugging it), so I thought I'd statically link.
 
 If I ./configure --static the result doesn't build, it dies during linking.  
 Is this expected?  (Do I need to install .a versions of all the alsa and x11 
 libraries to make that work?)
 
 I realize releases are a bit out of fashion, but is there any way to go 
 through cvs to track down which checkin broke this stuff?  I can do it in 
 git, mercurial, or subversion.  But cvs isn't really set up for this sort of 
 thing...

git clone git://git.kernel.dk/data/git/qemu.git

and bisect on that then. It's a continued git import of the cvs repo,
gets updated every night.

-- 
Jens Axboe





Re: [Qemu-devel] qemu/hw ide.c

2007-02-19 Thread Jens Axboe
On Mon, Feb 19 2007, Thiemo Seufer wrote:
 Thiemo Seufer wrote:
 [snip]
 Why is nsector uint32_t to begin with?

Because nobody sent a patch to fix it, I figure.
   
   Actually I seem to recall it's because it's being overloaded for
   requests that are  256 sectors. It would be a good cleanup to get rid
   of that an turn nsector into a proper uint8_t.
  
  It appears to use 16k bits in some cases. I won't fiddle with it myself
 
 16bits, or 64k, that is.

Yeah, it's for larger requests. It would be nice to track elsewhere,
though. I'll take a look at it.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Ensuring data is written to disk

2006-08-01 Thread Jens Axboe
On Tue, Aug 01 2006, Jamie Lokier wrote:
  Of course, guessing the disk drive write buffer size and trying not to kill
  system I/O performance with all these writes is another question entirely
  ... sigh !!!
 
 If you just want to evict all data from the drive's cache, and don't
 actually have other data to write, there is a CACHEFLUSH command you
 can send to the drive which will be more dependable than writing as
 much data as the cache size.

Exactly, and this is what the OS fsync() should do once the drive has
acknowledged that the data has been written (to cache). At least
reiserfs w/barriers on Linux does this.

Random write tricks are worthless, as you cannot make any assumptions
about what the drive firmware will do.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Ensuring data is written to disk

2006-08-01 Thread Jens Axboe
On Tue, Aug 01 2006, Jamie Lokier wrote:
 Jens Axboe wrote:
  On Tue, Aug 01 2006, Jamie Lokier wrote:
Of course, guessing the disk drive write buffer size and trying not to 
kill
system I/O performance with all these writes is another question 
entirely
... sigh !!!
   
   If you just want to evict all data from the drive's cache, and don't
   actually have other data to write, there is a CACHEFLUSH command you
   can send to the drive which will be more dependable than writing as
   much data as the cache size.
  
  Exactly, and this is what the OS fsync() should do once the drive has
  acknowledged that the data has been written (to cache). At least
  reiserfs w/barriers on Linux does this.
 
 1. Are you sure this happens, w/ reiserfs on Linux, even if the disk
is an SATA or SCSI type that supports ordered tagged commands?  My
understanding is that barriers force an ordering between write
commands, and that CACHEFLUSH is used only with disks that don't have
more sophisticated write ordering commands.  Is the data still
committed to the disk platter before fsync() returns on those?

No SATA drive supports ordered tags, that is a SCSI only property. The
barrier writes is a separate thing, probably reiser ties the two
together because it needs to know if the flush cache command works as
expected. Drives are funny sometimes...

For SATA you always need at least one cache flush (you need one if you
have the FUA/Forced Unit Access write available, you need two if not).

 2. Do you know if ext3 (in ordered mode) w/barriers on Linux does it too,
for in-place writes which don't modify the inode and therefore don't
have a journal entry?

I don't think that it does, however it may have changed. A quick grep
would seem to indicate that it has not changed.

 On Darwin, fsync() does not issue CACHEFLUSH to the drive.  Instead,
 it has an fcntl F_FULLSYNC which does that, which is documented in
 Darwin's fsync() page as working with all Darwin's filesystems,
 provided the hardware honours CACHEFLUSH or the equivalent.

That seems somewhat strange to me, I'd much rather be able to say that
fsync() itself is safe. An added fcntl hack doesn't really help the
applications that already rely on the correct behaviour.

 rom what little documentation I've found, on Linux it appears to be
 much less predictable.  It seems that some filesystems, with some
 kernel versions, and some mount options, on some types of disk, with
 some drive settings, will commit data to a platter before fsync()
 returns, and others won't.  And an application calling fsync() has no
 easy way to find out.  Have I got this wrong?

Nope, I'm afraid that is pretty much true... reiser and (it looks like,
just grepped) XFS has best support for this. Unfortunately I don't think
the user can actually tell if the OS does the right thing, outside of
running a blktrace and verifying that it actually sends a flush cache
down the queue.

 ps. (An aside question): do you happen to know of a good patch which
 implements IDE barriers w/ ext3 on 2.4 kernels?  I found a patch by
 googling, but it seemed that the ext3 parts might not be finished, so
 I don't trust it.  I've found turning off the IDE write cache makes
 writes safe, but with a huge performance cost.

The hard part (the IDE code) can be grabbed from the SLES8 latest
kernels, I developed and tested the code there. That also has the ext3
bits, IIRC.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-31 Thread Jens Axboe
On Fri, Jul 28 2006, Rik van Riel wrote:
 Anthony Liguori wrote:
 
 Right now Fabrice is working on rewriting the block API to be
 asynchronous.  There's been quite a lot of discussion about why using
 threads isn't a good idea for this
 
 Agreed, AIO is the way to go in the long run.
 
 With a proper async API, is there any reason why we would want this to be
 tunable?  I don't think there's much of a benefit of prematurely claiming
 a write is complete especially once the SCSI emulation can support
 multiple simultaneous requests.
 
 You're right.  This O_SYNC bandaid should probably stay in place
 to prevent data corruption, until the AIO framework is ready to
 be used.

O_SYNC is horrible, it'll totally kill performance. QEMU is basically
just a write cache enabled disk and it supports disk flushes as well. So
essentially it's the OS on top of QEMU that needs to take care for
flushing data out, like using barriers on the file system and
propagating fsync() properly down.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-31 Thread Jens Axboe
On Sat, Jul 29 2006, Paul Brook wrote:
  Easy to do with the fsync infrastructure, but probably not worth
  doing since people are working on the AIO I/O backend, which would
  allow multiple outstanding writes from a guest.  That, in turn,
  means I/O completion in the guest can be done when the data really
  hits disk, but without a performance impact.
 
 Not entirely true. That only works if you allow multiple guest IO
 requests in parallel, ie. some form of tagged command queueing. This
 requires either improving the SCSI emulation, or implementing SATA
 emulation. AFAIK parallel IDE doesn't support command queueing.

Parallel IDE does support queuing, but it never gained wide spread
support and the standard is quite broken as well (which is probably
_why_ it never got much adoption). It was also quite suboptimal from a
CPU efficiency POV.

Besides, async completion in itself is not enough, QEMU still needs to
honor ordered writes (barriers) and cache flushes.

 My impression what that the initial AIO implementation is just
 straight serial async operation. IO wouldn't actually go any faster,
 it just means the guest can do something else while it's waiting.

Depends on the app, if the io workload is parallel then you should see a
nice speedup as well (as QEMU is then no longer the serializing bottle
neck).

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-31 Thread Jens Axboe
On Sat, Jul 29 2006, Rik van Riel wrote:
 Fabrice Bellard wrote:
 Hi,
 
 Using O_SYNC for disk image access is not acceptable: QEMU relies on the 
 host OS to ensure that the data is written correctly.
 
 This means that write ordering is not preserved, and on a power
 failure any data written by qemu (or Xen fully virt) guests may
 not be preserved.
 
 Applications running on the host can count on fsync doing the
 right thing, meaning that if they call fsync, the data *will*
 have made it to disk.  Applications running inside a guest have
 no guarantees that their data is actually going to make it
 anywhere when fsync returns...

Then the guest OS is broken. Applications issuing an fsync() should
issue a flush (or write-through), the guest OS should propagate this
knowledge through it's io stack and the QEMU hard drive should get
notified. If the guest OS isn't doing what it's supposed to, QEMU can't
help you. And, in fact, running your app on the same host OS with write
back caching would screw you as well. The timing window will probably be
larger with QEMU, but the problem is essentially the same.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-31 Thread Jens Axboe
On Mon, Jul 31 2006, Jonas Maebe wrote:
 
 On 31 jul 2006, at 09:08, Jens Axboe wrote:
 
 Applications running on the host can count on fsync doing the
 right thing, meaning that if they call fsync, the data *will*
 have made it to disk.  Applications running inside a guest have
 no guarantees that their data is actually going to make it
 anywhere when fsync returns...
 
 Then the guest OS is broken.
 
 The problem is that supposedly many OS'es are broken in this way. See
 http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html

Well, as others have written here as well, then their OS are broken on
real hardware as well.

I wouldn't be adverse to a QEMU work-around, but O_SYNC is clearly not a
viable alternative! We could make QEMU behave more like a real hard
drive when it has aio support, flushing dirty cache out in a manner
more closely mimicking what a drive would do instead of relying on the
page cache writeout deciding to write it out.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-31 Thread Jens Axboe
On Mon, Jul 31 2006, andrzej zaborowski wrote:
 On 30/07/06, Jamie Lokier [EMAIL PROTECTED] wrote:
 Rik van Riel wrote:
  This may look like hair splitting, but so far I've lost a
  (test) postgresql database to this 3 times already.  Not getting
  the guest application's data to disk when the application calls
  fsync is a recipe for disaster.
 
 Exactly the same thing happens with real IDE disks if IDE write
 caching (on the drive itself) is enabled, which it is by default.  It
 is rarer, but it happens.
 
 The little difference with QEMU is that there are two caches above it:
 the host OS'es software cache and the IDE hardware cache. When a guest
 OS flushes its own software cache its precious data goes to the host's
 software cache while the guest thinks it's already the IDE cache. This
 is ofcourse of less importance because data in both caches (hard- and
 software) is lost when the power is cut off.

But the drive cache does not let the dirty data linger for as long as
wht OS page/buffer cache.

 IMHO what really makes IO unreliable in QEMU is that IO errors on the
 host are not reported to the guest by the IDE emulation and there's an
 exact place in hw/ide.c where they are arrogantly ignored.

Send a patch, I'm pretty sure nobody would disagree :-)

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: high CPU load / async IO?

2006-07-26 Thread Jens Axboe
On Tue, Jul 25 2006, Fabrice Bellard wrote:
 Jens Axboe wrote:
 On Tue, Jul 25 2006, Sven Köhler wrote:
 
 So the current thread-based async dma patch is really just the
 wrong long term solution.  A more long term solution is likely in
 the works.  It requires quite a bit of code modification though.
 
 I see. So in other words:
 
 don't ask for simple async I/O now. The more complex and flexible
 sollution will follow soon.
 
 Yes, hopefully really soon.
 
 So i will wait patiently :-)
 
 
 Is anyone actively working on this, or is it just speculation? I'd
 greatly prefer (and might do, if no one is working on it and Fabrice
 would take it) do a libaio version, since that'll for sure perform
 the best on Linux. But a posixaio version might be saner, as that
 should work on other operating systems as well.
 
 Fabrice, can you let people know what you would prefer?
 
 I am working on an implementation and the first version will use the
 posix aio and possibly the Windows ReadFile/WriteFile overlapped I/Os.
 Anthony Liguori got a pre version of the code, but it is not
 commitable yet.

Sounds good, so at least it's on its way :-)
It's on of those big items left on the TODO, so will be good to see go
in. Then one should implement an ahci host controller for queued command
support next...

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: high CPU load / async IO?

2006-07-26 Thread Jens Axboe
On Wed, Jul 26 2006, Paul Brook wrote:
  Sounds good, so at least it's on its way :-)
  It's on of those big items left on the TODO, so will be good to see go
  in. Then one should implement an ahci host controller for queued command
  support next...
 
 Or use the scsi emulation :-)

Ah, did not know that queueing was fully implemented there yet!

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: high CPU load / async IO?

2006-07-26 Thread Jens Axboe
On Wed, Jul 26 2006, Paul Brook wrote:
 On Wednesday 26 July 2006 13:23, Jens Axboe wrote:
  On Wed, Jul 26 2006, Paul Brook wrote:
Sounds good, so at least it's on its way :-)
It's on of those big items left on the TODO, so will be good to see go
in. Then one should implement an ahci host controller for queued
command support next...
  
   Or use the scsi emulation :-)
 
  Ah, did not know that queueing was fully implemented there yet!
 
 It isn't, but it's nearer than the SATA emulation!

ahci wouldn't be too much work, but definitely more so than finishing
the scsi bits!

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: high CPU load / async IO?

2006-07-26 Thread Jens Axboe
On Wed, Jul 26 2006, Sven Köhler wrote:
  Sounds good, so at least it's on its way :-)
  It's on of those big items left on the TODO, so will be good to see go
  in. Then one should implement an ahci host controller for queued
  command support next...
  Or use the scsi emulation :-)
  Ah, did not know that queueing was fully implemented there yet!
  It isn't, but it's nearer than the SATA emulation!
  
  ahci wouldn't be too much work, but definitely more so than finishing
  the scsi bits!
 
 That sounds great! I feel, like my dreams come true.
 
 
 BTW: Fabrice said, he will use the POSIX AIO (i guess, he means
 http://www.bullopensource.org/posix/ in case of Linux, right?)

Well I would assume that he just would use the glibc posix aio, which is
suboptimal but at least the code can be reused. The bull project looks
like it's trying to mimic posix aio on top of linux aio, so (if they got
the details right) that should be faster. I didn't check their sources,
though. You should be able to use the bull stuff with qemu, it would
most likely overloading the glibc function for posix aio.

 Which other OS do also support the POSIX AIO API?

No idea really, but I would guess any unixy OS out there.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Re: high CPU load / async IO?

2006-07-25 Thread Jens Axboe
On Tue, Jul 25 2006, Sven Köhler wrote:
  So the current thread-based async dma patch is really just the wrong long
  term solution.  A more long term solution is likely in the works.  It
  requires quite a bit of code modification though.
 
  I see. So in other words:
 
  don't ask for simple async I/O now. The more complex and flexible
  sollution will follow soon.
  
  Yes, hopefully really soon.
 
 So i will wait patiently :-)

Is anyone actively working on this, or is it just speculation? I'd
greatly prefer (and might do, if no one is working on it and Fabrice
would take it) do a libaio version, since that'll for sure perform the
best on Linux. But a posixaio version might be saner, as that should
work on other operating systems as well.

Fabrice, can you let people know what you would prefer?

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Pentium D with guest Ubuntu 6.06 server kernel panic with kqemu

2006-07-07 Thread Jens Axboe
On Fri, Jul 07 2006, Joachim Henke wrote:
 Yes, this patch was included, but it doesn't solve that problem. As  
 this message [http://www.mail-archive.com/qemu-devel@nongnu.org/ 
 msg03972.html] states, the 'monitor' and the 'mwait' instructions  
 have not been added. But your guest OS assumes them to be present,  
 because your host cpu has the MONITOR flag set in CPUID.
 
 Jo.
 
 R. Armiento wrote:
 The error looks very similar to the one reported here:
   http://www.mail-archive.com/qemu-devel@nongnu.org/msg03964.html
 But I believe that reported issue should not appear in recent qemu,  
 since SSE3 is now emulated (right?). (At least the patch in the end  
 of that thread seems to already be included in the sources?)
 
 So, my hypothesis is that there is some other feature that appears  
 in my host CPUID, which the booting linux image tries to make use  
 of, but which qemu does not emulate.

Until that gets fixed up, you can boot with idle=halt.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: RE : [Qemu-devel] cvttps2dq, movdq2q, movq2dq incorrect behaviour

2006-06-20 Thread Jens Axboe
On Tue, Jun 20 2006, malc wrote:
 On Tue, 20 Jun 2006, Sylvain Petreolle wrote:
 
 --- Julian Seward [EMAIL PROTECTED] a ?crit :
 
 The SSE2 instructions cvttps2dq, movdq2q, movq2dq do not behave
 correctly, as shown by the attached program.  It should print
 
   cvttps2dq_1 ... ok
   cvttps2dq_2 ... ok
   movdq2q_1 ... ok
   movq2dq_1 ... ok
 
 
 
 I tried your program on my linux station :
 CPU: AMD Athlon(tm) XP 1600+ stepping 02
 
 [EMAIL PROTECTED] qemu]$ gcc --version
 gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
 
 [EMAIL PROTECTED] qemu]$ gcc -msse2 sse2test.c -o sse2test
 [EMAIL PROTECTED] qemu]$ ./sse2test
 cvttps2dq_1 ... failed
 cvttps2dq_2 ... failed
 movdq2q_1 ... failed
 movq2dq_1 ... failed
 
 what am i doing wrong here ?
 
 Running it on a CPU without SSE2, if i'm allowed to venture a gues.

Doesn't work for me, either:

[EMAIL PROTECTED]:/home/axboe $ ./a
cvttps2dq_1 ... not ok
  result0.sd[0] = 0 (expected 12)
  result0.sd[1] = 0 (expected 56)
  result0.sd[2] = 0 (expected 43)
  result0.sd[3] = 0 (expected 87)
cvttps2dq_2 ... not ok
  result0.sd[0] = 0 (expected 12)
  result0.sd[1] = 0 (expected 56)
  result0.sd[2] = 0 (expected 43)
  result0.sd[3] = 0 (expected 87)
movdq2q_1 ... not ok
  result0.uq[0] = 240518168588 (expected 5124095577148911)
movq2dq_1 ... not ok
  result0.uq[0] = 0 (expected 5124095577148911)
  result0.uq[1] = 0 (expected 0)
[EMAIL PROTECTED]:/home/axboe $ ./a
Segmentation fault

Varies between the two. Compiling without -O2 makes the last two
suceed, the others still not. This CPU has sse2.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: RE : [Qemu-devel] cvttps2dq, movdq2q, movq2dq incorrect behaviour

2006-06-20 Thread Jens Axboe
On Tue, Jun 20 2006, Jens Axboe wrote:
 On Tue, Jun 20 2006, malc wrote:
  On Tue, 20 Jun 2006, Sylvain Petreolle wrote:
  
  --- Julian Seward [EMAIL PROTECTED] a ?crit :
  
  The SSE2 instructions cvttps2dq, movdq2q, movq2dq do not behave
  correctly, as shown by the attached program.  It should print
  
cvttps2dq_1 ... ok
cvttps2dq_2 ... ok
movdq2q_1 ... ok
movq2dq_1 ... ok
  
  
  
  I tried your program on my linux station :
  CPU: AMD Athlon(tm) XP 1600+ stepping 02
  
  [EMAIL PROTECTED] qemu]$ gcc --version
  gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
  
  [EMAIL PROTECTED] qemu]$ gcc -msse2 sse2test.c -o sse2test
  [EMAIL PROTECTED] qemu]$ ./sse2test
  cvttps2dq_1 ... failed
  cvttps2dq_2 ... failed
  movdq2q_1 ... failed
  movq2dq_1 ... failed
  
  what am i doing wrong here ?
  
  Running it on a CPU without SSE2, if i'm allowed to venture a gues.
 
 Doesn't work for me, either:
 
 [EMAIL PROTECTED]:/home/axboe $ ./a
 cvttps2dq_1 ... not ok
   result0.sd[0] = 0 (expected 12)
   result0.sd[1] = 0 (expected 56)
   result0.sd[2] = 0 (expected 43)
   result0.sd[3] = 0 (expected 87)
 cvttps2dq_2 ... not ok
   result0.sd[0] = 0 (expected 12)
   result0.sd[1] = 0 (expected 56)
   result0.sd[2] = 0 (expected 43)
   result0.sd[3] = 0 (expected 87)
 movdq2q_1 ... not ok
   result0.uq[0] = 240518168588 (expected 5124095577148911)
 movq2dq_1 ... not ok
   result0.uq[0] = 0 (expected 5124095577148911)
   result0.uq[1] = 0 (expected 0)
 [EMAIL PROTECTED]:/home/axboe $ ./a
 Segmentation fault
 
 Varies between the two. Compiling without -O2 makes the last two
 suceed, the others still not. This CPU has sse2.

32-bit version works, as intended I guess.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] kqemu version 1.3.0pre5

2006-03-28 Thread Jens Axboe
On Tue, Mar 28 2006, Ed Swierk wrote:
 I'm still getting a kernel panic running a Linux guest kernel with
 -kernel-qemu. I'm using kqemu-1.3.0pre5 and
 qemu-snapshot-2006-03-27_23.
 
 The guest kernel is a precompiled Fedora Core 4 kernel, version
 2.6.14-1.1656_FC4. It works fine with kqemu in non-kernel-kqemu mode.
 
 Any hints for how to track this problem down?

[snip]

 monitor/mwait feature present.
 using mwait in idle threads.

[snip]

 invalid operand:  [#1]
 Modules linked in:
 CPU:0
 EIP:0060:[c0101147]Not tainted VLI
 EFLAGS: 00010246   (2.6.14-1.1656_FC4)
 EIP is at mwait_idle+0x2f/0x41

I don't think qemu supports PNI, which includes the monitor/mwait
additions. I wonder why Linux detects that. You can probably get around
it for now by either passing idle=poll as a boot parameter, or compile
your kernel for plain i586 for instance.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] smp support and ide lba48

2006-03-13 Thread Jens Axboe
On Mon, Mar 13 2006, Mario Goppold wrote:
 Am Samstag, 11. März 2006 13:31 schrieb Jens Axboe:
  On Fri, Mar 10 2006, Mario Goppold wrote:
   Hi,
  
   I try to install SuSE92-64 on an 400G HD but it fails:
  
   hda: max request size: 128KiB
   hda: cannot use LBA48 - full capacity 838860800 sectors (429496 MB)
   hda: 268435456 sectors (137438 MB) w/256KiB Cache, CHS=65535/16/63,
   (U)DMA hda:4hda: lost interrupt
   hda: lost interrupt
   ...
  
  
   If I switch to 32bit (in grub) it works. Here is my Env:
  
   Qemu: snapshot20060304 (gcc version 3.3.6)
   KQemu: kqemu-1.3.0pre3 (gcc version 4.0.2, SuSE10.0, 2.6.13-15.8-smp)
  
   qemu-img create test.img 400G
   qemu-system-x86_64 -m 512 -k de -localtime -smp 2 \
-net nic,vlan=0,macaddr=00:01:02:03:04:05 -net tap,vlan=0 \
-hda test.img -cdrom /dev/dvd -boot d
  
   If I reduce the image-size it won't better.
  
   Yust now I try it without -smp 2 and see what I want unkown partition
   table ...
  
   So my question is : Is lba48 not smp save or is smp support broken (or
   incomplete)?
 
  lba48 support is not committed yet, read the linux messasge - it says it 
  cannot use lba48, because the drive (qemu) doesn't support it. Find my
  latest posting on this list, it should get you going.
 
 Oh, i oversight that the patches not commited yet. Now i have the patches to 
 the snapshot_2006-03-12 adapted (Patch 2/3 and 3/3 of your Mail from 
 4.1.2006) and applied but with no succsess:
 
 hda: max request size: 128KiB
 hda: 838860800 sectors (429496 MB) w/256KiB Cache, CHS=52216/255/63, (U)DMA
 hda: lost interrupt
 hda: lost interrupt
 hda: lost interrupt
 hda: lost interrupt
  hda:4hda: dma_timer_expiry: dma status == 0x24
 hda: DMA interrupt recovery
 hda: lost interrupt
 ...

So now you see the full drive size and linux can use it, however there
seems to be an unrelated problem with interrupt delivery in smp mode. I
can't say what causes that, other 'devices' will likely show the same
problem.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] smp support and ide lba48

2006-03-11 Thread Jens Axboe
On Fri, Mar 10 2006, Mario Goppold wrote:
 Hi,
 
 I try to install SuSE92-64 on an 400G HD but it fails:
 
 hda: max request size: 128KiB
 hda: cannot use LBA48 - full capacity 838860800 sectors (429496 MB)
 hda: 268435456 sectors (137438 MB) w/256KiB Cache, CHS=65535/16/63, (U)DMA
  hda:4hda: lost interrupt
 hda: lost interrupt
 ...
 
 
 If I switch to 32bit (in grub) it works. Here is my Env:
 
 Qemu: snapshot20060304 (gcc version 3.3.6)
 KQemu: kqemu-1.3.0pre3 (gcc version 4.0.2, SuSE10.0, 2.6.13-15.8-smp)
 
 qemu-img create test.img 400G
 qemu-system-x86_64 -m 512 -k de -localtime -smp 2 \
  -net nic,vlan=0,macaddr=00:01:02:03:04:05 -net tap,vlan=0 \
  -hda test.img -cdrom /dev/dvd -boot d
 
 If I reduce the image-size it won't better.
 
 Yust now I try it without -smp 2 and see what I want unkown partition 
 table ...
 
 So my question is : Is lba48 not smp save or is smp support broken (or 
 incomplete)?

lba48 support is not committed yet, read the linux messasge - it says it
cannot use lba48, because the drive (qemu) doesn't support it. Find my
latest posting on this list, it should get you going.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [PATCH] Fix Harddisk initialization

2006-02-22 Thread Jens Axboe
On Tue, Feb 21 2006, Thiemo Seufer wrote:
 Hello All,
 
 this fixes Harddisk initialization (s-nsector is initially 0x100, which
 is supposed to get handled as zero).
 
 
 Thiemo
 
 
 Index: qemu-work/hw/ide.c
 ===
 --- qemu-work.orig/hw/ide.c   2006-02-18 22:12:56.0 +
 +++ qemu-work/hw/ide.c2006-02-19 02:34:13.0 +
 @@ -1550,12 +1550,12 @@
  ide_set_irq(s);
  break;
  case WIN_SETMULT:
 -if (s-nsector  MAX_MULT_SECTORS || 
 +if ((s-nsector  0xFF)  MAX_MULT_SECTORS ||
  s-nsector == 0 ||
  (s-nsector  (s-nsector - 1)) != 0) {
  ide_abort_command(s);
  } else {
 -s-mult_sectors = s-nsector;
 +s-mult_sectors = s-nsector  0xFF;
  s-status = READY_STAT;
  }
  ide_set_irq(s);

I think the much better patch would be to fix qemu not to put 256
unconditionally in -nsector if it is written as zero. It's really a
special case for only the read/write commands, not a generel fixup.
I'd suggest adding a nsector_internal to fixup this internally in the
read/write path so all register correctly reflect what was actually
written by the OS.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [PATCH 2/3] ide lba48 support

2006-02-02 Thread Jens Axboe
On Wed, Feb 01 2006, Fabrice Bellard wrote:
 Jens Axboe wrote:
 Subject: [PATCH] Add lba48 support to ide
 From: Jens Axboe [EMAIL PROTECTED]
 Date: 1136376117 +0100
 
 Add lba48 support for the ide code. Read back of hob registers isn't
 there yet, though.
 
 Do you have a more recent patch ? In your latest patch, the lba48 field 
 is never reset and the nsector may be broken.

The lba48 setting did look a little odd, should be corrected now. I
guess that is what would affect the nsector stuff, it looks correct to
me know.

From nobody Mon Sep 17 00:00:00 2001
From: Jens Axboe [EMAIL PROTECTED]
Date: Thu Feb 2 10:51:20 2006 +0100
Subject: [PATCH] Add lba48 support to ide

Enables qemu to support ide disk images  2^28 * 512 bytes.

---

 hw/ide.c |  157 ++
 1 files changed, 137 insertions(+), 20 deletions(-)

b67eb122b5646ddcfd13d45563bbe6aa5309e9c0
diff --git a/hw/ide.c b/hw/ide.c
index 50b8e63..01b10e1 100644
--- a/hw/ide.c
+++ b/hw/ide.c
@@ -307,14 +307,24 @@ typedef struct IDEState {
 /* ide regs */
 uint8_t feature;
 uint8_t error;
-uint16_t nsector; /* 0 is 256 to ease computations */
+uint32_t nsector;
 uint8_t sector;
 uint8_t lcyl;
 uint8_t hcyl;
+/* other part of tf for lba48 support */
+uint8_t hob_feature;
+uint8_t hob_nsector;
+uint8_t hob_sector;
+uint8_t hob_lcyl;
+uint8_t hob_hcyl;
+
 uint8_t select;
 uint8_t status;
+
 /* 0x3f6 command, only meaningful for drive 0 */
 uint8_t cmd;
+/* set for lba48 access */
+uint8_t lba48;
 /* depends on bit 4 in select, only meaningful for drive 0 */
 struct IDEState *cur_drive; 
 BlockDriverState *bs;
@@ -462,13 +472,19 @@ static void ide_identify(IDEState *s)
 put_le16(p + 80, 0xf0); /* ata3 - ata6 supported */
 put_le16(p + 81, 0x16); /* conforms to ata5 */
 put_le16(p + 82, (1  14));
-put_le16(p + 83, (1  14));
+/* 13=flush_cache_ext,12=flush_cache,10=lba48 */
+put_le16(p + 83, (1  14) | (1  13) | (1 12) | (1  10));
 put_le16(p + 84, (1  14));
 put_le16(p + 85, (1  14));
-put_le16(p + 86, 0);
+/* 13=flush_cache_ext,12=flush_cache,10=lba48 */
+put_le16(p + 86, (1  14) | (1  13) | (1 12) | (1  10));
 put_le16(p + 87, (1  14));
 put_le16(p + 88, 0x3f | (1  13)); /* udma5 set and supported */
 put_le16(p + 93, 1 | (1  14) | 0x2000);
+put_le16(p + 100, s-nb_sectors);
+put_le16(p + 101, s-nb_sectors  16);
+put_le16(p + 102, s-nb_sectors  32);
+put_le16(p + 103, s-nb_sectors  48);
 
 memcpy(s-identify_data, p, sizeof(s-identify_data));
 s-identify_set = 1;
@@ -572,12 +588,19 @@ static int64_t ide_get_sector(IDEState *
 int64_t sector_num;
 if (s-select  0x40) {
 /* lba */
-sector_num = ((s-select  0x0f)  24) | (s-hcyl  16) | 
-(s-lcyl  8) | s-sector;
+   if (!s-lba48) {
+   sector_num = ((s-select  0x0f)  24) | (s-hcyl  16) |
+   (s-lcyl  8) | s-sector;
+   } else {
+   sector_num = ((int64_t)s-hob_hcyl  40) |
+   ((int64_t) s-hob_lcyl  32) |
+   ((int64_t) s-hob_sector  24) |
+   ((int64_t) s-hcyl  16) |
+   ((int64_t) s-lcyl  8) | s-sector;
+   }
 } else {
 sector_num = ((s-hcyl  8) | s-lcyl) * s-heads * s-sectors +
-(s-select  0x0f) * s-sectors + 
-(s-sector - 1);
+(s-select  0x0f) * s-sectors + (s-sector - 1);
 }
 return sector_num;
 }
@@ -586,10 +609,19 @@ static void ide_set_sector(IDEState *s, 
 {
 unsigned int cyl, r;
 if (s-select  0x40) {
-s-select = (s-select  0xf0) | (sector_num  24);
-s-hcyl = (sector_num  16);
-s-lcyl = (sector_num  8);
-s-sector = (sector_num);
+   if (!s-lba48) {
+s-select = (s-select  0xf0) | (sector_num  24);
+s-hcyl = (sector_num  16);
+s-lcyl = (sector_num  8);
+s-sector = (sector_num);
+   } else {
+   s-sector = sector_num;
+   s-lcyl = sector_num  8;
+   s-hcyl = sector_num  16;
+   s-hob_sector = sector_num  24;
+   s-hob_lcyl = sector_num  32;
+   s-hob_hcyl = sector_num  40;
+   }
 } else {
 cyl = sector_num / (s-heads * s-sectors);
 r = sector_num % (s-heads * s-sectors);
@@ -1475,43 +1507,89 @@ static void cdrom_change_cb(void *opaque
 s-nb_sectors = nb_sectors;
 }
 
+static void ide_cmd_lba48_transform(IDEState *s, int lba48)
+{
+s-lba48 = lba48;
+
+/* handle the 'magic' 0 nsector count conversion here. to avoid
+ * fiddling with the rest of the read logic, we just store the
+ * full sector count in -nsector and ignore -hob_nsector from now
+ */
+if (!s-lba48) {
+   if (!s-nsector)
+   s-nsector = 256;
+} else {
+   if (!s-nsector  !s-hob_nsector)
+   s-nsector = 65536;
+   else

Re: [Qemu-devel] [PATCH 3/3] proper support of FLUSH_CACHE and FLUSH_CACHE_EXT

2006-01-05 Thread Jens Axboe
On Wed, Jan 04 2006, Johannes Schindelin wrote:
 Hi,
 
 On Wed, 4 Jan 2006, Jens Axboe wrote:
 
  1.0.GIT
 
 Using git for QEmu development? Welcome to the club. ;-)

Yes I just imported the repo into git, cvs isn't really my cup of tea
and it isn't very handy for patch series. git isn't very tailored for
that as well, but at least it allows me to just do a 'format-patch'
against the old master and get the patch series. And with a devel
branch, it's pretty easy to pull the new updates and rebase the devel
branch as needed.

Are you using a persistent git repo for qemu (ie continually importing
new changes)? I've considered setting one up :-)

 Regarding your patches: as far as I understand them, I like 'em.

Thanks!

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [PATCH 3/3] proper support of FLUSH_CACHE and FLUSH_CACHE_EXT

2006-01-05 Thread Jens Axboe
On Thu, Jan 05 2006, Jens Axboe wrote:
 Are you using a persistent git repo for qemu (ie continually importing
 new changes)? I've considered setting one up :-)

I set up such a gateway, should be updated every night from Fabrices cvs
repository. The web interface is here:

http://brick.kernel.dk/git/?p=qemu.git;a=summary

and you can pull from the following git url:

git://brick.kernel.dk/data/git/cvsdata/qemu

I've added the 'ide' branch, with the patches posted here. If there's an
interest in this (the git repo, not the ide patches :), I can push it to
kernel.org as well.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


[Qemu-devel] [PATCH 0/3] qemu ide updates

2006-01-04 Thread Jens Axboe
Hi,

Here's the set of 3 patches I currently have for the qemu ide/block
code.

1/3: The ide id updates
2/3: lba48 support
3/3: Proper support of the flush cache command

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


[Qemu-devel] [PATCH] ide id updates

2006-01-03 Thread Jens Axboe
(s-identify_data + 88,0x3f);
+   break;
+   case 0x04: /* mdma mode */
+   put_le16(s-identify_data + 63,0x07 | (1  (val + 8)));
+   put_le16(s-identify_data + 88,0x3f);
+   break;
+   case 0x08: /* udma mode */
+   put_le16(s-identify_data + 63,0x07);
+   put_le16(s-identify_data + 88,0x3f | (1  (val + 8)));
+   break;
+   default:
+   goto abort_cmd;
+   }
+s-status = READY_STAT | SEEK_STAT;
+ide_set_irq(s);
+break;
+   }
 default:
 goto abort_cmd;
 }

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [PATCH] lba48 support

2006-01-02 Thread Jens Axboe
On Fri, Dec 30 2005, Fabrice Bellard wrote:
 Jens Axboe wrote:
 Saw the posts on this the other day and had a few spare hours to play
 with this. Works for me, with and without DMA (didn't test mult mode,
 but that should work fine too).
 
 Test with caution though, it's changing the ide code so could eat your
 data if there's a bug there... Most clever OS's don't use lba48 even for
 lba48 capable drives, unless the device is  2^28 sectors and the
 current request is past that (but they could be taking advantage of the
 larger transfer size possible, in which case lba48 will be used even for
 low sectors...).
 
 Thank you for the patch ! At least two details should be corrected 
 before I can apply it:
 
 1) Each duplicated IDE register acts as a 2 byte FIFO, so the logic you 
 added in the write function should be modified (the regs_written field 
 is not needed).
 
 2) The read back logic should be implemented (HOB bit in the device 
 control register).

Updated patch below. The read back logic doesn't work right now, since
we always set bits 5-7 (the obsolete) bits in device select. But I've
dropped the regs_written hack, the hob registers are now (as intended)
always the previous value. That makes it LIFO, which I suppose is what
you meant?


Index: hw/ide.c
===
RCS file: /sources/qemu/qemu/hw/ide.c,v
retrieving revision 1.38
diff -u -r1.38 ide.c
--- hw/ide.c6 Aug 2005 09:14:32 -   1.38
+++ hw/ide.c2 Jan 2006 12:58:15 -
@@ -305,14 +305,24 @@
 /* ide regs */
 uint8_t feature;
 uint8_t error;
-uint16_t nsector; /* 0 is 256 to ease computations */
+uint32_t nsector;
 uint8_t sector;
 uint8_t lcyl;
 uint8_t hcyl;
+/* other part of tf for lba48 support */
+uint8_t hob_feature;
+uint8_t hob_nsector;
+uint8_t hob_sector;
+uint8_t hob_lcyl;
+uint8_t hob_hcyl;
+
 uint8_t select;
 uint8_t status;
+
 /* 0x3f6 command, only meaningful for drive 0 */
 uint8_t cmd;
+/* set for lba48 access */
+uint8_t lba48;
 /* depends on bit 4 in select, only meaningful for drive 0 */
 struct IDEState *cur_drive; 
 BlockDriverState *bs;
@@ -449,13 +459,17 @@
 put_le16(p + 61, s-nb_sectors  16);
 put_le16(p + 80, (1  1) | (1  2));
 put_le16(p + 82, (1  14));
-put_le16(p + 83, (1  14));
+put_le16(p + 83, (1  14) | (1  10)); /* lba48 supported */
 put_le16(p + 84, (1  14));
 put_le16(p + 85, (1  14));
-put_le16(p + 86, 0);
+put_le16(p + 86, (1  14) | (1  10)); /* lba48 supported */
 put_le16(p + 87, (1  14));
 put_le16(p + 88, 0x1f | (1  13));
 put_le16(p + 93, 1 | (1  14) | 0x2000 | 0x4000);
+put_le16(p + 100, s-nb_sectors);
+put_le16(p + 101, s-nb_sectors  16);
+put_le16(p + 102, s-nb_sectors  32);
+put_le16(p + 103, s-nb_sectors  48);
 }
 
 static void ide_atapi_identify(IDEState *s)
@@ -548,12 +562,18 @@
 int64_t sector_num;
 if (s-select  0x40) {
 /* lba */
-sector_num = ((s-select  0x0f)  24) | (s-hcyl  16) | 
-(s-lcyl  8) | s-sector;
+   if (!s-lba48) {
+   sector_num = ((s-select  0x0f)  24) | (s-hcyl  16) |
+ (s-lcyl  8) | s-sector;
+   } else {
+   sector_num = ((int64_t)s-hcyl  40) |
+ ((int64_t) s-lcyl  32) |
+ (s-sector  24) | (s-hob_hcyl  16) |
+ (s-hob_lcyl  8) | s-hob_sector;
+   }
 } else {
 sector_num = ((s-hcyl  8) | s-lcyl) * s-heads * s-sectors +
-(s-select  0x0f) * s-sectors + 
-(s-sector - 1);
+(s-select  0x0f) * s-sectors + (s-sector - 1);
 }
 return sector_num;
 }
@@ -562,10 +582,19 @@
 {
 unsigned int cyl, r;
 if (s-select  0x40) {
-s-select = (s-select  0xf0) | (sector_num  24);
-s-hcyl = (sector_num  16);
-s-lcyl = (sector_num  8);
-s-sector = (sector_num);
+   if (!s-lba48) {
+s-select = (s-select  0xf0) | (sector_num  24);
+s-hcyl = (sector_num  16);
+s-lcyl = (sector_num  8);
+s-sector = (sector_num);
+   } else {
+   s-hob_sector = sector_num;
+   s-hob_lcyl = sector_num  8;
+   s-hob_hcyl = sector_num  16;
+   s-sector = sector_num  24;
+   s-lcyl = sector_num  32;
+   s-hcyl = sector_num  40;
+   }
 } else {
 cyl = sector_num / (s-heads * s-sectors);
 r = sector_num % (s-heads * s-sectors);
@@ -1451,43 +1480,65 @@
 s-nb_sectors = nb_sectors;
 }
 
+static void ide_clear_hob(IDEState *ide_if)
+{
+/* any write clears HOB high bit of device control register */
+ide_if[0].select = ~(1  7);
+ide_if[1].select = ~(1  7);
+}
+
 static void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 IDEState *ide_if = opaque;
 IDEState *s;
-int unit, n;
+int unit, n, lba48_cmd = 0

Re: [Qemu-devel] [PATCH] lba48 support

2005-12-30 Thread Jens Axboe
On Fri, Dec 30 2005, Fabrice Bellard wrote:
 Jens Axboe wrote:
 Saw the posts on this the other day and had a few spare hours to play
 with this. Works for me, with and without DMA (didn't test mult mode,
 but that should work fine too).
 
 Test with caution though, it's changing the ide code so could eat your
 data if there's a bug there... Most clever OS's don't use lba48 even for
 lba48 capable drives, unless the device is  2^28 sectors and the
 current request is past that (but they could be taking advantage of the
 larger transfer size possible, in which case lba48 will be used even for
 low sectors...).
 
 Thank you for the patch ! At least two details should be corrected 
 before I can apply it:
 
 1) Each duplicated IDE register acts as a 2 byte FIFO, so the logic you 
 added in the write function should be modified (the regs_written field 
 is not needed).

Perfect, I wasn't very fond of that approach either (it seemed fragile).

 2) The read back logic should be implemented (HOB bit in the device 
 control register).

Indeed. I'll get these things fixed up, wont be before monday though.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


[Qemu-devel] [PATCH] lba48 support

2005-12-29 Thread Jens Axboe
 {
+ide_if[0].hob_nsector = val;
+ide_if[1].hob_nsector = val;
+   }
 break;
 case 3:
-ide_if[0].sector = val;
-ide_if[1].sector = val;
+   if (!hob) {
+ide_if[0].sector = val;
+ide_if[1].sector = val;
+   } else {
+ide_if[0].hob_sector = val;
+ide_if[1].hob_sector = val;
+   }
 break;
 case 4:
-ide_if[0].lcyl = val;
-ide_if[1].lcyl = val;
+   if (!hob) {
+ide_if[0].lcyl = val;
+ide_if[1].lcyl = val;
+   } else {
+ide_if[0].hob_lcyl = val;
+ide_if[1].hob_lcyl = val;
+   }
 break;
 case 5:
-ide_if[0].hcyl = val;
-ide_if[1].hcyl = val;
+   if (!hob) {
+ide_if[0].hcyl = val;
+ide_if[1].hcyl = val;
+   } else {
+ide_if[0].hob_hcyl = val;
+ide_if[1].hob_hcyl = val;
+   }
 break;
 case 6:
 ide_if[0].select = (val  ~0x10) | 0xa0;
@@ -1501,10 +1559,34 @@
 #if defined(DEBUG_IDE)
 printf(ide: CMD=%02x\n, val);
 #endif
+   /* clear regs written when we see any command */
+   ide_if[0].regs_written = 0;
+
 s = ide_if-cur_drive;
 /* ignore commands to non existant slave */
 if (s != ide_if  !s-bs) 
 break;
+
+   s-lba48 = lba48_cmd;
+
+   /* handle the 'magic' 0 nsector count conversion here. to avoid
+* fiddling with the rest of the read logic, we just store the
+* full sector count in -nsector and ignore -hob_nsector from now
+*/
+   if (!s-lba48) {
+   if (!s-nsector)
+   s-nsector = 256;
+   } else {
+   if (!s-nsector  !s-hob_nsector)
+   s-nsector = 65536;
+   else {
+   int lo = s-hob_nsector;
+   int hi = s-nsector;
+
+   s-nsector = (hi  8) | lo;
+   }
+   }
+
 switch(val) {
 case WIN_IDENTIFY:
 if (s-bs  !s-is_cdrom) {
@@ -1536,12 +1618,16 @@
 }
 ide_set_irq(s);
 break;
+case WIN_VERIFY_EXT:
+   lba48_cmd = 1;
 case WIN_VERIFY:
 case WIN_VERIFY_ONCE:
 /* do sector number check ? */
 s-status = READY_STAT;
 ide_set_irq(s);
 break;
+   case WIN_READ_EXT:
+   lba48_cmd = 1;
 case WIN_READ:
 case WIN_READ_ONCE:
 if (!s-bs) 
@@ -1549,6 +1635,8 @@
 s-req_nb_sectors = 1;
 ide_sector_read(s);
 break;
+   case WIN_WRITE_EXT:
+   lba48_cmd = 1;
 case WIN_WRITE:
 case WIN_WRITE_ONCE:
 s-error = 0;
@@ -1556,12 +1644,16 @@
 s-req_nb_sectors = 1;
 ide_transfer_start(s, s-io_buffer, 512, ide_sector_write);
 break;
+   case WIN_MULTREAD_EXT:
+   lba48_cmd = 1;
 case WIN_MULTREAD:
 if (!s-mult_sectors)
 goto abort_cmd;
 s-req_nb_sectors = s-mult_sectors;
 ide_sector_read(s);
 break;
+case WIN_MULTWRITE_EXT:
+   lba48_cmd = 1;
 case WIN_MULTWRITE:
 if (!s-mult_sectors)
 goto abort_cmd;
@@ -1573,18 +1665,24 @@
 n = s-req_nb_sectors;
 ide_transfer_start(s, s-io_buffer, 512 * n, ide_sector_write);
 break;
+   case WIN_READDMA_EXT:
+   lba48_cmd = 1;
 case WIN_READDMA:
 case WIN_READDMA_ONCE:
 if (!s-bs) 
 goto abort_cmd;
 ide_sector_read_dma(s);
 break;
+   case WIN_WRITEDMA_EXT:
+   lba48_cmd = 1;
 case WIN_WRITEDMA:
 case WIN_WRITEDMA_ONCE:
 if (!s-bs) 
 goto abort_cmd;
 ide_sector_write_dma(s);
 break;
+case WIN_READ_NATIVE_MAX_EXT:
+   lba48_cmd = 1;
 case WIN_READ_NATIVE_MAX:
 ide_set_sector(s, s-nb_sectors - 1);
 s-status = READY_STAT;
@@ -1615,6 +1713,7 @@
case WIN_STANDBYNOW1:
 case WIN_IDLEIMMEDIATE:
 case WIN_FLUSH_CACHE:
+case WIN_FLUSH_CACHE_EXT:
s-status = READY_STAT;
 ide_set_irq(s);
 break;

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Audio cd's in guest OS

2005-11-07 Thread Jens Axboe
On Sat, Nov 05 2005, Oliver Gerlich wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Lars Roland schrieb:
  On 11/4/05, Mike Swanson [EMAIL PROTECTED] wrote:
  
 I've found on systems where traditional rippers don't work (eg,
 cdparanoia), CDFS has a greater chance of ripping the CDs (by default
 into WAV, but you can enable an option to rip it in the pure CDDA
 format if you want).
  
  
  Thanks - I should have known that someone had made a file system for
  this. However I still think it would be great to be able to pass the
  actual /dev/cdrom on to the guest OS, but I must admit that I have not
  grasped the complexity yet on doing this, so I am going to do some
  Qemu code reading before continuing - I am not even sure if it can be
  done in VMWare although I  seam to remember that Windows as a host OS
  running VMWare allows the guest access to a audio cdrom.
  
 
 Not sure how VMware does that; but actually I didn't even succeed
 accessing /dev/cdrom on the host when an audio cd is inserted:
 
 dd if=/dev/hdc of=/dev/null bs=2352 count=1
 dd: reading `/dev/hdc': Input/output error
 0+0 records in
 0+0 records out
 0 bytes transferred in 0.077570 seconds (0 bytes/sec)
 
 I used a blocksize of 2352 because I've read that's the size for audio
 cds... It didn't work with bs=1 either.

While the block size you gave is correct for cdda frames, you cannot
read them this way. The commands you use for reading data from a data
track varies, and the CDROM driver will always use the READ_10 command
for io originating from the file system layer. You would also need to
put some effort into the page cache to allow non-power-of-2 block sizes
for this to work. So it's not trivial :-)

For reading audio tracks, you can use either some pass through command
mechanism like CDROM_SEND_PACKET or SG_IO. Or the CDROMREADAUDIO ioctl,
which is the easiest to use since it doesn't require an understanding of
the command set.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] Audio cd's in guest OS

2005-11-07 Thread Jens Axboe
On Sat, Nov 05 2005, Fabrice Bellard wrote:
 Lars Roland wrote:
 On 11/4/05, Mike Swanson [EMAIL PROTECTED] wrote:
 
 I've found on systems where traditional rippers don't work (eg,
 cdparanoia), CDFS has a greater chance of ripping the CDs (by default
 into WAV, but you can enable an option to rip it in the pure CDDA
 format if you want).
 
 
 Thanks - I should have known that someone had made a file system for
 this. However I still think it would be great to be able to pass the
 actual /dev/cdrom on to the guest OS, but I must admit that I have not
 grasped the complexity yet on doing this, so I am going to do some
 Qemu code reading before continuing - I am not even sure if it can be
 done in VMWare although I  seam to remember that Windows as a host OS
 running VMWare allows the guest access to a audio cdrom.
 
 QEMU does not currently support reading raw CD tracks, but it is 
 definitely possible to add it (along with play audio features and even 
 CD recording).

I actually implemented the commands needed for recording some months
ago, but never really wrapped it up and submitted it. If there's any
interesting in this, I'll dust it off when I have some spare time.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [patch] non-blocking disk IO

2005-10-04 Thread Jens Axboe
On Tue, Oct 04 2005, Troy Benjegerdes wrote:
 What we want is to be able to have the guest OS request some DMA
 I/O operation, and have qemu be able to use AIO so that the actual disk
 hardware can dump the data directly in the pages the userspace process
 on the guest OS ends up wanting it in, avoiding several expensive memcopy
 and context switch operations.

That should be easy enough to do already, with or without the
nonblocking patch. Just make sure to open the files O_DIRECT and align
the io buffers and lengths. With a 2.6 host, you can usually get away
with 512-b aligment, on 2.4 you may have to ensure 1k/4k alignment.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel


Re: [Qemu-devel] [patch] non-blocking disk IO

2005-10-03 Thread Jens Axboe
On Mon, Oct 03 2005, John Coiner wrote:
 
 Non-blocking disk IO now works for any type of disk image, not just 
 raw format. There is no longer any format-specific code in the patch:
 
 http://people.brandeis.edu/~jcoiner/qemu_idedma/qemu_dma_patch.html
 
 You might want this patch if:
  * you run a multitasking guest OS,
  * you access a disk sometimes, and
  * you wouldn't mind if QEMU ran a little faster.
 
 Why I have not got feedback in droves I do not understand ;)

Why not use aio for this instead, seems like a better fit than spawning
a thread per block device? That would still require a thread for
handling completions, but you could easily just use a single completion
thread for all devices for this as it would not need to do any real
work.

-- 
Jens Axboe



___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel