We're all dancing around a very fundamental question here: what interface abstraction should the "raw" interface to a disk controller (and attached disks) present?
We're not going to allow userland to directly write device registers as a general practice (X11 notwithstanding, and that's a glaring & horrible exception to UNIX rules because we've been unwilling to put a full graphics abstraction subsystem, with appropriate userland API, into the kernel (too big! too ugly! no API agreement!), as we have with disks (filesystems), network interfaces (protocol stacks), and serial devices (tty line disciplines)), and userland code does not handle device interrupts; that's the kernel's job. We do generally allow userland to initiate DMA (through system calls) directly from userland memory - that's why the raw interface is generally faster than the block interface: no byte copying. Oh, and you get to do I/O in chunks larger than block interface is designed for, provided that the device (and driver) permits it. Then there's the whole addressing question. Disk blocks used to be addressed by cylinder/head/sector numbers, and the driver translated between block numbers and c/h/s; now, modern disks do that translation for us, and when asked about c/h/s they even lie to us to hide their guts (or to follow very old abstractions). And we're talking lately about disks with 4K native blocks rather than the traditional 512 byte blocks (though you've been able to format properly compliant SCSI disks to block sizes other than 512 bytes for a very long time (decades)). However, even "blocks" are an abstraction - UNIX wants to address everything in bytes; just look at read(2), write(2), and lseek(2). No mention of "blocks" - bytes are the fundamental (atomic) data & address unit of the system. We translate that to everything else as required. So, what should be the abstraction that the raw interface to a "disk" be? It's going to have a translation from bytes to whatever the disk is addressed in. The driver will handle manipulation of the device registers and handle interrupts. Our memory allocators tend already to be conservative about alignment, but would not be unreasonable for a device driver that knows de facto that a device requires aligned DMA addresses to check what's requested in read(2)/write(2) and return EINVAL as necessary (naturally, the device man page should document all the reasons a driver will return an error). However, some warts are just easier to handle in the device driver, rather than leave for (less capable) userland code to deal with. Another way to put the question: what is a disk? What are its fundamental properties, and how can we design a reasonable abstraction (which in most cases is probably not all that abstract) for userland code to reasonably deal with? As with all things, we have tradeoffs to make; UNIX is pragmatic: a good solution today to today's problems is better than a perfect solution (which we've got to find some poor sod to implement!) tomorrow. Erik <f...@netbsd.org>