** Changed in: lxc (Ubuntu)
       Status: Triaged => Fix Released

** Information type changed from Private Security to Public

You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.

  PCI Device Access Through /proc/

Status in lxc package in Ubuntu:
  Fix Released

Bug description:
  PCI Device Control Region Access From Within Containers


  * From within a container, it is possible to access the control regions
  of devices attached to the host PCI bus by using the /proc/bus/pci/ interface.
  This is allowed because of the CAP_SYS_RAWIO capability which is by default
  enabled inside an LxC container. This proof of concept uses this vulnerability
  to speak to an AHCI device directly, and ask a SATA drive to identify itself 
(although it could
  trivially be used to create a denial-of-service of the drive instead). The 
  of an AHCI drive is an arbitrary choice, a different approach may be to go 
after other targets
  on the PCI bus, such as the network controller.

  * This proof of concept is meant to demonstrate the ability to circumvent 
  by communicating with underlying hardware directly. It is likely this could 
be leveraged
  into full access to the underlying hard disk, however, this exploitation 
would be quite
  complicated, and is discussed in full later. 


  * The test environment for me was a vmware workstation system running
  Ubuntu. The primary disk was a SCSI disc, but I added a secondary 1GB SATA
  disk, with no special settings (write caching was enabled by default).
  You can talk to it if it's mounted or not.

  * I created a default LxC environment using the instructions at

  * As the root user in the LxC container,  I used lspci -vv to get the 
  information about the target AHCI device:

  02:05.0 SATA controller: VMware Device 07e0 (prog-if 01 [AHCI 1.0])
        Subsystem: VMware Device 07e0
        Physical Slot: 37
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
  Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
  <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Interrupt: pin A routed to IRQ 72
        Region 5: Memory at fd5ee000 (32-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at e7b10000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 4024
        Capabilities: [60] SATA HBA v1.0 InCfgSpace
        Capabilities: [70] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: ahci

  * Compile and execute the attached tool (pciread.c), and then run it within
  the container. In this example, the invocation was:

  #./pciread -b 02 -d 05 -f 0 -a 0xfd5ee000 -p 1

  And I get the following output:

  bar: fd5ee000 bus: 02 device: 05 function: 0
  opened /proc/bus/pci/02/05.0
  mapping 1 pages of size: 4096
  AHCI 0001.0300 32 slots 30 ports 6 Gbps 0x3fffffff impl
  IRQs disabled
  Supports 64-bit addresses
  Supports native command queuing
  DMA buffer @ 0x35b6b000
  cmd list buffer @ 0x25f9c000
  FIS buffer @ 0x26ccf000
  cmd buffer @ 0xe7e3000
  p->ports_impl: 0x3fffffff
  --------- port 0 ---------
  command list base address: 0x0
  FIS base address: 0x14c29000
  interrupt status: 0x0
  interrupt enable: 0x7840007f
  command and status: 0x44016
  signature : 0x101 (SATA drive)
  tfd : 0x441
  status : 0x123
  errors : 0x0
  active : 0x0
  control : 0x320
  interrupt status before: 0x0
  start bit before: 0
  interrupt status after: 0x2
  Waiting for command completion
  Seems to have completed...
  Got response data in DMA buffer:

  0x7f5a27873000: 7a 42 ab 08 00 00 0f 00   00 00 00 00 3f 00 00 00
  zB.......  .......
  0x7f5a27873010: 00 00 00 00 30 30 30 30   30 30 30 30 30 30 30 30
  ....00000  0000000
  0x7f5a27873020: 30 30 30 30 30 30 31 30   00 00 40 00 00 00 30 30
  00000010.  .....00
  0x7f5a27873030: 30 30 30 30 31 30 4d 56   61 77 65 72 56 20 72 69
  000010MVa  werV.ri
  0x7f5a27873040: 75 74 6c 61 53 20 54 41   20 41 61 48 64 72 44 20
  utlaS.TA.  AaHdrD.
  0x7f5a27873050: 69 72 65 76 20 20 20 20   20 20 20 20 20 20 ff 80
  irev.....  .......
  0x7f5a27873060: 00 00 00 0f 01 40 00 02   00 00 07 00 ab 08 0f 00
  .........  .......
  0x7f5a27873070: 3f 00 3b ff 1f 00 ff 01   00 00 20 00 00 00 07 00
  .........  .......
  0x7f5a27873080: 03 00 78 00 78 00 78 00   78 00 00 00 00 00 00 00
  ..x.x.x.x  .......
  0x7f5a27873090: 00 00 00 00 00 00 1f 00   06 01 00 00 00 00 00 00
  .........  .......
  0x7f5a278730a0: 7e 00 18 00 08 40 08 74   00 41 08 40 80 34 00 41
  .......t.  A...4.A

  * The hexdump output shows the ATA IDENTIFY command response sent back
  from the controller.

  * There are some assumptions the code makes. It assumes the drive it is
  going to talk to is the first device it finds in the AHCI port list
  that is actually active.

  * Also it doesn't cleanly recover everything after getting the response,
  so the state of the mapped registers is wrong and the kernel won't be
  able to mount the device afterwards or anything. 

  # Explanation of PoC

  While reading the attached code is instructive, here is an overview of the 
methodology used:
  * Map the control region of the AHCI device into memory through the 
/proc/bus/pci/ interface using open(), mmap(), and ioctl().
  * Allocate several buffers, and determine their logical address using 
  * Disable interrupts for the device.
  * Find the port the drive is attached to.
  * Set the FIS, Command, and Command List pointers on the device to the 
previously allocated buffers.
  * Create a H2D FIS (to tell the drive to identify itself), a command to wrap 
the FIS  (telling the drive to use a DMA buffer we have allocated), and a 
command list structure containing the command. 
  * Copy all of these to our previously allocated buffers, which the device 
also now has pointers to.
  * Flip the start bit on the device to cause it process commands from the 
command list.
  * Sleep for a second, then spin loop until the drive has processed our 
  * The drive has now executed our command (ATA_CMD_ID_ATA, which is the drive 
identification command), and written the result to a buffer we allocated. We 
print it out, and attempt (poorly) to restore the drive's state.

  # Full Impact

  As said earlier, by zeroing out the device's control regions, the
  device is rendered inoperable until a host restart. In addition,
  exploitation of a different PCI device may prove an easier target for
  an attacker.

  Continuing with the AHCI device scenario, it is very likely this
  vulnerability could be leveraged for arbitrary reads/writes to the
  underlying drive. Doing this is not at all straightforward, stemming
  from to the fact that you are not the kernel. In order to actually
  issue commands to the AHCI device, it is necessary to change the
  several pointers in the AHCI device's memory. Because the kernel keeps
  its own cached copy of these pointers and their contents, requests the
  kernel makes to the device during this time will at best be incorrect,
  and at worst will cause a kernel crash/panic/oops.

  An exploitation scenario for full container escape would look roughly

  * Create a situation where the disc is idle. This could be done a number of 
        ** Engage the kernel in processor intensive activity that does not 
require disc access (so nothing that will require using swap). This may be 
somewhat easier than expected due to CAP_SYS_NICE.
        ** Disable interrupts on the device. It's not clear to me whether the 
kernel would crash/panic/oops, or whether it would behave itself for short 
amounts of time with the device's interrupts disabled. 
        ** Set the device's busy flags to tell the kernel the device is busy. 
However, the kernel is free to ignore these.
        ** Opportunistically wait until the drive is idle. 
  * Issue a FIS to read the superblock of the disc into memory and parse it.
  * Repeat this process of reading inodes to navigate the inode tree until you 
have found the inode for the targeted lxc.container.config file.
  * Find the block which contains the piece of this file you wish to modify and 
read it into memory.
  * Modify it so that the configuration calls for the host root to be mounted 
inside the container RW. 
  * Issue a FIS to overwrite the targeted block with the new modified block.
  * On container restart, you now have the true root mounted inside your 

  Note that it may be necessary to perform each one of these drive
  operations individually, and then restore the drive to its original
  state in between, as to not alarm the kernel. Reading more than a page
  size of data is also potentially problematic, as you need to ensure
  your allocated buffer is contiguous in physical memory. However, this
  could likely be accomplished with something analogous to heap feng
  shui, or by simply by reading multi-page targets in multiple requests
  of at most one page.

To manage notifications about this bug go to:

Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to