On Wednesday, January 16, 2019 at 8:58:46 AM UTC-5, Nadav Har'El wrote:
>
>
> On Wed, Jan 16, 2019 at 7:21 AM Waldemar Kozaczuk <[email protected] 
> <javascript:>> wrote:
>
> This patch refactors virtio layer in order to support
> modern PCI and mmio devices. It is essentially a step 0
> in the list of tasks to make OSv boot on AWS firecracker.
>
>
> This patch introduces virtio_device class that abstracts
> virtio transport layer. That way virtio_driver class and
> its specializations can interact with host without being
> tied to a specific transport (PCI, mmio, etc).
>
> The virtio_pci_device, virtio_legacy_pci_device and 
> virtio_modern_pci_device
> classes define specializations of virtio_device for PCI transport.
> This patch effectively moves all PCI-related logic from virtio_driver
> to virtio_legacy_pci_device classi and effectively virtio drivers
> transport agnostic. Followup patches will provide implementations
> of virtio_modern_pci_device and future virtio_mmio_device.
>
>
> The interrupt_factory struct defines a mechanism used
> by virtio drivers to specify particulars of creating/registering
> proper interrupt logic for each transport.
>
>
> Thanks!
>
> This is a long patch which looks mostly good to me, but it has a lot of 
> things so I left a few comments below but I can't really tell if all
> the details are correct or not, so I would appreciate knowing more on how 
> you tested it. In particular you should see that in the
> normal case ("legacy" virtio in qemu) the network, disk, and random number 
> generator drivers continue to work as expected.
> Would be nice to also get to other edge cases (various old interrupt 
> types) but I'm not even sure how we ever reached those.
>
Yeah I am not sure how to make it shorter. 
I simply ran all the unit tests many times and they pass. I also tested 
couple of apps with virtio-scsi on. I guess I could run all unit tests with 
them as well. Same for older interrupt types - would removing "+x2apic" 
make it use old interrupt. I will also test non-virtio devices just in case.
Overall this patch should not change any existing functionality.
 
I must say that I never saw any failures on my Linux laptop but I did see 
occasional failures of some networking unit tests when running in nested 
virtualization - KVM off. But I think unit tests would occasionally fail in 
nested virtualization scenario before my patch as well. But I will double 
verify it.

Also this patch should not affect XEN or Vmware, should it? 

>
>
> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>>
> ---
>  Makefile                     |   1 +
>  drivers/device.hh            |   2 +
>  drivers/pci-generic.cc       |  25 +++-
>  drivers/virtio-assign.cc     |  40 ++-----
>  drivers/virtio-blk.cc        |  29 +++--
>  drivers/virtio-blk.hh        |   5 +-
>  drivers/virtio-device.hh     |  78 ++++++++++++
>  drivers/virtio-net.cc        |  38 +++---
>  drivers/virtio-net.hh        |   4 +-
>  drivers/virtio-pci-device.cc | 223 +++++++++++++++++++++++++++++++++++
>  drivers/virtio-pci-device.hh | 150 +++++++++++++++++++++++
>  drivers/virtio-rng.cc        |  16 ++-
>  drivers/virtio-rng.hh        |   3 +-
>  drivers/virtio-scsi.cc       |  32 ++---
>  drivers/virtio-scsi.hh       |   5 +-
>  drivers/virtio-vring.cc      |  15 +--
>  drivers/virtio-vring.hh      |   4 +-
>  drivers/virtio.cc            | 120 +++++--------------
>  drivers/virtio.hh            |  87 ++------------
>  drivers/vmw-pvscsi.hh        |   4 +-
>  20 files changed, 611 insertions(+), 270 deletions(-)
>  create mode 100644 drivers/virtio-device.hh
>  create mode 100644 drivers/virtio-pci-device.cc
>  create mode 100644 drivers/virtio-pci-device.hh
>
> diff --git a/Makefile b/Makefile
> index 68043485..7a88c56a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -819,6 +819,7 @@ drivers += $(libtsm)
>  drivers += drivers/vga.o drivers/kbd.o drivers/isa-serial.o
>  drivers += arch/$(arch)/pvclock-abi.o
>  drivers += drivers/virtio.o
> +drivers += drivers/virtio-pci-device.o
>  drivers += drivers/virtio-vring.o
>  drivers += drivers/virtio-net.o
>  drivers += drivers/virtio-assign.o
> diff --git a/drivers/device.hh b/drivers/device.hh
> index 3882470a..2fbd0afd 100644
> --- a/drivers/device.hh
> +++ b/drivers/device.hh
> @@ -34,6 +34,8 @@ namespace hw {
>                      && _device_id == other._device_id;
>          }
>
> +        bool is_vendor(u16 vendor_id) { return _vendor_id == vendor_id; }
> +
>      private:
>          u16 _vendor_id;
>          u16 _device_id;
> diff --git a/drivers/pci-generic.cc b/drivers/pci-generic.cc
> index 61e7a0f5..821111c3 100644
> --- a/drivers/pci-generic.cc
> +++ b/drivers/pci-generic.cc
> @@ -16,6 +16,8 @@
>  #include "drivers/pci-function.hh"
>  #include "drivers/pci-bridge.hh"
>  #include "drivers/pci-device.hh"
> +#include "drivers/virtio.hh"
> +#include "drivers/virtio-pci-device.hh"
>
>  namespace pci {
>
> @@ -93,11 +95,24 @@ bool check_bus(u16 bus)
>                  break;
>              }
>
> -            if (!device_manager::instance()->register_device(dev)) {
> -                pci_e("Error: couldn't register device %02x:%02x.%x",
> -                        bus, slot, func);
> -                //TODO: Need to beautify it as multiple instances of the 
> device may exist
> -                delete dev;
> +            if (auto pci_dev = dynamic_cast<device*>(dev)) {
>
>
> This test is just for checking dev is not a bridge? Or something else?
>
Yep. 

>  
>
> +                if 
> (pci_dev->get_id().is_vendor(virtio::VIRTIO_VENDOR_ID)) {
> +                    auto virtio_dev = 
> virtio::create_virtio_device(pci_dev);
> +                    if 
> (!device_manager::instance()->register_device(virtio_dev)) {
> +                        pci_e("Error: couldn't register virtio pci device 
> %02x:%02x.%x",
> +                              bus, slot, func);
> +                        //TODO: Need to beautify it as multiple instances 
> of the device may exist
> +                        delete dev;
> +                    }
> +                }
>
>
> Shouldn't we have an else here? Perhaps reporting an unknown device, or 
> soemthing?
>
Sure. 

>
> +            }
> +            else { 
>
> +                if (!device_manager::instance()->register_device(dev)) {
> +                    pci_e("Error: couldn't register device %02x:%02x.%x",
> +                          bus, slot, func);
> +                    //TODO: Need to beautify it as multiple instances of 
> the device may exist
> +                    delete dev;
> +                }
>
>
> This code code (register_device() call, error handling, etc.) is 
> duplicated.  Would be nice not to, I think (perhaps by making this regular 
> code, not an "else", and having the other successful cases fall through to 
> this registration).
>
> Sure. 

>              }
>
>              // test for multiple functions
> diff --git a/drivers/virtio-assign.cc b/drivers/virtio-assign.cc
> index 8a128d5e..427811b6 100644
> --- a/drivers/virtio-assign.cc
> +++ b/drivers/virtio-assign.cc
> @@ -7,8 +7,10 @@
>
>
>  #include <osv/virtio-assign.hh>
> +#include <drivers/virtio-device.hh>
>  #include <drivers/virtio-net.hh>
>
> +
>  // Currently we support only one assigned virtio device, but more could
>  // easily be added later.
>  static osv::assigned_virtio *the_assigned_virtio_device = nullptr;
> @@ -23,8 +25,8 @@ assigned_virtio *assigned_virtio::get() {
>  class impl : public osv::assigned_virtio, public virtio::virtio_driver {
>  public:
>      static hw_driver* probe_net(hw_device* dev);
> -    explicit impl(pci::device& dev)
> -        : virtio_driver(dev)
> +    explicit impl(virtio::virtio_device& virtio_dev)
> +        : virtio_driver(virtio_dev)
>      {
>          assert(!the_assigned_virtio_device);
>          the_assigned_virtio_device = this;
> @@ -48,9 +50,8 @@ public:
>
>      virtual u32 queue_size(int queue) override
>      {
> -        virtio_conf_writew(virtio::VIRTIO_PCI_QUEUE_SEL, queue);
> -        return virtio_conf_readw(virtio::VIRTIO_PCI_QUEUE_NUM);
> -
> +        _dev.select_queue(queue);
> +        return _dev.get_queue_size();
>      }
>      virtual u32 init_features(u32 driver_features) override
>      {
> @@ -70,34 +71,13 @@ public:
>          auto *ht = &_hack_threads.back(); // assumes object won't move 
> later
>          handler = [ht] { ht->wake(); };
>
> -        // OSv's generic virtio driver has already set the device to 
> msix, and set
> -        // the VIRTIO_MSI_QUEUE_VECTOR of its queue to its number.
> -        assert(_dev.is_msix());
> -        virtio_conf_writew(virtio::VIRTIO_PCI_QUEUE_SEL, queue);
> -        assert(virtio_conf_readw(virtio::VIRTIO_MSI_QUEUE_VECTOR) == 
> queue);
> -        if (!_dev.is_msix_enabled()) {
> -            _dev.msix_enable();
> -        }
> -        auto vectors = _msi.request_vectors(1);
> -        assert(vectors.size() == 1);
> -        auto vec = vectors[0];
> -        // TODO: in _msi.easy_register() we also have code for moving the
> -        // interrupt's affinity to where the handling thread is. We should
> -        // probably do this here too.
> -        _msi.assign_isr(vec, handler);
> -        auto ok = _msi.setup_entry(queue, vec);
> -        assert(ok);
> -        vec->msix_unmask_entries();
> +        _dev.register_interrupt(queue,handler);
>
>
> Being very rusty with this code, I am not sure if these changes make sense 
> or not. Would have been
> really great if you could test whether "assigned virtio" still works. But 
> this will probably be difficult to
> test because assigned virtio (see 
> 86748d776b5fd7bcbb0af61da22ef148d40c66f0) is a feature we
> invented to tell OSv to ignore a virtio-net device and instead let the 
> application (namely, an old version
> of Seastar) read from the virtio queue directly.
>
> If we can't test it still works, I wonder if we should not, with deep 
> regrets, drop the "assigned virtio" feature :-(
>
I am very much in favor of you suggestion. Honestly I have no idea if my 
changes do not break this functionality. I was not even aware that it used 
to be used by seastar.
Unless somebody strongly disagrees I am more than happy to send a separate 
patch removing assigned virtio. This would also make this patch smaller and 
somewhat simpler. 
Anybody strongly disagrees? Shall I send a separate email to the group?

>  
>
>      }
>
>      virtual void set_queue_pfn(int queue, u64 phys) override
>      {
> -        virtio_conf_writew(virtio::VIRTIO_PCI_QUEUE_SEL, queue);
> -        // Tell host about pfn
> -        u64 pfn = phys >> virtio::VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> -        // A bug in virtio's design... on large memory, this can actually 
> happen
> -        assert(pfn <= std::numeric_limits<u32>::max());
> -        virtio_conf_writel(virtio::VIRTIO_PCI_QUEUE_PFN, (u32)pfn);
> +        _dev.select_queue(queue);
> +        _dev.activate_queue(phys);
>      }
>
>      virtual void set_driver_ok() override
> @@ -107,7 +87,7 @@ public:
>
>      virtual void conf_read(void *buf, int length) override
>      {
> -        virtio_conf_read(virtio_pci_config_offset(), buf, length);
> +        virtio_conf_read(_dev.config_offset(), buf, length);
>      }
>
>  private:
> diff --git a/drivers/virtio-blk.cc b/drivers/virtio-blk.cc
> index a3f7898d..5f9f2028 100644
> --- a/drivers/virtio-blk.cc
> +++ b/drivers/virtio-blk.cc
> @@ -10,7 +10,6 @@
>
>  #include "drivers/virtio.hh"
>  #include "drivers/virtio-blk.hh"
> -#include "drivers/pci-device.hh"
>  #include <osv/interrupt.hh>
>
>  #include <osv/mempool.hh>
> @@ -100,7 +99,7 @@ struct driver blk_driver = {
>
>  bool blk::ack_irq()
>  {
> -    auto isr = virtio_conf_readb(VIRTIO_PCI_ISR);
> +    auto isr = _dev.ack_irq();
>
>
> Why do you call the method returning the isr "ack_irq"?
> (or, alternatively, why call its return value "isr"?)
>
I will revisit it. 

>  
>
>      auto queue = get_virt_queue(0);
>
>      if (isr) {
> @@ -112,8 +111,8 @@ bool blk::ack_irq()
>
>  }
>
> -blk::blk(pci::device& pci_dev)
> -    : virtio_driver(pci_dev), _ro(false)
> +blk::blk(virtio_device& virtio_dev)
> +    : virtio_driver(virtio_dev), _ro(false)
>  {
>
>      _driver_name = "virtio-blk";
> @@ -128,13 +127,19 @@ blk::blk(pci::device& pci_dev)
>              sched::thread::attr().name("virtio-blk"));
>      t->start();
>      auto queue = get_virt_queue(0);
> -    if (pci_dev.is_msix()) {
> -        _msi.easy_register({ { 0, [=] { queue->disable_interrupts(); }, t 
> } });
> -    } else {
> -        _irq.reset(new pci_interrupt(pci_dev,
> -                                     [=] { return ack_irq(); },
> -                                     [=] { t->wake(); }));
> -    }
> +
> +    interrupt_factory int_factory;
> +    int_factory.register_msi_bindings = [queue, t](interrupt_manager 
> &msi) {
> +        msi.easy_register( {{ 0, [=] { queue->disable_interrupts(); }, t 
> }});
> +    };
> +
> +    int_factory.create_pci_interrupt = [this,t](pci::device &pci_dev) {
> +        return new pci_interrupt(
> +            pci_dev,
> +            [=] { return this->ack_irq(); },
> +            [=] { t->wake(); });
> +    };
> +    _dev.register_interrupt(int_factory);
>
>      // Enable indirect descriptor
>      queue->set_use_indirect(true);
> @@ -164,7 +169,7 @@ blk::~blk()
>  void blk::read_config()
>  {
>      //read all of the block config (including size, mce, topology,..) in 
> one shot
> -    virtio_conf_read(virtio_pci_config_offset(), &_config, 
> sizeof(_config));
> +    virtio_conf_read(_dev.config_offset(), &_config, sizeof(_config));
>
>      trace_virtio_blk_read_config_capacity(_config.capacity);
>
> diff --git a/drivers/virtio-blk.hh b/drivers/virtio-blk.hh
> index 17cf4d18..8e814501 100644
> --- a/drivers/virtio-blk.hh
> +++ b/drivers/virtio-blk.hh
> @@ -8,7 +8,7 @@
>  #ifndef VIRTIO_BLK_DRIVER_H
>  #define VIRTIO_BLK_DRIVER_H
>  #include "drivers/virtio.hh"
> -#include "drivers/pci-device.hh"
> +#include "drivers/virtio-device.hh"
>  #include <osv/bio.h>
>
>  namespace virtio {
> @@ -118,7 +118,7 @@ public:
>          u8 status;
>      };
>
> -    explicit blk(pci::device& dev);
> +    explicit blk(virtio_device& dev);
>      virtual ~blk();
>
>      virtual std::string get_name() const { return _driver_name; }
> @@ -157,7 +157,6 @@ private:
>      bool _ro;
>      // This mutex protects parallel make_request invocations
>      mutex _lock;
> -    std::unique_ptr<pci_interrupt> _irq;
>  };
>
>  }
> diff --git a/drivers/virtio-device.hh b/drivers/virtio-device.hh
> new file mode 100644
> index 00000000..e6e29155
> --- /dev/null
> +++ b/drivers/virtio-device.hh
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright (C) 2019 Waldemar Kozaczuk.
> + *
> + * This work is open source software, licensed under the terms of the
> + * BSD license as described in the LICENSE file in the top-level 
> directory.
> + */
> +
> +#ifndef VIRTIO_DEVICE_HH
> +#define VIRTIO_DEVICE_HH
> +
> +#include <osv/types.h>
> +#include <osv/pci.hh>
> +#include <osv/interrupt.hh>
> +#include <osv/msi.hh>
> +#include "virtio-vring.hh"
> +#include "device.hh"
> +
> +using namespace hw;
> +
> +#define VIRTIO_ALIGN(x,alignment) ((x + (alignment-1)) & ~(alignment-1))
> +
> +namespace virtio {
> +
> +// Allows virtio drivers specify how to instantiate interrupts or
> +// register msi bindings. The associated virtio-device will
> +// use adequate functor to create correct interrupt in order
> +// to register it.
> +struct interrupt_factory {
> +    std::function<void(interrupt_manager &)> register_msi_bindings = 
> nullptr;
> +    std::function<pci_interrupt *(pci::device &)> create_pci_interrupt = 
> nullptr;
> +    std::function<gsi_edge_interrupt *()> create_gsi_edge_interrupt = 
> nullptr;
> +};
> +
> +// Defines virtio transport abstraction used by virtio-driver
> +// to communicate with virtio device. The specializations of thise
>
>
> thise -> this
>
> +// include virtio pci device (legacy and modern) as well
> +// as virtio mmio device. This abstraction allows virtio driver
> +// not be tied to any specific transport (pci, mmio).
> +class virtio_device : public hw_device {
> +public:
> +    virtual ~virtio_device() {};
> +
> +    virtual void init() = 0;
> +
> +    virtual u8 ack_irq() = 0;
> +    virtual void register_interrupt(interrupt_factory irq_factory) = 0;
> +    virtual void register_interrupt(unsigned int queue, 
> std::function<void(void)> handler) = 0;
> +
> +    virtual void select_queue(int queue) = 0;
> +    virtual u16 get_queue_size() = 0;
> +    virtual void setup_queue(int queue) = 0;
> +    virtual void activate_queue(vring *queue) = 0;
> +    virtual void activate_queue(u64 phys) = 0;
> +    virtual void kick_queue(int queue) = 0;
> +
> +    virtual u64 get_available_features() = 0;
> +    virtual bool get_available_feature_bit(int bit) = 0;
> +
> +    virtual void set_enabled_features(u64 features) = 0;
> +    virtual void set_enabled_feature_bit(int bit, bool on) = 0;
> +    virtual u64 get_enabled_features() = 0;
> +    virtual bool get_enabled_feature_bit(int bit) = 0;
> +
> +    virtual u8 get_status() = 0;
> +    virtual void set_status(u8 status) = 0;
> +
> +    virtual u8 read_config(u32 offset) = 0;
> +    virtual void write_config(u32 offset, u8 byte) = 0;
> +    virtual int config_offset() = 0;
> +    virtual void dump_config() = 0;
> +
> +    virtual bool is_modern() = 0;
> +    virtual size_t get_vring_alignment() = 0;
> +};
> +
> +}
> +
> +#endif //VIRTIO_DEVICE_HH
> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
> index b820c36c..3528613d 100644
> --- a/drivers/virtio-net.cc
> +++ b/drivers/virtio-net.cc
> @@ -10,7 +10,6 @@
>
>  #include "drivers/virtio.hh"
>  #include "drivers/virtio-net.hh"
> -#include "drivers/pci-device.hh"
>  #include <osv/interrupt.hh>
>
>  #include <osv/mempool.hh>
> @@ -217,7 +216,7 @@ void net::fill_qstats(const struct txq& txq, struct 
> if_data* out_data) const
>
>  bool net::ack_irq()
>  {
> -    auto isr = virtio_conf_readb(VIRTIO_PCI_ISR);
> +    auto isr = _dev.ack_irq();
>
>      if (isr) {
>          _rxq.vqueue->disable_interrupts();
> @@ -228,7 +227,7 @@ bool net::ack_irq()
>
>  }
>
> -net::net(pci::device& dev)
> +net::net(virtio_device& dev)
>      : virtio_driver(dev),
>        _rxq(get_virt_queue(0), [this] { this->receiver(); }),
>        _txq(this, get_virt_queue(1))
> @@ -290,16 +289,21 @@ net::net(pci::device& dev)
>
>      ether_ifattach(_ifn, _config.mac);
>
> -    if (dev.is_msix()) {
> -        _msi.easy_register({
> -            { 0, [&] { _rxq.vqueue->disable_interrupts(); }, poll_task },
> -            { 1, [&] { _txq.vqueue->disable_interrupts(); }, nullptr }
> -        });
> -    } else {
> -        _irq.reset(new pci_interrupt(dev,
> -                                     [=] { return this->ack_irq(); },
> -                                     [=] { poll_task->wake(); }));
> -    }
> +    interrupt_factory int_factory;
> +    int_factory.register_msi_bindings = 
> [this,poll_task](interrupt_manager &msi) {
> +       msi.easy_register({
> +           { 0, [&] { this->_rxq.vqueue->disable_interrupts(); }, 
> poll_task },
> +           { 1, [&] { this->_txq.vqueue->disable_interrupts(); }, nullptr 
> }
> +       });
> +    };
> +
> +    int_factory.create_pci_interrupt = [this,poll_task](pci::device 
> &pci_dev) {
> +        return new pci_interrupt(
> +            pci_dev,
> +            [=] { return this->ack_irq(); },
> +            [=] { poll_task->wake(); });
> +    };
> +    _dev.register_interrupt(int_factory);
>
> I am using here nested lambdas (lambdas that pass lambdas). My C++ 
experience is before C++11 so I wonder if there is no benign error 
especially around capturing by value vs reference. For example the outer 
lambda captures "this" and "poll_task" pointers by value - are we sure it 
does not affect the nested lambdas in some unexpected way?   

Also I am wondering what you think about this interrupt_factory struct as a 
way to register/create interrupts specific to each transport? Do you think 
there is a better way of doing it?

>      fill_rx_ring();
>
> @@ -324,7 +328,7 @@ net::~net()
>  void net::read_config()
>  {
>      //read all of the net config  in one shot
> -    virtio_conf_read(virtio_pci_config_offset(), &_config, 
> sizeof(_config));
> +    virtio_conf_read(_dev.config_offset(), &_config, sizeof(_config));
>
>      if (get_guest_feature_bit(VIRTIO_NET_F_MAC))
>          net_i("The mac addr of the device is %x:%x:%x:%x:%x:%x",
> @@ -856,12 +860,12 @@ u32 net::get_driver_features()
>
>  hw_driver* net::probe(hw_device* dev)
>  {
> -    if (auto pci_dev = dynamic_cast<pci::device*>(dev)) {
> -        if (pci_dev->get_id() == hw_device_id(VIRTIO_VENDOR_ID, 
> VIRTIO_NET_DEVICE_ID)) {
> +    if (auto virtio_dev = dynamic_cast<virtio_device*>(dev)) {
> +        if (virtio_dev->get_id() == hw_device_id(VIRTIO_VENDOR_ID, 
> VIRTIO_NET_DEVICE_ID)) {
>              if (opt_maxnic && maxnic-- <= 0) {
>                  return nullptr;
>              } else {
> -                return aligned_new<net>(*pci_dev);
> +                return aligned_new<net>(*virtio_dev);
>              }
>          }
>      }
> diff --git a/drivers/virtio-net.hh b/drivers/virtio-net.hh
> index ad323e69..e9f6dd0d 100644
> --- a/drivers/virtio-net.hh
> +++ b/drivers/virtio-net.hh
> @@ -204,7 +204,7 @@ public:
>              u16 virtqueue_pairs;
>      };
>
> -    explicit net(pci::device& dev);
> +    explicit net(virtio_device& dev);
>      virtual ~net();
>
>      virtual std::string get_name() const { return _driver_name; }
> @@ -269,8 +269,6 @@ private:
>
>      u32 _hdr_size;
>
> -    std::unique_ptr<pci_interrupt> _irq;
> -
>      struct rxq_stats {
>          u64 rx_packets; /* if_ipackets */
>          u64 rx_bytes;   /* if_ibytes */
> diff --git a/drivers/virtio-pci-device.cc b/drivers/virtio-pci-device.cc
> new file mode 100644
> index 00000000..a3ccfe26
> --- /dev/null
> +++ b/drivers/virtio-pci-device.cc
> @@ -0,0 +1,223 @@
> +/*
> + * Copyright (C) 2019 Waldemar Kozaczuk.
> + *
> + * This work is open source software, licensed under the terms of the
> + * BSD license as described in the LICENSE file in the top-level 
> directory.
> + */
> +
> +#include "drivers/virtio-pci-device.hh"
> +
> +namespace virtio {
> +
> +virtio_pci_device::virtio_pci_device(pci::device *dev)
> +    : virtio_device()
> +    , _dev(dev)
> +    , _msi(dev)
> +{
> +}
> +
> +virtio_pci_device::~virtio_pci_device() {
> +    delete _dev;
> +}
> +
> +virtio_legacy_pci_device::virtio_legacy_pci_device(pci::device *dev)
> +    : virtio_pci_device(dev)
> +{
> +}
> +
> +void virtio_legacy_pci_device::kick_queue(int queue)
> +{
> +    virtio_conf_writew(VIRTIO_PCI_QUEUE_NOTIFY, queue);
> +}
> +
> +void virtio_legacy_pci_device::setup_queue(int queue)
> +{
> +    if (_dev->is_msix()) {
> +        // Setup queue_id:entry_id 1:1 correlation...
> +        virtio_conf_writew(VIRTIO_MSI_QUEUE_VECTOR, queue);
> +        if (virtio_conf_readw(VIRTIO_MSI_QUEUE_VECTOR) != queue) {
> +            virtio_e("Setting MSIx entry for queue %d failed.", queue);
> +            return;
> +        }
> +    }
> +}
> +
> +void virtio_legacy_pci_device::activate_queue(vring *queue)
> +{
> +    // Tell host about pfn
> +    // TODO: Yak, this is a bug in the design, on large memory we'll have 
> PFNs > 32 bit
> +    // Dor to notify Rusty
> +    virtio_conf_writel(VIRTIO_PCI_QUEUE_PFN, (u32)(queue->get_paddr() >> 
> VIRTIO_PCI_QUEUE_ADDR_SHIFT));
> +}
> +
> +void virtio_legacy_pci_device::activate_queue(u64 phys)
> +{
> +    // Tell host about pfn
> +    u64 pfn = phys >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> +    // A bug in virtio's design... on large memory, this can actually 
> happen
> +    assert(pfn <= std::numeric_limits<u32>::max());
> +    virtio_conf_writel(VIRTIO_PCI_QUEUE_PFN, (u32)pfn);
> +}
> +
> +void virtio_legacy_pci_device::select_queue(int queue)
> +{
> +    virtio_conf_writew(VIRTIO_PCI_QUEUE_SEL, queue);
> +}
> +
> +u16 virtio_legacy_pci_device::get_queue_size()
> +{
> +    return virtio_conf_readw(VIRTIO_PCI_QUEUE_NUM);
> +}
> +
> +bool virtio_legacy_pci_device::get_virtio_config_bit(u32 offset, int bit)
> +{
> +    return virtio_conf_readl(offset) & (1 << bit);
> +}
> +
> +void virtio_legacy_pci_device::set_virtio_config_bit(u32 offset, int bit, 
> bool on)
> +{
> +    u32 val = virtio_conf_readl(offset);
> +    u32 newval = ( val & ~(1 << bit) ) | ((int)(on)<<bit);
> +    virtio_conf_writel(offset, newval);
> +}
> +
> +u64 virtio_legacy_pci_device::get_available_features()
> +{
> +    return virtio_conf_readl(VIRTIO_PCI_HOST_FEATURES);
> +}
> +
> +bool virtio_legacy_pci_device::get_available_feature_bit(int bit)
> +{
> +    return get_virtio_config_bit(VIRTIO_PCI_HOST_FEATURES, bit);
> +}
> +
> +void virtio_legacy_pci_device::set_enabled_features(u64 features)
> +{
> +    virtio_conf_writel(VIRTIO_PCI_GUEST_FEATURES, (u32)features);
> +}
> +
> +void virtio_legacy_pci_device::set_enabled_feature_bit(int bit, bool on)
> +{
> +    set_virtio_config_bit(VIRTIO_PCI_GUEST_FEATURES, bit, on);
> +}
> +
> +u64 virtio_legacy_pci_device::get_enabled_features()
> +{
> +    return virtio_conf_readl(VIRTIO_PCI_GUEST_FEATURES);
> +}
> +
> +bool virtio_legacy_pci_device::get_enabled_feature_bit(int bit)
> +{
> +    return get_virtio_config_bit(VIRTIO_PCI_GUEST_FEATURES, bit);
> +}
> +
> +u8 virtio_legacy_pci_device::get_status()
> +{
> +    return virtio_conf_readb(VIRTIO_PCI_STATUS);
> +}
> +
> +void virtio_legacy_pci_device::set_status(u8 status)
> +{
> +    virtio_conf_writeb(VIRTIO_PCI_STATUS, status);
> +}
> +
> +u8 virtio_legacy_pci_device::read_config(u32 offset)
> +{
> +    return _bar1->readb(offset);
> +}
> +
> +void virtio_legacy_pci_device::write_config(u32 offset, u8 byte)
> +{
> +    _bar1->writeb(offset, byte);
> +}
> +
> +void virtio_legacy_pci_device::dump_config()
> +{
> +    u8 B, D, F;
> +    _dev->get_bdf(B, D, F);
> +
> +    _dev->dump_config();
> +    virtio_d("%s [%x:%x.%x] vid:id=%x:%x", get_name().c_str(),
> +             (u16)B, (u16)D, (u16)F,
> +             _dev.get_vendor_id(),
> +             _dev.get_device_id());
> +}
> +
> +void virtio_legacy_pci_device::init()
> +{
> +    bool status = parse_pci_config();
> +    assert(status);
> +
> +    _dev->set_bus_master(true);
> +
> +    _dev->msix_enable();
> +}
> +
> +void virtio_legacy_pci_device::register_interrupt(interrupt_factory 
> irq_factory)
> +{
> +    if (irq_factory.register_msi_bindings && _dev->is_msix()) {
> +        irq_factory.register_msi_bindings(_msi);
> +    } else {
> +        _irq.reset(irq_factory.create_pci_interrupt(*_dev));
> +    }
> +}
> +
> +void virtio_legacy_pci_device::register_interrupt(unsigned int queue, 
> std::function<void(void)> handler)
> +{
> +    // OSv's generic virtio driver has already set the device to msix, 
> and set
> +    // the VIRTIO_MSI_QUEUE_VECTOR of its queue to its number.
> +    assert(_dev->is_msix());
> +    virtio_conf_writew(virtio::VIRTIO_PCI_QUEUE_SEL, queue);
> +    assert(virtio_conf_readw(virtio::VIRTIO_MSI_QUEUE_VECTOR) == queue);
> +    if (!_dev->is_msix_enabled()) {
> +        _dev->msix_enable();
> +    }
> +    auto vectors = _msi.request_vectors(1);
> +    assert(vectors.size() == 1);
> +    auto vec = vectors[0];
> +    // TODO: in _msi.easy_register() we also have code for moving the
> +    // interrupt's affinity to where the handling thread is. We should
> +    // probably do this here too.
> +    _msi.assign_isr(vec, handler);
> +    auto ok = _msi.setup_entry(queue, vec);
> +    assert(ok);
> +    vec->msix_unmask_entries();
> +}
> +
> +u8 virtio_legacy_pci_device::ack_irq()
> +{
> +    return virtio_conf_readb(VIRTIO_PCI_ISR);
> +}
> +
> +bool virtio_legacy_pci_device::parse_pci_config()
> +{
> +    // Test whether bar1 is present
> +    _bar1 = _dev->get_bar(1);
> +    if (_bar1 == nullptr) {
> +        return false;
> +    }
> +
> +    // Check ABI version
> +    u8 rev = _dev->get_revision_id();
> +    if (rev != VIRTIO_PCI_ABI_VERSION) {
> +        virtio_e("Wrong virtio revision=%x", rev);
> +        return false;
> +    }
> +
> +    // Check device ID
> +    u16 dev_id = _dev->get_device_id();
> +    if ((dev_id < VIRTIO_PCI_ID_MIN) || (dev_id > VIRTIO_PCI_ID_MAX)) {
> +        virtio_e("Wrong virtio dev id %x", dev_id);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +virtio_device* create_virtio_device(pci::device *dev) {
> +    //TODO Read PCI device configuration to create instance
> +    // of virtio_modern_pci_device or virtio_legacy_pci_device
> +    return new virtio_legacy_pci_device(dev);
> +}
> +
> +}
> diff --git a/drivers/virtio-pci-device.hh b/drivers/virtio-pci-device.hh
> new file mode 100644
> index 00000000..76827fc8
> --- /dev/null
> +++ b/drivers/virtio-pci-device.hh
> @@ -0,0 +1,150 @@
> +/*
> + * Copyright (C) 2019 Waldemar Kozaczuk.
> + *
> + * This work is open source software, licensed under the terms of the
> + * BSD license as described in the LICENSE file in the top-level 
> directory.
> + */
> +
> +#ifndef VIRTIO_PCI_DEVICE_HH
> +#define VIRTIO_PCI_DEVICE_HH
> +
> +#include <osv/pci.hh>
> +#include <osv/interrupt.hh>
> +#include <osv/msi.hh>
> +
> +#include "virtio-device.hh"
> +
> +namespace virtio {
> +
> +enum VIRTIO_PCI_CONFIG {
> +    /* A 32-bit r/o bitmask of the features supported by the host */
> +    VIRTIO_PCI_HOST_FEATURES = 0,
> +    /* A 32-bit r/o bitmask of the features supported by the host */
> +    VIRTIO_PCI_HOST_FEATURES_HIGH = 1,
> +    /* A 32-bit r/w bitmask of features activated by the guest */
> +    VIRTIO_PCI_GUEST_FEATURES = 4,
> +    /* A 32-bit r/w PFN for the currently selected queue */
> +    VIRTIO_PCI_QUEUE_PFN = 8,
> +    /* A 16-bit r/o queue size for the currently selected queue */
> +    VIRTIO_PCI_QUEUE_NUM = 12,
> +    /* A 16-bit r/w queue selector */
> +    VIRTIO_PCI_QUEUE_SEL = 14,
> +    /* A 16-bit r/w queue notifier */
> +    VIRTIO_PCI_QUEUE_NOTIFY = 16,
> +    /* An 8-bit device status register.  */
> +    VIRTIO_PCI_STATUS = 18,
> +    /* An 8-bit r/o interrupt status register.  Reading the value will 
> return the
> +     * current contents of the ISR and will also clear it.  This is 
> effectively
> +     * a read-and-acknowledge. */
> +    VIRTIO_PCI_ISR = 19,
> +    /* The bit of the ISR which indicates a device configuration change. 
> */
> +    VIRTIO_PCI_ISR_CONFIG  = 0x2,
> +    /* MSI-X registers: only enabled if MSI-X is enabled. */
> +    /* A 16-bit vector for configuration changes. */
> +    VIRTIO_MSI_CONFIG_VECTOR = 20,
> +    /* A 16-bit vector for selected queue notifications. */
> +    VIRTIO_MSI_QUEUE_VECTOR = 22,
> +    /* Vector value used to disable MSI for queue */
> +    VIRTIO_MSI_NO_VECTOR = 0xffff,
> +    /* Virtio ABI version, this must match exactly */
> +    VIRTIO_PCI_ABI_VERSION = 0,
> +    /* How many bits to shift physical queue address written to QUEUE_PFN.
> +     * 12 is historical, and due to x86 page size. */
> +    VIRTIO_PCI_QUEUE_ADDR_SHIFT = 12,
> +    /* The alignment to use between consumer and producer parts of vring.
> +     * x86 pagesize again. */
> +    VIRTIO_PCI_VRING_ALIGN = 4096,
> +    VIRTIO_PCI_ID_MIN = 0x1000,
> +    VIRTIO_PCI_ID_MAX = 0x103f,
> +};
> +
> +class virtio_pci_device : public virtio_device {
> +public:
> +    explicit virtio_pci_device(pci::device *dev);
> +    ~virtio_pci_device();
> +
> +    virtual hw_device_id get_id() { return _dev->get_id(); }
> +    virtual void print() { _dev->print(); }
> +    virtual void reset() { _dev->reset(); }
> +
> +    bool is_attached()  { return _dev->is_attached(); }
> +    void set_attached() { _dev->set_attached(); }
> +
> +    size_t get_vring_alignment() { return VIRTIO_PCI_VRING_ALIGN; }
> +
> +protected:
> +    pci::device *_dev;
> +    interrupt_manager _msi;
> +    std::unique_ptr<pci_interrupt> _irq;
> +};
> +
> +class virtio_legacy_pci_device : public virtio_pci_device {
> +public:
> +    explicit virtio_legacy_pci_device(pci::device *dev);
> +    ~virtio_legacy_pci_device() {}
> +
> +    virtual void init();
> +    virtual void register_interrupt(interrupt_factory irq_factory);
> +    virtual void register_interrupt(unsigned int queue, 
> std::function<void(void)> handler);
> +
> +    virtual void select_queue(int queue);
> +    virtual u16 get_queue_size();
> +    virtual void setup_queue(int queue);
> +    virtual void activate_queue(vring *queue);
> +    virtual void activate_queue(u64 phys);
> +    virtual void kick_queue(int queue);
> +
> +    virtual u64 get_available_features();
> +    virtual bool get_available_feature_bit(int bit);
> +
> +    virtual void set_enabled_features(u64 features);
> +    virtual void set_enabled_feature_bit(int bit, bool on);
> +    virtual u64 get_enabled_features();
> +    virtual bool get_enabled_feature_bit(int bit);
> +
> +    virtual u8 get_status();
> +    virtual void set_status(u8 status);
> +
> +    virtual u8 read_config(u32 offset);
> +    virtual void write_config(u32 offset, u8 byte);
> +    virtual void dump_config();
> +    virtual u8 ack_irq();
> +
> +    // The remaining space is defined by each driver as the per-driver
> +    // configuration space
> +    virtual int config_offset() { return (_dev->is_msix_enabled())? 24 : 
> 20;}
> +
> +    virtual bool is_modern() { return false; };
> +private:
> +    bool parse_pci_config();
> +
> +    // Access the virtio conf address space set by pci bar 1
> +    bool get_virtio_config_bit(u32 offset, int bit);
> +    void set_virtio_config_bit(u32 offset, int bit, bool on);
> +
> +    // Access virtio config space
> +    u8 virtio_conf_readb(u32 offset) { return _bar1->readb(offset);};
> +    u16 virtio_conf_readw(u32 offset) { return _bar1->readw(offset);};
> +    u32 virtio_conf_readl(u32 offset) { return _bar1->readl(offset);};
> +    void virtio_conf_writeb(u32 offset, u8 val) { _bar1->writeb(offset, 
> val);};
> +    void virtio_conf_writew(u32 offset, u16 val) { _bar1->writew(offset, 
> val);};
> +    void virtio_conf_writel(u32 offset, u32 val) { _bar1->writel(offset, 
> val);};
> +
> +    pci::bar* _bar1;
> +};
> +
> +class virtio_modern_pci_device : public virtio_pci_device {
> +public:
> +    explicit virtio_modern_pci_device(pci::device *dev);
> +    ~virtio_modern_pci_device();
> +
> +    virtual bool is_modern() { return true; };
> +};
> +
> +// Creates instance of virtio_modern_pci_device or 
> virtio_legacy_pci_device
> +// by reading configuration from PCI device
> +virtio_device* create_virtio_device(pci::device *dev);
> +
> +}
> +
> +#endif //VIRTIO_PCI_DEVICE_HH
> diff --git a/drivers/virtio-rng.cc b/drivers/virtio-rng.cc
> index d506cb2a..1e13c3e4 100644
> --- a/drivers/virtio-rng.cc
> +++ b/drivers/virtio-rng.cc
> @@ -36,11 +36,19 @@ static int virtio_rng_read(void *buf, int size)
>  }
>
>  namespace virtio {
> -rng::rng(pci::device& pci_dev)
> -    : virtio_driver(pci_dev)
> -    , _irq(pci_dev, [&] { return ack_irq(); }, [&] { handle_irq(); })
> +rng::rng(virtio_device& dev)
> +    : virtio_driver(dev)
>      , _thread(sched::thread::make([&] { worker(); }, 
> sched::thread::attr().name("virtio-rng")))
>  {
> +    interrupt_factory int_factory;
> +    int_factory.create_pci_interrupt = [this](pci::device &pci_dev) {
> +        return new pci_interrupt(
> +            pci_dev,
> +            [=] { return this->ack_irq(); },
> +            [=] { this->handle_irq(); });
> +    };
> +    _dev.register_interrupt(int_factory);
> +
>      _queue = get_virt_queue(0);
>
>      add_dev_status(VIRTIO_CONFIG_S_DRIVER_OK);
> @@ -79,7 +87,7 @@ void rng::handle_irq()
>
>  bool rng::ack_irq()
>  {
> -    return virtio_conf_readb(VIRTIO_PCI_ISR);
> +    return _dev.ack_irq();
>  }
>
>  void rng::worker()
> diff --git a/drivers/virtio-rng.hh b/drivers/virtio-rng.hh
> index 1790665b..e704fea4 100644
> --- a/drivers/virtio-rng.hh
> +++ b/drivers/virtio-rng.hh
> @@ -26,7 +26,7 @@ public:
>          VIRTIO_RNG_DEVICE_ID = 0x1005,
>      };
>
> -    explicit rng(pci::device& dev);
> +    explicit rng(virtio_device& dev);
>      virtual ~rng();
>
>      virtual std::string get_name() const { return "virtio-rng"; }
> @@ -44,7 +44,6 @@ private:
>
>      static const size_t _pool_size = 64;
>      std::vector<char> _entropy;
> -    pci_interrupt _irq;
>      std::unique_ptr<sched::thread> _thread;
>      condvar _producer;
>      condvar _consumer;
> diff --git a/drivers/virtio-scsi.cc b/drivers/virtio-scsi.cc
> index f3f94bd8..b053ae7e 100644
> --- a/drivers/virtio-scsi.cc
> +++ b/drivers/virtio-scsi.cc
> @@ -12,7 +12,6 @@
>  #include <osv/mempool.hh>
>  #include <osv/sched.hh>
>  #include <osv/interrupt.hh>
> -#include "drivers/pci-device.hh"
>  #include "drivers/virtio.hh"
>  #include "drivers/virtio-scsi.hh"
>  #include "drivers/scsi-common.hh"
> @@ -129,7 +128,7 @@ void scsi::add_lun(u16 target, u16 lun)
>
>  bool scsi::ack_irq()
>  {
> -    auto isr = virtio_conf_readb(VIRTIO_PCI_ISR);
> +    auto isr = _dev.ack_irq();
>      auto queue = get_virt_queue(VIRTIO_SCSI_QUEUE_REQ);
>
>      if (isr) {
> @@ -141,7 +140,7 @@ bool scsi::ack_irq()
>
>  }
>
> -scsi::scsi(pci::device& dev)
> +scsi::scsi(virtio_device& dev)
>      : virtio_driver(dev)
>  {
>
> @@ -157,17 +156,22 @@ scsi::scsi(pci::device& dev)
>      t->start();
>      auto queue = get_virt_queue(VIRTIO_SCSI_QUEUE_REQ);
>
> -    if (dev.is_msix()) {
> -        _msi.easy_register({
> -                { VIRTIO_SCSI_QUEUE_CTRL, nullptr, nullptr },
> -                { VIRTIO_SCSI_QUEUE_EVT, nullptr, nullptr },
> -                { VIRTIO_SCSI_QUEUE_REQ, [=] { 
> queue->disable_interrupts(); }, t },
> +    interrupt_factory int_factory;
> +    int_factory.register_msi_bindings = [queue,t](interrupt_manager &msi) 
> {
> +        msi.easy_register({
> +          { VIRTIO_SCSI_QUEUE_CTRL, nullptr, nullptr },
> +          { VIRTIO_SCSI_QUEUE_EVT, nullptr, nullptr },
> +          { VIRTIO_SCSI_QUEUE_REQ, [=] { queue->disable_interrupts(); }, 
> t },
>          });
> -    } else {
> -        _irq.reset(new pci_interrupt(dev,
> -                                     [=] { return this->ack_irq(); },
> -                                     [=] { t->wake(); }));
> -    }
> +    };
> +
> +    int_factory.create_pci_interrupt = [this,t](pci::device &pci_dev) {
> +        return new pci_interrupt(
> +                pci_dev,
> +                [=] { return this->ack_irq(); },
> +                [=] { t->wake(); });
> +    };
> +    _dev.register_interrupt(int_factory);
>
>      // Enable indirect descriptor
>      queue->set_use_indirect(true);
> @@ -184,7 +188,7 @@ scsi::~scsi()
>
>  void scsi::read_config()
>  {
> -    virtio_conf_read(virtio_pci_config_offset(), &_config, 
> sizeof(_config));
> +    virtio_conf_read(_dev.config_offset(), &_config, sizeof(_config));
>      config.max_lun = _config.max_lun;
>      config.max_target = _config.max_target;
>  }
> diff --git a/drivers/virtio-scsi.hh b/drivers/virtio-scsi.hh
> index 205ccb40..30c3d9e0 100644
> --- a/drivers/virtio-scsi.hh
> +++ b/drivers/virtio-scsi.hh
> @@ -8,7 +8,6 @@
>  #ifndef VIRTIO_SCSI_DRIVER_H
>  #define VIRTIO_SCSI_DRIVER_H
>  #include "drivers/virtio.hh"
> -#include "drivers/pci-device.hh"
>  #include "drivers/scsi-common.hh"
>  #include <osv/bio.h>
>  #include <osv/types.h>
> @@ -145,7 +144,7 @@ public:
>      };
>
>
> -    scsi(pci::device& dev);
> +    scsi(virtio_device& dev);
>      ~scsi();
>
>      virtual std::string get_name() const { return _driver_name; }
> @@ -174,8 +173,6 @@ private:
>      std::string _driver_name;
>      scsi_config _config;
>
> -    std::unique_ptr<pci_interrupt> _irq;
> -
>      //maintains the virtio instance number for multiple drives
>      static int _instance;
>      int _id;
> diff --git a/drivers/virtio-vring.cc b/drivers/virtio-vring.cc
> index 5f89fb5d..3f0e7e25 100644
> --- a/drivers/virtio-vring.cc
> +++ b/drivers/virtio-vring.cc
> @@ -38,12 +38,13 @@ TRACEPOINT(trace_vring_get_buf_ret, "vring=%p 
> _avail_count %d", void*, int);
>
>  namespace virtio {
>
> -    vring::vring(virtio_driver* const dev, u16 num, u16 q_index)
> +    vring::vring(virtio_driver* const driver, u16 num, u16 q_index)
>      {
> -        _dev = dev;
> +        _driver = driver;
>          _q_index = q_index;
>          // Alloc enough pages for the vring...
> -        unsigned sz = VIRTIO_ALIGN(vring::get_size(num, 
> VIRTIO_PCI_VRING_ALIGN));
> +        size_t alignment = driver->get_vring_alignment();
> +        size_t sz = VIRTIO_ALIGN(vring::get_size(num, alignment), 
> alignment);
>          _vring_ptr = memory::alloc_phys_contiguous_aligned(sz, 4096);
>          memset(_vring_ptr, 0, sz);
>
> @@ -53,7 +54,7 @@ namespace virtio {
>          _desc = (vring_desc*)_vring_ptr;
>          _avail = (vring_avail*)(_vring_ptr + num * sizeof(vring_desc));
>          _used = (vring_used*)(((unsigned long)&_avail->_ring[num] +
> -                sizeof(u16) + VIRTIO_PCI_VRING_ALIGN - 1) & 
> ~(VIRTIO_PCI_VRING_ALIGN - 1));
> +                sizeof(u16) + alignment - 1) & ~(alignment - 1));
>
>          // initialize the next pointer within the available ring
>          for (int i = 0; i < num; i++) _desc[i]._next = i + 1;
> @@ -102,7 +103,7 @@ namespace virtio {
>      inline bool vring::use_indirect(int desc_needed)
>      {
>          return _use_indirect &&
> -               _dev->get_indirect_buf_cap() &&
> +               _driver->get_indirect_buf_cap() &&
>                 // don't let the posting fail due to low available buffers 
> number
>                 (desc_needed > _avail_count ||
>                 // no need to use indirect for a single descriptor
> @@ -280,7 +281,7 @@ namespace virtio {
>
> ...

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to