Re: [PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID
On Mon, Nov 13, 2017 at 11:31:31AM -0800, Matt Wilson wrote: > From: Matt Wilson <m...@amazon.com> > > This device will be used in future Amazon EC2 instances as the primary > serial port (i.e., data sent to this port will be available via the > GetConsoleOuput [1] EC2 API). Ping? --msw > [1] > http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html > > Cc: sta...@vger.kernel.org > Signed-off-by: Matt Wilson <m...@amazon.com> > --- > drivers/tty/serial/8250/8250_pci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/tty/serial/8250/8250_pci.c > b/drivers/tty/serial/8250/8250_pci.c > index 0c101a7..d4e7be8 100644 > --- a/drivers/tty/serial/8250/8250_pci.c > +++ b/drivers/tty/serial/8250/8250_pci.c > @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { > { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, > { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, > > + /* Amazon PCI serial device */ > + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, > + > /* >* These entries match devices with class COMMUNICATION_SERIAL, >* COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
Re: [PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID
On Mon, Nov 13, 2017 at 11:31:31AM -0800, Matt Wilson wrote: > From: Matt Wilson > > This device will be used in future Amazon EC2 instances as the primary > serial port (i.e., data sent to this port will be available via the > GetConsoleOuput [1] EC2 API). Ping? --msw > [1] > http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html > > Cc: sta...@vger.kernel.org > Signed-off-by: Matt Wilson > --- > drivers/tty/serial/8250/8250_pci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/tty/serial/8250/8250_pci.c > b/drivers/tty/serial/8250/8250_pci.c > index 0c101a7..d4e7be8 100644 > --- a/drivers/tty/serial/8250/8250_pci.c > +++ b/drivers/tty/serial/8250/8250_pci.c > @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { > { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, > { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, > > + /* Amazon PCI serial device */ > + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, > + > /* >* These entries match devices with class COMMUNICATION_SERIAL, >* COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
[PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID
From: Matt Wilson <m...@amazon.com> This device will be used in future Amazon EC2 instances as the primary serial port (i.e., data sent to this port will be available via the GetConsoleOuput [1] EC2 API). [1] http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html Cc: sta...@vger.kernel.org Signed-off-by: Matt Wilson <m...@amazon.com> --- drivers/tty/serial/8250/8250_pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 0c101a7..d4e7be8 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, + /* Amazon PCI serial device */ + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, + /* * These entries match devices with class COMMUNICATION_SERIAL, * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL -- 2.7.4
[PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID
From: Matt Wilson This device will be used in future Amazon EC2 instances as the primary serial port (i.e., data sent to this port will be available via the GetConsoleOuput [1] EC2 API). [1] http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html Cc: sta...@vger.kernel.org Signed-off-by: Matt Wilson --- drivers/tty/serial/8250/8250_pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 0c101a7..d4e7be8 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, + /* Amazon PCI serial device */ + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, + /* * These entries match devices with class COMMUNICATION_SERIAL, * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL -- 2.7.4
[PATCH] serial: 8250_pci: Add Amazon PCI serial device ID
From: Matt Wilson <m...@amazon.com> Cc: sta...@vger.kernel.org Signed-off-by: Matt Wilson <m...@amazon.com> --- drivers/tty/serial/8250/8250_pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 0c101a7..d4e7be8 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, + /* Amazon PCI serial device */ + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, + /* * These entries match devices with class COMMUNICATION_SERIAL, * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL -- 2.7.4
[PATCH] serial: 8250_pci: Add Amazon PCI serial device ID
From: Matt Wilson Cc: sta...@vger.kernel.org Signed-off-by: Matt Wilson --- drivers/tty/serial/8250/8250_pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 0c101a7..d4e7be8 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 }, { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 }, + /* Amazon PCI serial device */ + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, + /* * These entries match devices with class COMMUNICATION_SERIAL, * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL -- 2.7.4
Re: [PATCH V2 net 13/20] net/ena: change driver's default timeouts
On Sun, Dec 04, 2016 at 03:19:31PM +0200, Netanel Belgazal wrote: ... because? (they turned out to be too aggressive, I believe.) > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_com.c| 4 ++-- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 7 --- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 4147d6e..a550c8a 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -36,9 +36,9 @@ > > /*/ > > /* Timeout in micro-sec */ > -#define ADMIN_CMD_TIMEOUT_US (100) > +#define ADMIN_CMD_TIMEOUT_US (300) > > -#define ENA_ASYNC_QUEUE_DEPTH 4 > +#define ENA_ASYNC_QUEUE_DEPTH 16 Why is this changed at the same time? > #define ENA_ADMIN_QUEUE_DEPTH 32 > > #define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \ > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h > b/drivers/net/ethernet/amazon/ena/ena_netdev.h > index c081fd3..ed42e07 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h > @@ -39,6 +39,7 @@ > #include > #include > #include > +#include This change seems unrelated. > #include "ena_com.h" > #include "ena_eth_com.h" > @@ -100,7 +101,7 @@ > /* Number of queues to check for missing queues per timer service */ > #define ENA_MONITORED_TX_QUEUES 4 > /* Max timeout packets before device reset */ > -#define MAX_NUM_OF_TIMEOUTED_PACKETS 32 > +#define MAX_NUM_OF_TIMEOUTED_PACKETS 128 > > #define ENA_TX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - > 1)) > > @@ -116,9 +117,9 @@ > #define ENA_IO_IRQ_IDX(q)(ENA_IO_IRQ_FIRST_IDX + (q)) > > /* ENA device should send keep alive msg every 1 sec. > - * We wait for 3 sec just to be on the safe side. > + * We wait for 6 sec just to be on the safe side. > */ > -#define ENA_DEVICE_KALIVE_TIMEOUT(3 * HZ) > +#define ENA_DEVICE_KALIVE_TIMEOUT(6 * HZ) > > #define ENA_MMIO_DISABLE_REG_READBIT(0) >
Re: [PATCH V2 net 13/20] net/ena: change driver's default timeouts
On Sun, Dec 04, 2016 at 03:19:31PM +0200, Netanel Belgazal wrote: ... because? (they turned out to be too aggressive, I believe.) > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_com.c| 4 ++-- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 7 --- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 4147d6e..a550c8a 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -36,9 +36,9 @@ > > /*/ > > /* Timeout in micro-sec */ > -#define ADMIN_CMD_TIMEOUT_US (100) > +#define ADMIN_CMD_TIMEOUT_US (300) > > -#define ENA_ASYNC_QUEUE_DEPTH 4 > +#define ENA_ASYNC_QUEUE_DEPTH 16 Why is this changed at the same time? > #define ENA_ADMIN_QUEUE_DEPTH 32 > > #define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \ > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h > b/drivers/net/ethernet/amazon/ena/ena_netdev.h > index c081fd3..ed42e07 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h > @@ -39,6 +39,7 @@ > #include > #include > #include > +#include This change seems unrelated. > #include "ena_com.h" > #include "ena_eth_com.h" > @@ -100,7 +101,7 @@ > /* Number of queues to check for missing queues per timer service */ > #define ENA_MONITORED_TX_QUEUES 4 > /* Max timeout packets before device reset */ > -#define MAX_NUM_OF_TIMEOUTED_PACKETS 32 > +#define MAX_NUM_OF_TIMEOUTED_PACKETS 128 > > #define ENA_TX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - > 1)) > > @@ -116,9 +117,9 @@ > #define ENA_IO_IRQ_IDX(q)(ENA_IO_IRQ_FIRST_IDX + (q)) > > /* ENA device should send keep alive msg every 1 sec. > - * We wait for 3 sec just to be on the safe side. > + * We wait for 6 sec just to be on the safe side. > */ > -#define ENA_DEVICE_KALIVE_TIMEOUT(3 * HZ) > +#define ENA_DEVICE_KALIVE_TIMEOUT(6 * HZ) > > #define ENA_MMIO_DISABLE_REG_READBIT(0) >
Re: [PATCH V2 net 08/20] net/ena: add hardware hints capability to the driver
On Sun, Dec 04, 2016 at 03:19:26PM +0200, Netanel Belgazal wrote: > The ENA device can update the ena driver about the desire timeouts. > The hardware hints are transmitted as Asynchronous event to the driver. This is really a new feature, not a bugfix - correct? If it is a new feature, submit it separately. If the built-in defaults need to be changed, submit that as a bugfix. --msw > In case the device does not support this capability, the driver > will use its own defines. > > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 31 + > drivers/net/ethernet/amazon/ena/ena_com.c| 41 --- > drivers/net/ethernet/amazon/ena/ena_com.h| 5 ++ > drivers/net/ethernet/amazon/ena/ena_ethtool.c| 1 - > drivers/net/ethernet/amazon/ena/ena_netdev.c | 86 > +++- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 19 +- > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 2 + > 7 files changed, 157 insertions(+), 28 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index 6d70bf5..35ae511 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -70,6 +70,8 @@ enum ena_admin_aq_feature_id { > > ENA_ADMIN_MAX_QUEUES_NUM= 2, > > + ENA_ADMIN_HW_HINTS = 3, > + > ENA_ADMIN_RSS_HASH_FUNCTION = 10, > > ENA_ADMIN_STATELESS_OFFLOAD_CONFIG = 11, > @@ -749,6 +751,31 @@ struct ena_admin_feature_rss_ind_table { > struct ena_admin_rss_ind_table_entry inline_entry; > }; > > +/* When hint value is 0, driver should use it's own predefined value */ > +struct ena_admin_ena_hw_hints { > + /* value in ms */ > + u16 mmio_read_timeout; > + > + /* value in ms */ > + u16 driver_watchdog_timeout; > + > + /* Per packet tx completion timeout. value in ms */ > + u16 missing_tx_completion_timeout; > + > + u16 missed_tx_completion_count_threshold_to_reset; > + > + /* value in ms */ > + u16 admin_completion_tx_timeout; > + > + u16 netdev_wd_timeout; > + > + u16 max_tx_sgl_size; > + > + u16 max_rx_sgl_size; > + > + u16 reserved[8]; > +}; > + > struct ena_admin_get_feat_cmd { > struct ena_admin_aq_common_desc aq_common_descriptor; > > @@ -782,6 +809,8 @@ struct ena_admin_get_feat_resp { > struct ena_admin_feature_rss_ind_table ind_table; > > struct ena_admin_feature_intr_moder_desc intr_moderation; > + > + struct ena_admin_ena_hw_hints hw_hints; > } u; > }; > > @@ -857,6 +886,8 @@ enum ena_admin_aenq_notification_syndrom { > ENA_ADMIN_SUSPEND = 0, > > ENA_ADMIN_RESUME= 1, > + > + ENA_ADMIN_UPDATE_HINTS = 2, > }; > > struct ena_admin_aenq_entry { > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 46aad3a..f1e4f04 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -508,15 +508,13 @@ static int ena_com_comp_status_to_errno(u8 comp_status) > static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx > *comp_ctx, >struct ena_com_admin_queue > *admin_queue) > { > - unsigned long flags; > - u32 start_time; > + unsigned long flags, timeout; > int ret; > > - start_time = ((u32)jiffies_to_usecs(jiffies)); > + timeout = jiffies + usecs_to_jiffies(admin_queue->completion_timeout); > > while (comp_ctx->status == ENA_CMD_SUBMITTED) { > - if u32)jiffies_to_usecs(jiffies)) - start_time) > > - ADMIN_CMD_TIMEOUT_US) { > + if (time_is_before_jiffies(timeout)) { > pr_err("Wait for completion (polling) timeout\n"); > /* ENA didn't have any completion */ > spin_lock_irqsave(_queue->q_lock, flags); > @@ -560,7 +558,8 @@ static int > ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com > int ret; > > wait_for_completion_timeout(_ctx->wait_event, > - usecs_to_jiffies(ADMIN_CMD_TIMEOUT_US)); > + usecs_to_jiffies( > + admin_queue->completion_timeout)); > > /* In case the command wasn't completed find out the root cause. >* There might be 2 kinds of errors > @@ -600,12 +599,14 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev > *ena_dev, u16 offset) > struct ena_com_mmio_read *mmio_read = _dev->mmio_read; > volatile struct ena_admin_ena_mmio_req_read_less_resp *read_resp = > mmio_read->read_resp; > - u32 mmio_read_reg, ret; > + u32
Re: [PATCH V2 net 08/20] net/ena: add hardware hints capability to the driver
On Sun, Dec 04, 2016 at 03:19:26PM +0200, Netanel Belgazal wrote: > The ENA device can update the ena driver about the desire timeouts. > The hardware hints are transmitted as Asynchronous event to the driver. This is really a new feature, not a bugfix - correct? If it is a new feature, submit it separately. If the built-in defaults need to be changed, submit that as a bugfix. --msw > In case the device does not support this capability, the driver > will use its own defines. > > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 31 + > drivers/net/ethernet/amazon/ena/ena_com.c| 41 --- > drivers/net/ethernet/amazon/ena/ena_com.h| 5 ++ > drivers/net/ethernet/amazon/ena/ena_ethtool.c| 1 - > drivers/net/ethernet/amazon/ena/ena_netdev.c | 86 > +++- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 19 +- > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 2 + > 7 files changed, 157 insertions(+), 28 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index 6d70bf5..35ae511 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -70,6 +70,8 @@ enum ena_admin_aq_feature_id { > > ENA_ADMIN_MAX_QUEUES_NUM= 2, > > + ENA_ADMIN_HW_HINTS = 3, > + > ENA_ADMIN_RSS_HASH_FUNCTION = 10, > > ENA_ADMIN_STATELESS_OFFLOAD_CONFIG = 11, > @@ -749,6 +751,31 @@ struct ena_admin_feature_rss_ind_table { > struct ena_admin_rss_ind_table_entry inline_entry; > }; > > +/* When hint value is 0, driver should use it's own predefined value */ > +struct ena_admin_ena_hw_hints { > + /* value in ms */ > + u16 mmio_read_timeout; > + > + /* value in ms */ > + u16 driver_watchdog_timeout; > + > + /* Per packet tx completion timeout. value in ms */ > + u16 missing_tx_completion_timeout; > + > + u16 missed_tx_completion_count_threshold_to_reset; > + > + /* value in ms */ > + u16 admin_completion_tx_timeout; > + > + u16 netdev_wd_timeout; > + > + u16 max_tx_sgl_size; > + > + u16 max_rx_sgl_size; > + > + u16 reserved[8]; > +}; > + > struct ena_admin_get_feat_cmd { > struct ena_admin_aq_common_desc aq_common_descriptor; > > @@ -782,6 +809,8 @@ struct ena_admin_get_feat_resp { > struct ena_admin_feature_rss_ind_table ind_table; > > struct ena_admin_feature_intr_moder_desc intr_moderation; > + > + struct ena_admin_ena_hw_hints hw_hints; > } u; > }; > > @@ -857,6 +886,8 @@ enum ena_admin_aenq_notification_syndrom { > ENA_ADMIN_SUSPEND = 0, > > ENA_ADMIN_RESUME= 1, > + > + ENA_ADMIN_UPDATE_HINTS = 2, > }; > > struct ena_admin_aenq_entry { > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 46aad3a..f1e4f04 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -508,15 +508,13 @@ static int ena_com_comp_status_to_errno(u8 comp_status) > static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx > *comp_ctx, >struct ena_com_admin_queue > *admin_queue) > { > - unsigned long flags; > - u32 start_time; > + unsigned long flags, timeout; > int ret; > > - start_time = ((u32)jiffies_to_usecs(jiffies)); > + timeout = jiffies + usecs_to_jiffies(admin_queue->completion_timeout); > > while (comp_ctx->status == ENA_CMD_SUBMITTED) { > - if u32)jiffies_to_usecs(jiffies)) - start_time) > > - ADMIN_CMD_TIMEOUT_US) { > + if (time_is_before_jiffies(timeout)) { > pr_err("Wait for completion (polling) timeout\n"); > /* ENA didn't have any completion */ > spin_lock_irqsave(_queue->q_lock, flags); > @@ -560,7 +558,8 @@ static int > ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com > int ret; > > wait_for_completion_timeout(_ctx->wait_event, > - usecs_to_jiffies(ADMIN_CMD_TIMEOUT_US)); > + usecs_to_jiffies( > + admin_queue->completion_timeout)); > > /* In case the command wasn't completed find out the root cause. >* There might be 2 kinds of errors > @@ -600,12 +599,14 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev > *ena_dev, u16 offset) > struct ena_com_mmio_read *mmio_read = _dev->mmio_read; > volatile struct ena_admin_ena_mmio_req_read_less_resp *read_resp = > mmio_read->read_resp; > - u32 mmio_read_reg, ret; > + u32 mmio_read_reg, timeout, ret;
Re: [PATCH V2 net 06/20] net/ena: fix NULL dereference when removing the driver after device reset faild
On Sun, Dec 04, 2016 at 03:19:24PM +0200, Netanel Belgazal wrote: > If for some reason the device stop responding and the device reset failed > to recover the device, the mmio register read datastructure will not be > reinitialized. If for some reason the device stops responding, and the device reset fails to recover the device, the MMIO register read data structure will not be reinitialized. > On driver removal, the driver will also tries to reset the device > but this time the mmio data structure will be NULL. On driver removal, the driver will also try to reset the device, but this time the MMIO data structure will be NULL. > To solve this issue perform the device reset in the remove function only if > the device is runnig. To solve this issue, perform the device reset in the remove function only if the device is running. Do you have an example of the NULL pointer dereference that you can paste in? It can be helpful for those searching for a fix for a bug they've experienced. --msw > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 224302c..ad5f78f 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2516,6 +2516,8 @@ static void ena_fw_reset_device(struct work_struct > *work) > err: > rtnl_unlock(); > > + clear_bit(ENA_FLAG_DEVICE_RUNNING, >flags); > + > dev_err(>dev, > "Reset attempt failed. Can not reset the device\n"); > } > @@ -3126,7 +3128,9 @@ static void ena_remove(struct pci_dev *pdev) > > cancel_work_sync(>resume_io_task); > > - ena_com_dev_reset(ena_dev); > + /* Reset the device only if the device is running. */ > + if (test_bit(ENA_FLAG_DEVICE_RUNNING, >flags)) > + ena_com_dev_reset(ena_dev); > > ena_free_mgmnt_irq(adapter); >
Re: [PATCH V2 net 06/20] net/ena: fix NULL dereference when removing the driver after device reset faild
On Sun, Dec 04, 2016 at 03:19:24PM +0200, Netanel Belgazal wrote: > If for some reason the device stop responding and the device reset failed > to recover the device, the mmio register read datastructure will not be > reinitialized. If for some reason the device stops responding, and the device reset fails to recover the device, the MMIO register read data structure will not be reinitialized. > On driver removal, the driver will also tries to reset the device > but this time the mmio data structure will be NULL. On driver removal, the driver will also try to reset the device, but this time the MMIO data structure will be NULL. > To solve this issue perform the device reset in the remove function only if > the device is runnig. To solve this issue, perform the device reset in the remove function only if the device is running. Do you have an example of the NULL pointer dereference that you can paste in? It can be helpful for those searching for a fix for a bug they've experienced. --msw > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 224302c..ad5f78f 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2516,6 +2516,8 @@ static void ena_fw_reset_device(struct work_struct > *work) > err: > rtnl_unlock(); > > + clear_bit(ENA_FLAG_DEVICE_RUNNING, >flags); > + > dev_err(>dev, > "Reset attempt failed. Can not reset the device\n"); > } > @@ -3126,7 +3128,9 @@ static void ena_remove(struct pci_dev *pdev) > > cancel_work_sync(>resume_io_task); > > - ena_com_dev_reset(ena_dev); > + /* Reset the device only if the device is running. */ > + if (test_bit(ENA_FLAG_DEVICE_RUNNING, >flags)) > + ena_com_dev_reset(ena_dev); > > ena_free_mgmnt_irq(adapter); >
Re: [PATCH V2 net 07/20] net/ena: refactor ena_get_stats64 to be atomic context safe
On Sun, Dec 04, 2016 at 03:19:25PM +0200, Netanel Belgazal wrote: > ndo_get_stat64 can be called from atomic context. > However the current implementation sends an admin command to retrieve > the statistics from the device. > This admin commands uses sleep. Suggest some comment edits: ndo_get_stat64() can be called from atomic context, but the current implementation sends an admin command to retrieve the statistics from the device. This admin command can sleep. > Refactor the implementation of ena_get_stats64 to take the > {rx,tx}bytes/cnt from the driver's inner counters > and to take the rx drops counter > from the asynchronous keep alive (heart bit) event. This patch re-factors the implementation of ena_get_stats64() to use the {rx,tx}bytes/count from the driver's inner counters, and to obtain the rx drop counter from the asynchronous keep alive (heart bit) event. --msw > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 8 > drivers/net/ethernet/amazon/ena/ena_netdev.c | 57 > +--- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 1 + > 3 files changed, 51 insertions(+), 15 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index f48c886..6d70bf5 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -873,6 +873,14 @@ struct ena_admin_aenq_link_change_desc { > u32 flags; > }; > > +struct ena_admin_aenq_keep_alive_desc { > + struct ena_admin_aenq_common_desc aenq_common_desc; > + > + u32 rx_drops_low; > + > + u32 rx_drops_high; > +}; > + > struct ena_admin_ena_mmio_req_read_less_resp { > u16 req_id; > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index ad5f78f..962ffb5 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2176,28 +2176,46 @@ static struct rtnl_link_stats64 > *ena_get_stats64(struct net_device *netdev, >struct rtnl_link_stats64 > *stats) > { > struct ena_adapter *adapter = netdev_priv(netdev); > - struct ena_admin_basic_stats ena_stats; > - int rc; > + struct ena_ring *rx_ring, *tx_ring; > + unsigned int start; > + u64 rx_drops; > + int i; > > if (!test_bit(ENA_FLAG_DEV_UP, >flags)) > return NULL; > > - rc = ena_com_get_dev_basic_stats(adapter->ena_dev, _stats); > - if (rc) > - return NULL; > + for (i = 0; i < adapter->num_queues; i++) { > + u64 bytes, packets; > + > + tx_ring = >tx_ring[i]; > + > + do { > + start = u64_stats_fetch_begin_irq(_ring->syncp); > + packets = tx_ring->tx_stats.cnt; > + bytes = tx_ring->tx_stats.bytes; > + } while (u64_stats_fetch_retry_irq(_ring->syncp, start)); > + > + stats->tx_packets += packets; > + stats->tx_bytes += bytes; > > - stats->tx_bytes = ((u64)ena_stats.tx_bytes_high << 32) | > - ena_stats.tx_bytes_low; > - stats->rx_bytes = ((u64)ena_stats.rx_bytes_high << 32) | > - ena_stats.rx_bytes_low; > + rx_ring = >rx_ring[i]; > + > + do { > + start = u64_stats_fetch_begin_irq(_ring->syncp); > + packets = rx_ring->rx_stats.cnt; > + bytes = rx_ring->rx_stats.bytes; > + } while (u64_stats_fetch_retry_irq(_ring->syncp, start)); > > - stats->rx_packets = ((u64)ena_stats.rx_pkts_high << 32) | > - ena_stats.rx_pkts_low; > - stats->tx_packets = ((u64)ena_stats.tx_pkts_high << 32) | > - ena_stats.tx_pkts_low; > + stats->rx_packets += packets; > + stats->rx_bytes += bytes; > + } > + > + do { > + start = u64_stats_fetch_begin_irq(>syncp); > + rx_drops = adapter->dev_stats.rx_drops; > + } while (u64_stats_fetch_retry_irq(>syncp, start)); > > - stats->rx_dropped = ((u64)ena_stats.rx_drops_high << 32) | > - ena_stats.rx_drops_low; > + stats->rx_dropped = rx_drops; > > stats->multicast = 0; > stats->collisions = 0; > @@ -3221,8 +3239,17 @@ static void ena_keep_alive_wd(void *adapter_data, > struct ena_admin_aenq_entry *aenq_e) > { > struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; > + struct ena_admin_aenq_keep_alive_desc *desc; > + u64 rx_drops; > > + desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e; > adapter->last_keep_alive_jiffies = jiffies; > + > + rx_drops = ((u64)desc->rx_drops_high << 32) | desc->rx_drops_low; > + > +
Re: [PATCH V2 net 07/20] net/ena: refactor ena_get_stats64 to be atomic context safe
On Sun, Dec 04, 2016 at 03:19:25PM +0200, Netanel Belgazal wrote: > ndo_get_stat64 can be called from atomic context. > However the current implementation sends an admin command to retrieve > the statistics from the device. > This admin commands uses sleep. Suggest some comment edits: ndo_get_stat64() can be called from atomic context, but the current implementation sends an admin command to retrieve the statistics from the device. This admin command can sleep. > Refactor the implementation of ena_get_stats64 to take the > {rx,tx}bytes/cnt from the driver's inner counters > and to take the rx drops counter > from the asynchronous keep alive (heart bit) event. This patch re-factors the implementation of ena_get_stats64() to use the {rx,tx}bytes/count from the driver's inner counters, and to obtain the rx drop counter from the asynchronous keep alive (heart bit) event. --msw > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 8 > drivers/net/ethernet/amazon/ena/ena_netdev.c | 57 > +--- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 1 + > 3 files changed, 51 insertions(+), 15 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index f48c886..6d70bf5 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -873,6 +873,14 @@ struct ena_admin_aenq_link_change_desc { > u32 flags; > }; > > +struct ena_admin_aenq_keep_alive_desc { > + struct ena_admin_aenq_common_desc aenq_common_desc; > + > + u32 rx_drops_low; > + > + u32 rx_drops_high; > +}; > + > struct ena_admin_ena_mmio_req_read_less_resp { > u16 req_id; > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index ad5f78f..962ffb5 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2176,28 +2176,46 @@ static struct rtnl_link_stats64 > *ena_get_stats64(struct net_device *netdev, >struct rtnl_link_stats64 > *stats) > { > struct ena_adapter *adapter = netdev_priv(netdev); > - struct ena_admin_basic_stats ena_stats; > - int rc; > + struct ena_ring *rx_ring, *tx_ring; > + unsigned int start; > + u64 rx_drops; > + int i; > > if (!test_bit(ENA_FLAG_DEV_UP, >flags)) > return NULL; > > - rc = ena_com_get_dev_basic_stats(adapter->ena_dev, _stats); > - if (rc) > - return NULL; > + for (i = 0; i < adapter->num_queues; i++) { > + u64 bytes, packets; > + > + tx_ring = >tx_ring[i]; > + > + do { > + start = u64_stats_fetch_begin_irq(_ring->syncp); > + packets = tx_ring->tx_stats.cnt; > + bytes = tx_ring->tx_stats.bytes; > + } while (u64_stats_fetch_retry_irq(_ring->syncp, start)); > + > + stats->tx_packets += packets; > + stats->tx_bytes += bytes; > > - stats->tx_bytes = ((u64)ena_stats.tx_bytes_high << 32) | > - ena_stats.tx_bytes_low; > - stats->rx_bytes = ((u64)ena_stats.rx_bytes_high << 32) | > - ena_stats.rx_bytes_low; > + rx_ring = >rx_ring[i]; > + > + do { > + start = u64_stats_fetch_begin_irq(_ring->syncp); > + packets = rx_ring->rx_stats.cnt; > + bytes = rx_ring->rx_stats.bytes; > + } while (u64_stats_fetch_retry_irq(_ring->syncp, start)); > > - stats->rx_packets = ((u64)ena_stats.rx_pkts_high << 32) | > - ena_stats.rx_pkts_low; > - stats->tx_packets = ((u64)ena_stats.tx_pkts_high << 32) | > - ena_stats.tx_pkts_low; > + stats->rx_packets += packets; > + stats->rx_bytes += bytes; > + } > + > + do { > + start = u64_stats_fetch_begin_irq(>syncp); > + rx_drops = adapter->dev_stats.rx_drops; > + } while (u64_stats_fetch_retry_irq(>syncp, start)); > > - stats->rx_dropped = ((u64)ena_stats.rx_drops_high << 32) | > - ena_stats.rx_drops_low; > + stats->rx_dropped = rx_drops; > > stats->multicast = 0; > stats->collisions = 0; > @@ -3221,8 +3239,17 @@ static void ena_keep_alive_wd(void *adapter_data, > struct ena_admin_aenq_entry *aenq_e) > { > struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; > + struct ena_admin_aenq_keep_alive_desc *desc; > + u64 rx_drops; > > + desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e; > adapter->last_keep_alive_jiffies = jiffies; > + > + rx_drops = ((u64)desc->rx_drops_high << 32) | desc->rx_drops_low; > + > + u64_stats_update_begin(>syncp); > +
Re: [PATCH V2 net 05/20] net/ena: fix RSS default hash configuration
On Sun, Dec 04, 2016 at 03:19:23PM +0200, Netanel Belgazal wrote: > ENA default hash configure IPv4_frag hash twice instead of configure -> configures. You may want to include "erroneously". What is the consequence of this bug? > configure non ip packets. configuring non-IP packets. --msw > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_com.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 3066d9c..46aad3a 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -2184,7 +2184,7 @@ int ena_com_set_default_hash_ctrl(struct ena_com_dev > *ena_dev) > hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields = > ENA_ADMIN_RSS_L3_SA | ENA_ADMIN_RSS_L3_DA; > > - hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields = > + hash_ctrl->selected_fields[ENA_ADMIN_RSS_NOT_IP].fields = > ENA_ADMIN_RSS_L2_DA | ENA_ADMIN_RSS_L2_SA; > > for (i = 0; i < ENA_ADMIN_RSS_PROTO_NUM; i++) {
Re: [PATCH V2 net 05/20] net/ena: fix RSS default hash configuration
On Sun, Dec 04, 2016 at 03:19:23PM +0200, Netanel Belgazal wrote: > ENA default hash configure IPv4_frag hash twice instead of configure -> configures. You may want to include "erroneously". What is the consequence of this bug? > configure non ip packets. configuring non-IP packets. --msw > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_com.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > b/drivers/net/ethernet/amazon/ena/ena_com.c > index 3066d9c..46aad3a 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_com.c > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > @@ -2184,7 +2184,7 @@ int ena_com_set_default_hash_ctrl(struct ena_com_dev > *ena_dev) > hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields = > ENA_ADMIN_RSS_L3_SA | ENA_ADMIN_RSS_L3_DA; > > - hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields = > + hash_ctrl->selected_fields[ENA_ADMIN_RSS_NOT_IP].fields = > ENA_ADMIN_RSS_L2_DA | ENA_ADMIN_RSS_L2_SA; > > for (i = 0; i < ENA_ADMIN_RSS_PROTO_NUM; i++) {
Re: [PATCH V2 net 04/20] net/ena: fix ethtool RSS flow configuration
On Sun, Dec 04, 2016 at 03:19:22PM +0200, Netanel Belgazal wrote: > ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type > treat the ena_flow_hash_to_flow_type enum as power of two values. > > Change the values of ena_admin_flow_hash_fields to be power of two values. Then I generally prefer BIT(0), BIT(1), BIT(2), etc. Also it would be helpful to include some comments about the consequences of the current state of the code. --msw > Signed-off-by: Netanel Belgazal> --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index a46e749..f48c886 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -631,22 +631,22 @@ enum ena_admin_flow_hash_proto { > /* RSS flow hash fields */ > enum ena_admin_flow_hash_fields { > /* Ethernet Dest Addr */ > - ENA_ADMIN_RSS_L2_DA = 0, > + ENA_ADMIN_RSS_L2_DA = 0x1, > > /* Ethernet Src Addr */ > - ENA_ADMIN_RSS_L2_SA = 1, > + ENA_ADMIN_RSS_L2_SA = 0x2, > > /* ipv4/6 Dest Addr */ > - ENA_ADMIN_RSS_L3_DA = 2, > + ENA_ADMIN_RSS_L3_DA = 0x4, > > /* ipv4/6 Src Addr */ > - ENA_ADMIN_RSS_L3_SA = 5, > + ENA_ADMIN_RSS_L3_SA = 0x8, > > /* tcp/udp Dest Port */ > - ENA_ADMIN_RSS_L4_DP = 6, > + ENA_ADMIN_RSS_L4_DP = 0x10, > > /* tcp/udp Src Port */ > - ENA_ADMIN_RSS_L4_SP = 7, > + ENA_ADMIN_RSS_L4_SP = 0x20, > }; > > struct ena_admin_proto_input {
Re: [PATCH V2 net 04/20] net/ena: fix ethtool RSS flow configuration
On Sun, Dec 04, 2016 at 03:19:22PM +0200, Netanel Belgazal wrote: > ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type > treat the ena_flow_hash_to_flow_type enum as power of two values. > > Change the values of ena_admin_flow_hash_fields to be power of two values. Then I generally prefer BIT(0), BIT(1), BIT(2), etc. Also it would be helpful to include some comments about the consequences of the current state of the code. --msw > Signed-off-by: Netanel Belgazal > --- > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > index a46e749..f48c886 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > @@ -631,22 +631,22 @@ enum ena_admin_flow_hash_proto { > /* RSS flow hash fields */ > enum ena_admin_flow_hash_fields { > /* Ethernet Dest Addr */ > - ENA_ADMIN_RSS_L2_DA = 0, > + ENA_ADMIN_RSS_L2_DA = 0x1, > > /* Ethernet Src Addr */ > - ENA_ADMIN_RSS_L2_SA = 1, > + ENA_ADMIN_RSS_L2_SA = 0x2, > > /* ipv4/6 Dest Addr */ > - ENA_ADMIN_RSS_L3_DA = 2, > + ENA_ADMIN_RSS_L3_DA = 0x4, > > /* ipv4/6 Src Addr */ > - ENA_ADMIN_RSS_L3_SA = 5, > + ENA_ADMIN_RSS_L3_SA = 0x8, > > /* tcp/udp Dest Port */ > - ENA_ADMIN_RSS_L4_DP = 6, > + ENA_ADMIN_RSS_L4_DP = 0x10, > > /* tcp/udp Src Port */ > - ENA_ADMIN_RSS_L4_SP = 7, > + ENA_ADMIN_RSS_L4_SP = 0x20, > }; > > struct ena_admin_proto_input {
Re: [PATCH V2 net 03/20] net/ena: fix queues number calculation
On Sun, Dec 04, 2016 at 03:19:21PM +0200, Netanel Belgazal wrote: > The ENA driver tries to open a queue per vCPU. > To determine how many vCPUs the instance have it uses num_possible_cpus > while it should have use num_online_cpus instead. use () when referring to functions: num_possible_cpus(), num_online_cpus(). > Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com> Reviewed-by: Matt Wilson <m...@amazon.com> > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 397c9bc..224302c 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2667,7 +2667,7 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev, > io_sq_num = get_feat_ctx->max_queues.max_sq_num; > } > > - io_queue_num = min_t(int, num_possible_cpus(), ENA_MAX_NUM_IO_QUEUES); > + io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES); > io_queue_num = min_t(int, io_queue_num, io_sq_num); > io_queue_num = min_t(int, io_queue_num, >get_feat_ctx->max_queues.max_cq_num);
Re: [PATCH V2 net 03/20] net/ena: fix queues number calculation
On Sun, Dec 04, 2016 at 03:19:21PM +0200, Netanel Belgazal wrote: > The ENA driver tries to open a queue per vCPU. > To determine how many vCPUs the instance have it uses num_possible_cpus > while it should have use num_online_cpus instead. use () when referring to functions: num_possible_cpus(), num_online_cpus(). > Signed-off-by: Netanel Belgazal Reviewed-by: Matt Wilson > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 397c9bc..224302c 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2667,7 +2667,7 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev, > io_sq_num = get_feat_ctx->max_queues.max_sq_num; > } > > - io_queue_num = min_t(int, num_possible_cpus(), ENA_MAX_NUM_IO_QUEUES); > + io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES); > io_queue_num = min_t(int, io_queue_num, io_sq_num); > io_queue_num = min_t(int, io_queue_num, >get_feat_ctx->max_queues.max_cq_num);
Re: [PATCH V2 net 02/20] net/ena: fix error handling when probe fails
On Sun, Dec 04, 2016 at 03:19:20PM +0200, Netanel Belgazal wrote: > When driver fails in probe, it will release all resources, including > adapter. > In case of probe failure, ena_remove should not try to free the adapter > resources. Please word wrap your commit message around 75 columns. > Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com> Reviewed-by: Matt Wilson <m...@amazon.com> > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 33a760e..397c9bc 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -3054,6 +3054,7 @@ static int ena_probe(struct pci_dev *pdev, const struct > pci_device_id *ent) > err_free_region: > ena_release_bars(ena_dev, pdev); > err_free_ena_dev: > + pci_set_drvdata(pdev, NULL); > vfree(ena_dev); > err_disable_device: > pci_disable_device(pdev);
Re: [PATCH V2 net 02/20] net/ena: fix error handling when probe fails
On Sun, Dec 04, 2016 at 03:19:20PM +0200, Netanel Belgazal wrote: > When driver fails in probe, it will release all resources, including > adapter. > In case of probe failure, ena_remove should not try to free the adapter > resources. Please word wrap your commit message around 75 columns. > Signed-off-by: Netanel Belgazal Reviewed-by: Matt Wilson > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 33a760e..397c9bc 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -3054,6 +3054,7 @@ static int ena_probe(struct pci_dev *pdev, const struct > pci_device_id *ent) > err_free_region: > ena_release_bars(ena_dev, pdev); > err_free_ena_dev: > + pci_set_drvdata(pdev, NULL); > vfree(ena_dev); > err_disable_device: > pci_disable_device(pdev);
Re: [PATCH V2 net 01/20] net/ena: remove ntuple filter support from device feature list
On Sun, Dec 04, 2016 at 03:19:19PM +0200, Netanel Belgazal wrote: > Remove NETIF_F_NTUPLE from netdev->features. > The ENA device driver does not support ntuple filtering. > > Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com> Reviewed-by: Matt Wilson <m...@amazon.com> > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index bfeaec5..33a760e 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2729,7 +2729,6 @@ static void ena_set_dev_offloads(struct > ena_com_dev_get_features_ctx *feat, > netdev->features = > dev_features | > NETIF_F_SG | > - NETIF_F_NTUPLE | > NETIF_F_RXHASH | > NETIF_F_HIGHDMA; >
Re: [PATCH V2 net 01/20] net/ena: remove ntuple filter support from device feature list
On Sun, Dec 04, 2016 at 03:19:19PM +0200, Netanel Belgazal wrote: > Remove NETIF_F_NTUPLE from netdev->features. > The ENA device driver does not support ntuple filtering. > > Signed-off-by: Netanel Belgazal Reviewed-by: Matt Wilson > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index bfeaec5..33a760e 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -2729,7 +2729,6 @@ static void ena_set_dev_offloads(struct > ena_com_dev_get_features_ctx *feat, > netdev->features = > dev_features | > NETIF_F_SG | > - NETIF_F_NTUPLE | > NETIF_F_RXHASH | > NETIF_F_HIGHDMA; >
Re: [PATCH V2 net 00/20] Increase ENA driver version to 1.1.2
On Sun, Dec 04, 2016 at 09:37:43PM -0500, David Miller wrote: > > It is not appropriate to submit so many patches at one time. Indeed, https://www.kernel.org/doc/Documentation/SubmittingPatches recommends submitting no more than 15 or so at once. > Please keep your patch series to no more than about a dozen > at a time. How about 15 from SubmittingPatches? The first 15 in the series are all important bugfixes. Should Netanel resubmit a series with just the bugfixes and a new cover letter? Or are you willing to consider the first 15 of this series as posted? > Also, group your changes logically and tie an appropriately > descriptive cover letter. > > "Increase driver version to X.Y.Z" tells the reader absolutely > nothing. Someone reading that Subject line in the GIT logs > will have no idea what the overall purpose of the patch series > is and what it accomplishes. You're right, the cover letter subject needs to be better. There is only one commit submitted with the subject "increase driver version to 1.1.2." - Patch 20/20. It is logically like: commit b8b2372de9cc00d5ed667c7b8db29b6cfbf037f5 Author: Manish ChopraDate: Wed Aug 3 04:02:04 2016 -0400 qlcnic: Update version to 5.3.65 Signed-off-by: Manish Chopra Signed-off-by: David S. Miller [...] commit ae33256c55d2fefcad8712e750b846461994a1af Author: Bimmy Pujari Date: Mon Jun 20 09:10:39 2016 -0700 i40e/i40evf-bump version to 1.6.11 Signed-off-by: Bimmy Pujari Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher [...] commit 5264cc63ba10ebfa0e54e3e641cce2656c7a60e8 Author: Jacob Keller Date: Tue Jun 7 16:09:02 2016 -0700 fm10k: bump version number Signed-off-by: Jacob Keller Tested-by: Krishneil Singh Signed-off-by: Jeff Kirsher [...] commit a58a3e68037647de78e3461194239a1104f76003 Author: Michael Chan Date: Fri Jul 1 18:46:20 2016 -0400 bnxt_en: Update firmware spec. to 1.3.0. And update driver version to 1.3.0. Signed-off-by: Michael Chan Signed-off-by: David S. Miller > You really need to describe the high level purpose of the patch set. > Is it adding a new feature? What is that feature? Why are you > adding that feature? How is that feature implemented? Why is > it implemented that way? The priority is to get bug fixes to the ENA driver in 4.9. Let's focus on the first 15. --msw
Re: [PATCH V2 net 00/20] Increase ENA driver version to 1.1.2
On Sun, Dec 04, 2016 at 09:37:43PM -0500, David Miller wrote: > > It is not appropriate to submit so many patches at one time. Indeed, https://www.kernel.org/doc/Documentation/SubmittingPatches recommends submitting no more than 15 or so at once. > Please keep your patch series to no more than about a dozen > at a time. How about 15 from SubmittingPatches? The first 15 in the series are all important bugfixes. Should Netanel resubmit a series with just the bugfixes and a new cover letter? Or are you willing to consider the first 15 of this series as posted? > Also, group your changes logically and tie an appropriately > descriptive cover letter. > > "Increase driver version to X.Y.Z" tells the reader absolutely > nothing. Someone reading that Subject line in the GIT logs > will have no idea what the overall purpose of the patch series > is and what it accomplishes. You're right, the cover letter subject needs to be better. There is only one commit submitted with the subject "increase driver version to 1.1.2." - Patch 20/20. It is logically like: commit b8b2372de9cc00d5ed667c7b8db29b6cfbf037f5 Author: Manish Chopra Date: Wed Aug 3 04:02:04 2016 -0400 qlcnic: Update version to 5.3.65 Signed-off-by: Manish Chopra Signed-off-by: David S. Miller [...] commit ae33256c55d2fefcad8712e750b846461994a1af Author: Bimmy Pujari Date: Mon Jun 20 09:10:39 2016 -0700 i40e/i40evf-bump version to 1.6.11 Signed-off-by: Bimmy Pujari Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher [...] commit 5264cc63ba10ebfa0e54e3e641cce2656c7a60e8 Author: Jacob Keller Date: Tue Jun 7 16:09:02 2016 -0700 fm10k: bump version number Signed-off-by: Jacob Keller Tested-by: Krishneil Singh Signed-off-by: Jeff Kirsher [...] commit a58a3e68037647de78e3461194239a1104f76003 Author: Michael Chan Date: Fri Jul 1 18:46:20 2016 -0400 bnxt_en: Update firmware spec. to 1.3.0. And update driver version to 1.3.0. Signed-off-by: Michael Chan Signed-off-by: David S. Miller > You really need to describe the high level purpose of the patch set. > Is it adding a new feature? What is that feature? Why are you > adding that feature? How is that feature implemented? Why is > it implemented that way? The priority is to get bug fixes to the ENA driver in 4.9. Let's focus on the first 15. --msw
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jul 14, 2016 at 09:08:03AM -0700, Benjamin Poirier wrote: > On 2016/07/14 08:22, Matt Wilson wrote: > [...] > > > > Dave and Benjamin, > > > > Do you want to see the interrupt moderation extensions to ethtool and > > the sysfs nodes removed before this lands in net-next? Or should > > Netanel remove the sysfs bits until we can extend the ethtool > > interfaces to cover the parameters that ena uses? > > I couldn't say what's acceptable or not. A few other drivers (qlcnic, > sfc, ...) already have sysfs tunables. Maybe John, as the new ethtool > maintainer, can weight in too about the changes required to ethtool. We definitely want ethtool to handle all the settings, it's just a question of when. We also want to address and resolve all the great feedback so far, and since you originally raised the point about extending ethtool I wanted to see if you have any major objection. --msw
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jul 14, 2016 at 09:08:03AM -0700, Benjamin Poirier wrote: > On 2016/07/14 08:22, Matt Wilson wrote: > [...] > > > > Dave and Benjamin, > > > > Do you want to see the interrupt moderation extensions to ethtool and > > the sysfs nodes removed before this lands in net-next? Or should > > Netanel remove the sysfs bits until we can extend the ethtool > > interfaces to cover the parameters that ena uses? > > I couldn't say what's acceptable or not. A few other drivers (qlcnic, > sfc, ...) already have sysfs tunables. Maybe John, as the new ethtool > maintainer, can weight in too about the changes required to ethtool. We definitely want ethtool to handle all the settings, it's just a question of when. We also want to address and resolve all the great feedback so far, and since you originally raised the point about extending ethtool I wanted to see if you have any major objection. --msw
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote: > This is a driver for the ENA family of networking devices. > > Signed-off-by: Netanel Belgazal> --- > > Notes: > Changes in v3: > - Fix compilation warning for 32bit systems. [kbuild test rebot] > - Replace netdev->num_tx_queues with adapter->num_queues in ethtool. > [b...@decadent.org.uk] > - Use get_link_ksettings instread of get_settings (deprecated ethtool > API). [b...@decadent.org.uk] > - Remove rna_nway_reset (ethtool). [b...@decadent.org.uk] > - Add missing break in switch condition. [b...@decadent.org.uk] > - Increase driver version to 1.0.2 > > Changes in v2: > - Increment driver version to 1.0.1 > - Reorder the initialization of the workqueus and the timer service > - Initialize last_keep_alive_jiffies in probe > - Remove ena_trc_* functions in favor of pr_* [da...@davemloft.net] > - Fix Tx interrupt control register value [rom...@fr.zoreil.com] > - Edit ena.txt to focus on technical details [rom...@fr.zoreil.com] > - Rename small_copy_len tunable to rx_copybreak > [benjamin.poir...@gmail.com] > - Change uint32_t to u32 [rom...@fr.zoreil.com] > - Return error in ena_com_set_hash_ctrl [rami.ro...@intel.com] > - Fix API documentation [rami.ro...@intel.com] > - Remove redundant check in ena_com_mem_addr_set [rom...@fr.zoreil.com] > - Add local variables for allocation methods to improve readability > [rom...@fr.zoreil.com] > - Replace dma_alloc_coherent and __GFP_ZERO flag with dma_zalloc_coherent > [rom...@fr.zoreil.com] > - Fix style when using dma_alloc_coherent [rom...@fr.zoreil.com] > - Tx xmit - In case of an error drop the packet and return NETDEV_TX_OK > [rom...@fr.zoreil.com] > - Reduce the number of parameters in ena_get_dev_stats > [rom...@fr.zoreil.com] > - Return ret and not 0 in ena_com_set_hash_ctrl [rami.ro...@intel.com] > - Do not initialize variables if the function doesn't read them before > the second write [rom...@fr.zoreil.com] > - Enums - Replace space with tabs to line the assign value > [rom...@fr.zoreil.com] > - Remove redundant comments [rom...@fr.zoreil.com] > > TODO: > - Add bytes variable to set_coalesce ethtool callback and use this > callback > instead of dedicated sysfs files. Dave and Benjamin, Do you want to see the interrupt moderation extensions to ethtool and the sysfs nodes removed before this lands in net-next? Or should Netanel remove the sysfs bits until we can extend the ethtool interfaces to cover the parameters that ena uses? --msw
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote: > This is a driver for the ENA family of networking devices. > > Signed-off-by: Netanel Belgazal > --- > > Notes: > Changes in v3: > - Fix compilation warning for 32bit systems. [kbuild test rebot] > - Replace netdev->num_tx_queues with adapter->num_queues in ethtool. > [b...@decadent.org.uk] > - Use get_link_ksettings instread of get_settings (deprecated ethtool > API). [b...@decadent.org.uk] > - Remove rna_nway_reset (ethtool). [b...@decadent.org.uk] > - Add missing break in switch condition. [b...@decadent.org.uk] > - Increase driver version to 1.0.2 > > Changes in v2: > - Increment driver version to 1.0.1 > - Reorder the initialization of the workqueus and the timer service > - Initialize last_keep_alive_jiffies in probe > - Remove ena_trc_* functions in favor of pr_* [da...@davemloft.net] > - Fix Tx interrupt control register value [rom...@fr.zoreil.com] > - Edit ena.txt to focus on technical details [rom...@fr.zoreil.com] > - Rename small_copy_len tunable to rx_copybreak > [benjamin.poir...@gmail.com] > - Change uint32_t to u32 [rom...@fr.zoreil.com] > - Return error in ena_com_set_hash_ctrl [rami.ro...@intel.com] > - Fix API documentation [rami.ro...@intel.com] > - Remove redundant check in ena_com_mem_addr_set [rom...@fr.zoreil.com] > - Add local variables for allocation methods to improve readability > [rom...@fr.zoreil.com] > - Replace dma_alloc_coherent and __GFP_ZERO flag with dma_zalloc_coherent > [rom...@fr.zoreil.com] > - Fix style when using dma_alloc_coherent [rom...@fr.zoreil.com] > - Tx xmit - In case of an error drop the packet and return NETDEV_TX_OK > [rom...@fr.zoreil.com] > - Reduce the number of parameters in ena_get_dev_stats > [rom...@fr.zoreil.com] > - Return ret and not 0 in ena_com_set_hash_ctrl [rami.ro...@intel.com] > - Do not initialize variables if the function doesn't read them before > the second write [rom...@fr.zoreil.com] > - Enums - Replace space with tabs to line the assign value > [rom...@fr.zoreil.com] > - Remove redundant comments [rom...@fr.zoreil.com] > > TODO: > - Add bytes variable to set_coalesce ethtool callback and use this > callback > instead of dedicated sysfs files. Dave and Benjamin, Do you want to see the interrupt moderation extensions to ethtool and the sysfs nodes removed before this lands in net-next? Or should Netanel remove the sysfs bits until we can extend the ethtool interfaces to cover the parameters that ena uses? --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jun 16, 2016 at 07:06:52PM +0100, Ben Hutchings wrote: > On Thu, 2016-06-16 at 10:55 -0700, Benjamin Poirier wrote: > > On 2016/06/13 11:46, Netanel Belgazal wrote: > [...] > > > +static ssize_t ena_show_small_copy_len(struct device *dev, > > > + struct device_attribute *attr, char *buf) > > > +{ > > > + struct ena_adapter *adapter = dev_get_drvdata(dev); > > > + > > > + return sprintf(buf, "%d\n", adapter->small_copy_len); > > > +} > > > + > > > +static DEVICE_ATTR(small_copy_len, S_IRUGO | S_IWUSR, > > > ena_show_small_copy_len, > > > + ena_store_small_copy_len); > > > > This is what many other drivers call (rx_)copybreak. Perhaps it's time > > to add it to ethtool as well? > [...] > > There is the 'tunable' ethtool API for random parameters like > rx_copybreak. The ethtool utility doesn't support it yet, though. So I started implementing proper support for this in ethtool in a generic way (retrieving the string set, parsing generically, etc), but it seems that the implementation isn't quite as complete as one would like. You said [1] that the kernel returns the type of each tunable, but I don't see this implemented anywhere. The string set also includes "Unspec", which is of dubious value. I think that something tunable specific is going to need to be added apart from ETHTOOL_GSTRINGS to provide ethtool the needed type information. --msw [1] http://thread.gmane.org/gmane.linux.network/367058/focus=376701
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jun 16, 2016 at 07:06:52PM +0100, Ben Hutchings wrote: > On Thu, 2016-06-16 at 10:55 -0700, Benjamin Poirier wrote: > > On 2016/06/13 11:46, Netanel Belgazal wrote: > [...] > > > +static ssize_t ena_show_small_copy_len(struct device *dev, > > > + struct device_attribute *attr, char *buf) > > > +{ > > > + struct ena_adapter *adapter = dev_get_drvdata(dev); > > > + > > > + return sprintf(buf, "%d\n", adapter->small_copy_len); > > > +} > > > + > > > +static DEVICE_ATTR(small_copy_len, S_IRUGO | S_IWUSR, > > > ena_show_small_copy_len, > > > + ena_store_small_copy_len); > > > > This is what many other drivers call (rx_)copybreak. Perhaps it's time > > to add it to ethtool as well? > [...] > > There is the 'tunable' ethtool API for random parameters like > rx_copybreak. The ethtool utility doesn't support it yet, though. So I started implementing proper support for this in ethtool in a generic way (retrieving the string set, parsing generically, etc), but it seems that the implementation isn't quite as complete as one would like. You said [1] that the kernel returns the type of each tunable, but I don't see this implemented anywhere. The string set also includes "Unspec", which is of dubious value. I think that something tunable specific is going to need to be added apart from ETHTOOL_GSTRINGS to provide ethtool the needed type information. --msw [1] http://thread.gmane.org/gmane.linux.network/367058/focus=376701
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jun 16, 2016 at 10:46:23PM +0200, Francois Romieu wrote: > Netanel Belgazal: > [...] > > Very limited review below. I'll comment on the documentation (since I edited it heavily) but will leave some the other parts for Netanel to answer. > > diff --git a/Documentation/networking/ena.txt > > b/Documentation/networking/ena.txt > > new file mode 100644 > > index 000..528544f > > --- /dev/null > > +++ b/Documentation/networking/ena.txt > > @@ -0,0 +1,330 @@ > > +Linux kernel driver for Elastic Network Adapter (ENA) family: > > += > > + > > +Overview: > > += > > +The ENA driver provides a modern Ethernet device interface optimized > > +for high performance and low CPU overhead. > > Marketing boilerplate. How much of it will still be true in 6 months ? > 6 years ? > > Hint: stay technical. If in doubt, make it boring. :o) Hopefully it will still be true for quite a while. But you make a good point, the overview here should stick with the technical details. > [...] > > +ENA devices allow high speed and low overhead Ethernet traffic > > +processing by providing a dedicated Tx/Rx queue pair per host CPU, a > > +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive > > +interrupt moderation, and CPU cacheline optimized data placement. > > How many queues exactly ? It's negotiated with the adapter via the admin queue, and it will vary based on the capabilities of a particular adapter. See: rc = ena_com_get_feature(ena_dev, _resp, ENA_ADMIN_MAX_QUEUES_NUM); in ena_com.c > [...] > > +Memory Allocations: > > +=== > > +DMA Coherent buffers are allocated for the f`ollowing DMA rings: > > +- Tx submission ring (For regular mode; for LLQ mode it is allocated > > + using kzalloc) > > +- Tx completion ring > > +- Rx submission ring > > +- Rx completion ring > > +- Admin submission ring > > +- Admin completion ring > > +- AENQ ring > > Useless documentation (implementation detail). > > Nobody will maintain it when the driver changes. Please remove. Ack. > [...] > > +MTU: > > + > > +The driver supports an arbitrarily large MTU with a maximum that is > > +negotiated with the device. The driver configures MTU using the > > +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU > > +via ifconfig(8) and ip(8). > > Please avoid advertising legacy ifconfig. Ack. > [...] > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > > new file mode 100644 > > index 000..8516e5e > > --- /dev/null > > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > [...] > > +/* admin commands opcodes */ > > +enum ena_admin_aq_opcode { > > + /* create submission queue */ > > + ENA_ADMIN_CREATE_SQ = 1, > > + > > + /* destroy submission queue */ > > + ENA_ADMIN_DESTROY_SQ = 2, > > + > > + /* create completion queue */ > > + ENA_ADMIN_CREATE_CQ = 3, > > + > > + /* destroy completion queue */ > > + ENA_ADMIN_DESTROY_CQ = 4, > > + > > + /* get capabilities of particular feature */ > > + ENA_ADMIN_GET_FEATURE = 8, > > + > > + /* get capabilities of particular feature */ > > + ENA_ADMIN_SET_FEATURE = 9, > > + > > + /* get statistics */ > > + ENA_ADMIN_GET_STATS = 11, > > +}; > > The documentation above simply duplicates the member names -> useless > visual payload. > > Please replace space with tab before "=" to line things up. I'll leave this to Netanel. A good amount of this comes from a unified documentation / struct / register access generation script, and it might need some massaging. > [...] > > +/* Basic Statistics Command. */ > > +struct ena_admin_basic_stats { > > + /* word 0 : */ > > + u32 tx_bytes_low; > > + > > + /* word 1 : */ > > + u32 tx_bytes_high; > > + > > + /* word 2 : */ > > + u32 tx_pkts_low; > > + > > + /* word 3 : */ > > + u32 tx_pkts_high; > > + > > + /* word 4 : */ > > + u32 rx_bytes_low; > > + > > + /* word 5 : */ > > + u32 rx_bytes_high; > > + > > + /* word 6 : */ > > + u32 rx_pkts_low; > > + > > + /* word 7 : */ > > + u32 rx_pkts_high; > > + > > + /* word 8 : */ > > + u32 rx_drops_low; > > + > > + /* word 9 : */ > > + u32 rx_drops_high; > > +}; > > struct zorg { > u32 ...; > u32 ...; > u32 ...; > u32 ...; > > u32 ...; > u32 ...; > u32 ...; > u32 ...; > > u32 ...; > u32 ...; > u32 ...; > }; > > -> No comment needed to figure the offset. >Comment removed => no need to check if offset starts at word 0 or word 1. > > [...] > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > > b/drivers/net/ethernet/amazon/ena/ena_com.c > > new file mode 100644 > > index 000..357e10f > > --- /dev/null > > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > [...] > > +static inline int
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Thu, Jun 16, 2016 at 10:46:23PM +0200, Francois Romieu wrote: > Netanel Belgazal : > [...] > > Very limited review below. I'll comment on the documentation (since I edited it heavily) but will leave some the other parts for Netanel to answer. > > diff --git a/Documentation/networking/ena.txt > > b/Documentation/networking/ena.txt > > new file mode 100644 > > index 000..528544f > > --- /dev/null > > +++ b/Documentation/networking/ena.txt > > @@ -0,0 +1,330 @@ > > +Linux kernel driver for Elastic Network Adapter (ENA) family: > > += > > + > > +Overview: > > += > > +The ENA driver provides a modern Ethernet device interface optimized > > +for high performance and low CPU overhead. > > Marketing boilerplate. How much of it will still be true in 6 months ? > 6 years ? > > Hint: stay technical. If in doubt, make it boring. :o) Hopefully it will still be true for quite a while. But you make a good point, the overview here should stick with the technical details. > [...] > > +ENA devices allow high speed and low overhead Ethernet traffic > > +processing by providing a dedicated Tx/Rx queue pair per host CPU, a > > +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive > > +interrupt moderation, and CPU cacheline optimized data placement. > > How many queues exactly ? It's negotiated with the adapter via the admin queue, and it will vary based on the capabilities of a particular adapter. See: rc = ena_com_get_feature(ena_dev, _resp, ENA_ADMIN_MAX_QUEUES_NUM); in ena_com.c > [...] > > +Memory Allocations: > > +=== > > +DMA Coherent buffers are allocated for the f`ollowing DMA rings: > > +- Tx submission ring (For regular mode; for LLQ mode it is allocated > > + using kzalloc) > > +- Tx completion ring > > +- Rx submission ring > > +- Rx completion ring > > +- Admin submission ring > > +- Admin completion ring > > +- AENQ ring > > Useless documentation (implementation detail). > > Nobody will maintain it when the driver changes. Please remove. Ack. > [...] > > +MTU: > > + > > +The driver supports an arbitrarily large MTU with a maximum that is > > +negotiated with the device. The driver configures MTU using the > > +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU > > +via ifconfig(8) and ip(8). > > Please avoid advertising legacy ifconfig. Ack. > [...] > > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > > new file mode 100644 > > index 000..8516e5e > > --- /dev/null > > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h > [...] > > +/* admin commands opcodes */ > > +enum ena_admin_aq_opcode { > > + /* create submission queue */ > > + ENA_ADMIN_CREATE_SQ = 1, > > + > > + /* destroy submission queue */ > > + ENA_ADMIN_DESTROY_SQ = 2, > > + > > + /* create completion queue */ > > + ENA_ADMIN_CREATE_CQ = 3, > > + > > + /* destroy completion queue */ > > + ENA_ADMIN_DESTROY_CQ = 4, > > + > > + /* get capabilities of particular feature */ > > + ENA_ADMIN_GET_FEATURE = 8, > > + > > + /* get capabilities of particular feature */ > > + ENA_ADMIN_SET_FEATURE = 9, > > + > > + /* get statistics */ > > + ENA_ADMIN_GET_STATS = 11, > > +}; > > The documentation above simply duplicates the member names -> useless > visual payload. > > Please replace space with tab before "=" to line things up. I'll leave this to Netanel. A good amount of this comes from a unified documentation / struct / register access generation script, and it might need some massaging. > [...] > > +/* Basic Statistics Command. */ > > +struct ena_admin_basic_stats { > > + /* word 0 : */ > > + u32 tx_bytes_low; > > + > > + /* word 1 : */ > > + u32 tx_bytes_high; > > + > > + /* word 2 : */ > > + u32 tx_pkts_low; > > + > > + /* word 3 : */ > > + u32 tx_pkts_high; > > + > > + /* word 4 : */ > > + u32 rx_bytes_low; > > + > > + /* word 5 : */ > > + u32 rx_bytes_high; > > + > > + /* word 6 : */ > > + u32 rx_pkts_low; > > + > > + /* word 7 : */ > > + u32 rx_pkts_high; > > + > > + /* word 8 : */ > > + u32 rx_drops_low; > > + > > + /* word 9 : */ > > + u32 rx_drops_high; > > +}; > > struct zorg { > u32 ...; > u32 ...; > u32 ...; > u32 ...; > > u32 ...; > u32 ...; > u32 ...; > u32 ...; > > u32 ...; > u32 ...; > u32 ...; > }; > > -> No comment needed to figure the offset. >Comment removed => no need to check if offset starts at word 0 or word 1. > > [...] > > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c > > b/drivers/net/ethernet/amazon/ena/ena_com.c > > new file mode 100644 > > index 000..357e10f > > --- /dev/null > > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c > [...] > > +static inline int ena_com_mem_addr_set(struct ena_com_dev
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Wed, Jun 15, 2016 at 09:32:35PM +0300, Netanel Belgazal wrote: > I removed HAVE_NETDEV_NAPI_LIST and fixed the ena print macro. > I'll wait a bit to see if there are additional comment before I'll send the > v2 patch. Also, please don't top post (reply inline) and send emails as plain text only. Have a look at [1] for how to set up a mailer like mutt with Gmail. --msw [1] https://www.kernel.org/doc/Documentation/email-clients.txt > On 15 June 2016 at 21:22, Matt Wilson <m...@amzn.com> wrote: > > > On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote: > > > > > > You also had a few typos in the email addresses on the Cc: line that > > > I've corrected (d...@avemeloft.net -> da...@davemeloft.net, > > > > Argh, and of course I typo'ed the correction. -> da...@davemloft.net. > > > > --msw > >
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Wed, Jun 15, 2016 at 09:32:35PM +0300, Netanel Belgazal wrote: > I removed HAVE_NETDEV_NAPI_LIST and fixed the ena print macro. > I'll wait a bit to see if there are additional comment before I'll send the > v2 patch. Also, please don't top post (reply inline) and send emails as plain text only. Have a look at [1] for how to set up a mailer like mutt with Gmail. --msw [1] https://www.kernel.org/doc/Documentation/email-clients.txt > On 15 June 2016 at 21:22, Matt Wilson wrote: > > > On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote: > > > > > > You also had a few typos in the email addresses on the Cc: line that > > > I've corrected (d...@avemeloft.net -> da...@davemeloft.net, > > > > Argh, and of course I typo'ed the correction. -> da...@davemloft.net. > > > > --msw > >
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote: > > You also had a few typos in the email addresses on the Cc: line that > I've corrected (d...@avemeloft.net -> da...@davemeloft.net, Argh, and of course I typo'ed the correction. -> da...@davemloft.net. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote: > > You also had a few typos in the email addresses on the Cc: line that > I've corrected (d...@avemeloft.net -> da...@davemeloft.net, Argh, and of course I typo'ed the correction. -> da...@davemloft.net. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote: > This is a driver for the forthcoming ENA family of networking devices. > > Signed-off-by: Netanel Belgazal[...] > + > +struct ena_napi { > + struct napi_struct napi cacheline_aligned; > + struct ena_ring *tx_ring; > + struct ena_ring *rx_ring; > +#ifndef HAVE_NETDEV_NAPI_LIST > + struct net_device poll_dev; > +#endif /* HAVE_NETDEV_NAPI_LIST */ > + u32 qid; > +}; Netanel, Sorry I missed this when I read through the patch earlier this week (though I pointed it out before...). This "#ifndef HAVE_NETDEV_NAPI_LIST" bit needs to be removed for the in-tree version. You also had a few typos in the email addresses on the Cc: line that I've corrected (d...@avemeloft.net -> da...@davemeloft.net, aligo...@amazon.com -> aligu...@amazon.com). Please make sure to correct in your v2 patch. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote: > This is a driver for the forthcoming ENA family of networking devices. > > Signed-off-by: Netanel Belgazal [...] > + > +struct ena_napi { > + struct napi_struct napi cacheline_aligned; > + struct ena_ring *tx_ring; > + struct ena_ring *rx_ring; > +#ifndef HAVE_NETDEV_NAPI_LIST > + struct net_device poll_dev; > +#endif /* HAVE_NETDEV_NAPI_LIST */ > + u32 qid; > +}; Netanel, Sorry I missed this when I read through the patch earlier this week (though I pointed it out before...). This "#ifndef HAVE_NETDEV_NAPI_LIST" bit needs to be removed for the in-tree version. You also had a few typos in the email addresses on the Cc: line that I've corrected (d...@avemeloft.net -> da...@davemeloft.net, aligo...@amazon.com -> aligu...@amazon.com). Please make sure to correct in your v2 patch. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Jun 14, 2016 at 11:27:09PM -0700, David Miller wrote: > From: Matt Wilson <m...@amzn.com> > Date: Tue, 14 Jun 2016 23:23:36 -0700 > > > Point taken, though existing drivers (even fairly popular ones) also > > aren't as clean as you might like. A quick look around... > > Existing drivers do undesirable things, film at 11... > > Yet are never a reason to accept such things in new drivers. I generally agree with this philosophy. > > Like many other network drivers, some of this is common code used for > > non-Linux systems, and that's why there is some overlap with Linux > > facilities. > > Again, never an excuse for such things. I suppose I was just happy to have the *majorly* objectionable parts cleaned up... > > Are there other things that jump out at you? > > I review hundreds of patches a day, I invested what I was able to > before moving on to other people's work. I wasn't asking you to do more, only if you had anything else you wanted to say before Netanel sends a v2. > Other developers must help review such a large driver submission, it > can't all be on me. And I'm certainly not saying it's all on you. I've been reviewing this with the team for quite a while to get it in pretty reasonable (IMHO) shape. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Jun 14, 2016 at 11:27:09PM -0700, David Miller wrote: > From: Matt Wilson > Date: Tue, 14 Jun 2016 23:23:36 -0700 > > > Point taken, though existing drivers (even fairly popular ones) also > > aren't as clean as you might like. A quick look around... > > Existing drivers do undesirable things, film at 11... > > Yet are never a reason to accept such things in new drivers. I generally agree with this philosophy. > > Like many other network drivers, some of this is common code used for > > non-Linux systems, and that's why there is some overlap with Linux > > facilities. > > Again, never an excuse for such things. I suppose I was just happy to have the *majorly* objectionable parts cleaned up... > > Are there other things that jump out at you? > > I review hundreds of patches a day, I invested what I was able to > before moving on to other people's work. I wasn't asking you to do more, only if you had anything else you wanted to say before Netanel sends a v2. > Other developers must help review such a large driver submission, it > can't all be on me. And I'm certainly not saying it's all on you. I've been reviewing this with the team for quite a while to get it in pretty reasonable (IMHO) shape. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Jun 14, 2016 at 10:25:16PM -0700, David Miller wrote: > From: Netanel Belgazal> Date: Mon, 13 Jun 2016 11:46:13 +0300 > > > +#define ena_trc_dbg(format, arg...) \ > > + pr_debug("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_info(format, arg...) \ > > + pr_info("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_warn(format, arg...) \ > > + pr_warn("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_err(format, arg...) \ > > + pr_err("[ENA_COM: %s] " format, __func__, ##arg) > > These custom tracing macros are quite inappropriate. > > We have the function tracer in the kernel when that is needed. So spitting > out __func__ all over the place is not something that should be found in > drivers these days. Point taken, though existing drivers (even fairly popular ones) also aren't as clean as you might like. A quick look around... msw@carbon:~/git/upstream/linux$ git grep -B1 '__func__' drivers/net/ethernet/ | grep -A1 '#define' drivers/net/ethernet/broadcom/bnx2x/bnx2x.h-#define BNX2X_ERROR(fmt, ...) \ drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:pr_err("[%s:%d]" fmt, __func__, __LINE__, ##__VA_ARGS__) [...] drivers/net/ethernet/intel/ixgb/ixgb_osdep.h:#define ENTER() pr_debug("%s\n", __func__); Like many other network drivers, some of this is common code used for non-Linux systems, and that's why there is some overlap with Linux facilities. For example, here's the common ENA parts as it's situated in DPDK as a PMD: http://dpdk.org/browse/dpdk/tree/drivers/net/ena/base/ena_com.c When you compare to the DPDK version you can see that the common code has already been contextualized for Linux in this patch in anticipation of this type of feedback. (e.g., ENA_SPINLOCK_LOCK() -> spin_lock_irqsave(), etc., as that would obviously never fly). The Linux-specific bits (ena_netdev.c, ena_ethtool.c, etc.) don't make use of any of the overlapping functionality needed for the common code. > And one can modify pr_fmt do make pr_debug et al. have whatever prefix > one wants. Yup, that's an easy improvement. > I suspect there will be several rounds of review to weed out things > like this. You can preempt a lot of that by removing as much in your > driver that the kernel has existing facilities for. Are there other things that jump out at you? I felt like this was pretty good for an initial submission in terms of striking a balance between using a portable core while avoiding a lot of compatibility shims. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Jun 14, 2016 at 10:25:16PM -0700, David Miller wrote: > From: Netanel Belgazal > Date: Mon, 13 Jun 2016 11:46:13 +0300 > > > +#define ena_trc_dbg(format, arg...) \ > > + pr_debug("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_info(format, arg...) \ > > + pr_info("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_warn(format, arg...) \ > > + pr_warn("[ENA_COM: %s] " format, __func__, ##arg) > > +#define ena_trc_err(format, arg...) \ > > + pr_err("[ENA_COM: %s] " format, __func__, ##arg) > > These custom tracing macros are quite inappropriate. > > We have the function tracer in the kernel when that is needed. So spitting > out __func__ all over the place is not something that should be found in > drivers these days. Point taken, though existing drivers (even fairly popular ones) also aren't as clean as you might like. A quick look around... msw@carbon:~/git/upstream/linux$ git grep -B1 '__func__' drivers/net/ethernet/ | grep -A1 '#define' drivers/net/ethernet/broadcom/bnx2x/bnx2x.h-#define BNX2X_ERROR(fmt, ...) \ drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:pr_err("[%s:%d]" fmt, __func__, __LINE__, ##__VA_ARGS__) [...] drivers/net/ethernet/intel/ixgb/ixgb_osdep.h:#define ENTER() pr_debug("%s\n", __func__); Like many other network drivers, some of this is common code used for non-Linux systems, and that's why there is some overlap with Linux facilities. For example, here's the common ENA parts as it's situated in DPDK as a PMD: http://dpdk.org/browse/dpdk/tree/drivers/net/ena/base/ena_com.c When you compare to the DPDK version you can see that the common code has already been contextualized for Linux in this patch in anticipation of this type of feedback. (e.g., ENA_SPINLOCK_LOCK() -> spin_lock_irqsave(), etc., as that would obviously never fly). The Linux-specific bits (ena_netdev.c, ena_ethtool.c, etc.) don't make use of any of the overlapping functionality needed for the common code. > And one can modify pr_fmt do make pr_debug et al. have whatever prefix > one wants. Yup, that's an easy improvement. > I suspect there will be several rounds of review to weed out things > like this. You can preempt a lot of that by removing as much in your > driver that the kernel has existing facilities for. Are there other things that jump out at you? I felt like this was pretty good for an initial submission in terms of striking a balance between using a portable core while avoiding a lot of compatibility shims. --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote: > This is a driver for the forthcoming ENA family of networking devices. Reviewed-by: Matt Wilson <m...@amazon.com> > Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com> > --- > Documentation/networking/00-INDEX |2 + > Documentation/networking/ena.txt | 330 ++ > MAINTAINERS |9 + > drivers/net/ethernet/Kconfig |1 + > drivers/net/ethernet/Makefile |1 + > drivers/net/ethernet/amazon/Kconfig | 27 + > drivers/net/ethernet/amazon/Makefile |5 + > drivers/net/ethernet/amazon/ena/Makefile |9 + > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 1246 > drivers/net/ethernet/amazon/ena/ena_com.c | 2768 + > drivers/net/ethernet/amazon/ena/ena_com.h | 1051 +++ > drivers/net/ethernet/amazon/ena/ena_common_defs.h | 52 + > drivers/net/ethernet/amazon/ena/ena_eth_com.c | 508 > drivers/net/ethernet/amazon/ena/ena_eth_com.h | 160 + > drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h | 460 +++ > drivers/net/ethernet/amazon/ena/ena_ethtool.c | 836 + > drivers/net/ethernet/amazon/ena/ena_netdev.c | 3362 > + > drivers/net/ethernet/amazon/ena/ena_netdev.h | 327 ++ > drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h | 67 + > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 133 + > drivers/net/ethernet/amazon/ena/ena_sysfs.c | 264 ++ > drivers/net/ethernet/amazon/ena/ena_sysfs.h | 55 + > 22 files changed, 11673 insertions(+) > create mode 100644 Documentation/networking/ena.txt > create mode 100644 drivers/net/ethernet/amazon/Kconfig > create mode 100644 drivers/net/ethernet/amazon/Makefile > create mode 100644 drivers/net/ethernet/amazon/ena/Makefile > create mode 100644 drivers/net/ethernet/amazon/ena/ena_admin_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_common_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_ethtool.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_regs_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.h [...] --msw
Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote: > This is a driver for the forthcoming ENA family of networking devices. Reviewed-by: Matt Wilson > Signed-off-by: Netanel Belgazal > --- > Documentation/networking/00-INDEX |2 + > Documentation/networking/ena.txt | 330 ++ > MAINTAINERS |9 + > drivers/net/ethernet/Kconfig |1 + > drivers/net/ethernet/Makefile |1 + > drivers/net/ethernet/amazon/Kconfig | 27 + > drivers/net/ethernet/amazon/Makefile |5 + > drivers/net/ethernet/amazon/ena/Makefile |9 + > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 1246 > drivers/net/ethernet/amazon/ena/ena_com.c | 2768 + > drivers/net/ethernet/amazon/ena/ena_com.h | 1051 +++ > drivers/net/ethernet/amazon/ena/ena_common_defs.h | 52 + > drivers/net/ethernet/amazon/ena/ena_eth_com.c | 508 > drivers/net/ethernet/amazon/ena/ena_eth_com.h | 160 + > drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h | 460 +++ > drivers/net/ethernet/amazon/ena/ena_ethtool.c | 836 + > drivers/net/ethernet/amazon/ena/ena_netdev.c | 3362 > + > drivers/net/ethernet/amazon/ena/ena_netdev.h | 327 ++ > drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h | 67 + > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 133 + > drivers/net/ethernet/amazon/ena/ena_sysfs.c | 264 ++ > drivers/net/ethernet/amazon/ena/ena_sysfs.h | 55 + > 22 files changed, 11673 insertions(+) > create mode 100644 Documentation/networking/ena.txt > create mode 100644 drivers/net/ethernet/amazon/Kconfig > create mode 100644 drivers/net/ethernet/amazon/Makefile > create mode 100644 drivers/net/ethernet/amazon/ena/Makefile > create mode 100644 drivers/net/ethernet/amazon/ena/ena_admin_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_common_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_ethtool.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_regs_defs.h > create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.c > create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.h [...] --msw
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Mar 15, 2016 at 01:19:07PM -0500, Dan Williams wrote: > On Tue, 2016-03-15 at 12:50 +0200, Netanel Belgazal wrote: > > + > > +The ENA driver supports industry standard TCP/IP offload features > > such > > +as checksum offload and TCP transmit segmentation offload (TSO). > > + > > +Receive-side scaling (RSS) is supported for multi-core scaling. > > Any chance it also supports VXLAN offload? While there's room for tunnel offload fields in the TX/RX descriptors (e.g., see ena_eth_io_tx_desc), there is no support at this time. --msw
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Mar 15, 2016 at 01:19:07PM -0500, Dan Williams wrote: > On Tue, 2016-03-15 at 12:50 +0200, Netanel Belgazal wrote: > > + > > +The ENA driver supports industry standard TCP/IP offload features > > such > > +as checksum offload and TCP transmit segmentation offload (TSO). > > + > > +Receive-side scaling (RSS) is supported for multi-core scaling. > > Any chance it also supports VXLAN offload? While there's room for tunnel offload fields in the TX/RX descriptors (e.g., see ena_eth_io_tx_desc), there is no support at this time. --msw
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote: [...] > diff --git a/drivers/net/ethernet/amazon/Kconfig > b/drivers/net/ethernet/amazon/Kconfig > new file mode 100644 > index 000..bc4f240d > --- /dev/null > +++ b/drivers/net/ethernet/amazon/Kconfig > @@ -0,0 +1,27 @@ > +# > +# Amazon network device configuration > +# > + > +config NET_VENDOR_AMAZON > + bool "Amazon Devices" > + default y > + ---help--- > + If you have a network (Ethernet) device belonging to this class, say > Y. > + > + Note that the answer to this question doesn't directly affect the > + kernel: saying N will just cause the configurator to skip all > + the questions about amazon devices. If you say Y, you will be asked > + for your specific device in the following questions. > + > +if NET_VENDOR_AMAZON > + > +config ENA_ETHERNET > + tristate "Elastic Network Adapter (ENA) support" > + depends on (PCI_MSI && X86) || COMPILE_TEST Remove COMPILE_TEST so that the kbuild-all robot won't complain. I imagine that the drier should work on other systems that have MSI-X... > + ---help--- > + This driver supports Elastic Network Adapter (ENA) adapter" > + > + To compile this driver as a module, choose M here. > + The module will be called ena. > + > +endif #NET_VENDOR_AMAZON [...] > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > new file mode 100644 > index 000..41d7265 > --- /dev/null > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c [...] > +static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) > +{ > + u32 i; > + int rc; > + > + adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_queues); > + if (!adapter->netdev->rx_cpu_rmap) > + return -ENOMEM; > + for (i = 0; i < adapter->num_queues; i++) { > + int irq_idx = ENA_IO_IRQ_IDX(i); > + > + rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap, > + adapter->msix_entries[irq_idx].vector); > + if (rc) { > + free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap); > + adapter->netdev->rx_cpu_rmap = NULL; > + return rc; > + } > + } > + return 0; > +} This should be in #ifdef CONFIG_RSS_ACCEL (and elsewhere where netdev->rx_cpu_rmap is touched). --msw
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote: [...] > diff --git a/drivers/net/ethernet/amazon/Kconfig > b/drivers/net/ethernet/amazon/Kconfig > new file mode 100644 > index 000..bc4f240d > --- /dev/null > +++ b/drivers/net/ethernet/amazon/Kconfig > @@ -0,0 +1,27 @@ > +# > +# Amazon network device configuration > +# > + > +config NET_VENDOR_AMAZON > + bool "Amazon Devices" > + default y > + ---help--- > + If you have a network (Ethernet) device belonging to this class, say > Y. > + > + Note that the answer to this question doesn't directly affect the > + kernel: saying N will just cause the configurator to skip all > + the questions about amazon devices. If you say Y, you will be asked > + for your specific device in the following questions. > + > +if NET_VENDOR_AMAZON > + > +config ENA_ETHERNET > + tristate "Elastic Network Adapter (ENA) support" > + depends on (PCI_MSI && X86) || COMPILE_TEST Remove COMPILE_TEST so that the kbuild-all robot won't complain. I imagine that the drier should work on other systems that have MSI-X... > + ---help--- > + This driver supports Elastic Network Adapter (ENA) adapter" > + > + To compile this driver as a module, choose M here. > + The module will be called ena. > + > +endif #NET_VENDOR_AMAZON [...] > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c > b/drivers/net/ethernet/amazon/ena/ena_netdev.c > new file mode 100644 > index 000..41d7265 > --- /dev/null > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c [...] > +static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) > +{ > + u32 i; > + int rc; > + > + adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_queues); > + if (!adapter->netdev->rx_cpu_rmap) > + return -ENOMEM; > + for (i = 0; i < adapter->num_queues; i++) { > + int irq_idx = ENA_IO_IRQ_IDX(i); > + > + rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap, > + adapter->msix_entries[irq_idx].vector); > + if (rc) { > + free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap); > + adapter->netdev->rx_cpu_rmap = NULL; > + return rc; > + } > + } > + return 0; > +} This should be in #ifdef CONFIG_RSS_ACCEL (and elsewhere where netdev->rx_cpu_rmap is touched). --msw
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
Hi Netanel, Looks like the last round of internal review went to {lkml,netdev,etc}, so this should be considered a RFC patch. The description still needs work. On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote: > This is a driver for the Amazon ethernet ENA family. ethernet -> Ethernet > The driver operates variety of ENA adapters through > feature negotiation with the adapter and upgradable commands set. > ENA driver handles PCI Physical and Virtual ENA functions. > > The ENA device is not yet released to public. > He is expected to be released soon. I would say "this is a driver for a device that will be available in the future." I wouldn't say "He is expected to released soon." > For the full specification of the device please refer to: > Obviously this is a placeholder. I think that the included documentation in ena.txt and in the code provides a reasonable theory of operation, and you don't necessarily need to provide a pointer here. If/when a full specification is available you can submit a patch to ena.txt that points to the canonical location. --msw > --- > Documentation/networking/00-INDEX |2 + > Documentation/networking/ena.txt | 330 +++ > MAINTAINERS |9 + > drivers/net/ethernet/Kconfig |1 + > drivers/net/ethernet/Makefile |1 + > drivers/net/ethernet/amazon/Kconfig | 27 + > drivers/net/ethernet/amazon/Makefile |5 + > drivers/net/ethernet/amazon/ena/Makefile |9 + > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 1310 + > drivers/net/ethernet/amazon/ena/ena_com.c | 2730 ++ > drivers/net/ethernet/amazon/ena/ena_com.h | 1040 +++ > drivers/net/ethernet/amazon/ena/ena_common_defs.h | 52 + > drivers/net/ethernet/amazon/ena/ena_eth_com.c | 502 > drivers/net/ethernet/amazon/ena/ena_eth_com.h | 146 + > drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h | 509 > drivers/net/ethernet/amazon/ena/ena_ethtool.c | 837 ++ > drivers/net/ethernet/amazon/ena/ena_netdev.c | 3179 > + > drivers/net/ethernet/amazon/ena/ena_netdev.h | 317 ++ > drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h | 77 + > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 133 + > drivers/net/ethernet/amazon/ena/ena_sysfs.c | 272 ++ > drivers/net/ethernet/amazon/ena/ena_sysfs.h | 55 + [...]
Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
Hi Netanel, Looks like the last round of internal review went to {lkml,netdev,etc}, so this should be considered a RFC patch. The description still needs work. On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote: > This is a driver for the Amazon ethernet ENA family. ethernet -> Ethernet > The driver operates variety of ENA adapters through > feature negotiation with the adapter and upgradable commands set. > ENA driver handles PCI Physical and Virtual ENA functions. > > The ENA device is not yet released to public. > He is expected to be released soon. I would say "this is a driver for a device that will be available in the future." I wouldn't say "He is expected to released soon." > For the full specification of the device please refer to: > Obviously this is a placeholder. I think that the included documentation in ena.txt and in the code provides a reasonable theory of operation, and you don't necessarily need to provide a pointer here. If/when a full specification is available you can submit a patch to ena.txt that points to the canonical location. --msw > --- > Documentation/networking/00-INDEX |2 + > Documentation/networking/ena.txt | 330 +++ > MAINTAINERS |9 + > drivers/net/ethernet/Kconfig |1 + > drivers/net/ethernet/Makefile |1 + > drivers/net/ethernet/amazon/Kconfig | 27 + > drivers/net/ethernet/amazon/Makefile |5 + > drivers/net/ethernet/amazon/ena/Makefile |9 + > drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 1310 + > drivers/net/ethernet/amazon/ena/ena_com.c | 2730 ++ > drivers/net/ethernet/amazon/ena/ena_com.h | 1040 +++ > drivers/net/ethernet/amazon/ena/ena_common_defs.h | 52 + > drivers/net/ethernet/amazon/ena/ena_eth_com.c | 502 > drivers/net/ethernet/amazon/ena/ena_eth_com.h | 146 + > drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h | 509 > drivers/net/ethernet/amazon/ena/ena_ethtool.c | 837 ++ > drivers/net/ethernet/amazon/ena/ena_netdev.c | 3179 > + > drivers/net/ethernet/amazon/ena/ena_netdev.h | 317 ++ > drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h | 77 + > drivers/net/ethernet/amazon/ena/ena_regs_defs.h | 133 + > drivers/net/ethernet/amazon/ena/ena_sysfs.c | 272 ++ > drivers/net/ethernet/amazon/ena/ena_sysfs.h | 55 + [...]
Re: [PATCH] virtio-pci: also bind to Amazon PCI vendor ID
61ba512-40e2-4cc9-843a-923143f3456c/pciidspec-11.doc > > > "Test will read and properly concatenate PCI IDs and verify uniqueness" > > > this is likely what you are running into: IDs must be unique, > > > so if you want to put your driver in Microsoft's database, > > > it must match a different set of IDs. > > > But you should not need to change the vendor ID to make them unique, > > > changing subsystem vendor ID will do. > > > > > > Did you try this? > > > > You cannot submit a modified kvm-guest-drivers.git for WHQL > > certification > > Sorry about being unclear, that github link was just an example to give > you the idea. The point is to change the subsystem ID *of the device*. > > > as the licensing is constructed in such a way as to > > prevent that. > > > > >> Red Hat has non-redistributable Windows drivers and Microsoft > > >> will not allow anyone else to WHQL certify drivers using that > > >> vendor ID. > > > > > > Don't see what Red Hat's windows drivers have to do with Linux really. > > > Amazon.com can do whatever it wants with its vendor ID, and if there is a > > > hypervisor with a different vendor ID that can use the virtio drivers, > > > this > > > patch is required. > > > The following would be a reasonable commit log in that case: > > > > > > "Amazon.com uses PV devices with vendor ID 0x1d0f that are otherwise > > > compatible with Linux virtio drivers. Add 0x1d0f ID to the list > > > to make Linux work with these devices." > > > Feel free to use :) > > > > > > > > > But I'd like to note that by doing this on the hypervisor side, > > > you lose the ability to run older Linux guests, > > > and create work for all distros who now need to update their > > > kernels to work with your ID, apparently for no good reason. > > > > > > So if this isn't out in the field yet, I would suggest examining > > > the alternative listed above. > > > > I am very happy to use any alternative mechanism that allows for > > virtio to be used with a wide variety of guests. > > I hope my idea above can work - it would be best for everyone. I would > try to explore it before giving up and changing the vendor ID. > As I said this is just friendly advice, and not a nak to the patch, > if your hypervisor has devices with your own ID, go ahead and > add the ID to driver. > > > Many other companies > > have also struggled with this and AFAIK no one has had much success. > > Question remains what the problem is. > Until we have an explicit report saying "we tried XYZ and > it failed" we are just shooting in the dark, aren't we? > > > > > OTOH if it *is* decided this is going to be out there in the field, > > > please add > > > the new devices to the PCI IDs list. > > > http://pci-ids.ucw.cz/ > > > Otherwise there's no way to be sure someone won't try to > > > use these IDs for something else. > > > > PCI-SIG assigns vendor IDs and 0x1d0f is assigned to Amazon. See > > https://www.pcisig.com/membership/vid_search/ > > > > Vendors self-manage device IDs and we have allocated 0x1000-0x103f to > > virtio devices. > > > > >> That makes it impossible to use virtio drivers with > > >> a Windows guest without changing the vendor ID. > > > > > > Hardly impossible: virtio drivers are available from a > > > variety of sources. > > > > Examples? > > > > Regards, > > > > Anthony Liguori > > Fedora has drivers. > Most other linux distros have drivers. > Maybe not as good as RHEL ones but they are there, and > their existance is the most likely reason no one bothered > re-writing drivers. Why not BSD license the drivers and get them into Windows Update? Citrix donated their drivers to the Xen Project and licensed them under the BSD license. It makes this process a lot easier... See win-* prefixed projects in https://github.com/xenserver, now moved to pvdrivers/win repositories in http://xenbits.xen.org/gitweb --msw > Which does not mean you shouldn't do it. Be my guest. > > My only point is commit log should not say "impossible to use". > > > > > > > > But this is IMO beside the point. > > > > > >> Cc: Matt Wilson > > >> Cc: Rusty Russell > > >> Cc: Michael Tsirkin > > >> Signed-off-by: Anthony Liguori > > > > > > I'd like to see the response/confirmation of the above, and/or the > > > commit log replaced before this patch is applied. > > > > > > Thanks! > > > > > >> --- > > >> drivers/virtio/virtio_pci.c |2 ++ > > >> 1 file changed, 2 insertions(+) > > >> > > >> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c > > >> index 101db3f..9cbac33 100644 > > >> --- a/drivers/virtio/virtio_pci.c > > >> +++ b/drivers/virtio/virtio_pci.c > > >> @@ -93,6 +93,8 @@ struct virtio_pci_vq_info > > >> /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ > > >> static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = { > > >> { PCI_DEVICE(0x1af4, PCI_ANY_ID) }, > > >> + /* Amazon.com vendor ID */ > > >> + { PCI_DEVICE(0x1d0f, PCI_ANY_ID) }, > > >> { 0 } > > >> }; > > >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-pci: also bind to Amazon PCI vendor ID
is to change the subsystem ID *of the device*. as the licensing is constructed in such a way as to prevent that. Red Hat has non-redistributable Windows drivers and Microsoft will not allow anyone else to WHQL certify drivers using that vendor ID. Don't see what Red Hat's windows drivers have to do with Linux really. Amazon.com can do whatever it wants with its vendor ID, and if there is a hypervisor with a different vendor ID that can use the virtio drivers, this patch is required. The following would be a reasonable commit log in that case: Amazon.com uses PV devices with vendor ID 0x1d0f that are otherwise compatible with Linux virtio drivers. Add 0x1d0f ID to the list to make Linux work with these devices. Feel free to use :) But I'd like to note that by doing this on the hypervisor side, you lose the ability to run older Linux guests, and create work for all distros who now need to update their kernels to work with your ID, apparently for no good reason. So if this isn't out in the field yet, I would suggest examining the alternative listed above. I am very happy to use any alternative mechanism that allows for virtio to be used with a wide variety of guests. I hope my idea above can work - it would be best for everyone. I would try to explore it before giving up and changing the vendor ID. As I said this is just friendly advice, and not a nak to the patch, if your hypervisor has devices with your own ID, go ahead and add the ID to driver. Many other companies have also struggled with this and AFAIK no one has had much success. Question remains what the problem is. Until we have an explicit report saying we tried XYZ and it failed we are just shooting in the dark, aren't we? OTOH if it *is* decided this is going to be out there in the field, please add the new devices to the PCI IDs list. http://pci-ids.ucw.cz/ Otherwise there's no way to be sure someone won't try to use these IDs for something else. PCI-SIG assigns vendor IDs and 0x1d0f is assigned to Amazon. See https://www.pcisig.com/membership/vid_search/ Vendors self-manage device IDs and we have allocated 0x1000-0x103f to virtio devices. That makes it impossible to use virtio drivers with a Windows guest without changing the vendor ID. Hardly impossible: virtio drivers are available from a variety of sources. Examples? Regards, Anthony Liguori Fedora has drivers. Most other linux distros have drivers. Maybe not as good as RHEL ones but they are there, and their existance is the most likely reason no one bothered re-writing drivers. Why not BSD license the drivers and get them into Windows Update? Citrix donated their drivers to the Xen Project and licensed them under the BSD license. It makes this process a lot easier... See win-* prefixed projects in https://github.com/xenserver, now moved to pvdrivers/win repositories in http://xenbits.xen.org/gitweb --msw Which does not mean you shouldn't do it. Be my guest. My only point is commit log should not say impossible to use. But this is IMO beside the point. Cc: Matt Wilson m...@amazon.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael Tsirkin m...@redhat.com Signed-off-by: Anthony Liguori aligu...@amazon.com I'd like to see the response/confirmation of the above, and/or the commit log replaced before this patch is applied. Thanks! --- drivers/virtio/virtio_pci.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 101db3f..9cbac33 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -93,6 +93,8 @@ struct virtio_pci_vq_info /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = { { PCI_DEVICE(0x1af4, PCI_ANY_ID) }, + /* Amazon.com vendor ID */ + { PCI_DEVICE(0x1d0f, PCI_ANY_ID) }, { 0 } }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/2] xen/setup: Remap Xen Identity Mapped RAM
On Mon, Aug 11, 2014 at 10:34:01AM -0700, Matt Rushton wrote: > Instead of ballooning up and down dom0 memory this remaps the existing mfns > that were replaced by the identity map. The reason for this is that the > existing implementation ballooned memory up and and down which caused dom0 > to have discontiguous pages. In some cases this resulted in the use of bounce > buffers which reduced network I/O performance significantly. This change will > honor the existing order of the pages with the exception of some boundary > conditions. > > To do this we need to update both the Linux p2m table and the Xen m2p table. > Particular care must be taken when updating the p2m table since it's important > to limit table memory consumption and reuse the existing leaf pages which get > freed when an entire leaf page is set to the identity map. To implement this, > mapping updates are grouped into blocks with table entries getting cached > temporarily and then released. > > On my test system before: > Total pages: 2105014 > Total contiguous: 1640635 > > After: > Total pages: 2105014 > Total contiguous: 2098904 > > Signed-off-by: Matthew Rushton > --- Matt, Could you provide a summary of v2->v3 changes here? --msw > arch/x86/xen/p2m.c | 23 +--- > arch/x86/xen/p2m.h | 15 ++ > arch/x86/xen/setup.c | 370 > +++--- > 3 files changed, 314 insertions(+), 94 deletions(-) > create mode 100644 arch/x86/xen/p2m.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/2] xen/setup: Remap Xen Identity Mapped RAM
On Mon, Aug 11, 2014 at 10:34:01AM -0700, Matt Rushton wrote: Instead of ballooning up and down dom0 memory this remaps the existing mfns that were replaced by the identity map. The reason for this is that the existing implementation ballooned memory up and and down which caused dom0 to have discontiguous pages. In some cases this resulted in the use of bounce buffers which reduced network I/O performance significantly. This change will honor the existing order of the pages with the exception of some boundary conditions. To do this we need to update both the Linux p2m table and the Xen m2p table. Particular care must be taken when updating the p2m table since it's important to limit table memory consumption and reuse the existing leaf pages which get freed when an entire leaf page is set to the identity map. To implement this, mapping updates are grouped into blocks with table entries getting cached temporarily and then released. On my test system before: Total pages: 2105014 Total contiguous: 1640635 After: Total pages: 2105014 Total contiguous: 2098904 Signed-off-by: Matthew Rushton mrush...@amazon.com --- Matt, Could you provide a summary of v2-v3 changes here? --msw arch/x86/xen/p2m.c | 23 +--- arch/x86/xen/p2m.h | 15 ++ arch/x86/xen/setup.c | 370 +++--- 3 files changed, 314 insertions(+), 94 deletions(-) create mode 100644 arch/x86/xen/p2m.h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] xen/setup: Remap Xen Identity Mapped RAM
On Fri, Aug 01, 2014 at 03:52:28PM +0100, David Vrabel wrote: > On 31/07/14 18:43, David Vrabel wrote: > > On 20/07/14 01:01, Matt Rushton wrote: > >> Instead of ballooning up and down dom0 memory this remaps the existing mfns > >> that were replaced by the identity map. The reason for this is that the > >> existing implementation ballooned memory up and and down which caused dom0 > >> to have discontiguous pages. In some cases this resulted in the use of > >> bounce > >> buffers which reduced network I/O performance significantly. This change > >> will > >> honor the existing order of the pages with the exception of some boundary > >> conditions. > >> > >> To do this we need to update both the Linux p2m table and the Xen m2p > >> table. > >> Particular care must be taken when updating the p2m table since it's > >> important > >> to limit table memory consumption and reuse the existing leaf pages which > >> get > >> freed when an entire leaf page is set to the identity map. To implement > >> this, > >> mapping updates are grouped into blocks with table entries getting cached > >> temporarily and then released. > >> > >> On my test system before: > >> Total pages: 2105014 > >> Total contiguous: 1640635 > >> > >> After: > >> Total pages: 2105014 > >> Total contiguous: 2098904 > > > > Applied to devel/for-linus-3.17 > > Unfortunately, this produces too many WARNINGs on some boxes or > with certain configurations. Hi David, Do you have more information about the systems or configurations that showed a problem? --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] xen/setup: Remap Xen Identity Mapped RAM
On Fri, Aug 01, 2014 at 03:52:28PM +0100, David Vrabel wrote: On 31/07/14 18:43, David Vrabel wrote: On 20/07/14 01:01, Matt Rushton wrote: Instead of ballooning up and down dom0 memory this remaps the existing mfns that were replaced by the identity map. The reason for this is that the existing implementation ballooned memory up and and down which caused dom0 to have discontiguous pages. In some cases this resulted in the use of bounce buffers which reduced network I/O performance significantly. This change will honor the existing order of the pages with the exception of some boundary conditions. To do this we need to update both the Linux p2m table and the Xen m2p table. Particular care must be taken when updating the p2m table since it's important to limit table memory consumption and reuse the existing leaf pages which get freed when an entire leaf page is set to the identity map. To implement this, mapping updates are grouped into blocks with table entries getting cached temporarily and then released. On my test system before: Total pages: 2105014 Total contiguous: 1640635 After: Total pages: 2105014 Total contiguous: 2098904 Applied to devel/for-linus-3.17 Unfortunately, this produces too many WARNINGs on some boxes or with certain configurations. Hi David, Do you have more information about the systems or configurations that showed a problem? --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes
On Thu, Feb 06, 2014 at 11:20:04AM -0500, Konrad Rzeszutek Wilk wrote: > On Wed, Feb 05, 2014 at 08:57:30PM -0800, Matt Wilson wrote: > > On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote: > > > > This series contain blkback bug fixes for memory leaks (patches 1 and > > > > 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned > > > > since its memory layout is exactly the same as blkif_request_segment > > > > and should introduce no functional change. > > > > > > > > All patches should be backported to stable branches, although the last > > > > one is not a functional change it will still be nice to have it for > > > > code correctness. > > > > > > Matt and Matt, could you guys kindly take a look as well? Thank you! > > > > Matt R. did some testing today and set up additional tests to run > > overnight. He'll follow up after the overnight tests complete. > > Thank you! Just in case the various mailing list software ate Matt's messages, he sent the following: [PATCH v2 2/4] xen-blkback: fix memory leaks Tested-by: Matt Rushton Reviewed-by: Matt Rushton [PATCH v2 3/4] xen-blkback: fix shutdown race Tested-by: Matt Rushton Reviewed-by: Matt Rushton [PATCH v2 4/4] xen-blkif: drop struct blkif_request_segment_aligned Tested-by: Matt Rushton I've separately sent suggestions to Matt on how to setup his mailer to format messages per list etiquette and how to avoid breaking message threading. --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes
On Thu, Feb 06, 2014 at 11:20:04AM -0500, Konrad Rzeszutek Wilk wrote: On Wed, Feb 05, 2014 at 08:57:30PM -0800, Matt Wilson wrote: On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote: On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote: This series contain blkback bug fixes for memory leaks (patches 1 and 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned since its memory layout is exactly the same as blkif_request_segment and should introduce no functional change. All patches should be backported to stable branches, although the last one is not a functional change it will still be nice to have it for code correctness. Matt and Matt, could you guys kindly take a look as well? Thank you! Matt R. did some testing today and set up additional tests to run overnight. He'll follow up after the overnight tests complete. Thank you! Just in case the various mailing list software ate Matt's messages, he sent the following: [PATCH v2 2/4] xen-blkback: fix memory leaks Tested-by: Matt Rushton mrush...@amazon.com Reviewed-by: Matt Rushton mrush...@amazon.com [PATCH v2 3/4] xen-blkback: fix shutdown race Tested-by: Matt Rushton mrush...@amazon.com Reviewed-by: Matt Rushton mrush...@amazon.com [PATCH v2 4/4] xen-blkif: drop struct blkif_request_segment_aligned Tested-by: Matt Rushton mrush...@amazon.com I've separately sent suggestions to Matt on how to setup his mailer to format messages per list etiquette and how to avoid breaking message threading. --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes
On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote: > > This series contain blkback bug fixes for memory leaks (patches 1 and > > 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned > > since its memory layout is exactly the same as blkif_request_segment > > and should introduce no functional change. > > > > All patches should be backported to stable branches, although the last > > one is not a functional change it will still be nice to have it for > > code correctness. > > Matt and Matt, could you guys kindly take a look as well? Thank you! Matt R. did some testing today and set up additional tests to run overnight. He'll follow up after the overnight tests complete. --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes
On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote: On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote: This series contain blkback bug fixes for memory leaks (patches 1 and 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned since its memory layout is exactly the same as blkif_request_segment and should introduce no functional change. All patches should be backported to stable branches, although the last one is not a functional change it will still be nice to have it for code correctness. Matt and Matt, could you guys kindly take a look as well? Thank you! Matt R. did some testing today and set up additional tests to run overnight. He'll follow up after the overnight tests complete. --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7] xen/grant-table: Avoid m2p_override during mapping
On Mon, Feb 03, 2014 at 01:24:58PM +, Zoltan Kiss wrote: > The grant mapping API does m2p_override unnecessarily: only gntdev needs it, > for blkback and future netback patches it just cause a lock contention, as > those pages never go to userspace. Therefore this series does the following: > - the original functions were renamed to __gnttab_[un]map_refs, with a new > parameter m2p_override > - based on m2p_override either they follow the original behaviour, or just set > the private flag and call set_phys_to_machine > - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with > m2p_override false > - a new function gnttab_[un]map_refs_userspace provides the old behaviour > > It also removes a stray space from page.h and change ret to 0 if > XENFEAT_auto_translated_physmap, as that is the only possible return value > there. > > v2: > - move the storing of the old mfn in page->index to gnttab_map_refs > - move the function header update to a separate patch > > v3: > - a new approach to retain old behaviour where it needed > - squash the patches into one > > v4: > - move out the common bits from m2p* functions, and pass pfn/mfn as parameter > - clear page->private before doing anything with the page, so > m2p_find_override > won't race with this > > v5: > - change return value handling in __gnttab_[un]map_refs > - remove a stray space in page.h > - add detail why ret = 0 now at some places > > v6: > - don't pass pfn to m2p* functions, just get it locally > > v7: > - the previous version broke build on ARM, as there is no need for those p2m > changes. I've put them into arch specific functions, which are stubs on arm > > Signed-off-by: Zoltan Kiss > Suggested-by: David Vrabel You're still forgetting that this was originally proposed by Anthony Liguori . https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] xen-blkback: bug fixes
On Tue, Jan 28, 2014 at 03:38:37PM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Jan 28, 2014 at 06:43:32PM +0100, Roger Pau Monne wrote: > > blkback bug fixes for memory leaks (patches 1 and 2) and a race > > (patch 3). > > They all look OK to me. I've stuck them in my 'stable/for-jens-3.14' > branch and are testing them now (hadn't pushed it yet). > > Matt and Matt, > > Could you take a look at the other two patches as well? Sure, though somehow you didn't address your message to us, so I didn't see it until today. Matt Rushton did some review and testing on an earlier version that came out fine. We'll give the final series a test since there was still a bit of rework. --msw > David, Boris, > > Are you OK with pushing those patches out to Jens Axboe if nobody > gives an NACK by Friday? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] xen-blkback: bug fixes
On Tue, Jan 28, 2014 at 03:38:37PM -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jan 28, 2014 at 06:43:32PM +0100, Roger Pau Monne wrote: blkback bug fixes for memory leaks (patches 1 and 2) and a race (patch 3). They all look OK to me. I've stuck them in my 'stable/for-jens-3.14' branch and are testing them now (hadn't pushed it yet). Matt and Matt, Could you take a look at the other two patches as well? Sure, though somehow you didn't address your message to us, so I didn't see it until today. Matt Rushton did some review and testing on an earlier version that came out fine. We'll give the final series a test since there was still a bit of rework. --msw David, Boris, Are you OK with pushing those patches out to Jens Axboe if nobody gives an NACK by Friday? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7] xen/grant-table: Avoid m2p_override during mapping
On Mon, Feb 03, 2014 at 01:24:58PM +, Zoltan Kiss wrote: The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page-index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page-private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally v7: - the previous version broke build on ARM, as there is no need for those p2m changes. I've put them into arch specific functions, which are stubs on arm Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com Suggested-by: David Vrabel david.vra...@citrix.com You're still forgetting that this was originally proposed by Anthony Liguori aligu...@amazon.com. https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-blkback: fix memory leaks
On Mon, Jan 27, 2014 at 05:19:41PM +0100, Roger Pau Monné wrote: > On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote: > > On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote: > >> I've at least identified two possible memory leaks in blkback, both > >> related to the shutdown path of a VBD: > >> > >> - We don't wait for any pending purge work to finish before cleaning > >> the list of free_pages. The purge work will call put_free_pages and > >> thus we might end up with pages being added to the free_pages list > >> after we have emptied it. > >> - We don't wait for pending requests to end before cleaning persistent > >> grants and the list of free_pages. Again this can add pages to the > >> free_pages lists or persistent grants to the persistent_gnts > >> red-black tree. > >> > >> Also, add some checks in xen_blkif_free to make sure we are cleaning > >> everything. > >> > >> Signed-off-by: Roger Pau Monné > >> Cc: Konrad Rzeszutek Wilk > >> Cc: David Vrabel > >> Cc: Boris Ostrovsky > >> Cc: Matt Rushton > >> Cc: Matt Wilson > >> Cc: Ian Campbell > >> --- > >> This should be applied after the patch: > >> > >> xen-blkback: fix memory leak when persistent grants are used > > > > Could you respin the series with the issues below fixed and > > have said patch as part of the series. That way not only does > > it have your SoB on it but it makes it easier to apply the patch > > for lazy^H^H^Hbusy maintainers and makes it clear that you had > > tested both of them. > > > > Also, please CC Jens Axboe on these patches. > > Ack, will do once the comments below are sorted out. We'll keep an eye out for the resubmitted series and do some testing. Thanks for taking this the extra step. --msw > >> >From Matt Rushton & Matt Wilson and backported to stable. > >> > >> I've been able to create and destroy ~4000 guests while doing heavy IO > >> operations with this patch on a 512M Dom0 without problems. > >> --- > >> drivers/block/xen-blkback/blkback.c | 29 +++-- > >> drivers/block/xen-blkback/xenbus.c |9 + > >> 2 files changed, 28 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkback/blkback.c > >> b/drivers/block/xen-blkback/blkback.c > >> index 30ef7b3..19925b7 100644 > >> --- a/drivers/block/xen-blkback/blkback.c > >> +++ b/drivers/block/xen-blkback/blkback.c > >> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif > >> *blkif, > >>struct pending_req *pending_req); > >> static void make_response(struct xen_blkif *blkif, u64 id, > >> unsigned short op, int st); > >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force); > >> > >> #define foreach_grant_safe(pos, n, rbtree, node) \ > >>for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \ > >> @@ -625,6 +626,12 @@ purge_gnt_list: > >>print_stats(blkif); > >>} > >> > >> + /* Drain pending IO */ > >> + xen_blk_drain_io(blkif, true); > >> + > >> + /* Drain pending purge work */ > >> + flush_work(>persistent_purge_work); > >> + > >>/* Free all persistent grant pages */ > >>if (!RB_EMPTY_ROOT(>persistent_gnts)) > >>free_persistent_gnts(blkif, >persistent_gnts, > >> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif, > >>return -EIO; > >> } > >> > >> -static void xen_blk_drain_io(struct xen_blkif *blkif) > >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force) > >> { > >>atomic_set(>drain, 1); > >>do { > >> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif) > >> > >>if (!atomic_read(>drain)) > >>break; > >> - } while (!kthread_should_stop()); > >> + } while (!kthread_should_stop() || force); > >>atomic_set(>drain, 0); > >> } > >> > >> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req > >> *pending_req, int error) > >> * the proper response on the ring. > >> */ > >>if (atomic_dec_and_test(_req->pendcnt)) { > >> - xen
Re: [PATCH] xen-blkback: fix memory leaks
On Mon, Jan 27, 2014 at 05:19:41PM +0100, Roger Pau Monné wrote: On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote: On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote: I've at least identified two possible memory leaks in blkback, both related to the shutdown path of a VBD: - We don't wait for any pending purge work to finish before cleaning the list of free_pages. The purge work will call put_free_pages and thus we might end up with pages being added to the free_pages list after we have emptied it. - We don't wait for pending requests to end before cleaning persistent grants and the list of free_pages. Again this can add pages to the free_pages lists or persistent grants to the persistent_gnts red-black tree. Also, add some checks in xen_blkif_free to make sure we are cleaning everything. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: David Vrabel david.vra...@citrix.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Matt Rushton mrush...@amazon.com Cc: Matt Wilson m...@amazon.com Cc: Ian Campbell ian.campb...@citrix.com --- This should be applied after the patch: xen-blkback: fix memory leak when persistent grants are used Could you respin the series with the issues below fixed and have said patch as part of the series. That way not only does it have your SoB on it but it makes it easier to apply the patch for lazy^H^H^Hbusy maintainers and makes it clear that you had tested both of them. Also, please CC Jens Axboe on these patches. Ack, will do once the comments below are sorted out. We'll keep an eye out for the resubmitted series and do some testing. Thanks for taking this the extra step. --msw From Matt Rushton Matt Wilson and backported to stable. I've been able to create and destroy ~4000 guests while doing heavy IO operations with this patch on a 512M Dom0 without problems. --- drivers/block/xen-blkback/blkback.c | 29 +++-- drivers/block/xen-blkback/xenbus.c |9 + 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 30ef7b3..19925b7 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, struct pending_req *pending_req); static void make_response(struct xen_blkif *blkif, u64 id, unsigned short op, int st); +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force); #define foreach_grant_safe(pos, n, rbtree, node) \ for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \ @@ -625,6 +626,12 @@ purge_gnt_list: print_stats(blkif); } + /* Drain pending IO */ + xen_blk_drain_io(blkif, true); + + /* Drain pending purge work */ + flush_work(blkif-persistent_purge_work); + /* Free all persistent grant pages */ if (!RB_EMPTY_ROOT(blkif-persistent_gnts)) free_persistent_gnts(blkif, blkif-persistent_gnts, @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif, return -EIO; } -static void xen_blk_drain_io(struct xen_blkif *blkif) +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force) { atomic_set(blkif-drain, 1); do { @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif) if (!atomic_read(blkif-drain)) break; - } while (!kthread_should_stop()); + } while (!kthread_should_stop() || force); atomic_set(blkif-drain, 0); } @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error) * the proper response on the ring. */ if (atomic_dec_and_test(pending_req-pendcnt)) { - xen_blkbk_unmap(pending_req-blkif, + struct xen_blkif *blkif = pending_req-blkif; + + xen_blkbk_unmap(blkif, pending_req-segments, pending_req-nr_pages); - make_response(pending_req-blkif, pending_req-id, + make_response(blkif, pending_req-id, pending_req-operation, pending_req-status); - xen_blkif_put(pending_req-blkif); - if (atomic_read(pending_req-blkif-refcnt) = 2) { - if (atomic_read(pending_req-blkif-drain)) - complete(pending_req-blkif-drain_complete); + free_req(blkif, pending_req); + xen_blkif_put(blkif); + if (atomic_read(blkif-refcnt) = 2) { + if (atomic_read(blkif-drain)) + complete(blkif-drain_complete); } - free_req(pending_req-blkif
Re: [Xen-devel] [PATCH] xen-blkback: fix memory leak when persistent grants are used
On Fri, Jan 24, 2014 at 03:36:22PM +, Ian Campbell wrote: > On Fri, 2014-01-24 at 09:21 +, Ian Campbell wrote: > > On Thu, 2014-01-23 at 11:28 -0800, Matt Wilson wrote: > > > From: Matt Rushton > > > > > > Currently shrink_free_pagepool() is called before the pages used for > > > persistent grants are released via free_persistent_gnts(). This > > > results in a memory leak when a VBD that uses persistent grants is > > > torn down. > > > > This may well be the explanation for the memory leak I was observing on > > ARM last night. I'll give it a go and let you know. > > Results are a bit inconclusive unfortunately, it seems like I am seeing > some other leak too (or instead). > > Totally unscientifically it does seem to be leaking more slowly than > before, so perhaps this patch has helped, but nothing conclusive I'm > afraid. Testing here looks good. I don't know if perhaps something else is going on with ARM... > I don't think that quite qualifies for a Tested-by though, sorry. How about an Acked-by? ;-) --msw > Ian. > > > > > > Cc: Konrad Rzeszutek Wilk > > > Cc: "Roger Pau Monné" > > > Cc: Ian Campbell > > > Cc: David Vrabel > > > Cc: linux-kernel@vger.kernel.org > > > Cc: xen-de...@lists.xen.org > > > Cc: Anthony Liguori > > > Signed-off-by: Matt Rushton > > > Signed-off-by: Matt Wilson > > > --- > > > drivers/block/xen-blkback/blkback.c |6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/block/xen-blkback/blkback.c > > > b/drivers/block/xen-blkback/blkback.c > > > index 6620b73..30ef7b3 100644 > > > --- a/drivers/block/xen-blkback/blkback.c > > > +++ b/drivers/block/xen-blkback/blkback.c > > > @@ -625,9 +625,6 @@ purge_gnt_list: > > > print_stats(blkif); > > > } > > > > > > - /* Since we are shutting down remove all pages from the buffer */ > > > - shrink_free_pagepool(blkif, 0 /* All */); > > > - > > > /* Free all persistent grant pages */ > > > if (!RB_EMPTY_ROOT(>persistent_gnts)) > > > free_persistent_gnts(blkif, >persistent_gnts, > > > @@ -636,6 +633,9 @@ purge_gnt_list: > > > BUG_ON(!RB_EMPTY_ROOT(>persistent_gnts)); > > > blkif->persistent_gnt_c = 0; > > > > > > + /* Since we are shutting down remove all pages from the buffer */ > > > + shrink_free_pagepool(blkif, 0 /* All */); > > > + > > > if (log_stats) > > > print_stats(blkif); > > > > > > > > > > > ___ > > Xen-devel mailing list > > xen-de...@lists.xen.org > > http://lists.xen.org/xen-devel > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen-blkback: fix memory leak when persistent grants are used
On Fri, Jan 24, 2014 at 03:36:22PM +, Ian Campbell wrote: On Fri, 2014-01-24 at 09:21 +, Ian Campbell wrote: On Thu, 2014-01-23 at 11:28 -0800, Matt Wilson wrote: From: Matt Rushton mrush...@amazon.com Currently shrink_free_pagepool() is called before the pages used for persistent grants are released via free_persistent_gnts(). This results in a memory leak when a VBD that uses persistent grants is torn down. This may well be the explanation for the memory leak I was observing on ARM last night. I'll give it a go and let you know. Results are a bit inconclusive unfortunately, it seems like I am seeing some other leak too (or instead). Totally unscientifically it does seem to be leaking more slowly than before, so perhaps this patch has helped, but nothing conclusive I'm afraid. Testing here looks good. I don't know if perhaps something else is going on with ARM... I don't think that quite qualifies for a Tested-by though, sorry. How about an Acked-by? ;-) --msw Ian. Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Roger Pau Monné roger@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: David Vrabel david.vra...@citrix.com Cc: linux-kernel@vger.kernel.org Cc: xen-de...@lists.xen.org Cc: Anthony Liguori aligu...@amazon.com Signed-off-by: Matt Rushton mrush...@amazon.com Signed-off-by: Matt Wilson m...@amazon.com --- drivers/block/xen-blkback/blkback.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 6620b73..30ef7b3 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -625,9 +625,6 @@ purge_gnt_list: print_stats(blkif); } - /* Since we are shutting down remove all pages from the buffer */ - shrink_free_pagepool(blkif, 0 /* All */); - /* Free all persistent grant pages */ if (!RB_EMPTY_ROOT(blkif-persistent_gnts)) free_persistent_gnts(blkif, blkif-persistent_gnts, @@ -636,6 +633,9 @@ purge_gnt_list: BUG_ON(!RB_EMPTY_ROOT(blkif-persistent_gnts)); blkif-persistent_gnt_c = 0; + /* Since we are shutting down remove all pages from the buffer */ + shrink_free_pagepool(blkif, 0 /* All */); + if (log_stats) print_stats(blkif); ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6] xen/grant-table: Avoid m2p_override during mapping
On Thu, Jan 23, 2014 at 09:23:44PM +, Zoltan Kiss wrote: > The grant mapping API does m2p_override unnecessarily: only gntdev needs it, > for blkback and future netback patches it just cause a lock contention, as > those pages never go to userspace. Therefore this series does the following: > - the original functions were renamed to __gnttab_[un]map_refs, with a new > parameter m2p_override > - based on m2p_override either they follow the original behaviour, or just set > the private flag and call set_phys_to_machine > - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with > m2p_override false > - a new function gnttab_[un]map_refs_userspace provides the old behaviour > > It also removes a stray space from page.h and change ret to 0 if > XENFEAT_auto_translated_physmap, as that is the only possible return value > there. > > v2: > - move the storing of the old mfn in page->index to gnttab_map_refs > - move the function header update to a separate patch > > v3: > - a new approach to retain old behaviour where it needed > - squash the patches into one > > v4: > - move out the common bits from m2p* functions, and pass pfn/mfn as parameter > - clear page->private before doing anything with the page, so > m2p_find_override > won't race with this > > v5: > - change return value handling in __gnttab_[un]map_refs > - remove a stray space in page.h > - add detail why ret = 0 now at some places > > v6: > - don't pass pfn to m2p* functions, just get it locally > > Signed-off-by: Zoltan Kiss > Suggested-by: David Vrabel Apologies for coming in late on this thread. I'm quite behind on xen-devel mail that isn't CC: to me. It seems to have been forgotten that Anthony and I proposed a similar change last November. https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws Or am I misunderstanding the change? --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] xen-blkback: fix memory leak when persistent grants are used
From: Matt Rushton Currently shrink_free_pagepool() is called before the pages used for persistent grants are released via free_persistent_gnts(). This results in a memory leak when a VBD that uses persistent grants is torn down. Cc: Konrad Rzeszutek Wilk Cc: "Roger Pau Monné" Cc: Ian Campbell Cc: David Vrabel Cc: linux-kernel@vger.kernel.org Cc: xen-de...@lists.xen.org Cc: Anthony Liguori Signed-off-by: Matt Rushton Signed-off-by: Matt Wilson --- drivers/block/xen-blkback/blkback.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 6620b73..30ef7b3 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -625,9 +625,6 @@ purge_gnt_list: print_stats(blkif); } - /* Since we are shutting down remove all pages from the buffer */ - shrink_free_pagepool(blkif, 0 /* All */); - /* Free all persistent grant pages */ if (!RB_EMPTY_ROOT(>persistent_gnts)) free_persistent_gnts(blkif, >persistent_gnts, @@ -636,6 +633,9 @@ purge_gnt_list: BUG_ON(!RB_EMPTY_ROOT(>persistent_gnts)); blkif->persistent_gnt_c = 0; + /* Since we are shutting down remove all pages from the buffer */ + shrink_free_pagepool(blkif, 0 /* All */); + if (log_stats) print_stats(blkif); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] xen-blkback: fix memory leak when persistent grants are used
From: Matt Rushton mrush...@amazon.com Currently shrink_free_pagepool() is called before the pages used for persistent grants are released via free_persistent_gnts(). This results in a memory leak when a VBD that uses persistent grants is torn down. Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Roger Pau Monné roger@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: David Vrabel david.vra...@citrix.com Cc: linux-kernel@vger.kernel.org Cc: xen-de...@lists.xen.org Cc: Anthony Liguori aligu...@amazon.com Signed-off-by: Matt Rushton mrush...@amazon.com Signed-off-by: Matt Wilson m...@amazon.com --- drivers/block/xen-blkback/blkback.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 6620b73..30ef7b3 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -625,9 +625,6 @@ purge_gnt_list: print_stats(blkif); } - /* Since we are shutting down remove all pages from the buffer */ - shrink_free_pagepool(blkif, 0 /* All */); - /* Free all persistent grant pages */ if (!RB_EMPTY_ROOT(blkif-persistent_gnts)) free_persistent_gnts(blkif, blkif-persistent_gnts, @@ -636,6 +633,9 @@ purge_gnt_list: BUG_ON(!RB_EMPTY_ROOT(blkif-persistent_gnts)); blkif-persistent_gnt_c = 0; + /* Since we are shutting down remove all pages from the buffer */ + shrink_free_pagepool(blkif, 0 /* All */); + if (log_stats) print_stats(blkif); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6] xen/grant-table: Avoid m2p_override during mapping
On Thu, Jan 23, 2014 at 09:23:44PM +, Zoltan Kiss wrote: The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page-index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page-private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com Suggested-by: David Vrabel david.vra...@citrix.com Apologies for coming in late on this thread. I'm quite behind on xen-devel mail that isn't CC: to me. It seems to have been forgotten that Anthony and I proposed a similar change last November. https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws Or am I misunderstanding the change? --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
From: Matt Wilson Commit f62805f1 introduced a bug where lazy MMU mode isn't exited if a m2p_add/remove_override call fails. Cc: Stefano Stabellini Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Vrabel Cc: Anthony Liguori Cc: xen-de...@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Matt Wilson --- drivers/xen/grant-table.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 62ccf54..0283871 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -930,9 +930,10 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, ret = m2p_add_override(mfn, pages[i], kmap_ops ? _ops[i] : NULL); if (ret) - return ret; + goto out; } + out: if (lazy) arch_leave_lazy_mmu_mode(); @@ -969,9 +970,10 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, ret = m2p_remove_override(pages[i], kmap_ops ? _ops[i] : NULL); if (ret) - return ret; + goto out; } + out: if (lazy) arch_leave_lazy_mmu_mode(); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
From: Matt Wilson m...@amazon.com Commit f62805f1 introduced a bug where lazy MMU mode isn't exited if a m2p_add/remove_override call fails. Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: David Vrabel david.vra...@citrix.com Cc: Anthony Liguori aligu...@amazon.com Cc: xen-de...@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Matt Wilson m...@amazon.com --- drivers/xen/grant-table.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 62ccf54..0283871 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -930,9 +930,10 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, ret = m2p_add_override(mfn, pages[i], kmap_ops ? kmap_ops[i] : NULL); if (ret) - return ret; + goto out; } + out: if (lazy) arch_leave_lazy_mmu_mode(); @@ -969,9 +970,10 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, ret = m2p_remove_override(pages[i], kmap_ops ? kmap_ops[i] : NULL); if (ret) - return ret; + goto out; } + out: if (lazy) arch_leave_lazy_mmu_mode(); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-gnttab: do not add m2p override entries for blkback mappings
On Tue, Nov 12, 2013 at 05:48:56PM -0800, Anthony Liguori wrote: > From: Anthony Liguori > > Commit 5dc03639 switched blkback to also add m2p override entries > when mapping grant pages but history seems to have forgotten why > this is useful if it ever was. > > The blkback driver does not need m2p override entries to exist > and there is significant overhead due to the locking in the m2p > override table. We see about a 10% improvement in IOP rate with > this patch applied running FIO in the guest. > > See http://article.gmane.org/gmane.linux.kernel/1590932 for a full > analysis of current users. > > Signed-off-by: Anthony Liguori Reviewed-by: Matt Wilson One comment for discussion below. > --- > A backported version of this has been heavily tested but the testing > against the latest Linux tree is light so far. [...] > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > index c4d2298..081be8d 100644 > --- a/drivers/xen/grant-table.c > +++ b/drivers/xen/grant-table.c > @@ -881,7 +881,8 @@ EXPORT_SYMBOL_GPL(gnttab_batch_copy); > > int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > struct gnttab_map_grant_ref *kmap_ops, > - struct page **pages, unsigned int count) > + struct page **pages, unsigned int count, > + int override) > { > int i, ret; > bool lazy = false; > @@ -918,10 +919,29 @@ int gnttab_map_refs(struct gnttab_map_grant_ref > *map_ops, > } else { > mfn = PFN_DOWN(map_ops[i].dev_bus_addr); > } > - ret = m2p_add_override(mfn, pages[i], kmap_ops ? > -_ops[i] : NULL); > - if (ret) > - return ret; > + > + if (override) { > + ret = m2p_add_override(mfn, pages[i], kmap_ops ? > +_ops[i] : NULL); > + if (ret) > + return ret; The original code, which remains unmodified by this patch, is returning without calling arch_leave_lazy_mmu_mode() at the bottom of this function. > + } else { > + unsigned long pfn, old_mfn; > + > + pfn = page_to_pfn(pages[i]); > + old_mfn = pfn_to_mfn(pfn); > + > + /* Save previous MFN in page private*/ > + WARN_ON(PagePrivate(pages[i])); > + SetPagePrivate(pages[i]); > + set_page_private(pages[i], old_mfn); > + > + if (!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))) { > + ret = -ENOMEM; > + break; The new path doesn't repeat this error. I think that the override path needs to be changed to do the same in a separate patch. --msw > + } > + } > + > } > > if (lazy) arch_leave_lazy_mmu_mode(); return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-gnttab: do not add m2p override entries for blkback mappings
On Tue, Nov 12, 2013 at 05:48:56PM -0800, Anthony Liguori wrote: From: Anthony Liguori aligu...@amazon.com Commit 5dc03639 switched blkback to also add m2p override entries when mapping grant pages but history seems to have forgotten why this is useful if it ever was. The blkback driver does not need m2p override entries to exist and there is significant overhead due to the locking in the m2p override table. We see about a 10% improvement in IOP rate with this patch applied running FIO in the guest. See http://article.gmane.org/gmane.linux.kernel/1590932 for a full analysis of current users. Signed-off-by: Anthony Liguori aligu...@amazon.com Reviewed-by: Matt Wilson m...@amazon.com One comment for discussion below. --- A backported version of this has been heavily tested but the testing against the latest Linux tree is light so far. [...] diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index c4d2298..081be8d 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -881,7 +881,8 @@ EXPORT_SYMBOL_GPL(gnttab_batch_copy); int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, - struct page **pages, unsigned int count) + struct page **pages, unsigned int count, + int override) { int i, ret; bool lazy = false; @@ -918,10 +919,29 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, } else { mfn = PFN_DOWN(map_ops[i].dev_bus_addr); } - ret = m2p_add_override(mfn, pages[i], kmap_ops ? -kmap_ops[i] : NULL); - if (ret) - return ret; + + if (override) { + ret = m2p_add_override(mfn, pages[i], kmap_ops ? +kmap_ops[i] : NULL); + if (ret) + return ret; The original code, which remains unmodified by this patch, is returning without calling arch_leave_lazy_mmu_mode() at the bottom of this function. + } else { + unsigned long pfn, old_mfn; + + pfn = page_to_pfn(pages[i]); + old_mfn = pfn_to_mfn(pfn); + + /* Save previous MFN in page private*/ + WARN_ON(PagePrivate(pages[i])); + SetPagePrivate(pages[i]); + set_page_private(pages[i], old_mfn); + + if (!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))) { + ret = -ENOMEM; + break; The new path doesn't repeat this error. I think that the override path needs to be changed to do the same in a separate patch. --msw + } + } + } if (lazy) arch_leave_lazy_mmu_mode(); return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Wed, Nov 06, 2013 at 11:34:27AM +, David Vrabel wrote: [...] > > Matt, Anthony, I presume you have profiling results or performance data > that support this proposed change? Can you provide them? I've measured 10-20% performance improvement in configurations where: 1) dom0 has a moderate number of vCPUs doing blkback work 2) domU has 32 vCPUs 3) 24 configured VBDs without persistent grant support 4) some lock contention in grant table hypercalls has been alleviated More specific results are still in the works. > > It's perfectly fine to store a foreign pfn in the m2p table. The m2p > > override table is used by the grant device to allow a reverse lookup of > > the real mfn to a pfn even if it's foreign. > > > > blkback doesn't actually need this though. This was introduced in: > > > > commit 5dc03639cc903f887931831d69895facb5260f4b > > Author: Konrad Rzeszutek Wilk > > Date: Tue Mar 1 16:46:45 2011 -0500 > > > > xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map > > > > Purely as an optimization. In practice though due to lock contention it > > slows things down. > > The full changeset description for this change doesn't make sense to me. > > xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map > > Instead of doing copy grants lets do mapping grants using > the M2P(and P2M) override mechanism. > > As all it is doing is replacing set_phys_to_machine() calls with > m2p_add_override(). Indeed, since this had nothing to do with copying. We were confused also. Konrad? --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Wed, Nov 06, 2013 at 11:34:27AM +, David Vrabel wrote: [...] Matt, Anthony, I presume you have profiling results or performance data that support this proposed change? Can you provide them? I've measured 10-20% performance improvement in configurations where: 1) dom0 has a moderate number of vCPUs doing blkback work 2) domU has 32 vCPUs 3) 24 configured VBDs without persistent grant support 4) some lock contention in grant table hypercalls has been alleviated More specific results are still in the works. It's perfectly fine to store a foreign pfn in the m2p table. The m2p override table is used by the grant device to allow a reverse lookup of the real mfn to a pfn even if it's foreign. blkback doesn't actually need this though. This was introduced in: commit 5dc03639cc903f887931831d69895facb5260f4b Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Tue Mar 1 16:46:45 2011 -0500 xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map Purely as an optimization. In practice though due to lock contention it slows things down. The full changeset description for this change doesn't make sense to me. xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map Instead of doing copy grants lets do mapping grants using the M2P(and P2M) override mechanism. As all it is doing is replacing set_phys_to_machine() calls with m2p_add_override(). Indeed, since this had nothing to do with copying. We were confused also. Konrad? --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] (xen) stable/for-jens-3.13
On Tue, Nov 05, 2013 at 02:52:29PM -0800, Matt Wilson wrote: > On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote: > > > > Hey Jens, > > > > Please git pull the following branch: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > > stable/for-jens-3.13 > > > > > > which has a fix to the Xen block frontend driver. For the backends that > > don't support persistent grants, we don't want to use the logic that would > > use copying (needed for persistent grants). That logic existed there > > initially > > but with the persistent grants code added in - it was removed (so in 3.10 > > time-frame). This adds part of it back. > > I just tested this patch as part of a backported series on top of a > 3.4.68 base. When this patch is included and a domU boots on a system > that does not support persistent grants the kernel panics on boot with > the trace below. Full boot log: http://paste.ubuntu.com/6367329/ > > It's possible we missed something in the backport series. I'll test > this patch on top of 3.11.4 to see how it fares. Hi Konrad, Looks like there's something amiss in the backport. Testing your git branch worked fine. Sorry for the false alarm. --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] (xen) stable/for-jens-3.13
On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote: > > Hey Jens, > > Please git pull the following branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > stable/for-jens-3.13 > > > which has a fix to the Xen block frontend driver. For the backends that > don't support persistent grants, we don't want to use the logic that would > use copying (needed for persistent grants). That logic existed there initially > but with the persistent grants code added in - it was removed (so in 3.10 > time-frame). This adds part of it back. I just tested this patch as part of a backported series on top of a 3.4.68 base. When this patch is included and a domU boots on a system that does not support persistent grants the kernel panics on boot with the trace below. Full boot log: http://paste.ubuntu.com/6367329/ It's possible we missed something in the backport series. I'll test this patch on top of 3.11.4 to see how it fares. general protection fault: [#1] SMP CPU 14 Modules linked in: aesni_intel cryptd aes_x86_64 aes_generic dm_mirror dm_region_hash dm_log dm_mod Pid: 2192, comm: blkid Not tainted 3.4.68-3.8.amzn1.x86_64 #1 Xen HVM domU RIP: 0010:[] [] get_grant+0x51/0x110 RSP: 0018:881d07b5d978 EFLAGS: 00010006 RAX: 025c68fd93b116ed RBX: 880e72fc6b50 RCX: RDX: 1e8134cf7b38b76c RSI: 01d1a05f RDI: 881d07b5da64 RBP: 881d07b5d998 R08: e200 R09: dead00100100 R10: dead00200200 R11: 8b9207578571 R12: 01d1a05f R13: 880e72efc000 R14: 881d07b5da64 R15: 881d1822b0b0 FS: 7fe197073760() GS:881d5bcc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fe196c4a3b0 CR3: 001d079c6000 CR4: 000407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process blkid (pid: 2192, threadinfo 881d07b5c000, task 881d07ae1690) Stack: 0246 880e72963400 881d1822b0b0 881d07b5da98 812ddf60 0001 2000 00ff881d07b5d9e8 881d07b5dfd8 881d1822b000 880e72d560b0 Call Trace: [] do_blkif_request+0x410/0x7c0 [] ? submit_bh+0xf7/0x120 [] queue_unplugged+0x56/0xf0 [] blk_flush_plug_list+0x1c3/0x220 [] blk_finish_plug+0x18/0x50 [] __do_page_cache_readahead+0x1c1/0x250 [] force_page_cache_readahead+0x71/0xa0 [] page_cache_sync_readahead+0x43/0x50 [] generic_file_aio_read+0x4d8/0x760 [] ? do_last+0x470/0x8c0 [] blkdev_aio_read+0x58/0x80 [] do_sync_read+0xda/0x120 [] ? security_file_permission+0x93/0xb0 [] ? rw_verify_area+0x61/0xf0 [] vfs_read+0xb0/0x180 [] sys_read+0x4a/0x90 [] system_call_fastpath+0x16/0x1b Code: 48 8d 82 b0 12 00 00 48 39 c3 0f 84 ca 00 00 00 49 b9 00 01 10 00 00 00 ad de 48 8b 13 49 ba 00 02 20 00 00 00 ad de 48 8b 43 08 <48> 89 42 08 48 89 10 44 8b 5b f0 4c 89 0b 4c 89 53 08 45 85 db RIP [] get_grant+0x51/0x110 RSP ---[ end trace 3eba56aed2b98597 ]--- --msw > Please pull! > > P.S. > Sorry for the late pull! > > drivers/block/xen-blkfront.c | 125 > ++- > 1 file changed, 100 insertions(+), 25 deletions(-) > > Roger Pau Monne (1): > xen-blkfront: restore the non-persistent data path > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Tue, Nov 05, 2013 at 02:09:53PM -0800, Anthony Liguori wrote: > On Tue, Nov 5, 2013 at 1:16 PM, Konrad Rzeszutek Wilk > wrote: > > On Tue, Nov 05, 2013 at 12:53:17PM -0800, Anthony Liguori wrote: > >> Matt Wilson writes: > >> > >> > On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote: > >> >> On 05/11/13 16:08, Ian Campbell wrote: > >> >> > On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote: > >> >> >> On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote: > >> >> >>> On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote: > >> >> >>>> On 05/11/13 13:36, David Vrabel wrote: > >> >> >>>>> On 05/11/13 11:24, Roger Pau Monne wrote: > >> >> >>>>>> IMHO there's no reason to set a m2p override if the mapping is > >> >> >>>>>> done in > >> >> >>>>>> kernel space, so only set the m2p override when kmap_ops is set. > >> >> >>>>> > >> >> >>>>> Can you provide a more detailed reasoning about why this is safe? > >> >> >>>> > >> >> >>>> To tell the truth, I don't understand why we need to use the m2p > >> >> >>>> override for kernel space only mappings, my understanding is that > >> >> >>>> this > >> >> >>>> m2p override is needed for user space mappings only (where we > >> >> >>>> actually > >> >> >>>> end up doing two mappings, one in kernel space and one in user > >> >> >>>> space). > >> >> >>>> For kernel space I don't see why we need to do anything else than > >> >> >>>> setting the right p2m translation. > >> >> >>> > >> >> >>> We needed the m2p when doing DMA operations. As the driver would > >> >> >>> want the bus address (so p2m) and then when unmapping the DMA we > >> >> >>> only get the bus address - so we needed to do a m2p lookup. > >> >> >> > >> >> >> OK, we need a m2p (that we already have in machine_to_phys_mapping), > >> >> >> what I don't understand is why we need the m2p override. > >> >> > > >> >> > The m2p is a host global table. > >> >> > > >> >> > For a foreign page grant mapped into the current domain the m2p will > >> >> > give you the foreign (owner) domain's p from the m, not the local one. > >> >> > >> >> Yes, you are completely right, then I have to figure out why blkback > >> >> works fine with this patch applied (or at least it seems to work fine). > >> > > >> > blkback also works for me when testing a similar patch. I'm still > >> > confused. One thing with your proposed patch: I'm not sure that you're > >> > putting back the correct mfn. > >> > >> It's perfectly fine to store a foreign pfn in the m2p table. The m2p > >> override table is used by the grant device to allow a reverse lookup of > >> the real mfn to a pfn even if it's foreign. > >> > >> blkback doesn't actually need this though. This was introduced in: > >> > >> commit 5dc03639cc903f887931831d69895facb5260f4b > >> Author: Konrad Rzeszutek Wilk > >> Date: Tue Mar 1 16:46:45 2011 -0500 > >> > >> xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map > >> > >> Purely as an optimization. In practice though due to lock contention it > >> slows things down. > >> > >> I think an alternative would be to use a read/write lock instead of just > >> a spinlock since it's the read path that is the most hot. > > > > The m2p hash table can also be expanded to lower the contention. > > That was the first thing I tried :-) > > The issue isn't lookup cost as much as the number of acquisitions and > the cost of adding/removing entries. > > >> I haven't tested that yet though. > > > > Looking forward to your patches :-) > > I'm really just being thorough in suggesting the rwlock. I actually > think that not using the m2p override table is the right long term > fix. I tried a rwlock a while back. It didn't help much because we actually have a problem with concurrent adds/removes as much as lookups. --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote: > On 05/11/13 16:08, Ian Campbell wrote: > > On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote: > >> On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote: > >>> On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote: > On 05/11/13 13:36, David Vrabel wrote: > > On 05/11/13 11:24, Roger Pau Monne wrote: > >> IMHO there's no reason to set a m2p override if the mapping is done in > >> kernel space, so only set the m2p override when kmap_ops is set. > > > > Can you provide a more detailed reasoning about why this is safe? > > To tell the truth, I don't understand why we need to use the m2p > override for kernel space only mappings, my understanding is that this > m2p override is needed for user space mappings only (where we actually > end up doing two mappings, one in kernel space and one in user space). > For kernel space I don't see why we need to do anything else than > setting the right p2m translation. > >>> > >>> We needed the m2p when doing DMA operations. As the driver would > >>> want the bus address (so p2m) and then when unmapping the DMA we > >>> only get the bus address - so we needed to do a m2p lookup. > >> > >> OK, we need a m2p (that we already have in machine_to_phys_mapping), > >> what I don't understand is why we need the m2p override. > > > > The m2p is a host global table. > > > > For a foreign page grant mapped into the current domain the m2p will > > give you the foreign (owner) domain's p from the m, not the local one. > > Yes, you are completely right, then I have to figure out why blkback > works fine with this patch applied (or at least it seems to work fine). blkback also works for me when testing a similar patch. I'm still confused. One thing with your proposed patch: I'm not sure that you're putting back the correct mfn. Adding Anthony to the thread. --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote: On 05/11/13 16:08, Ian Campbell wrote: On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote: On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote: On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote: On 05/11/13 13:36, David Vrabel wrote: On 05/11/13 11:24, Roger Pau Monne wrote: IMHO there's no reason to set a m2p override if the mapping is done in kernel space, so only set the m2p override when kmap_ops is set. Can you provide a more detailed reasoning about why this is safe? To tell the truth, I don't understand why we need to use the m2p override for kernel space only mappings, my understanding is that this m2p override is needed for user space mappings only (where we actually end up doing two mappings, one in kernel space and one in user space). For kernel space I don't see why we need to do anything else than setting the right p2m translation. We needed the m2p when doing DMA operations. As the driver would want the bus address (so p2m) and then when unmapping the DMA we only get the bus address - so we needed to do a m2p lookup. OK, we need a m2p (that we already have in machine_to_phys_mapping), what I don't understand is why we need the m2p override. The m2p is a host global table. For a foreign page grant mapped into the current domain the m2p will give you the foreign (owner) domain's p from the m, not the local one. Yes, you are completely right, then I have to figure out why blkback works fine with this patch applied (or at least it seems to work fine). blkback also works for me when testing a similar patch. I'm still confused. One thing with your proposed patch: I'm not sure that you're putting back the correct mfn. Adding Anthony to the thread. --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set
On Tue, Nov 05, 2013 at 02:09:53PM -0800, Anthony Liguori wrote: On Tue, Nov 5, 2013 at 1:16 PM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Tue, Nov 05, 2013 at 12:53:17PM -0800, Anthony Liguori wrote: Matt Wilson m...@linux.com writes: On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote: On 05/11/13 16:08, Ian Campbell wrote: On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote: On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote: On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote: On 05/11/13 13:36, David Vrabel wrote: On 05/11/13 11:24, Roger Pau Monne wrote: IMHO there's no reason to set a m2p override if the mapping is done in kernel space, so only set the m2p override when kmap_ops is set. Can you provide a more detailed reasoning about why this is safe? To tell the truth, I don't understand why we need to use the m2p override for kernel space only mappings, my understanding is that this m2p override is needed for user space mappings only (where we actually end up doing two mappings, one in kernel space and one in user space). For kernel space I don't see why we need to do anything else than setting the right p2m translation. We needed the m2p when doing DMA operations. As the driver would want the bus address (so p2m) and then when unmapping the DMA we only get the bus address - so we needed to do a m2p lookup. OK, we need a m2p (that we already have in machine_to_phys_mapping), what I don't understand is why we need the m2p override. The m2p is a host global table. For a foreign page grant mapped into the current domain the m2p will give you the foreign (owner) domain's p from the m, not the local one. Yes, you are completely right, then I have to figure out why blkback works fine with this patch applied (or at least it seems to work fine). blkback also works for me when testing a similar patch. I'm still confused. One thing with your proposed patch: I'm not sure that you're putting back the correct mfn. It's perfectly fine to store a foreign pfn in the m2p table. The m2p override table is used by the grant device to allow a reverse lookup of the real mfn to a pfn even if it's foreign. blkback doesn't actually need this though. This was introduced in: commit 5dc03639cc903f887931831d69895facb5260f4b Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Tue Mar 1 16:46:45 2011 -0500 xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map Purely as an optimization. In practice though due to lock contention it slows things down. I think an alternative would be to use a read/write lock instead of just a spinlock since it's the read path that is the most hot. The m2p hash table can also be expanded to lower the contention. That was the first thing I tried :-) The issue isn't lookup cost as much as the number of acquisitions and the cost of adding/removing entries. I haven't tested that yet though. Looking forward to your patches :-) I'm really just being thorough in suggesting the rwlock. I actually think that not using the m2p override table is the right long term fix. I tried a rwlock a while back. It didn't help much because we actually have a problem with concurrent adds/removes as much as lookups. --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] (xen) stable/for-jens-3.13
On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote: Hey Jens, Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.13 which has a fix to the Xen block frontend driver. For the backends that don't support persistent grants, we don't want to use the logic that would use copying (needed for persistent grants). That logic existed there initially but with the persistent grants code added in - it was removed (so in 3.10 time-frame). This adds part of it back. I just tested this patch as part of a backported series on top of a 3.4.68 base. When this patch is included and a domU boots on a system that does not support persistent grants the kernel panics on boot with the trace below. Full boot log: http://paste.ubuntu.com/6367329/ It's possible we missed something in the backport series. I'll test this patch on top of 3.11.4 to see how it fares. general protection fault: [#1] SMP CPU 14 Modules linked in: aesni_intel cryptd aes_x86_64 aes_generic dm_mirror dm_region_hash dm_log dm_mod Pid: 2192, comm: blkid Not tainted 3.4.68-3.8.amzn1.x86_64 #1 Xen HVM domU RIP: 0010:[812dda71] [812dda71] get_grant+0x51/0x110 RSP: 0018:881d07b5d978 EFLAGS: 00010006 RAX: 025c68fd93b116ed RBX: 880e72fc6b50 RCX: RDX: 1e8134cf7b38b76c RSI: 01d1a05f RDI: 881d07b5da64 RBP: 881d07b5d998 R08: e200 R09: dead00100100 R10: dead00200200 R11: 8b9207578571 R12: 01d1a05f R13: 880e72efc000 R14: 881d07b5da64 R15: 881d1822b0b0 FS: 7fe197073760() GS:881d5bcc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fe196c4a3b0 CR3: 001d079c6000 CR4: 000407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process blkid (pid: 2192, threadinfo 881d07b5c000, task 881d07ae1690) Stack: 0246 880e72963400 881d1822b0b0 881d07b5da98 812ddf60 0001 2000 00ff881d07b5d9e8 881d07b5dfd8 881d1822b000 880e72d560b0 Call Trace: [812ddf60] do_blkif_request+0x410/0x7c0 [811882b7] ? submit_bh+0xf7/0x120 [81212026] queue_unplugged+0x56/0xf0 [81216573] blk_flush_plug_list+0x1c3/0x220 [812165e8] blk_finish_plug+0x18/0x50 [8110ec11] __do_page_cache_readahead+0x1c1/0x250 [8110efb1] force_page_cache_readahead+0x71/0xa0 [8110f333] page_cache_sync_readahead+0x43/0x50 [81104f58] generic_file_aio_read+0x4d8/0x760 [811673e0] ? do_last+0x470/0x8c0 [8118ed98] blkdev_aio_read+0x58/0x80 [81158e1a] do_sync_read+0xda/0x120 [811e22f3] ? security_file_permission+0x93/0xb0 [811592a1] ? rw_verify_area+0x61/0xf0 [81159780] vfs_read+0xb0/0x180 [8115989a] sys_read+0x4a/0x90 [813f0979] system_call_fastpath+0x16/0x1b Code: 48 8d 82 b0 12 00 00 48 39 c3 0f 84 ca 00 00 00 49 b9 00 01 10 00 00 00 ad de 48 8b 13 49 ba 00 02 20 00 00 00 ad de 48 8b 43 08 48 89 42 08 48 89 10 44 8b 5b f0 4c 89 0b 4c 89 53 08 45 85 db RIP [812dda71] get_grant+0x51/0x110 RSP 881d07b5d978 ---[ end trace 3eba56aed2b98597 ]--- --msw Please pull! P.S. Sorry for the late pull! drivers/block/xen-blkfront.c | 125 ++- 1 file changed, 100 insertions(+), 25 deletions(-) Roger Pau Monne (1): xen-blkfront: restore the non-persistent data path -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] (xen) stable/for-jens-3.13
On Tue, Nov 05, 2013 at 02:52:29PM -0800, Matt Wilson wrote: On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote: Hey Jens, Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.13 which has a fix to the Xen block frontend driver. For the backends that don't support persistent grants, we don't want to use the logic that would use copying (needed for persistent grants). That logic existed there initially but with the persistent grants code added in - it was removed (so in 3.10 time-frame). This adds part of it back. I just tested this patch as part of a backported series on top of a 3.4.68 base. When this patch is included and a domU boots on a system that does not support persistent grants the kernel panics on boot with the trace below. Full boot log: http://paste.ubuntu.com/6367329/ It's possible we missed something in the backport series. I'll test this patch on top of 3.11.4 to see how it fares. Hi Konrad, Looks like there's something amiss in the backport. Testing your git branch worked fine. Sorry for the false alarm. --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hpet, allow user controlled mmap for user processes
On Fri, Mar 22, 2013 at 09:32:54AM -0400, Prarit Bhargava wrote: > The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET > registers to userspace. The Kconfig help points out that in some cases this > can be a security risk as some systems may erroneously configure the map such > that additional data is exposed to userspace. > > This is a problem for distributions -- some users want the MMAP functionality > but it comes with a significant security risk. In an effort to mitigate this > risk, and due to the low number of users of the MMAP functionality, I've > introduced a kernel parameter, hpet_mmap_enable, that is required in order > to actually have the HPET MMAP exposed. > > [v2]: Clemens suggested modifying the Kconfig help text and making the > default setting configurable. > [v3]: Fixed up Documentation and Kconfig entries, default now "Y" > [v4]: After testing, found that I need to modify CONFIG_HPET_MMAP_DEFAULT > usage > > Signed-off-by: Prarit Bhargava > Cc: Clemens Ladisch > --- > Documentation/kernel-parameters.txt |4 > drivers/char/Kconfig|9 +++-- > drivers/char/hpet.c | 25 +++-- > 3 files changed, 34 insertions(+), 4 deletions(-) It doesn't seem like this patch got picked up and seems like a good idea to me. Clemens, what do you think? Acked-by: Matt Wilson --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hpet, allow user controlled mmap for user processes
On Fri, Mar 22, 2013 at 09:32:54AM -0400, Prarit Bhargava wrote: The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET registers to userspace. The Kconfig help points out that in some cases this can be a security risk as some systems may erroneously configure the map such that additional data is exposed to userspace. This is a problem for distributions -- some users want the MMAP functionality but it comes with a significant security risk. In an effort to mitigate this risk, and due to the low number of users of the MMAP functionality, I've introduced a kernel parameter, hpet_mmap_enable, that is required in order to actually have the HPET MMAP exposed. [v2]: Clemens suggested modifying the Kconfig help text and making the default setting configurable. [v3]: Fixed up Documentation and Kconfig entries, default now Y [v4]: After testing, found that I need to modify CONFIG_HPET_MMAP_DEFAULT usage Signed-off-by: Prarit Bhargava pra...@redhat.com Cc: Clemens Ladisch clem...@ladisch.de --- Documentation/kernel-parameters.txt |4 drivers/char/Kconfig|9 +++-- drivers/char/hpet.c | 25 +++-- 3 files changed, 34 insertions(+), 4 deletions(-) It doesn't seem like this patch got picked up and seems like a good idea to me. Clemens, what do you think? Acked-by: Matt Wilson m...@amazon.com --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] xen-blkfront: revoke foreign access for grants not mapped by the backend
On Wed, Jul 31, 2013 at 05:00:44PM +0200, Roger Pau Monne wrote: > There's no need to keep the foreign access in a grant if it is not > persistently mapped by the backend. This allows us to free grants that > are not mapped by the backend, thus preventing blkfront from hoarding > all grants. > > The main effect of this is that blkfront will only persistently map > the same grants as the backend, and it will always try to use grants > that are already mapped by the backend. Also the number of persistent > grants in blkfront is the same as in blkback (and is controlled by the > value in blkback). > > Signed-off-by: Roger Pau Monné Acked-by: Matt Wilson > Cc: Konrad Rzeszutek Wilk > Cc: David Vrabel > --- > drivers/block/xen-blkfront.c | 33 + > 1 files changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 187a437..bc9fc7d 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -1016,13 +1016,38 @@ static void blkif_completion(struct blk_shadow *s, > struct blkfront_info *info, > } > /* Add the persistent grant into the list of free grants */ > for (i = 0; i < nseg; i++) { > - list_add(>grants_used[i]->node, >persistent_gnts); > - info->persistent_gnts_c++; > + if (gnttab_query_foreign_access(s->grants_used[i]->gref)) { > + /* > + * If the grant is still mapped by the backend (the > + * backend has chosen to make this grant persistent) > + * we add it at the head of the list, so it will be > + * reused first. > + */ > + list_add(>grants_used[i]->node, > >persistent_gnts); > + info->persistent_gnts_c++; > + } else { > + /* > + * If the grant is not mapped by the backend we end the > + * foreign access and add it to the tail of the list, > + * so it will not be picked again unless we run out of > + * persistent grants. > + */ > + gnttab_end_foreign_access(s->grants_used[i]->gref, 0, > 0UL); > + s->grants_used[i]->gref = GRANT_INVALID_REF; > + list_add_tail(>grants_used[i]->node, > >persistent_gnts); > + } > } > if (s->req.operation == BLKIF_OP_INDIRECT) { > for (i = 0; i < INDIRECT_GREFS(nseg); i++) { > - list_add(>indirect_grants[i]->node, > >persistent_gnts); > - info->persistent_gnts_c++; > + if > (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) { > + list_add(>indirect_grants[i]->node, > >persistent_gnts); > + info->persistent_gnts_c++; > + } else { > + > gnttab_end_foreign_access(s->indirect_grants[i]->gref, 0, 0UL); > + s->indirect_grants[i]->gref = GRANT_INVALID_REF; > + list_add_tail(>indirect_grants[i]->node, > + >persistent_gnts); > + } > } > } > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] xen-blkfront: revoke foreign access for grants not mapped by the backend
On Wed, Jul 31, 2013 at 05:00:44PM +0200, Roger Pau Monne wrote: There's no need to keep the foreign access in a grant if it is not persistently mapped by the backend. This allows us to free grants that are not mapped by the backend, thus preventing blkfront from hoarding all grants. The main effect of this is that blkfront will only persistently map the same grants as the backend, and it will always try to use grants that are already mapped by the backend. Also the number of persistent grants in blkfront is the same as in blkback (and is controlled by the value in blkback). Signed-off-by: Roger Pau Monné roger@citrix.com Acked-by: Matt Wilson m...@amazon.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: David Vrabel david.vra...@citrix.com --- drivers/block/xen-blkfront.c | 33 + 1 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 187a437..bc9fc7d 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -1016,13 +1016,38 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info, } /* Add the persistent grant into the list of free grants */ for (i = 0; i nseg; i++) { - list_add(s-grants_used[i]-node, info-persistent_gnts); - info-persistent_gnts_c++; + if (gnttab_query_foreign_access(s-grants_used[i]-gref)) { + /* + * If the grant is still mapped by the backend (the + * backend has chosen to make this grant persistent) + * we add it at the head of the list, so it will be + * reused first. + */ + list_add(s-grants_used[i]-node, info-persistent_gnts); + info-persistent_gnts_c++; + } else { + /* + * If the grant is not mapped by the backend we end the + * foreign access and add it to the tail of the list, + * so it will not be picked again unless we run out of + * persistent grants. + */ + gnttab_end_foreign_access(s-grants_used[i]-gref, 0, 0UL); + s-grants_used[i]-gref = GRANT_INVALID_REF; + list_add_tail(s-grants_used[i]-node, info-persistent_gnts); + } } if (s-req.operation == BLKIF_OP_INDIRECT) { for (i = 0; i INDIRECT_GREFS(nseg); i++) { - list_add(s-indirect_grants[i]-node, info-persistent_gnts); - info-persistent_gnts_c++; + if (gnttab_query_foreign_access(s-indirect_grants[i]-gref)) { + list_add(s-indirect_grants[i]-node, info-persistent_gnts); + info-persistent_gnts_c++; + } else { + gnttab_end_foreign_access(s-indirect_grants[i]-gref, 0, 0UL); + s-indirect_grants[i]-gref = GRANT_INVALID_REF; + list_add_tail(s-indirect_grants[i]-node, + info-persistent_gnts); + } } } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] xen-gnt: prevent adding duplicate gnt callbacks
On Wed, Jul 31, 2013 at 05:00:42PM +0200, Roger Pau Monne wrote: > With the current implementation, the callback in the tail of the list > can be added twice, because the check done in > gnttab_request_free_callback is bogus, callback->next can be NULL if > it is the last callback in the list. If we add the same callback twice > we end up with an infinite loop, were callback == callback->next. > > Replace this check with a proper one that iterates over the list to > see if the callback has already been added. Acked-by: Matt Wilson > Signed-off-by: Roger Pau Monné > Cc: Konrad Rzeszutek Wilk > Cc: David Vrabel > --- > This patch should be backported to stable trees > --- > drivers/xen/grant-table.c | 13 +++-- > 1 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > index 04c1b2d..d5418c1 100644 > --- a/drivers/xen/grant-table.c > +++ b/drivers/xen/grant-table.c > @@ -729,9 +729,18 @@ void gnttab_request_free_callback(struct > gnttab_free_callback *callback, > void (*fn)(void *), void *arg, u16 count) > { > unsigned long flags; > + struct gnttab_free_callback *cb; > + > spin_lock_irqsave(_list_lock, flags); > - if (callback->next) > - goto out; > + > + /* Check if the callback is already on the list */ > + cb = gnttab_free_callback_list; > + while (cb) { > + if (cb == callback) > + goto out; > + cb = cb->next; > + } > + > callback->fn = fn; > callback->arg = arg; > callback->count = count; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] xen-gnt: prevent adding duplicate gnt callbacks
On Wed, Jul 31, 2013 at 05:00:42PM +0200, Roger Pau Monne wrote: With the current implementation, the callback in the tail of the list can be added twice, because the check done in gnttab_request_free_callback is bogus, callback-next can be NULL if it is the last callback in the list. If we add the same callback twice we end up with an infinite loop, were callback == callback-next. Replace this check with a proper one that iterates over the list to see if the callback has already been added. Acked-by: Matt Wilson m...@amazon.com Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: David Vrabel david.vra...@citrix.com --- This patch should be backported to stable trees --- drivers/xen/grant-table.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 04c1b2d..d5418c1 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -729,9 +729,18 @@ void gnttab_request_free_callback(struct gnttab_free_callback *callback, void (*fn)(void *), void *arg, u16 count) { unsigned long flags; + struct gnttab_free_callback *cb; + spin_lock_irqsave(gnttab_list_lock, flags); - if (callback-next) - goto out; + + /* Check if the callback is already on the list */ + cb = gnttab_free_callback_list; + while (cb) { + if (cb == callback) + goto out; + cb = cb-next; + } + callback-fn = fn; callback-arg = arg; callback-count = count; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec/kdump implementation for Xen PV domU
On Mon, Jul 29, 2013 at 07:15:43PM +0200, Daniel Kiper wrote: > Hi all, > > Here I am sending as attachments patches enabling kexec/kdump > support in Xen PV domU. Only x84_64 architecture is supported. > There is no support for i386 but some code could be easily reused. > Here is a description of patches: [...] > - kexec-kernel-only_20121203.patch: this patch fixes timer > issue on Amazon EC2 machines. Hi Daniel, Do you know the cause of this issue? Does it have something to do with singleshot timer migration when offlining/onlining SMP CPUs? --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec/kdump implementation for Xen PV domU
On Mon, Jul 29, 2013 at 07:15:43PM +0200, Daniel Kiper wrote: Hi all, Here I am sending as attachments patches enabling kexec/kdump support in Xen PV domU. Only x84_64 architecture is supported. There is no support for i386 but some code could be easily reused. Here is a description of patches: [...] - kexec-kernel-only_20121203.patch: this patch fixes timer issue on Amazon EC2 machines. Hi Daniel, Do you know the cause of this issue? Does it have something to do with singleshot timer migration when offlining/onlining SMP CPUs? --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] is kexec on Xen domU possible?
On Mon, Jul 22, 2013 at 11:33:15AM -0700, Greg KH wrote: > On Mon, Jul 22, 2013 at 11:24:46AM -0700, H. Peter Anvin wrote: > > On 07/22/2013 10:20 AM, Eric W. Biederman wrote: > > >>> > > >>> Also, in any virtualized environment the hypervisor can do a better job > > >>> for things like kdump, simply because it can provide two things that are > > >>> otherwise hard to do: > > >>> > > >>> 1. a known-good system state; > > >>> 2. a known-clean kdump image. > > >>> > > >>> As such, I do encourage the virtualization people to (also) develop > > >>> hypervisor-*aware* solutions for these kinds of things. > > >> > > >> In general I agree but if you could not change hypervisor > > >> and/or dom0 (e.g. you are using cloud providers which are > > >> stick to old versions of Xen) then you have no choice. > > > > > > Which tends to be where kexec on panic comes in most cases. Getting > > > platform vendors to do something sane tends to be a multi-year political > > > effort of dubious worth while just solving the problem locally actually > > > gets the problem solved for those who care. > > > > > > > It should not be a "one or the other" issue. > > I don't care about kdump, I care about kexec on domU for people who are > running on cloud providers with old versions of Xen so that they can > control what kernel they can boot, when they want to boot it. If kdump > works as well, that's just a bonus, but it's down on the list of things > for me to be concerned about. Many Xen-based cloud providers provide a mechanism for users to boot the kernels they want. For example you can use PV-GRUB on EC2 instances to boot a kernel that is stored within an AMI. For more info: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UserProvidedkernels.html --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] is kexec on Xen domU possible?
On Mon, Jul 22, 2013 at 11:33:15AM -0700, Greg KH wrote: On Mon, Jul 22, 2013 at 11:24:46AM -0700, H. Peter Anvin wrote: On 07/22/2013 10:20 AM, Eric W. Biederman wrote: Also, in any virtualized environment the hypervisor can do a better job for things like kdump, simply because it can provide two things that are otherwise hard to do: 1. a known-good system state; 2. a known-clean kdump image. As such, I do encourage the virtualization people to (also) develop hypervisor-*aware* solutions for these kinds of things. In general I agree but if you could not change hypervisor and/or dom0 (e.g. you are using cloud providers which are stick to old versions of Xen) then you have no choice. Which tends to be where kexec on panic comes in most cases. Getting platform vendors to do something sane tends to be a multi-year political effort of dubious worth while just solving the problem locally actually gets the problem solved for those who care. It should not be a one or the other issue. I don't care about kdump, I care about kexec on domU for people who are running on cloud providers with old versions of Xen so that they can control what kernel they can boot, when they want to boot it. If kdump works as well, that's just a bonus, but it's down on the list of things for me to be concerned about. Many Xen-based cloud providers provide a mechanism for users to boot the kernels they want. For example you can use PV-GRUB on EC2 instances to boot a kernel that is stored within an AMI. For more info: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UserProvidedkernels.html --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter
On Wed, Jul 10, 2013 at 12:34:58AM +0200, Sander Eikelenboom wrote: > > Tuesday, July 9, 2013, 5:05:54 PM, you wrote: > > > On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote: > >> Then that should be discussed on grub2 to remove said check and modify > >> the code so that it can properly work without regression. > > > Actually, the kernel patch removing that symbol should be applied so > > that grub2 breaks faster. One can't possibly rely on kernel internals > > for anything, as it is insanely insane (yep, the tautology is on purpose > > :-)). > > How insanely insane is it to be able to determine whether a certain > compiled kernel binary supports a certain function ? > > Grub does this in it's update script to prevent adding a xen + > kernel combination that has no chance of booting when dom0 support > has not been configured in the kernel. That doesn't seem to be a > unreasonable thought. > > Grepping the accompanied config file in /boot for the xen dom0 > Kconfig parameter seems the best possible effort grub can do at the > moment. I think this can be improved, even with the situation today. > Especially since the Kconfig parameter naming doesn't change that > often. > > If you know a better way for grub to determine if a certain function > for a kernel binary is supported then please elaborate .. Certainly. Parse the ELF notes that are present in a dom0-capable Linux kernel binary itself. $ readelf -n vmlinux Notes at offset 0x0069be88 with length 0x017c: Owner Data size Description Xen 0x0006 Unknown note type: (0x0006) Xen 0x0004 Unknown note type: (0x0007) Xen 0x0008 Unknown note type: (0x0005) Xen 0x0008 Unknown note type: (0x0003) Xen 0x0008 NT_VERSION (version) Xen 0x0008 NT_ARCH (architecture) Xen 0x002a Unknown note type: (0x000a) Xen 0x0004 Unknown note type: (0x0009) Xen 0x0008 Unknown note type: (0x0008) Xen 0x0010 Unknown note type: (0x000d) Xen 0x0004 Unknown note type: (0x000e) Xen 0x0008 Unknown note type: (0x000c) Xen 0x0008 Unknown note type: (0x0004) GNU 0x0014 NT_GNU_BUILD_ID (unique build ID bitstring) See arch/x86/xen/xen-head.S. There's a new note type (XEN_ELFNOTE_SUPPORTED_FEATURES) that we can use to make dom0 support explicit. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/domain_build.c;hb=HEAD#l415 --msw -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter
On Wed, Jul 10, 2013 at 12:34:58AM +0200, Sander Eikelenboom wrote: Tuesday, July 9, 2013, 5:05:54 PM, you wrote: On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote: Then that should be discussed on grub2 to remove said check and modify the code so that it can properly work without regression. Actually, the kernel patch removing that symbol should be applied so that grub2 breaks faster. One can't possibly rely on kernel internals for anything, as it is insanely insane (yep, the tautology is on purpose :-)). How insanely insane is it to be able to determine whether a certain compiled kernel binary supports a certain function ? Grub does this in it's update script to prevent adding a xen + kernel combination that has no chance of booting when dom0 support has not been configured in the kernel. That doesn't seem to be a unreasonable thought. Grepping the accompanied config file in /boot for the xen dom0 Kconfig parameter seems the best possible effort grub can do at the moment. I think this can be improved, even with the situation today. Especially since the Kconfig parameter naming doesn't change that often. If you know a better way for grub to determine if a certain function for a kernel binary is supported then please elaborate .. Certainly. Parse the ELF notes that are present in a dom0-capable Linux kernel binary itself. $ readelf -n vmlinux Notes at offset 0x0069be88 with length 0x017c: Owner Data size Description Xen 0x0006 Unknown note type: (0x0006) Xen 0x0004 Unknown note type: (0x0007) Xen 0x0008 Unknown note type: (0x0005) Xen 0x0008 Unknown note type: (0x0003) Xen 0x0008 NT_VERSION (version) Xen 0x0008 NT_ARCH (architecture) Xen 0x002a Unknown note type: (0x000a) Xen 0x0004 Unknown note type: (0x0009) Xen 0x0008 Unknown note type: (0x0008) Xen 0x0010 Unknown note type: (0x000d) Xen 0x0004 Unknown note type: (0x000e) Xen 0x0008 Unknown note type: (0x000c) Xen 0x0008 Unknown note type: (0x0004) GNU 0x0014 NT_GNU_BUILD_ID (unique build ID bitstring) See arch/x86/xen/xen-head.S. There's a new note type (XEN_ELFNOTE_SUPPORTED_FEATURES) that we can use to make dom0 support explicit. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/domain_build.c;hb=HEAD#l415 --msw -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/