Re: [PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-18 Thread Matt Wilson
On Mon, Nov 13, 2017 at 11:31:31AM -0800, Matt Wilson wrote:
> From: Matt Wilson <m...@amazon.com>
> 
> This device will be used in future Amazon EC2 instances as the primary
> serial port (i.e., data sent to this port will be available via the
> GetConsoleOuput [1] EC2 API).

Ping?

--msw

> [1] 
> http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Matt Wilson <m...@amazon.com>
> ---
>  drivers/tty/serial/8250/8250_pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/tty/serial/8250/8250_pci.c 
> b/drivers/tty/serial/8250/8250_pci.c
> index 0c101a7..d4e7be8 100644
> --- a/drivers/tty/serial/8250/8250_pci.c
> +++ b/drivers/tty/serial/8250/8250_pci.c
> @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
>   { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
>   { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
>  
> + /* Amazon PCI serial device */
> + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
> +
>   /*
>* These entries match devices with class COMMUNICATION_SERIAL,
>* COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL


Re: [PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-18 Thread Matt Wilson
On Mon, Nov 13, 2017 at 11:31:31AM -0800, Matt Wilson wrote:
> From: Matt Wilson 
> 
> This device will be used in future Amazon EC2 instances as the primary
> serial port (i.e., data sent to this port will be available via the
> GetConsoleOuput [1] EC2 API).

Ping?

--msw

> [1] 
> http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Matt Wilson 
> ---
>  drivers/tty/serial/8250/8250_pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/tty/serial/8250/8250_pci.c 
> b/drivers/tty/serial/8250/8250_pci.c
> index 0c101a7..d4e7be8 100644
> --- a/drivers/tty/serial/8250/8250_pci.c
> +++ b/drivers/tty/serial/8250/8250_pci.c
> @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
>   { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
>   { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
>  
> + /* Amazon PCI serial device */
> + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
> +
>   /*
>* These entries match devices with class COMMUNICATION_SERIAL,
>* COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL


[PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-13 Thread Matt Wilson
From: Matt Wilson <m...@amazon.com>

This device will be used in future Amazon EC2 instances as the primary
serial port (i.e., data sent to this port will be available via the
GetConsoleOuput [1] EC2 API).

[1] 
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html

Cc: sta...@vger.kernel.org
Signed-off-by: Matt Wilson <m...@amazon.com>
---
 drivers/tty/serial/8250/8250_pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index 0c101a7..d4e7be8 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
{ PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
{ PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
 
+   /* Amazon PCI serial device */
+   { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
+
/*
 * These entries match devices with class COMMUNICATION_SERIAL,
 * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
-- 
2.7.4



[PATCH v2] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-13 Thread Matt Wilson
From: Matt Wilson 

This device will be used in future Amazon EC2 instances as the primary
serial port (i.e., data sent to this port will be available via the
GetConsoleOuput [1] EC2 API).

[1] 
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html

Cc: sta...@vger.kernel.org
Signed-off-by: Matt Wilson 
---
 drivers/tty/serial/8250/8250_pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index 0c101a7..d4e7be8 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
{ PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
{ PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
 
+   /* Amazon PCI serial device */
+   { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
+
/*
 * These entries match devices with class COMMUNICATION_SERIAL,
 * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
-- 
2.7.4



[PATCH] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-12 Thread Matt Wilson
From: Matt Wilson <m...@amazon.com>

Cc: sta...@vger.kernel.org
Signed-off-by: Matt Wilson <m...@amazon.com>
---
 drivers/tty/serial/8250/8250_pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index 0c101a7..d4e7be8 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
{ PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
{ PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
 
+   /* Amazon PCI serial device */
+   { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
+
/*
 * These entries match devices with class COMMUNICATION_SERIAL,
 * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
-- 
2.7.4



[PATCH] serial: 8250_pci: Add Amazon PCI serial device ID

2017-11-12 Thread Matt Wilson
From: Matt Wilson 

Cc: sta...@vger.kernel.org
Signed-off-by: Matt Wilson 
---
 drivers/tty/serial/8250/8250_pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index 0c101a7..d4e7be8 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = {
{ PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_125 },
{ PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_125 },
 
+   /* Amazon PCI serial device */
+   { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 },
+
/*
 * These entries match devices with class COMMUNICATION_SERIAL,
 * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL
-- 
2.7.4



Re: [PATCH V2 net 13/20] net/ena: change driver's default timeouts

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:31PM +0200, Netanel Belgazal wrote:

... because? (they turned out to be too aggressive, I believe.)

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_com.c| 4 ++--
>  drivers/net/ethernet/amazon/ena/ena_netdev.h | 7 ---
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 4147d6e..a550c8a 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -36,9 +36,9 @@
>  
> /*/
>  
>  /* Timeout in micro-sec */
> -#define ADMIN_CMD_TIMEOUT_US (100)
> +#define ADMIN_CMD_TIMEOUT_US (300)
>  
> -#define ENA_ASYNC_QUEUE_DEPTH 4
> +#define ENA_ASYNC_QUEUE_DEPTH 16

Why is this changed at the same time?

>  #define ENA_ADMIN_QUEUE_DEPTH 32
>  
>  #define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> index c081fd3..ed42e07 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

This change seems unrelated.

>  #include "ena_com.h"
>  #include "ena_eth_com.h"
> @@ -100,7 +101,7 @@
>  /* Number of queues to check for missing queues per timer service */
>  #define ENA_MONITORED_TX_QUEUES  4
>  /* Max timeout packets before device reset */
> -#define MAX_NUM_OF_TIMEOUTED_PACKETS 32
> +#define MAX_NUM_OF_TIMEOUTED_PACKETS 128
>  
>  #define ENA_TX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - 
> 1))
>  
> @@ -116,9 +117,9 @@
>  #define ENA_IO_IRQ_IDX(q)(ENA_IO_IRQ_FIRST_IDX + (q))
>  
>  /* ENA device should send keep alive msg every 1 sec.
> - * We wait for 3 sec just to be on the safe side.
> + * We wait for 6 sec just to be on the safe side.
>   */
> -#define ENA_DEVICE_KALIVE_TIMEOUT(3 * HZ)
> +#define ENA_DEVICE_KALIVE_TIMEOUT(6 * HZ)
>  
>  #define ENA_MMIO_DISABLE_REG_READBIT(0)
>  


Re: [PATCH V2 net 13/20] net/ena: change driver's default timeouts

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:31PM +0200, Netanel Belgazal wrote:

... because? (they turned out to be too aggressive, I believe.)

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_com.c| 4 ++--
>  drivers/net/ethernet/amazon/ena/ena_netdev.h | 7 ---
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 4147d6e..a550c8a 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -36,9 +36,9 @@
>  
> /*/
>  
>  /* Timeout in micro-sec */
> -#define ADMIN_CMD_TIMEOUT_US (100)
> +#define ADMIN_CMD_TIMEOUT_US (300)
>  
> -#define ENA_ASYNC_QUEUE_DEPTH 4
> +#define ENA_ASYNC_QUEUE_DEPTH 16

Why is this changed at the same time?

>  #define ENA_ADMIN_QUEUE_DEPTH 32
>  
>  #define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> index c081fd3..ed42e07 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

This change seems unrelated.

>  #include "ena_com.h"
>  #include "ena_eth_com.h"
> @@ -100,7 +101,7 @@
>  /* Number of queues to check for missing queues per timer service */
>  #define ENA_MONITORED_TX_QUEUES  4
>  /* Max timeout packets before device reset */
> -#define MAX_NUM_OF_TIMEOUTED_PACKETS 32
> +#define MAX_NUM_OF_TIMEOUTED_PACKETS 128
>  
>  #define ENA_TX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - 
> 1))
>  
> @@ -116,9 +117,9 @@
>  #define ENA_IO_IRQ_IDX(q)(ENA_IO_IRQ_FIRST_IDX + (q))
>  
>  /* ENA device should send keep alive msg every 1 sec.
> - * We wait for 3 sec just to be on the safe side.
> + * We wait for 6 sec just to be on the safe side.
>   */
> -#define ENA_DEVICE_KALIVE_TIMEOUT(3 * HZ)
> +#define ENA_DEVICE_KALIVE_TIMEOUT(6 * HZ)
>  
>  #define ENA_MMIO_DISABLE_REG_READBIT(0)
>  


Re: [PATCH V2 net 08/20] net/ena: add hardware hints capability to the driver

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:26PM +0200, Netanel Belgazal wrote:
> The ENA device can update the ena driver about the desire timeouts.
> The hardware hints are transmitted as Asynchronous event to the driver.

This is really a new feature, not a bugfix - correct? If it is a new
feature, submit it separately. If the built-in defaults need to be
changed, submit that as a bugfix.

--msw

> In case the device does not support this capability, the driver
> will use its own defines.
> 
> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 31 +
>  drivers/net/ethernet/amazon/ena/ena_com.c| 41 ---
>  drivers/net/ethernet/amazon/ena/ena_com.h|  5 ++
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c|  1 -
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 86 
> +++-
>  drivers/net/ethernet/amazon/ena/ena_netdev.h | 19 +-
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h  |  2 +
>  7 files changed, 157 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index 6d70bf5..35ae511 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -70,6 +70,8 @@ enum ena_admin_aq_feature_id {
>  
>   ENA_ADMIN_MAX_QUEUES_NUM= 2,
>  
> + ENA_ADMIN_HW_HINTS  = 3,
> +
>   ENA_ADMIN_RSS_HASH_FUNCTION = 10,
>  
>   ENA_ADMIN_STATELESS_OFFLOAD_CONFIG  = 11,
> @@ -749,6 +751,31 @@ struct ena_admin_feature_rss_ind_table {
>   struct ena_admin_rss_ind_table_entry inline_entry;
>  };
>  
> +/* When hint value is 0, driver should use it's own predefined value */
> +struct ena_admin_ena_hw_hints {
> + /* value in ms */
> + u16 mmio_read_timeout;
> +
> + /* value in ms */
> + u16 driver_watchdog_timeout;
> +
> + /* Per packet tx completion timeout. value in ms */
> + u16 missing_tx_completion_timeout;
> +
> + u16 missed_tx_completion_count_threshold_to_reset;
> +
> + /* value in ms */
> + u16 admin_completion_tx_timeout;
> +
> + u16 netdev_wd_timeout;
> +
> + u16 max_tx_sgl_size;
> +
> + u16 max_rx_sgl_size;
> +
> + u16 reserved[8];
> +};
> +
>  struct ena_admin_get_feat_cmd {
>   struct ena_admin_aq_common_desc aq_common_descriptor;
>  
> @@ -782,6 +809,8 @@ struct ena_admin_get_feat_resp {
>   struct ena_admin_feature_rss_ind_table ind_table;
>  
>   struct ena_admin_feature_intr_moder_desc intr_moderation;
> +
> + struct ena_admin_ena_hw_hints hw_hints;
>   } u;
>  };
>  
> @@ -857,6 +886,8 @@ enum ena_admin_aenq_notification_syndrom {
>   ENA_ADMIN_SUSPEND   = 0,
>  
>   ENA_ADMIN_RESUME= 1,
> +
> + ENA_ADMIN_UPDATE_HINTS  = 2,
>  };
>  
>  struct ena_admin_aenq_entry {
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 46aad3a..f1e4f04 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -508,15 +508,13 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
>  static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx 
> *comp_ctx,
>struct ena_com_admin_queue 
> *admin_queue)
>  {
> - unsigned long flags;
> - u32 start_time;
> + unsigned long flags, timeout;
>   int ret;
>  
> - start_time = ((u32)jiffies_to_usecs(jiffies));
> + timeout = jiffies + usecs_to_jiffies(admin_queue->completion_timeout);
>  
>   while (comp_ctx->status == ENA_CMD_SUBMITTED) {
> - if u32)jiffies_to_usecs(jiffies)) - start_time) >
> - ADMIN_CMD_TIMEOUT_US) {
> + if (time_is_before_jiffies(timeout)) {
>   pr_err("Wait for completion (polling) timeout\n");
>   /* ENA didn't have any completion */
>   spin_lock_irqsave(_queue->q_lock, flags);
> @@ -560,7 +558,8 @@ static int 
> ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
>   int ret;
>  
>   wait_for_completion_timeout(_ctx->wait_event,
> - usecs_to_jiffies(ADMIN_CMD_TIMEOUT_US));
> + usecs_to_jiffies(
> + admin_queue->completion_timeout));
>  
>   /* In case the command wasn't completed find out the root cause.
>* There might be 2 kinds of errors
> @@ -600,12 +599,14 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev 
> *ena_dev, u16 offset)
>   struct ena_com_mmio_read *mmio_read = _dev->mmio_read;
>   volatile struct ena_admin_ena_mmio_req_read_less_resp *read_resp =
>   mmio_read->read_resp;
> - u32 mmio_read_reg, ret;
> + u32 

Re: [PATCH V2 net 08/20] net/ena: add hardware hints capability to the driver

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:26PM +0200, Netanel Belgazal wrote:
> The ENA device can update the ena driver about the desire timeouts.
> The hardware hints are transmitted as Asynchronous event to the driver.

This is really a new feature, not a bugfix - correct? If it is a new
feature, submit it separately. If the built-in defaults need to be
changed, submit that as a bugfix.

--msw

> In case the device does not support this capability, the driver
> will use its own defines.
> 
> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 31 +
>  drivers/net/ethernet/amazon/ena/ena_com.c| 41 ---
>  drivers/net/ethernet/amazon/ena/ena_com.h|  5 ++
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c|  1 -
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 86 
> +++-
>  drivers/net/ethernet/amazon/ena/ena_netdev.h | 19 +-
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h  |  2 +
>  7 files changed, 157 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index 6d70bf5..35ae511 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -70,6 +70,8 @@ enum ena_admin_aq_feature_id {
>  
>   ENA_ADMIN_MAX_QUEUES_NUM= 2,
>  
> + ENA_ADMIN_HW_HINTS  = 3,
> +
>   ENA_ADMIN_RSS_HASH_FUNCTION = 10,
>  
>   ENA_ADMIN_STATELESS_OFFLOAD_CONFIG  = 11,
> @@ -749,6 +751,31 @@ struct ena_admin_feature_rss_ind_table {
>   struct ena_admin_rss_ind_table_entry inline_entry;
>  };
>  
> +/* When hint value is 0, driver should use it's own predefined value */
> +struct ena_admin_ena_hw_hints {
> + /* value in ms */
> + u16 mmio_read_timeout;
> +
> + /* value in ms */
> + u16 driver_watchdog_timeout;
> +
> + /* Per packet tx completion timeout. value in ms */
> + u16 missing_tx_completion_timeout;
> +
> + u16 missed_tx_completion_count_threshold_to_reset;
> +
> + /* value in ms */
> + u16 admin_completion_tx_timeout;
> +
> + u16 netdev_wd_timeout;
> +
> + u16 max_tx_sgl_size;
> +
> + u16 max_rx_sgl_size;
> +
> + u16 reserved[8];
> +};
> +
>  struct ena_admin_get_feat_cmd {
>   struct ena_admin_aq_common_desc aq_common_descriptor;
>  
> @@ -782,6 +809,8 @@ struct ena_admin_get_feat_resp {
>   struct ena_admin_feature_rss_ind_table ind_table;
>  
>   struct ena_admin_feature_intr_moder_desc intr_moderation;
> +
> + struct ena_admin_ena_hw_hints hw_hints;
>   } u;
>  };
>  
> @@ -857,6 +886,8 @@ enum ena_admin_aenq_notification_syndrom {
>   ENA_ADMIN_SUSPEND   = 0,
>  
>   ENA_ADMIN_RESUME= 1,
> +
> + ENA_ADMIN_UPDATE_HINTS  = 2,
>  };
>  
>  struct ena_admin_aenq_entry {
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 46aad3a..f1e4f04 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -508,15 +508,13 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
>  static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx 
> *comp_ctx,
>struct ena_com_admin_queue 
> *admin_queue)
>  {
> - unsigned long flags;
> - u32 start_time;
> + unsigned long flags, timeout;
>   int ret;
>  
> - start_time = ((u32)jiffies_to_usecs(jiffies));
> + timeout = jiffies + usecs_to_jiffies(admin_queue->completion_timeout);
>  
>   while (comp_ctx->status == ENA_CMD_SUBMITTED) {
> - if u32)jiffies_to_usecs(jiffies)) - start_time) >
> - ADMIN_CMD_TIMEOUT_US) {
> + if (time_is_before_jiffies(timeout)) {
>   pr_err("Wait for completion (polling) timeout\n");
>   /* ENA didn't have any completion */
>   spin_lock_irqsave(_queue->q_lock, flags);
> @@ -560,7 +558,8 @@ static int 
> ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
>   int ret;
>  
>   wait_for_completion_timeout(_ctx->wait_event,
> - usecs_to_jiffies(ADMIN_CMD_TIMEOUT_US));
> + usecs_to_jiffies(
> + admin_queue->completion_timeout));
>  
>   /* In case the command wasn't completed find out the root cause.
>* There might be 2 kinds of errors
> @@ -600,12 +599,14 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev 
> *ena_dev, u16 offset)
>   struct ena_com_mmio_read *mmio_read = _dev->mmio_read;
>   volatile struct ena_admin_ena_mmio_req_read_less_resp *read_resp =
>   mmio_read->read_resp;
> - u32 mmio_read_reg, ret;
> + u32 mmio_read_reg, timeout, ret;

Re: [PATCH V2 net 06/20] net/ena: fix NULL dereference when removing the driver after device reset faild

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:24PM +0200, Netanel Belgazal wrote:
> If for some reason the device stop responding and the device reset failed
> to recover the device, the mmio register read datastructure will not be
> reinitialized.

If for some reason the device stops responding, and the device reset
fails to recover the device, the MMIO register read data structure
will not be reinitialized.

> On driver removal, the driver will also tries to reset the device
> but this time the mmio data structure will be NULL.

On driver removal, the driver will also try to reset the device, but
this time the MMIO data structure will be NULL.

> To solve this issue perform the device reset in the remove function only if
> the device is runnig.

To solve this issue, perform the device reset in the remove function
only if the device is running.

Do you have an example of the NULL pointer dereference that you can
paste in? It can be helpful for those searching for a fix for a bug
they've experienced.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 224302c..ad5f78f 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2516,6 +2516,8 @@ static void ena_fw_reset_device(struct work_struct 
> *work)
>  err:
>   rtnl_unlock();
>  
> + clear_bit(ENA_FLAG_DEVICE_RUNNING, >flags);
> +
>   dev_err(>dev,
>   "Reset attempt failed. Can not reset the device\n");
>  }
> @@ -3126,7 +3128,9 @@ static void ena_remove(struct pci_dev *pdev)
>  
>   cancel_work_sync(>resume_io_task);
>  
> - ena_com_dev_reset(ena_dev);
> + /* Reset the device only if the device is running. */
> + if (test_bit(ENA_FLAG_DEVICE_RUNNING, >flags))
> + ena_com_dev_reset(ena_dev);
>  
>   ena_free_mgmnt_irq(adapter);
>  


Re: [PATCH V2 net 06/20] net/ena: fix NULL dereference when removing the driver after device reset faild

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:24PM +0200, Netanel Belgazal wrote:
> If for some reason the device stop responding and the device reset failed
> to recover the device, the mmio register read datastructure will not be
> reinitialized.

If for some reason the device stops responding, and the device reset
fails to recover the device, the MMIO register read data structure
will not be reinitialized.

> On driver removal, the driver will also tries to reset the device
> but this time the mmio data structure will be NULL.

On driver removal, the driver will also try to reset the device, but
this time the MMIO data structure will be NULL.

> To solve this issue perform the device reset in the remove function only if
> the device is runnig.

To solve this issue, perform the device reset in the remove function
only if the device is running.

Do you have an example of the NULL pointer dereference that you can
paste in? It can be helpful for those searching for a fix for a bug
they've experienced.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 224302c..ad5f78f 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2516,6 +2516,8 @@ static void ena_fw_reset_device(struct work_struct 
> *work)
>  err:
>   rtnl_unlock();
>  
> + clear_bit(ENA_FLAG_DEVICE_RUNNING, >flags);
> +
>   dev_err(>dev,
>   "Reset attempt failed. Can not reset the device\n");
>  }
> @@ -3126,7 +3128,9 @@ static void ena_remove(struct pci_dev *pdev)
>  
>   cancel_work_sync(>resume_io_task);
>  
> - ena_com_dev_reset(ena_dev);
> + /* Reset the device only if the device is running. */
> + if (test_bit(ENA_FLAG_DEVICE_RUNNING, >flags))
> + ena_com_dev_reset(ena_dev);
>  
>   ena_free_mgmnt_irq(adapter);
>  


Re: [PATCH V2 net 07/20] net/ena: refactor ena_get_stats64 to be atomic context safe

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:25PM +0200, Netanel Belgazal wrote:
> ndo_get_stat64 can be called from atomic context.
> However the current implementation sends an admin command to retrieve
> the statistics from the device.
> This admin commands uses sleep.

Suggest some comment edits:

ndo_get_stat64() can be called from atomic context, but the current
implementation sends an admin command to retrieve the statistics from
the device. This admin command can sleep.

> Refactor the implementation of ena_get_stats64 to take the
> {rx,tx}bytes/cnt from the driver's inner counters
> and to take the rx drops counter
> from the asynchronous keep alive (heart bit) event.

This patch re-factors the implementation of ena_get_stats64() to use
the {rx,tx}bytes/count from the driver's inner counters, and to obtain
the rx drop counter from the asynchronous keep alive (heart bit)
event.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h |  8 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 57 
> +---
>  drivers/net/ethernet/amazon/ena/ena_netdev.h |  1 +
>  3 files changed, 51 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index f48c886..6d70bf5 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -873,6 +873,14 @@ struct ena_admin_aenq_link_change_desc {
>   u32 flags;
>  };
>  
> +struct ena_admin_aenq_keep_alive_desc {
> + struct ena_admin_aenq_common_desc aenq_common_desc;
> +
> + u32 rx_drops_low;
> +
> + u32 rx_drops_high;
> +};
> +
>  struct ena_admin_ena_mmio_req_read_less_resp {
>   u16 req_id;
>  
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index ad5f78f..962ffb5 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2176,28 +2176,46 @@ static struct rtnl_link_stats64 
> *ena_get_stats64(struct net_device *netdev,
>struct rtnl_link_stats64 
> *stats)
>  {
>   struct ena_adapter *adapter = netdev_priv(netdev);
> - struct ena_admin_basic_stats ena_stats;
> - int rc;
> + struct ena_ring *rx_ring, *tx_ring;
> + unsigned int start;
> + u64 rx_drops;
> + int i;
>  
>   if (!test_bit(ENA_FLAG_DEV_UP, >flags))
>   return NULL;
>  
> - rc = ena_com_get_dev_basic_stats(adapter->ena_dev, _stats);
> - if (rc)
> - return NULL;
> + for (i = 0; i < adapter->num_queues; i++) {
> + u64 bytes, packets;
> +
> + tx_ring = >tx_ring[i];
> +
> + do {
> + start = u64_stats_fetch_begin_irq(_ring->syncp);
> + packets = tx_ring->tx_stats.cnt;
> + bytes = tx_ring->tx_stats.bytes;
> + } while (u64_stats_fetch_retry_irq(_ring->syncp, start));
> +
> + stats->tx_packets += packets;
> + stats->tx_bytes += bytes;
>  
> - stats->tx_bytes = ((u64)ena_stats.tx_bytes_high << 32) |
> - ena_stats.tx_bytes_low;
> - stats->rx_bytes = ((u64)ena_stats.rx_bytes_high << 32) |
> - ena_stats.rx_bytes_low;
> + rx_ring = >rx_ring[i];
> +
> + do {
> + start = u64_stats_fetch_begin_irq(_ring->syncp);
> + packets = rx_ring->rx_stats.cnt;
> + bytes = rx_ring->rx_stats.bytes;
> + } while (u64_stats_fetch_retry_irq(_ring->syncp, start));
>  
> - stats->rx_packets = ((u64)ena_stats.rx_pkts_high << 32) |
> - ena_stats.rx_pkts_low;
> - stats->tx_packets = ((u64)ena_stats.tx_pkts_high << 32) |
> - ena_stats.tx_pkts_low;
> + stats->rx_packets += packets;
> + stats->rx_bytes += bytes;
> + }
> +
> + do {
> + start = u64_stats_fetch_begin_irq(>syncp);
> + rx_drops = adapter->dev_stats.rx_drops;
> + } while (u64_stats_fetch_retry_irq(>syncp, start));
>  
> - stats->rx_dropped = ((u64)ena_stats.rx_drops_high << 32) |
> - ena_stats.rx_drops_low;
> + stats->rx_dropped = rx_drops;
>  
>   stats->multicast = 0;
>   stats->collisions = 0;
> @@ -3221,8 +3239,17 @@ static void ena_keep_alive_wd(void *adapter_data,
> struct ena_admin_aenq_entry *aenq_e)
>  {
>   struct ena_adapter *adapter = (struct ena_adapter *)adapter_data;
> + struct ena_admin_aenq_keep_alive_desc *desc;
> + u64 rx_drops;
>  
> + desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e;
>   adapter->last_keep_alive_jiffies = jiffies;
> +
> + rx_drops = ((u64)desc->rx_drops_high << 32) | desc->rx_drops_low;
> +
> + 

Re: [PATCH V2 net 07/20] net/ena: refactor ena_get_stats64 to be atomic context safe

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:25PM +0200, Netanel Belgazal wrote:
> ndo_get_stat64 can be called from atomic context.
> However the current implementation sends an admin command to retrieve
> the statistics from the device.
> This admin commands uses sleep.

Suggest some comment edits:

ndo_get_stat64() can be called from atomic context, but the current
implementation sends an admin command to retrieve the statistics from
the device. This admin command can sleep.

> Refactor the implementation of ena_get_stats64 to take the
> {rx,tx}bytes/cnt from the driver's inner counters
> and to take the rx drops counter
> from the asynchronous keep alive (heart bit) event.

This patch re-factors the implementation of ena_get_stats64() to use
the {rx,tx}bytes/count from the driver's inner counters, and to obtain
the rx drop counter from the asynchronous keep alive (heart bit)
event.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h |  8 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 57 
> +---
>  drivers/net/ethernet/amazon/ena/ena_netdev.h |  1 +
>  3 files changed, 51 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index f48c886..6d70bf5 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -873,6 +873,14 @@ struct ena_admin_aenq_link_change_desc {
>   u32 flags;
>  };
>  
> +struct ena_admin_aenq_keep_alive_desc {
> + struct ena_admin_aenq_common_desc aenq_common_desc;
> +
> + u32 rx_drops_low;
> +
> + u32 rx_drops_high;
> +};
> +
>  struct ena_admin_ena_mmio_req_read_less_resp {
>   u16 req_id;
>  
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index ad5f78f..962ffb5 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2176,28 +2176,46 @@ static struct rtnl_link_stats64 
> *ena_get_stats64(struct net_device *netdev,
>struct rtnl_link_stats64 
> *stats)
>  {
>   struct ena_adapter *adapter = netdev_priv(netdev);
> - struct ena_admin_basic_stats ena_stats;
> - int rc;
> + struct ena_ring *rx_ring, *tx_ring;
> + unsigned int start;
> + u64 rx_drops;
> + int i;
>  
>   if (!test_bit(ENA_FLAG_DEV_UP, >flags))
>   return NULL;
>  
> - rc = ena_com_get_dev_basic_stats(adapter->ena_dev, _stats);
> - if (rc)
> - return NULL;
> + for (i = 0; i < adapter->num_queues; i++) {
> + u64 bytes, packets;
> +
> + tx_ring = >tx_ring[i];
> +
> + do {
> + start = u64_stats_fetch_begin_irq(_ring->syncp);
> + packets = tx_ring->tx_stats.cnt;
> + bytes = tx_ring->tx_stats.bytes;
> + } while (u64_stats_fetch_retry_irq(_ring->syncp, start));
> +
> + stats->tx_packets += packets;
> + stats->tx_bytes += bytes;
>  
> - stats->tx_bytes = ((u64)ena_stats.tx_bytes_high << 32) |
> - ena_stats.tx_bytes_low;
> - stats->rx_bytes = ((u64)ena_stats.rx_bytes_high << 32) |
> - ena_stats.rx_bytes_low;
> + rx_ring = >rx_ring[i];
> +
> + do {
> + start = u64_stats_fetch_begin_irq(_ring->syncp);
> + packets = rx_ring->rx_stats.cnt;
> + bytes = rx_ring->rx_stats.bytes;
> + } while (u64_stats_fetch_retry_irq(_ring->syncp, start));
>  
> - stats->rx_packets = ((u64)ena_stats.rx_pkts_high << 32) |
> - ena_stats.rx_pkts_low;
> - stats->tx_packets = ((u64)ena_stats.tx_pkts_high << 32) |
> - ena_stats.tx_pkts_low;
> + stats->rx_packets += packets;
> + stats->rx_bytes += bytes;
> + }
> +
> + do {
> + start = u64_stats_fetch_begin_irq(>syncp);
> + rx_drops = adapter->dev_stats.rx_drops;
> + } while (u64_stats_fetch_retry_irq(>syncp, start));
>  
> - stats->rx_dropped = ((u64)ena_stats.rx_drops_high << 32) |
> - ena_stats.rx_drops_low;
> + stats->rx_dropped = rx_drops;
>  
>   stats->multicast = 0;
>   stats->collisions = 0;
> @@ -3221,8 +3239,17 @@ static void ena_keep_alive_wd(void *adapter_data,
> struct ena_admin_aenq_entry *aenq_e)
>  {
>   struct ena_adapter *adapter = (struct ena_adapter *)adapter_data;
> + struct ena_admin_aenq_keep_alive_desc *desc;
> + u64 rx_drops;
>  
> + desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e;
>   adapter->last_keep_alive_jiffies = jiffies;
> +
> + rx_drops = ((u64)desc->rx_drops_high << 32) | desc->rx_drops_low;
> +
> + u64_stats_update_begin(>syncp);
> + 

Re: [PATCH V2 net 05/20] net/ena: fix RSS default hash configuration

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:23PM +0200, Netanel Belgazal wrote:
> ENA default hash configure IPv4_frag hash twice instead of

configure -> configures. You may want to include "erroneously". What
is the consequence of this bug?

> configure non ip packets.

configuring non-IP packets.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 3066d9c..46aad3a 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -2184,7 +2184,7 @@ int ena_com_set_default_hash_ctrl(struct ena_com_dev 
> *ena_dev)
>   hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields =
>   ENA_ADMIN_RSS_L3_SA | ENA_ADMIN_RSS_L3_DA;
>  
> - hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields =
> + hash_ctrl->selected_fields[ENA_ADMIN_RSS_NOT_IP].fields =
>   ENA_ADMIN_RSS_L2_DA | ENA_ADMIN_RSS_L2_SA;
>  
>   for (i = 0; i < ENA_ADMIN_RSS_PROTO_NUM; i++) {


Re: [PATCH V2 net 05/20] net/ena: fix RSS default hash configuration

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:23PM +0200, Netanel Belgazal wrote:
> ENA default hash configure IPv4_frag hash twice instead of

configure -> configures. You may want to include "erroneously". What
is the consequence of this bug?

> configure non ip packets.

configuring non-IP packets.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 3066d9c..46aad3a 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -2184,7 +2184,7 @@ int ena_com_set_default_hash_ctrl(struct ena_com_dev 
> *ena_dev)
>   hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields =
>   ENA_ADMIN_RSS_L3_SA | ENA_ADMIN_RSS_L3_DA;
>  
> - hash_ctrl->selected_fields[ENA_ADMIN_RSS_IP4_FRAG].fields =
> + hash_ctrl->selected_fields[ENA_ADMIN_RSS_NOT_IP].fields =
>   ENA_ADMIN_RSS_L2_DA | ENA_ADMIN_RSS_L2_SA;
>  
>   for (i = 0; i < ENA_ADMIN_RSS_PROTO_NUM; i++) {


Re: [PATCH V2 net 04/20] net/ena: fix ethtool RSS flow configuration

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:22PM +0200, Netanel Belgazal wrote:
> ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
> treat the ena_flow_hash_to_flow_type enum as power of two values.
> 
> Change the values of ena_admin_flow_hash_fields to be power of two values.

Then I generally prefer BIT(0), BIT(1), BIT(2), etc.

Also it would be helpful to include some comments about the
consequences of the current state of the code.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index a46e749..f48c886 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -631,22 +631,22 @@ enum ena_admin_flow_hash_proto {
>  /* RSS flow hash fields */
>  enum ena_admin_flow_hash_fields {
>   /* Ethernet Dest Addr */
> - ENA_ADMIN_RSS_L2_DA = 0,
> + ENA_ADMIN_RSS_L2_DA = 0x1,
>  
>   /* Ethernet Src Addr */
> - ENA_ADMIN_RSS_L2_SA = 1,
> + ENA_ADMIN_RSS_L2_SA = 0x2,
>  
>   /* ipv4/6 Dest Addr */
> - ENA_ADMIN_RSS_L3_DA = 2,
> + ENA_ADMIN_RSS_L3_DA = 0x4,
>  
>   /* ipv4/6 Src Addr */
> - ENA_ADMIN_RSS_L3_SA = 5,
> + ENA_ADMIN_RSS_L3_SA = 0x8,
>  
>   /* tcp/udp Dest Port */
> - ENA_ADMIN_RSS_L4_DP = 6,
> + ENA_ADMIN_RSS_L4_DP = 0x10,
>  
>   /* tcp/udp Src Port */
> - ENA_ADMIN_RSS_L4_SP = 7,
> + ENA_ADMIN_RSS_L4_SP = 0x20,
>  };
>  
>  struct ena_admin_proto_input {


Re: [PATCH V2 net 04/20] net/ena: fix ethtool RSS flow configuration

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:22PM +0200, Netanel Belgazal wrote:
> ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
> treat the ena_flow_hash_to_flow_type enum as power of two values.
> 
> Change the values of ena_admin_flow_hash_fields to be power of two values.

Then I generally prefer BIT(0), BIT(1), BIT(2), etc.

Also it would be helpful to include some comments about the
consequences of the current state of the code.

--msw

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> index a46e749..f48c886 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> @@ -631,22 +631,22 @@ enum ena_admin_flow_hash_proto {
>  /* RSS flow hash fields */
>  enum ena_admin_flow_hash_fields {
>   /* Ethernet Dest Addr */
> - ENA_ADMIN_RSS_L2_DA = 0,
> + ENA_ADMIN_RSS_L2_DA = 0x1,
>  
>   /* Ethernet Src Addr */
> - ENA_ADMIN_RSS_L2_SA = 1,
> + ENA_ADMIN_RSS_L2_SA = 0x2,
>  
>   /* ipv4/6 Dest Addr */
> - ENA_ADMIN_RSS_L3_DA = 2,
> + ENA_ADMIN_RSS_L3_DA = 0x4,
>  
>   /* ipv4/6 Src Addr */
> - ENA_ADMIN_RSS_L3_SA = 5,
> + ENA_ADMIN_RSS_L3_SA = 0x8,
>  
>   /* tcp/udp Dest Port */
> - ENA_ADMIN_RSS_L4_DP = 6,
> + ENA_ADMIN_RSS_L4_DP = 0x10,
>  
>   /* tcp/udp Src Port */
> - ENA_ADMIN_RSS_L4_SP = 7,
> + ENA_ADMIN_RSS_L4_SP = 0x20,
>  };
>  
>  struct ena_admin_proto_input {


Re: [PATCH V2 net 03/20] net/ena: fix queues number calculation

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:21PM +0200, Netanel Belgazal wrote:
> The ENA driver tries to open a queue per vCPU.
> To determine how many vCPUs the instance have it uses num_possible_cpus
> while it should have use num_online_cpus instead.

use () when referring to functions: num_possible_cpus(), num_online_cpus().

> Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com>

Reviewed-by: Matt Wilson <m...@amazon.com>

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 397c9bc..224302c 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2667,7 +2667,7 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev,
>   io_sq_num = get_feat_ctx->max_queues.max_sq_num;
>   }
>  
> - io_queue_num = min_t(int, num_possible_cpus(), ENA_MAX_NUM_IO_QUEUES);
> + io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
>   io_queue_num = min_t(int, io_queue_num, io_sq_num);
>   io_queue_num = min_t(int, io_queue_num,
>get_feat_ctx->max_queues.max_cq_num);


Re: [PATCH V2 net 03/20] net/ena: fix queues number calculation

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:21PM +0200, Netanel Belgazal wrote:
> The ENA driver tries to open a queue per vCPU.
> To determine how many vCPUs the instance have it uses num_possible_cpus
> while it should have use num_online_cpus instead.

use () when referring to functions: num_possible_cpus(), num_online_cpus().

> Signed-off-by: Netanel Belgazal 

Reviewed-by: Matt Wilson 

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 397c9bc..224302c 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2667,7 +2667,7 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev,
>   io_sq_num = get_feat_ctx->max_queues.max_sq_num;
>   }
>  
> - io_queue_num = min_t(int, num_possible_cpus(), ENA_MAX_NUM_IO_QUEUES);
> + io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
>   io_queue_num = min_t(int, io_queue_num, io_sq_num);
>   io_queue_num = min_t(int, io_queue_num,
>get_feat_ctx->max_queues.max_cq_num);


Re: [PATCH V2 net 02/20] net/ena: fix error handling when probe fails

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:20PM +0200, Netanel Belgazal wrote:
> When driver fails in probe, it will release all resources, including
> adapter.
> In case of probe failure, ena_remove should not try to free the adapter
> resources.

Please word wrap your commit message around 75 columns.

> Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com>

Reviewed-by: Matt Wilson <m...@amazon.com>

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 33a760e..397c9bc 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3054,6 +3054,7 @@ static int ena_probe(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>  err_free_region:
>   ena_release_bars(ena_dev, pdev);
>  err_free_ena_dev:
> + pci_set_drvdata(pdev, NULL);
>   vfree(ena_dev);
>  err_disable_device:
>   pci_disable_device(pdev);


Re: [PATCH V2 net 02/20] net/ena: fix error handling when probe fails

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:20PM +0200, Netanel Belgazal wrote:
> When driver fails in probe, it will release all resources, including
> adapter.
> In case of probe failure, ena_remove should not try to free the adapter
> resources.

Please word wrap your commit message around 75 columns.

> Signed-off-by: Netanel Belgazal 

Reviewed-by: Matt Wilson 

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 33a760e..397c9bc 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3054,6 +3054,7 @@ static int ena_probe(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>  err_free_region:
>   ena_release_bars(ena_dev, pdev);
>  err_free_ena_dev:
> + pci_set_drvdata(pdev, NULL);
>   vfree(ena_dev);
>  err_disable_device:
>   pci_disable_device(pdev);


Re: [PATCH V2 net 01/20] net/ena: remove ntuple filter support from device feature list

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:19PM +0200, Netanel Belgazal wrote:
> Remove NETIF_F_NTUPLE from netdev->features.
> The ENA device driver does not support ntuple filtering.
> 
> Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com>

Reviewed-by: Matt Wilson <m...@amazon.com>

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index bfeaec5..33a760e 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2729,7 +2729,6 @@ static void ena_set_dev_offloads(struct 
> ena_com_dev_get_features_ctx *feat,
>   netdev->features =
>   dev_features |
>   NETIF_F_SG |
> - NETIF_F_NTUPLE |
>   NETIF_F_RXHASH |
>   NETIF_F_HIGHDMA;
>  


Re: [PATCH V2 net 01/20] net/ena: remove ntuple filter support from device feature list

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 03:19:19PM +0200, Netanel Belgazal wrote:
> Remove NETIF_F_NTUPLE from netdev->features.
> The ENA device driver does not support ntuple filtering.
> 
> Signed-off-by: Netanel Belgazal 

Reviewed-by: Matt Wilson 

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index bfeaec5..33a760e 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2729,7 +2729,6 @@ static void ena_set_dev_offloads(struct 
> ena_com_dev_get_features_ctx *feat,
>   netdev->features =
>   dev_features |
>   NETIF_F_SG |
> - NETIF_F_NTUPLE |
>   NETIF_F_RXHASH |
>   NETIF_F_HIGHDMA;
>  


Re: [PATCH V2 net 00/20] Increase ENA driver version to 1.1.2

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 09:37:43PM -0500, David Miller wrote:
> 
> It is not appropriate to submit so many patches at one time.

Indeed, https://www.kernel.org/doc/Documentation/SubmittingPatches
recommends submitting no more than 15 or so at once.

> Please keep your patch series to no more than about a dozen
> at a time.

How about 15 from SubmittingPatches? The first 15 in the series are
all important bugfixes. Should Netanel resubmit a series with just the
bugfixes and a new cover letter? Or are you willing to consider the
first 15 of this series as posted?

> Also, group your changes logically and tie an appropriately
> descriptive cover letter.
> 
> "Increase driver version to X.Y.Z" tells the reader absolutely
> nothing.  Someone reading that Subject line in the GIT logs
> will have no idea what the overall purpose of the patch series
> is and what it accomplishes.

You're right, the cover letter subject needs to be better. There is
only one commit submitted with the subject "increase driver version to
1.1.2." - Patch 20/20. It is logically like:

commit b8b2372de9cc00d5ed667c7b8db29b6cfbf037f5
Author: Manish Chopra 
Date:   Wed Aug 3 04:02:04 2016 -0400

qlcnic: Update version to 5.3.65

Signed-off-by: Manish Chopra 
Signed-off-by: David S. Miller 

[...]

commit ae33256c55d2fefcad8712e750b846461994a1af
Author: Bimmy Pujari 
Date:   Mon Jun 20 09:10:39 2016 -0700

i40e/i40evf-bump version to 1.6.11

Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 

[...]

commit 5264cc63ba10ebfa0e54e3e641cce2656c7a60e8
Author: Jacob Keller 
Date:   Tue Jun 7 16:09:02 2016 -0700

fm10k: bump version number

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 

[...]

commit a58a3e68037647de78e3461194239a1104f76003
Author: Michael Chan 
Date:   Fri Jul 1 18:46:20 2016 -0400

bnxt_en: Update firmware spec. to 1.3.0.

And update driver version to 1.3.0.

Signed-off-by: Michael Chan 
Signed-off-by: David S. Miller 

> You really need to describe the high level purpose of the patch set.
> Is it adding a new feature?  What is that feature?  Why are you
> adding that feature?  How is that feature implemented?  Why is
> it implemented that way?

The priority is to get bug fixes to the ENA driver in 4.9. Let's focus
on the first 15.

--msw


Re: [PATCH V2 net 00/20] Increase ENA driver version to 1.1.2

2016-12-04 Thread Matt Wilson
On Sun, Dec 04, 2016 at 09:37:43PM -0500, David Miller wrote:
> 
> It is not appropriate to submit so many patches at one time.

Indeed, https://www.kernel.org/doc/Documentation/SubmittingPatches
recommends submitting no more than 15 or so at once.

> Please keep your patch series to no more than about a dozen
> at a time.

How about 15 from SubmittingPatches? The first 15 in the series are
all important bugfixes. Should Netanel resubmit a series with just the
bugfixes and a new cover letter? Or are you willing to consider the
first 15 of this series as posted?

> Also, group your changes logically and tie an appropriately
> descriptive cover letter.
> 
> "Increase driver version to X.Y.Z" tells the reader absolutely
> nothing.  Someone reading that Subject line in the GIT logs
> will have no idea what the overall purpose of the patch series
> is and what it accomplishes.

You're right, the cover letter subject needs to be better. There is
only one commit submitted with the subject "increase driver version to
1.1.2." - Patch 20/20. It is logically like:

commit b8b2372de9cc00d5ed667c7b8db29b6cfbf037f5
Author: Manish Chopra 
Date:   Wed Aug 3 04:02:04 2016 -0400

qlcnic: Update version to 5.3.65

Signed-off-by: Manish Chopra 
Signed-off-by: David S. Miller 

[...]

commit ae33256c55d2fefcad8712e750b846461994a1af
Author: Bimmy Pujari 
Date:   Mon Jun 20 09:10:39 2016 -0700

i40e/i40evf-bump version to 1.6.11

Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 

[...]

commit 5264cc63ba10ebfa0e54e3e641cce2656c7a60e8
Author: Jacob Keller 
Date:   Tue Jun 7 16:09:02 2016 -0700

fm10k: bump version number

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 

[...]

commit a58a3e68037647de78e3461194239a1104f76003
Author: Michael Chan 
Date:   Fri Jul 1 18:46:20 2016 -0400

bnxt_en: Update firmware spec. to 1.3.0.

And update driver version to 1.3.0.

Signed-off-by: Michael Chan 
Signed-off-by: David S. Miller 

> You really need to describe the high level purpose of the patch set.
> Is it adding a new feature?  What is that feature?  Why are you
> adding that feature?  How is that feature implemented?  Why is
> it implemented that way?

The priority is to get bug fixes to the ENA driver in 4.9. Let's focus
on the first 15.

--msw


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-14 Thread Matt Wilson
On Thu, Jul 14, 2016 at 09:08:03AM -0700, Benjamin Poirier wrote:
> On 2016/07/14 08:22, Matt Wilson wrote:
> [...]
> > 
> > Dave and Benjamin,
> > 
> > Do you want to see the interrupt moderation extensions to ethtool and
> > the sysfs nodes removed before this lands in net-next? Or should
> > Netanel remove the sysfs bits until we can extend the ethtool
> > interfaces to cover the parameters that ena uses?
> 
> I couldn't say what's acceptable or not. A few other drivers (qlcnic,
> sfc, ...) already have sysfs tunables. Maybe John, as the new ethtool
> maintainer, can weight in too about the changes required to ethtool.

We definitely want ethtool to handle all the settings, it's just a
question of when. We also want to address and resolve all the great
feedback so far, and since you originally raised the point about
extending ethtool I wanted to see if you have any major objection.

--msw


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-14 Thread Matt Wilson
On Thu, Jul 14, 2016 at 09:08:03AM -0700, Benjamin Poirier wrote:
> On 2016/07/14 08:22, Matt Wilson wrote:
> [...]
> > 
> > Dave and Benjamin,
> > 
> > Do you want to see the interrupt moderation extensions to ethtool and
> > the sysfs nodes removed before this lands in net-next? Or should
> > Netanel remove the sysfs bits until we can extend the ethtool
> > interfaces to cover the parameters that ena uses?
> 
> I couldn't say what's acceptable or not. A few other drivers (qlcnic,
> sfc, ...) already have sysfs tunables. Maybe John, as the new ethtool
> maintainer, can weight in too about the changes required to ethtool.

We definitely want ethtool to handle all the settings, it's just a
question of when. We also want to address and resolve all the great
feedback so far, and since you originally raised the point about
extending ethtool I wanted to see if you have any major objection.

--msw


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-14 Thread Matt Wilson
On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote:
> This is a driver for the ENA family of networking devices.
> 
> Signed-off-by: Netanel Belgazal 
> ---
> 
> Notes:
> Changes in v3:
> - Fix compilation warning for 32bit systems. [kbuild test rebot]
> - Replace netdev->num_tx_queues with adapter->num_queues in ethtool. 
> [b...@decadent.org.uk]
> - Use get_link_ksettings instread of get_settings (deprecated ethtool 
> API). [b...@decadent.org.uk]
> - Remove rna_nway_reset (ethtool). [b...@decadent.org.uk]
> - Add missing break in switch condition. [b...@decadent.org.uk]
> - Increase driver version to 1.0.2
> 
> Changes in v2:
> - Increment driver version to 1.0.1
> - Reorder the initialization of the workqueus and the timer service
> - Initialize last_keep_alive_jiffies in probe
> - Remove ena_trc_* functions in favor of pr_* [da...@davemloft.net]
> - Fix Tx interrupt control register value [rom...@fr.zoreil.com]
> - Edit ena.txt to focus on technical details [rom...@fr.zoreil.com]
> - Rename small_copy_len tunable to rx_copybreak 
> [benjamin.poir...@gmail.com]
> - Change uint32_t to u32 [rom...@fr.zoreil.com]
> - Return error in ena_com_set_hash_ctrl [rami.ro...@intel.com]
> - Fix API documentation [rami.ro...@intel.com]
> - Remove redundant check in ena_com_mem_addr_set [rom...@fr.zoreil.com]
> - Add local variables for allocation methods to improve readability 
> [rom...@fr.zoreil.com]
> - Replace dma_alloc_coherent and __GFP_ZERO flag with dma_zalloc_coherent 
> [rom...@fr.zoreil.com]
> - Fix style when using dma_alloc_coherent [rom...@fr.zoreil.com]
> - Tx xmit - In case of an error drop the packet and return NETDEV_TX_OK 
> [rom...@fr.zoreil.com]
> - Reduce the number of parameters in ena_get_dev_stats 
> [rom...@fr.zoreil.com]
> - Return ret and not 0 in ena_com_set_hash_ctrl [rami.ro...@intel.com]
> - Do not initialize variables if the function doesn't read them before 
> the second write [rom...@fr.zoreil.com]
> - Enums - Replace space with tabs to line the assign value 
> [rom...@fr.zoreil.com]
> - Remove redundant comments [rom...@fr.zoreil.com]
> 
> TODO:
> - Add bytes variable to set_coalesce ethtool callback and use this 
> callback
>   instead of dedicated sysfs files.

Dave and Benjamin,

Do you want to see the interrupt moderation extensions to ethtool and
the sysfs nodes removed before this lands in net-next? Or should
Netanel remove the sysfs bits until we can extend the ethtool
interfaces to cover the parameters that ena uses?

--msw


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-14 Thread Matt Wilson
On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote:
> This is a driver for the ENA family of networking devices.
> 
> Signed-off-by: Netanel Belgazal 
> ---
> 
> Notes:
> Changes in v3:
> - Fix compilation warning for 32bit systems. [kbuild test rebot]
> - Replace netdev->num_tx_queues with adapter->num_queues in ethtool. 
> [b...@decadent.org.uk]
> - Use get_link_ksettings instread of get_settings (deprecated ethtool 
> API). [b...@decadent.org.uk]
> - Remove rna_nway_reset (ethtool). [b...@decadent.org.uk]
> - Add missing break in switch condition. [b...@decadent.org.uk]
> - Increase driver version to 1.0.2
> 
> Changes in v2:
> - Increment driver version to 1.0.1
> - Reorder the initialization of the workqueus and the timer service
> - Initialize last_keep_alive_jiffies in probe
> - Remove ena_trc_* functions in favor of pr_* [da...@davemloft.net]
> - Fix Tx interrupt control register value [rom...@fr.zoreil.com]
> - Edit ena.txt to focus on technical details [rom...@fr.zoreil.com]
> - Rename small_copy_len tunable to rx_copybreak 
> [benjamin.poir...@gmail.com]
> - Change uint32_t to u32 [rom...@fr.zoreil.com]
> - Return error in ena_com_set_hash_ctrl [rami.ro...@intel.com]
> - Fix API documentation [rami.ro...@intel.com]
> - Remove redundant check in ena_com_mem_addr_set [rom...@fr.zoreil.com]
> - Add local variables for allocation methods to improve readability 
> [rom...@fr.zoreil.com]
> - Replace dma_alloc_coherent and __GFP_ZERO flag with dma_zalloc_coherent 
> [rom...@fr.zoreil.com]
> - Fix style when using dma_alloc_coherent [rom...@fr.zoreil.com]
> - Tx xmit - In case of an error drop the packet and return NETDEV_TX_OK 
> [rom...@fr.zoreil.com]
> - Reduce the number of parameters in ena_get_dev_stats 
> [rom...@fr.zoreil.com]
> - Return ret and not 0 in ena_com_set_hash_ctrl [rami.ro...@intel.com]
> - Do not initialize variables if the function doesn't read them before 
> the second write [rom...@fr.zoreil.com]
> - Enums - Replace space with tabs to line the assign value 
> [rom...@fr.zoreil.com]
> - Remove redundant comments [rom...@fr.zoreil.com]
> 
> TODO:
> - Add bytes variable to set_coalesce ethtool callback and use this 
> callback
>   instead of dedicated sysfs files.

Dave and Benjamin,

Do you want to see the interrupt moderation extensions to ethtool and
the sysfs nodes removed before this lands in net-next? Or should
Netanel remove the sysfs bits until we can extend the ethtool
interfaces to cover the parameters that ena uses?

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-18 Thread Matt Wilson
On Thu, Jun 16, 2016 at 07:06:52PM +0100, Ben Hutchings wrote:
> On Thu, 2016-06-16 at 10:55 -0700, Benjamin Poirier wrote:
> > On 2016/06/13 11:46, Netanel Belgazal wrote:
> [...]
> > > +static ssize_t ena_show_small_copy_len(struct device *dev,
> > > +    struct device_attribute *attr, char *buf)
> > > +{
> > > + struct ena_adapter *adapter = dev_get_drvdata(dev);
> > > +
> > > + return sprintf(buf, "%d\n", adapter->small_copy_len);
> > > +}
> > > +
> > > +static DEVICE_ATTR(small_copy_len, S_IRUGO | S_IWUSR, 
> > > ena_show_small_copy_len,
> > > +    ena_store_small_copy_len);
> > 
> > This is what many other drivers call (rx_)copybreak. Perhaps it's time
> > to add it to ethtool as well?
> [...]
> 
> There is the 'tunable' ethtool API for random parameters like
> rx_copybreak.  The ethtool utility doesn't support it yet, though.

So I started implementing proper support for this in ethtool in a
generic way (retrieving the string set, parsing generically, etc), but
it seems that the implementation isn't quite as complete as one would
like. You said [1] that the kernel returns the type of each tunable,
but I don't see this implemented anywhere.

The string set also includes "Unspec", which is of dubious value.

I think that something tunable specific is going to need to be added
apart from ETHTOOL_GSTRINGS to provide ethtool the needed type
information.

--msw

[1] http://thread.gmane.org/gmane.linux.network/367058/focus=376701


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-18 Thread Matt Wilson
On Thu, Jun 16, 2016 at 07:06:52PM +0100, Ben Hutchings wrote:
> On Thu, 2016-06-16 at 10:55 -0700, Benjamin Poirier wrote:
> > On 2016/06/13 11:46, Netanel Belgazal wrote:
> [...]
> > > +static ssize_t ena_show_small_copy_len(struct device *dev,
> > > +    struct device_attribute *attr, char *buf)
> > > +{
> > > + struct ena_adapter *adapter = dev_get_drvdata(dev);
> > > +
> > > + return sprintf(buf, "%d\n", adapter->small_copy_len);
> > > +}
> > > +
> > > +static DEVICE_ATTR(small_copy_len, S_IRUGO | S_IWUSR, 
> > > ena_show_small_copy_len,
> > > +    ena_store_small_copy_len);
> > 
> > This is what many other drivers call (rx_)copybreak. Perhaps it's time
> > to add it to ethtool as well?
> [...]
> 
> There is the 'tunable' ethtool API for random parameters like
> rx_copybreak.  The ethtool utility doesn't support it yet, though.

So I started implementing proper support for this in ethtool in a
generic way (retrieving the string set, parsing generically, etc), but
it seems that the implementation isn't quite as complete as one would
like. You said [1] that the kernel returns the type of each tunable,
but I don't see this implemented anywhere.

The string set also includes "Unspec", which is of dubious value.

I think that something tunable specific is going to need to be added
apart from ETHTOOL_GSTRINGS to provide ethtool the needed type
information.

--msw

[1] http://thread.gmane.org/gmane.linux.network/367058/focus=376701


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-16 Thread Matt Wilson
On Thu, Jun 16, 2016 at 10:46:23PM +0200, Francois Romieu wrote:
> Netanel Belgazal  :
> [...]
> 
> Very limited review below.

I'll comment on the documentation (since I edited it heavily) but will
leave some the other parts for Netanel to answer.

> > diff --git a/Documentation/networking/ena.txt 
> > b/Documentation/networking/ena.txt
> > new file mode 100644
> > index 000..528544f
> > --- /dev/null
> > +++ b/Documentation/networking/ena.txt
> > @@ -0,0 +1,330 @@
> > +Linux kernel driver for Elastic Network Adapter (ENA) family:
> > +=
> > +
> > +Overview:
> > +=
> > +The ENA driver provides a modern Ethernet device interface optimized
> > +for high performance and low CPU overhead.
> 
> Marketing boilerplate. How much of it will still be true in 6 months ?
> 6 years ?
> 
> Hint: stay technical. If in doubt, make it boring. :o)

Hopefully it will still be true for quite a while. But you make a good
point, the overview here should stick with the technical details.

> [...]
> > +ENA devices allow high speed and low overhead Ethernet traffic
> > +processing by providing a dedicated Tx/Rx queue pair per host CPU, a
> > +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive
> > +interrupt moderation, and CPU cacheline optimized data placement.
> 
> How many queues exactly ?

It's negotiated with the adapter via the admin queue, and it will vary
based on the capabilities of a particular adapter. See:

rc = ena_com_get_feature(ena_dev, _resp,
 ENA_ADMIN_MAX_QUEUES_NUM);

in ena_com.c

> [...]
> > +Memory Allocations:
> > +===
> > +DMA Coherent buffers are allocated for the f`ollowing DMA rings:
> > +- Tx submission ring (For regular mode; for LLQ mode it is allocated
> > +  using kzalloc)
> > +- Tx completion ring
> > +- Rx submission ring
> > +- Rx completion ring
> > +- Admin submission ring
> > +- Admin completion ring
> > +- AENQ ring
> 
> Useless documentation (implementation detail).
> 
> Nobody will maintain it when the driver changes. Please remove.

Ack.

> [...]
> > +MTU:
> > +
> > +The driver supports an arbitrarily large MTU with a maximum that is
> > +negotiated with the device. The driver configures MTU using the
> > +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
> > +via ifconfig(8) and ip(8).
> 
> Please avoid advertising legacy ifconfig.

Ack.

> [...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> > new file mode 100644
> > index 000..8516e5e
> > --- /dev/null
> > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> [...]
> > +/* admin commands opcodes */
> > +enum ena_admin_aq_opcode {
> > +   /* create submission queue */
> > +   ENA_ADMIN_CREATE_SQ = 1,
> > +
> > +   /* destroy submission queue */
> > +   ENA_ADMIN_DESTROY_SQ = 2,
> > +
> > +   /* create completion queue */
> > +   ENA_ADMIN_CREATE_CQ = 3,
> > +
> > +   /* destroy completion queue */
> > +   ENA_ADMIN_DESTROY_CQ = 4,
> > +
> > +   /* get capabilities of particular feature */
> > +   ENA_ADMIN_GET_FEATURE = 8,
> > +
> > +   /* get capabilities of particular feature */
> > +   ENA_ADMIN_SET_FEATURE = 9,
> > +
> > +   /* get statistics */
> > +   ENA_ADMIN_GET_STATS = 11,
> > +};
> 
> The documentation above simply duplicates the member names -> useless
> visual payload.
> 
> Please replace space with tab before "=" to line things up.

I'll leave this to Netanel. A good amount of this comes from a unified
documentation / struct / register access generation script, and it
might need some massaging.

> [...]
> > +/* Basic Statistics Command. */
> > +struct ena_admin_basic_stats {
> > +   /* word 0 :  */
> > +   u32 tx_bytes_low;
> > +
> > +   /* word 1 :  */
> > +   u32 tx_bytes_high;
> > +
> > +   /* word 2 :  */
> > +   u32 tx_pkts_low;
> > +
> > +   /* word 3 :  */
> > +   u32 tx_pkts_high;
> > +
> > +   /* word 4 :  */
> > +   u32 rx_bytes_low;
> > +
> > +   /* word 5 :  */
> > +   u32 rx_bytes_high;
> > +
> > +   /* word 6 :  */
> > +   u32 rx_pkts_low;
> > +
> > +   /* word 7 :  */
> > +   u32 rx_pkts_high;
> > +
> > +   /* word 8 :  */
> > +   u32 rx_drops_low;
> > +
> > +   /* word 9 :  */
> > +   u32 rx_drops_high;
> > +};
> 
> struct zorg {
>   u32 ...;
>   u32 ...;
>   u32 ...;
>   u32 ...;
> 
>   u32 ...;
>   u32 ...;
>   u32 ...;
>   u32 ...;
> 
>   u32 ...;
>   u32 ...;
>   u32 ...;
> };
> 
> -> No comment needed to figure the offset.
>Comment removed => no need to check if offset starts at word 0 or word 1.
> 
> [...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> > b/drivers/net/ethernet/amazon/ena/ena_com.c
> > new file mode 100644
> > index 000..357e10f
> > --- /dev/null
> > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> [...]
> > +static inline int 

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-16 Thread Matt Wilson
On Thu, Jun 16, 2016 at 10:46:23PM +0200, Francois Romieu wrote:
> Netanel Belgazal  :
> [...]
> 
> Very limited review below.

I'll comment on the documentation (since I edited it heavily) but will
leave some the other parts for Netanel to answer.

> > diff --git a/Documentation/networking/ena.txt 
> > b/Documentation/networking/ena.txt
> > new file mode 100644
> > index 000..528544f
> > --- /dev/null
> > +++ b/Documentation/networking/ena.txt
> > @@ -0,0 +1,330 @@
> > +Linux kernel driver for Elastic Network Adapter (ENA) family:
> > +=
> > +
> > +Overview:
> > +=
> > +The ENA driver provides a modern Ethernet device interface optimized
> > +for high performance and low CPU overhead.
> 
> Marketing boilerplate. How much of it will still be true in 6 months ?
> 6 years ?
> 
> Hint: stay technical. If in doubt, make it boring. :o)

Hopefully it will still be true for quite a while. But you make a good
point, the overview here should stick with the technical details.

> [...]
> > +ENA devices allow high speed and low overhead Ethernet traffic
> > +processing by providing a dedicated Tx/Rx queue pair per host CPU, a
> > +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive
> > +interrupt moderation, and CPU cacheline optimized data placement.
> 
> How many queues exactly ?

It's negotiated with the adapter via the admin queue, and it will vary
based on the capabilities of a particular adapter. See:

rc = ena_com_get_feature(ena_dev, _resp,
 ENA_ADMIN_MAX_QUEUES_NUM);

in ena_com.c

> [...]
> > +Memory Allocations:
> > +===
> > +DMA Coherent buffers are allocated for the f`ollowing DMA rings:
> > +- Tx submission ring (For regular mode; for LLQ mode it is allocated
> > +  using kzalloc)
> > +- Tx completion ring
> > +- Rx submission ring
> > +- Rx completion ring
> > +- Admin submission ring
> > +- Admin completion ring
> > +- AENQ ring
> 
> Useless documentation (implementation detail).
> 
> Nobody will maintain it when the driver changes. Please remove.

Ack.

> [...]
> > +MTU:
> > +
> > +The driver supports an arbitrarily large MTU with a maximum that is
> > +negotiated with the device. The driver configures MTU using the
> > +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
> > +via ifconfig(8) and ip(8).
> 
> Please avoid advertising legacy ifconfig.

Ack.

> [...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> > b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> > new file mode 100644
> > index 000..8516e5e
> > --- /dev/null
> > +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> [...]
> > +/* admin commands opcodes */
> > +enum ena_admin_aq_opcode {
> > +   /* create submission queue */
> > +   ENA_ADMIN_CREATE_SQ = 1,
> > +
> > +   /* destroy submission queue */
> > +   ENA_ADMIN_DESTROY_SQ = 2,
> > +
> > +   /* create completion queue */
> > +   ENA_ADMIN_CREATE_CQ = 3,
> > +
> > +   /* destroy completion queue */
> > +   ENA_ADMIN_DESTROY_CQ = 4,
> > +
> > +   /* get capabilities of particular feature */
> > +   ENA_ADMIN_GET_FEATURE = 8,
> > +
> > +   /* get capabilities of particular feature */
> > +   ENA_ADMIN_SET_FEATURE = 9,
> > +
> > +   /* get statistics */
> > +   ENA_ADMIN_GET_STATS = 11,
> > +};
> 
> The documentation above simply duplicates the member names -> useless
> visual payload.
> 
> Please replace space with tab before "=" to line things up.

I'll leave this to Netanel. A good amount of this comes from a unified
documentation / struct / register access generation script, and it
might need some massaging.

> [...]
> > +/* Basic Statistics Command. */
> > +struct ena_admin_basic_stats {
> > +   /* word 0 :  */
> > +   u32 tx_bytes_low;
> > +
> > +   /* word 1 :  */
> > +   u32 tx_bytes_high;
> > +
> > +   /* word 2 :  */
> > +   u32 tx_pkts_low;
> > +
> > +   /* word 3 :  */
> > +   u32 tx_pkts_high;
> > +
> > +   /* word 4 :  */
> > +   u32 rx_bytes_low;
> > +
> > +   /* word 5 :  */
> > +   u32 rx_bytes_high;
> > +
> > +   /* word 6 :  */
> > +   u32 rx_pkts_low;
> > +
> > +   /* word 7 :  */
> > +   u32 rx_pkts_high;
> > +
> > +   /* word 8 :  */
> > +   u32 rx_drops_low;
> > +
> > +   /* word 9 :  */
> > +   u32 rx_drops_high;
> > +};
> 
> struct zorg {
>   u32 ...;
>   u32 ...;
>   u32 ...;
>   u32 ...;
> 
>   u32 ...;
>   u32 ...;
>   u32 ...;
>   u32 ...;
> 
>   u32 ...;
>   u32 ...;
>   u32 ...;
> };
> 
> -> No comment needed to figure the offset.
>Comment removed => no need to check if offset starts at word 0 or word 1.
> 
> [...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> > b/drivers/net/ethernet/amazon/ena/ena_com.c
> > new file mode 100644
> > index 000..357e10f
> > --- /dev/null
> > +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> [...]
> > +static inline int ena_com_mem_addr_set(struct ena_com_dev 

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Wed, Jun 15, 2016 at 09:32:35PM +0300, Netanel Belgazal wrote:
> I removed HAVE_NETDEV_NAPI_LIST and fixed the ena print macro.
> I'll wait a bit to see if there are additional comment before I'll send the
> v2 patch.

Also, please don't top post (reply inline) and send emails as plain
text only. Have a look at [1] for how to set up a mailer like mutt
with Gmail.

--msw

[1] https://www.kernel.org/doc/Documentation/email-clients.txt

> On 15 June 2016 at 21:22, Matt Wilson <m...@amzn.com> wrote:
> 
> > On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote:
> > >
> > > You also had a few typos in the email addresses on the Cc: line that
> > > I've corrected (d...@avemeloft.net -> da...@davemeloft.net,
> >
> > Argh, and of course I typo'ed the correction. -> da...@davemloft.net.
> >
> > --msw
> >


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Wed, Jun 15, 2016 at 09:32:35PM +0300, Netanel Belgazal wrote:
> I removed HAVE_NETDEV_NAPI_LIST and fixed the ena print macro.
> I'll wait a bit to see if there are additional comment before I'll send the
> v2 patch.

Also, please don't top post (reply inline) and send emails as plain
text only. Have a look at [1] for how to set up a mailer like mutt
with Gmail.

--msw

[1] https://www.kernel.org/doc/Documentation/email-clients.txt

> On 15 June 2016 at 21:22, Matt Wilson  wrote:
> 
> > On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote:
> > >
> > > You also had a few typos in the email addresses on the Cc: line that
> > > I've corrected (d...@avemeloft.net -> da...@davemeloft.net,
> >
> > Argh, and of course I typo'ed the correction. -> da...@davemloft.net.
> >
> > --msw
> >


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote:
> 
> You also had a few typos in the email addresses on the Cc: line that
> I've corrected (d...@avemeloft.net -> da...@davemeloft.net,

Argh, and of course I typo'ed the correction. -> da...@davemloft.net.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Wed, Jun 15, 2016 at 11:07:17AM -0700, Matt Wilson wrote:
> 
> You also had a few typos in the email addresses on the Cc: line that
> I've corrected (d...@avemeloft.net -> da...@davemeloft.net,

Argh, and of course I typo'ed the correction. -> da...@davemloft.net.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote:
> This is a driver for the forthcoming ENA family of networking devices.
> 
> Signed-off-by: Netanel Belgazal 

[...]

> +
> +struct ena_napi {
> + struct napi_struct napi cacheline_aligned;
> + struct ena_ring *tx_ring;
> + struct ena_ring *rx_ring;
> +#ifndef HAVE_NETDEV_NAPI_LIST
> + struct net_device poll_dev;
> +#endif /* HAVE_NETDEV_NAPI_LIST */
> + u32 qid;
> +};

Netanel,

Sorry I missed this when I read through the patch earlier this week
(though I pointed it out before...). This "#ifndef
HAVE_NETDEV_NAPI_LIST" bit needs to be removed for the in-tree
version.

You also had a few typos in the email addresses on the Cc: line that
I've corrected (d...@avemeloft.net -> da...@davemeloft.net,
aligo...@amazon.com -> aligu...@amazon.com). Please make sure to
correct in your v2 patch.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote:
> This is a driver for the forthcoming ENA family of networking devices.
> 
> Signed-off-by: Netanel Belgazal 

[...]

> +
> +struct ena_napi {
> + struct napi_struct napi cacheline_aligned;
> + struct ena_ring *tx_ring;
> + struct ena_ring *rx_ring;
> +#ifndef HAVE_NETDEV_NAPI_LIST
> + struct net_device poll_dev;
> +#endif /* HAVE_NETDEV_NAPI_LIST */
> + u32 qid;
> +};

Netanel,

Sorry I missed this when I read through the patch earlier this week
(though I pointed it out before...). This "#ifndef
HAVE_NETDEV_NAPI_LIST" bit needs to be removed for the in-tree
version.

You also had a few typos in the email addresses on the Cc: line that
I've corrected (d...@avemeloft.net -> da...@davemeloft.net,
aligo...@amazon.com -> aligu...@amazon.com). Please make sure to
correct in your v2 patch.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Tue, Jun 14, 2016 at 11:27:09PM -0700, David Miller wrote:
> From: Matt Wilson <m...@amzn.com>
> Date: Tue, 14 Jun 2016 23:23:36 -0700
> 
> > Point taken, though existing drivers (even fairly popular ones) also
> > aren't as clean as you might like. A quick look around...
> 
> Existing drivers do undesirable things, film at 11...
> 
> Yet are never a reason to accept such things in new drivers.

I generally agree with this philosophy.

> > Like many other network drivers, some of this is common code used for
> > non-Linux systems, and that's why there is some overlap with Linux
> > facilities.
> 
> Again, never an excuse for such things.

I suppose I was just happy to have the *majorly* objectionable parts
cleaned up...

> > Are there other things that jump out at you?
> 
> I review hundreds of patches a day, I invested what I was able to
> before moving on to other people's work.

I wasn't asking you to do more, only if you had anything else you
wanted to say before Netanel sends a v2. 

> Other developers must help review such a large driver submission, it
> can't all be on me.

And I'm certainly not saying it's all on you. I've been reviewing this
with the team for quite a while to get it in pretty reasonable (IMHO)
shape.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Tue, Jun 14, 2016 at 11:27:09PM -0700, David Miller wrote:
> From: Matt Wilson 
> Date: Tue, 14 Jun 2016 23:23:36 -0700
> 
> > Point taken, though existing drivers (even fairly popular ones) also
> > aren't as clean as you might like. A quick look around...
> 
> Existing drivers do undesirable things, film at 11...
> 
> Yet are never a reason to accept such things in new drivers.

I generally agree with this philosophy.

> > Like many other network drivers, some of this is common code used for
> > non-Linux systems, and that's why there is some overlap with Linux
> > facilities.
> 
> Again, never an excuse for such things.

I suppose I was just happy to have the *majorly* objectionable parts
cleaned up...

> > Are there other things that jump out at you?
> 
> I review hundreds of patches a day, I invested what I was able to
> before moving on to other people's work.

I wasn't asking you to do more, only if you had anything else you
wanted to say before Netanel sends a v2. 

> Other developers must help review such a large driver submission, it
> can't all be on me.

And I'm certainly not saying it's all on you. I've been reviewing this
with the team for quite a while to get it in pretty reasonable (IMHO)
shape.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Tue, Jun 14, 2016 at 10:25:16PM -0700, David Miller wrote:
> From: Netanel Belgazal 
> Date: Mon, 13 Jun 2016 11:46:13 +0300
> 
> > +#define ena_trc_dbg(format, arg...) \
> > +   pr_debug("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_info(format, arg...) \
> > +   pr_info("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_warn(format, arg...) \
> > +   pr_warn("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_err(format, arg...) \
> > +   pr_err("[ENA_COM: %s] " format, __func__, ##arg)
> 
> These custom tracing macros are quite inappropriate.
> 
> We have the function tracer in the kernel when that is needed.  So spitting
> out __func__ all over the place is not something that should be found in
> drivers these days.

Point taken, though existing drivers (even fairly popular ones) also
aren't as clean as you might like. A quick look around...

msw@carbon:~/git/upstream/linux$ git grep -B1 '__func__' drivers/net/ethernet/ 
| grep -A1 '#define' 
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h-#define BNX2X_ERROR(fmt, ...)   
 \
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:pr_err("[%s:%d]" fmt, 
__func__, __LINE__, ##__VA_ARGS__)
[...]
drivers/net/ethernet/intel/ixgb/ixgb_osdep.h:#define ENTER() pr_debug("%s\n", 
__func__);

Like many other network drivers, some of this is common code used for
non-Linux systems, and that's why there is some overlap with Linux
facilities. For example, here's the common ENA parts as it's situated
in DPDK as a PMD:

   http://dpdk.org/browse/dpdk/tree/drivers/net/ena/base/ena_com.c

When you compare to the DPDK version you can see that the common code
has already been contextualized for Linux in this patch in
anticipation of this type of feedback. (e.g., ENA_SPINLOCK_LOCK() ->
spin_lock_irqsave(), etc., as that would obviously never fly).

The Linux-specific bits (ena_netdev.c, ena_ethtool.c, etc.) don't make
use of any of the overlapping functionality needed for the common
code.

> And one can modify pr_fmt do make pr_debug et al. have whatever prefix
> one wants.

Yup, that's an easy improvement.

> I suspect there will be several rounds of review to weed out things
> like this.  You can preempt a lot of that by removing as much in your
> driver that the kernel has existing facilities for.

Are there other things that jump out at you? I felt like this was
pretty good for an initial submission in terms of striking a balance
between using a portable core while avoiding a lot of compatibility
shims.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-15 Thread Matt Wilson
On Tue, Jun 14, 2016 at 10:25:16PM -0700, David Miller wrote:
> From: Netanel Belgazal 
> Date: Mon, 13 Jun 2016 11:46:13 +0300
> 
> > +#define ena_trc_dbg(format, arg...) \
> > +   pr_debug("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_info(format, arg...) \
> > +   pr_info("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_warn(format, arg...) \
> > +   pr_warn("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_err(format, arg...) \
> > +   pr_err("[ENA_COM: %s] " format, __func__, ##arg)
> 
> These custom tracing macros are quite inappropriate.
> 
> We have the function tracer in the kernel when that is needed.  So spitting
> out __func__ all over the place is not something that should be found in
> drivers these days.

Point taken, though existing drivers (even fairly popular ones) also
aren't as clean as you might like. A quick look around...

msw@carbon:~/git/upstream/linux$ git grep -B1 '__func__' drivers/net/ethernet/ 
| grep -A1 '#define' 
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h-#define BNX2X_ERROR(fmt, ...)   
 \
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:pr_err("[%s:%d]" fmt, 
__func__, __LINE__, ##__VA_ARGS__)
[...]
drivers/net/ethernet/intel/ixgb/ixgb_osdep.h:#define ENTER() pr_debug("%s\n", 
__func__);

Like many other network drivers, some of this is common code used for
non-Linux systems, and that's why there is some overlap with Linux
facilities. For example, here's the common ENA parts as it's situated
in DPDK as a PMD:

   http://dpdk.org/browse/dpdk/tree/drivers/net/ena/base/ena_com.c

When you compare to the DPDK version you can see that the common code
has already been contextualized for Linux in this patch in
anticipation of this type of feedback. (e.g., ENA_SPINLOCK_LOCK() ->
spin_lock_irqsave(), etc., as that would obviously never fly).

The Linux-specific bits (ena_netdev.c, ena_ethtool.c, etc.) don't make
use of any of the overlapping functionality needed for the common
code.

> And one can modify pr_fmt do make pr_debug et al. have whatever prefix
> one wants.

Yup, that's an easy improvement.

> I suspect there will be several rounds of review to weed out things
> like this.  You can preempt a lot of that by removing as much in your
> driver that the kernel has existing facilities for.

Are there other things that jump out at you? I felt like this was
pretty good for an initial submission in terms of striking a balance
between using a portable core while avoiding a lot of compatibility
shims.

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-13 Thread Matt Wilson
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote:
> This is a driver for the forthcoming ENA family of networking devices.

Reviewed-by: Matt Wilson <m...@amazon.com>

> Signed-off-by: Netanel Belgazal <neta...@annapurnalabs.com>
> ---
>  Documentation/networking/00-INDEX |2 +
>  Documentation/networking/ena.txt  |  330 ++
>  MAINTAINERS   |9 +
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/amazon/Kconfig   |   27 +
>  drivers/net/ethernet/amazon/Makefile  |5 +
>  drivers/net/ethernet/amazon/ena/Makefile  |9 +
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h  | 1246 
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2768 +
>  drivers/net/ethernet/amazon/ena/ena_com.h | 1051 +++
>  drivers/net/ethernet/amazon/ena/ena_common_defs.h |   52 +
>  drivers/net/ethernet/amazon/ena/ena_eth_com.c |  508 
>  drivers/net/ethernet/amazon/ena/ena_eth_com.h |  160 +
>  drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h |  460 +++
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c |  836 +
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 3362 
> +
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  327 ++
>  drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h  |   67 +
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h   |  133 +
>  drivers/net/ethernet/amazon/ena/ena_sysfs.c   |  264 ++
>  drivers/net/ethernet/amazon/ena/ena_sysfs.h   |   55 +
>  22 files changed, 11673 insertions(+)
>  create mode 100644 Documentation/networking/ena.txt
>  create mode 100644 drivers/net/ethernet/amazon/Kconfig
>  create mode 100644 drivers/net/ethernet/amazon/Makefile
>  create mode 100644 drivers/net/ethernet/amazon/ena/Makefile
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_admin_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_common_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_ethtool.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_regs_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.h

[...]

--msw


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-13 Thread Matt Wilson
On Mon, Jun 13, 2016 at 11:46:13AM +0300, Netanel Belgazal wrote:
> This is a driver for the forthcoming ENA family of networking devices.

Reviewed-by: Matt Wilson 

> Signed-off-by: Netanel Belgazal 
> ---
>  Documentation/networking/00-INDEX |2 +
>  Documentation/networking/ena.txt  |  330 ++
>  MAINTAINERS   |9 +
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/amazon/Kconfig   |   27 +
>  drivers/net/ethernet/amazon/Makefile  |5 +
>  drivers/net/ethernet/amazon/ena/Makefile  |9 +
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h  | 1246 
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2768 +
>  drivers/net/ethernet/amazon/ena/ena_com.h | 1051 +++
>  drivers/net/ethernet/amazon/ena/ena_common_defs.h |   52 +
>  drivers/net/ethernet/amazon/ena/ena_eth_com.c |  508 
>  drivers/net/ethernet/amazon/ena/ena_eth_com.h |  160 +
>  drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h |  460 +++
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c |  836 +
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 3362 
> +
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  327 ++
>  drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h  |   67 +
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h   |  133 +
>  drivers/net/ethernet/amazon/ena/ena_sysfs.c   |  264 ++
>  drivers/net/ethernet/amazon/ena/ena_sysfs.h   |   55 +
>  22 files changed, 11673 insertions(+)
>  create mode 100644 Documentation/networking/ena.txt
>  create mode 100644 drivers/net/ethernet/amazon/Kconfig
>  create mode 100644 drivers/net/ethernet/amazon/Makefile
>  create mode 100644 drivers/net/ethernet/amazon/ena/Makefile
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_admin_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_com.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_common_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_com.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_ethtool.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_netdev.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_regs_defs.h
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.c
>  create mode 100644 drivers/net/ethernet/amazon/ena/ena_sysfs.h

[...]

--msw


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
On Tue, Mar 15, 2016 at 01:19:07PM -0500, Dan Williams wrote:
> On Tue, 2016-03-15 at 12:50 +0200, Netanel Belgazal wrote:
> > +
> > +The ENA driver supports industry standard TCP/IP offload features
> > such
> > +as checksum offload and TCP transmit segmentation offload (TSO).
> > +
> > +Receive-side scaling (RSS) is supported for multi-core scaling.
> 
> Any chance it also supports VXLAN offload?

While there's room for tunnel offload fields in the TX/RX descriptors
(e.g., see ena_eth_io_tx_desc), there is no support at this time.

--msw


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
On Tue, Mar 15, 2016 at 01:19:07PM -0500, Dan Williams wrote:
> On Tue, 2016-03-15 at 12:50 +0200, Netanel Belgazal wrote:
> > +
> > +The ENA driver supports industry standard TCP/IP offload features
> > such
> > +as checksum offload and TCP transmit segmentation offload (TSO).
> > +
> > +Receive-side scaling (RSS) is supported for multi-core scaling.
> 
> Any chance it also supports VXLAN offload?

While there's room for tunnel offload fields in the TX/RX descriptors
(e.g., see ena_eth_io_tx_desc), there is no support at this time.

--msw


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote:
[...]
> diff --git a/drivers/net/ethernet/amazon/Kconfig 
> b/drivers/net/ethernet/amazon/Kconfig
> new file mode 100644
> index 000..bc4f240d
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/Kconfig
> @@ -0,0 +1,27 @@
> +#
> +# Amazon network device configuration
> +#
> +
> +config NET_VENDOR_AMAZON
> + bool "Amazon Devices"
> + default y
> + ---help---
> +   If you have a network (Ethernet) device belonging to this class, say 
> Y.
> +
> +   Note that the answer to this question doesn't directly affect the
> +   kernel: saying N will just cause the configurator to skip all
> +   the questions about amazon devices. If you say Y, you will be asked
> +   for your specific device in the following questions.
> +
> +if NET_VENDOR_AMAZON
> +
> +config ENA_ETHERNET
> + tristate "Elastic Network Adapter (ENA) support"
> + depends on (PCI_MSI && X86) || COMPILE_TEST

Remove COMPILE_TEST so that the kbuild-all robot won't complain. I
imagine that the drier should work on other systems that have MSI-X...

> + ---help---
> +   This driver supports Elastic Network Adapter (ENA) adapter"
> +
> +   To compile this driver as a module, choose M here.
> +   The module will be called ena.
> +
> +endif #NET_VENDOR_AMAZON

[...]

> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> new file mode 100644
> index 000..41d7265
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c

[...]

> +static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter)
> +{
> + u32 i;
> + int rc;
> +
> + adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_queues);
> + if (!adapter->netdev->rx_cpu_rmap)
> + return -ENOMEM;
> + for (i = 0; i < adapter->num_queues; i++) {
> + int irq_idx = ENA_IO_IRQ_IDX(i);
> +
> + rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
> +   adapter->msix_entries[irq_idx].vector);
> + if (rc) {
> + free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> + adapter->netdev->rx_cpu_rmap = NULL;
> + return rc;
> + }
> + }
> + return 0;
> +}

This should be in #ifdef CONFIG_RSS_ACCEL (and elsewhere where
netdev->rx_cpu_rmap is touched).

--msw


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote:
[...]
> diff --git a/drivers/net/ethernet/amazon/Kconfig 
> b/drivers/net/ethernet/amazon/Kconfig
> new file mode 100644
> index 000..bc4f240d
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/Kconfig
> @@ -0,0 +1,27 @@
> +#
> +# Amazon network device configuration
> +#
> +
> +config NET_VENDOR_AMAZON
> + bool "Amazon Devices"
> + default y
> + ---help---
> +   If you have a network (Ethernet) device belonging to this class, say 
> Y.
> +
> +   Note that the answer to this question doesn't directly affect the
> +   kernel: saying N will just cause the configurator to skip all
> +   the questions about amazon devices. If you say Y, you will be asked
> +   for your specific device in the following questions.
> +
> +if NET_VENDOR_AMAZON
> +
> +config ENA_ETHERNET
> + tristate "Elastic Network Adapter (ENA) support"
> + depends on (PCI_MSI && X86) || COMPILE_TEST

Remove COMPILE_TEST so that the kbuild-all robot won't complain. I
imagine that the drier should work on other systems that have MSI-X...

> + ---help---
> +   This driver supports Elastic Network Adapter (ENA) adapter"
> +
> +   To compile this driver as a module, choose M here.
> +   The module will be called ena.
> +
> +endif #NET_VENDOR_AMAZON

[...]

> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> new file mode 100644
> index 000..41d7265
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c

[...]

> +static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter)
> +{
> + u32 i;
> + int rc;
> +
> + adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_queues);
> + if (!adapter->netdev->rx_cpu_rmap)
> + return -ENOMEM;
> + for (i = 0; i < adapter->num_queues; i++) {
> + int irq_idx = ENA_IO_IRQ_IDX(i);
> +
> + rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
> +   adapter->msix_entries[irq_idx].vector);
> + if (rc) {
> + free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> + adapter->netdev->rx_cpu_rmap = NULL;
> + return rc;
> + }
> + }
> + return 0;
> +}

This should be in #ifdef CONFIG_RSS_ACCEL (and elsewhere where
netdev->rx_cpu_rmap is touched).

--msw


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
Hi Netanel,

Looks like the last round of internal review went to {lkml,netdev,etc},
so this should be considered a RFC patch.

The description still needs work.

On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote:
> This is a driver for the Amazon ethernet ENA family.

ethernet -> Ethernet

> The driver operates variety of ENA adapters through
> feature negotiation with the adapter and upgradable commands set.
> ENA driver handles PCI Physical and Virtual ENA functions.
> 
> The ENA device is not yet released to public.
> He is expected to be released soon.

I would say "this is a driver for a device that will be available in
the future." I wouldn't say "He is expected to released soon."

> For the full specification of the device please refer to:
> 

Obviously this is a placeholder. I think that the included
documentation in ena.txt and in the code provides a reasonable theory
of operation, and you don't necessarily need to provide a pointer
here. If/when a full specification is available you can submit a patch
to ena.txt that points to the canonical location.

--msw

> ---
>  Documentation/networking/00-INDEX |2 +
>  Documentation/networking/ena.txt  |  330 +++
>  MAINTAINERS   |9 +
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/amazon/Kconfig   |   27 +
>  drivers/net/ethernet/amazon/Makefile  |5 +
>  drivers/net/ethernet/amazon/ena/Makefile  |9 +
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h  | 1310 +
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2730 ++
>  drivers/net/ethernet/amazon/ena/ena_com.h | 1040 +++
>  drivers/net/ethernet/amazon/ena/ena_common_defs.h |   52 +
>  drivers/net/ethernet/amazon/ena/ena_eth_com.c |  502 
>  drivers/net/ethernet/amazon/ena/ena_eth_com.h |  146 +
>  drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h |  509 
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c |  837 ++
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 3179 
> +
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  317 ++
>  drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h  |   77 +
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h   |  133 +
>  drivers/net/ethernet/amazon/ena/ena_sysfs.c   |  272 ++
>  drivers/net/ethernet/amazon/ena/ena_sysfs.h   |   55 +
[...]


Re: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-15 Thread Matt Wilson
Hi Netanel,

Looks like the last round of internal review went to {lkml,netdev,etc},
so this should be considered a RFC patch.

The description still needs work.

On Tue, Mar 15, 2016 at 12:50:06PM +0200, Netanel Belgazal wrote:
> This is a driver for the Amazon ethernet ENA family.

ethernet -> Ethernet

> The driver operates variety of ENA adapters through
> feature negotiation with the adapter and upgradable commands set.
> ENA driver handles PCI Physical and Virtual ENA functions.
> 
> The ENA device is not yet released to public.
> He is expected to be released soon.

I would say "this is a driver for a device that will be available in
the future." I wouldn't say "He is expected to released soon."

> For the full specification of the device please refer to:
> 

Obviously this is a placeholder. I think that the included
documentation in ena.txt and in the code provides a reasonable theory
of operation, and you don't necessarily need to provide a pointer
here. If/when a full specification is available you can submit a patch
to ena.txt that points to the canonical location.

--msw

> ---
>  Documentation/networking/00-INDEX |2 +
>  Documentation/networking/ena.txt  |  330 +++
>  MAINTAINERS   |9 +
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/amazon/Kconfig   |   27 +
>  drivers/net/ethernet/amazon/Makefile  |5 +
>  drivers/net/ethernet/amazon/ena/Makefile  |9 +
>  drivers/net/ethernet/amazon/ena/ena_admin_defs.h  | 1310 +
>  drivers/net/ethernet/amazon/ena/ena_com.c | 2730 ++
>  drivers/net/ethernet/amazon/ena/ena_com.h | 1040 +++
>  drivers/net/ethernet/amazon/ena/ena_common_defs.h |   52 +
>  drivers/net/ethernet/amazon/ena/ena_eth_com.c |  502 
>  drivers/net/ethernet/amazon/ena/ena_eth_com.h |  146 +
>  drivers/net/ethernet/amazon/ena/ena_eth_io_defs.h |  509 
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c |  837 ++
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 3179 
> +
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  317 ++
>  drivers/net/ethernet/amazon/ena/ena_pci_id_tbl.h  |   77 +
>  drivers/net/ethernet/amazon/ena/ena_regs_defs.h   |  133 +
>  drivers/net/ethernet/amazon/ena/ena_sysfs.c   |  272 ++
>  drivers/net/ethernet/amazon/ena/ena_sysfs.h   |   55 +
[...]


Re: [PATCH] virtio-pci: also bind to Amazon PCI vendor ID

2014-09-15 Thread Matt Wilson
61ba512-40e2-4cc9-843a-923143f3456c/pciidspec-11.doc
> > > "Test will read and properly concatenate PCI IDs and verify uniqueness"
> > > this is likely what you are running into: IDs must be unique,
> > > so if you want to put your driver in Microsoft's database,
> > > it must match a different set of IDs.
> > > But you should not need to change the vendor ID to make them unique,
> > > changing subsystem vendor ID will do.
> > >
> > > Did you try this?
> > 
> > You cannot submit a modified kvm-guest-drivers.git for WHQL
> > certification
> 
> Sorry about being unclear, that github link was just an example to give
> you the idea.  The point is to change the subsystem ID *of the device*.
> 
> > as the licensing is constructed in such a way as to
> > prevent that.
> >
> > >> Red Hat has non-redistributable Windows drivers and Microsoft
> > >> will not allow anyone else to WHQL certify drivers using that
> > >> vendor ID.
> > >
> > > Don't see what Red Hat's windows drivers have to do with Linux really.
> > > Amazon.com can do whatever it wants with its vendor ID, and if there is a
> > > hypervisor with a different vendor ID that can use the virtio drivers, 
> > > this
> > > patch is required.
> > > The following would be a reasonable commit log in that case:
> > >
> > > "Amazon.com uses PV devices with vendor ID 0x1d0f that are otherwise
> > >  compatible with Linux virtio drivers. Add 0x1d0f ID to the list
> > >  to make Linux work with these devices."
> > > Feel free to use :)
> > >
> > >
> > > But I'd like to note that by doing this on the hypervisor side,
> > > you lose the ability to run older Linux guests,
> > > and create work for all distros who now need to update their
> > > kernels to work with your ID, apparently for no good reason.
> > >
> > > So if this isn't out in the field yet, I would suggest examining
> > > the alternative listed above.
> > 
> > I am very happy to use any alternative mechanism that allows for
> > virtio to be used with a wide variety of guests.
> 
> I hope my idea above can work - it would be best for everyone.  I would
> try to explore it before giving up and changing the vendor ID.
> As I said this is just friendly advice, and not a nak to the patch,
> if your hypervisor has devices with your own ID, go ahead and
> add the ID to driver.
> 
> > Many other companies
> > have also struggled with this and AFAIK no one has had much success.
> 
> Question remains what the problem is.
> Until we have an explicit report saying "we tried XYZ and
> it failed" we are just shooting in the dark, aren't we?
> 
> 
> > > OTOH if it *is* decided this is going to be out there in the field, 
> > > please add
> > > the new devices to the PCI IDs list.
> > > http://pci-ids.ucw.cz/
> > > Otherwise there's no way to be sure someone won't try to
> > > use these IDs for something else.
> > 
> > PCI-SIG assigns vendor IDs and 0x1d0f is assigned to Amazon.  See
> > https://www.pcisig.com/membership/vid_search/
> > 
> > Vendors self-manage device IDs and we have allocated 0x1000-0x103f to
> > virtio devices.
> > 
> > >> That makes it impossible to use virtio drivers with
> > >> a Windows guest without changing the vendor ID.
> > >
> > > Hardly impossible: virtio drivers are available from a
> > > variety of sources.
> > 
> > Examples?
> > 
> > Regards,
> > 
> > Anthony Liguori
> 
> Fedora has drivers.
> Most other linux distros have drivers.
> Maybe not as good as RHEL ones but they are there, and
> their existance is the most likely reason no one bothered
> re-writing drivers.

Why not BSD license the drivers and get them into Windows Update?
Citrix donated their drivers to the Xen Project and licensed them
under the BSD license. It makes this process a lot easier...

See win-* prefixed projects in https://github.com/xenserver, now
moved to pvdrivers/win repositories in http://xenbits.xen.org/gitweb 

--msw

> Which does not mean you shouldn't do it. Be my guest.
> 
> My only point is commit log should not say "impossible to use".
> 
> 
> > >
> > > But this is IMO beside the point.
> > >
> > >> Cc: Matt Wilson 
> > >> Cc: Rusty Russell 
> > >> Cc: Michael Tsirkin 
> > >> Signed-off-by: Anthony Liguori 
> > >
> > > I'd like to see the response/confirmation of the above, and/or the
> > > commit log replaced before this patch is applied.
> > >
> > > Thanks!
> > >
> > >> ---
> > >>  drivers/virtio/virtio_pci.c |2 ++
> > >>  1 file changed, 2 insertions(+)
> > >>
> > >> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > >> index 101db3f..9cbac33 100644
> > >> --- a/drivers/virtio/virtio_pci.c
> > >> +++ b/drivers/virtio/virtio_pci.c
> > >> @@ -93,6 +93,8 @@ struct virtio_pci_vq_info
> > >>  /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
> > >>  static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
> > >>   { PCI_DEVICE(0x1af4, PCI_ANY_ID) },
> > >> + /* Amazon.com vendor ID */
> > >> + { PCI_DEVICE(0x1d0f, PCI_ANY_ID) },
> > >>   { 0 }
> > >>  };
> > >>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio-pci: also bind to Amazon PCI vendor ID

2014-09-15 Thread Matt Wilson
 is to change the subsystem ID *of the device*.
 
  as the licensing is constructed in such a way as to
  prevent that.
 
   Red Hat has non-redistributable Windows drivers and Microsoft
   will not allow anyone else to WHQL certify drivers using that
   vendor ID.
  
   Don't see what Red Hat's windows drivers have to do with Linux really.
   Amazon.com can do whatever it wants with its vendor ID, and if there is a
   hypervisor with a different vendor ID that can use the virtio drivers, 
   this
   patch is required.
   The following would be a reasonable commit log in that case:
  
   Amazon.com uses PV devices with vendor ID 0x1d0f that are otherwise
compatible with Linux virtio drivers. Add 0x1d0f ID to the list
to make Linux work with these devices.
   Feel free to use :)
  
  
   But I'd like to note that by doing this on the hypervisor side,
   you lose the ability to run older Linux guests,
   and create work for all distros who now need to update their
   kernels to work with your ID, apparently for no good reason.
  
   So if this isn't out in the field yet, I would suggest examining
   the alternative listed above.
  
  I am very happy to use any alternative mechanism that allows for
  virtio to be used with a wide variety of guests.
 
 I hope my idea above can work - it would be best for everyone.  I would
 try to explore it before giving up and changing the vendor ID.
 As I said this is just friendly advice, and not a nak to the patch,
 if your hypervisor has devices with your own ID, go ahead and
 add the ID to driver.
 
  Many other companies
  have also struggled with this and AFAIK no one has had much success.
 
 Question remains what the problem is.
 Until we have an explicit report saying we tried XYZ and
 it failed we are just shooting in the dark, aren't we?
 
 
   OTOH if it *is* decided this is going to be out there in the field, 
   please add
   the new devices to the PCI IDs list.
   http://pci-ids.ucw.cz/
   Otherwise there's no way to be sure someone won't try to
   use these IDs for something else.
  
  PCI-SIG assigns vendor IDs and 0x1d0f is assigned to Amazon.  See
  https://www.pcisig.com/membership/vid_search/
  
  Vendors self-manage device IDs and we have allocated 0x1000-0x103f to
  virtio devices.
  
   That makes it impossible to use virtio drivers with
   a Windows guest without changing the vendor ID.
  
   Hardly impossible: virtio drivers are available from a
   variety of sources.
  
  Examples?
  
  Regards,
  
  Anthony Liguori
 
 Fedora has drivers.
 Most other linux distros have drivers.
 Maybe not as good as RHEL ones but they are there, and
 their existance is the most likely reason no one bothered
 re-writing drivers.

Why not BSD license the drivers and get them into Windows Update?
Citrix donated their drivers to the Xen Project and licensed them
under the BSD license. It makes this process a lot easier...

See win-* prefixed projects in https://github.com/xenserver, now
moved to pvdrivers/win repositories in http://xenbits.xen.org/gitweb 

--msw

 Which does not mean you shouldn't do it. Be my guest.
 
 My only point is commit log should not say impossible to use.
 
 
  
   But this is IMO beside the point.
  
   Cc: Matt Wilson m...@amazon.com
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael Tsirkin m...@redhat.com
   Signed-off-by: Anthony Liguori aligu...@amazon.com
  
   I'd like to see the response/confirmation of the above, and/or the
   commit log replaced before this patch is applied.
  
   Thanks!
  
   ---
drivers/virtio/virtio_pci.c |2 ++
1 file changed, 2 insertions(+)
  
   diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
   index 101db3f..9cbac33 100644
   --- a/drivers/virtio/virtio_pci.c
   +++ b/drivers/virtio/virtio_pci.c
   @@ -93,6 +93,8 @@ struct virtio_pci_vq_info
/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
 { PCI_DEVICE(0x1af4, PCI_ANY_ID) },
   + /* Amazon.com vendor ID */
   + { PCI_DEVICE(0x1d0f, PCI_ANY_ID) },
 { 0 }
};
  
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] xen/setup: Remap Xen Identity Mapped RAM

2014-08-11 Thread Matt Wilson
On Mon, Aug 11, 2014 at 10:34:01AM -0700, Matt Rushton wrote:
> Instead of ballooning up and down dom0 memory this remaps the existing mfns
> that were replaced by the identity map. The reason for this is that the
> existing implementation ballooned memory up and and down which caused dom0
> to have discontiguous pages. In some cases this resulted in the use of bounce
> buffers which reduced network I/O performance significantly. This change will
> honor the existing order of the pages with the exception of some boundary
> conditions.
> 
> To do this we need to update both the Linux p2m table and the Xen m2p table.
> Particular care must be taken when updating the p2m table since it's important
> to limit table memory consumption and reuse the existing leaf pages which get
> freed when an entire leaf page is set to the identity map. To implement this,
> mapping updates are grouped into blocks with table entries getting cached
> temporarily and then released.
> 
> On my test system before:
> Total pages: 2105014
> Total contiguous: 1640635
> 
> After:
> Total pages: 2105014
> Total contiguous: 2098904
> 
> Signed-off-by: Matthew Rushton 
> ---

Matt,

Could you provide a summary of v2->v3 changes here?

--msw

>  arch/x86/xen/p2m.c   |   23 +---
>  arch/x86/xen/p2m.h   |   15 ++
>  arch/x86/xen/setup.c |  370 
> +++---
>  3 files changed, 314 insertions(+), 94 deletions(-)
>  create mode 100644 arch/x86/xen/p2m.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] xen/setup: Remap Xen Identity Mapped RAM

2014-08-11 Thread Matt Wilson
On Mon, Aug 11, 2014 at 10:34:01AM -0700, Matt Rushton wrote:
 Instead of ballooning up and down dom0 memory this remaps the existing mfns
 that were replaced by the identity map. The reason for this is that the
 existing implementation ballooned memory up and and down which caused dom0
 to have discontiguous pages. In some cases this resulted in the use of bounce
 buffers which reduced network I/O performance significantly. This change will
 honor the existing order of the pages with the exception of some boundary
 conditions.
 
 To do this we need to update both the Linux p2m table and the Xen m2p table.
 Particular care must be taken when updating the p2m table since it's important
 to limit table memory consumption and reuse the existing leaf pages which get
 freed when an entire leaf page is set to the identity map. To implement this,
 mapping updates are grouped into blocks with table entries getting cached
 temporarily and then released.
 
 On my test system before:
 Total pages: 2105014
 Total contiguous: 1640635
 
 After:
 Total pages: 2105014
 Total contiguous: 2098904
 
 Signed-off-by: Matthew Rushton mrush...@amazon.com
 ---

Matt,

Could you provide a summary of v2-v3 changes here?

--msw

  arch/x86/xen/p2m.c   |   23 +---
  arch/x86/xen/p2m.h   |   15 ++
  arch/x86/xen/setup.c |  370 
 +++---
  3 files changed, 314 insertions(+), 94 deletions(-)
  create mode 100644 arch/x86/xen/p2m.h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] xen/setup: Remap Xen Identity Mapped RAM

2014-08-01 Thread Matt Wilson
On Fri, Aug 01, 2014 at 03:52:28PM +0100, David Vrabel wrote:
> On 31/07/14 18:43, David Vrabel wrote:
> > On 20/07/14 01:01, Matt Rushton wrote:
> >> Instead of ballooning up and down dom0 memory this remaps the existing mfns
> >> that were replaced by the identity map. The reason for this is that the
> >> existing implementation ballooned memory up and and down which caused dom0
> >> to have discontiguous pages. In some cases this resulted in the use of 
> >> bounce
> >> buffers which reduced network I/O performance significantly. This change 
> >> will
> >> honor the existing order of the pages with the exception of some boundary
> >> conditions.
> >>
> >> To do this we need to update both the Linux p2m table and the Xen m2p 
> >> table.
> >> Particular care must be taken when updating the p2m table since it's 
> >> important
> >> to limit table memory consumption and reuse the existing leaf pages which 
> >> get
> >> freed when an entire leaf page is set to the identity map. To implement 
> >> this,
> >> mapping updates are grouped into blocks with table entries getting cached
> >> temporarily and then released.
> >>
> >> On my test system before:
> >> Total pages: 2105014
> >> Total contiguous: 1640635
> >>
> >> After:
> >> Total pages: 2105014
> >> Total contiguous: 2098904
> > 
> > Applied to devel/for-linus-3.17
> 
> Unfortunately, this produces too many WARNINGs on some boxes or
> with certain configurations.

Hi David,

Do you have more information about the systems or configurations that
showed a problem?

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] xen/setup: Remap Xen Identity Mapped RAM

2014-08-01 Thread Matt Wilson
On Fri, Aug 01, 2014 at 03:52:28PM +0100, David Vrabel wrote:
 On 31/07/14 18:43, David Vrabel wrote:
  On 20/07/14 01:01, Matt Rushton wrote:
  Instead of ballooning up and down dom0 memory this remaps the existing mfns
  that were replaced by the identity map. The reason for this is that the
  existing implementation ballooned memory up and and down which caused dom0
  to have discontiguous pages. In some cases this resulted in the use of 
  bounce
  buffers which reduced network I/O performance significantly. This change 
  will
  honor the existing order of the pages with the exception of some boundary
  conditions.
 
  To do this we need to update both the Linux p2m table and the Xen m2p 
  table.
  Particular care must be taken when updating the p2m table since it's 
  important
  to limit table memory consumption and reuse the existing leaf pages which 
  get
  freed when an entire leaf page is set to the identity map. To implement 
  this,
  mapping updates are grouped into blocks with table entries getting cached
  temporarily and then released.
 
  On my test system before:
  Total pages: 2105014
  Total contiguous: 1640635
 
  After:
  Total pages: 2105014
  Total contiguous: 2098904
  
  Applied to devel/for-linus-3.17
 
 Unfortunately, this produces too many WARNINGs on some boxes or
 with certain configurations.

Hi David,

Do you have more information about the systems or configurations that
showed a problem?

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes

2014-02-06 Thread Matt Wilson
On Thu, Feb 06, 2014 at 11:20:04AM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Feb 05, 2014 at 08:57:30PM -0800, Matt Wilson wrote:
> > On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote:
> > > > This series contain blkback bug fixes for memory leaks (patches 1 and 
> > > > 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned 
> > > > since its memory layout is exactly the same as blkif_request_segment 
> > > > and should introduce no functional change.
> > > > 
> > > > All patches should be backported to stable branches, although the last 
> > > > one is not a functional change it will still be nice to have it for 
> > > > code correctness.
> > > 
> > > Matt and Matt, could you guys kindly take a look as well? Thank you!
> > 
> > Matt R. did some testing today and set up additional tests to run
> > overnight. He'll follow up after the overnight tests complete.
> 
> Thank you!

Just in case the various mailing list software ate Matt's messages, he
sent the following:

[PATCH v2 2/4] xen-blkback: fix memory leaks
Tested-by: Matt Rushton 
Reviewed-by: Matt Rushton 

[PATCH v2 3/4] xen-blkback: fix shutdown race
Tested-by: Matt Rushton 
Reviewed-by: Matt Rushton 

[PATCH v2 4/4] xen-blkif: drop struct blkif_request_segment_aligned
Tested-by: Matt Rushton 

I've separately sent suggestions to Matt on how to setup his mailer to
format messages per list etiquette and how to avoid breaking message
threading.

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes

2014-02-06 Thread Matt Wilson
On Thu, Feb 06, 2014 at 11:20:04AM -0500, Konrad Rzeszutek Wilk wrote:
 On Wed, Feb 05, 2014 at 08:57:30PM -0800, Matt Wilson wrote:
  On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote:
   On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote:
This series contain blkback bug fixes for memory leaks (patches 1 and 
2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned 
since its memory layout is exactly the same as blkif_request_segment 
and should introduce no functional change.

All patches should be backported to stable branches, although the last 
one is not a functional change it will still be nice to have it for 
code correctness.
   
   Matt and Matt, could you guys kindly take a look as well? Thank you!
  
  Matt R. did some testing today and set up additional tests to run
  overnight. He'll follow up after the overnight tests complete.
 
 Thank you!

Just in case the various mailing list software ate Matt's messages, he
sent the following:

[PATCH v2 2/4] xen-blkback: fix memory leaks
Tested-by: Matt Rushton mrush...@amazon.com
Reviewed-by: Matt Rushton mrush...@amazon.com

[PATCH v2 3/4] xen-blkback: fix shutdown race
Tested-by: Matt Rushton mrush...@amazon.com
Reviewed-by: Matt Rushton mrush...@amazon.com

[PATCH v2 4/4] xen-blkif: drop struct blkif_request_segment_aligned
Tested-by: Matt Rushton mrush...@amazon.com

I've separately sent suggestions to Matt on how to setup his mailer to
format messages per list etiquette and how to avoid breaking message
threading.

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes

2014-02-05 Thread Matt Wilson
On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote:
> > This series contain blkback bug fixes for memory leaks (patches 1 and 
> > 2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned 
> > since its memory layout is exactly the same as blkif_request_segment 
> > and should introduce no functional change.
> > 
> > All patches should be backported to stable branches, although the last 
> > one is not a functional change it will still be nice to have it for 
> > code correctness.
> 
> Matt and Matt, could you guys kindly take a look as well? Thank you!

Matt R. did some testing today and set up additional tests to run
overnight. He'll follow up after the overnight tests complete.

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 0/4] xen-blk: bug fixes

2014-02-05 Thread Matt Wilson
On Tue, Feb 04, 2014 at 11:15:01AM -0400, Konrad Rzeszutek Wilk wrote:
 On Tue, Feb 04, 2014 at 11:26:11AM +0100, Roger Pau Monne wrote:
  This series contain blkback bug fixes for memory leaks (patches 1 and 
  2) and a race (patch 3). Patch 4 removes blkif_request_segment_aligned 
  since its memory layout is exactly the same as blkif_request_segment 
  and should introduce no functional change.
  
  All patches should be backported to stable branches, although the last 
  one is not a functional change it will still be nice to have it for 
  code correctness.
 
 Matt and Matt, could you guys kindly take a look as well? Thank you!

Matt R. did some testing today and set up additional tests to run
overnight. He'll follow up after the overnight tests complete.

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7] xen/grant-table: Avoid m2p_override during mapping

2014-02-03 Thread Matt Wilson
On Mon, Feb 03, 2014 at 01:24:58PM +, Zoltan Kiss wrote:
> The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
> for blkback and future netback patches it just cause a lock contention, as
> those pages never go to userspace. Therefore this series does the following:
> - the original functions were renamed to __gnttab_[un]map_refs, with a new
>   parameter m2p_override
> - based on m2p_override either they follow the original behaviour, or just set
>   the private flag and call set_phys_to_machine
> - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
>   m2p_override false
> - a new function gnttab_[un]map_refs_userspace provides the old behaviour
> 
> It also removes a stray space from page.h and change ret to 0 if
> XENFEAT_auto_translated_physmap, as that is the only possible return value
> there.
> 
> v2:
> - move the storing of the old mfn in page->index to gnttab_map_refs
> - move the function header update to a separate patch
> 
> v3:
> - a new approach to retain old behaviour where it needed
> - squash the patches into one
> 
> v4:
> - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
> - clear page->private before doing anything with the page, so 
> m2p_find_override
>   won't race with this
> 
> v5:
> - change return value handling in __gnttab_[un]map_refs
> - remove a stray space in page.h
> - add detail why ret = 0 now at some places
> 
> v6:
> - don't pass pfn to m2p* functions, just get it locally
> 
> v7:
> - the previous version broke build on ARM, as there is no need for those p2m
>   changes. I've put them into arch specific functions, which are stubs on arm
>
> Signed-off-by: Zoltan Kiss 
> Suggested-by: David Vrabel 

You're still forgetting that this was originally proposed by Anthony
Liguori .

https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] xen-blkback: bug fixes

2014-02-03 Thread Matt Wilson
On Tue, Jan 28, 2014 at 03:38:37PM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 28, 2014 at 06:43:32PM +0100, Roger Pau Monne wrote:
> > blkback bug fixes for memory leaks (patches 1 and 2) and a race 
> > (patch 3).
> 
> They all look OK to me. I've stuck them in my 'stable/for-jens-3.14'
> branch and are testing them now (hadn't pushed it yet).
> 
> Matt and Matt,
> 
> Could you take a look at the other two patches as well?

Sure, though somehow you didn't address your message to us, so I
didn't see it until today.

Matt Rushton did some review and testing on an earlier version that
came out fine. We'll give the final series a test since there was
still a bit of rework.

--msw

> David, Boris,
> 
> Are you OK with pushing those patches out to Jens Axboe if nobody
> gives an NACK by Friday?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] xen-blkback: bug fixes

2014-02-03 Thread Matt Wilson
On Tue, Jan 28, 2014 at 03:38:37PM -0400, Konrad Rzeszutek Wilk wrote:
 On Tue, Jan 28, 2014 at 06:43:32PM +0100, Roger Pau Monne wrote:
  blkback bug fixes for memory leaks (patches 1 and 2) and a race 
  (patch 3).
 
 They all look OK to me. I've stuck them in my 'stable/for-jens-3.14'
 branch and are testing them now (hadn't pushed it yet).
 
 Matt and Matt,
 
 Could you take a look at the other two patches as well?

Sure, though somehow you didn't address your message to us, so I
didn't see it until today.

Matt Rushton did some review and testing on an earlier version that
came out fine. We'll give the final series a test since there was
still a bit of rework.

--msw

 David, Boris,
 
 Are you OK with pushing those patches out to Jens Axboe if nobody
 gives an NACK by Friday?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7] xen/grant-table: Avoid m2p_override during mapping

2014-02-03 Thread Matt Wilson
On Mon, Feb 03, 2014 at 01:24:58PM +, Zoltan Kiss wrote:
 The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
 for blkback and future netback patches it just cause a lock contention, as
 those pages never go to userspace. Therefore this series does the following:
 - the original functions were renamed to __gnttab_[un]map_refs, with a new
   parameter m2p_override
 - based on m2p_override either they follow the original behaviour, or just set
   the private flag and call set_phys_to_machine
 - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
   m2p_override false
 - a new function gnttab_[un]map_refs_userspace provides the old behaviour
 
 It also removes a stray space from page.h and change ret to 0 if
 XENFEAT_auto_translated_physmap, as that is the only possible return value
 there.
 
 v2:
 - move the storing of the old mfn in page-index to gnttab_map_refs
 - move the function header update to a separate patch
 
 v3:
 - a new approach to retain old behaviour where it needed
 - squash the patches into one
 
 v4:
 - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
 - clear page-private before doing anything with the page, so 
 m2p_find_override
   won't race with this
 
 v5:
 - change return value handling in __gnttab_[un]map_refs
 - remove a stray space in page.h
 - add detail why ret = 0 now at some places
 
 v6:
 - don't pass pfn to m2p* functions, just get it locally
 
 v7:
 - the previous version broke build on ARM, as there is no need for those p2m
   changes. I've put them into arch specific functions, which are stubs on arm

 Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
 Suggested-by: David Vrabel david.vra...@citrix.com

You're still forgetting that this was originally proposed by Anthony
Liguori aligu...@amazon.com.

https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen-blkback: fix memory leaks

2014-01-27 Thread Matt Wilson
On Mon, Jan 27, 2014 at 05:19:41PM +0100, Roger Pau Monné wrote:
> On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
> >> I've at least identified two possible memory leaks in blkback, both
> >> related to the shutdown path of a VBD:
> >>
> >> - We don't wait for any pending purge work to finish before cleaning
> >>   the list of free_pages. The purge work will call put_free_pages and
> >>   thus we might end up with pages being added to the free_pages list
> >>   after we have emptied it.
> >> - We don't wait for pending requests to end before cleaning persistent
> >>   grants and the list of free_pages. Again this can add pages to the
> >>   free_pages lists or persistent grants to the persistent_gnts
> >>   red-black tree.
> >>
> >> Also, add some checks in xen_blkif_free to make sure we are cleaning
> >> everything.
> >>
> >> Signed-off-by: Roger Pau Monné 
> >> Cc: Konrad Rzeszutek Wilk 
> >> Cc: David Vrabel 
> >> Cc: Boris Ostrovsky 
> >> Cc: Matt Rushton 
> >> Cc: Matt Wilson 
> >> Cc: Ian Campbell 
> >> ---
> >> This should be applied after the patch:
> >>
> >> xen-blkback: fix memory leak when persistent grants are used
> > 
> > Could you respin the series with the issues below fixed and
> > have said patch as part of the series. That way not only does
> > it have your SoB on it but it makes it easier to apply the patch
> > for lazy^H^H^Hbusy maintainers and makes it clear that you had
> > tested both of them.
> > 
> > Also, please CC Jens Axboe on these patches.
> 
> Ack, will do once the comments below are sorted out.

We'll keep an eye out for the resubmitted series and do some testing.

Thanks for taking this the extra step.

--msw

> >> >From Matt Rushton & Matt Wilson and backported to stable.
> >>
> >> I've been able to create and destroy ~4000 guests while doing heavy IO
> >> operations with this patch on a 512M Dom0 without problems.
> >> ---
> >>  drivers/block/xen-blkback/blkback.c |   29 +++--
> >>  drivers/block/xen-blkback/xenbus.c  |9 +
> >>  2 files changed, 28 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c 
> >> b/drivers/block/xen-blkback/blkback.c
> >> index 30ef7b3..19925b7 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif 
> >> *blkif,
> >>struct pending_req *pending_req);
> >>  static void make_response(struct xen_blkif *blkif, u64 id,
> >>  unsigned short op, int st);
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
> >>  
> >>  #define foreach_grant_safe(pos, n, rbtree, node) \
> >>for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
> >> @@ -625,6 +626,12 @@ purge_gnt_list:
> >>print_stats(blkif);
> >>}
> >>  
> >> +  /* Drain pending IO */
> >> +  xen_blk_drain_io(blkif, true);
> >> +
> >> +  /* Drain pending purge work */
> >> +  flush_work(>persistent_purge_work);
> >> +
> >>/* Free all persistent grant pages */
> >>if (!RB_EMPTY_ROOT(>persistent_gnts))
> >>free_persistent_gnts(blkif, >persistent_gnts,
> >> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
> >>return -EIO;
> >>  }
> >>  
> >> -static void xen_blk_drain_io(struct xen_blkif *blkif)
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
> >>  {
> >>atomic_set(>drain, 1);
> >>do {
> >> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
> >>  
> >>if (!atomic_read(>drain))
> >>break;
> >> -  } while (!kthread_should_stop());
> >> +  } while (!kthread_should_stop() || force);
> >>atomic_set(>drain, 0);
> >>  }
> >>  
> >> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req 
> >> *pending_req, int error)
> >> * the proper response on the ring.
> >> */
> >>if (atomic_dec_and_test(_req->pendcnt)) {
> >> -  xen

Re: [PATCH] xen-blkback: fix memory leaks

2014-01-27 Thread Matt Wilson
On Mon, Jan 27, 2014 at 05:19:41PM +0100, Roger Pau Monné wrote:
 On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote:
  On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
  I've at least identified two possible memory leaks in blkback, both
  related to the shutdown path of a VBD:
 
  - We don't wait for any pending purge work to finish before cleaning
the list of free_pages. The purge work will call put_free_pages and
thus we might end up with pages being added to the free_pages list
after we have emptied it.
  - We don't wait for pending requests to end before cleaning persistent
grants and the list of free_pages. Again this can add pages to the
free_pages lists or persistent grants to the persistent_gnts
red-black tree.
 
  Also, add some checks in xen_blkif_free to make sure we are cleaning
  everything.
 
  Signed-off-by: Roger Pau Monné roger@citrix.com
  Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Cc: David Vrabel david.vra...@citrix.com
  Cc: Boris Ostrovsky boris.ostrov...@oracle.com
  Cc: Matt Rushton mrush...@amazon.com
  Cc: Matt Wilson m...@amazon.com
  Cc: Ian Campbell ian.campb...@citrix.com
  ---
  This should be applied after the patch:
 
  xen-blkback: fix memory leak when persistent grants are used
  
  Could you respin the series with the issues below fixed and
  have said patch as part of the series. That way not only does
  it have your SoB on it but it makes it easier to apply the patch
  for lazy^H^H^Hbusy maintainers and makes it clear that you had
  tested both of them.
  
  Also, please CC Jens Axboe on these patches.
 
 Ack, will do once the comments below are sorted out.

We'll keep an eye out for the resubmitted series and do some testing.

Thanks for taking this the extra step.

--msw

  From Matt Rushton  Matt Wilson and backported to stable.
 
  I've been able to create and destroy ~4000 guests while doing heavy IO
  operations with this patch on a 512M Dom0 without problems.
  ---
   drivers/block/xen-blkback/blkback.c |   29 +++--
   drivers/block/xen-blkback/xenbus.c  |9 +
   2 files changed, 28 insertions(+), 10 deletions(-)
 
  diff --git a/drivers/block/xen-blkback/blkback.c 
  b/drivers/block/xen-blkback/blkback.c
  index 30ef7b3..19925b7 100644
  --- a/drivers/block/xen-blkback/blkback.c
  +++ b/drivers/block/xen-blkback/blkback.c
  @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif 
  *blkif,
 struct pending_req *pending_req);
   static void make_response(struct xen_blkif *blkif, u64 id,
   unsigned short op, int st);
  +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
   
   #define foreach_grant_safe(pos, n, rbtree, node) \
 for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
  @@ -625,6 +626,12 @@ purge_gnt_list:
 print_stats(blkif);
 }
   
  +  /* Drain pending IO */
  +  xen_blk_drain_io(blkif, true);
  +
  +  /* Drain pending purge work */
  +  flush_work(blkif-persistent_purge_work);
  +
 /* Free all persistent grant pages */
 if (!RB_EMPTY_ROOT(blkif-persistent_gnts))
 free_persistent_gnts(blkif, blkif-persistent_gnts,
  @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
 return -EIO;
   }
   
  -static void xen_blk_drain_io(struct xen_blkif *blkif)
  +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
   {
 atomic_set(blkif-drain, 1);
 do {
  @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
   
 if (!atomic_read(blkif-drain))
 break;
  -  } while (!kthread_should_stop());
  +  } while (!kthread_should_stop() || force);
 atomic_set(blkif-drain, 0);
   }
   
  @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req 
  *pending_req, int error)
  * the proper response on the ring.
  */
 if (atomic_dec_and_test(pending_req-pendcnt)) {
  -  xen_blkbk_unmap(pending_req-blkif,
  +  struct xen_blkif *blkif = pending_req-blkif;
  +
  +  xen_blkbk_unmap(blkif,
 pending_req-segments,
 pending_req-nr_pages);
  -  make_response(pending_req-blkif, pending_req-id,
  +  make_response(blkif, pending_req-id,
   pending_req-operation, pending_req-status);
  -  xen_blkif_put(pending_req-blkif);
  -  if (atomic_read(pending_req-blkif-refcnt) = 2) {
  -  if (atomic_read(pending_req-blkif-drain))
  -  complete(pending_req-blkif-drain_complete);
  +  free_req(blkif, pending_req);
  +  xen_blkif_put(blkif);
  +  if (atomic_read(blkif-refcnt) = 2) {
  +  if (atomic_read(blkif-drain))
  +  complete(blkif-drain_complete);
 }
  -  free_req(pending_req-blkif

Re: [Xen-devel] [PATCH] xen-blkback: fix memory leak when persistent grants are used

2014-01-25 Thread Matt Wilson
On Fri, Jan 24, 2014 at 03:36:22PM +, Ian Campbell wrote:
> On Fri, 2014-01-24 at 09:21 +, Ian Campbell wrote:
> > On Thu, 2014-01-23 at 11:28 -0800, Matt Wilson wrote:
> > > From: Matt Rushton 
> > > 
> > > Currently shrink_free_pagepool() is called before the pages used for
> > > persistent grants are released via free_persistent_gnts(). This
> > > results in a memory leak when a VBD that uses persistent grants is
> > > torn down.
> > 
> > This may well be the explanation for the memory leak I was observing on
> > ARM last night. I'll give it a go and let you know.
> 
> Results are a bit inconclusive unfortunately, it seems like I am seeing
> some other leak too (or instead).
> 
> Totally unscientifically it does seem to be leaking more slowly than
> before, so perhaps this patch has helped, but nothing conclusive I'm
> afraid.

Testing here looks good. I don't know if perhaps something else is
going on with ARM...

> I don't think that quite qualifies for a Tested-by though, sorry.

How about an Acked-by? ;-)

--msw

> Ian. 
> 
> > 
> > > Cc: Konrad Rzeszutek Wilk 
> > > Cc: "Roger Pau Monné" 
> > > Cc: Ian Campbell 
> > > Cc: David Vrabel 
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: xen-de...@lists.xen.org
> > > Cc: Anthony Liguori 
> > > Signed-off-by: Matt Rushton 
> > > Signed-off-by: Matt Wilson 
> > > ---
> > >  drivers/block/xen-blkback/blkback.c |6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/block/xen-blkback/blkback.c 
> > > b/drivers/block/xen-blkback/blkback.c
> > > index 6620b73..30ef7b3 100644
> > > --- a/drivers/block/xen-blkback/blkback.c
> > > +++ b/drivers/block/xen-blkback/blkback.c
> > > @@ -625,9 +625,6 @@ purge_gnt_list:
> > >   print_stats(blkif);
> > >   }
> > >  
> > > - /* Since we are shutting down remove all pages from the buffer */
> > > - shrink_free_pagepool(blkif, 0 /* All */);
> > > -
> > >   /* Free all persistent grant pages */
> > >   if (!RB_EMPTY_ROOT(>persistent_gnts))
> > >   free_persistent_gnts(blkif, >persistent_gnts,
> > > @@ -636,6 +633,9 @@ purge_gnt_list:
> > >   BUG_ON(!RB_EMPTY_ROOT(>persistent_gnts));
> > >   blkif->persistent_gnt_c = 0;
> > >  
> > > + /* Since we are shutting down remove all pages from the buffer */
> > > + shrink_free_pagepool(blkif, 0 /* All */);
> > > +
> > >   if (log_stats)
> > >   print_stats(blkif);
> > >  
> > 
> > 
> > 
> > ___
> > Xen-devel mailing list
> > xen-de...@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen-blkback: fix memory leak when persistent grants are used

2014-01-25 Thread Matt Wilson
On Fri, Jan 24, 2014 at 03:36:22PM +, Ian Campbell wrote:
 On Fri, 2014-01-24 at 09:21 +, Ian Campbell wrote:
  On Thu, 2014-01-23 at 11:28 -0800, Matt Wilson wrote:
   From: Matt Rushton mrush...@amazon.com
   
   Currently shrink_free_pagepool() is called before the pages used for
   persistent grants are released via free_persistent_gnts(). This
   results in a memory leak when a VBD that uses persistent grants is
   torn down.
  
  This may well be the explanation for the memory leak I was observing on
  ARM last night. I'll give it a go and let you know.
 
 Results are a bit inconclusive unfortunately, it seems like I am seeing
 some other leak too (or instead).
 
 Totally unscientifically it does seem to be leaking more slowly than
 before, so perhaps this patch has helped, but nothing conclusive I'm
 afraid.

Testing here looks good. I don't know if perhaps something else is
going on with ARM...

 I don't think that quite qualifies for a Tested-by though, sorry.

How about an Acked-by? ;-)

--msw

 Ian. 
 
  
   Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
   Cc: Roger Pau Monné roger@citrix.com
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: David Vrabel david.vra...@citrix.com
   Cc: linux-kernel@vger.kernel.org
   Cc: xen-de...@lists.xen.org
   Cc: Anthony Liguori aligu...@amazon.com
   Signed-off-by: Matt Rushton mrush...@amazon.com
   Signed-off-by: Matt Wilson m...@amazon.com
   ---
drivers/block/xen-blkback/blkback.c |6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
   
   diff --git a/drivers/block/xen-blkback/blkback.c 
   b/drivers/block/xen-blkback/blkback.c
   index 6620b73..30ef7b3 100644
   --- a/drivers/block/xen-blkback/blkback.c
   +++ b/drivers/block/xen-blkback/blkback.c
   @@ -625,9 +625,6 @@ purge_gnt_list:
 print_stats(blkif);
 }

   - /* Since we are shutting down remove all pages from the buffer */
   - shrink_free_pagepool(blkif, 0 /* All */);
   -
 /* Free all persistent grant pages */
 if (!RB_EMPTY_ROOT(blkif-persistent_gnts))
 free_persistent_gnts(blkif, blkif-persistent_gnts,
   @@ -636,6 +633,9 @@ purge_gnt_list:
 BUG_ON(!RB_EMPTY_ROOT(blkif-persistent_gnts));
 blkif-persistent_gnt_c = 0;

   + /* Since we are shutting down remove all pages from the buffer */
   + shrink_free_pagepool(blkif, 0 /* All */);
   +
 if (log_stats)
 print_stats(blkif);

  
  
  
  ___
  Xen-devel mailing list
  xen-de...@lists.xen.org
  http://lists.xen.org/xen-devel
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] xen/grant-table: Avoid m2p_override during mapping

2014-01-23 Thread Matt Wilson
On Thu, Jan 23, 2014 at 09:23:44PM +, Zoltan Kiss wrote:
> The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
> for blkback and future netback patches it just cause a lock contention, as
> those pages never go to userspace. Therefore this series does the following:
> - the original functions were renamed to __gnttab_[un]map_refs, with a new
>   parameter m2p_override
> - based on m2p_override either they follow the original behaviour, or just set
>   the private flag and call set_phys_to_machine
> - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
>   m2p_override false
> - a new function gnttab_[un]map_refs_userspace provides the old behaviour
> 
> It also removes a stray space from page.h and change ret to 0 if
> XENFEAT_auto_translated_physmap, as that is the only possible return value
> there.
> 
> v2:
> - move the storing of the old mfn in page->index to gnttab_map_refs
> - move the function header update to a separate patch
> 
> v3:
> - a new approach to retain old behaviour where it needed
> - squash the patches into one
> 
> v4:
> - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
> - clear page->private before doing anything with the page, so 
> m2p_find_override
>   won't race with this
> 
> v5:
> - change return value handling in __gnttab_[un]map_refs
> - remove a stray space in page.h
> - add detail why ret = 0 now at some places
> 
> v6:
> - don't pass pfn to m2p* functions, just get it locally
> 
> Signed-off-by: Zoltan Kiss 
> Suggested-by: David Vrabel 

Apologies for coming in late on this thread. I'm quite behind on
xen-devel mail that isn't CC: to me.

It seems to have been forgotten that Anthony and I proposed a similar
change last November.

https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws

Or am I misunderstanding the change?

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen-blkback: fix memory leak when persistent grants are used

2014-01-23 Thread Matt Wilson
From: Matt Rushton 

Currently shrink_free_pagepool() is called before the pages used for
persistent grants are released via free_persistent_gnts(). This
results in a memory leak when a VBD that uses persistent grants is
torn down.

Cc: Konrad Rzeszutek Wilk 
Cc: "Roger Pau Monné" 
Cc: Ian Campbell 
Cc: David Vrabel 
Cc: linux-kernel@vger.kernel.org
Cc: xen-de...@lists.xen.org
Cc: Anthony Liguori 
Signed-off-by: Matt Rushton 
Signed-off-by: Matt Wilson 
---
 drivers/block/xen-blkback/blkback.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 6620b73..30ef7b3 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -625,9 +625,6 @@ purge_gnt_list:
print_stats(blkif);
}
 
-   /* Since we are shutting down remove all pages from the buffer */
-   shrink_free_pagepool(blkif, 0 /* All */);
-
/* Free all persistent grant pages */
if (!RB_EMPTY_ROOT(>persistent_gnts))
free_persistent_gnts(blkif, >persistent_gnts,
@@ -636,6 +633,9 @@ purge_gnt_list:
BUG_ON(!RB_EMPTY_ROOT(>persistent_gnts));
blkif->persistent_gnt_c = 0;
 
+   /* Since we are shutting down remove all pages from the buffer */
+   shrink_free_pagepool(blkif, 0 /* All */);
+
if (log_stats)
print_stats(blkif);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen-blkback: fix memory leak when persistent grants are used

2014-01-23 Thread Matt Wilson
From: Matt Rushton mrush...@amazon.com

Currently shrink_free_pagepool() is called before the pages used for
persistent grants are released via free_persistent_gnts(). This
results in a memory leak when a VBD that uses persistent grants is
torn down.

Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Roger Pau Monné roger@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: David Vrabel david.vra...@citrix.com
Cc: linux-kernel@vger.kernel.org
Cc: xen-de...@lists.xen.org
Cc: Anthony Liguori aligu...@amazon.com
Signed-off-by: Matt Rushton mrush...@amazon.com
Signed-off-by: Matt Wilson m...@amazon.com
---
 drivers/block/xen-blkback/blkback.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 6620b73..30ef7b3 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -625,9 +625,6 @@ purge_gnt_list:
print_stats(blkif);
}
 
-   /* Since we are shutting down remove all pages from the buffer */
-   shrink_free_pagepool(blkif, 0 /* All */);
-
/* Free all persistent grant pages */
if (!RB_EMPTY_ROOT(blkif-persistent_gnts))
free_persistent_gnts(blkif, blkif-persistent_gnts,
@@ -636,6 +633,9 @@ purge_gnt_list:
BUG_ON(!RB_EMPTY_ROOT(blkif-persistent_gnts));
blkif-persistent_gnt_c = 0;
 
+   /* Since we are shutting down remove all pages from the buffer */
+   shrink_free_pagepool(blkif, 0 /* All */);
+
if (log_stats)
print_stats(blkif);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] xen/grant-table: Avoid m2p_override during mapping

2014-01-23 Thread Matt Wilson
On Thu, Jan 23, 2014 at 09:23:44PM +, Zoltan Kiss wrote:
 The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
 for blkback and future netback patches it just cause a lock contention, as
 those pages never go to userspace. Therefore this series does the following:
 - the original functions were renamed to __gnttab_[un]map_refs, with a new
   parameter m2p_override
 - based on m2p_override either they follow the original behaviour, or just set
   the private flag and call set_phys_to_machine
 - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
   m2p_override false
 - a new function gnttab_[un]map_refs_userspace provides the old behaviour
 
 It also removes a stray space from page.h and change ret to 0 if
 XENFEAT_auto_translated_physmap, as that is the only possible return value
 there.
 
 v2:
 - move the storing of the old mfn in page-index to gnttab_map_refs
 - move the function header update to a separate patch
 
 v3:
 - a new approach to retain old behaviour where it needed
 - squash the patches into one
 
 v4:
 - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
 - clear page-private before doing anything with the page, so 
 m2p_find_override
   won't race with this
 
 v5:
 - change return value handling in __gnttab_[un]map_refs
 - remove a stray space in page.h
 - add detail why ret = 0 now at some places
 
 v6:
 - don't pass pfn to m2p* functions, just get it locally
 
 Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
 Suggested-by: David Vrabel david.vra...@citrix.com

Apologies for coming in late on this thread. I'm quite behind on
xen-devel mail that isn't CC: to me.

It seems to have been forgotten that Anthony and I proposed a similar
change last November.

https://lkml.kernel.org/r/1384307336-5328-1-git-send-email-anth...@codemonkey.ws

Or am I misunderstanding the change?

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen/gnttab: leave lazy MMU mode in the case of a m2p override failure

2013-11-20 Thread Matt Wilson
From: Matt Wilson 

Commit f62805f1 introduced a bug where lazy MMU mode isn't exited if a
m2p_add/remove_override call fails.

Cc: Stefano Stabellini 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Anthony Liguori 
Cc: xen-de...@lists.xenproject.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matt Wilson 
---
 drivers/xen/grant-table.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62ccf54..0283871 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -930,9 +930,10 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
ret = m2p_add_override(mfn, pages[i], kmap_ops ?
   _ops[i] : NULL);
if (ret)
-   return ret;
+   goto out;
}
 
+ out:
if (lazy)
arch_leave_lazy_mmu_mode();
 
@@ -969,9 +970,10 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref 
*unmap_ops,
ret = m2p_remove_override(pages[i], kmap_ops ?
   _ops[i] : NULL);
if (ret)
-   return ret;
+   goto out;
}
 
+ out:
if (lazy)
arch_leave_lazy_mmu_mode();
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen/gnttab: leave lazy MMU mode in the case of a m2p override failure

2013-11-20 Thread Matt Wilson
From: Matt Wilson m...@amazon.com

Commit f62805f1 introduced a bug where lazy MMU mode isn't exited if a
m2p_add/remove_override call fails.

Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Anthony Liguori aligu...@amazon.com
Cc: xen-de...@lists.xenproject.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matt Wilson m...@amazon.com
---
 drivers/xen/grant-table.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62ccf54..0283871 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -930,9 +930,10 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
ret = m2p_add_override(mfn, pages[i], kmap_ops ?
   kmap_ops[i] : NULL);
if (ret)
-   return ret;
+   goto out;
}
 
+ out:
if (lazy)
arch_leave_lazy_mmu_mode();
 
@@ -969,9 +970,10 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref 
*unmap_ops,
ret = m2p_remove_override(pages[i], kmap_ops ?
   kmap_ops[i] : NULL);
if (ret)
-   return ret;
+   goto out;
}
 
+ out:
if (lazy)
arch_leave_lazy_mmu_mode();
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen-gnttab: do not add m2p override entries for blkback mappings

2013-11-12 Thread Matt Wilson
On Tue, Nov 12, 2013 at 05:48:56PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori 
> 
> Commit 5dc03639 switched blkback to also add m2p override entries
> when mapping grant pages but history seems to have forgotten why
> this is useful if it ever was.
> 
> The blkback driver does not need m2p override entries to exist
> and there is significant overhead due to the locking in the m2p
> override table.  We see about a 10% improvement in IOP rate with
> this patch applied running FIO in the guest.
> 
> See http://article.gmane.org/gmane.linux.kernel/1590932 for a full
> analysis of current users.
> 
> Signed-off-by: Anthony Liguori 

Reviewed-by: Matt Wilson 

One comment for discussion below.

> ---
> A backported version of this has been heavily tested but the testing
> against the latest Linux tree is light so far.

[...]

> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index c4d2298..081be8d 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -881,7 +881,8 @@ EXPORT_SYMBOL_GPL(gnttab_batch_copy);
>  
>  int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>   struct gnttab_map_grant_ref *kmap_ops,
> - struct page **pages, unsigned int count)
> + struct page **pages, unsigned int count,
> + int override)
>  {
>   int i, ret;
>   bool lazy = false;
> @@ -918,10 +919,29 @@ int gnttab_map_refs(struct gnttab_map_grant_ref 
> *map_ops,
>   } else {
>   mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
>   }
> - ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> -_ops[i] : NULL);
> - if (ret)
> - return ret;
> +
> + if (override) {
> + ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> +_ops[i] : NULL);
> + if (ret)
> + return ret;

The original code, which remains unmodified by this patch, is
returning without calling arch_leave_lazy_mmu_mode() at the bottom of
this function.

> + } else {
> + unsigned long pfn, old_mfn;
> +
> + pfn = page_to_pfn(pages[i]);
> + old_mfn = pfn_to_mfn(pfn);
> +
> + /* Save previous MFN in page private*/
> + WARN_ON(PagePrivate(pages[i]));
> + SetPagePrivate(pages[i]);
> + set_page_private(pages[i], old_mfn);
> +
> + if (!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))) {
> + ret = -ENOMEM;
> + break;

The new path doesn't repeat this error. I think that the override path
needs to be changed to do the same in a separate patch.

--msw

> + }
> + }
> + 
>   }
>  
>   if (lazy)
arch_leave_lazy_mmu_mode();

return ret;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen-gnttab: do not add m2p override entries for blkback mappings

2013-11-12 Thread Matt Wilson
On Tue, Nov 12, 2013 at 05:48:56PM -0800, Anthony Liguori wrote:
 From: Anthony Liguori aligu...@amazon.com
 
 Commit 5dc03639 switched blkback to also add m2p override entries
 when mapping grant pages but history seems to have forgotten why
 this is useful if it ever was.
 
 The blkback driver does not need m2p override entries to exist
 and there is significant overhead due to the locking in the m2p
 override table.  We see about a 10% improvement in IOP rate with
 this patch applied running FIO in the guest.
 
 See http://article.gmane.org/gmane.linux.kernel/1590932 for a full
 analysis of current users.
 
 Signed-off-by: Anthony Liguori aligu...@amazon.com

Reviewed-by: Matt Wilson m...@amazon.com

One comment for discussion below.

 ---
 A backported version of this has been heavily tested but the testing
 against the latest Linux tree is light so far.

[...]

 diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
 index c4d2298..081be8d 100644
 --- a/drivers/xen/grant-table.c
 +++ b/drivers/xen/grant-table.c
 @@ -881,7 +881,8 @@ EXPORT_SYMBOL_GPL(gnttab_batch_copy);
  
  int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
   struct gnttab_map_grant_ref *kmap_ops,
 - struct page **pages, unsigned int count)
 + struct page **pages, unsigned int count,
 + int override)
  {
   int i, ret;
   bool lazy = false;
 @@ -918,10 +919,29 @@ int gnttab_map_refs(struct gnttab_map_grant_ref 
 *map_ops,
   } else {
   mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
   }
 - ret = m2p_add_override(mfn, pages[i], kmap_ops ?
 -kmap_ops[i] : NULL);
 - if (ret)
 - return ret;
 +
 + if (override) {
 + ret = m2p_add_override(mfn, pages[i], kmap_ops ?
 +kmap_ops[i] : NULL);
 + if (ret)
 + return ret;

The original code, which remains unmodified by this patch, is
returning without calling arch_leave_lazy_mmu_mode() at the bottom of
this function.

 + } else {
 + unsigned long pfn, old_mfn;
 +
 + pfn = page_to_pfn(pages[i]);
 + old_mfn = pfn_to_mfn(pfn);
 +
 + /* Save previous MFN in page private*/
 + WARN_ON(PagePrivate(pages[i]));
 + SetPagePrivate(pages[i]);
 + set_page_private(pages[i], old_mfn);
 +
 + if (!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))) {
 + ret = -ENOMEM;
 + break;

The new path doesn't repeat this error. I think that the override path
needs to be changed to do the same in a separate patch.

--msw

 + }
 + }
 + 
   }
  
   if (lazy)
arch_leave_lazy_mmu_mode();

return ret;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-06 Thread Matt Wilson
On Wed, Nov 06, 2013 at 11:34:27AM +, David Vrabel wrote:
[...]
> 
> Matt, Anthony, I presume you have profiling results or performance data
> that support this proposed change?  Can you provide them?

I've measured 10-20% performance improvement in configurations where:

1) dom0 has a moderate number of vCPUs doing blkback work
2) domU has 32 vCPUs
3) 24 configured VBDs without persistent grant support
4) some lock contention in grant table hypercalls has been alleviated

More specific results are still in the works.

> > It's perfectly fine to store a foreign pfn in the m2p table.  The m2p
> > override table is used by the grant device to allow a reverse lookup of
> > the real mfn to a pfn even if it's foreign.
> > 
> > blkback doesn't actually need this though.  This was introduced in:
> > 
> > commit 5dc03639cc903f887931831d69895facb5260f4b
> > Author: Konrad Rzeszutek Wilk 
> > Date:   Tue Mar 1 16:46:45 2011 -0500
> > 
> > xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
> > 
> > Purely as an optimization.  In practice though due to lock contention it
> > slows things down.
> 
> The full changeset description for this change doesn't make sense to me.
> 
> xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
> 
> Instead of doing copy grants lets do mapping grants using
> the M2P(and P2M) override mechanism.
> 
> As all it is doing is replacing set_phys_to_machine() calls with
> m2p_add_override().

Indeed, since this had nothing to do with copying. We were confused
also. Konrad?

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-06 Thread Matt Wilson
On Wed, Nov 06, 2013 at 11:34:27AM +, David Vrabel wrote:
[...]
 
 Matt, Anthony, I presume you have profiling results or performance data
 that support this proposed change?  Can you provide them?

I've measured 10-20% performance improvement in configurations where:

1) dom0 has a moderate number of vCPUs doing blkback work
2) domU has 32 vCPUs
3) 24 configured VBDs without persistent grant support
4) some lock contention in grant table hypercalls has been alleviated

More specific results are still in the works.

  It's perfectly fine to store a foreign pfn in the m2p table.  The m2p
  override table is used by the grant device to allow a reverse lookup of
  the real mfn to a pfn even if it's foreign.
  
  blkback doesn't actually need this though.  This was introduced in:
  
  commit 5dc03639cc903f887931831d69895facb5260f4b
  Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Date:   Tue Mar 1 16:46:45 2011 -0500
  
  xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
  
  Purely as an optimization.  In practice though due to lock contention it
  slows things down.
 
 The full changeset description for this change doesn't make sense to me.
 
 xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
 
 Instead of doing copy grants lets do mapping grants using
 the M2P(and P2M) override mechanism.
 
 As all it is doing is replacing set_phys_to_machine() calls with
 m2p_add_override().

Indeed, since this had nothing to do with copying. We were confused
also. Konrad?

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.13

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 02:52:29PM -0800, Matt Wilson wrote:
> On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote:
> > 
> > Hey Jens,
> > 
> > Please git pull the following branch:
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> > stable/for-jens-3.13
> > 
> > 
> > which has a fix to the Xen block frontend driver. For the backends that
> > don't support persistent grants, we don't want to use the logic that would
> > use copying (needed for persistent grants). That logic existed there 
> > initially
> > but with the persistent grants code added in - it was removed (so in 3.10
> > time-frame). This adds part of it back.
> 
> I just tested this patch as part of a backported series on top of a
> 3.4.68 base. When this patch is included and a domU boots on a system
> that does not support persistent grants the kernel panics on boot with
> the trace below. Full boot log: http://paste.ubuntu.com/6367329/
> 
> It's possible we missed something in the backport series. I'll test
> this patch on top of 3.11.4 to see how it fares.

Hi Konrad,

Looks like there's something amiss in the backport. Testing your git
branch worked fine.

Sorry for the false alarm.

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.13

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote:
> 
> Hey Jens,
> 
> Please git pull the following branch:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-3.13
> 
> 
> which has a fix to the Xen block frontend driver. For the backends that
> don't support persistent grants, we don't want to use the logic that would
> use copying (needed for persistent grants). That logic existed there initially
> but with the persistent grants code added in - it was removed (so in 3.10
> time-frame). This adds part of it back.

I just tested this patch as part of a backported series on top of a
3.4.68 base. When this patch is included and a domU boots on a system
that does not support persistent grants the kernel panics on boot with
the trace below. Full boot log: http://paste.ubuntu.com/6367329/

It's possible we missed something in the backport series. I'll test
this patch on top of 3.11.4 to see how it fares.

 general protection fault:  [#1] SMP 
 CPU 14 
 Modules linked in: aesni_intel cryptd aes_x86_64 aes_generic dm_mirror 
dm_region_hash dm_log dm_mod
 
 Pid: 2192, comm: blkid Not tainted 3.4.68-3.8.amzn1.x86_64 #1 Xen HVM domU
 RIP: 0010:[]  [] get_grant+0x51/0x110
 RSP: 0018:881d07b5d978  EFLAGS: 00010006
 RAX: 025c68fd93b116ed RBX: 880e72fc6b50 RCX: 
 RDX: 1e8134cf7b38b76c RSI: 01d1a05f RDI: 881d07b5da64
 RBP: 881d07b5d998 R08: e200 R09: dead00100100
 R10: dead00200200 R11: 8b9207578571 R12: 01d1a05f
 R13: 880e72efc000 R14: 881d07b5da64 R15: 881d1822b0b0
 FS:  7fe197073760() GS:881d5bcc() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fe196c4a3b0 CR3: 001d079c6000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process blkid (pid: 2192, threadinfo 881d07b5c000, task 881d07ae1690)
 Stack:
  0246 880e72963400  881d1822b0b0
  881d07b5da98 812ddf60 0001 2000
  00ff881d07b5d9e8 881d07b5dfd8 881d1822b000 880e72d560b0
 Call Trace:
  [] do_blkif_request+0x410/0x7c0
  [] ? submit_bh+0xf7/0x120
  [] queue_unplugged+0x56/0xf0
  [] blk_flush_plug_list+0x1c3/0x220
  [] blk_finish_plug+0x18/0x50
  [] __do_page_cache_readahead+0x1c1/0x250
  [] force_page_cache_readahead+0x71/0xa0
  [] page_cache_sync_readahead+0x43/0x50
  [] generic_file_aio_read+0x4d8/0x760
  [] ? do_last+0x470/0x8c0
  [] blkdev_aio_read+0x58/0x80
  [] do_sync_read+0xda/0x120
  [] ? security_file_permission+0x93/0xb0
  [] ? rw_verify_area+0x61/0xf0
  [] vfs_read+0xb0/0x180
  [] sys_read+0x4a/0x90
  [] system_call_fastpath+0x16/0x1b
 Code: 48 8d 82 b0 12 00 00 48 39 c3 0f 84 ca 00 00 00 49 b9 00 01 10 00 00 00 
ad de 48 8b 13 49 ba 00 02 20 00 00 00 ad de 48 8b 43 08 <48> 89 42 08 48 89 10 
44 8b 5b f0 4c 89 0b 4c 89 53 08 45 85 db 
 RIP  [] get_grant+0x51/0x110
  RSP 
 ---[ end trace 3eba56aed2b98597 ]---

--msw

> Please pull!
> 
> P.S.
> Sorry for the late pull!
> 
>  drivers/block/xen-blkfront.c | 125 
> ++-
>  1 file changed, 100 insertions(+), 25 deletions(-)
> 
> Roger Pau Monne (1):
>   xen-blkfront: restore the non-persistent data path
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 02:09:53PM -0800, Anthony Liguori wrote:
> On Tue, Nov 5, 2013 at 1:16 PM, Konrad Rzeszutek Wilk
>  wrote:
> > On Tue, Nov 05, 2013 at 12:53:17PM -0800, Anthony Liguori wrote:
> >> Matt Wilson  writes:
> >>
> >> > On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote:
> >> >> On 05/11/13 16:08, Ian Campbell wrote:
> >> >> > On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote:
> >> >> >> On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote:
> >> >> >>> On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote:
> >> >> >>>> On 05/11/13 13:36, David Vrabel wrote:
> >> >> >>>>> On 05/11/13 11:24, Roger Pau Monne wrote:
> >> >> >>>>>> IMHO there's no reason to set a m2p override if the mapping is 
> >> >> >>>>>> done in
> >> >> >>>>>> kernel space, so only set the m2p override when kmap_ops is set.
> >> >> >>>>>
> >> >> >>>>> Can you provide a more detailed reasoning about why this is safe?
> >> >> >>>>
> >> >> >>>> To tell the truth, I don't understand why we need to use the m2p
> >> >> >>>> override for kernel space only mappings, my understanding is that 
> >> >> >>>> this
> >> >> >>>> m2p override is needed for user space mappings only (where we 
> >> >> >>>> actually
> >> >> >>>> end up doing two mappings, one in kernel space and one in user 
> >> >> >>>> space).
> >> >> >>>> For kernel space I don't see why we need to do anything else than
> >> >> >>>> setting the right p2m translation.
> >> >> >>>
> >> >> >>> We needed the m2p when doing DMA operations. As the driver would
> >> >> >>> want the bus address (so p2m) and then when unmapping the DMA we
> >> >> >>> only get the bus address  - so we needed to do a m2p lookup.
> >> >> >>
> >> >> >> OK, we need a m2p (that we already have in machine_to_phys_mapping),
> >> >> >> what I don't understand is why we need the m2p override.
> >> >> >
> >> >> > The m2p is a host global table.
> >> >> >
> >> >> > For a foreign page grant mapped into the current domain the m2p will
> >> >> > give you the foreign (owner) domain's p from the m, not the local one.
> >> >>
> >> >> Yes, you are completely right, then I have to figure out why blkback
> >> >> works fine with this patch applied (or at least it seems to work fine).
> >> >
> >> > blkback also works for me when testing a similar patch. I'm still
> >> > confused. One thing with your proposed patch: I'm not sure that you're
> >> > putting back the correct mfn.
> >>
> >> It's perfectly fine to store a foreign pfn in the m2p table.  The m2p
> >> override table is used by the grant device to allow a reverse lookup of
> >> the real mfn to a pfn even if it's foreign.
> >>
> >> blkback doesn't actually need this though.  This was introduced in:
> >>
> >> commit 5dc03639cc903f887931831d69895facb5260f4b
> >> Author: Konrad Rzeszutek Wilk 
> >> Date:   Tue Mar 1 16:46:45 2011 -0500
> >>
> >> xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
> >>
> >> Purely as an optimization.  In practice though due to lock contention it
> >> slows things down.
> >>
> >> I think an alternative would be to use a read/write lock instead of just
> >> a spinlock since it's the read path that is the most hot.
> >
> > The m2p hash table can also be expanded to lower the contention.
> 
> That was the first thing I tried :-)
> 
> The issue isn't lookup cost as much as the number of acquisitions and
> the cost of adding/removing entries.
> 
> >> I haven't tested that yet though.
> >
> > Looking forward to your patches :-)
> 
> I'm really just being thorough in suggesting the rwlock.  I actually
> think that not using the m2p override table is the right long term
> fix.

I tried a rwlock a while back. It didn't help much because we actually
have a problem with concurrent adds/removes as much as lookups.

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote:
> On 05/11/13 16:08, Ian Campbell wrote:
> > On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote:
> >> On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote:
>  On 05/11/13 13:36, David Vrabel wrote:
> > On 05/11/13 11:24, Roger Pau Monne wrote:
> >> IMHO there's no reason to set a m2p override if the mapping is done in
> >> kernel space, so only set the m2p override when kmap_ops is set.
> >
> > Can you provide a more detailed reasoning about why this is safe?
> 
>  To tell the truth, I don't understand why we need to use the m2p
>  override for kernel space only mappings, my understanding is that this
>  m2p override is needed for user space mappings only (where we actually
>  end up doing two mappings, one in kernel space and one in user space).
>  For kernel space I don't see why we need to do anything else than
>  setting the right p2m translation.
> >>>
> >>> We needed the m2p when doing DMA operations. As the driver would
> >>> want the bus address (so p2m) and then when unmapping the DMA we
> >>> only get the bus address  - so we needed to do a m2p lookup.
> >>
> >> OK, we need a m2p (that we already have in machine_to_phys_mapping),
> >> what I don't understand is why we need the m2p override.
> > 
> > The m2p is a host global table.
> > 
> > For a foreign page grant mapped into the current domain the m2p will
> > give you the foreign (owner) domain's p from the m, not the local one.
> 
> Yes, you are completely right, then I have to figure out why blkback
> works fine with this patch applied (or at least it seems to work fine).

blkback also works for me when testing a similar patch. I'm still
confused. One thing with your proposed patch: I'm not sure that you're
putting back the correct mfn.

Adding Anthony to the thread.

--msw

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote:
 On 05/11/13 16:08, Ian Campbell wrote:
  On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote:
  On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote:
  On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote:
  On 05/11/13 13:36, David Vrabel wrote:
  On 05/11/13 11:24, Roger Pau Monne wrote:
  IMHO there's no reason to set a m2p override if the mapping is done in
  kernel space, so only set the m2p override when kmap_ops is set.
 
  Can you provide a more detailed reasoning about why this is safe?
 
  To tell the truth, I don't understand why we need to use the m2p
  override for kernel space only mappings, my understanding is that this
  m2p override is needed for user space mappings only (where we actually
  end up doing two mappings, one in kernel space and one in user space).
  For kernel space I don't see why we need to do anything else than
  setting the right p2m translation.
 
  We needed the m2p when doing DMA operations. As the driver would
  want the bus address (so p2m) and then when unmapping the DMA we
  only get the bus address  - so we needed to do a m2p lookup.
 
  OK, we need a m2p (that we already have in machine_to_phys_mapping),
  what I don't understand is why we need the m2p override.
  
  The m2p is a host global table.
  
  For a foreign page grant mapped into the current domain the m2p will
  give you the foreign (owner) domain's p from the m, not the local one.
 
 Yes, you are completely right, then I have to figure out why blkback
 works fine with this patch applied (or at least it seems to work fine).

blkback also works for me when testing a similar patch. I'm still
confused. One thing with your proposed patch: I'm not sure that you're
putting back the correct mfn.

Adding Anthony to the thread.

--msw

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] grant-table: don't set m2p override if kmap_ops is not set

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 02:09:53PM -0800, Anthony Liguori wrote:
 On Tue, Nov 5, 2013 at 1:16 PM, Konrad Rzeszutek Wilk
 konrad.w...@oracle.com wrote:
  On Tue, Nov 05, 2013 at 12:53:17PM -0800, Anthony Liguori wrote:
  Matt Wilson m...@linux.com writes:
 
   On Tue, Nov 05, 2013 at 05:03:58PM +0100, Roger Pau Monné wrote:
   On 05/11/13 16:08, Ian Campbell wrote:
On Tue, 2013-11-05 at 16:01 +0100, Roger Pau Monné wrote:
On 05/11/13 15:56, Konrad Rzeszutek Wilk wrote:
On Tue, Nov 05, 2013 at 03:47:08PM +0100, Roger Pau Monné wrote:
On 05/11/13 13:36, David Vrabel wrote:
On 05/11/13 11:24, Roger Pau Monne wrote:
IMHO there's no reason to set a m2p override if the mapping is 
done in
kernel space, so only set the m2p override when kmap_ops is set.
   
Can you provide a more detailed reasoning about why this is safe?
   
To tell the truth, I don't understand why we need to use the m2p
override for kernel space only mappings, my understanding is that 
this
m2p override is needed for user space mappings only (where we 
actually
end up doing two mappings, one in kernel space and one in user 
space).
For kernel space I don't see why we need to do anything else than
setting the right p2m translation.
   
We needed the m2p when doing DMA operations. As the driver would
want the bus address (so p2m) and then when unmapping the DMA we
only get the bus address  - so we needed to do a m2p lookup.
   
OK, we need a m2p (that we already have in machine_to_phys_mapping),
what I don't understand is why we need the m2p override.
   
The m2p is a host global table.
   
For a foreign page grant mapped into the current domain the m2p will
give you the foreign (owner) domain's p from the m, not the local one.
  
   Yes, you are completely right, then I have to figure out why blkback
   works fine with this patch applied (or at least it seems to work fine).
  
   blkback also works for me when testing a similar patch. I'm still
   confused. One thing with your proposed patch: I'm not sure that you're
   putting back the correct mfn.
 
  It's perfectly fine to store a foreign pfn in the m2p table.  The m2p
  override table is used by the grant device to allow a reverse lookup of
  the real mfn to a pfn even if it's foreign.
 
  blkback doesn't actually need this though.  This was introduced in:
 
  commit 5dc03639cc903f887931831d69895facb5260f4b
  Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Date:   Tue Mar 1 16:46:45 2011 -0500
 
  xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
 
  Purely as an optimization.  In practice though due to lock contention it
  slows things down.
 
  I think an alternative would be to use a read/write lock instead of just
  a spinlock since it's the read path that is the most hot.
 
  The m2p hash table can also be expanded to lower the contention.
 
 That was the first thing I tried :-)
 
 The issue isn't lookup cost as much as the number of acquisitions and
 the cost of adding/removing entries.
 
  I haven't tested that yet though.
 
  Looking forward to your patches :-)
 
 I'm really just being thorough in suggesting the rwlock.  I actually
 think that not using the m2p override table is the right long term
 fix.

I tried a rwlock a while back. It didn't help much because we actually
have a problem with concurrent adds/removes as much as lookups.

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.13

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote:
 
 Hey Jens,
 
 Please git pull the following branch:
 
  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
 stable/for-jens-3.13
 
 
 which has a fix to the Xen block frontend driver. For the backends that
 don't support persistent grants, we don't want to use the logic that would
 use copying (needed for persistent grants). That logic existed there initially
 but with the persistent grants code added in - it was removed (so in 3.10
 time-frame). This adds part of it back.

I just tested this patch as part of a backported series on top of a
3.4.68 base. When this patch is included and a domU boots on a system
that does not support persistent grants the kernel panics on boot with
the trace below. Full boot log: http://paste.ubuntu.com/6367329/

It's possible we missed something in the backport series. I'll test
this patch on top of 3.11.4 to see how it fares.

 general protection fault:  [#1] SMP 
 CPU 14 
 Modules linked in: aesni_intel cryptd aes_x86_64 aes_generic dm_mirror 
dm_region_hash dm_log dm_mod
 
 Pid: 2192, comm: blkid Not tainted 3.4.68-3.8.amzn1.x86_64 #1 Xen HVM domU
 RIP: 0010:[812dda71]  [812dda71] get_grant+0x51/0x110
 RSP: 0018:881d07b5d978  EFLAGS: 00010006
 RAX: 025c68fd93b116ed RBX: 880e72fc6b50 RCX: 
 RDX: 1e8134cf7b38b76c RSI: 01d1a05f RDI: 881d07b5da64
 RBP: 881d07b5d998 R08: e200 R09: dead00100100
 R10: dead00200200 R11: 8b9207578571 R12: 01d1a05f
 R13: 880e72efc000 R14: 881d07b5da64 R15: 881d1822b0b0
 FS:  7fe197073760() GS:881d5bcc() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fe196c4a3b0 CR3: 001d079c6000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process blkid (pid: 2192, threadinfo 881d07b5c000, task 881d07ae1690)
 Stack:
  0246 880e72963400  881d1822b0b0
  881d07b5da98 812ddf60 0001 2000
  00ff881d07b5d9e8 881d07b5dfd8 881d1822b000 880e72d560b0
 Call Trace:
  [812ddf60] do_blkif_request+0x410/0x7c0
  [811882b7] ? submit_bh+0xf7/0x120
  [81212026] queue_unplugged+0x56/0xf0
  [81216573] blk_flush_plug_list+0x1c3/0x220
  [812165e8] blk_finish_plug+0x18/0x50
  [8110ec11] __do_page_cache_readahead+0x1c1/0x250
  [8110efb1] force_page_cache_readahead+0x71/0xa0
  [8110f333] page_cache_sync_readahead+0x43/0x50
  [81104f58] generic_file_aio_read+0x4d8/0x760
  [811673e0] ? do_last+0x470/0x8c0
  [8118ed98] blkdev_aio_read+0x58/0x80
  [81158e1a] do_sync_read+0xda/0x120
  [811e22f3] ? security_file_permission+0x93/0xb0
  [811592a1] ? rw_verify_area+0x61/0xf0
  [81159780] vfs_read+0xb0/0x180
  [8115989a] sys_read+0x4a/0x90
  [813f0979] system_call_fastpath+0x16/0x1b
 Code: 48 8d 82 b0 12 00 00 48 39 c3 0f 84 ca 00 00 00 49 b9 00 01 10 00 00 00 
ad de 48 8b 13 49 ba 00 02 20 00 00 00 ad de 48 8b 43 08 48 89 42 08 48 89 10 
44 8b 5b f0 4c 89 0b 4c 89 53 08 45 85 db 
 RIP  [812dda71] get_grant+0x51/0x110
  RSP 881d07b5d978
 ---[ end trace 3eba56aed2b98597 ]---

--msw

 Please pull!
 
 P.S.
 Sorry for the late pull!
 
  drivers/block/xen-blkfront.c | 125 
 ++-
  1 file changed, 100 insertions(+), 25 deletions(-)
 
 Roger Pau Monne (1):
   xen-blkfront: restore the non-persistent data path
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.13

2013-11-05 Thread Matt Wilson
On Tue, Nov 05, 2013 at 02:52:29PM -0800, Matt Wilson wrote:
 On Tue, Nov 05, 2013 at 11:59:08AM -0500, Konrad Rzeszutek Wilk wrote:
  
  Hey Jens,
  
  Please git pull the following branch:
  
   git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
  stable/for-jens-3.13
  
  
  which has a fix to the Xen block frontend driver. For the backends that
  don't support persistent grants, we don't want to use the logic that would
  use copying (needed for persistent grants). That logic existed there 
  initially
  but with the persistent grants code added in - it was removed (so in 3.10
  time-frame). This adds part of it back.
 
 I just tested this patch as part of a backported series on top of a
 3.4.68 base. When this patch is included and a domU boots on a system
 that does not support persistent grants the kernel panics on boot with
 the trace below. Full boot log: http://paste.ubuntu.com/6367329/
 
 It's possible we missed something in the backport series. I'll test
 this patch on top of 3.11.4 to see how it fares.

Hi Konrad,

Looks like there's something amiss in the backport. Testing your git
branch worked fine.

Sorry for the false alarm.

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hpet, allow user controlled mmap for user processes

2013-08-29 Thread Matt Wilson
On Fri, Mar 22, 2013 at 09:32:54AM -0400, Prarit Bhargava wrote:
> The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET
> registers to userspace.  The Kconfig help points out that in some cases this
> can be a security risk as some systems may erroneously configure the map such
> that additional data is exposed to userspace.
> 
> This is a problem for distributions -- some users want the MMAP functionality
> but it comes with a significant security risk.  In an effort to mitigate this
> risk, and due to the low number of users of the MMAP functionality, I've
> introduced a kernel parameter, hpet_mmap_enable, that is required in order
> to actually have the HPET MMAP exposed.
> 
> [v2]: Clemens suggested modifying the Kconfig help text and making the
> default setting configurable.
> [v3]: Fixed up Documentation and Kconfig entries, default now "Y"
> [v4]: After testing, found that I need to modify CONFIG_HPET_MMAP_DEFAULT 
> usage
> 
> Signed-off-by: Prarit Bhargava 
> Cc: Clemens Ladisch 
> ---
>  Documentation/kernel-parameters.txt |4 
>  drivers/char/Kconfig|9 +++--
>  drivers/char/hpet.c |   25 +++--
>  3 files changed, 34 insertions(+), 4 deletions(-)

It doesn't seem like this patch got picked up and seems like a good
idea to me. Clemens, what do you think?

Acked-by: Matt Wilson 

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hpet, allow user controlled mmap for user processes

2013-08-29 Thread Matt Wilson
On Fri, Mar 22, 2013 at 09:32:54AM -0400, Prarit Bhargava wrote:
 The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET
 registers to userspace.  The Kconfig help points out that in some cases this
 can be a security risk as some systems may erroneously configure the map such
 that additional data is exposed to userspace.
 
 This is a problem for distributions -- some users want the MMAP functionality
 but it comes with a significant security risk.  In an effort to mitigate this
 risk, and due to the low number of users of the MMAP functionality, I've
 introduced a kernel parameter, hpet_mmap_enable, that is required in order
 to actually have the HPET MMAP exposed.
 
 [v2]: Clemens suggested modifying the Kconfig help text and making the
 default setting configurable.
 [v3]: Fixed up Documentation and Kconfig entries, default now Y
 [v4]: After testing, found that I need to modify CONFIG_HPET_MMAP_DEFAULT 
 usage
 
 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Cc: Clemens Ladisch clem...@ladisch.de
 ---
  Documentation/kernel-parameters.txt |4 
  drivers/char/Kconfig|9 +++--
  drivers/char/hpet.c |   25 +++--
  3 files changed, 34 insertions(+), 4 deletions(-)

It doesn't seem like this patch got picked up and seems like a good
idea to me. Clemens, what do you think?

Acked-by: Matt Wilson m...@amazon.com

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] xen-blkfront: revoke foreign access for grants not mapped by the backend

2013-08-01 Thread Matt Wilson
On Wed, Jul 31, 2013 at 05:00:44PM +0200, Roger Pau Monne wrote:
> There's no need to keep the foreign access in a grant if it is not
> persistently mapped by the backend. This allows us to free grants that
> are not mapped by the backend, thus preventing blkfront from hoarding
> all grants.
> 
> The main effect of this is that blkfront will only persistently map
> the same grants as the backend, and it will always try to use grants
> that are already mapped by the backend. Also the number of persistent
> grants in blkfront is the same as in blkback (and is controlled by the
> value in blkback).
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Matt Wilson 

> Cc: Konrad Rzeszutek Wilk 
> Cc: David Vrabel 
> ---
>  drivers/block/xen-blkfront.c |   33 +
>  1 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 187a437..bc9fc7d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1016,13 +1016,38 @@ static void blkif_completion(struct blk_shadow *s, 
> struct blkfront_info *info,
>   }
>   /* Add the persistent grant into the list of free grants */
>   for (i = 0; i < nseg; i++) {
> - list_add(>grants_used[i]->node, >persistent_gnts);
> - info->persistent_gnts_c++;
> + if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
> + /*
> +  * If the grant is still mapped by the backend (the
> +  * backend has chosen to make this grant persistent)
> +  * we add it at the head of the list, so it will be
> +  * reused first.
> +  */
> + list_add(>grants_used[i]->node, 
> >persistent_gnts);
> + info->persistent_gnts_c++;
> + } else {
> + /*
> +  * If the grant is not mapped by the backend we end the
> +  * foreign access and add it to the tail of the list,
> +  * so it will not be picked again unless we run out of
> +  * persistent grants.
> +  */
> + gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 
> 0UL);
> + s->grants_used[i]->gref = GRANT_INVALID_REF;
> + list_add_tail(>grants_used[i]->node, 
> >persistent_gnts);
> + }
>   }
>   if (s->req.operation == BLKIF_OP_INDIRECT) {
>   for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
> - list_add(>indirect_grants[i]->node, 
> >persistent_gnts);
> - info->persistent_gnts_c++;
> + if 
> (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
> + list_add(>indirect_grants[i]->node, 
> >persistent_gnts);
> + info->persistent_gnts_c++;
> + } else {
> + 
> gnttab_end_foreign_access(s->indirect_grants[i]->gref, 0, 0UL);
> + s->indirect_grants[i]->gref = GRANT_INVALID_REF;
> + list_add_tail(>indirect_grants[i]->node,
> +   >persistent_gnts);
> + }
>   }
>   }
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] xen-blkfront: revoke foreign access for grants not mapped by the backend

2013-08-01 Thread Matt Wilson
On Wed, Jul 31, 2013 at 05:00:44PM +0200, Roger Pau Monne wrote:
 There's no need to keep the foreign access in a grant if it is not
 persistently mapped by the backend. This allows us to free grants that
 are not mapped by the backend, thus preventing blkfront from hoarding
 all grants.
 
 The main effect of this is that blkfront will only persistently map
 the same grants as the backend, and it will always try to use grants
 that are already mapped by the backend. Also the number of persistent
 grants in blkfront is the same as in blkback (and is controlled by the
 value in blkback).
 
 Signed-off-by: Roger Pau Monné roger@citrix.com

Acked-by: Matt Wilson m...@amazon.com

 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: David Vrabel david.vra...@citrix.com
 ---
  drivers/block/xen-blkfront.c |   33 +
  1 files changed, 29 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
 index 187a437..bc9fc7d 100644
 --- a/drivers/block/xen-blkfront.c
 +++ b/drivers/block/xen-blkfront.c
 @@ -1016,13 +1016,38 @@ static void blkif_completion(struct blk_shadow *s, 
 struct blkfront_info *info,
   }
   /* Add the persistent grant into the list of free grants */
   for (i = 0; i  nseg; i++) {
 - list_add(s-grants_used[i]-node, info-persistent_gnts);
 - info-persistent_gnts_c++;
 + if (gnttab_query_foreign_access(s-grants_used[i]-gref)) {
 + /*
 +  * If the grant is still mapped by the backend (the
 +  * backend has chosen to make this grant persistent)
 +  * we add it at the head of the list, so it will be
 +  * reused first.
 +  */
 + list_add(s-grants_used[i]-node, 
 info-persistent_gnts);
 + info-persistent_gnts_c++;
 + } else {
 + /*
 +  * If the grant is not mapped by the backend we end the
 +  * foreign access and add it to the tail of the list,
 +  * so it will not be picked again unless we run out of
 +  * persistent grants.
 +  */
 + gnttab_end_foreign_access(s-grants_used[i]-gref, 0, 
 0UL);
 + s-grants_used[i]-gref = GRANT_INVALID_REF;
 + list_add_tail(s-grants_used[i]-node, 
 info-persistent_gnts);
 + }
   }
   if (s-req.operation == BLKIF_OP_INDIRECT) {
   for (i = 0; i  INDIRECT_GREFS(nseg); i++) {
 - list_add(s-indirect_grants[i]-node, 
 info-persistent_gnts);
 - info-persistent_gnts_c++;
 + if 
 (gnttab_query_foreign_access(s-indirect_grants[i]-gref)) {
 + list_add(s-indirect_grants[i]-node, 
 info-persistent_gnts);
 + info-persistent_gnts_c++;
 + } else {
 + 
 gnttab_end_foreign_access(s-indirect_grants[i]-gref, 0, 0UL);
 + s-indirect_grants[i]-gref = GRANT_INVALID_REF;
 + list_add_tail(s-indirect_grants[i]-node,
 +   info-persistent_gnts);
 + }
   }
   }
  }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] xen-gnt: prevent adding duplicate gnt callbacks

2013-07-31 Thread Matt Wilson
On Wed, Jul 31, 2013 at 05:00:42PM +0200, Roger Pau Monne wrote:
> With the current implementation, the callback in the tail of the list
> can be added twice, because the check done in
> gnttab_request_free_callback is bogus, callback->next can be NULL if
> it is the last callback in the list. If we add the same callback twice
> we end up with an infinite loop, were callback == callback->next.
> 
> Replace this check with a proper one that iterates over the list to
> see if the callback has already been added.

Acked-by: Matt Wilson 

> Signed-off-by: Roger Pau Monné 
> Cc: Konrad Rzeszutek Wilk 
> Cc: David Vrabel 
> ---
> This patch should be backported to stable trees
> ---
>  drivers/xen/grant-table.c |   13 +++--
>  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 04c1b2d..d5418c1 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -729,9 +729,18 @@ void gnttab_request_free_callback(struct 
> gnttab_free_callback *callback,
> void (*fn)(void *), void *arg, u16 count)
>  {
>   unsigned long flags;
> + struct gnttab_free_callback *cb;
> +
>   spin_lock_irqsave(_list_lock, flags);
> - if (callback->next)
> - goto out;
> +
> + /* Check if the callback is already on the list */
> + cb = gnttab_free_callback_list;
> + while (cb) {
> + if (cb == callback)
> + goto out;
> + cb = cb->next;
> + }
> +
>   callback->fn = fn;
>   callback->arg = arg;
>   callback->count = count;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] xen-gnt: prevent adding duplicate gnt callbacks

2013-07-31 Thread Matt Wilson
On Wed, Jul 31, 2013 at 05:00:42PM +0200, Roger Pau Monne wrote:
 With the current implementation, the callback in the tail of the list
 can be added twice, because the check done in
 gnttab_request_free_callback is bogus, callback-next can be NULL if
 it is the last callback in the list. If we add the same callback twice
 we end up with an infinite loop, were callback == callback-next.
 
 Replace this check with a proper one that iterates over the list to
 see if the callback has already been added.

Acked-by: Matt Wilson m...@amazon.com

 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: David Vrabel david.vra...@citrix.com
 ---
 This patch should be backported to stable trees
 ---
  drivers/xen/grant-table.c |   13 +++--
  1 files changed, 11 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
 index 04c1b2d..d5418c1 100644
 --- a/drivers/xen/grant-table.c
 +++ b/drivers/xen/grant-table.c
 @@ -729,9 +729,18 @@ void gnttab_request_free_callback(struct 
 gnttab_free_callback *callback,
 void (*fn)(void *), void *arg, u16 count)
  {
   unsigned long flags;
 + struct gnttab_free_callback *cb;
 +
   spin_lock_irqsave(gnttab_list_lock, flags);
 - if (callback-next)
 - goto out;
 +
 + /* Check if the callback is already on the list */
 + cb = gnttab_free_callback_list;
 + while (cb) {
 + if (cb == callback)
 + goto out;
 + cb = cb-next;
 + }
 +
   callback-fn = fn;
   callback-arg = arg;
   callback-count = count;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec/kdump implementation for Xen PV domU

2013-07-29 Thread Matt Wilson
On Mon, Jul 29, 2013 at 07:15:43PM +0200, Daniel Kiper wrote:
> Hi all,
> 
> Here I am sending as attachments patches enabling kexec/kdump
> support in Xen PV domU. Only x84_64 architecture is supported.
> There is no support for i386 but some code could be easily reused.
> Here is a description of patches:

[...]

>   - kexec-kernel-only_20121203.patch: this patch fixes timer
> issue on Amazon EC2 machines.

Hi Daniel,

Do you know the cause of this issue? Does it have something to do with
singleshot timer migration when offlining/onlining SMP CPUs?

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec/kdump implementation for Xen PV domU

2013-07-29 Thread Matt Wilson
On Mon, Jul 29, 2013 at 07:15:43PM +0200, Daniel Kiper wrote:
 Hi all,
 
 Here I am sending as attachments patches enabling kexec/kdump
 support in Xen PV domU. Only x84_64 architecture is supported.
 There is no support for i386 but some code could be easily reused.
 Here is a description of patches:

[...]

   - kexec-kernel-only_20121203.patch: this patch fixes timer
 issue on Amazon EC2 machines.

Hi Daniel,

Do you know the cause of this issue? Does it have something to do with
singleshot timer migration when offlining/onlining SMP CPUs?

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] is kexec on Xen domU possible?

2013-07-23 Thread Matt Wilson
On Mon, Jul 22, 2013 at 11:33:15AM -0700, Greg KH wrote:
> On Mon, Jul 22, 2013 at 11:24:46AM -0700, H. Peter Anvin wrote:
> > On 07/22/2013 10:20 AM, Eric W. Biederman wrote:
> > >>>
> > >>> Also, in any virtualized environment the hypervisor can do a better job
> > >>> for things like kdump, simply because it can provide two things that are
> > >>> otherwise hard to do:
> > >>>
> > >>> 1. a known-good system state;
> > >>> 2. a known-clean kdump image.
> > >>>
> > >>> As such, I do encourage the virtualization people to (also) develop
> > >>> hypervisor-*aware* solutions for these kinds of things.
> > >>
> > >> In general I agree but if you could not change hypervisor
> > >> and/or dom0 (e.g. you are using cloud providers which are
> > >> stick to old versions of Xen) then you have no choice.
> > > 
> > > Which tends to be where kexec on panic comes in most cases.  Getting
> > > platform vendors to do something sane tends to be a multi-year political
> > > effort of dubious worth while just solving the problem locally actually
> > > gets the problem solved for those who care.
> > > 
> > 
> > It should not be a "one or the other" issue.
> 
> I don't care about kdump, I care about kexec on domU for people who are
> running on cloud providers with old versions of Xen so that they can
> control what kernel they can boot, when they want to boot it.  If kdump
> works as well, that's just a bonus, but it's down on the list of things
> for me to be concerned about.

Many Xen-based cloud providers provide a mechanism for users to boot
the kernels they want. For example you can use PV-GRUB on EC2
instances to boot a kernel that is stored within an AMI.

For more info:
  http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UserProvidedkernels.html

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] is kexec on Xen domU possible?

2013-07-23 Thread Matt Wilson
On Mon, Jul 22, 2013 at 11:33:15AM -0700, Greg KH wrote:
 On Mon, Jul 22, 2013 at 11:24:46AM -0700, H. Peter Anvin wrote:
  On 07/22/2013 10:20 AM, Eric W. Biederman wrote:
  
   Also, in any virtualized environment the hypervisor can do a better job
   for things like kdump, simply because it can provide two things that are
   otherwise hard to do:
  
   1. a known-good system state;
   2. a known-clean kdump image.
  
   As such, I do encourage the virtualization people to (also) develop
   hypervisor-*aware* solutions for these kinds of things.
  
   In general I agree but if you could not change hypervisor
   and/or dom0 (e.g. you are using cloud providers which are
   stick to old versions of Xen) then you have no choice.
   
   Which tends to be where kexec on panic comes in most cases.  Getting
   platform vendors to do something sane tends to be a multi-year political
   effort of dubious worth while just solving the problem locally actually
   gets the problem solved for those who care.
   
  
  It should not be a one or the other issue.
 
 I don't care about kdump, I care about kexec on domU for people who are
 running on cloud providers with old versions of Xen so that they can
 control what kernel they can boot, when they want to boot it.  If kdump
 works as well, that's just a bonus, but it's down on the list of things
 for me to be concerned about.

Many Xen-based cloud providers provide a mechanism for users to boot
the kernels they want. For example you can use PV-GRUB on EC2
instances to boot a kernel that is stored within an AMI.

For more info:
  http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UserProvidedkernels.html

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter

2013-07-10 Thread Matt Wilson
On Wed, Jul 10, 2013 at 12:34:58AM +0200, Sander Eikelenboom wrote:
> 
> Tuesday, July 9, 2013, 5:05:54 PM, you wrote:
> 
> > On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote:
> >> Then that should be discussed on grub2 to remove said check and modify
> >> the code so that it can properly work without regression.
> 
> > Actually, the kernel patch removing that symbol should be applied so
> > that grub2 breaks faster. One can't possibly rely on kernel internals
> > for anything, as it is insanely insane (yep, the tautology is on purpose
> > :-)).
>
> How insanely insane is it to be able to determine whether a certain
> compiled kernel binary supports a certain function ?
> 
> Grub does this in it's update script to prevent adding a xen +
> kernel combination that has no chance of booting when dom0 support
> has not been configured in the kernel.  That doesn't seem to be a
> unreasonable thought.
> 
> Grepping the accompanied config file in /boot for the xen dom0
> Kconfig parameter seems the best possible effort grub can do at the
> moment.

I think this can be improved, even with the situation today.

> Especially since the Kconfig parameter naming doesn't change that
> often.
> 
> If you know a better way for grub to determine if a certain function
> for a kernel binary is supported then please elaborate ..

Certainly. Parse the ELF notes that are present in a dom0-capable
Linux kernel binary itself.

$ readelf -n vmlinux

Notes at offset 0x0069be88 with length 0x017c:
  Owner Data size   Description
  Xen   0x0006  Unknown note type: (0x0006)
  Xen   0x0004  Unknown note type: (0x0007)
  Xen   0x0008  Unknown note type: (0x0005)
  Xen   0x0008  Unknown note type: (0x0003)
  Xen   0x0008  NT_VERSION (version)
  Xen   0x0008  NT_ARCH (architecture)
  Xen   0x002a  Unknown note type: (0x000a)
  Xen   0x0004  Unknown note type: (0x0009)
  Xen   0x0008  Unknown note type: (0x0008)
  Xen   0x0010  Unknown note type: (0x000d)
  Xen   0x0004  Unknown note type: (0x000e)
  Xen   0x0008  Unknown note type: (0x000c)
  Xen   0x0008  Unknown note type: (0x0004)
  GNU   0x0014  NT_GNU_BUILD_ID (unique build ID bitstring)

See arch/x86/xen/xen-head.S.

There's a new note type (XEN_ELFNOTE_SUPPORTED_FEATURES) that we can
use to make dom0 support explicit.

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/domain_build.c;hb=HEAD#l415

--msw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter

2013-07-10 Thread Matt Wilson
On Wed, Jul 10, 2013 at 12:34:58AM +0200, Sander Eikelenboom wrote:
 
 Tuesday, July 9, 2013, 5:05:54 PM, you wrote:
 
  On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote:
  Then that should be discussed on grub2 to remove said check and modify
  the code so that it can properly work without regression.
 
  Actually, the kernel patch removing that symbol should be applied so
  that grub2 breaks faster. One can't possibly rely on kernel internals
  for anything, as it is insanely insane (yep, the tautology is on purpose
  :-)).

 How insanely insane is it to be able to determine whether a certain
 compiled kernel binary supports a certain function ?
 
 Grub does this in it's update script to prevent adding a xen +
 kernel combination that has no chance of booting when dom0 support
 has not been configured in the kernel.  That doesn't seem to be a
 unreasonable thought.
 
 Grepping the accompanied config file in /boot for the xen dom0
 Kconfig parameter seems the best possible effort grub can do at the
 moment.

I think this can be improved, even with the situation today.

 Especially since the Kconfig parameter naming doesn't change that
 often.
 
 If you know a better way for grub to determine if a certain function
 for a kernel binary is supported then please elaborate ..

Certainly. Parse the ELF notes that are present in a dom0-capable
Linux kernel binary itself.

$ readelf -n vmlinux

Notes at offset 0x0069be88 with length 0x017c:
  Owner Data size   Description
  Xen   0x0006  Unknown note type: (0x0006)
  Xen   0x0004  Unknown note type: (0x0007)
  Xen   0x0008  Unknown note type: (0x0005)
  Xen   0x0008  Unknown note type: (0x0003)
  Xen   0x0008  NT_VERSION (version)
  Xen   0x0008  NT_ARCH (architecture)
  Xen   0x002a  Unknown note type: (0x000a)
  Xen   0x0004  Unknown note type: (0x0009)
  Xen   0x0008  Unknown note type: (0x0008)
  Xen   0x0010  Unknown note type: (0x000d)
  Xen   0x0004  Unknown note type: (0x000e)
  Xen   0x0008  Unknown note type: (0x000c)
  Xen   0x0008  Unknown note type: (0x0004)
  GNU   0x0014  NT_GNU_BUILD_ID (unique build ID bitstring)

See arch/x86/xen/xen-head.S.

There's a new note type (XEN_ELFNOTE_SUPPORTED_FEATURES) that we can
use to make dom0 support explicit.

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/domain_build.c;hb=HEAD#l415

--msw
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >