date:20150605

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Thomas Monjalon

2015-06-05 16:07, Andrew Harvey:
> On 6/5/15, 3:46 AM, "Thomas Monjalon"  wrote:
> >Stephen and me say the same thing about using the ethdev API.
> 
> And your would have a point would be valid if dpdk were available to every
> interface we support (it is not) and on every processor architecture that
> we support (it is not) and every OS we support (it is not).  So to
> minimize entropy in the code why not leave the client code the same
> ioctl(fd, ?) and hide the implementation
> detail in a wrapper library.

Please, explain the relation between an ioctl and the DPDK.

[dpdk-dev] [PATCH] examples/distributor: fix missing "; " in debug macro

2015-06-05 Thread Thomas Monjalon

2015-06-05 17:01, Bruce Richardson:
> The macro to turn on additional debug output when the app was compiled
> with "-DDEBUG" was missing a ";".

It shows that such dead code is almost never tested.
It would be saner if this command would return no result:
git grep 'ifdef.*DEBUG' examples
examples/distributor/main.c:#ifdef DEBUG
examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG
examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG
examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG
examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG
examples/packet_ordering/main.c:#ifdef DEBUG
examples/vhost/main.c:#ifdef DEBUG
examples/vhost/main.h:#ifdef DEBUG
examples/vhost_xen/main.c:#ifdef DEBUG
examples/vhost_xen/main.h:#ifdef DEBUG

There is no good reason to not use CONFIG_RTE_LOG_LEVEL to trigger debug build.

[dpdk-dev] 4 Traffic classes per Pipe limitation

2015-06-05 Thread Dumitrescu, Cristian

Hi Avinash,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash
> Sent: Friday, June 5, 2015 6:06 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] 4 Traffic classes per Pipe limitation
> 
> Hi,
> This is related to the QOS scheduler functionality provided by dpdk.
> 
> I see a limit on the number of traffic classes to be 4.  I'm exploring the
> available options to increase that limit to 8.

Yes, there are 4x traffic classes (scheduled in strict priority), but each 
traffic class has 4x queues (scheduled using WFQ); for big weight ratios 
between queues (e.g. 1:4 or 1:8, etc), WFQ becomes very similar to strict 
priority, a king of strict priority without starvation. So the 16x queues per 
pipe can be considered 16x sub-traffic-classes.

You might want to watch this video on DPDK QoS: https://youtu.be/_PPklkWGugs 

> 
> This is what I found when I researched on this topic.
> The limitation on number's of TC (and pipes) comes from the number of
> bits available. Since the QoS code overloads the 32 bit RSS field in
> the mbuf there isn't enough bits to a lot. But then again if you add lots
> of pipes or subports the memory footprint gets huge.

It is not that simple. The number of 4x traffic classes in deeply built into 
the implementation for performance reasons. Increasing the number of bits 
allocated to traffic class in mbuf->sched would not help.

> 
> Any more info or suggestions on increasing the limit to 8 ?

Yes, look at the 16x pipe queues as 16x (sub)traffic classes.
> 
> Thanks
> -Avinash

[dpdk-dev] [PATCH 4/4] app: replace dump_cfg with proc_info

2015-06-05 Thread Maryam Tahhan

Extend dump_cfg to also display statistcs information for given DPDK
ports and rename the application to proc_info as it's now a utility
doing a little more than just dumping the memory information for DPDK.

Signed-off-by: Maryam Tahhan 
---
 app/Makefile   |   2 +-
 app/dump_cfg/Makefile  |  45 -
 app/dump_cfg/main.c|  92 -
 app/proc_info/Makefile |  45 +
 app/proc_info/main.c   | 525 +
 mk/rte.sdktest.mk  |   4 +-
 6 files changed, 573 insertions(+), 140 deletions(-)
 delete mode 100644 app/dump_cfg/Makefile
 delete mode 100644 app/dump_cfg/main.c
 create mode 100644 app/proc_info/Makefile
 create mode 100644 app/proc_info/main.c

diff --git a/app/Makefile b/app/Makefile
index 50c670b..88c0bad 100644
--- a/app/Makefile
+++ b/app/Makefile
@@ -36,6 +36,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
 DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
-DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg
+DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info

 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/app/dump_cfg/Makefile b/app/dump_cfg/Makefile
deleted file mode 100644
index 3257127..000
--- a/app/dump_cfg/Makefile
+++ /dev/null
@@ -1,45 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-# * Redistributions of source code must retain the above copyright
-#   notice, this list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright
-#   notice, this list of conditions and the following disclaimer in
-#   the documentation and/or other materials provided with the
-#   distribution.
-# * Neither the name of Intel Corporation nor the names of its
-#   contributors may be used to endorse or promote products derived
-#   from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-APP = dump_cfg
-
-CFLAGS += $(WERROR_FLAGS)
-
-# all source are stored in SRCS-y
-
-SRCS-y := main.c
-
-# this application needs libraries first
-DEPDIRS-y += lib
-
-include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/dump_cfg/main.c b/app/dump_cfg/main.c
deleted file mode 100644
index 127dbb1..000
--- a/app/dump_cfg/main.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

[dpdk-dev] [PATCH 3/4] testpmd: extend testpmd to show all extended stats

2015-06-05 Thread Maryam Tahhan

Extend testpmd to show additional aggregate extended stats.

Signed-off-by: Maryam Tahhan 
---
 app/test-pmd/config.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..b42d83f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -153,6 +153,11 @@ nic_stats_display(portid_t port_id)
   stats.opackets, stats.oerrors, stats.obytes);
}

+   printf("  RX-MAC-errors: %-10"PRIu64" RX-PHY-errors: %-10"PRIu64"\n",
+  stats.imacerr, stats.iphyerr);
+   printf("  RX-nombuf:  %-10"PRIu64"  RX-dropped: %-10"PRIu64"\n",
+  stats.rx_nombuf, stats.idrop);
+
/* stats fdir */
if (fdir_conf.mode != RTE_FDIR_MODE_NONE)
printf("  Fdirmiss:   %-10"PRIu64" Fdirmatch: %-10"PRIu64"\n",
-- 
1.8.1.4

[dpdk-dev] [PATCH 2/4] ethdev: expose extended error stats

2015-06-05 Thread Maryam Tahhan

Extend rte_eth_xstats_get to retrieve additional stats from the device
driver as well the top level extended stats. Add additional drop
counters to the extended stats.

Signed-off-by: Maryam Tahhan 
---
 lib/librte_ether/rte_ethdev.c | 12 
 lib/librte_ether/rte_ethdev.h |  4 
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..8c22cda 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -129,6 +129,8 @@ static const struct rte_eth_xstats_name_off 
rte_stats_strings[] = {
{"rx_crc_errors", offsetof(struct rte_eth_stats, ibadcrc)},
{"rx_bad_length_errors", offsetof(struct rte_eth_stats, ibadlen)},
{"rx_errors", offsetof(struct rte_eth_stats, ierrors)},
+   {"rx_mac_err", offsetof(struct rte_eth_stats, imacerr)},
+   {"rx_phy_err", offsetof(struct rte_eth_stats, iphyerr)},
{"alloc_rx_buff_failed", offsetof(struct rte_eth_stats, rx_nombuf)},
{"fdir_match", offsetof(struct rte_eth_stats, fdirmatch)},
{"fdir_miss", offsetof(struct rte_eth_stats, fdirmiss)},
@@ -136,6 +138,8 @@ static const struct rte_eth_xstats_name_off 
rte_stats_strings[] = {
{"rx_flow_control_xon", offsetof(struct rte_eth_stats, rx_pause_xon)},
{"tx_flow_control_xoff", offsetof(struct rte_eth_stats, tx_pause_xoff)},
{"rx_flow_control_xoff", offsetof(struct rte_eth_stats, rx_pause_xoff)},
+   {"tx_drops", offsetof(struct rte_eth_stats, odrop)},
+   {"rx_drops", offsetof(struct rte_eth_stats, idrop)},
 };
 #define RTE_NB_STATS (sizeof(rte_stats_strings) / sizeof(rte_stats_strings[0]))

@@ -154,7 +158,6 @@ static const struct rte_eth_xstats_name_off 
rte_txq_stats_strings[] = {
 #define RTE_NB_TXQ_STATS (sizeof(rte_txq_stats_strings) /  \
sizeof(rte_txq_stats_strings[0]))

-
 /**
  * The user application callback description.
  *
@@ -1741,7 +1744,7 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats 
*xstats,
 {
struct rte_eth_stats eth_stats;
struct rte_eth_dev *dev;
-   unsigned count, i, q;
+   unsigned count = 0, xcount = 0, i, q;
uint64_t val;
char *stats_ptr;

@@ -1754,18 +1757,19 @@ rte_eth_xstats_get(uint8_t port_id, struct 
rte_eth_xstats *xstats,

/* implemented by the driver */
if (dev->dev_ops->xstats_get != NULL)
-   return (*dev->dev_ops->xstats_get)(dev, xstats, n);
+   xcount = (*dev->dev_ops->xstats_get)(dev, xstats, n);

/* else, return generic statistics */
count = RTE_NB_STATS;
count += dev->data->nb_rx_queues * RTE_NB_RXQ_STATS;
count += dev->data->nb_tx_queues * RTE_NB_TXQ_STATS;
+   count += xcount;
if (n < count)
return count;

/* now fill the xstats structure */

-   count = 0;
+   count = xcount;
rte_eth_stats_get(port_id, _stats);

/* global stats */
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..5bc3b81 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -224,6 +224,10 @@ struct rte_eth_stats {
/**< Total number of good bytes received from loopback,VF Only */
uint64_t olbbytes;
/**< Total number of good bytes transmitted to loopback,VF Only */
+   uint64_t imacerr;   /**< Total of RX packets with MAC Errors. */
+   uint64_t iphyerr;   /**< Total of RX packets with PHY Errors. */
+   uint64_t idrop;  /**< Total number of dropped received packets. */
+   uint64_t odrop;  /**< Total number of dropped transmitted packets. */
 };

 /**
-- 
1.8.1.4

[dpdk-dev] [PATCH 1/4] ixgbe: expose extended error statistics

2015-06-05 Thread Maryam Tahhan

Implement xstats_get() and xstats_reset() in dev_ops for ixgbe to expose
detailed error statistics to DPDK applications.

Signed-off-by: Maryam Tahhan 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 160 +--
 1 file changed, 138 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..f789aba 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -131,7 +131,10 @@ static int ixgbe_dev_link_update(struct rte_eth_dev *dev,
int wait_to_complete);
 static void ixgbe_dev_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *stats);
+static int ixgbe_dev_xstats_get(struct rte_eth_dev *dev,
+   struct rte_eth_xstats *xstats, unsigned n);
 static void ixgbe_dev_stats_reset(struct rte_eth_dev *dev);
+static void ixgbe_dev_xstats_reset(struct rte_eth_dev *dev);
 static int ixgbe_dev_queue_stats_mapping_set(struct rte_eth_dev *eth_dev,
 uint16_t queue_id,
 uint8_t stat_idx,
@@ -330,7 +333,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.allmulticast_disable = ixgbe_dev_allmulticast_disable,
.link_update  = ixgbe_dev_link_update,
.stats_get= ixgbe_dev_stats_get,
+   .xstats_get   = ixgbe_dev_xstats_get,
.stats_reset  = ixgbe_dev_stats_reset,
+   .xstats_reset = ixgbe_dev_xstats_reset,
.queue_stats_mapping_set = ixgbe_dev_queue_stats_mapping_set,
.dev_infos_get= ixgbe_dev_info_get,
.mtu_set  = ixgbe_dev_mtu_set,
@@ -408,6 +413,34 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.mac_addr_remove  = ixgbevf_remove_mac_addr,
 };

+/* store statistics names and its offset in stats structure  */
+struct rte_ixgbe_xstats_name_off {
+   char name[RTE_ETH_XSTATS_NAME_SIZE];
+   unsigned offset;
+};
+
+static const struct rte_ixgbe_xstats_name_off rte_ixgbe_stats_strings[] = {
+   {"rx_illegal_byte_err", offsetof(struct ixgbe_hw_stats, errbc)},
+   {"rx_len_err", offsetof(struct ixgbe_hw_stats, rlec)},
+   {"rx_undersize_count", offsetof(struct ixgbe_hw_stats, ruc)},
+   {"rx_oversize_count", offsetof(struct ixgbe_hw_stats, roc)},
+   {"rx_fragment_count", offsetof(struct ixgbe_hw_stats, rfc)},
+   {"rx_jabber_count", offsetof(struct ixgbe_hw_stats, rjc)},
+   {"l3_l4_xsum_error", offsetof(struct ixgbe_hw_stats, xec)},
+   {"mac_local_fault", offsetof(struct ixgbe_hw_stats, mlfc)},
+   {"mac_remote_fault", offsetof(struct ixgbe_hw_stats, mrfc)},
+   {"mac_short_pkt_discard", offsetof(struct ixgbe_hw_stats, mspdc)},
+   {"fccrc_error", offsetof(struct ixgbe_hw_stats, fccrc)},
+   {"fcoe_drop", offsetof(struct ixgbe_hw_stats, fcoerpdc)},
+   {"fc_last_error", offsetof(struct ixgbe_hw_stats, fclast)},
+   {"rx_multicast_packets", offsetof(struct ixgbe_hw_stats, mprc)},
+   {"rx_broadcast_packets", offsetof(struct ixgbe_hw_stats, bprc)},
+   {"mgmt_pkts_dropped", offsetof(struct ixgbe_hw_stats, mngpdc)},
+};
+
+#define RTE_NB_XSTATS (sizeof(rte_ixgbe_stats_strings) /   \
+   sizeof(rte_ixgbe_stats_strings[0]))
+
 /**
  * Atomically reads the link status information from global
  * structure rte_eth_dev.
@@ -1739,26 +1772,18 @@ ixgbe_dev_close(struct rte_eth_dev *dev)
ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 }

-/*
- * This function is based on ixgbe_update_stats_counters() in base/ixgbe.c
- */
 static void
-ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+ixgbe_read_stats_registers(struct ixgbe_hw *hw, struct ixgbe_hw_stats 
*hw_stats,
+  uint64_t *total_missed_rx, uint64_t *total_qbrc,
+  uint64_t *total_qprc, uint64_t *rxnfgpc,
+  uint64_t *txdgpc)
 {
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct ixgbe_hw_stats *hw_stats =
-   IXGBE_DEV_PRIVATE_TO_STATS(dev->data->dev_private);
uint32_t bprc, lxon, lxoff, total;
-   uint64_t total_missed_rx, total_qbrc, total_qprc;
unsigned i;

-   total_missed_rx = 0;
-   total_qbrc = 0;
-   total_qprc = 0;
-
hw_stats->crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS);
hw_stats->illerrc += IXGBE_READ_REG(hw, IXGBE_ILLERRC);
+
hw_stats->errbc += IXGBE_READ_REG(hw, IXGBE_ERRBC);
hw_stats->mspdc += IXGBE_READ_REG(hw, IXGBE_MSPDC);

@@ -1768,7 +1793,7 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
/* global total per queue */
hw_stats->mpc[i] += mp;
/* Running comprehensive total for stats display */
-

[dpdk-dev] [PATCH 0/4] expose ixgbe extended stats to dpdk apps

2015-06-05 Thread Maryam Tahhan

This patch implements xstats_get() and xstats_reset() in dev_ops for ixgbe to
expose detailed error statistics to DPDK applications.

The dump_cfg application is extended to demonstrate the usage of retrieving
statistics for DPDK interfaces and renamed to proc_info in order reflect this
new functionality.

The testpmd app was also extended to display additional statistics.

Maryam Tahhan (4):
  ixgbe: expose extended error statistics
  ethdev: expose extended error stats
  testpmd: extend testpmd to show all extended stats
  app: replace dump_cfg with proc_info

 app/Makefile |   2 +-
 app/dump_cfg/Makefile|  45 
 app/dump_cfg/main.c  |  92 ---
 app/proc_info/Makefile   |  45 
 app/proc_info/main.c | 525 +++
 app/test-pmd/config.c|   5 +
 drivers/net/ixgbe/ixgbe_ethdev.c | 160 ++--
 lib/librte_ether/rte_ethdev.c|  12 +-
 lib/librte_ether/rte_ethdev.h|   4 +
 mk/rte.sdktest.mk|   4 +-
 10 files changed, 728 insertions(+), 166 deletions(-)
 delete mode 100644 app/dump_cfg/Makefile
 delete mode 100644 app/dump_cfg/main.c
 create mode 100644 app/proc_info/Makefile
 create mode 100644 app/proc_info/main.c

-- 
1.8.1.4

[dpdk-dev] 答复： Poor Virtio PMD TX Performance

2015-06-05 Thread 钢锁0310

There is same problem by using ovs-dpdk
Maybe that is because recv of VM is poor,so no free tx?descriptors,ovs-dpdk 
drop the 
pktsRTFSC*--Zhou,
 Tianlin ?2015?6?5?(???) 17:23dev at 
dpdk.org [dpdk-dev] Poor Virtio PMD TX 
PerformanceHi?there,We?tested?TX?performance?of?Virtio?PMD?by?DPDK?l2fwd,?but?found?even?at?60KPPS?(720B?packet?length)?TX?rate,?there?is?1/1000?packet?dropping?rate.The?log?shows?"No?free?tx?descriptors?to?transmit"?in?Virtio?PMD.Increasing?TX?queues?by?modifying?DPDK?l2fwd?can?decreases?packet?dropping?rate,?but?can't?ensure?no?packet?dropping?unless?retransmitting?packets?that?can't?be?sent?successfully.Oppositely,?RX?rate?can?be?600KPPS?without?packet?dropping.Test?Env-?Host?CPU:?4?cores,?2127.770MHz-?Host?Memory:?8G-?Host?OS:?Linux?dw-2?3.13.0-24-generic?#46-Ubuntu?SMP?Thu?Apr?10?19:11:08?UTC?2014?x86_64?x86_64?x86_64?GNU/Linux-?Guest?CPU:?4?cores,?2127.770MHz-?Guest?Memory:?4G-?Guest?OS:?fedora20Anybody?here?face?the?same?problem?-Tianlin

[dpdk-dev] ACL-Dynamic Adding or Deleting rules

2015-06-05 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sugumaran, Varthamanan
> Sent: Friday, June 05, 2015 6:22 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] ACL-Dynamic Adding or Deleting rules
> 
> Hi,
> Is there is a way to add/delete a single ACL rule in librte_acl?
> I had looked at the library and found no method to add/delete a rule 
> dynamically to the existing ACL context.

No, you always have to call rte_acl_build() to regenerate the whole RT table.
Konstantin


> Please let me know if there are any alternate ways of doing it.
> 
> Thanks
> Vartha

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Andrew Harvey (agh)

On 6/5/15, 5:47 AM, "Bruce Richardson"  wrote:

>On Fri, Jun 05, 2015 at 11:25:09AM +, Wang, Liang-min wrote:
>> 
>> 
>> > -Original Message-
>> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> > Sent: Friday, June 05, 2015 6:47 AM
>> > To: Andrew Harvey (agh)
>> > Cc: Stephen Hemminger; Wang, Liang-min; dev at dpdk.org
>> > Subject: Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to
>>provide
>> > ethtool-alike APIs
>> > 
>> > 2015-06-04 22:10, Andrew Harvey:
>> > > On 6/4/15, 7:58 AM, "Stephen Hemminger"
>> >  wrote:
>> > > >"Andrew Harvey (agh)"  wrote:
>> > > >> I believe that their is value in this interface for software
>>stacks
>> > > >>not  based on Linux being moved toward DPDK that need simple
>> > > >>operations like  getting the mac address.  Some of these stacks
>>have
>> > > >>a dearth of resources  available and dedicating a core/thread to
>>KNI
>> > > >>to get/set a mac address  is considered excessive. There are also
>> > > >>issues with 32/64 bit kernel  integration  using KNI.  If the
>> > > >>ethtool interface is not the correct interface then  please help
>>me
>> > > >>understand what should/could have been used. If ethtool is
>> > > >>considered 'old  and clunky?  Stephen's and your input would be
>> > > >>valuable in designing another interface  with  similar properties.
>> > > >>The use-case is pretty simple and there is no plans  for moving
>> > > >>anything back into the kernel on the contrary its the complete
>>opposite.
>> > > >>
>> > > >> ? Andy
>> > > >
>> > > >We have DPDK API's to do this, and any added wrappers make it
>>bigger.
>> > > >I don't see why calling your ethtool API is better than calling
>> > > >rte_eth* API.
>> > > >
>> > > >If there is a missing functionality in the rte_ethXXX api's for an
>> > > >application then add that. For example: rte_eth_mac_addr_get()
>> > >
>> > > I am getting somewhat confused by your latest comments.  Your first
>> > > email (referenced below) looked really positive and I found your
>> > > suggestions useful. Your latest post appears to contradict this and
>> > > now the interface was there all the time.  The wrapper fa?ade
>>provided
>> > > by the ethtool library provide a clean separation of concerns and
>>will
>> > > allow people to migrate from not only KNI but in our case from a
>> > > legacy system.  If a software stack has requirements to work with
>> > > multiple IO abstractions then the ethtool approach is attractive. I
>> > > would speculate that many other stacks moving towards dpdk will have
>> > similar issues.
>> > >
>> > > Summarizing, for our use-cases the ethtool interface facilitated our
>> > > adoption to dpdk while allowing us to support our legacy IO
>>abstractions.
>> > 
>> > Stephen and me say the same thing about using the ethdev API.
>> > We don't understand why using a fake ethtool lib would be easier.
>> > Though you are saying it "facilitated [your] adoption to dpdk".
>> > Please could you explain why using an ethtool-like API is easier than
>>using
>> > the existing ethdev API?
>> > In any case, you have to develop a specific backend for DPDK
>>(rte_ethtool
>> > would be also DPDK-specific).
>> 
>> As described earlier in this patch comment reply, there are other
>>ethtool ops that have been implemented.
>> Those ops includes set/get eeprom, set/get pauseparam, set/get
>>ringparam which are not available in the exiting ethdev library.
>> For this release, we focus on releasing some basic functions (btw,
>>mac_addr_set is not available but is covered by this patch).
>> The key reason that this set of library is not released as part of
>>ethdev is the ethtool API dependency on kernel include file.
>> To faithfully carry the ethtool ops and net dev ops API parameters, the
>>ethtool APIs are designed to follow the original definition except
>>avoiding carry kernel states.
>> With that, to support ethtool APIs faithfully, we need to include
>>.
>> As suggested by many DPDK veterans including Thomas (indicated over
>>your reply), you would prefer these APIs in a separate library.
>> 
>> > 
>> > It seems you already started to use such an ethtool implementation.
>> > Please note that our goal is not to prevent Cisco from upstreaming
>>(evidence
>> > with enic driver integration) but we want to guide you, and others
>>having the
>> > same needs, to the best solution for everybody.
>> > That's why we need to understand what we (or you) are missing.
>> > Maybe that it would be clearer with some code examples (which would
>>go in
>> > the lib documentation if any).
>> > 
>> > Thanks
>
>How about doing this work as a sample application initially, to
>demonstrate how
>an application written using ethtool APIs could be shimmed to use DPDK
>underneath.
>The ethtool to dpdk mapping could be contained in a single header file
>(or header
>and c file) inside the sample app. This would allow easy re-use of the
>shim
>layer, while at the same time not making it part of the core DPDK

[dpdk-dev] KNI performance

2015-06-05 Thread Marc Sune



On 05/06/15 17:06, Jay Rolette wrote:
> The past few days I've been trying to chase down why operations over KNI
> are so bloody slow. To give you an idea how bad it is, we did a simple test
> over an NFS mount:
>
> # Mount over a non-KNI interface (eth0 on vanilla Ubuntu 14.04 LTS)
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
> real11m58.224s
> user0m10.758s
> sys 0m25.050s
>
> # Reboot to make sure NFS cache is cleared and mount over a KNI interface
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
> real87m36.295s
> user0m14.552s
> sys 0m25.949s
>
> Packet captures showed a pretty consistent ~4ms delay. Get a READDIRPLUS
> reply from NFS server and the TCP stack on the DPDK/KNI system took about
> 4ms to ACK the reply. It isn't just on ACK packets either. If there was no
> ACK required, there would be a 4ms delay before the next call was sent
> (ACCESS, LOOKUP, another READDIR, etc.).
>
> This is running on top of a real DPDK app, so there are various queues and
> ring-buffers in the path between KNI and the wire, so I started there. Long
> story short, worst case, those could only inject ~120us of latency into the
> path.
>
> Next stop was KNI itself. Ignoring a few minor optos I found, nothing in
> the code looked like it could account for 4ms of latency. That wasn't quite
> right though...
>
> Here's the code for the processing loop in kni_thread_single():
>
>  while (!kthread_should_stop()) {
>  down_read(_list_lock);
>  for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
>  list_for_each_entry(dev, _list_head, list) {
> #ifdef RTE_KNI_VHOST
>  kni_chk_vhost_rx(dev);
> #else
>  kni_net_rx(dev);
> #endif
>  kni_net_poll_resp(dev);
>  }
>  }
>  up_read(_list_lock);
>  /* reschedule out for a while */
>  schedule_timeout_interruptible(usecs_to_jiffies( \
>  KNI_KTHREAD_RESCHEDULE_INTERVAL));
>
> Turns out the 4ms delay is due to the schedule_timeout() call. The code
> specifies a 5us sleep, but the call only guarantees a sleep of *at least*
> the time specified.
>
> The resolution of the sleep is controlled by the timer interrupt rate. If
> you are using a kernel from one of the usual Linux distros, HZ = 250 on
> x86. That works out nicely to a 4ms period. The KNI kernel thread was going
> to sleep and frequently not getting woken up for nearly 4ms.
>
> We rebuilt the kernel with HZ = 1000 and things improved considerably:
>
> # Mount over a KNI interface, HZ=1000
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
>
> real21m8.478s
> user0m13.824s
> sys 0m18.113s
>
> Still not where I'd like to get it, but much, much better.
>
> Hopefully my pain is your gain and this helps other KNI users.

Jay,

If you set CONFIG_RTE_KNI_PREEMPT_DEFAULT to 'n' you should see a 
reduced latency and delay since there is no preemption (though 
sacrifices 1 CPU for the kni kmod):

http://patchwork.dpdk.org/dev/patchwork/patch/3304/

However, KNI is still pretty slow. Even considering that there will 
always be at least 1 copy involved, I still think is too slow. I didn't 
had time to look closer yet.

Marc



>
> Jay

[dpdk-dev] add support for HTM lock elision for x86

2015-06-05 Thread Roman Dementiev

Hello Stephen,

Wednesday, June 3, 2015, 8:40:14 PM, you wrote:

> On Tue,  2 Jun 2015 15:11:30 +0200
> Roman Dementiev  wrote:

>> 
>> This series of patches adds methods that use hardware memory transactions 
>> (HTM)
>> on fast-path for DPDK locks (a.k.a. lock elision). Here the methods are 
>> implemented 
>> for x86 using Restricted Transactional Memory instructions (Intel(r) 
>> Transactional 
>> Synchronization Extensions). The implementation fall-backs to the normal 
>> DPDK lock
>> if HTM is not available or memory transactions fail.
>> This is not a replacement for all lock usages since not all critical 
>> sections protected
>> by locks are friendly to HTM.
>> 

> You probably want to put a caveat around this, it won't work for people
> that expect to use spinlocks to protect I/O operations on hardware.
> Since I/O operations aren't like memory.

yes, I/O  can not  be  rolled  back by the CPU if the transaction should fail. 
Thus
the  HTM  transaction  protecting I/O operations are always aborted by
CPU. In Intel TSX the I/O operations (MMIO, outp, etc) are TSX-unfriendly
causing immediate abort.

-- 
Best regards,
 Romanmailto:roman.dementiev at intel.com

Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052

[dpdk-dev] [PATCH 3/3] fm10k: Fix improper max queue number for VF

2015-06-05 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Both PF and VF shared code in function fm10k_stats_get().
The function works well with PF, but has problem with VF since
VF has less queues than PF.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3792df6..2c819e5 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -829,7 +829,7 @@ fm10k_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)

ipackets = opackets = ibytes = obytes = 0;
for (i = 0; (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) &&
-   (i < FM10K_MAX_QUEUES_PF); ++i) {
+   (i < hw->mac.max_queues); ++i) {
stats->q_ipackets[i] = hw_stats->q[i].rx_packets.count;
stats->q_opackets[i] = hw_stats->q[i].tx_packets.count;
stats->q_ibytes[i]   = hw_stats->q[i].rx_bytes.count;
-- 
1.7.7.6

[dpdk-dev] [PATCH 2/3] fm10k: remove mbuf size sanity check

2015-06-05 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Original implementation required mbuf size should be greater than
ETHER_MAX_VLAN_FRAME_LEN, which is not necessary. If it's less
than that value, scatter function will be selected and incoming
packets greater than mbuf size will be filled into several mbufs.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 9274ca3..3792df6 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1005,9 +1005,7 @@ handle_rxconf(struct fm10k_rx_queue *q, const struct 
rte_eth_rxconf *conf)
  *  2. Address is 8B aligned and buffer does not cross 4K boundary.
  *
  * As such, the driver may need to adjust the DMA address within the
- * buffer by up to 512B. The mempool element size is checked here
- * to make sure a maximally sized Ethernet frame can still be wholly
- * contained within the buffer after 512B alignment.
+ * buffer by up to 512B.
  *
  * return 1 if the element size is valid, otherwise return 0.
  */
@@ -1027,9 +1025,6 @@ mempool_element_size_valid(struct rte_mempool *mp)
if (min_size > mp->elt_size)
return 0;

-   if (min_size < ETHER_MAX_VLAN_FRAME_LEN)
-   return 0;
-
/* size is valid */
return 1;
 }
-- 
1.7.7.6

[dpdk-dev] [PATCH 1/3] fm10k: Add promiscuous mode support

2015-06-05 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add functions to support promiscuous/allmulticast enable and
disable.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |  118 +-
 1 files changed, 117 insertions(+), 1 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 87852ed..9274ca3 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -51,6 +51,11 @@
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
+static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
+static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
+static void fm10k_dev_allmulticast_enable(struct rte_eth_dev *dev);
+static void fm10k_dev_allmulticast_disable(struct rte_eth_dev *dev);
+static inline int fm10k_glort_valid(struct fm10k_hw *hw);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -566,6 +571,113 @@ fm10k_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
return 0;
 }

+static inline int fm10k_glort_valid(struct fm10k_hw *hw)
+{
+   return ((hw->mac.dglort_map & FM10K_DGLORTMAP_NONE)
+   != FM10K_DGLORTMAP_NONE);
+}
+
+static void
+fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int status;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* Return if it didn't acquire valid glort range */
+   if (!fm10k_glort_valid(hw))
+   return;
+
+   fm10k_mbx_lock(hw);
+   status = hw->mac.ops.update_xcast_mode(hw, hw->mac.dglort_map,
+   FM10K_XCAST_MODE_PROMISC);
+   fm10k_mbx_unlock(hw);
+
+   if (status != FM10K_SUCCESS)
+   PMD_INIT_LOG(ERR, "Failed to enable promiscuous mode");
+}
+
+static void
+fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint8_t mode;
+   int status;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* Return if it didn't acquire valid glort range */
+   if (!fm10k_glort_valid(hw))
+   return;
+
+   if (dev->data->all_multicast == 1)
+   mode = FM10K_XCAST_MODE_ALLMULTI;
+   else
+   mode = FM10K_XCAST_MODE_NONE;
+
+   fm10k_mbx_lock(hw);
+   status = hw->mac.ops.update_xcast_mode(hw, hw->mac.dglort_map,
+   mode);
+   fm10k_mbx_unlock(hw);
+
+   if (status != FM10K_SUCCESS)
+   PMD_INIT_LOG(ERR, "Failed to disable promiscuous mode");
+}
+
+static void
+fm10k_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int status;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* Return if it didn't acquire valid glort range */
+   if (!fm10k_glort_valid(hw))
+   return;
+
+   /* If promiscuous mode is enabled, it doesn't make sense to enable
+* allmulticast and disable promiscuous since fm10k only can select
+* one of the modes.
+*/
+   if (dev->data->promiscuous)
+   return;
+
+   fm10k_mbx_lock(hw);
+   status = hw->mac.ops.update_xcast_mode(hw, hw->mac.dglort_map,
+   FM10K_XCAST_MODE_ALLMULTI);
+   fm10k_mbx_unlock(hw);
+
+   if (status != FM10K_SUCCESS)
+   PMD_INIT_LOG(ERR, "Failed to enable allmulticast mode");
+}
+
+static void
+fm10k_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int status;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* Return if it didn't acquire valid glort range */
+   if (!fm10k_glort_valid(hw))
+   return;
+
+   if (dev->data->promiscuous)
+   return;
+
+   fm10k_mbx_lock(hw);
+   /* Change mode to unicast mode */
+   status = hw->mac.ops.update_xcast_mode(hw, hw->mac.dglort_map,
+   FM10K_XCAST_MODE_NONE);
+   fm10k_mbx_unlock(hw);
+
+   if (status != FM10K_SUCCESS)
+   PMD_INIT_LOG(ERR, "Failed to disable allmulticast mode");
+}
+
 /* fls = find last set bit = 32 minus the number of leading zeros */
 #ifndef fls
 #define fls(x) (((x) == 0) ? 0 : (32 - __builtin_clz((x
@@ -1654,6 +1766,10 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.dev_start  = fm10k_dev_start,
.dev_stop   = fm10k_dev_stop,
.dev_close  = fm10k_dev_close,
+   .promiscuous_enable = fm10k_dev_promiscuous_enable,
+   .promiscuous_disable= fm10k_dev_promiscuous_disable,
+   .allmulticast_enable= fm10k_dev_allmulticast_enable,
+   .allmulticast_disable   = fm10k_dev_allmulticast_disable,
.stats_get

[dpdk-dev] [PATCH 0/3] fm10k: Add promiscuous mode support

2015-06-05 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

The patch set add promiscuous mode configuration and 2 bug fixes.

Chen Jing D(Mark) (3):
  fm10k: Add promiscuous mode support
  fm10k: remove mbuf size sanity check
  fm10k: Fix improper max queue number for VF

 drivers/net/fm10k/fm10k_ethdev.c |  127 +++---
 1 files changed, 119 insertions(+), 8 deletions(-)

-- 
1.7.7.6

[dpdk-dev] [PATCH] examples/distributor: fix missing "; " in debug macro

2015-06-05 Thread Bruce Richardson

The macro to turn on additional debug output when the app was compiled
with "-DDEBUG" was missing a ";".

Fixes: 07db4a975094 ("examples/distributor: new sample app")

Signed-off-by: Anbarasan Murugesan 
Signed-off-by: Bruce Richardson 
---
 examples/distributor/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index ae3e7b3..972bddb 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -57,7 +57,7 @@
 #ifdef DEBUG
 #define LOG_LEVEL RTE_LOG_DEBUG
 #define LOG_DEBUG(log_type, fmt, args...) do { \
-   RTE_LOG(DEBUG, log_type, fmt, ##args)   \
+   RTE_LOG(DEBUG, log_type, fmt, ##args);  \
 } while (0)
 #else
 #define LOG_LEVEL RTE_LOG_INFO
-- 
2.4.2

[dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros

2015-06-05 Thread Daniel Mrzyglod

Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
meta-data fields.
Forcing aligned accesses is not really required, so this is removing an
unneeded constraint.
This issue was met during testing of the new version of the ip_pipeline
application. There is no performance impact.
This change has no ABI impact, as the previous code that uses aligned
accesses continues to run without any issues.

Signed-off-by: Daniel Mrzyglod 
---
 lib/librte_pipeline/rte_pipeline.c  |  8 
 lib/librte_port/rte_port.h  | 26 +-
 lib/librte_table/rte_table_array.c  |  4 +---
 lib/librte_table/rte_table_hash_ext.c   | 13 -
 lib/librte_table/rte_table_hash_key16.c | 24 
 lib/librte_table/rte_table_hash_key32.c | 24 
 lib/librte_table/rte_table_hash_key8.c  | 24 
 lib/librte_table/rte_table_hash_lru.c   | 13 -
 lib/librte_table/rte_table_lpm.c|  4 
 lib/librte_table/rte_table_lpm_ipv6.c   |  4 
 10 files changed, 14 insertions(+), 130 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c 
b/lib/librte_pipeline/rte_pipeline.c
index 36d92c9..b777cf1 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -175,14 +175,6 @@ rte_pipeline_check_params(struct rte_pipeline_params 
*params)
return -EINVAL;
}

-   /* offset_port_id */
-   if (params->offset_port_id & 0x3) {
-   RTE_LOG(ERR, PIPELINE,
-   "%s: Incorrect value for parameter offset_port_id\n",
-   __func__);
-   return -EINVAL;
-   }
-
return 0;
 }

diff --git a/lib/librte_port/rte_port.h b/lib/librte_port/rte_port.h
index d84e5a1..c3a0cca 100644
--- a/lib/librte_port/rte_port.h
+++ b/lib/librte_port/rte_port.h
@@ -54,23 +54,23 @@ extern "C" {
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
  */
-#define RTE_MBUF_METADATA_UINT8(mbuf, offset)  \
-   (((uint8_t *)&(mbuf)[1])[offset])
-#define RTE_MBUF_METADATA_UINT16(mbuf, offset) \
-   (((uint16_t *)&(mbuf)[1])[offset/sizeof(uint16_t)])
-#define RTE_MBUF_METADATA_UINT32(mbuf, offset) \
-   (((uint32_t *)&(mbuf)[1])[offset/sizeof(uint32_t)])
-#define RTE_MBUF_METADATA_UINT64(mbuf, offset) \
-   (((uint64_t *)&(mbuf)[1])[offset/sizeof(uint64_t)])
-
 #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)  \
-   (_MBUF_METADATA_UINT8(mbuf, offset))
+   (&((uint8_t *) &(mbuf)[1])[offset])
 #define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) \
-   (_MBUF_METADATA_UINT16(mbuf, offset))
+   ((uint16_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) \
-   (_MBUF_METADATA_UINT32(mbuf, offset))
+   ((uint32_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) \
-   (_MBUF_METADATA_UINT64(mbuf, offset))
+   ((uint64_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+
+#define RTE_MBUF_METADATA_UINT8(mbuf, offset)  \
+   (* RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT16(mbuf, offset) \
+   (* RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT32(mbuf, offset) \
+   (* RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT64(mbuf, offset) \
+   (* RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset))
 /**@}*/

 /*
diff --git a/lib/librte_table/rte_table_array.c 
b/lib/librte_table/rte_table_array.c
index c031070..b00ca67 100644
--- a/lib/librte_table/rte_table_array.c
+++ b/lib/librte_table/rte_table_array.c
@@ -66,10 +66,8 @@ rte_table_array_create(void *params, int socket_id, uint32_t 
entry_size)
/* Check input parameters */
if ((p == NULL) ||
(p->n_entries == 0) ||
-   (!rte_is_power_of_2(p->n_entries)) ||
-   ((p->offset & 0x3) != 0)) {
+   (!rte_is_power_of_2(p->n_entries)))
return NULL;
-   }

/* Memory allocation */
total_cl_size = (sizeof(struct rte_table_array) +
diff --git a/lib/librte_table/rte_table_hash_ext.c 
b/lib/librte_table/rte_table_hash_ext.c
index 66e416b..73beeaf 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -149,19 +149,6 @@ check_params_create(struct rte_table_hash_ext_params 
*params)
return -EINVAL;
}

-   /* signature offset */
-   if ((params->signature_offset & 0x3) != 0) {
-   RTE_LOG(ERR, TABLE, "%s: signature_offset invalid value\n",
-   __func__);
-   return -EINVAL;
-   }
-
-   /* key offset */
-   if ((params->key_offset

[dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue

2015-06-05 Thread Cunming Liang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid 
ABI(unannounced) broken in v2.1.
The users should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang 
---
 v9 Acked-by: vincent jardin 

 drivers/net/e1000/igb_ethdev.c | 28 -
 drivers/net/ixgbe/ixgbe_ethdev.c   | 41 -
 examples/l3fwd-power/main.c|  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 12 
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +-
 lib/librte_ether/rte_ethdev.c  |  2 +
 lib/librte_ether/rte_ethdev.h  | 32 +-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct 
rte_eth_dev *dev,
uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
uint8_t index, uint8_t offset);
+#endif

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
struct e1000_hw *hw =
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct rte_intr_handle *intr_handle = >pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
uint32_t intr_vector = 0;
+#endif
int ret, mask;
uint32_t ctrl_ext;

@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
/* configure PF module if SRIOV enabled */
igb_pf_host_configure(dev);

+#ifdef RTE_EAL_RX_INTR
/* check and configure queue intr-vector mapping */
if (dev->data->dev_conf.intr_conf.rxq != 0)
intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
return -ENOMEM;
}
}
+#endif

/* confiugre msix for rx interrupt */
eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 " no intr multiplex\n");
}

+#ifdef RTE_EAL_RX_INTR
/* check if rxq interrupt is enabled */
if (dev->data->dev_conf.intr_conf.rxq != 0)
eth_igb_rxq_interrupt_setup(dev);
+#endif

/* enable uio/vfio intr/eventfd mapping */
rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
}
filter_info->twotuple_mask = 0;

+#ifdef RTE_EAL_RX_INTR
/* Clean datapath event and queue/vec mapping */
rte_intr_efd_disable(intr_handle);
if (intr_handle->intr_vec != NULL) {
rte_free(intr_handle->intr_vec);
intr_handle->intr_vec = NULL;
}
+#endif
 }

 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
struct rte_pci_device *pci_dev;
+#endif

eth_igb_stop(dev);
e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)

igb_dev_clear_queues(dev);

+#ifdef RTE_EAL_RX_INTR
pci_dev = dev->pci_dev;
if (pci_dev->intr_handle.intr_vec) {
rte_free(pci_dev->intr_handle.intr_vec);
pci_dev->intr_handle.intr_vec = NULL;
}
+#endif

memset(, 0, sizeof(link));
rte_igb_dev_atomic_write_link_status(dev, );
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
return

[dpdk-dev] [PATCH v11 12/13] l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch

2015-06-05 Thread Cunming Liang

Demonstrate how to handle per rx queue interrupt in a NAPI-like
implementation in usersapce. PDK polling thread mainly works in
polling mode and switch to interrupt mode only if there is no
any packet received in recent polls.
User space interrupt notification generally takes a lot more cycles
than kernel, so one-shot interrupt is used here to guarantee minimum
overhead and DPDK polling thread returns to polling mode immediately
once it receives an interrupt notification for incoming packet.

Signed-off-by: Danny Zhou 
Signed-off-by: Cunming Liang 
---
v7 changes
 - using new APIs
 - demo multiple port/queue pair wait on the same epoll instance

v6 changes
 - Split event fd add and wait

v5 changes
 - Change invoked function name and parameter to accomodate EAL change

v3 changes
 - Add spinlock to ensure thread safe when accessing interrupt mask
   register

v2 changes
 - Remove unused function which is for debug purpose

 examples/l3fwd-power/main.c | 207 +++-
 1 file changed, 165 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 6ac342b..538bb93 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -74,12 +74,14 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1

 #define MAX_PKT_BURST 32

-#define MIN_ZERO_POLL_COUNT 5
+#define MIN_ZERO_POLL_COUNT 10

 /* around 100ms at 2 Ghz */
 #define TIMER_RESOLUTION_CYCLES   2ULL
@@ -153,6 +155,9 @@ static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
 /* ethernet addresses of ports */
 static struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS];

+/* ethernet addresses of ports */
+static rte_spinlock_t locks[RTE_MAX_ETHPORTS];
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;
 /* Ports set in promiscuous mode off by default. */
@@ -185,6 +190,9 @@ struct lcore_rx_queue {
 #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS
 #define MAX_RX_QUEUE_PER_PORT 128

+#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16
+
+
 #define MAX_LCORE_PARAMS 1024
 struct lcore_params {
uint8_t port_id;
@@ -211,7 +219,7 @@ static uint16_t nb_lcore_params = 
sizeof(lcore_params_array_default) /

 static struct rte_eth_conf port_conf = {
.rxmode = {
-   .mq_mode= ETH_MQ_RX_RSS,
+   .mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = ETHER_MAX_LEN,
.split_hdr_size = 0,
.header_split   = 0, /**< Header Split disabled */
@@ -223,11 +231,15 @@ static struct rte_eth_conf port_conf = {
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
-   .rss_hf = ETH_RSS_IP,
+   .rss_hf = ETH_RSS_UDP,
},
},
.txmode = {
-   .mq_mode = ETH_DCB_NONE,
+   .mq_mode = ETH_MQ_TX_NONE,
+   },
+   .intr_conf = {
+   .lsc = 1,
+   .rxq = 1, /**< rxq interrupt feature enabled */
},
 };

@@ -399,19 +411,22 @@ power_timer_cb(__attribute__((unused)) struct rte_timer 
*tim,
/* accumulate total execution time in us when callback is invoked */
sleep_time_ratio = (float)(stats[lcore_id].sleep_time) /
(float)SCALING_PERIOD;
-
/**
 * check whether need to scale down frequency a step if it sleep a lot.
 */
-   if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD)
-   rte_power_freq_down(lcore_id);
+   if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) {
+   if (rte_power_freq_down)
+   rte_power_freq_down(lcore_id);
+   }
else if ( (unsigned)(stats[lcore_id].nb_rx_processed /
-   stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST)
+   stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) {
/**
 * scale down a step if average packet per iteration less
 * than expectation.
 */
-   rte_power_freq_down(lcore_id);
+   if (rte_power_freq_down)
+   rte_power_freq_down(lcore_id);
+   }

/**
 * initialize another timer according to current frequency to ensure
@@ -704,22 +719,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,

 }

-#define SLEEP_GEAR1_THRESHOLD100
-#define SLEEP_GEAR2_THRESHOLD1000
+#define MINIMUM_SLEEP_TIME 1
+#define SUSPEND_THRESHOLD  300

 static inline uint32_t
 power_idle_heuristic(uint32_t zero_rx_packet_count)
 {
-   /* If zero count is less than 100, use it as the sleep time in us */
-   if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD)
-   return zero_rx_packet_count;
-   /* If zero count is less than 1000, sleep time should be 100 us */
-   else if

[dpdk-dev] [PATCH v11 11/13] igb: enable rx queue interrupts for PF

2015-06-05 Thread Cunming Liang

The patch does below for igb PF:
- Setup NIC to generate MSI-X interrupts
- Set the IVAR register to map interrupt causes to vectors
- Implement interrupt enable/disable functions

Signed-off-by: Danny Zhou 
Signed-off-by: Cunming Liang 
---
v9 changes
 - move queue-vec mapping init from dev_configure to dev_start
 - fix link interrupt not working issue in vfio-msix

v8 changes
 - add vfio-msi/vfio-legacy and uio-legacy support

v7 changes
 - add condition check when intr vector is not enabled

v6 changes
 - fill queue-vector mapping table

v5 changes
 - Rebase the patchset onto the HEAD

v3 changes
 - Remove unnecessary variables in e1000_mac_info
 - Remove spinlok from PMD

v2 changes
 - Consolidate review comments related to coding style

 drivers/net/e1000/igb_ethdev.c | 285 -
 1 file changed, 252 insertions(+), 33 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..bbd7b74 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,6 +96,7 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -194,6 +195,16 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_op filter_op,
 void *arg);

+static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
+   uint16_t queue_id);
+static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
+   uint16_t queue_id);
+static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
+   uint8_t queue, uint8_t msix_vector);
+static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
+   uint8_t index, uint8_t offset);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -253,6 +264,8 @@ static const struct eth_dev_ops eth_igb_ops = {
.vlan_tpid_set= eth_igb_vlan_tpid_set,
.vlan_offload_set = eth_igb_vlan_offload_set,
.rx_queue_setup   = eth_igb_rx_queue_setup,
+   .rx_queue_intr_enable = eth_igb_rx_queue_intr_enable,
+   .rx_queue_intr_disable = eth_igb_rx_queue_intr_disable,
.rx_queue_release = eth_igb_rx_queue_release,
.rx_queue_count   = eth_igb_rx_queue_count,
.rx_descriptor_done   = eth_igb_rx_descriptor_done,
@@ -584,12 +597,6 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id);

-   rte_intr_callback_register(&(pci_dev->intr_handle),
-   eth_igb_interrupt_handler, (void *)eth_dev);
-
-   /* enable uio intr after callback register */
-   rte_intr_enable(&(pci_dev->intr_handle));
-
/* enable support intr */
igb_intr_enable(eth_dev);

@@ -752,7 +759,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 {
struct e1000_hw *hw =
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   int ret, i, mask;
+   struct rte_intr_handle *intr_handle = >pci_dev->intr_handle;
+   uint32_t intr_vector = 0;
+   int ret, mask;
uint32_t ctrl_ext;

PMD_INIT_FUNC_TRACE();
@@ -792,6 +801,27 @@ eth_igb_start(struct rte_eth_dev *dev)
/* configure PF module if SRIOV enabled */
igb_pf_host_configure(dev);

+   /* check and configure queue intr-vector mapping */
+   if (dev->data->dev_conf.intr_conf.rxq != 0)
+   intr_vector = dev->data->nb_rx_queues;
+
+   if (rte_intr_efd_enable(intr_handle, intr_vector))
+   return -1;
+
+   if (rte_intr_dp_is_en(intr_handle)) {
+   intr_handle->intr_vec =
+   rte_zmalloc("intr_vec",
+   dev->data->nb_rx_queues * sizeof(int), 0);
+   if (intr_handle->intr_vec == NULL) {
+   PMD_INIT_LOG(ERR, "Failed to allocate %d rx_queues"
+" intr_vec\n", dev->data->nb_rx_queues);
+   return -ENOMEM;
+   }
+   }
+
+   /* confiugre msix for rx interrupt */
+   eth_igb_configure_msix_intr(dev);
+
/* Configure for OS presence */
igb_init_manageability(hw);

@@ -819,33 +849,9 @@ eth_igb_start(struct rte_eth_dev *dev)

[dpdk-dev] [PATCH v11 10/13] ixgbe: enable rx queue interrupts for both PF and VF

2015-06-05 Thread Cunming Liang

The patch does below things for ixgbe PF and VF:
- Setup NIC to generate MSI-X interrupts
- Set the IVAR register to map interrupt causes to vectors
- Implement interrupt enable/disable functions

Signed-off-by: Danny Zhou 
Signed-off-by: Yong Liu 
Signed-off-by: Cunming Liang 
---
v10 changes
 - return an actual error code rather than -1

v9 changes
 - move queue-vec mapping init from dev_configure to dev_start

v8 changes
 - add vfio-msi/vfio-legacy and uio-legacy support

v7 changes
 - add condition check when intr vector is not enabled

v6 changes
 - fill queue-vector mapping table

v5 changes
 - Rebase the patchset onto the HEAD

v3 changes
 - Remove spinlok from PMD

v2 changes
 - Consolidate review comments related to coding style

 drivers/net/ixgbe/ixgbe_ethdev.c | 484 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h |   4 +
 2 files changed, 476 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..bcec971 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -82,6 +82,9 @@
  */
 #define IXGBE_FC_LO0x40

+/* Default minimum inter-interrupt interval for EITR configuration */
+#define IXGBE_MIN_INTER_INTERRUPT_INTERVAL_DEFAULT0x79E
+
 /* Timer value included in XOFF frames. */
 #define IXGBE_FC_PAUSE 0x680

@@ -171,6 +174,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -183,11 +187,14 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct 
ixgbe_dcb_config *dcb_conf

 /* For Virtual Function support */
 static int eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev);
+static int ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev);
+static int ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev);
 static int  ixgbevf_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbevf_dev_start(struct rte_eth_dev *dev);
 static void ixgbevf_dev_stop(struct rte_eth_dev *dev);
 static void ixgbevf_dev_close(struct rte_eth_dev *dev);
 static void ixgbevf_intr_disable(struct ixgbe_hw *hw);
+static void ixgbevf_intr_enable(struct ixgbe_hw *hw);
 static void ixgbevf_dev_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *stats);
 static void ixgbevf_dev_stats_reset(struct rte_eth_dev *dev);
@@ -197,6 +204,15 @@ static void ixgbevf_vlan_strip_queue_set(struct 
rte_eth_dev *dev,
uint16_t queue, int on);
 static void ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask);
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on);
+static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle,
+   void *param);
+static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
+   uint16_t queue_id);
+static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
+uint16_t queue_id);
+static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
+uint8_t queue, uint8_t msix_vector);
+static void ixgbevf_configure_msix(struct rte_eth_dev *dev);

 /* For Eth VMDQ APIs support */
 static int ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct
@@ -214,6 +230,14 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
 static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
uint8_t rule_id);

+static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
+   uint16_t queue_id);
+static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
+   uint16_t queue_id);
+static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
+   uint8_t queue, uint8_t msix_vector);
+static void ixgbe_configure_msix(struct rte_eth_dev *dev);
+
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
uint16_t queue_idx, uint16_t tx_rate);
 static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf,
@@ -262,7 +286,7 @@ static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, 
uint16_t mtu);
  */
 #define UPDATE_VF_STAT(reg, last, cur) \
 {   \
-   u32 latest = IXGBE_READ_REG(hw, reg);   \
+   uint32_t latest = IXGBE_READ_REG(hw, reg);   \
cur += latest - last;   \
last = latest;  \
 }
@@ -343,6 +367,8 @@

[dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions

2015-06-05 Thread Cunming Liang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or 
per queue rx intr event set.

Signed-off-by: Danny Zhou 
Signed-off-by: Cunming Liang 

fix by http://www.dpdk.org/dev/patchwork/patch/4784/
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c  | 107 +
 lib/librte_ether/rte_ethdev.h  | 104 
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..27a87f5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
}
rte_spinlock_unlock(_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+   uint32_t vec;
+   struct rte_eth_dev *dev;
+   struct rte_intr_handle *intr_handle;
+   uint16_t qid;
+   int rc;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+   intr_handle = >pci_dev->intr_handle;
+   if (!intr_handle->intr_vec) {
+   PMD_DEBUG_TRACE("RX Intr vector unset\n");
+   return -EPERM;
+   }
+
+   for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+   vec = intr_handle->intr_vec[qid];
+   rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+   if (rc && rc != -EEXIST) {
+   PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+   " op %d epfd %d vec %u\n",
+   port_id, qid, op, epfd, vec);
+   }
+   }
+
+   return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+ int epfd, int op, void *data)
+{
+   uint32_t vec;
+   struct rte_eth_dev *dev;
+   struct rte_intr_handle *intr_handle;
+   int rc;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+   if (queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+   return -EINVAL;
+   }
+
+   intr_handle = >pci_dev->intr_handle;
+   if (!intr_handle->intr_vec) {
+   PMD_DEBUG_TRACE("RX Intr vector unset\n");
+   return -EPERM;
+   }
+
+   vec = intr_handle->intr_vec[queue_id];
+   rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+   if (rc && rc != -EEXIST) {
+   PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+   " op %d epfd %d vec %u\n",
+   port_id, queue_id, op, epfd, vec);
+   return rc;
+   }
+
+   return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+  uint16_t queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+   return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+   uint16_t queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+   return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {

[dpdk-dev] [PATCH v11 08/13] eal/bsd: dummy for new intr definition

2015-06-05 Thread Cunming Liang

To make bsd compiling happy with new intr changes.

Signed-off-by: Cunming Liang 
---
v8 changes
 - add stub for new function

v7 changes
 - remove stub 'linux only' function from source file

 lib/librte_eal/bsdapp/eal/eal_interrupts.c | 19 ++
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   | 74 ++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map  |  5 ++
 3 files changed, 98 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_interrupts.c 
b/lib/librte_eal/bsdapp/eal/eal_interrupts.c
index cb7d4f1..ea85be3 100644
--- a/lib/librte_eal/bsdapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/bsdapp/eal/eal_interrupts.c
@@ -69,3 +69,22 @@ rte_eal_intr_init(void)
return 0;
 }

+int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+   int epfd, int op, unsigned int vec, void *data)
+{
+   return -ENOTSUP;
+}
+
+int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+   return 0;
+}
+
+void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+   return;
+}
+
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h 
b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index 87a9cf6..fc2c46b 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -49,6 +49,80 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
int fd;  /**< file descriptor */
enum rte_intr_handle_type type;  /**< handle type */
+   int max_intr;/**< max interrupt requested */
+   uint32_t nb_efd; /**< number of available efds */
+   int *intr_vec;   /**< intr vector number array */
 };

+/**
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {ADD, DEL}.
+ * @param vec
+ *   RX intr vector number added to the epoll instance wait list.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+   int epfd, int op, unsigned int vec, void *data);
+
+/**
+ * It enables the fastpath event fds if it's necessary.
+ * It creates event fds when multi-vectors allowed,
+ * otherwise it multiplexes the single event fds.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ * @param nb_vec
+ *   Number of intrrupt vector trying to enable.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+
+/**
+ * It disable the fastpath event fds.
+ * It deletes registered eventfds and closes the open fds.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+
+/**
+ * The fastpath interrupt is enabled or not.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+   return 0;
+}
+
+/**
+ * The interrupt handle instance allows other cause or not.
+ * Other cause stands for none fastpath interrupt.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+   return 1;
+}
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 67b6a6c..a74671b 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -53,8 +53,13 @@ DPDK_2.0 {
rte_hexdump;
rte_intr_callback_register;
rte_intr_callback_unregister;
+   rte_intr_allow_others;
rte_intr_disable;
+   rte_intr_dp_is_en;
+   rte_intr_efd_enable;
+   rte_intr_efd_disable;
rte_intr_enable;
+   rte_intr_rx_ctl;
rte_log;
rte_log_add_in_history;
rte_log_cur_msg_loglevel;
-- 
1.8.1.4

[dpdk-dev] [PATCH v11 07/13] eal/linux: fix lsc read error in uio_pci_generic

2015-06-05 Thread Cunming Liang

The new UIO generic handle type was introduced by patch.
http://dpdk.org/ml/archives/dev/2015-April/017008.html
When using uio_pci_generic and turning on lsc interrupt, it complains fd read 
error.
The root cause is the 'count' size of read is not correct.

Reported-by: Yong Liu 
Signed-off-by: Cunming Liang 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 4b191ae..300ebb1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -678,6 +678,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int 
nfds)
/* set the length to be read dor different handle type */
switch (src->intr_handle.type) {
case RTE_INTR_HANDLE_UIO:
+   case RTE_INTR_HANDLE_UIO_INTX:
bytes_read = sizeof(buf.uio_intr_count);
break;
case RTE_INTR_HANDLE_ALARM:
-- 
1.8.1.4

[dpdk-dev] [PATCH v11 06/13] eal/linux: standalone intr event fd create support

2015-06-05 Thread Cunming Liang

The patch exposes intr event fd create and release for PMD.
The device driver can assign the number of event associated with interrupt 
vector.
It also provides misc functions to check 1) allows other slowpath intr(e.g. 
lsc);
2) intr event on fastpath is enabled or not.

Signed-off-by: Cunming Liang 
---
v11 changes
 - typo cleanup

 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 57 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 51 +++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|  4 ++
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 1dfead5..4b191ae 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -68,6 +69,7 @@
 #include "eal_vfio.h"

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
+#define NB_OTHER_INTR   1

 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */

@@ -1110,3 +1112,58 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int 
epfd,

return rc;
 }
+
+int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+   uint32_t i;
+   int fd;
+   uint32_t n = RTE_MIN(nb_efd, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
+
+   if (intr_handle->type == RTE_INTR_HANDLE_VFIO_MSIX) {
+   for (i = 0; i < n; i++) {
+   fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL,
+   "cannot setup eventfd,"
+   "error %i (%s)\n",
+   errno, strerror(errno));
+   return -1;
+   }
+   intr_handle->efds[i] = fd;
+   }
+   intr_handle->nb_efd   = n;
+   intr_handle->max_intr = NB_OTHER_INTR + n;
+   } else {
+   intr_handle->efds[0]  = intr_handle->fd;
+   intr_handle->nb_efd   = RTE_MIN(nb_efd, 1U);
+   intr_handle->max_intr = NB_OTHER_INTR;
+   }
+
+   return 0;
+}
+
+void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+   uint32_t i;
+   struct rte_epoll_event *rev;
+
+   for (i = 0; i < intr_handle->nb_efd; i++) {
+   rev = _handle->elist[i];
+   if (rev->status == RTE_EPOLL_INVALID)
+   continue;
+   if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
+   /* force free if the entry valid */
+   eal_epoll_data_safe_free(rev);
+   rev->status = RTE_EPOLL_INVALID;
+   }
+   }
+
+   if (intr_handle->max_intr > intr_handle->nb_efd) {
+   for (i = 0; i < intr_handle->nb_efd; i++)
+   close(intr_handle->efds[i]);
+   }
+   intr_handle->nb_efd = 0;
+   intr_handle->max_intr = 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 3e93a27..912cc50 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -166,4 +166,55 @@ int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
int epfd, int op, unsigned int vec, void *data);

+/**
+ * It enables the fastpath event fds if it's necessary.
+ * It creates event fds when multi-vectors allowed,
+ * otherwise it multiplexes the single event fds.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ * @param nb_vec
+ *   Number of interrupt vector trying to enable.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+
+/**
+ * It disable the fastpath event fds.
+ * It deletes registered eventfds and closes the open fds.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+
+/**
+ * The fastpath interrupt is enabled or not.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+   return !(!intr_handle->nb_efd);
+}
+
+/**
+ * The interrupt handle instance allows other cause or not.
+ * Other cause stands for none fastpath interrupt.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+   return !!(intr_handle->max_intr - intr_handle->nb_efd);
+}
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git

[dpdk-dev] [PATCH v11 05/13] eal/linux: add interrupt vectors handling on VFIO

2015-06-05 Thread Cunming Liang

This patch does below:
 - Create VFIO eventfds for each interrupt vector (move to next)
 - Assign per interrupt vector's eventfd to VFIO by ioctl

Signed-off-by: Danny Zhou 
Signed-off-by: Cunming Liang 
---
v8 changes
 - move eventfd creation out of the setup_interrupts to a standalone function

v7 changes
 - cleanup unnecessary code change
 - split event and intr operation to other patches

 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 50 
 1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 4499055..1dfead5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -128,6 +128,9 @@ static pthread_t intr_thread;
 #ifdef VFIO_PRESENT

 #define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+/* irq set buffer length for queue interrupts and LSC interrupt */
+#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \
+ sizeof(int) * (RTE_MAX_RXTX_INTR_VEC_ID + 1))

 /* enable legacy (INTx) interrupts */
 static int
@@ -245,23 +248,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
intr_handle->fd);
return -1;
}
-
-   /* manually trigger interrupt to enable it */
-   memset(irq_set, 0, len);
-   len = sizeof(struct vfio_irq_set);
-   irq_set->argsz = len;
-   irq_set->count = 1;
-   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
-   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
-   irq_set->start = 0;
-
-   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
-
-   if (ret) {
-   RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
-   intr_handle->fd);
-   return -1;
-   }
return 0;
 }

@@ -294,7 +280,7 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) {
 static int
 vfio_enable_msix(struct rte_intr_handle *intr_handle) {
int len, ret;
-   char irq_set_buf[IRQ_SET_BUF_LEN];
+   char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
struct vfio_irq_set *irq_set;
int *fd_ptr;

@@ -302,12 +288,18 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {

irq_set = (struct vfio_irq_set *) irq_set_buf;
irq_set->argsz = len;
-   irq_set->count = 1;
+   if (!intr_handle->max_intr)
+   intr_handle->max_intr = 1;
+   else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
+   intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
+
+   irq_set->count = intr_handle->max_intr;
irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
irq_set->start = 0;
fd_ptr = (int *) _set->data;
-   *fd_ptr = intr_handle->fd;
+   memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
+   fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;

ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);

@@ -317,22 +309,6 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
return -1;
}

-   /* manually trigger interrupt to enable it */
-   memset(irq_set, 0, len);
-   len = sizeof(struct vfio_irq_set);
-   irq_set->argsz = len;
-   irq_set->count = 1;
-   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
-   irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
-   irq_set->start = 0;
-
-   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
-
-   if (ret) {
-   RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd 
%d\n",
-   intr_handle->fd);
-   return -1;
-   }
return 0;
 }

@@ -340,7 +316,7 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 static int
 vfio_disable_msix(struct rte_intr_handle *intr_handle) {
struct vfio_irq_set *irq_set;
-   char irq_set_buf[IRQ_SET_BUF_LEN];
+   char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
int len, ret;

len = sizeof(struct vfio_irq_set);
-- 
1.8.1.4

[dpdk-dev] [PATCH v11 04/13] eal/linux: fix comments typo on vfio msi

2015-06-05 Thread Cunming Liang


Signed-off-by: Cunming Liang 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 08dc2ab..4499055 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -219,7 +219,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) {
return 0;
 }

-/* enable MSI-X interrupts */
+/* enable MSI interrupts */
 static int
 vfio_enable_msi(struct rte_intr_handle *intr_handle) {
int len, ret;
@@ -265,7 +265,7 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
return 0;
 }

-/* disable MSI-X interrupts */
+/* disable MSI interrupts */
 static int
 vfio_disable_msi(struct rte_intr_handle *intr_handle) {
struct vfio_irq_set *irq_set;
-- 
1.8.1.4

[dpdk-dev] [PATCH v11 03/13] eal/linux: add API to set rx interrupt event monitor

2015-06-05 Thread Cunming Liang

The patch adds 'rte_intr_rx_ctl' to add or delete interrupt vector events 
monitor on specified epoll instance.

Signed-off-by: Cunming Liang 
---
v10 changes:
 - add RTE_INTR_HANDLE_UIO_INTX for uio_pci_generic 

v8 changes
 - fix EWOULDBLOCK and EINTR processing
 - add event status check

v7 changes
 - rename rte_intr_rx_set to rte_intr_rx_ctl.
 - rte_intr_rx_ctl uses rte_epoll_ctl to register epoll event instance.
 - the intr rx event instance includes a intr process callback.

v6 changes
 - split rte_intr_wait_rx_pkt into two function, wait and set.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set to remove queue visibility on eal.
 - rte_intr_rx_wait to support multiplexing.
 - allow epfd as input to support flexible event fd combination.

 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 101 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  20 
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|   1 +
 3 files changed, 122 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index dc327a4..08dc2ab 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -897,6 +897,50 @@ rte_eal_intr_init(void)
return -ret;
 }

+static void
+eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
+{
+   union rte_intr_read_buffer buf;
+   int bytes_read = 1;
+
+   switch (intr_handle->type) {
+   case RTE_INTR_HANDLE_UIO:
+   case RTE_INTR_HANDLE_UIO_INTX:
+   bytes_read = sizeof(buf.uio_intr_count);
+   break;
+#ifdef VFIO_PRESENT
+   case RTE_INTR_HANDLE_VFIO_MSIX:
+   case RTE_INTR_HANDLE_VFIO_MSI:
+   case RTE_INTR_HANDLE_VFIO_LEGACY:
+   bytes_read = sizeof(buf.vfio_intr_count);
+   break;
+#endif
+   default:
+   bytes_read = 1;
+   RTE_LOG(INFO, EAL, "unexpected intr type\n");
+   break;
+   }
+
+   /**
+* read out to clear the ready-to-be-read flag
+* for epoll_wait.
+*/
+   do {
+   bytes_read = read(fd, , bytes_read);
+   if (bytes_read < 0) {
+   if (errno == EINTR || errno == EWOULDBLOCK ||
+   errno == EAGAIN)
+   continue;
+   RTE_LOG(ERR, EAL, "Error reading from file "
+   "descriptor %d: %s\n", fd,
+   strerror(errno));
+   } else if (bytes_read == 0)
+   RTE_LOG(ERR, EAL, "Read nothing from file "
+   "descriptor %d\n", fd);
+   return;
+   } while (1);
+}
+
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
struct rte_epoll_event *events)
@@ -1033,3 +1077,60 @@ rte_epoll_ctl(int epfd, int op, int fd,

return 0;
 }
+
+int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
+   int op, unsigned int vec, void *data)
+{
+   struct rte_epoll_event *rev;
+   struct rte_epoll_data *epdata;
+   int epfd_op;
+   int rc = 0;
+
+   if (!intr_handle || intr_handle->nb_efd == 0 ||
+   vec >= intr_handle->nb_efd) {
+   RTE_LOG(ERR, EAL, "Wrong intr vector number.\n");
+   return -EPERM;
+   }
+
+   switch (op) {
+   case RTE_INTR_EVENT_ADD:
+   epfd_op = EPOLL_CTL_ADD;
+   rev = _handle->elist[vec];
+   if (rev->status != RTE_EPOLL_INVALID) {
+   RTE_LOG(INFO, EAL, "Event already been added.\n");
+   return -EEXIST;
+   }
+
+   /* attach to intr vector fd */
+   epdata = >epdata;
+   epdata->event  = EPOLLIN | EPOLLPRI | EPOLLET;
+   epdata->data   = data;
+   epdata->cb_fun = (rte_intr_event_cb_t)eal_intr_proc_rxtx_intr;
+   epdata->cb_arg = (void *)intr_handle;
+   rc = rte_epoll_ctl(epfd, epfd_op, intr_handle->efds[vec], rev);
+   if (!rc)
+   RTE_LOG(DEBUG, EAL, "eventfd %d associated with vec %d"
+   " is added on epfd %d\n", rev->fd, vec, epfd);
+   else
+   rc = -EPERM;
+   break;
+   case RTE_INTR_EVENT_DEL:
+   epfd_op = EPOLL_CTL_DEL;
+   rev = _handle->elist[vec];
+   if (rev->status == RTE_EPOLL_INVALID) {
+   RTE_LOG(INFO, EAL, "Event does not exist.\n");
+   return -EPERM;
+   }
+
+   rc = rte_epoll_ctl(rev->epfd, epfd_op, rev->fd, rev);
+   if (rc)
+   rc = -EPERM;
+   break;
+   default:
+   RTE_LOG(ERR, EAL, "event op type mismatch\n");
+

[dpdk-dev] [PATCH v11 02/13] eal/linux: add rte_epoll_wait/ctl support

2015-06-05 Thread Cunming Liang

The patch adds 'rte_epoll_wait' and 'rte_epoll_ctl' for async event wakeup.
It defines 'struct rte_epoll_event' as the event param.
The 'op' uses the same enum as epoll_wait/ctl does.
The epoll event support to carry a raw user data and to register a callback 
which is executed during wakeup.

Signed-off-by: Cunming Liang 
---
v11 changes
 - cleanup spelling error

v9 changes
 - rework on coding style

v8 changes
 - support delete event in safety during the wakeup execution
 - add EINTR process during epoll_wait

v7 changes
 - split v6[4/8] into two patches, one for epoll event(this one)
   another for rx intr(next patch)
 - introduce rte_epoll_event definition
 - rte_epoll_wait/ctl for more generic RTE epoll API

v6 changes
 - split rte_intr_wait_rx_pkt into two function, wait and set.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set to remove queue visibility on eal.
 - rte_intr_rx_wait to support multiplexing.
 - allow epfd as input to support flexible event fd combination.

 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 138 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  82 +++-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|   3 +
 3 files changed, 220 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 3a84b3c..dc327a4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -69,6 +69,8 @@

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)

+static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
+
 /**
  * union for pipe fds.
  */
@@ -895,3 +897,139 @@ rte_eal_intr_init(void)
return -ret;
 }

+static int
+eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
+   struct rte_epoll_event *events)
+{
+   unsigned int i, count = 0;
+   struct rte_epoll_event *rev;
+
+   for (i = 0; i < n; i++) {
+   rev = evs[i].data.ptr;
+   if (!rev || !rte_atomic32_cmpset(>status, RTE_EPOLL_VALID,
+RTE_EPOLL_EXEC))
+   continue;
+
+   events[count].status= RTE_EPOLL_VALID;
+   events[count].fd= rev->fd;
+   events[count].epfd  = rev->epfd;
+   events[count].epdata.event  = rev->epdata.event;
+   events[count].epdata.data   = rev->epdata.data;
+   if (rev->epdata.cb_fun)
+   rev->epdata.cb_fun(rev->fd,
+  rev->epdata.cb_arg);
+
+   rte_compiler_barrier();
+   rev->status = RTE_EPOLL_VALID;
+   count++;
+   }
+   return count;
+}
+
+static inline int
+eal_init_tls_epfd(void)
+{
+   int pfd = epoll_create(255);
+
+   if (pfd < 0) {
+   RTE_LOG(ERR, EAL,
+   "Cannot create epoll instance\n");
+   return -1;
+   }
+   return pfd;
+}
+
+int
+rte_intr_tls_epfd(void)
+{
+   if (RTE_PER_LCORE(_epfd) == -1)
+   RTE_PER_LCORE(_epfd) = eal_init_tls_epfd();
+
+   return RTE_PER_LCORE(_epfd);
+}
+
+int
+rte_epoll_wait(int epfd, struct rte_epoll_event *events,
+  int maxevents, int timeout)
+{
+   struct epoll_event evs[maxevents];
+   int rc;
+
+   if (!events) {
+   RTE_LOG(ERR, EAL, "rte_epoll_event can't be NULL\n");
+   return -1;
+   }
+
+   /* using per thread epoll fd */
+   if (epfd == RTE_EPOLL_PER_THREAD)
+   epfd = rte_intr_tls_epfd();
+
+   while (1) {
+   rc = epoll_wait(epfd, evs, maxevents, timeout);
+   if (likely(rc > 0)) {
+   /* epoll_wait has at least one fd ready to read */
+   rc = eal_epoll_process_event(evs, rc, events);
+   break;
+   } else if (rc < 0) {
+   if (errno == EINTR)
+   continue;
+   /* epoll_wait fail */
+   RTE_LOG(ERR, EAL, "epoll_wait returns with fail %s\n",
+   strerror(errno));
+   rc = -1;
+   break;
+   }
+   }
+
+   return rc;
+}
+
+static inline void
+eal_epoll_data_safe_free(struct rte_epoll_event *ev)
+{
+   while (!rte_atomic32_cmpset(>status, RTE_EPOLL_VALID,
+   RTE_EPOLL_INVALID))
+   while (ev->status != RTE_EPOLL_VALID)
+   rte_pause();
+   memset(>epdata, 0, sizeof(ev->epdata));
+   ev->fd = -1;
+   ev->epfd = -1;
+}
+
+int
+rte_epoll_ctl(int epfd, int op, int fd,
+ struct rte_epoll_event *event)
+{
+   struct epoll_event ev;
+
+   if (!event) {
+   RTE_LOG(ERR, EAL, "rte_epoll_event can't be NULL\n");
+

[dpdk-dev] [PATCH v11 01/13] eal/linux: add interrupt vectors support in intr_handle

2015-06-05 Thread Cunming Liang

The patch adds interrupt vectors support in rte_intr_handle.
'vec_en' is set when interrupt vectors are detected and associated event fds 
are set.
Those event fds are stored in efds[].
'intr_vec' is reserved for device driver to initialize the vector mapping table.
When the event fds add to a specified epoll instance, 'eptrs' will hold the 
rte_epoll_event object pointer.

Signed-off-by: Danny Zhou 
Signed-off-by: Cunming Liang 
---
v7 changes:
 - add eptrs[], it's used to store the register rte_epoll_event instances.
 - add vec_en, to log the vector capability status.

v6 changes:
 - add mapping table between irq vector number and queue id.

v5 changes:
 - Create this new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect.

 lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index bdeb3fc..9c86a15 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,8 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_

+#define RTE_MAX_RXTX_INTR_VEC_ID 32
+
 enum rte_intr_handle_type {
RTE_INTR_HANDLE_UNKNOWN = 0,
RTE_INTR_HANDLE_UIO,  /**< uio device handle */
@@ -49,6 +51,8 @@ enum rte_intr_handle_type {
RTE_INTR_HANDLE_MAX
 };

+struct rte_epoll_event;
+
 /** Handle for interrupts. */
 struct rte_intr_handle {
union {
@@ -58,6 +62,12 @@ struct rte_intr_handle {
};
int fd;  /**< interrupt event file descriptor */
enum rte_intr_handle_type type;  /**< handle type */
+   uint32_t max_intr;   /**< max interrupt requested */
+   uint32_t nb_efd; /**< number of available efds */
+   int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
+   struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
+/**< intr vector epoll event ptr */
+   int *intr_vec;   /**< intr vector number array */
 };

 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
1.8.1.4

[dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD

2015-06-05 Thread Cunming Liang

v11 changes
 - typo cleanup and check kernel style 

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments

v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.

The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.

DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.

This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF 
only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (13):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c | 311 ++--
 drivers/net/ixgbe/ixgbe_ethdev.c   | 519 -
 drivers/net/ixgbe/ixgbe_ethdev.h   |   4 +
 examples/l3fwd-power/main.c| 206 ++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c |  19 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 
 lib/librte_eal/bsdapp/eal/rte_eal_version.map  |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 361 --
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|   8 +
 lib/librte_ether/rte_ethdev.c  | 109 +
 lib/librte_ether/rte_ethdev.h  | 132 ++
 lib/librte_ether/rte_ether_version.map |   4 +
 13 files

[dpdk-dev] [PATCH 4/4] app/testpmd: refactor ieee1588 forwarding

2015-06-05 Thread John McNamara

Refactor the ieee1588_fwd mode in testpmd to use the new ethdev
APIs to enable and read IEEE1588 PTP timestamps.

Signed-off-by: John McNamara 
---
 app/test-pmd/ieee1588fwd.c | 443 ++---
 1 file changed, 13 insertions(+), 430 deletions(-)

diff --git a/app/test-pmd/ieee1588fwd.c b/app/test-pmd/ieee1588fwd.c
index 84237c1..b5c4f5a 100644
--- a/app/test-pmd/ieee1588fwd.c
+++ b/app/test-pmd/ieee1588fwd.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -31,39 +31,9 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 

-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 

 #include "testpmd.h"

@@ -77,6 +47,7 @@ struct ptpv2_msg {
uint8_t version; /**< must be 0x02 */
uint8_t unused[34];
 };
+
 #define PTP_SYNC_MESSAGE0x0
 #define PTP_DELAY_REQ_MESSAGE   0x1
 #define PTP_PATH_DELAY_REQ_MESSAGE  0x2
@@ -108,393 +79,18 @@ struct ptpv2_msg {
  * is greater than the previous one.
  */

-/*
- * 1GbE 82576 Kawela registers used for IEEE1588 hardware support
- */
-#define IGBE_82576_ETQF(n) (0x05CB0 + (4 * (n)))
-#define IGBE_82576_ETQF_FILTER_ENABLE  (1 << 26)
-#define IGBE_82576_ETQF_1588_TIMESTAMP (1 << 30)
-
-#define IGBE_82576_TSYNCRXCTL  0x0B620
-#define IGBE_82576_TSYNCRXCTL_RXTS_ENABLE (1 << 4)
-
-#define IGBE_82576_RXSTMPL 0x0B624
-#define IGBE_82576_RXSTMPH 0x0B628
-#define IGBE_82576_RXSATRL 0x0B62C
-#define IGBE_82576_RXSATRH 0x0B630
-#define IGBE_82576_TSYNCTXCTL  0x0B614
-#define IGBE_82576_TSYNCTXCTL_TXTS_ENABLE (1 << 4)
-
-#define IGBE_82576_TXSTMPL 0x0B618
-#define IGBE_82576_TXSTMPH 0x0B61C
-#define IGBE_82576_SYSTIML 0x0B600
-#define IGBE_82576_SYSTIMH 0x0B604
-#define IGBE_82576_TIMINCA 0x0B608
-#define IGBE_82576_TIMADJL 0x0B60C
-#define IGBE_82576_TIMADJH 0x0B610
-#define IGBE_82576_TSAUXC  0x0B640
-#define IGBE_82576_TRGTTIML0   0x0B644
-#define IGBE_82576_TRGTTIMH0   0x0B648
-#define IGBE_82576_TRGTTIML1   0x0B64C
-#define IGBE_82576_TRGTTIMH1   0x0B650
-#define IGBE_82576_AUXSTMPL0   0x0B65C
-#define IGBE_82576_AUXSTMPH0   0x0B660
-#define IGBE_82576_AUXSTMPL1   0x0B664
-#define IGBE_82576_AUXSTMPH1   0x0B668
-#define IGBE_82576_TSYNCRXCFG  0x05F50
-#define IGBE_82576_TSSDP   0x0003C
-
-/*
- * 10GbE 82599 Niantic registers used for IEEE1588 hardware support
- */
-#define IXGBE_82599_ETQF(n) (0x05128 + (4 * (n)))
-#define IXGBE_82599_ETQF_FILTER_ENABLE  (1 << 31)
-#define IXGBE_82599_ETQF_1588_TIMESTAMP (1 << 30)
-
-#define IXGBE_82599_TSYNCRXCTL 0x05188
-#define IXGBE_82599_TSYNCRXCTL_RXTS_ENABLE (1 << 4)
-
-#define IXGBE_82599_RXSTMPL0x051E8
-#define IXGBE_82599_RXSTMPH0x051A4
-#define IXGBE_82599_RXSATRL0x051A0
-#define IXGBE_82599_RXSATRH0x051A8
-#define IXGBE_82599_RXMTRL 0x05120
-#define IXGBE_82599_TSYNCTXCTL 0x08C00
-#define IXGBE_82599_TSYNCTXCTL_TXTS_ENABLE (1 << 4)
-
-#define IXGBE_82599_TXSTMPL0x08C04
-#define IXGBE_82599_TXSTMPH0x08C08
-#define IXGBE_82599_SYSTIML0x08C0C
-#define IXGBE_82599_SYSTIMH0x08C10
-#define IXGBE_82599_TIMINCA0x08C14
-#define IXGBE_82599_TIMADJL0x08C18
-#define IXGBE_82599_TIMADJH0x08C1C
-#define IXGBE_82599_TSAUXC 0x08C20
-#define IXGBE_82599_TRGTTIML0  0x08C24
-#define IXGBE_82599_TRGTTIMH0  0x08C28
-#define IXGBE_82599_TRGTTIML1  0x08C2C
-#define IXGBE_82599_TRGTTIMH1  0x08C30
-#define IXGBE_82599_AUXSTMPL0  0x08C3C
-#define IXGBE_82599_AUXSTMPH0  0x08C40
-#define IXGBE_82599_AUXSTMPL1  0x08C44
-#define IXGBE_82599_AUXSTMPH1  0x08C48
-
-/**
- * Mandatory ETQF register for IEEE1588 packets filter.
- */
-#define ETQF_FILTER_1588_REG 3
-
-/**
- * Recommended value for increment and period of
- * the Increment Attribute Register.
- */
-#define IEEE1588_TIMINCA_INIT ((0x02 << 24) | 0x00F42400)
-
-/**
- * Data structure with pointers to port-specific functions.
- */
-typedef void (*ieee1588_start_t)(portid_t pi); /**< Start IEEE1588 feature. */
-typedef void (*ieee1588_stop_t)(portid_t pi);  /**< Stop IEEE1588 feature.  */
-typedef int  (*tmst_read_t)(portid_t pi, uint64_t *tmst); /**< Read TMST regs 
*/
-
-struct port_ieee1588_ops {
-   ieee1588_start_t ieee1588_start;
-   ieee1588_stop_t  ieee1588_stop;
-   tmst_read_t  rx_tmst_read;
-   tmst_read_t  tx_tmst_read;
-};
-
-/**
- * 1GbE 82576 IEEE1588 operations.
- */
-static void
-igbe_82576_ieee1588_start(portid_t pi)
-{
-   uint32_t tsync_ctl;

[dpdk-dev] [PATCH 3/4] ixgbe: add support for ieee1588 timestamping

2015-06-05 Thread John McNamara

Add ixgbe support for new ethdev APIs to enable and read IEEE1588
PTP timestamps.

Signed-off-by: John McNamara 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 118 +++
 1 file changed, 118 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..dc72011 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -116,6 +116,12 @@

 #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / 
sizeof(hw_stats->qprc[0]))

+/* IEEE1588 additional values. */
+#define IXGBE_TIMINCA_16NS_SHIFT 24
+#define IXGBE_TIMINCA_INCVALUE   1600
+#define IXGBE_TIMINCA_INIT   ((0x02 << IXGBE_TIMINCA_16NS_SHIFT) \
+ | IXGBE_TIMINCA_INCVALUE)
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
@@ -257,6 +263,13 @@ static int ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
 void *arg);
 static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);

+static int ixgbe_ieee1588_enable(struct rte_eth_dev *dev);
+static int ixgbe_ieee1588_disable(struct rte_eth_dev *dev);
+static int ixgbe_ieee1588_read_rx_timestamp(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+static int ixgbe_ieee1588_read_tx_timestamp(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -381,6 +394,10 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.rss_hash_update  = ixgbe_dev_rss_hash_update,
.rss_hash_conf_get= ixgbe_dev_rss_hash_conf_get,
.filter_ctrl  = ixgbe_dev_filter_ctrl,
+   .ieee1588_enable= ixgbe_ieee1588_enable,
+   .ieee1588_disable   = ixgbe_ieee1588_disable,
+   .ieee1588_read_rx_timestamp = ixgbe_ieee1588_read_rx_timestamp,
+   .ieee1588_read_tx_timestamp = ixgbe_ieee1588_read_tx_timestamp,
 };

 /*
@@ -4439,6 +4456,107 @@ ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
return ret;
 }

+static int
+ixgbe_ieee1588_enable(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_ctl;
+
+   /* Start incrementing the System Time registers used to timestamp PTP
+* packets.
+*/
+   IXGBE_WRITE_REG(hw, IXGBE_TIMINCA, IXGBE_TIMINCA_INIT);
+
+   /* Enable L2 filtering of IEEE1588 Ethernet frame types. */
+   IXGBE_WRITE_REG(hw, IXGBE_ETQF(IXGBE_ETQF_FILTER_1588),
+   (ETHER_TYPE_1588 |
+IXGBE_ETQF_FILTER_EN |
+IXGBE_ETQF_1588));
+
+   /* Enable timestamping of received PTP packets. */
+   tsync_ctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
+   tsync_ctl |= IXGBE_TSYNCRXCTL_ENABLED;
+
+   IXGBE_WRITE_REG(hw, IXGBE_TSYNCRXCTL, tsync_ctl);
+
+   /* Enable Timestamping of transmitted PTP packets. */
+   tsync_ctl = IXGBE_READ_REG(hw, IXGBE_TSYNCTXCTL);
+   tsync_ctl |= IXGBE_TSYNCTXCTL_ENABLED;
+
+   IXGBE_WRITE_REG(hw, IXGBE_TSYNCTXCTL, tsync_ctl);
+
+   return 0;
+}
+
+static int
+ixgbe_ieee1588_disable(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_ctl;
+
+   /* Disable Timestamping of transmitted PTP packets. */
+   tsync_ctl = IXGBE_READ_REG(hw, IXGBE_TSYNCTXCTL);
+   tsync_ctl &= ~IXGBE_TSYNCTXCTL_ENABLED;
+
+   IXGBE_WRITE_REG(hw, IXGBE_TSYNCTXCTL, tsync_ctl);
+
+   /* Disable timestamping of received PTP packets. */
+   tsync_ctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
+   tsync_ctl &= ~IXGBE_TSYNCRXCTL_ENABLED;
+
+   IXGBE_WRITE_REG(hw, IXGBE_TSYNCRXCTL, tsync_ctl);
+
+   /* Disable L2 filtering of IEEE1588 Ethernet frame types. */
+   IXGBE_WRITE_REG(hw, IXGBE_ETQF(IXGBE_ETQF_FILTER_1588), 0);
+
+   /* Stop incrementating the System Time registers. */
+   IXGBE_WRITE_REG(hw, IXGBE_TIMINCA, 0);
+
+   return 0;
+}
+
+static int
+ixgbe_ieee1588_read_rx_timestamp(struct rte_eth_dev *dev,
+struct timespec *timestamp)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_rxctl;
+   uint32_t rx_stmpl;
+   uint32_t rx_stmph;
+
+   tsync_rxctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
+   if ((tsync_rxctl & 0x01) == 0)
+   return -EINVAL;
+
+   rx_stmpl = IXGBE_READ_REG(hw, IXGBE_RXSTMPL);
+   rx_stmph = IXGBE_READ_REG(hw, IXGBE_RXSTMPH);
+   timestamp->tv_sec = (uint64_t)(((uint64_t)rx_stmph << 32) | rx_stmpl);
+   timestamp->tv_nsec = 0;
+
+   return  0;
+}
+
+static int
+ixgbe_ieee1588_read_tx_timestamp(struct rte_eth_dev *dev,
+

[dpdk-dev] [PATCH 2/4] e1000: add support for ieee1588 timestamping

2015-06-05 Thread John McNamara

Add e1000/igb support for new ethdev APIs to enable and read
IEEE1588 PTP timestamps.

Signed-off-by: John McNamara 
---
 drivers/net/e1000/igb_ethdev.c | 118 +
 1 file changed, 118 insertions(+)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..f4e5527 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -74,6 +74,12 @@
 #define IGB_8_BIT_WIDTH  CHAR_BIT
 #define IGB_8_BIT_MASK   UINT8_MAX

+/* IEEE1588 additional values. */
+#define E1000_ETQF_FILTER_1588 3
+#define E1000_TIMINCA_INCVALUE   1600
+#define E1000_TIMINCA_INIT   ((0x02 << E1000_TIMINCA_16NS_SHIFT) \
+ | E1000_TIMINCA_INCVALUE)
+
 static int  eth_igb_configure(struct rte_eth_dev *dev);
 static int  eth_igb_start(struct rte_eth_dev *dev);
 static void eth_igb_stop(struct rte_eth_dev *dev);
@@ -194,6 +200,13 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_op filter_op,
 void *arg);

+static int igb_ieee1588_enable(struct rte_eth_dev *dev);
+static int igb_ieee1588_disable(struct rte_eth_dev *dev);
+static int igb_ieee1588_read_rx_timestamp(struct rte_eth_dev *dev,
+ struct timespec *timestamp);
+static int igb_ieee1588_read_tx_timestamp(struct rte_eth_dev *dev,
+ struct timespec *timestamp);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -269,6 +282,10 @@ static const struct eth_dev_ops eth_igb_ops = {
.rss_hash_update  = eth_igb_rss_hash_update,
.rss_hash_conf_get= eth_igb_rss_hash_conf_get,
.filter_ctrl  = eth_igb_filter_ctrl,
+   .ieee1588_enable= igb_ieee1588_enable,
+   .ieee1588_disable   = igb_ieee1588_disable,
+   .ieee1588_read_rx_timestamp = igb_ieee1588_read_rx_timestamp,
+   .ieee1588_read_tx_timestamp = igb_ieee1588_read_tx_timestamp,
 };

 /*
@@ -3642,6 +3659,107 @@ eth_igb_filter_ctrl(struct rte_eth_dev *dev,
return ret;
 }

+static int
+igb_ieee1588_enable(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_ctl;
+
+   /* Start incrementing the System Time registers used to timestamp PTP
+* packets.
+*/
+   E1000_WRITE_REG(hw, E1000_TIMINCA, E1000_TIMINCA_INIT);
+
+   /* Enable L2 filtering of IEEE1588 Ethernet frame types. */
+   E1000_WRITE_REG(hw, E1000_ETQF(E1000_ETQF_FILTER_1588),
+   (ETHER_TYPE_1588 |
+E1000_ETQF_FILTER_ENABLE |
+E1000_ETQF_1588));
+
+   /* Enable timestamping of received PTP packets. */
+   tsync_ctl = E1000_READ_REG(hw, E1000_TSYNCRXCTL);
+   tsync_ctl |= E1000_TSYNCRXCTL_ENABLED;
+
+   E1000_WRITE_REG(hw, E1000_TSYNCRXCTL, tsync_ctl);
+
+   /* Enable Timestamping of transmitted PTP packets. */
+   tsync_ctl = E1000_READ_REG(hw, E1000_TSYNCTXCTL);
+   tsync_ctl |= E1000_TSYNCTXCTL_ENABLED;
+
+   E1000_WRITE_REG(hw, E1000_TSYNCTXCTL, tsync_ctl);
+
+   return 0;
+}
+
+static int
+igb_ieee1588_disable(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_ctl;
+
+   /* Disable Timestamping of transmitted PTP packets. */
+   tsync_ctl = E1000_READ_REG(hw, E1000_TSYNCTXCTL);
+   tsync_ctl &= ~E1000_TSYNCTXCTL_ENABLED;
+
+   E1000_WRITE_REG(hw, E1000_TSYNCTXCTL, tsync_ctl);
+
+   /* Disable timestamping of received PTP packets. */
+   tsync_ctl = E1000_READ_REG(hw, E1000_TSYNCRXCTL);
+   tsync_ctl &= ~E1000_TSYNCRXCTL_ENABLED;
+
+   E1000_WRITE_REG(hw, E1000_TSYNCRXCTL, tsync_ctl);
+
+   /* Disable L2 filtering of IEEE1588 Ethernet frame types. */
+   E1000_WRITE_REG(hw, E1000_ETQF(E1000_ETQF_FILTER_1588), 0);
+
+   /* Stop incrementating the System Time registers. */
+   E1000_WRITE_REG(hw, E1000_TIMINCA, 0);
+
+   return 0;
+}
+
+static int
+igb_ieee1588_read_rx_timestamp(struct rte_eth_dev *dev,
+  struct timespec *timestamp)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tsync_rxctl;
+   uint32_t rx_stmpl;
+   uint32_t rx_stmph;
+
+   tsync_rxctl = E1000_READ_REG(hw, E1000_TSYNCRXCTL);
+   if ((tsync_rxctl & 0x01) == 0)
+   return -EINVAL;
+
+   rx_stmpl = E1000_READ_REG(hw, E1000_RXSTMPL);
+   rx_stmph = E1000_READ_REG(hw, E1000_RXSTMPH);
+   timestamp->tv_sec = (uint64_t)(((uint64_t)rx_stmph << 32) | rx_stmpl);
+   timestamp->tv_nsec = 0;
+
+   return  0;
+}
+
+static int
+igb_ieee1588_read_tx_timestamp(struct rte_eth_dev *dev,
+  struct timespec *timestamp)
+{
+   struct e1000_hw *hw =

[dpdk-dev] [PATCH 1/4] ethdev: add support for ieee1588 timestamping

2015-06-05 Thread John McNamara

Add ethdev API to enable and read IEEE1588 PTP timestamps from
nics that support it. The following functions are added:

rte_eth_ieee1588_enable()
rte_eth_ieee1588_disable()
rte_eth_ieee1588_read_rx_timestamp()
rte_eth_ieee1588_read_tx_timestamp()

Signed-off-by: John McNamara 
---
 lib/librte_ether/rte_ethdev.c  | 70 ++-
 lib/librte_ether/rte_ethdev.h  | 88 +-
 lib/librte_ether/rte_ether_version.map |  4 ++
 3 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..f85a1cd 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -3627,3 +3627,71 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t 
queue_id,
/* Callback wasn't found. */
return -EINVAL;
 }
+
+int
+rte_eth_ieee1588_enable(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->ieee1588_enable, -ENOTSUP);
+   return (*dev->dev_ops->ieee1588_enable)(dev);
+}
+
+int
+rte_eth_ieee1588_disable(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->ieee1588_disable, -ENOTSUP);
+   return (*dev->dev_ops->ieee1588_disable)(dev);
+}
+
+int
+rte_eth_ieee1588_read_rx_timestamp(uint8_t port_id, struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->ieee1588_read_rx_timestamp,
+   -ENOTSUP);
+   return (*dev->dev_ops->ieee1588_read_rx_timestamp)(dev, timestamp);
+}
+
+
+int
+rte_eth_ieee1588_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->ieee1588_read_tx_timestamp,
+   -ENOTSUP);
+   return (*dev->dev_ops->ieee1588_read_tx_timestamp)(dev, timestamp);
+}
+
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..abe9b1c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -1220,6 +1220,7 @@ typedef int (*eth_mirror_rule_reset_t)(struct rte_eth_dev 
*dev,
  uint8_t rule_id);
 /**< @internal Remove a traffic mirroring rule on an Ethernet device */

+
 typedef int (*eth_udp_tunnel_add_t)(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *tunnel_udp);
 /**< @internal Add tunneling UDP info */
@@ -1228,6 +1229,20 @@ typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev 
*dev,
struct rte_eth_udp_tunnel *tunnel_udp);
 /**< @internal Delete tunneling UDP info */

+typedef int (*eth_ieee1588_enable_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to enable IEEE1588 PTP timestamping. */
+
+typedef int (*eth_ieee1588_disable_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to disable IEEE1588 PTP timestamping. */
+
+typedef int (*eth_ieee1588_read_rx_timestamp_t)(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+/**< @internal Function used to read an RX IEEE1588 PTP timestamp. */
+
+typedef int (*eth_ieee1588_read_tx_timestamp_t)(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+/**< @internal Function used to read a TX IEEE1588 PTP timestamp. */
+

 #ifdef RTE_NIC_BYPASS

@@ -1386,6 +1401,16 @@ struct eth_dev_ops {
/** Get current RSS hash configuration. */
rss_hash_conf_get_t rss_hash_conf_get;

[dpdk-dev] [PATCH 0/4] ethdev: add support for ieee1588 timestamping

2015-06-05 Thread John McNamara

This patchset adds ethdev API to enable and read IEEE1588 PTP timestamps from
devices that support it. The following functions are added:

rte_eth_ieee1588_enable()
rte_eth_ieee1588_disable()
rte_eth_ieee1588_read_rx_timestamp()
rte_eth_ieee1588_read_tx_timestamp()

The "ieee1588" forwarding mode in testpmd is also refactored to demonstrate
the new API and to clean up the code.

Adds support for igb and ixgbe. Support for i40e will follow in V2.

I would be interested in getting feedback from maintainers of non-Intel pmds
on whether this interface is sufficient to initialise, read from, and stop,
IEEE1588 functionality on other devices.


John McNamara (4):
  ethdev: add support for ieee1588 timestamping
  e1000: add support for ieee1588 timestamping
  ixgbe: add support for ieee1588 timestamping
  app/testpmd: refactor ieee1588 forwarding

 app/test-pmd/ieee1588fwd.c | 443 +
 drivers/net/e1000/igb_ethdev.c | 118 +
 drivers/net/ixgbe/ixgbe_ethdev.c   | 118 +
 lib/librte_ether/rte_ethdev.c  |  70 +-
 lib/librte_ether/rte_ethdev.h  |  88 ++-
 lib/librte_ether/rte_ether_version.map |   4 +
 6 files changed, 409 insertions(+), 432 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 4/4] doc: modify the command about mirror in testpmd guide

2015-06-05 Thread Jingjing Wu

Signed-off-by: Jingjing Wu 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 761172e..2cd3461 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -772,9 +772,13 @@ set port (port_id) vf (vf_id) rate (rate_value) queue_mask 
(queue_mask)
 set port - mirror rule
 ~~

-Set port or vlan type mirror rule for a port.
+Set pool or vlan type mirror rule for a port:

-set port (port_id) mirror-rule (rule_id) (pool-mirror|vlan-mirror) 
(poolmask|vlanid[,vlanid]*) dst-pool (pool_id) (on|off)
+set port (port_id) mirror-rule (rule_id) 
(pool-mirror-up|pool-mirror-down|vlan-mirror) (poolmask|vlanid[,vlanid]*) 
dst-pool (pool_id) (on|off)
+
+Set link mirror rule for a port:
+
+set port (port_id) mirror-rule (rule_id) (uplink-mirror|downlink-mirror) 
dst-pool (pool_id) (on|off)

 For example to enable mirror traffic with vlan 0,1 to pool 0:

-- 
1.9.3

[dpdk-dev] [PATCH v3 3/4] i40e: enable mirror functionality in i40e driver

2015-06-05 Thread Jingjing Wu

enable mirror functionality in i40e driver
.mirror_rule_set
.mirror_rule_reset

Signed-off-by: Jingjing Wu 
---
 drivers/net/i40e/i40e_ethdev.c | 334 +
 drivers/net/i40e/i40e_ethdev.h |  23 +++
 2 files changed, 357 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..d0e83b5 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -211,6 +211,10 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);
 static void i40e_hw_init(struct i40e_hw *hw);
+static int i40e_mirror_rule_set(struct rte_eth_dev *dev,
+   struct rte_eth_mirror_conf *mirror_conf,
+   uint8_t sw_id, uint8_t on);
+static int i40e_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t sw_id);

 static const struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
@@ -262,6 +266,8 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.udp_tunnel_add   = i40e_dev_udp_tunnel_add,
.udp_tunnel_del   = i40e_dev_udp_tunnel_del,
.filter_ctrl  = i40e_dev_filter_ctrl,
+   .mirror_rule_set  = i40e_mirror_rule_set,
+   .mirror_rule_reset= i40e_mirror_rule_reset,
 };

 static struct eth_driver rte_i40e_pmd = {
@@ -563,6 +569,9 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
/* enable uio intr after callback register */
rte_intr_enable(&(pci_dev->intr_handle));

+   /* initialize mirror rule list */
+   TAILQ_INIT(>mirror_list);
+
return 0;

 err_mac_alloc:
@@ -929,6 +938,7 @@ i40e_dev_stop(struct rte_eth_dev *dev)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_vsi *main_vsi = pf->main_vsi;
+   struct i40e_mirror_rule *p_mirror;
int i;

/* Disable all queues */
@@ -953,6 +963,13 @@ i40e_dev_stop(struct rte_eth_dev *dev)
/* Set link down */
i40e_dev_set_link_down(dev);

+   /* Remove all mirror rules */
+   while ((p_mirror = TAILQ_FIRST(>mirror_list))) {
+   TAILQ_REMOVE(>mirror_list, p_mirror, rules);
+   rte_free(p_mirror);
+   }
+   pf->nb_mirror_rule = 0;
+
 }

 static void
@@ -5697,3 +5714,320 @@ i40e_configure_registers(struct i40e_hw *hw)
"0x%"PRIx32, reg_table[i].val, reg_table[i].addr);
}
 }
+
+/**
+ * i40e_aq_add_mirror_rule
+ * @hw: pointer to the hardware structure
+ * @seid: VEB seid to add mirror rule to
+ * @dst_id: destination vsi seid
+ * @entries: Buffer which contains the entities to be mirrored
+ * @count: number of entities contained in the buffer
+ * @rule_id:the rule_id of the rule to be added
+ *
+ * Add a mirror rule for a given veb.
+ *
+ **/
+static enum i40e_status_code
+i40e_aq_add_mirror_rule(struct i40e_hw *hw,
+   uint16_t seid, uint16_t dst_id,
+   uint16_t rule_type, uint16_t *entries,
+   uint16_t count, uint16_t *rule_id)
+{
+   struct i40e_aq_desc desc;
+   struct i40e_aqc_add_delete_mirror_rule *cmd =
+   (struct i40e_aqc_add_delete_mirror_rule *)
+   struct i40e_aqc_add_delete_mirror_rule_completion *resp =
+   (struct i40e_aqc_add_delete_mirror_rule_completion *)
+   
+   uint16_t buff_len;
+   enum i40e_status_code status;
+
+   i40e_fill_default_direct_cmd_desc(,
+ i40e_aqc_opc_add_mirror_rule);
+
+   buff_len = sizeof(uint16_t) * count;
+   desc.datalen = rte_cpu_to_le_16(buff_len);
+   if (buff_len > 0)
+   desc.flags |= rte_cpu_to_le_16(
+   (uint16_t)(I40E_AQ_FLAG_BUF | I40E_AQ_FLAG_RD));
+   cmd->rule_type = rte_cpu_to_le_16(rule_type <<
+   I40E_AQC_MIRROR_RULE_TYPE_SHIFT);
+   cmd->num_entries = rte_cpu_to_le_16(count);
+   cmd->seid = rte_cpu_to_le_16(seid);
+   cmd->destination = rte_cpu_to_le_16(dst_id);
+
+   status = i40e_asq_send_command(hw, , entries, buff_len, NULL);
+   PMD_DRV_LOG(INFO, "i40e_aq_add_mirror_rule, aq_status %d,"
+"rule_id = %u"
+" mirror_rules_used = %u, mirror_rules_free = %u,",
+hw->aq.asq_last_status, resp->rule_id,
+resp->mirror_rules_used, resp->mirror_rules_free);
+   *rule_id = rte_le_to_cpu_16(resp->rule_id);
+
+   return status;
+}
+
+/**
+ * i40e_aq_del_mirror_rule
+ * @hw: pointer to the hardware structure
+ * @seid: VEB seid to add mirror rule to
+ * @entries: Buffer which contains the entities to be mirrored
+ * @count: number of entities contained in the buffer
+ * @rule_id:the rule_id of the rule to be delete
+ *
+

[dpdk-dev] [PATCH v3 2/4] ethdev: redefine the mirror type

2015-06-05 Thread Jingjing Wu

This path renames the mirror type in rte_eth_mirror_conf and macros,
and rework the mirror set in ixgbe dirvers by using new definition.
It also fixes some coding style.

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c   | 42 +--
 drivers/net/ixgbe/ixgbe_ethdev.c | 53 ++--
 lib/librte_ether/rte_ethdev.c| 14 ---
 lib/librte_ether/rte_ethdev.h| 11 +
 4 files changed, 74 insertions(+), 46 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d693bde..6d4474b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -412,7 +412,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set rate limit for queues in VF of a port\n\n"

"set port (port_id) mirror-rule (rule_id)"
-   "(pool-mirror|vlan-mirror)\n"
+   " (pool-mirror-up|pool-mirror-down|vlan-mirror)"
" (poolmask|vlanid[,vlanid]*) dst-pool (pool_id) 
(on|off)\n"
"   Set pool or vlan type mirror rule on a port.\n"
"   e.g., 'set port 0 mirror-rule 0 vlan-mirror 0,1"
@@ -6583,7 +6583,8 @@ cmdline_parse_token_num_t cmd_mirror_mask_ruleid =
rule_id, UINT8);
 cmdline_parse_token_string_t cmd_mirror_mask_what =
TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result,
-   what, "pool-mirror#vlan-mirror");
+   what, "pool-mirror-up#pool-mirror-down"
+ "#vlan-mirror");
 cmdline_parse_token_string_t cmd_mirror_mask_value =
TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result,
value, NULL);
@@ -6612,13 +6613,16 @@ cmd_set_mirror_mask_parsed(void *parsed_result,

mr_conf.dst_pool = res->dstpool_id;

-   if (!strcmp(res->what, "pool-mirror")) {
-   mr_conf.pool_mask = strtoull(res->value,NULL,16);
-   mr_conf.rule_type_mask = ETH_VMDQ_POOL_MIRROR;
-   } else if(!strcmp(res->what, "vlan-mirror")) {
-   mr_conf.rule_type_mask = ETH_VMDQ_VLAN_MIRROR;
-   nb_item = parse_item_list(res->value, "core",
-   ETH_MIRROR_MAX_VLANS, vlan_list, 1);
+   if (!strcmp(res->what, "pool-mirror-up")) {
+   mr_conf.pool_mask = strtoull(res->value, NULL, 16);
+   mr_conf.rule_type = ETH_MIRROR_VIRTUAL_POOL_UP;
+   } else if (!strcmp(res->what, "pool-mirror-down")) {
+   mr_conf.pool_mask = strtoull(res->value, NULL, 16);
+   mr_conf.rule_type = ETH_MIRROR_VIRTUAL_POOL_DOWN;
+   } else if (!strcmp(res->what, "vlan-mirror")) {
+   mr_conf.rule_type = ETH_MIRROR_VLAN;
+   nb_item = parse_item_list(res->value, "vlan",
+   ETH_MIRROR_MAX_VLANS, vlan_list, 1);
if (nb_item <= 0)
return;

@@ -6633,21 +6637,21 @@ cmd_set_mirror_mask_parsed(void *parsed_result,
}
}

-   if(!strcmp(res->on, "on"))
+   if (!strcmp(res->on, "on"))
ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 1);
else
ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 0);
-   if(ret < 0)
+   if (ret < 0)
printf("mirror rule add error: (%s)\n", strerror(-ret));
 }

 cmdline_parse_inst_t cmd_set_mirror_mask = {
.f = cmd_set_mirror_mask_parsed,
.data = NULL,
-   .help_str = "set port X mirror-rule Y pool-mirror|vlan-mirror "
-   "pool_mask|vlan_id[,vlan_id]* dst-pool Z 
on|off",
+   .help_str = "set port X mirror-rule Y 
pool-mirror-up|pool-mirror-down|vlan-mirror"
+   " pool_mask|vlan_id[,vlan_id]* dst-pool Z on|off",
.tokens = {
(void *)_mirror_mask_set,
(void *)_mirror_mask_port,
@@ -6714,14 +6718,14 @@ cmd_set_mirror_link_parsed(void *parsed_result,
struct rte_eth_mirror_conf mr_conf;

memset(_conf, 0, sizeof(struct rte_eth_mirror_conf));
-   if(!strcmp(res->what, "uplink-mirror")) {
-   mr_conf.rule_type_mask = ETH_VMDQ_UPLINK_MIRROR;
-   }else if(!strcmp(res->what, "downlink-mirror"))
-   mr_conf.rule_type_mask = ETH_VMDQ_DOWNLIN_MIRROR;
+   if (!strcmp(res->what, "uplink-mirror"))
+   mr_conf.rule_type = ETH_MIRROR_UPLINK_PORT;
+   else
+   mr_conf.rule_type = ETH_MIRROR_DOWNLINK_PORT;

mr_conf.dst_pool = res->dstpool_id;

-   if(!strcmp(res->on, "on"))
+   if (!strcmp(res->on, "on"))
ret =

[dpdk-dev] [PATCH v3 1/4] ethdev: rename rte_eth_vmdq_mirror_conf

2015-06-05 Thread Jingjing Wu

rename rte_eth_vmdq_mirror_conf to rte_eth_mirror_conf and move
the maximum rule id check from ethdev level to driver

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c   | 22 +++---
 drivers/net/ixgbe/ixgbe_ethdev.c | 11 +++
 drivers/net/ixgbe/ixgbe_ethdev.h |  4 +++-
 lib/librte_ether/rte_ethdev.c| 18 ++
 lib/librte_ether/rte_ethdev.h| 19 ++-
 5 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f01db2a..d693bde 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -6604,11 +6604,11 @@ cmd_set_mirror_mask_parsed(void *parsed_result,
 {
int ret,nb_item,i;
struct cmd_set_mirror_mask_result *res = parsed_result;
-   struct rte_eth_vmdq_mirror_conf mr_conf;
+   struct rte_eth_mirror_conf mr_conf;

-   memset(_conf,0,sizeof(struct rte_eth_vmdq_mirror_conf));
+   memset(_conf, 0, sizeof(struct rte_eth_mirror_conf));

-   unsigned int vlan_list[ETH_VMDQ_MAX_VLAN_FILTERS];
+   unsigned int vlan_list[ETH_MIRROR_MAX_VLANS];

mr_conf.dst_pool = res->dstpool_id;

@@ -6618,11 +6618,11 @@ cmd_set_mirror_mask_parsed(void *parsed_result,
} else if(!strcmp(res->what, "vlan-mirror")) {
mr_conf.rule_type_mask = ETH_VMDQ_VLAN_MIRROR;
nb_item = parse_item_list(res->value, "core",
-   ETH_VMDQ_MAX_VLAN_FILTERS,vlan_list,1);
+   ETH_MIRROR_MAX_VLANS, vlan_list, 1);
if (nb_item <= 0)
return;

-   for(i=0; i < nb_item; i++) {
+   for (i = 0; i < nb_item; i++) {
if (vlan_list[i] > ETHER_MAX_VLAN_ID) {
printf("Invalid vlan_id: must be < 4096\n");
return;
@@ -6634,10 +6634,10 @@ cmd_set_mirror_mask_parsed(void *parsed_result,
}

if(!strcmp(res->on, "on"))
-   ret = rte_eth_mirror_rule_set(res->port_id,_conf,
+   ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 1);
else
-   ret = rte_eth_mirror_rule_set(res->port_id,_conf,
+   ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 0);
if(ret < 0)
printf("mirror rule add error: (%s)\n", strerror(-ret));
@@ -6711,9 +6711,9 @@ cmd_set_mirror_link_parsed(void *parsed_result,
 {
int ret;
struct cmd_set_mirror_link_result *res = parsed_result;
-   struct rte_eth_vmdq_mirror_conf mr_conf;
+   struct rte_eth_mirror_conf mr_conf;

-   memset(_conf,0,sizeof(struct rte_eth_vmdq_mirror_conf));
+   memset(_conf, 0, sizeof(struct rte_eth_mirror_conf));
if(!strcmp(res->what, "uplink-mirror")) {
mr_conf.rule_type_mask = ETH_VMDQ_UPLINK_MIRROR;
}else if(!strcmp(res->what, "downlink-mirror"))
@@ -6722,10 +6722,10 @@ cmd_set_mirror_link_parsed(void *parsed_result,
mr_conf.dst_pool = res->dstpool_id;

if(!strcmp(res->on, "on"))
-   ret = rte_eth_mirror_rule_set(res->port_id,_conf,
+   ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 1);
else
-   ret = rte_eth_mirror_rule_set(res->port_id,_conf,
+   ret = rte_eth_mirror_rule_set(res->port_id, _conf,
res->rule_id, 0);

/* check the return value and print it if is < 0 */
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..9e767fa 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -209,7 +209,7 @@ static int ixgbe_set_pool_tx(struct rte_eth_dev 
*dev,uint16_t pool,uint8_t on);
 static int ixgbe_set_pool_vlan_filter(struct rte_eth_dev *dev, uint16_t vlan,
uint64_t pool_mask,uint8_t vlan_on);
 static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
-   struct rte_eth_vmdq_mirror_conf *mirror_conf,
+   struct rte_eth_mirror_conf *mirror_conf,
uint8_t rule_id, uint8_t on);
 static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
uint8_t rule_id);
@@ -3388,7 +3388,7 @@ ixgbe_set_pool_vlan_filter(struct rte_eth_dev *dev, 
uint16_t vlan,

 static int
 ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
-   struct rte_eth_vmdq_mirror_conf *mirror_conf,
+   struct rte_eth_mirror_conf *mirror_conf,
uint8_t rule_id, uint8_t on)
 {
uint32_t mr_ctl,vlvf;
@@ -3412,7 +3412,10 @@ ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

if

[dpdk-dev] [PATCH v3 0/4] enable mirror functionality in i40e driver

2015-06-05 Thread Jingjing Wu

This patch set enables mirror functionality in i40e driver, and redefines 
structure and macros used to configure mirror.

v2 changes:
 - correct comments style
 - add doc change

v3 changes:
 - change the mirror rule type to support bit mask and avoid ABI broken
 - fix code style

Jingjing Wu (4):
  ethdev: rename rte_eth_vmdq_mirror_conf
  ethdev: redefine the mirror type
  i40e: enable mirror functionality in i40e driver
  doc: modify the command about mirror in testpmd guide

 app/test-pmd/cmdline.c  |  62 +++---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +-
 drivers/net/i40e/i40e_ethdev.c  | 334 
 drivers/net/i40e/i40e_ethdev.h  |  23 ++
 drivers/net/ixgbe/ixgbe_ethdev.c|  64 --
 drivers/net/ixgbe/ixgbe_ethdev.h|   4 +-
 lib/librte_ether/rte_ethdev.c   |  28 +--
 lib/librte_ether/rte_ethdev.h   |  30 +--
 8 files changed, 467 insertions(+), 86 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Andrew Harvey (agh)

On 6/5/15, 3:46 AM, "Thomas Monjalon"  wrote:

>2015-06-04 22:10, Andrew Harvey:
>> On 6/4/15, 7:58 AM, "Stephen Hemminger" 
>>wrote:
>> >"Andrew Harvey (agh)"  wrote:
>> >> I believe that their is value in this interface for software stacks
>>not
>> >> based on Linux being moved toward DPDK that need simple operations
>>like
>> >> getting the mac address.  Some of these stacks have a dearth of
>> >>resources
>> >> available and dedicating a core/thread to KNI to get/set a mac
>>address
>> >> is considered excessive. There are also issues with 32/64 bit kernel
>> >> integration
>> >> using KNI.  If the ethtool interface is not the correct interface
>>then
>> >> please help me
>> >> understand what should/could have been used. If ethtool is considered
>> >>'old
>> >> and clunky?
>> >> Stephen's and your input would be valuable in designing another
>> >>interface
>> >> with
>> >> similar properties.  The use-case is pretty simple and there is no
>>plans
>> >> for moving
>> >> anything back into the kernel on the contrary its the complete
>>opposite.
>> >> 
>> >> ? Andy
>> >
>> >We have DPDK API's to do this, and any added wrappers make it bigger.
>> >I don't see why calling your ethtool API is better than calling
>> >rte_eth* API.
>> >
>> >If there is a missing functionality in the rte_ethXXX api's for an
>> >application then add that. For example: rte_eth_mac_addr_get()
>> 
>> I am getting somewhat confused by your latest comments.  Your first
>>email
>> (referenced below) looked really positive and I found your suggestions
>> useful. Your latest post appears to contradict this and now the
>>interface
>> was there all the time.  The wrapper fa?ade provided by the ethtool
>> library provide a clean separation of concerns and will allow people to
>> migrate from not only KNI but in our case from a legacy system.  If a
>> software stack has requirements to work with multiple IO abstractions
>> then the ethtool approach is attractive. I would speculate that many
>> other stacks moving towards dpdk will have similar issues.
>> 
>> Summarizing, for our use-cases the ethtool interface facilitated our
>> adoption to dpdk while allowing us to support our legacy IO
>>abstractions.
>
>Stephen and me say the same thing about using the ethdev API.

And your would have a point would be valid if dpdk were available to every
interface we support (it is not) and on every processor architecture that
we support (it is not) and every OS we support (it is not).  So to
minimize entropy in the code why not leave the client code the same
ioctl(fd, ?) and hide the implementation
detail in a wrapper library.  We have a large legacy code base to move
forward and sprinkling special interest code like rte_xxx throughout every
client we have is not appropriate at this time.


>We don't understand why using a fake ethtool lib would be easier.
>Though you are saying it "facilitated [your] adoption to dpdk".
>Please could you explain why using an ethtool-like API is easier than
>using the existing ethdev API?
>In any case, you have to develop a specific backend for DPDK
>(rte_ethtool would be also DPDK-specific).
>
>It seems you already started to use such an ethtool implementation.
>Please note that our goal is not to prevent Cisco from upstreaming
>(evidence with enic driver integration) but we want to guide you, and
>others having the same needs, to the best solution for everybody.
>That's why we need to understand what we (or you) are missing.
>Maybe that it would be clearer with some code examples (which would
>go in the lib documentation if any).
>
>Thanks

[dpdk-dev] The use of --log-level and its default state

2015-06-05 Thread Thomas Monjalon

Keith, your mail is very long but it's maybe on purpose to show that
there are too many logs ;)

2015-06-05 12:32, Wiles, Keith:
> On 6/5/15, 5:00 AM, "Thomas Monjalon"  wrote:
> >2015-05-27 15:10, Wiles, Keith:
> >> I would like to have the log-level default changed to not log
> >>everything,
> >> but the user needs to enable the log messages if he needs to see more
> >> information. Normally applications or systems are not so verbose, but if
> >> needed the user enables the verbose or debug messages.
> >> 
> >> Can we change the default logs and messages to be non-verbose instead?
> >
> >Do you mean changing this line?
> >/* default value from build option */
> >internal_cfg->log_level = RTE_LOG_LEVEL;
> >It means using the most verbose level available in the build.
> >
> >Maybe we should set RTE_LOG_NOTICE or RTE_LOG_WARNING,
> >However, there is already --log-level for the user and rte_set_log_level()
> >for the application developper.
> >So this default log level is only used for DPDK trials and development.
> >Probably that being verbose is a good option for such cases?
> >
> The normal operation for most systems is no-news-is-good-news, meaning
> only report warnings and errors if someone wants informational output it
> should be enabled by the user. It seems PMDs and other parts of DPDK print
> out information which is not a warning or error, but are debug or
> informational messages that are not very useful. The debug information is
> more for the developer then a user or even a developer of DPDK as the
> debug information has nothing to do with the current developers goals.

This assumption is not fully true.
In normal applications DPDK doesn't show so many logs.
But in testpmd or examples, the log level is not set (by default) and they
behave as debug applications, which is probably a good default.
However, as you say below, the log level of some messages is not well tuned.

> We can change the RTE_LOG_LEVEL to a value that only prints warning and
> errors should always be printed. We can leave it or we can make that one
> change to reduce the amount of clutter on the screen. Some of the PMD
> information is printed out anytime the state changes, which effects the
> application screen output.
> 
> When I have to interact with users of DPDK they sometimes miss critical
> details in the output because of the sheer amount of output on the screen.
> Even most OSes try to output information to the screen is a sane way to
> allow someone to quickly spot a problem.
> 
> Here is a normal output with log level at the default:
> 
[...] 
> When I quit the application I get the four above are they really useful,
> not really IMO. The PMD information and the EAL information is not very
> use 99% of the time.

Yes it is a debug mode.

> Next is ?log-level=0 on the command line.
> 
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 4 on socket 0
> EAL: Detected lcore 5 as core 8 on socket 0
> EAL: Detected lcore 6 as core 9 on socket 0
> EAL: Detected lcore 7 as core 10 on socket 0
> EAL: Detected lcore 8 as core 11 on socket 0
> EAL: Detected lcore 9 as core 16 on socket 0
> EAL: Detected lcore 10 as core 17 on socket 0
> EAL: Detected lcore 11 as core 18 on socket 0
> EAL: Detected lcore 12 as core 19 on socket 0
> EAL: Detected lcore 13 as core 20 on socket 0
> EAL: Detected lcore 14 as core 24 on socket 0
> EAL: Detected lcore 15 as core 25 on socket 0
> EAL: Detected lcore 16 as core 26 on socket 0
> EAL: Detected lcore 17 as core 27 on socket 0
> EAL: Detected lcore 18 as core 0 on socket 1
> EAL: Detected lcore 19 as core 1 on socket 1
> EAL: Detected lcore 20 as core 2 on socket 1
> EAL: Detected lcore 21 as core 3 on socket 1
> EAL: Detected lcore 22 as core 4 on socket 1
> EAL: Detected lcore 23 as core 8 on socket 1
> EAL: Detected lcore 24 as core 9 on socket 1
> EAL: Detected lcore 25 as core 10 on socket 1
> EAL: Detected lcore 26 as core 11 on socket 1
> EAL: Detected lcore 27 as core 16 on socket 1
> EAL: Detected lcore 28 as core 17 on socket 1
> EAL: Detected lcore 29 as core 18 on socket 1
> EAL: Detected lcore 30 as core 19 on socket 1
> EAL: Detected lcore 31 as core 20 on socket 1
> EAL: Detected lcore 32 as core 24 on socket 1
> EAL: Detected lcore 33 as core 25 on socket 1
> EAL: Detected lcore 34 as core 26 on socket 1
> EAL: Detected lcore 35 as core 27 on socket 1
> EAL: Detected lcore 36 as core 0 on socket 0
> EAL: Detected lcore 37 as core 1 on socket 0
> EAL: Detected lcore 38 as core 2 on socket 0
> EAL: Detected lcore 39 as core 3 on socket 0
> EAL: Detected lcore 40 as core 4 on socket 0
> EAL: Detected lcore 41 as core 8 on socket 0
> EAL: Detected lcore 42 as core 9 on socket 0
> EAL: Detected lcore 43 as core 10 on socket 0
> EAL: Detected lcore 44 as core 11 on socket 0
> EAL: Detected lcore 45 as core

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Thomas Monjalon

2015-06-05 11:25, Wang, Liang-min:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Stephen and me say the same thing about using the ethdev API.
> > We don't understand why using a fake ethtool lib would be easier.
> > Though you are saying it "facilitated [your] adoption to dpdk".
> > Please could you explain why using an ethtool-like API is easier than using
> > the existing ethdev API?
> > In any case, you have to develop a specific backend for DPDK (rte_ethtool
> > would be also DPDK-specific).
> 
> As described earlier in this patch comment reply, there are other ethtool ops 
> that have been implemented.
> Those ops includes set/get eeprom, set/get pauseparam, set/get ringparam 
> which are not available in the exiting ethdev library.

1/ We cannot really consider code which is not public
2/ You may extend ethdev if some functions are missing

> For this release, we focus on releasing some basic functions (btw, 
> mac_addr_set is not available but is covered by this patch).

Yes, you are extending ethdev by adding rte_eth_dev_default_mac_addr_set.

> The key reason that this set of library is not released as part of ethdev is 
> the ethtool API dependency on kernel include file.

It is a good reason to separate the library.
But it doesn't justify its need.

> To faithfully carry the ethtool ops and net dev ops API parameters, the 
> ethtool APIs are designed to follow the original definition except avoiding 
> carry kernel states.
> With that, to support ethtool APIs faithfully, we need to include 
> . 
> As suggested by many DPDK veterans including Thomas (indicated over your 
> reply), you would prefer these APIs in a separate library.

I think I'm starting to understand that you really need ethtool conversion
(implemented in rte_ethtool_get_drvinfo) but not the other functions which
are simple wrappers. Right?

[dpdk-dev] [PATCH v1] abi: announce abi changes plan for interrupt mode

2015-06-05 Thread Cunming Liang

It announces the planned ABI changes for interrupt mode on v2.2. 
The feature will turn off by default so as to avoid v2.1 ABI broken.

Signed-off-by: Cunming Liang 
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..4c9bf85 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices

 Deprecation Notices
 ---
+* The ABI changes are planned for struct rte_intr_handle and struct 
rte_eth_conf in order to support interrupt mode feature. The upcoming release 
2.1 will not contain these ABI changes by default, but release 2.2 will, and no 
backwards compatibility is planed due to the additional interrupt mode feature 
enabling. Binaries using this library build prior to version 2.2 will require 
updating and recompilation.
-- 
1.8.1.4

[dpdk-dev] [PATCH 6/6] MAINTAINERS: claim responsability for hash library

2015-06-05 Thread Pablo de Lara

Signed-off-by: Pablo de Lara 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9362c19..189c41c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -339,6 +339,7 @@ F: doc/guides/sample_app_ug/l3_forward_access_ctrl.rst

 Hashes
 M: Bruce Richardson 
+M: Pablo de Lara 
 F: lib/librte_hash/
 F: doc/guides/prog_guide/hash_lib.rst
 F: app/test/test_hash*
-- 
2.4.2

[dpdk-dev] [PATCH 5/6] hash: add new functionality to store data in hash table

2015-06-05 Thread Pablo de Lara

Usually hash tables not only store keys, but also data associated
to them. In order to maintain the existing API,
the key index will still be returned when
looking up/deleting an entry, but user will be able
to store/look up data associated to a key.

Signed-off-by: Pablo de Lara 
---
 lib/librte_hash/rte_hash.c   | 284 ---
 lib/librte_hash/rte_hash.h   | 180 ++
 lib/librte_hash/rte_hash_version.map |   8 +
 3 files changed, 385 insertions(+), 87 deletions(-)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index 0b7f543..c1be7b0 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -177,7 +177,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
/* Total memory required for hash context */
mem_size = hash_struct_size + tbl_size;

-   key_entry_size = params->key_len;
+   key_entry_size = sizeof(struct rte_hash_key) + params->key_len;

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

@@ -335,7 +335,7 @@ run_cuckoo(const struct rte_hash *h, struct rte_hash_bucket 
*bkt, uint32_t key_i
/* idx = 0 if primary, 1 if secondary */
unsigned idx;
static unsigned number_pushes;
-   void *k, *keys = h->key_store;
+   struct rte_hash_key *k, *keys = h->key_store;
unsigned i, j;

uint64_t hash_stored;
@@ -358,8 +358,8 @@ run_cuckoo(const struct rte_hash *h, struct rte_hash_bucket 
*bkt, uint32_t key_i
 * is very likely that it has entered in a loop, need rehasing
 */
if (++number_pushes > 1 && hash == original_hash) {
-   k = (char *)keys + key_idx * h->key_entry_size;
-   if (!memcmp(k, original_key, h->key_len)) {
+   k = (struct rte_hash_key *) ((char *)keys + key_idx * 
h->key_entry_size);
+   if (!memcmp(k->key, original_key, h->key_len)) {
rte_ring_sp_enqueue(h->free_slots, (void 
*)((uintptr_t)key_idx));
number_pushes = 0;
/*
@@ -381,10 +381,11 @@ run_cuckoo(const struct rte_hash *h, struct 
rte_hash_bucket *bkt, uint32_t key_i
 */
idx = !(bkt->signatures[i] & (h->sig_secondary));
key_idx_stored = bkt->key_idx[i];
-   k = (char *)keys + key_idx_stored * h->key_entry_size;
+   k = (struct rte_hash_key *) ((char *)keys +
+   key_idx_stored * 
h->key_entry_size);

if (idx == 0)
-   hash_stored = rte_hash_hash(h, k);
+   hash_stored = rte_hash_hash(h, k->key);
else
hash_stored = 
rte_hash_secondary_hash(bkt->signatures[i]);

@@ -430,7 +431,7 @@ rte_hash_rehash(struct rte_hash *h, rte_hash_function 
hash_func,
uint32_t tbl_size, mem_size, bucket_size, hash_struct_size;
uint64_t hash;
struct rte_hash_bucket *bkt;
-   void *k;
+   struct rte_hash_key *k;
struct rte_hash *sec_h;
uint32_t bucket_idx;
int32_t ret;
@@ -458,16 +459,16 @@ rte_hash_rehash(struct rte_hash *h, rte_hash_function 
hash_func,
for (j = 0; j < RTE_HASH_BUCKET_ENTRIES; j++) {
/* Check if entry in bucket is not empty */
if (h->buckets[i].signatures[j] != NULL_SIGNATURE) {
-   k = (char *)h->key_store +
-   h->buckets[i].key_idx[j] * 
h->key_entry_size;
+   k = (struct rte_hash_key *) ((char 
*)h->key_store +
+   h->buckets[i].key_idx[j] * 
h->key_entry_size);
/* Get new hash (with new initial value) */
-   hash = rte_hash_hash(sec_h, k);
+   hash = rte_hash_hash(sec_h, k->key);
bucket_idx = hash & sec_h->bucket_bitmask;
hash |= sec_h->sig_msb;
bkt = _h->buckets[bucket_idx];
/* Add entry on secondary hash table */
ret = run_cuckoo(sec_h, bkt, 
h->buckets[i].key_idx[j],
-   hash, hash, k);
+   hash, hash, k->key);
if (ret == -EAGAIN)
goto exit;
num_entries++;
@@ -489,12 +490,12 @@ exit:

 static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
-   hash_sig_t sig)
+   hash_sig_t sig, uintptr_t data)
 {
uint64_t hash0, bucket_idx0, hash1, bucket_idx1;
unsigned i;
struct rte_hash_bucket

[dpdk-dev] [PATCH 4/6] hash: add new functions rte_hash_rehash and rte_hash_reset

2015-06-05 Thread Pablo de Lara

Added rehash function to be able to keep adding more entries,
when a key cannot be added, as a result of a loop
evicting entries infinitely.

Also added reset function to be able to empty the table,
without having to destroy and create it again.

Signed-off-by: Pablo de Lara 
---
 lib/librte_hash/rte_hash.c   | 91 
 lib/librte_hash/rte_hash.h   | 26 +++
 lib/librte_hash/rte_hash_version.map |  2 +
 3 files changed, 119 insertions(+)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index 9599413..0b7f543 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -301,6 +301,33 @@ rte_hash_free(struct rte_hash *h)
rte_free(te);
 }

+void
+rte_hash_reset(struct rte_hash *h)
+{
+   void *ptr;
+   unsigned i;
+   uint32_t num_buckets, bucket_size, tbl_size;
+
+   if (h == NULL)
+   return;
+
+   num_buckets = h->num_buckets;
+   bucket_size = align_size(sizeof(struct rte_hash_bucket), 
BUCKET_ALIGNMENT);
+   tbl_size = align_size(num_buckets * bucket_size,
+ RTE_CACHE_LINE_SIZE);
+
+   memset(h->buckets, 0, tbl_size);
+   memset(h->key_store, 0, h->key_entry_size * h->entries);
+
+   /* clear the free ring */
+   while (rte_ring_dequeue(h->free_slots, ) == 0)
+   rte_pause();
+
+   /* Repopulate the free slots ring. Entry zero is reserved for key 
misses */
+   for (i = 1; i < h->entries + 1; i++)
+   rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+}
+
 static inline int32_t
 run_cuckoo(const struct rte_hash *h, struct rte_hash_bucket *bkt, uint32_t 
key_idx,
uint64_t hash, uint64_t original_hash, const void *original_key)
@@ -396,6 +423,70 @@ run_cuckoo(const struct rte_hash *h, struct 
rte_hash_bucket *bkt, uint32_t key_i
original_hash, original_key);
 }

+int
+rte_hash_rehash(struct rte_hash *h, rte_hash_function hash_func,
+   uint32_t hash_func_init_val)
+{
+   uint32_t tbl_size, mem_size, bucket_size, hash_struct_size;
+   uint64_t hash;
+   struct rte_hash_bucket *bkt;
+   void *k;
+   struct rte_hash *sec_h;
+   uint32_t bucket_idx;
+   int32_t ret;
+   unsigned i, j;
+   unsigned num_entries = 0;
+
+   /* Create new table to reorganize the entries */
+   hash_struct_size = align_size(sizeof(struct rte_hash), 
RTE_CACHE_LINE_SIZE);
+   bucket_size = align_size(sizeof(struct rte_hash_bucket), 
BUCKET_ALIGNMENT);
+   tbl_size = align_size(h->num_buckets * bucket_size, 
RTE_CACHE_LINE_SIZE);
+   mem_size = hash_struct_size + tbl_size;
+
+   sec_h = (struct rte_hash *) rte_zmalloc_socket(NULL, mem_size,
+   RTE_CACHE_LINE_SIZE, h->socket_id);
+
+   memcpy(sec_h, h, hash_struct_size);
+   sec_h->buckets = (struct rte_hash_bucket *)((uint8_t *)sec_h + 
hash_struct_size);
+
+   /* Updates the primary hash function and/or its initial value to rehash 
*/
+   sec_h->hash_func_init_val = hash_func_init_val;
+   if (hash_func != NULL)
+   sec_h->hash_func = hash_func;
+
+   for (i = 0; i < h->num_buckets; i++) {
+   for (j = 0; j < RTE_HASH_BUCKET_ENTRIES; j++) {
+   /* Check if entry in bucket is not empty */
+   if (h->buckets[i].signatures[j] != NULL_SIGNATURE) {
+   k = (char *)h->key_store +
+   h->buckets[i].key_idx[j] * 
h->key_entry_size;
+   /* Get new hash (with new initial value) */
+   hash = rte_hash_hash(sec_h, k);
+   bucket_idx = hash & sec_h->bucket_bitmask;
+   hash |= sec_h->sig_msb;
+   bkt = _h->buckets[bucket_idx];
+   /* Add entry on secondary hash table */
+   ret = run_cuckoo(sec_h, bkt, 
h->buckets[i].key_idx[j],
+   hash, hash, k);
+   if (ret == -EAGAIN)
+   goto exit;
+   num_entries++;
+   }
+   }
+   }
+
+   /* Replace old table with the new table */
+   h->hash_func_init_val = hash_func_init_val;
+   if (hash_func != NULL)
+   sec_h->hash_func = hash_func;
+   memcpy(h->buckets, sec_h->buckets, tbl_size);
+   ret = 0;
+
+exit:
+   rte_free(sec_h);
+   return ret;
+}
+
 static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
hash_sig_t sig)
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index b364a43..c92d935 100644
---

[dpdk-dev] [PATCH 3/6] hash: add new lookup_bulk_with_hash function

2015-06-05 Thread Pablo de Lara

Previous implementation was lacking a function
to look up a burst of entries, given precalculated hash values.
This patch implements such function, quite useful for
looking up keys from packets that have precalculated hash values
from a 5-tuple key.

Added the function in the hash unit test as well.

Signed-off-by: Pablo de Lara 
---
 app/test/test_hash.c |  19 ++-
 lib/librte_hash/rte_hash.c   | 226 ++-
 lib/librte_hash/rte_hash.h   |  26 
 lib/librte_hash/rte_hash_version.map |   8 ++
 4 files changed, 275 insertions(+), 4 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 4ef99ee..5d22cb9 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -456,6 +456,7 @@ static int test_five_keys(void)
 {
struct rte_hash *handle;
const void *key_array[5] = {0};
+   hash_sig_t hashes[5];
int pos[5];
int expected_pos[5];
unsigned i;
@@ -475,12 +476,24 @@ static int test_five_keys(void)
}

/* Lookup */
-   for(i = 0; i < 5; i++)
+   for (i = 0; i < 5; i++) {
key_array[i] = [i];
+   hashes[i] = rte_hash_hash(handle, [i]);
+   }

ret = rte_hash_lookup_multi(handle, _array[0], 5, (int32_t *)pos);
-   if(ret == 0)
-   for(i = 0; i < 5; i++) {
+   if (ret == 0)
+   for (i = 0; i < 5; i++) {
+   print_key_info("Lkp", key_array[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != expected_pos[i],
+   "failed to find key (pos[%u]=%d)", i, 
pos[i]);
+   }
+
+   /* Lookup with precalculated hashes */
+   ret = rte_hash_lookup_multi_with_hash(handle, _array[0], hashes,
+   5, (int32_t *)pos);
+   if (ret == 0)
+   for (i = 0; i < 5; i++) {
print_key_info("Lkp", key_array[i], pos[i]);
RETURN_IF_ERROR(pos[i] != expected_pos[i],
"failed to find key (pos[%u]=%d)", i, 
pos[i]);
diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index cbfe17e..9599413 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -615,6 +615,25 @@ lookup_stage0(unsigned *idx, uint64_t *lookup_mask,
*lookup_mask &= ~(1llu << *idx);
 }

+/* Lookup bulk stage 0: Get primary hash value and calculate secondary hash 
value */
+static inline void
+lookup_stage0_with_hash(unsigned *idx, uint64_t *lookup_mask,
+   uint64_t *primary_hash, uint64_t *secondary_hash,
+   const hash_sig_t *hash_vals, const struct rte_hash *h)
+{
+   *idx = __builtin_ctzl(*lookup_mask);
+   if (*lookup_mask == 0)
+   *idx = 0;
+
+   *primary_hash = hash_vals[*idx];
+   *secondary_hash = rte_hash_secondary_hash(*primary_hash);
+
+   *primary_hash |= h->sig_msb;
+
+   *secondary_hash |= h->sig_msb;
+   *secondary_hash |= h->sig_secondary;
+   *lookup_mask &= ~(1llu << *idx);
+}

 /* Lookup bulk stage 1: Prefetch primary/secondary buckets */
 static inline void
@@ -631,7 +650,7 @@ lookup_stage1(uint64_t primary_hash, uint64_t 
secondary_hash,
 }

 /*
- * Lookup bulk stage 2:  Search for match hashes in primary/secondary locations
+ * Lookup bulk stage 2: Search for match hashes in primary/secondary locations
  * and prefetch first key slot
  */
 static inline void
@@ -880,6 +899,198 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const 
void **keys,
return 0;
 }

+static inline int
+__rte_hash_lookup_bulk_with_hash(const struct rte_hash *h, const void **keys,
+   const hash_sig_t *hash_vals, uint32_t num_keys,
+   int32_t *positions)
+{
+   uint64_t hits = 0;
+   uint64_t next_mask = 0;
+   uint64_t extra_hits_mask = 0;
+   uint64_t lookup_mask;
+   unsigned idx;
+   const void *key_store = h->key_store;
+
+   unsigned idx00, idx01, idx10, idx11, idx20, idx21, idx30, idx31;
+   const struct rte_hash_bucket *primary_bkt10, *primary_bkt11;
+   const struct rte_hash_bucket *secondary_bkt10, *secondary_bkt11;
+   const struct rte_hash_bucket *primary_bkt20, *primary_bkt21;
+   const struct rte_hash_bucket *secondary_bkt20, *secondary_bkt21;
+   const void *k_slot20, *k_slot21, *k_slot30, *k_slot31;
+   uint64_t primary_hash00, primary_hash01;
+   uint64_t secondary_hash00, secondary_hash01;
+   uint64_t primary_hash10, primary_hash11;
+   uint64_t secondary_hash10, secondary_hash11;
+   uint64_t primary_hash20, primary_hash21;
+   uint64_t secondary_hash20, secondary_hash21;
+
+   if (num_keys == RTE_HASH_LOOKUP_BULK_MAX)
+   lookup_mask = 0x;
+   else
+   lookup_mask = (1 << num_keys) - 1;
+
+   lookup_stage0_with_hash(, _mask, _hash00,
+

[dpdk-dev] [PATCH 2/6] hash: replace existing hash library with cuckoo hash implementation

2015-06-05 Thread Pablo de Lara

This patch replaces the existing hash library with another approach,
using the Cuckoo Hash method to resolve collisions (open addressing),
which pushes items from a full bucket when a new entry tries
to be added in it, storing the evicted entry in an alternative location,
using a secondary hash function.

This gives the user the ability to store more entries when a bucket
is full, in comparison with the previous implementation.
Therefore, the unit test has been updated, as some scenarios have changed
(such as the previous removed restriction).

Also note that the API has not been changed, although new fields
have been added in the rte_hash structure.
The main change when creating a new table is that the number of entries
per bucket is fixed now, so its parameter is ignored now
(still there to maintain the same parameters structure).

As a last note, the maximum burst size in lookup_burst function
hash been increased to 64, to improve performance.

Signed-off-by: Pablo de Lara 
---
 app/test/test_hash.c   |  86 +
 lib/librte_hash/rte_hash.c | 797 ++---
 lib/librte_hash/rte_hash.h | 157 +
 3 files changed, 721 insertions(+), 319 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 1da27c5..4ef99ee 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -169,7 +169,6 @@ static struct flow_key keys[5] = { {
 /* Parameters used for hash table in unit test functions. Name set later. */
 static struct rte_hash_parameters ut_params = {
.entries = 64,
-   .bucket_entries = 4,
.key_len = sizeof(struct flow_key), /* 13 */
.hash_func = rte_jhash,
.hash_func_init_val = 0,
@@ -527,21 +526,18 @@ static int test_five_keys(void)
 /*
  * Add keys to the same bucket until bucket full.
  * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th unsuccessful
- * - lookup the 5 keys: 4 hits, 1 miss
- * - add the 5 keys again: 4 OK, one error as bucket is full
- * - lookup the 5 keys: 4 hits (updated data), 1 miss
- * - delete the 5 keys: 5 OK (even if the 5th is not in the table)
+ *   first 4 successful, 5th successful, pushing existing item in bucket
+ * - lookup the 5 keys: 5 hits
+ * - add the 5 keys again: 5 OK
+ * - lookup the 5 keys: 5 hits (updated data)
+ * - delete the 5 keys: 5 OK
  * - lookup the 5 keys: 5 misses
- * - add the 5th key: OK
- * - lookup the 5th key: hit
  */
 static int test_full_bucket(void)
 {
struct rte_hash_parameters params_pseudo_hash = {
.name = "test4",
.entries = 64,
-   .bucket_entries = 4,
.key_len = sizeof(struct flow_key), /* 13 */
.hash_func = pseudo_hash,
.hash_func_init_val = 0,
@@ -555,7 +551,7 @@ static int test_full_bucket(void)
handle = rte_hash_create(_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");

-   /* Fill bucket*/
+   /* Fill bucket */
for (i = 0; i < 4; i++) {
pos[i] = rte_hash_add_key(handle, [i]);
print_key_info("Add", [i], pos[i]);
@@ -563,47 +559,36 @@ static int test_full_bucket(void)
"failed to add key (pos[%u]=%d)", i, pos[i]);
expected_pos[i] = pos[i];
}
-   /* This shouldn't work because the bucket is full */
+   /* This should work and will push one of the items in the bucket 
because it is full */
pos[4] = rte_hash_add_key(handle, [4]);
print_key_info("Add", [4], pos[4]);
-   RETURN_IF_ERROR(pos[4] != -ENOSPC,
-   "fail: added key to full bucket (pos[4]=%d)", pos[4]);
+   RETURN_IF_ERROR(pos[4] < 0,
+   "failed to add key (pos[4]=%d)", pos[4]);
+   expected_pos[5] = pos[5];

/* Lookup */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < 5; i++) {
pos[i] = rte_hash_lookup(handle, [i]);
print_key_info("Lkp", [i], pos[i]);
RETURN_IF_ERROR(pos[i] != expected_pos[i],
"failed to find key (pos[%u]=%d)", i, pos[i]);
}
-   pos[4] = rte_hash_lookup(handle, [4]);
-   print_key_info("Lkp", [4], pos[4]);
-   RETURN_IF_ERROR(pos[4] != -ENOENT,
-   "fail: found non-existent key (pos[4]=%d)", pos[4]);

/* Add - update */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < 5; i++) {
pos[i] = rte_hash_add_key(handle, [i]);
print_key_info("Add", [i], pos[i]);
RETURN_IF_ERROR(pos[i] != expected_pos[i],
"failed to add key (pos[%u]=%d)", i, pos[i]);
}
-   pos[4] = rte_hash_add_key(handle, [4]);
-   print_key_info("Add", [4], pos[4]);
-   RETURN_IF_ERROR(pos[4] != -ENOSPC,
-   "fail: added key to full bucket

[dpdk-dev] [PATCH 1/6] eal: add const in prefetch functions

2015-06-05 Thread Pablo de Lara

rte_prefetchX functions included volatile void *p as parameter,
but the function does not modify it, so it should include the const keyword.

Signed-off-by: Pablo de Lara 
---
 lib/librte_eal/common/include/arch/ppc_64/rte_prefetch.h |  6 +++---
 lib/librte_eal/common/include/arch/x86/rte_prefetch.h| 12 ++--
 lib/librte_eal/common/include/generic/rte_prefetch.h |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_prefetch.h 
b/lib/librte_eal/common/include/arch/ppc_64/rte_prefetch.h
index 9df0d13..fea3be1 100644
--- a/lib/librte_eal/common/include/arch/ppc_64/rte_prefetch.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_prefetch.h
@@ -39,17 +39,17 @@ extern "C" {

 #include "generic/rte_prefetch.h"

-static inline void rte_prefetch0(volatile void *p)
+static inline void rte_prefetch0(const volatile void *p)
 {
asm volatile ("dcbt 0,%[p],1" : : [p] "r" (p));
 }

-static inline void rte_prefetch1(volatile void *p)
+static inline void rte_prefetch1(const volatile void *p)
 {
asm volatile ("dcbt 0,%[p],1" : : [p] "r" (p));
 }

-static inline void rte_prefetch2(volatile void *p)
+static inline void rte_prefetch2(const volatile void *p)
 {
asm volatile ("dcbt 0,%[p],1" : : [p] "r" (p));
 }
diff --git a/lib/librte_eal/common/include/arch/x86/rte_prefetch.h 
b/lib/librte_eal/common/include/arch/x86/rte_prefetch.h
index ec2454d..688fa5e 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_prefetch.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_prefetch.h
@@ -40,19 +40,19 @@ extern "C" {

 #include "generic/rte_prefetch.h"

-static inline void rte_prefetch0(volatile void *p)
+static inline void rte_prefetch0(const volatile void *p)
 {
-   asm volatile ("prefetcht0 %[p]" : [p] "+m" (*(volatile char *)p));
+   asm volatile ("prefetcht0 %[p]" : : [p] "m" (*(const volatile char 
*)p));
 }

-static inline void rte_prefetch1(volatile void *p)
+static inline void rte_prefetch1(const volatile void *p)
 {
-   asm volatile ("prefetcht1 %[p]" : [p] "+m" (*(volatile char *)p));
+   asm volatile ("prefetcht1 %[p]" : : [p] "m" (*(const volatile char 
*)p));
 }

-static inline void rte_prefetch2(volatile void *p)
+static inline void rte_prefetch2(const volatile void *p)
 {
-   asm volatile ("prefetcht2 %[p]" : [p] "+m" (*(volatile char *)p));
+   asm volatile ("prefetcht2 %[p]" : : [p] "m" (*(const volatile char 
*)p));
 }

 #ifdef __cplusplus
diff --git a/lib/librte_eal/common/include/generic/rte_prefetch.h 
b/lib/librte_eal/common/include/generic/rte_prefetch.h
index 217f319..ee4a9ee 100644
--- a/lib/librte_eal/common/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/common/include/generic/rte_prefetch.h
@@ -51,14 +51,14 @@
  * @param p
  *   Address to prefetch
  */
-static inline void rte_prefetch0(volatile void *p);
+static inline void rte_prefetch0(const volatile void *p);

 /**
  * Prefetch a cache line into all cache levels except the 0th cache level.
  * @param p
  *   Address to prefetch
  */
-static inline void rte_prefetch1(volatile void *p);
+static inline void rte_prefetch1(const volatile void *p);

 /**
  * Prefetch a cache line into all cache levels except the 0th and 1th cache
@@ -66,6 +66,6 @@ static inline void rte_prefetch1(volatile void *p);
  * @param p
  *   Address to prefetch
  */
-static inline void rte_prefetch2(volatile void *p);
+static inline void rte_prefetch2(const volatile void *p);

 #endif /* _RTE_PREFETCH_H_ */
-- 
2.4.2

[dpdk-dev] [PATCH 0/6] Cuckoo hash

2015-06-05 Thread Pablo de Lara

This patchset is to replace the existing hash library with
a more efficient and functional approach, using the Cuckoo hash
method to deal with collisions. This method is based on using
two different hash functions to have two possible locations
in the hash table where an entry can be.
So, if a bucket is full, a new entry can push one of the items
in that bucket to its alternative location, making space for itself.

Advantages
~~
- Offers the option to store more entries when the target bucket is full
  (unlike the previous implementation)
- Memory efficient: for storing those entries, it is not necessary to
  request new memory, as the entries will be stored in the same table
- Constant worst lookup time: in worst case scenario, it always takes
  the same time to look up an entry, as there are only two possible locations
  where an entry can be.
- Storing data: user can store data in the hash table, unlike the
  previous implementation, but he can still use the old API

This implementation tipically offers over 90% utilization before having
to rehash the table, so it is unlikely that a rehash is necessary,
as long as there is enough free space and user uses reasonable good hash 
functions.

Things left for v2:
- Improve unit tests to show clearer performance numbers
- Documentation changes

Pablo de Lara (6):
  eal: add const in prefetch functions
  hash: replace existing hash library with cuckoo hash implementation
  hash: add new lookup_bulk_with_hash function
  hash: add new functions rte_hash_rehash and rte_hash_reset
  hash: add new functionality to store data in hash table
  MAINTAINERS: claim responsability for hash library

 MAINTAINERS|1 +
 app/test/Makefile  |3 +
 app/test/test_hash.c   |  105 +-
 .../common/include/arch/ppc_64/rte_prefetch.h  |6 +-
 .../common/include/arch/x86/rte_prefetch.h |   12 +-
 .../common/include/generic/rte_prefetch.h  |6 +-
 lib/librte_hash/rte_hash.c | 1226 +---
 lib/librte_hash/rte_hash.h |  373 +-
 lib/librte_hash/rte_hash_version.map   |   18 +
 9 files changed, 1422 insertions(+), 328 deletions(-)

-- 
2.4.2

[dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros

2015-06-05 Thread Dumitrescu, Cristian



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Friday, June 5, 2015 3:55 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
> 
> Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> meta-data fields.
> Forcing aligned accesses is not really required, so this is removing an
> unneeded constraint.
> This issue was met during testing of the new version of the ip_pipeline
> application. There is no performance impact.
> This change has no ABI impact, as the previous code that uses aligned
> accesses continues to run without any issues.
> 
> Signed-off-by: Daniel Mrzyglod 


Ack-ed by: Cristian Dumitrescu

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Wang, Liang-min



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, June 05, 2015 9:41 AM
> To: Wang, Liang-min
> Cc: Andrew Harvey (agh); Stephen Hemminger; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide
> ethtool-alike APIs
> 
> 2015-06-05 11:25, Wang, Liang-min:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Stephen and me say the same thing about using the ethdev API.
> > > We don't understand why using a fake ethtool lib would be easier.
> > > Though you are saying it "facilitated [your] adoption to dpdk".
> > > Please could you explain why using an ethtool-like API is easier
> > > than using the existing ethdev API?
> > > In any case, you have to develop a specific backend for DPDK
> > > (rte_ethtool would be also DPDK-specific).
> >
> > As described earlier in this patch comment reply, there are other ethtool
> ops that have been implemented.
> > Those ops includes set/get eeprom, set/get pauseparam, set/get
> ringparam which are not available in the exiting ethdev library.
> 
> 1/ We cannot really consider code which is not public 2/ You may extend
> ethdev if some functions are missing
> 
> > For this release, we focus on releasing some basic functions (btw,
> mac_addr_set is not available but is covered by this patch).
> 
> Yes, you are extending ethdev by adding
> rte_eth_dev_default_mac_addr_set.
> 
> > The key reason that this set of library is not released as part of ethdev is
> the ethtool API dependency on kernel include file.
> 
> It is a good reason to separate the library.
> But it doesn't justify its need.
> 
> > To faithfully carry the ethtool ops and net dev ops API parameters, the
> ethtool APIs are designed to follow the original definition except avoiding
> carry kernel states.
> > With that, to support ethtool APIs faithfully, we need to include
> .
> > As suggested by many DPDK veterans including Thomas (indicated over
> your reply), you would prefer these APIs in a separate library.
> 
> I think I'm starting to understand that you really need ethtool conversion
> (implemented in rte_ethtool_get_drvinfo) but not the other functions which
> are simple wrappers. Right?

The rte_ethtool_get_drvinfo and many others ethtool ops have the same 
conversion requirement.
As for ethtool and net dev ops that don't require conversion. For the sake of 
clean API interface, they are implemented in the same ethtool library.

[dpdk-dev] The use of --log-level and its default state

2015-06-05 Thread Wiles, Keith



On 6/5/15, 8:58 AM, "Thomas Monjalon"  wrote:

>Keith, your mail is very long but it's maybe on purpose to show that
>there are too many logs ;)

Yes it was kind of the point, but only because I wanted to show the
complete output.
>
>2015-06-05 12:32, Wiles, Keith:
>> On 6/5/15, 5:00 AM, "Thomas Monjalon"  wrote:
>> >2015-05-27 15:10, Wiles, Keith:
>> >> I would like to have the log-level default changed to not log
>> >>everything,
>> >> but the user needs to enable the log messages if he needs to see more
>> >> information. Normally applications or systems are not so verbose,
>>but if
>> >> needed the user enables the verbose or debug messages.
>> >> 
>> >> Can we change the default logs and messages to be non-verbose
>>instead?
>> >
>> >Do you mean changing this line?
>> >/* default value from build option */
>> >internal_cfg->log_level = RTE_LOG_LEVEL;
>> >It means using the most verbose level available in the build.
>> >
>> >Maybe we should set RTE_LOG_NOTICE or RTE_LOG_WARNING,
>> >However, there is already --log-level for the user and
>>rte_set_log_level()
>> >for the application developper.
>> >So this default log level is only used for DPDK trials and development.
>> >Probably that being verbose is a good option for such cases?
>> >
>> The normal operation for most systems is no-news-is-good-news, meaning
>> only report warnings and errors if someone wants informational output it
>> should be enabled by the user. It seems PMDs and other parts of DPDK
>>print
>> out information which is not a warning or error, but are debug or
>> informational messages that are not very useful. The debug information
>>is
>> more for the developer then a user or even a developer of DPDK as the
>> debug information has nothing to do with the current developers goals.
>
>This assumption is not fully true.
>In normal applications DPDK doesn't show so many logs.
>But in testpmd or examples, the log level is not set (by default) and they
>behave as debug applications, which is probably a good default.
>However, as you say below, the log level of some messages is not well
>tuned.

The output was from Pktgen and I did not set any log-level in the app. I
would get the messages about non-vector and vector support when the port
is stopped/started. :-(
>
>> We can change the RTE_LOG_LEVEL to a value that only prints warning and
>> errors should always be printed. We can leave it or we can make that one
>> change to reduce the amount of clutter on the screen. Some of the PMD
>> information is printed out anytime the state changes, which effects the
>> application screen output.
>> 
>> When I have to interact with users of DPDK they sometimes miss critical
>> details in the output because of the sheer amount of output on the
>>screen.
>> Even most OSes try to output information to the screen is a sane way to
>> allow someone to quickly spot a problem.
>> 
>> Here is a normal output with log level at the default:
>> 
>[...] 
>> When I quit the application I get the four above are they really useful,
>> not really IMO. The PMD information and the EAL information is not very
>> use 99% of the time.
>
>Yes it is a debug mode.
>
>> Next is ?log-level=0 on the command line.
>> 
>> EAL: Detected lcore 0 as core 0 on socket 0
>> EAL: Detected lcore 1 as core 1 on socket 0
>> EAL: Detected lcore 2 as core 2 on socket 0
>> EAL: Detected lcore 3 as core 3 on socket 0
>> EAL: Detected lcore 4 as core 4 on socket 0
>> EAL: Detected lcore 5 as core 8 on socket 0
>> EAL: Detected lcore 6 as core 9 on socket 0
>> EAL: Detected lcore 7 as core 10 on socket 0
>> EAL: Detected lcore 8 as core 11 on socket 0
>> EAL: Detected lcore 9 as core 16 on socket 0
>> EAL: Detected lcore 10 as core 17 on socket 0
>> EAL: Detected lcore 11 as core 18 on socket 0
>> EAL: Detected lcore 12 as core 19 on socket 0
>> EAL: Detected lcore 13 as core 20 on socket 0
>> EAL: Detected lcore 14 as core 24 on socket 0
>> EAL: Detected lcore 15 as core 25 on socket 0
>> EAL: Detected lcore 16 as core 26 on socket 0
>> EAL: Detected lcore 17 as core 27 on socket 0
>> EAL: Detected lcore 18 as core 0 on socket 1
>> EAL: Detected lcore 19 as core 1 on socket 1
>> EAL: Detected lcore 20 as core 2 on socket 1
>> EAL: Detected lcore 21 as core 3 on socket 1
>> EAL: Detected lcore 22 as core 4 on socket 1
>> EAL: Detected lcore 23 as core 8 on socket 1
>> EAL: Detected lcore 24 as core 9 on socket 1
>> EAL: Detected lcore 25 as core 10 on socket 1
>> EAL: Detected lcore 26 as core 11 on socket 1
>> EAL: Detected lcore 27 as core 16 on socket 1
>> EAL: Detected lcore 28 as core 17 on socket 1
>> EAL: Detected lcore 29 as core 18 on socket 1
>> EAL: Detected lcore 30 as core 19 on socket 1
>> EAL: Detected lcore 31 as core 20 on socket 1
>> EAL: Detected lcore 32 as core 24 on socket 1
>> EAL: Detected lcore 33 as core 25 on socket 1
>> EAL: Detected lcore 34 as core 26 on socket 1
>> EAL: Detected lcore 35 as core 27 on socket 1
>> EAL: Detected lcore 36 as core 0 on

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Bruce Richardson

On Fri, Jun 05, 2015 at 11:25:09AM +, Wang, Liang-min wrote:
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Friday, June 05, 2015 6:47 AM
> > To: Andrew Harvey (agh)
> > Cc: Stephen Hemminger; Wang, Liang-min; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide
> > ethtool-alike APIs
> > 
> > 2015-06-04 22:10, Andrew Harvey:
> > > On 6/4/15, 7:58 AM, "Stephen Hemminger"
> >  wrote:
> > > >"Andrew Harvey (agh)"  wrote:
> > > >> I believe that their is value in this interface for software stacks
> > > >>not  based on Linux being moved toward DPDK that need simple
> > > >>operations like  getting the mac address.  Some of these stacks have
> > > >>a dearth of resources  available and dedicating a core/thread to KNI
> > > >>to get/set a mac address  is considered excessive. There are also
> > > >>issues with 32/64 bit kernel  integration  using KNI.  If the
> > > >>ethtool interface is not the correct interface then  please help me
> > > >>understand what should/could have been used. If ethtool is
> > > >>considered 'old  and clunky?  Stephen's and your input would be
> > > >>valuable in designing another interface  with  similar properties.
> > > >>The use-case is pretty simple and there is no plans  for moving
> > > >>anything back into the kernel on the contrary its the complete opposite.
> > > >>
> > > >> ? Andy
> > > >
> > > >We have DPDK API's to do this, and any added wrappers make it bigger.
> > > >I don't see why calling your ethtool API is better than calling
> > > >rte_eth* API.
> > > >
> > > >If there is a missing functionality in the rte_ethXXX api's for an
> > > >application then add that. For example: rte_eth_mac_addr_get()
> > >
> > > I am getting somewhat confused by your latest comments.  Your first
> > > email (referenced below) looked really positive and I found your
> > > suggestions useful. Your latest post appears to contradict this and
> > > now the interface was there all the time.  The wrapper fa?ade provided
> > > by the ethtool library provide a clean separation of concerns and will
> > > allow people to migrate from not only KNI but in our case from a
> > > legacy system.  If a software stack has requirements to work with
> > > multiple IO abstractions then the ethtool approach is attractive. I
> > > would speculate that many other stacks moving towards dpdk will have
> > similar issues.
> > >
> > > Summarizing, for our use-cases the ethtool interface facilitated our
> > > adoption to dpdk while allowing us to support our legacy IO abstractions.
> > 
> > Stephen and me say the same thing about using the ethdev API.
> > We don't understand why using a fake ethtool lib would be easier.
> > Though you are saying it "facilitated [your] adoption to dpdk".
> > Please could you explain why using an ethtool-like API is easier than using
> > the existing ethdev API?
> > In any case, you have to develop a specific backend for DPDK (rte_ethtool
> > would be also DPDK-specific).
> 
> As described earlier in this patch comment reply, there are other ethtool ops 
> that have been implemented.
> Those ops includes set/get eeprom, set/get pauseparam, set/get ringparam 
> which are not available in the exiting ethdev library.
> For this release, we focus on releasing some basic functions (btw, 
> mac_addr_set is not available but is covered by this patch).
> The key reason that this set of library is not released as part of ethdev is 
> the ethtool API dependency on kernel include file.
> To faithfully carry the ethtool ops and net dev ops API parameters, the 
> ethtool APIs are designed to follow the original definition except avoiding 
> carry kernel states.
> With that, to support ethtool APIs faithfully, we need to include 
> . 
> As suggested by many DPDK veterans including Thomas (indicated over your 
> reply), you would prefer these APIs in a separate library.
> 
> > 
> > It seems you already started to use such an ethtool implementation.
> > Please note that our goal is not to prevent Cisco from upstreaming (evidence
> > with enic driver integration) but we want to guide you, and others having 
> > the
> > same needs, to the best solution for everybody.
> > That's why we need to understand what we (or you) are missing.
> > Maybe that it would be clearer with some code examples (which would go in
> > the lib documentation if any).
> > 
> > Thanks

How about doing this work as a sample application initially, to demonstrate how
an application written using ethtool APIs could be shimmed to use DPDK 
underneath.
The ethtool to dpdk mapping could be contained in a single header file (or 
header
and c file) inside the sample app. This would allow easy re-use of the shim
layer, while at the same time not making it part of the core DPDK libraries.

Regards,
/Bruce

[dpdk-dev] [PATCH 26/26] ixgbe/base: block EEE(Energy Efficient Ethernet) setup on the interfaces that don't support EEE

2015-06-05 Thread Wenzhuo Lu

This patch sets the setup_EEE function pointer to NULL for the
interfaces which do not support EEE. Currently only the KR backplane
interface (0x15AB) supports EEE. Setting this pointer to NULL prevents
EEE registers from being incorrectly modified and gives base drivers a
flag to check for EEE support.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 8edc52c..da312ba 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -390,6 +390,9 @@ s32 ixgbe_init_ops_X550EM(struct ixgbe_hw *hw)
mac->ops.acquire_swfw_sync = ixgbe_acquire_swfw_sync_X550em;
mac->ops.release_swfw_sync = ixgbe_release_swfw_sync_X550em;

+   if (hw->device_id != IXGBE_DEV_ID_X550EM_X_KR)
+   mac->ops.setup_eee = NULL;
+
/* PHY */
phy->ops.init = ixgbe_init_phy_ops_X550em;
phy->ops.identify = ixgbe_identify_phy_x550em;
-- 
1.9.3

[dpdk-dev] [PATCH 25/26] ixgbe/base: added x550em PHY reset function

2015-06-05 Thread Wenzhuo Lu

This patch adds x550em PHY reset function ixgbe_reset_phy_t_X550em.
ixgbe_reset_phy_t_X550em calls the reset PHY generic, and then enables
the x550em PHY LASI(Link Alarm Status Interrupt) interrupts.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 21 ++---
 drivers/net/ixgbe/base/ixgbe_x550.h |  2 +-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 9301686..8edc52c 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1325,6 +1325,7 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)
 ixgbe_setup_internal_phy_t_x550em;
phy->ops.enter_lplu = ixgbe_enter_lplu_t_x550em;
phy->ops.handle_lasi = ixgbe_handle_lasi_ext_t_x550em;
+   phy->ops.reset = ixgbe_reset_phy_t_X550em;
break;
default:
break;
@@ -1501,9 +1502,6 @@ s32 ixgbe_init_ext_t_x550em(struct ixgbe_hw *hw)
return status;
}

-   /* Configure Link Status Alarm and Temperature Threshold interrupts */
-   status = ixgbe_enable_lasi_ext_t_x550em(hw);
-
return status;
 }

@@ -2892,3 +2890,20 @@ s32 ixgbe_check_link_t_X550em(struct ixgbe_hw *hw, 
ixgbe_link_speed *speed,

return IXGBE_SUCCESS;
 }
+
+/**
+ *  ixgbe_reset_phy_t_X550em - Performs X557 PHY reset and enables LASI
+ *  @hw: pointer to hardware structure
+ **/
+s32 ixgbe_reset_phy_t_X550em(struct ixgbe_hw *hw)
+{
+   s32 status;
+
+   status = ixgbe_reset_phy_generic(hw);
+
+   if (status != IXGBE_SUCCESS)
+   return status;
+
+   /* Configure Link Status Alarm and Temperature Threshold interrupts */
+   return ixgbe_enable_lasi_ext_t_x550em(hw);
+}
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.h 
b/drivers/net/ixgbe/base/ixgbe_x550.h
index ead9e79..4cfd49c 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.h
+++ b/drivers/net/ixgbe/base/ixgbe_x550.h
@@ -101,5 +101,5 @@ s32 ixgbe_setup_mac_link_t_X550em(struct ixgbe_hw *hw,
  bool autoneg_wait_to_complete);
 s32 ixgbe_check_link_t_X550em(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
  bool *link_up, bool link_up_wait_to_complete);
+s32 ixgbe_reset_phy_t_X550em(struct ixgbe_hw *hw);
 #endif /* _IXGBE_X550_H_ */
-
-- 
1.9.3

[dpdk-dev] [PATCH 24/26] ixgbe/base: set lan_id before first I2C access

2015-06-05 Thread Wenzhuo Lu

Set the lan_id before the first I2C access. The existing call was
clearly being done after a previous I2C access in the same function
and that can't be right, so call the set_lan_id method earlier. At
this point it probably doesn't matter for this QSFP function, but
it makes sense to do it consistently anyway.

On X550, be sure to set the lan_id before using it to configure the
mux control output, else the mux will not be controlled.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_phy.c  | 6 +++---
 drivers/net/ixgbe/base/ixgbe_x550.c | 2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c 
b/drivers/net/ixgbe/base/ixgbe_phy.c
index e5ededb..3ba5661 100644
--- a/drivers/net/ixgbe/base/ixgbe_phy.c
+++ b/drivers/net/ixgbe/base/ixgbe_phy.c
@@ -1692,6 +1692,9 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw 
*hw)
goto out;
}

+   /* LAN ID is needed for I2C access */
+   hw->mac.ops.set_lan_id(hw);
+
status = hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
 );

@@ -1706,9 +1709,6 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw 
*hw)

hw->phy.id = identifier;

-   /* LAN ID is needed for sfp_type determination */
-   hw->mac.ops.set_lan_id(hw);
-
status = hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_QSFP_10GBE_COMP,
 _codes_10g);

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 4608f75..9301686 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1291,6 +1291,8 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)

DEBUGFUNC("ixgbe_init_phy_ops_X550em");

+   hw->mac.ops.set_lan_id(hw);
+
if (hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) {
phy->phy_semaphore_mask = IXGBE_GSSR_SHARED_I2C_SM;
ixgbe_setup_mux_ctl(hw);
-- 
1.9.3

[dpdk-dev] [PATCH 23/26] ixgbe/base: add link check support for x550em PHY

2015-06-05 Thread Wenzhuo Lu

This patch adds ixgbe_check_link_t_X550em for checking x550em
PHY link. We check that both the MAC and external x550em PHY have link.
This is to avoid a false link up between the internal and external PHY
when the external PHY doesn't have link.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 50 +
 drivers/net/ixgbe/base/ixgbe_x550.h |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 3695215..4608f75 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1070,6 +1070,7 @@ void ixgbe_init_mac_link_ops_X550em(struct ixgbe_hw *hw)
break;
case ixgbe_media_type_copper:
mac->ops.setup_link = ixgbe_setup_mac_link_t_X550em;
+   mac->ops.check_link = ixgbe_check_link_t_X550em;
break;
default:
break;
@@ -2840,3 +2841,52 @@ s32 ixgbe_setup_mac_link_t_X550em(struct ixgbe_hw *hw,

return hw->phy.ops.setup_link_speed(hw, speed, 
autoneg_wait_to_complete);
 }
+
+/**
+ * ixgbe_check_link_t_X550em - Determine link and speed status
+ * @hw: pointer to hardware structure
+ * @speed: pointer to link speed
+ * @link_up: true when link is up
+ * @link_up_wait_to_complete: bool used to wait for link up or not
+ *
+ * Check that both the MAC and X557 external PHY have link.
+ **/
+s32 ixgbe_check_link_t_X550em(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
+ bool *link_up, bool link_up_wait_to_complete)
+{
+   u32 status;
+   u16 autoneg_status;
+
+   if (hw->mac.ops.get_media_type(hw) != ixgbe_media_type_copper)
+   return IXGBE_ERR_CONFIG;
+
+   status = ixgbe_check_mac_link_generic(hw, speed, link_up,
+ link_up_wait_to_complete);
+
+   /* If check link fails or MAC link is not up, then return */
+   if (status != IXGBE_SUCCESS || !(*link_up))
+   return status;
+
+   /* MAC link is up, so check external PHY link.
+* Read this twice back to back to indicate current status.
+*/
+   status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_AUTO_NEG_STATUS,
+ IXGBE_MDIO_AUTO_NEG_DEV_TYPE,
+ _status);
+
+   if (status != IXGBE_SUCCESS)
+   return status;
+
+   status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_AUTO_NEG_STATUS,
+ IXGBE_MDIO_AUTO_NEG_DEV_TYPE,
+ _status);
+
+   if (status != IXGBE_SUCCESS)
+   return status;
+
+   /* If external PHY link is not up, then indicate link not up */
+   if (!(autoneg_status & IXGBE_MDIO_AUTO_NEG_LINK_STATUS))
+   *link_up = false;
+
+   return IXGBE_SUCCESS;
+}
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.h 
b/drivers/net/ixgbe/base/ixgbe_x550.h
index ee23c76..ead9e79 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.h
+++ b/drivers/net/ixgbe/base/ixgbe_x550.h
@@ -99,5 +99,7 @@ s32 ixgbe_handle_lasi_ext_t_x550em(struct ixgbe_hw *hw);
 s32 ixgbe_setup_mac_link_t_X550em(struct ixgbe_hw *hw,
  ixgbe_link_speed speed,
  bool autoneg_wait_to_complete);
+s32 ixgbe_check_link_t_X550em(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
+ bool *link_up, bool link_up_wait_to_complete);
 #endif /* _IXGBE_X550_H_ */

-- 
1.9.3

[dpdk-dev] [PATCH 22/26] ixgbe/base: add x550em PHY interrupt and forced 1G/10G support

2015-06-05 Thread Wenzhuo Lu

This patch adds x550em external PHY interrupt and forced 1G/10G
support. Support includes enabling and handling Link Status
Change and Thermal Sensor interrupt. ixgbe_handle_lasi has been added
to the API for handling the interrupts received from x550em PHY.
ixgbe_enable_lasi_ext_t_x550em and ixgbe_get_lasi_ext_t_x550em have been
added to X550em to enable mask and check interrupt flags for x550em PHY.

Forced 1G/10G link speed is handled via ixgbe_setup_mac_link_t_X550em.
ixgbe_setup_mac_link_t_X550em sets up the internal PHY and
external PHY link to either 10G or 1G based on the user selected auto
advertised link speed setting. Then sets up the external PHY auto
advertised link speed.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_api.c  |  17 +++
 drivers/net/ixgbe/base/ixgbe_api.h  |   1 +
 drivers/net/ixgbe/base/ixgbe_type.h |  18 ++-
 drivers/net/ixgbe/base/ixgbe_x550.c | 241 +++-
 drivers/net/ixgbe/base/ixgbe_x550.h |   4 +
 5 files changed, 275 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_api.c 
b/drivers/net/ixgbe/base/ixgbe_api.c
index e08a2e0..916d744 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.c
+++ b/drivers/net/ixgbe/base/ixgbe_api.c
@@ -1293,6 +1293,23 @@ s32 ixgbe_enter_lplu(struct ixgbe_hw *hw)
 }

 /**
+ * ixgbe_handle_lasi - Handle external Base T PHY interrupt
+ * @hw: pointer to hardware structure
+ *
+ * Handle external Base T PHY interrupt. If high temperature
+ * failure alarm then return error, else if link status change
+ * then setup internal/external PHY link
+ *
+ * Return IXGBE_ERR_OVERTEMP if interrupt is high temperature
+ * failure alarm, else return PHY access status.
+ */
+s32 ixgbe_handle_lasi(struct ixgbe_hw *hw)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.handle_lasi, (hw),
+   IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  *  ixgbe_read_analog_reg8 - Reads 8 bit analog register
  *  @hw: pointer to hardware structure
  *  @reg: analog register to read
diff --git a/drivers/net/ixgbe/base/ixgbe_api.h 
b/drivers/net/ixgbe/base/ixgbe_api.h
index b08c846..bd1208e 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.h
+++ b/drivers/net/ixgbe/base/ixgbe_api.h
@@ -212,6 +212,7 @@ void ixgbe_enable_mdd(struct ixgbe_hw *hw);
 void ixgbe_mdd_event(struct ixgbe_hw *hw, u32 *vf_bitmap);
 void ixgbe_restore_mdd_vf(struct ixgbe_hw *hw, u32 vf);
 s32 ixgbe_enter_lplu(struct ixgbe_hw *hw);
+s32 ixgbe_handle_lasi(struct ixgbe_hw *hw);
 void ixgbe_set_rate_select_speed(struct ixgbe_hw *hw, ixgbe_link_speed speed);
 void ixgbe_disable_rx(struct ixgbe_hw *hw);
 void ixgbe_enable_rx(struct ixgbe_hw *hw);
diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index 6a00e5b..eaaba44 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -1378,6 +1378,8 @@ struct ixgbe_dmac_config {
 #define IXGBE_MDIO_AUTO_NEG_STATUS 0x1 /* AUTO_NEG Status Reg */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STAT0xC800 /* AUTO_NEG Vendor 
Status Reg */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_TX_ALARM 0xCC00 /* AUTO_NEG Vendor TX Reg */
+#define IXGBE_MDIO_AUTO_NEG_VENDOR_TX_ALARM2 0xCC01 /* AUTO_NEG Vendor Tx Reg 
*/
+#define IXGBE_MDIO_AUTO_NEG_VEN_LSC0x1 /* AUTO_NEG Vendor Tx LSC */
 #define IXGBE_MDIO_AUTO_NEG_ADVT   0x10 /* AUTO_NEG Advt Reg */
 #define IXGBE_MDIO_AUTO_NEG_LP 0x13 /* AUTO_NEG LP Status Reg */
 #define IXGBE_MDIO_AUTO_NEG_EEE_ADVT   0x3C /* AUTO_NEG EEE Advt Reg */
@@ -1406,11 +1408,24 @@ struct ixgbe_dmac_config {
 #define IXGBE_MDIO_TX_VENDOR_ALARMS_3_RST_MASK 0x3 /* PHY Reset Complete Mask 
*/
 #define IXGBE_MDIO_GLOBAL_RES_PR_10 0xC479 /* Global Resv Provisioning 10 Reg 
*/
 #define IXGBE_MDIO_POWER_UP_STALL  0x8000 /* Power Up Stall */
-
+#define IXGBE_MDIO_GLOBAL_INT_CHIP_STD_MASK0xFF00 /* int std mask */
+#define IXGBE_MDIO_GLOBAL_CHIP_STD_INT_FLAG0xFC00 /* chip std int flag */
+#define IXGBE_MDIO_GLOBAL_INT_CHIP_VEN_MASK0xFF01 /* int chip-wide mask */
+#define IXGBE_MDIO_GLOBAL_INT_CHIP_VEN_FLAG0xFC01 /* int chip-wide mask */
+#define IXGBE_MDIO_GLOBAL_ALARM_1  0xCC00 /* Global alarm 1 */
+#define IXGBE_MDIO_GLOBAL_ALM_1_HI_TMP_FAIL0x4000 /* high temp failure */
+#define IXGBE_MDIO_GLOBAL_INT_MASK 0xD400 /* Global int mask */
+#define IXGBE_MDIO_GLOBAL_AN_VEN_ALM_INT_EN0x1000 /* autoneg vendor alarm 
int enable */
+#define IXGBE_MDIO_GLOBAL_ALARM_1_INT  0x4 /* int in Global alarm 1 */
+#define IXGBE_MDIO_GLOBAL_VEN_ALM_INT_EN   0x1 /* vendor alarm int enable 
*/
+#define IXGBE_MDIO_GLOBAL_STD_ALM2_INT 0x200 /* vendor alarm2 int mask 
*/
+#define IXGBE_MDIO_GLOBAL_INT_HI_TEMP_EN   0x4000 /* int high temp enable 
*/
 #define IXGBE_MDIO_PMA_PMD_CONTROL_ADDR0x /* PMA/PMD Control Reg */
 #define IXGBE_MDIO_PMA_PMD_SDA_SCL_ADDR0xC30A /* PHY_XS SDA/SCL Addr 
Reg */
 #define IXGBE_MDIO_PMA_PMD_SDA_SCL_DATA

[dpdk-dev] [PATCH 21/26] ixgbe/base: add x550em Auto neg Flow Control support

2015-06-05 Thread Wenzhuo Lu

This patch adds x550em Auto neg Flow Control support to
ixgbe_device_supports_autoneg_fc and sets the x550em setup_fc function
pointer to ixgbe_setup_fc_generic. ixgbe_setup_fc_generic is used for
x550em because flow control is setup on the external PHY via MDIO, when
ixgbe_setup_fc_X550em sets up flow control on the internal PHY.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_common.c | 1 +
 drivers/net/ixgbe/base/ixgbe_x550.c   | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index 7a8eb6b..9e80722 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -185,6 +185,7 @@ bool ixgbe_device_supports_autoneg_fc(struct ixgbe_hw *hw)
case IXGBE_DEV_ID_X540T:
case IXGBE_DEV_ID_X540T1:
case IXGBE_DEV_ID_X550T:
+   case IXGBE_DEV_ID_X550EM_X_10G_T:
supported = true;
break;
default:
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 669d0ce..9abe927 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -382,7 +382,11 @@ s32 ixgbe_init_ops_X550EM(struct ixgbe_hw *hw)
mac->ops.get_supported_physical_layer =
ixgbe_get_supported_physical_layer_X550em;

+   if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper)
+   mac->ops.setup_fc = ixgbe_setup_fc_generic;
+   else
mac->ops.setup_fc = ixgbe_setup_fc_X550em;
+
mac->ops.acquire_swfw_sync = ixgbe_acquire_swfw_sync_X550em;
mac->ops.release_swfw_sync = ixgbe_release_swfw_sync_X550em;

-- 
1.9.3

[dpdk-dev] [PATCH 20/26] ixgbe/base: ixgbe_setup_internal_phy_x550em function clean-up

2015-06-05 Thread Wenzhuo Lu

This patch cleans up the ixgbe_setup_internal_phy_ x550em() function as follows:
 - Renames it to ixgbe_setup_internal_phy_t_x550em to clarify that it is
   specific to copper
 - Returns an error if called for non-copper devices
 - Corrects the comments
 - Removed the LASI(Link Alarm Status Interrupt) status register checks as
   this was incorrect and never worked correctly anyway.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 31 ---
 drivers/net/ixgbe/base/ixgbe_x550.h |  2 +-
 2 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 87942bb..669d0ce 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1150,7 +1150,8 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)
phy->ops.write_reg = ixgbe_write_phy_reg_x550em;
break;
case ixgbe_phy_x550em_ext_t:
-   phy->ops.setup_internal_link = ixgbe_setup_internal_phy_x550em;
+   phy->ops.setup_internal_link =
+ixgbe_setup_internal_phy_t_x550em;
phy->ops.enter_lplu = ixgbe_enter_lplu_t_x550em;
break;
default:
@@ -1537,35 +1538,27 @@ s32 ixgbe_setup_mac_link_sfp_x550em(struct ixgbe_hw *hw,
 }

 /**
- * ixgbe_setup_internal_phy_x550em - Configure integrated KR PHY
+ * ixgbe_setup_internal_phy_t_x550em - Configure KR PHY to X557 link
  * @hw: point to hardware structure
  *
- * Configures the integrated KR PHY to talk to the external PHY. The base
- * driver will call this function when it gets notification via interrupt from
- * the external PHY. This function forces the internal PHY into iXFI mode at
- * the correct speed.
+ * Configures the link between the integrated KR PHY and the external X557 PHY
+ * The driver will call this function when it gets a link status change
+ * interrupt from the X557 PHY. This function configures the link speed
+ * between the PHYs to match the link speed of the BASE-T link.
  *
  * A return of a non-zero value indicates an error, and the base driver should
  * not report link up.
  */
-s32 ixgbe_setup_internal_phy_x550em(struct ixgbe_hw *hw)
+s32 ixgbe_setup_internal_phy_t_x550em(struct ixgbe_hw *hw)
 {
u32 status;
-   u16 lasi, autoneg_status, speed;
+   u16 autoneg_status, speed;
ixgbe_link_speed force_speed;

-   /* Verify that the external link status has changed */
-   status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_XENPAK_LASI_STATUS,
- IXGBE_MDIO_PMA_PMD_DEV_TYPE,
- );
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   /* If there was no change in link status, we can just exit */
-   if (!(lasi & IXGBE_XENPAK_LASI_LINK_STATUS_ALARM))
-   return IXGBE_SUCCESS;
+   if (hw->mac.ops.get_media_type(hw) != ixgbe_media_type_copper)
+   return IXGBE_ERR_CONFIG;

-   /* we read this twice back to back to indicate current status */
+   /* read this twice back to back to indicate current status */
status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_AUTO_NEG_STATUS,
  IXGBE_MDIO_AUTO_NEG_DEV_TYPE,
  _status);
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.h 
b/drivers/net/ixgbe/base/ixgbe_x550.h
index a60c7ce..bd6ce9d 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.h
+++ b/drivers/net/ixgbe/base/ixgbe_x550.h
@@ -83,7 +83,7 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw);
 s32 ixgbe_setup_kr_x550em(struct ixgbe_hw *hw);
 s32 ixgbe_setup_kx4_x550em(struct ixgbe_hw *hw);
 s32 ixgbe_init_ext_t_x550em(struct ixgbe_hw *hw);
-s32 ixgbe_setup_internal_phy_x550em(struct ixgbe_hw *hw);
+s32 ixgbe_setup_internal_phy_t_x550em(struct ixgbe_hw *hw);
 s32 ixgbe_setup_phy_loopback_x550em(struct ixgbe_hw *hw);
 u32 ixgbe_get_supported_physical_layer_X550em(struct ixgbe_hw *hw);
 void ixgbe_disable_rx_x550(struct ixgbe_hw *hw);
-- 
1.9.3

[dpdk-dev] [PATCH 19/26] ixgbe/base: change return value for ixgbe_setup_internal_phy_t_x550em

2015-06-05 Thread Wenzhuo Lu

This patch changes the return value for ixgbe_setup_internal_phy_t_x550em
when link is down to IXGBE_SUCCESS.
The driver will call ixgbe_setup_internal_phy_t_x550em when a link status
change is reported. The links status change can occur on link up or link
down, and if the link status change is for link down then there is no iXFI
setup necessary and no error condition needs to be returned.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 146b2d4..87942bb 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1578,9 +1578,9 @@ s32 ixgbe_setup_internal_phy_x550em(struct ixgbe_hw *hw)
if (status != IXGBE_SUCCESS)
return status;

-   /* If link is not up return an error indicating treat link as down */
+   /* If link is not up, then there is no setup necessary so return  */
if (!(autoneg_status & IXGBE_MDIO_AUTO_NEG_LINK_STATUS))
-   return IXGBE_ERR_INVALID_LINK_SETTINGS;
+   return IXGBE_SUCCESS;

status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_AUTO_NEG_VENDOR_STAT,
  IXGBE_MDIO_AUTO_NEG_DEV_TYPE,
-- 
1.9.3

[dpdk-dev] [PATCH 18/26] ixgbe/base: move I2C MUX function from ixgbe_x540.c to ixgbe_x550.c

2015-06-05 Thread Wenzhuo Lu

The following patch moves the handling of the I2C MUX (which is only
used for x550em SFP+ devices) out of the ixgbe_x540.c file and
into the ixgbe_x550.c file where it belongs.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x540.c | 30 +-
 drivers/net/ixgbe/base/ixgbe_x550.c | 63 +
 drivers/net/ixgbe/base/ixgbe_x550.h |  2 ++
 3 files changed, 66 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x540.c 
b/drivers/net/ixgbe/base/ixgbe_x540.c
index 1462689..af29b13 100644
--- a/drivers/net/ixgbe/base/ixgbe_x540.c
+++ b/drivers/net/ixgbe/base/ixgbe_x540.c
@@ -739,26 +739,6 @@ STATIC s32 ixgbe_poll_flash_update_done_X540(struct 
ixgbe_hw *hw)
 }

 /**
- * ixgbe_set_mux - Set mux for port 1 access with CS4227
- * @hw: pointer to hardware structure
- * @state: set mux if 1, clear if 0
- */
-STATIC void ixgbe_set_mux(struct ixgbe_hw *hw, u8 state)
-{
-   u32 esdp;
-
-   if (!hw->bus.lan_id)
-   return;
-   esdp = IXGBE_READ_REG(hw, IXGBE_ESDP);
-   if (state)
-   esdp |= IXGBE_ESDP_SDP1;
-   else
-   esdp &= ~IXGBE_ESDP_SDP1;
-   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp);
-   IXGBE_WRITE_FLUSH(hw);
-}
-
-/**
  *  ixgbe_acquire_swfw_sync_X540 - Acquire SWFW semaphore
  *  @hw: pointer to hardware structure
  *  @mask: Mask to specify which semaphore to acquire
@@ -800,8 +780,6 @@ s32 ixgbe_acquire_swfw_sync_X540(struct ixgbe_hw *hw, u32 
mask)
IXGBE_WRITE_REG(hw, IXGBE_SWFW_SYNC, swfw_sync);
ixgbe_release_swfw_sync_semaphore(hw);
msec_delay(5);
-   if (swi2c_mask)
-   ixgbe_set_mux(hw, 1);
return IXGBE_SUCCESS;
}
/* Firmware currently using resource (fwmask), hardware
@@ -832,8 +810,6 @@ s32 ixgbe_acquire_swfw_sync_X540(struct ixgbe_hw *hw, u32 
mask)
IXGBE_WRITE_REG(hw, IXGBE_SWFW_SYNC, swfw_sync);
ixgbe_release_swfw_sync_semaphore(hw);
msec_delay(5);
-   if (swi2c_mask)
-   ixgbe_set_mux(hw, 1);
return IXGBE_SUCCESS;
}
/* If the resource is not released by other SW the SW can assume that
@@ -871,10 +847,8 @@ void ixgbe_release_swfw_sync_X540(struct ixgbe_hw *hw, u32 
mask)

DEBUGFUNC("ixgbe_release_swfw_sync_X540");

-   if (mask & IXGBE_GSSR_I2C_MASK) {
+   if (mask & IXGBE_GSSR_I2C_MASK)
swmask |= mask & IXGBE_GSSR_I2C_MASK;
-   ixgbe_set_mux(hw, 0);
-   }
ixgbe_get_swfw_sync_semaphore(hw);

swfw_sync = IXGBE_READ_REG(hw, IXGBE_SWFW_SYNC);
@@ -1036,5 +1010,3 @@ s32 ixgbe_blink_led_stop_X540(struct ixgbe_hw *hw, u32 
index)

return IXGBE_SUCCESS;
 }
-
-
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index a321594..146b2d4 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -383,6 +383,9 @@ s32 ixgbe_init_ops_X550EM(struct ixgbe_hw *hw)
ixgbe_get_supported_physical_layer_X550em;

mac->ops.setup_fc = ixgbe_setup_fc_X550em;
+   mac->ops.acquire_swfw_sync = ixgbe_acquire_swfw_sync_X550em;
+   mac->ops.release_swfw_sync = ixgbe_release_swfw_sync_X550em;
+
/* PHY */
phy->ops.init = ixgbe_init_phy_ops_X550em;
phy->ops.identify = ixgbe_identify_phy_x550em;
@@ -2549,3 +2552,63 @@ s32 ixgbe_setup_fc_X550em(struct ixgbe_hw *hw)
 out:
return ret_val;
 }
+
+/**
+ * ixgbe_set_mux - Set mux for port 1 access with CS4227
+ * @hw: pointer to hardware structure
+ * @state: set mux if 1, clear if 0
+ */
+STATIC void ixgbe_set_mux(struct ixgbe_hw *hw, u8 state)
+{
+   u32 esdp;
+
+   if (!hw->bus.lan_id)
+   return;
+   esdp = IXGBE_READ_REG(hw, IXGBE_ESDP);
+   if (state)
+   esdp |= IXGBE_ESDP_SDP1;
+   else
+   esdp &= ~IXGBE_ESDP_SDP1;
+   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp);
+   IXGBE_WRITE_FLUSH(hw);
+}
+
+/**
+ *  ixgbe_acquire_swfw_sync_X550em - Acquire SWFW semaphore
+ *  @hw: pointer to hardware structure
+ *  @mask: Mask to specify which semaphore to acquire
+ *
+ *  Acquires the SWFW semaphore and sets the I2C MUX
+ **/
+s32 ixgbe_acquire_swfw_sync_X550em(struct ixgbe_hw *hw, u32 mask)
+{
+   s32 status;
+
+   DEBUGFUNC("ixgbe_acquire_swfw_sync_X550em");
+
+   status = ixgbe_acquire_swfw_sync_X540(hw, mask);
+   if (status)
+   return status;
+
+   if (mask & IXGBE_GSSR_I2C_MASK)
+   ixgbe_set_mux(hw, 1);
+
+   return IXGBE_SUCCESS;
+}
+
+/**
+ *  ixgbe_release_swfw_sync_X550em - Release SWFW semaphore
+ *  @hw: pointer to hardware structure
+ *  @mask: Mask to specify which semaphore to release
+ *
+ *  Releases the SWFW semaphore

[dpdk-dev] [PATCH 17/26] ixgbe/base: new simplified x550em init flow

2015-06-05 Thread Wenzhuo Lu

The init flow is simplified. We no longer wait for the PHY FW init
complete bit to be set as this bit is only set once by the PHY at power
on and then cleared on the first read. So only the first instance of
running SW (or possibly MAC FW) needs to initialize the PHY.

The PHY initialization has been simplified and now only requires that
the PHY FW be "un-stalled". SW no longer needs to put the PHY in
low-power mode or enable the transceiver.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 85 +
 1 file changed, 19 insertions(+), 66 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 34ea26f..a321594 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1293,84 +1293,37 @@ s32 ixgbe_init_ext_t_x550em(struct ixgbe_hw *hw)
 {
u32 status;
u16 reg;
-   u32 retries = 1;
-
-   /* TODO: The number of attempts and delay between attempts is undefined 
*/
-   do {
-   /* decrement retries counter and exit if we hit 0 */
-   if (retries < 1) {
-   ERROR_REPORT1(IXGBE_ERROR_INVALID_STATE,
- "External PHY not yet finished 
resetting.");
-   return IXGBE_ERR_PHY;
-   }
-   retries--;
-
-   usec_delay(0);
-
-   status = hw->phy.ops.read_reg(hw,
- IXGBE_MDIO_TX_VENDOR_ALARMS_3,
- IXGBE_MDIO_PMA_PMD_DEV_TYPE,
- );
-
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   /* Verify PHY FW reset has completed */
-   } while ((reg & IXGBE_MDIO_TX_VENDOR_ALARMS_3_RST_MASK) != 1);

-   /* Set port to low power mode */
status = hw->phy.ops.read_reg(hw,
- IXGBE_MDIO_VENDOR_SPECIFIC_1_CONTROL,
- IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
- );
-
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   reg |= IXGBE_MDIO_PHY_SET_LOW_POWER_MODE;
-
-   status = hw->phy.ops.write_reg(hw,
-  IXGBE_MDIO_VENDOR_SPECIFIC_1_CONTROL,
-  IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
-  reg);
-
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   /* Enable the transmitter */
-   status = hw->phy.ops.read_reg(hw,
- IXGBE_MDIO_PMD_STD_TX_DISABLE_CNTR,
+ IXGBE_MDIO_TX_VENDOR_ALARMS_3,
  IXGBE_MDIO_PMA_PMD_DEV_TYPE,
  );

if (status != IXGBE_SUCCESS)
return status;

-   reg &= ~IXGBE_MDIO_PMD_GLOBAL_TX_DISABLE;
-
-   status = hw->phy.ops.write_reg(hw,
-  IXGBE_MDIO_PMD_STD_TX_DISABLE_CNTR,
-  IXGBE_MDIO_PMA_PMD_DEV_TYPE,
-  reg);
-
-   if (status != IXGBE_SUCCESS)
-   return status;
+   /* If PHY FW reset completed bit is set then this is the first
+* SW instance after a power on so the PHY FW must be un-stalled.
+*/
+   if (reg & IXGBE_MDIO_TX_VENDOR_ALARMS_3_RST_MASK) {
+   status = hw->phy.ops.read_reg(hw,
+   IXGBE_MDIO_GLOBAL_RES_PR_10,
+   IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
+   );

-   /* Un-stall the PHY FW */
-   status = hw->phy.ops.read_reg(hw,
- IXGBE_MDIO_GLOBAL_RES_PR_10,
- IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
- );
+   if (status != IXGBE_SUCCESS)
+   return status;

-   if (status != IXGBE_SUCCESS)
-   return status;
+   reg &= ~IXGBE_MDIO_POWER_UP_STALL;

-   reg &= ~IXGBE_MDIO_POWER_UP_STALL;
+   status = hw->phy.ops.write_reg(hw,
+   IXGBE_MDIO_GLOBAL_RES_PR_10,
+   IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
+   reg);

-   status = hw->phy.ops.write_reg(hw,
-  IXGBE_MDIO_GLOBAL_RES_PR_10,
-  IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE,
-  reg);
+   if (status != IXGBE_SUCCESS)
+   return status;
+   }

return status;
 }
-- 
1.9.3

[dpdk-dev] [PATCH 16/26] ixgbe/base: fix flow control for KR backplane

2015-06-05 Thread Wenzhuo Lu

For the KR backplane which is different from other backplane,
in that we can't use auto-negotiation to determine the
mode. Instead, use whatever the user configured.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_api.c| 12 +
 drivers/net/ixgbe/base/ixgbe_api.h|  1 +
 drivers/net/ixgbe/base/ixgbe_common.c |  7 +--
 drivers/net/ixgbe/base/ixgbe_common.h |  1 +
 drivers/net/ixgbe/base/ixgbe_type.h   |  5 ++
 drivers/net/ixgbe/base/ixgbe_x550.c   | 87 +++
 drivers/net/ixgbe/base/ixgbe_x550.h   |  1 +
 7 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_api.c 
b/drivers/net/ixgbe/base/ixgbe_api.c
index ff0cd70..e08a2e0 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.c
+++ b/drivers/net/ixgbe/base/ixgbe_api.c
@@ -1069,6 +1069,18 @@ s32 ixgbe_fc_enable(struct ixgbe_hw *hw)
 }

 /**
+ *  ixgbe_setup_fc - Set up flow control
+ *  @hw: pointer to hardware structure
+ *
+ *  Called at init time to set up flow control.
+ **/
+s32 ixgbe_setup_fc(struct ixgbe_hw *hw)
+{
+   return ixgbe_call_func(hw, hw->mac.ops.setup_fc, (hw),
+   IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  * ixgbe_set_fw_drv_ver - Try to send the driver version number FW
  * @hw: pointer to hardware structure
  * @maj: driver major number to be sent to firmware
diff --git a/drivers/net/ixgbe/base/ixgbe_api.h 
b/drivers/net/ixgbe/base/ixgbe_api.h
index 9ffe196..b08c846 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.h
+++ b/drivers/net/ixgbe/base/ixgbe_api.h
@@ -128,6 +128,7 @@ s32 ixgbe_set_vfta(struct ixgbe_hw *hw, u32 vlan,
 s32 ixgbe_set_vlvf(struct ixgbe_hw *hw, u32 vlan, u32 vind,
   bool vlan_on, bool *vfta_changed);
 s32 ixgbe_fc_enable(struct ixgbe_hw *hw);
+s32 ixgbe_setup_fc(struct ixgbe_hw *hw);
 s32 ixgbe_set_fw_drv_ver(struct ixgbe_hw *hw, u8 maj, u8 min, u8 build,
 u8 ver);
 s32 ixgbe_get_thermal_sensor_data(struct ixgbe_hw *hw);
diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index 3758df1..7a8eb6b 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -134,6 +134,7 @@ s32 ixgbe_init_ops_generic(struct ixgbe_hw *hw)

/* Flow Control */
mac->ops.fc_enable = ixgbe_fc_enable_generic;
+   mac->ops.setup_fc = ixgbe_setup_fc_generic;

/* Link */
mac->ops.get_link_capabilities = NULL;
@@ -200,19 +201,19 @@ bool ixgbe_device_supports_autoneg_fc(struct ixgbe_hw *hw)
 }

 /**
- *  ixgbe_setup_fc - Set up flow control
+ *  ixgbe_setup_fc_generic - Set up flow control
  *  @hw: pointer to hardware structure
  *
  *  Called at init time to set up flow control.
  **/
-STATIC s32 ixgbe_setup_fc(struct ixgbe_hw *hw)
+s32 ixgbe_setup_fc_generic(struct ixgbe_hw *hw)
 {
s32 ret_val = IXGBE_SUCCESS;
u32 reg = 0, reg_bp = 0;
u16 reg_cu = 0;
bool locked = false;

-   DEBUGFUNC("ixgbe_setup_fc");
+   DEBUGFUNC("ixgbe_setup_fc_generic");

/* Validate the requested mode */
if (hw->fc.strict_ieee && hw->fc.requested_mode == ixgbe_fc_rx_pause) {
diff --git a/drivers/net/ixgbe/base/ixgbe_common.h 
b/drivers/net/ixgbe/base/ixgbe_common.h
index 71507df..fd67a88 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.h
+++ b/drivers/net/ixgbe/base/ixgbe_common.h
@@ -111,6 +111,7 @@ s32 ixgbe_enable_sec_rx_path_generic(struct ixgbe_hw *hw);
 s32 ixgbe_fc_enable_generic(struct ixgbe_hw *hw);
 bool ixgbe_device_supports_autoneg_fc(struct ixgbe_hw *hw);
 void ixgbe_fc_autoneg(struct ixgbe_hw *hw);
+s32 ixgbe_setup_fc_generic(struct ixgbe_hw *hw);

 s32 ixgbe_validate_mac_addr(u8 *mac_addr);
 s32 ixgbe_acquire_swfw_sync(struct ixgbe_hw *hw, u32 mask);
diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index fb46c97..6a00e5b 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -3593,6 +3593,7 @@ struct ixgbe_mac_operations {

/* Flow Control */
s32 (*fc_enable)(struct ixgbe_hw *);
+   s32 (*setup_fc)(struct ixgbe_hw *);

/* Manageability interface */
s32 (*set_fw_drv_ver)(struct ixgbe_hw *, u8, u8, u8, u8);
@@ -3817,6 +3818,7 @@ struct ixgbe_hw {

 #define IXGBE_KRM_PORT_CAR_GEN_CTRL(P) ((P == 0) ? (0x4010) : (0x8010))
 #define IXGBE_KRM_LINK_CTRL_1(P)   ((P == 0) ? (0x420C) : (0x820C))
+#define IXGBE_KRM_AN_CNTL_1(P) ((P == 0) ? (0x422C) : (0x822C))
 #define IXGBE_KRM_DSP_TXFFE_STATE_4(P) ((P == 0) ? (0x4634) : (0x8634))
 #define IXGBE_KRM_DSP_TXFFE_STATE_5(P) ((P == 0) ? (0x4638) : (0x8638))
 #define IXGBE_KRM_RX_TRN_LINKUP_CTRL(P)((P == 0) ? (0x4B00) : (0x8B00))
@@ -3839,6 +3841,9 @@ struct ixgbe_hw {
 #define IXGBE_KRM_LINK_CTRL_1_TETH_AN_ENABLE   (1 << 29)
 #define IXGBE_KRM_LINK_CTRL_1_TETH_AN_RESTART  (1 << 31)

+#define IXGBE_KRM_AN_CNTL_1_SYM_PAUSE  (1 << 28)
+#define

[dpdk-dev] [PATCH 15/26] ixgbe/base: add SW based LPLU support

2015-06-05 Thread Wenzhuo Lu

This patch adds SW Low Power Link Up (LPLU) support for x550em PHY.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_api.c  |  13 +++
 drivers/net/ixgbe/base/ixgbe_api.h  |   1 +
 drivers/net/ixgbe/base/ixgbe_type.h |  17 +++-
 drivers/net/ixgbe/base/ixgbe_x550.c | 173 
 drivers/net/ixgbe/base/ixgbe_x550.h |   2 +
 5 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_api.c 
b/drivers/net/ixgbe/base/ixgbe_api.c
index 8a14888..ff0cd70 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.c
+++ b/drivers/net/ixgbe/base/ixgbe_api.c
@@ -1268,6 +1268,19 @@ void ixgbe_restore_mdd_vf(struct ixgbe_hw *hw, u32 vf)
 }

 /**
+ *  ixgbe_enter_lplu - Transition to low power states
+ *  @hw: pointer to hardware structure
+ *
+ * Configures Low Power Link Up on transition to low power states
+ * (from D0 to non-D0).
+ **/
+s32 ixgbe_enter_lplu(struct ixgbe_hw *hw)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.enter_lplu, (hw),
+   IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  *  ixgbe_read_analog_reg8 - Reads 8 bit analog register
  *  @hw: pointer to hardware structure
  *  @reg: analog register to read
diff --git a/drivers/net/ixgbe/base/ixgbe_api.h 
b/drivers/net/ixgbe/base/ixgbe_api.h
index d822e52..9ffe196 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.h
+++ b/drivers/net/ixgbe/base/ixgbe_api.h
@@ -210,6 +210,7 @@ void ixgbe_disable_mdd(struct ixgbe_hw *hw);
 void ixgbe_enable_mdd(struct ixgbe_hw *hw);
 void ixgbe_mdd_event(struct ixgbe_hw *hw, u32 *vf_bitmap);
 void ixgbe_restore_mdd_vf(struct ixgbe_hw *hw, u32 vf);
+s32 ixgbe_enter_lplu(struct ixgbe_hw *hw);
 void ixgbe_set_rate_select_speed(struct ixgbe_hw *hw, ixgbe_link_speed speed);
 void ixgbe_disable_rx(struct ixgbe_hw *hw);
 void ixgbe_enable_rx(struct ixgbe_hw *hw);
diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index d095ae8..fb46c97 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -1377,6 +1377,7 @@ struct ixgbe_dmac_config {
 #define IXGBE_MDIO_AUTO_NEG_CONTROL0x0 /* AUTO_NEG Control Reg */
 #define IXGBE_MDIO_AUTO_NEG_STATUS 0x1 /* AUTO_NEG Status Reg */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STAT0xC800 /* AUTO_NEG Vendor 
Status Reg */
+#define IXGBE_MDIO_AUTO_NEG_VENDOR_TX_ALARM 0xCC00 /* AUTO_NEG Vendor TX Reg */
 #define IXGBE_MDIO_AUTO_NEG_ADVT   0x10 /* AUTO_NEG Advt Reg */
 #define IXGBE_MDIO_AUTO_NEG_LP 0x13 /* AUTO_NEG LP Status Reg */
 #define IXGBE_MDIO_AUTO_NEG_EEE_ADVT   0x3C /* AUTO_NEG EEE Advt Reg */
@@ -1396,6 +1397,10 @@ struct ixgbe_dmac_config {
 #define IXGBE_MDIO_PHY_1000BASET_ABILITY   0x0020 /* 1000BaseT capable */
 #define IXGBE_MDIO_PHY_100BASETX_ABILITY   0x0080 /* 100BaseTX capable */
 #define IXGBE_MDIO_PHY_SET_LOW_POWER_MODE  0x0800 /* Set low power mode */
+#define IXGBE_AUTO_NEG_LP_STATUS   0xE820 /* AUTO NEG Rx LP Status Reg */
+#define IXGBE_AUTO_NEG_LP_1000BASE_CAP 0x8000 /* AUTO NEG Rx LP 1000BaseT Cap 
*/
+#define IXGBE_AUTO_NEG_LP_10GBASE_CAP  0x0800 /* AUTO NEG Rx LP 10GBaseT Cap */
+#define IXGBE_AUTO_NEG_10GBASET_STAT   0x0021 /* AUTO NEG 10G BaseT Stat */

 #define IXGBE_MDIO_TX_VENDOR_ALARMS_3  0xCC02 /* Vendor Alarms 3 Reg */
 #define IXGBE_MDIO_TX_VENDOR_ALARMS_3_RST_MASK 0x3 /* PHY Reset Complete Mask 
*/
@@ -1423,7 +1428,8 @@ struct ixgbe_dmac_config {

 #define IXGBE_MDIO_AUTO_NEG_LINK_STATUS0x4 /* Indicates if 
link is up */

-#define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_MASK 0x7 /* Speed/Duplex 
Mask */
+#define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_MASK 0x7 /* Speed/Duplex Mask */
+#define IXGBE_MDIO_AUTO_NEG_VEN_STAT_SPEED_MASK0x6 /* Speed 
Mask */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_10M_HALF 0x0 /* 10Mb/s Half 
Duplex */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_10M_FULL 0x1 /* 10Mb/s Full 
Duplex */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_100M_HALF0x2 /* 100Mb/s Half 
Duplex */
@@ -1432,6 +1438,8 @@ struct ixgbe_dmac_config {
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_1GB_FULL 0x5 /* 1Gb/s Full 
Duplex */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_10GB_HALF0x6 /* 10Gb/s Half 
Duplex */
 #define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_10GB_FULL0x7 /* 10Gb/s Full 
Duplex */
+#define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_1GB  0x4 /* 1Gb/s */
+#define IXGBE_MDIO_AUTO_NEG_VENDOR_STATUS_10GB 0x6 /* 10Gb/s */

 #define IXGBE_MII_10GBASE_T_AUTONEG_CTRL_REG   0x20   /* 10G Control Reg */
 #define IXGBE_MII_AUTONEG_VENDOR_PROVISION_1_REG 0xC400 /* 1G Provisioning 1 */
@@ -2165,6 +2173,11 @@ enum {
 #define IXGBE_NVM_POLL_WRITE   1 /* Flag for polling for wr complete */
 #define IXGBE_NVM_POLL_READ0 /* Flag for polling for rd complete */

+#define NVM_INIT_CTRL_30x38
+#define NVM_INIT_CTRL_3_LPLU   0x8
+#define NVM_INIT_CTRL_3_D10GMP_PORT0 0x40
+#define

[dpdk-dev] [PATCH 14/26] ixgbe/base: add SFP+ dual-speed support

2015-06-05 Thread Wenzhuo Lu

This patch adds SFP+ dual-speed support.
82599 fiber link code was moved from ixgbe_82599.c to ixgbe_commom.c
for use by X550em, and the API was updated to support the common code
usage. SFP MAC link code is added to x550em.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_82599.c  | 182 ---
 drivers/net/ixgbe/base/ixgbe_82599.h  |   5 +-
 drivers/net/ixgbe/base/ixgbe_api.c|  29 +
 drivers/net/ixgbe/base/ixgbe_api.h|   3 +
 drivers/net/ixgbe/base/ixgbe_common.c | 228 ++
 drivers/net/ixgbe/base/ixgbe_common.h |   5 +
 drivers/net/ixgbe/base/ixgbe_phy.h|   5 +
 drivers/net/ixgbe/base/ixgbe_type.h   |  16 +--
 drivers/net/ixgbe/base/ixgbe_x550.c   |  23 +++-
 drivers/net/ixgbe/base/ixgbe_x550.h   |   3 +
 10 files changed, 327 insertions(+), 172 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_82599.c 
b/drivers/net/ixgbe/base/ixgbe_82599.c
index 90de625..f0deb59 100644
--- a/drivers/net/ixgbe/base/ixgbe_82599.c
+++ b/drivers/net/ixgbe/base/ixgbe_82599.c
@@ -84,6 +84,9 @@ void ixgbe_init_mac_link_ops_82599(struct ixgbe_hw *hw)
if (hw->phy.multispeed_fiber) {
/* Set up dual speed SFP+ support */
mac->ops.setup_link = ixgbe_setup_mac_link_multispeed_fiber;
+   mac->ops.setup_mac_link = ixgbe_setup_mac_link_82599;
+   mac->ops.set_rate_select_speed =
+  ixgbe_set_hard_rate_select_speed;
} else {
if ((ixgbe_get_media_type(hw) == ixgbe_media_type_backplane) &&
 (hw->phy.smart_speed == ixgbe_smart_speed_auto ||
@@ -729,172 +732,33 @@ void ixgbe_flap_tx_laser_multispeed_fiber(struct 
ixgbe_hw *hw)
}
 }

-
 /**
- *  ixgbe_setup_mac_link_multispeed_fiber - Set MAC link speed
+ *  ixgbe_set_hard_rate_select_speed - Set module link speed
  *  @hw: pointer to hardware structure
- *  @speed: new link speed
- *  @autoneg_wait_to_complete: true when waiting for completion is needed
+ *  @speed: link speed to set
  *
- *  Set the link speed in the AUTOC register and restarts link.
- **/
-s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
-ixgbe_link_speed speed,
-bool autoneg_wait_to_complete)
+ *  Set module link speed via RS0/RS1 rate select pins.
+ */
+void ixgbe_set_hard_rate_select_speed(struct ixgbe_hw *hw,
+   ixgbe_link_speed speed)
 {
-   s32 status = IXGBE_SUCCESS;
-   ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_UNKNOWN;
-   ixgbe_link_speed highest_link_speed = IXGBE_LINK_SPEED_UNKNOWN;
-   u32 speedcnt = 0;
u32 esdp_reg = IXGBE_READ_REG(hw, IXGBE_ESDP);
-   u32 i = 0;
-   bool autoneg, link_up = false;
-
-   DEBUGFUNC("ixgbe_setup_mac_link_multispeed_fiber");
-
-   /* Mask off requested but non-supported speeds */
-   status = ixgbe_get_link_capabilities(hw, _speed, );
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   speed &= link_speed;
-
-   /*
-* Try each speed one by one, highest priority first.  We do this in
-* software because 10gb fiber doesn't support speed autonegotiation.
-*/
-   if (speed & IXGBE_LINK_SPEED_10GB_FULL) {
-   speedcnt++;
-   highest_link_speed = IXGBE_LINK_SPEED_10GB_FULL;
-
-   /* If we already have link at this speed, just jump out */
-   status = ixgbe_check_link(hw, _speed, _up, false);
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   if ((link_speed == IXGBE_LINK_SPEED_10GB_FULL) && link_up)
-   goto out;
-
-   /* Set the module link speed */
-   switch (hw->phy.media_type) {
-   case ixgbe_media_type_fiber:
-   esdp_reg |= (IXGBE_ESDP_SDP5_DIR | IXGBE_ESDP_SDP5);
-   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg);
-   IXGBE_WRITE_FLUSH(hw);
-   break;
-   case ixgbe_media_type_fiber_qsfp:
-   /* QSFP module automatically detects MAC link speed */
-   break;
-   default:
-   DEBUGOUT("Unexpected media type.\n");
-   break;
-   }
-
-   /* Allow module to change analog characteristics (1G->10G) */
-   msec_delay(40);
-
-   status = ixgbe_setup_mac_link_82599(hw,
-   IXGBE_LINK_SPEED_10GB_FULL,
-   autoneg_wait_to_complete);
-   if (status != IXGBE_SUCCESS)
-   return status;
-
-   /* Flap the tx laser if it has not already been done */
-

[dpdk-dev] [PATCH 13/26] ixgbe/base: set lan_id for non-PCIe devices

2015-06-05 Thread Wenzhuo Lu

The introduction of ixgbe_get_bus_info_X550em failed to call the
set_lan_id method to set the func and lan_id and deal with port-
swapped configurations. Add the call to resolve the problem.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index e6d0a9f..ba4f38a 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -2269,6 +2269,8 @@ s32 ixgbe_get_bus_info_X550em(struct ixgbe_hw *hw)
hw->bus.width = ixgbe_bus_width_unknown;
hw->bus.speed = ixgbe_bus_speed_unknown;

+   hw->mac.ops.set_lan_id(hw);
+
return IXGBE_SUCCESS;
 }

-- 
1.9.3

[dpdk-dev] [PATCH 12/26] ixgbe/base: disable FEC(Forward Error Correction) to save power

2015-06-05 Thread Wenzhuo Lu

The FEC feature can improve BER(Bit Error Rate) but uses more power
to do so. It also cannot be used with EEE(Energy Efficient Ethernet).
EEE is an important feature, and we have no known BER issues, so FEC
is not needed.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 319aa0d..e6d0a9f 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -1383,8 +1383,8 @@ s32 ixgbe_setup_kr_x550em(struct ixgbe_hw *hw)
return status;

reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_ENABLE;
-   reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_FEC_REQ;
-   reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_CAP_FEC;
+   reg_val &= ~(IXGBE_KRM_LINK_CTRL_1_TETH_AN_FEC_REQ |
+IXGBE_KRM_LINK_CTRL_1_TETH_AN_CAP_FEC);
reg_val &= ~(IXGBE_KRM_LINK_CTRL_1_TETH_AN_CAP_KR |
 IXGBE_KRM_LINK_CTRL_1_TETH_AN_CAP_KX);

-- 
1.9.3

[dpdk-dev] [PATCH 11/26] ixgbe/base: restore ESDP settings after MAC reset

2015-06-05 Thread Wenzhuo Lu

The I2C mux control relies on the SDP setting in the ESDP register
so it is necessary to restore the value after a MAC reset. So,
put the code in a function so it can be used in more than one place.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 41 -
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index c91e737..319aa0d 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -264,6 +264,23 @@ STATIC void ixgbe_check_cs4227(struct ixgbe_hw *hw)
 }

 /**
+ * ixgbe_setup_mux_ctl - Setup ESDP register for I2C mux control
+ * @hw: pointer to hardware structure
+ **/
+STATIC void ixgbe_setup_mux_ctl(struct ixgbe_hw *hw)
+{
+   u32 esdp = IXGBE_READ_REG(hw, IXGBE_ESDP);
+
+   if (hw->bus.lan_id) {
+   esdp &= ~(IXGBE_ESDP_SDP1_NATIVE | IXGBE_ESDP_SDP1);
+   esdp |= IXGBE_ESDP_SDP1_DIR;
+   }
+   esdp &= ~(IXGBE_ESDP_SDP0_NATIVE | IXGBE_ESDP_SDP0_DIR);
+   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp);
+   IXGBE_WRITE_FLUSH(hw);
+}
+
+/**
  * ixgbe_identify_phy_x550em - Get PHY type based on device id
  * @hw: pointer to hardware structure
  *
@@ -271,20 +288,11 @@ STATIC void ixgbe_check_cs4227(struct ixgbe_hw *hw)
  */
 STATIC s32 ixgbe_identify_phy_x550em(struct ixgbe_hw *hw)
 {
-   u32 esdp = IXGBE_READ_REG(hw, IXGBE_ESDP);
-
switch (hw->device_id) {
case IXGBE_DEV_ID_X550EM_X_SFP:
/* set up for CS4227 usage */
hw->phy.phy_semaphore_mask = IXGBE_GSSR_SHARED_I2C_SM;
-   if (hw->bus.lan_id) {
-
-   esdp &= ~(IXGBE_ESDP_SDP1_NATIVE | IXGBE_ESDP_SDP1);
-   esdp |= IXGBE_ESDP_SDP1_DIR;
-   }
-   esdp &= ~(IXGBE_ESDP_SDP0_NATIVE | IXGBE_ESDP_SDP0_DIR);
-   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp);
-
+   ixgbe_setup_mux_ctl(hw);
ixgbe_check_cs4227(hw);

return ixgbe_identify_module_generic(hw);
@@ -1099,20 +1107,12 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)
 {
struct ixgbe_phy_info *phy = >phy;
s32 ret_val;
-   u32 esdp;

DEBUGFUNC("ixgbe_init_phy_ops_X550em");

if (hw->device_id == IXGBE_DEV_ID_X550EM_X_SFP) {
-   esdp = IXGBE_READ_REG(hw, IXGBE_ESDP);
phy->phy_semaphore_mask = IXGBE_GSSR_SHARED_I2C_SM;
-
-   if (hw->bus.lan_id) {
-   esdp &= ~(IXGBE_ESDP_SDP1_NATIVE | IXGBE_ESDP_SDP1);
-   esdp |= IXGBE_ESDP_SDP1_DIR;
-   }
-   esdp &= ~(IXGBE_ESDP_SDP0_NATIVE | IXGBE_ESDP_SDP0_DIR);
-   IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp);
+   ixgbe_setup_mux_ctl(hw);
}

/* Identify the PHY or SFP module */
@@ -1269,6 +1269,9 @@ mac_reset_top:
hw->mac.ops.init_rx_addrs(hw);


+   if (hw->device_id == IXGBE_DEV_ID_X550EM_X_SFP)
+   ixgbe_setup_mux_ctl(hw);
+
return status;
 }

-- 
1.9.3

[dpdk-dev] [PATCH 10/26] ixgbe/base: add logic to reset CS4227 when needed

2015-06-05 Thread Wenzhuo Lu

On some hardware platforms, the CS4227 does not initialize properly.
Detect those cases and reset it appropriately.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_phy.h  |  12 +++
 drivers/net/ixgbe/base/ixgbe_x550.c | 184 
 2 files changed, 196 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_phy.h 
b/drivers/net/ixgbe/base/ixgbe_phy.h
index 7456bf4..ba5b308 100644
--- a/drivers/net/ixgbe/base/ixgbe_phy.h
+++ b/drivers/net/ixgbe/base/ixgbe_phy.h
@@ -83,9 +83,21 @@ POSSIBILITY OF SUCH DAMAGE.
 #define IXGBE_I2C_EEPROM_STATUS_IN_PROGRESS0x3

 #define IXGBE_CS4227   0xBE/* CS4227 address */
+#define IXGBE_CS4227_GLOBAL_ID_LSB 0
+#define IXGBE_CS4227_SCRATCH   2
+#define IXGBE_CS4227_GLOBAL_ID_VALUE   0x03E5
+#define IXGBE_CS4227_SCRATCH_VALUE 0x5aa5
+#define IXGBE_CS4227_RETRIES   5
 #define IXGBE_CS4227_SPARE24_LSB   0x12B0  /* Reg to program EDC */
 #define IXGBE_CS4227_EDC_MODE_CX1  0x0002
 #define IXGBE_CS4227_EDC_MODE_SR   0x0004
+#define IXGBE_CS4227_RESET_HOLD500 /* microseconds */
+#define IXGBE_CS4227_RESET_DELAY   500 /* milliseconds */
+#define IXGBE_CS4227_CHECK_DELAY   30  /* milliseconds */
+#define IXGBE_PE   0xE0/* Port expander address */
+#define IXGBE_PE_OUTPUT1   /* Output register 
offset */
+#define IXGBE_PE_CONFIG3   /* Config register 
offset */
+#define IXGBE_PE_BIT1  (1 << 1)

 /* Flow control defines */
 #define IXGBE_TAF_SYM_PAUSE0x400
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 0ce1c85..c91e737 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -38,6 +38,7 @@ POSSIBILITY OF SUCH DAMAGE.
 #include "ixgbe_common.h"
 #include "ixgbe_phy.h"

+
 /**
  *  ixgbe_init_ops_X550 - Inits func ptrs and MAC type
  *  @hw: pointer to hardware structure
@@ -82,6 +83,187 @@ s32 ixgbe_init_ops_X550(struct ixgbe_hw *hw)
 }

 /**
+ * ixgbe_read_cs4227 - Read CS4227 register
+ * @hw: pointer to hardware structure
+ * @reg: register number to write
+ * @value: pointer to receive value read
+ *
+ * Returns status code
+ **/
+STATIC s32 ixgbe_read_cs4227(struct ixgbe_hw *hw, u16 reg, u16 *value)
+{
+   return ixgbe_read_i2c_combined_unlocked(hw, IXGBE_CS4227, reg, value);
+}
+
+/**
+ * ixgbe_write_cs4227 - Write CS4227 register
+ * @hw: pointer to hardware structure
+ * @reg: register number to write
+ * @value: value to write to register
+ *
+ * Returns status code
+ **/
+STATIC s32 ixgbe_write_cs4227(struct ixgbe_hw *hw, u16 reg, u16 value)
+{
+   return ixgbe_write_i2c_combined_unlocked(hw, IXGBE_CS4227, reg, value);
+}
+
+/**
+ * ixgbe_get_cs4227_status - Return CS4227 status
+ * @hw: pointer to hardware structure
+ *
+ * Returns error if CS4227 not successfully initialized
+ **/
+STATIC s32 ixgbe_get_cs4227_status(struct ixgbe_hw *hw)
+{
+   s32 status;
+   u16 value = 0;
+   u8 retry;
+
+   for (retry = 0; retry < IXGBE_CS4227_RETRIES; ++retry) {
+   status = ixgbe_read_cs4227(hw, IXGBE_CS4227_GLOBAL_ID_LSB,
+  );
+   if (status != IXGBE_SUCCESS)
+   return status;
+   if (value == IXGBE_CS4227_GLOBAL_ID_VALUE)
+   break;
+   msec_delay(IXGBE_CS4227_CHECK_DELAY);
+   }
+   if (value != IXGBE_CS4227_GLOBAL_ID_VALUE)
+   return IXGBE_ERR_PHY;
+
+   status = ixgbe_write_cs4227(hw, IXGBE_CS4227_SCRATCH,
+   IXGBE_CS4227_SCRATCH_VALUE);
+   if (status != IXGBE_SUCCESS)
+   return status;
+   status = ixgbe_read_cs4227(hw, IXGBE_CS4227_SCRATCH, );
+   if (status != IXGBE_SUCCESS)
+   return status;
+   if (value != IXGBE_CS4227_SCRATCH_VALUE)
+   return IXGBE_ERR_PHY;
+   return IXGBE_SUCCESS;
+}
+
+/**
+ * ixgbe_read_pe - Read register from port expander
+ * @hw: pointer to hardware structure
+ * @reg: register number to read
+ * @value: pointer to receive read value
+ *
+ * Returns status code
+ **/
+STATIC s32 ixgbe_read_pe(struct ixgbe_hw *hw, u8 reg, u8 *value)
+{
+   s32 status;
+
+   status = ixgbe_read_i2c_byte_unlocked(hw, reg, IXGBE_PE, value);
+   if (status != IXGBE_SUCCESS)
+   ERROR_REPORT2(IXGBE_ERROR_CAUTION,
+ "port expander access failed with %d\n", status);
+   return status;
+}
+
+/**
+ * ixgbe_write_pe - Write register to port expander
+ * @hw: pointer to hardware structure
+ * @reg: register number to write
+ * @value: value to write
+ *
+ * Returns status code
+ **/
+STATIC s32 ixgbe_write_pe(struct ixgbe_hw *hw, u8 reg, u8 value)
+{
+   s32 status;
+
+   status = ixgbe_write_i2c_byte_unlocked(hw, reg,

[dpdk-dev] [PATCH 09/26] ixgbe/base: issue firmware command when coming up

2015-06-05 Thread Wenzhuo Lu

The driver now needs to issue a firmware command to inform the
firmware that a driver is coming up. This prevents the possibility
of the firmware and the driver configuring the PHY at the same
time. Upon completion of the command, the firmware will no longer
be configuring the PHY.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_type.h |  2 ++
 drivers/net/ixgbe/base/ixgbe_x550.c | 17 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index 0e4f312..4e38f53 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -2798,6 +2798,7 @@ enum ixgbe_fdir_pballoc_type {
 #define IXGBE_HI_FLASH_ERASE_TIMEOUT   1000 /* Process Erase command limit */
 #define IXGBE_HI_FLASH_UPDATE_TIMEOUT  5000 /* Process Update command limit */
 #define IXGBE_HI_FLASH_APPLY_TIMEOUT   0 /* Process Apply command limit */
+#define IXGBE_HI_PHY_MGMT_REQ_TIMEOUT  2000 /* Wait up to 2 seconds */

 /* CEM Support */
 #define FW_CEM_HDR_LEN 0x4
@@ -2818,6 +2819,7 @@ enum ixgbe_fdir_pballoc_type {
 #define FW_MAX_READ_BUFFER_SIZE1024
 #define FW_DISABLE_RXEN_CMD0xDE
 #define FW_DISABLE_RXEN_LEN0x1
+#define FW_PHY_MGMT_REQ_CMD0x20
 /* Host Interface Command Structures */

 struct ixgbe_hic_hdr {
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 4664583..0ce1c85 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -972,6 +972,7 @@ s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)
  */
 s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
 {
+   struct ixgbe_hic_hdr fw_cmd;
ixgbe_link_speed link_speed;
s32 status;
u32 ctrl = 0;
@@ -980,6 +981,22 @@ s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)

DEBUGFUNC("ixgbe_reset_hw_X550em");

+   fw_cmd.cmd = FW_PHY_MGMT_REQ_CMD;
+   fw_cmd.buf_len = 0;
+   fw_cmd.cmd_or_resp.cmd_resv = 0;
+   fw_cmd.checksum = FW_DEFAULT_CHECKSUM;
+   status = ixgbe_host_interface_command(hw, (u32 *)_cmd,
+ sizeof(fw_cmd),
+ IXGBE_HI_PHY_MGMT_REQ_TIMEOUT,
+ true);
+   if (status)
+   ERROR_REPORT2(IXGBE_ERROR_CAUTION,
+ "PHY mgmt command failed with %d\n", status);
+   else if (fw_cmd.cmd_or_resp.ret_status != FW_CEM_RESP_STATUS_SUCCESS)
+   ERROR_REPORT2(IXGBE_ERROR_CAUTION,
+ "PHY mgmt command returned %d\n",
+ fw_cmd.cmd_or_resp.ret_status);
+
/* Call adapter stop to disable Tx/Rx and clear interrupts */
status = hw->mac.ops.stop_adapter(hw);
if (status != IXGBE_SUCCESS)
-- 
1.9.3

[dpdk-dev] [PATCH 08/26] ixgbe/base: reduce I2C retry count on X550 devices

2015-06-05 Thread Wenzhuo Lu

A retry count of 10 is likely to run into problems on X550 devices
that have to detect and reset unresponsive CS4227 devices. So,
reduce the I2C retry count to 3 for X550 and above. This should
avoid any possible regressions in existing devices.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_phy.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c 
b/drivers/net/ixgbe/base/ixgbe_phy.c
index eb89071..e5ededb 100644
--- a/drivers/net/ixgbe/base/ixgbe_phy.c
+++ b/drivers/net/ixgbe/base/ixgbe_phy.c
@@ -121,6 +121,8 @@ STATIC s32 ixgbe_read_i2c_combined_generic_int(struct 
ixgbe_hw *hw, u8 addr,
u8 reg_high;
u8 csum;

+   if (hw->mac.type >= ixgbe_mac_X550)
+   max_retry = 3;
reg_high = ((reg >> 7) & 0xFE) | 1; /* Indicate read combined */
csum = ixgbe_ones_comp_byte_add(reg_high, reg & 0xFF);
csum = ~csum;
@@ -2045,6 +2047,8 @@ STATIC s32 ixgbe_read_i2c_byte_generic_int(struct 
ixgbe_hw *hw, u8 byte_offset,

DEBUGFUNC("ixgbe_read_i2c_byte_generic");

+   if (hw->mac.type >= ixgbe_mac_X550)
+   max_retry = 3;
if (ixgbe_is_sfp_probe(hw, byte_offset, dev_addr))
max_retry = IXGBE_SFP_DETECT_RETRIES;

-- 
1.9.3

[dpdk-dev] [PATCH 07/26] ixgbe/base: provide unlocked I2C methods

2015-06-05 Thread Wenzhuo Lu

Most I2C accesses take and release semaphores for each access. It's
also necessary to perform multiple I2C operations under the same
holding of the semaphore, so provide unlocked I2C methods for that
purpose.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_api.c  |  68 
 drivers/net/ixgbe/base/ixgbe_api.h  |   8 ++
 drivers/net/ixgbe/base/ixgbe_phy.c  | 203 +++-
 drivers/net/ixgbe/base/ixgbe_phy.h  |   4 +
 drivers/net/ixgbe/base/ixgbe_type.h |   8 ++
 5 files changed, 264 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_api.c 
b/drivers/net/ixgbe/base/ixgbe_api.c
index b45d0de..1ba7b9a 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.c
+++ b/drivers/net/ixgbe/base/ixgbe_api.c
@@ -1309,6 +1309,23 @@ s32 ixgbe_read_i2c_byte(struct ixgbe_hw *hw, u8 
byte_offset, u8 dev_addr,
 }

 /**
+ *  ixgbe_read_i2c_byte_unlocked - Reads 8 bit word via I2C from device address
+ *  @hw: pointer to hardware structure
+ *  @byte_offset: byte offset to read
+ *  @dev_addr: I2C bus address to read from
+ *  @data: value read
+ *
+ *  Performs byte read operation to SFP module's EEPROM over I2C interface.
+ **/
+s32 ixgbe_read_i2c_byte_unlocked(struct ixgbe_hw *hw, u8 byte_offset,
+u8 dev_addr, u8 *data)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.read_i2c_byte_unlocked,
+  (hw, byte_offset, dev_addr, data),
+  IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  * ixgbe_read_i2c_combined - Perform I2C read combined operation
  * @hw: pointer to the hardware structure
  * @addr: I2C bus address to read from
@@ -1324,6 +1341,23 @@ s32 ixgbe_read_i2c_combined(struct ixgbe_hw *hw, u8 
addr, u16 reg, u16 *val)
 }

 /**
+ * ixgbe_read_i2c_combined_unlocked - Perform I2C read combined operation
+ * @hw: pointer to the hardware structure
+ * @addr: I2C bus address to read from
+ * @reg: I2C device register to read from
+ * @val: pointer to location to receive read value
+ *
+ * Returns an error code on error.
+ **/
+s32 ixgbe_read_i2c_combined_unlocked(struct ixgbe_hw *hw, u8 addr, u16 reg,
+u16 *val)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.read_i2c_combined_unlocked,
+  (hw, addr, reg, val),
+  IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  *  ixgbe_write_i2c_byte - Writes 8 bit word over I2C
  *  @hw: pointer to hardware structure
  *  @byte_offset: byte offset to write
@@ -1341,6 +1375,24 @@ s32 ixgbe_write_i2c_byte(struct ixgbe_hw *hw, u8 
byte_offset, u8 dev_addr,
 }

 /**
+ *  ixgbe_write_i2c_byte_unlocked - Writes 8 bit word over I2C
+ *  @hw: pointer to hardware structure
+ *  @byte_offset: byte offset to write
+ *  @dev_addr: I2C bus address to write to
+ *  @data: value to write
+ *
+ *  Performs byte write operation to SFP module's EEPROM over I2C interface
+ *  at a specified device address.
+ **/
+s32 ixgbe_write_i2c_byte_unlocked(struct ixgbe_hw *hw, u8 byte_offset,
+ u8 dev_addr, u8 data)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.write_i2c_byte_unlocked,
+  (hw, byte_offset, dev_addr, data),
+  IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  * ixgbe_write_i2c_combined - Perform I2C write combined operation
  * @hw: pointer to the hardware structure
  * @addr: I2C bus address to write to
@@ -1356,6 +1408,22 @@ s32 ixgbe_write_i2c_combined(struct ixgbe_hw *hw, u8 
addr, u16 reg, u16 val)
 }

 /**
+ * ixgbe_write_i2c_combined_unlocked - Perform I2C write combined operation
+ * @hw: pointer to the hardware structure
+ * @addr: I2C bus address to write to
+ * @reg: I2C device register to write to
+ * @val: value to write
+ *
+ * Returns an error code on error.
+ **/
+s32 ixgbe_write_i2c_combined_unlocked(struct ixgbe_hw *hw, u8 addr, u16 reg,
+ u16 val)
+{
+   return ixgbe_call_func(hw, hw->phy.ops.write_i2c_combined_unlocked,
+  (hw, addr, reg, val), IXGBE_NOT_IMPLEMENTED);
+}
+
+/**
  *  ixgbe_write_i2c_eeprom - Writes 8 bit EEPROM word over I2C interface
  *  @hw: pointer to hardware structure
  *  @byte_offset: EEPROM byte offset to write
diff --git a/drivers/net/ixgbe/base/ixgbe_api.h 
b/drivers/net/ixgbe/base/ixgbe_api.h
index d7cc2a6..784d365 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.h
+++ b/drivers/net/ixgbe/base/ixgbe_api.h
@@ -171,10 +171,18 @@ u32 ixgbe_atr_compute_sig_hash_82599(union 
ixgbe_atr_hash_dword input,
 bool ixgbe_verify_lesm_fw_enabled_82599(struct ixgbe_hw *hw);
 s32 ixgbe_read_i2c_byte(struct ixgbe_hw *hw, u8 byte_offset, u8 dev_addr,
u8 *data);
+s32 ixgbe_read_i2c_byte_unlocked(struct ixgbe_hw *hw, u8 byte_offset,
+u8 dev_addr, u8 *data);
 s32 ixgbe_read_i2c_combined(struct ixgbe_hw *hw, u8 addr, u16 reg, u16 *val);
+s32

[dpdk-dev] [PATCH 06/26] ixgbe/base: erase ixgbe_get_hi_status

2015-06-05 Thread Wenzhuo Lu

Remove the function which is not called by the drivers.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_common.c | 30 --
 drivers/net/ixgbe/base/ixgbe_common.h |  1 -
 2 files changed, 31 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index 8179354..11cc2f4 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -4346,36 +4346,6 @@ u8 ixgbe_calculate_checksum(u8 *buffer, u32 length)
 }

 /**
- *  ixgbe_get_hi_status - Get host interface command status
- *  @hw: pointer to the HW structure
- *  @return_code: reads and returns code
- *
- *  Check if command returned with success. On success return IXGBE_SUCCESS
- *  else return IXGBE_ERR_HOST_INTERFACE_COMMAND.
- **/
-s32 ixgbe_get_hi_status(struct ixgbe_hw *hw, u8 *ret_status)
-{
-   struct ixgbe_hic_hdr response;
-   u32 *response_val = (u32 *)
-
-   DEBUGFUNC("ixgbe_get_host_interface_status");
-
-   /* Read the command response */
-   *response_val = IXGBE_CPU_TO_LE32(IXGBE_READ_REG(hw, IXGBE_FLEX_MNG));
-
-   if (ret_status)
-   *ret_status = response.cmd_or_resp.ret_status;
-
-   if (response.cmd_or_resp.ret_status != FW_CEM_RESP_STATUS_SUCCESS) {
-   DEBUGOUT1("Host interface error=%x.\n",
- response.cmd_or_resp.ret_status);
-   return IXGBE_ERR_HOST_INTERFACE_COMMAND;
-   }
-
-   return IXGBE_SUCCESS;
-}
-
-/**
  *  ixgbe_host_interface_command - Issue command to manageability block
  *  @hw: pointer to the HW structure
  *  @buffer: contains the command to write and where the return status will
diff --git a/drivers/net/ixgbe/base/ixgbe_common.h 
b/drivers/net/ixgbe/base/ixgbe_common.h
index b67d46c..25d5eb1 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.h
+++ b/drivers/net/ixgbe/base/ixgbe_common.h
@@ -155,7 +155,6 @@ void ixgbe_enable_relaxed_ordering_gen2(struct ixgbe_hw 
*hw);
 s32 ixgbe_set_fw_drv_ver_generic(struct ixgbe_hw *hw, u8 maj, u8 min,
 u8 build, u8 ver);
 u8 ixgbe_calculate_checksum(u8 *buffer, u32 length);
-s32 ixgbe_get_hi_status(struct ixgbe_hw *hw, u8 *ret_status);
 s32 ixgbe_host_interface_command(struct ixgbe_hw *hw, u32 *buffer,
 u32 length, u32 timeout, bool return_data);

-- 
1.9.3

[dpdk-dev] [PATCH 05/26] ixgbe/base: allow tunneled UDP and TCP frames to reach their destination

2015-06-05 Thread Wenzhuo Lu

All bits in FDIRTCPM and FDIRUDPM are set to 1 when
ixgbe_fdir_set_input_mask_82599 is called. Not settings these bits will cause
TCP and UDP packets to be filtered out when NVGRE or VXLAN mode is enabled.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_82599.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_82599.c 
b/drivers/net/ixgbe/base/ixgbe_82599.c
index d3d8c6f..90de625 100644
--- a/drivers/net/ixgbe/base/ixgbe_82599.c
+++ b/drivers/net/ixgbe/base/ixgbe_82599.c
@@ -1915,7 +1915,12 @@ s32 ixgbe_fdir_set_input_mask_82599(struct ixgbe_hw *hw,
}
IXGBE_WRITE_REG_BE32(hw, IXGBE_FDIRIP6M, fdirip6m);

-   /* Set all bits in FDIRSIP4M and FDIRDIP4M cloud mode */
+   /* Set all bits in FDIRTCPM, FDIRUDPM, FDIRSIP4M and
+* FDIRDIP4M in cloud mode to allow L3/L3 packets to
+* tunnel.
+*/
+   IXGBE_WRITE_REG(hw, IXGBE_FDIRTCPM, 0x);
+   IXGBE_WRITE_REG(hw, IXGBE_FDIRUDPM, 0x);
IXGBE_WRITE_REG_BE32(hw, IXGBE_FDIRDIP4M, 0x);
IXGBE_WRITE_REG_BE32(hw, IXGBE_FDIRSIP4M, 0x);
}
-- 
1.9.3

[dpdk-dev] [PATCH 04/26] ixgbe/base: check return value after calling

2015-06-05 Thread Wenzhuo Lu

This patch moves the check of the return value from
ixgbe_start_hw_generic after the function is called.
Previously we had the code to disable relaxed ordering in
between, which seems a bit out of place.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_82598.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_82598.c 
b/drivers/net/ixgbe/base/ixgbe_82598.c
index 75e3e89..9bdbce4 100644
--- a/drivers/net/ixgbe/base/ixgbe_82598.c
+++ b/drivers/net/ixgbe/base/ixgbe_82598.c
@@ -259,6 +259,8 @@ s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
DEBUGFUNC("ixgbe_start_hw_82598");

ret_val = ixgbe_start_hw_generic(hw);
+   if (ret_val)
+   return ret_val;

/* Disable relaxed ordering */
for (i = 0; ((i < hw->mac.max_tx_queues) &&
@@ -277,8 +279,7 @@ s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
}

/* set the completion timeout for interface */
-   if (ret_val == IXGBE_SUCCESS)
-   ixgbe_set_pcie_completion_timeout(hw);
+   ixgbe_set_pcie_completion_timeout(hw);

return ret_val;
 }
-- 
1.9.3

[dpdk-dev] ACL-Dynamic Adding or Deleting rules

2015-06-05 Thread Sugumaran, Varthamanan

Hi,
Is there is a way to add/delete a single ACL rule in librte_acl?
I had looked at the library and found no method to add/delete a rule 
dynamically to the existing ACL context.
Please let me know if there are any alternate ways of doing it.

Thanks
Vartha

[dpdk-dev] [PATCH 03/26] ixgbe/base: fix typo error in code comment

2015-06-05 Thread Wenzhuo Lu

There's a typo in the code comment for FC end
of Frame Exception (FCEOFe/IPE), so fixed the typo.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_type.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index fc333f1..a739f97 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -2474,7 +2474,7 @@ enum {
 #define IXGBE_RXDADV_ERR_SHIFT 20 /* RDESC.ERRORS shift */
 #define IXGBE_RXDADV_ERR_OUTERIPER 0x0400 /* CRC IP Header error */
 #define IXGBE_RXDADV_ERR_RXE   0x2000 /* Any MAC Error */
-#define IXGBE_RXDADV_ERR_FCEOFE0x8000 /* FCoEFe/IPE */
+#define IXGBE_RXDADV_ERR_FCEOFE0x8000 /* FCEOFe/IPE */
 #define IXGBE_RXDADV_ERR_FCERR 0x0070 /* FCERR/FDIRERR */
 #define IXGBE_RXDADV_ERR_FDIR_LEN  0x0010 /* FDIR Length error */
 #define IXGBE_RXDADV_ERR_FDIR_DROP 0x0020 /* FDIR Drop error */
-- 
1.9.3

[dpdk-dev] [PATCH 02/26] ixgbe/base: fix code comment, double from

2015-06-05 Thread Wenzhuo Lu

Remove the redundant "from".

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index 0174ecb..8179354 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -1085,7 +1085,7 @@ s32 ixgbe_stop_adapter_generic(struct ixgbe_hw *hw)
msec_delay(2);

/*
-* Prevent the PCI-E bus from from hanging by disabling PCI-E master
+* Prevent the PCI-E bus from hanging by disabling PCI-E master
 * access and verify no pending requests
 */
return ixgbe_disable_pcie_master(hw);
-- 
1.9.3

[dpdk-dev] [PATCH 01/26] ixgbe/base: update copyright and readme

2015-06-05 Thread Wenzhuo Lu

Update copyright in every file.
Update README file.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/README| 4 ++--
 drivers/net/ixgbe/base/ixgbe_82598.c | 2 +-
 drivers/net/ixgbe/base/ixgbe_82598.h | 2 +-
 drivers/net/ixgbe/base/ixgbe_82599.c | 2 +-
 drivers/net/ixgbe/base/ixgbe_82599.h | 2 +-
 drivers/net/ixgbe/base/ixgbe_api.c   | 2 +-
 drivers/net/ixgbe/base/ixgbe_api.h   | 2 +-
 drivers/net/ixgbe/base/ixgbe_common.c| 2 +-
 drivers/net/ixgbe/base/ixgbe_common.h| 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb.c   | 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb.h   | 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82598.c | 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82598.h | 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82599.c | 2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82599.h | 2 +-
 drivers/net/ixgbe/base/ixgbe_mbx.c   | 2 +-
 drivers/net/ixgbe/base/ixgbe_mbx.h   | 2 +-
 drivers/net/ixgbe/base/ixgbe_osdep.h | 2 +-
 drivers/net/ixgbe/base/ixgbe_phy.c   | 2 +-
 drivers/net/ixgbe/base/ixgbe_phy.h   | 2 +-
 drivers/net/ixgbe/base/ixgbe_type.h  | 2 +-
 drivers/net/ixgbe/base/ixgbe_vf.c| 3 +--
 drivers/net/ixgbe/base/ixgbe_vf.h| 2 +-
 drivers/net/ixgbe/base/ixgbe_x540.c  | 2 +-
 drivers/net/ixgbe/base/ixgbe_x540.h  | 2 +-
 drivers/net/ixgbe/base/ixgbe_x550.c  | 2 +-
 drivers/net/ixgbe/base/ixgbe_x550.h  | 2 +-
 27 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ixgbe/base/README b/drivers/net/ixgbe/base/README
index ba1249b..fa71d85 100644
--- a/drivers/net/ixgbe/base/README
+++ b/drivers/net/ixgbe/base/README
@@ -1,7 +1,7 @@
 ..
  BSD LICENSE

- Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  All rights reserved.

  Redistribution and use in source and binary forms, with or without
@@ -34,7 +34,7 @@ Intel? IXGBE driver
 ===

 This directory contains source code of FreeBSD ixgbe driver of version
-cid-10g-shared-code.2015.02.03 released by LAD. The sub-directory of lad/
+cid-10g-shared-code.2015.03.06 released by ND. The sub-directory of base/
 contains the original source package.
 This driver is valid for the product(s) listed below

diff --git a/drivers/net/ixgbe/base/ixgbe_82598.c 
b/drivers/net/ixgbe/base/ixgbe_82598.c
index 4e06550..75e3e89 100644
--- a/drivers/net/ixgbe/base/ixgbe_82598.c
+++ b/drivers/net/ixgbe/base/ixgbe_82598.c
@@ -1,6 +1,6 @@
 
/***

-Copyright (c) 2001-2014, Intel Corporation
+Copyright (c) 2001-2015, Intel Corporation
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/ixgbe/base/ixgbe_82598.h 
b/drivers/net/ixgbe/base/ixgbe_82598.h
index 4000486..89dd11a 100644
--- a/drivers/net/ixgbe/base/ixgbe_82598.h
+++ b/drivers/net/ixgbe/base/ixgbe_82598.h
@@ -1,6 +1,6 @@
 
/***

-Copyright (c) 2001-2014, Intel Corporation
+Copyright (c) 2001-2015, Intel Corporation
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/ixgbe/base/ixgbe_82599.c 
b/drivers/net/ixgbe/base/ixgbe_82599.c
index 239b833..d3d8c6f 100644
--- a/drivers/net/ixgbe/base/ixgbe_82599.c
+++ b/drivers/net/ixgbe/base/ixgbe_82599.c
@@ -1,6 +1,6 @@
 
/***

-Copyright (c) 2001-2014, Intel Corporation
+Copyright (c) 2001-2015, Intel Corporation
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/ixgbe/base/ixgbe_82599.h 
b/drivers/net/ixgbe/base/ixgbe_82599.h
index 39316a9..adf109c 100644
--- a/drivers/net/ixgbe/base/ixgbe_82599.h
+++ b/drivers/net/ixgbe/base/ixgbe_82599.h
@@ -1,6 +1,6 @@
 
/***

-Copyright (c) 2001-2014, Intel Corporation
+Copyright (c) 2001-2015, Intel Corporation
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/ixgbe/base/ixgbe_api.c 
b/drivers/net/ixgbe/base/ixgbe_api.c
index c704b69..b45d0de 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.c
+++ b/drivers/net/ixgbe/base/ixgbe_api.c
@@ -1,6 +1,6 @@
 
/***

-Copyright (c) 2001-2014, Intel Corporation
+Copyright (c) 2001-2015, Intel Corporation
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/ixgbe/base/ixgbe_api.h 
b/drivers/net/ixgbe/base/ixgbe_api.h
index 8386e29..d7cc2a6 100644
--- a/drivers/net/ixgbe/base/ixgbe_api.h
+++ b/drivers/net/ixgbe/base/ixgbe_api.h
@@ -1,6 +1,6 @@

[dpdk-dev] [PATCH 00/26] update ixgbe base driver

2015-06-05 Thread Wenzhuo Lu

Short summary:
*update copyright and readme
*fix code comment, double from
*fix typo error in code comment
*check return value after calling
*allow tunneled UDP and TCP frames to reach their destination
*erase ixgbe_get_hi_status
*provide unlocked I2C methods
*reduce I2C retry count on X550 devices
*issue firmware command when coming up
*add logic to reset CS4227 when needed
*restore ESDP settings after MAC reset
*disable FEC to save power
*set lan_id for non-PCIe devices
*add SFP+ dual-speed support
*add SW based LPLU support
*fix flow control for KR backplane
*new simplified x550em init flow
*move I2C MUX function from ixgbe_x540.c to ixgbe_x550.c
*change return value for ixgbe_setup_internal_phy_t_x550em
*ixgbe_setup_internal_phy_x550em function clean-up
*add x550em Auto neg Flow Control support
*add x550em PHY interrupt and forced 1G/10G support
*add link check support for x550em PHY
*set lan_id before first I2C access
*added x550em PHY reset function
*block EEE setup on the interfaces which don't support EEE

Wenzhuo Lu (26):
  ixgbe/base: update copyright and readme
  ixgbe/base: fix code comment, double from
  ixgbe/base: fix typo error in code comment
  ixgbe/base: check return value after calling
  ixgbe/base: allow tunneled UDP and TCP frames to reach their
destination
  ixgbe/base: erase ixgbe_get_hi_status
  ixgbe/base: provide unlocked I2C methods
  ixgbe/base: reduce I2C retry count on X550 devices
  ixgbe/base: issue firmware command when coming up
  ixgbe/base: add logic to reset CS4227 when needed
  ixgbe/base: restore ESDP settings after MAC reset
  ixgbe/base: disable FEC(Forward Error Correction) to save power
  ixgbe/base: set lan_id for non-PCIe devices
  ixgbe/base: add SFP+ dual-speed support
  ixgbe/base: add SW based LPLU support
  ixgbe/base: fix flow control for KR backplane
  ixgbe/base: new simplified x550em init flow
  ixgbe/base: move I2C MUX function from ixgbe_x540.c to ixgbe_x550.c
  ixgbe/base: change return value for ixgbe_setup_internal_phy_t_x550em
  ixgbe/base: ixgbe_setup_internal_phy_x550em function clean-up
  ixgbe/base: add x550em Auto neg Flow Control support
  ixgbe/base: add x550em PHY interrupt and forced 1G/10G support
  ixgbe/base: add link check support for x550em PHY
  ixgbe/base: set lan_id before first I2C access
  ixgbe/base: added x550em PHY reset function
  ixgbe/base: block EEE(Energy Efficient Ethernet) setup on the
interfaces that don't support EEE

 drivers/net/ixgbe/base/README|4 +-
 drivers/net/ixgbe/base/ixgbe_82598.c |7 +-
 drivers/net/ixgbe/base/ixgbe_82598.h |2 +-
 drivers/net/ixgbe/base/ixgbe_82599.c |  191 +-
 drivers/net/ixgbe/base/ixgbe_82599.h |7 +-
 drivers/net/ixgbe/base/ixgbe_api.c   |  141 +++-
 drivers/net/ixgbe/base/ixgbe_api.h   |   16 +-
 drivers/net/ixgbe/base/ixgbe_common.c|  270 +++-
 drivers/net/ixgbe/base/ixgbe_common.h|9 +-
 drivers/net/ixgbe/base/ixgbe_dcb.c   |2 +-
 drivers/net/ixgbe/base/ixgbe_dcb.h   |2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82598.c |2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82598.h |2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82599.c |2 +-
 drivers/net/ixgbe/base/ixgbe_dcb_82599.h |2 +-
 drivers/net/ixgbe/base/ixgbe_mbx.c   |2 +-
 drivers/net/ixgbe/base/ixgbe_mbx.h   |2 +-
 drivers/net/ixgbe/base/ixgbe_osdep.h |2 +-
 drivers/net/ixgbe/base/ixgbe_phy.c   |  215 ++-
 drivers/net/ixgbe/base/ixgbe_phy.h   |   23 +-
 drivers/net/ixgbe/base/ixgbe_type.h  |   70 +-
 drivers/net/ixgbe/base/ixgbe_vf.c|3 +-
 drivers/net/ixgbe/base/ixgbe_vf.h|2 +-
 drivers/net/ixgbe/base/ixgbe_x540.c  |   32 +-
 drivers/net/ixgbe/base/ixgbe_x540.h  |2 +-
 drivers/net/ixgbe/base/ixgbe_x550.c  | 1029 ++
 drivers/net/ixgbe/base/ixgbe_x550.h  |   20 +-
 27 files changed, 1646 insertions(+), 415 deletions(-)

-- 
1.9.3

[dpdk-dev] 4 Traffic classes per Pipe limitation

2015-06-05 Thread Yeddula, Avinash

Hi,
This is related to the QOS scheduler functionality provided by dpdk.

I see a limit on the number of traffic classes to be 4.  I'm exploring the 
available options to increase that limit to 8.

This is what I found when I researched on this topic.
The limitation on number's of TC (and pipes) comes from the number of
bits available. Since the QoS code overloads the 32 bit RSS field in
the mbuf there isn't enough bits to a lot. But then again if you add lots
of pipes or subports the memory footprint gets huge.

Any more info or suggestions on increasing the limit to 8 ?

Thanks
-Avinash

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Thomas Monjalon

2015-06-04 22:10, Andrew Harvey:
> On 6/4/15, 7:58 AM, "Stephen Hemminger"  wrote:
> >"Andrew Harvey (agh)"  wrote:
> >> I believe that their is value in this interface for software stacks not
> >> based on Linux being moved toward DPDK that need simple operations like
> >> getting the mac address.  Some of these stacks have a dearth of
> >>resources
> >> available and dedicating a core/thread to KNI to get/set a mac address
> >> is considered excessive. There are also issues with 32/64 bit kernel
> >> integration
> >> using KNI.  If the ethtool interface is not the correct interface then
> >> please help me
> >> understand what should/could have been used. If ethtool is considered
> >>'old
> >> and clunky?
> >> Stephen's and your input would be valuable in designing another
> >>interface
> >> with
> >> similar properties.  The use-case is pretty simple and there is no plans
> >> for moving
> >> anything back into the kernel on the contrary its the complete opposite.
> >> 
> >> ? Andy
> >
> >We have DPDK API's to do this, and any added wrappers make it bigger.
> >I don't see why calling your ethtool API is better than calling
> >rte_eth* API.
> >
> >If there is a missing functionality in the rte_ethXXX api's for an
> >application then add that. For example: rte_eth_mac_addr_get()
> 
> I am getting somewhat confused by your latest comments.  Your first email
> (referenced below) looked really positive and I found your suggestions
> useful. Your latest post appears to contradict this and now the interface
> was there all the time.  The wrapper fa?ade provided by the ethtool
> library provide a clean separation of concerns and will allow people to
> migrate from not only KNI but in our case from a legacy system.  If a
> software stack has requirements to work with multiple IO abstractions
> then the ethtool approach is attractive. I would speculate that many
> other stacks moving towards dpdk will have similar issues.
> 
> Summarizing, for our use-cases the ethtool interface facilitated our
> adoption to dpdk while allowing us to support our legacy IO abstractions.

Stephen and me say the same thing about using the ethdev API.
We don't understand why using a fake ethtool lib would be easier.
Though you are saying it "facilitated [your] adoption to dpdk".
Please could you explain why using an ethtool-like API is easier than
using the existing ethdev API?
In any case, you have to develop a specific backend for DPDK
(rte_ethtool would be also DPDK-specific).

It seems you already started to use such an ethtool implementation.
Please note that our goal is not to prevent Cisco from upstreaming
(evidence with enic driver integration) but we want to guide you, and
others having the same needs, to the best solution for everybody.
That's why we need to understand what we (or you) are missing.
Maybe that it would be clearer with some code examples (which would
go in the lib documentation if any).

Thanks

[dpdk-dev] The use of --log-level and its default state

2015-06-05 Thread Wiles, Keith



On 6/5/15, 5:00 AM, "Thomas Monjalon"  wrote:

>2015-05-27 15:10, Wiles, Keith:
>> I would like to have the log-level default changed to not log
>>everything,
>> but the user needs to enable the log messages if he needs to see more
>> information. Normally applications or systems are not so verbose, but if
>> needed the user enables the verbose or debug messages.
>> 
>> Can we change the default logs and messages to be non-verbose instead?
>
>Do you mean changing this line?
>/* default value from build option */
>internal_cfg->log_level = RTE_LOG_LEVEL;
>It means using the most verbose level available in the build.
>
>Maybe we should set RTE_LOG_NOTICE or RTE_LOG_WARNING,
>However, there is already --log-level for the user and rte_set_log_level()
>for the application developper.
>So this default log level is only used for DPDK trials and development.
>Probably that being verbose is a good option for such cases?
>
The normal operation for most systems is no-news-is-good-news, meaning
only report warnings and errors if someone wants informational output it
should be enabled by the user. It seems PMDs and other parts of DPDK print
out information which is not a warning or error, but are debug or
informational messages that are not very useful. The debug information is
more for the developer then a user or even a developer of DPDK as the
debug information has nothing to do with the current developers goals.

We can change the RTE_LOG_LEVEL to a value that only prints warning and
errors should always be printed. We can leave it or we can make that one
change to reduce the amount of clutter on the screen. Some of the PMD
information is printed out anytime the state changes, which effects the
application screen output.

When I have to interact with users of DPDK they sometimes miss critical
details in the output because of the sheer amount of output on the screen.
Even most OSes try to output information to the screen is a sane way to
allow someone to quickly spot a problem.

Here is a normal output with log level at the default:

EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 8 on socket 0
EAL: Detected lcore 6 as core 9 on socket 0
EAL: Detected lcore 7 as core 10 on socket 0
EAL: Detected lcore 8 as core 11 on socket 0
EAL: Detected lcore 9 as core 16 on socket 0
EAL: Detected lcore 10 as core 17 on socket 0
EAL: Detected lcore 11 as core 18 on socket 0
EAL: Detected lcore 12 as core 19 on socket 0
EAL: Detected lcore 13 as core 20 on socket 0
EAL: Detected lcore 14 as core 24 on socket 0
EAL: Detected lcore 15 as core 25 on socket 0
EAL: Detected lcore 16 as core 26 on socket 0
EAL: Detected lcore 17 as core 27 on socket 0
EAL: Detected lcore 18 as core 0 on socket 1
EAL: Detected lcore 19 as core 1 on socket 1
EAL: Detected lcore 20 as core 2 on socket 1
EAL: Detected lcore 21 as core 3 on socket 1
EAL: Detected lcore 22 as core 4 on socket 1
EAL: Detected lcore 23 as core 8 on socket 1
EAL: Detected lcore 24 as core 9 on socket 1
EAL: Detected lcore 25 as core 10 on socket 1
EAL: Detected lcore 26 as core 11 on socket 1
EAL: Detected lcore 27 as core 16 on socket 1
EAL: Detected lcore 28 as core 17 on socket 1
EAL: Detected lcore 29 as core 18 on socket 1
EAL: Detected lcore 30 as core 19 on socket 1
EAL: Detected lcore 31 as core 20 on socket 1
EAL: Detected lcore 32 as core 24 on socket 1
EAL: Detected lcore 33 as core 25 on socket 1
EAL: Detected lcore 34 as core 26 on socket 1
EAL: Detected lcore 35 as core 27 on socket 1
EAL: Detected lcore 36 as core 0 on socket 0
EAL: Detected lcore 37 as core 1 on socket 0
EAL: Detected lcore 38 as core 2 on socket 0
EAL: Detected lcore 39 as core 3 on socket 0
EAL: Detected lcore 40 as core 4 on socket 0
EAL: Detected lcore 41 as core 8 on socket 0
EAL: Detected lcore 42 as core 9 on socket 0
EAL: Detected lcore 43 as core 10 on socket 0
EAL: Detected lcore 44 as core 11 on socket 0
EAL: Detected lcore 45 as core 16 on socket 0
EAL: Detected lcore 46 as core 17 on socket 0
EAL: Detected lcore 47 as core 18 on socket 0
EAL: Detected lcore 48 as core 19 on socket 0
EAL: Detected lcore 49 as core 20 on socket 0
EAL: Detected lcore 50 as core 24 on socket 0
EAL: Detected lcore 51 as core 25 on socket 0
EAL: Detected lcore 52 as core 26 on socket 0
EAL: Detected lcore 53 as core 27 on socket 0
EAL: Detected lcore 54 as core 0 on socket 1
EAL: Detected lcore 55 as core 1 on socket 1
EAL: Detected lcore 56 as core 2 on socket 1
EAL: Detected lcore 57 as core 3 on socket 1
EAL: Detected lcore 58 as core 4 on socket 1
EAL: Detected lcore 59 as core 8 on socket 1
EAL: Detected lcore 60 as core 9 on socket 1
EAL: Detected lcore 61 as core 10 on socket 1
EAL: Detected lcore 62 as core 11 on socket 1
EAL: Detected lcore 63 as core 16 on socket 1
EAL: Detected lcore 64

[dpdk-dev] [PATCH v3] pipeline: add statistics for librte_pipeline

2015-06-05 Thread Thomas Monjalon

2015-05-28 19:26, Dumitrescu, Cristian:
> I think we have the following options identified so far for stats collection 
> configuration:
> 
> 1. Stats configuration through the RTE_LOG_LEVEL
> 2. Single configuration flag global for all DPDK libraries
> 3. Single configuration flag per DPDK library
> 
> It would be good if Thomas and Stephen, as well as others, would reply with 
> their preference order.

There are important design questions in these threads.
I think that the best way to come to a conclusion is to submit a design rule
- to state whether statistics must be considered as a feature or as 
debug,
- and to decide whether stats must be always available or disabled
globally or per-library.
It should be written in a new doc. I suggest docs/guidelines/design.rst.
Then we'll have to discuss and vote on this base. It will avoid future
discussions.
The underlying discussion is to decide if every cycle is important even if
there is a real usability drawback.

In order to reach a conclusion, it seems reasonnable to target a consensus
2 weeks after the first submission of these design rules.
Thanks Cristian to follow up.

[dpdk-dev] [PATCH] mk: remove "u" modifier from "ar" command

2015-06-05 Thread Bruce Richardson

On Fedora 22, the "ar" binary operates by default in deterministic mode,
making the "u" parameter irrelevant, and leading to warning messages
getting printed in the build output like below.

  INSTALL-LIB librte_kvargs.a
ar: `u' modifier ignored since `D' is the default (see `U')

There are two options to remove these warnings:
* add in the "U" flag to make "ar" non-deterministic again
* remove the "u" flag to have all objects always updated

This patch takes the second approach.

Signed-off-by: Bruce Richardson 
---
 mk/rte.lib.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 0d7482d..6bd67aa 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -70,7 +70,7 @@ else
 _CPU_LDFLAGS := $(CPU_LDFLAGS)
 endif

-O_TO_A = $(AR) crus $(LIB) $(OBJS-y)
+O_TO_A = $(AR) crs $(LIB) $(OBJS-y)
 O_TO_A_STR = $(subst ','\'',$(O_TO_A)) #'# fix syntax highlight
 O_TO_A_DISP = $(if $(V),"$(O_TO_A_STR)","  AR $(@)")
 O_TO_A_CMD = "cmd_$@ = $(O_TO_A_STR)"
-- 
2.4.2

[dpdk-dev] The use of --log-level and its default state

2015-06-05 Thread Thomas Monjalon

2015-05-27 15:10, Wiles, Keith:
> I would like to have the log-level default changed to not log everything,
> but the user needs to enable the log messages if he needs to see more
> information. Normally applications or systems are not so verbose, but if
> needed the user enables the verbose or debug messages.
> 
> Can we change the default logs and messages to be non-verbose instead?

Do you mean changing this line?
/* default value from build option */
internal_cfg->log_level = RTE_LOG_LEVEL;
It means using the most verbose level available in the build.

Maybe we should set RTE_LOG_NOTICE or RTE_LOG_WARNING,
However, there is already --log-level for the user and rte_set_log_level()
for the application developper.
So this default log level is only used for DPDK trials and development.
Probably that being verbose is a good option for such cases?

[dpdk-dev] [PATCH] qos_sched: example modification to use librte_cfgfile

2015-06-05 Thread Michal Jastrzebski

This is a modification of qos_sched example to use
librte_cfgfile for parsing configuration file.

Signed-off-by: Michal Jastrzebski 
---
 examples/qos_sched/cfg_file.c |  157 ++---
 examples/qos_sched/cfg_file.h |   35 ++---
 examples/qos_sched/init.c |   14 ++--
 3 files changed, 47 insertions(+), 159 deletions(-)

diff --git a/examples/qos_sched/cfg_file.c b/examples/qos_sched/cfg_file.c
index 05a8caf..71ddabb 100644
--- a/examples/qos_sched/cfg_file.c
+++ b/examples/qos_sched/cfg_file.c
@@ -233,92 +233,7 @@ int cfg_close(struct cfg_file *cfg)
 }

 int
-cfg_num_sections(struct cfg_file *cfg, const char *sectionname, size_t length)
-{
-   int i;
-   int num_sections = 0;
-   for (i = 0; i < cfg->num_sections; i++) {
-   if (strncmp(cfg->sections[i]->name, sectionname, length) == 0)
-   num_sections++;
-   }
-   return num_sections;
-}
-
-int
-cfg_sections(struct cfg_file *cfg, char *sections[], int max_sections)
-{
-   int i;
-   for (i = 0; i < cfg->num_sections && i < max_sections; i++) {
-   snprintf(sections[i], CFG_NAME_LEN, "%s",  
cfg->sections[i]->name);
-   }
-   return i;
-}
-
-static const struct cfg_section *
-_get_section(struct cfg_file *cfg, const char *sectionname)
-{
-   int i;
-   for (i = 0; i < cfg->num_sections; i++) {
-   if (strncmp(cfg->sections[i]->name, sectionname,
-   sizeof(cfg->sections[0]->name)) == 0)
-   return cfg->sections[i];
-   }
-   return NULL;
-}
-
-int
-cfg_has_section(struct cfg_file *cfg, const char *sectionname)
-{
-   return (_get_section(cfg, sectionname) != NULL);
-}
-
-int
-cfg_section_num_entries(struct cfg_file *cfg, const char *sectionname)
-{
-   const struct cfg_section *s = _get_section(cfg, sectionname);
-   if (s == NULL)
-   return -1;
-   return s->num_entries;
-}
-
-
-int
-cfg_section_entries(struct cfg_file *cfg, const char *sectionname,
-   struct cfg_entry *entries, int max_entries)
-{
-   int i;
-   const struct cfg_section *sect = _get_section(cfg, sectionname);
-   if (sect == NULL)
-   return -1;
-   for (i = 0; i < max_entries && i < sect->num_entries; i++)
-   entries[i] = *sect->entries[i];
-   return i;
-}
-
-const char *
-cfg_get_entry(struct cfg_file *cfg, const char *sectionname,
-   const char *entryname)
-{
-   int i;
-   const struct cfg_section *sect = _get_section(cfg, sectionname);
-   if (sect == NULL)
-   return NULL;
-   for (i = 0; i < sect->num_entries; i++)
-   if (strncmp(sect->entries[i]->name, entryname, CFG_NAME_LEN) == 
0)
-   return sect->entries[i]->value;
-   return NULL;
-}
-
-int
-cfg_has_entry(struct cfg_file *cfg, const char *sectionname,
-   const char *entryname)
-{
-   return (cfg_get_entry(cfg, sectionname, entryname) != NULL);
-}
-
-
-int
-cfg_load_port(struct cfg_file *cfg, struct rte_sched_port_params *port_params)
+cfg_load_port(struct rte_cfgfile *cfg, struct rte_sched_port_params 
*port_params)
 {
const char *entry;
int j;
@@ -326,19 +241,19 @@ cfg_load_port(struct cfg_file *cfg, struct 
rte_sched_port_params *port_params)
if (!cfg || !port_params)
return -1;

-   entry = cfg_get_entry(cfg, "port", "frame overhead");
+   entry = rte_cfgfile_get_entry(cfg, "port", "frame overhead");
if (entry)
port_params->frame_overhead = (uint32_t)atoi(entry);

-   entry = cfg_get_entry(cfg, "port", "number of subports per port");
+   entry = rte_cfgfile_get_entry(cfg, "port", "number of subports per 
port");
if (entry)
port_params->n_subports_per_port = (uint32_t)atoi(entry);

-   entry = cfg_get_entry(cfg, "port", "number of pipes per subport");
+   entry = rte_cfgfile_get_entry(cfg, "port", "number of pipes per 
subport");
if (entry)
port_params->n_pipes_per_subport = (uint32_t)atoi(entry);

-   entry = cfg_get_entry(cfg, "port", "queue sizes");
+   entry = rte_cfgfile_get_entry(cfg, "port", "queue sizes");
if (entry) {
char *next;

@@ -356,7 +271,7 @@ cfg_load_port(struct cfg_file *cfg, struct 
rte_sched_port_params *port_params)

/* Parse WRED min thresholds */
snprintf(str, sizeof(str), "tc %d wred min", j);
-   entry = cfg_get_entry(cfg, "red", str);
+   entry = rte_cfgfile_get_entry(cfg, "red", str);
if (entry) {
char *next;
int k;
@@ -372,7 +287,7 @@ cfg_load_port(struct cfg_file *cfg, struct 
rte_sched_port_params *port_params)

/* Parse WRED max thresholds */
snprintf(str, sizeof(str), "tc %d wred max", j);
-

[dpdk-dev] [PATCH v2] vhost: provide vhost API to unregister vhost unix domain socket

2015-06-05 Thread Huawei Xie

rte_vhost_driver_unregister will remove the listenfd from event list, and then 
close it.

Signed-off-by: Huawei Xie 
Signed-off-by: Peng Sun 
---
 lib/librte_vhost/rte_virtio_net.h|  3 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |  9 
 lib/librte_vhost/vhost_user/vhost-net-user.c | 68 +++-
 lib/librte_vhost/vhost_user/vhost-net-user.h |  2 +-
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 5d38185..5630fbc 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -188,6 +188,9 @@ int rte_vhost_enable_guest_notification(struct virtio_net 
*dev, uint16_t queue_i
 /* Register vhost driver. dev_name could be different for multiple instance 
support. */
 int rte_vhost_driver_register(const char *dev_name);

+/* Unregister vhost driver. This is only meaningful to vhost user. */
+int rte_vhost_driver_unregister(const char *dev_name);
+
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
 /* Start vhost driver session blocking loop. */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 6b68abf..1ae7c49 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -405,6 +405,15 @@ rte_vhost_driver_register(const char *dev_name)
 }

 /**
+ * An empty function for unregister
+ */
+int
+rte_vhost_driver_unregister(const char *dev_name __rte_unused)
+{
+   return 0;
+}
+
+/**
  * The CUSE session is launched allowing the application to receive open,
  * release and ioctl calls.
  */
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 31f1215..87a4711 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -66,6 +66,8 @@ struct connfd_ctx {
 struct _vhost_server {
struct vhost_server *server[MAX_VHOST_SERVER];
struct fdset fdset;
+   int vserver_cnt;
+   pthread_mutex_t server_mutex;
 };

 static struct _vhost_server g_vhost_server = {
@@ -74,10 +76,10 @@ static struct _vhost_server g_vhost_server = {
.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
.num = 0
},
+   .vserver_cnt = 0,
+   .server_mutex = PTHREAD_MUTEX_INITIALIZER,
 };

-static int vserver_idx;
-
 static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NONE] = "VHOST_USER_NONE",
[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
@@ -427,7 +429,6 @@ vserver_message_handler(int connfd, void *dat, int *remove)
}
 }

-
 /**
  * Creates and initialise the vhost server.
  */
@@ -436,34 +437,77 @@ rte_vhost_driver_register(const char *path)
 {
struct vhost_server *vserver;

-   if (vserver_idx == 0)
+   pthread_mutex_lock(_vhost_server.server_mutex);
+   if (ops == NULL)
ops = get_virtio_net_callbacks();
-   if (vserver_idx == MAX_VHOST_SERVER)
+
+   if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "error: the number of servers reaches maximum\n");
+   pthread_mutex_unlock(_vhost_server.server_mutex);
return -1;
+   }

vserver = calloc(sizeof(struct vhost_server), 1);
-   if (vserver == NULL)
+   if (vserver == NULL) {
+   pthread_mutex_unlock(_vhost_server.server_mutex);
return -1;
-
-   unlink(path);
+   }

vserver->listenfd = uds_socket(path);
if (vserver->listenfd < 0) {
free(vserver);
+   pthread_mutex_unlock(_vhost_server.server_mutex);
return -1;
}
-   vserver->path = path;
+
+   vserver->path = strdup(path);

fdset_add(_vhost_server.fdset, vserver->listenfd,
-   vserver_new_vq_conn, NULL,
-   vserver);
+   vserver_new_vq_conn, NULL, vserver);

-   g_vhost_server.server[vserver_idx++] = vserver;
+   g_vhost_server.server[g_vhost_server.vserver_cnt++] = vserver;
+   pthread_mutex_unlock(_vhost_server.server_mutex);

return 0;
 }


+/**
+ * Unregister the specified vhost server
+ */
+int
+rte_vhost_driver_unregister(const char *path)
+{
+   int i;
+   int count;
+
+   pthread_mutex_lock(_vhost_server.server_mutex);
+
+   for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
+   if (!strcmp(g_vhost_server.server[i]->path, path)) {
+   fdset_del(_vhost_server.fdset,
+   g_vhost_server.server[i]->listenfd);
+
+   close(g_vhost_server.server[i]->listenfd);
+   free(g_vhost_server.server[i]->path);
+   free(g_vhost_server.server[i]);
+
+

[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-05 Thread Wang, Liang-min



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, June 05, 2015 6:47 AM
> To: Andrew Harvey (agh)
> Cc: Stephen Hemminger; Wang, Liang-min; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide
> ethtool-alike APIs
> 
> 2015-06-04 22:10, Andrew Harvey:
> > On 6/4/15, 7:58 AM, "Stephen Hemminger"
>  wrote:
> > >"Andrew Harvey (agh)"  wrote:
> > >> I believe that their is value in this interface for software stacks
> > >>not  based on Linux being moved toward DPDK that need simple
> > >>operations like  getting the mac address.  Some of these stacks have
> > >>a dearth of resources  available and dedicating a core/thread to KNI
> > >>to get/set a mac address  is considered excessive. There are also
> > >>issues with 32/64 bit kernel  integration  using KNI.  If the
> > >>ethtool interface is not the correct interface then  please help me
> > >>understand what should/could have been used. If ethtool is
> > >>considered 'old  and clunky?  Stephen's and your input would be
> > >>valuable in designing another interface  with  similar properties.
> > >>The use-case is pretty simple and there is no plans  for moving
> > >>anything back into the kernel on the contrary its the complete opposite.
> > >>
> > >> ? Andy
> > >
> > >We have DPDK API's to do this, and any added wrappers make it bigger.
> > >I don't see why calling your ethtool API is better than calling
> > >rte_eth* API.
> > >
> > >If there is a missing functionality in the rte_ethXXX api's for an
> > >application then add that. For example: rte_eth_mac_addr_get()
> >
> > I am getting somewhat confused by your latest comments.  Your first
> > email (referenced below) looked really positive and I found your
> > suggestions useful. Your latest post appears to contradict this and
> > now the interface was there all the time.  The wrapper fa?ade provided
> > by the ethtool library provide a clean separation of concerns and will
> > allow people to migrate from not only KNI but in our case from a
> > legacy system.  If a software stack has requirements to work with
> > multiple IO abstractions then the ethtool approach is attractive. I
> > would speculate that many other stacks moving towards dpdk will have
> similar issues.
> >
> > Summarizing, for our use-cases the ethtool interface facilitated our
> > adoption to dpdk while allowing us to support our legacy IO abstractions.
> 
> Stephen and me say the same thing about using the ethdev API.
> We don't understand why using a fake ethtool lib would be easier.
> Though you are saying it "facilitated [your] adoption to dpdk".
> Please could you explain why using an ethtool-like API is easier than using
> the existing ethdev API?
> In any case, you have to develop a specific backend for DPDK (rte_ethtool
> would be also DPDK-specific).

As described earlier in this patch comment reply, there are other ethtool ops 
that have been implemented.
Those ops includes set/get eeprom, set/get pauseparam, set/get ringparam which 
are not available in the exiting ethdev library.
For this release, we focus on releasing some basic functions (btw, mac_addr_set 
is not available but is covered by this patch).
The key reason that this set of library is not released as part of ethdev is 
the ethtool API dependency on kernel include file.
To faithfully carry the ethtool ops and net dev ops API parameters, the ethtool 
APIs are designed to follow the original definition except avoiding carry 
kernel states.
With that, to support ethtool APIs faithfully, we need to include 
. 
As suggested by many DPDK veterans including Thomas (indicated over your 
reply), you would prefer these APIs in a separate library.

> 
> It seems you already started to use such an ethtool implementation.
> Please note that our goal is not to prevent Cisco from upstreaming (evidence
> with enic driver integration) but we want to guide you, and others having the
> same needs, to the best solution for everybody.
> That's why we need to understand what we (or you) are missing.
> Maybe that it would be clearer with some code examples (which would go in
> the lib documentation if any).
> 
> Thanks

[dpdk-dev] [PATCH] fm10k: fix PF/VF MAC address register and clean up bug

2015-06-05 Thread Shaopeng He

MAC address with fixed VLAN 0 was removed. VF MAC/VLAN filter was enabled
for the default value. Removed all VLAN and MAC address table entries when
the system(e.g. testpmd) was closed.

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k_ethdev.c | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 9b198a7..587debc 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -58,6 +58,8 @@ static int
 fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 static void
 fm10k_MAC_filter_set(struct rte_eth_dev *dev, const u8 *mac, bool add);
+static void
+fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -703,6 +705,8 @@ fm10k_dev_close(struct rte_eth_dev *dev)

PMD_INIT_FUNC_TRACE();

+   fm10k_MACVLAN_remove_all(dev);
+
/* Stop mailbox service first */
fm10k_close_mbx_service(hw);
fm10k_dev_stop(dev);
@@ -830,10 +834,6 @@ fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);

-   /* @todo - add support for the VF */
-   if (hw->mac.type != fm10k_mac_pf)
-   return -ENOTSUP;
-
if (vlan_id > ETH_VLAN_ID_MAX) {
PMD_INIT_LOG(ERR, "Invalid vlan_id: must be < 4096");
return (-EINVAL);
@@ -916,10 +916,6 @@ fm10k_MAC_filter_set(struct rte_eth_dev *dev, const u8 
*mac, bool add)
hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);

-   /* @todo - add support for the VF */
-   if (hw->mac.type != fm10k_mac_pf)
-   return;
-
fm10k_mbx_lock(hw);
i = 0;
for (j = 0; j < FM10K_VFTA_SIZE; j++) {
@@ -970,6 +966,25 @@ fm10k_macaddr_remove(struct rte_eth_dev *dev, uint32_t 
index)
FALSE);
 }

+/* Remove all VLAN and MAC address table entries */
+static void
+fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev)
+{
+   uint32_t j, k;
+   struct fm10k_macvlan_filter_info *macvlan;
+
+   macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);
+   for (j = 0; j < FM10K_VFTA_SIZE; j++) {
+   if (macvlan->vfta[j]) {
+   for (k = 0; k < FM10K_UINT32_BIT_SIZE; k++) {
+   if (macvlan->vfta[j] & (1 << k))
+   fm10k_vlan_filter_set(dev,
+   j * FM10K_UINT32_BIT_SIZE + k, 
false);
+   }
+   }
+   }
+}
+
 static inline int
 check_nb_desc(uint16_t min, uint16_t max, uint16_t mult, uint16_t request)
 {
@@ -1997,13 +2012,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
/* Enable port first */
hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map, 1, 1);

-   /*
-* Add default mac. glort is assigned by SM for PF, while is
-* unused for VF. PF will assign correct glort for VF.
-*/
-   hw->mac.ops.update_uc_addr(hw, hw->mac.dglort_map, hw->mac.addr,
-   0, 1, 0);
-
/* Set unicast mode by default. App can change to other mode in other
 * API func.
 */
-- 
1.9.3

[dpdk-dev] Running testpmd over KNI

2015-06-05 Thread Bruce Richardson

On Thu, Jun 04, 2015 at 02:01:19PM -0700, Navneet Rao wrote:
> Running ---
> 
>  
> 
> ./testpmd -c7 -n3 --vdev=eth_pcap0,iface=vEth0 --vdev=eth_pcap1,iface=vEth1 
> -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=1024
> 
>  
> 
> results in a  
> 
>  
> 
> EAL: Error - exiting with code: 1
> 
>   Cause: Cannot create lock on '/var/run/.rte_config'. Is another primary 
> process running?
> 
>  
> 
>  
> 
> I don't think I am running another process using testpmd!!!
> 
> Any ideas to debug this?
> 
>  
> 
> Thanks
> 
> -Navneet

Hi Navneet,

I'm a little unclear on your setup here. You are using a DPDK process to pull
packets from a physical NIC and send them to the kernel using KNI. Then you want
to have testpmd pull those packets from the KNI device using pcap back into 
user-
space before returning them via the same sort of path i.e. userspace, pcap to
kernel, kni back to userspace and out again. Can you explain why you want
such a setup, as it will work very slowly compared to just running everything
directly in userspace?

As for your specific issue. If you have a DPDK process running to manage the KNI
device, that is the process holding the lock on .rte_config. You will need to
run the second process with a different file-prefix parameter to have two
DPDK processes running side-by-side.

Regards,
/Bruce

[dpdk-dev] [PATCH 2/2] vhost: realloc virtio_net and virtqueue to the same node of vring desc table

2015-06-05 Thread Huawei Xie

When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR 
message,
will try to reallocate virtio_net device and virtqueue to the same numa node.

Signed-off-by: Huawei Xie 
---
 config/common_linuxapp|  1 +
 lib/librte_vhost/Makefile |  4 ++
 lib/librte_vhost/virtio-net.c | 93 +++
 mk/rte.app.mk |  3 ++
 4 files changed, 101 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0078dc9..4ace24e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -421,6 +421,7 @@ CONFIG_RTE_KNI_VHOST_DEBUG_TX=n
 #
 CONFIG_RTE_LIBRTE_VHOST=n
 CONFIG_RTE_LIBRTE_VHOST_USER=y
+CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

 #
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index a8645a6..6681f22 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -46,6 +46,10 @@ CFLAGS += -I vhost_cuse -lfuse
 LDFLAGS += -lfuse
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
+LDFLAGS += -lnuma
+endif
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),y)
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 19b74d6..8a80f5e 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -38,6 +38,9 @@
 #include 
 #include 
 #include 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include 
+#endif

 #include 

@@ -481,6 +484,93 @@ set_vring_num(struct vhost_device_ctx ctx, struct 
vhost_vring_state *state)
 }

 /*
+ * Reallocate virtio_det and vhost_virtqueue data structure to make them on the
+ * same numa node as the memory of vring descriptor.
+ */
+#ifdef RTE_LIBRTE_VHOST_NUMA
+static struct virtio_net*
+numa_realloc(struct virtio_net *dev, int index)
+{
+   int oldnode, newnode;
+   struct virtio_net_config_ll *old_ll_dev, *new_ll_dev;
+   struct vhost_virtqueue *old_vq, *new_vq;
+   int ret;
+   int realloc_dev = 0, realloc_vq = 0;
+
+   old_ll_dev = (struct virtio_net_config_ll *)dev;
+   old_vq = dev->virtqueue[index];
+
+   ret  = get_mempolicy(, NULL, 0, old_vq->desc,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   ret = ret | get_mempolicy(, NULL, 0, old_ll_dev,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   if (ret) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Unable to get vring desc or dev numa information.\n");
+   return dev;
+   }
+   if (oldnode != newnode)
+   realloc_dev = 1;
+
+   ret = get_mempolicy(, NULL, 0, old_vq,
+   MPOL_F_NODE | MPOL_F_ADDR);
+   if (ret) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Unable to get vq numa information.\n");
+   return dev;
+   }
+   if (oldnode != newnode)
+   realloc_vq = 1;
+
+   if (realloc_dev == 0 && realloc_vq == 0)
+   return dev;
+
+   if (realloc_dev)
+   new_ll_dev = rte_malloc_socket(NULL,
+   sizeof(struct virtio_net_config_ll), 0, newnode);
+   if (realloc_vq)
+   new_vq = rte_malloc_socket(NULL,
+   sizeof(struct vhost_virtqueue), 0, newnode);
+   if (!new_ll_dev || !new_vq) {
+   if (new_ll_dev)
+   rte_free(new_ll_dev);
+   if (new_vq)
+   rte_free(new_vq);
+   return dev;
+   }
+
+   if (realloc_vq)
+   memcpy(new_vq, old_vq, sizeof(*new_vq));
+   if (realloc_dev)
+   memcpy(new_ll_dev, old_ll_dev, sizeof(*new_ll_dev));
+   (new_ll_dev ? new_ll_dev : old_ll_dev)->dev.virtqueue[index] =
+   new_vq ? new_vq : old_vq;
+   if (realloc_vq)
+   rte_free(old_vq);
+   if (realloc_dev) {
+   if (ll_root == old_ll_dev)
+   ll_root = new_ll_dev;
+   else {
+   struct virtio_net_config_ll *prev = ll_root;
+   while (prev->next != old_ll_dev)
+   prev = prev->next;
+   prev->next = new_ll_dev;
+   new_ll_dev->next = old_ll_dev->next;
+   }
+   rte_free(old_ll_dev);
+   }
+
+   return _ll_dev->dev;
+}
+#else
+static struct virtio_net*
+numa_realloc(struct virtio_net *dev, int index __rte_unused)
+{
+   return dev;
+}
+#endif
+
+/*
  * Called from CUSE IOCTL: VHOST_SET_VRING_ADDR
  * The virtio device sends us the desc, used and avail ring addresses.
  * This function then converts these to our address space.
@@ -508,6 +598,9 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
return -1;
}

+   dev = numa_realloc(dev, addr->index);
+   vq = dev->virtqueue[addr->index];
+
vq->avail = (struct vring_avail

[dpdk-dev] [PATCH 1/2] vhost: malloc -> rte_malloc for virtio_net and virt queue allocation

2015-06-05 Thread Huawei Xie

use rte_malloc/free for virtio_net and virt queue allocation/free

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/virtio-net.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 4672e67..19b74d6 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #include "vhost-net.h"
@@ -202,9 +203,9 @@ static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
/* Free any malloc'd memory */
-   free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-   free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
-   free(ll_dev);
+   rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
+   rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+   rte_free(ll_dev);
 }

 /*
@@ -278,7 +279,7 @@ new_device(struct vhost_device_ctx ctx)
struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;

/* Setup device and virtqueues. */
-   new_ll_dev = malloc(sizeof(struct virtio_net_config_ll));
+   new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
if (new_ll_dev == NULL) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for dev.\n",
@@ -286,19 +287,19 @@ new_device(struct vhost_device_ctx ctx)
return -1;
}

-   virtqueue_rx = malloc(sizeof(struct vhost_virtqueue));
+   virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
if (virtqueue_rx == NULL) {
-   free(new_ll_dev);
+   rte_free(new_ll_dev);
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for rxq.\n",
ctx.fh);
return -1;
}

-   virtqueue_tx = malloc(sizeof(struct vhost_virtqueue));
+   virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
if (virtqueue_tx == NULL) {
-   free(virtqueue_rx);
-   free(new_ll_dev);
+   rte_free(virtqueue_rx);
+   rte_free(new_ll_dev);
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for txq.\n",
ctx.fh);
-- 
1.8.1.4

[dpdk-dev] [PATCH 0/2] vhost: numa aware allocation of virtio_net device and vhost virt queue

2015-06-05 Thread Huawei Xie

The virtio_net device and vhost virt queue should be allocated on the same numa 
node as vring descriptors.
When we firstly allocate the virtio_net device and vhost virt queue, we don't 
know the numa node of vring descriptors.
When we receive the VHOST_SET_VRING_ADDR message, we get the numa node of vring 
descriptors, so we will try to reallocate virtio_net and vhost virt queue to 
the same numa node.

Huawei Xie (2):
  use rte_malloc/free for virtio_net and virt_queue memory data allocation/free
  When we get the address of vring descriptor table, will try to reallocate 
virtio_net device and virtqueue to the same numa node.

 config/common_linuxapp|   1 +
 lib/librte_vhost/Makefile |   4 ++
 lib/librte_vhost/virtio-net.c | 112 ++
 mk/rte.app.mk |   3 ++
 4 files changed, 111 insertions(+), 9 deletions(-)

-- 
1.8.1.4

1 2 >

1 - 100 of 124 matches

Mail list logo