[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

2016-10-20 Thread Oleg Kuporosov
Hello Konstantin,

> 
> My vote also would be to have timestamp in the second cache line.
> About moving seqn to the 2-nd cache line too - that's probably a fair point.

It may impact throughput till ~6% for applications required such embedded 
Timestamps.

> About the rest of the patch:
> Do you really need to put that code into the PMDs itself?
> Can't the same result be achieved by using RX callbacks?
> Again that approach would work with any PMD and there would be no need
> to modify PMD code itself.

Correct, the approach with using callbacs 
(rte_eth_timesync_read_[r|t]x_timestamp())
Has also some Cons for this use case:
- FSI needs the most accurate stamping as possible by reasons were described in
Cover letter
- callback will be called from user app and so you have to take into account 
Difference between time when packet was released by NIC and callback call
- such difference is not easy to estimate correctly due to dependency on CPU 
type,
Its frequency and current load conditions
- even estimated it would be hard without additional performance penalty to 
align
Packet with timestamp, taking account that call may actually placed from
Different thread or even process.

It looks the least impacting and correct way is to have timestamp in rte_mbuf 
and fill
It in Rx burst procedure.

> Another thing, that I am in doubt why to use system time?
> Wouldn't raw HW TSC value (rte_rdtsc()) do here?

System time is being used for periodic clock synchronization between wall
clock and NIC to estimate NIC clock deviation. It is in assumption the system 
itself is
synchronized by PTP from master clock. It is run on context of control thread.

Thanks,
Oleg.


[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

2016-10-19 Thread Oleg Kuporosov
Hello Reshma

> I just read this mail chain, to make every one aware again, I am emphasizing
> on the point that I am also adding new "time_arraived" field to mbuf struct as
> part of  below 17.02 Road map item.

Thanks for your work for extending statistics support in DPDK.

"time_arrived" points here for mostly RX path, while more common term used 
"timestamp"
Allows also using it for TX path as well in the future.

Best regards,
Oleg.


[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

2016-10-19 Thread Oleg Kuporosov
Hello Oliver

Great thanks for your review and con

> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding a field for a
> pretty rare case used by only one NIC :)

It  may be looks so, but in fact not only for one NIC. Absence of timestamp 
there 
Required from developers implement its support in out of DPDK data path with
Local DPDK patching which also may lead some penalty in accuracy.
Good example here is open source implementation of tracing library - 
https://github.com/wanduow/libtrace/tree/libtrace4

These folks patched DPDK to have timestamp for every packet (Intel DPDK Patches 
folder)
And used it in out of band to store and later processing (lib/format_dpdk.c).
That was actually the starting point of my investigations.
Such approach is working for 1GBb case but not for 10-100 cases.

> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch 
> submission
> is the occasion to raise the question: how to decide what should be in the
> first part of the mbuf? There are also some other candidates for moving: m-
> >seqn is only used in librte_reorder and it is not set in the RX part of a 
> >driver.

Agree, it is difficult to decide, my thoughts were:
- there is hole (6 bytes) which wasn't marked as reserved for any planned 
extensions;
-vlan_tci_outer is being used by only one NIC (i40e) so far, 2nd NIC (fm10k) is 
using it
With comment:
 * mbuf->vlan_tci_outer is an idle field in fm10k driver,
 * so it can be selected to store sglort value.
To store some another value under some specific "if".

Also for i40e it is under #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC which per doc 
should be
Enabled for high throughput of small packets. So in default case (disabled) it 
anyway has
some performance penalty with using 32 bytes descriptor and moving it to 2nd CL 
would
not hit big additional penalty. Unfortunately I have no such NIC to measure.
Is there any data how often double tagging in being used  in  DPDK applications?

Another my thought was to have at the end of 1st CL enum which may hold
Reserved fields per specific use cases and data widths (uint8, 2xuint4, 
4xuint2, 8xbytes).

> 
> About the timestamp, it would be valuable to have other opinions, not only
> about the placement of the field in the structure, but also to check that this
> API is also usable for other NICs.

Sure, but I didn't change timesync/timestamping API itself.

> Have you measured the impact of having the timestamp in the second part
> of the mbuf?

Yes, the worst case with prefetching of 2nd CL it is 5.7-5.9 % penalty for 
combined RX+TX
And expectedly much worse without prefetching.
In the best case it is 0.3..0.5 % for RX only. It can be explained by much 
harder cache
trashing when TX is "on".

> Changing the mbuf structure should happen as rarely as possible, and we
> have to make sure we take the correct decisions. I think we will discuss this 
> at
> dpdk userland 2016.

Oh, yes, please discuss, I would not be able to join. :(

> Apart from that, I wonder if an ol_flag should be added to tell that the
> timestamp field is valid in the mbuf.

Oliver, there is PKT_RX_IEEE1588_TMST flag already.

Best regards,
Oleg.


[dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support

2016-10-13 Thread Oleg Kuporosov
Accurate RX timestaping is minimal requirement for several important
financial services industry requirement for several applications like
packet capture.

Support of periodic time synchronization with system time and adjusting
clock deviation was also added. We assume the system is synchronized by
PTP itself, so there is no links to PTP client implementation in DPDK.
Time synchronization is run by control thread by rte_alarm for each 5
seconds.  New synchronization object is copied to each configured RXQ
for TS calculation.

RX timestamp is calculated in nanoseconds and stored in rte_mbuf.
RX timestamp doesn't require additional API as simply can be
read from timestamp field of rte_mbuf.

TX timestamps were not added due to several reasons - there is no API yet,
issue with burst mode as TS will be provided only for the latest packet in
the burst and rte_mbuf will be discarded immediately after completion.

Enabling/disabling time synchronization DPDK API was added.

Signed-off-by: Oleg Kuporosov 
---
 drivers/net/mlx5/mlx5.c  |   7 +-
 drivers/net/mlx5/mlx5.h  |  10 +-
 drivers/net/mlx5/mlx5_defs.h |   4 +
 drivers/net/mlx5/mlx5_ethdev.c   | 222 ++-
 drivers/net/mlx5/mlx5_rxq.c  |   2 +
 drivers/net/mlx5/mlx5_rxtx.c |  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   7 +-
 drivers/net/mlx5/mlx5_time.h |  53 
 drivers/net/mlx5/mlx5_trigger.c  |   1 +
 lib/librte_eal/common/include/rte_time.h |  45 +++
 10 files changed, 360 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 83bdf65..64dab28 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -122,6 +122,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
  (void *)dev,
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
+   mlx5_timesync_disable(dev);
priv_dev_interrupt_handler_uninstall(priv, dev);
priv_special_flow_disable_all(priv);
priv_mac_addrs_disable(priv);
@@ -219,6 +220,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.rss_hash_update = mlx5_rss_hash_update,
.rss_hash_conf_get = mlx5_rss_hash_conf_get,
.filter_ctrl = mlx5_dev_filter_ctrl,
+   .timesync_enable = mlx5_timesync_enable,
+   .timesync_disable = mlx5_timesync_disable,
 };

 static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d4fb5ff..49447c4 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -54,6 +54,7 @@
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-pedantic"
 #endif
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +65,7 @@
 #endif

 #include "mlx5_utils.h"
+#include "mlx5_time.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -113,6 +115,7 @@ struct priv {
unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
+   unsigned int timesync_en:1; /* Timesync (timestamping) enabled */
unsigned int txq_inline; /* Maximum packet size for inlining. */
unsigned int txqs_inline; /* Queue number threshold for inlining. */
/* RX/TX queues. */
@@ -120,6 +123,7 @@ struct priv {
unsigned int txqs_n; /* TX queues array size. */
struct rxq *(*rxqs)[]; /* RX queues. */
struct txq *(*txqs)[]; /* TX queues. */
+   struct mlx5_timesync timesync; /* time synronization object */
/* Indirection tables referencing all RX WQs. */
struct ibv_exp_rwq_ind_table *(*ind_tables)[];
unsigned int ind_tables_n; /* Number of indirection tables. */
@@ -203,6 +207,8 @@ int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
 void priv_select_tx_function(struct priv *);
 void priv_select_rx_function(struct priv *);
+int mlx5_timesync_enable(struct rte_eth_dev *dev);
+int mlx5_timesync_disable(struct rte_eth_dev *dev)

[dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps

2016-10-13 Thread Oleg Kuporosov
Implemented two methods of control

- by --enable-timestamps CL testpmd application we can enable timestamping
  for all ports;
- in interactive mode port config  timestamps on|off is able to
  configure timestamping per specific port.

The control doesn't interact with IEEE1588 PTP implementation there as
it is under macro compilation but can be extended in the future.

This feature is required for debugging/testing purposes for real time HW
packet timestamping.

Signed-off-by: Oleg Kuporosov 
---
 app/test-pmd/cmdline.c  | 122 
 app/test-pmd/parameters.c   |   4 +
 app/test-pmd/testpmd.c  |   5 ++
 app/test-pmd/testpmd.h  |   1 +
 doc/guides/testpmd_app_ug/run_app.rst   |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 6 files changed, 143 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 17d238f..9b202ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -561,6 +561,9 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
" for ports.\n\n"

+   "port config (port_id|all) timestamps (on|off)\n"
+   "Enable/disable packet timestamps.\n\n"
+
"port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n"
"Set the RSS mode.\n\n"

@@ -10188,6 +10191,123 @@ cmdline_parse_inst_t 
cmd_config_l2_tunnel_en_dis_specific = {
},
 };

+/* Configure port timestamping */
+struct cmd_config_timestamps_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t config;
+   cmdline_fixed_string_t all;
+   uint8_t id;
+   cmdline_fixed_string_t timestamps;
+   cmdline_fixed_string_t mode;
+};
+
+cmdline_parse_token_string_t cmd_config_timestamps_port =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+port, "port");
+cmdline_parse_token_string_t cmd_config_timestamps_config =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+config, "config");
+cmdline_parse_token_string_t cmd_config_timestamps_all_str =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+all, "all");
+cmdline_parse_token_num_t cmd_config_timestamps_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_timestamps_result,
+id, UINT8);
+cmdline_parse_token_string_t cmd_config_timestamps_ts_str =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+   timestamps, "timestamps");
+cmdline_parse_token_string_t cmd_config_timestamps_path =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+   mode, "on#off");
+
+/* enable/disable timestamps (on/off) for a port */
+static void
+cmd_config_timestamps_all_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_config_timestamps_result *res = parsed_result;
+   portid_t pid;
+   uint8_t mode = 0;
+
+   if (!strcmp("on", res->mode))
+   mode = 1;
+   else if (!strcmp("off", res->mode))
+   mode = 0;
+   else {
+   printf("Unknown timestamps mode parameter\n");
+   return;
+   }
+   FOREACH_PORT(pid, ports) {
+   if (mode)
+   rte_eth_timesync_enable(pid);
+   else
+   rte_eth_timesync_disable(pid);
+   }
+}
+
+cmdline_parse_inst_t cmd_config_timestamps_all = {
+   .f = cmd_config_timestamps_all_parsed,
+   .data = NULL,
+   .help_str = "port config all timestamps on|off",
+   .tokens = {
+   (void *)_config_timestamps_port,
+   (void *)_config_timestamps_config,
+   (void *)_config_timestamps_all_str,
+   (void *)_config_timestamps_ts_str,
+   (void *)_config_timestamps_path,
+   NULL,
+   },
+};
+
+/* enable/disable timestamps (rx/tx/both) for a port */
+static void
+cmd_config_timestamps_specific_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_config_timestamps_result *res =
+   parsed_result;
+   uint8_t mode = 0;
+
+   if (port_id_is_invalid(res->id, ENABLED_WARN))
+   return;
+   if (!strcmp("on", res->mode))
+   mode = 1;
+   else if (!strcmp(&

[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

2016-10-13 Thread Oleg Kuporosov
The hard requirement of financial services industry is accurate
timestamping aligned with the packet itself. This patch is to satisfy
this requirement:

- include uint64_t timestamp field into rte_mbuf with minimal impact to
  throughput/latency. Keep it just simple uint64_t in ns (more than 580
  years) would be enough for immediate needs while using full
  struct timespec with twice bigger size would have much stronger
  performance impact as missed cacheline0.

- it is possible as there is 6-bytes gap in 1st cacheline (fast path)
  and moving uint16_t vlan_tci_outer field to 2nd cacheline.

- such move will only impact for pretty rare usable VLAN RX stripping
  mode for outer TCI (it used only for one NIC i40e from the whole set and
  allows to keep minimal performance impact for RX/TX timestamps.

Signed-off-by: Oleg Kuporosov 
---
 lib/librte_mbuf/rte_mbuf.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 23b7bf8..1e1f2ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -851,8 +851,7 @@ struct rte_mbuf {

uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */

-   /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
-   uint16_t vlan_tci_outer;
+   uint64_t timestamp;   /**< Packet's timestamp, usually in ns */

/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
@@ -885,6 +884,9 @@ struct rte_mbuf {
};
};

+   /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+   uint16_t vlan_tci_outer;
+
/** Size of the application private data. In case of an indirect
 * mbuf, it stores the direct mbuf private data size. */
uint16_t priv_size;
-- 
1.8.3.1



[dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support

2016-10-13 Thread Oleg Kuporosov

Hello DPDK Developers,

Financial Services Industry which is pretty eager for several DPDK
features especially low latency while high throughput. The major issue
so far for increasing DPDK adoption there is requirement for several
applications (capturers, some trading algorithms) to support of accurate
timestamping. The requirement mostly came from regulatory and customers
need strictly follow it.

Current state of timestamping support in DPDK looks pretty good:
 - there is API to enable/disable timestamping acquisition by 
   rte_eth_timesync_enable/disable
 - get timestamps itself by timesync_read_rx/tx_timestamp
 - and even implementation of NTP IEEE 1588 for time synchronization
   by rte_eth_timesync_adjust_read/write_time APIs.

But it misses the most important feature there ? embedding timestamp
in rte_mbuf aligning it with packet.

We would like to change this to increase DPDK adoption for several new
DPDK-based applications for FSI segment. It also might be very
applicable for several RT media and debugging purposes of network
flows/streams in other segments like HPC.

There are several thoughts how to improve there:
 - include uint64_t timestamp field into rte_mbuf with minimal impact
   to throughput/latency. Keep it just simple uint64_t in ns (more than
   580 years) would be enough for immediate needs while using full
   struct timespec with twice bigger size would have much stronger
   performance impact as missed cacheline0. It is possible as there is
   6-bytes gap in 1st cacheline (fast path) and moving uint16_t
   vlan_tci_outer field to 2nd cacheline. 
 - such move will only impact for pretty rare usable VLAN RX stripping
   mode for outer TCI (it used only for one NIC i40e from the whole
   set and keep minimal performance impact for timestamps.  
 - PMD can fill the field in their callback completion routines
   depending on enabling this feature in runtime.

We evaluated other possible options but looks it will have even worse
performance impact.


Oleg Kuporosov (3):
  mbuf: embedding timestamp into the packet
  app/testpmd: enabled control for packet timestamps
  net/mlx5: implementation of Rx packet timestamping support

 app/test-pmd/cmdline.c  | 122 +++
 app/test-pmd/parameters.c   |   4 +
 app/test-pmd/testpmd.c  |   5 +
 app/test-pmd/testpmd.h  |   1 +
 doc/guides/testpmd_app_ug/run_app.rst   |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 +
 drivers/net/mlx5/mlx5.c |   7 +-
 drivers/net/mlx5/mlx5.h |  10 +-
 drivers/net/mlx5/mlx5_defs.h|   4 +
 drivers/net/mlx5/mlx5_ethdev.c  | 222 +++-
 drivers/net/mlx5/mlx5_rxq.c |   2 +
 drivers/net/mlx5/mlx5_rxtx.c|  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h|   7 +-
 drivers/net/mlx5/mlx5_time.h|  53 +++
 drivers/net/mlx5/mlx5_trigger.c |   1 +
 lib/librte_eal/common/include/rte_time.h|  45 ++
 lib/librte_mbuf/rte_mbuf.h  |   6 +-
 17 files changed, 507 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

Thanks,
Oleg.
-- 
1.8.3.1