Re: [dpdk-dev] [PATCH v4 2/2] doc: add guide for debug and troubleshoot

2019-01-20 Thread Varghese, Vipin
Thanks Marko, I will spin v5 with the changes asap.

Note: Just wondering why 'devtools/checkpatches.sh' did not report any error.

Thanks
Vipin Varghese

> -Original Message-
> From: Kovacevic, Marko
> Sent: Friday, January 18, 2019 8:59 PM
> To: Varghese, Vipin ; dev@dpdk.org;
> shreyansh.j...@nxp.com; tho...@monjalon.net
> Cc: Mcnamara, John ; Patel, Amol
> ; Padubidri, Sanjay A 
> Subject: RE: [PATCH v4 2/2] doc: add guide for debug and troubleshoot
> 
> After checking the patch again I found a few spelling mistakes
> 
> > Add user guide on debug and troubleshoot for common issues and
> > bottleneck found in sample application model.
> >
> > Signed-off-by: Vipin Varghese 
> > Acked-by: Marko Kovacevic 
> > ---
> >  doc/guides/howto/debug_troubleshoot_guide.rst | 375
> > ++
> >  doc/guides/howto/index.rst|   1 +
> >  2 files changed, 376 insertions(+)
> >  create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst
> >
> 
> <...>
> 
> receieve / receive
> 
> > +-  If stats for RX and drops updated on same queue? check
> > + receieve
> > thread
> > +-  If packet does not reach PMD? check if offload for port and queue
> > +   matches to traffic pattern send.
> > +
> 
> <...>
> 
> Offlaod/ offload
> 
> > +-  Is the packet multi segmented? Check if port and queue offlaod is 
> > set.
> > +
> > +Are there object drops in producer point for ring?
> > +~~
> 
> <...>
> 
> sufficent / sufficient
> 
> > +-  Are drops on specific socket? If yes check if there are sufficent
> > +   objects by rte_mempool_get_count() or rte_mempool_avail_count()
> > +-  Is 'rte_mempool_get_count() or rte_mempool_avail_count()' zero?
> > +   application requires more objects hence reconfigure number of
> > +   elements in rte_mempool_create().
> > +-  Is there single RX thread for multiple NIC? try having multiple
> > +   lcore to read from fixed interface or we might be hitting cache
> > +   limit, so increase cache_size for pool_create().
> > +
> 
> Sceanrios/ scenarios
> 
> > +#. Is performance low for some sceanrios?
> > +-  Check if sufficient objects in mempool by rte_mempool_avail_count()
> > +-  Is failure seen in some packets? we might be getting packets with
> > +   'size > mbuf data size'.
> > +-  Is NIC offload or application handling multi segment mbuf? check the
> > +   special packets are continuous with rte_pktmbuf_is_contiguous().
> > +-  If there separate user threads used to access mempool objects, use
> > +   rte_mempool_cache_create() for non DPDK threads.
> 
> debuging / debugging
> 
> > +-  Is the error reproducible with 1GB hugepage? If no, then try 
> > debuging
> > +   the issue with lookup table or objects with rte_mem_lock_page().
> > +
> > +.. note::
> > +  Stall in release of MBUF can be because
> 
> <...>
> 
> softwre / software
> 
> > +-  If softwre crypto is in use, check if the CRYPTO Library is build 
> > with
> > +   right (SIMD) flags or check if the queue pair using CPU ISA for
> > +   feature_flags AVX|SSE|NEON using rte_cryptodev_info_get()
> 
> Assited/ assisted
> 
> > +-  If its hardware assited crypto showing performance variance? Check 
> > if
> > +   hardware is on same NUMA socket as queue pair and session pool.
> > +
> 
> <...>
> 
> exceeeding / exceeding
> 
> > +   core? registered functions may be exceeeding the desired time slots
> > +   while running on same service core.
> > +-  Is function is running on RTE core? check if there are conflicting
> > +   functions running on same CPU core by rte_thread_get_affinity().
> > +
> 
> <...>
> 
> > +#. Where to capture packets?
> > +-  Enable pdump in primary to allow secondary to access queue-pair for
> > +   ports. Thus packets are copied over in RX|TX callback by secondary
> > +   process using ring buffers.
> > +-  To capture packet in middle of pipeline stage, user specific hooks
> > +   or callback are to be used to copy the packets. These packets
> > +can
> 
> secodnary / secondary
> 
> > +   be shared to secodnary process via user defined custom rings.
> > +
> > +Issue still persists?
> > +~
> > +
> > +#. Are there custom or vendor specific offload meta data?
> > +-  From PMD, then check for META data error and drops.
> > +-  From application, then check for META data error and drops.
> > +#. Is multiprocess is used configuration and data processing?
> > +-  Check enabling or disabling features from secondary is
> > +supported or
> > not?
> 
> Obejcts/ objects
> 
> > +#. Is there drops for certain scenario for packets or obejcts?
> > +-  Check user private data in objects by dumping the details for debug.
> > +
> <...>
> 
> Thanks,
> Marko K


Re: [dpdk-dev] [PATCH v4 2/2] doc: add guide for debug and troubleshoot

2019-01-18 Thread Kovacevic, Marko
After checking the patch again I found a few spelling mistakes

> Add user guide on debug and troubleshoot for common issues and
> bottleneck found in sample application model.
> 
> Signed-off-by: Vipin Varghese 
> Acked-by: Marko Kovacevic 
> ---
>  doc/guides/howto/debug_troubleshoot_guide.rst | 375
> ++
>  doc/guides/howto/index.rst|   1 +
>  2 files changed, 376 insertions(+)
>  create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst
>

<...>
 
receieve / receive

> +-  If stats for RX and drops updated on same queue? check receieve
> thread
> +-  If packet does not reach PMD? check if offload for port and queue
> +   matches to traffic pattern send.
> +

<...>

Offlaod/ offload
 
> +-  Is the packet multi segmented? Check if port and queue offlaod is set.
> +
> +Are there object drops in producer point for ring?
> +~~

<...>

sufficent / sufficient 

> +-  Are drops on specific socket? If yes check if there are sufficent
> +   objects by rte_mempool_get_count() or rte_mempool_avail_count()
> +-  Is 'rte_mempool_get_count() or rte_mempool_avail_count()' zero?
> +   application requires more objects hence reconfigure number of
> +   elements in rte_mempool_create().
> +-  Is there single RX thread for multiple NIC? try having multiple
> +   lcore to read from fixed interface or we might be hitting cache
> +   limit, so increase cache_size for pool_create().
> +

Sceanrios/ scenarios
 
> +#. Is performance low for some sceanrios?
> +-  Check if sufficient objects in mempool by rte_mempool_avail_count()
> +-  Is failure seen in some packets? we might be getting packets with
> +   'size > mbuf data size'.
> +-  Is NIC offload or application handling multi segment mbuf? check the
> +   special packets are continuous with rte_pktmbuf_is_contiguous().
> +-  If there separate user threads used to access mempool objects, use
> +   rte_mempool_cache_create() for non DPDK threads.

debuging / debugging 

> +-  Is the error reproducible with 1GB hugepage? If no, then try debuging
> +   the issue with lookup table or objects with rte_mem_lock_page().
> +
> +.. note::
> +  Stall in release of MBUF can be because

<...>

softwre / software

> +-  If softwre crypto is in use, check if the CRYPTO Library is build with
> +   right (SIMD) flags or check if the queue pair using CPU ISA for
> +   feature_flags AVX|SSE|NEON using rte_cryptodev_info_get()

Assited/ assisted 

> +-  If its hardware assited crypto showing performance variance? Check if
> +   hardware is on same NUMA socket as queue pair and session pool.
> +

<...>

exceeeding / exceeding 

> +   core? registered functions may be exceeeding the desired time slots
> +   while running on same service core.
> +-  Is function is running on RTE core? check if there are conflicting
> +   functions running on same CPU core by rte_thread_get_affinity().
> +

<...>

> +#. Where to capture packets?
> +-  Enable pdump in primary to allow secondary to access queue-pair for
> +   ports. Thus packets are copied over in RX|TX callback by secondary
> +   process using ring buffers.
> +-  To capture packet in middle of pipeline stage, user specific hooks
> +   or callback are to be used to copy the packets. These packets can

secodnary / secondary 

> +   be shared to secodnary process via user defined custom rings.
> +
> +Issue still persists?
> +~
> +
> +#. Are there custom or vendor specific offload meta data?
> +-  From PMD, then check for META data error and drops.
> +-  From application, then check for META data error and drops.
> +#. Is multiprocess is used configuration and data processing?
> +-  Check enabling or disabling features from secondary is supported or
> not?

Obejcts/ objects 

> +#. Is there drops for certain scenario for packets or obejcts?
> +-  Check user private data in objects by dumping the details for debug.
> +
<...>

Thanks,
Marko K


[dpdk-dev] [PATCH v4 2/2] doc: add guide for debug and troubleshoot

2019-01-16 Thread Vipin Varghese
Add user guide on debug and troubleshoot for common issues and bottleneck
found in sample application model.

Signed-off-by: Vipin Varghese 
Acked-by: Marko Kovacevic 
---
 doc/guides/howto/debug_troubleshoot_guide.rst | 375 ++
 doc/guides/howto/index.rst|   1 +
 2 files changed, 376 insertions(+)
 create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst

diff --git a/doc/guides/howto/debug_troubleshoot_guide.rst 
b/doc/guides/howto/debug_troubleshoot_guide.rst
new file mode 100644
index 0..f2e337bb1
--- /dev/null
+++ b/doc/guides/howto/debug_troubleshoot_guide.rst
@@ -0,0 +1,375 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2018 Intel Corporation.
+
+.. _debug_troubleshoot_via_pmd:
+
+Debug & Troubleshoot guide via PMD
+==
+
+DPDK applications can be designed to run as single thread simple stage to
+multiple threads with complex pipeline stages. These application can use poll
+mode devices which helps in offloading CPU cycles. A few models are
+
+  *  single primary
+  *  multiple primary
+  *  single primary single secondary
+  *  single primary multiple secondary
+
+In all the above cases, it is a tedious task to isolate, debug and understand
+odd behaviour which occurs randomly or periodically. The goal of guide is to
+share and explore a few commonly seen patterns and behaviour. Then, isolate
+and identify the root cause via step by step debug at various processing
+stages.
+
+Application Overview
+
+
+Let us take up an example application as reference for explaining issues and
+patterns commonly seen. The sample application in discussion makes use of
+single primary model with various pipeline stages. The application uses PMD
+and libraries such as service cores, mempool, pkt mbuf, event, crypto, QoS
+and eth.
+
+The overview of an application modeled using PMD is shown in
+:numref:`dtg_sample_app_model`.
+
+.. _dtg_sample_app_model:
+
+.. figure:: img/dtg_sample_app_model.*
+
+   Overview of pipeline stage of an application
+
+Bottleneck Analysis
+---
+
+To debug the bottleneck and performance issues the desired application
+is made to run in an environment matching as below
+
+#. Linux 64-bit|32-bit
+#. DPDK PMD and libraries are used
+#. Libraries and PMD are either static or shared. But not both
+#. Machine flag optimizations of gcc or compiler are made constant
+
+Is there mismatch in packet rate (received < send)?
+~~~
+
+RX Port and associated core :numref:`dtg_rx_rate`.
+
+.. _dtg_rx_rate:
+
+.. figure:: img/dtg_rx_rate.*
+
+   RX send rate compared against Received rate
+
+#. Are generic configuration correct?
+-  What is port Speed, Duplex? rte_eth_link_get()
+-  Are packets of higher sizes are dropped? rte_eth_get_mtu()
+-  Are only specific MAC received? rte_eth_promiscuous_get()
+
+#. Are there NIC specific drops?
+-  Check rte_eth_rx_queue_info_get() for nb_desc and scattered_rx
+-  Is RSS enabled? rte_eth_dev_rss_hash_conf_get()
+-  Are packets spread on all queues? rte_eth_dev_stats()
+-  If stats for RX and drops updated on same queue? check receieve thread
+-  If packet does not reach PMD? check if offload for port and queue
+   matches to traffic pattern send.
+
+#. If problem still persists, this might be at RX lcore thread
+-  Check if RX thread, distributor or event rx adapter? these may be
+   processing less than required
+-  Is the application is build using processing pipeline with RX stage? If
+   there are multiple port-pair tied to a single RX core, try to debug by
+   using rte_prefetch_non_temporal(). This will intimate the mbuf in cache
+   is temporary.
+
+Are there packet drops (receive|transmit)?
+~~
+
+RX-TX Port and associated cores :numref:`dtg_rx_tx_drop`.
+
+.. _dtg_rx_tx_drop:
+
+.. figure:: img/dtg_rx_tx_drop.*
+
+   RX-TX drops
+
+#. At RX
+-  Get RX queue count? nb_rx_queues using rte_eth_dev_info_get()
+-  Are there miss, errors, qerros? rte_eth_dev_stats() for imissed,
+   ierrors, q_erros, rx_nombuf, rte_mbuf_ref_count
+
+#. At TX
+-  Are you doing in bulk TX? check application for TX descriptor overhead.
+-  Are there TX errors? rte_eth_dev_stats() for oerrors and qerros
+-  Is specific scenarios not releasing mbuf? check rte_mbuf_ref_count of
+   those packets.
+-  Is the packet multi segmented? Check if port and queue offlaod is set.
+
+Are there object drops in producer point for ring?
+~~
+
+Producer point for ring :numref:`dtg_producer_ring`.
+
+.. _dtg_producer_ring:
+
+.. figure:: img/dtg_producer_ring.*
+
+   Producer point for Rings
+
+#. Performance for Producer
+-  Fetch the type of RING 'rte_ring_dump()' for flags (RING_F_SP_ENQ)
+-  If '(burst enqueue - actual enqueue) > 0'