[dpdk-dev] sysfs permission problem when running dpdk app in lxc
I found many other guys face the same issue as mine, but no replies to their question. Anyway, there are three major option groups including apparmor, seccome, and cgroup. It seems that some security options of lxc prevent my DPDK app from opening sysfs file in RW mode. I changed lxc.aa_profile to unconfied, and that resolved my issue. On Mon, Sep 5, 2016 at 2:33 PM, Moon-Sang Lee wrote: > > I'm using ubuntu 16.04 LTS as my host and installed lxd on it. > When I try to run my dpdk(2.2.0) app in the container, I got following > error message. > > EAL: lcore 9 is ready (tid=17fd700;cpuset=[9]) > EAL: lcore 3 is ready (tid=c915700;cpuset=[3]) > EAL: lcore 11 is ready (tid=ffc700;cpuset=[11]) > EAL: lcore 5 is ready (tid=27ff700;cpuset=[5]) > EAL: lcore 7 is ready (tid=1ffe700;cpuset=[7]) > EAL: lcore 13 is ready (tid=7fb700;cpuset=[13]) > EAL: PCI device :06:00.0 on NUMA socket -1 > EAL: probe driver: 8086:10e8 rte_igb_pmd > EAL: Not managed by a supported kernel driver, skipped > EAL: PCI device :06:00.1 on NUMA socket -1 > EAL: probe driver: 8086:10e8 rte_igb_pmd > EAL: Not managed by a supported kernel driver, skipped > EAL: PCI device :07:00.0 on NUMA socket -1 > EAL: probe driver: 8086:10e8 rte_igb_pmd > EAL: Cannot open /sys/class/uio/uio0/device/config: Permission denied > EAL: Error - exiting with code: 1 > Cause: Requested device :07:00.0 cannot be used > > > > However, it seems ok of the permission when I list those files. > I appreciate any comments. > > root at test4:~# cat /proc/mounts |grep sysfs > sysfs /sys sysfs rw,relatime 0 0 > root at test4:~# ls -al /sys/class/uio/uio0/device/config > -rw-r--r-- 1 root root 4096 Sep 5 04:56 /sys/class/uio/uio0/device/config > root at test4:~# ls -al /dev/uio0 > crw-rw-rw- 1 root root 243, 0 Sep 5 04:16 /dev/uio0 > root at test4:~# > > > > -- > Moon-Sang Lee, SW Engineer > Email: sang0627 at gmail.com > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] sysfs permission problem when running dpdk app in lxc
I'm using ubuntu 16.04 LTS as my host and installed lxd on it. When I try to run my dpdk(2.2.0) app in the container, I got following error message. EAL: lcore 9 is ready (tid=17fd700;cpuset=[9]) EAL: lcore 3 is ready (tid=c915700;cpuset=[3]) EAL: lcore 11 is ready (tid=ffc700;cpuset=[11]) EAL: lcore 5 is ready (tid=27ff700;cpuset=[5]) EAL: lcore 7 is ready (tid=1ffe700;cpuset=[7]) EAL: lcore 13 is ready (tid=7fb700;cpuset=[13]) EAL: PCI device :06:00.0 on NUMA socket -1 EAL: probe driver: 8086:10e8 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device :06:00.1 on NUMA socket -1 EAL: probe driver: 8086:10e8 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device :07:00.0 on NUMA socket -1 EAL: probe driver: 8086:10e8 rte_igb_pmd EAL: Cannot open /sys/class/uio/uio0/device/config: Permission denied EAL: Error - exiting with code: 1 Cause: Requested device :07:00.0 cannot be used However, it seems ok of the permission when I list those files. I appreciate any comments. root at test4:~# cat /proc/mounts |grep sysfs sysfs /sys sysfs rw,relatime 0 0 root at test4:~# ls -al /sys/class/uio/uio0/device/config -rw-r--r-- 1 root root 4096 Sep 5 04:56 /sys/class/uio/uio0/device/config root at test4:~# ls -al /dev/uio0 crw-rw-rw- 1 root root 243, 0 Sep 5 04:16 /dev/uio0 root at test4:~# -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] KNI port type in IP pipeline
According to pp. 145 of DPDK programmer's guide 2.2.0, the KNI port type is described in table 23.1. But, I cannot find any material about how to specify the KNI port type in pipeline configuration file. Unfortunately, it seems there is no related source file in $DPDK_TOP/lib/librte_port. Does packet framework already implement the KNI port type somewhere or should I implement that KNI port type by myself? regards, -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] rte_lcore_to_socket_id(lcore_id) mismatches to that of lstopo
I printed cpu layout with cpu_layout.py tool in dpdk tools directory and lstopo linux command. They shows the same result that my lcore 0, 2, 4, and 6 are in socket #1. However, rte_lcore_to_socket_id() returns 0 for lcore 0, 2, 4, and 6. Why does this difference occur and which value should I use to match lcore to socket? (i.e. I'm using dpdk 2.2.0 on Xeon E5520 that is based on nehalem microarchitecture.) [mslee at centos7 tools]$ ./cpu_layout.py Core and Socket Information (as reported by '/proc/cpuinfo') cores = [0, 1, 2, 3] sockets = [1, 0] Socket 1Socket 0 Core 0 [0, 8] [1, 9] Core 1 [2, 10] [3, 11] Core 2 [4, 12] [5, 13] Core 3 [6, 14] [7, 15] code fragment: socketid = rte_lcore_to_socket_id(lcore_id); RTE_LOG(INFO, APP, "init_mem: lcore_id = %d, socketid = %d\n", lcore_id, socketid); log fragment: APP: init_mem: lcore_id = 0, socketid = 0 APP: init_mem: lcore_id = 2, socketid = 0 APP: init_mem: lcore_id = 4, socketid = 0 APP: init_mem: lcore_id = 6, socketid = 0 -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
ort pause state [UINT16]: pause <all|tx|rx|none> Pause/unpause port EthApp> pause 0 Port 0: Rx Paused EthApp> pause 0 none PMD: ixgbe_flow_ctrl_set(): Rx packet buffer size = 0x8 Port 0: Tx & Rx not paused EthApp> pause 0 Port 0: Rx Paused EthApp> quit [root at centos7 app]# pwd On Thu, Jan 28, 2016 at 9:57 AM, Moon-Sang Lee wrote: > > Helin, I implemented my own sample application that is a kind of carrier > grade NAT server. > It works fine on 1G NIC (i.e. Intel Corporation 82576 Gigabit Network > Connection (rev 01)) > But, it does not receive packets on 10G NIC (i.e. Intel Corporation > 82598EB 10-Gigabit AF Network Connection (rev 01)) as described in the > previous email. > According to my log messages, it seems that control register for RX DMA is > not enabled. > > Here is some information about my environment. > > 1. HW & OS > [mslee at centos7 ~]$ uname -a > Linux centos7 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux > [mslee at centos7 ~]$ more /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 26 > model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz > stepping : 5 > microcode : 0x19 > cpu MHz : 2262.000 > cache size : 8192 KB > physical id : 1 > siblings : 8 > core id : 0 > cpu cores : 4 > apicid : 16 > initial apicid : 16 > fpu : yes > fpu_exception : yes > cpuid level : 11 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx > rdtscp lm c > onstant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc > aperfmperf > pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 > sse4_2 po > pcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid > bogomips : 4521.93 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > ... > > > > 2. port ocnfigure parameter for rte_eth_dev_configure(): > ret = rte_eth_dev_configure(port, NB_RXQ, NB_TXQ, _conf); > where NB_RXQ=1, NB_TXQ=2, and > struct rte_eth_conf port_conf = { > .rxmode = { > //.mq_mode = ETH_MQ_RX_RSS, > .mq_mode = ETH_MQ_RX_NONE, // for 10G NIC > .max_rx_pkt_len = ETHER_MAX_LEN, > .split_hdr_size = 0, > .header_split = 0, // Header Split disabled > .hw_ip_checksum = 0,// IP checksum offload enabled > .hw_vlan_filter = 0,// VLAN filtering disabled > .jumbo_frame = 0, // Jumbo Frame Support disabled > .hw_strip_crc = 0, // CRC stripped by hardware > }, > .rx_adv_conf = { > .rss_conf = { > .rss_key = NULL, > .rss_hf = ETH_RSS_IP, > }, > }, > .txmode = { > .mq_mode = ETH_MQ_TX_NONE, > }, > }; > > > > 3. rx queue setup parameter > ret = rte_eth_rx_queue_setup(port, RXQ_ID, NB_RXD, socket_id, NULL, > pktmbuf_pool[socket_id]) > where RXQ_ID = 0, NB_RXD = 128 > > > > 4. config parameters in config/common_linuxapp > # > # Compile burst-oriented IXGBE PMD driver > # > CONFIG_RTE_LIBRTE_IXGBE_PMD=y > CONFIG_RTE_LIBRTE_IXGBE_DEBUG_INIT=n > CONFIG_RTE_LIBRTE_IXGBE_DEBUG_RX=n > CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX=n > CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n > CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n > CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n > CONFIG_RTE_IXGBE_INC_VECTOR=y > CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y > > > > 5. where log message is printed > > dpdk-2.2.0/drivers/net/ixgbe/ixgbe_rxtx.c: > > /* Allocate buffers for descriptor rings */ > if (ixgbe_alloc_rx_queue_mbufs(rxq) != 0) { > PMD_INIT_LOG(ERR, "Could not alloc mbuf for queue:%d", > rx_queue_id); > return -1; > } > rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx)); > rxdctl |= IXGBE_RXDCTL_ENABLE; > IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), rxdctl); > > /* Wait until RX Enable ready */ > poll_ms = RTE_IXGBE_REGISTER_POLL_WAIT_10_MS; > do { > rte_delay_ms(1); > rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx)); > } while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE)); > if (!poll_ms) > PMD_INIT_LOG(ERR, "*Could not enable Rx Queue %d*", > rx_queue_id); > > > I'm going to update firmware of my NIC, but I'm not sure it helps. > I appreciate any comment. > > > > On Wed, Jan 27, 2016 at 4:23 PM, Z
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
Helin, I implemented my own sample application that is a kind of carrier grade NAT server. It works fine on 1G NIC (i.e. Intel Corporation 82576 Gigabit Network Connection (rev 01)) But, it does not receive packets on 10G NIC (i.e. Intel Corporation 82598EB 10-Gigabit AF Network Connection (rev 01)) as described in the previous email. According to my log messages, it seems that control register for RX DMA is not enabled. Here is some information about my environment. 1. HW & OS [mslee at centos7 ~]$ uname -a Linux centos7 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [mslee at centos7 ~]$ more /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz stepping : 5 microcode : 0x19 cpu MHz : 2262.000 cache size : 8192 KB physical id : 1 siblings : 8 core id : 0 cpu cores : 4 apicid : 16 initial apicid : 16 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm c onstant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 po pcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 4521.93 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ... 2. port ocnfigure parameter for rte_eth_dev_configure(): ret = rte_eth_dev_configure(port, NB_RXQ, NB_TXQ, _conf); where NB_RXQ=1, NB_TXQ=2, and struct rte_eth_conf port_conf = { .rxmode = { //.mq_mode = ETH_MQ_RX_RSS, .mq_mode = ETH_MQ_RX_NONE, // for 10G NIC .max_rx_pkt_len = ETHER_MAX_LEN, .split_hdr_size = 0, .header_split = 0, // Header Split disabled .hw_ip_checksum = 0,// IP checksum offload enabled .hw_vlan_filter = 0,// VLAN filtering disabled .jumbo_frame = 0, // Jumbo Frame Support disabled .hw_strip_crc = 0, // CRC stripped by hardware }, .rx_adv_conf = { .rss_conf = { .rss_key = NULL, .rss_hf = ETH_RSS_IP, }, }, .txmode = { .mq_mode = ETH_MQ_TX_NONE, }, }; 3. rx queue setup parameter ret = rte_eth_rx_queue_setup(port, RXQ_ID, NB_RXD, socket_id, NULL, pktmbuf_pool[socket_id]) where RXQ_ID = 0, NB_RXD = 128 4. config parameters in config/common_linuxapp # # Compile burst-oriented IXGBE PMD driver # CONFIG_RTE_LIBRTE_IXGBE_PMD=y CONFIG_RTE_LIBRTE_IXGBE_DEBUG_INIT=n CONFIG_RTE_LIBRTE_IXGBE_DEBUG_RX=n CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX=n CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n CONFIG_RTE_IXGBE_INC_VECTOR=y CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y 5. where log message is printed dpdk-2.2.0/drivers/net/ixgbe/ixgbe_rxtx.c: /* Allocate buffers for descriptor rings */ if (ixgbe_alloc_rx_queue_mbufs(rxq) != 0) { PMD_INIT_LOG(ERR, "Could not alloc mbuf for queue:%d", rx_queue_id); return -1; } rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx)); rxdctl |= IXGBE_RXDCTL_ENABLE; IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), rxdctl); /* Wait until RX Enable ready */ poll_ms = RTE_IXGBE_REGISTER_POLL_WAIT_10_MS; do { rte_delay_ms(1); rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx)); } while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE)); if (!poll_ms) PMD_INIT_LOG(ERR, "*Could not enable Rx Queue %d*", rx_queue_id); I'm going to update firmware of my NIC, but I'm not sure it helps. I appreciate any comment. On Wed, Jan 27, 2016 at 4:23 PM, Zhang, Helin wrote: > Moon-Sang > > Were you using pktgen or else application? > Could you help to share with me the detailed steps of your reproducing > that issue? > We will find time on that soon later. Thanks! > > Regards, > Helin > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Laurent GUERBY > Sent: Wednesday, January 27, 2016 3:16 PM > To: Moon-Sang Lee > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Errors Rx count increasing while pktgen doing > nothing on Intel 82598EB 10G > > On Wed, 2016-01-27 at 15:50 +0900, Moon-Sang Lee wrote: > > > > > > Laurent, have you resolved this problem? > > I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced > > the same problem as you. > > Here is parts of my log and it says that PMD cannot enable RX queue > > for my NIC. > > I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling > &g
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
n my original email above (plus extract of lspci > >-vn), here is the full output of the command: > > > >01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF > >Dual Port Network Connection (rev 01) > >01:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF > >Dual Port Network Connection (rev 01) > >05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > >RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) > > > >(The realtek is used only for internet connectivity). > > > >> Also send me the command line. > > > >On the first machine t1 : > >root at t1:~/pktgen-dpdk# ./app/app/x86_64-native-linuxapp-gcc/pktgen \ > >-c e -n 1 --proc-type auto -- -m '[2:3].1' -P -f t1-t3.pkt -N > > > >And on the other machine t3: > >root at t3:~/pktgen-dpdk# ./app/app/x86_64-native-linuxapp-gcc/pktgen \ > >-c e -n 1 --proc-type auto -- -m '[2:3].1' -P -f t3-t1.pkt -N > > You need to always start with port numbering for Pktgen with zero. > > Change the [2:3].1 to [2:3].0 the reason is you removed one of the two > ports and Pktgen starts port counting from zero for the first available > port :-) > > Sorry, I did not spot that sooner. > > > >The two "-f" pkt files are attached to this email, I do "start 1" > >manually at the pktgen prompt. > > > >Thanks for your time, > > > >Sincerely, > > > >Laurent > > > > > >> > > >> >Laurent > >> > > >> > > >> > >> > >> Regards, > >> Keith > >> > >> > >> > >> > > > > > > > Regards, > Keith > > > > > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] rte_prefetch0() is effective?
I see codes as below in example directory, and I wonder it is effective. Coherent IO is adopted to modern architectures, so I think that DMA initiation by rte_eth_rx_burst() might already fulfills cache lines of RX buffers. Do I really need to call rte_prefetchX()? nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, MAX_PKT_BURST); ... /* Prefetch and forward already prefetched packets */ for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) { rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[ j + PREFETCH_OFFSET], void *)); l3fwd_simple_forward(pkts_burst[j], portid, qconf); } -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] bytes order of ip header in dpdk 2.1
sorry, my code has a bug with regarding to byte order in convert_ip_to_string(). On Wed, Nov 18, 2015 at 5:47 PM, Moon-Sang Lee wrote: > > Once receiving a packet by rte_eth_rx_burst(), I printed IP header as > below. > It seems that length(uint16_t) is in big endian, but ip addresses are in > little endian. > I'm confused and appreciate any comment. > > ### print_ipv4_header src = 11.0.168.192, dst = 2.173.248.143, length = > 60, ttl = 64, proto = 6 > > This log was printed with these functions. > > void print_ipv4_header(struct rte_mbuf *m) { > struct ether_hdr *eth_header; > struct ipv4_hdr *ipv4_header; > uint16_t length; > uint8_t ttl, proto; > uint32_t src_ip, dst_ip; > char src_ip_str[MAX_IP_STR_LEN], dst_ip_str[MAX_IP_STR_LEN]; > > eth_header = rte_pktmbuf_mtod(m, struct ether_hdr *); > ipv4_header = (struct ipv4_hdr*)(eth_header + 1); > *length = rte_be_to_cpu_16(ipv4_header->total_length);* > ttl = ipv4_header->time_to_live; > proto = ipv4_header->next_proto_id; > *src_ip = rte_be_to_cpu_32(ipv4_header->src_addr);* > *dst_ip = rte_be_to_cpu_32(ipv4_header->dst_addr);* > convert_ip_to_string(src_ip_str, src_ip); > convert_ip_to_string(dst_ip_str, dst_ip); > > debug("### print_ipv4_header src = %s, dst = %s, length = %d, ttl = > %d, proto = %d", src_ip_str, dst_ip_str, length, ttl, proto); > } > > void convert_ip_to_string(char *str, uint32_t ip) { > unsigned char *ptr = (unsigned char *) > > sprintf(str, "%u.%u.%u.%u", ptr[0], ptr[1], ptr[2], ptr[3]); > } > > > > -- > Moon-Sang Lee, SW Engineer > Email: sang0627 at gmail.com > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] bytes order of ip header in dpdk 2.1
Once receiving a packet by rte_eth_rx_burst(), I printed IP header as below. It seems that length(uint16_t) is in big endian, but ip addresses are in little endian. I'm confused and appreciate any comment. ### print_ipv4_header src = 11.0.168.192, dst = 2.173.248.143, length = 60, ttl = 64, proto = 6 This log was printed with these functions. void print_ipv4_header(struct rte_mbuf *m) { struct ether_hdr *eth_header; struct ipv4_hdr *ipv4_header; uint16_t length; uint8_t ttl, proto; uint32_t src_ip, dst_ip; char src_ip_str[MAX_IP_STR_LEN], dst_ip_str[MAX_IP_STR_LEN]; eth_header = rte_pktmbuf_mtod(m, struct ether_hdr *); ipv4_header = (struct ipv4_hdr*)(eth_header + 1); *length = rte_be_to_cpu_16(ipv4_header->total_length);* ttl = ipv4_header->time_to_live; proto = ipv4_header->next_proto_id; *src_ip = rte_be_to_cpu_32(ipv4_header->src_addr);* *dst_ip = rte_be_to_cpu_32(ipv4_header->dst_addr);* convert_ip_to_string(src_ip_str, src_ip); convert_ip_to_string(dst_ip_str, dst_ip); debug("### print_ipv4_header src = %s, dst = %s, length = %d, ttl = %d, proto = %d", src_ip_str, dst_ip_str, length, ttl, proto); } void convert_ip_to_string(char *str, uint32_t ip) { unsigned char *ptr = (unsigned char *) sprintf(str, "%u.%u.%u.%u", ptr[0], ptr[1], ptr[2], ptr[3]); } -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] l2fwd in examples directory
Let me clarify my mixed stuffs. My processor is L5520, family 6, model 26 that is based on Nehalem microarchitecture according to wikipedia ( https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)), it does not have PCI interface on chipset. Therefore, "rte_eth_dev_socket_id(portid) always returns -1" seems no problem. My understanding of the lstopo result might be wrong. Thanks anyway. On Mon, Oct 19, 2015 at 4:39 PM, Moon-Sang Lee wrote: > > My NUT has Xeon L5520 that is based on Nehalem microarchitecture. > Does Nehalem supports PCIe interface on chipset? > > Anyhow, 'lstopo' shows as below and it seems that my PCI devices are > connected to socket #0. > I'm still wondering why rte_eth_dev_socket_id(portid) always returns -1. > > mslee at myhost:~$ lstopo > Machine (31GB) > NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (8192KB) > L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#8) > L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 > PU L#2 (P#2) > PU L#3 (P#10) > L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 > PU L#4 (P#4) > PU L#5 (P#12) > L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 > PU L#6 (P#6) > PU L#7 (P#14) > NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (8192KB) > L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 > PU L#8 (P#1) > PU L#9 (P#9) > L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 > PU L#10 (P#3) > PU L#11 (P#11) > L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 > PU L#12 (P#5) > PU L#13 (P#13) > L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 > PU L#14 (P#7) > PU L#15 (P#15) > HostBridge L#0 > PCIBridge > PCI 14e4:163b > Net L#0 "em1" > PCI 14e4:163b > Net L#1 "em2" > PCIBridge > PCI 1000:0058 > Block L#2 "sda" > Block L#3 "sdb" > PCIBridge > PCIBridge > PCIBridge > PCI 8086:10e8 > PCI 8086:10e8 > PCIBridge > PCI 8086:10e8 > PCI 8086:10e8 > PCIBridge > PCI 102b:0532 > PCI 8086:3a20 > PCI 8086:3a26 > Block L#4 "sr0" > mslee at myhost:~$ > > > > On Sun, Oct 18, 2015 at 2:51 PM, Moon-Sang Lee wrote: > >> >> thanks bruce. >> >> I didn't know that PCI slots have direct socket affinity. >> is it static or configurable through PCI configuration space? >> well, my NUT, two node NUMA, seems always returns -1 on calling >> rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values. >> I appreciate if you explain more about getting the affinity. >> >> p.s. >> I'm using intel Xeon processor and 1G NIC(82576). >> >> >> >> >> On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson < >> bruce.richardson at intel.com> wrote: >> >>> On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote: >>> > There is codes as below in examples/l2fwd/main.c and I think >>> > rte_eth_dev_socket_id(portid) >>> > always returns -1(SOCKET_ID_ANY) since there is no association code >>> between >>> > port and >>> > lcore in the example codes. >>> >>> Can you perhaps clarify what you mean here. On modern NUMA systems, such >>> as those >>> from Intel :-), the PCI slots are directly connected to the CPU sockets, >>> so the >>> ethernet ports do indeed have a direct NUMA affinity. It's not something >>> that >>> the app needs to specify. >>> >>> /Bruce >>> >>> > (i.e. I need to find a matching lcore from >>> > lcore_queue_conf[] with portid >>> > and call rte_lcore_to_socket_id(lcore_id).) >>> > >>> > /* init one RX queue */ >>> > fflush(stdout); >>> > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, >>> > rte_eth_dev_socket_id(portid), >>> > NULL, >>> > l2fwd_pktmbuf_pool); >>> > if (ret < 0) >>> > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, >>> > port=%u\n", >>> > ret, (unsigned) portid); >>> > >>> > It works fine even though memory is allocated in different NUMA node. >>> But I >>> > wonder there is >>> > a DPDK API that associates inlcore to port internally thus >>> > rte_eth_devices[portid].pci_dev->numa_node >>> > contains proper node. >>> > >>> > >>> > -- >>> > Moon-Sang Lee, SW Engineer >>> > Email: sang0627 at gmail.com >>> > Wisdom begins in wonder. *Socrates* >>> >> >> >> >> -- >> Moon-Sang Lee, SW Engineer >> Email: sang0627 at gmail.com >> Wisdom begins in wonder. *Socrates* >> > > > > -- > Moon-Sang Lee, SW Engineer > Email: sang0627 at gmail.com > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] l2fwd in examples directory
My NUT has Xeon L5520 that is based on Nehalem microarchitecture. Does Nehalem supports PCIe interface on chipset? Anyhow, 'lstopo' shows as below and it seems that my PCI devices are connected to socket #0. I'm still wondering why rte_eth_dev_socket_id(portid) always returns -1. mslee at myhost:~$ lstopo Machine (31GB) NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (8192KB) L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#8) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 PU L#2 (P#2) PU L#3 (P#10) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 PU L#4 (P#4) PU L#5 (P#12) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 PU L#6 (P#6) PU L#7 (P#14) NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (8192KB) L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 PU L#8 (P#1) PU L#9 (P#9) L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 PU L#10 (P#3) PU L#11 (P#11) L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 PU L#12 (P#5) PU L#13 (P#13) L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 PU L#14 (P#7) PU L#15 (P#15) HostBridge L#0 PCIBridge PCI 14e4:163b Net L#0 "em1" PCI 14e4:163b Net L#1 "em2" PCIBridge PCI 1000:0058 Block L#2 "sda" Block L#3 "sdb" PCIBridge PCIBridge PCIBridge PCI 8086:10e8 PCI 8086:10e8 PCIBridge PCI 8086:10e8 PCI 8086:10e8 PCIBridge PCI 102b:0532 PCI 8086:3a20 PCI 8086:3a26 Block L#4 "sr0" mslee at myhost:~$ On Sun, Oct 18, 2015 at 2:51 PM, Moon-Sang Lee wrote: > > thanks bruce. > > I didn't know that PCI slots have direct socket affinity. > is it static or configurable through PCI configuration space? > well, my NUT, two node NUMA, seems always returns -1 on calling > rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values. > I appreciate if you explain more about getting the affinity. > > p.s. > I'm using intel Xeon processor and 1G NIC(82576). > > > > > On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson < > bruce.richardson at intel.com> wrote: > >> On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote: >> > There is codes as below in examples/l2fwd/main.c and I think >> > rte_eth_dev_socket_id(portid) >> > always returns -1(SOCKET_ID_ANY) since there is no association code >> between >> > port and >> > lcore in the example codes. >> >> Can you perhaps clarify what you mean here. On modern NUMA systems, such >> as those >> from Intel :-), the PCI slots are directly connected to the CPU sockets, >> so the >> ethernet ports do indeed have a direct NUMA affinity. It's not something >> that >> the app needs to specify. >> >> /Bruce >> >> > (i.e. I need to find a matching lcore from >> > lcore_queue_conf[] with portid >> > and call rte_lcore_to_socket_id(lcore_id).) >> > >> > /* init one RX queue */ >> > fflush(stdout); >> > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, >> > rte_eth_dev_socket_id(portid), >> > NULL, >> > l2fwd_pktmbuf_pool); >> > if (ret < 0) >> > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, >> > port=%u\n", >> > ret, (unsigned) portid); >> > >> > It works fine even though memory is allocated in different NUMA node. >> But I >> > wonder there is >> > a DPDK API that associates inlcore to port internally thus >> > rte_eth_devices[portid].pci_dev->numa_node >> > contains proper node. >> > >> > >> > -- >> > Moon-Sang Lee, SW Engineer >> > Email: sang0627 at gmail.com >> > Wisdom begins in wonder. *Socrates* >> > > > > -- > Moon-Sang Lee, SW Engineer > Email: sang0627 at gmail.com > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] l2fwd in examples directory
thanks bruce. I didn't know that PCI slots have direct socket affinity. is it static or configurable through PCI configuration space? well, my NUT, two node NUMA, seems always returns -1 on calling rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values. I appreciate if you explain more about getting the affinity. p.s. I'm using intel Xeon processor and 1G NIC(82576). On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson < bruce.richardson at intel.com> wrote: > On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote: > > There is codes as below in examples/l2fwd/main.c and I think > > rte_eth_dev_socket_id(portid) > > always returns -1(SOCKET_ID_ANY) since there is no association code > between > > port and > > lcore in the example codes. > > Can you perhaps clarify what you mean here. On modern NUMA systems, such > as those > from Intel :-), the PCI slots are directly connected to the CPU sockets, > so the > ethernet ports do indeed have a direct NUMA affinity. It's not something > that > the app needs to specify. > > /Bruce > > > (i.e. I need to find a matching lcore from > > lcore_queue_conf[] with portid > > and call rte_lcore_to_socket_id(lcore_id).) > > > > /* init one RX queue */ > > fflush(stdout); > > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, > > rte_eth_dev_socket_id(portid), > > NULL, > > l2fwd_pktmbuf_pool); > > if (ret < 0) > > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, > > port=%u\n", > > ret, (unsigned) portid); > > > > It works fine even though memory is allocated in different NUMA node. > But I > > wonder there is > > a DPDK API that associates inlcore to port internally thus > > rte_eth_devices[portid].pci_dev->numa_node > > contains proper node. > > > > > > -- > > Moon-Sang Lee, SW Engineer > > Email: sang0627 at gmail.com > > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] l2fwd in examples directory
There is codes as below in examples/l2fwd/main.c and I think rte_eth_dev_socket_id(portid) always returns -1(SOCKET_ID_ANY) since there is no association code between port and lcore in the example codes. (i.e. I need to find a matching lcore from lcore_queue_conf[] with portid and call rte_lcore_to_socket_id(lcore_id).) /* init one RX queue */ fflush(stdout); ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, rte_eth_dev_socket_id(portid), NULL, l2fwd_pktmbuf_pool); if (ret < 0) rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, port=%u\n", ret, (unsigned) portid); It works fine even though memory is allocated in different NUMA node. But I wonder there is a DPDK API that associates inlcore to port internally thus rte_eth_devices[portid].pci_dev->numa_node contains proper node. -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] l2fwd in examples directory
There is codes as below in examples/l2fwd/main.c and I think rte_eth_dev_socket_id(portid) always returns -1(SOCKET_ID_ANY) since there is no association code between port and lcore in the example codes. (i.e. I need to find a matching lcore from lcore_queue_conf[] with portid and call rte_lcore_to_socket_id(lcore_id).) /* init one RX queue */ fflush(stdout); ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, rte_eth_dev_socket_id(portid), NULL, l2fwd_pktmbuf_pool); if (ret < 0) rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, port=%u\n", ret, (unsigned) portid); It works fine even though memory is allocated in different NUMA node. But I wonder there is a DPDK API that associates inlcore to port internally, thus rte_eth_devices[portid].pci_dev->numa_node contains proper node. -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] ksoftirqd when using KNI
I've observed CPU stats with top command, and found that ksoftirqd is processing software interrupts which might come from dpdk-kni application and would be processed by KNI and kernel net stack. My observation shows that 1. dpdk-kni-application drops a half of rx packets (i.e. fail to deliver packets to skb). this seems the rx_q is full in KNI side. I think this is because processing in KNI and IP stack is much slow and receiving packets from device via dpdk is much faster. 2. bonding multiple KNI interfaces to spread loads across multiple kernel threads does not help reduce that processing time. In addition, packets are transmitted out of order throughout multiple KNIs, which requires reordering at the communication end point. 3. NAT with native kernel performs twice better than that of KNI + native kernel even though the latter does not incur hardware interrupts. Anyway, my experiment was done in limited environment, so this does not reflect any general case. My wish for simple NAT solution seems not feasible with KNI, thus I should change my approach from KNI to pure dpdk application. On Fri, Sep 18, 2015 at 8:53 PM, Moon-Sang Lee wrote: > > I'm a newbie and testing DPDK KNI with 1G intel NIC. > > According to my understanding of DPDK documents, > KNI should not raise interrupts when sending/receiving packets. > > But when I transmit bunch of packets to my KNI ports, > 'top command' shows ksoftirqd with 50% CPU load. > > Would you give me some comments about this situation? > > > > -- > Moon-Sang Lee, SW Engineer > Email: sang0627 at gmail.com > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] about poor KNI performance
I'm running NAT with iptables on server M2, and two linux PCs, M1 and M3, are directly connected to M2 as below. All interfaces are 1G NICs over 1G links. [ M1 ]p1p3-[ M2 ]-p1p4[ M3 ] Without KNI nor DPDK, native iptables in Linux supports 360Mbps for UDP and 545Mbps for TCP. With KNI, it drops almost every packet for UDP (i.e. 87% of received packets are dropped), and shows 42Mbps for TCP. If I use KNI with DPDK, it runs in poll-mode thus there should not be interrupts. In my experiment, something strange is ksoftirqd consumes 50% CPU loads with KNI. Why ksoftirqd works such hard? I'm using the KNI application in the example directory of DPDK 2.1.0 source tree with options -c 0xf0 -n 4 -- -p 0x3 -P --config="(0,4,6,8),(1,5,7,9)". (i.e. rte_kni.ko is loaded with kthread_mode=multiple option) Any comments is appreciated. # without KNI (i.e. with native iptables) # [mslee at localhost ~]$ iperf3 -s -i 10 --- Server listening on 5201 --- Accepted connection from 192.168.110.4, port 48099 [ 5] local 192.168.110.5 port 5201 connected to 192.168.110.4 port 54364 [ ID] Interval Transfer Bandwidth JitterLost/Total Datagrams [ 5] 0.00-10.00 sec 376 MBytes 315 Mbits/sec 0.002 ms 471622/3548257 (13%) [ 5] 10.00-10.04 sec 1.53 MBytes 326 Mbits/sec 0.003 ms 1520/14071 (11%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth JitterLost/Total Datagrams [ 5] 0.00-10.04 sec 435 MBytes 363 Mbits/sec 0.003 ms 473142/3562328 (13%) --- Server listening on 5201 --- Accepted connection from 192.168.110.4, port 48100 [ 5] local 192.168.110.5 port 5201 connected to 192.168.110.4 port 48101 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 650 MBytes 545 Mbits/sec [ 5] 10.00-10.04 sec 2.53 MBytes 563 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.04 sec 653 MBytes 546 Mbits/sec 97 sender [ 5] 0.00-10.04 sec 652 MBytes 545 Mbits/sec receiver --- Server listening on 5201 --- # with KNI (i.e. with KNI + native iptables) # [mslee at localhost ~]$ iperf3 -s -i 10 --- Server listening on 5201 --- Accepted connection from 192.168.110.4, port 48102 [ 5] local 192.168.110.5 port 5201 connected to 192.168.110.4 port 60867 [ ID] Interval Transfer Bandwidth JitterLost/Total Datagrams [ 5] 0.00-10.00 sec 57.9 MBytes 48.6 Mbits/sec 0.021 ms 3239121/3713340 (87%) [ 5] 10.00-10.27 sec 278 KBytes 8.37 Mbits/sec 0.018 ms 16319/18541 (88%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth JitterLost/Total Datagrams [ 5] 0.00-10.27 sec 456 MBytes 372 Mbits/sec 0.018 ms 3255440/3731881 (87%) --- Server listening on 5201 --- Accepted connection from 192.168.110.4, port 48103 [ 5] local 192.168.110.5 port 5201 connected to 192.168.110.4 port 48104 [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.00 sec 50.6 MBytes 42.4 Mbits/sec [ 5] 10.00-10.04 sec 204 KBytes 44.2 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.04 sec 52.0 MBytes 43.4 Mbits/sec 151 sender [ 5] 0.00-10.04 sec 50.8 MBytes 42.4 Mbits/sec receiver --- Server listening on 5201 --- # snapshot of top command # PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 9519 root 20 0 8659492 20404 19948 R 400.0 0.1 10:41.09 unatd-kni 114 root 20 00 0 0 S 45.7 0.0 1:31.46 ksoftirqd/8 9525 root 20 00 0 0 S 17.2 0.0 0:07.89 kni_p1p3 9590 root 20 00 0 0 S 1.2 0.0 0:01.57 kni_p1p4 -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] ksoftirqd when using KNI
I'm a newbie and testing DPDK KNI with 1G intel NIC. According to my understanding of DPDK documents, KNI should not raise interrupts when sending/receiving packets. But when I transmit bunch of packets to my KNI ports, 'top command' shows ksoftirqd with 50% CPU load. Would you give me some comments about this situation? -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [Q] asymmetric ping latency over KNI
hello. I installed dpdp-2.1.0 on my ubuntu 14.04 server where two NIC ports are available. There are two KNI interfaces, vEth0 and vEth1, for two NIC ports. (i.e. one KNI for one NIC port) And I connected two servers, M1 and M3, to each NIC port as below. [ M1 ]eth0-[ M2 ]-eth1[ M3 ] (vEth0) (vEth1) After running KNI example in dpdk-2.1.0 source tree, I pinged from M1 to M2(eth0) and from M3 to M2(eth1). It shows short latency over M1-M2, but large latency (1 second) over M2-M3. If I put "-i 0.2" option to ping, the large latency reduces to 200ms. If I put "-i 0.x", then the latency becomes 0.x secnod. I can't figure out what's wrong with my configuration and running parameters. Any comments are appreciated. my runtime environment as follows. # loading kni driver insmod igb_uio.ko insmod rte_kni,ko lo_mode=lo_mode_ring_skb # execute kni example (i.e. packet burst size is the default, 32) a.out -c 0x1414 -n 4 -- -p 0x3 -P --config="(0,2,4,6),(1,10,12,14)" -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*