date:20150324

[dpdk-dev] Inputs needed - testing l2fwd

2015-03-24 Thread Stephen Hemminger

On Wed, 25 Mar 2015 09:40:47 +0530
Shankari Vaidyalingam  wrote:

> Hi,
> 
> Can anyone please help me whether I'm missing something in the below
> exercise
> 
> Regards
> Shankari.V
> 
> On Wed, Mar 25, 2015 at 12:25 AM, Shankari Vaidyalingam <
> shankari.v2k6 at gmail.com> wrote:
> 
> > I'm trying to run sample DPDK  applications by injecting packets from an
> > external traffic generator.
> > I'm not able to get the frames to reach the RX queue of the corresponding
> > port and make the PMD detect those packets and get the application process
> > them.
> >
> > My configuration is:
> >
> >  I'm using Oracle VirtualBox to have 2 virtual network adapters for use in
> > the sample application. I have configured both the network adapters in the
> > "Bridged Networking" mode.
> > When I open the VM I can see the interfaces - eth0 and eth1
> > I bound the 2 interfaces to igb_uio driver and then configured the
> > hugepage mapping.
> > When I try sending the frames from an external source I had set the dest
> > MAC to the MAC addr of the ports bound to DPDK.
> > But still I'm not able to see either the "Frames received" or "Frames
> > dropped" count getting incremented.
> >
> > I tried googling and tried various options for networking mode as
> > (Bridged/Host Only...) but none of them seem to work fine.
> >
> > I'm sure that I'm missing something.
> > Can you please help me in this regard.
> >
> > Regards
> > Shankari.V
> >
> >

VirtualBox has its own version of virtio which is not compatiable
with the current version of DPDK virtio driver.

[dpdk-dev] [PATCH] vhost: Fix `struct file' leakage in `eventfd_link'

2015-03-24 Thread Pavel Boldin

On Tue, Mar 24, 2015 at 6:20 PM, Xie, Huawei  wrote:

> On 3/24/2015 7:10 PM, Pavel Boldin wrote:
>
>
> On Tue, Mar 24, 2015 at 8:28 AM, Xie, Huawei  huawei.xie at intel.com>> wrote:
> On 3/23/2015 8:54 PM, Pavel Boldin wrote:
> > Due to increased `struct file's reference counter subsequent call
> > to `filp_close' does not free the `struct file'. Prepend `fput' call
> > to decrease the reference counter.
> >
> > Signed-off-by: Pavel Boldin  pboldin at mirantis.com>>
> > ---
> >  lib/librte_vhost/eventfd_link/eventfd_link.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > index 7755dd6..62c45c8 100644
> > --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> > +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > @@ -117,6 +117,7 @@ eventfd_link_ioctl(struct file *f, unsigned int
> ioctl, unsigned long arg)
> >* Release the existing eventfd in the source process
> >*/
> >   spin_lock(>file_lock);
> > + fput(file);
> >   filp_close(file, files);
> >   fdt = files_fdtable(files);
> >   fdt->fd[eventfd_copy.source_fd] = NULL;
> Acked-by Huawei Xie mailto:huawei.xie at intel.com>>
>
> In future, we should remove the allocation of src eventfd, allocate a
> free fd from kernel, and install it with target fd.
>
> Well, I don't think this is correct, because this will put too much job
> for the kernel module making it a combiner.
>
> At the moment I propose to accept module refactoring patch to the upstream.
>
> After that the module can be reworked along with the application code. The
> reason is I can't work on DPDK since I have no hardware to test application
> at so I can't help with patch that touches application code.
>
> Pavel
> Pavel:
> I am not asking to do it right now but in future, and i agree we accept
> the refactoring patch now, so i ack the patch.
>
Huawei, this is the refactoring patch [1]. This patch is merely a bugfix.

[1] http://dpdk.org/dev/patchwork/patch/4113/


> Huawei
>

[dpdk-dev] [PATCH v2 2/6] eal: Close file descriptor of uio configuration

2015-03-24 Thread Stephen Hemminger

On Wed, 25 Mar 2015 12:17:32 +0900
Tetsuya Mukawa  wrote:

> On 2015/03/25 3:33, Stephen Hemminger wrote:
> > On Tue, 24 Mar 2015 13:18:33 +0900
> > Tetsuya Mukawa  wrote:
> >
> >> When pci_uio_unmap_resource() is called, a file descriptor that is used
> >> for uio configuration should be closed.
> >>
> >> Signed-off-by: Tetsuya Mukawa 
> >> ---
> >>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 6 +-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
> >> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> >> index 9cdf24f..f0277be 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> >> @@ -459,8 +459,12 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)
> >>  
> >>/* close fd if in primary process */
> >>close(dev->intr_handle.fd);
> >> -
> >>dev->intr_handle.fd = -1;
> >> +
> >> +  /* close cfg_fd if in primary process */
> >> +  close(dev->intr_handle.uio_cfg_fd);
> >> +  dev->intr_handle.uio_cfg_fd = -1;
> >> +
> >>dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> >>  }
> >>  #endif /* RTE_LIBRTE_EAL_HOTPLUG */
> >
> > For the Qlogic/Broadcom driver it needed the config fd handle, and I added
> > generic config space access functions.
> 
> Hi Stephen,
> 
> Is this the patch you mentioned?
> http://dpdk.org/dev/patchwork/patch/3024/
> 
> 
> Hi David, Bernard, Stephen
> 
> I guess here are works we will need to do.
> 1. Add close(dev->config_fd) in Stephen's patch.
> 2. Write a patch for uio to merge "dev->intr_handle->uio_cfg_fd" and
> "dev->config_fd".
> 3. Write a patch for vfio to merge "dev->intr_handle->vfio_cfg_fd" and
> "dev->config_fd".
> 
> If we already have these patches, I guess it may be nice to merge above
> patches first.
> Do you have a suggestion how to merge patches related with pci config fd?
> 
> Thanks,
> Tetsuya
> 

Yeah, that is the patch. It reopens config fd, it seems to overlap
this.

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/24 18:06, Xie, Huawei wrote:
> On 3/24/2015 3:44 PM, Linhaifeng wrote:
>>
>> On 2015/3/24 9:53, Xie, Huawei wrote:
>>> On 3/24/2015 9:00 AM, Linhaifeng wrote:
 On 2015/3/23 20:54, Xie, Huawei wrote:
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Monday, March 23, 2015 8:24 PM
>> To: dev at dpdk.org
>> Cc: Ouyang, Changchun; Xie, Huawei
>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>
>>
>>
>> On 2015/3/21 16:07, linhaifeng wrote:
>>> From: Linhaifeng 
>>>
>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>> to volatile before notify guest.
>>>
>>> Signed-off-by: Linhaifeng 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>>> b/lib/librte_vhost/vhost_rxtx.c
>>> index 535c7a1..8d674d1 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>> uint16_t queue_id,
>>> }
>>>
>>> rte_compiler_barrier();
>>> -   vq->used->idx += entry_success;
>>> +   *(volatile uint16_t *)>used->idx += entry_success;
> Haifeng:
> We have compiler barrier before and an external function call behind, so 
> we don't need volatile  here.
> Do you meet issue?
>
 Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
 there are no buffers in tx_q and virtio_net
 though vhost haven't handle all packets so we have to restart VM to 
 restore work.

 The status in VM is:
 Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
 Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
 Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
 Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
 svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
 svq->last_used_idx=52820
 Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
 rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
 rvq->last_used_idx=35977
 Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: 
 dev_queue_xmit, qdisc->flags=4, qdisc->state deactiveed=0
 Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: 
 dev_queue_xmit, txq->state=1, stopped=1

 Why compiler barrier not take effect in our case? Is compiler barrier 
 depended on -O3 option? We use -O2 option.
>>> compiler barrier always works regardless of the optimization option.
>>> I don't get your story, but the key thing is, do you check the asm code?
>>> If called from outside as an API, how is it possible it is optimized?
>>> there is only one update to used->idx in that function.
>>
>> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile 
>> ? Why not remove it?
> I checked the code. Seems we can remove. That is another issue.
> For your issue, you meet problem, and submit this this patch, but i am a
> bit confused it is the root cause. Do you check the asm code that
> volatile is optimized?

I'm not sure about this too.How to check volatile is optimized?

> 
>>> /* Kick guest if required. */
>>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>> eventfd_write((int)vq->callfd, 1);
>>>
>> --
>> Regards,
>> Haifeng
>
>>>
>>> .
>>>
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] hash: fix breaking strict-aliasing rules

2015-03-24 Thread Yerden Zhumabekov

Fix rte_hash_crc() function by making use of uintptr_t variable
to hold a pointer to data being hashed. In this way, casting uint64_t
pointer to uint32_t avoided.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 1cd626c..abdbd9a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -513,35 +513,36 @@ rte_hash_crc(const void *data, uint32_t data_len, 
uint32_t init_val)
 {
unsigned i;
uint64_t temp = 0;
-   const uint64_t *p64 = (const uint64_t *)data;
+   uintptr_t pd = (uintptr_t) data;

for (i = 0; i < data_len / 8; i++) {
-   init_val = rte_hash_crc_8byte(*p64++, init_val);
+   init_val = rte_hash_crc_8byte(*(const uint64_t *)pd, init_val);
+   pd += 8;
}

switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
+   temp |= (uint64_t) *((const uint8_t *)pd + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
+   temp |= (uint64_t) *((const uint8_t *)pd + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
-   temp |= *((const uint32_t *)p64);
+   temp |= (uint64_t) *((const uint8_t *)pd + 4) << 32;
+   temp |= *(const uint32_t *)pd;
init_val = rte_hash_crc_8byte(temp, init_val);
break;
case 3:
-   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)pd, init_val);
break;
case 4:
-   temp |= *((const uint8_t *)p64 + 2) << 16;
+   temp |= *((const uint8_t *)pd + 2) << 16;
/* Fallthrough */
case 5:
-   temp |= *((const uint8_t *)p64 + 1) << 8;
+   temp |= *((const uint8_t *)pd + 1) << 8;
/* Fallthrough */
case 6:
-   temp |= *((const uint8_t *)p64);
+   temp |= *(const uint8_t *)pd;
init_val = rte_hash_crc_4byte(temp, init_val);
/* Fallthrough */
default:
-- 
1.7.9.5

[dpdk-dev] ovs-dpdk: placing the metadata

2015-03-24 Thread Zoltan Kiss

Hi,

I've noticed in lib/netdev-dpdk.c that __rte_pktmbuf_init() stores the 
packet metadata right after "struct rte_mbuf", and before the buffer data:

 /* start of buffer is just after mbuf structure */
 m->buf_addr = (char *)m + sizeof(struct dp_packet);

(struct dp_packet has the rte_mbuf as first member if DPDK enabled) 

However, lib/librte_mbuf/rte_mbuf.h seems to codify that the buffer 
should start right after the rte_mbuf:

/**
  * Given the buf_addr returns the pointer to corresponding mbuf.
  */
#define RTE_MBUF_FROM_BADDR(ba) (((struct rte_mbuf *)(ba)) - 1)

/**
  * Given the pointer to mbuf returns an address where it's  buf_addr
  * should point to.
  */
#define RTE_MBUF_TO_BADDR(mb)   (((struct rte_mbuf *)(mb)) + 1)

These macros are used for attaching/detaching mbuf's to each other. This 
is the way the code retrieves the direct buffer from an indirect one, 
and vica versa. I think if we want to keep the metadata feature (which I 
guess is quite important), we need to add a pointer to rte_mbuf, which 
helps the direct and indirect structs to find each other. Something like:

struct rte_mbuf *attach;/**< Points to the other buffer if this one
 is (in)direct. Otherwise NULL.  */

What do you think?

Regards,

Zoltan Kiss

[dpdk-dev] Need some info on the impact of BIOS setting

2015-03-24 Thread Shankari Vaidyalingam

Hi

I'd like to know the impact of the virtualization setting (Intel VT set to
enabled in BIOS) on the execution of the DPDK applications.
Do I need to disable this setting while executing the DPDK applications in
virtual machines in VirtualBox.
Any help in this regard is appreciated.

Regards
Shankari.V

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-24 Thread Bruce Richardson

On Tue, Mar 24, 2015 at 04:10:18PM +0200, Dor Green wrote:
> 1 . The eth_conf is:
> 
> static struct rte_eth_conf const ethconf = {
> .link_speed = 0,
> .link_duplex = 0,
> 
> .rxmode = {
> .mq_mode = ETH_MQ_RX_RSS,
> .max_rx_pkt_len = ETHER_MAX_LEN,
> .split_hdr_size = 0,
> .header_split = 0,
> .hw_ip_checksum = 0,
> .hw_vlan_filter = 0,
> .jumbo_frame = 0,
> .hw_strip_crc = 0,   /**< CRC stripped by hardware */
> },
> 
> .txmode = {
> },
> 
> .rx_adv_conf = {
> .rss_conf = {
> .rss_key = NULL,
> .rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
> }
> },
> 
> .fdir_conf = {
> .mode = RTE_FDIR_MODE_SIGNATURE,
> 
> },
> 
> .intr_conf = {
> .lsc = 0,
> },
> };
> 
> I've tried setting jumbo frames on with a larger packet length and
> even turning off RSS/FDIR. No luck.
> 
> I don't see anything relating to the port in the initial prints, what
> are you looking for?

I'm looking for the PMD initialization text, like that shown below (from 
testpmd):
Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cd700 hw_ring=0x7f9ba0b00080 
dma_addr=0x36d00080
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08cce80 hw_ring=0x7f9ba0b10080 
dma_addr=0x36d10080
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size 
no less than 32.
Port 0: 68:05:CA:04:51:3A
Configuring Port 1 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cab40 hw_ring=0x7f9ba0b20100 
dma_addr=0x36d20100
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08ca2c0 hw_ring=0x7f9ba0b30100 
dma_addr=0x36d30100
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size 
no less than 32.
Port 1: 68:05:CA:04:51:38

This tells us what RX and TX functions are going to be used for each port.

> 
> 2. The packet is a normal, albeit somewhat large (1239 bytes) TCP data
> packet (SSL certificate data, specifically).
> One important thing of note that I've just realised is that it's not
> this "packet of death" which causes the segmentation fault (i.e. has
> an out-of-bounds address for its data), but the packet afterwards-- no
> matter what packet it is.
> 
Can this problem be reproduced using testpmd or any of the standard dpdk 
example apps, by sending in the same packet sequence?

Is there anything unusual being done in the setup of the mempool used for the
packet buffers?

/Bruce

> 
> On Tue, Mar 24, 2015 at 3:17 PM, Bruce Richardson
>  wrote:
> > On Tue, Mar 24, 2015 at 12:54:14PM +0200, Dor Green wrote:
> >> I've managed to fix it so 1.8 works, and the segmentation fault still 
> >> occurs.
> >>
> >> On Tue, Mar 24, 2015 at 11:55 AM, Dor Green  wrote:
> >> > I tried 1.8, but that fails to initialize my device and fails at the pci 
> >> > probe:
> >> > "Cause: Requested device :04:00.1 cannot be used"
> >> > Can't even compile 2.0rc2 atm, getting:
> >> > "/usr/lib/gcc/x86_64-linux-gnu/4.6/include/emmintrin.h:701:1: note:
> >> > expected '__m128i' but argument is of type 'int'"
> >> > For reasons I don't understand.
> >> >
> >> > As for the example apps (in 1.7), I can run them properly but I don't
> >> > think any of them do the same processing as I do. Note that mine does
> >> > work with most packets.
> >> >
> >> >
> >
> > Couple of further questions:
> > 1. What config options are being used to configure the port and what is the
> > output printed at port initialization time? This is needed to let us track 
> > down
> > what specific RX path is being used inside the ixgbe driver
> > 2. What type of packets specifically cause problems? Is it reproducible with
> > one particular packet, or packet type? Are you sending in jumbo-frames?
> >
> > Regards,
> > /Bruce
> >
> >> > On Mon, Mar 23, 2015 at 11:24 PM, Matthew Hall  >> > mhcomputing.net> wrote:
> >> >> On Mon, Mar 23, 2015 at 05:19:00PM +0200, Dor Green wrote:
> >> >>> I changed it to free and it still happens. Note that the segmentation 
> >> >>> fault
> >> >>> happens before that anyway.
> >> >>>
> >> >>> I am using 1.7.1 at the moment. I can try using a newer version.
> >> >>
> >> >> I'm using 1.7.X in my open-source DPDK-based app and it works, but I 
> >> >> have an
> >> >> IGB 1-gigabit NIC though, and how RX / TX work are quite driver 
> >> >> specific of
> >> >> course.
> >> >>
> >> >> I suspect there's some issue with how things are working in your IXGBE 
> >> >> NIC
> >> >> driver / setup. Do the same failures occur inside of the DPDK's own 
> >> >> sample
> >> >> apps?
> >> >>
> >> >> Matthew.

[dpdk-dev] [PATCH] vhost: Fix `struct file' leakage in `eventfd_link'

2015-03-24 Thread Xie, Huawei

On 3/24/2015 7:10 PM, Pavel Boldin wrote:

On Tue, Mar 24, 2015 at 8:28 AM, Xie, Huawei mailto:huawei.xie at intel.com>> wrote:
On 3/23/2015 8:54 PM, Pavel Boldin wrote:
> Due to increased `struct file's reference counter subsequent call
> to `filp_close' does not free the `struct file'. Prepend `fput' call
> to decrease the reference counter.
>
> Signed-off-by: Pavel Boldin mailto:pboldin at 
> mirantis.com>>
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> index 7755dd6..62c45c8 100644
> --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> @@ -117,6 +117,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>* Release the existing eventfd in the source process
>*/
>   spin_lock(>file_lock);
> + fput(file);
>   filp_close(file, files);
>   fdt = files_fdtable(files);
>   fdt->fd[eventfd_copy.source_fd] = NULL;
Acked-by Huawei Xie mailto:huawei.xie at intel.com>>

In future, we should remove the allocation of src eventfd, allocate a
free fd from kernel, and install it with target fd.

Well, I don't think this is correct, because this will put too much job for the 
kernel module making it a combiner.

At the moment I propose to accept module refactoring patch to the upstream.

After that the module can be reworked along with the application code. The 
reason is I can't work on DPDK since I have no hardware to test application at 
so I can't help with patch that touches application code.

Pavel
Pavel:
I am not asking to do it right now but in future, and i agree we accept the 
refactoring patch now, so i ack the patch.
Huawei

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-24 Thread Dor Green

1 . The eth_conf is:

static struct rte_eth_conf const ethconf = {
.link_speed = 0,
.link_duplex = 0,

.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = ETHER_MAX_LEN,
.split_hdr_size = 0,
.header_split = 0,
.hw_ip_checksum = 0,
.hw_vlan_filter = 0,
.jumbo_frame = 0,
.hw_strip_crc = 0,   /**< CRC stripped by hardware */
},

.txmode = {
},

.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
}
},

.fdir_conf = {
.mode = RTE_FDIR_MODE_SIGNATURE,

},

.intr_conf = {
.lsc = 0,
},
};

I've tried setting jumbo frames on with a larger packet length and
even turning off RSS/FDIR. No luck.

I don't see anything relating to the port in the initial prints, what
are you looking for?

2. The packet is a normal, albeit somewhat large (1239 bytes) TCP data
packet (SSL certificate data, specifically).
One important thing of note that I've just realised is that it's not
this "packet of death" which causes the segmentation fault (i.e. has
an out-of-bounds address for its data), but the packet afterwards-- no
matter what packet it is.


On Tue, Mar 24, 2015 at 3:17 PM, Bruce Richardson
 wrote:
> On Tue, Mar 24, 2015 at 12:54:14PM +0200, Dor Green wrote:
>> I've managed to fix it so 1.8 works, and the segmentation fault still occurs.
>>
>> On Tue, Mar 24, 2015 at 11:55 AM, Dor Green  wrote:
>> > I tried 1.8, but that fails to initialize my device and fails at the pci 
>> > probe:
>> > "Cause: Requested device :04:00.1 cannot be used"
>> > Can't even compile 2.0rc2 atm, getting:
>> > "/usr/lib/gcc/x86_64-linux-gnu/4.6/include/emmintrin.h:701:1: note:
>> > expected '__m128i' but argument is of type 'int'"
>> > For reasons I don't understand.
>> >
>> > As for the example apps (in 1.7), I can run them properly but I don't
>> > think any of them do the same processing as I do. Note that mine does
>> > work with most packets.
>> >
>> >
>
> Couple of further questions:
> 1. What config options are being used to configure the port and what is the
> output printed at port initialization time? This is needed to let us track 
> down
> what specific RX path is being used inside the ixgbe driver
> 2. What type of packets specifically cause problems? Is it reproducible with
> one particular packet, or packet type? Are you sending in jumbo-frames?
>
> Regards,
> /Bruce
>
>> > On Mon, Mar 23, 2015 at 11:24 PM, Matthew Hall  
>> > wrote:
>> >> On Mon, Mar 23, 2015 at 05:19:00PM +0200, Dor Green wrote:
>> >>> I changed it to free and it still happens. Note that the segmentation 
>> >>> fault
>> >>> happens before that anyway.
>> >>>
>> >>> I am using 1.7.1 at the moment. I can try using a newer version.
>> >>
>> >> I'm using 1.7.X in my open-source DPDK-based app and it works, but I have 
>> >> an
>> >> IGB 1-gigabit NIC though, and how RX / TX work are quite driver specific 
>> >> of
>> >> course.
>> >>
>> >> I suspect there's some issue with how things are working in your IXGBE NIC
>> >> driver / setup. Do the same failures occur inside of the DPDK's own sample
>> >> apps?
>> >>
>> >> Matthew.

[dpdk-dev] [PATCH v3] lib/librte_vhost: update used->idx when allocation of mbuf fails

2015-03-24 Thread Linhaifeng



On 2015/3/24 15:14, Xie, Huawei wrote:
> On 3/22/2015 8:08 PM, Ouyang, Changchun wrote:
>>
>>> -Original Message-
>>> From: linhaifeng [mailto:haifeng.lin at huawei.com]
>>> Sent: Saturday, March 21, 2015 9:47 AM
>>> To: dev at dpdk.org
>>> Cc: Ouyang, Changchun; Xie, Huawei
>>> Subject: [PATCH v3] lib/librte_vhost: update used->idx when allocation of
>>> mbuf fails
>>>
>>> From: Linhaifeng 
>>>
>>> When failed to malloc buffer from mempool we just update last_used_idx
>>> but not used->idx so after many times vhost thought have handle all packets
>>> but virtio_net thought vhost have not handle all packets and will not update
>>> avail->idx.
>>>
>>> Signed-off-by: Linhaifeng 
>> Acked-by: Changchun Ouyang 
>>
>>
>>
> Acked-by: Huawei Xie 
> 
> This patch fix the issue.
> Simple solution like other PMDs is before processing one descriptor,
> ensure allocation of new mbuf is successfull, and then immediately
> refill after receiving the packet from the descriptor.
> In future, we should consider optimized bulk allocation strategy with
> threshold.
> 

Hi, huawei

THis is patch is for librte_vhost.
Do you want to ack for the other patch for virtio-net-pmd?
> 
> 
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/24 9:53, Xie, Huawei wrote:
> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>
>> On 2015/3/23 20:54, Xie, Huawei wrote:
>>>
 -Original Message-
 From: Linhaifeng [mailto:haifeng.lin at huawei.com]
 Sent: Monday, March 23, 2015 8:24 PM
 To: dev at dpdk.org
 Cc: Ouyang, Changchun; Xie, Huawei
 Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile



 On 2015/3/21 16:07, linhaifeng wrote:
> From: Linhaifeng 
>
> Same as rte_vhost_enqueue_burst we should cast used->idx
> to volatile before notify guest.
>
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 535c7a1..8d674d1 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
 uint16_t queue_id,
>   }
>
>   rte_compiler_barrier();
> - vq->used->idx += entry_success;
> + *(volatile uint16_t *)>used->idx += entry_success;
>>>
>>> Haifeng:
>>> We have compiler barrier before and an external function call behind, so we 
>>> don't need volatile  here.
>>> Do you meet issue?
>>>
>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
>> there are no buffers in tx_q and virtio_net
>> though vhost haven't handle all packets so we have to restart VM to restore 
>> work.
>>
>> The status in VM is:
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
>> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
>> svq->last_used_idx=52820
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
>> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
>> rvq->last_used_idx=35977
>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
>> qdisc->flags=4, qdisc->state deactiveed=0
>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
>> txq->state=1, stopped=1
>>
>> Why compiler barrier not take effect in our case? Is compiler barrier 
>> depended on -O3 option? We use -O2 option.
> compiler barrier always works regardless of the optimization option.
> I don't get your story, but the key thing is, do you check the asm code?
> If called from outside as an API, how is it possible it is optimized?
> there is only one update to used->idx in that function.


Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile ? 
Why not remove it?

>>
>   /* Kick guest if required. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>   eventfd_write((int)vq->callfd, 1);
>
 --
 Regards,
 Haifeng
>>>
>>>
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-24 Thread John McNamara

Added a 'make system_info' target to print out system info
related to DPDK. This is intended as output that can be
attached to bug reports.
---
 mk/rte.sdkroot.mk | 33 +
 1 file changed, 33 insertions(+)

diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index e8423b0..b477d09 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -123,3 +123,36 @@ examples examples_clean:
 %:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
+
+.PHONY: system_info
+system_info:
+   $(Q)echo
+   $(Q)echo "CC version"
+   $(Q)echo "=="
+   $(Q)$(CC) --version
+   $(Q)echo
+
+   $(Q)echo "DPDK version"
+   $(Q)echo ""
+   $(Q)$(MAKE) showversion
+   $(Q)echo
+
+   $(Q)echo "Git commit"
+   $(Q)echo "=="
+   $(Q)git log --pretty=format:'%H' -1
+   $(Q)echo
+
+   $(Q)echo "Uname"
+   $(Q)echo "="
+   $(Q)uname -srvmpio
+   $(Q)echo
+
+   $(Q)echo "Hugepages"
+   $(Q)echo "="
+   $(Q)grep -i huge /proc/meminfo
+   $(Q)echo
+
+   $(Q)tools/cpu_layout.py
+
+   $(Q)tools/dpdk_nic_bind.py --status
+   $(Q)echo
-- 
1.8.1.4

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-24 Thread John McNamara

Added a 'make system_info' target to print out system info related to
DPDK. This is intended as output that can be attached to bug reports.

This is related to the recent call for tools brainstorming by Thomas.

http://dpdk.org/ml/archives/dev/2015-March/015499.html

Bug reports to the DPDK mailing list rarely have enough information and
require several follow-up questions to determine the version of software, the
OS, the compiler etc.

This 'make' target prints out some useful system information that can be
attached to an email. There is no guarantee that the end user will do this but
at least it can be documented and we can point them to it.

Suggestions for other useful information to be output welcome. Untested on
FreeBSD for now.


Sample output:

$ make system_info  

CC version
==
cc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8)
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


DPDK version

2.0.0-rc2

Git commit
==
0431d322b80858b9efa1b43480be6d4ddccf4a66

Uname
=
Linux 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 
x86_64 GNU/Linux

Hugepages
=
AnonHugePages:102400 kB
HugePages_Total:1024
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB


Core and Socket Information (as reported by '/proc/cpuinfo')


cores =  [0, 1, 2, 3, 4, 5, 6, 7]
sockets =  [0, 1]

   Socket 0Socket 1
   
Core 0 [0, 16] [8, 24] 

Core 1 [1, 17] [9, 25] 

Core 2 [2, 18] [10, 26]

Core 3 [3, 19] [11, 27]

Core 4 [4, 20] [12, 28]

Core 5 [5, 21] [13, 29]

Core 6 [6, 22] [14, 30]

Core 7 [7, 23] [15, 31]


Network devices using DPDK-compatible driver

:84:00.0 '82599EB 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio 
unused=
:86:00.0 '82599EB 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio 
unused=

Network devices using kernel driver
===
:03:00.0 'RTL8111/8168 PCI Express Gigabit Ethernet controller' if=p260p1 
drv=r8169 unused=igb_uio 
:06:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb unused=igb_uio 
:06:00.1 'I350 Gigabit Network Connection' if=eth0 drv=igb unused=igb_uio 
:81:00.0 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p264p1 
drv=ixgbe unused=igb_uio 
:81:00.1 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p264p2 
drv=ixgbe unused=igb_uio 
:84:00.1 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p263p2 
drv=ixgbe unused=igb_uio 
:86:00.1 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p262p2 
drv=ixgbe unused=igb_uio 
:88:00.0 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p261p1 
drv=ixgbe unused=igb_uio 
:88:00.1 '82599EB 10-Gigabit SFI/SFP+ Network Connection' if=p261p2 
drv=ixgbe unused=igb_uio 

Other network devices
=






John McNamara (1):
  mk: added make target to print out system info

 mk/rte.sdkroot.mk | 33 +
 1 file changed, 33 insertions(+)

-- 
1.8.1.4

[dpdk-dev] [PATCH v2] ixgbe: fix ixgbe access endian issue on bigendian arch

2015-03-24 Thread xuelin....@freescale.com

From: Xuelin Shi 

enforce rules of the cpu and ixgbe exchange data.
1. cpu use data owned by ixgbe must use rte_le_to_cpu_xx(...)
2. cpu fill data to ixgbe must use rte_cpu_to_le_xx(...)

Signed-off-by: Xuelin Shi 
---
changes for v2:
  rebased on latest head.
  fix some style issue detected by checkpatch.pl

 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 104 --
 1 file changed, 66 insertions(+), 38 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 9da2c7e..961ac08 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -128,7 +128,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
int i;

/* check DD bit on threshold descriptor */
-   status = txq->tx_ring[txq->tx_next_dd].wb.status;
+   status = rte_le_to_cpu_32(txq->tx_ring[txq->tx_next_dd].wb.status);
if (! (status & IXGBE_ADVTXD_STAT_DD))
return 0;

@@ -174,11 +174,14 @@ tx4(volatile union ixgbe_adv_tx_desc *txdp, struct 
rte_mbuf **pkts)
pkt_len = (*pkts)->data_len;

/* write data to descriptor */
-   txdp->read.buffer_addr = buf_dma_addr;
+   txdp->read.buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+
txdp->read.cmd_type_len =
-   ((uint32_t)DCMD_DTYP_FLAGS | pkt_len);
+   rte_cpu_to_le_32((uint32_t)DCMD_DTYP_FLAGS | pkt_len);
+
txdp->read.olinfo_status =
-   (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+   rte_cpu_to_le_32(pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
rte_prefetch0(&(*pkts)->pool);
}
 }
@@ -194,11 +197,14 @@ tx1(volatile union ixgbe_adv_tx_desc *txdp, struct 
rte_mbuf **pkts)
pkt_len = (*pkts)->data_len;

/* write data to descriptor */
-   txdp->read.buffer_addr = buf_dma_addr;
+   txdp->read.buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+
txdp->read.cmd_type_len =
-   ((uint32_t)DCMD_DTYP_FLAGS | pkt_len);
+   rte_cpu_to_le_32((uint32_t)DCMD_DTYP_FLAGS | pkt_len);
+
txdp->read.olinfo_status =
-   (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+   rte_cpu_to_le_32(pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
rte_prefetch0(&(*pkts)->pool);
 }

@@ -285,7 +291,7 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 * a divisor of the ring size
 */
tx_r[txq->tx_next_rs].read.cmd_type_len |=
-   rte_cpu_to_le_32(IXGBE_ADVTXD_DCMD_RS);
+   rte_cpu_to_le_32(IXGBE_ADVTXD_DCMD_RS);
txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);

txq->tx_tail = 0;
@@ -304,7 +310,7 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 */
if (txq->tx_tail > txq->tx_next_rs) {
tx_r[txq->tx_next_rs].read.cmd_type_len |=
-   rte_cpu_to_le_32(IXGBE_ADVTXD_DCMD_RS);
+   rte_cpu_to_le_32(IXGBE_ADVTXD_DCMD_RS);
txq->tx_next_rs = (uint16_t)(txq->tx_next_rs +
txq->tx_rs_thresh);
if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -505,6 +511,7 @@ ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq)
uint16_t nb_tx_desc = txq->nb_tx_desc;
uint16_t desc_to_clean_to;
uint16_t nb_tx_to_clean;
+   uint32_t stat;

/* Determine the last descriptor needing to be cleaned */
desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
@@ -513,7 +520,9 @@ ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq)

/* Check to make sure the last descriptor to clean is done */
desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-   if (! (txr[desc_to_clean_to].wb.status & IXGBE_TXD_STAT_DD))
+
+   stat = rte_le_to_cpu_32(txr[desc_to_clean_to].wb.status);
+   if (!(stat & IXGBE_TXD_STAT_DD))
{
PMD_TX_FREE_LOG(DEBUG,
"TX descriptor %4u is not done"
@@ -544,7 +553,7 @@ ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq)
 * up to the last descriptor with the RS bit set
 * are done. Only reset the threshold descriptor.
 */
-   txr[desc_to_clean_to].wb.status = 0;
+   txr[desc_to_clean_to].wb.status = rte_cpu_to_le_32(0);

/* Update the txq to reflect the last descriptor that was cleaned */
txq->last_desc_cleaned = desc_to_clean_to;
@@ -801,12 +810,14 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 */
slen = m_seg->data_len;
buf_dma_addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+
txd->read.buffer_addr =
-

[dpdk-dev] [PATCH v2] librte_lpm: define tbl entry reversely for big endian

2015-03-24 Thread xuelin....@freescale.com

From: Xuelin Shi 

This module uses type conversion between struct and int.
Also truncation and comparison is used with this int.
It is not safe for different endian arch.

Add ifdef for big endian struct to fix this issue.

Signed-off-by: Xuelin Shi 
---
changes for v2:
  add 

 lib/librte_lpm/rte_lpm.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 1af150c..6cbcd4c 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -96,6 +97,7 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS  0x0100

+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 struct rte_lpm_tbl24_entry {
/* Stores Next hop or group index (i.e. gindex)into tbl8. */
@@ -117,6 +119,24 @@ struct rte_lpm_tbl8_entry {
uint8_t valid_group :1; /**< Group validation flag. */
uint8_t depth   :6; /**< Rule depth. */
 };
+#else
+struct rte_lpm_tbl24_entry {
+   uint8_t depth   :6;
+   uint8_t ext_entry   :1;
+   uint8_t valid   :1;
+   union {
+   uint8_t tbl8_gindex;
+   uint8_t next_hop;
+   };
+};
+
+struct rte_lpm_tbl8_entry {
+   uint8_t depth   :6;
+   uint8_t valid_group :1;
+   uint8_t valid   :1;
+   uint8_t next_hop;
+};
+#endif

 /** @internal Rule structure. */
 struct rte_lpm_rule {
-- 
1.9.1

[dpdk-dev] [PATCH v2 6/6] eal: Fix interface of pci_map_resource()

2015-03-24 Thread Tetsuya Mukawa

The function is implemented in both linuxapp and bsdapp, but interface
is different. The patch fixes the function of bsdapp to do same as
linuxapp. After applying it, file descriptor should be opened and
closed out of pci_map_resource().
Also, remove redundant error messages from linuxapp.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 111 ++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  21 +++---
 2 files changed, 77 insertions(+), 55 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 08b91b4..d83916b 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -100,7 +100,7 @@ struct mapped_pci_resource {

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
+   int nb_maps;
struct pci_map maps[PCI_MAX_RESOURCE];
 };

@@ -122,47 +122,30 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED, fd, offset);
-   close(fd);
-   if (mapaddr == MAP_FAILED ||
-   (requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   MAP_SHARED | additional_flags, fd, offset);
+   if (mapaddr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
-   goto fail;
-   }
-
-   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);

return mapaddr;
-
-fail:
-   return NULL;
 }

 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-   size_t i;
+   int i, fd;
struct mapped_pci_resource *uio_res;
struct mapped_pci_res_list *uio_res_list =
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
@@ -170,19 +153,34 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
TAILQ_FOREACH(uio_res, uio_res_list, next) {

/* skip this element if it doesn't match our PCI address */
-   if (memcmp(_res->pci_addr, >addr, sizeof(dev->addr)))
+   if (rte_eal_compare_pci_addr(_res->pci_addr, >addr))
continue;

for (i = 0; i != uio_res->nb_maps; i++) {
-   if (pci_map_resource(uio_res->maps[i].addr,
-uio_res->path,
-(off_t)uio_res->maps[i].offset,
-(size_t)uio_res->maps[i].size)
-   != uio_res->maps[i].addr) {
+   /*
+* open devname, to mmap it
+*/
+   fd = open(uio_res->maps[i].path, O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   uio_res->maps[i].path, strerror(errno));
+   return -1;
+   }
+
+   void *mapaddr = pci_map_resource(uio_res->maps[i].addr,
+   fd, (off_t)uio_res->maps[i].offset,
+   (size_t)uio_res->maps[i].size, 0);
+   if (mapaddr != uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
-   "Cannot mmap device resource\n");
+   "Cannot mmap device resource "
+   "file %s to address: %p\n",
+   uio_res->maps[i].path,
+   uio_res->maps[i].addr);
+   close(fd);

[dpdk-dev] [PATCH v2 5/6] eal: Use map_idx in pci_uio_map_resource() of bsdapp to work same as linuxapp

2015-03-24 Thread Tetsuya Mukawa

This patch changes code that maps pci resources in bsdapp.
Linuxapp has almost same code. To consolidate both, fix implementation
of bsdapp to work same as linuxapp.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 85f8671..08b91b4 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -195,7 +195,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 static int
 pci_uio_map_resource(struct rte_pci_device *dev)
 {
-   int i, j;
+   int i, map_idx;
char devname[PATH_MAX]; /* contains the /dev/uioX */
void *mapaddr;
uint64_t phaddr;
@@ -247,31 +247,31 @@ pci_uio_map_resource(struct rte_pci_device *dev)
pagesz = sysconf(_SC_PAGESIZE);

maps = uio_res->maps;
-   for (i = uio_res->nb_maps = 0; i != PCI_MAX_RESOURCE; i++) {
+   for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {

-   j = uio_res->nb_maps;
/* skip empty BAR */
if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
continue;

/* if matching map is found, then use it */
offset = i * pagesz;
-   maps[j].offset = offset;
-   maps[j].phaddr = dev->mem_resource[i].phys_addr;
-   maps[j].size = dev->mem_resource[i].len;
-   if (maps[j].addr != NULL ||
-   (mapaddr = pci_map_resource(NULL, devname, (off_t)offset,
-   (size_t)maps[j].size)
-   ) == NULL) {
+   maps[map_idx].offset = offset;
+   maps[map_idx].phaddr = dev->mem_resource[i].phys_addr;
+   maps[map_idx].size = dev->mem_resource[i].len;
+   mapaddr = pci_map_resource(NULL, devname, (off_t)offset,
+   (size_t)maps[map_idx].size);
+   if ((maps[map_idx].addr != NULL) || (mapaddr == NULL)) {
rte_free(uio_res);
return -1;
}

-   maps[j].addr = mapaddr;
-   uio_res->nb_maps++;
+   maps[map_idx].addr = mapaddr;
+   map_idx++;
dev->mem_resource[i].addr = mapaddr;
}

+   uio_res->nb_maps = map_idx;
+
TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);

return 0;
-- 
1.9.1

[dpdk-dev] [PATCH v2 4/6] eal/bsdapp: Change names of pci related data structure

2015-03-24 Thread Tetsuya Mukawa

To merge pci code of linuxapp and bsdapp, this patch changes names
like below.
 - uio_map to pci_map
 - uio_resource to mapped_pci_resource
 - uio_res_list to mapped_pci_res_list
Also, add 'path' variable to pci_map of bsdapp.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 3a22b49..85f8671 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -83,8 +83,9 @@
  * enabling bus master.
  */

-struct uio_map {
+struct pci_map {
void *addr;
+   char *path;
uint64_t offset;
uint64_t size;
uint64_t phaddr;
@@ -94,16 +95,16 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);

 static struct rte_tailq_elem rte_uio_tailq = {
.name = "UIO_RESOURCE_LIST",
@@ -162,9 +163,9 @@ static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
size_t i;
-   struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
+   struct mapped_pci_resource *uio_res;
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -201,10 +202,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t offset;
uint64_t pagesz;
struct rte_pci_addr *loc = >addr;
-   struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
-   struct uio_map *maps;
+   struct mapped_pci_resource *uio_res;
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct pci_map *maps;

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-- 
1.9.1

[dpdk-dev] [PATCH v2 3/6] eal: Fix memory leaks and needless increment of pci_map_addr

2015-03-24 Thread Tetsuya Mukawa

This patch fixes following memory leaks.
- When pci_map_resource() is failed but path is allocated correctly,
  path won't be freed in pci_uio_map_recource().
- When open() is failed, uio_res won't be freed in
  pci_uio_map_resource().
- When pci_uio_unmap() is called, path should be freed.

Also, fixes below.
- When pci_map_resource() is failed, mapaddr will be MAP_FAILED.
  In this case, pci_map_addr should not be incremented in
  pci_uio_map_resource().
- To shrink code, move close().
- Remove fail variable.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 35 ++-
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index f0277be..0128cec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -333,7 +333,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
maps = uio_res->maps;
for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
int fd;
-   int fail = 0;

/* skip empty BAR */
phaddr = dev->mem_resource[i].phys_addr;
@@ -347,6 +346,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
loc->domain, loc->bus, loc->devid, 
loc->function,
i);

+   /* allocate memory to keep path */
+   maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
+   if (maps[map_idx].path == NULL)
+   goto fail0;
+
/*
 * open resource file, to mmap it
 */
@@ -354,7 +358,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (fd < 0) {
RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
devname, strerror(errno));
-   return -1;
+   goto fail1;
}

/* try mapping somewhere close to the end of hugepages */
@@ -363,23 +367,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)

mapaddr = pci_map_resource(pci_map_addr, fd, 0,
(size_t)dev->mem_resource[i].len, 0);
+   close(fd);
if (mapaddr == MAP_FAILED)
-   fail = 1;
+   goto fail1;

pci_map_addr = RTE_PTR_ADD(mapaddr,
(size_t)dev->mem_resource[i].len);

-   maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
-   if (maps[map_idx].path == NULL)
-   fail = 1;
-
-   if (fail) {
-   rte_free(uio_res);
-   close(fd);
-   return -1;
-   }
-   close(fd);
-
maps[map_idx].phaddr = dev->mem_resource[i].phys_addr;
maps[map_idx].size = dev->mem_resource[i].len;
maps[map_idx].addr = mapaddr;
@@ -394,6 +388,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);

return 0;
+
+fail1:
+   rte_free(maps[map_idx].path);
+fail0:
+   for (i = 0; i < map_idx; i++)
+   rte_free(maps[i].path);
+   rte_free(uio_res);
+
+   return -1;
 }

 #ifdef RTE_LIBRTE_EAL_HOTPLUG
@@ -405,9 +408,11 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
if (uio_res == NULL)
return;

-   for (i = 0; i != uio_res->nb_maps; i++)
+   for (i = 0; i != uio_res->nb_maps; i++) {
pci_unmap_resource(uio_res->maps[i].addr,
(size_t)uio_res->maps[i].size);
+   rte_free(uio_res->maps[i].path);
+   }
 }

 static struct mapped_pci_resource *
-- 
1.9.1

[dpdk-dev] [PATCH v2 2/6] eal: Close file descriptor of uio configuration

2015-03-24 Thread Tetsuya Mukawa

When pci_uio_unmap_resource() is called, a file descriptor that is used
for uio configuration should be closed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 9cdf24f..f0277be 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -459,8 +459,12 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)

/* close fd if in primary process */
close(dev->intr_handle.fd);
-
dev->intr_handle.fd = -1;
+
+   /* close cfg_fd if in primary process */
+   close(dev->intr_handle.uio_cfg_fd);
+   dev->intr_handle.uio_cfg_fd = -1;
+
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 }
 #endif /* RTE_LIBRTE_EAL_HOTPLUG */
-- 
1.9.1

[dpdk-dev] [PATCH v2 1/6] eal: Fix coding style of eal_pci.c and eal_pci_uio.c

2015-03-24 Thread Tetsuya Mukawa

This patch fixes coding style of below files in linuxapp and bsdapp.
 - eal_pci.c
 - eal_pci_uio.c

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 24 +---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 12 
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index fe3ef86..3a22b49 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -161,9 +161,10 @@ fail:
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list = RTE_TAILQ_CAST(rte_uio_tailq.head, 
uio_res_list);
+   size_t i;
+   struct uio_resource *uio_res;
+   struct uio_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -179,10 +180,10 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
!= uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
"Cannot mmap device resource\n");
-   return (-1);
+   return -1;
}
}
-   return (0);
+   return 0;
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -201,7 +202,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t pagesz;
struct rte_pci_addr *loc = >addr;
struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list = RTE_TAILQ_CAST(rte_uio_tailq.head, 
uio_res_list);
+   struct uio_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
struct uio_map *maps;

dev->intr_handle.fd = -1;
@@ -209,7 +211,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)

/* secondary processes - use already recorded details */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return (pci_uio_map_secondary(dev));
+   return pci_uio_map_secondary(dev);

snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u",
dev->addr.bus, dev->addr.devid, dev->addr.function);
@@ -233,7 +235,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL,
"%s(): cannot store uio mmap details\n", __func__);
-   return (-1);
+   return -1;
}

snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -261,7 +263,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
(size_t)maps[j].size)
) == NULL) {
rte_free(uio_res);
-   return (-1);
+   return -1;
}

maps[j].addr = mapaddr;
@@ -271,7 +273,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)

TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);

-   return (0);
+   return 0;
 }

 /* Scan one pci sysfs entry, and fill the devices list from it. */
@@ -311,7 +313,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
/* FreeBSD has no NUMA support (yet) */
dev->numa_node = 0;

-/* parse resources */
+   /* parse resources */
switch (conf->pc_hdr & PCIM_HDRTYPE) {
case PCIM_HDRTYPE_NORMAL:
max = PCIR_MAX_BAR_0;
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 2d1c69b..9cdf24f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -92,7 +92,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 {
int fd, i;
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -272,7 +273,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t phaddr;
struct rte_pci_addr *loc = >addr;
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
struct pci_map *maps;

dev->intr_handle.fd = -1;
@@ -412,7 +414,8 @@ static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
struct mapped_pci_resource

[dpdk-dev] [PATCH 6/6] eal: Fix interface of pci_map_resource()

2015-03-24 Thread Tetsuya Mukawa

On 2015/03/20 18:53, Iremonger, Bernard wrote:
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Tuesday, March 17, 2015 9:31 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; Richardson, Bruce; Tetsuya Mukawa
>> Subject: [PATCH 6/6] eal: Fix interface of pci_map_resource()
>>
>> The function is implemented in both linuxapp and bsdapp, but interface is 
>> different. The patch fixes
>> the function of bsdapp to do same as linuxapp. After applying it, file 
>> descriptor should be opened and
>> closed out of pci_map_resource().
>> Also, remove redundant error messages from linuxapp.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 109 
>> ++
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  21 +++---
>>  2 files changed, 75 insertions(+), 55 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index 08b91b4..21a3d79 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> @@ -100,7 +100,7 @@ struct mapped_pci_resource {
>>
>>  struct rte_pci_addr pci_addr;
>>  char path[PATH_MAX];
>> -size_t nb_maps;
>> +int nb_maps;
>>  struct pci_map maps[PCI_MAX_RESOURCE];  };
>>
>> @@ -122,47 +122,30 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
>> __rte_unused)
>>
>>  /* map a particular resource from a file */  static void * 
>> -pci_map_resource(void *requested_addr,
>> const char *devname, off_t offset,
>> - size_t size)
>> +pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
>> + int additional_flags)
>>  {
>> -int fd;
>>  void *mapaddr;
>>
>> -/*
>> - * open devname, to mmap it
>> - */
>> -fd = open(devname, O_RDWR);
>> -if (fd < 0) {
>> -RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> -devname, strerror(errno));
>> -goto fail;
>> -}
>> -
>>  /* Map the PCI memory resource of device */
>>  mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
>> -MAP_SHARED, fd, offset);
>> -close(fd);
>> -if (mapaddr == MAP_FAILED ||
>> -(requested_addr != NULL && mapaddr != requested_addr)) {
>> -RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
>> -" %s (%p)\n", __func__, devname, fd, requested_addr,
>> +MAP_SHARED | additional_flags, fd, offset);
>> +if (mapaddr == MAP_FAILED) {
>> +RTE_LOG(ERR, EAL,
>> +"%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
>> +__func__, fd, requested_addr,
>>  (unsigned long)size, (unsigned long)offset,
>>  strerror(errno), mapaddr);
>> -goto fail;
> Hi Tetsuya,
>
> Previously pci_map_resource() returned NULL  when MAP_FAILED occurred.
> It now returns MAP_FAILED.
> Where pci_map_resource() is called it should now check for MAP_FAILED instead 
> of NULL, or else the return value of NULL should be restored.
> There is another comment below.

Hi Bernard,

I appreciate for your comment.
Yes it should be. I will fix it.

Regards,
Tetsuya

>
>
>> -}
>> -
>> -RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
>> +} else
>> +RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
>>
>>  return mapaddr;
>> -
>> -fail:
>> -return NULL;
>>  }
>>
>>  static int
>>  pci_uio_map_secondary(struct rte_pci_device *dev)  {
>> -size_t i;
>> +int i, fd;
>>  struct mapped_pci_resource *uio_res;
>>  struct mapped_pci_res_list *uio_res_list =
>>  RTE_TAILQ_CAST(rte_uio_tailq.head, 
>> mapped_pci_res_list); @@ -170,19
>> +153,34 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
>>  TAILQ_FOREACH(uio_res, uio_res_list, next) {
>>
>>  /* skip this element if it doesn't match our PCI address */
>> -if (memcmp(_res->pci_addr, >addr, sizeof(dev->addr)))
>> +if (rte_eal_compare_pci_addr(_res->pci_addr, >addr))
>>  continue;
>>
>>  for (i = 0; i != uio_res->nb_maps; i++) {
>> -if (pci_map_resource(uio_res->maps[i].addr,
>> - uio_res->path,
>> - (off_t)uio_res->maps[i].offset,
>> - (size_t)uio_res->maps[i].size)
>> -!= uio_res->maps[i].addr) {
>> +/*
>> + * open devname, to mmap it
>> + */
>> +fd = open(uio_res->maps[i].path, O_RDWR);
>> +if (fd < 0) {
>> +RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +uio_res->maps[i].path, strerror(errno));
>> +

[dpdk-dev] [PATCH v2 0/6] Clean up pci uio implementations

2015-03-24 Thread Tetsuya Mukawa

This patch set cleans up pci uio implementation. These clean up are
for consolidating pci uio implementation of linuxapp and bsdapp, and
moving consolidated functions in eal common.
Because of above, this patch set tries to implement linuxapp and bsdapp
almost same.
Actual consolidations will be done later patch set.

PATCH v2 changes:
 - Move 'if-condition' to later patch series.
 - Fix memory leaks of path.
 - Fix typos.
   (Thanks to David Marchand)
 - Fix commit title and body.
 - Fix pci_map_resource() to handle MAP_FAILED.
   (Thanks to Iremonger, Bernard)

Changes:
 - This patch set is derived from below.
   "[PATCH v2] eal: Port Hotplug support for BSD"
 - Set cfg_fd as -1, when cfg_fd is closed.
   (Thanks to Iremonger, Bernard)
 - Remove needless coding style fixings.
 - Fix coding style of if-else condition.
   (Thanks to Richardson, Bruce)


Tetsuya Mukawa (6):
  eal: Fix coding style of eal_pci.c and eal_pci_uio.c
  eal: Close file descriptor of uio configuration
  eal: Fix memory leaks and needless increment of pci_map_addr
  eal/bsdapp: Change names of pci related data structure
  eal: Use map_idx in pci_uio_map_resource() of bsdapp to work same as
linuxapp
  eal: Fix interface of pci_map_resource()

 lib/librte_eal/bsdapp/eal/eal_pci.c   | 162 ++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  72 +++--
 2 files changed, 136 insertions(+), 98 deletions(-)

-- 
1.9.1

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-24 Thread Bruce Richardson

On Tue, Mar 24, 2015 at 12:54:14PM +0200, Dor Green wrote:
> I've managed to fix it so 1.8 works, and the segmentation fault still occurs.
> 
> On Tue, Mar 24, 2015 at 11:55 AM, Dor Green  wrote:
> > I tried 1.8, but that fails to initialize my device and fails at the pci 
> > probe:
> > "Cause: Requested device :04:00.1 cannot be used"
> > Can't even compile 2.0rc2 atm, getting:
> > "/usr/lib/gcc/x86_64-linux-gnu/4.6/include/emmintrin.h:701:1: note:
> > expected '__m128i' but argument is of type 'int'"
> > For reasons I don't understand.
> >
> > As for the example apps (in 1.7), I can run them properly but I don't
> > think any of them do the same processing as I do. Note that mine does
> > work with most packets.
> >
> >

Couple of further questions:
1. What config options are being used to configure the port and what is the
output printed at port initialization time? This is needed to let us track down
what specific RX path is being used inside the ixgbe driver
2. What type of packets specifically cause problems? Is it reproducible with
one particular packet, or packet type? Are you sending in jumbo-frames?

Regards,
/Bruce

> > On Mon, Mar 23, 2015 at 11:24 PM, Matthew Hall  
> > wrote:
> >> On Mon, Mar 23, 2015 at 05:19:00PM +0200, Dor Green wrote:
> >>> I changed it to free and it still happens. Note that the segmentation 
> >>> fault
> >>> happens before that anyway.
> >>>
> >>> I am using 1.7.1 at the moment. I can try using a newer version.
> >>
> >> I'm using 1.7.X in my open-source DPDK-based app and it works, but I have 
> >> an
> >> IGB 1-gigabit NIC though, and how RX / TX work are quite driver specific of
> >> course.
> >>
> >> I suspect there's some issue with how things are working in your IXGBE NIC
> >> driver / setup. Do the same failures occur inside of the DPDK's own sample
> >> apps?
> >>
> >> Matthew.

[dpdk-dev] [PATCH] vhost: Fix `struct file' leakage in `eventfd_link'

2015-03-24 Thread Pavel Boldin

On Tue, Mar 24, 2015 at 8:28 AM, Xie, Huawei  wrote:

> On 3/23/2015 8:54 PM, Pavel Boldin wrote:
> > Due to increased `struct file's reference counter subsequent call
> > to `filp_close' does not free the `struct file'. Prepend `fput' call
> > to decrease the reference counter.
> >
> > Signed-off-by: Pavel Boldin 
> > ---
> >  lib/librte_vhost/eventfd_link/eventfd_link.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > index 7755dd6..62c45c8 100644
> > --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> > +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > @@ -117,6 +117,7 @@ eventfd_link_ioctl(struct file *f, unsigned int
> ioctl, unsigned long arg)
> >* Release the existing eventfd in the source process
> >*/
> >   spin_lock(>file_lock);
> > + fput(file);
> >   filp_close(file, files);
> >   fdt = files_fdtable(files);
> >   fdt->fd[eventfd_copy.source_fd] = NULL;
> Acked-by Huawei Xie 
>
> In future, we should remove the allocation of src eventfd, allocate a
> free fd from kernel, and install it with target fd.
>

Well, I don't think this is correct, because this will put too much job for
the kernel module making it a combiner.

At the moment I propose to accept module refactoring patch to the upstream.

After that the module can be reworked along with the application code. The
reason is I can't work on DPDK since I have no hardware to test application
at so I can't help with patch that touches application code.

Pavel

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-24 Thread Neil Horman

On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
> Added a 'make system_info' target to print out system info
> related to DPDK. This is intended as output that can be
> attached to bug reports.
> ---
>  mk/rte.sdkroot.mk | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
> index e8423b0..b477d09 100644
> --- a/mk/rte.sdkroot.mk
> +++ b/mk/rte.sdkroot.mk
> @@ -123,3 +123,36 @@ examples examples_clean:
>  %:
>   $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
>   $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
> +
> +.PHONY: system_info
> +system_info:
> + $(Q)echo
> + $(Q)echo "CC version"
> + $(Q)echo "=="
> + $(Q)$(CC) --version
> + $(Q)echo
> +
> + $(Q)echo "DPDK version"
> + $(Q)echo ""
> + $(Q)$(MAKE) showversion
> + $(Q)echo
> +
> + $(Q)echo "Git commit"
> + $(Q)echo "=="
> + $(Q)git log --pretty=format:'%H' -1
> + $(Q)echo
> +
> + $(Q)echo "Uname"
> + $(Q)echo "="
> + $(Q)uname -srvmpio
> + $(Q)echo
> +
> + $(Q)echo "Hugepages"
> + $(Q)echo "="
> + $(Q)grep -i huge /proc/meminfo
> + $(Q)echo
> +
> + $(Q)tools/cpu_layout.py
> +
> + $(Q)tools/dpdk_nic_bind.py --status
> + $(Q)echo
> -- 
> 1.8.1.4
> 
> 
Nak, for a few reasons:

1) While this target is in a common makefile, at least some of the information
it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't going
to work on BSD, or other operating systems that we might support in the future

2) This is tied to the build system.  Theres no guarantee that users will
diagnose problems only on the system that they built the DPDK on.  

A better solution might be to simply document the sort of information that a bug
reporter is expected to gather, along with some sample tools for doing so.
There are numerous tools to get the above information, both in isolation and in
aggregate.

Neil

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-24 Thread Dor Green

I've managed to fix it so 1.8 works, and the segmentation fault still occurs.

On Tue, Mar 24, 2015 at 11:55 AM, Dor Green  wrote:
> I tried 1.8, but that fails to initialize my device and fails at the pci 
> probe:
> "Cause: Requested device :04:00.1 cannot be used"
> Can't even compile 2.0rc2 atm, getting:
> "/usr/lib/gcc/x86_64-linux-gnu/4.6/include/emmintrin.h:701:1: note:
> expected '__m128i' but argument is of type 'int'"
> For reasons I don't understand.
>
> As for the example apps (in 1.7), I can run them properly but I don't
> think any of them do the same processing as I do. Note that mine does
> work with most packets.
>
>
> On Mon, Mar 23, 2015 at 11:24 PM, Matthew Hall  
> wrote:
>> On Mon, Mar 23, 2015 at 05:19:00PM +0200, Dor Green wrote:
>>> I changed it to free and it still happens. Note that the segmentation fault
>>> happens before that anyway.
>>>
>>> I am using 1.7.1 at the moment. I can try using a newer version.
>>
>> I'm using 1.7.X in my open-source DPDK-based app and it works, but I have an
>> IGB 1-gigabit NIC though, and how RX / TX work are quite driver specific of
>> course.
>>
>> I suspect there's some issue with how things are working in your IXGBE NIC
>> driver / setup. Do the same failures occur inside of the DPDK's own sample
>> apps?
>>
>> Matthew.

[dpdk-dev] [PATCH] cmdline: fix type format from unsigned to size_t for buffer size

2015-03-24 Thread Olivier MATZ

Hi,

On 03/24/2015 11:48 AM, Jastrzebski, MichalX K wrote:
>>> On 02/20/2015 05:18 PM, Daniel Mrzyglod wrote:
 Function match_inst is used to take buffor using sizeof() which is size_t
>> type.
 This modification also involved changing '%u' to '%zu' in printf function.

 Signed-off-by: Daniel Mrzyglod 

 [...]
>>>
>>> Did you see a specific issue with the current code? (maybe a compilation
>>> issue or a klocwork issue?)
> Hi Olivier, Thomas
> Yes, this is an issue reported by the static analysis tool.
>>>
>>> I think this patch is ok, but there are many places where this kind
>>> of fixes should be applied in cmdline (cmdline_parse_*(),
>>> cmdline_get_help_*(), etc.). Is there a motivation for changing it
>>> only there?
> The tool we use didn't reported other places,
> that is why Daniel send only this one change.

I think it would be great to look at other places to check if the same
change is required.

Regards,
Olivier

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-24 Thread Dor Green

I tried 1.8, but that fails to initialize my device and fails at the pci probe:
"Cause: Requested device :04:00.1 cannot be used"
Can't even compile 2.0rc2 atm, getting:
"/usr/lib/gcc/x86_64-linux-gnu/4.6/include/emmintrin.h:701:1: note:
expected '__m128i' but argument is of type 'int'"
For reasons I don't understand.

As for the example apps (in 1.7), I can run them properly but I don't
think any of them do the same processing as I do. Note that mine does
work with most packets.

On Mon, Mar 23, 2015 at 11:24 PM, Matthew Hall  wrote:
> On Mon, Mar 23, 2015 at 05:19:00PM +0200, Dor Green wrote:
>> I changed it to free and it still happens. Note that the segmentation fault
>> happens before that anyway.
>>
>> I am using 1.7.1 at the moment. I can try using a newer version.
>
> I'm using 1.7.X in my open-source DPDK-based app and it works, but I have an
> IGB 1-gigabit NIC though, and how RX / TX work are quite driver specific of
> course.
>
> I suspect there's some issue with how things are working in your IXGBE NIC
> driver / setup. Do the same failures occur inside of the DPDK's own sample
> apps?
>
> Matthew.

[dpdk-dev] [PATCH v2 2/6] eal: Close file descriptor of uio configuration

2015-03-24 Thread Stephen Hemminger

On Tue, 24 Mar 2015 13:18:33 +0900
Tetsuya Mukawa  wrote:

> When pci_uio_unmap_resource() is called, a file descriptor that is used
> for uio configuration should be closed.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index 9cdf24f..f0277be 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -459,8 +459,12 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)
>  
>   /* close fd if in primary process */
>   close(dev->intr_handle.fd);
> -
>   dev->intr_handle.fd = -1;
> +
> + /* close cfg_fd if in primary process */
> + close(dev->intr_handle.uio_cfg_fd);
> + dev->intr_handle.uio_cfg_fd = -1;
> +
>   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>  }
>  #endif /* RTE_LIBRTE_EAL_HOTPLUG */


For the Qlogic/Broadcom driver it needed the config fd handle, and I added
generic config space access functions.

[dpdk-dev] dpdk kni ping answer

2015-03-24 Thread Yaron Illouz


The source of my problem seems to be that Ethtool is not supported in
VMs (VF).

Is there another solution to give an ip to a dpdk port?


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yaron Illouz
Sent: Monday, March 23, 2015 2:08 PM
To: dev at dpdk.org; dev
Subject: [dpdk-dev] dpdk kni ping answer

Hi 



I want to give an ip to a dpdk port, because I need to receive traffic
by gre replication.

I though I could do it with dpdk kni
(http://dpdk.org/doc/guides/prog_guide/kernel_nic_interface.html)



I am able to give an ip to interface vEth0, but I can't ping or ssh to
the ip address.

I set the ip with the following command: ifconfig vEth0 192.168.69.4
netmask 255.255.255.0 up



Is this something possible to do with dpdk kni?

If not, what could be another solution for this.





Regards

[dpdk-dev] [PATCH v2] rte_mbuf: mbuf bulk alloc/free functions added + unittest

2015-03-24 Thread Ananyev, Konstantin



Hi Vadim,

> From: Vadim Suraev [mailto:vadim.suraev at gmail.com]
> Sent: Tuesday, March 24, 2015 7:53 AM
> To: Ananyev, Konstantin
> Cc: Olivier MATZ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] rte_mbuf: mbuf bulk alloc/free functions 
> added + unittest
> 
> Hi, Konstantin,
> 
> >Though from my point, such function should be generic as 
> >rte_pktmbuf_free_chain() -
> >no special limitations like all mbufs from one pool, refcnt==1, etc.
> I misunderstood, I'll rework.

Ok, thanks.
Konstantin

> Regards,
> ?Vadim.
> 
> On Tue, Mar 24, 2015 at 1:48 AM, Ananyev, Konstantin  intel.com> wrote:
> Hi Vadim,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vadim Suraev
> > Sent: Monday, March 23, 2015 5:31 PM
> > To: Olivier MATZ
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] rte_mbuf: mbuf bulk alloc/free functions 
> > added + unittest
> >
> > Hi, Olivier,
> > No, I personally need to free a chain using mempool bulk. If I'm not
> > mistaken, it has been proposed by one of reviewers to have lower level
> > function, so it was done (I'm sorry if misunderstood)
> 
> Was it me?
> As I remember, I said it would be good to create rte_pktmbuf_bulk_free() or 
> rte_pktmbuf_seg_bulk_free() -
> that would free a bulk of mbufs segments in the same manner as 
> rte_pktmbuf_free_chain() does:
> count number of consecutive mbufs from the same mempool to be freed and then 
> put them back into the pool at one go.
> Such function would be useful inside PMD code.
> In fact we already using analogy of such function inside vPMD TX code.
> Though from my point, such function should be generic as 
> rte_pktmbuf_free_chain() -
> no special limitations like all mbufs from one pool, refcnt==1, etc.
> So if it was me who confused you - I am sorry.
> Konstantin
> 
> > Regards,
> > Vadim.
> > On Mar 23, 2015 8:44 PM, "Olivier MATZ"  wrote:
> >
> > > Hi Neil,
> > >
> > > On 03/19/2015 02:16 PM, Neil Horman wrote:
> > > >> On 03/18/2015 09:58 PM, Neil Horman wrote:
> > >  +/**
> > >  + * Free a bulk of mbufs into its original mempool.
> > >  + * This function assumes:
> > >  + * - refcnt equals 1
> > >  + * - mbufs are direct
> > >  + * - all mbufs must belong to the same mempool
> > >  + *
> > >  + * @param mbufs
> > >  + *? ? Array of pointers to mbuf
> > >  + * @param count
> > >  + *? ? Array size
> > >  + */
> > >  +static inline void rte_pktmbuf_bulk_free(struct rte_mbuf **mbufs,
> > >  +? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned count)
> > >  +{
> > >  +? unsigned idx;
> > >  +
> > >  +? RTE_MBUF_ASSERT(count > 0);
> > >  +
> > >  +? for (idx = 0; idx < count; idx++) {
> > >  +? ? ? ? ? RTE_MBUF_ASSERT(mbufs[idx]->pool == mbufs[0]->pool);
> > >  +? ? ? ? ? RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 1);
> > >  +? ? ? ? ? rte_mbuf_refcnt_set(mbufs[idx], 0);
> > > >>> This is really a misuse of the API.? The entire point of reference
> > > counting is
> > > >>> to know when an mbuf has no more references and can be freed.? By
> > > forcing all
> > > >>> the reference counts to zero here, you allow the refcnt infrastructure
> > > to be
> > > >>> circumvented, causing memory leaks.
> > > >>>
> > > >>> I think what you need to do here is enhance the underlying pktmbuf
> > > interface
> > > >>> such that an rte_mbuf structure has a destructor method association
> > > with it
> > > >>> which is called when its refcnt reaches zero.? That way the
> > > >>> rte_pktmbuf_bulk_free function can just decrement the refcnt on each
> > > >>> mbuf_structure, and the pool as a whole can be returned when the
> > > destructor
> > > >>> function discovers that all mbufs in that bulk pool are freed.
> > > >>
> > > >> I don't really understand what's the problem here. The API explicitly
> > > >> describes the conditions for calling this functions: the segments are
> > > >> directs, they belong to the same mempool, and their refcnt is 1.
> > > >>
> > > >> This function could be useful in a driver which knows that the mbuf
> > > >> it allocated matches this conditions. I think an application that
> > > >> only uses direct mbufs and one mempool could also use this function.
> > > >
> > > >
> > > > That last condition is my issue with this patch, that the user has to
> > > know what
> > > > refcnts are.? It makes this api useful for little more than the test
> > > case that
> > > > is provided with it.? Its irritating enough that for singly allocated
> > > mbufs the
> > > > user has to know what the refcount of a buffer is before freeing, but at
> > > least
> > > > they can macrotize a {rte_pktmbuf_refcnt_update;
> > > if(rte_pktmbuf_refct_read) then
> > > > free} operation.
> > > >
> > > > With this, you've placed the user in charge of not only doing that, but
> > > also of
> > > > managing the relationship between pktmbufs and the pool they came from.
> > > while
> > > > that makes

[dpdk-dev] [PATCH] cmdline: fix type format from unsigned to size_t for buffer size

2015-03-24 Thread Jastrzebski, MichalX K

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Monday, March 23, 2015 12:47 PM
> To: Mrzyglod, DanielX T
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] cmdline: fix type format from unsigned to
> size_t for buffer size
> 
> Daniel,
> Without answer to the question from Olivier, this patch is going to be
> ignored.
> 
> 2015-02-24 12:03, Olivier MATZ:
> > Hi Daniel,
> >
> > On 02/20/2015 05:18 PM, Daniel Mrzyglod wrote:
> > > Function match_inst is used to take buffor using sizeof() which is size_t
> type.
> > > This modification also involved changing '%u' to '%zu' in printf function.
> > >
> > > Signed-off-by: Daniel Mrzyglod 
> > > ---
> > >   lib/librte_cmdline/cmdline_parse.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_cmdline/cmdline_parse.c
> b/lib/librte_cmdline/cmdline_parse.c
> > > index dfc885c..0821791 100644
> > > --- a/lib/librte_cmdline/cmdline_parse.c
> > > +++ b/lib/librte_cmdline/cmdline_parse.c
> > > @@ -138,7 +138,7 @@ nb_common_chars(const char * s1, const char *
> s2)
> > >*/
> > >   static int
> > >   match_inst(cmdline_parse_inst_t *inst, const char *buf,
> > > -unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
> > > +unsigned int nb_match_token, void *resbuf, size_t resbuf_size)
> > >   {
> > >   unsigned int token_num=0;
> > >   cmdline_parse_token_hdr_t * token_p;
> > > @@ -169,7 +169,7 @@ match_inst(cmdline_parse_inst_t *inst, const
> char *buf,
> > >
> > >   if (token_hdr.offset > resbuf_size) {
> > >   printf("Parse error(%s:%d): Token 
> > > offset(%u)
> "
> > > - "exceeds maximum size(%u)\n",
> > > + "exceeds maximum size(%zu)\n",
> > >   __FILE__, __LINE__,
> > >   token_hdr.offset, resbuf_size);
> > >   return -ENOBUFS;
> > >
> >
> >
> > Did you see a specific issue with the current code? (maybe a compilation
> > issue or a klocwork issue?)
Hi Olivier, Thomas
Yes, this is an issue reported by the static analysis tool.
> >
> > I think this patch is ok, but there are many places where this kind
> > of fixes should be applied in cmdline (cmdline_parse_*(),
> > cmdline_get_help_*(), etc.). Is there a motivation for changing it
> > only there?
The tool we use didn't reported other places, 
that is why Daniel send only this one change.
> >
> > Regards,
> > Olivier
> >
>

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Xie, Huawei

On 3/24/2015 3:44 PM, Linhaifeng wrote:
>
> On 2015/3/24 9:53, Xie, Huawei wrote:
>> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>> On 2015/3/23 20:54, Xie, Huawei wrote:
> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
> Sent: Monday, March 23, 2015 8:24 PM
> To: dev at dpdk.org
> Cc: Ouyang, Changchun; Xie, Huawei
> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>
>
>
> On 2015/3/21 16:07, linhaifeng wrote:
>> From: Linhaifeng 
>>
>> Same as rte_vhost_enqueue_burst we should cast used->idx
>> to volatile before notify guest.
>>
>> Signed-off-by: Linhaifeng 
>> ---
>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>> b/lib/librte_vhost/vhost_rxtx.c
>> index 535c7a1..8d674d1 100644
>> --- a/lib/librte_vhost/vhost_rxtx.c
>> +++ b/lib/librte_vhost/vhost_rxtx.c
>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
> uint16_t queue_id,
>>  }
>>
>>  rte_compiler_barrier();
>> -vq->used->idx += entry_success;
>> +*(volatile uint16_t *)>used->idx += entry_success;
 Haifeng:
 We have compiler barrier before and an external function call behind, so 
 we don't need volatile  here.
 Do you meet issue?

>>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
>>> there are no buffers in tx_q and virtio_net
>>> though vhost haven't handle all packets so we have to restart VM to restore 
>>> work.
>>>
>>> The status in VM is:
>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
>>> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
>>> svq->last_used_idx=52820
>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
>>> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
>>> rvq->last_used_idx=35977
>>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
>>> qdisc->flags=4, qdisc->state deactiveed=0
>>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
>>> txq->state=1, stopped=1
>>>
>>> Why compiler barrier not take effect in our case? Is compiler barrier 
>>> depended on -O3 option? We use -O2 option.
>> compiler barrier always works regardless of the optimization option.
>> I don't get your story, but the key thing is, do you check the asm code?
>> If called from outside as an API, how is it possible it is optimized?
>> there is only one update to used->idx in that function.
>
> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile 
> ? Why not remove it?
I checked the code. Seems we can remove. That is another issue.
For your issue, you meet problem, and submit this this patch, but i am a
bit confused it is the root cause. Do you check the asm code that
volatile is optimized?

>>  /* Kick guest if required. */
>>  if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>  eventfd_write((int)vq->callfd, 1);
>>
> --
> Regards,
> Haifeng

>>
>> .
>>

[dpdk-dev] [PATCH v2] rte_mbuf: mbuf bulk alloc/free functions added + unittest

2015-03-24 Thread Vadim Suraev

Hi, Konstantin,

>Though from my point, such function should be generic as
rte_pktmbuf_free_chain() -
>no special limitations like all mbufs from one pool, refcnt==1, etc.
I misunderstood, I'll rework.
Regards,
 Vadim.

On Tue, Mar 24, 2015 at 1:48 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> Hi Vadim,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vadim Suraev
> > Sent: Monday, March 23, 2015 5:31 PM
> > To: Olivier MATZ
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] rte_mbuf: mbuf bulk alloc/free
> functions added + unittest
> >
> > Hi, Olivier,
> > No, I personally need to free a chain using mempool bulk. If I'm not
> > mistaken, it has been proposed by one of reviewers to have lower level
> > function, so it was done (I'm sorry if misunderstood)
>
> Was it me?
> As I remember, I said it would be good to create rte_pktmbuf_bulk_free()
> or rte_pktmbuf_seg_bulk_free() -
> that would free a bulk of mbufs segments in the same manner as
> rte_pktmbuf_free_chain() does:
> count number of consecutive mbufs from the same mempool to be freed and
> then put them back into the pool at one go.
> Such function would be useful inside PMD code.
> In fact we already using analogy of such function inside vPMD TX code.
> Though from my point, such function should be generic as
> rte_pktmbuf_free_chain() -
> no special limitations like all mbufs from one pool, refcnt==1, etc.
> So if it was me who confused you - I am sorry.
> Konstantin
>
> > Regards,
> > Vadim.
> > On Mar 23, 2015 8:44 PM, "Olivier MATZ"  wrote:
> >
> > > Hi Neil,
> > >
> > > On 03/19/2015 02:16 PM, Neil Horman wrote:
> > > >> On 03/18/2015 09:58 PM, Neil Horman wrote:
> > >  +/**
> > >  + * Free a bulk of mbufs into its original mempool.
> > >  + * This function assumes:
> > >  + * - refcnt equals 1
> > >  + * - mbufs are direct
> > >  + * - all mbufs must belong to the same mempool
> > >  + *
> > >  + * @param mbufs
> > >  + *Array of pointers to mbuf
> > >  + * @param count
> > >  + *Array size
> > >  + */
> > >  +static inline void rte_pktmbuf_bulk_free(struct rte_mbuf **mbufs,
> > >  +   unsigned count)
> > >  +{
> > >  +  unsigned idx;
> > >  +
> > >  +  RTE_MBUF_ASSERT(count > 0);
> > >  +
> > >  +  for (idx = 0; idx < count; idx++) {
> > >  +  RTE_MBUF_ASSERT(mbufs[idx]->pool == mbufs[0]->pool);
> > >  +  RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 1);
> > >  +  rte_mbuf_refcnt_set(mbufs[idx], 0);
> > > >>> This is really a misuse of the API.  The entire point of reference
> > > counting is
> > > >>> to know when an mbuf has no more references and can be freed.  By
> > > forcing all
> > > >>> the reference counts to zero here, you allow the refcnt
> infrastructure
> > > to be
> > > >>> circumvented, causing memory leaks.
> > > >>>
> > > >>> I think what you need to do here is enhance the underlying pktmbuf
> > > interface
> > > >>> such that an rte_mbuf structure has a destructor method association
> > > with it
> > > >>> which is called when its refcnt reaches zero.  That way the
> > > >>> rte_pktmbuf_bulk_free function can just decrement the refcnt on
> each
> > > >>> mbuf_structure, and the pool as a whole can be returned when the
> > > destructor
> > > >>> function discovers that all mbufs in that bulk pool are freed.
> > > >>
> > > >> I don't really understand what's the problem here. The API
> explicitly
> > > >> describes the conditions for calling this functions: the segments
> are
> > > >> directs, they belong to the same mempool, and their refcnt is 1.
> > > >>
> > > >> This function could be useful in a driver which knows that the mbuf
> > > >> it allocated matches this conditions. I think an application that
> > > >> only uses direct mbufs and one mempool could also use this function.
> > > >
> > > >
> > > > That last condition is my issue with this patch, that the user has to
> > > know what
> > > > refcnts are.  It makes this api useful for little more than the test
> > > case that
> > > > is provided with it.  Its irritating enough that for singly allocated
> > > mbufs the
> > > > user has to know what the refcount of a buffer is before freeing,
> but at
> > > least
> > > > they can macrotize a {rte_pktmbuf_refcnt_update;
> > > if(rte_pktmbuf_refct_read) then
> > > > free} operation.
> > > >
> > > > With this, you've placed the user in charge of not only doing that,
> but
> > > also of
> > > > managing the relationship between pktmbufs and the pool they came
> from.
> > > while
> > > > that makes sense for the test case, it really doesn't in any general
> use
> > > case in
> > > > which packet processing is ever deferred or queued, because it means
> > > that the
> > > > application is now responsible for holding a pointer to every packet
> it
> > > > allocates and checking its

[dpdk-dev] [RFC PATCHv2 5/8] add OSv support

2015-03-24 Thread Neil Horman

On Sat, Mar 21, 2015 at 12:23:02PM +0900, Takuya ASADA wrote:
> Adding OSv support.
> Based on Linux/FreeBSD EAL, but calling OSv kernel APIs to access devices, 
> allocate contiguous memory, etc.
> 
> Signed-off-by: Takuya ASADA 
> ---
> diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h 
> b/lib/librte_eal/osvapp/eal/include/exec-env/rte_interrupts.h
> similarity index 100%
> copy from lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
> copy to lib/librte_eal/osvapp/eal/include/exec-env/rte_interrupts.h
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
> b/lib/librte_eal/osvapp/eal/rte_eal_version.map
> similarity index 92%
> copy from lib/librte_eal/bsdapp/eal/rte_eal_version.map
> copy to lib/librte_eal/osvapp/eal/rte_eal_version.map
> index 67b6a6c..ebb584a 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/osvapp/eal/rte_eal_version.map
> @@ -31,24 +31,26 @@ DPDK_2.0 {
>   rte_eal_get_physmem_layout;
>   rte_eal_get_physmem_size;
>   rte_eal_has_hugepages;
> - rte_eal_hpet_init;
>   rte_eal_init;
>   rte_eal_iopl_init;
>   rte_eal_lcore_role;
>   rte_eal_mp_remote_launch;
>   rte_eal_mp_wait_lcore;
> + rte_eal_parse_devargs_str;
> + rte_eal_pci_close_one;
>   rte_eal_pci_dump;
>   rte_eal_pci_probe;
> + rte_eal_pci_probe_one;
>   rte_eal_pci_register;
>   rte_eal_pci_unregister;
>   rte_eal_process_type;
>   rte_eal_remote_launch;
>   rte_eal_tailq_lookup;
>   rte_eal_tailq_register;
> + rte_eal_vdev_init;
> + rte_eal_vdev_uninit;
>   rte_eal_wait_lcore;
>   rte_exit;
> - rte_get_hpet_cycles;
> - rte_get_hpet_hz;
>   rte_get_tsc_hz;
>   rte_hexdump;
>   rte_intr_callback_register;
> @@ -86,9 +88,6 @@ DPDK_2.0 {
>   rte_thread_get_affinity;
>   rte_thread_set_affinity;
>   rte_vlog;
> - rte_xen_dom0_memory_attach;
> - rte_xen_dom0_memory_init;
> - test_mp_secondary;
>  
I understand removing symbols that you don't support, but why are you adding the
vdev, pci and parsing symbols?  I don't see you using them anywhere in your
code...

Neil

[dpdk-dev] tools brainstorming

2015-03-24 Thread Simon Kågström

On 2015-03-23 17:29, Thomas Monjalon wrote:
> 2015-03-20 16:18, Simon K?gstr?m:
>>> - make autotests easier and faster to run for smoke testing
>>> - automated basic testpmd check
>>
>> Code coverage for automated tests can be useful as well.
>>
>> In a way I'm speaking in my own interests here since I've written a tool
>> to do just this (and produce nice HTML etc output), kcov, that can be
>> found at github (https://github.com/SimonKagstrom/kcov).
> 
> Feel free to do some DPDK integration for your kcov tool.
> It could definitely help to write good tests.

Well, there's not that much integration needed for kcov. It needs
debugging information in the binary, so something like the patch below
is needed (and the same for clang, not sure about icc). Apart from that,
I just ran it as

  kcov --exclude-path=/usr /tmp/kcov-dpdk ./basicfwd --no-huge -c 1 -n 1

to collect coverage info and generate a report. The report for this
particular case can be seen here:

  http://www.c64-network.org/kcov-dpdk/index.html

I'm not sure how your automatic test suite works, but just running it
like you do today but through kcov like above should do the trick.

// Simon

diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 88f235c..d3cd9e5 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -60,7 +60,7 @@ endif
 HOSTAS= as

 TOOLCHAIN_ASFLAGS =
-TOOLCHAIN_CFLAGS =
+TOOLCHAIN_CFLAGS = -g
 TOOLCHAIN_LDFLAGS =

 ifeq ($(CONFIG_RTE_LIBRTE_GCOV),y)

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/23 20:54, Xie, Huawei wrote:
> 
> 
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Monday, March 23, 2015 8:24 PM
>> To: dev at dpdk.org
>> Cc: Ouyang, Changchun; Xie, Huawei
>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>
>>
>>
>> On 2015/3/21 16:07, linhaifeng wrote:
>>> From: Linhaifeng 
>>>
>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>> to volatile before notify guest.
>>>
>>> Signed-off-by: Linhaifeng 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>> index 535c7a1..8d674d1 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>> uint16_t queue_id,
>>> }
>>>
>>> rte_compiler_barrier();
>>> -   vq->used->idx += entry_success;
>>> +   *(volatile uint16_t *)>used->idx += entry_success;
> 
> 
> Haifeng:
> We have compiler barrier before and an external function call behind, so we 
> don't need volatile  here.
> Do you meet issue?
> 

Tx_q is sometimes stopped when we use virtio_net. Because vhost thought there 
are no buffers in tx_q and virtio_net
though vhost haven't handle all packets so we have to restart VM to restore 
work.

The status in VM is:
Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
svq->last_used_idx=52820
Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
rvq->last_used_idx=35977
Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
qdisc->flags=4, qdisc->state deactiveed=0
Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
txq->state=1, stopped=1

Why compiler barrier not take effect in our case? Is compiler barrier depended 
on -O3 option? We use -O2 option.

>>> /* Kick guest if required. */
>>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>> eventfd_write((int)vq->callfd, 1);
>>>
>>
>> --
>> Regards,
>> Haifeng
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] ixgbe: do not include CRC in Tx byte count

2015-03-24 Thread Stephen Hemminger

On Mon, 23 Mar 2015 16:45:44 +
"Ananyev, Konstantin"  wrote:

> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of stephen at 
> > networkplumber.org
> > Sent: Friday, January 23, 2015 6:24 AM
> > To: dev at dpdk.org
> > Cc: Stephen Hemminger
> > Subject: [dpdk-dev] [PATCH] ixgbe: do not include CRC in Tx byte count
> > 
> > From: Stephen Hemminger 
> > 
> > The ixgbe driver was including CRC in the transmit packet byte
> > count, but not for packets received.
> > This was notice when forwarding and
> > the number of bytes received was greater than the number of bytes 
> > transmitted
> > for the same number of packets. Make the driver behave like other
> > virtual devices and not include CRC in byte count. Use the same queue
> > counters already computed and used for Rx.
> 
> About RX side stats - as I remember it depends to what value hw_stip_crc is 
> set at configure().
> If hw_stip_crc==1, then, yes CRC bytes are not included into  QBRC value.
> I If hw_stip_crc==0, then CRC bytes are included into QBRC. 

That is an additional bug!
  * CRC should never be included in the byte count.
This is not how Linux or other drivers report byte count.

  * the byte count must be symmetrical Rx == Tx

The Brocade router always set strip_crc to 1. So did not see the additional bug
of CRC being included in byte count.

[dpdk-dev] [PATCH] ixgbe: fix data access on big endian cpu

2015-03-24 Thread Xuelin Shi

Hi Thomas,

Done. http://patchwork.dpdk.org/dev/patchwork/patch/4123/

Thanks,
Xuelin Shi

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, March 23, 2015 22:02
> To: Shi Xuelin-B29237
> Cc: dev at dpdk.org; konstantin.ananyev at intel.com; helin.zhang at intel.com
> Subject: Re: [PATCH] ixgbe: fix data access on big endian cpu
> 
> 2015-03-03 16:27, xuelin.shi at freescale.com:
> > From: Xuelin Shi 
> >
> > enforce rules for cpu and ixgbe exchanging data.
> > 1. cpu use data owned by ixgbe must use rte_le_to_cpu_xx(...) 2. cpu
> > fill data to ixgbe must use rte_cpu_to_le_xx(...)
> >
> > Signed-off-by: Xuelin Shi 
> 
> Please Xuelin, could you rebase on HEAD and fix these checkpatch errors?
> 
> ERROR:SPACING: space prohibited after that '!' (ctx:BxW)
> 
> ERROR:CODE_INDENT: code indent should use tabs where possible
> +^I^I ^I   ^I  IXGBE_RXDADV_STAT_DD)) {$
> 
> Thanks

[dpdk-dev] [PATCH] librte_lpm: define tbl entry reversely for big endian

2015-03-24 Thread Xuelin Shi

Hi Thomas,

Done.  http://patchwork.dpdk.org/dev/patchwork/patch/4122/

Thanks,
Xuelin Shi

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, March 23, 2015 22:04
> To: Shi Xuelin-B29237
> Cc: Bruce Richardson; dev at dpdk.org
> Subject: Re: [PATCH] librte_lpm: define tbl entry reversely for big
> endian
> 
> 2015-03-09 14:02, Bruce Richardson:
> > On Wed, Mar 04, 2015 at 02:34:12PM +0800, xuelin.shi at freescale.com
> wrote:
> > > From: Xuelin Shi 
> > >
> > > This module uses type conversion between struct and int.
> > > Also truncation and comparison is used with this int.
> > > It is not safe for different endian arch.
> > >
> > > Add ifdef for big endian struct to fix this issue.
> > >
> > > Signed-off-by: Xuelin Shi 
> >
> > Get an error compiling this up (using clang on FreeBSD).
> >
> >   CC rte_lpm.o
> >   In file included from
> /usr/home/bruce/dpdk.org/lib/librte_lpm/rte_lpm.c:57:
> >   /usr/home/bruce/dpdk.org/lib/librte_lpm/rte_lpm.h:99:5: fatal error:
> > 'RTE_BYTE_ORDER' is not defined, evaluates to 0 [-Wundef] #if
> RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> >   ^
> >   1 error generated.
> >
> > Adding "#include " should fix the issue.
> 
> Please Xuelin, could you submit a v2?
> Thanks
> 
> > Existing unit tests on IA (little endian) pass fine there-after, but I
> > think for this patch it would be good to have an ack from someone who
> > can validate on a big endian system, since this is what this patch is
> meant to enable.
> >
> > /Bruce
> >
>

[dpdk-dev] [PATCH v3] lib/librte_vhost: update used->idx when allocation of mbuf fails

2015-03-24 Thread Xie, Huawei

On 3/22/2015 8:08 PM, Ouyang, Changchun wrote:
>
>> -Original Message-
>> From: linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Saturday, March 21, 2015 9:47 AM
>> To: dev at dpdk.org
>> Cc: Ouyang, Changchun; Xie, Huawei
>> Subject: [PATCH v3] lib/librte_vhost: update used->idx when allocation of
>> mbuf fails
>>
>> From: Linhaifeng 
>>
>> When failed to malloc buffer from mempool we just update last_used_idx
>> but not used->idx so after many times vhost thought have handle all packets
>> but virtio_net thought vhost have not handle all packets and will not update
>> avail->idx.
>>
>> Signed-off-by: Linhaifeng 
> Acked-by: Changchun Ouyang 
>
>
>
Acked-by: Huawei Xie 

This patch fix the issue.
Simple solution like other PMDs is before processing one descriptor,
ensure allocation of new mbuf is successfull, and then immediately
refill after receiving the packet from the descriptor.
In future, we should consider optimized bulk allocation strategy with
threshold.

[dpdk-dev] [PATCH] vhost: Fix `struct file' leakage in `eventfd_link'

2015-03-24 Thread Xie, Huawei

On 3/23/2015 8:54 PM, Pavel Boldin wrote:
> Due to increased `struct file's reference counter subsequent call
> to `filp_close' does not free the `struct file'. Prepend `fput' call
> to decrease the reference counter.
>
> Signed-off-by: Pavel Boldin 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> index 7755dd6..62c45c8 100644
> --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> @@ -117,6 +117,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>* Release the existing eventfd in the source process
>*/
>   spin_lock(>file_lock);
> + fput(file);
>   filp_close(file, files);
>   fdt = files_fdtable(files);
>   fdt->fd[eventfd_copy.source_fd] = NULL;
Acked-by Huawei Xie 

In future, we should remove the allocation of src eventfd, allocate a
free fd from kernel, and install it with target fd.

[dpdk-dev] [PATCH] i40e: remove ALLOW_LB flag on SRIOV vsi

2015-03-24 Thread Zhang, Helin



> -Original Message-
> From: Wu, Jingjing
> Sent: Friday, March 20, 2015 3:32 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Xu, HuilongX; Zhang, Helin
> Subject: [PATCH] i40e: remove ALLOW_LB flag on SRIOV vsi
> 
> Disable VEB switching by removing ALLOW_LB on SRIOV vsi.
> 
> If the source mac address of packet sent from VF is not listed in the VEB's 
> mac
> table, the VEB will switch the packet back to the VF.
> It's a hardware issue. Enabling ALLOW_LB flag will block VF functions.
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Helin Zhang 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index cf6685e..28ea5dc 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -3059,11 +3059,15 @@ i40e_vsi_setup(struct i40e_pf *pf,
>   ctxt.connection_type = 0x1;
>   ctxt.flags = I40E_AQ_VSI_TYPE_VF;
> 
> - /* Configure switch ID */
> - ctxt.info.valid_sections |=
> - rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
> - ctxt.info.switch_id =
> - rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
> + /**
> +  * Do not configure switch ID to enable VEB switch by
> +  * I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB. Because in Fortville,
> +  * if the source mac address of packet sent from VF is not
> +  * listed in the VEB's mac table, the VEB will switch the
> +  * packet back to the VF. Need to enable it when HW issue
> +  * is fixed.
> +  */
> +
>   /* Configure port/vlan */
>   ctxt.info.valid_sections |=
>   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_VLAN_VALID);
> --
> 1.9.3

[dpdk-dev] [PATCH] kni/ethtool/ixgbe: enforce access between ixgbe PCI and CPU

2015-03-24 Thread Zhang, Helin

Hi Xuelin

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> xuelin.shi at freescale.com
> Sent: Wednesday, February 11, 2015 2:50 PM
> To: thomas.monjalon at 6wind.com
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] kni/ethtool/ixgbe: enforce access between ixgbe
> PCI and CPU
> 
> From: Xuelin Shi 
> 
> make sure:
>   CPU read from ixgbe with IXGBE_LE32_TO_CPUS
> CPU write to ixgbe with IXGBE_CPU_TO_LE32
> 
> otherwise, there is endian issue for ixgbe on BIG_ENDIAN CPU.
I got your concern, but you modified in wrong places. Source files in 
linux/kni/ will
be compiled into a kernel module. So the endian issue will be handled quite 
well by kernel
functions like writel, readl, etc. And your modifications are not needed at all 
for KNI
kernel module.

But I think the similar changes are needed in librte_pmd_e1000, 
librte_pmd_ixgbe,
librte_pmd_i40e, etc.

Regards,
Helin

> 
> Signed-off-by: Xuelin Shi 
> ---
>  .../linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h   | 24
> --
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h
> b/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h
> index d161600..0612632 100644
> --- a/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h
> +++ b/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h
> @@ -53,6 +53,16 @@
> 
>  #undef ASSERT
> 
> +static inline uint32_t ixgbe_read_addr(volatile void* addr) {
> + return IXGBE_LE32_TO_CPUS(*((volatile uint32_t *)addr)); }
> +
> +static inline uint32_t ixgbe_write_addr(u32 value, volatile void* addr)
> +{
> + return writel(IXGBE_CPU_TO_LE32(value), addr); }
> +
>  #ifdef DBG
>  #define hw_dbg(hw, S, A...)  printk(KERN_DEBUG S, ## A)
>  #else
> @@ -91,19 +101,20 @@
>   default: \
>   break; \
>   } \
> - writel((value), ((a)->hw_addr + (reg))); \
> + ixgbe_write_addr((value), ((a)->hw_addr + (reg))); \
>  } while (0)
>  #else
> -#define IXGBE_WRITE_REG(a, reg, value) writel((value), ((a)->hw_addr +
> (reg)))
> +#define IXGBE_WRITE_REG(a, reg, value) \
> + ixgbe_write_addr((value), ((a)->hw_addr + (reg)))
>  #endif
> 
> -#define IXGBE_READ_REG(a, reg) readl((a)->hw_addr + (reg))
> +#define IXGBE_READ_REG(a, reg) ixgbe_read_addr((a)->hw_addr + (reg))
> 
>  #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, value) ( \
> - writel((value), ((a)->hw_addr + (reg) + ((offset) << 2
> + ixgbe_write_addr((value), ((a)->hw_addr + (reg) + ((offset) << 2
> 
>  #define IXGBE_READ_REG_ARRAY(a, reg, offset) ( \
> - readl((a)->hw_addr + (reg) + ((offset) << 2)))
> + ixgbe_read_addr((a)->hw_addr + (reg) + ((offset) << 2)))
> 
>  #ifndef writeq
>  #define writeq(val, addr)do { writel((u32) (val), addr); \
> @@ -111,7 +122,8 @@
>   } while (0);
>  #endif
> 
> -#define IXGBE_WRITE_REG64(a, reg, value) writeq((value), ((a)->hw_addr +
> (reg)))
> +#define IXGBE_WRITE_REG64(a, reg, value) \
> + writeq((cpu_to_le64(value)), ((a)->hw_addr + (reg)))
> 
>  #define IXGBE_WRITE_FLUSH(a) IXGBE_READ_REG(a, IXGBE_STATUS)
> struct ixgbe_hw;
> --
> 1.9.1

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Xie, Huawei

On 3/24/2015 9:00 AM, Linhaifeng wrote:
>
> On 2015/3/23 20:54, Xie, Huawei wrote:
>>
>>> -Original Message-
>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>> Sent: Monday, March 23, 2015 8:24 PM
>>> To: dev at dpdk.org
>>> Cc: Ouyang, Changchun; Xie, Huawei
>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>>
>>>
>>>
>>> On 2015/3/21 16:07, linhaifeng wrote:
 From: Linhaifeng 

 Same as rte_vhost_enqueue_burst we should cast used->idx
 to volatile before notify guest.

 Signed-off-by: Linhaifeng 
 ---
  lib/librte_vhost/vhost_rxtx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
 index 535c7a1..8d674d1 100644
 --- a/lib/librte_vhost/vhost_rxtx.c
 +++ b/lib/librte_vhost/vhost_rxtx.c
 @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>>> uint16_t queue_id,
}

rte_compiler_barrier();
 -  vq->used->idx += entry_success;
 +  *(volatile uint16_t *)>used->idx += entry_success;
>>
>> Haifeng:
>> We have compiler barrier before and an external function call behind, so we 
>> don't need volatile  here.
>> Do you meet issue?
>>
> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought there 
> are no buffers in tx_q and virtio_net
> though vhost haven't handle all packets so we have to restart VM to restore 
> work.
>
> The status in VM is:
> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
> svq->last_used_idx=52820
> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
> rvq->last_used_idx=35977
> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
> qdisc->flags=4, qdisc->state deactiveed=0
> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
> txq->state=1, stopped=1
>
> Why compiler barrier not take effect in our case? Is compiler barrier 
> depended on -O3 option? We use -O2 option.
compiler barrier always works regardless of the optimization option.
I don't get your story, but the key thing is, do you check the asm code?
If called from outside as an API, how is it possible it is optimized?
there is only one update to used->idx in that function.
>
/* Kick guest if required. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
eventfd_write((int)vq->callfd, 1);

>>> --
>>> Regards,
>>> Haifeng
>>
>>

[dpdk-dev] [PATCH] scripts: enable extended tag of PCIe

2015-03-24 Thread Zhang, Helin

Hi Thomas

Zhida is our intern who has already been back to university. I think Yong might 
have reviewed it.
It is good supplementation for setting extended tag on Linux, though not 
necessary. I am OK to have it merged or not. Thanks!

Marvin, could you help to ack it, as I know you have reviewed it?

Regards,
Helin

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, March 23, 2015 7:53 PM
> To: Zang, Zhida
> Cc: dev at dpdk.org; Butler, Siobhan A; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH] scripts: enable extended tag of PCIe
> 
> Hi,
> 
> This patch needs review and documentation.
> It's going to be dropped if nobody cares.
> 
> There were some previous discussions about it:
>   http://dpdk.org/ml/archives/dev/2015-February/012708.html
> 
> 
> 2015-01-30 12:57, zhida zang:
> > As 'extended tag' of PCIe needs to be enabled for i40e high
> > performance, Linux command of 'setpci' can be used to check and set
> > the corresponding bit of 'extended tag' of PCIe configuration space.
> > The script is to check and set the right bit in PCIe configuration space to
> enable 'extended tag'.
> >
> > Signed-off-by: Zhida Zang 
> > ---
> >  tools/set_pci.py | 124
> > +++
> >  1 file changed, 124 insertions(+)
> >  create mode 100755 tools/set_pci.py
> >
> > diff --git a/tools/set_pci.py b/tools/set_pci.py new file mode 100755
> > index 000..e242efb
> > --- /dev/null
> > +++ b/tools/set_pci.py
> > @@ -0,0 +1,124 @@
> > +#! /usr/bin/python
> > +import sys
> > +import os
> > +import subprocess
> > +import getopt
> > +from os.path import basename
> > +
> > +# The register to check if extended tag is supported or not.
> > +PCI_DEV_CAP_REG = 0xA4
> > +# The control register which contains the bit to enable/disable 'extended
> tag'.
> > +PCI_DEV_CTRL_REG = 0xA8
> > +# The mask of 'extended tag' in capability register.
> > +PCI_DEV_CAP_EXT_TAG_MASK = 0x20
> > +# The mask of 'extended tag' in control register.
> > +PCI_DEV_CTRL_EXT_TAG_MASK = 0x100
> > +
> > +dev_ids = {}
> > +flag = "Set"
> > +
> > +
> > +def usage():
> > +'''Print usage information for the program'''
> > +argv0 = basename(sys.argv[0])
> > +print """
> > +Usage:
> > +--
> > +
> > +%(argv0)s [options] DEVICE1 DEVICE2 
> > +
> > +where DEVICE1, DEVICE2 etc, are specified via PCI
> > +"domain:bus:slot.func" syntax or "bus:slot.func" syntax. For devices
> > +bound to Linux kernel drivers, they may also be referred to by Linux 
> > interface
> name e.g. eth0, eth1, em0, em1, etc.
> > +
> > +Options:
> > +--help, --usage:
> > +Display usage information and quit
> > +
> > +-s --set:
> > +Set the following pci device
> > +
> > +-u --Unset:
> > +Unset the following pci device
> > +
> > +Examples:
> > +-
> > +To set pci 0a:00.0
> > +%(argv0)s -s 0a:00.0
> > +%(argv0)s --set 0a:00.0
> > +
> > +To unset :01:00.0
> > +%(argv0)s -u :01:00.0
> > +%(argv0)s --unset :01:00.0
> > +
> > +To set :02:00.0 and :02:00.1
> > +%(argv0)s -s 02:00.0 02:00.1
> > +
> > +""" % locals()  # replace items from local variables
> > +
> > +
> > +def parse_args():
> > +global flag
> > +global dev_ids
> > +if len(sys.argv) <= 1:
> > +usage()
> > +sys.exit(0)
> > +try:
> > +opts, dev_ids = getopt.getopt(
> > +sys.argv[1:],
> > +"su",
> > +["help", "usage", "set", "unset"]
> > +)
> > +except getopt.GetoptError, error:
> > +print str(error)
> > +print "Run '%s --usage' for further information" % sys.argv[0]
> > +sys.exit(1)
> > +
> > +for opt, arg in opts:
> > +if opt == "--help" or opt == "--usage":
> > +usage()
> > +sys.exit(0)
> > +if opt == "-s" or opt == "--set":
> > +flag = "Set"
> > +if opt == "-u" or opt == "--unset":
> > +flag = "Unset"
> > +
> > +
> > +def check_output(args, stderr=None):
> > +'''Run a command and capture its output'''
> > +return subprocess.Popen(
> > +args,
> > +stdout=subprocess.PIPE,
> > +stderr=stderr
> > +).communicate()[0]
> > +
> > +
> > +def set_pci():
> > +if len(dev_ids) == 0:
> > +print "Error: No devices specified."
> > +print "Run '%s --usage' for further information" % sys.argv[0]
> > +sys.exit(1)
> > +param_cap = "%x.W" % PCI_DEV_CAP_REG
> > +for k in range(len(dev_ids)):
> > +val = check_output(["setpci", "-s", dev_ids[k], param_cap])
> > +if (not (int(val, 16) & PCI_DEV_CAP_EXT_TAG_MASK)):
> > +print dev_ids[k], "Not supported"
> > +continue
> > +if (int(val, 16) & PCI_DEV_CTRL_EXT_TAG_MASK):
> > +continue
> > +param_ctrl = "%x.W" % PCI_DEV_CTRL_REG
> > +val =

[dpdk-dev] [PATCH] eal: remove unnecessary #ifdef CONFIG_BQL

2015-03-24 Thread Zhang, Helin

Hi Alex

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Gartrell
> Sent: Saturday, March 14, 2015 3:22 AM
> To: dev at dpdk.org
> Cc: kernel-team at fb.com
> Subject: [dpdk-dev] [PATCH] eal: remove unnecessary #ifdef CONFIG_BQL
> 
> I couldn't figure out why this #ifdef existed so I looked around upstream and 
> it's
> not there.  It seems to build just fine without it.

You can see in igb/kcomat.h of below code section.

#ifndef CONFIG_BQL
#define netdev_tx_completed_queue(_q, _p, _b) do {} while (0)
#define netdev_completed_queue(_n, _p, _b) do {} while (0)
#define netdev_tx_sent_queue(_q, _b) do {} while (0)
#define netdev_sent_queue(_n, _b) do {} while (0)
#define netdev_tx_reset_queue(_q) do {} while (0)
#define netdev_reset_queue(_n) do {} while (0)
#endif

That means all callers of txring_txq() are conditional compiled. I don't think 
your
modifications are needed.

Regards,
Helin

> 
> Signed-off-by: Alex Gartrell 
> ---
>  lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h
> b/lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h
> index a582f52..d58e1f3 100644
> --- a/lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h
> +++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h
> @@ -472,12 +472,10 @@ static inline u16 igb_desc_unused(const struct
> igb_ring *ring)
>   return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;  }
> 
> -#ifdef CONFIG_BQL
>  static inline struct netdev_queue *txring_txq(const struct igb_ring *tx_ring)
> {
>   return netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);  }
> -#endif /* CONFIG_BQL */
> 
>  // #ifdef EXT_THERMAL_SENSOR_SUPPORT
>  // #ifdef IGB_PROCFS
> --
> Alex Gartrell

[dpdk-dev] Need info on --vdev option

2015-03-24 Thread Shankari Vaidyalingam

Hi Olivier,

Thanks a lot.
I changed the ordering and it worked. But the pcap file was empty even when
I specified the rx interface.
I have a basic doubt.  When the ethernet interfaces are bound to teh igb
driver they become user interfaces and no longer shown in "ifconfig"
command output. By what name can we identify those interfaces after they
get bound to igb driver and become dpdk interfaces?

Regards
Shankari.V


On Mon, Mar 23, 2015 at 9:31 PM, Olivier MATZ 
wrote:

> Hi Shankari,
>
> On 03/23/2015 04:54 PM, Shankari Vaidyalingam wrote:
> > Hi
> >
> > I'm trying to do a packet capture on the DPDK interface while running
> l2fwd
> > sample application and injecting packets from a traffic generator.
> > I'm getting the below error when I give this command: sudo ./build/l2fw-c
> > 0x03 -n 2 -- -p 0x03 --vdev
> > 'eth_pcap0,tx_pcap=/home/controller/pkt_capt/try.pcap'
> >
> > [...]
> > EAL:   :00:11.0 not managed by UIO driver, skipping
> > ./build/l2fwd: unrecognized option '--vdev'
> > ./build/l2fwd [EAL options] -- -p PORTMASK [-q NQ]
> >   -p PORTMASK: hexadecimal bitmask of ports to configure
> >   -q NQ: number of queue (=ports) per lcore (default is 1)
> >   -T PERIOD: statistics will be refreshed each PERIOD seconds (0 to
> > disable, 10 default, 86400 maximum)
> > EAL: Error - exiting with code: 1
> >   Cause: Invalid L2FWD arguments
>
> Please note the position of "--" in your command line. This is
> used to separate the arguments of eal and application. As --vdev
> is an eal argument, it should go before the "--".
>
> Regards,
> Olivier
>

50 matches

Mail list logo