Re: [lng-odp] ODP packet reference API usecase

2018-02-20 Thread Bogdan Pricope
On 19 February 2018 at 16:32, Bill Fischofer  wrote:
> On Mon, Feb 19, 2018 at 7:36 AM, Bogdan Pricope
>  wrote:
>>
>> On 17 February 2018 at 03:36, Bill Fischofer  
>> wrote:
>> > Changed to post this to the ODP mailing list, as this is a good topic to
>> > discuss there.
>> >
>> > Let's first back up a bit and discuss the intent behind packet references.
>> > Packets typically consist of one or more headers followed by a payload.
>> > References are designed to allow multiple unique headers to share a common
>> > payload, which must be treated as read/only. This is because headers
>> > typically have things like unique destination addresses (for multicast) or
>> > IP identifiers (for retransmission). A fundamental assumption of the ODP
>> > programming model is that a packet handle is owned by only one thread at a
>> > time. If two or more threads were to make simultaneous accesses to a packet
>> > through the same handle, the result would be unpredictable race conditions
>> > and likely data corruption.This is why the shared part of any packet
>> > reference must be treated as read only by all parties.
>> >
>> > So any packet reference consists of a R/W prefix, which is unique to that
>> > reference, followed by a (potentially) shared R/O suffix. The point at 
>> > which
>> > the R/O suffix begins is set by the offset parameter. For dynamic 
>> > references
>> > the prefix can either start out null, which is what odp_packet_ref() does,
>> > or can be supplied by the caller, which is what odp_packet_ref_pkt() does.
>> > In both cases the R/W prefix can be modified
>> >
>> > With this background let's look at the specifics.
>> >
>> > On Fri, Feb 16, 2018 at 9:31 AM, Bogdan Pricope 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am trying to use ODP packet reference API to improve transmission of
>> >> TCP data. Yet, it seems current API is not enough for this.
>> >>
>> >> Usecase:  three data areas stored inside odp_packet_t
>> >>
>> >> mb1: size 40 bytes – to send last 20 bytes
>> >> mb2: size 40 bytes – to send 40 bytes
>> >> mb3: size 40 bytes – to send first 20 bytes
>> >>
>> >
>> > Are you looking at three areas within the same packet or three areas that
>> > are in separate packets? You cannot create a reference to part of a packet.
>> > The only control is the starting offset within the packet that is to become
>> > the shared portion of the reference. That extends to the end of the packet
>> > on which the reference is created. We had considered adding a length
>> > parameter but that would pose undue implementation complications.
>>
>> Each data area is in separate odp packet: this is generated by 3 calls
>> to an API equivalent to:
>>
>> ssize_t send(int sockfd, const void *buf, size_t len, int flags);
>>
>> Buffer 'buf' provided by application is stored as odp_packet_t inside
>> socket send buffer array.
>
>
> You can certainly use odp_packet_offset() to get addressability to a
> particular offset within a packet and send() from there. You just need
> to be aware of potential segmentation along the way (seg_len value
> returned from the odp_packet_offset() call). If you need a particular
> offset/len to be contiguously addressable odp_packet_align() may be
> used to force that, but that can be an expensive operation if the
> packet has to be resegmented to meet the needs expressed by the call.

I meant, send() calls are generating the tree data areas:
 - application calls send()
 - ofp identifies socket structure
 - ofp copy buf in an odp_packet_t and store it in socket's send sockbuf.

Usually, it will try to send packet form sockbuf immediately, but in
congestion situations, it may stay longer in sockbuf.

Eventually, send window is long enough and ofp will try to aggregate
multiple sends into one bigger packet.

Right now, this is done with memcpy(), but I was wondering if we can
use packet ref APIs

>
>>
>>
>> Packetization function receives as arguments: a list of payload
>> buffers (odp_packet_t), offset (start of the un-ack data) and
>> a len (min(data available, send window size)). Length parameter is needed.
>
>
> The odp_pktout_send() API currently does not support sending partial
> packets, however you can simulate this with references. If you've
> ACK'd a bytestream up to a certain point and the bytestream is held in
> an odp_packet_t, you can create a reference at the ACK point and then
> push the transmit headers you need onto that. For example:
>
> odp_packet_t bytestream;
> int ack_point;
> odp_packet_t xmit_hdrs;
> ...setup xmit_hdrs as needed
>
> odp_packet_t xmit_pkt = odp_packet_ref_pkt(bytestream, ack_point, xmit_hdrs);
>
> odp_pktout_send(pktio, xmit_pkt, 1);
>
> ...etc.
>
> The xmit_pkt consists of the xmit_hdrs followed by the bytestream
> starting at offset ack_point, which is what it sounds like you're
> asking for.
>

Ideally, odp_packet_pull_tail() would work on 

Re: [lng-odp] ODP packet reference API usecase

2018-02-19 Thread Bill Fischofer
On Mon, Feb 19, 2018 at 7:36 AM, Bogdan Pricope
 wrote:
>
> On 17 February 2018 at 03:36, Bill Fischofer  
> wrote:
> > Changed to post this to the ODP mailing list, as this is a good topic to
> > discuss there.
> >
> > Let's first back up a bit and discuss the intent behind packet references.
> > Packets typically consist of one or more headers followed by a payload.
> > References are designed to allow multiple unique headers to share a common
> > payload, which must be treated as read/only. This is because headers
> > typically have things like unique destination addresses (for multicast) or
> > IP identifiers (for retransmission). A fundamental assumption of the ODP
> > programming model is that a packet handle is owned by only one thread at a
> > time. If two or more threads were to make simultaneous accesses to a packet
> > through the same handle, the result would be unpredictable race conditions
> > and likely data corruption.This is why the shared part of any packet
> > reference must be treated as read only by all parties.
> >
> > So any packet reference consists of a R/W prefix, which is unique to that
> > reference, followed by a (potentially) shared R/O suffix. The point at which
> > the R/O suffix begins is set by the offset parameter. For dynamic references
> > the prefix can either start out null, which is what odp_packet_ref() does,
> > or can be supplied by the caller, which is what odp_packet_ref_pkt() does.
> > In both cases the R/W prefix can be modified
> >
> > With this background let's look at the specifics.
> >
> > On Fri, Feb 16, 2018 at 9:31 AM, Bogdan Pricope 
> > wrote:
> >>
> >> Hi,
> >>
> >> I am trying to use ODP packet reference API to improve transmission of
> >> TCP data. Yet, it seems current API is not enough for this.
> >>
> >> Usecase:  three data areas stored inside odp_packet_t
> >>
> >> mb1: size 40 bytes – to send last 20 bytes
> >> mb2: size 40 bytes – to send 40 bytes
> >> mb3: size 40 bytes – to send first 20 bytes
> >>
> >
> > Are you looking at three areas within the same packet or three areas that
> > are in separate packets? You cannot create a reference to part of a packet.
> > The only control is the starting offset within the packet that is to become
> > the shared portion of the reference. That extends to the end of the packet
> > on which the reference is created. We had considered adding a length
> > parameter but that would pose undue implementation complications.
>
> Each data area is in separate odp packet: this is generated by 3 calls
> to an API equivalent to:
>
> ssize_t send(int sockfd, const void *buf, size_t len, int flags);
>
> Buffer 'buf' provided by application is stored as odp_packet_t inside
> socket send buffer array.


You can certainly use odp_packet_offset() to get addressability to a
particular offset within a packet and send() from there. You just need
to be aware of potential segmentation along the way (seg_len value
returned from the odp_packet_offset() call). If you need a particular
offset/len to be contiguously addressable odp_packet_align() may be
used to force that, but that can be an expensive operation if the
packet has to be resegmented to meet the needs expressed by the call.

>
>
> Packetization function receives as arguments: a list of payload
> buffers (odp_packet_t), offset (start of the un-ack data) and
> a len (min(data available, send window size)). Length parameter is needed.


The odp_pktout_send() API currently does not support sending partial
packets, however you can simulate this with references. If you've
ACK'd a bytestream up to a certain point and the bytestream is held in
an odp_packet_t, you can create a reference at the ACK point and then
push the transmit headers you need onto that. For example:

odp_packet_t bytestream;
int ack_point;
odp_packet_t xmit_hdrs;
...setup xmit_hdrs as needed

odp_packet_t xmit_pkt = odp_packet_ref_pkt(bytestream, ack_point, xmit_hdrs);

odp_pktout_send(pktio, xmit_pkt, 1);

...etc.

The xmit_pkt consists of the xmit_hdrs followed by the bytestream
starting at offset ack_point, which is what it sounds like you're
asking for.

>
>
> >
> >>
> >>
> >> Resulting packet should look like:
> >> mb0: len 32 bytes to hold IPv4 + TCP headers
> >> reference to mb1: 20 bytes offset, 20 bytes len
> >> reference to mb2: 0 bytes offset, 40 bytes len
> >> reference to mb3: 0 offset, 20 bytes len
> >
> >
> > What you're describing here is either a composite packet that is assembled
> > from multiple different packets via odp_packet_concat(), or a compound
> > reference where the packet on which you're creating a reference is itself a
> > reference. You cannot create a reference directly to more than one packet.
>
> I meant a compound reference.
>
> >
> >>
> >>
> >>
> >> API calls:
> >> pkt = odp_packet_alloc(pool, 32);
> >> pkt = odp_packet_ref_pkt(mb1, 20, pkt);
> >> pkt = odp_packet_ref_pkt(mb2, 0, pkt);
> 

Re: [lng-odp] ODP packet reference API usecase

2018-02-19 Thread Bogdan Pricope
On 17 February 2018 at 03:36, Bill Fischofer  wrote:
> Changed to post this to the ODP mailing list, as this is a good topic to
> discuss there.
>
> Let's first back up a bit and discuss the intent behind packet references.
> Packets typically consist of one or more headers followed by a payload.
> References are designed to allow multiple unique headers to share a common
> payload, which must be treated as read/only. This is because headers
> typically have things like unique destination addresses (for multicast) or
> IP identifiers (for retransmission). A fundamental assumption of the ODP
> programming model is that a packet handle is owned by only one thread at a
> time. If two or more threads were to make simultaneous accesses to a packet
> through the same handle, the result would be unpredictable race conditions
> and likely data corruption.This is why the shared part of any packet
> reference must be treated as read only by all parties.
>
> So any packet reference consists of a R/W prefix, which is unique to that
> reference, followed by a (potentially) shared R/O suffix. The point at which
> the R/O suffix begins is set by the offset parameter. For dynamic references
> the prefix can either start out null, which is what odp_packet_ref() does,
> or can be supplied by the caller, which is what odp_packet_ref_pkt() does.
> In both cases the R/W prefix can be modified
>
> With this background let's look at the specifics.
>
> On Fri, Feb 16, 2018 at 9:31 AM, Bogdan Pricope 
> wrote:
>>
>> Hi,
>>
>> I am trying to use ODP packet reference API to improve transmission of
>> TCP data. Yet, it seems current API is not enough for this.
>>
>> Usecase:  three data areas stored inside odp_packet_t
>>
>> mb1: size 40 bytes – to send last 20 bytes
>> mb2: size 40 bytes – to send 40 bytes
>> mb3: size 40 bytes – to send first 20 bytes
>>
>
> Are you looking at three areas within the same packet or three areas that
> are in separate packets? You cannot create a reference to part of a packet.
> The only control is the starting offset within the packet that is to become
> the shared portion of the reference. That extends to the end of the packet
> on which the reference is created. We had considered adding a length
> parameter but that would pose undue implementation complications.

Each data area is in separate odp packet: this is generated by 3 calls
to an API equivalent to:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Buffer 'buf' provided by application is stored as odp_packet_t inside
socket send buffer array.

Packetization function receives as arguments: a list of payload
buffers (odp_packet_t), offset (start of the un-ack data) and
a len (min(data available, send window size)). Length parameter is needed.

>
>>
>>
>> Resulting packet should look like:
>> mb0: len 32 bytes to hold IPv4 + TCP headers
>> reference to mb1: 20 bytes offset, 20 bytes len
>> reference to mb2: 0 bytes offset, 40 bytes len
>> reference to mb3: 0 offset, 20 bytes len
>
>
> What you're describing here is either a composite packet that is assembled
> from multiple different packets via odp_packet_concat(), or a compound
> reference where the packet on which you're creating a reference is itself a
> reference. You cannot create a reference directly to more than one packet.

I meant a compound reference.

>
>>
>>
>>
>> API calls:
>> pkt = odp_packet_alloc(pool, 32);
>> pkt = odp_packet_ref_pkt(mb1, 20, pkt);
>> pkt = odp_packet_ref_pkt(mb2, 0, pkt);
>> pkt = odp_packet_ref_pkt(mb3, 0, pkt);
>
>
> Not sure if this is an omission, but having pkt on both sides of the
> assignment here is attempting to create a circular reference (pkt refers to

I thought it works like:

pkt = pkt + new_buff;

> itself) and that is expressly forbidden by the APIs. The proper way to do
> what I think you're trying to do is:
>
> odp_packet_t mb0, mb1, mb2, mb3, refpkt;
>
> ... setup mb0 to hold 14 bytes of Ethernet header
> ... setup mb1 to hold 20 bytes of IPv4 header
> ... setup mb2 to hold 20 bytes of TCP header
> ... setup mb3 to hold your payload
>
> refpkt = odp_packet_ref_pkt(odp_packet_ref_pkt(odp_packet_ref_pkt(mb3, 0,
> mb2), 0, mb1), 0, mb0);
>

It should work (except for missing length part) on single thread
scenario, yet is an odd construction.
Not sure if single thread scenario is good enough for this usecase.


> You of course could use intermediate assignments to capture the handles for
> the intermediate reference.
>
> The result is refpkt is the handle to the composite reference {mb0 || mb1 ||
> mb2 || mb3}, where only mb0 is R/W and all others must be treated as R/O. Of
> the original packets you now have (in linux-generic, other implementations
> may be slightly different).
>
> refpkt has a ref_count of 1 since it is a unique (unreferenced) packet.
> mb0 has a ref_count of 1 since it is returned as refpkt.
> mb1 has a ref_count of 2 since is is referred by to 

Re: [lng-odp] ODP packet reference API usecase

2018-02-16 Thread Bill Fischofer
Changed to post this to the ODP mailing list, as this is a good topic to
discuss there.

Let's first back up a bit and discuss the intent behind packet references.
Packets typically consist of one or more headers followed by a payload.
References are designed to allow multiple unique headers to share a common
payload, which must be treated as read/only. This is because headers
typically have things like unique destination addresses (for multicast) or
IP identifiers (for retransmission). A fundamental assumption of the ODP
programming model is that a packet handle is owned by only one thread at a
time. If two or more threads were to make simultaneous accesses to a packet
through the same handle, the result would be unpredictable race conditions
and likely data corruption.This is why the shared part of any packet
reference must be treated as read only by all parties.

So any packet reference consists of a R/W prefix, which is unique to that
reference, followed by a (potentially) shared R/O suffix. The point at
which the R/O suffix begins is set by the offset parameter. For dynamic
references the prefix can either start out null, which is what
odp_packet_ref() does, or can be supplied by the caller, which is what
odp_packet_ref_pkt() does. In both cases the R/W prefix can be modified

With this background let's look at the specifics.

On Fri, Feb 16, 2018 at 9:31 AM, Bogdan Pricope 
wrote:

> Hi,
>
> I am trying to use ODP packet reference API to improve transmission of
> TCP data. Yet, it seems current API is not enough for this.
>
> Usecase:  three data areas stored inside odp_packet_t
>
> mb1: size 40 bytes – to send last 20 bytes
> mb2: size 40 bytes – to send 40 bytes
> mb3: size 40 bytes – to send first 20 bytes
>
>
Are you looking at three areas within the same packet or three areas that
are in separate packets? You cannot create a reference to part of a packet.
The only control is the starting offset within the packet that is to become
the shared portion of the reference. That extends to the end of the packet
on which the reference is created. We had considered adding a length
parameter but that would pose undue implementation complications.


>
> Resulting packet should look like:
> mb0: len 32 bytes to hold IPv4 + TCP headers
> reference to mb1: 20 bytes offset, 20 bytes len
> reference to mb2: 0 bytes offset, 40 bytes len
> reference to mb3: 0 offset, 20 bytes len
>

What you're describing here is either a composite packet that is assembled
from multiple different packets via odp_packet_concat(), or a compound
reference where the packet on which you're creating a reference is itself a
reference. You cannot create a reference directly to more than one packet.


>
>
> API calls:
> pkt = odp_packet_alloc(pool, 32);
> pkt = odp_packet_ref_pkt(mb1, 20, pkt);
> pkt = odp_packet_ref_pkt(mb2, 0, pkt);
> pkt = odp_packet_ref_pkt(mb3, 0, pkt);
>

Not sure if this is an omission, but having pkt on both sides of the
assignment here is attempting to create a circular reference (pkt refers to
itself) and that is expressly forbidden by the APIs. The proper way to do
what I think you're trying to do is:

odp_packet_t mb0, mb1, mb2, mb3, refpkt;

... setup mb0 to hold 14 bytes of Ethernet header
... setup mb1 to hold 20 bytes of IPv4 header
... setup mb2 to hold 20 bytes of TCP header
... setup mb3 to hold your payload

refpkt = odp_packet_ref_pkt(odp_packet_ref_pkt(odp_packet_ref_pkt(mb3, 0,
mb2), 0, mb1), 0, mb0);

You of course could use intermediate assignments to capture the handles for
the intermediate reference.

The result is refpkt is the handle to the composite reference {mb0 || mb1
|| mb2 || mb3}, where only mb0 is R/W and all others must be treated as
R/O. Of the original packets you now have (in linux-generic, other
implementations may be slightly different).

refpkt has a ref_count of 1 since it is a unique (unreferenced) packet.
mb0 has a ref_count of 1 since it is returned as refpkt.
mb1 has a ref_count of 2 since is is referred by to mb0/refpkt.
mb2 has a ref_count of 3 since it is referred to by mb1 and (indirectly) by
mb0/refpkt.
mb3 has a ref_count of 4 since it is referred to by mb2 and (indirectly) by
both mb1, and mb0/refpkt.


>
> Two problems:
> 1.   Cannot add mb2 and mb3 because pkt has references after first
> call to odp_packet_ref_pkt()
>
>  * The packet handle ('pkt') may itself be a (dynamic) reference to a
> packet,
>  * but the header packet handle ('hdr') must be unique.
>
> odp_packet_t odp_packet_ref_pkt(odp_packet_t pkt, uint32_t offset,
>
> odp_packet_t hdr);
>
> 2.   Although packet reference can start with an offset, it cannot
> be less that all the rest of the packet (no size).
>

Hopefully the above explanation makes sense.


>
> Is there any other way to create this packet structure? Maybe with
> odp_packet_concat() / odp_packet_split()?
>

odp_packet_concat() is actually quite efficient in the current
linux-generic implementation since