Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-19 Thread Peltonen, Janne (Nokia - FI/Espoo)

Ola Liljedahl [mailto:ola.liljed...@linaro.org] wrote:
> 
> On 10 April 2017 at 10:56, Peltonen, Janne (Nokia - FI/Espoo)
>  wrote:
> > Hi,
> >
> > Ola Liljedahl  wrote:
> >> Peltonen, Janne (Nokia - FI/Espoo)  wrote:
> >> > In an IPsec GW (as a use case example) one might want to do all
> >> > stateless processing (like ingress and egress IP processing and the
> >> > crypto operations) in ordered contexts to get parallelism, but the
> >> > stateful part (replay-check and sequence number generation) in
> >> > an atomic context (or holding an ordered lock) so that not only
> >> > the packets are output in ingress order but also their sequence
> >> > numbers are in the same order.
> >>
> >> To what extent does IPsec sequence number ordering have to equal actual
> >> transmission order? If an IPsec GW preserves ingress-to-egress packet 
> >> order,
> >> does it matter that this order might not be the same as the IPsec sequence
> >> number order? If IPsec SN's are not used for reordering, just for replay
> >> protection, I don't see why the two number series have to match.
> >
> > The sequence numbers of course do not need to fully match the transmission
> > order because the anti-replay window mechanism can tolerate out-of-sequence
> > packets (caused either by packet reordering in the network or by sequence
> > number assignment not quite following the transmission order.
> >
> > But assigning sequence numbers out of order with respect to the packet
> > order (which hopefully stays the same between the ingress and egress of
> > an SGW) eats into the out-of-sequence tolerance budget (i.e. the window
> > size at the other end) and leaves less of the budget for actual
> > reordering in the network.
> >
> > Whether out-of-sequence sequence number assignment is ok or problematic
> > depends on the peer configuration, network, possible QoS induced
> > packet reordering and the magnitude of the possible sequence number
> > (not packet) reordering with respect to transmission order in the sender.
> >
> > Often the antireplay window is something like 32 or 64 packets
> That seems like very small window(s). I understand the simplicity enabled
> by such small windows but is it really enough for modern high speed networks
> with multiple L2/L3 switch/router hops (possibly with some link aggregation
> in there as well).

Maybe it is, maybe it isn't. But in the spirit of being conservative
in what one sends to the network, it would not be so nice to have an
implementation that puts additional requirements like bigger
windows to the peers compared to an implementation that does not.

> 
> > and maybe
> > not all of that can be used by the IPsec sender for relaxed ordering of
> > the sequence number assignment. One issue is that the size of the replay
> > window is not negotiated so the sender cannot tell the receiver that
> > a bigger window than normal is needed.
> The receiver should be able to adapt the size of antireplay window by 
> monitoring
> the amount of (supposedly) stale packets (SN's). Has such a design been tried?

I am not aware of any such implementation (which does not mean that
they do not exist).

> Do you have any "quality" requirements here, how large proportion of packets 
> is
> allowed to be dropped due to limited size of antireplay window? I assume there
> are higher level SLA's that control packet loss, perhaps it is up to
> the service provider
> to use that packet loss budget as it sees fit.
> 
> >
> >> > That said, some might argue that IPsec replay window can take care
> >> > not only of packet reordering in the network but also of reordering
> >> > inside an IPsec GW and therefore the atomic context (or ordered lock)
> >> > is not necessarily needed in all implementations.
> >>
> >> Do you mean that the replay protection also should be responsible for
> >> end-to-end (IPsec GW to GW) order restoration?
> >
> > No, I do not mean that and I think it is not in general correct for
> > an IPsec GW to reorder received packets to the sequence number order.
> If order restoration adds latency, it can definitively do harm. And even if it
> could be done without adding latency (e.g. in some queue), we don't know if
> the SN order is the "real" order and order restoration actually is beneficial.
> 
> >
> > What I mean (but formulated sloppily) is that the window mechanism of
> > replay protection can tolerate out-of-sequence sequence numbers to some
> > extent even when the cause is not the network but the sending IPsec GW.
> Well you could consider (parts of) the IPsec GW itself to be part of
> the network...
> Where does the network start? You can consider the transmitting NIC or the
> cables the start of the network but what if you have link aggregation with
> independent NIC's? The network must have started at some earlier shared
> point where there is an unambiguous packet order. Is there always such a 
> 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-16 Thread Ola Liljedahl
On 10 April 2017 at 10:56, Peltonen, Janne (Nokia - FI/Espoo)
 wrote:
> Hi,
>
> Ola Liljedahl  wrote:
>> Peltonen, Janne (Nokia - FI/Espoo)  wrote:
>> > In an IPsec GW (as a use case example) one might want to do all
>> > stateless processing (like ingress and egress IP processing and the
>> > crypto operations) in ordered contexts to get parallelism, but the
>> > stateful part (replay-check and sequence number generation) in
>> > an atomic context (or holding an ordered lock) so that not only
>> > the packets are output in ingress order but also their sequence
>> > numbers are in the same order.
>>
>> To what extent does IPsec sequence number ordering have to equal actual
>> transmission order? If an IPsec GW preserves ingress-to-egress packet order,
>> does it matter that this order might not be the same as the IPsec sequence
>> number order? If IPsec SN's are not used for reordering, just for replay
>> protection, I don't see why the two number series have to match.
>
> The sequence numbers of course do not need to fully match the transmission
> order because the anti-replay window mechanism can tolerate out-of-sequence
> packets (caused either by packet reordering in the network or by sequence
> number assignment not quite following the transmission order.
>
> But assigning sequence numbers out of order with respect to the packet
> order (which hopefully stays the same between the ingress and egress of
> an SGW) eats into the out-of-sequence tolerance budget (i.e. the window
> size at the other end) and leaves less of the budget for actual
> reordering in the network.
>
> Whether out-of-sequence sequence number assignment is ok or problematic
> depends on the peer configuration, network, possible QoS induced
> packet reordering and the magnitude of the possible sequence number
> (not packet) reordering with respect to transmission order in the sender.
>
> Often the antireplay window is something like 32 or 64 packets
That seems like very small window(s). I understand the simplicity enabled
by such small windows but is it really enough for modern high speed networks
with multiple L2/L3 switch/router hops (possibly with some link aggregation
in there as well).

> and maybe
> not all of that can be used by the IPsec sender for relaxed ordering of
> the sequence number assignment. One issue is that the size of the replay
> window is not negotiated so the sender cannot tell the receiver that
> a bigger window than normal is needed.
The receiver should be able to adapt the size of antireplay window by monitoring
the amount of (supposedly) stale packets (SN's). Has such a design been tried?
Do you have any "quality" requirements here, how large proportion of packets is
allowed to be dropped due to limited size of antireplay window? I assume there
are higher level SLA's that control packet loss, perhaps it is up to
the service provider
to use that packet loss budget as it sees fit.

>
>> > That said, some might argue that IPsec replay window can take care
>> > not only of packet reordering in the network but also of reordering
>> > inside an IPsec GW and therefore the atomic context (or ordered lock)
>> > is not necessarily needed in all implementations.
>>
>> Do you mean that the replay protection also should be responsible for
>> end-to-end (IPsec GW to GW) order restoration?
>
> No, I do not mean that and I think it is not in general correct for
> an IPsec GW to reorder received packets to the sequence number order.
If order restoration adds latency, it can definitively do harm. And even if it
could be done without adding latency (e.g. in some queue), we don't know if
the SN order is the "real" order and order restoration actually is beneficial.

>
> What I mean (but formulated sloppily) is that the window mechanism of
> replay protection can tolerate out-of-sequence sequence numbers to some
> extent even when the cause is not the network but the sending IPsec GW.
Well you could consider (parts of) the IPsec GW itself to be part of
the network...
Where does the network start? You can consider the transmitting NIC or the
cables the start of the network but what if you have link aggregation with
independent NIC's? The network must have started at some earlier shared
point where there is an unambiguous packet order. Is there always such a point?

>
> So, depending on the implementation and on the circumstances, one might
> want to ensure that sequence number gets assigned in the transmission
> order or one might decide not to worry about it and let the window
> mechanism in the receiver handle it.
Isn't it really the sending IPsec GW's *ingress* order that is interesting?
Ideally IPsec SN allocation and then egress (transmission) order follows the
ingress order. The SN cannot be allocated until the proper SA has been
identified
so there is some potential for early reordering (unless atomic queue or ordered

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-10 Thread Peltonen, Janne (Nokia - FI/Espoo)
Hi,

Ola Liljedahl  wrote:
> Peltonen, Janne (Nokia - FI/Espoo)  wrote:
> > In an IPsec GW (as a use case example) one might want to do all
> > stateless processing (like ingress and egress IP processing and the
> > crypto operations) in ordered contexts to get parallelism, but the
> > stateful part (replay-check and sequence number generation) in
> > an atomic context (or holding an ordered lock) so that not only
> > the packets are output in ingress order but also their sequence
> > numbers are in the same order.
>
> To what extent does IPsec sequence number ordering have to equal actual
> transmission order? If an IPsec GW preserves ingress-to-egress packet order,
> does it matter that this order might not be the same as the IPsec sequence
> number order? If IPsec SN's are not used for reordering, just for replay
> protection, I don't see why the two number series have to match.

The sequence numbers of course do not need to fully match the transmission
order because the anti-replay window mechanism can tolerate out-of-sequence
packets (caused either by packet reordering in the network or by sequence
number assignment not quite following the transmission order.

But assigning sequence numbers out of order with respect to the packet
order (which hopefully stays the same between the ingress and egress of
an SGW) eats into the out-of-sequence tolerance budget (i.e. the window
size at the other end) and leaves less of the budget for actual
reordering in the network.

Whether out-of-sequence sequence number assignment is ok or problematic
depends on the peer configuration, network, possible QoS induced
packet reordering and the magnitude of the possible sequence number
(not packet) reordering with respect to transmission order in the sender.

Often the antireplay window is something like 32 or 64 packets and maybe
not all of that can be used by the IPsec sender for relaxed ordering of
the sequence number assignment. One issue is that the size of the replay
window is not negotiated so the sender cannot tell the receiver that
a bigger window than normal is needed.

> > That said, some might argue that IPsec replay window can take care
> > not only of packet reordering in the network but also of reordering
> > inside an IPsec GW and therefore the atomic context (or ordered lock)
> > is not necessarily needed in all implementations.
>
> Do you mean that the replay protection also should be responsible for
> end-to-end (IPsec GW to GW) order restoration?

No, I do not mean that and I think it is not in general correct for
an IPsec GW to reorder received packets to the sequence number order.

What I mean (but formulated sloppily) is that the window mechanism of
replay protection can tolerate out-of-sequence sequence numbers to some
extent even when the cause is not the network but the sending IPsec GW.

So, depending on the implementation and on the circumstances, one might
want to ensure that sequence number gets assigned in the transmission
order or one might decide not to worry about it and let the window
mechanism in the receiver handle it.

> It would be great if each QoS class would have its own IPsec SA but
> is that always the case?

Yes it would and that is what the RFC suggests but it is not often
the case. And since some antireplay window needs to be left for
QoS-caused out-of-order tolerance, it may not always be a good idea
to have the sender essentially consume a big chunk of the window even
before the IPsec packets enter the network.

Janne




Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-07 Thread Ola Liljedahl
On 7 April 2017 at 08:40, Peltonen, Janne (Nokia - FI/Espoo) <
janne.pelto...@nokia.com> wrote:

> Hi,
>
> On Thu, Apr 6, 2017 at 1:46 PM, Bill Fischofer 
> wrote:
> > On Thu, Apr 6, 2017 at 1:32 PM, Ola Liljedahl 
> wrote:
> > > On 6 April 2017 at 13:48, Jerin Jacob 
> wrote:
>
> > >> We see ORDERED->ATOMIC as main use case for basic packet forward.Stage
> > >> 1(ORDERED) to process on N cores and Stage2(ATOMIC) to maintain the
> ingress
> > >> order.
> > > Doesn't ORDERED scheduling maintain the ingress packet order all the
> > > way to the egress interface? A least that's my understanding of ODP
> > > ordered queues.
> > > From an ODP perspective, I fail to see how the ATOMIC stage is needed.
>
> For basic IP forwarding I also do not see why an atomic stage would be
> needed, but for stateful things like IPsec or some application specific
> higher layer processing the situation can be different.
>
> At the risk of stating the obvious: Ordered scheduling maintains ingress
> order when packets are placed in the next queue (toward the next pipeline
> stage or to pktout), but it allows parallel processing of packets of the
> same flow between the points where order is maintained. To guarantee packet
> processing in the ingress order in some section of code, the code needs
> to be executed in an atomic context or protected using an ordered lock.
>
> > As pointed out earlier, ordered locks are another option to avoid a
> > separate processing stage simply to do in-sequence operations within
> > an ordered flow. I'd be curious to understand the use-case in a bit
> > more detail here. Ordered queues preserve the originating queue's
> > order, however to achieve end-to-end ordering involving multiple
> > processing stages requires that flows traverse only ordered or atomic
> > queues. If a parallel queue is used ordering is indeterminate from
> > that point on in the pipeline.
>
> Exactly.
>
> In an IPsec GW (as a use case example) one might want to do all
> stateless processing (like ingress and egress IP processing and the
> crypto operations) in ordered contexts to get parallelism, but the
> stateful part (replay-check and sequence number generation) in
> an atomic context (or holding an ordered lock) so that not only
> the packets are output in ingress order but also their sequence
> numbers are in the same order.
>
To what extent does IPsec sequence number ordering have to equal actual
transmission order?
If an IPsec GW preserves ingress-to-egress packet order, does it matter
that this order might not be the same as the IPsec sequence number order?
If IPsec SN's are not used for reordering, just for replay protection, I
don't see why the two number series have to match.


> That said, some might argue that IPsec replay window can take care
> not only of packet reordering in the network but also of reordering
> inside an IPsec GW and therefore the atomic context (or ordered lock)
> is not necessarily needed in all implementations.
>
Do you mean that the replay protection also should be responsible for
end-to-end (IPsec GW to GW) order restoration? Doesn't that mean that
packets might have to be saved until their SN leaves the replay window (if
there are missing packets/SN's that we are waiting for). Wouldn't this add
a lot of latency when waiting for missing packets? Latency affecting
packets in unrelated flows which don't care about that missing/late packet.
Can't individual IPsec packets have different QoS requirements? You don't
want a latency sensitive packet to have to wait for an earlier missing
packet. It would be great if each QoS class would have its own IPsec SA but
is that always the case?




> Janne
>
>
>


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-07 Thread Maxim Uvarov
I forget to note that it will be great to update .travis file for
testing new option.

Maxim.

On 04/04/17 21:47, Brian Brooks wrote:
> This work derives from Ola Liljedahl's prototype [1] which introduced a
> scalable scheduler design based on primarily lock-free algorithms and
> data structures designed to decrease contention. A thread searches
> through a data structure containing only queues that are both non-empty
> and allowed to be scheduled to that thread. Strict priority scheduling is
> respected, and (W)RR scheduling may be used within queues of the same 
> priority.
> Lastly, pre-scheduling or stashing is not employed since it is optional
> functionality that can be implemented in the application.
> 
> In addition to scalable ring buffers, the algorithm also uses unbounded
> concurrent queues. LL/SC and CAS variants exist in cases where absense of
> ABA problem cannot be proved, and also in cases where the compiler's atomic
> built-ins may not be lowered to the desired instruction(s). Finally, a version
> of the algorithm that uses locks is also provided.
> 
> See platform/linux-generic/include/odp_config_internal.h for further build
> time configuration.
> 
> Use --enable-schedule-scalable to conditionally compile this scheduler
> into the library.
> 
> [1] https://lists.linaro.org/pipermail/lng-odp/2016-September/025682.html
> 
> v2:
>  - Move ARMv8 issues and other fixes into separate patches
>  - Abstract away some #ifdefs
>  - Fix some checkpatch.pl warnings
> 
> Brian Brooks (14):
>   Fix native Clang build on ARMv8
>   api: queue: Add ring_size
>   Add ODP_CONFIG_QUEUE_SIZE
>   Fix a locking bug
>   test: odp_scheduling: Handle dequeueing from a concurrent queue
>   test: scheduler: Fixup calling release operations
>   Avoid shm namespace collisions and allow shm block per queue
>   Add _odp_packet_to_buf_hdr_ptr()
>   Add scalable scheduler build config
>   Add LL/SC and signaling primitives
>   Add a bitset
>   Add atomic ops for 128-bit scalars
>   Add llqueue, an unbounded concurrent queue
>   Add scalable scheduler
> 
> Ola Liljedahl (2):
>   linux-generic: ring.c: use required memory orderings
>   helper: cuckootable: Specify queue ring_size
> 
>  configure.ac   |   30 +-
>  helper/cuckootable.c   |1 +
>  include/odp/api/spec/queue.h   |5 +
>  platform/linux-generic/Makefile.am |   21 +-
>  .../include/odp/api/plat/schedule_types.h  |   20 +-
>  platform/linux-generic/include/odp_atomic16.h  |  214 +++
>  platform/linux-generic/include/odp_bitset.h|  155 ++
>  .../linux-generic/include/odp_config_internal.h|   91 +-
>  platform/linux-generic/include/odp_llqueue.h   |  285 +++
>  platform/linux-generic/include/odp_llsc.h  |  332 
>  .../linux-generic/include/odp_packet_internal.h|3 +
>  .../linux-generic/include/odp_queue_internal.h |  122 +-
>  platform/linux-generic/include/odp_schedule_if.h   |  166 +-
>  .../include/odp_schedule_ordered_internal.h|  150 ++
>  platform/linux-generic/m4/odp_schedule.m4  |   55 +-
>  platform/linux-generic/odp_classification.c|4 +-
>  platform/linux-generic/odp_packet.c|5 +
>  platform/linux-generic/odp_packet_io.c |   88 +-
>  platform/linux-generic/odp_queue.c |2 +-
>  platform/linux-generic/odp_queue_scalable.c|  883 +
>  platform/linux-generic/odp_schedule_if.c   |   36 +-
>  platform/linux-generic/odp_schedule_scalable.c | 1922 
> 
>  .../linux-generic/odp_schedule_scalable_ordered.c  |  285 +++
>  platform/linux-generic/odp_traffic_mngr.c  |7 +-
>  platform/linux-generic/pktio/loop.c|   11 +-
>  platform/linux-generic/pktio/ring.c|   30 +-
>  test/common_plat/performance/odp_sched_latency.c   |   68 +-
>  test/common_plat/performance/odp_scheduling.c  |   12 +-
>  .../api/classification/odp_classification_basic.c  |8 +-
>  .../classification/odp_classification_test_pmr.c   |   42 +-
>  .../validation/api/scheduler/scheduler.c   |   11 +-
>  test/common_plat/validation/api/timer/timer.c  |5 +-
>  32 files changed, 4922 insertions(+), 147 deletions(-)
>  create mode 100644 platform/linux-generic/include/odp_atomic16.h
>  create mode 100644 platform/linux-generic/include/odp_bitset.h
>  create mode 100644 platform/linux-generic/include/odp_llqueue.h
>  create mode 100644 platform/linux-generic/include/odp_llsc.h
>  create mode 100644 
> platform/linux-generic/include/odp_schedule_ordered_internal.h
>  create mode 100644 platform/linux-generic/odp_queue_scalable.c
>  create mode 100644 platform/linux-generic/odp_schedule_scalable.c
>  create mode 100644 platform/linux-generic/odp_schedule_scalable_ordered.c
> 



Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-07 Thread Peltonen, Janne (Nokia - FI/Espoo)
Hi,

On Thu, Apr 6, 2017 at 1:46 PM, Bill Fischofer  
wrote:
> On Thu, Apr 6, 2017 at 1:32 PM, Ola Liljedahl  
> wrote:
> > On 6 April 2017 at 13:48, Jerin Jacob  
> > wrote:

> >> We see ORDERED->ATOMIC as main use case for basic packet forward.Stage
> >> 1(ORDERED) to process on N cores and Stage2(ATOMIC) to maintain the ingress
> >> order.
> > Doesn't ORDERED scheduling maintain the ingress packet order all the
> > way to the egress interface? A least that's my understanding of ODP
> > ordered queues.
> > From an ODP perspective, I fail to see how the ATOMIC stage is needed.

For basic IP forwarding I also do not see why an atomic stage would be
needed, but for stateful things like IPsec or some application specific
higher layer processing the situation can be different.

At the risk of stating the obvious: Ordered scheduling maintains ingress
order when packets are placed in the next queue (toward the next pipeline
stage or to pktout), but it allows parallel processing of packets of the
same flow between the points where order is maintained. To guarantee packet
processing in the ingress order in some section of code, the code needs
to be executed in an atomic context or protected using an ordered lock.

> As pointed out earlier, ordered locks are another option to avoid a
> separate processing stage simply to do in-sequence operations within
> an ordered flow. I'd be curious to understand the use-case in a bit
> more detail here. Ordered queues preserve the originating queue's
> order, however to achieve end-to-end ordering involving multiple
> processing stages requires that flows traverse only ordered or atomic
> queues. If a parallel queue is used ordering is indeterminate from
> that point on in the pipeline.

Exactly.

In an IPsec GW (as a use case example) one might want to do all
stateless processing (like ingress and egress IP processing and the
crypto operations) in ordered contexts to get parallelism, but the
stateful part (replay-check and sequence number generation) in
an atomic context (or holding an ordered lock) so that not only
the packets are output in ingress order but also their sequence
numbers are in the same order.

That said, some might argue that IPsec replay window can take care
not only of packet reordering in the network but also of reordering
inside an IPsec GW and therefore the atomic context (or ordered lock)
is not necessarily needed in all implementations.

Janne




Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-06 Thread Brian Brooks
On Thu, Apr 6, 2017 at 1:46 PM, Bill Fischofer
<bill.fischo...@linaro.org> wrote:
> On Thu, Apr 6, 2017 at 1:32 PM, Ola Liljedahl <ola.liljed...@linaro.org> 
> wrote:
>> On 6 April 2017 at 13:48, Jerin Jacob <jerin.ja...@caviumnetworks.com> wrote:
>>> -Original Message-
>>>> Date: Thu, 6 Apr 2017 12:54:10 +0200
>>>> From: Ola Liljedahl <ola.liljed...@linaro.org>
>>>> To: Brian Brooks <brian.bro...@arm.com>
>>>> Cc: Jerin Jacob <jerin.ja...@caviumnetworks.com>,
>>>>  "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
>>>> Subject: Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software
>>>>  scheduler
>>>>
>>>> On 5 April 2017 at 18:50, Brian Brooks <brian.bro...@arm.com> wrote:
>>>> > On 04/05 21:27:37, Jerin Jacob wrote:
>>>> >> -Original Message-
>>>> >> > Date: Tue, 4 Apr 2017 13:47:52 -0500
>>>> >> > From: Brian Brooks <brian.bro...@arm.com>
>>>> >> > To: lng-odp@lists.linaro.org
>>>> >> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software 
>>>> >> > scheduler
>>>> >> > X-Mailer: git-send-email 2.12.2
>>>> >> >
>>>> >> > This work derives from Ola Liljedahl's prototype [1] which introduced 
>>>> >> > a
>>>> >> > scalable scheduler design based on primarily lock-free algorithms and
>>>> >> > data structures designed to decrease contention. A thread searches
>>>> >> > through a data structure containing only queues that are both 
>>>> >> > non-empty
>>>> >> > and allowed to be scheduled to that thread. Strict priority 
>>>> >> > scheduling is
>>>> >> > respected, and (W)RR scheduling may be used within queues of the same 
>>>> >> > priority.
>>>> >> > Lastly, pre-scheduling or stashing is not employed since it is 
>>>> >> > optional
>>>> >> > functionality that can be implemented in the application.
>>>> >> >
>>>> >> > In addition to scalable ring buffers, the algorithm also uses 
>>>> >> > unbounded
>>>> >> > concurrent queues. LL/SC and CAS variants exist in cases where 
>>>> >> > absense of
>>>> >> > ABA problem cannot be proved, and also in cases where the compiler's 
>>>> >> > atomic
>>>> >> > built-ins may not be lowered to the desired instruction(s). Finally, 
>>>> >> > a version
>>>> >> > of the algorithm that uses locks is also provided.
>>>> >> >
>>>> >> > See platform/linux-generic/include/odp_config_internal.h for further 
>>>> >> > build
>>>> >> > time configuration.
>>>> >> >
>>>> >> > Use --enable-schedule-scalable to conditionally compile this scheduler
>>>> >> > into the library.
>>>> >>
>>>> >> This is an interesting stuff.
>>>> >>
>>>> >> Do you have any performance/latency numbers in comparison to exiting 
>>>> >> scheduler
>>>> >> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on 
>>>> >> any platform?
>>>> It is still a SW implementation, there is overhead accessed with queue
>>>> enqueue/dequeue and the scheduling itself.
>>>> So for an N-stage pipeline, overhead will accumulate.
>>>> If only a subset of threads are associated with each stage (this could
>>>> be beneficial for I-cache hit rate), there will be less need for
>>>> scalability.
>>>> What is the recommended strategy here for OCTEON/ThunderX?
>>>
>>> In the view of portable event driven applications(Works on both
>>> embedded and server capable chips), the SW schedule is an important piece.
>>>
>>>> All threads/cores share all work?
>>>
>>> That is the recommend one in HW as it supports nativity. But HW provides
>>> means to partition the work load based on odp schedule groups
>>>
>>>
>>>>
>>>> >
>>>> > To give an idea, the avg latency reported by odp_sched_latency is down 
>>>> > to half
>>>> > that of othe

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-06 Thread Bill Fischofer
On Thu, Apr 6, 2017 at 1:32 PM, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
> On 6 April 2017 at 13:48, Jerin Jacob <jerin.ja...@caviumnetworks.com> wrote:
>> -Original Message-
>>> Date: Thu, 6 Apr 2017 12:54:10 +0200
>>> From: Ola Liljedahl <ola.liljed...@linaro.org>
>>> To: Brian Brooks <brian.bro...@arm.com>
>>> Cc: Jerin Jacob <jerin.ja...@caviumnetworks.com>,
>>>  "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
>>> Subject: Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software
>>>  scheduler
>>>
>>> On 5 April 2017 at 18:50, Brian Brooks <brian.bro...@arm.com> wrote:
>>> > On 04/05 21:27:37, Jerin Jacob wrote:
>>> >> -Original Message-
>>> >> > Date: Tue, 4 Apr 2017 13:47:52 -0500
>>> >> > From: Brian Brooks <brian.bro...@arm.com>
>>> >> > To: lng-odp@lists.linaro.org
>>> >> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software 
>>> >> > scheduler
>>> >> > X-Mailer: git-send-email 2.12.2
>>> >> >
>>> >> > This work derives from Ola Liljedahl's prototype [1] which introduced a
>>> >> > scalable scheduler design based on primarily lock-free algorithms and
>>> >> > data structures designed to decrease contention. A thread searches
>>> >> > through a data structure containing only queues that are both non-empty
>>> >> > and allowed to be scheduled to that thread. Strict priority scheduling 
>>> >> > is
>>> >> > respected, and (W)RR scheduling may be used within queues of the same 
>>> >> > priority.
>>> >> > Lastly, pre-scheduling or stashing is not employed since it is optional
>>> >> > functionality that can be implemented in the application.
>>> >> >
>>> >> > In addition to scalable ring buffers, the algorithm also uses unbounded
>>> >> > concurrent queues. LL/SC and CAS variants exist in cases where absense 
>>> >> > of
>>> >> > ABA problem cannot be proved, and also in cases where the compiler's 
>>> >> > atomic
>>> >> > built-ins may not be lowered to the desired instruction(s). Finally, a 
>>> >> > version
>>> >> > of the algorithm that uses locks is also provided.
>>> >> >
>>> >> > See platform/linux-generic/include/odp_config_internal.h for further 
>>> >> > build
>>> >> > time configuration.
>>> >> >
>>> >> > Use --enable-schedule-scalable to conditionally compile this scheduler
>>> >> > into the library.
>>> >>
>>> >> This is an interesting stuff.
>>> >>
>>> >> Do you have any performance/latency numbers in comparison to exiting 
>>> >> scheduler
>>> >> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
>>> >> platform?
>>> It is still a SW implementation, there is overhead accessed with queue
>>> enqueue/dequeue and the scheduling itself.
>>> So for an N-stage pipeline, overhead will accumulate.
>>> If only a subset of threads are associated with each stage (this could
>>> be beneficial for I-cache hit rate), there will be less need for
>>> scalability.
>>> What is the recommended strategy here for OCTEON/ThunderX?
>>
>> In the view of portable event driven applications(Works on both
>> embedded and server capable chips), the SW schedule is an important piece.
>>
>>> All threads/cores share all work?
>>
>> That is the recommend one in HW as it supports nativity. But HW provides
>> means to partition the work load based on odp schedule groups
>>
>>
>>>
>>> >
>>> > To give an idea, the avg latency reported by odp_sched_latency is down to 
>>> > half
>>> > that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 
>>> > 16c A57,
>>> > and 12c broadwell. We are still preparing numbers, and I think it's worth 
>>> > mentioning
>>> > that they are subject to change as this patch series changes over time.
>>> >
>>> > I am not aware of an existing benchmark that involves switching between 
>>> > different
>>> > queue types. Perhaps this is happening in an example app?
>>> This 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-06 Thread Ola Liljedahl
On 6 April 2017 at 13:48, Jerin Jacob <jerin.ja...@caviumnetworks.com> wrote:
> -Original Message-
>> Date: Thu, 6 Apr 2017 12:54:10 +0200
>> From: Ola Liljedahl <ola.liljed...@linaro.org>
>> To: Brian Brooks <brian.bro...@arm.com>
>> Cc: Jerin Jacob <jerin.ja...@caviumnetworks.com>,
>>  "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
>> Subject: Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software
>>  scheduler
>>
>> On 5 April 2017 at 18:50, Brian Brooks <brian.bro...@arm.com> wrote:
>> > On 04/05 21:27:37, Jerin Jacob wrote:
>> >> -Original Message-
>> >> > Date: Tue, 4 Apr 2017 13:47:52 -0500
>> >> > From: Brian Brooks <brian.bro...@arm.com>
>> >> > To: lng-odp@lists.linaro.org
>> >> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software 
>> >> > scheduler
>> >> > X-Mailer: git-send-email 2.12.2
>> >> >
>> >> > This work derives from Ola Liljedahl's prototype [1] which introduced a
>> >> > scalable scheduler design based on primarily lock-free algorithms and
>> >> > data structures designed to decrease contention. A thread searches
>> >> > through a data structure containing only queues that are both non-empty
>> >> > and allowed to be scheduled to that thread. Strict priority scheduling 
>> >> > is
>> >> > respected, and (W)RR scheduling may be used within queues of the same 
>> >> > priority.
>> >> > Lastly, pre-scheduling or stashing is not employed since it is optional
>> >> > functionality that can be implemented in the application.
>> >> >
>> >> > In addition to scalable ring buffers, the algorithm also uses unbounded
>> >> > concurrent queues. LL/SC and CAS variants exist in cases where absense 
>> >> > of
>> >> > ABA problem cannot be proved, and also in cases where the compiler's 
>> >> > atomic
>> >> > built-ins may not be lowered to the desired instruction(s). Finally, a 
>> >> > version
>> >> > of the algorithm that uses locks is also provided.
>> >> >
>> >> > See platform/linux-generic/include/odp_config_internal.h for further 
>> >> > build
>> >> > time configuration.
>> >> >
>> >> > Use --enable-schedule-scalable to conditionally compile this scheduler
>> >> > into the library.
>> >>
>> >> This is an interesting stuff.
>> >>
>> >> Do you have any performance/latency numbers in comparison to exiting 
>> >> scheduler
>> >> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
>> >> platform?
>> It is still a SW implementation, there is overhead accessed with queue
>> enqueue/dequeue and the scheduling itself.
>> So for an N-stage pipeline, overhead will accumulate.
>> If only a subset of threads are associated with each stage (this could
>> be beneficial for I-cache hit rate), there will be less need for
>> scalability.
>> What is the recommended strategy here for OCTEON/ThunderX?
>
> In the view of portable event driven applications(Works on both
> embedded and server capable chips), the SW schedule is an important piece.
>
>> All threads/cores share all work?
>
> That is the recommend one in HW as it supports nativity. But HW provides
> means to partition the work load based on odp schedule groups
>
>
>>
>> >
>> > To give an idea, the avg latency reported by odp_sched_latency is down to 
>> > half
>> > that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 16c 
>> > A57,
>> > and 12c broadwell. We are still preparing numbers, and I think it's worth 
>> > mentioning
>> > that they are subject to change as this patch series changes over time.
>> >
>> > I am not aware of an existing benchmark that involves switching between 
>> > different
>> > queue types. Perhaps this is happening in an example app?
>> This could be useful in e.g. IPsec termination. Use an atomic stage
>> for the replay protection check and update. Now ODP has ordered locks
>> for that so the "atomic" (exclusive) section can be achieved from an
>> ordered processing stage. Perhaps Jerin knows some other application
>> that utilises two-stage ORDERED->ATOMIC processing.
>
> We see ORDERED->ATOMIC as main 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-06 Thread Jerin Jacob
-Original Message-
> Date: Thu, 6 Apr 2017 12:54:10 +0200
> From: Ola Liljedahl <ola.liljed...@linaro.org>
> To: Brian Brooks <brian.bro...@arm.com>
> Cc: Jerin Jacob <jerin.ja...@caviumnetworks.com>,
>  "lng-odp@lists.linaro.org" <lng-odp@lists.linaro.org>
> Subject: Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software
>  scheduler
> 
> On 5 April 2017 at 18:50, Brian Brooks <brian.bro...@arm.com> wrote:
> > On 04/05 21:27:37, Jerin Jacob wrote:
> >> -Original Message-
> >> > Date: Tue, 4 Apr 2017 13:47:52 -0500
> >> > From: Brian Brooks <brian.bro...@arm.com>
> >> > To: lng-odp@lists.linaro.org
> >> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software 
> >> > scheduler
> >> > X-Mailer: git-send-email 2.12.2
> >> >
> >> > This work derives from Ola Liljedahl's prototype [1] which introduced a
> >> > scalable scheduler design based on primarily lock-free algorithms and
> >> > data structures designed to decrease contention. A thread searches
> >> > through a data structure containing only queues that are both non-empty
> >> > and allowed to be scheduled to that thread. Strict priority scheduling is
> >> > respected, and (W)RR scheduling may be used within queues of the same 
> >> > priority.
> >> > Lastly, pre-scheduling or stashing is not employed since it is optional
> >> > functionality that can be implemented in the application.
> >> >
> >> > In addition to scalable ring buffers, the algorithm also uses unbounded
> >> > concurrent queues. LL/SC and CAS variants exist in cases where absense of
> >> > ABA problem cannot be proved, and also in cases where the compiler's 
> >> > atomic
> >> > built-ins may not be lowered to the desired instruction(s). Finally, a 
> >> > version
> >> > of the algorithm that uses locks is also provided.
> >> >
> >> > See platform/linux-generic/include/odp_config_internal.h for further 
> >> > build
> >> > time configuration.
> >> >
> >> > Use --enable-schedule-scalable to conditionally compile this scheduler
> >> > into the library.
> >>
> >> This is an interesting stuff.
> >>
> >> Do you have any performance/latency numbers in comparison to exiting 
> >> scheduler
> >> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
> >> platform?
> It is still a SW implementation, there is overhead accessed with queue
> enqueue/dequeue and the scheduling itself.
> So for an N-stage pipeline, overhead will accumulate.
> If only a subset of threads are associated with each stage (this could
> be beneficial for I-cache hit rate), there will be less need for
> scalability.
> What is the recommended strategy here for OCTEON/ThunderX?

In the view of portable event driven applications(Works on both
embedded and server capable chips), the SW schedule is an important piece.

> All threads/cores share all work?

That is the recommend one in HW as it supports nativity. But HW provides
means to partition the work load based on odp schedule groups


> 
> >
> > To give an idea, the avg latency reported by odp_sched_latency is down to 
> > half
> > that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 16c 
> > A57,
> > and 12c broadwell. We are still preparing numbers, and I think it's worth 
> > mentioning
> > that they are subject to change as this patch series changes over time.
> >
> > I am not aware of an existing benchmark that involves switching between 
> > different
> > queue types. Perhaps this is happening in an example app?
> This could be useful in e.g. IPsec termination. Use an atomic stage
> for the replay protection check and update. Now ODP has ordered locks
> for that so the "atomic" (exclusive) section can be achieved from an
> ordered processing stage. Perhaps Jerin knows some other application
> that utilises two-stage ORDERED->ATOMIC processing.

We see ORDERED->ATOMIC as main use case for basic packet forward.Stage
1(ORDERED) to process on N cores and Stage2(ATOMIC) to maintain the ingress
order.


> 
> >
> >> When we say scalable scheduler, What application/means used to quantify
> >> scalablity??
> It starts with the design, use non-blocking data structures and try to
> distribute data to threads so that they do not access shared data very
> often. Some of this is a little detrimental to single-threaded
> performanc

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-06 Thread Ola Liljedahl
On 5 April 2017 at 18:50, Brian Brooks  wrote:
> On 04/05 21:27:37, Jerin Jacob wrote:
>> -Original Message-
>> > Date: Tue, 4 Apr 2017 13:47:52 -0500
>> > From: Brian Brooks 
>> > To: lng-odp@lists.linaro.org
>> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler
>> > X-Mailer: git-send-email 2.12.2
>> >
>> > This work derives from Ola Liljedahl's prototype [1] which introduced a
>> > scalable scheduler design based on primarily lock-free algorithms and
>> > data structures designed to decrease contention. A thread searches
>> > through a data structure containing only queues that are both non-empty
>> > and allowed to be scheduled to that thread. Strict priority scheduling is
>> > respected, and (W)RR scheduling may be used within queues of the same 
>> > priority.
>> > Lastly, pre-scheduling or stashing is not employed since it is optional
>> > functionality that can be implemented in the application.
>> >
>> > In addition to scalable ring buffers, the algorithm also uses unbounded
>> > concurrent queues. LL/SC and CAS variants exist in cases where absense of
>> > ABA problem cannot be proved, and also in cases where the compiler's atomic
>> > built-ins may not be lowered to the desired instruction(s). Finally, a 
>> > version
>> > of the algorithm that uses locks is also provided.
>> >
>> > See platform/linux-generic/include/odp_config_internal.h for further build
>> > time configuration.
>> >
>> > Use --enable-schedule-scalable to conditionally compile this scheduler
>> > into the library.
>>
>> This is an interesting stuff.
>>
>> Do you have any performance/latency numbers in comparison to exiting 
>> scheduler
>> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
>> platform?
It is still a SW implementation, there is overhead accessed with queue
enqueue/dequeue and the scheduling itself.
So for an N-stage pipeline, overhead will accumulate.
If only a subset of threads are associated with each stage (this could
be beneficial for I-cache hit rate), there will be less need for
scalability.
What is the recommended strategy here for OCTEON/ThunderX? All
threads/cores share all work?

>
> To give an idea, the avg latency reported by odp_sched_latency is down to half
> that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 16c 
> A57,
> and 12c broadwell. We are still preparing numbers, and I think it's worth 
> mentioning
> that they are subject to change as this patch series changes over time.
>
> I am not aware of an existing benchmark that involves switching between 
> different
> queue types. Perhaps this is happening in an example app?
This could be useful in e.g. IPsec termination. Use an atomic stage
for the replay protection check and update. Now ODP has ordered locks
for that so the "atomic" (exclusive) section can be achieved from an
ordered processing stage. Perhaps Jerin knows some other application
that utilises two-stage ORDERED->ATOMIC processing.

>
>> When we say scalable scheduler, What application/means used to quantify
>> scalablity??
It starts with the design, use non-blocking data structures and try to
distribute data to threads so that they do not access shared data very
often. Some of this is a little detrimental to single-threaded
performance, you need to use more atomic operations. It seems to work
well on ARM (A53, A57) though, the penalty is higher on x86 (x86 is
very good with spin locks, cmpxchg seems to have more overhead
compared to ldxr/stxr on ARM which can have less memory ordering
constraints). We actually use different synchronisation strategies on
ARM and on x86 (compile time configuration).

You can read more here:
https://docs.google.com/presentation/d/1BqAdni4aP4aHOqO6fNO39-0MN9zOntI-2ZnVTUXBNSQ
I also did an internal presentation on the scheduler prototype back at
Las Vegas, that presentation might also be somewhere on the Linaro web
site.


>>
>> Do you have any numbers in comparison to existing scheduler to show
>> magnitude of the scalablity on any platform?


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Brian Brooks
On 04/05 21:27:37, Jerin Jacob wrote:
> -Original Message-
> > Date: Tue, 4 Apr 2017 13:47:52 -0500
> > From: Brian Brooks 
> > To: lng-odp@lists.linaro.org
> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler
> > X-Mailer: git-send-email 2.12.2
> > 
> > This work derives from Ola Liljedahl's prototype [1] which introduced a
> > scalable scheduler design based on primarily lock-free algorithms and
> > data structures designed to decrease contention. A thread searches
> > through a data structure containing only queues that are both non-empty
> > and allowed to be scheduled to that thread. Strict priority scheduling is
> > respected, and (W)RR scheduling may be used within queues of the same 
> > priority.
> > Lastly, pre-scheduling or stashing is not employed since it is optional
> > functionality that can be implemented in the application.
> > 
> > In addition to scalable ring buffers, the algorithm also uses unbounded
> > concurrent queues. LL/SC and CAS variants exist in cases where absense of
> > ABA problem cannot be proved, and also in cases where the compiler's atomic
> > built-ins may not be lowered to the desired instruction(s). Finally, a 
> > version
> > of the algorithm that uses locks is also provided.
> > 
> > See platform/linux-generic/include/odp_config_internal.h for further build
> > time configuration.
> > 
> > Use --enable-schedule-scalable to conditionally compile this scheduler
> > into the library.
> 
> This is an interesting stuff.
> 
> Do you have any performance/latency numbers in comparison to exiting scheduler
> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
> platform?

To give an idea, the avg latency reported by odp_sched_latency is down to half
that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 16c A57,
and 12c broadwell. We are still preparing numbers, and I think it's worth 
mentioning
that they are subject to change as this patch series changes over time.

I am not aware of an existing benchmark that involves switching between 
different
queue types. Perhaps this is happening in an example app?

> When we say scalable scheduler, What application/means used to quantify
> scalablity??
> 
> Do you have any numbers in comparison to existing scheduler to show
> magnitude of the scalablity on any platform?


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Jerin Jacob
-Original Message-
> Date: Tue, 4 Apr 2017 13:47:52 -0500
> From: Brian Brooks 
> To: lng-odp@lists.linaro.org
> Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler
> X-Mailer: git-send-email 2.12.2
> 
> This work derives from Ola Liljedahl's prototype [1] which introduced a
> scalable scheduler design based on primarily lock-free algorithms and
> data structures designed to decrease contention. A thread searches
> through a data structure containing only queues that are both non-empty
> and allowed to be scheduled to that thread. Strict priority scheduling is
> respected, and (W)RR scheduling may be used within queues of the same 
> priority.
> Lastly, pre-scheduling or stashing is not employed since it is optional
> functionality that can be implemented in the application.
> 
> In addition to scalable ring buffers, the algorithm also uses unbounded
> concurrent queues. LL/SC and CAS variants exist in cases where absense of
> ABA problem cannot be proved, and also in cases where the compiler's atomic
> built-ins may not be lowered to the desired instruction(s). Finally, a version
> of the algorithm that uses locks is also provided.
> 
> See platform/linux-generic/include/odp_config_internal.h for further build
> time configuration.
> 
> Use --enable-schedule-scalable to conditionally compile this scheduler
> into the library.

This is an interesting stuff.

Do you have any performance/latency numbers in comparison to exiting scheduler
for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
platform?

When we say scalable scheduler, What application/means used to quantify
scalablity??
Do you have any numbers in comparison to existing scheduler to show
magnitude of the scalablity on any platform?



Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Ola Liljedahl
_fdserver.c:
#define FDSERVER_MAX_ENTRIES 256

   /* store the file descriptor in table: */
if (fd_table_nb_entries < FDSERVER_MAX_ENTRIES) {

} else {
ODP_ERR("FD table full\n");

Weird that you get this but not we.

It is probably related to the scalable scheduler requesting a smh
object per queue. This is a left-over from the prototype, perhaps it
needs to be fixed, allocate one shm for all queues. Still may need a
shm per ring buffer though...

Perhaps we need to increase/remove this arbitrary limit on 256 FD entries.

On 5 April 2017 at 14:05, Bill Fischofer  wrote:
> Environment is Ubuntu 16.10.
>
> On Wed, Apr 5, 2017 at 7:03 AM, Bill Fischofer
>  wrote:
>> This is running on my desktop x86:
>>
>> ./bootstrap
>> ./configure --enable-schedule-scalable --enable-cunit-support
>> make
>> cd test/common_plat/validation/api/scheduler
>> ./scheduler_main
>>
>> On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
>>  wrote:
>>> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
 When I compile configure this without --enable-schedule-scalable the
 scheduler validation test runs normally, however if I enable the new
 scheduler I get this output:


 ...
  CUnit - A unit testing framework for C - Version 2.1-3
  http://cunit.sourceforge.net/

 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full

 ...lots more lines like this

 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure

 Suite: Scheduler
   Test: scheduler_test_wait_time
 ..._fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
 de-registration failure
 passed
   Test: scheduler_test_num_prio ...passed
   Test: scheduler_test_queue_destroy
 ..._fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 passed
   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
 full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Bill Fischofer
Environment is Ubuntu 16.10.

On Wed, Apr 5, 2017 at 7:03 AM, Bill Fischofer
 wrote:
> This is running on my desktop x86:
>
> ./bootstrap
> ./configure --enable-schedule-scalable --enable-cunit-support
> make
> cd test/common_plat/validation/api/scheduler
> ./scheduler_main
>
> On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
>  wrote:
>> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
>>> When I compile configure this without --enable-schedule-scalable the
>>> scheduler validation test runs normally, however if I enable the new
>>> scheduler I get this output:
>>>
>>>
>>> ...
>>>  CUnit - A unit testing framework for C - Version 2.1-3
>>>  http://cunit.sourceforge.net/
>>>
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>>
>>> ...lots more lines like this
>>>
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>>
>>> Suite: Scheduler
>>>   Test: scheduler_test_wait_time
>>> ..._fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
>>> de-registration failure
>>> passed
>>>   Test: scheduler_test_num_prio ...passed
>>>   Test: scheduler_test_queue_destroy
>>> ..._fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> passed
>>>   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
>>> full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>>
>>> These messages repeat throughout the test even though it "passes".
>>> Clearly something isn't right.
>>
>> We have done considerable amount of testing on x86 as well as ARM with
>> different schedulers.
>> Can you provide more details?
>> What is the config command you used?
>> What platform (x86 vs ARM)?
>> I assume you are running 'make check'.
>>
>>>
>>> On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
 This work derives from Ola Liljedahl's prototype [1] which introduced a
 scalable scheduler design based on primarily lock-free 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Bill Fischofer
This is running on my desktop x86:

./bootstrap
./configure --enable-schedule-scalable --enable-cunit-support
make
cd test/common_plat/validation/api/scheduler
./scheduler_main

On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
 wrote:
> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
>> When I compile configure this without --enable-schedule-scalable the
>> scheduler validation test runs normally, however if I enable the new
>> scheduler I get this output:
>>
>>
>> ...
>>  CUnit - A unit testing framework for C - Version 2.1-3
>>  http://cunit.sourceforge.net/
>>
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>>
>> ...lots more lines like this
>>
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>
>> Suite: Scheduler
>>   Test: scheduler_test_wait_time
>> ..._fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
>> de-registration failure
>> passed
>>   Test: scheduler_test_num_prio ...passed
>>   Test: scheduler_test_queue_destroy
>> ..._fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> passed
>>   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
>> full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>
>> These messages repeat throughout the test even though it "passes".
>> Clearly something isn't right.
>
> We have done considerable amount of testing on x86 as well as ARM with
> different schedulers.
> Can you provide more details?
> What is the config command you used?
> What platform (x86 vs ARM)?
> I assume you are running 'make check'.
>
>>
>> On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
>>> This work derives from Ola Liljedahl's prototype [1] which introduced a
>>> scalable scheduler design based on primarily lock-free algorithms and
>>> data structures designed to decrease contention. A thread searches
>>> through a data structure containing only queues that are both non-empty
>>> and allowed to be scheduled to that thread. Strict 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-04 Thread Honnappa Nagarahalli
On 4 April 2017 at 16:12, Bill Fischofer  wrote:
> When I compile configure this without --enable-schedule-scalable the
> scheduler validation test runs normally, however if I enable the new
> scheduler I get this output:
>
>
> ...
>  CUnit - A unit testing framework for C - Version 2.1-3
>  http://cunit.sourceforge.net/
>
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
>
> ...lots more lines like this
>
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>
> Suite: Scheduler
>   Test: scheduler_test_wait_time
> ..._fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
> de-registration failure
> passed
>   Test: scheduler_test_num_prio ...passed
>   Test: scheduler_test_queue_destroy
> ..._fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> passed
>   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
> full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:463:handle_request():FD table full
> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>
> These messages repeat throughout the test even though it "passes".
> Clearly something isn't right.

We have done considerable amount of testing on x86 as well as ARM with
different schedulers.
Can you provide more details?
What is the config command you used?
What platform (x86 vs ARM)?
I assume you are running 'make check'.

>
> On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
>> This work derives from Ola Liljedahl's prototype [1] which introduced a
>> scalable scheduler design based on primarily lock-free algorithms and
>> data structures designed to decrease contention. A thread searches
>> through a data structure containing only queues that are both non-empty
>> and allowed to be scheduled to that thread. Strict priority scheduling is
>> respected, and (W)RR scheduling may be used within queues of the same 
>> priority.
>> Lastly, pre-scheduling or stashing is not employed since it is optional
>> functionality that can be implemented in the application.
>>
>> In addition to scalable ring buffers, the algorithm also uses unbounded
>> concurrent queues. LL/SC and CAS variants exist in 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-04 Thread Bill Fischofer
When I compile configure this without --enable-schedule-scalable the
scheduler validation test runs normally, however if I enable the new
scheduler I get this output:


...
 CUnit - A unit testing framework for C - Version 2.1-3
 http://cunit.sourceforge.net/

_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full

...lots more lines like this

_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure

Suite: Scheduler
  Test: scheduler_test_wait_time
..._fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
de-registration failure
passed
  Test: scheduler_test_num_prio ...passed
  Test: scheduler_test_queue_destroy
..._fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
passed
  Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:463:handle_request():FD table full
_fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
_fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure

These messages repeat throughout the test even though it "passes".
Clearly something isn't right.

On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
> This work derives from Ola Liljedahl's prototype [1] which introduced a
> scalable scheduler design based on primarily lock-free algorithms and
> data structures designed to decrease contention. A thread searches
> through a data structure containing only queues that are both non-empty
> and allowed to be scheduled to that thread. Strict priority scheduling is
> respected, and (W)RR scheduling may be used within queues of the same 
> priority.
> Lastly, pre-scheduling or stashing is not employed since it is optional
> functionality that can be implemented in the application.
>
> In addition to scalable ring buffers, the algorithm also uses unbounded
> concurrent queues. LL/SC and CAS variants exist in cases where absense of
> ABA problem cannot be proved, and also in cases where the compiler's atomic
> built-ins may not be lowered to the desired instruction(s). Finally, a version
> of the algorithm that uses locks is also provided.
>
> See platform/linux-generic/include/odp_config_internal.h for further build
> time configuration.
>
> Use --enable-schedule-scalable to conditionally compile this scheduler
> into the library.
>
> [1]