Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-22 Thread Maxim Uvarov

Merged,
Maxim.

On 09/22/16 01:11, Brian Brooks wrote:

For series:

Reviewed-and-tested-by: Brian Brooks 

On 09/14 11:53:06, Matias Elo wrote:

Add new scheduling latency benchmark application. The application
measures delays (avg, min, max) for high and low priority events.

The test has a configurable number of TRAFFIC events and few SAMPLE events
(one common or one per priority). The scheduling latency is only measured
from the SAMPLE events to minimize measurement overhead.

The application's command line arguments enable configuring:
- Number of processing threads
- Number of high/low priority queues
- Number of high/low priority events
- Use separate SAMPLE events for each priority
- Scheduled queue type (PARALLEL, ATOMIC, ORDERED)

Signed-off-by: Matias Elo 
---

V2:
- Remove unnecessary 'num_workers' initialization (Maxim)

  test/common_plat/performance/.gitignore  |   1 +
  test/common_plat/performance/Makefile.am |   4 +
  test/common_plat/performance/odp_sched_latency.c | 767 +++
  3 files changed, 772 insertions(+)
  create mode 100644 test/common_plat/performance/odp_sched_latency.c

diff --git a/test/common_plat/performance/.gitignore 
b/test/common_plat/performance/.gitignore
index edcc832..1527d25 100644
--- a/test/common_plat/performance/.gitignore
+++ b/test/common_plat/performance/.gitignore
@@ -4,4 +4,5 @@ odp_atomic
  odp_crypto
  odp_l2fwd
  odp_pktio_perf
+odp_sched_latency
  odp_scheduling
diff --git a/test/common_plat/performance/Makefile.am 
b/test/common_plat/performance/Makefile.am
index d23bb3e..f5dd8dd 100644
--- a/test/common_plat/performance/Makefile.am
+++ b/test/common_plat/performance/Makefile.am
@@ -5,6 +5,7 @@ TESTS_ENVIRONMENT += TEST_DIR=${builddir}
  EXECUTABLES = odp_crypto$(EXEEXT) odp_pktio_perf$(EXEEXT)
  
  COMPILE_ONLY = odp_l2fwd$(EXEEXT) \

+  odp_sched_latency$(EXEEXT) \
   odp_scheduling$(EXEEXT)
  
  TESTSCRIPTS = odp_l2fwd_run.sh \

@@ -20,6 +21,8 @@ bin_PROGRAMS = $(EXECUTABLES) $(COMPILE_ONLY)
  
  odp_crypto_LDFLAGS = $(AM_LDFLAGS) -static

  odp_crypto_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
+odp_sched_latency_LDFLAGS = $(AM_LDFLAGS) -static
+odp_sched_latency_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
  odp_scheduling_LDFLAGS = $(AM_LDFLAGS) -static
  odp_scheduling_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
  
@@ -27,6 +30,7 @@ noinst_HEADERS = \

  $(top_srcdir)/test/test_debug.h
  
  dist_odp_crypto_SOURCES = odp_crypto.c

+dist_odp_sched_latency_SOURCES = odp_sched_latency.c
  dist_odp_scheduling_SOURCES = odp_scheduling.c
  dist_odp_pktio_perf_SOURCES = odp_pktio_perf.c
  
diff --git a/test/common_plat/performance/odp_sched_latency.c b/test/common_plat/performance/odp_sched_latency.c

new file mode 100644
index 000..063fb21
--- /dev/null
+++ b/test/common_plat/performance/odp_sched_latency.c
@@ -0,0 +1,767 @@
+/* Copyright (c) 2016, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+/**
+ * @file
+ *
+ * @example odp_sched_latency.c  ODP scheduling latency benchmark application
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/* ODP main header */
+#include 
+
+/* ODP helper for Linux apps */
+#include 
+
+/* GNU lib C */
+#include 
+
+#define MAX_WORKERS  64/**< Maximum number of worker threads */
+#define MAX_QUEUES   4096  /**< Maximum number of queues */
+#define EVENT_POOL_SIZE  (1024 * 1024) /**< Event pool size */
+#define TEST_ROUNDS (4 * 1024 * 1024)  /**< Test rounds for each thread */
+#define MAIN_THREAD   1 /**< Thread ID performing maintenance tasks */
+
+/* Default values for command line arguments */
+#define SAMPLE_EVENT_PER_PRIO0 /**< Allocate a separate sample event for
+each priority */
+#define HI_PRIO_EVENTS   0 /**< Number of high priority events */
+#define LO_PRIO_EVENTS  32 /**< Number of low priority events */
+#define HI_PRIO_QUEUES  16 /**< Number of high priority queues */
+#define LO_PRIO_QUEUES  64 /**< Number of low priority queues */
+
+#define EVENTS_PER_HI_PRIO_QUEUE 0  /**< Alloc HI_PRIO_QUEUES x HI_PRIO_EVENTS
+events */
+#define EVENTS_PER_LO_PRIO_QUEUE 1  /**< Alloc LO_PRIO_QUEUES x LO_PRIO_EVENTS
+events */
+ODP_STATIC_ASSERT(HI_PRIO_QUEUES <= MAX_QUEUES, "Too many HI priority queues");
+ODP_STATIC_ASSERT(LO_PRIO_QUEUES <= MAX_QUEUES, "Too many LO priority queues");
+
+#define CACHE_ALIGN_ROUNDUP(x)\
+   ((ODP_CACHE_LINE_SIZE) * \
+(((x) + ODP_CACHE_LINE_SIZE - 1) / (ODP_CACHE_LINE_SIZE)))
+
+/* Test priorities */
+#define NUM_PRIOS 2 /**< Number of tested priorities */
+#define HI_PRIO  0
+#define LO_PRIO  1
+
+/** Test event types */
+typedef enum {
+   WARM_UP, /**< Warm up event */
+   

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-21 Thread Brian Brooks
On 09/20 08:01:49, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> Hi,
> 
> First, this app is written according to the current API and we'd like to 
> start latency testing schedulers ASAP. A review of the app code itself would 
> be appreciated.

Reviewed and tested.

> Anayway, I'll answer those API related comments under.
> 
> 
> > -Original Message-
> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
> > Fischofer
> > Sent: Monday, September 19, 2016 11:41 PM
> > To: Brian Brooks <brian.bro...@linaro.org>
> > Cc: Elo, Matias (Nokia - FI/Espoo) <matias....@nokia-bell-labs.com>; lng-
> > o...@lists.linaro.org
> > Subject: Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling
> > latency test
> > 
> > On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks <brian.bro...@linaro.org>
> > wrote:
> > 
> > > On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > > > >
> > > > > On 09/14 11:53:06, Matias Elo wrote:
> > > > > > +
> 
> 
> > >
> > > Thinking in the general sense..
> > >
> > > Should applications have to reason about _and_ code around pre-scheduled
> > > and non-scheduled events? If the event hasn't crossed the API boundary
> > to
> > > be
> > > delivered to the application according to the scheduling group policies
> > for
> > > that core, what is the difference to the application?
> > >
> > > If a scheduler implementation uses TLS to pre-schedule events it also
> > seems
> > > like it should be able to support work-stealing of those pre-scheduled
> > > events
> > > by other threads in the runtime case where odp_schedule() is not called
> > > from
> > > that thread or the thread id is removed from scheduling group masks.
> > From
> > > the application perspective these are all implementation details.
> > >
> 
> Pause signals a (HW) scheduler that application will leave the schedule loop 
> soon (app stops calling schedule() for a long time or forever). Without the 
> signal, scheduler would not see any difference between a "mid" schedule call 
> vs. the last call. A schedule() call starts and ends a schedule context (e.g. 
> atomic locking of a queue). If application just leaves the loop, the last 
> context will not be freed and e.g. an atomic queue would deadlock.

It is the scheduler providing exclusive access to the atomic queue. At any
one point in time there may only be one core processing an event from an
atomic queue. Multiple cores can participate in processing from an atomic
queue, but the scheduler will ensure exclusive access.

If the core processing an event from an atomic queue finishes its work and
asks the scheduler for more work, the atomic context is implicitly released
by the application. The scheduler may give that core an event from a higher
priority queue and an event from the original atomic queue to another core.

Another scenario is when the core processing an event from an atomic queue
finishes the critical section work but still needs to continue processing the
event, it may release the atomic context explicitly. At this point, the
scheduler may dispatch the next event from the atomic queue to another core
and there could be parallel processing of events from an atomic queue. Maybe
switching the queue to be ordered instead of atomic could be considered here.

Do you have something in mind as to why the odp_schedule_release_xxx() APIs
are insufficient for the 'last' schedule call?

> Also generally pre-scheduled work cannot be "stolen" since:
> 1) it would be costly operation to unwind already made decisions
> 2) packet order must be maintained also in this case. It's costly to reorder 
> / force order for stolen events (other events may have been already processed 
> on other cores before you "steal" some events).

A scheduler implementation may pre-schedule work to cores, but you're right
it could end up being costly if data is being moved like that. Ensuring
correctness could become challenging too.

> > You're making an argument I made some time back. :)  As I recall, the
> > rationale for pause/resume was to make life easier for existing code that
> > is introducing ODP on a more gradual basis. Presumably Nokia has examples
> > of such code in house.
> 
> No. See, rationale above. It's based on functionality of existing SoC HW 
> schedulers. HW is bad in unwinding already made decisions. Application is in 
> the best position to decide what to do for the last events before a thread 
> exists. Typically, those are processed as any other event.
> 
> > 
> > From a

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-21 Thread Brian Brooks
For series:

Reviewed-and-tested-by: Brian Brooks 

On 09/14 11:53:06, Matias Elo wrote:
> Add new scheduling latency benchmark application. The application
> measures delays (avg, min, max) for high and low priority events.
> 
> The test has a configurable number of TRAFFIC events and few SAMPLE events
> (one common or one per priority). The scheduling latency is only measured
> from the SAMPLE events to minimize measurement overhead.
> 
> The application's command line arguments enable configuring:
> - Number of processing threads
> - Number of high/low priority queues
> - Number of high/low priority events
> - Use separate SAMPLE events for each priority
> - Scheduled queue type (PARALLEL, ATOMIC, ORDERED)
> 
> Signed-off-by: Matias Elo 
> ---
> 
> V2:
> - Remove unnecessary 'num_workers' initialization (Maxim)
> 
>  test/common_plat/performance/.gitignore  |   1 +
>  test/common_plat/performance/Makefile.am |   4 +
>  test/common_plat/performance/odp_sched_latency.c | 767 
> +++
>  3 files changed, 772 insertions(+)
>  create mode 100644 test/common_plat/performance/odp_sched_latency.c
> 
> diff --git a/test/common_plat/performance/.gitignore 
> b/test/common_plat/performance/.gitignore
> index edcc832..1527d25 100644
> --- a/test/common_plat/performance/.gitignore
> +++ b/test/common_plat/performance/.gitignore
> @@ -4,4 +4,5 @@ odp_atomic
>  odp_crypto
>  odp_l2fwd
>  odp_pktio_perf
> +odp_sched_latency
>  odp_scheduling
> diff --git a/test/common_plat/performance/Makefile.am 
> b/test/common_plat/performance/Makefile.am
> index d23bb3e..f5dd8dd 100644
> --- a/test/common_plat/performance/Makefile.am
> +++ b/test/common_plat/performance/Makefile.am
> @@ -5,6 +5,7 @@ TESTS_ENVIRONMENT += TEST_DIR=${builddir}
>  EXECUTABLES = odp_crypto$(EXEEXT) odp_pktio_perf$(EXEEXT)
>  
>  COMPILE_ONLY = odp_l2fwd$(EXEEXT) \
> +odp_sched_latency$(EXEEXT) \
>  odp_scheduling$(EXEEXT)
>  
>  TESTSCRIPTS = odp_l2fwd_run.sh \
> @@ -20,6 +21,8 @@ bin_PROGRAMS = $(EXECUTABLES) $(COMPILE_ONLY)
>  
>  odp_crypto_LDFLAGS = $(AM_LDFLAGS) -static
>  odp_crypto_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
> +odp_sched_latency_LDFLAGS = $(AM_LDFLAGS) -static
> +odp_sched_latency_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
>  odp_scheduling_LDFLAGS = $(AM_LDFLAGS) -static
>  odp_scheduling_CFLAGS = $(AM_CFLAGS) -I${top_srcdir}/test
>  
> @@ -27,6 +30,7 @@ noinst_HEADERS = \
> $(top_srcdir)/test/test_debug.h
>  
>  dist_odp_crypto_SOURCES = odp_crypto.c
> +dist_odp_sched_latency_SOURCES = odp_sched_latency.c
>  dist_odp_scheduling_SOURCES = odp_scheduling.c
>  dist_odp_pktio_perf_SOURCES = odp_pktio_perf.c
>  
> diff --git a/test/common_plat/performance/odp_sched_latency.c 
> b/test/common_plat/performance/odp_sched_latency.c
> new file mode 100644
> index 000..063fb21
> --- /dev/null
> +++ b/test/common_plat/performance/odp_sched_latency.c
> @@ -0,0 +1,767 @@
> +/* Copyright (c) 2016, Linaro Limited
> + * All rights reserved.
> + *
> + * SPDX-License-Identifier: BSD-3-Clause
> + */
> +
> +/**
> + * @file
> + *
> + * @example odp_sched_latency.c  ODP scheduling latency benchmark application
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +/* ODP main header */
> +#include 
> +
> +/* ODP helper for Linux apps */
> +#include 
> +
> +/* GNU lib C */
> +#include 
> +
> +#define MAX_WORKERS64/**< Maximum number of worker threads */
> +#define MAX_QUEUES 4096  /**< Maximum number of queues */
> +#define EVENT_POOL_SIZE(1024 * 1024) /**< Event pool size */
> +#define TEST_ROUNDS (4 * 1024 * 1024)/**< Test rounds for each 
> thread */
> +#define MAIN_THREAD 1 /**< Thread ID performing maintenance tasks */
> +
> +/* Default values for command line arguments */
> +#define SAMPLE_EVENT_PER_PRIO  0 /**< Allocate a separate sample 
> event for
> +  each priority */
> +#define HI_PRIO_EVENTS 0 /**< Number of high priority events 
> */
> +#define LO_PRIO_EVENTS32 /**< Number of low priority events 
> */
> +#define HI_PRIO_QUEUES16 /**< Number of high priority queues 
> */
> +#define LO_PRIO_QUEUES64 /**< Number of low priority queues 
> */
> +
> +#define EVENTS_PER_HI_PRIO_QUEUE 0  /**< Alloc HI_PRIO_QUEUES x 
> HI_PRIO_EVENTS
> +  events */
> +#define EVENTS_PER_LO_PRIO_QUEUE 1  /**< Alloc LO_PRIO_QUEUES x 
> LO_PRIO_EVENTS
> +  events */
> +ODP_STATIC_ASSERT(HI_PRIO_QUEUES <= MAX_QUEUES, "Too many HI priority 
> queues");
> +ODP_STATIC_ASSERT(LO_PRIO_QUEUES <= MAX_QUEUES, "Too many LO priority 
> queues");
> +
> +#define CACHE_ALIGN_ROUNDUP(x)\
> + ((ODP_CACHE_LINE_SIZE) * \
> +  (((x) + ODP_CACHE_LINE_SIZE - 1) / (ODP_CACHE_LINE_SIZE)))
> +
> +/* Test 

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-20 Thread Savolainen, Petri (Nokia - FI/Espoo)
Hi,

First, this app is written according to the current API and we'd like to start 
latency testing schedulers ASAP. A review of the app code itself would be 
appreciated.

Anayway, I'll answer those API related comments under.


> -Original Message-
> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
> Fischofer
> Sent: Monday, September 19, 2016 11:41 PM
> To: Brian Brooks <brian.bro...@linaro.org>
> Cc: Elo, Matias (Nokia - FI/Espoo) <matias@nokia-bell-labs.com>; lng-
> o...@lists.linaro.org
> Subject: Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling
> latency test
> 
> On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks <brian.bro...@linaro.org>
> wrote:
> 
> > On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > > >
> > > > On 09/14 11:53:06, Matias Elo wrote:
> > > > > +


> >
> > Thinking in the general sense..
> >
> > Should applications have to reason about _and_ code around pre-scheduled
> > and non-scheduled events? If the event hasn't crossed the API boundary
> to
> > be
> > delivered to the application according to the scheduling group policies
> for
> > that core, what is the difference to the application?
> >
> > If a scheduler implementation uses TLS to pre-schedule events it also
> seems
> > like it should be able to support work-stealing of those pre-scheduled
> > events
> > by other threads in the runtime case where odp_schedule() is not called
> > from
> > that thread or the thread id is removed from scheduling group masks.
> From
> > the application perspective these are all implementation details.
> >

Pause signals a (HW) scheduler that application will leave the schedule loop 
soon (app stops calling schedule() for a long time or forever). Without the 
signal, scheduler would not see any difference between a "mid" schedule call 
vs. the last call. A schedule() call starts and ends a schedule context (e.g. 
atomic locking of a queue). If application just leaves the loop, the last 
context will not be freed and e.g. an atomic queue would deadlock.

Also generally pre-scheduled work cannot be "stolen" since:
1) it would be costly operation to unwind already made decisions
2) packet order must be maintained also in this case. It's costly to reorder / 
force order for stolen events (other events may have been already processed on 
other cores before you "steal" some events).



> 
> You're making an argument I made some time back. :)  As I recall, the
> rationale for pause/resume was to make life easier for existing code that
> is introducing ODP on a more gradual basis. Presumably Nokia has examples
> of such code in house.

No. See, rationale above. It's based on functionality of existing SoC HW 
schedulers. HW is bad in unwinding already made decisions. Application is in 
the best position to decide what to do for the last events before a thread 
exists. Typically, those are processed as any other event.

> 
> From a design standpoint worker threads shouldn't "change their minds" and
> go off to do something else for a while. For whatever else they might want
> to do it would seem that such requirements would be better served by
> simply
> having another thread to do the other things that wakes up periodically to
> do them.
> 

Pause/resume should not be something that a thread is doing very often. But 
without it, any worker thread could not ever exit the schedule loop - doing so 
could deadlock a queue (or a number of queues).

> 
> >
> > This pause state may also cause some confusion for application writers
> > because
> > it is now possible to write two different event loops for the same core
> > depending on how a particular scheduler implementation behaves. The
> > semantics
> > seem to blur a bit with scheduling groups. Level of abstraction can be
> > raised
> > by deprecating the scheduler pause state and APIs.
> >

Those cannot be just deprecated. The same signal is needed in some form to 
avoid deadlocks.

> 
> This is a worthwhile discussion to have. I'll add it to the agenda for
> tomorrow's ODP call and we can include it in the wider scheduler
> discussions scheduled for next week. The other rationale for not wanting
> this behavior (another argument I advanced earlier) is that it greatly
> complicates recovery processing. A robustly designed application should be
> able to recover from the failure of an individual thread (this is
> especially true if the ODP thread is in fact a separate process). If the
> implementation has prescheduled events to a failed thread then how are
> they
> recovered gracefully? Conversely, if the 

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-19 Thread Bill Fischofer
On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks 
wrote:

> On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > >
> > > On 09/14 11:53:06, Matias Elo wrote:
> > > > +
> > > > + /* Clear possible locally stored buffers */
> > > > + odp_schedule_pause();
> > > > +
> > > > + while (1) {
> > > > + ev = odp_schedule(_queue, ODP_SCHED_NO_WAIT);
> > > > +
> > > > + if (ev == ODP_EVENT_INVALID)
> > > > + break;
> > > > +
> > > > + if (odp_queue_enq(src_queue, ev)) {
> > > > + LOG_ERR("[%i] Queue enqueue failed.\n", thr);
> > > > + odp_event_free(ev);
> > > > + return -1;
> > > > + }
> > > > + }
> > > > +
> > > > + odp_schedule_resume();
> > >
> > > Is it possible to skip this and go straight to draining the queues?
> > >
> > > Locally pre-scheduled work is an implementation detail that should be
> hidden
> > > by the scheduling APIs.
> > >
> > > A hardware scheduler may not pre-schedule work to cores the way the
> current
> > > software implementation does.
> >
> > Also some HW schedulers may operate in push mode and do local cashing.
> Calling
> > odp_schedule_pause() is the only ODP method to signal the scheduler to
> stop this.
> > So to keep the application platform agnostic (and follow the API
> documentation),
> > this step cannot be skipped.
> >
> > -Matias
>
> Thinking in the general sense..
>
> Should applications have to reason about _and_ code around pre-scheduled
> and non-scheduled events? If the event hasn't crossed the API boundary to
> be
> delivered to the application according to the scheduling group policies for
> that core, what is the difference to the application?
>
> If a scheduler implementation uses TLS to pre-schedule events it also seems
> like it should be able to support work-stealing of those pre-scheduled
> events
> by other threads in the runtime case where odp_schedule() is not called
> from
> that thread or the thread id is removed from scheduling group masks. From
> the application perspective these are all implementation details.
>

You're making an argument I made some time back. :)  As I recall, the
rationale for pause/resume was to make life easier for existing code that
is introducing ODP on a more gradual basis. Presumably Nokia has examples
of such code in house.

>From a design standpoint worker threads shouldn't "change their minds" and
go off to do something else for a while. For whatever else they might want
to do it would seem that such requirements would be better served by simply
having another thread to do the other things that wakes up periodically to
do them.


>
> This pause state may also cause some confusion for application writers
> because
> it is now possible to write two different event loops for the same core
> depending on how a particular scheduler implementation behaves. The
> semantics
> seem to blur a bit with scheduling groups. Level of abstraction can be
> raised
> by deprecating the scheduler pause state and APIs.
>

This is a worthwhile discussion to have. I'll add it to the agenda for
tomorrow's ODP call and we can include it in the wider scheduler
discussions scheduled for next week. The other rationale for not wanting
this behavior (another argument I advanced earlier) is that it greatly
complicates recovery processing. A robustly designed application should be
able to recover from the failure of an individual thread (this is
especially true if the ODP thread is in fact a separate process). If the
implementation has prescheduled events to a failed thread then how are they
recovered gracefully? Conversely, if the implementation can recover from
such a scenario than it would seem it could equally "unschedule" prestaged
events as needed due to thread termination (normal or abnormal) or for load
balancing purposes.

We may not be able to fully deprecate these APIs, but perhaps we can make
it clearer how they are intended to be used and classify them as
"discouraged" for new code.


>
> > > The ODP implementation for that environment
> > > would have to turn the scheduling call into a nop for that core if it
> is
> > > paused by use of these APIs. Another way to implement it would be to
> remove
> > > this core from all queue scheduling groups and leave the schedule call
> as-is.
> > > If implemented by the first method, the application writer could
> simply just
> > > not call the API to schedule work. If implemented by the second
> method, there
> > > are already scheduling group APIs to do this.
> >
> > The ODP implementation is free to choose how it implements these calls.
> For
> > example adding a single 'if (odp_unlikely(x))' to odp_schedule() to make
> it a NOP
> > after odp_schedule_pause() has been called shouldn't cause a significant
> overhead.
> >
> > >
> > > Are odp_schedule_pause() and odp_schedule_resume() deprecated?
> >
> > Nope.
> >
> > >
> > > > + odp_barrier_wait(>barrier);
> > > > +
> > > > + 

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-19 Thread Brian Brooks
On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > 
> > On 09/14 11:53:06, Matias Elo wrote:
> > > +
> > > + /* Clear possible locally stored buffers */
> > > + odp_schedule_pause();
> > > +
> > > + while (1) {
> > > + ev = odp_schedule(_queue, ODP_SCHED_NO_WAIT);
> > > +
> > > + if (ev == ODP_EVENT_INVALID)
> > > + break;
> > > +
> > > + if (odp_queue_enq(src_queue, ev)) {
> > > + LOG_ERR("[%i] Queue enqueue failed.\n", thr);
> > > + odp_event_free(ev);
> > > + return -1;
> > > + }
> > > + }
> > > +
> > > + odp_schedule_resume();
> > 
> > Is it possible to skip this and go straight to draining the queues?
> > 
> > Locally pre-scheduled work is an implementation detail that should be hidden
> > by the scheduling APIs.
> > 
> > A hardware scheduler may not pre-schedule work to cores the way the current
> > software implementation does.
> 
> Also some HW schedulers may operate in push mode and do local cashing. Calling
> odp_schedule_pause() is the only ODP method to signal the scheduler to stop 
> this.
> So to keep the application platform agnostic (and follow the API 
> documentation),
> this step cannot be skipped.
> 
> -Matias

Thinking in the general sense..

Should applications have to reason about _and_ code around pre-scheduled
and non-scheduled events? If the event hasn't crossed the API boundary to be
delivered to the application according to the scheduling group policies for
that core, what is the difference to the application?

If a scheduler implementation uses TLS to pre-schedule events it also seems
like it should be able to support work-stealing of those pre-scheduled events
by other threads in the runtime case where odp_schedule() is not called from
that thread or the thread id is removed from scheduling group masks. From
the application perspective these are all implementation details.

This pause state may also cause some confusion for application writers because
it is now possible to write two different event loops for the same core
depending on how a particular scheduler implementation behaves. The semantics
seem to blur a bit with scheduling groups. Level of abstraction can be raised
by deprecating the scheduler pause state and APIs.

> > The ODP implementation for that environment
> > would have to turn the scheduling call into a nop for that core if it is
> > paused by use of these APIs. Another way to implement it would be to remove
> > this core from all queue scheduling groups and leave the schedule call 
> > as-is.
> > If implemented by the first method, the application writer could simply just
> > not call the API to schedule work. If implemented by the second method, 
> > there
> > are already scheduling group APIs to do this.
> 
> The ODP implementation is free to choose how it implements these calls. For
> example adding a single 'if (odp_unlikely(x))' to odp_schedule() to make it a 
> NOP
> after odp_schedule_pause() has been called shouldn't cause a significant 
> overhead.
> 
> > 
> > Are odp_schedule_pause() and odp_schedule_resume() deprecated?
> 
> Nope.
> 
> > 
> > > + odp_barrier_wait(>barrier);
> > > +
> > > + clear_sched_queues();


Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-19 Thread Elo, Matias (Nokia - FI/Espoo)


> -Original Message-
> From: Brian Brooks [mailto:brian.bro...@linaro.org]
> Sent: Saturday, September 17, 2016 1:05 AM
> To: Elo, Matias (Nokia - FI/Espoo) <matias@nokia-bell-labs.com>
> Cc: lng-odp@lists.linaro.org
> Subject: Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency 
> test
> 
> On 09/14 11:53:06, Matias Elo wrote:
> > +
> > +   /* Clear possible locally stored buffers */
> > +   odp_schedule_pause();
> > +
> > +   while (1) {
> > +   ev = odp_schedule(_queue, ODP_SCHED_NO_WAIT);
> > +
> > +   if (ev == ODP_EVENT_INVALID)
> > +   break;
> > +
> > +   if (odp_queue_enq(src_queue, ev)) {
> > +   LOG_ERR("[%i] Queue enqueue failed.\n", thr);
> > +   odp_event_free(ev);
> > +   return -1;
> > +   }
> > +   }
> > +
> > +   odp_schedule_resume();
> 
> Is it possible to skip this and go straight to draining the queues?
> 
> Locally pre-scheduled work is an implementation detail that should be hidden
> by the scheduling APIs.
> 
> A hardware scheduler may not pre-schedule work to cores the way the current
> software implementation does.

Also some HW schedulers may operate in push mode and do local cashing. Calling
odp_schedule_pause() is the only ODP method to signal the scheduler to stop 
this.
So to keep the application platform agnostic (and follow the API documentation),
this step cannot be skipped.

-Matias

> The ODP implementation for that environment
> would have to turn the scheduling call into a nop for that core if it is
> paused by use of these APIs. Another way to implement it would be to remove
> this core from all queue scheduling groups and leave the schedule call as-is.
> If implemented by the first method, the application writer could simply just
> not call the API to schedule work. If implemented by the second method, there
> are already scheduling group APIs to do this.

The ODP implementation is free to choose how it implements these calls. For
example adding a single 'if (odp_unlikely(x))' to odp_schedule() to make it a 
NOP
after odp_schedule_pause() has been called shouldn't cause a significant 
overhead.

> 
> Are odp_schedule_pause() and odp_schedule_resume() deprecated?

Nope.

> 
> > +   odp_barrier_wait(>barrier);
> > +
> > +   clear_sched_queues();


Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-16 Thread Brian Brooks
On 09/14 11:53:06, Matias Elo wrote:
> +
> + /* Clear possible locally stored buffers */
> + odp_schedule_pause();
> +
> + while (1) {
> + ev = odp_schedule(_queue, ODP_SCHED_NO_WAIT);
> +
> + if (ev == ODP_EVENT_INVALID)
> + break;
> +
> + if (odp_queue_enq(src_queue, ev)) {
> + LOG_ERR("[%i] Queue enqueue failed.\n", thr);
> + odp_event_free(ev);
> + return -1;
> + }
> + }
> +
> + odp_schedule_resume();

Is it possible to skip this and go straight to draining the queues?

Locally pre-scheduled work is an implementation detail that should be hidden
by the scheduling APIs.

A hardware scheduler may not pre-schedule work to cores the way the current
software implementation does. The ODP implementation for that environment
would have to turn the scheduling call into a nop for that core if it is
paused by use of these APIs. Another way to implement it would be to remove
this core from all queue scheduling groups and leave the schedule call as-is.
If implemented by the first method, the application writer could simply just
not call the API to schedule work. If implemented by the second method, there
are already scheduling group APIs to do this.

Are odp_schedule_pause() and odp_schedule_resume() deprecated?

> + odp_barrier_wait(>barrier);
> +
> + clear_sched_queues();