[lng-odp] [Bug 2779] CID 173342: Error handling issues (CHECKED_RETURN)

2017-04-05 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=2779

--- Comment #4 from Bill Fischofer  ---
Patch http://patches.opendataplane.org/patch/8459/ posted to resolve this
issue, following Maxim's comments about the first patch.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [lng-odp] [API-NEXT PATCH v2 07/16] test: odp_scheduling: Handle dequeueing from a concurrent queue

2017-04-05 Thread Maxim Uvarov
On 04/05/17 17:30, Ola Liljedahl wrote:
> On 5 April 2017 at 14:50, Maxim Uvarov  wrote:
>> On 04/05/17 06:57, Honnappa Nagarahalli wrote:
>>> This can go into master/api-next as an independent patch. Agree?
>>
>> agree. If we accept implementation where events can be 'delayed'
> Probably all platforms with HW queues.
> 
>> than it
>> looks like we missed some api to sync queues.
> When would those API's be used?
> 

might be in case like that. Might be it's not needed in real world
application.

My point that if situation of postpone event is accepted that we need
document that in api doxygen comment.

Maxim.

>>
>> But I do not see why we need this patch. On the same cpu test queue 1
>> event and after that dequeue 1 event:
>>
>> for (i = 0; i < QUEUE_ROUNDS; i++) {
>> ev = odp_buffer_to_event(buf);
>>
>> if (odp_queue_enq(queue, ev)) {
>> LOG_ERR("  [%i] Queue enqueue failed.\n", thr);
>> odp_buffer_free(buf);
>> return -1;
>> }
>>
>> ev = odp_queue_deq(queue);
>>
>> buf = odp_buffer_from_event(ev);
>>
>> if (!odp_buffer_is_valid(buf)) {
>> LOG_ERR("  [%i] Queue empty.\n", thr);
>> return -1;
>> }
>> }
>>
>> Where this exactly event can be delayed?
> In the memory system.
> 
>>
>> If other threads do the same - then all do enqueue 1 event first and
>> then dequeue one event. I can understand problem with queueing on one
>> cpu and dequeuing on other cpu. But on the same cpu is has to always
>> work. Isn't it?
> No.
> 
>>
>> Maxim.
>>
>>>
>>> On 4 April 2017 at 21:22, Brian Brooks  wrote:
 On 04/04 17:26:12, Bill Fischofer wrote:
> On Tue, Apr 4, 2017 at 3:37 PM, Brian Brooks  wrote:
>> On 04/04 21:59:15, Maxim Uvarov wrote:
>>> On 04/04/17 21:47, Brian Brooks wrote:
 Signed-off-by: Ola Liljedahl 
 Reviewed-by: Brian Brooks 
 Reviewed-by: Honnappa Nagarahalli 
 Reviewed-by: Kevin Wang 
 ---
  test/common_plat/performance/odp_scheduling.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

 diff --git a/test/common_plat/performance/odp_scheduling.c 
 b/test/common_plat/performance/odp_scheduling.c
 index c74a0713..38e76257 100644
 --- a/test/common_plat/performance/odp_scheduling.c
 +++ b/test/common_plat/performance/odp_scheduling.c
 @@ -273,7 +273,7 @@ static int test_plain_queue(int thr, 
 test_globals_t *globals)
 test_message_t *t_msg;
 odp_queue_t queue;
 uint64_t c1, c2, cycles;
 -   int i;
 +   int i, j;

 /* Alloc test message */
 buf = odp_buffer_alloc(globals->pool);
 @@ -307,7 +307,15 @@ static int test_plain_queue(int thr, 
 test_globals_t *globals)
 return -1;
 }

 -   ev = odp_queue_deq(queue);
 +   /* When enqueue and dequeue are decoupled (e.g. not using a
 +* common lock), an enqueued event may not be immediately
 +* visible to dequeue. So we just try again for a while. */
 +   for (j = 0; j < 100; j++) {
>>>
>>> where 100 number comes from?
>>
>> It is the retry count. Perhaps it could be a bit lower, or a bit higher, 
>> but
>> it works well.
>
> Actually, it's incorrect. What happens if all 100 retries fail? You'll
> call odp_buffer_from_event() for ODP_EVENT_INVALID, which is
> undefined.

 Incorrect? :) The point is that an event may not be immediately available
 to dequeue after it has been enqueued. This is due to the way that a 
 concurrent
 ring buffer behaves in a multi-threaded environment. The approach here is
 just to retry the dequeue a couple times (100 times actually) before moving
 on to the rest of code. Perhaps 100 times is too many times, but some 
 amount
 of retry is needed.

 If this is not desirable, then I think it would be more accurate to 
 consider
 odp_queue_enq() / odp_queue_deq() as async APIs -or- MT-unsafe (must be 
 called
 from one thread at a time in order to ensure the behavior that an event is
 immediately available for dequeue once it has been enqueued).

>>
>>> Maxim.
>>>
 +   ev = odp_queue_deq(queue);
 +   if (ev != ODP_EVENT_INVALID)
 +   break;
 +   odp_cpu_pause();
 +   }

 buf = 

[lng-odp] [PATCH] example: l3fwd: check rc from odph_eth_addr_parse()

2017-04-05 Thread Bill Fischofer
Resolve Bug https://bugs.linaro.org/show_bug.cgi?id=2779 by checking
the return code from odph_eth_addr_parse() and failing the call if
dst_mac is unparseable.

Signed-off-by: Bill Fischofer 
---
 example/l3fwd/odp_l3fwd_db.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/example/l3fwd/odp_l3fwd_db.c b/example/l3fwd/odp_l3fwd_db.c
index 082b2c2..7d1efd5 100644
--- a/example/l3fwd/odp_l3fwd_db.c
+++ b/example/l3fwd/odp_l3fwd_db.c
@@ -394,7 +394,8 @@ int create_fwd_db_entry(char *input, char **oif, uint8_t 
**dst_mac)
*oif = entry->oif;
break;
case 2:
-   odph_eth_addr_parse(>dst_mac, token);
+   if (odph_eth_addr_parse(>dst_mac, token) < 0)
+   return -1;
*dst_mac = entry->dst_mac.addr;
break;
 
-- 
2.9.3



[lng-odp] [Bug 2852] ODP_STATIC_ASSERT() fails when used by C++

2017-04-05 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=2852

--- Comment #4 from Bill Fischofer  ---
Patch v5 submitted via pull request from bill-fischofer-linaro/odp. Need status
update but it seems this has not yet been merged.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 2911] unchecked return value may result in out of bounds access

2017-04-05 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=2911

Bill Fischofer  changed:

   What|Removed |Added

   Severity|critical|minor

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 2911] unchecked return value may result in out of bounds access

2017-04-05 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=2911

--- Comment #2 from Bill Fischofer  ---
Actually, while the odp_packet_alloc_multi() may return -1 on errors, this code
calls the internal packet_alloc_multi() which returns the number of packets
allocated in the range 0..max_num. So if no packets are allocated, the for loop
does not execute and netmap_pkt_to_odp() returns 0.

Please review the code and if you agree with the above I'll close this as
invalid. Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Maxim Uvarov
On 04/05/17 21:36, Ola Liljedahl wrote:
> On 5 April 2017 at 17:33, Dmitry Eremin-Solenikov
>  wrote:
>> On 05.04.2017 17:40, Ola Liljedahl wrote:
>>> On 5 April 2017 at 14:20, Maxim Uvarov  wrote:
 On 04/05/17 01:46, Ola Liljedahl wrote:
> On 4 April 2017 at 21:25, Maxim Uvarov  wrote:
>> it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
> "better"? In what way?
>>> Please respond to the question. If you claim something is "better",
>>> you must be able to explain *why* it is better.
>>>
>>> *We* have explained why we think it is better to keep both
>>> implementations in the same file, close to each other. I think Brian's
>>> explanation was very good.
>>
>> Because it allows one to overview a complete implementation at once
>> instead of switching between two different modes.
> That's a good argument as well. It doesn't mean that the
> implementations should live in separate files.
> 
> We keep both implementations in the same file but avoid interleaving
> the different functions (as is done now). This is actually what some
> one in our team wanted.
> 

If it's 2 files which differ for something that it's possible to diff
both files. Interleaving code is very hard to read and understand. If
it's 2 files then you can compile needed file and not compile not
needed. I.e. you don't need to use these ifdefs. And you always know
which way compiler went. I'm not saying that it has to be exactly 2
files. Might be 3 - 1 common file a 2 with difference.

btw, why all that functions are inline?


Maxim.


>>
>> --
>> With best wishes
>> Dmitry



Re: [lng-odp] [RFC/API-NEXT 1/1] api: classification: packet hashing per class of service

2017-04-05 Thread Brian Brooks
On Fri, Dec 9, 2016 at 5:54 AM, Balasubramanian Manoharan
 wrote:
> Adds support to spread packet from a single CoS to multiple queues by
> configuring hashing at CoS level.
>
> Signed-off-by: Balasubramanian Manoharan 
> ---
>  include/odp/api/spec/classification.h | 45 
> ---
>  1 file changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/include/odp/api/spec/classification.h 
> b/include/odp/api/spec/classification.h
> index 0e442c7..220b029 100644
> --- a/include/odp/api/spec/classification.h
> +++ b/include/odp/api/spec/classification.h
> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {
> /** Maximum number of CoS supported */
> unsigned max_cos;
>
> +   /** Maximun number of queue supported per CoS */
> +   unsigned max_queue_supported;
> +
> +   /** Protocol header combination supported for Hashing */
> +   odp_pktin_hash_proto_t hash_supported;

I like this idea and think it is critical for supporting
implementations that can handle lots of queues. What I don't quite
understand is how it relates to the rest of the Classification
functionality. For example, I would like all packets coming from the
physical interface to be hashed/parsed according to "hash_supported"
and then assigned to their respective queues. I would then like to
schedule from those queues. That is all. Is this possible? What
priority and synchronization will those queues have?

> /** A Boolean to denote support of PMR range */
> odp_bool_t pmr_range_supported;
>  } odp_cls_capability_t;
> @@ -164,9 +170,25 @@ typedef enum {
>   * Used to communicate class of service creation options
>   */
>  typedef struct odp_cls_cos_param {
> -   odp_queue_t queue;  /**< Queue associated with CoS */
> -   odp_pool_t pool;/**< Pool associated with CoS */
> -   odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS 
> */
> +   /* Minimum number of queues to be linked to this CoS.
> +* If the number is greater than 1 then hashing has to be
> +* enabled */
> +   uint32_t num_queue;
> +   /* Denotes whether hashing is enabled for queues in this CoS
> +* When hashing is enabled the queues are created by the 
> implementation
> +* and application need not configure any queue to the class of 
> service
> +*/
> +   odp_bool_t enable_hashing;
> +   /* Protocol header fields which are included in packet input
> +* hash calculation */
> +   odp_pktin_hash_proto_t hash;
> +   /* If hashing is disabled, then application has to configure
> +* this queue and packets are delivered to this queue */
> +   odp_queue_t queue;
> +   /* Pool associated with CoS */
> +   odp_pool_t pool;
> +   /* Drop policy associated with CoS */
> +   odp_cls_drop_t drop_policy;
>  } odp_cls_cos_param_t;
>
>  /**
> @@ -209,6 +231,23 @@ int odp_cls_capability(odp_cls_capability_t *capability);
>  odp_cos_t odp_cls_cos_create(const char *name, odp_cls_cos_param_t *param);
>
>  /**
> + * Queue hash result
> + * Returns the queue within a CoS in which a particular packet will be 
> enqueued
> + * based on the packet parameters and hash protocol field configured with the
> + * class of service.
> + *
> + * @param  cos class of service
> + * @param  packet  Packet handle
> + *
> + * @retval Returns the queue handle on which this packet will be
> + * enqueued.
> + * @retval ODP_QUEUE_INVALID for error case
> + *
> + * @note The packet has to be updated with valid header pointers L2, L3 and 
> L4.
> + */
> +odp_queue_t odp_queue_hash_result(odp_cos_t cos, odp_packet_t packet);
> +
> +/**
>   * Discard a class-of-service along with all its associated resources
>   *
>   * @param[in]  cos_id  class-of-service instance.
> --
> 1.9.1
>


Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Ola Liljedahl
On 5 April 2017 at 17:33, Dmitry Eremin-Solenikov
 wrote:
> On 05.04.2017 17:40, Ola Liljedahl wrote:
>> On 5 April 2017 at 14:20, Maxim Uvarov  wrote:
>>> On 04/05/17 01:46, Ola Liljedahl wrote:
 On 4 April 2017 at 21:25, Maxim Uvarov  wrote:
> it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
 "better"? In what way?
>> Please respond to the question. If you claim something is "better",
>> you must be able to explain *why* it is better.
>>
>> *We* have explained why we think it is better to keep both
>> implementations in the same file, close to each other. I think Brian's
>> explanation was very good.
>
> Because it allows one to overview a complete implementation at once
> instead of switching between two different modes.
That's a good argument as well. It doesn't mean that the
implementations should live in separate files.

We keep both implementations in the same file but avoid interleaving
the different functions (as is done now). This is actually what some
one in our team wanted.

>
> --
> With best wishes
> Dmitry


Re: [lng-odp] CRC/Adler requirement in comp interface

2017-04-05 Thread Bill Fischofer
Adding Barry to the to list since I'm not sure if he follows the ODP
mailing list and this was something he raised.

On Wed, Apr 5, 2017 at 7:31 AM, Verma, Shally  wrote:
> from yesterday meeting minutes , I see a note on this feedback on compression:
> Consider adding additional "hashes" (e.g., CRC, Adler)
>
> As we mentioned that comp interface does not provide CRC. Also adler comes as 
> output of zlib format and CRC can be available through helper functions. So 
> is there any use case identified where user need Adler as explicit algorithm 
> to compression interface?
>
> Thanks
> Shally
>


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Brian Brooks
On 04/05 21:27:37, Jerin Jacob wrote:
> -Original Message-
> > Date: Tue, 4 Apr 2017 13:47:52 -0500
> > From: Brian Brooks 
> > To: lng-odp@lists.linaro.org
> > Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler
> > X-Mailer: git-send-email 2.12.2
> > 
> > This work derives from Ola Liljedahl's prototype [1] which introduced a
> > scalable scheduler design based on primarily lock-free algorithms and
> > data structures designed to decrease contention. A thread searches
> > through a data structure containing only queues that are both non-empty
> > and allowed to be scheduled to that thread. Strict priority scheduling is
> > respected, and (W)RR scheduling may be used within queues of the same 
> > priority.
> > Lastly, pre-scheduling or stashing is not employed since it is optional
> > functionality that can be implemented in the application.
> > 
> > In addition to scalable ring buffers, the algorithm also uses unbounded
> > concurrent queues. LL/SC and CAS variants exist in cases where absense of
> > ABA problem cannot be proved, and also in cases where the compiler's atomic
> > built-ins may not be lowered to the desired instruction(s). Finally, a 
> > version
> > of the algorithm that uses locks is also provided.
> > 
> > See platform/linux-generic/include/odp_config_internal.h for further build
> > time configuration.
> > 
> > Use --enable-schedule-scalable to conditionally compile this scheduler
> > into the library.
> 
> This is an interesting stuff.
> 
> Do you have any performance/latency numbers in comparison to exiting scheduler
> for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
> platform?

To give an idea, the avg latency reported by odp_sched_latency is down to half
that of other schedulers (pre-scheduling/stashing disabled) on 4c A53, 16c A57,
and 12c broadwell. We are still preparing numbers, and I think it's worth 
mentioning
that they are subject to change as this patch series changes over time.

I am not aware of an existing benchmark that involves switching between 
different
queue types. Perhaps this is happening in an example app?

> When we say scalable scheduler, What application/means used to quantify
> scalablity??
> 
> Do you have any numbers in comparison to existing scheduler to show
> magnitude of the scalablity on any platform?


Re: [lng-odp] [PATCH] linux-generic: decouple odp_errno define from odp-linux

2017-04-05 Thread Bill Fischofer
This patch has an extra blank line that checkpatch flags:

le-odp_errno-define-from-odp-lin.patch
CHECK: Please don't use multiple blank lines
#48: FILE: platform/linux-generic/include/odp_errno_define.h:13:
+
+

But other than that, this is good.

On Thu, Mar 30, 2017 at 8:31 AM, Balakrishna Garapati
 wrote:
> makes it easy to define odp_errno to dpdk rteerrno and fixes
> linking issues.
>
> Signed-off-by: Balakrishna Garapati 

Reviewed-and-tested-by: Bill Fischofer 

> ---
>  platform/linux-generic/Makefile.am|  1 +
>  platform/linux-generic/include/odp_errno_define.h | 27 
> +++
>  platform/linux-generic/include/odp_internal.h |  3 +--
>  3 files changed, 29 insertions(+), 2 deletions(-)
>  create mode 100644 platform/linux-generic/include/odp_errno_define.h
>
> diff --git a/platform/linux-generic/Makefile.am 
> b/platform/linux-generic/Makefile.am
> index 37835c3..5452915 100644
> --- a/platform/linux-generic/Makefile.am
> +++ b/platform/linux-generic/Makefile.am
> @@ -125,6 +125,7 @@ noinst_HEADERS = \
>   ${srcdir}/include/odp_config_internal.h \
>   ${srcdir}/include/odp_crypto_internal.h \
>   ${srcdir}/include/odp_debug_internal.h \
> + ${srcdir}/include/odp_errno_define.h \
>   ${srcdir}/include/odp_forward_typedefs_internal.h \
>   ${srcdir}/include/odp_internal.h \
>   ${srcdir}/include/odp_name_table_internal.h \
> diff --git a/platform/linux-generic/include/odp_errno_define.h 
> b/platform/linux-generic/include/odp_errno_define.h
> new file mode 100644
> index 000..9baa7d9
> --- /dev/null
> +++ b/platform/linux-generic/include/odp_errno_define.h
> @@ -0,0 +1,27 @@
> +/* Copyright (c) 2017, Linaro Limited
> + * All rights reserved.
> + *
> + * SPDX-License-Identifier: BSD-3-Clause
> + */
> +
> +/**
> + * @file
> + *
> + * ODP error number define
> + */
> +
> +
> +#ifndef ODP_ERRNO_DEFINE_H_
> +#define ODP_ERRNO_DEFINE_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +extern __thread int __odp_errno;
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/platform/linux-generic/include/odp_internal.h 
> b/platform/linux-generic/include/odp_internal.h
> index b313b1f..e1267cf 100644
> --- a/platform/linux-generic/include/odp_internal.h
> +++ b/platform/linux-generic/include/odp_internal.h
> @@ -20,11 +20,10 @@ extern "C" {
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> -extern __thread int __odp_errno;
> -
>  #define MAX_CPU_NUMBER 128
>
>  typedef struct {
> --
> 1.9.1
>


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Jerin Jacob
-Original Message-
> Date: Tue, 4 Apr 2017 13:47:52 -0500
> From: Brian Brooks 
> To: lng-odp@lists.linaro.org
> Subject: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler
> X-Mailer: git-send-email 2.12.2
> 
> This work derives from Ola Liljedahl's prototype [1] which introduced a
> scalable scheduler design based on primarily lock-free algorithms and
> data structures designed to decrease contention. A thread searches
> through a data structure containing only queues that are both non-empty
> and allowed to be scheduled to that thread. Strict priority scheduling is
> respected, and (W)RR scheduling may be used within queues of the same 
> priority.
> Lastly, pre-scheduling or stashing is not employed since it is optional
> functionality that can be implemented in the application.
> 
> In addition to scalable ring buffers, the algorithm also uses unbounded
> concurrent queues. LL/SC and CAS variants exist in cases where absense of
> ABA problem cannot be proved, and also in cases where the compiler's atomic
> built-ins may not be lowered to the desired instruction(s). Finally, a version
> of the algorithm that uses locks is also provided.
> 
> See platform/linux-generic/include/odp_config_internal.h for further build
> time configuration.
> 
> Use --enable-schedule-scalable to conditionally compile this scheduler
> into the library.

This is an interesting stuff.

Do you have any performance/latency numbers in comparison to exiting scheduler
for completing say two stage(ORDERED->ATOMIC) or N stage pipeline on any 
platform?

When we say scalable scheduler, What application/means used to quantify
scalablity??
Do you have any numbers in comparison to existing scheduler to show
magnitude of the scalablity on any platform?



Re: [lng-odp] [API-NEXT PATCH v2 16/16] Add scalable scheduler

2017-04-05 Thread Brian Brooks
On 04/04 22:27:54, Maxim Uvarov wrote:
> On 04/04/17 21:48, Brian Brooks wrote:
> > Add queue getters and setters to provide an abstraction over more than one
> > internal queue data structure.
> > 
> > Use buffer handles instead of pointer to internal object in pktio and tm 
> > code.
> > 
> > Increase the running time of odp_sched_latency to get more stable numbers
> > across multiple runs. This is not done for SP scheduler because it is too 
> > slow.
> > 
> > Use an explicit scheduler group in odp_sched_latency.
> > 
> > Add scalable queue and scheduler implementation.
> > 
> > Signed-off-by: Brian Brooks 
> > Signed-off-by: Kevin Wang 
> > Signed-off-by: Honnappa Nagarahalli 
> > Signed-off-by: Ola Liljedahl 
> > ---
> >  platform/linux-generic/Makefile.am |   21 +-
> >  .../include/odp/api/plat/schedule_types.h  |   20 +-
> >  .../linux-generic/include/odp_queue_internal.h |  122 +-
> >  platform/linux-generic/include/odp_schedule_if.h   |  166 +-
> >  .../include/odp_schedule_ordered_internal.h|  150 ++
> >  platform/linux-generic/m4/odp_schedule.m4  |   55 +-
> >  platform/linux-generic/odp_classification.c|4 +-
> >  platform/linux-generic/odp_packet_io.c |   88 +-
> >  platform/linux-generic/odp_queue.c |2 +-
> >  platform/linux-generic/odp_queue_scalable.c|  883 +
> >  platform/linux-generic/odp_schedule_if.c   |   36 +-
> >  platform/linux-generic/odp_schedule_scalable.c | 1922 
> > 
> >  .../linux-generic/odp_schedule_scalable_ordered.c  |  285 +++
> >  platform/linux-generic/odp_traffic_mngr.c  |7 +-
> >  platform/linux-generic/pktio/loop.c|   10 +-
> >  test/common_plat/performance/odp_sched_latency.c   |   68 +-
> >  16 files changed, 3754 insertions(+), 85 deletions(-)
> >  create mode 100644 
> > platform/linux-generic/include/odp_schedule_ordered_internal.h
> >  create mode 100644 platform/linux-generic/odp_queue_scalable.c
> >  create mode 100644 platform/linux-generic/odp_schedule_scalable.c
> >  create mode 100644 platform/linux-generic/odp_schedule_scalable_ordered.c
> > 
> > diff --git a/platform/linux-generic/Makefile.am 
> > b/platform/linux-generic/Makefile.am
> > index 70683cac..8c263b99 100644
> > --- a/platform/linux-generic/Makefile.am
> > +++ b/platform/linux-generic/Makefile.am
> > @@ -151,6 +151,8 @@ noinst_HEADERS = \
> >   ${srcdir}/include/odp_debug_internal.h \
> >   ${srcdir}/include/odp_forward_typedefs_internal.h \
> >   ${srcdir}/include/odp_internal.h \
> > + ${srcdir}/include/odp_llqueue.h \
> > + ${srcdir}/include/odp_llsc.h \
> >   ${srcdir}/include/odp_name_table_internal.h \
> >   ${srcdir}/include/odp_packet_internal.h \
> >   ${srcdir}/include/odp_packet_io_internal.h \
> > @@ -219,13 +221,9 @@ __LIB__libodp_linux_la_SOURCES = \
> >pktio/ring.c \
> >odp_pkt_queue.c \
> >odp_pool.c \
> > -  odp_queue.c \
> >odp_rwlock.c \
> >odp_rwlock_recursive.c \
> > -  odp_schedule.c \
> >odp_schedule_if.c \
> > -  odp_schedule_sp.c \
> > -  odp_schedule_iquery.c \
> >odp_shared_memory.c \
> >odp_sorted_list.c \
> >odp_spinlock.c \
> > @@ -250,6 +248,21 @@ __LIB__libodp_linux_la_SOURCES = \
> >arch/@ARCH_DIR@/odp_cpu_arch.c \
> >arch/@ARCH_DIR@/odp_sysinfo_parse.c
> >  
> > +if ODP_SCHEDULE_SP
> 
> can that ifs be removed here? I commented about that in previous patch set.

My understanding is that conditional compilation is sufficient
for linux-generic. That is what this is. This form of conditional
compilation does not require a queue interface or akwardly appending
'scalable' to every global symbol that pertains to this queue and
scheduler implementation.

The path that we are headed down is much improved extensibility and
proper run-time interfaces, which would take a bit of time and might
not be easy to review. Not to mention a non-objective in linux-generic.

> Maxim.
> 
> > +__LIB__libodp_linux_la_SOURCES += odp_schedule_sp.c
> > +endif
> > +
> > +if ODP_SCHEDULE_IQUERY
> > +__LIB__libodp_linux_la_SOURCES += odp_schedule_iquery.c
> > +endif
> > +
> > +if ODP_SCHEDULE_SCALABLE
> > +__LIB__libodp_linux_la_SOURCES += odp_queue_scalable.c 
> > odp_schedule_scalable.c \
> > + odp_schedule_scalable_ordered.c
> > +else
> > +__LIB__libodp_linux_la_SOURCES += odp_queue.c odp_schedule.c
> > +endif
> > +
> >  if HAVE_PCAP
> >  __LIB__libodp_linux_la_SOURCES += 

Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Dmitry Eremin-Solenikov
On 05.04.2017 17:40, Ola Liljedahl wrote:
> On 5 April 2017 at 14:20, Maxim Uvarov  wrote:
>> On 04/05/17 01:46, Ola Liljedahl wrote:
>>> On 4 April 2017 at 21:25, Maxim Uvarov  wrote:
 it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
>>> "better"? In what way?
> Please respond to the question. If you claim something is "better",
> you must be able to explain *why* it is better.
> 
> *We* have explained why we think it is better to keep both
> implementations in the same file, close to each other. I think Brian's
> explanation was very good.

Because it allows one to overview a complete implementation at once
instead of switching between two different modes.

-- 
With best wishes
Dmitry


Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Ola Liljedahl
_fdserver.c:
#define FDSERVER_MAX_ENTRIES 256

   /* store the file descriptor in table: */
if (fd_table_nb_entries < FDSERVER_MAX_ENTRIES) {

} else {
ODP_ERR("FD table full\n");

Weird that you get this but not we.

It is probably related to the scalable scheduler requesting a smh
object per queue. This is a left-over from the prototype, perhaps it
needs to be fixed, allocate one shm for all queues. Still may need a
shm per ring buffer though...

Perhaps we need to increase/remove this arbitrary limit on 256 FD entries.

On 5 April 2017 at 14:05, Bill Fischofer  wrote:
> Environment is Ubuntu 16.10.
>
> On Wed, Apr 5, 2017 at 7:03 AM, Bill Fischofer
>  wrote:
>> This is running on my desktop x86:
>>
>> ./bootstrap
>> ./configure --enable-schedule-scalable --enable-cunit-support
>> make
>> cd test/common_plat/validation/api/scheduler
>> ./scheduler_main
>>
>> On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
>>  wrote:
>>> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
 When I compile configure this without --enable-schedule-scalable the
 scheduler validation test runs normally, however if I enable the new
 scheduler I get this output:


 ...
  CUnit - A unit testing framework for C - Version 2.1-3
  http://cunit.sourceforge.net/

 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full

 ...lots more lines like this

 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure

 Suite: Scheduler
   Test: scheduler_test_wait_time
 ..._fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
 de-registration failure
 passed
   Test: scheduler_test_num_prio ...passed
   Test: scheduler_test_queue_destroy
 ..._fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 passed
   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
 full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
 _fdserver.c:463:handle_request():FD table full
 _fdserver.c:297:_odp_fdserver_register_fd():fd 

Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Ola Liljedahl
On 5 April 2017 at 01:21, Dmitry Eremin-Solenikov
 wrote:
> On 05.04.2017 00:25, Brian Brooks wrote:
>> On 04/04 23:23:33, Dmitry Eremin-Solenikov wrote:
>>> On 04.04.2017 22:25, Maxim Uvarov wrote:
 it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
 defined and one for not.
>>>
>>> Seconding that. At least LLDSCD and non-LLDSCD code should not be
>>> interleaved.
>>
>> Can you explain your judgement?
>
> Consider reading two intermixed books of technical recipes. It is just
> my opinion, but I'd prefer to have two separate code blocks: one for
> LLDSCD, one for non-LLDSCD cases.
In this case, it is two recipes for baking the *same* cake. We think
it is useful to be able to easily compare the recipes.
Each function can be considered a separate code block, it's not like
we are mixing recipes line by line.

>
> --
> With best wishes
> Dmitry


Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Ola Liljedahl
On 5 April 2017 at 14:20, Maxim Uvarov  wrote:
> On 04/05/17 01:46, Ola Liljedahl wrote:
>> On 4 April 2017 at 21:25, Maxim Uvarov  wrote:
>>> it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
>> "better"? In what way?
Please respond to the question. If you claim something is "better",
you must be able to explain *why* it is better.

*We* have explained why we think it is better to keep both
implementations in the same file, close to each other. I think Brian's
explanation was very good.

>>
>>> defined and one for not. Also ODP_ prefix should not be used for
>>> internal things (not api).
>> OK this was not clear, some of the defines in odp_config_internal.h
>> use an ODP_ prefix, some not. You mean there is a system to that?
>>
>> Shouldn't those defines that are part of the API be declared/described
>> in the API header files (located in include/odp/api/spec)? How else do
>> you know that they are part of the API? And if they are part of the
>> API, how does the application (the 'A' in API) access the definitions
>> *and* their values?
>>
>> There are API's for querying about things like total number of queues
>> but those API's are separate and do not depend on some define with a
>> specific name.
>>
>
> That is not api setting. It's linux-generic internal settings. ODP apps
> do not use that values.
>
> Maxim.
>
>>>
>>> Maxim.
>>>
>>> On 04/04/17 21:48, Brian Brooks wrote:
 Signed-off-by: Ola Liljedahl 
 Reviewed-by: Brian Brooks 
 ---
  platform/linux-generic/include/odp_llqueue.h | 285 
 +++
  1 file changed, 285 insertions(+)
  create mode 100644 platform/linux-generic/include/odp_llqueue.h

 diff --git a/platform/linux-generic/include/odp_llqueue.h 
 b/platform/linux-generic/include/odp_llqueue.h
 new file mode 100644
 index ..aa46ace3
 --- /dev/null
 +++ b/platform/linux-generic/include/odp_llqueue.h
 @@ -0,0 +1,285 @@
 +/* Copyright (c) 2017, ARM Limited.
 + * All rights reserved.
 + *
 + * SPDX-License-Identifier:BSD-3-Clause
 + */
 +
 +#ifndef ODP_LLQUEUE_H_
 +#define ODP_LLQUEUE_H_
 +
 +#include 
 +#include 
 +#include 
 +
 +#include 
 +#include 
 +#include 
 +
 +#include 
 +#include 
 +
 +/**
 + * Linked list queues
 + 
 */
 +
 +/* The scalar equivalent of a double pointer */
 +#if __SIZEOF_PTRDIFF_T__ == 4
 +typedef uint64_t dintptr_t;
 +#endif
 +#if __SIZEOF_PTRDIFF_T__ == 8
 +typedef __int128 dintptr_t;
 +#endif
 +
 +#define SENTINEL ((void *)~(uintptr_t)0)
 +
 +struct llnode {
 + struct llnode *next;
 +};
 +
 +union llht {
 + struct {
 + struct llnode *head, *tail;
 + } st;
 + dintptr_t ui;
 +};
 +
 +struct llqueue {
 + union llht u;
 +#ifndef ODP_CONFIG_LLDSCD
 + odp_spinlock_t lock;
 +#endif
 +};
 +
 +static inline struct llnode *llq_head(struct llqueue *llq)
 +{
 + return __atomic_load_n(>u.st.head, __ATOMIC_RELAXED);
 +}
 +
 +static inline void llqueue_init(struct llqueue *llq)
 +{
 + llq->u.st.head = NULL;
 + llq->u.st.tail = NULL;
 +#ifndef ODP_CONFIG_LLDSCD
 + odp_spinlock_init(>lock);
 +#endif
 +}
 +
 +#ifdef ODP_CONFIG_LLDSCD
 +
 +static inline void llq_enqueue(struct llqueue *llq, struct llnode *node)
 +{
 + union llht old, neu;
 +
 + ODP_ASSERT(node->next == NULL);
 + node->next = SENTINEL;
 + do {
 + old.ui = lld(>u.ui, __ATOMIC_RELAXED);
 + neu.st.head = old.st.head == NULL ? node : old.st.head;
 + neu.st.tail = node;
 + } while (odp_unlikely(scd(>u.ui, neu.ui, __ATOMIC_RELEASE)));
 + if (old.st.tail != NULL) {
 + /* List was not empty */
 + ODP_ASSERT(old.st.tail->next == SENTINEL);
 + old.st.tail->next = node;
 + }
 +}
 +
 +#else
 +
 +static inline void llq_enqueue(struct llqueue *llq, struct llnode *node)
 +{
 + ODP_ASSERT(node->next == NULL);
 + node->next = SENTINEL;
 +
 + odp_spinlock_lock(>lock);
 + if (llq->u.st.head == NULL) {
 + llq->u.st.head = node;
 + llq->u.st.tail = node;
 + } else {
 + llq->u.st.tail->next = node;
 + llq->u.st.tail = node;
 + }
 + odp_spinlock_unlock(>lock);
 +}
 +
 +#endif
 +
 +#ifdef ODP_CONFIG_LLDSCD
 +
 +static inline 

Re: [lng-odp] [API-NEXT PATCH v2 07/16] test: odp_scheduling: Handle dequeueing from a concurrent queue

2017-04-05 Thread Ola Liljedahl
On 5 April 2017 at 14:50, Maxim Uvarov  wrote:
> On 04/05/17 06:57, Honnappa Nagarahalli wrote:
>> This can go into master/api-next as an independent patch. Agree?
>
> agree. If we accept implementation where events can be 'delayed'
Probably all platforms with HW queues.

> than it
> looks like we missed some api to sync queues.
When would those API's be used?

>
> But I do not see why we need this patch. On the same cpu test queue 1
> event and after that dequeue 1 event:
>
> for (i = 0; i < QUEUE_ROUNDS; i++) {
> ev = odp_buffer_to_event(buf);
>
> if (odp_queue_enq(queue, ev)) {
> LOG_ERR("  [%i] Queue enqueue failed.\n", thr);
> odp_buffer_free(buf);
> return -1;
> }
>
> ev = odp_queue_deq(queue);
>
> buf = odp_buffer_from_event(ev);
>
> if (!odp_buffer_is_valid(buf)) {
> LOG_ERR("  [%i] Queue empty.\n", thr);
> return -1;
> }
> }
>
> Where this exactly event can be delayed?
In the memory system.

>
> If other threads do the same - then all do enqueue 1 event first and
> then dequeue one event. I can understand problem with queueing on one
> cpu and dequeuing on other cpu. But on the same cpu is has to always
> work. Isn't it?
No.

>
> Maxim.
>
>>
>> On 4 April 2017 at 21:22, Brian Brooks  wrote:
>>> On 04/04 17:26:12, Bill Fischofer wrote:
 On Tue, Apr 4, 2017 at 3:37 PM, Brian Brooks  wrote:
> On 04/04 21:59:15, Maxim Uvarov wrote:
>> On 04/04/17 21:47, Brian Brooks wrote:
>>> Signed-off-by: Ola Liljedahl 
>>> Reviewed-by: Brian Brooks 
>>> Reviewed-by: Honnappa Nagarahalli 
>>> Reviewed-by: Kevin Wang 
>>> ---
>>>  test/common_plat/performance/odp_scheduling.c | 12 ++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/test/common_plat/performance/odp_scheduling.c 
>>> b/test/common_plat/performance/odp_scheduling.c
>>> index c74a0713..38e76257 100644
>>> --- a/test/common_plat/performance/odp_scheduling.c
>>> +++ b/test/common_plat/performance/odp_scheduling.c
>>> @@ -273,7 +273,7 @@ static int test_plain_queue(int thr, test_globals_t 
>>> *globals)
>>> test_message_t *t_msg;
>>> odp_queue_t queue;
>>> uint64_t c1, c2, cycles;
>>> -   int i;
>>> +   int i, j;
>>>
>>> /* Alloc test message */
>>> buf = odp_buffer_alloc(globals->pool);
>>> @@ -307,7 +307,15 @@ static int test_plain_queue(int thr, 
>>> test_globals_t *globals)
>>> return -1;
>>> }
>>>
>>> -   ev = odp_queue_deq(queue);
>>> +   /* When enqueue and dequeue are decoupled (e.g. not using a
>>> +* common lock), an enqueued event may not be immediately
>>> +* visible to dequeue. So we just try again for a while. */
>>> +   for (j = 0; j < 100; j++) {
>>
>> where 100 number comes from?
>
> It is the retry count. Perhaps it could be a bit lower, or a bit higher, 
> but
> it works well.

 Actually, it's incorrect. What happens if all 100 retries fail? You'll
 call odp_buffer_from_event() for ODP_EVENT_INVALID, which is
 undefined.
>>>
>>> Incorrect? :) The point is that an event may not be immediately available
>>> to dequeue after it has been enqueued. This is due to the way that a 
>>> concurrent
>>> ring buffer behaves in a multi-threaded environment. The approach here is
>>> just to retry the dequeue a couple times (100 times actually) before moving
>>> on to the rest of code. Perhaps 100 times is too many times, but some amount
>>> of retry is needed.
>>>
>>> If this is not desirable, then I think it would be more accurate to consider
>>> odp_queue_enq() / odp_queue_deq() as async APIs -or- MT-unsafe (must be 
>>> called
>>> from one thread at a time in order to ensure the behavior that an event is
>>> immediately available for dequeue once it has been enqueued).
>>>
>
>> Maxim.
>>
>>> +   ev = odp_queue_deq(queue);
>>> +   if (ev != ODP_EVENT_INVALID)
>>> +   break;
>>> +   odp_cpu_pause();
>>> +   }
>>>
>>> buf = odp_buffer_from_event(ev);
>>>
>>>
>>
>


Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Ola Liljedahl

On 05/04/2017, 15:39, "Dmitry Eremin-Solenikov"
 wrote:

>On 05.04.2017 16:33, Ola Liljedahl wrote:
>> 
>> 
>> 
>> 
>> On 05/04/2017, 15:22, "Dmitry Eremin-Solenikov"
>>  wrote:
>> 
>>> On 05.04.2017 15:16, Ola Liljedahl wrote:
 On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
  wrote:

> On 05.04.2017 02:31, Ola Liljedahl wrote:
>> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>>  wrote:
>>> On 04.04.2017 23:52, Ola Liljedahl wrote:
 Sending from my ARM email account, I hope Outlook does not mess up
 the
 format.



 On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
  wrote:

> On 04.04.2017 21:48, Brian Brooks wrote:
>> Signed-off-by: Ola Liljedahl 
>> Reviewed-by: Brian Brooks 
>> Reviewed-by: Honnappa Nagarahalli 
>
>>
>>
>>
>>
>> 
>>+/***
>>**
>> **
>> **
>> **
>> ***
>> + * bitset abstract data type
>> +
>>
>>
>>
>> 
>>*
>>**
>> **
>> **
>> **
>> /
>> +/* This could be a struct of scalars to support larger bit sets
>> */
>> +
>> +#if ATOM_BITSET_SIZE <= 32
>
> Maybe I missed, where did you set this macro?
 In odp_config_internal.h
 It is a build time configuration.

>
> Also, why do you need several versions of bitset? Can you stick
>to
> one
> size that fits all?
 Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
 (AFAIK).
 Only x86-64 and ARMv8a supports 128-bit atomics (and compiler
 support
 for
 128-bit atomics for ARMv8a is a bit lackingŠ).
 Other architectures might only support 32-bit atomic operations.
>>>
>>> What will be the major outcome of settling on the 64-bit atomics?
>> The size of the bitset determines the maximum number of threads, the
>> maximum number of scheduler groups and the maximum number of reorder
>> contexts (per thread).
>
> Then even 128 can become too small in the forthcoming future. As far
>as
> I understand, most of the interesting things happen around
> bitsetting/clearing. Maybe we can redefine bitset as a struct or
>array
> of atomics? Then it would be expandable without significant software
> issues, wouldn't it?
>
> I'm trying to get away of situation where we have overcomplicated low
> level code, which brings different issues on further platforms (like
> supporting this amount of threads on ARM and that amount of threads
>on
> x86/PPC/MIPS/etc).
 I think the current implementation is simple and efficient. I also
 think it
 is sufficiently capable, e.g. supports up to 128 threads/scheduler
 groups
 etc.
>>>
>>> With 96 cores on existing boards, 128 seems quite like a close limit.
>> The limit imposed by bitset_t is the number of threads (CPU's) in one
>>ODP
>> application. It is not a platform or system limit.
>> 
>> How likely is it that all of those 96 cores will be executing the same
>>ODP
>> application?
>
>That depends on the exact customer's view.
So still completely hypothetical.

>
>> I doubt anyone wants to have a ODP app spanning more than one socket,
>> consider the inter-socket latency on current multi-socket capable SoC's.
>
>Just two sockets. Sorry. I start to sound like an advertisement. I'll to
>stop that. But '128 threads' really sounds like '640k ought to be enough
>for everybody'. Let's work for scalable generic solution.
We can implement a more complicated solution when there is an actual need
for that. The API's won't change, nor the ABI, due to an internal
implementation change.
This limit is not an architectural limit so cannot be compared to the 640K
limit of old PC's.
If you think they are equivalent you need think harder.


>
 on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I
 don't
 think we should make a more complicated generic implementation until
the
 need has surfaced. It is easy to over-speculate in what will be
 required in
 the future and implement stuff that is never used.
>>>
>>> It is already overcomplicated.
>> What do you think is overcomplicated? I think the code is very simple.
>> Only one or two functions have more than one line of C code in them.
>> 
>>> It is a nice scientific solution,
>> "scientific"?

Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Dmitry Eremin-Solenikov
On 05.04.2017 16:33, Ola Liljedahl wrote:
> 
> 
> 
> 
> On 05/04/2017, 15:22, "Dmitry Eremin-Solenikov"
>  wrote:
> 
>> On 05.04.2017 15:16, Ola Liljedahl wrote:
>>> On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
>>>  wrote:
>>>
 On 05.04.2017 02:31, Ola Liljedahl wrote:
> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>  wrote:
>> On 04.04.2017 23:52, Ola Liljedahl wrote:
>>> Sending from my ARM email account, I hope Outlook does not mess up
>>> the
>>> format.
>>>
>>>
>>>
>>> On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
>>>  wrote:
>>>
 On 04.04.2017 21:48, Brian Brooks wrote:
> Signed-off-by: Ola Liljedahl 
> Reviewed-by: Brian Brooks 
> Reviewed-by: Honnappa Nagarahalli 

>
>
>
>
> +/*
> **
> **
> **
> ***
> + * bitset abstract data type
> +
>
>
>
> ***
> **
> **
> **
> /
> +/* This could be a struct of scalars to support larger bit sets
> */
> +
> +#if ATOM_BITSET_SIZE <= 32

 Maybe I missed, where did you set this macro?
>>> In odp_config_internal.h
>>> It is a build time configuration.
>>>

 Also, why do you need several versions of bitset? Can you stick to
 one
 size that fits all?
>>> Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
>>> (AFAIK).
>>> Only x86-64 and ARMv8a supports 128-bit atomics (and compiler
>>> support
>>> for
>>> 128-bit atomics for ARMv8a is a bit lackingŠ).
>>> Other architectures might only support 32-bit atomic operations.
>>
>> What will be the major outcome of settling on the 64-bit atomics?
> The size of the bitset determines the maximum number of threads, the
> maximum number of scheduler groups and the maximum number of reorder
> contexts (per thread).

 Then even 128 can become too small in the forthcoming future. As far as
 I understand, most of the interesting things happen around
 bitsetting/clearing. Maybe we can redefine bitset as a struct or array
 of atomics? Then it would be expandable without significant software
 issues, wouldn't it?

 I'm trying to get away of situation where we have overcomplicated low
 level code, which brings different issues on further platforms (like
 supporting this amount of threads on ARM and that amount of threads on
 x86/PPC/MIPS/etc).
>>> I think the current implementation is simple and efficient. I also
>>> think it
>>> is sufficiently capable, e.g. supports up to 128 threads/scheduler
>>> groups
>>> etc.
>>
>> With 96 cores on existing boards, 128 seems quite like a close limit.
> The limit imposed by bitset_t is the number of threads (CPU's) in one ODP
> application. It is not a platform or system limit.
> 
> How likely is it that all of those 96 cores will be executing the same ODP
> application?

That depends on the exact customer's view.

> I doubt anyone wants to have a ODP app spanning more than one socket,
> consider the inter-socket latency on current multi-socket capable SoC's.

Just two sockets. Sorry. I start to sound like an advertisement. I'll to
stop that. But '128 threads' really sounds like '640k ought to be enough
for everybody'. Let's work for scalable generic solution.

>>> on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I
>>> don't
>>> think we should make a more complicated generic implementation until the
>>> need has surfaced. It is easy to over-speculate in what will be
>>> required in
>>> the future and implement stuff that is never used.
>>
>> It is already overcomplicated.
> What do you think is overcomplicated? I think the code is very simple.
> Only one or two functions have more than one line of C code in them.
> 
>> It is a nice scientific solution,
> "scientific"?

Yep. I used the same work, when we discussed ipfrag reassembly. You have
nice, fast and ideal solutions, but they are hard to be understood and
maintained by other people. Thus I'm asking for understandable generic C
solution, which is further optimized by your code.


-- 
With best wishes
Dmitry


Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Ola Liljedahl




On 05/04/2017, 15:22, "Dmitry Eremin-Solenikov"
 wrote:

>On 05.04.2017 15:16, Ola Liljedahl wrote:
>> On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
>>  wrote:
>> 
>>> On 05.04.2017 02:31, Ola Liljedahl wrote:
 On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
  wrote:
> On 04.04.2017 23:52, Ola Liljedahl wrote:
>> Sending from my ARM email account, I hope Outlook does not mess up
>>the
>> format.
>>
>>
>>
>> On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
>>  wrote:
>>
>>> On 04.04.2017 21:48, Brian Brooks wrote:
 Signed-off-by: Ola Liljedahl 
 Reviewed-by: Brian Brooks 
 Reviewed-by: Honnappa Nagarahalli 
>>>



 
+/*
**
 **
 **
 ***
 + * bitset abstract data type
 +


 
***
**
 **
 **
 /
 +/* This could be a struct of scalars to support larger bit sets
*/
 +
 +#if ATOM_BITSET_SIZE <= 32
>>>
>>> Maybe I missed, where did you set this macro?
>> In odp_config_internal.h
>> It is a build time configuration.
>>
>>>
>>> Also, why do you need several versions of bitset? Can you stick to
>>> one
>>> size that fits all?
>> Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
>> (AFAIK).
>> Only x86-64 and ARMv8a supports 128-bit atomics (and compiler
>>support
>> for
>> 128-bit atomics for ARMv8a is a bit lackingŠ).
>> Other architectures might only support 32-bit atomic operations.
>
> What will be the major outcome of settling on the 64-bit atomics?
 The size of the bitset determines the maximum number of threads, the
 maximum number of scheduler groups and the maximum number of reorder
 contexts (per thread).
>>>
>>> Then even 128 can become too small in the forthcoming future. As far as
>>> I understand, most of the interesting things happen around
>>> bitsetting/clearing. Maybe we can redefine bitset as a struct or array
>>> of atomics? Then it would be expandable without significant software
>>> issues, wouldn't it?
>>>
>>> I'm trying to get away of situation where we have overcomplicated low
>>> level code, which brings different issues on further platforms (like
>>> supporting this amount of threads on ARM and that amount of threads on
>>> x86/PPC/MIPS/etc).
>> I think the current implementation is simple and efficient. I also
>>think it
>> is sufficiently capable, e.g. supports up to 128 threads/scheduler
>>groups
>> etc.
>
>With 96 cores on existing boards, 128 seems quite like a close limit.
The limit imposed by bitset_t is the number of threads (CPU's) in one ODP
application. It is not a platform or system limit.

How likely is it that all of those 96 cores will be executing the same ODP
application?
I doubt anyone wants to have a ODP app spanning more than one socket,
consider the inter-socket latency on current multi-socket capable SoC's.

>
>> on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I
>>don't
>> think we should make a more complicated generic implementation until the
>> need has surfaced. It is easy to over-speculate in what will be
>>required in
>> the future and implement stuff that is never used.
>
>It is already overcomplicated.
What do you think is overcomplicated? I think the code is very simple.
Only one or two functions have more than one line of C code in them.

> It is a nice scientific solution,
"scientific"?

> it
>might be high performance, but it is a bit too complicated for generic
>code.
What is "too complicated" and what simpler solution do you suggest instead?

> I have the feeling that it can find path in odp-cloud, but for
>odp/linux-generic we need (IMO) initially a simple code.
Then we shouldn't add the scalable scheduler to linux-generic, too
complicated.

>
>-- 
>With best wishes
>Dmitry



Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Dmitry Eremin-Solenikov
On 05.04.2017 15:16, Ola Liljedahl wrote:
> On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
>  wrote:
> 
>> On 05.04.2017 02:31, Ola Liljedahl wrote:
>>> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>>>  wrote:
 On 04.04.2017 23:52, Ola Liljedahl wrote:
> Sending from my ARM email account, I hope Outlook does not mess up the
> format.
>
>
>
> On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
>  wrote:
>
>> On 04.04.2017 21:48, Brian Brooks wrote:
>>> Signed-off-by: Ola Liljedahl 
>>> Reviewed-by: Brian Brooks 
>>> Reviewed-by: Honnappa Nagarahalli 
>>
>>>
>>>
>>>
>>> +/***
>>> **
>>> **
>>> ***
>>> + * bitset abstract data type
>>> +
>>>
>>>
>>> *
>>> **
>>> **
>>> /
>>> +/* This could be a struct of scalars to support larger bit sets */
>>> +
>>> +#if ATOM_BITSET_SIZE <= 32
>>
>> Maybe I missed, where did you set this macro?
> In odp_config_internal.h
> It is a build time configuration.
>
>>
>> Also, why do you need several versions of bitset? Can you stick to
>> one
>> size that fits all?
> Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
> (AFAIK).
> Only x86-64 and ARMv8a supports 128-bit atomics (and compiler support
> for
> 128-bit atomics for ARMv8a is a bit lackingŠ).
> Other architectures might only support 32-bit atomic operations.

 What will be the major outcome of settling on the 64-bit atomics?
>>> The size of the bitset determines the maximum number of threads, the
>>> maximum number of scheduler groups and the maximum number of reorder
>>> contexts (per thread).
>>
>> Then even 128 can become too small in the forthcoming future. As far as
>> I understand, most of the interesting things happen around
>> bitsetting/clearing. Maybe we can redefine bitset as a struct or array
>> of atomics? Then it would be expandable without significant software
>> issues, wouldn't it?
>>
>> I'm trying to get away of situation where we have overcomplicated low
>> level code, which brings different issues on further platforms (like
>> supporting this amount of threads on ARM and that amount of threads on
>> x86/PPC/MIPS/etc).
> I think the current implementation is simple and efficient. I also think it
> is sufficiently capable, e.g. supports up to 128 threads/scheduler groups
> etc.

With 96 cores on existing boards, 128 seems quite like a close limit.

> on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I don't
> think we should make a more complicated generic implementation until the
> need has surfaced. It is easy to over-speculate in what will be required in
> the future and implement stuff that is never used.

It is already overcomplicated. It is a nice scientific solution, it
might be high performance, but it is a bit too complicated for generic
code. I have the feeling that it can find path in odp-cloud, but for
odp/linux-generic we need (IMO) initially a simple code.


-- 
With best wishes
Dmitry


Re: [lng-odp] [API-NEXT PATCH v2 07/16] test: odp_scheduling: Handle dequeueing from a concurrent queue

2017-04-05 Thread Maxim Uvarov
On 04/05/17 06:57, Honnappa Nagarahalli wrote:
> This can go into master/api-next as an independent patch. Agree?

agree. If we accept implementation where events can be 'delayed' than it
looks like we missed some api to sync queues.

But I do not see why we need this patch. On the same cpu test queue 1
event and after that dequeue 1 event:

for (i = 0; i < QUEUE_ROUNDS; i++) {
ev = odp_buffer_to_event(buf);

if (odp_queue_enq(queue, ev)) {
LOG_ERR("  [%i] Queue enqueue failed.\n", thr);
odp_buffer_free(buf);
return -1;
}

ev = odp_queue_deq(queue);

buf = odp_buffer_from_event(ev);

if (!odp_buffer_is_valid(buf)) {
LOG_ERR("  [%i] Queue empty.\n", thr);
return -1;
}
}

Where this exactly event can be delayed?

If other threads do the same - then all do enqueue 1 event first and
then dequeue one event. I can understand problem with queueing on one
cpu and dequeuing on other cpu. But on the same cpu is has to always
work. Isn't it?

Maxim.

> 
> On 4 April 2017 at 21:22, Brian Brooks  wrote:
>> On 04/04 17:26:12, Bill Fischofer wrote:
>>> On Tue, Apr 4, 2017 at 3:37 PM, Brian Brooks  wrote:
 On 04/04 21:59:15, Maxim Uvarov wrote:
> On 04/04/17 21:47, Brian Brooks wrote:
>> Signed-off-by: Ola Liljedahl 
>> Reviewed-by: Brian Brooks 
>> Reviewed-by: Honnappa Nagarahalli 
>> Reviewed-by: Kevin Wang 
>> ---
>>  test/common_plat/performance/odp_scheduling.c | 12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/test/common_plat/performance/odp_scheduling.c 
>> b/test/common_plat/performance/odp_scheduling.c
>> index c74a0713..38e76257 100644
>> --- a/test/common_plat/performance/odp_scheduling.c
>> +++ b/test/common_plat/performance/odp_scheduling.c
>> @@ -273,7 +273,7 @@ static int test_plain_queue(int thr, test_globals_t 
>> *globals)
>> test_message_t *t_msg;
>> odp_queue_t queue;
>> uint64_t c1, c2, cycles;
>> -   int i;
>> +   int i, j;
>>
>> /* Alloc test message */
>> buf = odp_buffer_alloc(globals->pool);
>> @@ -307,7 +307,15 @@ static int test_plain_queue(int thr, test_globals_t 
>> *globals)
>> return -1;
>> }
>>
>> -   ev = odp_queue_deq(queue);
>> +   /* When enqueue and dequeue are decoupled (e.g. not using a
>> +* common lock), an enqueued event may not be immediately
>> +* visible to dequeue. So we just try again for a while. */
>> +   for (j = 0; j < 100; j++) {
>
> where 100 number comes from?

 It is the retry count. Perhaps it could be a bit lower, or a bit higher, 
 but
 it works well.
>>>
>>> Actually, it's incorrect. What happens if all 100 retries fail? You'll
>>> call odp_buffer_from_event() for ODP_EVENT_INVALID, which is
>>> undefined.
>>
>> Incorrect? :) The point is that an event may not be immediately available
>> to dequeue after it has been enqueued. This is due to the way that a 
>> concurrent
>> ring buffer behaves in a multi-threaded environment. The approach here is
>> just to retry the dequeue a couple times (100 times actually) before moving
>> on to the rest of code. Perhaps 100 times is too many times, but some amount
>> of retry is needed.
>>
>> If this is not desirable, then I think it would be more accurate to consider
>> odp_queue_enq() / odp_queue_deq() as async APIs -or- MT-unsafe (must be 
>> called
>> from one thread at a time in order to ensure the behavior that an event is
>> immediately available for dequeue once it has been enqueued).
>>

> Maxim.
>
>> +   ev = odp_queue_deq(queue);
>> +   if (ev != ODP_EVENT_INVALID)
>> +   break;
>> +   odp_cpu_pause();
>> +   }
>>
>> buf = odp_buffer_from_event(ev);
>>
>>
>



[lng-odp] CRC/Adler requirement in comp interface

2017-04-05 Thread Verma, Shally
from yesterday meeting minutes , I see a note on this feedback on compression:
Consider adding additional "hashes" (e.g., CRC, Adler)

As we mentioned that comp interface does not provide CRC. Also adler comes as 
output of zlib format and CRC can be available through helper functions. So is 
there any use case identified where user need Adler as explicit algorithm to 
compression interface?

Thanks
Shally



Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Maxim Uvarov
On 04/05/17 15:16, Ola Liljedahl wrote:
> On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
>  wrote:
> 
>> On 05.04.2017 02:31, Ola Liljedahl wrote:
>>> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>>>  wrote:
 On 04.04.2017 23:52, Ola Liljedahl wrote:
> Sending from my ARM email account, I hope Outlook does not mess up the
> format.
>
>
>
> On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
>  wrote:
>
>> On 04.04.2017 21:48, Brian Brooks wrote:
>>> Signed-off-by: Ola Liljedahl 
>>> Reviewed-by: Brian Brooks 
>>> Reviewed-by: Honnappa Nagarahalli 
>>
>>>
>>>
>>>
>>> +/***
>>> **
>>> **
>>> ***
>>> + * bitset abstract data type
>>> +
>>>
>>>
>>> *
>>> **
>>> **
>>> /
>>> +/* This could be a struct of scalars to support larger bit sets */
>>> +
>>> +#if ATOM_BITSET_SIZE <= 32
>>
>> Maybe I missed, where did you set this macro?
> In odp_config_internal.h
> It is a build time configuration.
>
>>
>> Also, why do you need several versions of bitset? Can you stick to
>> one
>> size that fits all?
> Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
> (AFAIK).
> Only x86-64 and ARMv8a supports 128-bit atomics (and compiler support
> for
> 128-bit atomics for ARMv8a is a bit lackingŠ).
> Other architectures might only support 32-bit atomic operations.

 What will be the major outcome of settling on the 64-bit atomics?
>>> The size of the bitset determines the maximum number of threads, the
>>> maximum number of scheduler groups and the maximum number of reorder
>>> contexts (per thread).
>>
>> Then even 128 can become too small in the forthcoming future. As far as
>> I understand, most of the interesting things happen around
>> bitsetting/clearing. Maybe we can redefine bitset as a struct or array
>> of atomics? Then it would be expandable without significant software
>> issues, wouldn't it?
>>

Why odp_cpu_mask_t is not used for that case?

Maxim.


>> I'm trying to get away of situation where we have overcomplicated low
>> level code, which brings different issues on further platforms (like
>> supporting this amount of threads on ARM and that amount of threads on
>> x86/PPC/MIPS/etc).
> I think the current implementation is simple and efficient. I also think it
> is sufficiently capable, e.g. supports up to 128 threads/scheduler groups
> etc.
> on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I don't
> think we should make a more complicated generic implementation until the
> need has surfaced. It is easy to over-speculate in what will be required in
> the future and implement stuff that is never used.
> 
>>
> I think the user should have control over this but if you think that
> we
> should just select the max value that is supported by the architecture
> in
> question and thus skip one build configuration, I am open to this. We
> will
> still need separate versions for 32/64/128 bits because there are
> slight
> differences in the syntax and implementation. Such are the vagaries of
> the
> C standard (and GCC extensions).
>
>
>> Any real reason for the following defines? Why do you need them?
> The functions were added as they were needed, e.g. in
> odp_schedule_scalable.c.
> I dont think there is anyone which is not used anymore but can
> double-check that.

 Well. I maybe should rephrase my question: why do you think that it's
 better to have bitset_andn(a, b), rather than just a &~b ?
>>> The atomic bitset is an abstract data type. The implementation does not
>>> have to use a scalar word. Alternative implementation paths exist, e.g.
>>> use a struct with multiple words and perform the requested operation one
>>> word at a time (this is OK but perhaps not well documented).
>>
>> This makes sense, esp. if we add non-plain-integer bitsets.
> One note on using a struct with multiple words is that this will/might in
> some cases
> require multiple atomic operations (one per word) and this will be slower.
> 
>>
>>
>> -- 
>> With best wishes
>> Dmitry
> 



Re: [lng-odp] [API-NEXT PATCH v2 15/16] Add llqueue, an unbounded concurrent queue

2017-04-05 Thread Maxim Uvarov
On 04/05/17 01:46, Ola Liljedahl wrote:
> On 4 April 2017 at 21:25, Maxim Uvarov  wrote:
>> it's better to have 2 separate files for that. One for ODP_CONFIG_LLDSCD
> "better"? In what way?
> 
>> defined and one for not. Also ODP_ prefix should not be used for
>> internal things (not api).
> OK this was not clear, some of the defines in odp_config_internal.h
> use an ODP_ prefix, some not. You mean there is a system to that?
> 
> Shouldn't those defines that are part of the API be declared/described
> in the API header files (located in include/odp/api/spec)? How else do
> you know that they are part of the API? And if they are part of the
> API, how does the application (the 'A' in API) access the definitions
> *and* their values?
> 
> There are API's for querying about things like total number of queues
> but those API's are separate and do not depend on some define with a
> specific name.
> 

That is not api setting. It's linux-generic internal settings. ODP apps
do not use that values.

Maxim.

>>
>> Maxim.
>>
>> On 04/04/17 21:48, Brian Brooks wrote:
>>> Signed-off-by: Ola Liljedahl 
>>> Reviewed-by: Brian Brooks 
>>> ---
>>>  platform/linux-generic/include/odp_llqueue.h | 285 
>>> +++
>>>  1 file changed, 285 insertions(+)
>>>  create mode 100644 platform/linux-generic/include/odp_llqueue.h
>>>
>>> diff --git a/platform/linux-generic/include/odp_llqueue.h 
>>> b/platform/linux-generic/include/odp_llqueue.h
>>> new file mode 100644
>>> index ..aa46ace3
>>> --- /dev/null
>>> +++ b/platform/linux-generic/include/odp_llqueue.h
>>> @@ -0,0 +1,285 @@
>>> +/* Copyright (c) 2017, ARM Limited.
>>> + * All rights reserved.
>>> + *
>>> + * SPDX-License-Identifier:BSD-3-Clause
>>> + */
>>> +
>>> +#ifndef ODP_LLQUEUE_H_
>>> +#define ODP_LLQUEUE_H_
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +/**
>>> + * Linked list queues
>>> + 
>>> */
>>> +
>>> +/* The scalar equivalent of a double pointer */
>>> +#if __SIZEOF_PTRDIFF_T__ == 4
>>> +typedef uint64_t dintptr_t;
>>> +#endif
>>> +#if __SIZEOF_PTRDIFF_T__ == 8
>>> +typedef __int128 dintptr_t;
>>> +#endif
>>> +
>>> +#define SENTINEL ((void *)~(uintptr_t)0)
>>> +
>>> +struct llnode {
>>> + struct llnode *next;
>>> +};
>>> +
>>> +union llht {
>>> + struct {
>>> + struct llnode *head, *tail;
>>> + } st;
>>> + dintptr_t ui;
>>> +};
>>> +
>>> +struct llqueue {
>>> + union llht u;
>>> +#ifndef ODP_CONFIG_LLDSCD
>>> + odp_spinlock_t lock;
>>> +#endif
>>> +};
>>> +
>>> +static inline struct llnode *llq_head(struct llqueue *llq)
>>> +{
>>> + return __atomic_load_n(>u.st.head, __ATOMIC_RELAXED);
>>> +}
>>> +
>>> +static inline void llqueue_init(struct llqueue *llq)
>>> +{
>>> + llq->u.st.head = NULL;
>>> + llq->u.st.tail = NULL;
>>> +#ifndef ODP_CONFIG_LLDSCD
>>> + odp_spinlock_init(>lock);
>>> +#endif
>>> +}
>>> +
>>> +#ifdef ODP_CONFIG_LLDSCD
>>> +
>>> +static inline void llq_enqueue(struct llqueue *llq, struct llnode *node)
>>> +{
>>> + union llht old, neu;
>>> +
>>> + ODP_ASSERT(node->next == NULL);
>>> + node->next = SENTINEL;
>>> + do {
>>> + old.ui = lld(>u.ui, __ATOMIC_RELAXED);
>>> + neu.st.head = old.st.head == NULL ? node : old.st.head;
>>> + neu.st.tail = node;
>>> + } while (odp_unlikely(scd(>u.ui, neu.ui, __ATOMIC_RELEASE)));
>>> + if (old.st.tail != NULL) {
>>> + /* List was not empty */
>>> + ODP_ASSERT(old.st.tail->next == SENTINEL);
>>> + old.st.tail->next = node;
>>> + }
>>> +}
>>> +
>>> +#else
>>> +
>>> +static inline void llq_enqueue(struct llqueue *llq, struct llnode *node)
>>> +{
>>> + ODP_ASSERT(node->next == NULL);
>>> + node->next = SENTINEL;
>>> +
>>> + odp_spinlock_lock(>lock);
>>> + if (llq->u.st.head == NULL) {
>>> + llq->u.st.head = node;
>>> + llq->u.st.tail = node;
>>> + } else {
>>> + llq->u.st.tail->next = node;
>>> + llq->u.st.tail = node;
>>> + }
>>> + odp_spinlock_unlock(>lock);
>>> +}
>>> +
>>> +#endif
>>> +
>>> +#ifdef ODP_CONFIG_LLDSCD
>>> +
>>> +static inline struct llnode *llq_dequeue(struct llqueue *llq)
>>> +{
>>> + struct llnode *head;
>>> + union llht old, neu;
>>> +
>>> + /* llq_dequeue() may be used in a busy-waiting fashion
>>> +  * Read head using plain load to avoid disturbing remote LL/SC
>>> +  */
>>> + head = __atomic_load_n(>u.st.head, __ATOMIC_ACQUIRE);
>>> + if (head == NULL)
>>> + return NULL;
>>> + /* Read head->next before LL to minimize cache miss latency
>>> +  * in LL/SC below
>>> +   

Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Ola Liljedahl
On 05/04/2017, 12:36, "Dmitry Eremin-Solenikov"
 wrote:

>On 05.04.2017 02:31, Ola Liljedahl wrote:
>> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>>  wrote:
>>> On 04.04.2017 23:52, Ola Liljedahl wrote:
 Sending from my ARM email account, I hope Outlook does not mess up the
 format.



 On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
  wrote:

> On 04.04.2017 21:48, Brian Brooks wrote:
>> Signed-off-by: Ola Liljedahl 
>> Reviewed-by: Brian Brooks 
>> Reviewed-by: Honnappa Nagarahalli 
>
>>
>>
>> 
>>+/***
>>**
>> **
>> ***
>> + * bitset abstract data type
>> +
>>
>> 
>>*
>>**
>> **
>> /
>> +/* This could be a struct of scalars to support larger bit sets */
>> +
>> +#if ATOM_BITSET_SIZE <= 32
>
> Maybe I missed, where did you set this macro?
 In odp_config_internal.h
 It is a build time configuration.

>
> Also, why do you need several versions of bitset? Can you stick to
>one
> size that fits all?
 Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
 (AFAIK).
 Only x86-64 and ARMv8a supports 128-bit atomics (and compiler support
 for
 128-bit atomics for ARMv8a is a bit lackingŠ).
 Other architectures might only support 32-bit atomic operations.
>>>
>>> What will be the major outcome of settling on the 64-bit atomics?
>> The size of the bitset determines the maximum number of threads, the
>> maximum number of scheduler groups and the maximum number of reorder
>> contexts (per thread).
>
>Then even 128 can become too small in the forthcoming future. As far as
>I understand, most of the interesting things happen around
>bitsetting/clearing. Maybe we can redefine bitset as a struct or array
>of atomics? Then it would be expandable without significant software
>issues, wouldn't it?
>
>I'm trying to get away of situation where we have overcomplicated low
>level code, which brings different issues on further platforms (like
>supporting this amount of threads on ARM and that amount of threads on
>x86/PPC/MIPS/etc).
I think the current implementation is simple and efficient. I also think it
is sufficiently capable, e.g. supports up to 128 threads/scheduler groups
etc.
on 64-bit ARM and x86, up to 64 on 32-bit ARM/x86 and 64-bit MIPS. I don't
think we should make a more complicated generic implementation until the
need has surfaced. It is easy to over-speculate in what will be required in
the future and implement stuff that is never used.

>
 I think the user should have control over this but if you think that
we
 should just select the max value that is supported by the architecture
 in
 question and thus skip one build configuration, I am open to this. We
 will
 still need separate versions for 32/64/128 bits because there are
slight
 differences in the syntax and implementation. Such are the vagaries of
 the
 C standard (and GCC extensions).


> Any real reason for the following defines? Why do you need them?
 The functions were added as they were needed, e.g. in
 odp_schedule_scalable.c.
 I dont think there is anyone which is not used anymore but can
 double-check that.
>>>
>>> Well. I maybe should rephrase my question: why do you think that it's
>>> better to have bitset_andn(a, b), rather than just a &~b ?
>> The atomic bitset is an abstract data type. The implementation does not
>> have to use a scalar word. Alternative implementation paths exist, e.g.
>> use a struct with multiple words and perform the requested operation one
>> word at a time (this is OK but perhaps not well documented).
>
>This makes sense, esp. if we add non-plain-integer bitsets.
One note on using a struct with multiple words is that this will/might in
some cases
require multiple atomic operations (one per word) and this will be slower.

>
>
>-- 
>With best wishes
>Dmitry



Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Bill Fischofer
Environment is Ubuntu 16.10.

On Wed, Apr 5, 2017 at 7:03 AM, Bill Fischofer
 wrote:
> This is running on my desktop x86:
>
> ./bootstrap
> ./configure --enable-schedule-scalable --enable-cunit-support
> make
> cd test/common_plat/validation/api/scheduler
> ./scheduler_main
>
> On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
>  wrote:
>> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
>>> When I compile configure this without --enable-schedule-scalable the
>>> scheduler validation test runs normally, however if I enable the new
>>> scheduler I get this output:
>>>
>>>
>>> ...
>>>  CUnit - A unit testing framework for C - Version 2.1-3
>>>  http://cunit.sourceforge.net/
>>>
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>>
>>> ...lots more lines like this
>>>
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>>
>>> Suite: Scheduler
>>>   Test: scheduler_test_wait_time
>>> ..._fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
>>> de-registration failure
>>> passed
>>>   Test: scheduler_test_num_prio ...passed
>>>   Test: scheduler_test_queue_destroy
>>> ..._fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> passed
>>>   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
>>> full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:463:handle_request():FD table full
>>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>>
>>> These messages repeat throughout the test even though it "passes".
>>> Clearly something isn't right.
>>
>> We have done considerable amount of testing on x86 as well as ARM with
>> different schedulers.
>> Can you provide more details?
>> What is the config command you used?
>> What platform (x86 vs ARM)?
>> I assume you are running 'make check'.
>>
>>>
>>> On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
 This work derives from Ola Liljedahl's prototype [1] which introduced a
 scalable scheduler design based on primarily lock-free 

Re: [lng-odp] [API-NEXT PATCH v2 00/16] A scalable software scheduler

2017-04-05 Thread Bill Fischofer
This is running on my desktop x86:

./bootstrap
./configure --enable-schedule-scalable --enable-cunit-support
make
cd test/common_plat/validation/api/scheduler
./scheduler_main

On Tue, Apr 4, 2017 at 10:24 PM, Honnappa Nagarahalli
 wrote:
> On 4 April 2017 at 16:12, Bill Fischofer  wrote:
>> When I compile configure this without --enable-schedule-scalable the
>> scheduler validation test runs normally, however if I enable the new
>> scheduler I get this output:
>>
>>
>> ...
>>  CUnit - A unit testing framework for C - Version 2.1-3
>>  http://cunit.sourceforge.net/
>>
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>>
>> ...lots more lines like this
>>
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>>
>> Suite: Scheduler
>>   Test: scheduler_test_wait_time
>> ..._fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> 1..2..3..4..5.._fdserver.c:342:_odp_fdserver_deregister_fd():fd
>> de-registration failure
>> passed
>>   Test: scheduler_test_num_prio ...passed
>>   Test: scheduler_test_queue_destroy
>> ..._fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> passed
>>   Test: scheduler_test_groups ..._fdserver.c:463:handle_request():FD table 
>> full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:463:handle_request():FD table full
>> _fdserver.c:297:_odp_fdserver_register_fd():fd registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>> _fdserver.c:342:_odp_fdserver_deregister_fd():fd de-registration failure
>>
>> These messages repeat throughout the test even though it "passes".
>> Clearly something isn't right.
>
> We have done considerable amount of testing on x86 as well as ARM with
> different schedulers.
> Can you provide more details?
> What is the config command you used?
> What platform (x86 vs ARM)?
> I assume you are running 'make check'.
>
>>
>> On Tue, Apr 4, 2017 at 1:47 PM, Brian Brooks  wrote:
>>> This work derives from Ola Liljedahl's prototype [1] which introduced a
>>> scalable scheduler design based on primarily lock-free algorithms and
>>> data structures designed to decrease contention. A thread searches
>>> through a data structure containing only queues that are both non-empty
>>> and allowed to be scheduled to that thread. Strict 

Re: [lng-odp] [API-NEXT PATCH v2 13/16] Add a bitset

2017-04-05 Thread Dmitry Eremin-Solenikov
On 05.04.2017 02:31, Ola Liljedahl wrote:
> On 05/04/2017, 01:25, "Dmitry Eremin-Solenikov"
>  wrote:
>> On 04.04.2017 23:52, Ola Liljedahl wrote:
>>> Sending from my ARM email account, I hope Outlook does not mess up the
>>> format.
>>>
>>>
>>>
>>> On 04/04/2017, 22:21, "Dmitry Eremin-Solenikov"
>>>  wrote:
>>>
 On 04.04.2017 21:48, Brian Brooks wrote:
> Signed-off-by: Ola Liljedahl 
> Reviewed-by: Brian Brooks 
> Reviewed-by: Honnappa Nagarahalli 

>
>
> +/*
> **
> ***
> + * bitset abstract data type
> +
>
> ***
> **
> /
> +/* This could be a struct of scalars to support larger bit sets */
> +
> +#if ATOM_BITSET_SIZE <= 32

 Maybe I missed, where did you set this macro?
>>> In odp_config_internal.h
>>> It is a build time configuration.
>>>

 Also, why do you need several versions of bitset? Can you stick to one
 size that fits all?
>>> Some 32-bit archs (ARMv7a, x86) will only support 64-bit atomics
>>> (AFAIK).
>>> Only x86-64 and ARMv8a supports 128-bit atomics (and compiler support
>>> for
>>> 128-bit atomics for ARMv8a is a bit lackingŠ).
>>> Other architectures might only support 32-bit atomic operations.
>>
>> What will be the major outcome of settling on the 64-bit atomics?
> The size of the bitset determines the maximum number of threads, the
> maximum number of scheduler groups and the maximum number of reorder
> contexts (per thread).

Then even 128 can become too small in the forthcoming future. As far as
I understand, most of the interesting things happen around
bitsetting/clearing. Maybe we can redefine bitset as a struct or array
of atomics? Then it would be expandable without significant software
issues, wouldn't it?

I'm trying to get away of situation where we have overcomplicated low
level code, which brings different issues on further platforms (like
supporting this amount of threads on ARM and that amount of threads on
x86/PPC/MIPS/etc).

>>> I think the user should have control over this but if you think that we
>>> should just select the max value that is supported by the architecture
>>> in
>>> question and thus skip one build configuration, I am open to this. We
>>> will
>>> still need separate versions for 32/64/128 bits because there are slight
>>> differences in the syntax and implementation. Such are the vagaries of
>>> the
>>> C standard (and GCC extensions).
>>>
>>>
 Any real reason for the following defines? Why do you need them?
>>> The functions were added as they were needed, e.g. in
>>> odp_schedule_scalable.c.
>>> I dont think there is anyone which is not used anymore but can
>>> double-check that.
>>
>> Well. I maybe should rephrase my question: why do you think that it's
>> better to have bitset_andn(a, b), rather than just a &~b ?
> The atomic bitset is an abstract data type. The implementation does not
> have to use a scalar word. Alternative implementation paths exist, e.g.
> use a struct with multiple words and perform the requested operation one
> word at a time (this is OK but perhaps not well documented).

This makes sense, esp. if we add non-plain-integer bitsets.


-- 
With best wishes
Dmitry


Re: [lng-odp] v1.14.0.0 was tagged!

2017-04-05 Thread Savolainen, Petri (Nokia - FI/Espoo)
Hi,

A month has passed. I'd suggest that we tag 1.15 and include parser 
configuration API changes from api-next. Other things in api-next seems to lack 
implementation still. Crypto sha1/sha512 is the next candidate to get into 1.15.

-Petri


> -Original Message-
> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Maxim
> Uvarov
> Sent: Wednesday, March 01, 2017 9:05 PM
> To: lng-odp@lists.linaro.org
> Subject: [lng-odp] v1.14.0.0 was tagged!
> 
> v1.14.0.0 passed all CI tests and was tagged.
> 
> Detailed changelog is available in corresponding changelog commit which
> is bellow in this email.
> 
> Thank you,
> Maxim.
> 


[lng-odp] [RFC][PATCH] added asymmetric crypto algorithm support.

2017-04-05 Thread Umesh Kartha
Asymmetric crypto algorithms are essential in protocols such as SSL/TLS.
As the current ODP crypto library lacks support for asymmetric crypto
algorithms, this RFC is an attempt to address it and add support for the
same.

The asymmetric algorithms featured in this version are

1 RSA
  - RSA Sign
  - RSA Verify
  - RSA Public Encrypt
  - RSA Private Decrypt

  Padding schemes supported for RSA operations are
* RSA PKCS#1 BT1
* RSA PKCS#1 BT2
* RSA PKCS#1 OAEP
* RSA PKCS#1 PSS

2  ECDSA
  - ECDSA Sign
  - ECDSA Verify

  Curves supported for ECDSA operations are
* Prime192v1
* Secp224k1
* Prime256v1
* Secp384r1
* Secp521r1

3  MODEXP

4  FUNDAMENTAL ECC
  - Point Addition
  - Point Multiplication
  - Point Doubling

   Curves supported for fundamental ECC operations are same as that of
   ECDSA operations.



Signed-off-by: Umesh Kartha 
---
 include/odp/api/spec/crypto.h | 570 +-
 1 file changed, 569 insertions(+), 1 deletion(-)

diff --git include/odp/api/spec/crypto.h include/odp/api/spec/crypto.h
index d30f050..4cd5a3d 100644
--- include/odp/api/spec/crypto.h
+++ include/odp/api/spec/crypto.h
@@ -57,6 +57,14 @@ typedef enum {
ODP_CRYPTO_OP_ENCODE,
/** Decrypt and/or verify authentication ICV */
ODP_CRYPTO_OP_DECODE,
+   /** Perform asymmetric crypto RSA  operation */
+   ODP_CRYPTO_OP_RSA,
+   /** Perform asymmetric crypto modex operation */
+   ODP_CRYPTO_OP_MODEX,
+   /** Perform asymmetric crypto ECDSA operation */
+   ODP_CRYPTO_OP_ECDSA,
+   /** Perform asymmetric crypto ECC point operation */
+   ODP_CRYPTO_OP_FECC,
 } odp_crypto_op_t;
 
 /**
@@ -213,6 +221,202 @@ typedef union odp_crypto_auth_algos_t {
 } odp_crypto_auth_algos_t;
 
 /**
+ * Asymmetric crypto algorithms
+ */
+typedef enum {
+   /** RSA asymmetric key algorithm */
+   ODP_ASYM_ALG_RSA,
+
+   /** Modular exponentiation algorithm */
+   ODP_ASYM_ALG_MODEXP,
+
+   /** ECDSA authentication algorithm */
+   ODP_ASYM_ALG_ECDSA,
+
+   /** Fundamental ECC algorithm */
+   ODP_ASYM_ALG_FECC
+} odp_asym_alg_t;
+
+/**
+ *  Asymmetric algorithms in a bit field structure
+ */
+typedef union odp_crypto_asym_algos_t {
+   /** Asymmetric algorithms */
+   struct {
+   /** ODP_ASYM_ALG_RSA_PKCS   */
+   uint16_t alg_rsa_pkcs   :1;
+
+   /** ODP_ASYM_ALG_MODEXP */
+   uint16_t alg_modexp :1;
+
+   /** ODP_ASYM_ALG_ECDSA  */
+   uint16_t alg_ecdsa  :1;
+
+   /** ODP_ASYM_FECC   */
+   uint16_t alg_fecc   :1;
+   } bit;
+   /** All bits of the bit field structure
+*
+* This field can be used to set/clear all flags, or bitwise
+* operations over the entire structure.
+*/
+   uint16_t all_bits;
+} odp_crypto_asym_algos_t;
+
+/**
+ * Asymmetric Crypto RSA PKCS operation type
+ */
+typedef enum {
+   /** Encrypt with PKCS RSA public key */
+   ODP_CRYPTO_RSA_OP_PUBLIC_ENCRYPT,
+
+   /** Decrypt with PKCS RSA private key */
+   ODP_CRYPTO_RSA_OP_PRIVATE_DECRYPT,
+
+   /** Sign with RSA private key*/
+   ODP_CRYPTO_RSA_OP_SIGN,
+
+   /** Verify with RSA public key */
+   ODP_CRYPTO_RSA_OP_VERIFY,
+
+} odp_crypto_rsa_op_t;
+
+/**
+ * Asymmetric Crypto RSA PKCS padding type
+ */
+typedef enum {
+   /** RSA padding type none */
+   ODP_CRYPTO_RSA_PADDING_NONE,
+
+   /** RSA padding type PKCS#1 BT1*/
+   ODP_CRYPTO_RSA_PADDING_BT1,
+
+   /** RSA padding type PKCS#1 BT2*/
+   ODP_CRYPTO_RSA_PADDING_BT2,
+
+   /** RSA padding type PKCS#1OAEP */
+   ODP_CRYPTO_RSA_PADDING_OAEP,
+
+   /** RSA padding type PKCS#1 PSS */
+   ODP_CRYPTO_RSA_PADDING_PSS,
+
+} odp_crypto_rsa_padding_t;
+
+/**
+ *  RSA padding types in a bitfield structure.
+ */
+typedef union odp_crypto_rsa_pad_bits_t {
+   /** RSA padding type */
+   struct {
+   /** ODP_CRYPTO_RSA_PADDING_NONE */
+   uint16_t rsa_pad_none   :1;
+
+   /** ODP_CRYPTO_RSA_PADDING_BT1*/
+   uint16_t rsa_pad_bt1:1;
+
+   /** ODP_CRYPTO_RSA_PADDING_BT2*/
+   uint16_t rsa_pad_bt2:1;
+
+   /** ODP_CRYPTO_RSA_PADDING_OAEP*/
+   uint16_t rsa_pad_oaep   :1;
+
+   /** ODP_CRYPTO_RSA_PADDING_PSS */
+   uint16_t rsa_pad_pss:1;
+   } bit;
+   /** All bits of the bit field structure
+*
+* This field can be used to set/clear all flags, or bitwise
+* operations over the entire structure.
+*/
+   uint16_t all_bits;
+} odp_crypto_rsa_pad_bits_t;
+
+typedef enum {
+   /** MODEX operation not specified */
+   ODP_CRYPTO_MODEX_OP_NULL,
+
+   /** MODEX operation modular exponentation */
+ 

Re: [lng-odp] [API-NEXT PATCH v2 2/4] linux-gen: packet: remove lazy parsing

2017-04-05 Thread Elo, Matias (Nokia - FI/Espoo)



> On 4 Apr 2017, at 18:30, Maxim Uvarov  wrote:
> 
> breaks build:
> https://travis-ci.org/muvarov/odp/jobs/218496566
> 
> 

Hi Maxim,

I'm unable to repeat this problem. Were the patches perhaps merged to the wrong 
branch?

-Matias