Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-05-06 Thread Graeme Foot
Hi,

I found a couple of bugs in my patches.  New patches attached.

Graeme.


-Original Message-
From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of 
Graeme Foot
Sent: Wednesday, 7 March 2018 10:53 AM
To: Gavin Lambert <gav...@compacsort.com>
Cc: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi Gavin,

I have attached updated eoe patches to maintain compatibility with your 
patchset for non-RTAI / RTDM applications.

Patch 0001:
This will now continue with the existing behaviour of auto-create and 
auto-delete of eoe ports as the eoe slaves are detected and removed.  It does 
contain the fix for removing the lock from the transmit callback so eoe users 
will probably want this update even if they want to keep existing behaviour.

Patch 0002:
Still only targeted at RTAI / RTDM users, but will now maintain existing 
behaviour (internal locking via the internal callbacks) for Linux user space 
applications.


Regards,
Graeme.

-Original Message-
From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of 
Graeme Foot
Sent: Tuesday, 6 March 2018 1:40 PM
To: Esben Haabendal <esben.haaben...@gmail.com>
Cc: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi,

The email's getting a little hard to read so I'll try to summarise what I'm 
doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard 
realtime calls.  So the base 0017 and 0018 patches are useless for RTAI.  
Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this 
framework.

I put the patches in the features branch of Gavin's patchset as they should be 
optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for 
RTAI / RTDM for EoE.  I haven't looked too hard at making this one generic as 
it does break the previous patches I mentioned (base 0017 and 0018).  But it 
also needs to break them as the whole point of it is to use an application 
level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab 
master).  So no, the patch does not support "plain" user-space applications 
(i.e. not using RTDM) as my focus was only for RTDM applications.  Patch 2 
could probably have a few more defines to make sure it only affects RTDM users 
and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime 
options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users 
to provide a user space alternate to the callbacks framework.  If people prefer 
this method to allow "plain" Linux user space applications to supply their own 
locking then it could be extended.  But it does mean that the base patches 
"0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and 
"0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab 
EtherCAT master approach of "the user application knows best" as to when a lock 
can be acquired, but also covers the "realtime RTAI can't use Linux locks" 
issue; OR don't use patch 2 and go with the patches 0017/0018 approach where 
the master handles this locking (which as I said is not an option for RTAI).



> From: Esben Haabendal [mailto:esbenhaaben...@gmail.com] On Behalf Of 
> Esben Haabendal
> 
> Hi Graeme
> 
> Need to get one thing straight first.
> 
> RTDM user-space is not the same as "plain" Linux user-space.
> I know that you are 100% aware that, but your replies to me seems less clear 
> about it.
> 
> Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space 
> applications, so does not apply to the applications we support.
> 
> >> > These 4 functions are special.  The master should be written (and 
> >> > mostly seems to be) in a fashion that between a
> >> > ecrt_master_activate() and ecrt_master_deactivate() the above 
> >> > calls do not require any locks to synchronize with the master 
> >> > code, except for the EOE thread which uses callbacks.
> >> 
> >> Well, mostly is the same as not in my world.  If they mostly do not 
> >> require any locks, they do require locks.
> >
> > When I say mostly, I mean the vanilla Etherlab version is ok, but 
> > the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() 
> > broke this.  I do not use this patch.
> 
> Do you mean that the patches (and I guess you mean 
> 0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
> 0018-Use-call-back-functions.patch) introduces the need for locking?
>

Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-03-15 Thread Gavin Lambert
On 5 March 2018 22:58, quoth Esben Haabendal:
> The multiple process world can be different though.  You basically end up
> with a new EtherCAT API.  A combination of the etherlabmaster API and a
> custom named semaphore API.  Without this, applications will not work
> properly together.  Why not include such a feature directly in
> etherlabmaster?  Without it, I think we are making the user-space
> applications (non-RTDM) into a second-class citizen.

If you have multiple independent processes then they must (of necessity) either 
operate on separate masters (in which case no locking beyond what the kernel 
already does is required) or they must all communicate (through some mechanism 
of your own devising) with a single process who "owns" the master.  The master 
library does not allow you to reserve or activate a single master concurrently 
in separate processes.

If you have multiple tasks within a single process operating on the same master 
(eg. multiple cycles with different intervals) then they _should_ operate on 
different domains, and *must* coordinate their calls to the ECRT APIs in some 
fashion.  In upstream Etherlab, this requires application-level locking.  In 
the current patchset, the locking is done for you (except for RTDM), but I'm 
not entirely convinced this is the correct design choice (see my other reply).

If you have one process that is running a realtime application loop and another 
process that only performs non-realtime tasks (eg. injecting CoE requests), 
even on the same master, then all versions of Etherlab handle this for you 
without requiring additional locking.  This is how you can still use the 
"ethercat" command line tool while running a realtime application.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-03-05 Thread Esben Haabendal
Hi Graeme

Need to get one thing straight first.

RTDM user-space is not the same as "plain" Linux user-space.
I know that you are 100% aware that, but your replies to me seems less
clear about it.

Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space
applications, so does not apply to the applications we support.

>> > These 4 functions are special.  The master should be written (and 
>> > mostly seems to be) in a fashion that between a ecrt_master_activate() 
>> > and ecrt_master_deactivate() the above calls do not require any locks 
>> > to synchronize with the master code, except for the EOE thread which 
>> > uses callbacks.
>> 
>> Well, mostly is the same as not in my world.  If they mostly do not require
>> any locks, they do require locks.
>
> When I say mostly, I mean the vanilla Etherlab version is ok, but the patch
> that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke this.  I do not
> use this patch.

Do you mean that the patches (and I guess you mean
0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
0018-Use-call-back-functions.patch) introduces the need for locking?

That does not sound right to me.  As I read them (and your writing in
this thread), the need for locking is there with and without these
patches.  Without them, locking needs to be handled at application
level.  With the patches, the master tries to handle the locking.

And that is a clearly a design change, and one which you clearly does
not agree with.

But if you really mean that these patches broke etherlabmaster (and not
"just" changed the design), would you please explain it a bit more
detailed?

>> As for the EOE thread, I might be overlooking something obvious.  But how
>> are you supposed to use callbacks when using the library API?
>> 
>> As far as I read the code, if you are using EoE and not RTDM, you will
>> always use the standard ec_master_internal_send_cb() and
>> ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().
>
> If you do not set EoE callbacks (with ecrt_master_callbacks()) there will be
> NO callbacks once the master goes active.  (It will not use
> ec_master_internal_send_cb() and ec_master_internal_receive_cb().  It will not
> start an EoE processing thread.  You will get a "No EoE processing because of
> missing callbacks!" message in your system log.)

In ec_ioctl_activate() there is:

#ifndef EC_IOCTL_RTDM
ecrt_master_callbacks(master, ec_master_internal_send_cb,
ec_master_internal_receive_cb, master);
#endif

So at least for (non RTDM) user-space applications, that activate the
master with ioctl, EoE callbacks are always set to
ec_master_internal_send_cb() and ec_master_internal_receive_cb().

> If your application is in kernel space then you can specify callback functions
> and then use application level locking.  If your application is in user space,
> then you cannot specify callback functions as the kernel cannot call back into
> user space.  This is why I created my latest patch "0002-eoe-via-rtdm.patch".
> This patch lets you create your own EoE processing thread in your own
> application.  This of course also lets you use application level locking.

But only for RTDM.  The functions added are guarded by

#if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)

which means that they do not apply when using non-RTDM user-space.

>> And with them, I don't see how you are supposed to do locking at the
>> application level.  I will not be able to specify application level locks
>> that are used by both application and eoe, unless they are implemented
>> directly in the master (kernel).
>> 
>> And with ec_master_eoe_thread() running inside master, you really need to do
>> locking to synchronize access to master->datagram_queue (i.e. the io_sem
>> lock).  If not, you will have race conditions between eoe and 3 of the 4
>> functions:
>
> The io_sem lock is only used by the master when it is idle.  Once you activate
> the master it is up to your application to provide the locking.

Ok.  But I dare to say that it is an open question if that is an design
for etherlabmaster.  I believe it has both pros and cons.

Pro: Application developers can implement synchronization as they find
most optimal for them.

Cons: Application developers need to figure out and implement
synchronization without much (any) help from the etherlabmaster code.
And getting synchronization is IMHO something that is often tricky to
get right (at least for many developers).

>>ecrt_master_send()
>>ecrt_master_receive()
>>ecrt_domain_queue()
>> 
>> So we really do not to do locking in at least these 3 when using EOE.
>> 
>> Or maybe the whole master->datagram_queue should be refactored to something
>> that can be synchronized safely in a lock-free way?
>
> The point of the callback functions is so that you can make sure your
> application does not call any of ecrt_master_receive(), ecrt_master_send(),
> ecrt_master_send_ext(), 

Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-03-04 Thread Graeme Foot
> From: Esben Haabendal [mailto:esbenhaaben...@gmail.com] On Behalf Of Esben 
> Haabendal
> 
> >> Graeme Foot  writes:
> >> 
> >> > The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls 
> >> > were added to protect the following functions when multiple 
> >> > send/receive loops are
> >> > running:
> >> > - ecrt_master_send()
> >> > - ecrt_master_receive()
> >> > - ecrt_master_domain_process()
> >> > - ecrt_master_domain_queue()
> >> >
> >> > In my opinion any locking on these functions should be at the 
> >> > application level instead.
> >> 
> >> Well, I have a different opinion on that.
> >> 
> >> Implementation of locking is inherently difficult to get right.  You 
> >> both have to ensure against race conditions while avoiding deadlocks.
> >> But you also do not want to block for too long.
> >> 
> >> While I do acknowledge that for trivial cases where there are only a 
> >> single application, it is not a big program for that single 
> >> application to maintain synchronization, I don't think that it is a 
> >> good solution to let each application developer in the more 
> >> complicated situations (like multiple independent processes) have to 
> >> do this without any help from etherlabmaster.  Forcing all (or at 
> >> least some) application developers to solve this same problem again 
> >> and again should not be the best we can do.
> >
> > These 4 functions are special.  The master should be written (and 
> > mostly seems to be) in a fashion that between a ecrt_master_activate() 
> > and ecrt_master_deactivate() the above calls do not require any locks 
> > to synchronize with the master code, except for the EOE thread which 
> > uses callbacks.
> 
> Well, mostly is the same as not in my world.  If they mostly do not require 
> any locks, they do require locks.
> 

When I say mostly, I mean the vanilla Etherlab version is ok, but the patch 
that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke this.  I do not 
use this patch.


> I guess some of the confusion I have is caused by the fact that it is unclear 
> exactly which functions are allowed to be called between
> ecrt_master_activate() and ecrt_master_deactivate(), and which functions are 
> not.  Do we have such a list?  Or how can I create such a list exactly?  And 
> would it be possible to enforce this directly in the API, so if you call a 
> function you are not allowed to call, you get an error back instead of 
> introducing random breakage by modifying unsynchronized shared data 
> structures?
> 

The functions that I think should not require locks between 
ecrt_master_activate() and ecrt_master_deactivate() are:
- ecrt_master_receive()
- ecrt_domain_process()
- ecrt_domain_state()
- ecrt_master_state()
- ecrt_domain_queue()
- ecrt_master_reference_clock_time()
- ecrt_master_sync_slave_clocks()
- ecrt_master_sync_reference_clock()
- ecrt_master_64bit_reference_clock_time_queue()
- ecrt_master_64bit_reference_clock_time()
- ecrt_master_application_time()
- ecrt_master_send()
- ecrt_master_send_ext()
- ecrt_master_deactivate_slaves()

And with some of my recent patchs
- ecrt_master_eoe_is_open()
- ecrt_master_eoe_process()
- ecrt_master_rt_slave_requests()
- ecrt_master_exec_slave_requests()

There may be some others I don't use.


> As for the EOE thread, I might be overlooking something obvious.  But how are 
> you supposed to use callbacks when using the library API?
> 
> As far as I read the code, if you are using EoE and not RTDM, you will always 
> use the standard ec_master_internal_send_cb() and
> ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().
> 

If you do not set EoE callbacks (with ecrt_master_callbacks()) there will be NO 
callbacks once the master goes active.  (It will not use 
ec_master_internal_send_cb() and ec_master_internal_receive_cb().  It will not 
start an EoE processing thread.  You will get a "No EoE processing because of 
missing callbacks!" message in your system log.)

If your application is in kernel space then you can specify callback functions 
and then use application level locking.  If your application is in user space, 
then you cannot specify callback functions as the kernel cannot call back into 
user space.  This is why I created my latest patch "0002-eoe-via-rtdm.patch".  
This patch lets you create your own EoE processing thread in your own 
application.  This of course also lets you use application level locking.


> And with them, I don't see how you are supposed to do locking at the 
> application level.  I will not be able to specify application level locks 
> that are used by both application and eoe, unless they are implemented 
> directly in the master (kernel).
> 
> And with ec_master_eoe_thread() running inside master, you really need to do 
> locking to synchronize access to master->datagram_queue (i.e. the io_sem 
> lock).  If not, you will have race conditions between eoe and 3 of the 4 
> functions:
> 

The io_sem 

Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-03-02 Thread Esben Haabendal
Graeme Foot  writes:

>> Graeme Foot  writes:
>> 
>> > The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls 
>> > were added to protect the following functions when multiple 
>> > send/receive loops are
>> > running:
>> > - ecrt_master_send()
>> > - ecrt_master_receive()
>> > - ecrt_master_domain_process()
>> > - ecrt_master_domain_queue()
>> >
>> > In my opinion any locking on these functions should be at the 
>> > application level instead.
>> 
>> Well, I have a different opinion on that.
>> 
>> Implementation of locking is inherently difficult to get right.  You
>> both have to ensure against race conditions while avoiding deadlocks.
>> But you also do not want to block for too long.
>> 
>> While I do acknowledge that for trivial cases where there are only a
>> single application, it is not a big program for that single
>> application to maintain synchronization, I don't think that it is a
>> good solution to let each application developer in the more
>> complicated situations (like multiple independent processes) have to
>> do this without any help from etherlabmaster.  Forcing all (or at
>> least some) application developers to solve this same problem again
>> and again should not be the best we can do.
>
> These 4 functions are special.  The master should be written (and
> mostly seems to be) in a fashion that between a ecrt_master_activate()
> and ecrt_master_deactivate() the above calls do not require any locks
> to synchronize with the master code, except for the EOE thread which
> uses callbacks.

Well, mostly is the same as not in my world.  If they mostly do not
require any locks, they do require locks.

I guess some of the confusion I have is caused by the fact that it is
unclear exactly which functions are allowed to be called between
ecrt_master_activate() and ecrt_master_deactivate(), and which functions
are not.  Do we have such a list?  Or how can I create such a list
exactly?  And would it be possible to enforce this directly in the API,
so if you call a function you are not allowed to call, you get an error
back instead of introducing random breakage by modifying unsynchronized
shared data structures?

As for the EOE thread, I might be overlooking something obvious.  But
how are you supposed to use callbacks when using the library API?

As far as I read the code, if you are using EoE and not RTDM, you will
always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().

And with them, I don't see how you are supposed to do locking at the
application level.  I will not be able to specify application level
locks that are used by both application and eoe, unless they are
implemented directly in the master (kernel).

And with ec_master_eoe_thread() running inside master, you really need
to do locking to synchronize access to master->datagram_queue (i.e. the
io_sem lock).  If not, you will have race conditions between eoe and 3
of the 4 functions:

   ecrt_master_send()
   ecrt_master_receive()
   ecrt_domain_queue()

So we really do not to do locking in at least these 3 when using EOE.

Or maybe the whole master->datagram_queue should be refactored to something
that can be synchronized safely in a lock-free way?

> And the reason it uses callbacks is because only your application
> knows if it is appropriate to allow the EOE thread to call
> ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular
> time.

I see what you mean.  But I just don't see how it is possible to
accomplish this for user space applications.

> However, if your application has multiple send/receive loops then they need to
> be synchronized with each other (see next comment).  There are a few more
> functions such as the distributed clock calls that are also in the
> send/receive loops.  They also do not require ethercat level locks as they
> should be safe to call between the activate and deactivate.  They also do not
> need application level locking as they should really only be called by one
> send/receive loop per master.  All other ethercat function calls should not be
> in a hard realtime context.

Ok.  But you imply a struct rule for how to design applications doing
EtherCAT communication: "There can only be one send/receive loop per
master".

That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different
cycle time.

One solution could be to create a single send/receive loop per master,
and then let the multiple application loops communicate with that
instead of directly with the master.  That might actually be a really
good idea.  But, if that is done, I think it should fit very well as a
general feature of the EtherCAT master.  It should be possible to
implement without any application specific code.



The final "problem" is as