Re: [etherlab-dev] pcap logging patch

2019-06-10 Thread Gavin Lambert
Sounds interesting, although a rolling buffer would probably be more generally 
useful (for all but the startup case).  This is basically what EC_DEBUG_IF 
does; other than being disabled by default, why wasn't that suitable?

The method that I usually use to debug traffic issues is to insert a dumb 
hub/switch between the master and first slave, and additionally connect another 
PC running Wireshark to spy on the traffic.  (For best results, disable the 
TCP/IP bindings on the monitoring PC to avoid injecting non-EtherCAT packets, 
although EtherCAT nodes will ignore these anyway.)  And it's reasonably 
portable; you just need an extra network cable and some power, no software 
changes at all.

You can go even better by adding a dedicated network monitoring device (which 
guarantees not to accept packets from the monitoring PC), but I find that the 
above is sufficient for most purposes, especially since EtherCAT packets are 
sent as broadcasts.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 


COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: etherlab-dev  On Behalf Of Graeme Foot
Sent: Tuesday, 11 June 2019 13:41
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] pcap logging patch

Hi,

In case anyone is interested I've attached a patch for an EtherCAT comms 
logging function:

/features/pcap/0001-pcap-logging.patch

This will cache the first 30mb (defined under PCAP_SIZE) of EtherCAT comms 
traffic to memory in pcap format.  It adds a pcap command to the ethercat tool 
utility, which also has a reset option to clear the cache and continue logging.

I know there are already other debug options, i.e.:
- Debug level 2, will print the EtherCAT comms to syslog direct
- EC_DEBUG_IF, which creates a local IFACE port that gets the EtherCAT comms 
traffic mirrored to it
(to be logged in wireshark locally or from a remote computer if 
the debug IFACE is bridged to a real IFACE)
- EC_DEBUG_RING, will print the EtherCAT comms to syslog if Debug level > 0
Warning: EC_DEBUG_RING uses the do_gettimeofday() method.  This is not safe to 
be called from an
RTAI realtime thread.  It will freeze your system if you only have one CPU.  It 
should use jiffies instead.

None of the options above really suited my situation as I wanted to track down 
intermittent startup issues at client sites.  The Syslog rotates too quickly 
and has other information in it and the Debug IFace option was not suitable to 
set up at a client site.


Regards,

Graeme Foot
Kinetic Engineering Design Ltd.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] pcap logging patch

2019-06-10 Thread Graeme Foot
Hi,

In case anyone is interested I've attached a patch for an EtherCAT comms 
logging function:

/features/pcap/0001-pcap-logging.patch

This will cache the first 30mb (defined under PCAP_SIZE) of EtherCAT comms 
traffic to memory in pcap format.  It adds a pcap command to the ethercat tool 
utility, which also has a reset option to clear the cache and continue logging.

I know there are already other debug options, i.e.:
- Debug level 2, will print the EtherCAT comms to syslog direct
- EC_DEBUG_IF, which creates a local IFACE port that gets the EtherCAT comms 
traffic mirrored to it
(to be logged in wireshark locally or from a remote computer if 
the debug IFACE is bridged to a real IFACE)
- EC_DEBUG_RING, will print the EtherCAT comms to syslog if Debug level > 0
Warning: EC_DEBUG_RING uses the do_gettimeofday() method.  This is not safe to 
be called from an
RTAI realtime thread.  It will freeze your system if you only have one CPU.  It 
should use jiffies instead.

None of the options above really suited my situation as I wanted to track down 
intermittent startup issues at client sites.  The Syslog rotates too quickly 
and has other information in it and the Debug IFace option was not suitable to 
set up at a client site.


Regards,

Graeme Foot
Kinetic Engineering Design Ltd.



0001-pcap-logging.patch
Description: 0001-pcap-logging.patch
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Missing Vendor ID / Product Code

2019-06-10 Thread Graeme Foot
Hi,

"/features/parallel-slave/0008-fsm_sii-loading-check.patch" is still required.

This patch fixes a problem where the SII data may still be loading from EEPROM 
while the slave fsm starts to read it, resulting in bad SII data.  This 
situation can occur when rescanning the slaves after a hotplug and one of the 
newly connected slaves is a little slow reading from the EEPROM (in my case 
EL1008 modules in particular).

It's probably worth being a base patch also.  I don't think there's anything 
relying on the parallel-slave patches.  It's just that it's more of an issue 
after the parallel-slave patches as you initialise more modules sooner so it's 
more likely to happen.

Cheers,
Graeme.


From: Gavin Lambert 
Sent: Tuesday, 11 June 2019 12:41 PM
To: Graeme Foot ; etherlab-dev@etherlab.org
Subject: RE: Missing Vendor ID / Product Code

Ah, I see.  I think the original version relied on simply not sending those 
datagrams later on.  Usually there’s only one or two cycles where a slave FSM 
is “blocked” like that, so it didn’t eat up too many datagrams and didn’t “wrap 
around” to cause issues with other slaves due to the limit on parallel FSMs.

But yes, you’re correct that it’s better to not consume the datagram from the 
ring in the first place, especially if there’s going to be prolonged inactivity.

I’ll probably end up putting this in the base patches, perhaps even folding it 
into one of the existing ones.  It’s probably getting about time to make 
another patchset release soon anyway. 😊

Does this supersede your “fsm_sii-loading-check.patch” or do you think that’s 
still useful?


Gavin Lambert
Senior Software Developer


[cid:image001.png@01D52054.31E09080]
[TOMRA][Facebook][Linkedin][Youtube][twitter]

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot mailto:graeme.f...@touchcut.com>>
Sent: Tuesday, 11 June 2019 12:22
To: Gavin Lambert mailto:gavin.lamb...@tomra.com>>; 
etherlab-dev@etherlab.org
Subject: RE: Missing Vendor ID / Product Code

Hi,

Yes I saw "base/0019..." adding EC_DATAGRAM_INVALID.  I didn't explicitly see 
"base/0026..." adding the code to prevent abandoning the mailbox fsms but I was 
working with it's merged code.  "base/0026..." added another case where the 
external datagram queue has the issue due to not abandoning the fsm, but also 
not using the datagram.

My patch will plug that hole now by not incrementing the queue index if the 
datagram is not used (if flagged with EC_DATAGRAM_INVALID).

Regards,
Graeme.


From: Gavin Lambert mailto:gavin.lamb...@tomra.com>>
Sent: Tuesday, 11 June 2019 12:02 PM
To: Graeme Foot mailto:graeme.f...@touchcut.com>>; 
etherlab-dev@etherlab.org
Subject: RE: Missing Vendor ID / Product Code

Did you have a look at 
base/0026-Prevent-abandoning-the-mailbox-state-machines-early-.patch?  Because 
that does something similar.

(It was base/0019-Support-for-multiple-mailbox-protocols.patch which added the 
handling of the INVALID datagram state for the mailbox state machines.  The one 
above was a bugfix for this patch, essentially.)


Gavin Lambert
Senior Software Developer


[cid:image001.png@01D52054.31E09080]
[TOMRA][Facebook][Linkedin][Youtube]

Re: [etherlab-dev] Missing Vendor ID / Product Code

2019-06-10 Thread Graeme Foot
Hi,

Yes I saw "base/0019..." adding EC_DATAGRAM_INVALID.  I didn't explicitly see 
"base/0026..." adding the code to prevent abandoning the mailbox fsms but I was 
working with it's merged code.  "base/0026..." added another case where the 
external datagram queue has the issue due to not abandoning the fsm, but also 
not using the datagram.

My patch will plug that hole now by not incrementing the queue index if the 
datagram is not used (if flagged with EC_DATAGRAM_INVALID).

Regards,
Graeme.


From: Gavin Lambert 
Sent: Tuesday, 11 June 2019 12:02 PM
To: Graeme Foot ; etherlab-dev@etherlab.org
Subject: RE: Missing Vendor ID / Product Code

Did you have a look at 
base/0026-Prevent-abandoning-the-mailbox-state-machines-early-.patch?  Because 
that does something similar.

(It was base/0019-Support-for-multiple-mailbox-protocols.patch which added the 
handling of the INVALID datagram state for the mailbox state machines.  The one 
above was a bugfix for this patch, essentially.)


Gavin Lambert
Senior Software Developer


[cid:image001.png@01D5204F.87ABDBF0]
[TOMRA][Facebook][Linkedin][Youtube][twitter]

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot mailto:graeme.f...@touchcut.com>>
Sent: Tuesday, 11 June 2019 11:52
To: etherlab-dev@etherlab.org
Cc: Gavin Lambert mailto:gavin.lamb...@tomra.com>>
Subject: RE: Missing Vendor ID / Product Code

Hi,

Unfortunately "0008-fsm_sii-loading-check.patch" (below) didn't fix my main 
problem.  It turns out it is an inherent problem with how the masters external 
datagram ring works.  I have attached a patch that plugs the hole causing the 
problem I was having but there may be other cases where issues could occur.

Patch: 
/features/parallel-slave/0009-ec_master_exec_slave_fsms-external-datagram-fix.patch


The guts of the problem:

ec_master_exec_slave_fsms() calls ec_master_get_external_datagram() to get a 
datagram from the external datagram ring.  The datagram is then passed to 
ec_fsm_slave_exec() of the slaves with some work to do.  This call will then 
return either 1 for fsm still in progress or 0 for fsm is complete.  The master 
assumes that if the fsm is still in progress then the datagram has been 
consumed and is in use, but there are various cases where this is not true.  If 
any of these cases occur then in the first loop of ec_master_exec_slave_fsms() 
these slaves fsm's may be executed multiple times while another slaves fsm is 
waiting on its datagram to return.

If too many slaves, or cycles, occur during this time then the waiting slaves 
datagram either gets its state set to EC_DATAGRAM_INVALID or gets reused by 
another slave.  This can lead to "cancelled" datagram replies or the two slaves 
getting the results from the second slaves datagram (as the first datagram 
index will be replaced and its reply is lost).


In my case this was occurring due to using the "0001-load-sii-from-file.patch" 
patch.  During the SII config stage of a slave this patch will create a kthread 
to attempt to read the SII file from disk.  In the meantime the 
ec_fsm_slave_exec() command will continue returning a value of 1 (fsm in 
progress) but will not be using the presented datagrams (setting the datagram 
state to EC_DATAGRAM_INVALID).

During initial startup and configuration of the master the 
ec_master_exec_slave_fsms() call is made from ec_master_idle_thread() in a loop 
with (in my configuration) a call to schedule() before resuming the loop.  This 
means that multiple loops may occur before a reply to a slaves datagram 
returns, leaving plenty of time for the in-use datagrams to be recycled 
resulting in its state or data being overwritten.


The patch I have attached now also tests the datagrams state for 
EC_DATAGRAM_INVALID before incrementing the external datagram ring index.  This 
solves my problem where the datagrams state is being set to EC_DATAGRAM_INVALID 
while waiting for the kthread to complete.

I suspect there may be other instances where this problem could occur.  One 
case I have thought of, but haven't been able to confirm, is when multiple 
protocols try to access a slaves mailbox at the same time (e.g. COE, EOE, FOE 
etc).  Only one protocol is allowed to communicate at a time.  The other 
protocols will be offered a datagram from the ring, but they aren't able to use 
it until their turn comes up.  In these cases if e

Re: [etherlab-dev] Missing Vendor ID / Product Code

2019-06-10 Thread Gavin Lambert
Did you have a look at 
base/0026-Prevent-abandoning-the-mailbox-state-machines-early-.patch?  Because 
that does something similar.

(It was base/0019-Support-for-multiple-mailbox-protocols.patch which added the 
handling of the INVALID datagram state for the mailbox state machines.  The one 
above was a bugfix for this patch, essentially.)


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 


COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot 
Sent: Tuesday, 11 June 2019 11:52
To: etherlab-dev@etherlab.org
Cc: Gavin Lambert 
Subject: RE: Missing Vendor ID / Product Code

Hi,

Unfortunately "0008-fsm_sii-loading-check.patch" (below) didn't fix my main 
problem.  It turns out it is an inherent problem with how the masters external 
datagram ring works.  I have attached a patch that plugs the hole causing the 
problem I was having but there may be other cases where issues could occur.

Patch: 
/features/parallel-slave/0009-ec_master_exec_slave_fsms-external-datagram-fix.patch


The guts of the problem:

ec_master_exec_slave_fsms() calls ec_master_get_external_datagram() to get a 
datagram from the external datagram ring.  The datagram is then passed to 
ec_fsm_slave_exec() of the slaves with some work to do.  This call will then 
return either 1 for fsm still in progress or 0 for fsm is complete.  The master 
assumes that if the fsm is still in progress then the datagram has been 
consumed and is in use, but there are various cases where this is not true.  If 
any of these cases occur then in the first loop of ec_master_exec_slave_fsms() 
these slaves fsm's may be executed multiple times while another slaves fsm is 
waiting on its datagram to return.

If too many slaves, or cycles, occur during this time then the waiting slaves 
datagram either gets its state set to EC_DATAGRAM_INVALID or gets reused by 
another slave.  This can lead to "cancelled" datagram replies or the two slaves 
getting the results from the second slaves datagram (as the first datagram 
index will be replaced and its reply is lost).


In my case this was occurring due to using the "0001-load-sii-from-file.patch" 
patch.  During the SII config stage of a slave this patch will create a kthread 
to attempt to read the SII file from disk.  In the meantime the 
ec_fsm_slave_exec() command will continue returning a value of 1 (fsm in 
progress) but will not be using the presented datagrams (setting the datagram 
state to EC_DATAGRAM_INVALID).

During initial startup and configuration of the master the 
ec_master_exec_slave_fsms() call is made from ec_master_idle_thread() in a loop 
with (in my configuration) a call to schedule() before resuming the loop.  This 
means that multiple loops may occur before a reply to a slaves datagram 
returns, leaving plenty of time for the in-use datagrams to be recycled 
resulting in its state or data being overwritten.


The patch I have attached now also tests the datagrams state for 
EC_DATAGRAM_INVALID before incrementing the external datagram ring index.  This 
solves my problem where the datagrams state is being set to EC_DATAGRAM_INVALID 
while waiting for the kthread to complete.

I suspect there may be other instances where this problem could occur.  One 
case I have thought of, but haven't been able to confirm, is when multiple 
protocols try to access a slaves mailbox at the same time (e.g. COE, EOE, FOE 
etc).  Only one protocol is allowed to communicate at a time.  The other 
protocols will be offered a datagram from the ring, but they aren't able to use 
it until their turn comes up.  In these cases if ec_read_mbox_locked() fails 
the datagram state is also set to EC_DATAGRAM_INVALID so the patch should also 
cover this case.


Regards,
Graeme.


From: etherlab-dev 
mailto:etherlab-dev-boun...@etherlab.org>> 
On Behalf Of Graeme Foot
Sent: Monday, 4 March 2019 2:36 PM
To: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Missing Vendor ID / Product Code

Hi,

I think I've finally solved the problem.  The slaves with the issue are 
returni

Re: [etherlab-dev] Missing Vendor ID / Product Code

2019-06-10 Thread Graeme Foot
Hi,

Unfortunately "0008-fsm_sii-loading-check.patch" (below) didn't fix my main 
problem.  It turns out it is an inherent problem with how the masters external 
datagram ring works.  I have attached a patch that plugs the hole causing the 
problem I was having but there may be other cases where issues could occur.

Patch: 
/features/parallel-slave/0009-ec_master_exec_slave_fsms-external-datagram-fix.patch


The guts of the problem:

ec_master_exec_slave_fsms() calls ec_master_get_external_datagram() to get a 
datagram from the external datagram ring.  The datagram is then passed to 
ec_fsm_slave_exec() of the slaves with some work to do.  This call will then 
return either 1 for fsm still in progress or 0 for fsm is complete.  The master 
assumes that if the fsm is still in progress then the datagram has been 
consumed and is in use, but there are various cases where this is not true.  If 
any of these cases occur then in the first loop of ec_master_exec_slave_fsms() 
these slaves fsm's may be executed multiple times while another slaves fsm is 
waiting on its datagram to return.

If too many slaves, or cycles, occur during this time then the waiting slaves 
datagram either gets its state set to EC_DATAGRAM_INVALID or gets reused by 
another slave.  This can lead to "cancelled" datagram replies or the two slaves 
getting the results from the second slaves datagram (as the first datagram 
index will be replaced and its reply is lost).


In my case this was occurring due to using the "0001-load-sii-from-file.patch" 
patch.  During the SII config stage of a slave this patch will create a kthread 
to attempt to read the SII file from disk.  In the meantime the 
ec_fsm_slave_exec() command will continue returning a value of 1 (fsm in 
progress) but will not be using the presented datagrams (setting the datagram 
state to EC_DATAGRAM_INVALID).

During initial startup and configuration of the master the 
ec_master_exec_slave_fsms() call is made from ec_master_idle_thread() in a loop 
with (in my configuration) a call to schedule() before resuming the loop.  This 
means that multiple loops may occur before a reply to a slaves datagram 
returns, leaving plenty of time for the in-use datagrams to be recycled 
resulting in its state or data being overwritten.


The patch I have attached now also tests the datagrams state for 
EC_DATAGRAM_INVALID before incrementing the external datagram ring index.  This 
solves my problem where the datagrams state is being set to EC_DATAGRAM_INVALID 
while waiting for the kthread to complete.

I suspect there may be other instances where this problem could occur.  One 
case I have thought of, but haven't been able to confirm, is when multiple 
protocols try to access a slaves mailbox at the same time (e.g. COE, EOE, FOE 
etc).  Only one protocol is allowed to communicate at a time.  The other 
protocols will be offered a datagram from the ring, but they aren't able to use 
it until their turn comes up.  In these cases if ec_read_mbox_locked() fails 
the datagram state is also set to EC_DATAGRAM_INVALID so the patch should also 
cover this case.


Regards,
Graeme.


From: etherlab-dev  On Behalf Of Graeme Foot
Sent: Monday, 4 March 2019 2:36 PM
To: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Missing Vendor ID / Product Code

Hi,

I think I've finally solved the problem.  The slaves with the issue are 
returning with the "EEPROM not loaded" bit set when reading the SII information 
(bit 12 if the EEPROM status word).  If this bit is set then the slave has not 
yet finished reading the SII information from the EEPROM and the data returned 
may not be valid.  The master code was not checking for this bit.  I have 
attached a patch to do so:
/features/parallel-slave/0008-fsm_sii-loading-check.patch

The patch checks if the bit is set and keeps re-reading the EEPROM data until 
it is not.  At this point the data returned is still incorrect so a complete 
read is requested (where a write is first sent asking for the slave to load the 
data that needs to be read).  There is a 500ms timeout waiting for the bit to 
be clear.  If the bit does not clear then the EEPROM load may have failed (e.g. 
incorrect CRC value).


My previous patch (features/sii-read-failure/0001-sii-read-retry.patch) should 
no longer be required, but it may help to make reading of the SII data more 
robust.  I've attached the latest version of this one also.  It is now:
features/sii-read-failure/0001-slave-scan-retry.patch


Regards,
Graeme Foot.


From: etherlab-dev 
mailto:etherlab-dev-boun...@etherlab.org>> 
On Behalf Of Graeme Foot
Sent: Friday, 12 October 2018 11:38 AM
To: Gavin Lambert mailto:gavin.lamb...@tomra.com>>; 
etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Missing Vendor ID / Product Code

Hi,

I've had a chance to play with my testrig and have managed to consistently 
reproduce the problem when hot-plugging a module (I haven't had the problem 
again on