Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-11 Thread Jarod Wilson
On Tue, Aug 02, 2016 at 07:33:01AM +, Brown, Aaron F wrote:
> > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> > Behalf Of Jarod Wilson
> > Sent: Monday, August 1, 2016 6:32 PM
> > To: Avargil, Raanan <raanan.avar...@intel.com>
> > Cc: Hall, Christopher S <christopher.s.h...@intel.com>;
> > netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> > ker...@vger.kernel.org
> > Subject: Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out
> > systim sanitization
> > 
> > On Wed, Jul 27, 2016 at 11:01:55AM -0400, Jarod Wilson wrote:
> > > On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote:
> > > >> This is prepatory work for an expanding list of adapter families that 
> > > >> have
> > occasional ~10 hour clock jumps when being used for PTP. Factor out the
> > sanitization function and convert to using a feature (bug) flag, per 
> > suggestion
> > from Jesse Brandeburg.
> > > >>
> > > >> Littering functional code with device-specific checks is much messier
> > than simply checking a flag, and having device-specific init set flags as
> > needed.
> > > >> There are probably a number of other cases in the e1000e code that
> > could/should be converted similarly.
> > > >
> > > > Looks ok to me.
> > > > Adding Chris who asked what happens if we reach the max retry counter
> > (E1000_MAX_82574_SYSTIM_REREAD)?
> > > > This counter is set to 50.
> > > > Can you, for testing purposes, decreased this value (or even set it to 
> > > > 0)
> > and see what happens?
> > >
> > > Unfortunately, I don't have direct access to the affected hardware myself,
> > > so I'd have to prep a test build, hand it off to someone and play relay. I
> > > could do that, but it'd have some lag and possible multiple round-trips...
> > > Anyone inside Intel have hardware handy to test on? :p
> > 
> > Was tied up with other work the middle of last week, then on vacation for
> > a bit. There was some testing feedback provided from someone at neither
> > Red Hat or Intel, but I'm not sure where it leaves us right now. What
> > needs to happen next?
> 
> Probably nothing else needs to be done on your end.  I was out for the last 
> week and a half and am now running the patches through a series of regression 
> test covering a fair number of the different e1000e parts.  I will also try 
> to duplicate Tim Woodford' success on a NUC with an i218 in my lab.  Assuming 
> nothing jumps out at me I'll probably give it a tested-by later this week so 
> that Jeff can push it on up.

Looking for a status update on this one, not seeing it pushed to DaveM
just yet.

-- 
Jarod Wilson
ja...@redhat.com



RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-04 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Woodford, Timothy W.
> Sent: Friday, July 29, 2016 7:41 AM
> To: Woodford, Timothy W. <timothy.woodf...@jhuapl.edu>; Avargil,
> Raanan <raanan.avar...@intel.com>; Jarod Wilson <ja...@redhat.com>;
> linux-ker...@vger.kernel.org; Hall, Christopher S
> <christopher.s.h...@intel.com>
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out
> systim sanitization
> 
> >>> This is prepatory work for an expanding list of adapter families that have
> occasional ~10 hour clock jumps when being used for PTP. Factor out the
> sanitization function and convert to using a feature (bug) flag, per 
> suggestion
> from Jesse Brandeburg.
> >>>
> >>> Littering functional code with device-specific checks is much messier than
> simply checking a flag, and having device-specific init set flags as needed.
> >>> There are probably a number of other cases in the e1000e code that
> could/should be converted similarly.
> >>
> >> Looks ok to me.
> >> Adding Chris who asked what happens if we reach the max retry counter
> (E1000_MAX_82574_SYSTIM_REREAD)?
> >> This counter is set to 50.
> >> Can you, for testing purposes, decreased this value (or even set it to 0)
> and see what happens?
> >  I'll set the max retry counter to 1 and run an overnight test to see what
> happens.
> 
> After running with this configuration for about 36 hours, I haven't seen any
> timing jumps.  Either this configuration eliminates the error, or it makes it
> significantly less likely to occur.
> 
> Tim Woodford

Feel free to throw a Tested-by: on it if you like.  Not a big deal either way, 
I managed to get enough cycles in on it I'm pretty happy with it as well.

> ___
> Intel-wired-lan mailing list
> intel-wired-...@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-04 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Jarod Wilson
> Sent: Tuesday, July 26, 2016 11:26 AM
> To: linux-ker...@vger.kernel.org
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim
> sanitization
> 
> This is prepatory work for an expanding list of adapter families that have
> occasional ~10 hour clock jumps when being used for PTP. Factor out the
> sanitization function and convert to using a feature (bug) flag, per
> suggestion from Jesse Brandeburg.
> 
> Littering functional code with device-specific checks is much messier than
> simply checking a flag, and having device-specific init set flags as needed.
> There are probably a number of other cases in the e1000e code that
> could/should be converted similarly.
> 
> Suggested-by: Jesse Brandeburg 
> CC: Jesse Brandeburg 
> CC: Jeff Kirsher 
> CC: intel-wired-...@lists.osuosl.org
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
>  drivers/net/ethernet/intel/e1000e/82571.c  |  6 ++-
>  drivers/net/ethernet/intel/e1000e/e1000.h  |  1 +
>  drivers/net/ethernet/intel/e1000e/netdev.c | 66 ++---
> -
>  3 files changed, 44 insertions(+), 29 deletions(-)

Tested-by: Aaron Brown 


RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-02 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Jarod Wilson
> Sent: Monday, August 1, 2016 6:32 PM
> To: Avargil, Raanan <raanan.avar...@intel.com>
> Cc: Hall, Christopher S <christopher.s.h...@intel.com>;
> netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out
> systim sanitization
> 
> On Wed, Jul 27, 2016 at 11:01:55AM -0400, Jarod Wilson wrote:
> > On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote:
> > >> This is prepatory work for an expanding list of adapter families that 
> > >> have
> occasional ~10 hour clock jumps when being used for PTP. Factor out the
> sanitization function and convert to using a feature (bug) flag, per 
> suggestion
> from Jesse Brandeburg.
> > >>
> > >> Littering functional code with device-specific checks is much messier
> than simply checking a flag, and having device-specific init set flags as
> needed.
> > >> There are probably a number of other cases in the e1000e code that
> could/should be converted similarly.
> > >
> > > Looks ok to me.
> > > Adding Chris who asked what happens if we reach the max retry counter
> (E1000_MAX_82574_SYSTIM_REREAD)?
> > > This counter is set to 50.
> > > Can you, for testing purposes, decreased this value (or even set it to 0)
> and see what happens?
> >
> > Unfortunately, I don't have direct access to the affected hardware myself,
> > so I'd have to prep a test build, hand it off to someone and play relay. I
> > could do that, but it'd have some lag and possible multiple round-trips...
> > Anyone inside Intel have hardware handy to test on? :p
> 
> Was tied up with other work the middle of last week, then on vacation for
> a bit. There was some testing feedback provided from someone at neither
> Red Hat or Intel, but I'm not sure where it leaves us right now. What
> needs to happen next?

Probably nothing else needs to be done on your end.  I was out for the last 
week and a half and am now running the patches through a series of regression 
test covering a fair number of the different e1000e parts.  I will also try to 
duplicate Tim Woodford' success on a NUC with an i218 in my lab.  Assuming 
nothing jumps out at me I'll probably give it a tested-by later this week so 
that Jeff can push it on up.

> 
> --
> Jarod Wilson
> ja...@redhat.com
> 
> ___
> Intel-wired-lan mailing list
> intel-wired-...@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-01 Thread Jarod Wilson
On Wed, Jul 27, 2016 at 11:01:55AM -0400, Jarod Wilson wrote:
> On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote:
> >> This is prepatory work for an expanding list of adapter families that have 
> >> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
> >> sanitization function and convert to using a feature (bug) flag, per 
> >> suggestion from Jesse Brandeburg.
> >> 
> >> Littering functional code with device-specific checks is much messier than 
> >> simply checking a flag, and having device-specific init set flags as 
> >> needed.
> >> There are probably a number of other cases in the e1000e code that 
> >> could/should be converted similarly.
> > 
> > Looks ok to me.
> > Adding Chris who asked what happens if we reach the max retry counter 
> > (E1000_MAX_82574_SYSTIM_REREAD)?
> > This counter is set to 50. 
> > Can you, for testing purposes, decreased this value (or even set it to 0) 
> > and see what happens?
> 
> Unfortunately, I don't have direct access to the affected hardware myself,
> so I'd have to prep a test build, hand it off to someone and play relay. I
> could do that, but it'd have some lag and possible multiple round-trips...
> Anyone inside Intel have hardware handy to test on? :p

Was tied up with other work the middle of last week, then on vacation for
a bit. There was some testing feedback provided from someone at neither
Red Hat or Intel, but I'm not sure where it leaves us right now. What
needs to happen next?

-- 
Jarod Wilson
ja...@redhat.com



RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-29 Thread Woodford, Timothy W.
>>> This is prepatory work for an expanding list of adapter families that have 
>>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>>> sanitization function and convert to using a feature (bug) flag, per 
>>> suggestion from Jesse Brandeburg.
>>> 
>>> Littering functional code with device-specific checks is much messier than 
>>> simply checking a flag, and having device-specific init set flags as needed.
>>> There are probably a number of other cases in the e1000e code that 
>>> could/should be converted similarly.
>> 
>> Looks ok to me.
>> Adding Chris who asked what happens if we reach the max retry counter 
>> (E1000_MAX_82574_SYSTIM_REREAD)?
>> This counter is set to 50. 
>> Can you, for testing purposes, decreased this value (or even set it to 0) 
>> and see what happens?
>  I'll set the max retry counter to 1 and run an overnight test to see what 
> happens.

After running with this configuration for about 36 hours, I haven't seen any 
timing jumps.  Either this configuration eliminates the error, or it makes it 
significantly less likely to occur.

Tim Woodford


RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Woodford, Timothy W.
>> This is prepatory work for an expanding list of adapter families that have 
>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>> sanitization function and convert to using a feature (bug) flag, per 
>> suggestion from Jesse Brandeburg.
>> 
>> Littering functional code with device-specific checks is much messier than 
>> simply checking a flag, and having device-specific init set flags as needed.
>> There are probably a number of other cases in the e1000e code that 
>> could/should be converted similarly.
> 
> Looks ok to me.
> Adding Chris who asked what happens if we reach the max retry counter 
> (E1000_MAX_82574_SYSTIM_REREAD)?
> This counter is set to 50. 
> Can you, for testing purposes, decreased this value (or even set it to 0) and 
> see what happens?

I have one of the affected NICs.  I inserted some test code into the module to 
check the value of i after the for loop inside the sanitize function exits to 
check how many iterations of the loop were being run.  I ran a PTP test setup 
for a few hours with E1000_MAX_82574_SYSTIM_REREAD=50, and the maximum number 
was 5 iterations.
It looks like setting E1000_MAX_82574_SYSTIM_REREAD to 0 will just disable the 
return value checking and will have the same results as disabling the 
sanitization function.  I'll set the max retry counter to 1 and run an 
overnight test to see what happens.

Tim Woodford


Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Jarod Wilson
On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote:
>> This is prepatory work for an expanding list of adapter families that have 
>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>> sanitization function and convert to using a feature (bug) flag, per 
>> suggestion from Jesse Brandeburg.
>> 
>> Littering functional code with device-specific checks is much messier than 
>> simply checking a flag, and having device-specific init set flags as needed.
>> There are probably a number of other cases in the e1000e code that 
>> could/should be converted similarly.
> 
> Looks ok to me.
> Adding Chris who asked what happens if we reach the max retry counter 
> (E1000_MAX_82574_SYSTIM_REREAD)?
> This counter is set to 50. 
> Can you, for testing purposes, decreased this value (or even set it to 0) and 
> see what happens?

Unfortunately, I don't have direct access to the affected hardware myself,
so I'd have to prep a test build, hand it off to someone and play relay. I
could do that, but it'd have some lag and possible multiple round-trips...
Anyone inside Intel have hardware handy to test on? :p

-- 
Jarod Wilson
ja...@redhat.com



RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Avargil, Raanan
This is prepatory work for an expanding list of adapter families that have 
occasional ~10 hour clock jumps when being used for PTP. Factor out the 
sanitization function and convert to using a feature (bug) flag, per suggestion 
from Jesse Brandeburg.

Littering functional code with device-specific checks is much messier than 
simply checking a flag, and having device-specific init set flags as needed.
There are probably a number of other cases in the e1000e code that could/should 
be converted similarly.

Suggested-by: Jesse Brandeburg 
CC: Jesse Brandeburg 
CC: Jeff Kirsher 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e1000e/82571.c  |  6 ++-  
drivers/net/ethernet/intel/e1000e/e1000.h  |  1 +  
drivers/net/ethernet/intel/e1000e/netdev.c | 66 ++
 3 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c 
b/drivers/net/ethernet/intel/e1000e/82571.c
index 7fd4d54..6b03c85 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -2032,7 +2032,8 @@ const struct e1000_info e1000_82574_info = {
  | FLAG2_DISABLE_ASPM_L0S
  | FLAG2_DISABLE_ASPM_L1
  | FLAG2_NO_DISABLE_RX
- | FLAG2_DMA_BURST,
+ | FLAG2_DMA_BURST
+ | FLAG2_CHECK_SYSTIM_OVERFLOW,
.pba= 32,
.max_hw_frame_size  = DEFAULT_JUMBO,
.get_variants   = e1000_get_variants_82571,
@@ -2053,7 +2054,8 @@ const struct e1000_info e1000_82583_info = {
  | FLAG_HAS_CTRLEXT_ON_LOAD,
.flags2 = FLAG2_DISABLE_ASPM_L0S
  | FLAG2_DISABLE_ASPM_L1
- | FLAG2_NO_DISABLE_RX,
+ | FLAG2_NO_DISABLE_RX
+ | FLAG2_CHECK_SYSTIM_OVERFLOW,
.pba= 32,
.max_hw_frame_size  = DEFAULT_JUMBO,
.get_variants   = e1000_get_variants_82571,
diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index ef96cd1..879cca4 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -452,6 +452,7 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, 
u32 *timinca);
 #define FLAG2_PCIM2PCI_ARBITER_WA BIT(11)
 #define FLAG2_DFLT_CRC_STRIPPING  BIT(12)
 #define FLAG2_CHECK_RX_HWTSTAMP   BIT(13)
+#define FLAG2_CHECK_SYSTIM_OVERFLOW   BIT(14)
 
 #define E1000_RX_DESC_PS(R, i) \
(&(((union e1000_rx_desc_packet_split *)((R).desc))[i])) diff --git 
a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0..1b2df11 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4303,6 +4303,42 @@ void e1000e_reinit_locked(struct e1000_adapter *adapter) 
 }
 
 /**
+ * e1000e_sanitize_systim - sanitize raw cycle counter reads
+ * @hw: pointer to the HW structure
+ * @systim: cycle_t value read, sanitized and returned
+ *
+ * Errata for 82574/82583 possible bad bits read from SYSTIMH/L:
+ * check to see that the time is incrementing at a reasonable
+ * rate and is a multiple of incvalue.
+ **/
+static cycle_t e1000e_sanitize_systim(struct e1000_hw *hw, cycle_t 
+systim) {
+   u64 time_delta, rem, temp;
+   cycle_t systim_next;
+   u32 incvalue;
+   int i;
+
+   incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
+   for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
+   /* latch SYSTIMH on read of SYSTIML */
+   systim_next = (cycle_t)er32(SYSTIML);
+   systim_next |= (cycle_t)er32(SYSTIMH) << 32;
+
+   time_delta = systim_next - systim;
+   temp = time_delta;
+   /* VMWare users have seen incvalue of zero, don't div / 0 */
+   rem = incvalue ? do_div(temp, incvalue) : (time_delta != 0);
+
+   systim = systim_next;
+
+   if ((time_delta < E1000_82574_SYSTIM_EPSILON) && (rem == 0))
+   break;
+   }
+
+   return systim;
+}
+
+/**
  * e1000e_cyclecounter_read - read raw cycle counter (used by time counter)
  * @cc: cyclecounter structure
  **/
@@ -4312,7 +4348,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 cc);
struct e1000_hw *hw = >hw;
u32 systimel, systimeh;
-   cycle_t systim, systim_next;
+   cycle_t systim;
/* SYSTIMH