Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-07 Thread James Morse
Hi Shiju, On 06/10/2020 17:13, Shiju Jose wrote: [...] > Please find following pseudo code we added for the kernel side to make sure > we correctly understand your suggestions. > > 1. Create edac device and edac device sysfs entries for the online CPU caches. > /drivers/edac/edac_device.c >

RE: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-06 Thread Shiju Jose
el@vger.kernel.org; tony.l...@intel.com; >r...@rjwysocki.net; l...@kernel.org; Linuxarm >Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on >short time period > >Hi Shiju, > >On 02/10/2020 16:38, Shiju Jose wrote: >>> -Original Message-

Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread Borislav Petkov
On Fri, Oct 02, 2020 at 06:33:17PM +0100, James Morse wrote: > > I think adding the CPU error collection to the kernel > > has the following advantages, > > 1. The CPU error collection and isolation would not be active if the > > rasdaemon stopped running or not running on a machine.

Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread James Morse
r.kernel.org; tony.l...@intel.com; r...@rjwysocki.net; >> james.mo...@arm.com; l...@kernel.org; Linuxarm >> >> Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on >> short time period >> >> On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju

RE: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread Luck, Tony
> Because from my x86 CPUs limited experience, the cache arrays are mostly > fine and errors reported there are not something that happens very > frequently so we don't even need to collect and count those. On Intel X86 we leave the counting and threshold decisions about cache health to the

RE: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread Shiju Jose
t;james.mo...@arm.com; l...@kernel.org; Linuxarm > >Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on >short time period > >On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju Jose wrote: >> Open Questions based on the feedback from Boris, 1. ARM process

Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread Borislav Petkov
On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju Jose wrote: > Open Questions based on the feedback from Boris, > 1. ARM processor error types are cache/TLB/bus errors. >[Reference N2.4.4.1 ARM Processor Error Information UEFI Spec v2.8] > Any of the above error types should not be consider for