Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-14 Thread Tom Hetmer
Hi
OK, great!


I'll try using this build and we'll let you know, it can take a while due to 
christmas etc. and we need to wait for some new errors.


Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Albert Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/13/18 20:48 
> Předmět: Re: Re[8]: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> Hey Tom,
> 
> Ok, I went with just the X10 motherboards.  I got a new branch called
> supermicro_dimm2 that handles all the X10s you listed.
> 
> https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm2
> 
> Can you give it a try, against hopefully a few different motherboards.
> 
> Al
> 
> On Thu, 2018-12-13 at 12:21 +0100, Tom Hetmer wrote:
> > I'm not sure if this or the 'version 1' method applies for X9s.
> > I've seen it work for all X10s at least. I think I'd limit it just to
> > X10 boards.
> > 
> > Best,
> > Tom Hetmer
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > - Původní zpráva -
> > > Odesilatel: "Albert Chu" 
> > > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> > .org
> > > Datum: 12/12/18 20:52
> > > Předmět: Re: Re[6]: [Freeipmi-users] Decoding ram errors on
> > supermicro
> > >
> > > Hey Tom,
> > >
> > > So are you under the impression all the motherboards in your
> > product ID
> > > list should support this DIMM interpretation?
> > >
> > > We atleast have evidence of X10DRH LN4 working based on your prior
> > e-
> > > mail, but I am a tad reluctant to add all motherboards, especially
> > non-
> > > X10 motherboards since we do not have official information from
> > > Supermicro.
> > >
> > > What are your thoughts?
> > >
> > > Al
> > >
> > > On Wed, 2018-12-12 at 12:04 +0100, Tom Hetmer wrote:
> > > > Hi,
> > > >
> > > > no luck.
> > > >
> > > > 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory       
> >    
> > > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> > Event
> > > > Data3 code = 80h
> > > > 202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory       
> >    
> > > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> > Event
> > > > Data3 code = 80h
> > > > 203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory       
> >    
> > > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> > Event
> > > > Data3 code = 80h
> > > > 204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory       
> >    
> > > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> > Event
> > > > Data3 code = 80h
> > > >
> > > > debug: http://termbin.com/3x02
> > > > 10.110.32.36: [ 811h] = product_id[16b]
> > > > It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
> > > > You can check it in the full list:
> > > > https://github.com/chu11/freeipmi-mirror/files/2651093/product_id
> > s.tx
> > > > t
> > > >
> > > > When patched with 2065:
> > > > 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable
> > memory
> > > > error ; DIMMB2(CPU1)
> > > > 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable
> > memory
> > > > error ; DIMMB2(CPU1)
> > > > 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable
> > memory
> > > > error ; DIMMB2(CPU1)
> > > > 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable
> > memory
> > > > error ; DIMMB2(CPU1)
> > > >
> > > > Voila :)
> > > >
> > > >
> > > > Best,
> > > > Tom Hetmer
> > > >
> > > > CDN77 Operations
> > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > >
> > > > - Původní zpráva -
> > > > > Odesilatel: "Albert Chu" 
> > > > > Příjemce: "Tom Hetmer" , freeipmi-users
> > @gnu
> > > > .org
> > > > > Datum: 12/12/18 02:18
> > > > > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on
> > > > supermicro
> > > > >

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-13 Thread Albert Chu
Hey Tom,

Ok, I went with just the X10 motherboards.  I got a new branch called
supermicro_dimm2 that handles all the X10s you listed.

https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm2

Can you give it a try, against hopefully a few different motherboards.

Al

On Thu, 2018-12-13 at 12:21 +0100, Tom Hetmer wrote:
> I'm not sure if this or the 'version 1' method applies for X9s.
> I've seen it work for all X10s at least. I think I'd limit it just to
> X10 boards.
> 
> Best,
> Tom Hetmer
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> - Původní zpráva -
> > Odesilatel: "Albert Chu" 
> > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> .org
> > Datum: 12/12/18 20:52
> > Předmět: Re: Re[6]: [Freeipmi-users] Decoding ram errors on
> supermicro
> >
> > Hey Tom,
> >
> > So are you under the impression all the motherboards in your
> product ID
> > list should support this DIMM interpretation?
> >
> > We atleast have evidence of X10DRH LN4 working based on your prior
> e-
> > mail, but I am a tad reluctant to add all motherboards, especially
> non-
> > X10 motherboards since we do not have official information from
> > Supermicro.
> >
> > What are your thoughts?
> >
> > Al
> >
> > On Wed, 2018-12-12 at 12:04 +0100, Tom Hetmer wrote:
> > > Hi,
> > >
> > > no luck.
> > >
> > > 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory       
>    
> > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> Event
> > > Data3 code = 80h
> > > 202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory       
>    
> > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> Event
> > > Data3 code = 80h
> > > 203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory       
>    
> > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> Event
> > > Data3 code = 80h
> > > 204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory       
>    
> > > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM
> Event
> > > Data3 code = 80h
> > >
> > > debug: http://termbin.com/3x02
> > > 10.110.32.36: [ 811h] = product_id[16b]
> > > It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
> > > You can check it in the full list:
> > > https://github.com/chu11/freeipmi-mirror/files/2651093/product_id
> s.tx
> > > t
> > >
> > > When patched with 2065:
> > > 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable
> memory
> > > error ; DIMMB2(CPU1)
> > > 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable
> memory
> > > error ; DIMMB2(CPU1)
> > > 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable
> memory
> > > error ; DIMMB2(CPU1)
> > > 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable
> memory
> > > error ; DIMMB2(CPU1)
> > >
> > > Voila :)
> > >
> > >
> > > Best,
> > > Tom Hetmer
> > >
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > >
> > > - Původní zpráva -
> > > > Odesilatel: "Albert Chu" 
> > > > Příjemce: "Tom Hetmer" , freeipmi-users
> @gnu
> > > .org
> > > > Datum: 12/12/18 02:18
> > > > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on
> > > supermicro
> > > >
> > > > Hey Tom,
> > > >
> > > > I got a branch on github with (what I hope) is support for the
> > > X10SLM+-
> > > > F.  Could you give it a shot.  The branch is called
> > > "supermicro_dimm".
> > > >
> > > > https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm
> > > >
> > > > ./autogen.sh
> > > > ./configure
> > > > make
> > > > ipmi-sel/ipmi-sel --interpret-oem-data
> > > > (add remote connection options as needed to ipmi-sel)
> > > >
> > > > If that doesn't work, could you do the following
> > > >
> > > > ipmi-sel/ipmi-sel --debug --display=201
> > > >
> > > > (i picked 201 as one of the DIMM output belows.  Doesn't have
> to be
> > > > that one, just any specific DIMM SEL event).
> > > >
> > > > Thanks,
> > > >
> > > > Al
&g

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-13 Thread Tom Hetmer

I'm not sure if this or the 'version 1' method applies for X9s.
I've seen it work for all X10s at least. I think I'd limit it just to X10 
boards.

Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Albert Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/12/18 20:52 
> Předmět: Re: Re[6]: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> Hey Tom,
> 
> So are you under the impression all the motherboards in your product ID
> list should support this DIMM interpretation?
> 
> We atleast have evidence of X10DRH LN4 working based on your prior e-
> mail, but I am a tad reluctant to add all motherboards, especially non-
> X10 motherboards since we do not have official information from
> Supermicro.
> 
> What are your thoughts?
> 
> Al
> 
> On Wed, 2018-12-12 at 12:04 +0100, Tom Hetmer wrote:
> > Hi,
> > 
> > no luck.
> > 
> > 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory           
> > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> > Data3 code = 80h
> > 202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory           
> > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> > Data3 code = 80h
> > 203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory           
> > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> > Data3 code = 80h
> > 204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory           
> > | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> > Data3 code = 80h
> > 
> > debug: http://termbin.com/3x02
> > 10.110.32.36: [ 811h] = product_id[16b]
> > It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
> > You can check it in the full list:
> > https://github.com/chu11/freeipmi-mirror/files/2651093/product_ids.tx
> > t
> > 
> > When patched with 2065:
> > 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable memory
> > error ; DIMMB2(CPU1)
> > 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable memory
> > error ; DIMMB2(CPU1)
> > 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable memory
> > error ; DIMMB2(CPU1)
> > 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable memory
> > error ; DIMMB2(CPU1)
> > 
> > Voila :)
> > 
> > 
> > Best,
> > Tom Hetmer
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > - Původní zpráva -
> > > Odesilatel: "Albert Chu" 
> > > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> > .org
> > > Datum: 12/12/18 02:18
> > > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on
> > supermicro
> > >
> > > Hey Tom,
> > >
> > > I got a branch on github with (what I hope) is support for the
> > X10SLM+-
> > > F.  Could you give it a shot.  The branch is called
> > "supermicro_dimm".
> > >
> > > https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm
> > >
> > > ./autogen.sh
> > > ./configure
> > > make
> > > ipmi-sel/ipmi-sel --interpret-oem-data
> > > (add remote connection options as needed to ipmi-sel)
> > >
> > > If that doesn't work, could you do the following
> > >
> > > ipmi-sel/ipmi-sel --debug --display=201
> > >
> > > (i picked 201 as one of the DIMM output belows.  Doesn't have to be
> > > that one, just any specific DIMM SEL event).
> > >
> > > Thanks,
> > >
> > > Al
> > >
> > > On Tue, 2018-12-11 at 13:33 +0100, Tom Hetmer wrote:
> > > > Supermicro (after pointing me to web interface and SNMP...):
> > > > "Sorry, we do not have this Information at our support desk. you
> > can
> > > > request this via your sales channel, but it can be that you would
> > > > need to sign an NDA for such information."
> > > >
> > > > So we're on our own, I don't have any better contact as we buy
> > from a
> > > > reseller.
> > > > Besides they'd want an NDA for that 3 lines of code.
> > > >
> > > > Best,
> > > > Tom Hetmer
> > > >
> > > > CDN77 Operations
> > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > >
> > > > - Původní zpráva -
> > > > Odesilat

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-12 Thread Albert Chu
Hey Tom,

So are you under the impression all the motherboards in your product ID
list should support this DIMM interpretation?

We atleast have evidence of X10DRH LN4 working based on your prior e-
mail, but I am a tad reluctant to add all motherboards, especially non-
X10 motherboards since we do not have official information from
Supermicro.

What are your thoughts?

Al

On Wed, 2018-12-12 at 12:04 +0100, Tom Hetmer wrote:
> Hi,
> 
> no luck.
> 
> 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 
> debug: http://termbin.com/3x02
> 10.110.32.36: [ 811h] = product_id[16b]
> It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
> You can check it in the full list:
> https://github.com/chu11/freeipmi-mirror/files/2651093/product_ids.tx
> t
> 
> When patched with 2065:
> 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 
> Voila :)
> 
> 
> Best,
> Tom Hetmer
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> - Původní zpráva -----
> > Odesilatel: "Albert Chu" 
> > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> .org
> > Datum: 12/12/18 02:18
> > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on
> supermicro
> >
> > Hey Tom,
> >
> > I got a branch on github with (what I hope) is support for the
> X10SLM+-
> > F.  Could you give it a shot.  The branch is called
> "supermicro_dimm".
> >
> > https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm
> >
> > ./autogen.sh
> > ./configure
> > make
> > ipmi-sel/ipmi-sel --interpret-oem-data
> > (add remote connection options as needed to ipmi-sel)
> >
> > If that doesn't work, could you do the following
> >
> > ipmi-sel/ipmi-sel --debug --display=201
> >
> > (i picked 201 as one of the DIMM output belows.  Doesn't have to be
> > that one, just any specific DIMM SEL event).
> >
> > Thanks,
> >
> > Al
> >
> > On Tue, 2018-12-11 at 13:33 +0100, Tom Hetmer wrote:
> > > Supermicro (after pointing me to web interface and SNMP...):
> > > "Sorry, we do not have this Information at our support desk. you
> can
> > > request this via your sales channel, but it can be that you would
> > > need to sign an NDA for such information."
> > >
> > > So we're on our own, I don't have any better contact as we buy
> from a
> > > reseller.
> > > Besides they'd want an NDA for that 3 lines of code.
> > >
> > > Best,
> > > Tom Hetmer
> > >
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > >
> > > - Původní zpráva -
> > > Odesilatel: "Tom Hetmer" 
> > > Příjemce: "Al Chu" , freeipmi-users@gnu.org
> > > Datum: 12/11/18 12:09
> > > Předmět: Re[3]: [Freeipmi-users] Decoding ram errors on
> supermicro
> > >
> > > Hey,
> > >
> > > so that was fast - we've got an older X10SLM-F rented by a
> customer.
> > >
> > > IPMI web says
> > > 201    2018/09/22 00:23:34    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 202    2018/09/29 09:31:25    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 203    2018/10/13 19:31:34    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 204    2018/10/20 01:49:38    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > >
> > > freeipmi:
> > > ID   | Date        | Time     | Name             | Type         
>    
> > > | State    | Event

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-12 Thread Albert Chu
Oh whoops, so I got the wrong motherboard product ID number.  But it
otherwise it seems to work correctly.

I have some refactoring to do before adding the rest of the
motherboards, then I can give you another branch to try that will be a
candidate for putting into the next release.

Al

On Wed, 2018-12-12 at 12:04 +0100, Tom Hetmer wrote:
> Hi,
> 
> no luck.
> 
> 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory           
> | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event
> Data3 code = 80h
> 
> debug: http://termbin.com/3x02
> 10.110.32.36: [ 811h] = product_id[16b]
> It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
> You can check it in the full list:
> https://github.com/chu11/freeipmi-mirror/files/2651093/product_ids.tx
> t
> 
> When patched with 2065:
> 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable memory
> error ; DIMMB2(CPU1)
> 
> Voila :)
> 
> 
> Best,
> Tom Hetmer
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> - Původní zpráva -----
> > Odesilatel: "Albert Chu" 
> > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> .org
> > Datum: 12/12/18 02:18
> > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on
> supermicro
> >
> > Hey Tom,
> >
> > I got a branch on github with (what I hope) is support for the
> X10SLM+-
> > F.  Could you give it a shot.  The branch is called
> "supermicro_dimm".
> >
> > https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm
> >
> > ./autogen.sh
> > ./configure
> > make
> > ipmi-sel/ipmi-sel --interpret-oem-data
> > (add remote connection options as needed to ipmi-sel)
> >
> > If that doesn't work, could you do the following
> >
> > ipmi-sel/ipmi-sel --debug --display=201
> >
> > (i picked 201 as one of the DIMM output belows.  Doesn't have to be
> > that one, just any specific DIMM SEL event).
> >
> > Thanks,
> >
> > Al
> >
> > On Tue, 2018-12-11 at 13:33 +0100, Tom Hetmer wrote:
> > > Supermicro (after pointing me to web interface and SNMP...):
> > > "Sorry, we do not have this Information at our support desk. you
> can
> > > request this via your sales channel, but it can be that you would
> > > need to sign an NDA for such information."
> > >
> > > So we're on our own, I don't have any better contact as we buy
> from a
> > > reseller.
> > > Besides they'd want an NDA for that 3 lines of code.
> > >
> > > Best,
> > > Tom Hetmer
> > >
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > >
> > > - Původní zpráva -
> > > Odesilatel: "Tom Hetmer" 
> > > Příjemce: "Al Chu" , freeipmi-users@gnu.org
> > > Datum: 12/11/18 12:09
> > > Předmět: Re[3]: [Freeipmi-users] Decoding ram errors on
> supermicro
> > >
> > > Hey,
> > >
> > > so that was fast - we've got an older X10SLM-F rented by a
> customer.
> > >
> > > IPMI web says
> > > 201    2018/09/22 00:23:34    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 202    2018/09/29 09:31:25    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 203    2018/10/13 19:31:34    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > > 204    2018/10/20 01:49:38    OEM    Memory    Correctable Memory
> ECC
> > > @ DIMMB2(CPU1)
> > >
> > > freeipmi:
> > > ID   | Date        | Time     | Name             | Type         
>    
> > > | State    | Event
> > > 7    | Jan-21-2016 | 15:26:16 | FANA             | Fa

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-12 Thread Tom Hetmer
Hi,

no luck.


201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory            | 
Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 
80h
202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory            | 
Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 
80h
203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory            | 
Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 
80h
204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory            | 
Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 
80h


debug: http://termbin.com/3x02

10.110.32.36: [ 811h] = product_id[16b] 
It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051.
You can check it in the full list:
https://github.com/chu11/freeipmi-mirror/files/2651093/product_ids.txt


When patched with 2065:
201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable memory error ; 
DIMMB2(CPU1)
202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable memory error ; 
DIMMB2(CPU1)
203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable memory error ; 
DIMMB2(CPU1)
204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable memory error ; 
DIMMB2(CPU1)


Voila :)



Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Albert Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/12/18 02:18 
> Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> Hey Tom,
> 
> I got a branch on github with (what I hope) is support for the X10SLM+-
> F.  Could you give it a shot.  The branch is called "supermicro_dimm".
> 
> https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm
> 
> ./autogen.sh
> ./configure
> make
> ipmi-sel/ipmi-sel --interpret-oem-data
> (add remote connection options as needed to ipmi-sel)
> 
> If that doesn't work, could you do the following
> 
> ipmi-sel/ipmi-sel --debug --display=201
> 
> (i picked 201 as one of the DIMM output belows.  Doesn't have to be
> that one, just any specific DIMM SEL event).
> 
> Thanks,
> 
> Al
> 
> On Tue, 2018-12-11 at 13:33 +0100, Tom Hetmer wrote:
> > Supermicro (after pointing me to web interface and SNMP...):
> > "Sorry, we do not have this Information at our support desk. you can
> > request this via your sales channel, but it can be that you would
> > need to sign an NDA for such information."
> > 
> > So we're on our own, I don't have any better contact as we buy from a
> > reseller.
> > Besides they'd want an NDA for that 3 lines of code.
> > 
> > Best,
> > Tom Hetmer
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > - Původní zpráva -
> > Odesilatel: "Tom Hetmer" 
> > Příjemce: "Al Chu" , freeipmi-users@gnu.org
> > Datum: 12/11/18 12:09
> > Předmět: Re[3]: [Freeipmi-users] Decoding ram errors on supermicro
> > 
> > Hey,
> > 
> > so that was fast - we've got an older X10SLM-F rented by a customer.
> > 
> > IPMI web says
> > 201    2018/09/22 00:23:34    OEM    Memory    Correctable Memory ECC
> > @ DIMMB2(CPU1)
> > 202    2018/09/29 09:31:25    OEM    Memory    Correctable Memory ECC
> > @ DIMMB2(CPU1)
> > 203    2018/10/13 19:31:34    OEM    Memory    Correctable Memory ECC
> > @ DIMMB2(CPU1)
> > 204    2018/10/20 01:49:38    OEM    Memory    Correctable Memory ECC
> > @ DIMMB2(CPU1)
> > 
> > freeipmi:
> > ID   | Date        | Time     | Name             | Type             
> > | State    | Event
> > 7    | Jan-21-2016 | 15:26:16 | FANA             | Fan             
> >  | Critical | Lower Critical - going low ; Sensor Reading = 0.00 RPM
> > ; Threshold = 600.00 RPM
> > 8    | Jan-21-2016 | 15:26:16 | FANA             | Fan             
> >  | Critical | Lower Non-recoverable - going low ; Sensor Reading =
> > 0.00 RPM ; Threshold = 400.00 RPM
> > 9    | Jan-21-2016 | 15:26:25 | FANA             | Fan             
> >  | Critical | Lower Non-recoverable - going low ; Sensor Reading =
> > 13300.00 RPM ; Threshold = 400.00 RPM
> > 10   | Jan-21-2016 | 15:26:25 | FANA             | Fan             
> >  | Warning  | Lower Critical - going low ; Sensor Reading = 13300.00
> > RPM ; Threshold = 600.00 RPM
> > 201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory           
> > | Warning  | Correctable memory error ; OEM Event Data2 code = 2Bh ;
> > OEM Event Data3 code = 80h
> > 202  | Sep-29-2018 

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-11 Thread Tom Hetmer
Hey,

so that was fast - we've got an older X10SLM-F rented by a customer.


IPMI web says
201    2018/09/22 00:23:34    OEM    Memory    Correctable Memory ECC @ 
DIMMB2(CPU1)
202    2018/09/29 09:31:25    OEM    Memory    Correctable Memory ECC @ 
DIMMB2(CPU1)
203    2018/10/13 19:31:34    OEM    Memory    Correctable Memory ECC @ 
DIMMB2(CPU1)
204    2018/10/20 01:49:38    OEM    Memory    Correctable Memory ECC @ 
DIMMB2(CPU1)


freeipmi:
ID   | Date        | Time     | Name             | Type              | State    
| Event
7    | Jan-21-2016 | 15:26:16 | FANA             | Fan               | Critical 
| Lower Critical - going low ; Sensor Reading = 0.00 RPM ; Threshold = 600.00 
RPM
8    | Jan-21-2016 | 15:26:16 | FANA             | Fan               | Critical 
| Lower Non-recoverable - going low ; Sensor Reading = 0.00 RPM ; Threshold = 
400.00 RPM
9    | Jan-21-2016 | 15:26:25 | FANA             | Fan               | Critical 
| Lower Non-recoverable - going low ; Sensor Reading = 13300.00 RPM ; Threshold 
= 400.00 RPM
10   | Jan-21-2016 | 15:26:25 | FANA             | Fan               | Warning  
| Lower Critical - going low ; Sensor Reading = 13300.00 RPM ; Threshold = 
600.00 RPM
201  | Sep-22-2018 | 00:23:34 | Sensor #0        | Memory            | Warning  
| Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code 
= 80h
202  | Sep-29-2018 | 09:31:25 | Sensor #0        | Memory            | Warning  
| Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code 
= 80h
203  | Oct-13-2018 | 19:31:34 | Sensor #0        | Memory            | Warning  
| Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code 
= 80h
204  | Oct-20-2018 | 01:49:38 | Sensor #0        | Memory            | Warning  
| Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code 
= 80h


We'll ask the customer for downtime to replace it, all should then be correct 
as it's official data from supermicro's own interface.

Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com


- Původní zpráva -
Odesilatel: "Tom Hetmer" 
Příjemce: freeipmi-users@gnu.org, "Al Chu" 
Datum: 12/11/18 11:59
Předmět: Re[2]: [Freeipmi-users] Decoding ram errors on supermicro

Hi,

it appears we have no ECC errors on the servers we directly own right now.
I can let you know when we get one though.


We rent out some machines to customers as well, maybe there's some errors there 
=> my colleague will check the report today.


I also created a ticket with Supermicro just if they can confirm we're looking 
at the right code/add any official details.


Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva -
> Odesilatel: "Al Chu" 
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org
> Datum: 12/11/18 02:28
> Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro
>
> Hey Tom,
>
> Is there a specific motherboard (amongst the product IDs you mentioned
> below) you have with a dimm error that we can test on.  To make sure I
> don't make a major mistake, I'd like to code to 1 motherboard first.
>
> Thanks,
> Al
>
>
> On Wed, 2018-12-05 at 10:48 -0800, Albert Chu wrote:
> > On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote:
> > > Alright, added to github.
> > >
> > > Here's the output from bmc-info for that particular board.
> > > Product ID            : 2201
> > > [Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4,
> > > BIOS 2.0 01/30/2016
> > >
> > >
> > > I guess you'll support it based on the product ID?
> >
> > Yes!  Thanks.  I'll put these in the ticket too.
> >
> > Al
> >
> > > So if there are any other (X10) boards with different product ID
> > > but
> > > the same SEL output I'll have to send it again, correct?
> > >
> > >
> > > I have all kinds of numbers on other machines,
> > > ie. 
> > > X10DRW-E => 2148
> > > X11SPi-TF => 2369
> > > X10SLL-F => 2049
> > > X10DRL-i => 2097
> > > X11DDW-NT => 2407
> > > X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051
> > >
> > >
> > > and so on.. I think we have at least 1/4 of the boards they
> > > manufacture.
> > > X9s are under 2000, X11 seems to be 23xx. But that's maybe too much
> > > reverse engineering to you ;)
> > > I can try to ping them and ask about details but I got no offical
> > > contact with Supermicro.
> > >
> > >
> > > Best,
> > > Tom Hetmer
> > >
> > >
> > > CDN77 Operations
> > > supp...@cdn7

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-11 Thread Tom Hetmer
Hi,

it appears we have no ECC errors on the servers we directly own right now.
I can let you know when we get one though.


We rent out some machines to customers as well, maybe there's some errors there 
=> my colleague will check the report today.


I also created a ticket with Supermicro just if they can confirm we're looking 
at the right code/add any official details.


Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Al Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/11/18 02:28 
> Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> Hey Tom,
> 
> Is there a specific motherboard (amongst the product IDs you mentioned
> below) you have with a dimm error that we can test on.  To make sure I
> don't make a major mistake, I'd like to code to 1 motherboard first.
> 
> Thanks,
> Al
> 
>  
> On Wed, 2018-12-05 at 10:48 -0800, Albert Chu wrote:
> > On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote:
> > > Alright, added to github.
> > > 
> > > Here's the output from bmc-info for that particular board.
> > > Product ID            : 2201
> > > [Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4,
> > > BIOS 2.0 01/30/2016
> > > 
> > > 
> > > I guess you'll support it based on the product ID?
> > 
> > Yes!  Thanks.  I'll put these in the ticket too.
> > 
> > Al
> > 
> > > So if there are any other (X10) boards with different product ID
> > > but
> > > the same SEL output I'll have to send it again, correct?
> > > 
> > > 
> > > I have all kinds of numbers on other machines,
> > > ie. 
> > > X10DRW-E => 2148
> > > X11SPi-TF => 2369
> > > X10SLL-F => 2049
> > > X10DRL-i => 2097
> > > X11DDW-NT => 2407
> > > X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051
> > > 
> > > 
> > > and so on.. I think we have at least 1/4 of the boards they
> > > manufacture.
> > > X9s are under 2000, X11 seems to be 23xx. But that's maybe too much
> > > reverse engineering to you ;)
> > > I can try to ping them and ask about details but I got no offical
> > > contact with Supermicro.
> > > 
> > > 
> > > Best,
> > > Tom Hetmer
> > > 
> > > 
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > 
> > > - Původní zpráva - 
> > > > Odesilatel: "Albert Chu"  
> > > > Příjemce: "Tom Hetmer" , freeipmi-users@g
> > > > nu
> > > > .org 
> > > > Datum: 12/04/18 19:40 
> > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > > > 
> > > > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > > > > Sure. It seems there's a similar ticket
> > > > > already: https://github.com/chu11/freeipmi-mirror/issues/19
> > > > 
> > > > Ahh, if you could, update it with info from ipmitool / ipmiutil.
> > > >  I
> > > > was
> > > > reluctant to add support based on reverse engineering.  But if
> > > > other
> > > > tools have "official" interpretations from Supermicro, I'm more
> > > > confident in the addition.
> > > > 
> > > > > Yep, that's the code. ipmitool and a few others decode it too.
> > > > > 
> > > > > 
> > > > > We have a *lot* of Supermicros so I can help with testing if
> > > > > needed -
> > > > > but we don't get that much CRC errors though :)
> > > > 
> > > > The one thing I'll need is product ID numbers (you can get from
> > > > bmc-
> > > > info) and the name of the product.  This goes into the
> > > > documentation
> > > > and some of the code.
> > > > 
> > > > Thanks,
> > > > 
> > > > Al
> > > > 
> > > > > So I guess we'd have to wait till one pops up. But I hope the
> > > > > 'ver 2'
> > > > > method from ipmiutil works fine.
> > > > > We used ipmitool in our monitoring before and it was accurate
> > > > > but
> > > > > slow, that's why I rewrote it all to use freeipmi.
> > > > > 
> > > > > 
> > > > > Thanks!
> > > > > 
> > > &

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-10 Thread Al Chu
Hey Tom,

Is there a specific motherboard (amongst the product IDs you mentioned
below) you have with a dimm error that we can test on.  To make sure I
don't make a major mistake, I'd like to code to 1 motherboard first.

Thanks,
Al

 
On Wed, 2018-12-05 at 10:48 -0800, Albert Chu wrote:
> On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote:
> > Alright, added to github.
> > 
> > Here's the output from bmc-info for that particular board.
> > Product ID            : 2201
> > [Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4,
> > BIOS 2.0 01/30/2016
> > 
> > 
> > I guess you'll support it based on the product ID?
> 
> Yes!  Thanks.  I'll put these in the ticket too.
> 
> Al
> 
> > So if there are any other (X10) boards with different product ID
> > but
> > the same SEL output I'll have to send it again, correct?
> > 
> > 
> > I have all kinds of numbers on other machines,
> > ie. 
> > X10DRW-E => 2148
> > X11SPi-TF => 2369
> > X10SLL-F => 2049
> > X10DRL-i => 2097
> > X11DDW-NT => 2407
> > X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051
> > 
> > 
> > and so on.. I think we have at least 1/4 of the boards they
> > manufacture.
> > X9s are under 2000, X11 seems to be 23xx. But that's maybe too much
> > reverse engineering to you ;)
> > I can try to ping them and ask about details but I got no offical
> > contact with Supermicro.
> > 
> > 
> > Best,
> > Tom Hetmer
> > 
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > - Původní zpráva - 
> > > Odesilatel: "Albert Chu"  
> > > Příjemce: "Tom Hetmer" , freeipmi-users@g
> > > nu
> > > .org 
> > > Datum: 12/04/18 19:40 
> > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > > 
> > > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > > > Sure. It seems there's a similar ticket
> > > > already: https://github.com/chu11/freeipmi-mirror/issues/19
> > > 
> > > Ahh, if you could, update it with info from ipmitool / ipmiutil.
> > >  I
> > > was
> > > reluctant to add support based on reverse engineering.  But if
> > > other
> > > tools have "official" interpretations from Supermicro, I'm more
> > > confident in the addition.
> > > 
> > > > Yep, that's the code. ipmitool and a few others decode it too.
> > > > 
> > > > 
> > > > We have a *lot* of Supermicros so I can help with testing if
> > > > needed -
> > > > but we don't get that much CRC errors though :)
> > > 
> > > The one thing I'll need is product ID numbers (you can get from
> > > bmc-
> > > info) and the name of the product.  This goes into the
> > > documentation
> > > and some of the code.
> > > 
> > > Thanks,
> > > 
> > > Al
> > > 
> > > > So I guess we'd have to wait till one pops up. But I hope the
> > > > 'ver 2'
> > > > method from ipmiutil works fine.
> > > > We used ipmitool in our monitoring before and it was accurate
> > > > but
> > > > slow, that's why I rewrote it all to use freeipmi.
> > > > 
> > > > 
> > > > Thanks!
> > > > 
> > > > 
> > > > Best,
> > > > Tom Hetmer
> > > > 
> > > > 
> > > > CDN77 Operations
> > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > > 
> > > > - Původní zpráva - 
> > > > > Odesilatel: "Albert Chu"  
> > > > > Příjemce: "Tom Hetmer" , freeipmi-
> > > > > users
> > > > > @gnu
> > > > > .org 
> > > > > Datum: 12/03/18 21:06 
> > > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on
> > > > > supermicro 
> > > > > 
> > > > > Hi Tom,
> > > > > 
> > > > > Thanks for the pointer to ipmiutil's code.  I assume you
> > > > > found
> > > > > this
> > > > > comment:
> > > > > 
> > > > > ---
> > > > >   /* ver 2 method: 2A 80 = P1_DIMMB1
> > > > > */   
> > > > >   
> > > > > 
> > > >

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-05 Thread Albert Chu
On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote:
> Alright, added to github.
> 
> Here's the output from bmc-info for that particular board.
> Product ID            : 2201
> [Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4,
> BIOS 2.0 01/30/2016
> 
> 
> I guess you'll support it based on the product ID?

Yes!  Thanks.  I'll put these in the ticket too.

Al

> So if there are any other (X10) boards with different product ID but
> the same SEL output I'll have to send it again, correct?
> 
> 
> I have all kinds of numbers on other machines,
> ie. 
> X10DRW-E => 2148
> X11SPi-TF => 2369
> X10SLL-F => 2049
> X10DRL-i => 2097
> X11DDW-NT => 2407
> X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051
> 
> 
> and so on.. I think we have at least 1/4 of the boards they
> manufacture.
> X9s are under 2000, X11 seems to be 23xx. But that's maybe too much
> reverse engineering to you ;)
> I can try to ping them and ask about details but I got no offical
> contact with Supermicro.
> 
> 
> Best,
> Tom Hetmer
> 
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> - Původní zpráva ----- 
> > Odesilatel: "Albert Chu"  
> > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> > .org 
> > Datum: 12/04/18 19:40 
> > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > 
> > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > > Sure. It seems there's a similar ticket
> > > already: https://github.com/chu11/freeipmi-mirror/issues/19
> > 
> > Ahh, if you could, update it with info from ipmitool / ipmiutil.  I
> > was
> > reluctant to add support based on reverse engineering.  But if
> > other
> > tools have "official" interpretations from Supermicro, I'm more
> > confident in the addition.
> > 
> > > Yep, that's the code. ipmitool and a few others decode it too.
> > > 
> > > 
> > > We have a *lot* of Supermicros so I can help with testing if
> > > needed -
> > > but we don't get that much CRC errors though :)
> > 
> > The one thing I'll need is product ID numbers (you can get from
> > bmc-
> > info) and the name of the product.  This goes into the
> > documentation
> > and some of the code.
> > 
> > Thanks,
> > 
> > Al
> > 
> > > So I guess we'd have to wait till one pops up. But I hope the
> > > 'ver 2'
> > > method from ipmiutil works fine.
> > > We used ipmitool in our monitoring before and it was accurate but
> > > slow, that's why I rewrote it all to use freeipmi.
> > > 
> > > 
> > > Thanks!
> > > 
> > > 
> > > Best,
> > > Tom Hetmer
> > > 
> > > 
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > 
> > > - Původní zpráva - 
> > > > Odesilatel: "Albert Chu"  
> > > > Příjemce: "Tom Hetmer" , freeipmi-users
> > > > @gnu
> > > > .org 
> > > > Datum: 12/03/18 21:06 
> > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on
> > > > supermicro 
> > > > 
> > > > Hi Tom,
> > > > 
> > > > Thanks for the pointer to ipmiutil's code.  I assume you found
> > > > this
> > > > comment:
> > > > 
> > > > ---
> > > >   /* ver 2 method: 2A 80 = P1_DIMMB1
> > > > */ 
> > > > 
> > > >    
> > > >   /* SuperMicro
> > > > says:  
> > > > 
> > > > 
> > > >    *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > > > (='B') 
> > > > 
> > > > 
> > > >    *  dimm: %c (data2 & 0xf) +
> > > > 0x27,  
> > > > 
> > > >  
> > > >    *  cpu:  %x (data3 & 0x03) +
> > > > 1);
> > > > 
> > > > 
> > > >    */   
> > > > ---
> >

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-04 Thread Tom Hetmer
Alright, added to github.

Here's the output from bmc-info for that particular board.
Product ID            : 2201
[Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 
01/30/2016


I guess you'll support it based on the product ID?
So if there are any other (X10) boards with different product ID but the same 
SEL output I'll have to send it again, correct?


I have all kinds of numbers on other machines,
ie. 
X10DRW-E => 2148
X11SPi-TF => 2369
X10SLL-F => 2049
X10DRL-i => 2097
X11DDW-NT => 2407
X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051


and so on.. I think we have at least 1/4 of the boards they manufacture.
X9s are under 2000, X11 seems to be 23xx. But that's maybe too much reverse 
engineering to you ;)
I can try to ping them and ask about details but I got no offical contact with 
Supermicro.


Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Albert Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/04/18 19:40 
> Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > Sure. It seems there's a similar ticket
> > already: https://github.com/chu11/freeipmi-mirror/issues/19
> 
> Ahh, if you could, update it with info from ipmitool / ipmiutil.  I was
> reluctant to add support based on reverse engineering.  But if other
> tools have "official" interpretations from Supermicro, I'm more
> confident in the addition.
> 
> > Yep, that's the code. ipmitool and a few others decode it too.
> > 
> > 
> > We have a *lot* of Supermicros so I can help with testing if needed -
> > but we don't get that much CRC errors though :)
> 
> The one thing I'll need is product ID numbers (you can get from bmc-
> info) and the name of the product.  This goes into the documentation
> and some of the code.
> 
> Thanks,
> 
> Al
> 
> > So I guess we'd have to wait till one pops up. But I hope the 'ver 2'
> > method from ipmiutil works fine.
> > We used ipmitool in our monitoring before and it was accurate but
> > slow, that's why I rewrote it all to use freeipmi.
> > 
> > 
> > Thanks!
> > 
> > 
> > Best,
> > Tom Hetmer
> > 
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > - Původní zpráva - 
> > > Odesilatel: "Albert Chu"  
> > > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> > > .org 
> > > Datum: 12/03/18 21:06 
> > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > > 
> > > Hi Tom,
> > > 
> > > Thanks for the pointer to ipmiutil's code.  I assume you found this
> > > comment:
> > > 
> > > ---
> > >   /* ver 2 method: 2A 80 = P1_DIMMB1
> > > */ 
> > >    
> > >   /* SuperMicro
> > > says:  
> > > 
> > >    *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > > (='B') 
> > > 
> > >    *  dimm: %c (data2 & 0xf) +
> > > 0x27,  
> > >  
> > >    *  cpu:  %x (data3 & 0x03) +
> > > 1);
> > > 
> > >    */   
> > > ---
> > > 
> > > I can definitely add it to my todo list.
> > > 
> > > Would you mind writing up an issue on github here?
> > > 
> > > https://github.com/chu11/freeipmi-mirror
> > > 
> > > Al
> > > 
> > > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > > > Hi, 
> > > > 
> > > > it'd be good if freeipmi supported decoding the supermicro ECC
> > > > errors.
> > > > 
> > > > 
> > > > Manufacturer: Supermicro
> > > > Product Name: X10DRH LN4
> > > > eg.
> > > > freeipmi
> > > > 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable
> > > > memory
> > > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> > > > 
> > > > 
>

Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-04 Thread Albert Chu
On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> Sure. It seems there's a similar ticket
> already: https://github.com/chu11/freeipmi-mirror/issues/19

Ahh, if you could, update it with info from ipmitool / ipmiutil.  I was
reluctant to add support based on reverse engineering.  But if other
tools have "official" interpretations from Supermicro, I'm more
confident in the addition.

> Yep, that's the code. ipmitool and a few others decode it too.
> 
> 
> We have a *lot* of Supermicros so I can help with testing if needed -
> but we don't get that much CRC errors though :)

The one thing I'll need is product ID numbers (you can get from bmc-
info) and the name of the product.  This goes into the documentation
and some of the code.

Thanks,

Al

> So I guess we'd have to wait till one pops up. But I hope the 'ver 2'
> method from ipmiutil works fine.
> We used ipmitool in our monitoring before and it was accurate but
> slow, that's why I rewrote it all to use freeipmi.
> 
> 
> Thanks!
> 
> 
> Best,
> Tom Hetmer
> 
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> - Původní zpráva - 
> > Odesilatel: "Albert Chu"  
> > Příjemce: "Tom Hetmer" , freeipmi-users@gnu
> > .org 
> > Datum: 12/03/18 21:06 
> > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > 
> > Hi Tom,
> > 
> > Thanks for the pointer to ipmiutil's code.  I assume you found this
> > comment:
> > 
> > ---
> >   /* ver 2 method: 2A 80 = P1_DIMMB1
> > */ 
> >    
> >   /* SuperMicro
> > says:  
> > 
> >    *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > (='B') 
> > 
> >    *  dimm: %c (data2 & 0xf) +
> > 0x27,  
> >  
> >    *  cpu:  %x (data3 & 0x03) +
> > 1);
> > 
> >    */   
> > ---
> > 
> > I can definitely add it to my todo list.
> > 
> > Would you mind writing up an issue on github here?
> > 
> > https://github.com/chu11/freeipmi-mirror
> > 
> > Al
> > 
> > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > > Hi, 
> > > 
> > > it'd be good if freeipmi supported decoding the supermicro ECC
> > > errors.
> > > 
> > > 
> > > Manufacturer: Supermicro
> > > Product Name: X10DRH LN4
> > > eg.
> > > freeipmi
> > > 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable
> > > memory
> > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> > > 
> > > 
> > > web interface
> > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> > > (@DIMMG1(CPU2)) | Asserted
> > > 
> > > 
> > > something like this worked for me (stolen from ipmiutil)
> > > 
> > > 
> > > $cpu = ($data3 & 0x03) + 1;
> > > 
> > > 
> > > $NPAIRS = 26;
> > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > > 
> > > 
> > > $bdata = "0x".$data2.$data3;
> > > $bdata = hexdec($bdata);
> > > $pair = (($bdata & 0xF0) >> 4) - 1;
> > > 
> > > 
> > > if ($pair < 0) $pair = 0;
> > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> > > 
> > > 
> > > $pair = $rgpairs[$pair - 1];
> > > 
> > > 
> > > $dimm = $bdata & 0x0F;
> > > 
> > > 
> > > $dimm may be incorrect as the original code decrements 9, but on
> > > that
> > > board it was wrong so i changed it to get the right result -
> > > we'll
> > > see if it keeps getting the right values.
> > > 
> > > Best,
> > > Tom Hetmer
> > > 
> > > 
> > > CDN77 Operations
> > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > > 
> > > ___
> > > Freeipmi-users mailing list
> > > Freeipmi-users@gnu.org
> > > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> > 
> > -- 
> > Albert Chu
> > ch...@llnl.gov
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> 
> ___
> Freeipmi-users mailing list
> Freeipmi-users@gnu.org
> https://lists.gnu.org/mailman/listinfo/freeipmi-users
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-users mailing list
Freeipmi-users@gnu.org
https://lists.gnu.org/mailman/listinfo/freeipmi-users


Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-04 Thread Tom Hetmer
Sure. It seems there's a similar ticket already: 
https://github.com/chu11/freeipmi-mirror/issues/19
Yep, that's the code. ipmitool and a few others decode it too.


We have a *lot* of Supermicros so I can help with testing if needed - but we 
don't get that much CRC errors though :)
So I guess we'd have to wait till one pops up. But I hope the 'ver 2' method 
from ipmiutil works fine.
We used ipmitool in our monitoring before and it was accurate but slow, that's 
why I rewrote it all to use freeipmi.


Thanks!


Best,
Tom Hetmer


CDN77 Operations
supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com

- Původní zpráva - 
> Odesilatel: "Albert Chu"  
> Příjemce: "Tom Hetmer" , freeipmi-users@gnu.org 
> Datum: 12/03/18 21:06 
> Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> Hi Tom,
> 
> Thanks for the pointer to ipmiutil's code.  I assume you found this
> comment:
> 
> ---
>   /* ver 2 method: 2A 80 = P1_DIMMB1 */   
>  
>   /* SuperMicro says: 
>  
>    *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3, (='B')
>  
>    *  dimm: %c (data2 & 0xf) + 0x27,  
>  
>    *  cpu:  %x (data3 & 0x03) + 1);   
>  
>    */   
> ---
> 
> I can definitely add it to my todo list.
> 
> Would you mind writing up an issue on github here?
> 
> https://github.com/chu11/freeipmi-mirror
> 
> Al
> 
> On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > Hi, 
> > 
> > it'd be good if freeipmi supported decoding the supermicro ECC
> > errors.
> > 
> > 
> > Manufacturer: Supermicro
> > Product Name: X10DRH LN4
> > eg.
> > freeipmi
> > 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable memory
> > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> > 
> > 
> > web interface
> > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> > (@DIMMG1(CPU2)) | Asserted
> > 
> > 
> > something like this worked for me (stolen from ipmiutil)
> > 
> > 
> > $cpu = ($data3 & 0x03) + 1;
> > 
> > 
> > $NPAIRS = 26;
> > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > 
> > 
> > $bdata = "0x".$data2.$data3;
> > $bdata = hexdec($bdata);
> > $pair = (($bdata & 0xF0) >> 4) - 1;
> > 
> > 
> > if ($pair < 0) $pair = 0;
> > if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> > 
> > 
> > $pair = $rgpairs[$pair - 1];
> > 
> > 
> > $dimm = $bdata & 0x0F;
> > 
> > 
> > $dimm may be incorrect as the original code decrements 9, but on that
> > board it was wrong so i changed it to get the right result - we'll
> > see if it keeps getting the right values.
> > 
> > Best,
> > Tom Hetmer
> > 
> > 
> > CDN77 Operations
> > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > ___
> > Freeipmi-users mailing list
> > Freeipmi-users@gnu.org
> > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> -- 
> Albert Chu
> ch...@llnl.gov
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory

___
Freeipmi-users mailing list
Freeipmi-users@gnu.org
https://lists.gnu.org/mailman/listinfo/freeipmi-users


Re: [Freeipmi-users] Decoding ram errors on supermicro

2018-12-03 Thread Albert Chu
Hi Tom,

Thanks for the pointer to ipmiutil's code.  I assume you found this
comment:

---
  /* ver 2 method: 2A 80 = P1_DIMMB1 */ 
   
  /* SuperMicro says:   
   
   *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3, (='B')  
   
   *  dimm: %c (data2 & 0xf) + 0x27,
   
   *  cpu:  %x (data3 & 0x03) + 1); 
   
   */   
---

I can definitely add it to my todo list.

Would you mind writing up an issue on github here?

https://github.com/chu11/freeipmi-mirror

Al

On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> Hi, 
> 
> it'd be good if freeipmi supported decoding the supermicro ECC
> errors.
> 
> 
> Manufacturer: Supermicro
> Product Name: X10DRH LN4
> eg.
> freeipmi
> 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable memory
> error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> 
> 
> web interface
> 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> (@DIMMG1(CPU2)) | Asserted
> 
> 
> something like this worked for me (stolen from ipmiutil)
> 
> 
> $cpu = ($data3 & 0x03) + 1;
> 
> 
> $NPAIRS = 26;
> $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> 
> 
> $bdata = "0x".$data2.$data3;
> $bdata = hexdec($bdata);
> $pair = (($bdata & 0xF0) >> 4) - 1;
> 
> 
> if ($pair < 0) $pair = 0;
> if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> 
> 
> $pair = $rgpairs[$pair - 1];
> 
> 
> $dimm = $bdata & 0x0F;
> 
> 
> $dimm may be incorrect as the original code decrements 9, but on that
> board it was wrong so i changed it to get the right result - we'll
> see if it keeps getting the right values.
> 
> Best,
> Tom Hetmer
> 
> 
> CDN77 Operations
> supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> ___
> Freeipmi-users mailing list
> Freeipmi-users@gnu.org
> https://lists.gnu.org/mailman/listinfo/freeipmi-users
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-users mailing list
Freeipmi-users@gnu.org
https://lists.gnu.org/mailman/listinfo/freeipmi-users