Bug#900399: More good news

2018-06-07 Thread Сергей Коган

Hi!

Let's lower the severity of this bug and flag it as unverified.

Given the datasheet for the TB62501 and actual board layout of the T500 
- the described scenario (short from the VCC3SW to GND caused by a stray 
write to the PMH register) is highly improbable:


- The LDO inside the RINKAN has an over-current protection set as low as 
55mA and should prevent any damage even if the VCC3SW is shorted. After 
the single over-current/under-voltage event, RINKAN LDO is locked in the 
OFF state and requires a complete power-off to restart.


- Unused pins of the PMH are in fact floating

- Some RINKAN batches do show tendency to malfunction with no apparent 
reasons. The main board temperature could be a contributing factor.


So, we have to seriously consider the possibility that two laptops died 
at the same time just by a coincidence.


We do plan to run a memtest on the restored laptop using a current 
measuring/limiting circuit on the VCC3SW bus. If no excessive current 
consumption would be detected - the memtest has nothing to do with the 
issue. If an excessive current during the test would be observed, it 
would get us a direction to resume the investigation.


---
Sincerely yours,
Sergey Kogan



Bug#900399: It's confirmed: memtest86+ can kill lenovo mainboard

2018-06-06 Thread Сергей Коган

Hi!

Good news and a bad news. Both T500 laptops were examined. One was 
(almost) repaired. One is dead.


One-line summary:  Yes, memtest86+ killed them. No, it is not related to 
the embedded controller. It's a short-circuit in a power hub IC.


Details:

- It's important to note that power management logic in lenovo thinkpad 
laptops is quite sophisticated. The embedded controller provides a 
high-level signal, while special IC's issue signals to various gates to 
power up or power down specific parts of the system.


- One of those low-level IC's is a RIKNAN (U61 on lenovo schematics). 
The important part of the IC is the VCC3SW micro-power LDO (dc/dc 
converter). It provides a limited 3.3v power supply for the power button 
detection circuit, thermal protection logic and a power hub IC.


- The power hub PMH_7 (U28) is more intelligent then RINKAN, and has a 
SPI connection to the EC. It controls a lot of clocks and power signals 
on a main board. Note that PMH is used across different lenovo products, 
so some of it's outputs are left unused. It is a common practice to tie 
unused IC outputs to ground or VCC instead of leaving them unconnected.


- Coreboot developers discovered a method of accessing the internal 
registers of the PMH. The protocol is simple: write a register address 
to some memory-mapped EC address, then write desired value to the other 
EC address.


    outb(reg, EC_LENOVO_PMH7_ADDR);
    val = inb(EC_LENOVO_PMH7_DATA);
    outb(reg, EC_LENOVO_PMH7_ADDR);
    outb(val | (1 << bit), EC_LENOVO_PMH7_DATA);

- Now we are leaving the hard facts ground and start speculating.

- It seems be the case than either BIOS do not list memory-mapped EC 
registers as a reserved memory area, or memtest86+ fails to process this 
reservation correctly.


- The pattern of the memory writes by memtest is (unfortunately) 100% 
compatible with PMH internal register access protocol.


- It is very possible that by writing some moving ones and zeros or a 
random bytes, the memtest has pulled an unused (tied to ground or VCC) 
PMH pin high or low - thereby creating a short-circuit on VCC3SW line.


- This short-circuit would tend to overheat the RINKAN LDO as it's 
output transistor is in active mode, and is easily overloaded with a PMH 
output transistor (which is in conduction mode with a resistance of 
milli-ohms). It seems that RINKAN has no over-current or thermal 
protection built in.


- VCC3SW malfunction is not critical while the main board 3.3V/9А and 
5V/8A buses are powered by TPS51221 (U41) IC. Most components draw power 
from main buses and not from VCC3SW. But when the laptop is powered off, 
there is no VCC3SW bus to initiate the power-on process. The laptop is 
bricked.


Findings:

Both laptops were disassembled and main boards examined using a 
multi-meter and an oscilloscope. The main boards were of a different 
revisions (and different types: one with discrete graphics, one without) 
but both has the VCC3SW power bus malfunctioned. The first laptop 
provided around 1.2v over the VCC3SW and a measured resistance from 
VCC3SW to GND was around 400 Ohm. After cutting the VCC3SW pin on RINKAN 
IC and providing an external power to the VCC3SW line - the laptop 
powered up and attempted to boot. We ended up wiring up an external 
micro-power LDO (LP2930-3.3) to provide the power permanently. This 
laptop still has some minor problems (like refusing to power-up unless 
the battery is removed and AC-IN is plugged-in), but is still usable.


The second T500 RINKAN was not providing any power to the VCC3SW bus, 
and measured resistance was only ~50 Ohms. We had to cut both VCC3SW 
(output) and VREGIN20 (input) RINKAN pins to remove an over-current 
condition. After that we observed the power on main 3.3V and 5V buses, 
but RINKAN/PMH7 do not issue 'POWER GOOD' signals and prevent the system 
to become usable. No repair is possible.


It looks like T6x, T400/500, T410/510, T420/520 laptop families could be 
affected by this problem. Starting from the T430/530 series, a 
communication protocol with the EC was changed - breaking tp_smapi 
driver and fixing the described problem as a side effect.


I have a "revived" T500 on hands and I would be happy to provide any 
information to confirm or correct my findings.


I still think that it's appropriate to warn lenovo users of a 
possibility to brick their laptops with just a mere memory test.


---
Sincerely yours,
Sergey Kogan