Hi!
Good news and a bad news. Both T500 laptops were examined. One was
(almost) repaired. One is dead.
One-line summary: Yes, memtest86+ killed them. No, it is not related to
the embedded controller. It's a short-circuit in a power hub IC.
Details:
- It's important to note that power management logic in lenovo thinkpad
laptops is quite sophisticated. The embedded controller provides a
high-level signal, while special IC's issue signals to various gates to
power up or power down specific parts of the system.
- One of those low-level IC's is a RIKNAN (U61 on lenovo schematics).
The important part of the IC is the VCC3SW micro-power LDO (dc/dc
converter). It provides a limited 3.3v power supply for the power button
detection circuit, thermal protection logic and a power hub IC.
- The power hub PMH_7 (U28) is more intelligent then RINKAN, and has a
SPI connection to the EC. It controls a lot of clocks and power signals
on a main board. Note that PMH is used across different lenovo products,
so some of it's outputs are left unused. It is a common practice to tie
unused IC outputs to ground or VCC instead of leaving them unconnected.
- Coreboot developers discovered a method of accessing the internal
registers of the PMH. The protocol is simple: write a register address
to some memory-mapped EC address, then write desired value to the other
EC address.
outb(reg, EC_LENOVO_PMH7_ADDR);
val = inb(EC_LENOVO_PMH7_DATA);
outb(reg, EC_LENOVO_PMH7_ADDR);
outb(val | (1 << bit), EC_LENOVO_PMH7_DATA);
- Now we are leaving the hard facts ground and start speculating.
- It seems be the case than either BIOS do not list memory-mapped EC
registers as a reserved memory area, or memtest86+ fails to process this
reservation correctly.
- The pattern of the memory writes by memtest is (unfortunately) 100%
compatible with PMH internal register access protocol.
- It is very possible that by writing some moving ones and zeros or a
random bytes, the memtest has pulled an unused (tied to ground or VCC)
PMH pin high or low - thereby creating a short-circuit on VCC3SW line.
- This short-circuit would tend to overheat the RINKAN LDO as it's
output transistor is in active mode, and is easily overloaded with a PMH
output transistor (which is in conduction mode with a resistance of
milli-ohms). It seems that RINKAN has no over-current or thermal
protection built in.
- VCC3SW malfunction is not critical while the main board 3.3V/9А and
5V/8A buses are powered by TPS51221 (U41) IC. Most components draw power
from main buses and not from VCC3SW. But when the laptop is powered off,
there is no VCC3SW bus to initiate the power-on process. The laptop is
bricked.
Findings:
Both laptops were disassembled and main boards examined using a
multi-meter and an oscilloscope. The main boards were of a different
revisions (and different types: one with discrete graphics, one without)
but both has the VCC3SW power bus malfunctioned. The first laptop
provided around 1.2v over the VCC3SW and a measured resistance from
VCC3SW to GND was around 400 Ohm. After cutting the VCC3SW pin on RINKAN
IC and providing an external power to the VCC3SW line - the laptop
powered up and attempted to boot. We ended up wiring up an external
micro-power LDO (LP2930-3.3) to provide the power permanently. This
laptop still has some minor problems (like refusing to power-up unless
the battery is removed and AC-IN is plugged-in), but is still usable.
The second T500 RINKAN was not providing any power to the VCC3SW bus,
and measured resistance was only ~50 Ohms. We had to cut both VCC3SW
(output) and VREGIN20 (input) RINKAN pins to remove an over-current
condition. After that we observed the power on main 3.3V and 5V buses,
but RINKAN/PMH7 do not issue 'POWER GOOD' signals and prevent the system
to become usable. No repair is possible.
It looks like T6x, T400/500, T410/510, T420/520 laptop families could be
affected by this problem. Starting from the T430/530 series, a
communication protocol with the EC was changed - breaking tp_smapi
driver and fixing the described problem as a side effect.
I have a "revived" T500 on hands and I would be happy to provide any
information to confirm or correct my findings.
I still think that it's appropriate to warn lenovo users of a
possibility to brick their laptops with just a mere memory test.
---
Sincerely yours,
Sergey Kogan