I agree with Francois and suggest you just disable the PPC temperature 
monitoring on that board rather than trying to rework it.  Using different code 
on that ROACH's Actel Fusion might be painful though. You could either disable 
all the safety shutdowns with that flag as you've done and live with that, or 
otherwise I like your idea of shorting the PPC's temperature diode and I think 
this is a harmless fix. The Fusion drives 10uA and 100uA through these BJT 
temperature sensors and measures the differential voltage to determine the 
temperature so shorting these lines won't increase the currents or anything. 

ATRTN0/1 is the return current for these modules. They're shared lines so you 
can have multiple temperature probes. If I understand you correctly, you're 
suspecting a disconnect between J31 and the Fusion. I think these two lines are 
pulled low/connected together inside the Fusion. So you could try'n connect 
that line to the other ATRTN line to see if you can partially restore some 
temperature monitoring functionality (it'd be noisier if it worked at all). If 
the break's between J31 and the Fusion, try'n short J31's pins 1,3,5 and 7 all 
together and see if you get more sensible temperature readings. 

If the break's between J31 and the PPC (and you can't get to it on the PPC side 
of the break), then I'd suggest just shorting the sensing on J31 and living 
without PPC temp sensing.

Jason

On 26 Apr 2012, at 06:38, Francois Kapp wrote:

> Hi Matt,
> 
> Good sleuthing!
> 
> The x-ray evidence points to pcb damage under the Actel device, without that 
> I would intuitively suggested that the PPC is the more likely candidate for 
> soldering problems. I would say that not monitoring PPC temperature is not a 
> complete show stopper from a hardware point of view, but the device does get 
> warm, so a good check on the heatsink mounting would be required. I'll leave 
> it to the software guys to comment on the workaround suggestion.
> 
> Francois
> 
> On Apr 25, 2012 11:40 PM, "Matt Dexter" <mdex...@calmail.berkeley.edu> wrote:
> Hi,
> 
> Yesterday I spent some time debugging a Roach1 with
> something strange with the PPC temperature sensor.
> The PPC temperature reported is about 50 deg higher
> than that reported for the Xilinx FPGA or Actel Fusion
> device (which actually has the ADC).
> As in 80 vs 30 when just the Xport is running.
> 
> Could this be caused by the temp sensor having a good
> connection to 1 and only 1 of the 2 PPC's PNP transistor leads ?
> 
> Is there any easy software workaround ?
> Would it be OK to use such a workaround and not monitor PPC's temp ?
> Would you accept delivery of a Roach1 that did not monitor
> the PPC's temp but was otherwise AOK ?
> 
> Or should we remove the Actel Fusion BGA device, inspect&repair
> as necessary, install a new Actel and continue debugging ?
> 
> Thanks
> Matt
> 
> ------------------------------------------------------------
> 
> Before I had a chance to look at the board the Actel
> Fusion device U60 was replaced. Both the current and
> previous devices were programmed with the latest and greatest
> design.  I don't know for certain but suspect the previous
> Actel part behaved identically to the device now
> installed.
> 
> We made some ohmmeter measurements and for a while we
> thought that ATRTN1 (J31-7) was 43 ohms to GND vs 7.x ohms
> to ground on a board that reported good temps.  Later
> we redid the measurements and the strange board also reported
> 7.x ohms.  The reported temps were still bonkers.
> 
> The voltages at J31-7 vs J31-8 were about .69 volts and
> that matched the voltages for the Xilinx temps on J31-3 vs J31-4.
> Those voltages decreased and the reported temperatures increased
> as expected after the board was fully powered up.  But still
> the reported values were too high by 50 for the PPC.
> 
> The board would only stay powered up if we disabled the
> automatic failure condition  shutdown function  We checked
> the other reported temps, voltages and currents and they
> were all fine.  The board would also stay up if we temporarily shorted
> J31-7 to J31-8 which leads to a reported temp of -271.  The -271
> is nonsensical but understandable.  As it is higher than the low temp shutoff 
> threshold of -280 all the other
> autoshutoff protections can remain enabled.
> 
> The Actel Fusion device is a 256 pin BGA so it's virtually impossible
> to probe even though the PPC temp sensor input connections
> are on or near the edge of the package.
> Using an X-Ray inspection device and comparing vs a known
> good board we found the very short length of etch that delivers
> the ATRTN1 signal from a via (from the PPC IC and to J31-7) to the
> PCB bad for the Actel's T6 pin looks to be mainly missing.
> 
> So maybe the Actel's ADC is getting a valid version of just one of
> the PPC's PNP transistor leads and thus reporting a value with a
> strange offset ?
> 
> We tried blinding pushing in a fine wire to make a connection from J31-7
> to the tiny U60-T6 solder ball but never improved the reported value.
> It would have been a big surprise if that actually worked but hey
> no guts no glory.
> 
> Do you have any ideas before we remove the Actual Fusion part at U60
> and inspect&repair the PCB trace from the PCB pad to the breakout
> via ?
> 
> other ideas:
> 1) tweak the autoshutdown code running on this 1 Roach1 to deal
>   with the ~ 50deg offset in reported PPC temps ?
> 2) jumper the PPC temp signals so Vdiff is 0V (-271 deg) or
>   perhaps some other voltage like 1.0VCC so that
>   PPC temperature monitoring is lost but the rest of the
>   protections are up and running.
> 3) no need to disable entirely the failure mode shutdown function
>   so don't want to persue that path.
> 4) ?
> 
> For now, I believe, additional board bringup and debugging will continue
> with J31-7&8 shorted together...
> 


Reply via email to