Re: PDP-11/45 RSTS/E boot problem

2019-02-19 Thread Liam Proven via cctalk
On Mon, 18 Feb 2019 at 22:16, Fred Cisin via cctalk
 wrote:
>
> One of the moxt common causes of a terrible ear-piercing high whine is the
> spindle contact.  Many old drives had a springy piece that rubbed against
> the end of the spindle.  Over time, it would wear a divot, polish that,
> and start to squeal.  A very light pressure on it would test that
> hypothesis.  Not enough pressure to muffle the sound, and certaianly not
> enough pressure to slow the spindle!  Or, pulling up on it, away from the
> spindle.  Some people claimed that you could just rip it off.  Don't.
> Best is to twist it very slightly sideways, so that it can start wearing a
> new divot.

It was a 3½" EIDE drive. 8GB one, I think, but might have been
smaller. I didn't want to open it to do that, although there was a
time when custom PC builders "de-lidded" hard disks and fitted them
with little acrylic windows so you could see the head move. Not sure
I'd want to trust my data to that...

> Well, there don't seem to be many 350 RAMAC disks still running.
>
> (I'm trying to decide what to use as a base to make a patio table out of a
> [crashed] RAMAC 24" platter)

Conceded.

And thank you for the reminder that I'm not old yet.

My first machine with a hard disk was my work PC in my first job: an
IBM PC-AT, with a 20 MB FS/FH 5¼" ST-506 drive, probably a Seagate
ST-4026. I added a second drive to the machine, a 15 MB one, and put
Xenix/286 on it.

A few years ago I bought a surplus 2½" 1 TB drive from a chap who'd
bought a new notebook and put an SSD in it before use. So, 2nd hand
but unused.

It cost me CzK 1000, about £30 at the time.

£30 for a terabyte. I was in a state of shock. It was so tiny, too.

I found an online capacity comparator thing.

You'd need a pile of those Seagate drives the size of a _space
shuttle_ to hold a terabyte.

https://liam-on-linux.livejournal.com/53353.html

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: PDP-11/45 RSTS/E boot problem

2019-02-18 Thread Paul Koning via cctalk



> On Feb 18, 2019, at 4:47 PM, Jay Jaeger via cctalk  
> wrote:
> 
> On 2/18/2019 3:38 PM, Paul Koning via cctalk wrote:
>> 
>> ...
>> Then again, I remember our college RS64 (drive for the RC11) which developed 
>> a bad motor bearing.  ...
>> 
> 
> Nice of the FE to do that.
> 
> The Univ. of Wisconsin CS Department had one of those, but the platter
> went bad.  They just flipped the platter upside down and got more use
> out of it.

Yes, that was a feature.  You had to reformat it, which required getting the 
timing track writer box from Maynard.  I have seen that done on an RS11 (RF11) 
drive on our RSTS system; it crashed some heads and was rebuilt completely (new 
heads, new platter, new motor).

> The Univ. of Wisconsin ECE Department also had one - the two machines
> were nearly twins.  I *have* *that* one - and it still ran when I tried
> it a year or so ago.

Neat.  You can run RT11 on it if you add the boot loader and driver, at least 
old versions.  DOS V4 also supports it.  And older versions of RSTS can use it 
as a swap disk.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-18 Thread Jay Jaeger via cctalk
On 2/18/2019 3:38 PM, Paul Koning via cctalk wrote:
> 
> 
>> On Feb 18, 2019, at 4:16 PM, Fred Cisin via cctalk  
>> wrote:
>>
>> On Mon, 18 Feb 2019, Liam Proven via cctalk wrote:
>>> Well that is the thing, of course. I had that with one old IDE disk,
>>> too. It made a terrible ear-piercing high whine that I associate with
>>> a failing disk... but it passed every diagnostic I could throw at it,
>>> so I used it for non-critical stuff and in testbed machines.
>>
>> One of the moxt common causes of a terrible ear-piercing high whine is the 
>> spindle contact.  Many old drives had a springy piece that rubbed against 
>> the end of the spindle.  
> 
> Then again, I remember our college RS64 (drive for the RC11) which developed 
> a bad motor bearing.  Since the platter is mounted directly on the motor 
> spindle that was a problem.  And it was not under contract, so replacing the 
> motor would have set back the department a substantial sum.  So the DEC FS 
> engineer removed the motor and carried it to Appleton Electric Motor Co., 
> which pulled the old bearing, pressed on a replacement, and handed it back.  
> Jim reinstalled the motor, all was well.  Didn't even lose any data bits.
> 
>   paul
> 

Nice of the FE to do that.

The Univ. of Wisconsin CS Department had one of those, but the platter
went bad.  They just flipped the platter upside down and got more use
out of it.

The Univ. of Wisconsin ECE Department also had one - the two machines
were nearly twins.  I *have* *that* one - and it still ran when I tried
it a year or so ago.


Re: PDP-11/45 RSTS/E boot problem

2019-02-18 Thread Paul Koning via cctalk



> On Feb 18, 2019, at 4:16 PM, Fred Cisin via cctalk  
> wrote:
> 
> On Mon, 18 Feb 2019, Liam Proven via cctalk wrote:
>> Well that is the thing, of course. I had that with one old IDE disk,
>> too. It made a terrible ear-piercing high whine that I associate with
>> a failing disk... but it passed every diagnostic I could throw at it,
>> so I used it for non-critical stuff and in testbed machines.
> 
> One of the moxt common causes of a terrible ear-piercing high whine is the 
> spindle contact.  Many old drives had a springy piece that rubbed against the 
> end of the spindle.  

Then again, I remember our college RS64 (drive for the RC11) which developed a 
bad motor bearing.  Since the platter is mounted directly on the motor spindle 
that was a problem.  And it was not under contract, so replacing the motor 
would have set back the department a substantial sum.  So the DEC FS engineer 
removed the motor and carried it to Appleton Electric Motor Co., which pulled 
the old bearing, pressed on a replacement, and handed it back.  Jim reinstalled 
the motor, all was well.  Didn't even lose any data bits.

paul




Re: PDP-11/45 RSTS/E boot problem

2019-02-18 Thread Fred Cisin via cctalk

On Mon, 18 Feb 2019, Liam Proven via cctalk wrote:

Well that is the thing, of course. I had that with one old IDE disk,
too. It made a terrible ear-piercing high whine that I associate with
a failing disk... but it passed every diagnostic I could throw at it,
so I used it for non-critical stuff and in testbed machines.


One of the moxt common causes of a terrible ear-piercing high whine is the 
spindle contact.  Many old drives had a springy piece that rubbed against 
the end of the spindle.  Over time, it would wear a divot, polish that, 
and start to squeal.  A very light pressure on it would test that 
hypothesis.  Not enough pressure to muffle the sound, and certaianly not 
enough pressure to slow the spindle!  Or, pulling up on it, away from the 
spindle.  Some people claimed that you could just rip it off.  Don't.
Best is to twist it very slightly sideways, so that it can start wearing a 
new divot.



My experience is extensive enough that _anyone's_ justifications of
why they won't use Brand X disks get ignored,


Well, there don't seem to be many 350 RAMAC disks still running.

(I'm trying to decide what to use as a base to make a patio table out of a 
[crashed] RAMAC 24" platter)


--
Grumpy Ol' Fred ci...@xenosoft.com


Re: PDP-11/45 RSTS/E boot problem

2019-02-18 Thread Liam Proven via cctalk
On Sat, 16 Feb 2019 at 01:43, Peter Coghlan via cctalk
 wrote:
>   Days turned into weeks, weeks into months and months into
> years.  It continued to occasionally make the same ghastly noises that
> never should be heard coming from a hard disk but with absolutely no sign
> of any errors being logged or damage to data whatsoever.  The noises seem
> to be associated with seek activity because I have never heard them when
> the disk is just spinning but otherwise idle.  I eventually retired it
> and replaced it with a much larger one, purely because I ran out of
> space on it.  Any thoughts on what might be happening with it?

Ha!

Well that is the thing, of course. I had that with one old IDE disk,
too. It made a terrible ear-piercing high whine that I associate with
a failing disk... but it passed every diagnostic I could throw at it,
so I used it for non-critical stuff and in testbed machines.

For about 4 or 5 *years*. It was one reason to run machines with the
case covers on, to muffle the noise. But it ran faultlessly for years.
I think in the end I sold it on to someone, with a warning of course.
That's how I dispose of all kit -- pass it on to a new owner. I try
never to scrap or recycle anything at all.

That's the problem with rule-of-thumb diagnoses. Sometimes they fail.
But more often, things fail with no warning, so it's still useful.

This is why I disregard everyone's accounts of hard disk brands they
won't touch. I did PC tech support for ~25 years. I've seen every make
of hard drive ever fail randomly, and I've seen every make of hard
drive ever work flawlessly for years even when vilely abused.

My experience is extensive enough that _anyone's_ justifications of
why they won't use Brand X disks get ignored, because if I took them,
I would not use _any_brand of disk. Everyone who's been around a bit
has a horror story and the intersection in the Venn diagram, while
small, excludes all vendors ever.

I've never seen any one make that is significantly worse than any other.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Peter Coghlan via cctalk
Liam Proven wrote:
> 
> And some of my younger colleagues thought it was strange that I could
> predict hard disk failures from the running noises they made, and
> later than that, whether WinNT's bus-mastering DMA-mode disk
> controller device driver was installed from the sound of the disk
> accesses while the machine booted.
>

The little 8GB SCA system disk in my Alphaserver 800 started making
awful bloodcurdling clattering noises a few years ago.  The first time
I heard it, I was convinced that the works were splattered all over the
inside of the HDA casing and the machine was only continuing to run
because of what was left in the disk cache or something like that.  I
started running a backup in case I might be able to salvage some part of
the contents.  Despite several more heartstopping clunks and clatters
while the backup was running, it ran to completion, with no errors logged
to my complete surprise.  I ran an ANALYZE /DISK /READ which attempts to
read all blocks on the disk that are allocated to files.  Again, several
more awful clanks and clatters but it completed with no errors.  I lined
up a replacement disk for it but I was curious to see how exactly it
was going to fail so I decided to keep on using it for a while to see
what happens.  Days turned into weeks, weeks into months and months into
years.  It continued to occasionally make the same ghastly noises that
never should be heard coming from a hard disk but with absolutely no sign
of any errors being logged or damage to data whatsoever.  The noises seem
to be associated with seek activity because I have never heard them when
the disk is just spinning but otherwise idle.  I eventually retired it
and replaced it with a much larger one, purely because I ran out of
space on it.  Any thoughts on what might be happening with it?

Regards,
Peter Coghlan.


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Peter Coghlan via cctalk
Jeffrey S. Worley wrote:
> 
> Back in 2000-ish, I was upgrading my DG MV4000/dc to 8mb so as to be
> able to run the snazzy AOS/VS II tapes I'd got along with the 9 track
> drive I hacked onto the machine...
> 
> The install would start and then bomb at a certain point every time.  I
> decided to work the machine hard and then pull the board and give a
> good SNIFF.  This is a 15x15 inch board populated with 256kx1 drams. 
> The time in the machine got the board cooking nicely, and when I
> smelled a certain charred smell in the vicinity of a 74ls04, I knew it
> was that magic black smoke.  I pulled a 74HCT04 from a known-good isa
> card, socketed the spot and viola!  Working 8mb board.  It isn't
> ALLWAYS the most expensive chip, thank God, and sometimes even us not-
> as-bright guys come off with a win.
>

About 20 years earlier than that, one of my friends at school asked me to
fix his Jupiter Ace which had stopped working.  I told him I didn't hold
out much hope for success because I didn't have the vaguest idea how his
little machine worked at that time but I agreed to wave my multimeter in
the general direction of it's power supply.  I opened it up and quickly
found that the voltages seemed very reasonable and I prodded around the
board rather aimlessly looking for some part that looked guilty.  I soon
noticed that one of the eight identical chips in a row at the bottom of
the board was getting hot enough to burn my finger while the others
remained cool and calm.  I can't remember where I got a replacement 4116
or 4164 or whatever it was - I probably had to get it mail order but once
it was soldered in with fingers crossed that nothing else was wrong, the
machine came right back to life.  Sometimes you just get lucky.  I wish I
could be that lucky with some of my own stuff now.

Regards,
Peter Coghlan.


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Noel Chiappa via cctalk
> From: Paul Koning

> Studied it for a while, took out a small hammer, whacked the device at
> some spot, and reported "fixed".

That reminds me of an amusing story from the first time I went to see 'Star
Wars; I went with a group of people from Tech Sq. It has that scene where
they're about to make the jump to hyperspace in the 'Falcon', and it won't
go; so one of them (I think Solo) jumps up and whacks a particular spot on
the bulkhead with his fist, and away she goes.

We all found this terribly amusing, since one of the DEC time-sharing systems
on the 9th floor had a sticky relay in the power controller, and when you'd
try to power it on or off from the front panel, the relay would stick, and
nothing would happen. So the procedure was to go around the back, open a
particular door, reach in, and whack the power controller behind it in a
particular spot with the side of your fist, and away it went!

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Liam Proven via cctalk
On Fri, 15 Feb 2019 at 14:59, Paul Koning  wrote:
>
> Speaking of sounds made by machines, there is a famous security paper from a 
> few years ago in which researchers read the encryption keys out of 
> smartphones by listening to the sounds made by the device while it was 
> execution the crypto algorithms.

... wow.

> These hardware wizard stories remind me of a legendary repair wizard, 
> non-computer industrial devices I think.  He was called in to fix a tricky 
> problem at the customer site.  Studied it for a while, took out a small 
> hammer, whacked the device at some spot, and reported "fixed".  He then sent 
> in a bill for $500.
>
> Customer challenged that with a demand to itemize the work.  The itemized 
> bill came back like this:
>
> 1. Applying impact to the device: $5
> 2. Knowing where and how to apply the impact: $495

110 years old, and still apt.

https://quoteinvestigator.com/2017/03/06/tap/

I first encountered it in the form of one of the AI Koans. I guess
these are probably familiar to all here, but in case:

http://people.cs.uchicago.edu/~wiseman/humor/ai-koans.html

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Paul Koning via cctalk



> On Feb 15, 2019, at 6:06 AM, Liam Proven via cctalk  
> wrote:
> 
> On Fri, 15 Feb 2019 at 04:34, Jeffrey S. Worley via cctalk
>  wrote:
>> 
>> The install would start and then bomb at a certain point every time.  I
>> decided to work the machine hard and then pull the board and give a
>> good SNIFF.
> 
> Got a nose for a hardware fault, eh? ;-)
> 
> And some of my younger colleagues thought it was strange that I could
> predict hard disk failures from the running noises they made, and
> later than that, whether WinNT's bus-mastering DMA-mode disk
> controller device driver was installed from the sound of the disk
> accesses while the machine booted.

Speaking of sounds made by machines, there is a famous security paper from a 
few years ago in which researchers read the encryption keys out of smartphones 
by listening to the sounds made by the device while it was execution the crypto 
algorithms.

These hardware wizard stories remind me of a legendary repair wizard, 
non-computer industrial devices I think.  He was called in to fix a tricky 
problem at the customer site.  Studied it for a while, took out a small hammer, 
whacked the device at some spot, and reported "fixed".  He then sent in a bill 
for $500.

Customer challenged that with a demand to itemize the work.  The itemized bill 
came back like this:

1. Applying impact to the device: $5
2. Knowing where and how to apply the impact: $495

   paul





Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Peter Coghlan via cctalk
Fritz Mueller wrote:
> 
> That's right -- I wasn't without an army, it was just a very small and
> quite dedicated army! :-)
> 
> I think I'd have gone down many blind alleys without help and perspective
> provided by others here, and in particular a lot guidance provided by Noel
> over the past couple weeks in private correspondence enabling the use of
> V6 as a test case and investigative tool.  For this I am very grateful.
> 

I very much enjoyed following the story of tracking down this fault.
Thanks for sharing it.

> 
> As those of you who have worked on these machines know, they are just so
> damn serviceable, by design.  It's very empowering!
> 

I wish that this was also the case with several DEC Alphas I have with
cache failures that are not nearly so serviceable or empowering  :-(

Regards,
Peter Coghlan.

>
>   --FritzM.
>


Re: PDP-11/45 RSTS/E boot problem

2019-02-15 Thread Liam Proven via cctalk
On Fri, 15 Feb 2019 at 04:34, Jeffrey S. Worley via cctalk
 wrote:
>
> The install would start and then bomb at a certain point every time.  I
> decided to work the machine hard and then pull the board and give a
> good SNIFF.

Got a nose for a hardware fault, eh? ;-)

And some of my younger colleagues thought it was strange that I could
predict hard disk failures from the running noises they made, and
later than that, whether WinNT's bus-mastering DMA-mode disk
controller device driver was installed from the sound of the disk
accesses while the machine booted.

BTW, Jeff, Gmail bottom-quotes just fine. I'm using the web interface
right now. Just hit Ctrl-A, trim as needed and move the cursor. Yes,
it's a pain on mobile, so I try not to answer on mobiles!

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: PDP-11/45 RSTS/E boot problem

2019-02-14 Thread William Pechter via cctalk
> Message: 2
> Date: Wed, 13 Feb 2019 15:03:41 -0500
> From: Paul Koning 
> To: Jay Jaeger , "General Discussion: On-Topic and
>     Off-Topic Posts" 
> Subject: Re: PDP-11/45 RSTS/E boot problem
> Message-ID: 
> Content-Type: text/plain;    charset=us-ascii
>
>
> > On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk
>  wrote:
> >
> > ...
> > Maybe that story about FE's using Unix as a test to confirm operation
> > even when diagnostics said the machine was OK was not so much just a
> > legend?
>
> It still fels like a legend.  My experience with DEC field service
> engineers is that they used the diagnostics.  In the PDP-11 era, Unix
> knowledge around DEC was pretty sparse, especially early on when it
> could be found only in the Telephone Products Group (Armando
> Stettner).  RSTS would be more plausible, but I never saw that in the
> hads of FS engineers either.
> By and large diagnostics would find problems. I've seen a number in
> the 1970s, including a messy data path failure in the 11/45 MMU where
> we (college students) did the initial diagnosis while the FS engineer
> was on his way. My suspicion is that things not solved by diagnostics
> would be escalated to the "wizard from Maynard". And they'd probably
> start replacing whole subsystems. I've seen that once, when our
> college RSTS-11 system (11/20, 16 DL-11 lines) was crashing on average
> once a day for months. DEC brought in several of those "wizards". The
> "fix" was to replace the 11/20 by a "spare part" -- an 11/45 with more
> memory, a DH11, and RSTS/E. Decades later I was told that the wizards
> actually pinned the blame on the college FM broadcast transmitter,
> about 200 feet down the hall from the computer center. That may well
> be, though I didn't heard that at the time. RSTS did get used in
> manufacturing, at Final Assembly & Test sites like Westminster MA and
> Salem NH, where PDP-11 systems large enough to run RSTS/E were
> subjected to a load test of exerciser programs running under that OS.
> The way it was explained to us is that a system that would be happy
> with such a test would also be happy with any customer application.
> It's not clear if that was because RSTS would load things more than
> most, or was more finicky about hardware glitches than most, but it
> certainly was the practice for quite some time. Of course, not all
> PDP-11 configurations could be tested that way. paul

I guess the experience in NJ was a bit different since AT had two
dedicated Field Service offices who handled their sites including Bell Labs.

I was on the Commercial/Government side from 81-86 and we didn't get to
play with RSTS on customer sites at all (but sometimes we got to play in
the in-house machines in Princeton or on our own hardware).

It was a bit different in the Vax side since many diags were run under
VAX/VMS and as a brand new hire I was doing Vax installs -- including
installing the VMS 2.x and 3.x on 11/780's and 11/750's at install time.

If they had paid for software installation -- the software guys would
wipe and reinstall.
If not we left the pack and prayed the customer wouldn't wipe the diags
that we installed on the disk when we build the VMS pack.  Realistically
the only thing the customer needed to do after we got done was tweak the
systen parameters, check the swap etc. and lay on the layered products
like languages.

Things got much more interesting when the VMS3.x and 4.x got CI780's and
HSC50's.  That was more involved than the easy VMS 2.x-3.x install.

As far as the 11/70's -- I'm building a pidp1170... My last 11/70
install was around 84 or so when I put in a late DECDatasystem 570 blue
11/70 with the FCC Cabinets at AT in Freehold.

As far as the Wizard from Maynard -- one story from my branch support
guy (rumored to be about his
brother on the 11/70 line in (I think in Westminster MA... not Salem or
other NH plants) had an intermittant 11/70 that would crash every couple
of days and they could run all the diags and DEC X11 with no issues. 
They called over their in-house wizard who ran toggle-in programs from
the front panel -- playing the switches like piano keys with both
hands.   After about a half hour his comment was "Clean the terminator
fingers."

Machine ran like a SOB once the gold fingers were cleaned.

Weirdest 11/70 mess I had was after I left DEC to work for a third party
maintenance group.  Their regional support was in Dallas.  I was in NJ. 
They couldn't find their support guy so they rushed me on a plane to
Chicago to work with two techs who were babysitting a mess they had no
clue on.

The site was WW Granger in Skokie and I arrived at 3AM...  They had a 5
or 6 story warehouse which was a totally robotic automated site picking
water heaters and other industrial equ

Re: PDP-11/45 RSTS/E boot problem

2019-02-14 Thread Jeffrey S. Worley via cctalk
I got a laugh out of this anecdote.  Of course, folks heard me chuckle
and I tried to share the joke but  Way too geeky for public
consumption.

Back in 2000-ish, I was upgrading my DG MV4000/dc to 8mb so as to be
able to run the snazzy AOS/VS II tapes I'd got along with the 9 track
drive I hacked onto the machine...

The install would start and then bomb at a certain point every time.  I
decided to work the machine hard and then pull the board and give a
good SNIFF.  This is a 15x15 inch board populated with 256kx1 drams. 
The time in the machine got the board cooking nicely, and when I
smelled a certain charred smell in the vicinity of a 74ls04, I knew it
was that magic black smoke.  I pulled a 74HCT04 from a known-good isa
card, socketed the spot and viola!  Working 8mb board.  It isn't
ALLWAYS the most expensive chip, thank God, and sometimes even us not-
as-bright guys come off with a win.

I really enjoy reading this list even though I don't contribute all
that often or anything of much value.  It is a pleasure to watch you
guys work.

Jeff


On Thu, 2019-02-14 at 12:00 -0600, cctalk-requ...@classiccmp.org wrote:
> Re: PDP-11/45 RSTS/E boot problem

> When our 11/45 failed in the MMU in 1975, my classmate Josh Rosen
traced the failing path on the schematics.  When Jim Newport the field
service engineer showed up, Josh described the diagnostics result that
pointed at the failed path, and added "This is the failed chip"
(pointing to one particular chip.

Jim asked "Why that one?"  Josh answered "because that is the most
expensive chip".

It turned out he was right.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-14 Thread Fritz Mueller via cctalk
That's right -- I wasn't without an army, it was just a very small and quite 
dedicated army! :-)

I think I'd have gone down many blind alleys without help and perspective 
provided by others here, and in particular a lot guidance provided by Noel over 
the past couple weeks in private correspondence enabling the use of V6 as a 
test case and investigative tool.  For this I am very grateful.

As those of you who have worked on these machines know, they are just so damn 
serviceable, by design.  It's very empowering!

--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-14 Thread Alan Frisbie via cctalk
Ethan Dicks  wrote:

> I have had an RK11-C for a long time that I've never tried to
> power up (I got an RKV11-D and used that on Qbus machines
> instead).

Wow, someone else with an RKV11-D!  I thought I was the only
person who had one.   I modified mine (using the dead bug
technique) to add 18-bit addressing instead of just 16, and
ran it successfully with RT-11 and RSX-11M on my 11/73 system.

I have had DEC people visit my place, look at the RKV11-D, and
say "DEC never made anything like that!".  :-)

Alan "and I don't exist either" Frisbie


Re: PDP-11/45 RSTS/E boot problem

2019-02-14 Thread Noel Chiappa via cctalk
> From: Jerry Weiss

> I am trying to understand how the diagnostics didn't reveal this defect.

Vondada #12: "Diagnostics are highly efficient in finding solved problems." :-)

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Fritz Mueller via cctalk

On 2/13/19 5:20 PM, Jerry Weiss wrote:
I am trying to understand how the diagnostics didn't reveal this 
defect.  I see in the source for the diagnostic DZRKH-F there are tests 
for address in the 28K-32K range and also for the 32K boundary.


So, to catch this defect the diagnostic would have to have a test or 
tests which crossed _specifically_ the 30K boundary within a transfer.


The detailed symptom was a false overflow into the ex.mem bits on the 
30K boundary, causing a skip forward in bus addresses during the transfer.


The detailed fail on the 7430 (E34 on the M795) was that it acted as if 
input pin 11 was always H.


Now, all this said, I don't think I have ever run ZRKH!  I *did* run and 
pass the earlier ZRKA, ZRKB, and ZRKC, but somehow missed ZRKH...  I'm 
wondering now how different it is from ZRKC?  Will have to take a look...


--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Jon Elson via cctalk

On 02/13/2019 10:40 AM, Noel Chiappa via cctalk wrote:

He's also had to do a tremendous amount of work on it to get it running,
starting with building an entire new power harness.
Yes, the 5V power harness between the regulators and the 
backplane were a real mess on the 11/45 we got second hand.  
I probably SHOULD have rebuilt the entire harness and 
replaced all the Mate-n-Lock connectors on the regulators, 
too, but we were always just wanting to get the machine 
running again.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Jerry Weiss via cctalk

On 2/13/19 1:43 AM, Fritz Mueller via cctalk wrote:

SUCCESS!!

Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around 
with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  Pulled, 
socketed, replaced, and off she goes!

I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk.

*THAT* was a really fun and rewarding hunt :-)  First message in the thread was 
back on Dec 30, 2018.  Lots of debugging and failed DRAM repairs, then the 
final long assault to this single, failed gate...

Thanks to all here for the help and resources, and particular shout-outs for 
Noel and Paul who gave generously of their time and attention working through 
the densest bits, both on and off the list.

I predict a long happy weekend and a big power bill at the end of the month :-)

 cheers,
   --FritzM.



Congratulations.  Well Done.

I am trying to understand how the diagnostics didn't reveal this 
defect.  I see in the source for the diagnostic DZRKH-F there are tests 
for address in the 28K-32K range and also for the 32K boundary.


I'm trying to make sense of the M795 to get a better understanding. Any 
addition data on how the 7430 failed (input bad, output bad, etc ?)


  Jerry


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Paul Koning via cctalk



> On Feb 13, 2019, at 3:03 PM, Paul Koning  wrote:
> 
> ...
> My suspicion is that things not solved by diagnostics would be escalated to 
> the "wizard from Maynard".  And they'd probably start replacing whole 
> subsystems. 

This says that Fritz actually was a new "Wizard from Maynard" in solving this 
problem.  Only more so -- because he didn't have the luxury of just swapping 
out whole sections of the machine with new kits, or a backup team of subsystem 
experts at the home office to call on.

That confirms it's really a very impressive performance.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Paul Koning via cctalk



> On Feb 13, 2019, at 3:54 PM, Ethan Dicks via cctalk  
> wrote:
> 
> ...
> It's interesting that it was a bad 7430 in yours.  I find that for
> equipment of that vintage, my usual suspects are failed 7474s and
> failed 7440s, probably 80% of the total.  Behind that, it goes 7420s
> and then maybe 7430s.

When our 11/45 failed in the MMU in 1975, my classmate Josh Rosen traced the 
failing path on the schematics.  When Jim Newport the field service engineer 
showed up, Josh described the diagnostics result that pointed at the failed 
path, and added "This is the failed chip" (pointing to one particular chip.

Jim asked "Why that one?"  Josh answered "because that is the most expensive 
chip".

It turned out he was right.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Ethan Dicks via cctalk
On Wed, Feb 13, 2019 at 2:43 AM Fritz Mueller via cctalk
 wrote:
>
> SUCCESS!!

Outstanding!

> Put the M795 out on an extender, loaded 16 in RKBAR, and had a look 
> around with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  
> Pulled, socketed, replaced, and off she goes!
>
> I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk.

Nice.

I have had an RK11-C for a long time that I've never tried to power up
(I got an RKV11-D and used that on Qbus machines instead).  The saga
has been interesting for me as I contemplate getting mine working in
the next couple of years.  I had to look up the M795.  I had forgotten
there was one dual-height module in the entire controller.

It's interesting that it was a bad 7430 in yours.  I find that for
equipment of that vintage, my usual suspects are failed 7474s and
failed 7440s, probably 80% of the total.  Behind that, it goes 7420s
and then maybe 7430s.

-ethan


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Paul Koning via cctalk


> On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk  
> wrote:
> 
> ...
> Maybe that story about FE's using Unix as a test to confirm operation
> even when diagnostics said the machine was OK was not so much just a
> legend?

It still fels like a legend.  My experience with DEC field service engineers is 
that they used the diagnostics.  In the PDP-11 era, Unix knowledge around DEC 
was pretty sparse, especially early on when it could be found only in the 
Telephone Products Group (Armando Stettner).  RSTS would be more plausible, but 
I never saw that in the hads of FS engineers either.

By and large diagnostics would find problems.  I've seen a number in the 1970s, 
including a messy data path failure in the 11/45 MMU where we (college 
students) did the initial diagnosis while the FS engineer was on his way.

My suspicion is that things not solved by diagnostics would be escalated to the 
"wizard from Maynard".  And they'd probably start replacing whole subsystems.  
I've seen that once, when our college RSTS-11 system (11/20, 16 DL-11 lines) 
was crashing on average once a day for months.  DEC brought in several of those 
"wizards".  The "fix" was to replace the 11/20 by a "spare part" -- an 11/45 
with more memory, a DH11, and RSTS/E.

Decades later I was told that the wizards actually pinned the blame on the 
college FM broadcast transmitter, about 200 feet down the hall from the 
computer center.  That may well be, though I didn't heard that at the time.

RSTS did get used in manufacturing, at Final Assembly & Test sites like 
Westminster MA and Salem NH, where PDP-11 systems large enough to run RSTS/E 
were subjected to a load test of exerciser programs running under that OS.  The 
way it was explained to us is that a system that would be happy with such a 
test would also be happy with any customer application.  It's not clear if that 
was because RSTS would load things more than most, or was more finicky about 
hardware glitches than most, but it certainly was the practice for quite some 
time.  Of course, not all PDP-11 configurations could be tested that way.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Jay Jaeger via cctalk


On 2/13/2019 1:43 AM, Fritz Mueller via cctalk wrote:
> SUCCESS!!
> 
> Put the M795 out on an extender, loaded 16 in RKBAR, and had a look 
> around with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  
> Pulled, socketed, replaced, and off she goes!
> 
> I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk.
> 
> *THAT* was a really fun and rewarding hunt :-)  First message in the thread 
> was back on Dec 30, 2018.  Lots of debugging and failed DRAM repairs, then 
> the final long assault to this single, failed gate...
> 
> Thanks to all here for the help and resources, and particular shout-outs for 
> Noel and Paul who gave generously of their time and attention working through 
> the densest bits, both on and off the list.
> 
> I predict a long happy weekend and a big power bill at the end of the month 
> :-)
> 
> cheers,
>   --FritzM.
> 
> 

Congratulations.As another poster mentioned, it has been fascinating
to watch and learn, day by day, as you worked on the problem with Noel
and Paul's help.

And I learned a little bit more about my 11/45 (that it indeed had had a
processor field upgrade), which I had not looked at very closely before.

Maybe that story about FE's using Unix as a test to confirm operation
even when diagnostics said the machine was OK was not so much just a
legend?




Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Jon Elson via cctalk

On 02/13/2019 01:43 AM, Fritz Mueller via cctalk wrote:

SUCCESS!!

Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around 
with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  Pulled, 
socketed, replaced, and off she goes!


WOW!  Good detective work, that certainly was a WEIRD 
problem, and not where I thought it was going to be.


Glad you got it solved!

Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Noel Chiappa via cctalk
> From: Alan Frisbie

> I am finding this entire discussion extremely fascinating! Every day I
> look forward to reading the latest twists in the plot.

:-)

> The ideas, hunches, tests, dead ends, and results are an excellent
> example of the debugging process.

Yeah, and it was a Duesie of a problem, too.

Although once we got clear of the bad data from the console and my confusion
about R5, and it became clear that in the Unix failure, the pure text was
being damaged, from that point it was pretty straightforward to track it down
(albeit one that needed detailed understanding of how V6 handled pure texts -
and luckily I'd come to understand that part of the system a bit while getting
the QSIC running).

Fritz's lucky discovery, early on, that it was location dependent was also a
big help.

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-13 Thread Paul Koning via cctalk



> On Feb 13, 2019, at 2:43 AM, Fritz Mueller via cctalk  
> wrote:
> 
> SUCCESS!!
> 
> Put the M795 out on an extender, loaded 16 in RKBAR, and had a look 
> around with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  
> Pulled, socketed, replaced, and off she goes!
> 
> I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk.

Congratulations.  You have successfully performed a repair of the type done at 
customer sites by highly trained DEC field service personnel.  They were the 
ones who traveled with an oscilloscope, a tool case including soldering iron, 
and a case full of replacement chips.

One difference is that the diagnostics didn't point to the problem, which in my 
experience is rather an unusual situation.  Nicely done.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-12 Thread Fritz Mueller via cctalk
SUCCESS!!

Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around 
with a logic probe.  Narrowed it down to E34 (a 7430 8-input NAND).  Pulled, 
socketed, replaced, and off she goes!

I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk.

*THAT* was a really fun and rewarding hunt :-)  First message in the thread was 
back on Dec 30, 2018.  Lots of debugging and failed DRAM repairs, then the 
final long assault to this single, failed gate...

Thanks to all here for the help and resources, and particular shout-outs for 
Noel and Paul who gave generously of their time and attention working through 
the densest bits, both on and off the list.

I predict a long happy weekend and a big power bill at the end of the month :-)

cheers,
  --FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-12 Thread Alan Frisbie via cctalk
> > > Likely some disk controllers did NOT SUPPORT crossing 64K boundaries!
> >
> > No; the RK11 spec says "[the two extended memory bits] make up a two-bit
> > counter that increments each time the RKBA overflows".
> >
> > The actual error turns out to be slightly different to my guess; there's
> > a spurious overflow from the low 16-bit register to these bits at 017.
> 
> Maybe a problem with E29 or E34 on the M795 module?

I am finding this entire discussion extremely fascinating!
Every day I look forward to reading the latest twists in the
plot.   The ideas, hunches, tests, dead ends, and results are an
excellent example of the debugging process.

I am awaiting the exciting Perry Mason style conclusion, where
the guilty chip stands up and confesses on the stand.  :-)

Alan "Where were you on the night of the crime?" Frisbie


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Noel Chiappa via cctalk
> From: Jerry Weiss

> it is impressive that UNIX booted successfully without tripping over a
> boundary.

Well, V6 is (or can be configured to be) extraordinarily small, so I'm not
surprised it booted OK without going over the 017 mark.

I have this persistent memory that the -11/40 in the CSR group at MIT had only
3 banks of MM-L (@16KB each) when I first got there! Which is plausible; the
smallest V6 config would have about 22KB of core text, and about 2KB of
initialized data. If you cut all the parameters to the bone (minimal number of
disk buffers, etc) you could probably get away with say 6KB of un-initialized
data. That would leave you 18KB for user programs on such a system, a bit less
than their recommendation of 24KB minimum for users, but probably minimally
useable.

We quickly added more memory, I'm sure, but I don't now remember how/what!
Later on it was converted to an -11/45, and then we got an Able ENABLE, but
that would have been a couple of years later.

 Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Jerry Weiss via cctalk

On 2/11/19 12:31 PM, Noel Chiappa via cctalk wrote:

 > From: Jerry Weiss

 > Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K
 > boundaries.

Interesting! What's odd is that the DR11-B uses the Bus Interface card (M7219)
from the RC11 controller, and that _can_ cross moby boundaries, so clearly it
has the right overflow output; someone just decided not to implement it - the
DR11-B sets ERROR instead on an address overflow. Wierd.
    Yes the overflow sets error and halts the transfer.  There are 
registers for the extended bits in the DR11B, just missing a few gates 
to increment.   My recollection is that my simple mod wouldn't allow the 
read back of the incremented extended bits, but in my use case this was 
never a problem.



Anyway, it will be interesting to see if RSTS operates correctly once this
problem is fixed...

Noel
  
Yes if turns out the increment was not functional for one or both 
extended address bits, it is impressive that UNIX booted successfully 
without tripping over a boundary.


  Jerry


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Noel Chiappa via cctalk
> From: Jerry Weiss

> Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K
> boundaries.

Interesting! What's odd is that the DR11-B uses the Bus Interface card (M7219)
from the RC11 controller, and that _can_ cross moby boundaries, so clearly it
has the right overflow output; someone just decided not to implement it - the
DR11-B sets ERROR instead on an address overflow. Wierd.

Anyway, it will be interesting to see if RSTS operates correctly once this
problem is fixed...

Noel

 


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Paul Koning via cctalk



> On Feb 11, 2019, at 1:13 PM, Jerry Weiss  wrote:
> 
> On 2/11/19 11:50 AM, Paul Koning via cctalk wrote:
>>> ...
>> You may be thinking about PC controllers like the floppy controller.  I 
>> can't remember ANY DEC DMA device controller that had boundary crossing 
>> limits of any kind.  It certainly isn't a restriction in the RK11.
>> 
>>  paul
>> 
> Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K 
> boundaries.
> 
> I did however via a single chip "dead bug" modification, modify one to 
> accomplish this.  
>
> Jerry

That's rather shocking.  I meant my comment to apply to every DMA controller, 
not just disks.  I never used the DR11-B, though.  Perhaps there are other 
obscure devices that get this wrong.  But, for example, even devices like 
DMC-11 and TS-11 got it right.

There are of course Q-bus devices that only do a partial address space, but my 
point is that whatever the number of address bits implemented, address 
arithmetic is as a matter of normal design done across all of them, not across 
a subset.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Jerry Weiss via cctalk

On 2/11/19 11:50 AM, Paul Koning via cctalk wrote:

On Feb 11, 2019, at 11:12 AM, Jon Elson via cctalk  
wrote:

On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote:

A look at the RK11 registers after the swap-out showed an anomaly; something
about the extended memory address bits? (Maybe a multi-block transfer than
crosses a 64KB boundary? That would explain the address sensitivity we were
seeing.) Hopefully he'll track it to its lair shortly.



OH, BOY!  I think you may have found it.  Likely some disk controllers did NOT 
SUPPORT crossing 64K boundaries!  The diags would not detect that, as it was 
likely expected behavior.  I would suspect the driver would need to break up 
these operations.

You may be thinking about PC controllers like the floppy controller.  I can't 
remember ANY DEC DMA device controller that had boundary crossing limits of any 
kind.  It certainly isn't a restriction in the RK11.

paul

Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K 
boundaries.


I did however via a single chip "dead bug" modification, modify one to 
accomplish this.


    Jerry


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Fritz Mueller via cctalk
Yup; specifically, the symptoms are consistent with 'D15 RKBA=ALL 1 L' being 
incorrectly generated at BA 16, causing an increment to EX.MEM, causing a 
skip in the DMA.

So it looks like problem with bit 12 in that carry logic; I'll check E28 and 
E34 when I get back to it tonight, but I have to move the machine around so I 
can climb inside :-)

  --FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Tony Duell via cctalk
On Mon, Feb 11, 2019 at 6:03 PM Noel Chiappa via cctalk
 wrote:
>
> > From: Jon Elson
>
> > Likely some disk controllers did NOT SUPPORT crossing 64K boundaries!
>
> No; the RK11 spec says "[the two extended memory bits] make up a two-bit
> counter that increments each time the RKBA overflows".
>
> The actual error turns out to be slightly different to my guess; there's
> a spurious overflow from the low 16-bit register to these bits at 017.

Maybe a problem with E29 or E34 on the M795 module?

-tony


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Noel Chiappa via cctalk
> From: Jon Elson

> Likely some disk controllers did NOT SUPPORT crossing 64K boundaries!

No; the RK11 spec says "[the two extended memory bits] make up a two-bit
counter that increments each time the RKBA overflows".

The actual error turns out to be slightly different to my guess; there's
a spurious overflow from the low 16-bit register to these bits at 017.

I can see how the diags didn't catch that one! Unless you try a multi-block
xfer that walks across the boundary A perfect example of Vonada #12.

 Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Paul Koning via cctalk



> On Feb 11, 2019, at 11:12 AM, Jon Elson via cctalk  
> wrote:
> 
> On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote:
>> A look at the RK11 registers after the swap-out showed an anomaly; something
>> about the extended memory address bits? (Maybe a multi-block transfer than
>> crosses a 64KB boundary? That would explain the address sensitivity we were
>> seeing.) Hopefully he'll track it to its lair shortly.
>> 
>> 
> OH, BOY!  I think you may have found it.  Likely some disk controllers did 
> NOT SUPPORT crossing 64K boundaries!  The diags would not detect that, as it 
> was likely expected behavior.  I would suspect the driver would need to break 
> up these operations.

You may be thinking about PC controllers like the floppy controller.  I can't 
remember ANY DEC DMA device controller that had boundary crossing limits of any 
kind.  It certainly isn't a restriction in the RK11.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Tony Duell via cctalk
On Mon, Feb 11, 2019 at 4:13 PM Jon Elson via cctalk
 wrote:
>
> On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote:
> > A look at the RK11 registers after the swap-out showed an anomaly; something
> > about the extended memory address bits? (Maybe a multi-block transfer than
> > crosses a 64KB boundary? That would explain the address sensitivity we were
> > seeing.) Hopefully he'll track it to its lair shortly.
> >
> >
> OH, BOY!  I think you may have found it.  Likely some disk
> controllers did NOT SUPPORT crossing 64K boundaries!  The
> diags would not detect that, as it was likely expected
> behavior.  I would suspect the driver would need to break up
> these operations.

I _think_ the RK11-C should cross a 64K boundary correctly. There's
an output from the low 16 bits bus address module (M795) on pin BP2
'D15 RKBA=ALL 1 L' (page 27 of the schematic on Bitsavers)  that
goes to the counter that holds the 2 extended bits, pin C1 of the M239
in slot A17 (page 13 of the same .pdf)

[I am working from 'RK11-C_schemFeb1971.pdf' from bitsavers]

Of course if there is a fault in this area then it will not correctly increment
the top 2 bits, but that might give you somewhere to check.

-tony


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Jon Elson via cctalk

On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote:

A look at the RK11 registers after the swap-out showed an anomaly; something
about the extended memory address bits? (Maybe a multi-block transfer than
crosses a 64KB boundary? That would explain the address sensitivity we were
seeing.) Hopefully he'll track it to its lair shortly.


OH, BOY!  I think you may have found it.  Likely some disk 
controllers did NOT SUPPORT crossing 64K boundaries!  The 
diags would not detect that, as it was likely expected 
behavior.  I would suspect the driver would need to break up 
these operations.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-11 Thread Noel Chiappa via cctalk
> From: Fritz Mueller

> If, as you are suspecting, we find damning evidence pointing
> specifically to the RK11

I got an update from Fritz. As you all will recall, the problem seemed to be
a corrupted 'pure text'. So the question was 'when was it damaged, and how'.

After some confusion caused by different OS images (the 'Ritchie' and
'Wellsch' distros), he managed to get a look at the text in main memory after
it was first read in from the file system, and before it was swapped out (it
was showing up damaged after a swap out/in cycle); it looked good at that
point. The copy written out to the swap disk however, not so good.

A look at the RK11 registers after the swap-out showed an anomaly; something
about the extended memory address bits? (Maybe a multi-block transfer than
crosses a 64KB boundary? That would explain the address sensitivity we were
seeing.) Hopefully he'll track it to its lair shortly.

We also need to characterize exactly what the fault is, because the DEC RK11
diagnostics weren't finding it, so it seems the diagnostic suite could use an
enhancement

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-09 Thread Fritz Mueller via cctalk


>> This seems the best place to start with the LA this weekend then.
> 
> I'm going to respectfully semi-disagree! I think that at this point there's a
> good chance we can localize to within a gate or two before we start applying
> test instruments.

Oh, I agree completely, Noel.  I should have more precisely said "when/if we 
get to the LA this weekend, this seems the place to start."

If, as you are suspecting, we find damning evidence pointing specifically to 
the RK11, I'm going to want to watch it going about its business; the LA will 
be a good tool for that.

And yes, one of the beautiful things about these machines is how far you can 
get with just a set of extenders, a KM11, the front panel, and a 'scope.

  --FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-09 Thread Noel Chiappa via cctalk
> From: Fritz Mueller

> This seems the best place to start with the LA this weekend then.

I'm going to respectfully semi-disagree! I think that at this point there's a
good chance we can localize to within a gate or two before we start applying
test instuments.

My thinking starts with two pieces of data; i) your discovery that when the
MM trap happens, the end of the pure text segment contains a fragment of code
from 04000 lower in the text, and ii) the data that the location in main
memory where that _should_ have been is full of zeros - i.e. never been
written into.

The latter is, I think, due to the fact that Unix clears all of main memory
on startup; I think it's just chance that that memory hasn't been used yet
for something else, and is still 0's. (Unix does clear main memory in a few
places during regular operation - e.g. when expanding the stack, the newly
added area is 0'd - but in general, e.g. when swapping in a pure text, it
doesn't seem to bother, which makes sense since it's all about to be
over-written anyway.)

Anyway, those two, together with my previous analysis that this was unlikely
to have happened when the text was first being read in from the file, block
by block, lead me to believe that the likely cause is that the BAR on the
RK11 skipped up a whole bunch (setting the 04000 bit at some point) when it
was reading the pure text back in from the swap, and skipped writing into
that zero-filled block of main memory, putting the stuff that should have
gone there up 04000, instead.

(Why it's swapping the text back in is too complicated to be worth explaining
here; anyone who _really_ wants to know should look here:

  http://gunkies.org/wiki/Unix_V6_internals

in the last section, "exec() and pure-text images".)

It's easy to confirm all these suppositions/deductions fairly easily, without
having to connect up, configure, etc the LA: we can just stop the machine
after the text is first read in (in xalloc()) from the file-system, and
confirm that the text looks good there; if so, either the swap-out (albeit
unlikely, since that doesn't account for the 0's) or subsequent swap-in had
an issue. The latter would be easy to confirm: just halt the machine after
the text is swapped in, and see what the RK registers contain.

At that point, as I said, we'll know to within a few gates where the issue
is, and then it'll be time to bring out the LA.

Actually, a plain oscilloscope would do; it's interesting to recollect that
these machines were designed and maintained without benefit of a LA, purely
with an oscilloscope! We're so spoiled now! :-)

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-08 Thread Fritz Mueller via cctalk
>>> How about a Unibus trace? 
>> 
>> I don't think my sad little HP LA has enough buffer for that...
> 
> You could use triggers in innovative ways.

Ah, quite right, gentlemen.  This seems the best place to start with the LA 
this weekend then.

  --FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-08 Thread Jay Jaeger via cctalk



On 2/7/2019 11:47 AM, Noel Chiappa via cctalk wrote:
> 
> The interesting point is that when V6 first copies the text in from the file
> holding the command (using readi(), Lions 6221 for anyone who's masochistic
> enough to try and actually follow this :-), it reads it in starting from the
> bottom, one disk block at a time (since in V6, files are not stored
> contiguously).
> 

I remember when Lions first showed up.  I have a copy of a copy made
back in the day.

JRJ


Re: PDP-11/45 RSTS/E boot problem

2019-02-07 Thread Mattis Lind via cctalk
torsdag 7 februari 2019 skrev Fritz Mueller via cctalk <
cctalk@classiccmp.org>:

>
> > How about a Unibus trace?  That would give you the RK11 commands as well
> as the data it sends in response.
>
> I don't think my sad little HP LA has enough buffer for that...


You could use triggers in innovative ways. Maybe trigger on that particular
data on that particular address. What takes place just before this? Is it
DMA or is it the CPU moving the data. Is it just reading out bad or is it
written bad?  That should be possible to figure out. You will probably get
quite far with only 16 data and 16 address bits (if you have the smallest
analyzer). Of course more is better...

/Mattis


>
>--FritzM.


Re: PDP-11/45 RSTS/E boot problem

2019-02-07 Thread Fritz Mueller via cctalk


> How about a Unibus trace?  That would give you the RK11 commands as well as 
> the data it sends in response.

I don't think my sad little HP LA has enough buffer for that...

   --FritzM.

Re: PDP-11/45 RSTS/E boot problem

2019-02-07 Thread Paul Koning via cctalk



> On Feb 7, 2019, at 1:37 PM, Fritz Mueller via cctalk  
> wrote:
> 
> 
>> On Feb 7, 2019, at 9:47 AM, Noel Chiappa via cctalk  
>> wrote:
>> 
>> So, with UISA0 containing 01614, that gives us PA:161400 + 04200 = PA:165600,
>> I think. And it wound up at PA:171600 - off by 04000 (higher) - which is
>> obviously an interesting number.
> 
> Thanks, Noel.
> 
>> ...it might be interesting to look at PA:165600 and see what's actually 
>> _there_
> 
> A sea of zeros, as it turns out.
> 
> I'm thinking it might be worth obtaining a full memory dump of the text 
> segment at the point of fault (I can do this with a small toggle-in program 
> to dump it over the serial console), , and then compare that to the complete 
> text section in the ls binary. That would give us more of a clue about 
> whether blocks of memory are duplicated or swapped, what the size, alignment, 
> and stride of the corrupted blocks is, how many there are, etc.
> 
> I'll get an IR trace out this weekend.  Another thing I _could_ do with the 
> LA is an IO command trace on the RK11 (though that's a lot of probes to hook 
> up to get disk address, count, and memory address).

How about a Unibus trace?  That would give you the RK11 commands as well as the 
data it sends in response.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-07 Thread Fritz Mueller via cctalk


> On Feb 7, 2019, at 9:47 AM, Noel Chiappa via cctalk  
> wrote:
> 
> So, with UISA0 containing 01614, that gives us PA:161400 + 04200 = PA:165600,
> I think. And it wound up at PA:171600 - off by 04000 (higher) - which is
> obviously an interesting number.

Thanks, Noel.

> ...it might be interesting to look at PA:165600 and see what's actually 
> _there_

A sea of zeros, as it turns out.

I'm thinking it might be worth obtaining a full memory dump of the text segment 
at the point of fault (I can do this with a small toggle-in program to dump it 
over the serial console), , and then compare that to the complete text section 
in the ls binary. That would give us more of a clue about whether blocks of 
memory are duplicated or swapped, what the size, alignment, and stride of the 
corrupted blocks is, how many there are, etc.

I'll get an IR trace out this weekend.  Another thing I _could_ do with the LA 
is an IO command trace on the RK11 (though that's a lot of probes to hook up to 
get disk address, count, and memory address).

  --FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-07 Thread Jon Elson via cctalk

On 02/06/2019 09:11 PM, Noel Chiappa via cctalk wrote:

 > From: Jon Elson

 > I'm thinking it is bad memory. ... I think it is just a bad memory chip

Nothing so simple, I'm afraid! The memory actually contains:

   PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144

and it's _supposed_ to be holding:

   PA:171600: 110024 010400 000167 16 010500 010605 010446 010346

This together with Fritz's discovery of that first 'bad memory' pattern 
_elsewhere_
in the binary for the command makes it look pretty likely that some sort of 
other
error has wound up with stuff being put in the wrong location.

OK, now it is starting to look like an address problem.  
That could actually be several things.
Possibly something going wrong in DMA, and the disk data is 
being written into the wrong place in memory.  If the two 
places the same data show up are related by some simple 
binary transposition, maybe under some cases a write to 
memory gets written simultaneously into two banks of the 
memory.  A memory interference test OUGHT to pick up 
something like that.  It could also be a bus problem, or 
something going haywire in the MMU.


And, one other possibility is that the duplicate data is a 
disk buffer or cache that was then copied to the location to 
be executed.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk
> Seems a little less-likely to be the problem, given(?) as well that you have 
> fairly consistent (is deterministic overstating it?) behaviour.

Yeah.  We've gotten to the point now where enough layered problems have been 
cleared away that the remaining behavior is quite deterministic.

> If you wanted to test it by experiment, without having to remove the 
> installed Rs, you could test-clip another R in parallel with the 38.4K, 
> probably something around 200K, to shorten the 555 period.

Yes; and I think a quick solder tack for that would even be easier to manage 
than clips in there.  Will give that a go this weekend.

  cheers,
--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 10:37 PM, Fritz Mueller via cctalk wrote:

>> 4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the 
>> 14.5uS from the manual would give 1.86 mS, 7% shy of 2.
>> The schematic specs 1% resistors, and the parts list does appear to spec a 
>> high-tolerance "1%200PPM" cap.
>> 
>> Although there are the internal voltage divider Rs in the 555 which are also 
>> critical for the timing and everything is 40+ years old.
>> 
>> Idle speculation at my distance, we'll see what Fritz observes.
> 
> Brent:  11.8us, 6.4us position 
> Manual: 14.5us, 6.0us positive
> Actual: 15.2us, 8.5us positive
> 
> So yeah, a little pokey there...


15.2uS gives a 1.95mS refresh, so it's awfully close to the 2mS spec, but still 
within.
The datasheet I was looking at doesn't seem to give any spec for tolerance on 
the refresh so one would guess there's a safety margin built into the 2mS spec.

Seems a little less-likely to be the problem, given(?) as well that you have 
fairly consistent (is deterministic overstating it?) behaviour.

If you wanted to test it by experiment, without having to remove the installed 
Rs, you could test-clip another R in parallel with the 38.4K,
probably something around 200K, to shorten the 555 period.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk
> 4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the 14.5uS 
> from the manual would give 1.86 mS, 7% shy of 2.
> The schematic specs 1% resistors, and the parts list does appear to spec a 
> high-tolerance "1%200PPM" cap.
> 
> Although there are the internal voltage divider Rs in the 555 which are also 
> critical for the timing and everything is 40+ years old.
> 
> Idle speculation at my distance, we'll see what Fritz observes.

Brent:  11.8us, 6.4us position 
Manual: 14.5us, 6.0us positive
Actual: 15.2us, 8.5us positive

So yeah, a little pokey there...



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk




It looks like the question boils down to either "how did that part of
the binary get to that part of memory?", or "how did we end up
executing out of that part of memory?"


More the former, I think...


Noel, is it possible for you deduce where Unix _should_ be placing these 
 "bad" bits (from file offset octal 4220)?


Maybe a comparison of addresses where the bits should be, with addresses 
where the "bad" copy ends up, could point us at some particular failure 
modes to check in the KT11, CPU, or RK11...


--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Noel Chiappa via cctalk
> From: Fritz Mueller

> It looks like the question boils down to either "how did that part of
> the binary get to that part of memory?", or "how did we end up
> executing out of that part of memory?"

More the former, I think.

UISA0 contains 001614, and physical memory at 0161400 does contain the first
few instructions of the command's binary, so that 01614 is probably correct
for the base address of segment (page) 0, which contains all the code for the
command. (Without looking through the OS's guts, I can't confirm, from interal
data structures, that that's where it decided to put the command's binary.)

The PC at fault time is 010210, which is correct for the frame setup
function, CSV; and looking at the contents of the stack, registers etc makes
it pretty certain it had just done the "JSR R5, CSV" to get there. And
0161400 + 010210 = 0171610, which contains something completely different
from what's in the command binary at 010210!

> Could still be a memory issue _elsewhere_ that lands us there, of
> course... Could also be a translation error lurking in the KT11, or a
> CPU bug not found by any of the DEC diagnostic suites.

Yup. Like I said, good news is we're down to one problem; bad news is it's
a Duesie!

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Noel Chiappa via cctalk
> From: Jon Elson

> I'm thinking it is bad memory. ... I think it is just a bad memory chip

Nothing so simple, I'm afraid! The memory actually contains:

  PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144

and it's _supposed_ to be holding:

  PA:171600: 110024 010400 000167 16 010500 010605 010446 010346

This together with Fritz's discovery of that first 'bad memory' pattern 
_elsewhere_
in the binary for the command makes it look pretty likely that some sort of 
other
error has wound up with stuff being put in the wrong location.

  Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk



On 2/6/19 6:25 PM, Jon Elson via cctalk wrote:
I'm thinking it is bad memory.  It seems unlikely bus problems could 
alter only ONE BIT per word, so I think it is just a bad memory chip, 
and finding multiple words where the 01 bit is now turned on sure 
looks like that kind of problem.


So, there was an issue specifically relating to bit 12 on the front 
panel (d'oh!), which I have now cleared up.


Furthermore, the "authoritative" sequence of 16 words obtained from the 
front panel last night, after addressing this issue, is:


PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144
PA:171620: 004767 000206 000405 012404 012467 016124 000167 177346

...and, as it turns out, this exact sequence also occurs within the ls 
binary, on disk (per "od"):


0004220 016162 004767 000224 000414 016700 016152 016702 016144
0004240 004767 000206 000405 012404 012467 016124 000167 177346

So, the memory there _seems_ fine with the latest info at our disposal. 
It looks like the question boils down to either "how did that part of 
the binary get to that part of memory?", or "how did we end up executing 
out of that part of memory?"


Could still be a memory issue _elsewhere_ that lands us there, of 
course...  Could also be a translation error lurking in the KT11, or a 
CPU bug not found by any of the DEC diagnostic suites.


I will scope the refresh clock when I get home tonight, and I'm planning 
on hauling out the logic analyzer for an IR trace this weekend...


   --FritzM.


P.S. One idea that popped into my head recently, after a suggestion here 
to check the KT11 address translation adders, and my response "but the 
diagnostics!"...  A bug in one of the carry lookahead generators used 
between the bit slices of that adder could cause a mistranslation on 
only a fairly selective subset of virtual addresses, and this might 
conceivably be missed by the KT11 diagnostics?  *IF* that's the case and 
we can chase the IR trace upstream to the place of an unlucky 
mistranslation, it will be pretty easy to track down then in the hw and fix.


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Jon Elson via cctalk

On 02/06/2019 05:39 PM, Fritz Mueller via cctalk wrote:

On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk  
wrote:

Is the schematic available for the memory board at-issue?
Curious myself to see what approach for refresh DEC used.

Yes, here: 
http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf

There is also a technical manual adjacent, with circuit descriptions.

I will scope this up tonight and take a look!

Yup, page 6, a 555 RC refresh timer!

Jon



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Jon Elson via cctalk

On 02/06/2019 04:24 PM, Brent Hilpert via cctalk wrote:

On 2019-Feb-06, at 1:21 PM, Noel Chiappa via cctalk wrote:

From: Brent Hilpert
what about the refresh circuitry of the memory board?
...
It might also explain why a number of 4116s were (apparently) failing
earlier in the efforts ... replacing them might have just replaced them
with 'slightly better' chips, i.e. with a slightly longer refresh tolerance.

Ooh, excellent idea!


Is the schematic available for the memory board at-issue?
Curious myself to see what approach for refresh DEC used.


Hmm, yes, if the refresh is done by one-shots and RC timing, 
a failed cap could silently kill the refresh trigger.  An 
easy way to check is put something in a few locations and 
halt the CPU for some time (seconds to minutes).  If the 
content is now gone, then the refresh is very likely not 
being done.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Jon Elson via cctalk

On 02/06/2019 12:53 PM, Noel Chiappa via cctalk wrote:
  


If so, i) we're down to one problem (good news), and our problem turns into
finding out how that section of the code got trashed (bad news).
I'm thinking it is bad memory.  It seems unlikely bus 
problems could alter only ONE BIT per word, so I think it is 
just a bad memory chip, and finding multiple words where the 
01 bit is now turned on sure looks like that kind of 
problem.  It could, of course, be a bad driver or receiver 
on the memory board.  Might also check the other voltage in 
the memory array (+12 or whatever was used internally in the 
particular memory) and also look for degraded caps on the board.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 5:29 PM, Paul Koning wrote:
>> On Feb 6, 2019, at 8:25 PM, Brent Hilpert via cctalk  
>> wrote:
>> On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote:
> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk 
>  wrote:
> 
> Is the schematic available for the memory board at-issue?
> Curious myself to see what approach for refresh DEC used.
 
 Yes, here: 
 http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf
>>> 
>>> For completeness, from the technical manual:
>>> 
>>> "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H 
>>> and the refresh address. Sig- nal REF CLK H is derived from a 555 timer 
>>> (E5) which is set up as a free running oscillator, powered by the + IS V / 
>>> + 12 V module input (V-555). The REF CLK H signal oscillates with a period 
>>> of 14.5us and has a positive pulse width of 6us during each period."
>> 
>> So I could have saved myself some fun if I had read the manual rather than 
>> just looking at the schematic.
>> Not that they're way out of whack, but the mild disparity between the 
>> manual's 14.5uS and my calculated 11.7uS is curious
>> (the calculation being based on the schematic RC values and the 555 
>> equations).
> 
> Perhaps the period was changed in a schematic rev or ECO, and the manual 
> wasn't updated to reflect it.  It would be interesting to check the data 
> sheet for the RAM chip to see what it likes for refresh cycle.  And given 
> that this is an RC oscillator your theory about out of tolerance timing 
> definitely deserves checking.


Checking further..

4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the 14.5uS 
from the manual would give 1.86 mS, 7% shy of 2.
The schematic specs 1% resistors, and the parts list does appear to spec a 
high-tolerance "1%200PPM" cap.

Although there are the internal voltage divider Rs in the 555 which are also 
critical for the timing and everything is 40+ years old.

Idle speculation at my distance, we'll see what Fritz observes.
Could be other problems in the refresh circuitry too, like failed outputs from 
the row counter, etc.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 3:39 PM, Fritz Mueller wrote:
>> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk  
>> wrote:
>> 
>> Is the schematic available for the memory board at-issue?
>> Curious myself to see what approach for refresh DEC used.
> 
> Yes, here: 
> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf
> 
> There is also a technical manual adjacent, with circuit descriptions.
> 
> I will scope this up tonight and take a look!

Mixed up To: fields.
The following was intended to go to the list and was originally sent a moment 
before I saw Fritz's message mentioning the 555:


Ha!, simple free-running 555 oscillator generating the refresh cycles 
(pdf.pg27).

I suspect there is a mistake in the schematic there:
V-555 more likely connects on the other side of R4 (E5.4-C1-R4, rather 
than E5.7-R4-R5)
to make it into the standard 555 astable circuit.

Based on that, calculations indicate that the output from E5 (TP18) should be 
around 85 KHz, cycling 6.4 uS high, 5.3 uS low.
So it's generating a refresh cycle every 11.8 uS. With 7 bits used from counter 
E43 (128 rows) for full refresh, that's a cell refresh
every 1.5mS which (without having checked the 4116 specs) sounds sensible for a 
DRAM from that period.

Note the 555 (E5) is running on +12 or +15V, with a R voltage divider on the 
output before driving into TTL.


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Paul Koning via cctalk



> On Feb 6, 2019, at 8:25 PM, Brent Hilpert via cctalk  
> wrote:
> 
> On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote:
 On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk 
  wrote:
 
 Is the schematic available for the memory board at-issue?
 Curious myself to see what approach for refresh DEC used.
>>> 
>>> Yes, here: 
>>> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf
>> 
>> For completeness, from the technical manual:
>> 
>> "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H 
>> and the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) 
>> which is set up as a free running oscillator, powered by the + IS V / + 12 V 
>> module input (V-555). The REF CLK H signal oscillates with a period of 
>> 14.5us and has a positive pulse width of 6us during each period."
> 
> So I could have saved myself some fun if I had read the manual rather than 
> just looking at the schematic.
> Not that they're way out of whack, but the mild disparity between the 
> manual's 14.5uS and my calculated 11.7uS is curious
> (the calculation being based on the schematic RC values and the 555 
> equations).

Perhaps the period was changed in a schematic rev or ECO, and the manual wasn't 
updated to reflect it.  It would be interesting to check the data sheet for the 
RAM chip to see what it likes for refresh cycle.  And given that this is an RC 
oscillator your theory about out of tolerance timing definitely deserves 
checking.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote:
>>> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk 
>>>  wrote:
>>> 
>>> Is the schematic available for the memory board at-issue?
>>> Curious myself to see what approach for refresh DEC used.
>> 
>> Yes, here: 
>> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf
> 
> For completeness, from the technical manual:
> 
> "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H 
> and the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) 
> which is set up as a free running oscillator, powered by the + IS V / + 12 V 
> module input (V-555). The REF CLK H signal oscillates with a period of 14.5us 
> and has a positive pulse width of 6us during each period."



So I could have saved myself some fun if I had read the manual rather than just 
looking at the schematic.
Not that they're way out of whack, but the mild disparity between the manual's 
14.5uS and my calculated 11.7uS is curious
(the calculation being based on the schematic RC values and the 555 equations).



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk


>> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk  
>> wrote:
>> 
>> Is the schematic available for the memory board at-issue?
>> Curious myself to see what approach for refresh DEC used.
> 
> Yes, here: 
> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf

For completeness, from the technical manual:

"The refresh logic, shown in sheet 6 of the print set, generates REF CLK H and 
the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) which 
is set up as a free running oscillator, powered by the + IS V / + 12 V module 
input (V-555). The REF CLK H signal oscillates with a period of 14.5us and has 
a positive pulse width of 6us during each period."



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Fritz Mueller via cctalk


> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk  
> wrote:
> 
> Is the schematic available for the memory board at-issue?
> Curious myself to see what approach for refresh DEC used.

Yes, here: 
http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf

There is also a technical manual adjacent, with circuit descriptions.

I will scope this up tonight and take a look!

--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 1:21 PM, Noel Chiappa via cctalk wrote:
>> From: Brent Hilpert
> 
>> what about the refresh circuitry of the memory board?
>> ...
>> It might also explain why a number of 4116s were (apparently) failing
>> earlier in the efforts ... replacing them might have just replaced them
>> with 'slightly better' chips, i.e. with a slightly longer refresh tolerance.
> 
> Ooh, excellent idea!


Is the schematic available for the memory board at-issue?
Curious myself to see what approach for refresh DEC used.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Noel Chiappa via cctalk
> From: Brent Hilpert

> what about the refresh circuitry of the memory board?
> ...
> It might also explain why a number of 4116s were (apparently) failing
> earlier in the efforts ... replacing them might have just replaced them
> with 'slightly better' chips, i.e. with a slightly longer refresh 
tolerance.

Ooh, excellent idea!

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Brent Hilpert via cctalk
On 2019-Feb-06, at 10:53 AM, Noel Chiappa via cctalk wrote:
> 
> I'm not sure that's going to tell us much: the latest development is that
> Fritz looked at the actual memory contents again, and it is once again
> trash; _almost_ identical to what was there before:
> 
>  PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144
> 
> but with some extra 01 bits:
> 
>  PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144
> 
> (It's not clear if this represents a real difference, or if that 
> front panel issue Fritz mentioned caused the contents to be displayed
> incorrectly.)
> 
> The exciting thing is that if the latter really is what's in main memory,
> that '16700 16152' at the PC of the MM trap could indeed generate the MM trap
> we're seeing: it's "MOV 26364, R0", and that address is in segment (page) 1,
> which is only 03500 long
> 
> If so, i) we're down to one problem (good news), and our problem turns into
> finding out how that section of the code got trashed (bad news). Which is not
> going to be simple, alas, I suspect. I don't think it's the RK11, because
> Unix reads the program image into system buffers in low memory, and that's
> clearly working OK in the 'sleep;ls' case. (It may not use the exact same
> buffers, though...) It then copies it out to the memory where it's going to
> execute from, using an MTPI loop. So maybe the memory still has issues, or
> maybe the MTPI isn't working with some main memory locations or or or...


I haven't followed this in detail enough to know what the configuration and 
memory board at play are so maybe
this can be ruled out from your end, but for consideration, what about the 
refresh circuitry of the memory board?

Mem diagnostics, unless they explicitly account for it, may not show up 
problems with memory refresh
if the loop times are short enough to effectively substitute as refresh cycles, 
while they could show up later in
real-world use with arbitrary time between accesses.

Refresh on some early boards/systems was asynchronously timed by monostables or 
onboard oscillators
which can drift or fail on the margin/slope. (I don't know what DEC's design 
policy was for DRAM refresh).
It might also explain why a number of 4116s were (apparently) failing earlier 
in the efforts (if I recall the discussion correctly),
replacing them might have just replaced them with 'slightly better' chips, i.e. 
with a slightly longer refresh tolerance.



Re: PDP-11/45 RSTS/E boot problem

2019-02-06 Thread Noel Chiappa via cctalk
> From: Mattis Lind

>> we've also looked at what's in memory at that location, and the low
>> part of the text segment seems to be correct, but there was junk at
>> the top, around the target of the JSR (i.e. at 'csv'). Not just one
>> word, but everything around that location was wrong, which would
>> suggest to me that the cause is not a simple memory failure there.
>> I've suggested to Fritz that we look at that again, to see if what was
>> recorded before is accurate 

> Would it be possible to insert a breakpoint or halt and run the
> program, insert original instruction and single step?

I'm not sure that's going to tell us much: the latest development is that
Fritz looked at the actual memory contents again, and it is once again
trash; _almost_ identical to what was there before:

  PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144

but with some extra 01 bits:

  PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144

(It's not clear if this represents a real difference, or if that 
front panel issue Fritz mentioned caused the contents to be displayed
incorrectly.)

The exciting thing is that if the latter really is what's in main memory,
that '16700 16152' at the PC of the MM trap could indeed generate the MM trap
we're seeing: it's "MOV 26364, R0", and that address is in segment (page) 1,
which is only 03500 long

If so, i) we're down to one problem (good news), and our problem turns into
finding out how that section of the code got trashed (bad news). Which is not
going to be simple, alas, I suspect. I don't think it's the RK11, because
Unix reads the program image into system buffers in low memory, and that's
clearly working OK in the 'sleep;ls' case. (It may not use the exact same
buffers, though...) It then copies it out to the memory where it's going to
execute from, using an MTPI loop. So maybe the memory still has issues, or
maybe the MTPI isn't working with some main memory locations or or or...

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk
> On the logic analyzer suggestion: I remember seeing a logic analyzer hooked 
> to a PDP-11 at DEC, for software debugging.  As I recall, it was connected at 
> the console front panel, which seems reasonable since several key CPU data 
> paths are exposed there.

Ooh, I like that suggestion!  It might be worth making up some inline cables 
for the LA just for this purpose, so it could be a quick hookup whenever needed.

  --FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk
>> Would it be any difference if you run the machine at full speed or lower 
>> speed...
> 
> Ah, yes -- this I haven't tried yet!  I have a KM11 replica, so this is easy 
> enough to do; I'll give that a go when I next get back to the machine 
> (possibly this evening).

Ran the machine on the maintenance clock via the KM11 at a variety of speeds, 
and the behavior remains the same.  So not too timing sensitive...  At least 
its consistent!

--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Paul Koning via cctalk
On the logic analyzer suggestion: I remember seeing a logic analyzer hooked to 
a PDP-11 at DEC, for software debugging.  As I recall, it was connected at the 
console front panel, which seems reasonable since several key CPU data paths 
are exposed there.

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Josh Dersch via cctalk
On Tue, Feb 5, 2019 at 10:03 AM Fritz Mueller via cctalk <
cctalk@classiccmp.org> wrote:

>
>
> FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of
> running PDP11GUI, and I use an USA19H USB serial dongle connected through
> to the VM as a serial interface.  I don't know if something about this
> setup is particularly detrimental to PDP11GUI transfer performance?  It
> takes me >3hrs to write a 2.5mb pack this way (at 9600 baud), or a little
> over 1hr to read one back.  Do others see similar throughput with these
> tools?
>

Yes.  PDP11GUI is a great tool but it is extremely slow for dumping disks.
It's not your setup.  I restored an RL02 pack this way once (at 9600bps)
and it took a very long time (I didn't time it but it was well over 6
hours).  Compare this with restoring an RK05 pack on my PDP-8 using
dumprest, which takes just about an hour...

- Josh


>
> --FritzM.
>
>


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Jay Jaeger via cctalk
On 2/5/2019 12:03 PM, Fritz Mueller via cctalk wrote:
>> Perhaps compile [test programs] under SimH and do a block-level diff of the 
>> image with what is currently in use, and transfer just those blocks?
> 
> I did experiment with this a little way back.  I wrote a small standalone 
> code that dumps a CRC of every sector over the console; I can run this both 
> under SIMH and on the real machine, then diff to find the changed sectors.
> 
> Unfortunately, when I tried to apply this, it seemed that SIMH's write single 
> sector wasn't working correctly for me (though "write all sectors to end" 
> seemed to work okay).  It ended up being much more tedious than I had thought 
> to do it this way; in the end I concluded I'd be better off writing some 
> different software specifically for this purpose, but I haven't gotten back 
> to it yet.
> 
> FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of 
> running PDP11GUI, and I use an USA19H USB serial dongle connected through to 
> the VM as a serial interface.  I don't know if something about this setup is 
> particularly detrimental to PDP11GUI transfer performance?  It takes me >3hrs 
> to write a 2.5mb pack this way (at 9600 baud), or a little over 1hr to read 
> one back.  Do others see similar throughput with these tools?
> 
>   --FritzM.
> 
> 

At 9600 bps, and allowing for 10 bit characters (8 data bits, 1 start, 1
stop), that is 960 cps, and 2.5MB RK05 should take under an hour (2400
s).  Round that up to an hour, say, for handshaking overhead, etc.  That
is consistent with your read time.

To get to three hours we would need a pause for each write of:

7200 = 200 (tracks) x 12 (sectors/trk) x 2 (sides) x n seconds/block

And n would be 1.5 seconds / sector for the write time.  That seems
excessive.

Perhaps it is doing read after write verify for each block written?   If
so, can you turn that verify off?  (When I do my transfers over a DR11,
I run a separate checksum step afterwards, and the transfer programs
also report their checksums).


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk


> On Feb 5, 2019, at 10:03 AM, Fritz Mueller  wrote:
> 
> Unfortunately, when I tried to apply this, it seemed that SIMH's write single 
> sector wasn't working correctly for me...

Correction to above: "PDP11GUI's write single sector".  Apologies!

  --FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk


> Would it be any difference if you run the machine at full speed or lower 
> speed...

Ah, yes -- this I haven't tried yet!  I have a KM11 replica, so this is easy 
enough to do; I'll give that a go when I next get back to the machine (possibly 
this evening).

> ...or even single step past this instruction? With the KM11 installed you 
> could even single step the 5 minor states in each micro instruction. Would it 
> be possible to insert a breakpoint or halt and run the program, insert 
> original instruction and single step?

We're not *quite* sure yet of the exact offending instruction; memory around 
the purported fault location doesn't look like what we expect (particularly, 
its hard to see how the instruction which should have executed last could 
possibly result in the particular fault taken; thus Noel's request for an IR 
trace.)

I think the breakpoint-and-step approach is likely to be fruitful, but we need 
to clear up some muddiness around the exact instruction sequence/location first.

--FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk


>>> I keep wondering about the psu.
>> 
>> Good theory.
> 
> I'll give these a double-check...

I did give these a look yesterday.  Indeed, the +5 regulator in position "C" 
(which includes supply to the KT11) was running a little low (4.9 and change).  
I trimmed it up, and checked the rest of the regulators while I was at it (they 
were all fine.)

This did clear up some small strangenesses I was seeing at the console in 
address translation mode, but "ls" still fails in exactly the same way.

--FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk


> On Feb 5, 2019, at 8:45 AM, Jon Elson via cctalk  
> wrote:
> 
> I'd guess the diagnostic tries a few patterns to test for gross failure of 
> this circuitry, but since it involves memory on a system running a program, 
> it may not be able to exhaustively test these adders and comparators.

In fact, the DEC diagnostics relocate themselves around memory, so they can and 
do "paint the whole floor".  The tests are fairly exhaustive, testing 
relocations, access range and privilege mechanisms, activity and statistics 
flags, and fault and interrupt behaviors. (It takes my machine about 45 minutes 
running full bore to work its way through a single pass!)

Again, not to say that there's not a bug lurking in the KT11 (it remains in 
fact a prime suspect!)  But with the ground gone over so far we have managed to 
pretty thoroughly check and ruled out a lot of things like any sort of 
consistent failure of the relocation adder.

I really appreciate the time people are taking to offer help and suggestions -- 
please keep them coming!

   thanks,
--FritzM.




Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Fritz Mueller via cctalk
> Perhaps compile [test programs] under SimH and do a block-level diff of the 
> image with what is currently in use, and transfer just those blocks?

I did experiment with this a little way back.  I wrote a small standalone code 
that dumps a CRC of every sector over the console; I can run this both under 
SIMH and on the real machine, then diff to find the changed sectors.

Unfortunately, when I tried to apply this, it seemed that SIMH's write single 
sector wasn't working correctly for me (though "write all sectors to end" 
seemed to work okay).  It ended up being much more tedious than I had thought 
to do it this way; in the end I concluded I'd be better off writing some 
different software specifically for this purpose, but I haven't gotten back to 
it yet.

FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of 
running PDP11GUI, and I use an USA19H USB serial dongle connected through to 
the VM as a serial interface.  I don't know if something about this setup is 
particularly detrimental to PDP11GUI transfer performance?  It takes me >3hrs 
to write a 2.5mb pack this way (at 9600 baud), or a little over 1hr to read one 
back.  Do others see similar throughput with these tools?

--FritzM.



Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Jon Elson via cctalk

On 02/05/2019 07:36 AM, Noel Chiappa via cctalk wrote:

One would hope that the DEC KT11 diagnostic would check for this... but just
to be thorough, we have in fact written a short diagnostic which stores every
possible value in each UISA register and checks that it's correct. So unless
there is some sort of pattern sensitivity (e.g. when A is in UISAm and B is in
UISAn), that's not it.
The MMU has to have some adders in it.  One adds the offset 
for the segment's beginning physical address to the supplied 
address from the CPU.  The other compares the requested 
address against the limit (size) of the segment, to make 
sure it doesn't exceed the segment size.  Either this adder 
or the comparator could be faulty.  I'd guess the diagnostic 
tries a few patterns to test for gross failure of this 
circuitry, but since it involves memory on a system running 
a program, it may not be able to exhaustively test these 
adders and comparators.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Mattis Lind via cctalk
Den tis 5 feb. 2019 kl 00:23 skrev Fritz Mueller via cctalk <
cctalk@classiccmp.org>:

>
> > On Feb 4, 2019, at 2:28 AM, Noel Chiappa via cctalk <
> cctalk@classiccmp.org> wrote:
> >
> > I'm pretty sure the command only gets a few instructions in before it
> blows
> > up.  Here are the process' registers, and the _entire_ contents of the
> user
> > mode stack:
> >
> > R0 10
> > R1 0
> > R2 0
> > R3 0
> > R4 34
> > R5 444
> > SP 177760
> > PC 010210
> >
> > 060: 00 20 01 10 14 17 071554 00
>
> Okay, I've had a bit of time in front of the machine to repro this and
> take a look.  What I actually see is:
>
> R0 10
> R1 0
> R2 0
> R3 0
> R4 0
> R5 34
> R6 141774
> PC 000254
>
> (remember, for the last, this will have been after taking a trap to 250,
> where I have the usual "BR .+2; HALT" catcher installed)
>
> Also, memory at 060 (PA:164060) is all zeros as far as the eye can see...
>

Would it be any difference if you run the machine at full speed or lower
speed or even single step past this instruction? With the KM11 installed
you could even single step the 5 minor states in each micro instruction.
Would it be possible to insert a breakpoint or halt and run the program,
insert original instruction and single step?

The TIG module has a separate non crystal controlled oscillator which one
could tune for marginal checking.

Would it be possible to isolate the test case outside the UNIX environment?

/Mattis



>
> I have a bit of water on the basement floor right now after the recent
> rains here, which is complicating setup of the LA.  There's a big puddle
> where I normally place it...
>
>
>


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Jay Jaeger via cctalk
> 
> Yeah, it may come to that. One issue we've been having is doing specialized
> test programmes; trying to run the C compiler fails. I don't know about the
> assembler, though. And as Fritz mentioned, it takes hours to load a new disk
> image. I think we've come up with a way around that, though; produce binary
> of stand-alone tests elsewhere (I've often/always got a v6 running on
> Ersatz-11 here), and load them into the /45's main memory with PDP11GUI.
> 
>   Noel

Perhaps compile it under SimH and do a block-level diff of the image
with what is currently in use, and transfer just those blocks?
(Presumably would be the superblock, bitmap, directory and actual
program blocks).

For my setup I use a DR11 to transfer data, using an Arduino with
Ethernet as a go-between my PC and the PDP-11.


Re: PDP-11/45 RSTS/E boot problem

2019-02-05 Thread Noel Chiappa via cctalk
> From: Paul Koning

> Another possibility occurs to me: bad bits in the MMU (UISAR0 register
> ... if UISAR0 has a stuck bit so the "plain" case maps incorrectly
> you'd expect to come up with execution that looks nothing at all like
> what was intended.

One would hope that the DEC KT11 diagnostic would check for this... but just
to be thorough, we have in fact written a short diagnostic which stores every
possible value in each UISA register and checks that it's correct. So unless
there is some sort of pattern sensitivity (e.g. when A is in UISAm and B is in
UISAn), that's not it. Also, and perhaps more significantly, when checked
after the trap happens, all the UISA registers and all the KISA registers
contain correct data. So, unless it's something where _sometimes_ one reads
UISAn and gets X when it actually contains Y, I'm not sure the SARs (PARs) are
involved.

> From: Jon Elson

> OK, here's a really complicated thing to try. If you know the physical
> memory address of ls when it has the problem

We do (see above), and we've also looked at what's in memory at that
location, and the low part of the text segment seems to be correct, but there
was junk at the top, around the target of the JSR (i.e. at 'csv'). Not just
one word, but everything around that location was wrong, which would suggest
to me that the cause is not a simple memory failure there.

I've suggested to Fritz that we look at that again, to see if what was
recorded before is accurate (i.e. if we see the same wrong contents), to make
sure we didn't make a mistake somehow.

> write a machine language program that loads a copy of ls into that
> location and then tries to read it back.

Yeah, it may come to that. One issue we've been having is doing specialized
test programmes; trying to run the C compiler fails. I don't know about the
assembler, though. And as Fritz mentioned, it takes hours to load a new disk
image. I think we've come up with a way around that, though; produce binary
of stand-alone tests elsewhere (I've often/always got a v6 running on
Ersatz-11 here), and load them into the /45's main memory with PDP11GUI.

Noel


RE: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Wayne S via cctalk
Yep,  I noticed that, but thought it was a idea you might want to explore and 
it’s simple enough to do.

Without the full output from the ls command and how it was executed I was just 
throwing it out there.

For instance, was the default dir where ls was run, the same dir as when the 
backgrounded one was run.

That would make a difference if the filesystem was corrupt. In previous 
threads, there was an issue getting the proper image onto the disk, there is 
the potential for corruption.



There is the assumption, since  boards were being worked on, that the problem 
for a software is probably due to said hardware, even though diags pass.  With 
that assumption,  shouldn’t you try to eliminate different hardware pieces?   I 
would try running something that uses memory and doesn’t use disk to narrow the 
problem down.



Anyway,

Take care and good luck,



Wayne







Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10




From: Noel Chiappa 
Sent: Monday, February 4, 2019 12:43:09 PM
To: cctalk@classiccmp.org
Cc: j...@mercury.lcs.mit.edu
Subject: RE: PDP-11/45 RSTS/E boot problem

> From: Wayne S

> it might be a wonky filesystem. ...
> The corruption probably came because the entire disk was going bad.

This theory is contradicted by the fact (mentioned several times, including in
the message you were replying to) that doing a plain 'ls' bombs, but 'sleep
300 &; ls' works fine.

Noel


RE: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Wayne S via cctalk
Noel,  it might be a wonky filesystem.

I’ve had ls -l seg fault because of bad attribute data on a file in a directory 
on Solaris.

Interestingly, ls (without the -l) worked okay.

Maybe fsck or the equivalent command may show something.

It was a Solaris system with many concurrent users so I couldn’t  take it down 
to  run fsck so I

ended up writing a quick Perl program to just list file names and then modified 
it to get the attributes. It seg faulted when it came to the bad file name.  I 
used Perl unlink to kill it and everything was okay.

The corruption probably came because the entire disk was going bad.

Just a thought.







Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10




From: cctalk  on behalf of Noel Chiappa via 
cctalk 
Sent: Monday, February 4, 2019 11:24:19 AM
To: cctalk@classiccmp.org
Cc: j...@mercury.lcs.mit.edu
Subject: Re: PDP-11/45 RSTS/E boot problem

> From: Jay Jaeger

> This sort of situation, where DEC diagnostics run OK but UNIX has issues
> was reported to be not all that uncommon - to the point where the urban
> legend was that some DEC FE's would fire up Unix V6 as a sort of system
> exerciser.

Amusing! Never heard that; our -11's were never under maintenance, so DEC FE's
never worked on them.

> Make a copy of ls, and see if the copy also fails

It acts just like the original; fails when run by itself, runs OK when 'sleep'
is also running (in the background).


> From: Bob Smith

> We finally had the cpu backplane replaced

Ow. Not an option for Fritz, I expect. (I dunno - anyone have a spare /45
backplane?)


> From: Paul Koning

> Is there any way to attach a logic analyzer to various data paths on
> this machine?

I had suggested to Fritz that the symptoms led me to believe that it was time
to deploy a LA, especially since the MM trap only occurs once between him
typing 'ls' and the process failing - i.e. easy to trigger on.

He offered me the options of look at the IR or at the UNIBUS - I opted for
the IR so we can see _exactly_ what the machine _thinks_ it is doing! No
report back yet, though.

   Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Jon Elson via cctalk

On 02/04/2019 11:34 AM, Fritz Mueller via cctalk wrote:



2.  Make a copy of ls, and see if the copy also fails
(different location on disk would mess with timing just a bit).

Also done; the copy appears to behave identically to the original.


OK, here's a really complicated thing to try.  If you know 
the physical memory address of ls when it has the problem, 
write a machine language program that loads a copy of ls 
into that location and then tries to read it back.  You 
might be able to do this in Unix, having it start with the 
exact code of ls, but then has the tester above that and the 
entry point is for the test program.
This would detect a pattern sensitivity in the memory.  If 
ls, when actually running reads an instruction wrong, it 
could then try to read a bad address, and cause the MMU trap.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Jon Elson via cctalk

On 02/04/2019 11:20 AM, Fritz Mueller via cctalk wrote:


The MMU classifies the error in register SR0; this decodes to a segment length 
error (access within the segment beyond configured bound).  As Noel notes, 
however, this is not consistent with the instructions we see at the point of 
fault.
OK, so the CPU presents an address that is within the 
segment bound, but the MMU declares it to be OUTSIDE the 
bounds of the segment. That could be a CPU problem, but 
likely would be the same with the MMU on or off, so the 
diags SHOULD catch that.  But, if the CPU is sending a good 
address, then it has to be the MMU is failing on the 
addition/comparison with the segment size.

Anyway, is it possible to borrow an MMU from somebody else?

Potentially...  It is a two board option; I do have a spare for both of the 
boards, but these spares each are in need of other repairs at the moment.

One slightly complicating factor is that I have a *very* early 11/45.  Most of 
my boards (including the MMU boards), as well as my backplane, pre-date the 
currently available schematics on bitsavers, etc., and there are no records 
regarding which ECOs have been applied on my hardware.  Thus my interest in 
tracking down ECOs/FCOs...  I've been picking my way through the list that Jay 
recently posted, verifying by looking at the greenwires which FCO's I have 
applied and which not.  Its a bit painstaking.


This could be messy, but DEC was FAIRLY good at making 
updates backwards compatible where possible.  So, it MAY be 
true that a later MMU will still work in this CPU.


Jon


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Noel Chiappa via cctalk
> From: Fritz Mueller

> I've had a bit of time in front of the machine to repro this and take a
> look. What I actually see is:

> R0 10
> R1 0
> R2 0
> R3 0
> R4 0
> R5 34
> R6 141774
> PC 000254

Argh. (Very red face!)

I worked out the trap stack layout by looking at m40.s and trap.c, and
totally forgot about the return PC (that's the 0444) from the call to
trap():

  0001740 13 141756 022050 13 00 00 00 34
  0001760 000444 31 177760 00 030351 10 010210 170010

I clearly should have looked at core(V) in the V6 manual!

The R6 you have recorded is correct for just after the trap; that's
the kernel mode SP, which points to the top of the kernel stack,
in segment 6 (in the swappable per-process kernel area, which runs
from 14-1776).

So there is no R5 mystery, I was just confused. Back to the other two!

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Fritz Mueller via cctalk


>>> The obvious answer is bad memory.
>> 
>> At the board level, yes.  Deeper, it could be bad memory bits or bad
>> memory decode.
> 
> Yes, one of the standard early PDP-11 memory tests is the "no duplicate 
> address test".

I should say that the memory board is not _completely_ whack -- it is passing 
the rather thorough MAINDEC ZQMC, a 0-124k exerciser with multiple 
pattern/sequence tests which also kicks around the KT11.

That doesn't rule out the possibility that there is a lurker in there not 
covered by the DEC diags.  But if there is, its something subtle...




Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Fritz Mueller via cctalk


> On Feb 4, 2019, at 2:28 AM, Noel Chiappa via cctalk  
> wrote:
> 
> I'm pretty sure the command only gets a few instructions in before it blows
> up.  Here are the process' registers, and the _entire_ contents of the user
> mode stack:
> 
> R0 10
> R1 0
> R2 0
> R3 0
> R4 34
> R5 444
> SP 177760
> PC 010210
> 
> 060: 00 20 01 10 14 17 071554 00

Okay, I've had a bit of time in front of the machine to repro this and take a 
look.  What I actually see is:

R0 10
R1 0
R2 0
R3 0
R4 0
R5 34
R6 141774
PC 000254

(remember, for the last, this will have been after taking a trap to 250, where 
I have the usual "BR .+2; HALT" catcher installed)

Also, memory at 060 (PA:164060) is all zeros as far as the eye can see...

I have a bit of water on the basement floor right now after the recent rains 
here, which is complicating setup of the LA.  There's a big puddle where I 
normally place it...




Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Paul Koning via cctalk



> On Feb 4, 2019, at 5:47 PM, Ethan Dicks  wrote:
> 
> On Mon, Feb 4, 2019 at 3:15 PM Paul Koning via cctalk
>  wrote:
>>> On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk  
>>> wrote:
>> That translates into "the problem depends on the physical address of the 
>> code being executed".
>> 
>> The obvious answer is bad memory.
> 
> At the board level, yes.  Deeper, it could be bad memory bits or bad
> memory decode.
> 
> A simple ones-and-zeros test can identify bad DRAMs.  It's not as
> likely to find bad decoding, which could result in the same chips
> tested more than once and other chips not tested at all.  I've found
> both problems in real MS11-L boards I have for my stack of 11/04 and
> 11/34s I'm testing.
> 
> ISTR in the DEC world, they were good about that.  I have multiple
> papertapes for the PDP-8, that I think were literally called "ones and
> zeros" and "memory address" tests.  I would think XXDP has something
> similar in terms of progressive tests that expect the previous stage
> passed.

Yes, one of the standard early PDP-11 memory tests is the "no duplicate address 
test".

paul



Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Ethan Dicks via cctalk
On Mon, Feb 4, 2019 at 3:15 PM Paul Koning via cctalk
 wrote:
> > On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk  
> > wrote:
> That translates into "the problem depends on the physical address of the code 
> being executed".
>
> The obvious answer is bad memory.

At the board level, yes.  Deeper, it could be bad memory bits or bad
memory decode.

A simple ones-and-zeros test can identify bad DRAMs.  It's not as
likely to find bad decoding, which could result in the same chips
tested more than once and other chips not tested at all.  I've found
both problems in real MS11-L boards I have for my stack of 11/04 and
11/34s I'm testing.

ISTR in the DEC world, they were good about that.  I have multiple
papertapes for the PDP-8, that I think were literally called "ones and
zeros" and "memory address" tests.  I would think XXDP has something
similar in terms of progressive tests that expect the previous stage
passed.

-ethan


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Jay Jaeger via cctalk
On 2/4/2019 11:34 AM, Fritz Mueller via cctech wrote:
> 
>> On Feb 4, 2019, at 9:13 AM, Jay Jaeger  wrote:
>>
>> If he hasn't already, if Fritz has more than one memory board, he might
>> try swapping them to see if that changes anything.
> 
> I only have an 128kw MS11-L here to work with, unfortunately.  Its been 
> through a bunch of recent troubleshooting (tracking down and replacing failed 
> DRAMs).  I *think* its pretty solid at this point (also passing some of the 
> hairier DEC diagnostics) but...
> 
> I'd be happy to try out a different memory board if anybody was interested in 
> sending out a loaner?  (I'm in the SF Bay area).
> 

Well it turns out I have a couple of spares, but maybe someone closer
would be easier (Madison, WI  53711)

I have an MS11-LB, 64Kw, M7891-BB and two MS11-LD, 128Kw, M7891-DB and
an M7891-D?

So, two of these are newer revisions (rather than M7891-xA) - I have no
idea what the difference is.  On that last one I probably didn't record
where it was D, DB or DA

I also have quite a few RK05 packs and would be willing to sell one (and
I have boxes to ship boards and packs in).  The ones I am most willing
to part with would need their open/close springs removed, as they are
broken and dangerous to the platter in their current condition, but are
otherwise fine.  I would just remove the spring.

$20 for a pack is what I usually price them at, plus shipping.  (PayPal,
preferably)


The board would be a loan (with compensation for time spent if it is bad
*and* gets fixed) ;).



Let me know - might take me a couple of days to hunt the board down and
remove the spring and re-test the pack and pack everything up and ship
it.  (in my 11/34 which runs @rkunix V6 just fine.  ;))

JRJ




Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Paul Koning via cctalk



> On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk  
> wrote:
> 
>> From: Wayne S
> 
>> it might be a wonky filesystem. ...
>> The corruption probably came because the entire disk was going bad.
> 
> This theory is contradicted by the fact (mentioned several times, including in
> the message you were replying to) that doing a plain 'ls' bombs, but 'sleep
> 300 &; ls' works fine.

That translates into "the problem depends on the physical address of the code 
being executed".

The obvious answer is bad memory.  Another possibility occurs to me: bad bits 
in the MMU (UISAR0 register if I remember correctly).  Bad memory is likely to 
show up with a few bits wrong; if UISAR0 has a stuck bit so the "plain" case 
maps incorrectly you'd expect to come up with execution that looks nothing at 
all like what was intended.

paul



RE: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Noel Chiappa via cctalk
> From: Wayne S

> it might be a wonky filesystem. ...
> The corruption probably came because the entire disk was going bad.

This theory is contradicted by the fact (mentioned several times, including in
the message you were replying to) that doing a plain 'ls' bombs, but 'sleep
300 &; ls' works fine.

Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Noel Chiappa via cctalk
> From: Jay Jaeger

> This sort of situation, where DEC diagnostics run OK but UNIX has issues
> was reported to be not all that uncommon - to the point where the urban
> legend was that some DEC FE's would fire up Unix V6 as a sort of system
> exerciser.

Amusing! Never heard that; our -11's were never under maintenance, so DEC FE's
never worked on them.

> Make a copy of ls, and see if the copy also fails

It acts just like the original; fails when run by itself, runs OK when 'sleep'
is also running (in the background).


> From: Bob Smith

> We finally had the cpu backplane replaced

Ow. Not an option for Fritz, I expect. (I dunno - anyone have a spare /45
backplane?)


> From: Paul Koning

> Is there any way to attach a logic analyzer to various data paths on
> this machine?

I had suggested to Fritz that the symptoms led me to believe that it was time
to deploy a LA, especially since the MM trap only occurs once between him
typing 'ls' and the process failing - i.e. easy to trigger on.

He offered me the options of look at the IR or at the UNIBUS - I opted for
the IR so we can see _exactly_ what the machine _thinks_ it is doing! No
report back yet, though.

   Noel


Re: PDP-11/45 RSTS/E boot problem

2019-02-04 Thread Warner Losh via cctalk
On Mon, Feb 4, 2019 at 11:35 AM Paul Koning via cctalk <
cctalk@classiccmp.org> wrote:

>  The spec says allowed tolerances are +/- 5%.  He knew the reality for
> correct operation was -0%, +5%, so he tweaked all the supplies to read a
> hair above nominal.
>

Ah, the good old days...  I recall our PDP-11 tech tweaking +5V from 5.05V
to 4.95V and back again to demonstrate that tiny differences matter a lot
on one of the cranky 11/23+''s we had after I made a particularly unhelpful
teenage smart ass remark... The 11/23+ wouldn't boot at the slightly lower
than full voltage. It as cranky for a couple of years. Before that unit was
retired, the 5V and 12V rails had been tweek up to 5.2V and 12.5V in an
effort to keep the system alive long enough to transition customers from it
to a new Vax installed to deal with the growth in demand...  In the end, we
put that 11/23+ back in service for developers with a different disk
controller and it was happy back at +5.05V / +12.1V...

Warner


  1   2   >