Re: PDP-11/45 RSTS/E boot problem
On Mon, 18 Feb 2019 at 22:16, Fred Cisin via cctalk wrote: > > One of the moxt common causes of a terrible ear-piercing high whine is the > spindle contact. Many old drives had a springy piece that rubbed against > the end of the spindle. Over time, it would wear a divot, polish that, > and start to squeal. A very light pressure on it would test that > hypothesis. Not enough pressure to muffle the sound, and certaianly not > enough pressure to slow the spindle! Or, pulling up on it, away from the > spindle. Some people claimed that you could just rip it off. Don't. > Best is to twist it very slightly sideways, so that it can start wearing a > new divot. It was a 3½" EIDE drive. 8GB one, I think, but might have been smaller. I didn't want to open it to do that, although there was a time when custom PC builders "de-lidded" hard disks and fitted them with little acrylic windows so you could see the head move. Not sure I'd want to trust my data to that... > Well, there don't seem to be many 350 RAMAC disks still running. > > (I'm trying to decide what to use as a base to make a patio table out of a > [crashed] RAMAC 24" platter) Conceded. And thank you for the reminder that I'm not old yet. My first machine with a hard disk was my work PC in my first job: an IBM PC-AT, with a 20 MB FS/FH 5¼" ST-506 drive, probably a Seagate ST-4026. I added a second drive to the machine, a 15 MB one, and put Xenix/286 on it. A few years ago I bought a surplus 2½" 1 TB drive from a chap who'd bought a new notebook and put an SSD in it before use. So, 2nd hand but unused. It cost me CzK 1000, about £30 at the time. £30 for a terabyte. I was in a state of shock. It was so tiny, too. I found an online capacity comparator thing. You'd need a pile of those Seagate drives the size of a _space shuttle_ to hold a terabyte. https://liam-on-linux.livejournal.com/53353.html -- Liam Proven - Profile: https://about.me/liamproven Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053
Re: PDP-11/45 RSTS/E boot problem
> On Feb 18, 2019, at 4:47 PM, Jay Jaeger via cctalk > wrote: > > On 2/18/2019 3:38 PM, Paul Koning via cctalk wrote: >> >> ... >> Then again, I remember our college RS64 (drive for the RC11) which developed >> a bad motor bearing. ... >> > > Nice of the FE to do that. > > The Univ. of Wisconsin CS Department had one of those, but the platter > went bad. They just flipped the platter upside down and got more use > out of it. Yes, that was a feature. You had to reformat it, which required getting the timing track writer box from Maynard. I have seen that done on an RS11 (RF11) drive on our RSTS system; it crashed some heads and was rebuilt completely (new heads, new platter, new motor). > The Univ. of Wisconsin ECE Department also had one - the two machines > were nearly twins. I *have* *that* one - and it still ran when I tried > it a year or so ago. Neat. You can run RT11 on it if you add the boot loader and driver, at least old versions. DOS V4 also supports it. And older versions of RSTS can use it as a swap disk. paul
Re: PDP-11/45 RSTS/E boot problem
On 2/18/2019 3:38 PM, Paul Koning via cctalk wrote: > > >> On Feb 18, 2019, at 4:16 PM, Fred Cisin via cctalk >> wrote: >> >> On Mon, 18 Feb 2019, Liam Proven via cctalk wrote: >>> Well that is the thing, of course. I had that with one old IDE disk, >>> too. It made a terrible ear-piercing high whine that I associate with >>> a failing disk... but it passed every diagnostic I could throw at it, >>> so I used it for non-critical stuff and in testbed machines. >> >> One of the moxt common causes of a terrible ear-piercing high whine is the >> spindle contact. Many old drives had a springy piece that rubbed against >> the end of the spindle. > > Then again, I remember our college RS64 (drive for the RC11) which developed > a bad motor bearing. Since the platter is mounted directly on the motor > spindle that was a problem. And it was not under contract, so replacing the > motor would have set back the department a substantial sum. So the DEC FS > engineer removed the motor and carried it to Appleton Electric Motor Co., > which pulled the old bearing, pressed on a replacement, and handed it back. > Jim reinstalled the motor, all was well. Didn't even lose any data bits. > > paul > Nice of the FE to do that. The Univ. of Wisconsin CS Department had one of those, but the platter went bad. They just flipped the platter upside down and got more use out of it. The Univ. of Wisconsin ECE Department also had one - the two machines were nearly twins. I *have* *that* one - and it still ran when I tried it a year or so ago.
Re: PDP-11/45 RSTS/E boot problem
> On Feb 18, 2019, at 4:16 PM, Fred Cisin via cctalk > wrote: > > On Mon, 18 Feb 2019, Liam Proven via cctalk wrote: >> Well that is the thing, of course. I had that with one old IDE disk, >> too. It made a terrible ear-piercing high whine that I associate with >> a failing disk... but it passed every diagnostic I could throw at it, >> so I used it for non-critical stuff and in testbed machines. > > One of the moxt common causes of a terrible ear-piercing high whine is the > spindle contact. Many old drives had a springy piece that rubbed against the > end of the spindle. Then again, I remember our college RS64 (drive for the RC11) which developed a bad motor bearing. Since the platter is mounted directly on the motor spindle that was a problem. And it was not under contract, so replacing the motor would have set back the department a substantial sum. So the DEC FS engineer removed the motor and carried it to Appleton Electric Motor Co., which pulled the old bearing, pressed on a replacement, and handed it back. Jim reinstalled the motor, all was well. Didn't even lose any data bits. paul
Re: PDP-11/45 RSTS/E boot problem
On Mon, 18 Feb 2019, Liam Proven via cctalk wrote: Well that is the thing, of course. I had that with one old IDE disk, too. It made a terrible ear-piercing high whine that I associate with a failing disk... but it passed every diagnostic I could throw at it, so I used it for non-critical stuff and in testbed machines. One of the moxt common causes of a terrible ear-piercing high whine is the spindle contact. Many old drives had a springy piece that rubbed against the end of the spindle. Over time, it would wear a divot, polish that, and start to squeal. A very light pressure on it would test that hypothesis. Not enough pressure to muffle the sound, and certaianly not enough pressure to slow the spindle! Or, pulling up on it, away from the spindle. Some people claimed that you could just rip it off. Don't. Best is to twist it very slightly sideways, so that it can start wearing a new divot. My experience is extensive enough that _anyone's_ justifications of why they won't use Brand X disks get ignored, Well, there don't seem to be many 350 RAMAC disks still running. (I'm trying to decide what to use as a base to make a patio table out of a [crashed] RAMAC 24" platter) -- Grumpy Ol' Fred ci...@xenosoft.com
Re: PDP-11/45 RSTS/E boot problem
On Sat, 16 Feb 2019 at 01:43, Peter Coghlan via cctalk wrote: > Days turned into weeks, weeks into months and months into > years. It continued to occasionally make the same ghastly noises that > never should be heard coming from a hard disk but with absolutely no sign > of any errors being logged or damage to data whatsoever. The noises seem > to be associated with seek activity because I have never heard them when > the disk is just spinning but otherwise idle. I eventually retired it > and replaced it with a much larger one, purely because I ran out of > space on it. Any thoughts on what might be happening with it? Ha! Well that is the thing, of course. I had that with one old IDE disk, too. It made a terrible ear-piercing high whine that I associate with a failing disk... but it passed every diagnostic I could throw at it, so I used it for non-critical stuff and in testbed machines. For about 4 or 5 *years*. It was one reason to run machines with the case covers on, to muffle the noise. But it ran faultlessly for years. I think in the end I sold it on to someone, with a warning of course. That's how I dispose of all kit -- pass it on to a new owner. I try never to scrap or recycle anything at all. That's the problem with rule-of-thumb diagnoses. Sometimes they fail. But more often, things fail with no warning, so it's still useful. This is why I disregard everyone's accounts of hard disk brands they won't touch. I did PC tech support for ~25 years. I've seen every make of hard drive ever fail randomly, and I've seen every make of hard drive ever work flawlessly for years even when vilely abused. My experience is extensive enough that _anyone's_ justifications of why they won't use Brand X disks get ignored, because if I took them, I would not use _any_brand of disk. Everyone who's been around a bit has a horror story and the intersection in the Venn diagram, while small, excludes all vendors ever. I've never seen any one make that is significantly worse than any other. -- Liam Proven - Profile: https://about.me/liamproven Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053
Re: PDP-11/45 RSTS/E boot problem
Liam Proven wrote: > > And some of my younger colleagues thought it was strange that I could > predict hard disk failures from the running noises they made, and > later than that, whether WinNT's bus-mastering DMA-mode disk > controller device driver was installed from the sound of the disk > accesses while the machine booted. > The little 8GB SCA system disk in my Alphaserver 800 started making awful bloodcurdling clattering noises a few years ago. The first time I heard it, I was convinced that the works were splattered all over the inside of the HDA casing and the machine was only continuing to run because of what was left in the disk cache or something like that. I started running a backup in case I might be able to salvage some part of the contents. Despite several more heartstopping clunks and clatters while the backup was running, it ran to completion, with no errors logged to my complete surprise. I ran an ANALYZE /DISK /READ which attempts to read all blocks on the disk that are allocated to files. Again, several more awful clanks and clatters but it completed with no errors. I lined up a replacement disk for it but I was curious to see how exactly it was going to fail so I decided to keep on using it for a while to see what happens. Days turned into weeks, weeks into months and months into years. It continued to occasionally make the same ghastly noises that never should be heard coming from a hard disk but with absolutely no sign of any errors being logged or damage to data whatsoever. The noises seem to be associated with seek activity because I have never heard them when the disk is just spinning but otherwise idle. I eventually retired it and replaced it with a much larger one, purely because I ran out of space on it. Any thoughts on what might be happening with it? Regards, Peter Coghlan.
Re: PDP-11/45 RSTS/E boot problem
Jeffrey S. Worley wrote: > > Back in 2000-ish, I was upgrading my DG MV4000/dc to 8mb so as to be > able to run the snazzy AOS/VS II tapes I'd got along with the 9 track > drive I hacked onto the machine... > > The install would start and then bomb at a certain point every time. I > decided to work the machine hard and then pull the board and give a > good SNIFF. This is a 15x15 inch board populated with 256kx1 drams. > The time in the machine got the board cooking nicely, and when I > smelled a certain charred smell in the vicinity of a 74ls04, I knew it > was that magic black smoke. I pulled a 74HCT04 from a known-good isa > card, socketed the spot and viola! Working 8mb board. It isn't > ALLWAYS the most expensive chip, thank God, and sometimes even us not- > as-bright guys come off with a win. > About 20 years earlier than that, one of my friends at school asked me to fix his Jupiter Ace which had stopped working. I told him I didn't hold out much hope for success because I didn't have the vaguest idea how his little machine worked at that time but I agreed to wave my multimeter in the general direction of it's power supply. I opened it up and quickly found that the voltages seemed very reasonable and I prodded around the board rather aimlessly looking for some part that looked guilty. I soon noticed that one of the eight identical chips in a row at the bottom of the board was getting hot enough to burn my finger while the others remained cool and calm. I can't remember where I got a replacement 4116 or 4164 or whatever it was - I probably had to get it mail order but once it was soldered in with fingers crossed that nothing else was wrong, the machine came right back to life. Sometimes you just get lucky. I wish I could be that lucky with some of my own stuff now. Regards, Peter Coghlan.
Re: PDP-11/45 RSTS/E boot problem
> From: Paul Koning > Studied it for a while, took out a small hammer, whacked the device at > some spot, and reported "fixed". That reminds me of an amusing story from the first time I went to see 'Star Wars; I went with a group of people from Tech Sq. It has that scene where they're about to make the jump to hyperspace in the 'Falcon', and it won't go; so one of them (I think Solo) jumps up and whacks a particular spot on the bulkhead with his fist, and away she goes. We all found this terribly amusing, since one of the DEC time-sharing systems on the 9th floor had a sticky relay in the power controller, and when you'd try to power it on or off from the front panel, the relay would stick, and nothing would happen. So the procedure was to go around the back, open a particular door, reach in, and whack the power controller behind it in a particular spot with the side of your fist, and away it went! Noel
Re: PDP-11/45 RSTS/E boot problem
On Fri, 15 Feb 2019 at 14:59, Paul Koning wrote: > > Speaking of sounds made by machines, there is a famous security paper from a > few years ago in which researchers read the encryption keys out of > smartphones by listening to the sounds made by the device while it was > execution the crypto algorithms. ... wow. > These hardware wizard stories remind me of a legendary repair wizard, > non-computer industrial devices I think. He was called in to fix a tricky > problem at the customer site. Studied it for a while, took out a small > hammer, whacked the device at some spot, and reported "fixed". He then sent > in a bill for $500. > > Customer challenged that with a demand to itemize the work. The itemized > bill came back like this: > > 1. Applying impact to the device: $5 > 2. Knowing where and how to apply the impact: $495 110 years old, and still apt. https://quoteinvestigator.com/2017/03/06/tap/ I first encountered it in the form of one of the AI Koans. I guess these are probably familiar to all here, but in case: http://people.cs.uchicago.edu/~wiseman/humor/ai-koans.html -- Liam Proven - Profile: https://about.me/liamproven Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053
Re: PDP-11/45 RSTS/E boot problem
> On Feb 15, 2019, at 6:06 AM, Liam Proven via cctalk > wrote: > > On Fri, 15 Feb 2019 at 04:34, Jeffrey S. Worley via cctalk > wrote: >> >> The install would start and then bomb at a certain point every time. I >> decided to work the machine hard and then pull the board and give a >> good SNIFF. > > Got a nose for a hardware fault, eh? ;-) > > And some of my younger colleagues thought it was strange that I could > predict hard disk failures from the running noises they made, and > later than that, whether WinNT's bus-mastering DMA-mode disk > controller device driver was installed from the sound of the disk > accesses while the machine booted. Speaking of sounds made by machines, there is a famous security paper from a few years ago in which researchers read the encryption keys out of smartphones by listening to the sounds made by the device while it was execution the crypto algorithms. These hardware wizard stories remind me of a legendary repair wizard, non-computer industrial devices I think. He was called in to fix a tricky problem at the customer site. Studied it for a while, took out a small hammer, whacked the device at some spot, and reported "fixed". He then sent in a bill for $500. Customer challenged that with a demand to itemize the work. The itemized bill came back like this: 1. Applying impact to the device: $5 2. Knowing where and how to apply the impact: $495 paul
Re: PDP-11/45 RSTS/E boot problem
Fritz Mueller wrote: > > That's right -- I wasn't without an army, it was just a very small and > quite dedicated army! :-) > > I think I'd have gone down many blind alleys without help and perspective > provided by others here, and in particular a lot guidance provided by Noel > over the past couple weeks in private correspondence enabling the use of > V6 as a test case and investigative tool. For this I am very grateful. > I very much enjoyed following the story of tracking down this fault. Thanks for sharing it. > > As those of you who have worked on these machines know, they are just so > damn serviceable, by design. It's very empowering! > I wish that this was also the case with several DEC Alphas I have with cache failures that are not nearly so serviceable or empowering :-( Regards, Peter Coghlan. > > --FritzM. >
Re: PDP-11/45 RSTS/E boot problem
On Fri, 15 Feb 2019 at 04:34, Jeffrey S. Worley via cctalk wrote: > > The install would start and then bomb at a certain point every time. I > decided to work the machine hard and then pull the board and give a > good SNIFF. Got a nose for a hardware fault, eh? ;-) And some of my younger colleagues thought it was strange that I could predict hard disk failures from the running noises they made, and later than that, whether WinNT's bus-mastering DMA-mode disk controller device driver was installed from the sound of the disk accesses while the machine booted. BTW, Jeff, Gmail bottom-quotes just fine. I'm using the web interface right now. Just hit Ctrl-A, trim as needed and move the cursor. Yes, it's a pain on mobile, so I try not to answer on mobiles! -- Liam Proven - Profile: https://about.me/liamproven Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053
Re: PDP-11/45 RSTS/E boot problem
> Message: 2 > Date: Wed, 13 Feb 2019 15:03:41 -0500 > From: Paul Koning > To: Jay Jaeger , "General Discussion: On-Topic and > Off-Topic Posts" > Subject: Re: PDP-11/45 RSTS/E boot problem > Message-ID: > Content-Type: text/plain; charset=us-ascii > > > > On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk > wrote: > > > > ... > > Maybe that story about FE's using Unix as a test to confirm operation > > even when diagnostics said the machine was OK was not so much just a > > legend? > > It still fels like a legend. My experience with DEC field service > engineers is that they used the diagnostics. In the PDP-11 era, Unix > knowledge around DEC was pretty sparse, especially early on when it > could be found only in the Telephone Products Group (Armando > Stettner). RSTS would be more plausible, but I never saw that in the > hads of FS engineers either. > By and large diagnostics would find problems. I've seen a number in > the 1970s, including a messy data path failure in the 11/45 MMU where > we (college students) did the initial diagnosis while the FS engineer > was on his way. My suspicion is that things not solved by diagnostics > would be escalated to the "wizard from Maynard". And they'd probably > start replacing whole subsystems. I've seen that once, when our > college RSTS-11 system (11/20, 16 DL-11 lines) was crashing on average > once a day for months. DEC brought in several of those "wizards". The > "fix" was to replace the 11/20 by a "spare part" -- an 11/45 with more > memory, a DH11, and RSTS/E. Decades later I was told that the wizards > actually pinned the blame on the college FM broadcast transmitter, > about 200 feet down the hall from the computer center. That may well > be, though I didn't heard that at the time. RSTS did get used in > manufacturing, at Final Assembly & Test sites like Westminster MA and > Salem NH, where PDP-11 systems large enough to run RSTS/E were > subjected to a load test of exerciser programs running under that OS. > The way it was explained to us is that a system that would be happy > with such a test would also be happy with any customer application. > It's not clear if that was because RSTS would load things more than > most, or was more finicky about hardware glitches than most, but it > certainly was the practice for quite some time. Of course, not all > PDP-11 configurations could be tested that way. paul I guess the experience in NJ was a bit different since AT had two dedicated Field Service offices who handled their sites including Bell Labs. I was on the Commercial/Government side from 81-86 and we didn't get to play with RSTS on customer sites at all (but sometimes we got to play in the in-house machines in Princeton or on our own hardware). It was a bit different in the Vax side since many diags were run under VAX/VMS and as a brand new hire I was doing Vax installs -- including installing the VMS 2.x and 3.x on 11/780's and 11/750's at install time. If they had paid for software installation -- the software guys would wipe and reinstall. If not we left the pack and prayed the customer wouldn't wipe the diags that we installed on the disk when we build the VMS pack. Realistically the only thing the customer needed to do after we got done was tweak the systen parameters, check the swap etc. and lay on the layered products like languages. Things got much more interesting when the VMS3.x and 4.x got CI780's and HSC50's. That was more involved than the easy VMS 2.x-3.x install. As far as the 11/70's -- I'm building a pidp1170... My last 11/70 install was around 84 or so when I put in a late DECDatasystem 570 blue 11/70 with the FCC Cabinets at AT in Freehold. As far as the Wizard from Maynard -- one story from my branch support guy (rumored to be about his brother on the 11/70 line in (I think in Westminster MA... not Salem or other NH plants) had an intermittant 11/70 that would crash every couple of days and they could run all the diags and DEC X11 with no issues. They called over their in-house wizard who ran toggle-in programs from the front panel -- playing the switches like piano keys with both hands. After about a half hour his comment was "Clean the terminator fingers." Machine ran like a SOB once the gold fingers were cleaned. Weirdest 11/70 mess I had was after I left DEC to work for a third party maintenance group. Their regional support was in Dallas. I was in NJ. They couldn't find their support guy so they rushed me on a plane to Chicago to work with two techs who were babysitting a mess they had no clue on. The site was WW Granger in Skokie and I arrived at 3AM... They had a 5 or 6 story warehouse which was a totally robotic automated site picking water heaters and other industrial equ
Re: PDP-11/45 RSTS/E boot problem
I got a laugh out of this anecdote. Of course, folks heard me chuckle and I tried to share the joke but Way too geeky for public consumption. Back in 2000-ish, I was upgrading my DG MV4000/dc to 8mb so as to be able to run the snazzy AOS/VS II tapes I'd got along with the 9 track drive I hacked onto the machine... The install would start and then bomb at a certain point every time. I decided to work the machine hard and then pull the board and give a good SNIFF. This is a 15x15 inch board populated with 256kx1 drams. The time in the machine got the board cooking nicely, and when I smelled a certain charred smell in the vicinity of a 74ls04, I knew it was that magic black smoke. I pulled a 74HCT04 from a known-good isa card, socketed the spot and viola! Working 8mb board. It isn't ALLWAYS the most expensive chip, thank God, and sometimes even us not- as-bright guys come off with a win. I really enjoy reading this list even though I don't contribute all that often or anything of much value. It is a pleasure to watch you guys work. Jeff On Thu, 2019-02-14 at 12:00 -0600, cctalk-requ...@classiccmp.org wrote: > Re: PDP-11/45 RSTS/E boot problem > When our 11/45 failed in the MMU in 1975, my classmate Josh Rosen traced the failing path on the schematics. When Jim Newport the field service engineer showed up, Josh described the diagnostics result that pointed at the failed path, and added "This is the failed chip" (pointing to one particular chip. Jim asked "Why that one?" Josh answered "because that is the most expensive chip". It turned out he was right. paul
Re: PDP-11/45 RSTS/E boot problem
That's right -- I wasn't without an army, it was just a very small and quite dedicated army! :-) I think I'd have gone down many blind alleys without help and perspective provided by others here, and in particular a lot guidance provided by Noel over the past couple weeks in private correspondence enabling the use of V6 as a test case and investigative tool. For this I am very grateful. As those of you who have worked on these machines know, they are just so damn serviceable, by design. It's very empowering! --FritzM.
Re: PDP-11/45 RSTS/E boot problem
Ethan Dicks wrote: > I have had an RK11-C for a long time that I've never tried to > power up (I got an RKV11-D and used that on Qbus machines > instead). Wow, someone else with an RKV11-D! I thought I was the only person who had one. I modified mine (using the dead bug technique) to add 18-bit addressing instead of just 16, and ran it successfully with RT-11 and RSX-11M on my 11/73 system. I have had DEC people visit my place, look at the RKV11-D, and say "DEC never made anything like that!". :-) Alan "and I don't exist either" Frisbie
Re: PDP-11/45 RSTS/E boot problem
> From: Jerry Weiss > I am trying to understand how the diagnostics didn't reveal this defect. Vondada #12: "Diagnostics are highly efficient in finding solved problems." :-) Noel
Re: PDP-11/45 RSTS/E boot problem
On 2/13/19 5:20 PM, Jerry Weiss wrote: I am trying to understand how the diagnostics didn't reveal this defect. I see in the source for the diagnostic DZRKH-F there are tests for address in the 28K-32K range and also for the 32K boundary. So, to catch this defect the diagnostic would have to have a test or tests which crossed _specifically_ the 30K boundary within a transfer. The detailed symptom was a false overflow into the ex.mem bits on the 30K boundary, causing a skip forward in bus addresses during the transfer. The detailed fail on the 7430 (E34 on the M795) was that it acted as if input pin 11 was always H. Now, all this said, I don't think I have ever run ZRKH! I *did* run and pass the earlier ZRKA, ZRKB, and ZRKC, but somehow missed ZRKH... I'm wondering now how different it is from ZRKC? Will have to take a look... --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 02/13/2019 10:40 AM, Noel Chiappa via cctalk wrote: He's also had to do a tremendous amount of work on it to get it running, starting with building an entire new power harness. Yes, the 5V power harness between the regulators and the backplane were a real mess on the 11/45 we got second hand. I probably SHOULD have rebuilt the entire harness and replaced all the Mate-n-Lock connectors on the regulators, too, but we were always just wanting to get the machine running again. Jon
Re: PDP-11/45 RSTS/E boot problem
On 2/13/19 1:43 AM, Fritz Mueller via cctalk wrote: SUCCESS!! Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). Pulled, socketed, replaced, and off she goes! I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk. *THAT* was a really fun and rewarding hunt :-) First message in the thread was back on Dec 30, 2018. Lots of debugging and failed DRAM repairs, then the final long assault to this single, failed gate... Thanks to all here for the help and resources, and particular shout-outs for Noel and Paul who gave generously of their time and attention working through the densest bits, both on and off the list. I predict a long happy weekend and a big power bill at the end of the month :-) cheers, --FritzM. Congratulations. Well Done. I am trying to understand how the diagnostics didn't reveal this defect. I see in the source for the diagnostic DZRKH-F there are tests for address in the 28K-32K range and also for the 32K boundary. I'm trying to make sense of the M795 to get a better understanding. Any addition data on how the 7430 failed (input bad, output bad, etc ?) Jerry
Re: PDP-11/45 RSTS/E boot problem
> On Feb 13, 2019, at 3:03 PM, Paul Koning wrote: > > ... > My suspicion is that things not solved by diagnostics would be escalated to > the "wizard from Maynard". And they'd probably start replacing whole > subsystems. This says that Fritz actually was a new "Wizard from Maynard" in solving this problem. Only more so -- because he didn't have the luxury of just swapping out whole sections of the machine with new kits, or a backup team of subsystem experts at the home office to call on. That confirms it's really a very impressive performance. paul
Re: PDP-11/45 RSTS/E boot problem
> On Feb 13, 2019, at 3:54 PM, Ethan Dicks via cctalk > wrote: > > ... > It's interesting that it was a bad 7430 in yours. I find that for > equipment of that vintage, my usual suspects are failed 7474s and > failed 7440s, probably 80% of the total. Behind that, it goes 7420s > and then maybe 7430s. When our 11/45 failed in the MMU in 1975, my classmate Josh Rosen traced the failing path on the schematics. When Jim Newport the field service engineer showed up, Josh described the diagnostics result that pointed at the failed path, and added "This is the failed chip" (pointing to one particular chip. Jim asked "Why that one?" Josh answered "because that is the most expensive chip". It turned out he was right. paul
Re: PDP-11/45 RSTS/E boot problem
On Wed, Feb 13, 2019 at 2:43 AM Fritz Mueller via cctalk wrote: > > SUCCESS!! Outstanding! > Put the M795 out on an extender, loaded 16 in RKBAR, and had a look > around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). > Pulled, socketed, replaced, and off she goes! > > I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk. Nice. I have had an RK11-C for a long time that I've never tried to power up (I got an RKV11-D and used that on Qbus machines instead). The saga has been interesting for me as I contemplate getting mine working in the next couple of years. I had to look up the M795. I had forgotten there was one dual-height module in the entire controller. It's interesting that it was a bad 7430 in yours. I find that for equipment of that vintage, my usual suspects are failed 7474s and failed 7440s, probably 80% of the total. Behind that, it goes 7420s and then maybe 7430s. -ethan
Re: PDP-11/45 RSTS/E boot problem
> On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk > wrote: > > ... > Maybe that story about FE's using Unix as a test to confirm operation > even when diagnostics said the machine was OK was not so much just a > legend? It still fels like a legend. My experience with DEC field service engineers is that they used the diagnostics. In the PDP-11 era, Unix knowledge around DEC was pretty sparse, especially early on when it could be found only in the Telephone Products Group (Armando Stettner). RSTS would be more plausible, but I never saw that in the hads of FS engineers either. By and large diagnostics would find problems. I've seen a number in the 1970s, including a messy data path failure in the 11/45 MMU where we (college students) did the initial diagnosis while the FS engineer was on his way. My suspicion is that things not solved by diagnostics would be escalated to the "wizard from Maynard". And they'd probably start replacing whole subsystems. I've seen that once, when our college RSTS-11 system (11/20, 16 DL-11 lines) was crashing on average once a day for months. DEC brought in several of those "wizards". The "fix" was to replace the 11/20 by a "spare part" -- an 11/45 with more memory, a DH11, and RSTS/E. Decades later I was told that the wizards actually pinned the blame on the college FM broadcast transmitter, about 200 feet down the hall from the computer center. That may well be, though I didn't heard that at the time. RSTS did get used in manufacturing, at Final Assembly & Test sites like Westminster MA and Salem NH, where PDP-11 systems large enough to run RSTS/E were subjected to a load test of exerciser programs running under that OS. The way it was explained to us is that a system that would be happy with such a test would also be happy with any customer application. It's not clear if that was because RSTS would load things more than most, or was more finicky about hardware glitches than most, but it certainly was the practice for quite some time. Of course, not all PDP-11 configurations could be tested that way. paul
Re: PDP-11/45 RSTS/E boot problem
On 2/13/2019 1:43 AM, Fritz Mueller via cctalk wrote: > SUCCESS!! > > Put the M795 out on an extender, loaded 16 in RKBAR, and had a look > around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). > Pulled, socketed, replaced, and off she goes! > > I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk. > > *THAT* was a really fun and rewarding hunt :-) First message in the thread > was back on Dec 30, 2018. Lots of debugging and failed DRAM repairs, then > the final long assault to this single, failed gate... > > Thanks to all here for the help and resources, and particular shout-outs for > Noel and Paul who gave generously of their time and attention working through > the densest bits, both on and off the list. > > I predict a long happy weekend and a big power bill at the end of the month > :-) > > cheers, > --FritzM. > > Congratulations.As another poster mentioned, it has been fascinating to watch and learn, day by day, as you worked on the problem with Noel and Paul's help. And I learned a little bit more about my 11/45 (that it indeed had had a processor field upgrade), which I had not looked at very closely before. Maybe that story about FE's using Unix as a test to confirm operation even when diagnostics said the machine was OK was not so much just a legend?
Re: PDP-11/45 RSTS/E boot problem
On 02/13/2019 01:43 AM, Fritz Mueller via cctalk wrote: SUCCESS!! Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). Pulled, socketed, replaced, and off she goes! WOW! Good detective work, that certainly was a WEIRD problem, and not where I thought it was going to be. Glad you got it solved! Jon
Re: PDP-11/45 RSTS/E boot problem
> From: Alan Frisbie > I am finding this entire discussion extremely fascinating! Every day I > look forward to reading the latest twists in the plot. :-) > The ideas, hunches, tests, dead ends, and results are an excellent > example of the debugging process. Yeah, and it was a Duesie of a problem, too. Although once we got clear of the bad data from the console and my confusion about R5, and it became clear that in the Unix failure, the pure text was being damaged, from that point it was pretty straightforward to track it down (albeit one that needed detailed understanding of how V6 handled pure texts - and luckily I'd come to understand that part of the system a bit while getting the QSIC running). Fritz's lucky discovery, early on, that it was location dependent was also a big help. Noel
Re: PDP-11/45 RSTS/E boot problem
> On Feb 13, 2019, at 2:43 AM, Fritz Mueller via cctalk > wrote: > > SUCCESS!! > > Put the M795 out on an extender, loaded 16 in RKBAR, and had a look > around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). > Pulled, socketed, replaced, and off she goes! > > I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk. Congratulations. You have successfully performed a repair of the type done at customer sites by highly trained DEC field service personnel. They were the ones who traveled with an oscilloscope, a tool case including soldering iron, and a case full of replacement chips. One difference is that the diagnostics didn't point to the problem, which in my experience is rather an unusual situation. Nicely done. paul
Re: PDP-11/45 RSTS/E boot problem
SUCCESS!! Put the M795 out on an extender, loaded 16 in RKBAR, and had a look around with a logic probe. Narrowed it down to E34 (a 7430 8-input NAND). Pulled, socketed, replaced, and off she goes! I can now successfully boot and run both V6 Unix and RSTS/E V06C from disk. *THAT* was a really fun and rewarding hunt :-) First message in the thread was back on Dec 30, 2018. Lots of debugging and failed DRAM repairs, then the final long assault to this single, failed gate... Thanks to all here for the help and resources, and particular shout-outs for Noel and Paul who gave generously of their time and attention working through the densest bits, both on and off the list. I predict a long happy weekend and a big power bill at the end of the month :-) cheers, --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> > > Likely some disk controllers did NOT SUPPORT crossing 64K boundaries! > > > > No; the RK11 spec says "[the two extended memory bits] make up a two-bit > > counter that increments each time the RKBA overflows". > > > > The actual error turns out to be slightly different to my guess; there's > > a spurious overflow from the low 16-bit register to these bits at 017. > > Maybe a problem with E29 or E34 on the M795 module? I am finding this entire discussion extremely fascinating! Every day I look forward to reading the latest twists in the plot. The ideas, hunches, tests, dead ends, and results are an excellent example of the debugging process. I am awaiting the exciting Perry Mason style conclusion, where the guilty chip stands up and confesses on the stand. :-) Alan "Where were you on the night of the crime?" Frisbie
Re: PDP-11/45 RSTS/E boot problem
> From: Jerry Weiss > it is impressive that UNIX booted successfully without tripping over a > boundary. Well, V6 is (or can be configured to be) extraordinarily small, so I'm not surprised it booted OK without going over the 017 mark. I have this persistent memory that the -11/40 in the CSR group at MIT had only 3 banks of MM-L (@16KB each) when I first got there! Which is plausible; the smallest V6 config would have about 22KB of core text, and about 2KB of initialized data. If you cut all the parameters to the bone (minimal number of disk buffers, etc) you could probably get away with say 6KB of un-initialized data. That would leave you 18KB for user programs on such a system, a bit less than their recommendation of 24KB minimum for users, but probably minimally useable. We quickly added more memory, I'm sure, but I don't now remember how/what! Later on it was converted to an -11/45, and then we got an Able ENABLE, but that would have been a couple of years later. Noel
Re: PDP-11/45 RSTS/E boot problem
On 2/11/19 12:31 PM, Noel Chiappa via cctalk wrote: > From: Jerry Weiss > Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K > boundaries. Interesting! What's odd is that the DR11-B uses the Bus Interface card (M7219) from the RC11 controller, and that _can_ cross moby boundaries, so clearly it has the right overflow output; someone just decided not to implement it - the DR11-B sets ERROR instead on an address overflow. Wierd. Yes the overflow sets error and halts the transfer. There are registers for the extended bits in the DR11B, just missing a few gates to increment. My recollection is that my simple mod wouldn't allow the read back of the incremented extended bits, but in my use case this was never a problem. Anyway, it will be interesting to see if RSTS operates correctly once this problem is fixed... Noel Yes if turns out the increment was not functional for one or both extended address bits, it is impressive that UNIX booted successfully without tripping over a boundary. Jerry
Re: PDP-11/45 RSTS/E boot problem
> From: Jerry Weiss > Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K > boundaries. Interesting! What's odd is that the DR11-B uses the Bus Interface card (M7219) from the RC11 controller, and that _can_ cross moby boundaries, so clearly it has the right overflow output; someone just decided not to implement it - the DR11-B sets ERROR instead on an address overflow. Wierd. Anyway, it will be interesting to see if RSTS operates correctly once this problem is fixed... Noel
Re: PDP-11/45 RSTS/E boot problem
> On Feb 11, 2019, at 1:13 PM, Jerry Weiss wrote: > > On 2/11/19 11:50 AM, Paul Koning via cctalk wrote: >>> ... >> You may be thinking about PC controllers like the floppy controller. I >> can't remember ANY DEC DMA device controller that had boundary crossing >> limits of any kind. It certainly isn't a restriction in the RK11. >> >> paul >> > Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K > boundaries. > > I did however via a single chip "dead bug" modification, modify one to > accomplish this. > > Jerry That's rather shocking. I meant my comment to apply to every DMA controller, not just disks. I never used the DR11-B, though. Perhaps there are other obscure devices that get this wrong. But, for example, even devices like DMC-11 and TS-11 got it right. There are of course Q-bus devices that only do a partial address space, but my point is that whatever the number of address bits implemented, address arithmetic is as a matter of normal design done across all of them, not across a subset. paul
Re: PDP-11/45 RSTS/E boot problem
On 2/11/19 11:50 AM, Paul Koning via cctalk wrote: On Feb 11, 2019, at 11:12 AM, Jon Elson via cctalk wrote: On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote: A look at the RK11 registers after the swap-out showed an anomaly; something about the extended memory address bits? (Maybe a multi-block transfer than crosses a 64KB boundary? That would explain the address sensitivity we were seeing.) Hopefully he'll track it to its lair shortly. OH, BOY! I think you may have found it. Likely some disk controllers did NOT SUPPORT crossing 64K boundaries! The diags would not detect that, as it was likely expected behavior. I would suspect the driver would need to break up these operations. You may be thinking about PC controllers like the floppy controller. I can't remember ANY DEC DMA device controller that had boundary crossing limits of any kind. It certainly isn't a restriction in the RK11. paul Though not a disk controller, the DEC DR11-B/DA11-B would not cross 64K boundaries. I did however via a single chip "dead bug" modification, modify one to accomplish this. Jerry
Re: PDP-11/45 RSTS/E boot problem
Yup; specifically, the symptoms are consistent with 'D15 RKBA=ALL 1 L' being incorrectly generated at BA 16, causing an increment to EX.MEM, causing a skip in the DMA. So it looks like problem with bit 12 in that carry logic; I'll check E28 and E34 when I get back to it tonight, but I have to move the machine around so I can climb inside :-) --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On Mon, Feb 11, 2019 at 6:03 PM Noel Chiappa via cctalk wrote: > > > From: Jon Elson > > > Likely some disk controllers did NOT SUPPORT crossing 64K boundaries! > > No; the RK11 spec says "[the two extended memory bits] make up a two-bit > counter that increments each time the RKBA overflows". > > The actual error turns out to be slightly different to my guess; there's > a spurious overflow from the low 16-bit register to these bits at 017. Maybe a problem with E29 or E34 on the M795 module? -tony
Re: PDP-11/45 RSTS/E boot problem
> From: Jon Elson > Likely some disk controllers did NOT SUPPORT crossing 64K boundaries! No; the RK11 spec says "[the two extended memory bits] make up a two-bit counter that increments each time the RKBA overflows". The actual error turns out to be slightly different to my guess; there's a spurious overflow from the low 16-bit register to these bits at 017. I can see how the diags didn't catch that one! Unless you try a multi-block xfer that walks across the boundary A perfect example of Vonada #12. Noel
Re: PDP-11/45 RSTS/E boot problem
> On Feb 11, 2019, at 11:12 AM, Jon Elson via cctalk > wrote: > > On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote: >> A look at the RK11 registers after the swap-out showed an anomaly; something >> about the extended memory address bits? (Maybe a multi-block transfer than >> crosses a 64KB boundary? That would explain the address sensitivity we were >> seeing.) Hopefully he'll track it to its lair shortly. >> >> > OH, BOY! I think you may have found it. Likely some disk controllers did > NOT SUPPORT crossing 64K boundaries! The diags would not detect that, as it > was likely expected behavior. I would suspect the driver would need to break > up these operations. You may be thinking about PC controllers like the floppy controller. I can't remember ANY DEC DMA device controller that had boundary crossing limits of any kind. It certainly isn't a restriction in the RK11. paul
Re: PDP-11/45 RSTS/E boot problem
On Mon, Feb 11, 2019 at 4:13 PM Jon Elson via cctalk wrote: > > On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote: > > A look at the RK11 registers after the swap-out showed an anomaly; something > > about the extended memory address bits? (Maybe a multi-block transfer than > > crosses a 64KB boundary? That would explain the address sensitivity we were > > seeing.) Hopefully he'll track it to its lair shortly. > > > > > OH, BOY! I think you may have found it. Likely some disk > controllers did NOT SUPPORT crossing 64K boundaries! The > diags would not detect that, as it was likely expected > behavior. I would suspect the driver would need to break up > these operations. I _think_ the RK11-C should cross a 64K boundary correctly. There's an output from the low 16 bits bus address module (M795) on pin BP2 'D15 RKBA=ALL 1 L' (page 27 of the schematic on Bitsavers) that goes to the counter that holds the 2 extended bits, pin C1 of the M239 in slot A17 (page 13 of the same .pdf) [I am working from 'RK11-C_schemFeb1971.pdf' from bitsavers] Of course if there is a fault in this area then it will not correctly increment the top 2 bits, but that might give you somewhere to check. -tony
Re: PDP-11/45 RSTS/E boot problem
On 02/11/2019 07:04 AM, Noel Chiappa via cctalk wrote: A look at the RK11 registers after the swap-out showed an anomaly; something about the extended memory address bits? (Maybe a multi-block transfer than crosses a 64KB boundary? That would explain the address sensitivity we were seeing.) Hopefully he'll track it to its lair shortly. OH, BOY! I think you may have found it. Likely some disk controllers did NOT SUPPORT crossing 64K boundaries! The diags would not detect that, as it was likely expected behavior. I would suspect the driver would need to break up these operations. Jon
Re: PDP-11/45 RSTS/E boot problem
> From: Fritz Mueller > If, as you are suspecting, we find damning evidence pointing > specifically to the RK11 I got an update from Fritz. As you all will recall, the problem seemed to be a corrupted 'pure text'. So the question was 'when was it damaged, and how'. After some confusion caused by different OS images (the 'Ritchie' and 'Wellsch' distros), he managed to get a look at the text in main memory after it was first read in from the file system, and before it was swapped out (it was showing up damaged after a swap out/in cycle); it looked good at that point. The copy written out to the swap disk however, not so good. A look at the RK11 registers after the swap-out showed an anomaly; something about the extended memory address bits? (Maybe a multi-block transfer than crosses a 64KB boundary? That would explain the address sensitivity we were seeing.) Hopefully he'll track it to its lair shortly. We also need to characterize exactly what the fault is, because the DEC RK11 diagnostics weren't finding it, so it seems the diagnostic suite could use an enhancement Noel
Re: PDP-11/45 RSTS/E boot problem
>> This seems the best place to start with the LA this weekend then. > > I'm going to respectfully semi-disagree! I think that at this point there's a > good chance we can localize to within a gate or two before we start applying > test instruments. Oh, I agree completely, Noel. I should have more precisely said "when/if we get to the LA this weekend, this seems the place to start." If, as you are suspecting, we find damning evidence pointing specifically to the RK11, I'm going to want to watch it going about its business; the LA will be a good tool for that. And yes, one of the beautiful things about these machines is how far you can get with just a set of extenders, a KM11, the front panel, and a 'scope. --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> From: Fritz Mueller > This seems the best place to start with the LA this weekend then. I'm going to respectfully semi-disagree! I think that at this point there's a good chance we can localize to within a gate or two before we start applying test instuments. My thinking starts with two pieces of data; i) your discovery that when the MM trap happens, the end of the pure text segment contains a fragment of code from 04000 lower in the text, and ii) the data that the location in main memory where that _should_ have been is full of zeros - i.e. never been written into. The latter is, I think, due to the fact that Unix clears all of main memory on startup; I think it's just chance that that memory hasn't been used yet for something else, and is still 0's. (Unix does clear main memory in a few places during regular operation - e.g. when expanding the stack, the newly added area is 0'd - but in general, e.g. when swapping in a pure text, it doesn't seem to bother, which makes sense since it's all about to be over-written anyway.) Anyway, those two, together with my previous analysis that this was unlikely to have happened when the text was first being read in from the file, block by block, lead me to believe that the likely cause is that the BAR on the RK11 skipped up a whole bunch (setting the 04000 bit at some point) when it was reading the pure text back in from the swap, and skipped writing into that zero-filled block of main memory, putting the stuff that should have gone there up 04000, instead. (Why it's swapping the text back in is too complicated to be worth explaining here; anyone who _really_ wants to know should look here: http://gunkies.org/wiki/Unix_V6_internals in the last section, "exec() and pure-text images".) It's easy to confirm all these suppositions/deductions fairly easily, without having to connect up, configure, etc the LA: we can just stop the machine after the text is first read in (in xalloc()) from the file-system, and confirm that the text looks good there; if so, either the swap-out (albeit unlikely, since that doesn't account for the 0's) or subsequent swap-in had an issue. The latter would be easy to confirm: just halt the machine after the text is swapped in, and see what the RK registers contain. At that point, as I said, we'll know to within a few gates where the issue is, and then it'll be time to bring out the LA. Actually, a plain oscilloscope would do; it's interesting to recollect that these machines were designed and maintained without benefit of a LA, purely with an oscilloscope! We're so spoiled now! :-) Noel
Re: PDP-11/45 RSTS/E boot problem
>>> How about a Unibus trace? >> >> I don't think my sad little HP LA has enough buffer for that... > > You could use triggers in innovative ways. Ah, quite right, gentlemen. This seems the best place to start with the LA this weekend then. --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 2/7/2019 11:47 AM, Noel Chiappa via cctalk wrote: > > The interesting point is that when V6 first copies the text in from the file > holding the command (using readi(), Lions 6221 for anyone who's masochistic > enough to try and actually follow this :-), it reads it in starting from the > bottom, one disk block at a time (since in V6, files are not stored > contiguously). > I remember when Lions first showed up. I have a copy of a copy made back in the day. JRJ
Re: PDP-11/45 RSTS/E boot problem
torsdag 7 februari 2019 skrev Fritz Mueller via cctalk < cctalk@classiccmp.org>: > > > How about a Unibus trace? That would give you the RK11 commands as well > as the data it sends in response. > > I don't think my sad little HP LA has enough buffer for that... You could use triggers in innovative ways. Maybe trigger on that particular data on that particular address. What takes place just before this? Is it DMA or is it the CPU moving the data. Is it just reading out bad or is it written bad? That should be possible to figure out. You will probably get quite far with only 16 data and 16 address bits (if you have the smallest analyzer). Of course more is better... /Mattis > >--FritzM.
Re: PDP-11/45 RSTS/E boot problem
> How about a Unibus trace? That would give you the RK11 commands as well as > the data it sends in response. I don't think my sad little HP LA has enough buffer for that... --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> On Feb 7, 2019, at 1:37 PM, Fritz Mueller via cctalk > wrote: > > >> On Feb 7, 2019, at 9:47 AM, Noel Chiappa via cctalk >> wrote: >> >> So, with UISA0 containing 01614, that gives us PA:161400 + 04200 = PA:165600, >> I think. And it wound up at PA:171600 - off by 04000 (higher) - which is >> obviously an interesting number. > > Thanks, Noel. > >> ...it might be interesting to look at PA:165600 and see what's actually >> _there_ > > A sea of zeros, as it turns out. > > I'm thinking it might be worth obtaining a full memory dump of the text > segment at the point of fault (I can do this with a small toggle-in program > to dump it over the serial console), , and then compare that to the complete > text section in the ls binary. That would give us more of a clue about > whether blocks of memory are duplicated or swapped, what the size, alignment, > and stride of the corrupted blocks is, how many there are, etc. > > I'll get an IR trace out this weekend. Another thing I _could_ do with the > LA is an IO command trace on the RK11 (though that's a lot of probes to hook > up to get disk address, count, and memory address). How about a Unibus trace? That would give you the RK11 commands as well as the data it sends in response. paul
Re: PDP-11/45 RSTS/E boot problem
> On Feb 7, 2019, at 9:47 AM, Noel Chiappa via cctalk > wrote: > > So, with UISA0 containing 01614, that gives us PA:161400 + 04200 = PA:165600, > I think. And it wound up at PA:171600 - off by 04000 (higher) - which is > obviously an interesting number. Thanks, Noel. > ...it might be interesting to look at PA:165600 and see what's actually > _there_ A sea of zeros, as it turns out. I'm thinking it might be worth obtaining a full memory dump of the text segment at the point of fault (I can do this with a small toggle-in program to dump it over the serial console), , and then compare that to the complete text section in the ls binary. That would give us more of a clue about whether blocks of memory are duplicated or swapped, what the size, alignment, and stride of the corrupted blocks is, how many there are, etc. I'll get an IR trace out this weekend. Another thing I _could_ do with the LA is an IO command trace on the RK11 (though that's a lot of probes to hook up to get disk address, count, and memory address). --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 02/06/2019 09:11 PM, Noel Chiappa via cctalk wrote: > From: Jon Elson > I'm thinking it is bad memory. ... I think it is just a bad memory chip Nothing so simple, I'm afraid! The memory actually contains: PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144 and it's _supposed_ to be holding: PA:171600: 110024 010400 000167 16 010500 010605 010446 010346 This together with Fritz's discovery of that first 'bad memory' pattern _elsewhere_ in the binary for the command makes it look pretty likely that some sort of other error has wound up with stuff being put in the wrong location. OK, now it is starting to look like an address problem. That could actually be several things. Possibly something going wrong in DMA, and the disk data is being written into the wrong place in memory. If the two places the same data show up are related by some simple binary transposition, maybe under some cases a write to memory gets written simultaneously into two banks of the memory. A memory interference test OUGHT to pick up something like that. It could also be a bus problem, or something going haywire in the MMU. And, one other possibility is that the duplicate data is a disk buffer or cache that was then copied to the location to be executed. Jon
Re: PDP-11/45 RSTS/E boot problem
> Seems a little less-likely to be the problem, given(?) as well that you have > fairly consistent (is deterministic overstating it?) behaviour. Yeah. We've gotten to the point now where enough layered problems have been cleared away that the remaining behavior is quite deterministic. > If you wanted to test it by experiment, without having to remove the > installed Rs, you could test-clip another R in parallel with the 38.4K, > probably something around 200K, to shorten the 555 period. Yes; and I think a quick solder tack for that would even be easier to manage than clips in there. Will give that a go this weekend. cheers, --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 10:37 PM, Fritz Mueller via cctalk wrote: >> 4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the >> 14.5uS from the manual would give 1.86 mS, 7% shy of 2. >> The schematic specs 1% resistors, and the parts list does appear to spec a >> high-tolerance "1%200PPM" cap. >> >> Although there are the internal voltage divider Rs in the 555 which are also >> critical for the timing and everything is 40+ years old. >> >> Idle speculation at my distance, we'll see what Fritz observes. > > Brent: 11.8us, 6.4us position > Manual: 14.5us, 6.0us positive > Actual: 15.2us, 8.5us positive > > So yeah, a little pokey there... 15.2uS gives a 1.95mS refresh, so it's awfully close to the 2mS spec, but still within. The datasheet I was looking at doesn't seem to give any spec for tolerance on the refresh so one would guess there's a safety margin built into the 2mS spec. Seems a little less-likely to be the problem, given(?) as well that you have fairly consistent (is deterministic overstating it?) behaviour. If you wanted to test it by experiment, without having to remove the installed Rs, you could test-clip another R in parallel with the 38.4K, probably something around 200K, to shorten the 555 period.
Re: PDP-11/45 RSTS/E boot problem
> 4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the 14.5uS > from the manual would give 1.86 mS, 7% shy of 2. > The schematic specs 1% resistors, and the parts list does appear to spec a > high-tolerance "1%200PPM" cap. > > Although there are the internal voltage divider Rs in the 555 which are also > critical for the timing and everything is 40+ years old. > > Idle speculation at my distance, we'll see what Fritz observes. Brent: 11.8us, 6.4us position Manual: 14.5us, 6.0us positive Actual: 15.2us, 8.5us positive So yeah, a little pokey there...
Re: PDP-11/45 RSTS/E boot problem
It looks like the question boils down to either "how did that part of the binary get to that part of memory?", or "how did we end up executing out of that part of memory?" More the former, I think... Noel, is it possible for you deduce where Unix _should_ be placing these "bad" bits (from file offset octal 4220)? Maybe a comparison of addresses where the bits should be, with addresses where the "bad" copy ends up, could point us at some particular failure modes to check in the KT11, CPU, or RK11... --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> From: Fritz Mueller > It looks like the question boils down to either "how did that part of > the binary get to that part of memory?", or "how did we end up > executing out of that part of memory?" More the former, I think. UISA0 contains 001614, and physical memory at 0161400 does contain the first few instructions of the command's binary, so that 01614 is probably correct for the base address of segment (page) 0, which contains all the code for the command. (Without looking through the OS's guts, I can't confirm, from interal data structures, that that's where it decided to put the command's binary.) The PC at fault time is 010210, which is correct for the frame setup function, CSV; and looking at the contents of the stack, registers etc makes it pretty certain it had just done the "JSR R5, CSV" to get there. And 0161400 + 010210 = 0171610, which contains something completely different from what's in the command binary at 010210! > Could still be a memory issue _elsewhere_ that lands us there, of > course... Could also be a translation error lurking in the KT11, or a > CPU bug not found by any of the DEC diagnostic suites. Yup. Like I said, good news is we're down to one problem; bad news is it's a Duesie! Noel
Re: PDP-11/45 RSTS/E boot problem
> From: Jon Elson > I'm thinking it is bad memory. ... I think it is just a bad memory chip Nothing so simple, I'm afraid! The memory actually contains: PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144 and it's _supposed_ to be holding: PA:171600: 110024 010400 000167 16 010500 010605 010446 010346 This together with Fritz's discovery of that first 'bad memory' pattern _elsewhere_ in the binary for the command makes it look pretty likely that some sort of other error has wound up with stuff being put in the wrong location. Noel
Re: PDP-11/45 RSTS/E boot problem
On 2/6/19 6:25 PM, Jon Elson via cctalk wrote: I'm thinking it is bad memory. It seems unlikely bus problems could alter only ONE BIT per word, so I think it is just a bad memory chip, and finding multiple words where the 01 bit is now turned on sure looks like that kind of problem. So, there was an issue specifically relating to bit 12 on the front panel (d'oh!), which I have now cleared up. Furthermore, the "authoritative" sequence of 16 words obtained from the front panel last night, after addressing this issue, is: PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144 PA:171620: 004767 000206 000405 012404 012467 016124 000167 177346 ...and, as it turns out, this exact sequence also occurs within the ls binary, on disk (per "od"): 0004220 016162 004767 000224 000414 016700 016152 016702 016144 0004240 004767 000206 000405 012404 012467 016124 000167 177346 So, the memory there _seems_ fine with the latest info at our disposal. It looks like the question boils down to either "how did that part of the binary get to that part of memory?", or "how did we end up executing out of that part of memory?" Could still be a memory issue _elsewhere_ that lands us there, of course... Could also be a translation error lurking in the KT11, or a CPU bug not found by any of the DEC diagnostic suites. I will scope the refresh clock when I get home tonight, and I'm planning on hauling out the logic analyzer for an IR trace this weekend... --FritzM. P.S. One idea that popped into my head recently, after a suggestion here to check the KT11 address translation adders, and my response "but the diagnostics!"... A bug in one of the carry lookahead generators used between the bit slices of that adder could cause a mistranslation on only a fairly selective subset of virtual addresses, and this might conceivably be missed by the KT11 diagnostics? *IF* that's the case and we can chase the IR trace upstream to the place of an unlucky mistranslation, it will be pretty easy to track down then in the hw and fix.
Re: PDP-11/45 RSTS/E boot problem
On 02/06/2019 05:39 PM, Fritz Mueller via cctalk wrote: On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk wrote: Is the schematic available for the memory board at-issue? Curious myself to see what approach for refresh DEC used. Yes, here: http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf There is also a technical manual adjacent, with circuit descriptions. I will scope this up tonight and take a look! Yup, page 6, a 555 RC refresh timer! Jon
Re: PDP-11/45 RSTS/E boot problem
On 02/06/2019 04:24 PM, Brent Hilpert via cctalk wrote: On 2019-Feb-06, at 1:21 PM, Noel Chiappa via cctalk wrote: From: Brent Hilpert what about the refresh circuitry of the memory board? ... It might also explain why a number of 4116s were (apparently) failing earlier in the efforts ... replacing them might have just replaced them with 'slightly better' chips, i.e. with a slightly longer refresh tolerance. Ooh, excellent idea! Is the schematic available for the memory board at-issue? Curious myself to see what approach for refresh DEC used. Hmm, yes, if the refresh is done by one-shots and RC timing, a failed cap could silently kill the refresh trigger. An easy way to check is put something in a few locations and halt the CPU for some time (seconds to minutes). If the content is now gone, then the refresh is very likely not being done. Jon
Re: PDP-11/45 RSTS/E boot problem
On 02/06/2019 12:53 PM, Noel Chiappa via cctalk wrote: If so, i) we're down to one problem (good news), and our problem turns into finding out how that section of the code got trashed (bad news). I'm thinking it is bad memory. It seems unlikely bus problems could alter only ONE BIT per word, so I think it is just a bad memory chip, and finding multiple words where the 01 bit is now turned on sure looks like that kind of problem. It could, of course, be a bad driver or receiver on the memory board. Might also check the other voltage in the memory array (+12 or whatever was used internally in the particular memory) and also look for degraded caps on the board. Jon
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 5:29 PM, Paul Koning wrote: >> On Feb 6, 2019, at 8:25 PM, Brent Hilpert via cctalk >> wrote: >> On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote: > On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk > wrote: > > Is the schematic available for the memory board at-issue? > Curious myself to see what approach for refresh DEC used. Yes, here: http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf >>> >>> For completeness, from the technical manual: >>> >>> "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H >>> and the refresh address. Sig- nal REF CLK H is derived from a 555 timer >>> (E5) which is set up as a free running oscillator, powered by the + IS V / >>> + 12 V module input (V-555). The REF CLK H signal oscillates with a period >>> of 14.5us and has a positive pulse width of 6us during each period." >> >> So I could have saved myself some fun if I had read the manual rather than >> just looking at the schematic. >> Not that they're way out of whack, but the mild disparity between the >> manual's 14.5uS and my calculated 11.7uS is curious >> (the calculation being based on the schematic RC values and the 555 >> equations). > > Perhaps the period was changed in a schematic rev or ECO, and the manual > wasn't updated to reflect it. It would be interesting to check the data > sheet for the RAM chip to see what it likes for refresh cycle. And given > that this is an RC oscillator your theory about out of tolerance timing > definitely deserves checking. Checking further.. 4116 datasheet specs 2mS, my calcs give a refresh period of 1.5mS, the 14.5uS from the manual would give 1.86 mS, 7% shy of 2. The schematic specs 1% resistors, and the parts list does appear to spec a high-tolerance "1%200PPM" cap. Although there are the internal voltage divider Rs in the 555 which are also critical for the timing and everything is 40+ years old. Idle speculation at my distance, we'll see what Fritz observes. Could be other problems in the refresh circuitry too, like failed outputs from the row counter, etc.
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 3:39 PM, Fritz Mueller wrote: >> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk >> wrote: >> >> Is the schematic available for the memory board at-issue? >> Curious myself to see what approach for refresh DEC used. > > Yes, here: > http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf > > There is also a technical manual adjacent, with circuit descriptions. > > I will scope this up tonight and take a look! Mixed up To: fields. The following was intended to go to the list and was originally sent a moment before I saw Fritz's message mentioning the 555: Ha!, simple free-running 555 oscillator generating the refresh cycles (pdf.pg27). I suspect there is a mistake in the schematic there: V-555 more likely connects on the other side of R4 (E5.4-C1-R4, rather than E5.7-R4-R5) to make it into the standard 555 astable circuit. Based on that, calculations indicate that the output from E5 (TP18) should be around 85 KHz, cycling 6.4 uS high, 5.3 uS low. So it's generating a refresh cycle every 11.8 uS. With 7 bits used from counter E43 (128 rows) for full refresh, that's a cell refresh every 1.5mS which (without having checked the 4116 specs) sounds sensible for a DRAM from that period. Note the 555 (E5) is running on +12 or +15V, with a R voltage divider on the output before driving into TTL.
Re: PDP-11/45 RSTS/E boot problem
> On Feb 6, 2019, at 8:25 PM, Brent Hilpert via cctalk > wrote: > > On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote: On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk wrote: Is the schematic available for the memory board at-issue? Curious myself to see what approach for refresh DEC used. >>> >>> Yes, here: >>> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf >> >> For completeness, from the technical manual: >> >> "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H >> and the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) >> which is set up as a free running oscillator, powered by the + IS V / + 12 V >> module input (V-555). The REF CLK H signal oscillates with a period of >> 14.5us and has a positive pulse width of 6us during each period." > > So I could have saved myself some fun if I had read the manual rather than > just looking at the schematic. > Not that they're way out of whack, but the mild disparity between the > manual's 14.5uS and my calculated 11.7uS is curious > (the calculation being based on the schematic RC values and the 555 > equations). Perhaps the period was changed in a schematic rev or ECO, and the manual wasn't updated to reflect it. It would be interesting to check the data sheet for the RAM chip to see what it likes for refresh cycle. And given that this is an RC oscillator your theory about out of tolerance timing definitely deserves checking. paul
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 5:11 PM, Fritz Mueller via cctalk wrote: >>> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk >>> wrote: >>> >>> Is the schematic available for the memory board at-issue? >>> Curious myself to see what approach for refresh DEC used. >> >> Yes, here: >> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf > > For completeness, from the technical manual: > > "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H > and the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) > which is set up as a free running oscillator, powered by the + IS V / + 12 V > module input (V-555). The REF CLK H signal oscillates with a period of 14.5us > and has a positive pulse width of 6us during each period." So I could have saved myself some fun if I had read the manual rather than just looking at the schematic. Not that they're way out of whack, but the mild disparity between the manual's 14.5uS and my calculated 11.7uS is curious (the calculation being based on the schematic RC values and the 555 equations).
Re: PDP-11/45 RSTS/E boot problem
>> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk >> wrote: >> >> Is the schematic available for the memory board at-issue? >> Curious myself to see what approach for refresh DEC used. > > Yes, here: > http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf For completeness, from the technical manual: "The refresh logic, shown in sheet 6 of the print set, generates REF CLK H and the refresh address. Sig- nal REF CLK H is derived from a 555 timer (E5) which is set up as a free running oscillator, powered by the + IS V / + 12 V module input (V-555). The REF CLK H signal oscillates with a period of 14.5us and has a positive pulse width of 6us during each period."
Re: PDP-11/45 RSTS/E boot problem
> On Feb 6, 2019, at 2:24 PM, Brent Hilpert via cctalk > wrote: > > Is the schematic available for the memory board at-issue? > Curious myself to see what approach for refresh DEC used. Yes, here: http://bitsavers.trailing-edge.com/pdf/dec/pdp11/memory/MP00672_MS11L_engDrw.pdf There is also a technical manual adjacent, with circuit descriptions. I will scope this up tonight and take a look! --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 1:21 PM, Noel Chiappa via cctalk wrote: >> From: Brent Hilpert > >> what about the refresh circuitry of the memory board? >> ... >> It might also explain why a number of 4116s were (apparently) failing >> earlier in the efforts ... replacing them might have just replaced them >> with 'slightly better' chips, i.e. with a slightly longer refresh tolerance. > > Ooh, excellent idea! Is the schematic available for the memory board at-issue? Curious myself to see what approach for refresh DEC used.
Re: PDP-11/45 RSTS/E boot problem
> From: Brent Hilpert > what about the refresh circuitry of the memory board? > ... > It might also explain why a number of 4116s were (apparently) failing > earlier in the efforts ... replacing them might have just replaced them > with 'slightly better' chips, i.e. with a slightly longer refresh tolerance. Ooh, excellent idea! Noel
Re: PDP-11/45 RSTS/E boot problem
On 2019-Feb-06, at 10:53 AM, Noel Chiappa via cctalk wrote: > > I'm not sure that's going to tell us much: the latest development is that > Fritz looked at the actual memory contents again, and it is once again > trash; _almost_ identical to what was there before: > > PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144 > > but with some extra 01 bits: > > PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144 > > (It's not clear if this represents a real difference, or if that > front panel issue Fritz mentioned caused the contents to be displayed > incorrectly.) > > The exciting thing is that if the latter really is what's in main memory, > that '16700 16152' at the PC of the MM trap could indeed generate the MM trap > we're seeing: it's "MOV 26364, R0", and that address is in segment (page) 1, > which is only 03500 long > > If so, i) we're down to one problem (good news), and our problem turns into > finding out how that section of the code got trashed (bad news). Which is not > going to be simple, alas, I suspect. I don't think it's the RK11, because > Unix reads the program image into system buffers in low memory, and that's > clearly working OK in the 'sleep;ls' case. (It may not use the exact same > buffers, though...) It then copies it out to the memory where it's going to > execute from, using an MTPI loop. So maybe the memory still has issues, or > maybe the MTPI isn't working with some main memory locations or or or... I haven't followed this in detail enough to know what the configuration and memory board at play are so maybe this can be ruled out from your end, but for consideration, what about the refresh circuitry of the memory board? Mem diagnostics, unless they explicitly account for it, may not show up problems with memory refresh if the loop times are short enough to effectively substitute as refresh cycles, while they could show up later in real-world use with arbitrary time between accesses. Refresh on some early boards/systems was asynchronously timed by monostables or onboard oscillators which can drift or fail on the margin/slope. (I don't know what DEC's design policy was for DRAM refresh). It might also explain why a number of 4116s were (apparently) failing earlier in the efforts (if I recall the discussion correctly), replacing them might have just replaced them with 'slightly better' chips, i.e. with a slightly longer refresh tolerance.
Re: PDP-11/45 RSTS/E boot problem
> From: Mattis Lind >> we've also looked at what's in memory at that location, and the low >> part of the text segment seems to be correct, but there was junk at >> the top, around the target of the JSR (i.e. at 'csv'). Not just one >> word, but everything around that location was wrong, which would >> suggest to me that the cause is not a simple memory failure there. >> I've suggested to Fritz that we look at that again, to see if what was >> recorded before is accurate > Would it be possible to insert a breakpoint or halt and run the > program, insert original instruction and single step? I'm not sure that's going to tell us much: the latest development is that Fritz looked at the actual memory contents again, and it is once again trash; _almost_ identical to what was there before: PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144 but with some extra 01 bits: PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144 (It's not clear if this represents a real difference, or if that front panel issue Fritz mentioned caused the contents to be displayed incorrectly.) The exciting thing is that if the latter really is what's in main memory, that '16700 16152' at the PC of the MM trap could indeed generate the MM trap we're seeing: it's "MOV 26364, R0", and that address is in segment (page) 1, which is only 03500 long If so, i) we're down to one problem (good news), and our problem turns into finding out how that section of the code got trashed (bad news). Which is not going to be simple, alas, I suspect. I don't think it's the RK11, because Unix reads the program image into system buffers in low memory, and that's clearly working OK in the 'sleep;ls' case. (It may not use the exact same buffers, though...) It then copies it out to the memory where it's going to execute from, using an MTPI loop. So maybe the memory still has issues, or maybe the MTPI isn't working with some main memory locations or or or... Noel
Re: PDP-11/45 RSTS/E boot problem
> On the logic analyzer suggestion: I remember seeing a logic analyzer hooked > to a PDP-11 at DEC, for software debugging. As I recall, it was connected at > the console front panel, which seems reasonable since several key CPU data > paths are exposed there. Ooh, I like that suggestion! It might be worth making up some inline cables for the LA just for this purpose, so it could be a quick hookup whenever needed. --FritzM.
Re: PDP-11/45 RSTS/E boot problem
>> Would it be any difference if you run the machine at full speed or lower >> speed... > > Ah, yes -- this I haven't tried yet! I have a KM11 replica, so this is easy > enough to do; I'll give that a go when I next get back to the machine > (possibly this evening). Ran the machine on the maintenance clock via the KM11 at a variety of speeds, and the behavior remains the same. So not too timing sensitive... At least its consistent! --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On the logic analyzer suggestion: I remember seeing a logic analyzer hooked to a PDP-11 at DEC, for software debugging. As I recall, it was connected at the console front panel, which seems reasonable since several key CPU data paths are exposed there. paul
Re: PDP-11/45 RSTS/E boot problem
On Tue, Feb 5, 2019 at 10:03 AM Fritz Mueller via cctalk < cctalk@classiccmp.org> wrote: > > > FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of > running PDP11GUI, and I use an USA19H USB serial dongle connected through > to the VM as a serial interface. I don't know if something about this > setup is particularly detrimental to PDP11GUI transfer performance? It > takes me >3hrs to write a 2.5mb pack this way (at 9600 baud), or a little > over 1hr to read one back. Do others see similar throughput with these > tools? > Yes. PDP11GUI is a great tool but it is extremely slow for dumping disks. It's not your setup. I restored an RL02 pack this way once (at 9600bps) and it took a very long time (I didn't time it but it was well over 6 hours). Compare this with restoring an RK05 pack on my PDP-8 using dumprest, which takes just about an hour... - Josh > > --FritzM. > >
Re: PDP-11/45 RSTS/E boot problem
On 2/5/2019 12:03 PM, Fritz Mueller via cctalk wrote: >> Perhaps compile [test programs] under SimH and do a block-level diff of the >> image with what is currently in use, and transfer just those blocks? > > I did experiment with this a little way back. I wrote a small standalone > code that dumps a CRC of every sector over the console; I can run this both > under SIMH and on the real machine, then diff to find the changed sectors. > > Unfortunately, when I tried to apply this, it seemed that SIMH's write single > sector wasn't working correctly for me (though "write all sectors to end" > seemed to work okay). It ended up being much more tedious than I had thought > to do it this way; in the end I concluded I'd be better off writing some > different software specifically for this purpose, but I haven't gotten back > to it yet. > > FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of > running PDP11GUI, and I use an USA19H USB serial dongle connected through to > the VM as a serial interface. I don't know if something about this setup is > particularly detrimental to PDP11GUI transfer performance? It takes me >3hrs > to write a 2.5mb pack this way (at 9600 baud), or a little over 1hr to read > one back. Do others see similar throughput with these tools? > > --FritzM. > > At 9600 bps, and allowing for 10 bit characters (8 data bits, 1 start, 1 stop), that is 960 cps, and 2.5MB RK05 should take under an hour (2400 s). Round that up to an hour, say, for handshaking overhead, etc. That is consistent with your read time. To get to three hours we would need a pause for each write of: 7200 = 200 (tracks) x 12 (sectors/trk) x 2 (sides) x n seconds/block And n would be 1.5 seconds / sector for the write time. That seems excessive. Perhaps it is doing read after write verify for each block written? If so, can you turn that verify off? (When I do my transfers over a DR11, I run a separate checksum step afterwards, and the transfer programs also report their checksums).
Re: PDP-11/45 RSTS/E boot problem
> On Feb 5, 2019, at 10:03 AM, Fritz Mueller wrote: > > Unfortunately, when I tried to apply this, it seemed that SIMH's write single > sector wasn't working correctly for me... Correction to above: "PDP11GUI's write single sector". Apologies! --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> Would it be any difference if you run the machine at full speed or lower > speed... Ah, yes -- this I haven't tried yet! I have a KM11 replica, so this is easy enough to do; I'll give that a go when I next get back to the machine (possibly this evening). > ...or even single step past this instruction? With the KM11 installed you > could even single step the 5 minor states in each micro instruction. Would it > be possible to insert a breakpoint or halt and run the program, insert > original instruction and single step? We're not *quite* sure yet of the exact offending instruction; memory around the purported fault location doesn't look like what we expect (particularly, its hard to see how the instruction which should have executed last could possibly result in the particular fault taken; thus Noel's request for an IR trace.) I think the breakpoint-and-step approach is likely to be fruitful, but we need to clear up some muddiness around the exact instruction sequence/location first. --FritzM.
Re: PDP-11/45 RSTS/E boot problem
>>> I keep wondering about the psu. >> >> Good theory. > > I'll give these a double-check... I did give these a look yesterday. Indeed, the +5 regulator in position "C" (which includes supply to the KT11) was running a little low (4.9 and change). I trimmed it up, and checked the rest of the regulators while I was at it (they were all fine.) This did clear up some small strangenesses I was seeing at the console in address translation mode, but "ls" still fails in exactly the same way. --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> On Feb 5, 2019, at 8:45 AM, Jon Elson via cctalk > wrote: > > I'd guess the diagnostic tries a few patterns to test for gross failure of > this circuitry, but since it involves memory on a system running a program, > it may not be able to exhaustively test these adders and comparators. In fact, the DEC diagnostics relocate themselves around memory, so they can and do "paint the whole floor". The tests are fairly exhaustive, testing relocations, access range and privilege mechanisms, activity and statistics flags, and fault and interrupt behaviors. (It takes my machine about 45 minutes running full bore to work its way through a single pass!) Again, not to say that there's not a bug lurking in the KT11 (it remains in fact a prime suspect!) But with the ground gone over so far we have managed to pretty thoroughly check and ruled out a lot of things like any sort of consistent failure of the relocation adder. I really appreciate the time people are taking to offer help and suggestions -- please keep them coming! thanks, --FritzM.
Re: PDP-11/45 RSTS/E boot problem
> Perhaps compile [test programs] under SimH and do a block-level diff of the > image with what is currently in use, and transfer just those blocks? I did experiment with this a little way back. I wrote a small standalone code that dumps a CRC of every sector over the console; I can run this both under SIMH and on the real machine, then diff to find the changed sectors. Unfortunately, when I tried to apply this, it seemed that SIMH's write single sector wasn't working correctly for me (though "write all sectors to end" seemed to work okay). It ended up being much more tedious than I had thought to do it this way; in the end I concluded I'd be better off writing some different software specifically for this purpose, but I haven't gotten back to it yet. FWIW, I maintain a Windows VM (on a MacOS X host) for the sole purpose of running PDP11GUI, and I use an USA19H USB serial dongle connected through to the VM as a serial interface. I don't know if something about this setup is particularly detrimental to PDP11GUI transfer performance? It takes me >3hrs to write a 2.5mb pack this way (at 9600 baud), or a little over 1hr to read one back. Do others see similar throughput with these tools? --FritzM.
Re: PDP-11/45 RSTS/E boot problem
On 02/05/2019 07:36 AM, Noel Chiappa via cctalk wrote: One would hope that the DEC KT11 diagnostic would check for this... but just to be thorough, we have in fact written a short diagnostic which stores every possible value in each UISA register and checks that it's correct. So unless there is some sort of pattern sensitivity (e.g. when A is in UISAm and B is in UISAn), that's not it. The MMU has to have some adders in it. One adds the offset for the segment's beginning physical address to the supplied address from the CPU. The other compares the requested address against the limit (size) of the segment, to make sure it doesn't exceed the segment size. Either this adder or the comparator could be faulty. I'd guess the diagnostic tries a few patterns to test for gross failure of this circuitry, but since it involves memory on a system running a program, it may not be able to exhaustively test these adders and comparators. Jon
Re: PDP-11/45 RSTS/E boot problem
Den tis 5 feb. 2019 kl 00:23 skrev Fritz Mueller via cctalk < cctalk@classiccmp.org>: > > > On Feb 4, 2019, at 2:28 AM, Noel Chiappa via cctalk < > cctalk@classiccmp.org> wrote: > > > > I'm pretty sure the command only gets a few instructions in before it > blows > > up. Here are the process' registers, and the _entire_ contents of the > user > > mode stack: > > > > R0 10 > > R1 0 > > R2 0 > > R3 0 > > R4 34 > > R5 444 > > SP 177760 > > PC 010210 > > > > 060: 00 20 01 10 14 17 071554 00 > > Okay, I've had a bit of time in front of the machine to repro this and > take a look. What I actually see is: > > R0 10 > R1 0 > R2 0 > R3 0 > R4 0 > R5 34 > R6 141774 > PC 000254 > > (remember, for the last, this will have been after taking a trap to 250, > where I have the usual "BR .+2; HALT" catcher installed) > > Also, memory at 060 (PA:164060) is all zeros as far as the eye can see... > Would it be any difference if you run the machine at full speed or lower speed or even single step past this instruction? With the KM11 installed you could even single step the 5 minor states in each micro instruction. Would it be possible to insert a breakpoint or halt and run the program, insert original instruction and single step? The TIG module has a separate non crystal controlled oscillator which one could tune for marginal checking. Would it be possible to isolate the test case outside the UNIX environment? /Mattis > > I have a bit of water on the basement floor right now after the recent > rains here, which is complicating setup of the LA. There's a big puddle > where I normally place it... > > >
Re: PDP-11/45 RSTS/E boot problem
> > Yeah, it may come to that. One issue we've been having is doing specialized > test programmes; trying to run the C compiler fails. I don't know about the > assembler, though. And as Fritz mentioned, it takes hours to load a new disk > image. I think we've come up with a way around that, though; produce binary > of stand-alone tests elsewhere (I've often/always got a v6 running on > Ersatz-11 here), and load them into the /45's main memory with PDP11GUI. > > Noel Perhaps compile it under SimH and do a block-level diff of the image with what is currently in use, and transfer just those blocks? (Presumably would be the superblock, bitmap, directory and actual program blocks). For my setup I use a DR11 to transfer data, using an Arduino with Ethernet as a go-between my PC and the PDP-11.
Re: PDP-11/45 RSTS/E boot problem
> From: Paul Koning > Another possibility occurs to me: bad bits in the MMU (UISAR0 register > ... if UISAR0 has a stuck bit so the "plain" case maps incorrectly > you'd expect to come up with execution that looks nothing at all like > what was intended. One would hope that the DEC KT11 diagnostic would check for this... but just to be thorough, we have in fact written a short diagnostic which stores every possible value in each UISA register and checks that it's correct. So unless there is some sort of pattern sensitivity (e.g. when A is in UISAm and B is in UISAn), that's not it. Also, and perhaps more significantly, when checked after the trap happens, all the UISA registers and all the KISA registers contain correct data. So, unless it's something where _sometimes_ one reads UISAn and gets X when it actually contains Y, I'm not sure the SARs (PARs) are involved. > From: Jon Elson > OK, here's a really complicated thing to try. If you know the physical > memory address of ls when it has the problem We do (see above), and we've also looked at what's in memory at that location, and the low part of the text segment seems to be correct, but there was junk at the top, around the target of the JSR (i.e. at 'csv'). Not just one word, but everything around that location was wrong, which would suggest to me that the cause is not a simple memory failure there. I've suggested to Fritz that we look at that again, to see if what was recorded before is accurate (i.e. if we see the same wrong contents), to make sure we didn't make a mistake somehow. > write a machine language program that loads a copy of ls into that > location and then tries to read it back. Yeah, it may come to that. One issue we've been having is doing specialized test programmes; trying to run the C compiler fails. I don't know about the assembler, though. And as Fritz mentioned, it takes hours to load a new disk image. I think we've come up with a way around that, though; produce binary of stand-alone tests elsewhere (I've often/always got a v6 running on Ersatz-11 here), and load them into the /45's main memory with PDP11GUI. Noel
RE: PDP-11/45 RSTS/E boot problem
Yep, I noticed that, but thought it was a idea you might want to explore and it’s simple enough to do. Without the full output from the ls command and how it was executed I was just throwing it out there. For instance, was the default dir where ls was run, the same dir as when the backgrounded one was run. That would make a difference if the filesystem was corrupt. In previous threads, there was an issue getting the proper image onto the disk, there is the potential for corruption. There is the assumption, since boards were being worked on, that the problem for a software is probably due to said hardware, even though diags pass. With that assumption, shouldn’t you try to eliminate different hardware pieces? I would try running something that uses memory and doesn’t use disk to narrow the problem down. Anyway, Take care and good luck, Wayne Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Noel Chiappa Sent: Monday, February 4, 2019 12:43:09 PM To: cctalk@classiccmp.org Cc: j...@mercury.lcs.mit.edu Subject: RE: PDP-11/45 RSTS/E boot problem > From: Wayne S > it might be a wonky filesystem. ... > The corruption probably came because the entire disk was going bad. This theory is contradicted by the fact (mentioned several times, including in the message you were replying to) that doing a plain 'ls' bombs, but 'sleep 300 &; ls' works fine. Noel
RE: PDP-11/45 RSTS/E boot problem
Noel, it might be a wonky filesystem. I’ve had ls -l seg fault because of bad attribute data on a file in a directory on Solaris. Interestingly, ls (without the -l) worked okay. Maybe fsck or the equivalent command may show something. It was a Solaris system with many concurrent users so I couldn’t take it down to run fsck so I ended up writing a quick Perl program to just list file names and then modified it to get the attributes. It seg faulted when it came to the bad file name. I used Perl unlink to kill it and everything was okay. The corruption probably came because the entire disk was going bad. Just a thought. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: cctalk on behalf of Noel Chiappa via cctalk Sent: Monday, February 4, 2019 11:24:19 AM To: cctalk@classiccmp.org Cc: j...@mercury.lcs.mit.edu Subject: Re: PDP-11/45 RSTS/E boot problem > From: Jay Jaeger > This sort of situation, where DEC diagnostics run OK but UNIX has issues > was reported to be not all that uncommon - to the point where the urban > legend was that some DEC FE's would fire up Unix V6 as a sort of system > exerciser. Amusing! Never heard that; our -11's were never under maintenance, so DEC FE's never worked on them. > Make a copy of ls, and see if the copy also fails It acts just like the original; fails when run by itself, runs OK when 'sleep' is also running (in the background). > From: Bob Smith > We finally had the cpu backplane replaced Ow. Not an option for Fritz, I expect. (I dunno - anyone have a spare /45 backplane?) > From: Paul Koning > Is there any way to attach a logic analyzer to various data paths on > this machine? I had suggested to Fritz that the symptoms led me to believe that it was time to deploy a LA, especially since the MM trap only occurs once between him typing 'ls' and the process failing - i.e. easy to trigger on. He offered me the options of look at the IR or at the UNIBUS - I opted for the IR so we can see _exactly_ what the machine _thinks_ it is doing! No report back yet, though. Noel
Re: PDP-11/45 RSTS/E boot problem
On 02/04/2019 11:34 AM, Fritz Mueller via cctalk wrote: 2. Make a copy of ls, and see if the copy also fails (different location on disk would mess with timing just a bit). Also done; the copy appears to behave identically to the original. OK, here's a really complicated thing to try. If you know the physical memory address of ls when it has the problem, write a machine language program that loads a copy of ls into that location and then tries to read it back. You might be able to do this in Unix, having it start with the exact code of ls, but then has the tester above that and the entry point is for the test program. This would detect a pattern sensitivity in the memory. If ls, when actually running reads an instruction wrong, it could then try to read a bad address, and cause the MMU trap. Jon
Re: PDP-11/45 RSTS/E boot problem
On 02/04/2019 11:20 AM, Fritz Mueller via cctalk wrote: The MMU classifies the error in register SR0; this decodes to a segment length error (access within the segment beyond configured bound). As Noel notes, however, this is not consistent with the instructions we see at the point of fault. OK, so the CPU presents an address that is within the segment bound, but the MMU declares it to be OUTSIDE the bounds of the segment. That could be a CPU problem, but likely would be the same with the MMU on or off, so the diags SHOULD catch that. But, if the CPU is sending a good address, then it has to be the MMU is failing on the addition/comparison with the segment size. Anyway, is it possible to borrow an MMU from somebody else? Potentially... It is a two board option; I do have a spare for both of the boards, but these spares each are in need of other repairs at the moment. One slightly complicating factor is that I have a *very* early 11/45. Most of my boards (including the MMU boards), as well as my backplane, pre-date the currently available schematics on bitsavers, etc., and there are no records regarding which ECOs have been applied on my hardware. Thus my interest in tracking down ECOs/FCOs... I've been picking my way through the list that Jay recently posted, verifying by looking at the greenwires which FCO's I have applied and which not. Its a bit painstaking. This could be messy, but DEC was FAIRLY good at making updates backwards compatible where possible. So, it MAY be true that a later MMU will still work in this CPU. Jon
Re: PDP-11/45 RSTS/E boot problem
> From: Fritz Mueller > I've had a bit of time in front of the machine to repro this and take a > look. What I actually see is: > R0 10 > R1 0 > R2 0 > R3 0 > R4 0 > R5 34 > R6 141774 > PC 000254 Argh. (Very red face!) I worked out the trap stack layout by looking at m40.s and trap.c, and totally forgot about the return PC (that's the 0444) from the call to trap(): 0001740 13 141756 022050 13 00 00 00 34 0001760 000444 31 177760 00 030351 10 010210 170010 I clearly should have looked at core(V) in the V6 manual! The R6 you have recorded is correct for just after the trap; that's the kernel mode SP, which points to the top of the kernel stack, in segment 6 (in the swappable per-process kernel area, which runs from 14-1776). So there is no R5 mystery, I was just confused. Back to the other two! Noel
Re: PDP-11/45 RSTS/E boot problem
>>> The obvious answer is bad memory. >> >> At the board level, yes. Deeper, it could be bad memory bits or bad >> memory decode. > > Yes, one of the standard early PDP-11 memory tests is the "no duplicate > address test". I should say that the memory board is not _completely_ whack -- it is passing the rather thorough MAINDEC ZQMC, a 0-124k exerciser with multiple pattern/sequence tests which also kicks around the KT11. That doesn't rule out the possibility that there is a lurker in there not covered by the DEC diags. But if there is, its something subtle...
Re: PDP-11/45 RSTS/E boot problem
> On Feb 4, 2019, at 2:28 AM, Noel Chiappa via cctalk > wrote: > > I'm pretty sure the command only gets a few instructions in before it blows > up. Here are the process' registers, and the _entire_ contents of the user > mode stack: > > R0 10 > R1 0 > R2 0 > R3 0 > R4 34 > R5 444 > SP 177760 > PC 010210 > > 060: 00 20 01 10 14 17 071554 00 Okay, I've had a bit of time in front of the machine to repro this and take a look. What I actually see is: R0 10 R1 0 R2 0 R3 0 R4 0 R5 34 R6 141774 PC 000254 (remember, for the last, this will have been after taking a trap to 250, where I have the usual "BR .+2; HALT" catcher installed) Also, memory at 060 (PA:164060) is all zeros as far as the eye can see... I have a bit of water on the basement floor right now after the recent rains here, which is complicating setup of the LA. There's a big puddle where I normally place it...
Re: PDP-11/45 RSTS/E boot problem
> On Feb 4, 2019, at 5:47 PM, Ethan Dicks wrote: > > On Mon, Feb 4, 2019 at 3:15 PM Paul Koning via cctalk > wrote: >>> On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk >>> wrote: >> That translates into "the problem depends on the physical address of the >> code being executed". >> >> The obvious answer is bad memory. > > At the board level, yes. Deeper, it could be bad memory bits or bad > memory decode. > > A simple ones-and-zeros test can identify bad DRAMs. It's not as > likely to find bad decoding, which could result in the same chips > tested more than once and other chips not tested at all. I've found > both problems in real MS11-L boards I have for my stack of 11/04 and > 11/34s I'm testing. > > ISTR in the DEC world, they were good about that. I have multiple > papertapes for the PDP-8, that I think were literally called "ones and > zeros" and "memory address" tests. I would think XXDP has something > similar in terms of progressive tests that expect the previous stage > passed. Yes, one of the standard early PDP-11 memory tests is the "no duplicate address test". paul
Re: PDP-11/45 RSTS/E boot problem
On Mon, Feb 4, 2019 at 3:15 PM Paul Koning via cctalk wrote: > > On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk > > wrote: > That translates into "the problem depends on the physical address of the code > being executed". > > The obvious answer is bad memory. At the board level, yes. Deeper, it could be bad memory bits or bad memory decode. A simple ones-and-zeros test can identify bad DRAMs. It's not as likely to find bad decoding, which could result in the same chips tested more than once and other chips not tested at all. I've found both problems in real MS11-L boards I have for my stack of 11/04 and 11/34s I'm testing. ISTR in the DEC world, they were good about that. I have multiple papertapes for the PDP-8, that I think were literally called "ones and zeros" and "memory address" tests. I would think XXDP has something similar in terms of progressive tests that expect the previous stage passed. -ethan
Re: PDP-11/45 RSTS/E boot problem
On 2/4/2019 11:34 AM, Fritz Mueller via cctech wrote: > >> On Feb 4, 2019, at 9:13 AM, Jay Jaeger wrote: >> >> If he hasn't already, if Fritz has more than one memory board, he might >> try swapping them to see if that changes anything. > > I only have an 128kw MS11-L here to work with, unfortunately. Its been > through a bunch of recent troubleshooting (tracking down and replacing failed > DRAMs). I *think* its pretty solid at this point (also passing some of the > hairier DEC diagnostics) but... > > I'd be happy to try out a different memory board if anybody was interested in > sending out a loaner? (I'm in the SF Bay area). > Well it turns out I have a couple of spares, but maybe someone closer would be easier (Madison, WI 53711) I have an MS11-LB, 64Kw, M7891-BB and two MS11-LD, 128Kw, M7891-DB and an M7891-D? So, two of these are newer revisions (rather than M7891-xA) - I have no idea what the difference is. On that last one I probably didn't record where it was D, DB or DA I also have quite a few RK05 packs and would be willing to sell one (and I have boxes to ship boards and packs in). The ones I am most willing to part with would need their open/close springs removed, as they are broken and dangerous to the platter in their current condition, but are otherwise fine. I would just remove the spring. $20 for a pack is what I usually price them at, plus shipping. (PayPal, preferably) The board would be a loan (with compensation for time spent if it is bad *and* gets fixed) ;). Let me know - might take me a couple of days to hunt the board down and remove the spring and re-test the pack and pack everything up and ship it. (in my 11/34 which runs @rkunix V6 just fine. ;)) JRJ
Re: PDP-11/45 RSTS/E boot problem
> On Feb 4, 2019, at 3:43 PM, Noel Chiappa via cctalk > wrote: > >> From: Wayne S > >> it might be a wonky filesystem. ... >> The corruption probably came because the entire disk was going bad. > > This theory is contradicted by the fact (mentioned several times, including in > the message you were replying to) that doing a plain 'ls' bombs, but 'sleep > 300 &; ls' works fine. That translates into "the problem depends on the physical address of the code being executed". The obvious answer is bad memory. Another possibility occurs to me: bad bits in the MMU (UISAR0 register if I remember correctly). Bad memory is likely to show up with a few bits wrong; if UISAR0 has a stuck bit so the "plain" case maps incorrectly you'd expect to come up with execution that looks nothing at all like what was intended. paul
RE: PDP-11/45 RSTS/E boot problem
> From: Wayne S > it might be a wonky filesystem. ... > The corruption probably came because the entire disk was going bad. This theory is contradicted by the fact (mentioned several times, including in the message you were replying to) that doing a plain 'ls' bombs, but 'sleep 300 &; ls' works fine. Noel
Re: PDP-11/45 RSTS/E boot problem
> From: Jay Jaeger > This sort of situation, where DEC diagnostics run OK but UNIX has issues > was reported to be not all that uncommon - to the point where the urban > legend was that some DEC FE's would fire up Unix V6 as a sort of system > exerciser. Amusing! Never heard that; our -11's were never under maintenance, so DEC FE's never worked on them. > Make a copy of ls, and see if the copy also fails It acts just like the original; fails when run by itself, runs OK when 'sleep' is also running (in the background). > From: Bob Smith > We finally had the cpu backplane replaced Ow. Not an option for Fritz, I expect. (I dunno - anyone have a spare /45 backplane?) > From: Paul Koning > Is there any way to attach a logic analyzer to various data paths on > this machine? I had suggested to Fritz that the symptoms led me to believe that it was time to deploy a LA, especially since the MM trap only occurs once between him typing 'ls' and the process failing - i.e. easy to trigger on. He offered me the options of look at the IR or at the UNIBUS - I opted for the IR so we can see _exactly_ what the machine _thinks_ it is doing! No report back yet, though. Noel
Re: PDP-11/45 RSTS/E boot problem
On Mon, Feb 4, 2019 at 11:35 AM Paul Koning via cctalk < cctalk@classiccmp.org> wrote: > The spec says allowed tolerances are +/- 5%. He knew the reality for > correct operation was -0%, +5%, so he tweaked all the supplies to read a > hair above nominal. > Ah, the good old days... I recall our PDP-11 tech tweaking +5V from 5.05V to 4.95V and back again to demonstrate that tiny differences matter a lot on one of the cranky 11/23+''s we had after I made a particularly unhelpful teenage smart ass remark... The 11/23+ wouldn't boot at the slightly lower than full voltage. It as cranky for a couple of years. Before that unit was retired, the 5V and 12V rails had been tweek up to 5.2V and 12.5V in an effort to keep the system alive long enough to transition customers from it to a new Vax installed to deal with the growth in demand... In the end, we put that 11/23+ back in service for developers with a different disk controller and it was happy back at +5.05V / +12.1V... Warner