Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-12-01 Thread Mark Kettenis
> Date: Thu, 25 Nov 2021 15:14:27 -0700
> From: Ted Bullock 

Hi Ted,

I made some small changes to the code and committed it.  I chose to
use device_type in the end since that better reflects the intention of
disabling devices that use the Open Firmware driver for IDE devices.

Thanks for getting to the bottom of this!

> On 2021-11-25 5:22 a.m., Ted Bullock wrote:
> > On 2021-11-25 5:05 a.m., Mark Kettenis wrote:
> >> From: Ted Bullock 
> >>> On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:
> > +    parent = OF_parent(handle);
> 
>  I think the OF_parent call can go inside the !strcmp(buf, "block")
>  block.
> >>>
> >>>
> >>> I worried that the following re-assignment of the handle would cause
> >>> problems, so I chose an order of operations and placed the parent call
> >>> before this re-assignment.  The reason for my concern is based on not
> >>> knowing the proper distinction between OF_finddevice and OF_open in the
> >>> boot prom itself (they are both kind of opaque to me and just send bits
> >>> for interpretation into the boot prom).
> >>>
> >>> if ((handle = OF_open(fname)) == -1) {
> >>> DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
> >>> return ENXIO;
> >>> }
> >>>
> >>> It's possible that there is no meaningful distinction between when
> >>> handle is re-assigned midway through the function, in which case you are
> >>> correct it could absolutely be moved later in the function call.
> >>> Notably there is quite a bit of variable re-use and re-purposing in the
> >>> devopen function so I chose the place I thought would be safest.
> >>
> >> I think you have a point.  Might make sense to have a separate
> >> variables here.  Probably best to keep using handle for the OF_open()
> >> result, and use "node" or "dhandle" for the OF_finddevice() result.
> >>
> > 
> > More to the point I just went to test since I was now awake and thinking
> > about the problem some more and moving the call to OF_parent past the
> > reassignment with OF_open does indeed break booting. I presume that the
> > handle return value is contextually entirely different between
> > OF_finddevice and OF_open.
> 
> I deliberately left the call to OF_parent earlier in the function.  This
> can be moved closer to the IDE comparison logic only if the variable
> handle isn't re-assigned. Maybe do this in a future patch.
> 
> Also I left the query to name rather than device_type because it seems
> more correct from my reading of the ieee1275 spec. The device tree in OF
> uses the name as the identifier in the tree, not the device_type.
> 
> I do agree that checking if the parent exists is more correct and I have
> done that.
> 
> This patch currently works for me.
> 
> Index: arch/sparc64/stand/ofwboot/ofdev.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
> retrieving revision 1.31
> diff -u -p -u -p -r1.31 ofdev.c
> --- arch/sparc64/stand/ofwboot/ofdev.c9 Dec 2020 18:10:19 -   
> 1.31
> +++ arch/sparc64/stand/ofwboot/ofdev.c25 Nov 2021 22:04:01 -
> @@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
>   char fname[256];
>   char buf[DEV_BSIZE];
>   struct disklabel label;
> - int handle, part;
> + int handle, part, parent;
>   int error = 0;
>  #ifdef SOFTRAID
>   char volno;
> @@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
>  #endif
>   if ((handle = OF_finddevice(fname)) == -1)
>   return ENOENT;
> +
> + parent = OF_parent(handle);
> +
>   DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
>   if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
>   return ENXIO;
> @@ -685,6 +688,17 @@ devopen(struct open_file *of, const char
>  
>   of->f_dev = devsw;
>   of->f_devdata = 
> +
> + /* Some PROMS have bugged writing code for ide block devices */
> + if (parent &&
> + OF_getprop(parent, "name", buf, sizeof buf) > 0 &&
> + !strcmp(buf, "ide"))
> + {
> + DNPRINTF(BOOT_D_OFDEV, 
> + "devopen: Disable writing for IDE block device\n");
> + of->f_flags |= F_NOWRITE;
> + }
> +
>  #ifdef SPARC_BOOT_UFS
>   bcopy(_system_ufs, _system[nfsys++], sizeof 
> file_system[0]);
>   bcopy(_system_ufs2, _system[nfsys++], sizeof 
> file_system[0]);
> Index: arch/sparc64/stand/ofwboot/vers.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
> retrieving revision 1.22
> diff -u -p -u -p -r1.22 vers.c
> --- arch/sparc64/stand/ofwboot/vers.c 9 Dec 2020 18:10:19 -   1.22
> +++ arch/sparc64/stand/ofwboot/vers.c 25 Nov 2021 22:04:01 -
> @@ -1 +1 @@
> -const char version[] = "1.21";
> +const char version[] = "1.22";
> Index: 

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Ted Bullock
On 2021-11-25 5:22 a.m., Ted Bullock wrote:
> On 2021-11-25 5:05 a.m., Mark Kettenis wrote:
>> From: Ted Bullock 
>>> On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:
> +    parent = OF_parent(handle);

 I think the OF_parent call can go inside the !strcmp(buf, "block")
 block.
>>>
>>>
>>> I worried that the following re-assignment of the handle would cause
>>> problems, so I chose an order of operations and placed the parent call
>>> before this re-assignment.  The reason for my concern is based on not
>>> knowing the proper distinction between OF_finddevice and OF_open in the
>>> boot prom itself (they are both kind of opaque to me and just send bits
>>> for interpretation into the boot prom).
>>>
>>> if ((handle = OF_open(fname)) == -1) {
>>> DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
>>> return ENXIO;
>>> }
>>>
>>> It's possible that there is no meaningful distinction between when
>>> handle is re-assigned midway through the function, in which case you are
>>> correct it could absolutely be moved later in the function call.
>>> Notably there is quite a bit of variable re-use and re-purposing in the
>>> devopen function so I chose the place I thought would be safest.
>>
>> I think you have a point.  Might make sense to have a separate
>> variables here.  Probably best to keep using handle for the OF_open()
>> result, and use "node" or "dhandle" for the OF_finddevice() result.
>>
> 
> More to the point I just went to test since I was now awake and thinking
> about the problem some more and moving the call to OF_parent past the
> reassignment with OF_open does indeed break booting. I presume that the
> handle return value is contextually entirely different between
> OF_finddevice and OF_open.

I deliberately left the call to OF_parent earlier in the function.  This
can be moved closer to the IDE comparison logic only if the variable
handle isn't re-assigned. Maybe do this in a future patch.

Also I left the query to name rather than device_type because it seems
more correct from my reading of the ieee1275 spec. The device tree in OF
uses the name as the identifier in the tree, not the device_type.

I do agree that checking if the parent exists is more correct and I have
done that.

This patch currently works for me.

Index: arch/sparc64/stand/ofwboot/ofdev.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
retrieving revision 1.31
diff -u -p -u -p -r1.31 ofdev.c
--- arch/sparc64/stand/ofwboot/ofdev.c  9 Dec 2020 18:10:19 -   1.31
+++ arch/sparc64/stand/ofwboot/ofdev.c  25 Nov 2021 22:04:01 -
@@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
char fname[256];
char buf[DEV_BSIZE];
struct disklabel label;
-   int handle, part;
+   int handle, part, parent;
int error = 0;
 #ifdef SOFTRAID
char volno;
@@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
 #endif
if ((handle = OF_finddevice(fname)) == -1)
return ENOENT;
+
+   parent = OF_parent(handle);
+
DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
return ENXIO;
@@ -685,6 +688,17 @@ devopen(struct open_file *of, const char
 
of->f_dev = devsw;
of->f_devdata = 
+
+   /* Some PROMS have bugged writing code for ide block devices */
+   if (parent &&
+   OF_getprop(parent, "name", buf, sizeof buf) > 0 &&
+   !strcmp(buf, "ide"))
+   {
+   DNPRINTF(BOOT_D_OFDEV, 
+   "devopen: Disable writing for IDE block device\n");
+   of->f_flags |= F_NOWRITE;
+   }
+
 #ifdef SPARC_BOOT_UFS
bcopy(_system_ufs, _system[nfsys++], sizeof 
file_system[0]);
bcopy(_system_ufs2, _system[nfsys++], sizeof 
file_system[0]);
Index: arch/sparc64/stand/ofwboot/vers.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
retrieving revision 1.22
diff -u -p -u -p -r1.22 vers.c
--- arch/sparc64/stand/ofwboot/vers.c   9 Dec 2020 18:10:19 -   1.22
+++ arch/sparc64/stand/ofwboot/vers.c   25 Nov 2021 22:04:01 -
@@ -1 +1 @@
-const char version[] = "1.21";
+const char version[] = "1.22";
Index: lib/libsa/fchmod.c
===
RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
retrieving revision 1.1
diff -u -p -u -p -r1.1 fchmod.c
--- lib/libsa/fchmod.c  3 Aug 2019 15:22:17 -   1.1
+++ lib/libsa/fchmod.c  25 Nov 2021 22:04:03 -
@@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
errno = EOPNOTSUPP;
return (-1);
}
+   /* writing is broken or unsupported */
+   if (f->f_flags & F_NOWRITE) {
+

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Ted Bullock

On 2021-11-25 5:05 a.m., Mark Kettenis wrote:

From: Ted Bullock 

On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:

+   parent = OF_parent(handle);


I think the OF_parent call can go inside the !strcmp(buf, "block") block.



I worried that the following re-assignment of the handle would cause
problems, so I chose an order of operations and placed the parent call
before this re-assignment.  The reason for my concern is based on not
knowing the proper distinction between OF_finddevice and OF_open in the
boot prom itself (they are both kind of opaque to me and just send bits
for interpretation into the boot prom).

if ((handle = OF_open(fname)) == -1) {
DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
return ENXIO;
}

It's possible that there is no meaningful distinction between when
handle is re-assigned midway through the function, in which case you are
correct it could absolutely be moved later in the function call.
Notably there is quite a bit of variable re-use and re-purposing in the
devopen function so I chose the place I thought would be safest.


I think you have a point.  Might make sense to have a separate
variables here.  Probably best to keep using handle for the OF_open()
result, and use "node" or "dhandle" for the OF_finddevice() result.



More to the point I just went to test since I was now awake and thinking 
about the problem some more and moving the call to OF_parent past the 
reassignment with OF_open does indeed break booting. I presume that the 
handle return value is contextually entirely different between 
OF_finddevice and OF_open.


In fact, I will need to go through the recovery process again to install 
a working boot block :P So further testing will need to be when it's not 
5:22 AM and people won't yell at me.


Honestly there are so many ifdefs, gotos and variable re tasking in 
these functions to handle building distinct boot binaries that the 
entire file is worryingly fragile.


Anyway one thing at a time, I don't think its appropriate to be renaming 
variables in this patch but I am willing to do that work on subsequent 
ones if people want to see that done.


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Mark Kettenis
> Date: Thu, 25 Nov 2021 04:28:55 -0700
> Content-Language: en-US
> Cc: Theo de Raadt ,
> Mark Kettenis , bugs@openbsd.org
> From: Ted Bullock 
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:
> >> +  parent = OF_parent(handle);
> > 
> > I think the OF_parent call can go inside the !strcmp(buf, "block") block.
> 
> 
> I worried that the following re-assignment of the handle would cause 
> problems, so I chose an order of operations and placed the parent call 
> before this re-assignment.  The reason for my concern is based on not 
> knowing the proper distinction between OF_finddevice and OF_open in the 
> boot prom itself (they are both kind of opaque to me and just send bits 
> for interpretation into the boot prom).
> 
> if ((handle = OF_open(fname)) == -1) {
>   DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
>   return ENXIO;
> }
> 
> It's possible that there is no meaningful distinction between when 
> handle is re-assigned midway through the function, in which case you are 
> correct it could absolutely be moved later in the function call. 
> Notably there is quite a bit of variable re-use and re-purposing in the 
> devopen function so I chose the place I thought would be safest.

I think you have a point.  Might make sense to have a separate
variables here.  Probably best to keep using handle for the OF_open()
result, and use "node" or "dhandle" for the OF_finddevice() result.

Cheers,

Mark



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Ted Bullock

On 2021-11-25 4:29 a.m., Mark Kettenis wrote:

And I think it is better to use "device_type" here.


https://github.com/openbios/openboot/blob/master/obp/dev/ide/ide.fth

The obp source list both identifiers, however, the 1275 spec says:

3.2.1.1 Node names

Each node in the device tree is identified by a node *name* using the 
following notation:


driver-name@unit-address:device-arguments

which led me to thinking that using the name identifier was more 
appropriate than device_type.


Later the 1275 spec says

“name” - Standard property name to define the name of the package

in contrast to

“device_type” - Standard property name to specify the implemented interface

 Anyway I think either is appropriate and will work, however I'll 
follow you're guidance on it, as I'm much less experienced here.



So based on what I see in diskprobe.c, I would probably write this as:

 parent = OF_parent(handle)
if (parent && OF_getprop(parent, "device_type", buf,
sizeof(buf)) > 0 && strcmp(buf, "ide") == 0)
of->f_flags |= F_NOWRITE;

>

Can you test that Ted and mail a new diff?


Yes, it will have to wait until later today when I'm at the machine 
though.  Thanks for taking the time to review in the meantime.


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Mark Kettenis
> Date: Thu, 25 Nov 2021 11:55:24 +0100
> From: Otto Moerbeek 
> 
> On Wed, Nov 24, 2021 at 02:48:29PM -0700, Ted Bullock wrote:
> 
> > On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > > 
> > > I can confirm that this allows my sunblade 100 to boot -current
> > Hi folks,
> > 
> > I'm requesting to have the patch I sent in a previous mail reviewed and
> > committed. I also attached the patch here.
> > 
> > (for reference)
> > https://marc.info/?l=openbsd-bugs=163744498300496=2
> > 
> > This should affect only a small set of older sparc64 machines that use
> > IDE drives instead of SCSI.  At least two such machines types are not
> > working and are corrupting filesystems when booted with recent versions
> > of OpenBSD since writing calls were introduced to the second stage boot
> > block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> > older sparc64 machines will potentially re-use entropy from previous
> > boot cycles if a new random seed file was not written between restarts.
> > 
> > -- 
> > Ted Bullock 
> 
> Tests ok (using a SCSI disk, I still see the correct behaviour, I do
> not have IDE disks so I cannot test that).
> 
> One comment inline,
> 
>   -Otto
> 
> > Index: arch/sparc64/stand/ofwboot/ofdev.c
> > ===
> > RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
> > retrieving revision 1.31
> > diff -u -p -u -p -r1.31 ofdev.c
> > --- arch/sparc64/stand/ofwboot/ofdev.c  9 Dec 2020 18:10:19 -   
> > 1.31
> > +++ arch/sparc64/stand/ofwboot/ofdev.c  20 Nov 2021 12:36:10 -
> > @@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
> > char fname[256];
> > char buf[DEV_BSIZE];
> > struct disklabel label;
> > -   int handle, part;
> > +   int handle, part, parent;
> > int error = 0;
> >  #ifdef SOFTRAID
> > char volno;
> > @@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
> >  #endif
> > if ((handle = OF_finddevice(fname)) == -1)
> > return ENOENT;
> > +
> > +   parent = OF_parent(handle);
> 
> I think the OF_parent call can go inside the !strcmp(buf, "block") block.
> 
> 
> > +
> > DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
> > if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
> > return ENXIO;
> > @@ -685,6 +688,13 @@ devopen(struct open_file *of, const char
> >  
> > of->f_dev = devsw;
> > of->f_devdata = 
> > +
> > +   /* Some PROMS have bugged writing code for ide block devices */
> 
> here I mean

Also, it is probably a good idea to check that there is a parent.  And I think 
it is better to use "device_type" here.

> > +   if (OF_getprop(parent, "name", buf, sizeof buf) < 0)
> > +   return ENXIO;
> > +   if (!strcmp(buf, "ide"))
> > +   of->f_flags |= F_NOWRITE;
> > +

So based on what I see in diskprobe.c, I would probably write this as:

parent = OF_parent(handle)
if (parent && OF_getprop(parent, "device_type", buf,
sizeof(buf)) > 0 && strcmp(buf, "ide") == 0)
of->f_flags |= F_NOWRITE;

Can you test that Ted and mail a new diff?

> >  #ifdef SPARC_BOOT_UFS
> > bcopy(_system_ufs, _system[nfsys++], sizeof 
> > file_system[0]);
> > bcopy(_system_ufs2, _system[nfsys++], sizeof 
> > file_system[0]);
> > Index: arch/sparc64/stand/ofwboot/vers.c
> > ===
> > RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
> > retrieving revision 1.22
> > diff -u -p -u -p -r1.22 vers.c
> > --- arch/sparc64/stand/ofwboot/vers.c   9 Dec 2020 18:10:19 -   
> > 1.22
> > +++ arch/sparc64/stand/ofwboot/vers.c   20 Nov 2021 12:36:11 -
> > @@ -1 +1 @@
> > -const char version[] = "1.21";
> > +const char version[] = "1.22";
> > Index: lib/libsa/fchmod.c
> > ===
> > RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
> > retrieving revision 1.1
> > diff -u -p -u -p -r1.1 fchmod.c
> > --- lib/libsa/fchmod.c  3 Aug 2019 15:22:17 -   1.1
> > +++ lib/libsa/fchmod.c  20 Nov 2021 12:36:12 -
> > @@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
> > errno = EOPNOTSUPP;
> > return (-1);
> > }
> > +   /* writing is broken or unsupported */
> > +   if (f->f_flags & F_NOWRITE) {
> > +   errno = EOPNOTSUPP;
> > +   return (-1);
> > +   }
> >  
> > errno = (f->f_ops->fchmod)(f, m);
> > return (0);
> > Index: lib/libsa/stand.h
> > ===
> > RCS file: /cvs/src/sys/lib/libsa/stand.h,v
> > retrieving revision 1.71
> > diff -u -p -u -p -r1.71 stand.h
> > --- lib/libsa/stand.h   24 Oct 2021 17:49:19 -  1.71
> > +++ 

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Ted Bullock

On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:

+   parent = OF_parent(handle);


I think the OF_parent call can go inside the !strcmp(buf, "block") block.



I worried that the following re-assignment of the handle would cause 
problems, so I chose an order of operations and placed the parent call 
before this re-assignment.  The reason for my concern is based on not 
knowing the proper distinction between OF_finddevice and OF_open in the 
boot prom itself (they are both kind of opaque to me and just send bits 
for interpretation into the boot prom).


if ((handle = OF_open(fname)) == -1) {
DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
return ENXIO;
}

It's possible that there is no meaningful distinction between when 
handle is re-assigned midway through the function, in which case you are 
correct it could absolutely be moved later in the function call. 
Notably there is quite a bit of variable re-use and re-purposing in the 
devopen function so I chose the place I thought would be safest.


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Otto Moerbeek
On Wed, Nov 24, 2021 at 02:48:29PM -0700, Ted Bullock wrote:

> On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > 
> > I can confirm that this allows my sunblade 100 to boot -current
> Hi folks,
> 
> I'm requesting to have the patch I sent in a previous mail reviewed and
> committed. I also attached the patch here.
> 
> (for reference)
> https://marc.info/?l=openbsd-bugs=163744498300496=2
> 
> This should affect only a small set of older sparc64 machines that use
> IDE drives instead of SCSI.  At least two such machines types are not
> working and are corrupting filesystems when booted with recent versions
> of OpenBSD since writing calls were introduced to the second stage boot
> block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> older sparc64 machines will potentially re-use entropy from previous
> boot cycles if a new random seed file was not written between restarts.
> 
> -- 
> Ted Bullock 

Tests ok (using a SCSI disk, I still see the correct behaviour, I do
not have IDE disks so I cannot test that).

One comment inline,

-Otto

> Index: arch/sparc64/stand/ofwboot/ofdev.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
> retrieving revision 1.31
> diff -u -p -u -p -r1.31 ofdev.c
> --- arch/sparc64/stand/ofwboot/ofdev.c9 Dec 2020 18:10:19 -   
> 1.31
> +++ arch/sparc64/stand/ofwboot/ofdev.c20 Nov 2021 12:36:10 -
> @@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
>   char fname[256];
>   char buf[DEV_BSIZE];
>   struct disklabel label;
> - int handle, part;
> + int handle, part, parent;
>   int error = 0;
>  #ifdef SOFTRAID
>   char volno;
> @@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
>  #endif
>   if ((handle = OF_finddevice(fname)) == -1)
>   return ENOENT;
> +
> + parent = OF_parent(handle);

I think the OF_parent call can go inside the !strcmp(buf, "block") block.


> +
>   DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
>   if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
>   return ENXIO;
> @@ -685,6 +688,13 @@ devopen(struct open_file *of, const char
>  
>   of->f_dev = devsw;
>   of->f_devdata = 
> +
> + /* Some PROMS have bugged writing code for ide block devices */

here I mean

> + if (OF_getprop(parent, "name", buf, sizeof buf) < 0)
> + return ENXIO;
> + if (!strcmp(buf, "ide"))
> + of->f_flags |= F_NOWRITE;
> +
>  #ifdef SPARC_BOOT_UFS
>   bcopy(_system_ufs, _system[nfsys++], sizeof 
> file_system[0]);
>   bcopy(_system_ufs2, _system[nfsys++], sizeof 
> file_system[0]);
> Index: arch/sparc64/stand/ofwboot/vers.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
> retrieving revision 1.22
> diff -u -p -u -p -r1.22 vers.c
> --- arch/sparc64/stand/ofwboot/vers.c 9 Dec 2020 18:10:19 -   1.22
> +++ arch/sparc64/stand/ofwboot/vers.c 20 Nov 2021 12:36:11 -
> @@ -1 +1 @@
> -const char version[] = "1.21";
> +const char version[] = "1.22";
> Index: lib/libsa/fchmod.c
> ===
> RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
> retrieving revision 1.1
> diff -u -p -u -p -r1.1 fchmod.c
> --- lib/libsa/fchmod.c3 Aug 2019 15:22:17 -   1.1
> +++ lib/libsa/fchmod.c20 Nov 2021 12:36:12 -
> @@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
>   errno = EOPNOTSUPP;
>   return (-1);
>   }
> + /* writing is broken or unsupported */
> + if (f->f_flags & F_NOWRITE) {
> + errno = EOPNOTSUPP;
> + return (-1);
> + }
>  
>   errno = (f->f_ops->fchmod)(f, m);
>   return (0);
> Index: lib/libsa/stand.h
> ===
> RCS file: /cvs/src/sys/lib/libsa/stand.h,v
> retrieving revision 1.71
> diff -u -p -u -p -r1.71 stand.h
> --- lib/libsa/stand.h 24 Oct 2021 17:49:19 -  1.71
> +++ lib/libsa/stand.h 20 Nov 2021 12:36:12 -
> @@ -107,10 +107,11 @@ struct open_file {
>  extern struct open_file files[];
>  
>  /* f_flags values */
> -#define  F_READ  0x0001  /* file opened for reading */
> -#define  F_WRITE 0x0002  /* file opened for writing */
> -#define  F_RAW   0x0004  /* raw device open - no file system */
> -#define F_NODEV  0x0008  /* network open - no device */
> +#define F_READ  0x0001 /* file opened for reading */
> +#define F_WRITE 0x0002 /* file opened for writing */
> +#define F_RAW   0x0004 /* raw device open - no file system */
> +#define F_NODEV 0x0008 /* network open - no device */
> 

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-24 Thread Otto Moerbeek
On Wed, Nov 24, 2021 at 11:29:09PM +0100, Mark Kettenis wrote:

> > Date: Wed, 24 Nov 2021 14:48:29 -0700
> > From: Ted Bullock 
> 
> Hi Ted,
> 
> Yes, I think your idea is sane and the diff looks good.  But I'd like
> to test it myself before I commit it and I haven't found the time to
> do so yet.
> 
> Do remind me if I haven't done so in a week or so.

Same for me.

-Otto

> 
> Thanks,
> 
> Mark
> 
> > On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > > 
> > > I can confirm that this allows my sunblade 100 to boot -current
> > Hi folks,
> > 
> > I'm requesting to have the patch I sent in a previous mail reviewed and
> > committed. I also attached the patch here.
> > 
> > (for reference)
> > https://marc.info/?l=openbsd-bugs=163744498300496=2
> > 
> > This should affect only a small set of older sparc64 machines that use
> > IDE drives instead of SCSI.  At least two such machines types are not
> > working and are corrupting filesystems when booted with recent versions
> > of OpenBSD since writing calls were introduced to the second stage boot
> > block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> > older sparc64 machines will potentially re-use entropy from previous
> > boot cycles if a new random seed file was not written between restarts.
> > 
> > -- 
> > Ted Bullock 
> > 
> > [2:text/plain Show Save:sunblade100.patch (3kB)]
> > 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-24 Thread Mark Kettenis
> Date: Wed, 24 Nov 2021 14:48:29 -0700
> From: Ted Bullock 

Hi Ted,

Yes, I think your idea is sane and the diff looks good.  But I'd like
to test it myself before I commit it and I haven't found the time to
do so yet.

Do remind me if I haven't done so in a week or so.

Thanks,

Mark

> On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > 
> > I can confirm that this allows my sunblade 100 to boot -current
> Hi folks,
> 
> I'm requesting to have the patch I sent in a previous mail reviewed and
> committed. I also attached the patch here.
> 
> (for reference)
> https://marc.info/?l=openbsd-bugs=163744498300496=2
> 
> This should affect only a small set of older sparc64 machines that use
> IDE drives instead of SCSI.  At least two such machines types are not
> working and are corrupting filesystems when booted with recent versions
> of OpenBSD since writing calls were introduced to the second stage boot
> block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> older sparc64 machines will potentially re-use entropy from previous
> boot cycles if a new random seed file was not written between restarts.
> 
> -- 
> Ted Bullock 
> 
> [2:text/plain Show Save:sunblade100.patch (3kB)]
> 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-24 Thread Ted Bullock
On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> This patch disables fchmod in the bootblock for IDE drives on sparc64
> 
> I can confirm that this allows my sunblade 100 to boot -current
Hi folks,

I'm requesting to have the patch I sent in a previous mail reviewed and
committed. I also attached the patch here.

(for reference)
https://marc.info/?l=openbsd-bugs=163744498300496=2

This should affect only a small set of older sparc64 machines that use
IDE drives instead of SCSI.  At least two such machines types are not
working and are corrupting filesystems when booted with recent versions
of OpenBSD since writing calls were introduced to the second stage boot
block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
older sparc64 machines will potentially re-use entropy from previous
boot cycles if a new random seed file was not written between restarts.

-- 
Ted Bullock Index: arch/sparc64/stand/ofwboot/ofdev.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
retrieving revision 1.31
diff -u -p -u -p -r1.31 ofdev.c
--- arch/sparc64/stand/ofwboot/ofdev.c  9 Dec 2020 18:10:19 -   1.31
+++ arch/sparc64/stand/ofwboot/ofdev.c  20 Nov 2021 12:36:10 -
@@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
char fname[256];
char buf[DEV_BSIZE];
struct disklabel label;
-   int handle, part;
+   int handle, part, parent;
int error = 0;
 #ifdef SOFTRAID
char volno;
@@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
 #endif
if ((handle = OF_finddevice(fname)) == -1)
return ENOENT;
+
+   parent = OF_parent(handle);
+
DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
return ENXIO;
@@ -685,6 +688,13 @@ devopen(struct open_file *of, const char
 
of->f_dev = devsw;
of->f_devdata = 
+
+   /* Some PROMS have bugged writing code for ide block devices */
+   if (OF_getprop(parent, "name", buf, sizeof buf) < 0)
+   return ENXIO;
+   if (!strcmp(buf, "ide"))
+   of->f_flags |= F_NOWRITE;
+
 #ifdef SPARC_BOOT_UFS
bcopy(_system_ufs, _system[nfsys++], sizeof 
file_system[0]);
bcopy(_system_ufs2, _system[nfsys++], sizeof 
file_system[0]);
Index: arch/sparc64/stand/ofwboot/vers.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
retrieving revision 1.22
diff -u -p -u -p -r1.22 vers.c
--- arch/sparc64/stand/ofwboot/vers.c   9 Dec 2020 18:10:19 -   1.22
+++ arch/sparc64/stand/ofwboot/vers.c   20 Nov 2021 12:36:11 -
@@ -1 +1 @@
-const char version[] = "1.21";
+const char version[] = "1.22";
Index: lib/libsa/fchmod.c
===
RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
retrieving revision 1.1
diff -u -p -u -p -r1.1 fchmod.c
--- lib/libsa/fchmod.c  3 Aug 2019 15:22:17 -   1.1
+++ lib/libsa/fchmod.c  20 Nov 2021 12:36:12 -
@@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
errno = EOPNOTSUPP;
return (-1);
}
+   /* writing is broken or unsupported */
+   if (f->f_flags & F_NOWRITE) {
+   errno = EOPNOTSUPP;
+   return (-1);
+   }
 
errno = (f->f_ops->fchmod)(f, m);
return (0);
Index: lib/libsa/stand.h
===
RCS file: /cvs/src/sys/lib/libsa/stand.h,v
retrieving revision 1.71
diff -u -p -u -p -r1.71 stand.h
--- lib/libsa/stand.h   24 Oct 2021 17:49:19 -  1.71
+++ lib/libsa/stand.h   20 Nov 2021 12:36:12 -
@@ -107,10 +107,11 @@ struct open_file {
 extern struct open_file files[];
 
 /* f_flags values */
-#defineF_READ  0x0001  /* file opened for reading */
-#defineF_WRITE 0x0002  /* file opened for writing */
-#defineF_RAW   0x0004  /* raw device open - no file system */
-#define F_NODEV0x0008  /* network open - no device */
+#define F_READ  0x0001 /* file opened for reading */
+#define F_WRITE 0x0002 /* file opened for writing */
+#define F_RAW   0x0004 /* raw device open - no file system */
+#define F_NODEV 0x0008 /* network open - no device */
+#define F_NOWRITE   0x0010 /* bootblock writing broken or unsupported */
 
 #define isupper(c) ((c) >= 'A' && (c) <= 'Z')
 #define islower(c) ((c) >= 'a' && (c) <= 'z')


Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-20 Thread Ted Bullock
On 2021-11-18 6:25 p.m., Ted Bullock wrote:
> On 2021-11-15 2:41 p.m., Ted Bullock wrote:
>> On 2021-11-15 2:21 p.m., Theo de Raadt wrote:
>>> I do want fchmod for random reuse on most sparc64 machines.  But if
>>> OFW "write"
>>> support isn't working on some machines, we should nop it out.  Luckily
>>> this is
>>> in MD code.
>>
>> It's probably reasonable for any version older than OpenBoot 4.17.1
>> should get this treatment (This is the most recent ofw for the blade
>> 100/150)
>>
> 
> I've had some difficulty trying to figure out how the OFW actually is
> programmed, specifically how to query the PROM version.
> 

So after reading the openfirmware documents for a few days and learning to
read forth, I determined that using the sun prom fcode firmware-version
will not do the trick here. The data it returns is not granular enough and
the 1275 spec says it's obsolete too. 

That said, it got me to thinking about something else I had reported, I've
only encountered the problem on older sun gear using IDE devices (ultra 5
and blade 100). I think the writing bug on the sun proms is specific to 
IDE drives (though the nature of the bug is still a mystery to me).

Theo, you said that you want the majority of the sparc64 systems to 
continue to be able to prevent re-use of the seed on boot, so I propose
that OpenBSD just no-op the fchmod call for IDE devices only, which should
work around the problem for the broken(?) prom systems and everything else
that uses some variant of SCSI and its decedents will work as before.

This patch disables fchmod in the bootblock for IDE drives on sparc64

I can confirm that this allows my sunblade 100 to boot -current

Index: arch/sparc64/stand/ofwboot/ofdev.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
retrieving revision 1.31
diff -u -p -u -p -r1.31 ofdev.c
--- arch/sparc64/stand/ofwboot/ofdev.c  9 Dec 2020 18:10:19 -   1.31
+++ arch/sparc64/stand/ofwboot/ofdev.c  20 Nov 2021 12:36:10 -
@@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
char fname[256];
char buf[DEV_BSIZE];
struct disklabel label;
-   int handle, part;
+   int handle, part, parent;
int error = 0;
 #ifdef SOFTRAID
char volno;
@@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
 #endif
if ((handle = OF_finddevice(fname)) == -1)
return ENOENT;
+
+   parent = OF_parent(handle);
+
DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
return ENXIO;
@@ -685,6 +688,13 @@ devopen(struct open_file *of, const char
 
of->f_dev = devsw;
of->f_devdata = 
+
+   /* Some PROMS have bugged writing code for ide block devices */
+   if (OF_getprop(parent, "name", buf, sizeof buf) < 0)
+   return ENXIO;
+   if (!strcmp(buf, "ide"))
+   of->f_flags |= F_NOWRITE;
+
 #ifdef SPARC_BOOT_UFS
bcopy(_system_ufs, _system[nfsys++], sizeof 
file_system[0]);
bcopy(_system_ufs2, _system[nfsys++], sizeof 
file_system[0]);
Index: arch/sparc64/stand/ofwboot/vers.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
retrieving revision 1.22
diff -u -p -u -p -r1.22 vers.c
--- arch/sparc64/stand/ofwboot/vers.c   9 Dec 2020 18:10:19 -   1.22
+++ arch/sparc64/stand/ofwboot/vers.c   20 Nov 2021 12:36:11 -
@@ -1 +1 @@
-const char version[] = "1.21";
+const char version[] = "1.22";
Index: lib/libsa/fchmod.c
===
RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
retrieving revision 1.1
diff -u -p -u -p -r1.1 fchmod.c
--- lib/libsa/fchmod.c  3 Aug 2019 15:22:17 -   1.1
+++ lib/libsa/fchmod.c  20 Nov 2021 12:36:12 -
@@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
errno = EOPNOTSUPP;
return (-1);
}
+   /* writing is broken or unsupported */
+   if (f->f_flags & F_NOWRITE) {
+   errno = EOPNOTSUPP;
+   return (-1);
+   }
 
errno = (f->f_ops->fchmod)(f, m);
return (0);
Index: lib/libsa/stand.h
===
RCS file: /cvs/src/sys/lib/libsa/stand.h,v
retrieving revision 1.71
diff -u -p -u -p -r1.71 stand.h
--- lib/libsa/stand.h   24 Oct 2021 17:49:19 -  1.71
+++ lib/libsa/stand.h   20 Nov 2021 12:36:12 -
@@ -107,10 +107,11 @@ struct open_file {
 extern struct open_file files[];
 
 /* f_flags values */
-#defineF_READ  0x0001  /* file opened for reading */
-#defineF_WRITE 0x0002  /* file opened for writing */
-#defineF_RAW   0x0004  /* raw device open - no file system */
-#define F_NODEV 

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-18 Thread Ted Bullock
On 2021-11-15 2:41 p.m., Ted Bullock wrote:
> On 2021-11-15 2:21 p.m., Theo de Raadt wrote:
>> I do want fchmod for random reuse on most sparc64 machines.  But if
>> OFW "write"
>> support isn't working on some machines, we should nop it out.  Luckily
>> this is
>> in MD code.
> 
> It's probably reasonable for any version older than OpenBoot 4.17.1
> should get this treatment (This is the most recent ofw for the blade
> 100/150)
> 

I've had some difficulty trying to figure out how the OFW actually is
programmed, specifically how to query the PROM version.

Anyway, I've included a patch with how I think to work around the broken
prom? Does this approach work? Obviously it needs the logic to check for
the prom version and I do see some whitespace issues too.

Index: arch/sparc64/stand/ofwboot/ofdev.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
retrieving revision 1.31
diff -u -p -u -p -r1.31 ofdev.c
--- arch/sparc64/stand/ofwboot/ofdev.c  9 Dec 2020 18:10:19 -   1.31
+++ arch/sparc64/stand/ofwboot/ofdev.c  19 Nov 2021 01:07:40 -
@@ -666,6 +666,17 @@ devopen(struct open_file *of, const char
ofdev.handle = handle;
ofdev.type = OFDEV_DISK;
ofdev.bsize = DEV_BSIZE;
+   
+   /*
+* Older OpenBoot PROMS seem to have bugged writing code
+* At least OpenBoot Version Equal or Older to 4.17.1
+* How do you ask OFW this?
+*/
+   if (1) {
+   printf("devopen: Flagging %s with F_NOWRITE\n", fname);
+   of->f_flags |= F_NOWRITE;
+   }
+
if (!strcmp(buf, "block")) {
error = load_disklabel(, );
if (error && error != ERDLAB)
Index: lib/libsa/fchmod.c
===
RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
retrieving revision 1.1
diff -u -p -u -p -r1.1 fchmod.c
--- lib/libsa/fchmod.c  3 Aug 2019 15:22:17 -   1.1
+++ lib/libsa/fchmod.c  19 Nov 2021 01:07:48 -
@@ -53,6 +53,12 @@ fchmod(int fd, mode_t m)
errno = EOPNOTSUPP;
return (-1);
}
+   /* writing is broken or unsupported */
+   if (f->f_flags & F_NOWRITE) {
+   errno = EOPNOTSUPP;
+   printf("Awooga Don't Write\n");
+   return (-1);
+   }
 
errno = (f->f_ops->fchmod)(f, m);
return (0);
Index: lib/libsa/stand.h
===
RCS file: /cvs/src/sys/lib/libsa/stand.h,v
retrieving revision 1.71
diff -u -p -u -p -r1.71 stand.h
--- lib/libsa/stand.h   24 Oct 2021 17:49:19 -  1.71
+++ lib/libsa/stand.h   19 Nov 2021 01:07:48 -
@@ -111,6 +111,7 @@ extern struct open_file files[];
 #defineF_WRITE 0x0002  /* file opened for writing */
 #defineF_RAW   0x0004  /* raw device open - no file system */
 #define F_NODEV0x0008  /* network open - no device */
+#define F_NOWRITE  0x0010  /* bootblock writing is broken or unsupported */
 
 #define isupper(c) ((c) >= 'A' && (c) <= 'Z')
 #define islower(c) ((c) >= 'a' && (c) <= 'z')


-- 
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Ted Bullock

On 2021-11-15 2:21 p.m., Theo de Raadt wrote:

I do want fchmod for random reuse on most sparc64 machines.  But if OFW "write"
support isn't working on some machines, we should nop it out.  Luckily this is
in MD code.


It's probably reasonable for any version older than OpenBoot 4.17.1 
should get this treatment (This is the most recent ofw for the blade 
100/150)



--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Theo de Raadt
Ted Bullock  wrote:

> On 2021-11-15 12:28 p.m., Theo de Raadt wrote:
> > Otto Moerbeek  wrote:
> >> So you build code with just the struct fs_ops change, while
> >> keeping everything else the same?
> >>
> >> Very puzzling...
> > indeed, you are losing us Ted.
> > Just comment out the fchmod() call.
> > Does it work then?
> 
> Yes the system boots fine with the fchmod call commented out on the
> most recent -current snapshot.
> 
> Regarding my other comment, I was looking for other calls to fchmod in
> the bootblock and thought I saw it was being called elsewhere from the 
> commentary on the patch notes but actually searching bootblock
> directory says otherwise.

>From a MI viewpoint, fchmod is used in two different places:

sys/stand/boot.c

/* Prevent re-upgrade: chmod a-x bsd.upgrade */
if (isupgrade) {
struct stat st;

if (fstat(fd, ) == 0) {
st.st_mode &= 
~(S_IXUSR|S_IXGRP|S_IXOTH);
if (fchmod(fd, st.st_mode) == -1)

and

if (sb.st_mode & S_ISTXT) {
printf("NOTE: random seed is being reused.\n");
error = -1;
goto done;
}
fchmod(fd, sb.st_mode | S_ISTXT);


But sparc64 doesn't use the MI boot.c, so it only contains the latter fchmod()
operation in the MD sys/arch/sparc64/stand/ofwboot/boot.c
I didn't bother adding the re-upgrade chunk to ofwboot, because one day I want
sparc64 to use MI boot.c


I do want fchmod for random reuse on most sparc64 machines.  But if OFW "write"
support isn't working on some machines, we should nop it out.  Luckily this is
in MD code.



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Ted Bullock



On 2021-11-15 12:28 p.m., Theo de Raadt wrote:

Otto Moerbeek  wrote:

So you build code with just the struct fs_ops change, while
keeping everything else the same?

Very puzzling...


indeed, you are losing us Ted.

Just comment out the fchmod() call.

Does it work then?


Yes the system boots fine with the fchmod call commented out on the most 
recent -current snapshot.


Regarding my other comment, I was looking for other calls to fchmod in 
the bootblock and thought I saw it was being called elsewhere from the 
commentary on the patch notes but actually searching bootblock directory 
says otherwise.


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Theo de Raadt
Otto Moerbeek  wrote:

> On Mon, Nov 15, 2021 at 11:59:07AM -0700, Ted Bullock wrote:
> 
> > On 2021-11-14 7:57 p.m., Theo de Raadt wrote:
> > > But I fear the "write" operation was never been tested by the Sun,
> > > or maybe we are not doing something
> > > 
> > > but only on earlier generations
> > I perused the change log of stand.h and revision 1.67 also might/will
> > trigger the same bug on upgrading between versions for these machines.
> 
> So you build code with just the struct fs_ops change, while
> keeping everything else the same?
> 
> Very puzzling...

indeed, you are losing us Ted.

Just comment out the fchmod() call.

Does it work then?



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Otto Moerbeek
On Mon, Nov 15, 2021 at 11:59:07AM -0700, Ted Bullock wrote:

> On 2021-11-14 7:57 p.m., Theo de Raadt wrote:
> > But I fear the "write" operation was never been tested by the Sun,
> > or maybe we are not doing something
> > 
> > but only on earlier generations
> I perused the change log of stand.h and revision 1.67 also might/will
> trigger the same bug on upgrading between versions for these machines.

So you build code with just the struct fs_ops change, while
keeping everything else the same?

Very puzzling...

-Otto



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Ted Bullock

On 2021-11-14 7:57 p.m., Theo de Raadt wrote:

But I fear the "write" operation was never been tested by the Sun,
or maybe we are not doing something

but only on earlier generations
I perused the change log of stand.h and revision 1.67 also might/will 
trigger the same bug on upgrading between versions for these machines.



--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-14 Thread Theo de Raadt
Ted Bullock  wrote:

> On 2021-11-14 7:36 p.m., Theo de Raadt wrote:
> > So maybe the "write" operation is broken on these older roms.
> 
> Is there any more verbose way of seeing what's happening as it starts?
> I'm fairly certain that ddb isn't available since this is happening 
> before the kernel is even loaded. I'm kind of worried that the only
> way to see into what is happening is with a logic analyzer but maybe
> there is higher level stuff in the obp?

hey, you can add printf's to the bootblock code

But I fear the "write" operation was never been tested by the Sun,
or maybe we are not doing something

but only on earlier generations



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-14 Thread Ted Bullock

On 2021-11-14 7:36 p.m., Theo de Raadt wrote:

So maybe the "write" operation is broken on these older roms.


Is there any more verbose way of seeing what's happening as it starts? 
I'm fairly certain that ddb isn't available since this is happening 
before the kernel is even loaded. I'm kind of worried that the only way 
to see into what is happening is with a logic analyzer but maybe there 
is higher level stuff in the obp?


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-14 Thread Theo de Raadt
Ted Bullock  wrote:

> OK more specifics,
> 
> I repeated my tests from last night and the boot does indeed fail
> after revision 1.35 on sys/arch/sparc64/stand/ofboot/boot.c
> 
> > 
> >> Random was added to the bootblocks before 6.7.  But then code was
> >> added to fchmod the random file, so that it would not be reused.
> >> That fchmod is a write-to-disk operation.
> 
> 
> Rebooting with command: boot
> Boot device: disk  File and args:
> OpenBSD IEEE 1275 Bootblock 2.0
> ..>> OpenBSD BOOT 1.19
> Trying bsd...
> /
> Evaluating:
> Flushbuf error
> read header: short read (only 0 of 64)
> 
> >> Is this write code triggering a bug in the bootblocks or is it actually
> >> broken on it's own?
> >>
> >> see loadrandom()
> > 
> 
> the fchmod call is what's triggers the error, I don't know the answer
> to your question though.

So maybe the "write" operation is broken on these older roms.

I wonder if we should do a version-check, and fail the write operation.
The bootblocks will be OK if the fchmod fails.



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-14 Thread Ted Bullock

OK more specifics,

I repeated my tests from last night and the boot does indeed fail after 
revision 1.35 on sys/arch/sparc64/stand/ofboot/boot.c





Random was added to the bootblocks before 6.7.  But then code was
added to fchmod the random file, so that it would not be reused.
That fchmod is a write-to-disk operation.



Rebooting with command: boot
Boot device: disk  File and args:
OpenBSD IEEE 1275 Bootblock 2.0
..>> OpenBSD BOOT 1.19
Trying bsd...
/
Evaluating:
Flushbuf error
read header: short read (only 0 of 64)


Is this write code triggering a bug in the bootblocks or is it actually
broken on it's own?

see loadrandom()




the fchmod call is what's triggers the error, I don't know the answer to 
your question though.


I have to step away for the evening but for those who are interested 
here is the process I used today and a similar one yesterday to 
determine when the problem was introduced.


Process
===

On separate NFS machine to prevent downloading entire tree every test
Checkout openbsd cvs src using
cvs -q checkout -PD"" src
Update source to specific date using from inside the src directory
cvs -q up -PdD"2020-05-26 16:35 UTC"

1. Install OpenBSD snapshot
 - Before first boot, switch to shell and download ofwboot from 
http://cdn.openbsd.org/pub/OpenBSD/6.7/sparc64/

 - replace the ofwboot file on the new root fs (/mnt when in bsd.rd)
2. boot new install
3. umount /usr/src (default)
4. mount  /usr/src
5. cd /usr/src/sys
6. make obj
7. cd arch/sparc64/stand
8. make
9. make install
10. installboot -v wd0
11. reboot


Checking "2020-06-05 09:15 UTC"

- This patch is just before Otto introduced bootblock 2.1

Result : Failure, fault introduced before this change



Bisecting date between 2020-06-05 and 2020-05-26

Checking 2020-06-01

Result : Failure, fault introduced before this date



Bisecting date between 2020-05-26 and 2020-06-01

Checking 2020-05-28

Result : Failure, fault introduced before this date



Bisecting date between 2020-05-26 and 2020-05-28

Checking 2020-05-27

Result : Failure, fault introduced before this date



Resetting to 2020-05-26

Checking 2020-05-26

Result : Boots



Updating to "2020-05-26 16:29 UTC"

Result : Boots



Updating to "2020-05-26 16:35 UTC"

Result : Fault

introduced/triggered at R1.35 on sys/arch/sparc64/stand/ofboot/boot.c


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-14 Thread Ted Bullock

Heyo Theo (and Mark),

On 2021-11-13 4:12 p.m., Theo de Raadt wrote:

If you say that 6.7 works, then the changes in the logs to look at are
the following.

It isn't that many diffs, see below.


I went through every diff to sys/arch/sparc64/stand between release 6.7 
(working) until things broke. Note that I was updating the entire tree, 
not just /stand (more on this in a little bit).


The point at which things break is by the time the tree hits the 
following patch:



revision 1.10
date: 2020/06/05 09:16:13;  author: otto;  state: Exp;  lines: +11 -3;  
commitid: UX6Wf38M2cpPhjSY;
Qemu does not like we load ofwboot on top of the bootblocks and as
miod notes this is actually a regression I introduced in the latest
commit. So stop doing that, even if it works on real hardware.
ok kettenis@


Specifically, the problem starts between May 26, and June 5 patches. 
There are no changes to files in sys/arch/sparc64/stand between those 
dates but there are changes in the larger tree impacting ffs during that 
window, perhaps the problem is there.



Random was added to the bootblocks before 6.7.  But then code was
added to fchmod the random file, so that it would not be reused.
That fchmod is a write-to-disk operation.

Is this write code triggering a bug in the bootblocks or is it actually
broken on it's own?

see loadrandom()


I didn't see the bug manifest by the time it got to the patches you are 
referring to here ^  However since the filesystems are getting mangled 
on the -current version I assume that there is some garbage being 
written to disk where it doesn't belong, possibly because of this code. 
Presumably it wouldn't do that if things were read only.



ofwboot/elf64_exec.c

+* Kernels up to and including OpenBSD 6.7
+* check for an exact match if the length.
+* Lie here to make sure we can still boot
+* older kernels with softraid.

The format of a structure on the disk changed.  Are you using softraid?
It might matter.


No I'm not using softraid, all the tests I've been doing have been with 
the default install and default partition layout.


...

Regarding the rest of what you wrote, I haven't taken a look yet, I was 
just working through the tedious process of updating and patching 
looking for when things break so far.


Before I go, I said that I had been updating the entire tree to make 
changes outside of the /stand path would get pulled in if they prompted 
build changes. I should mention that I see that libsa updated between 
may 26 and otto's patch on June 5, 2020 that caused ofwboot to rebuild 
as well due to a patch to some of it's files that aren't in /stand


U src/sys/ufs/ffs/ffs_alloc.c
U src/sys/ufs/ffs/ffs_vfsops.c
U src/sys/ufs/ufs/dinode.h
U src/sys/ufs/ufs/inode.h

It's possible that the problem is in one of those changes and I haven't 
gone through yet their history yet. I specifically mention this because 
I have tested the latest forth bootblock with an older version of 
ofwboot and the system has started fine, so it might not actually be 
otto's change that introduce the bug? More testing IS required.


--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-13 Thread Theo de Raadt
Nope, this is also needed ahead of time

  # cd /usr/src/sys
  # make obj

otherwise, sys/stand and sys/lib don't have obj@ subtrees, and the
later compile will reach into the wrong places.

Experienced developers can forget about this because they always have
obj subtrees...

> # cd /usr/src/sys/arch/sparc64/stand
> # make obj
> # make
> 
> you should end up with an ofwboot binary in
> /usr/src/sys/arch/sparc64/stand/ofwboot/obj that you can test.




Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-13 Thread Theo de Raadt
If you say that 6.7 works, then the changes in the logs to look at are
the following.

It isn't that many diffs, see below.

Compiling the bootblocks is not hard

  cd /usr/src
  make obj  # YES FROM THE TOP OF THE TREE
  cd sys/arch/sparc64/stand
  make
  make install

in the forth bootblk:

revision 1.10
date: 2020/06/05 09:16:13;  author: otto;  state: Exp;  lines: +11 -3;  
commitid: UX6Wf38M2cpPhjSY;
Qemu does not like we load ofwboot on top of the bootblocks and as
miod notes this is actually a regression I introduced in the latest
commit. So stop doing that, even if it works on real hardware.
ok kettenis@

revision 1.9
date: 2020/04/02 06:06:22;  author: otto;  state: Exp;  lines: +529 -373;  
commitid: DiwabwH5EeuMX7D2;
Change bootblocks to be able to read from ffs1, ffs2 and softraid.
ffs2 part from eeh at netbsd. ok deraadt@


ofwboot/boot.c

Random was added to the bootblocks before 6.7.  But then code was
added to fchmod the random file, so that it would not be reused.
That fchmod is a write-to-disk operation.

Is this write code triggering a bug in the bootblocks or is it actually
broken on it's own?

see loadrandom()



ofwboot/elf64_exec.c

+* Kernels up to and including OpenBSD 6.7
+* check for an exact match if the length.
+* Lie here to make sure we can still boot
+* older kernels with softraid.

The format of a structure on the disk changed.  Are you using softraid?
It might matter.



There is also some change for the symbol table.  The commit history
explains it.



ofwboot/ofdev.c

date: 2018/03/29 08:12:58;  author: stsp;  state: Exp;  lines: +3 -3;  
commitid: CrCKSPquGDvYCD8s;
Make sparc64 ofwboot open the softraid boot chunk early on and keep the
handle open until a kernel has been loaded from the softraid volume.

This works around an apparent memory leak in the firmware on T5220
which fails to load an ldom guest kernel from softraid with a "Fast
Data Access MMU Miss" trap after many OF_open()/OF_close() calls.
This problem goes away when we call OF_open()/OF_close() just once
instead of for every block we want to read.

Crank ofwboot version to 1.10.

Make sure to keep working boot media around before upgrading!
Softraid boot of an T5220 ldom guest (CRYPTO), and a v240 (RAID 1),
have been tested and are known to work. Please report issues to bugs@.

ok kettenis@




Those are the ones to try to back out, independently, or sometimes a
chain of commits.  Figure out which of these problems it is.


It is very *POSSIBLE* this is trigging bugs in the PROMS on the older
machines, but we don't use them anymore, so that's how we got into
this situation.



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-13 Thread Mark Kettenis
> Date: Sat, 13 Nov 2021 15:45:19 -0700
> From: Ted Bullock 
> 
> On 2021-11-04 4:04 p.m., Ted Bullock wrote:
> > On 2021-11-02 5:36 p.m., Ted Bullock wrote:
> >> Reporting an issue that is causing my SunBlade 100 (ultrasparc64) 
> >> workstation to be unable too boot. 
> > 
> > I've identified the malfunctioning component that is causing installs 
> > after 6.7 to fail for this system, although I'm not certain yet which 
> > patch broke things for this system.
> I've done further testing on ofwboot. The version in 7.0 and snapshots 
> is very broken, and appears to entirely mangle the root filesystem of 
> sparc64 machines that use wd(4) boot devices to the point that it's 
> necessary to reinstall entirely between testing attempts. (I've had a 
> friend test on an ultra 10 and encountered the same fault)
> 
> The current workaround I am using is to install and before the first 
> boot, manually copy a working version of ofwboot from the 6.7 release. 
> Note that ofwboot from 6.8 and 6.9, while broken, don't seem to mangle 
> the filesystem entirely like the version in snapshots does.
> 
> I gather from the lack of response to the bug reports that the platform 
> and impacted systems are low priority, so I don't want to impose too 
> much on the devs here. I would like to bisect the various patches to 
> ofwboot to find the fault but I need some guidance on how to build the 
> program (in sys/arch/sparc64/stand) since my first several attempts 
> don't seem to have produced a working binary.
> 
> Thanks guys

I suspect that ofdev.c rev 1.30 is what breaks things.  So maybe try a
version from before 2020/05/26 and one after and see if things break.

If you check out the source tree with -D2020/05/25 and do:

# cd /usr/src/sys/arch/sparc64/stand
# make obj
# make

you should end up with an ofwboot binary in
/usr/src/sys/arch/sparc64/stand/ofwboot/obj that you can test.






Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-13 Thread Ted Bullock

On 2021-11-04 4:04 p.m., Ted Bullock wrote:

On 2021-11-02 5:36 p.m., Ted Bullock wrote:
Reporting an issue that is causing my SunBlade 100 (ultrasparc64) 
workstation to be unable too boot. 


I've identified the malfunctioning component that is causing installs 
after 6.7 to fail for this system, although I'm not certain yet which 
patch broke things for this system.
I've done further testing on ofwboot. The version in 7.0 and snapshots 
is very broken, and appears to entirely mangle the root filesystem of 
sparc64 machines that use wd(4) boot devices to the point that it's 
necessary to reinstall entirely between testing attempts. (I've had a 
friend test on an ultra 10 and encountered the same fault)


The current workaround I am using is to install and before the first 
boot, manually copy a working version of ofwboot from the 6.7 release. 
Note that ofwboot from 6.8 and 6.9, while broken, don't seem to mangle 
the filesystem entirely like the version in snapshots does.


I gather from the lack of response to the bug reports that the platform 
and impacted systems are low priority, so I don't want to impose too 
much on the devs here. I would like to bisect the various patches to 
ofwboot to find the fault but I need some guidance on how to build the 
program (in sys/arch/sparc64/stand) since my first several attempts 
don't seem to have produced a working binary.


Thanks guys

--
Ted Bullock 



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-04 Thread Ted Bullock

On 2021-11-02 5:36 p.m., Ted Bullock wrote:
Reporting an issue that is causing my SunBlade 100 (ultrasparc64) 
workstation to be unable too boot.  


I've identified the malfunctioning component that is causing installs 
after 6.7 to fail for this system, although I'm not certain yet which 
patch broke things for this system.


ofwboot on sparc64 was altered between versions 6.7 and 6.8 in such a 
way that it breaks booting from hdd (wd device). The bootblock itself or 
part 1 of the boot seems fine but something causes the hdd to be 
unreadable to the kernel when loaded with the phase 2 bootloaders.


I've installed the ofwboot version from Nov 3 snapshot into a working 
6.7 install and was able to reproduce the same failure mode that I'm 
seeing on -current today with a fresh install.


Further, I tried booting a 6.7 install with the kernel from the Nov. 3 
snapshot and the system boots normally with the -current kernel.


I'm not clear which patch broke the bootloader for the hdd install but 
there don't seem to be many between 6.7 and 6.8 that impacted ofwboot.


I'm guessing that there aren't many users running with a standard IDE 
hard drive on sparc64 anymore so the failure mode hasn't been caught or 
at least looked into. I don't feel particularly competent at the moment 
to diagnose what is causing the bug in ofwboot itself, so I'm hoping 
someone can/will spare a bit of their valuable time to help identify the 
problem.


--
Ted Bullock