Hi, Our IIJ team is planning to run OpenBSD on HPE ProLiant DL20 Gen10 server.
Yasuoka-san and I are testing on this host and found that `atactl sd0 readattr`
doesn't show any messages, just quits silently.

'sd0' disk is shown in dmesg as follows.

```
sd0 at scsibus1 targ 0 lun 0: <ATA, MK000480GWSSC, HPG3> naa.5002538e01735e7f
sd0: 457862MB, 512 bytes/sector, 937703088 sectors, thin
```

To see what's happening to atactl(8), we added printf code like this.

```
diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
index 85dfced8c9a..2babdafffc7 100644
--- a/sbin/atactl/atactl.c
+++ b/sbin/atactl/atactl.c
@@ -1660,10 +1660,11 @@ device_attr(int argc, char *argv[])
        if (attr_val.revision != attr_thr.revision) {
                /*
                 * Non standard vendor implementation.
                 * Return, since we don't know how to use this.
                 */
+               printf("atactl: revision mismatch SMART_READ=0x%02x 
SMART_THREASHOLD=0x%02x\n", attr_val.revision, attr_thr.revision);
                return;
        }

        attr = attr_val.attribute;
        thr = attr_thr.threshold;

```

And got following message.

```
$ doas ./atactl sd0 readattr
atactl: revision mismatch SMART_READ=0x01 SMART_THREASHOLD=0x00

```

These 2 revisions of 'attr_val' and 'attr_thr' are different on this disk.
The comment says that it's wrong vendor implementation but I can see
'smartctl' shows the attributes as follows. NetBSD's atactl doesn't
see these 2 revisions are same but checks each checksum is valid.

```
SMART Attributes Data Structure revision number: 1
  Vendor Specific SMART Attributes with Thresholds:
  ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate     0x000f   200   200   002    Pre-fail  Always      
 -       0
    5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always      
 -       0
    9 Power_On_Hours          0x0032   099   099   000    Old_age   Always      
 -       3138
  173 Unknown_Attribute       0x0033   099   099   005    Pre-fail  Always      
 -       8
  175 Program_Fail_Count_Chip 0x0033   100   100   001    Pre-fail  Always      
 -       0
  180 Unused_Rsvd_Blk_Cnt_Tot 0x003b   200   200   097    Pre-fail  Always      
 -       0
  194 Temperature_Celsius     0x0022   080   075   000    Old_age   Always      
 -       20
  196 Reallocated_Event_Count 0x0033   100   100   005    Pre-fail  Always      
 -       0
  202 Unknown_SSD_Attribute   0x0033   100   100   010    Pre-fail  Always      
 -       0
```

So, I propose to change the method to verify checksums instead of revision 
checks.

```
diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
index 85dfced8c9a..1f77460ce3d 100644
--- a/sbin/atactl/atactl.c
+++ b/sbin/atactl/atactl.c
@@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[])
        req.datalen = sizeof(attr_thr);
        ata_command(&req);

-       if (attr_val.revision != attr_thr.revision) {
-               /*
-                * Non standard vendor implementation.
-                * Return, since we don't know how to use this.
-                */
-               return;
-       }
+       if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0)
+               errx(1, "Checksum mismatch (attr_val)");
+
+       if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0)
+               errx(1, "Checksum mismatch (attr_thr)");

        attr = attr_val.attribute;
        thr = attr_thr.threshold;

```

With this patch, we got following result.

```
Attributes table revision: 1
  ID      Attribute name                  Threshold       Value   Raw
    1     Raw Read Error Rate                2            200     0x000000000000
    5     Reallocated Sector Count           5            100     0x000000000000
    9     Power-On Hours Count               0             99     0x000000000c42
  173     Unknown                            5             99     0x000000000008
  175     Unknown                            1            100     0x000000000000
  180     Unknown                           97            200     0x000000000000
  194     Temperature                        0             80     0x000000000014
  196     Reallocation Event Count           5            100     0x000000000000
  202     Data Address Mark Errors          10            100     0x000000000000
```

Is the proposed patch OK?

--
Yuichiro NAITO (naito.yuich...@gmail.com)

Reply via email to