Re: Device timeout reading fsbn ...

2019-10-01 Thread Thomas Mueller
from Michael van Elst:

> mueller6...@twc.com ("Thomas Mueller") writes:

> >Do you know when (what version) NCQ was introduced to NetBSD?  Was it before 
> >or after 7.99.1?

> It's only in HEAD and will be in netbsd-9.


> >What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
> >smartmontools?

> Sorry, atactl, it is a native command. E.g.

> # atactl wd0 smart status
> SMART supported, SMART enabled
> id value thresh crit collect reliability description raw
(snip)

Now I see why I could trust my old 7.99.1 installation to act as server when I 
was updating a NetBSD installation by NFS from the other computer.

Thanks for the information!

I looked through src/doc/CHANGES.prev on HEAD but couldn't find where NCQ was 
introduced.

I ran "atactl wd1 smart status" but couldn't find anything wrong from that 
display of information.

I suppose I should run smartmontools from a different hard drive or other drive 
such as a USB stick?

from Mike Pumford:

> On 01/10/2019 14:36, Thomas Mueller wrote:

> > Do you know when (what version) NCQ was introduced to NetBSD?  Was it 
> > before or after 7.99.1?

> It went in after NetBSD 8.x was branched so I'd guess it would be somewhere in
> the 8.99.xx versions. It is in the 9.0_BETA branch as well.

> > What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
> > smartmontools?

> > I don't have smartmontools installed but could run it from the System 
> > Rescue CD or build in NetBSD (or FreeBSD or Linux?) on the Hitachi hard 
> > drive.

> > Firmware or driver bug could explain why the Western Digital Green hard 
> > drive might be adversely affected but not all other hard drives.

> > I believe Western Digital discontinued the Green hard drives because of 
> > technical or performance problems.

> The fact that the drives were deliberately designed to spin themselves down
> behind the back of the operating system and ATA driver meaning that the next
> time the OS tried to do an IO the operation would timeout and have to be
> retried after the disk had spun back up. This tended to trigger the type of
> fsbn errors you are seeing.

> All the extra spin up/spin down cycles played havoc with performance and I
> think also took its toll on the drive electronics. The whole idea was fairly
> flawed as spinning up a drive uses more power than at any other time in drive
> operation so doing it more often costs power unless you are confident that the
> drive can be down long enough to offset that usage.

> I thought they did finally produce a version of the firmware where you could
> at least turn that ridiculous behaviour off but I've no idea where you can
> find it. The other way to avoid it is to ensure the OS does a disk operation
> often enough to inhibit the spindown.

I haven't noticed the crash with FreeBSD, but FreeBSD has other problems, could 
be either the hard drive or motherboard.

But if I want to go further with NetBSD, I guess I need to run

sysctl -w hw.wd1.use_ncq=0

and see if this solves the problem.

But I still need to be aware of the possibility of this hard drive going fully 
bad.

Tom



Re: Device timeout reading fsbn ...

2019-10-01 Thread Mike Pumford




On 01/10/2019 14:36, Thomas Mueller wrote:


Do you know when (what version) NCQ was introduced to NetBSD?  Was it before or 
after 7.99.1?

It went in after NetBSD 8.x was branched so I'd guess it would be 
somewhere in the 8.99.xx versions. It is in the 9.0_BETA branch as well.



What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
smartmontools?

I don't have smartmontools installed but could run it from the System Rescue CD 
or build in NetBSD (or FreeBSD or Linux?) on the Hitachi hard drive.

Firmware or driver bug could explain why the Western Digital Green hard drive 
might be adversely affected but not all other hard drives.

I believe Western Digital discontinued the Green hard drives because of 
technical or performance problems.

The fact that the drives were deliberately designed to spin themselves 
down behind the back of the operating system and ATA driver meaning that 
the next time the OS tried to do an IO the operation would timeout and 
have to be retried after the disk had spun back up. This tended to 
trigger the type of fsbn errors you are seeing.


All the extra spin up/spin down cycles played havoc with performance and 
I think also took its toll on the drive electronics. The whole idea was 
fairly flawed as spinning up a drive uses more power than at any other 
time in drive operation so doing it more often costs power unless you 
are confident that the drive can be down long enough to offset that usage.


I thought they did finally produce a version of the firmware where you 
could at least turn that ridiculous behaviour off but I've no idea where 
you can find it. The other way to avoid it is to ensure the OS does a 
disk operation often enough to inhibit the spindown.


Mike


Re: uefi and i915drm

2019-10-01 Thread Cayo Puigdefabregas
Good news, with the last beta 9 I can my i915drmkms graphic with UEFI.
Thank you everybody

Cayo

El mié., 18 sept. 2019 a las 20:24, Cayo Puigdefabregas
() escribió:
>
> Thank you.
> I will try to get more info, more printf.
> I also want to make a non-modular kernel. I don't know if I'll know
> how to do it and if this can fix it.
>
> Greetings
>
> Cayo
>
> El mié., 18 sept. 2019 a las 20:01, Jonathan A. Kollasch
> () escribió:
> >
> > The graphics component of 7th generation Core i* chips is too new.
> >
> > i915drm is a kernel builtin module in your kernel, so loading it again
> > from the file system fails.
> >
> > Jonathan Kollasch
> >
> > On Tue, Sep 17, 2019 at 10:59:06PM +0200, Cayo Puigdefabregas wrote:
> > > Hi, I have a problem with all versions of Netbsd and I don't know if
> > > this BETA can be fixed.
> > > The problem is that I can not boot startx when NetBSD is configured to
> > > boot from an efi partition.
> > > I have assumed that there is some problem when the i915drmkms module are 
> > > loaded.
> > > But I don't see anything strange because the modules seem to load 
> > > correctly.
> > > What seems odd to me is that it has to do with the boot of the efi 
> > > partition.
> > >
> > > When I configure the hard drive partition without UEFI and install it
> > > with BIOS, it works very well if I use the Netbsd bootloader,
> > > but I have the same problem if I boot it from grub2 and with the
> > > NetBSD bootloader in a partition
> > >
> > > If I try to load the i915drm module with modload, it gives me an error.
> > > I attach all the log files that I think are interesting.
> > >
> > > Can anyone have any idea why it happens?
> > > I would appreciate any suggestions
> > >
> > > Thank you
> >


Re: Device timeout reading fsbn ...

2019-10-01 Thread Michael van Elst
mueller6...@twc.com ("Thomas Mueller") writes:

>Do you know when (what version) NCQ was introduced to NetBSD?  Was it before 
>or after 7.99.1?

It's only in HEAD and will be in netbsd-9.


>What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
>smartmontools?

Sorry, atactl, it is a native command. E.g.

# atactl wd0 smart status
SMART supported, SMART enabled
id value thresh crit collect reliability description raw
  1 1096 yes online  positiveRaw read error rate 25066120
  3 1000 yes online  positiveSpin-up time0
  4 100   20 no  online  positiveStart/stop count77
  5 100   36 yes online  positiveReallocated sector count0
  7  86   30 yes online  positiveSeek error rate 415904943
  9  630 no  online  positivePower-on hours count32748
 10 100   97 yes online  positiveSpin retry count0
 12 100   20 no  online  positiveDevice power cycle count78
183 1000 no  online  positiveSATA Downshift Error Count  0
184 100   99 no  online  positiveEnd-to-end error0
187 1000 no  online  positiveReported uncorrect  0
188 1000 no  online  positiveCommand Timeout 0
189 1000 no  online  positiveHigh Fly Writes 0
190  70   45 no  online  positiveAirflow Temperature 30 
Lifetime min/max 23/0
194  300 no  online  positiveTemperature 30 
Lifetime min/max 0/19
195  500 no  online  positiveHardware ECC Recovered  25066120
197 1000 no  online  positiveCurrent pending sector  0
198 1000 no  offline positiveOffline uncorrectable   0
199 2000 no  online  positiveUltra DMA CRC error count   0
240 1000 no  offline positiveHead flying hours   
24661702246380
241 1000 no  offline positiveTotal LBAs Written  4085414554
242 1000 no  offline positiveTotal LBAs Read 4233695140


A high 'raw read error rate' or even recovered ECC errors don't say that
the drive is defective. That's natural behaviour with modern drives
(and some don't hide that).

The 'raw' column is also not standardized and the values may be misleading.
Sometimes smartmontools knows how to interpret the values better.

Critical values (second column) start with some number (mostly 100)
and go down.  The thresholds are based on manufacturing statistics,
when you reach them, it becomes likely that the drive will fail
soon, but maybe it already has :)

Some drives do not have SMART enabled by default. You can change that
with the atactl command, but some drives only then start to collect
statistics.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Device timeout reading fsbn ...

2019-10-01 Thread J. Lewis Muir
On 10/01, Thomas Mueller wrote:
> mueller6...@twc.com ("Thomas Mueller") writes, and Michael van Elst responds:
> > Backing up the data is of course the first thing. But please also check
> > the disk with smartmontools or 'atatctl wd1 smart status' to see if itself
> > reports problems.
> 
> What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
> smartmontools?

I assume he made a typo with an extra 't'.  I think he meant atactl

  https://netbsd.gw.com/cgi-bin/man-cgi?atactl++NetBSD-current

Lewis


Re: Device timeout reading fsbn ...

2019-10-01 Thread Chavdar Ivanov
atactl, mistype.

On Tue, 1 Oct 2019 at 14:41, Thomas Mueller  wrote:
>
>
> mueller6...@twc.com ("Thomas Mueller") writes, and Michael van Elst responds:
>
> > >> sysctl -w hw.wd0.use_ncq=0
>
> > >Actually that would be wd1 in my case as opposed to wd0, but is there any 
> > >danger in using this sysctl, could it make the hard drive go bad more 
> > >quickly?
>
> > There is no danger, it prevents the driver from queuing more than one
> > request to the drive. That's how older netbsd versions behaved.
>
> > While a bad disk can't be ruled out, I tend to suspect a firmware or
> > driver bug that gets exercised with NCQ. That would explain why there
> > is no error with older netbsd.
>
>
> > >I guess I still need to move things over to the Hitachi hard drive, 
> > >including user data that could not be reinstalled.
>
> > Backing up the data is of course the first thing. But please also check
> > the disk with smartmontools or 'atatctl wd1 smart status' to see if itself
> > reports problems.
>
> Do you know when (what version) NCQ was introduced to NetBSD?  Was it before 
> or after 7.99.1?
>
> What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
> smartmontools?
>
> I don't have smartmontools installed but could run it from the System Rescue 
> CD or build in NetBSD (or FreeBSD or Linux?) on the Hitachi hard drive.
>
> Firmware or driver bug could explain why the Western Digital Green hard drive 
> might be adversely affected but not all other hard drives.
>
> I believe Western Digital discontinued the Green hard drives because of 
> technical or performance problems.
>
> Tom
>


-- 



Re: Device timeout reading fsbn ...

2019-10-01 Thread Thomas Mueller


mueller6...@twc.com ("Thomas Mueller") writes, and Michael van Elst responds:

> >> sysctl -w hw.wd0.use_ncq=0

> >Actually that would be wd1 in my case as opposed to wd0, but is there any 
> >danger in using this sysctl, could it make the hard drive go bad more 
> >quickly?

> There is no danger, it prevents the driver from queuing more than one
> request to the drive. That's how older netbsd versions behaved.

> While a bad disk can't be ruled out, I tend to suspect a firmware or
> driver bug that gets exercised with NCQ. That would explain why there
> is no error with older netbsd.


> >I guess I still need to move things over to the Hitachi hard drive, 
> >including user data that could not be reinstalled.

> Backing up the data is of course the first thing. But please also check
> the disk with smartmontools or 'atatctl wd1 smart status' to see if itself
> reports problems.

Do you know when (what version) NCQ was introduced to NetBSD?  Was it before or 
after 7.99.1?

What is atatctl?  "which atatctl" shows nothing.  Is atatctl part of 
smartmontools?

I don't have smartmontools installed but could run it from the System Rescue CD 
or build in NetBSD (or FreeBSD or Linux?) on the Hitachi hard drive.

Firmware or driver bug could explain why the Western Digital Green hard drive 
might be adversely affected but not all other hard drives.

I believe Western Digital discontinued the Green hard drives because of 
technical or performance problems.

Tom



Re: Device timeout reading fsbn ...

2019-10-01 Thread Michael van Elst
mueller6...@twc.com ("Thomas Mueller") writes:

>> sysctl -w hw.wd0.use_ncq=0

>Actually that would be wd1 in my case as opposed to wd0, but is there any 
>danger in using this sysctl, could it make the hard drive go bad more quickly?

There is no danger, it prevents the driver from queuing more than one
request to the drive. That's how older netbsd versions behaved.

While a bad disk can't be ruled out, I tend to suspect a firmware or
driver bug that gets exercised with NCQ. That would explain why there
is no error with older netbsd.


>I guess I still need to move things over to the Hitachi hard drive, including 
>user data that could not be reinstalled.

Backing up the data is of course the first thing. But please also check
the disk with smartmontools or 'atatctl wd1 smart status' to see if itself
reports problems.


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Device timeout reading fsbn ...

2019-10-01 Thread Thomas Mueller
> On Mon, Sep 30, 2019 at 03:03:05AM +, Thomas Mueller wrote:
> > What does it mean when I get such messages, and then I can't do anything 
> > more except reboot (not clean, I need to run fsck_ffs on partitions that 
> > had been mounted)?
 
> > To reboot, I have to Reset, or Ctrl-Alt-Esc into debugger, followed by 
> > reboot.
 
> > Does it mean the hard drive is going bad, about to go completely bad?

> You could check smart datas; but yes it usually means that the drive is
> going bad.

 
> Manuel Bouyer 

I suspect the hard drive may be going slowly bad, may have been a bit buggy to 
begin with.

I had that situation some years back, some things didn't work fully right, 
Linux Slackware was OK but FreeBSD was not really functional.  Maybe six years 
later, the hard drive failed completely.  It was a Western Digital Caviar.

So I need to move things over to the other (Hitachi) hard drive, a refurbished 
special, have already started, will see if NetBSD and FreeBSD run any better on 
that hard disk.

It is strange that the hard disk seems OK under some FreeBSD and NetBSD 
installations.

That hard disk is a Western Digital Green, suggesting it may be ready to go bad 
any time.

> A timeout can have many causes, but here I suspect some issue with NCQ.

> Try to disable NCQ for that drive with

> sysctl -w hw.wd0.use_ncq=0

>   Michael van Elst

Actually that would be wd1 in my case as opposed to wd0, but is there any 
danger in using this sysctl, could it make the hard drive go bad more quickly?

I guess I still need to move things over to the Hitachi hard drive, including 
user data that could not be reinstalled.

Tom