Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread Robert Nichols

On 08/11/2017 02:32 PM, m.r...@5-cent.us wrote:

Robert Nichols wrote:

On 08/11/2017 12:16 PM, Chris Murphy wrote:

On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols
 wrote:

On 08/10/2017 11:06 AM, Chris Murphy wrote:


On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz 
wrote:

On 08/09/2017 10:46 AM, Chris Murphy wrote:


If it's a bad sector problem, you'd write to sector 17066160 and see
if the drive complies or spits back a write error. It looks like a bad
sector in that the same LBA is reported each time but I've only ever
seen this with both a read error and a UNC error. So I'm not sure
it's a bad sector.



That'll read that sector and display hex and ascii. If you recognize
the
contents, it's probably user data. Otherwise, it's file system
metadata or
a system binary.

If you get nothing but an I/O error, then it's lost so it doesn't
matter what it is, you can definitely overwrite it.

dd if=/dev/zero of=/dev/sda seek=17066160 count=1



You really don't want to do that without first finding out what file is
using that block. You will convert a detected I/O error into silent
corruption ofthat file, and that is a much worse situation.


Yeah he'd want to do an fsck -f and see if repairs are made, and also



fsck checks filesystem metadata, not the content of files. It is not going
to detect that a file has had 512 bytes replaced by zeros. If the file
is a non-configuration file installed from an RPM, then "rpm -Va" should
flag it.

LVM certainly makes the procedure harder. Figuring out what filesystem
block corresponds to that LBA is still possible, but you have to examine
the LV layout in /etc/lvm/backup/ and learn more than you probably wanted
to know about LVM.


I posted a link yesterday - let me know if you want me to repost it - to
someone's web page who REALLY knows about filesystems and sectors, and how
to identify the file that a bad sector is part of.

And it works. I haven't needed it in a few years, but I have followed his
directions, and identified the file on the bad sector.


But, have you tried it when LVM is involved? That's an additional mapping
layer for disk addresses that is not covered in the page you linked, which
is just a partial copy of the smartmontools bad block HOWTO at
https://www.smartmontools.org/browser/trunk/www/badblockhowto.xml .
That smartmontools page does have a section that deals with LVM. I advise
not looking at that on a full stomach.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread m . roth
Robert Nichols wrote:
> On 08/11/2017 12:16 PM, Chris Murphy wrote:
>> On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols
>>  wrote:
>>> On 08/10/2017 11:06 AM, Chris Murphy wrote:

 On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz 
 wrote:
> On 08/09/2017 10:46 AM, Chris Murphy wrote:
>>
>> If it's a bad sector problem, you'd write to sector 17066160 and see
>> if the drive complies or spits back a write error. It looks like a bad
>> sector in that the same LBA is reported each time but I've only ever
>> seen this with both a read error and a UNC error. So I'm not sure
>> it's a bad sector.

 That'll read that sector and display hex and ascii. If you recognize
 the
 contents, it's probably user data. Otherwise, it's file system
 metadata or
 a system binary.

 If you get nothing but an I/O error, then it's lost so it doesn't
 matter what it is, you can definitely overwrite it.

 dd if=/dev/zero of=/dev/sda seek=17066160 count=1
>>>
>>>
>>> You really don't want to do that without first finding out what file is
>>> using that block. You will convert a detected I/O error into silent
>>> corruption ofthat file, and that is a much worse situation.
>>
>> Yeah he'd want to do an fsck -f and see if repairs are made, and also

> fsck checks filesystem metadata, not the content of files. It is not going
> to detect that a file has had 512 bytes replaced by zeros. If the file
> is a non-configuration file installed from an RPM, then "rpm -Va" should
> flag it.
>
> LVM certainly makes the procedure harder. Figuring out what filesystem
> block corresponds to that LBA is still possible, but you have to examine
> the LV layout in /etc/lvm/backup/ and learn more than you probably wanted
> to know about LVM.

I posted a link yesterday - let me know if you want me to repost it - to
someone's web page who REALLY knows about filesystems and sectors, and how
to identify the file that a bad sector is part of.

And it works. I haven't needed it in a few years, but I have followed his
directions, and identified the file on the bad sector.

  mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread Warren Young
On Aug 11, 2017, at 1:07 PM, Robert Nichols  wrote:

>> Yeah he'd want to do an fsck -f and see if repairs are madestem.
> 
> fsck checks filesystem metadata, not the content of files.

Chris might have been thinking of fsck -c or -k, which do various sorts of 
badblocks scans.

That’s still a poor alternative to strong data checksumming and Merkle tree 
structured filesystems, of course.

> LVM certainly makes the procedure harder. Figuring out what filesystem
> block corresponds to that LBA is still possible, but you have to examine
> the LV layout in /etc/lvm/backup/ and learn more than you probably wanted
> to know about LVM.

Linux kernel 4.8 added a feature called reverse mapping which is intended to 
solve this problem.

In principle, this will let you get a list of files that are known to be 
corrupted due to errors at the block layer, then fix it by removing or 
overwriting those files.  The block layer, DM, LVM2, and filesystem layers will 
then be able to understand that those blocks are no longer corrupt, therefore 
the filesystem is fine, as are all the possible layers in between.

This understanding is based on a question I asked and had answered on the 
Stratis-Docs GitHub issue tracker:

https://github.com/stratis-storage/stratis-docs/issues/53

We’ll see how well it works in practice.  It is certainly possible in 
principle: ZFS does this today.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread Robert Nichols

On 08/11/2017 12:16 PM, Chris Murphy wrote:

On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols
 wrote:

On 08/10/2017 11:06 AM, Chris Murphy wrote:


On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz 
wrote:




On 08/09/2017 10:46 AM, Chris Murphy wrote:


If it's a bad sector problem, you'd write to sector 17066160 and see if


the


drive complies or spits back a write error. It looks like a bad sector
in
that the same LBA is reported each time but I've only ever seen this
with
both a read error and a UNC error. So I'm not sure it's a bad sector.

What is DID_BAD_TARGET?



I have no experience on how to force a write to a specific sector and
not cause other problems.  I suspect that this sector is in the /
partition:

Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xc89d

  Device Boot  Start End  Blocks   Id  System
/dev/sda12048 2099199 1048576   83  Linux
/dev/sda2 2099200 4196351 1048576   82  Linux swap /
Solaris
/dev/sda3 4196352   468862127   232332888   83  Linux



LBA 17066160 would be on sda3.

dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C

That'll read that sector and display hex and ascii. If you recognize the
contents, it's probably user data. Otherwise, it's file system metadata or
a system binary.

If you get nothing but an I/O error, then it's lost so it doesn't matter
what it is, you can definitely overwrite it.

dd if=/dev/zero of=/dev/sda seek=17066160 count=1



You really don't want to do that without first finding out what file is
using
that block. You will convert a detected I/O error into silent corruption of
that file, and that is a much worse situation.


Yeah he'd want to do an fsck -f and see if repairs are made, and also
rpm -Va. There *will* be legitimately modified files, so it's going to
be tedious to exactly sort out the ones that are legitimately modified
vs corrupt. If it's a configuration file, I'd say you could ignore it
but any modified binaries other than permissions need to be replaced
and is the likely culprit.

The smartmontools page has hints on how to figure out what file is
affected by a particular sector being corrupt but the more layers are
involved the more difficult that gets. I'm not sure there's an easy to
do this with LVM in between the physical device and file system.


fsck checks filesystem metadata, not the content of files. It is not going
to detect that a file has had 512 bytes replaced by zeros. If the file
is a non-configuration file installed from an RPM, then "rpm -Va" should
flag it.

LVM certainly makes the procedure harder. Figuring out what filesystem
block corresponds to that LBA is still possible, but you have to examine
the LV layout in /etc/lvm/backup/ and learn more than you probably wanted
to know about LVM.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread Chris Murphy
On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols
 wrote:
> On 08/10/2017 11:06 AM, Chris Murphy wrote:
>>
>> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz 
>> wrote:
>>
>>>
>>>
>>> On 08/09/2017 10:46 AM, Chris Murphy wrote:

 If it's a bad sector problem, you'd write to sector 17066160 and see if
>>>
>>> the

 drive complies or spits back a write error. It looks like a bad sector
 in
 that the same LBA is reported each time but I've only ever seen this
 with
 both a read error and a UNC error. So I'm not sure it's a bad sector.

 What is DID_BAD_TARGET?
>>>
>>>
>>> I have no experience on how to force a write to a specific sector and
>>> not cause other problems.  I suspect that this sector is in the /
>>> partition:
>>>
>>> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
>>> Units = sectors of 1 * 512 = 512 bytes
>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>> Disk label type: dos
>>> Disk identifier: 0xc89d
>>>
>>>  Device Boot  Start End  Blocks   Id  System
>>> /dev/sda12048 2099199 1048576   83  Linux
>>> /dev/sda2 2099200 4196351 1048576   82  Linux swap /
>>> Solaris
>>> /dev/sda3 4196352   468862127   232332888   83  Linux
>>>
>>
>> LBA 17066160 would be on sda3.
>>
>> dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C
>>
>> That'll read that sector and display hex and ascii. If you recognize the
>> contents, it's probably user data. Otherwise, it's file system metadata or
>> a system binary.
>>
>> If you get nothing but an I/O error, then it's lost so it doesn't matter
>> what it is, you can definitely overwrite it.
>>
>> dd if=/dev/zero of=/dev/sda seek=17066160 count=1
>
>
> You really don't want to do that without first finding out what file is
> using
> that block. You will convert a detected I/O error into silent corruption of
> that file, and that is a much worse situation.

Yeah he'd want to do an fsck -f and see if repairs are made, and also
rpm -Va. There *will* be legitimately modified files, so it's going to
be tedious to exactly sort out the ones that are legitimately modified
vs corrupt. If it's a configuration file, I'd say you could ignore it
but any modified binaries other than permissions need to be replaced
and is the likely culprit.

The smartmontools page has hints on how to figure out what file is
affected by a particular sector being corrupt but the more layers are
involved the more difficult that gets. I'm not sure there's an easy to
do this with LVM in between the physical device and file system.

-- 
Chris Murphy
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread Robert Nichols

On 08/10/2017 11:06 AM, Chris Murphy wrote:

On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz  wrote:




On 08/09/2017 10:46 AM, Chris Murphy wrote:

If it's a bad sector problem, you'd write to sector 17066160 and see if

the

drive complies or spits back a write error. It looks like a bad sector in
that the same LBA is reported each time but I've only ever seen this with
both a read error and a UNC error. So I'm not sure it's a bad sector.

What is DID_BAD_TARGET?


I have no experience on how to force a write to a specific sector and
not cause other problems.  I suspect that this sector is in the /
partition:

Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xc89d

 Device Boot  Start End  Blocks   Id  System
/dev/sda12048 2099199 1048576   83  Linux
/dev/sda2 2099200 4196351 1048576   82  Linux swap /
Solaris
/dev/sda3 4196352   468862127   232332888   83  Linux



LBA 17066160 would be on sda3.

dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C

That'll read that sector and display hex and ascii. If you recognize the
contents, it's probably user data. Otherwise, it's file system metadata or
a system binary.

If you get nothing but an I/O error, then it's lost so it doesn't matter
what it is, you can definitely overwrite it.

dd if=/dev/zero of=/dev/sda seek=17066160 count=1


You really don't want to do that without first finding out what file is using
that block. You will convert a detected I/O error into silent corruption of
that file, and that is a much worse situation.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-11 Thread hw

Chris Murphy wrote:

On Wed, Aug 9, 2017, 11:55 AM Mark Haney  wrote:


To be honest, I'd not try a btrfs volume on a notebook SSD. I did that on a
couple of systems and it corrupted pretty quickly. I'd stick with xfs/ext4


if you manage to get the drive working again.




Sounds like a hardware problem. Btrfs is explicitly optimized for SSD, the
maintainers worked for FusionIO for several years of its development. If
the drive is silently corrupting data, Btrfs will pretty much immediately
start complaining where other filesystems will continue. Bad RAM can also
result in scary warnings where you don't with other filesytems. And I've
been using it in numerous SSDs for years and NVMe for a year with zero
problems.


That´s one thing I´ve been wondering about:  When using btrfs RAID, do you
need to somehow monitor the disks to see if one has failed?


On CentOS though, I'd get newer btrfs-progs RPM from Fedora, and use either
an elrepo.org kernel, a Fedora kernel, or build my own latest long-term
from kernel.org. There's just too much development that's happened since
the tree found in RHEL/CentOS kernels.


I can´t go with a more recent kernel version before NVIDIA has updated their
drivers to no longer need fence.h (or what it was).

And I thought stuff gets backported, especially things as important as file
systems.


Also FWIW Red Hat is deprecating Btrfs, in the RHEL 7.4 announcement.
Support will be removed probably in RHEL 8. I have no idea how it'll affect
CentOS kernels though. It will remain in Fedora kernels.


That would suck badly to the point at which I´d have to look for yet another
distribution.  The only one ramaining is arch.

What do they suggest as a replacement?  The only other FS that comes close is
ZFS, and removing btrfs alltogether would be taking living in the past too many
steps too far.


Anyway, blkdiscard can be used on an SSD, whole or partition to zero them
out. And at least recent ext4 and XFS mkfs will do a blkdisard, same as
mksfs.btrfs.


Chris Murphy







<
https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon



Virus-free.
www.avast.com
<
https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Aug 9, 2017 at 1:48 PM, hw  wrote:


Robert Moskowitz wrote:


I am building a new system using an Kingston 240GB SSD drive I pulled
from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
install went fine and ran for a couple days then got errors on the
console.  Here is an example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 'reboot'.
I had to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?



Make sure the cables and power supply are ok.  Try the drive in another
machine
that has a different controller to see if there is an incompatibility
between
the drive and the controller.

You could make a btrfs file system on the whole device: that should say
that
a trim operation is performed for the whole device.  Maybe that helps.

If the errors persist, replace the drive.  I悲 use Intel SSDs because they
seam to have the least problems with broken firmwares.  Do not use SSDs
with
hardware RAID controllers unless the SSDs were designed for this
application.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos





--
[image: photo]
Mark Haney
Network Engineer at NeoNova
919-460-3330 <(919)%20460-3330> (opt 1) • mark.ha...@neonova.net
www.neonova.net 
  

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos



___
CentOS mailing list

Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Warren Young
On Aug 10, 2017, at 2:17 PM, John R Pierce  wrote:
> 
> On 8/10/2017 1:12 PM, Warren Young wrote:
>> You want those pages to get swapped out quickly so that the precious RAM can 
>> be used more productively; by the buffer cache, if nothing else.
> 
> most modern virtual memory OS's don't swap out unused pages, instead, they 
> swap IN accessed pages directly from the executable file.  only thing written 
> to swap are 'dirty' pages that have been changed since loading.

Is that not a distinction without a difference in my case?

Let’s say I have a system with 256 MB of free user-space RAM, and I have a 
binary that happens to be nearly 256 MB on disk, between the main executable 
and all the libraries it uses.

Question: Can my program allocate any dynamic RAM?

The OS’s VMM is free to use addresses beyond 0-256 MB, but since we’ve said 
there is no swap space, everything swapped in must still be assigned a place in 
physical RAM *somewhere*.

Is there a meaningful distinction between:

Scenario 1: The application’s first few executable pages are loaded from disk, 
a few key libraries are loaded, then the application does a dynamic memory 
allocation, then somehow causes all the rest of the executable pages to be 
loaded, running the system out of RAM.

Scenario 2: The application is entirely loaded into RAM, nearly filling it, 
then the application attempts a large dynamic memory allocation, causing an OOM 
error.

Regardless of the answer to these questions, I can tell you that switching that 
web site to a more efficient web application stack allowed us to shrink the VPS 
from a 256 MB plan, under which it would occasionally crash and require a 
reboot, to a 64 MB plan, under which the site has been rock-solid.  Same VPS 
provider, same web site content, same user-facing functionality.

If I’d had the ability to assign swap space, I probably could have gotten away 
with a 64 MB VPS plan with the inefficient web technology, too.  They gave me 
plenty of disk space with that plan.

(And no, swapon /some-file is no solution here.  The VPS technology simply 
didn’t allow swap space, even from a swap file on one of the system disks.  It 
wasn’t simply an inability to add a swap partition.)
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread J Martin Rushton
On 10/08/17 21:17, John R Pierce wrote:
> On 8/10/2017 1:12 PM, Warren Young wrote:
>> It’s a bad idea to do without swap even if you almost never use it,
>> because today’s bloated apps often have many pages of virtual memory
>> they rarely or never actually touch.  You want those pages to get
>> swapped out quickly so that the precious RAM can be used more
>> productively; by the buffer cache, if nothing else.
> 
> most modern virtual memory OS's don't swap out unused pages, instead,
> they swap IN accessed pages directly from the executable file.  only
> thing written to swap are 'dirty' pages that have been changed since
> loading.
> 
Modern?  They've been doing that since I did my VMS theory 30-odd years ago.



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Warren Young
On Aug 10, 2017, at 10:46 AM, mad.scientist.at.la...@tutanota.com wrote:
> 
> is that because the drive is compressing the information?

No.  I believe by “probabilistic representation” the parent poster simply means 
that in any given data cell, you don’t have a hard “1” or “0”, you have some 
voltage potential which can be interpreted as some number of 1 or 0 bits, often 
3 bits or more.

Between that fact and wear-leveling, you can’t take a simple voltage 
measurement on a data cell and say, “This cell contains 011.”  You need more 
smarts about what’s going on to turn the voltage reading into the correct value.

As the drive’s data cells wear out, the drive’s ability to do that correctly 
and reliably degrade.  Thus cell death, thus drive death, thus filesystem 
death, thus backups, else sadness.

And please don’t top-post.

A: Yes.

Q: Are you sure?

A: Because it makes the flow of conversation more difficult to read.

Q: Why shouldn’t I top-post?
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread John R Pierce

On 8/10/2017 1:12 PM, Warren Young wrote:

It’s a bad idea to do without swap even if you almost never use it, because 
today’s bloated apps often have many pages of virtual memory they rarely or 
never actually touch.  You want those pages to get swapped out quickly so that 
the precious RAM can be used more productively; by the buffer cache, if nothing 
else.


most modern virtual memory OS's don't swap out unused pages, instead, 
they swap IN accessed pages directly from the executable file.  only 
thing written to swap are 'dirty' pages that have been changed since 
loading.



--
john r pierce, recycling bits in santa cruz

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Warren Young
On Aug 10, 2017, at 2:07 AM, John Hodrien  wrote:
> 
> For a well configured desktop that rarely needs to swap, I struggle to see the
> load on the SSD as being significant, and yet obviously the performance of an
> SSD would make it ideal for swap.

I agree.

It’s a bad idea to do without swap even if you almost never use it, because 
today’s bloated apps often have many pages of virtual memory they rarely or 
never actually touch.  You want those pages to get swapped out quickly so that 
the precious RAM can be used more productively; by the buffer cache, if nothing 
else.

I once used a web application server on a headless VPS that still had GUI 
libraries linked to its binary because one of the underlying technologies it 
uses was also used in a GUI app, and it was too difficult to tear all that GUI 
code out, even if it was never called.  Because the VPS technology didn’t 
support swap, I directly paid the price for those megs of unused (and 
unusable!) libraries in my monthly VPS rental fees.

> Coo, I've never seen a disk actually shrink due to failed sectors.  I don't
> think I've got an SSD into a worn state yet to see this.

Me, neither.  I’m pretty sure the spare sector pool’s size isn’t reported to 
the OS, and the drive isn’t allowed to dip into the sectors it does expose 
externally for spares.  

When the spare pool is used up, the drive just starts failing in a way that 
even SMART can see.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread m . roth
Chris Murphy wrote:
> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz 
> wrote:
>
>>
>>
>> On 08/09/2017 10:46 AM, Chris Murphy wrote:
>> > If it's a bad sector problem, you'd write to sector 17066160 and see
>> if
>> the
>> > drive complies or spits back a write error. It looks like a bad sector
>> in
>> > that the same LBA is reported each time but I've only ever seen this
>> with
>> > both a read error and a UNC error. So I'm not sure it's a bad sector.
>> >
>> > What is DID_BAD_TARGET?
>>
>> I have no experience on how to force a write to a specific sector and
>> not cause other problems.  I suspect that this sector is in the /
>> partition:
>>
>> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disk label type: dos
>> Disk identifier: 0xc89d
>>
>> Device Boot  Start End  Blocks   Id  System
>> /dev/sda12048 2099199 1048576   83  Linux
>> /dev/sda2 2099200 4196351 1048576   82  Linux swap /
>> Solaris
>> /dev/sda3 4196352   468862127   232332888   83  Linux
>>
>
> LBA 17066160 would be on sda3.
>
> dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C
>
> That'll read that sector and display hex and ascii. If you recognize the
> contents, it's probably user data. Otherwise, it's file system metadata or
> a system binary.

Yeah, I was going to suggest you find out what that's part of. Try this link
, which is about
identifying what an unreadable sector is part of.

 mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread mad.scientist.at.large

is that because the drive is compressing the information?  is there a way to 
turn this off?  i hate mandatory compression as losing one bit in a compressed 
file tends to be a big deal compared to the same in an uncompressed file.
--
Securely sent with Tutanota. Claim your encrypted mailbox today!
https://tutanota.com

10. Aug 2017 10:06 by li...@colorremedies.com:


> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <> r...@htt-consult.com> > 
> wrote:
>
>> 
>>
> SSD's, in particular SD Cards (which you're not using, which is noted as
> /dev/mmcblk0...) store you data as a probabilistic representation, and
> through a lot of magic, the probability of retrieving your data correctly
> from SSD is made very high. Almost deterministic.
>
> The magic is in the firmware, and so there's some possibility any given SSD
> problem is related to a firmware bug. So it's worth comparing the firmware
> reported by smartctl and what the manufacturer has, and then their
> changelog. Most have a way to update firmware without Windows, but don't
> have images that will boot an arm board, usually the "universal" updater is
> based on FreeDOS funny enough. You'd need to stick the SSD in an x86
> computer to do this. Hilariously perverse, I did this with a Samsung 830
> SSD a while back, sticking it into a Macbook Pro, and burned that firmware
> ISO onto a DVD-RW, and it booted that Mac (using the firmware's BIOS
> compatibility support module) and updated the SSD's firmware without a
> problem.
>
>
>
> Chris Murphy
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Chris Murphy
On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz  wrote:

>
>
> On 08/09/2017 10:46 AM, Chris Murphy wrote:
> > If it's a bad sector problem, you'd write to sector 17066160 and see if
> the
> > drive complies or spits back a write error. It looks like a bad sector in
> > that the same LBA is reported each time but I've only ever seen this with
> > both a read error and a UNC error. So I'm not sure it's a bad sector.
> >
> > What is DID_BAD_TARGET?
>
> I have no experience on how to force a write to a specific sector and
> not cause other problems.  I suspect that this sector is in the /
> partition:
>
> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk label type: dos
> Disk identifier: 0xc89d
>
> Device Boot  Start End  Blocks   Id  System
> /dev/sda12048 2099199 1048576   83  Linux
> /dev/sda2 2099200 4196351 1048576   82  Linux swap /
> Solaris
> /dev/sda3 4196352   468862127   232332888   83  Linux
>

LBA 17066160 would be on sda3.

dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C

That'll read that sector and display hex and ascii. If you recognize the
contents, it's probably user data. Otherwise, it's file system metadata or
a system binary.

If you get nothing but an I/O error, then it's lost so it doesn't matter
what it is, you can definitely overwrite it.

dd if=/dev/zero of=/dev/sda seek=17066160 count=1

If you want an extra confirmation, you can first do 'smartctl -t long
/dev/sda' and then after the prescribed testing time, which is listed,
check it again with 'smartct -a /dev/sda' and see if the test completed, or
if under self-test log section, it shows it was aborted and lists a number
under the LBA_of_first_error column.



> But I don't know where it is in relation to the way the drive was
> formatted in my notebook.  I think it would have been in the / partition.
>




>
> > And what do you get for
> > smartctl -x 
>
> About 17KB of output?


Can you attach it as a file to the list? If the list won't accept the
attachment, put it up on fpaste.org or pastebin or something like that.
MUA's tend to nerf the output so don't paste it into an email.





> I don't know how to read what it is saying, but
> noted in the beginning:
>
> Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
> badly formed scsi parameters
>
> Don't know what this means...
>
> BTW, the system is a Cubieboard2 armv7 SoC running Centos7-armv7hl. This
> is the first time I have used an SSD on a Cubie, but I know it is
> frequently done.  I would have to ask on the Cubie forum what others
> experience with SSDs have been.
>

It's very common. I think this is just an ordinary bad sector, if that LBA
value is consistent. If it's a new SSD it's slightly concerning. You can
either keep an eye on it, or put a little pressure on the manufacturer or
place of purchase that you have a bad sector and would like to swap out the
unit.

SSD's, in particular SD Cards (which you're not using, which is noted as
/dev/mmcblk0...) store you data as a probabilistic representation, and
through a lot of magic, the probability of retrieving your data correctly
from SSD is made very high. Almost deterministic.

The magic is in the firmware, and so there's some possibility any given SSD
problem is related to a firmware bug. So it's worth comparing the firmware
reported by smartctl and what the manufacturer has, and then their
changelog. Most have a way to update firmware without Windows, but don't
have images that will boot an arm board, usually the "universal" updater is
based on FreeDOS funny enough. You'd need to stick the SSD in an x86
computer to do this. Hilariously perverse, I did this with a Samsung 830
SSD a while back, sticking it into a Macbook Pro, and burned that firmware
ISO onto a DVD-RW, and it booted that Mac (using the firmware's BIOS
compatibility support module) and updated the SSD's firmware without a
problem.



Chris Murphy
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Chris Murphy
On Wed, Aug 9, 2017, 11:55 AM Mark Haney  wrote:

> To be honest, I'd not try a btrfs volume on a notebook SSD. I did that on a
> couple of systems and it corrupted pretty quickly. I'd stick with xfs/ext4

if you manage to get the drive working again.
>

Sounds like a hardware problem. Btrfs is explicitly optimized for SSD, the
maintainers worked for FusionIO for several years of its development. If
the drive is silently corrupting data, Btrfs will pretty much immediately
start complaining where other filesystems will continue. Bad RAM can also
result in scary warnings where you don't with other filesytems. And I've
been using it in numerous SSDs for years and NVMe for a year with zero
problems.

On CentOS though, I'd get newer btrfs-progs RPM from Fedora, and use either
an elrepo.org kernel, a Fedora kernel, or build my own latest long-term
from kernel.org. There's just too much development that's happened since
the tree found in RHEL/CentOS kernels.

Also FWIW Red Hat is deprecating Btrfs, in the RHEL 7.4 announcement.
Support will be removed probably in RHEL 8. I have no idea how it'll affect
CentOS kernels though. It will remain in Fedora kernels.

Anyway, blkdiscard can be used on an SSD, whole or partition to zero them
out. And at least recent ext4 and XFS mkfs will do a blkdisard, same as
mksfs.btrfs.


Chris Murphy






> <
> https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon
> >
> Virus-free.
> www.avast.com
> <
> https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Wed, Aug 9, 2017 at 1:48 PM, hw  wrote:
>
> > Robert Moskowitz wrote:
> >
> >> I am building a new system using an Kingston 240GB SSD drive I pulled
> >> from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
> >> install went fine and ran for a couple days then got errors on the
> >> console.  Here is an example:
> >>
> >> [168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
> >> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >> [168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
> >> 00 00 08 00
> >> [168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
> >> [168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
> >> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >> [168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
> >> 00 00 08 00
> >> [168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
> >> [168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
> >> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >> [168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
> >> 00 00 08 00
> >> [168787.830544] blk_update_request: I/O error, dev sda, sector 17066160
> >>
> >> Eventually, I could not do anything on the system.  Not even a 'reboot'.
> >> I had to do a cold power cycle to bring things back.
> >>
> >> Is there anything to do about this or trash the drive and start anew?
> >>
> >
> > Make sure the cables and power supply are ok.  Try the drive in another
> > machine
> > that has a different controller to see if there is an incompatibility
> > between
> > the drive and the controller.
> >
> > You could make a btrfs file system on the whole device: that should say
> > that
> > a trim operation is performed for the whole device.  Maybe that helps.
> >
> > If the errors persist, replace the drive.  I悲 use Intel SSDs because they
> > seam to have the least problems with broken firmwares.  Do not use SSDs
> > with
> > hardware RAID controllers unless the SSDs were designed for this
> > application.
> >
> >
> > ___
> > CentOS mailing list
> > CentOS@centos.org
> > https://lists.centos.org/mailman/listinfo/centos
> >
> >
>
>
> --
> [image: photo]
> Mark Haney
> Network Engineer at NeoNova
> 919-460-3330 <(919)%20460-3330> (opt 1) • mark.ha...@neonova.net
> www.neonova.net 
>   
> 
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread John Hodrien

On Thu, 10 Aug 2017, Robert Moskowitz wrote:


Other than the 17K output from smartctl -x, what do you recommend?


smartctl -a is a little easier on the eye.

jh
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Robert Moskowitz



On 08/10/2017 10:31 AM, m.r...@5-cent.us wrote:

Robert Moskowitz wrote:

On 08/09/2017 10:44 PM, mad.scientist.at.la...@tutanota.com wrote:

what file system are you using?  ssd drives have different
characteristics that need to be accomadated (including a relatively slow
write process which is obvious as soon as the buffer is full), and
never, never put a swap partition on it, the high activity will wear it
out rather quickly.  might also check cables, often a problem
particularly if they are older sata cables being run at a possibly
higher than rated speed.

When working with a Cubieboard SoC (or most of the other armv7 boards),
you tend to have everything hanging out:
http://medon.htt-consult.com/~rgm/cubieboard/cubietower-2.JPG

I have checked the cables and they are all tight.


in any case, reformating it might not be a bad idea, and you can always
use the command line program badblocks to exercise and test it.

I will have to look into that.


Here's a thought: I've not done this, but could you use smartctl to check
the drive?


Other than the 17K output from smartctl -x, what do you recommend?


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread hw

Robert Moskowitz wrote:



On 08/09/2017 01:48 PM, hw wrote:

Robert Moskowitz wrote:

I am building a new system using an Kingston 240GB SSD drive I pulled from my 
notebook (when I had to upgrade to a 500GB SSD drive).  Centos install went 
fine and ran for a couple days then got errors on the console.  Here is an 
example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 'reboot'.  I had 
to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?


Make sure the cables and power supply are ok.  Try the drive in another machine
that has a different controller to see if there is an incompatibility between
the drive and the controller.

You could make a btrfs file system on the whole device: that should say that
a trim operation is performed for the whole device.  Maybe that helps.


This is a Centos7-armv7hl install which is done by dd the provided image onto a 
drive, so really can't alter the provided file systems much other than to 
resize them.  What I have is:


Perhaps there´s some incompatibility on this architecture.

BTW, that the cables sit tight doesn´t mean they are good.




Model: ATA KINGSTON SV300S3 (scsi)
Disk /dev/sda: 240GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End SizeType File system Flags
 1  1049kB  1075MB  1074MB  primary  ext3
 2  1075MB  2149MB  1074MB  primary  linux-swap(v1)
 3  2149MB  240GB   238GB   primary  ext4






If the errors persist, replace the drive.  I´d use Intel SSDs because they
seam to have the least problems with broken firmwares.  Do not use SSDs with
hardware RAID controllers unless the SSDs were designed for this application.





___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread hw

Mark Haney wrote:

To be honest, I'd not try a btrfs volume on a notebook SSD. I did that on a
couple of systems and it corrupted pretty quickly.  I'd stick with xfs/ext4
if you manage to get the drive working again.


That was merely to see if a trim operation on the whole device would bring some
improvement.

I have the system on SSDs at home and data on spinning disks, so far no problems
with btrfs.  Do I need to worry now?





Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Aug 9, 2017 at 1:48 PM, hw  wrote:


Robert Moskowitz wrote:


I am building a new system using an Kingston 240GB SSD drive I pulled
from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
install went fine and ran for a couple days then got errors on the
console.  Here is an example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 'reboot'.
I had to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?



Make sure the cables and power supply are ok.  Try the drive in another
machine
that has a different controller to see if there is an incompatibility
between
the drive and the controller.

You could make a btrfs file system on the whole device: that should say
that
a trim operation is performed for the whole device.  Maybe that helps.

If the errors persist, replace the drive.  I悲 use Intel SSDs because they
seam to have the least problems with broken firmwares.  Do not use SSDs
with
hardware RAID controllers unless the SSDs were designed for this
application.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos







___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread m . roth
Robert Moskowitz wrote:
> On 08/09/2017 10:44 PM, mad.scientist.at.la...@tutanota.com wrote:

>> what file system are you using?  ssd drives have different
>> characteristics that need to be accomadated (including a relatively slow
>> write process which is obvious as soon as the buffer is full), and
>> never, never put a swap partition on it, the high activity will wear it
>> out rather quickly.  might also check cables, often a problem
>> particularly if they are older sata cables being run at a possibly
>> higher than rated speed.
>
> When working with a Cubieboard SoC (or most of the other armv7 boards),
> you tend to have everything hanging out:
> http://medon.htt-consult.com/~rgm/cubieboard/cubietower-2.JPG
>
> I have checked the cables and they are all tight.
>
>> in any case, reformating it might not be a bad idea, and you can always
>> use the command line program badblocks to exercise and test it.
>
> I will have to look into that.
>
Here's a thought: I've not done this, but could you use smartctl to check
the drive?

   mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Robert Moskowitz



On 08/09/2017 10:44 PM, mad.scientist.at.la...@tutanota.com wrote:

what file system are you using?  ssd drives have different characteristics that 
need to be accomadated (including a relatively slow write process which is 
obvious as soon as the buffer is full), and never, never put a swap partition 
on it, the high activity will wear it out rather quickly.  might also check 
cables, often a problem particularly if they are older sata cables being run at 
a possibly higher than rated speed.


When working with a Cubieboard SoC (or most of the other armv7 boards), 
you tend to have everything hanging out: 
http://medon.htt-consult.com/~rgm/cubieboard/cubietower-2.JPG


I have checked the cables and they are all tight.


in any case, reformating it might not be a bad idea, and you can always use the 
command line program badblocks to exercise and test it.


I will have to look into that.


   keep in mind the drive will invisibly remap any bad sectors if possible.  if 
the reported size of the drive is smaller than it should be the drive has run 
out of spare blocks and dying blocks are being removed from the storage place 
with no replacements.

--
Securely sent with Tutanota. Claim your encrypted mailbox today!
https://tutanota.com

9. Aug 2017 18:44 by elie...@ngtech.co.il:



I have yet to see a SSD read\write error which wasn't related to disk issues
like a bad sector but the controller might have an issue with the drive.
To verify it you will need to burn some read\write IOPS of the drive but if
it's under warranty then it's better to verify it now then later.

Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: > elie...@ngtech.co.il



-Original Message-
From: CentOS [> mailto:centos-boun...@centos.org> ] On Behalf Of Robert
Moskowitz
Sent: Wednesday, August 9, 2017 17:03
To: CentOS mailing list <> centos@centos.org> >
Subject: [CentOS] Errors on an SSD drive

I am building a new system using an Kingston 240GB SSD drive I pulled
from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
install went fine and ran for a couple days then got errors on the
console.  Here is an example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a
'reboot'.  I had to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?

Thanks

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Robert Moskowitz



On 08/09/2017 01:48 PM, hw wrote:

Robert Moskowitz wrote:
I am building a new system using an Kingston 240GB SSD drive I pulled 
from my notebook (when I had to upgrade to a 500GB SSD drive).  
Centos install went fine and ran for a couple days then got errors on 
the console.  Here is an example:


[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 
b0 00 00 08 00

[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 
b0 00 00 08 00

[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 
b0 00 00 08 00

[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 
'reboot'.  I had to do a cold power cycle to bring things back.


Is there anything to do about this or trash the drive and start anew?


Make sure the cables and power supply are ok.  Try the drive in 
another machine
that has a different controller to see if there is an incompatibility 
between

the drive and the controller.

You could make a btrfs file system on the whole device: that should 
say that

a trim operation is performed for the whole device.  Maybe that helps.


This is a Centos7-armv7hl install which is done by dd the provided image 
onto a drive, so really can't alter the provided file systems much other 
than to resize them.  What I have is:


Model: ATA KINGSTON SV300S3 (scsi)
Disk /dev/sda: 240GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End SizeType File system Flags
 1  1049kB  1075MB  1074MB  primary  ext3
 2  1075MB  2149MB  1074MB  primary  linux-swap(v1)
 3  2149MB  240GB   238GB   primary  ext4






If the errors persist, replace the drive.  I´d use Intel SSDs because 
they
seam to have the least problems with broken firmwares.  Do not use 
SSDs with
hardware RAID controllers unless the SSDs were designed for this 
application.



___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread Robert Moskowitz



On 08/09/2017 10:46 AM, Chris Murphy wrote:

If it's a bad sector problem, you'd write to sector 17066160 and see if the
drive complies or spits back a write error. It looks like a bad sector in
that the same LBA is reported each time but I've only ever seen this with
both a read error and a UNC error. So I'm not sure it's a bad sector.

What is DID_BAD_TARGET?


I have no experience on how to force a write to a specific sector and 
not cause other problems.  I suspect that this sector is in the / partition:


Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xc89d

   Device Boot  Start End  Blocks   Id  System
/dev/sda12048 2099199 1048576   83  Linux
/dev/sda2 2099200 4196351 1048576   82  Linux swap / Solaris
/dev/sda3 4196352   468862127   232332888   83  Linux

But I don't know where it is in relation to the way the drive was 
formatted in my notebook.  I think it would have been in the / partition.




And what do you get for
smartctl -x 


About 17KB of output?  I don't know how to read what it is saying, but 
noted in the beginning:


Write SCT (Get) XXX Error Recovery Control Command failed: scsi error 
badly formed scsi parameters


Don't know what this means...

BTW, the system is a Cubieboard2 armv7 SoC running Centos7-armv7hl. This 
is the first time I have used an SSD on a Cubie, but I know it is 
frequently done.  I would have to ask on the Cubie forum what others 
experience with SSDs have been.





Chris Murphy

On Wed, Aug 9, 2017, 8:03 AM Robert Moskowitz  wrote:


I am building a new system using an Kingston 240GB SSD drive I pulled
from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
install went fine and ran for a couple days then got errors on the
console.  Here is an example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
00 00 08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a
'reboot'.  I had to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?

Thanks

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos



___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-10 Thread John Hodrien

On Thu, 10 Aug 2017, mad.scientist.at.la...@tutanota.com wrote:


what file system are you using?  ssd drives have different characteristics
that need to be accomadated (including a relatively slow write process which
is obvious as soon as the buffer is full), and never, never put a swap
partition on it, the high activity will wear it out rather quickly.


I know this is common doctrine, but is this still generally held true?

For a well configured desktop that rarely needs to swap, I struggle to see the
load on the SSD as being significant, and yet obviously the performance of an
SSD would make it ideal for swap.


might also check cables, often a problem particularly if they are older sata
cables being run at a possibly higher than rated speed.  in any case,
reformating it might not be a bad idea, and you can always use the command
line program badblocks to exercise and test it.


Exercising an SSD?

smartctl will give you sensible information on what the drive thinks of
itself, and will give you actual figures on wear levelling and such like.


keep in mind the drive will invisibly remap any bad sectors if possible.  if
the reported size of the drive is smaller than it should be the drive has
run out of spare blocks and dying blocks are being removed from the storage
place with no replacements.


Coo, I've never seen a disk actually shrink due to failed sectors.  I don't
think I've got an SSD into a worn state yet to see this.

jh
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-09 Thread mad.scientist.at.large
what file system are you using?  ssd drives have different characteristics that 
need to be accomadated (including a relatively slow write process which is 
obvious as soon as the buffer is full), and never, never put a swap partition 
on it, the high activity will wear it out rather quickly.  might also check 
cables, often a problem particularly if they are older sata cables being run at 
a possibly higher than rated speed.  in any case, reformating it might not be a 
bad idea, and you can always use the command line program badblocks to exercise 
and test it.  keep in mind the drive will invisibly remap any bad sectors if 
possible.  if the reported size of the drive is smaller than it should be the 
drive has run out of spare blocks and dying blocks are being removed from the 
storage place with no replacements.

--
Securely sent with Tutanota. Claim your encrypted mailbox today!
https://tutanota.com

9. Aug 2017 18:44 by elie...@ngtech.co.il:


> I have yet to see a SSD read\write error which wasn't related to disk issues
> like a bad sector but the controller might have an issue with the drive.
> To verify it you will need to burn some read\write IOPS of the drive but if
> it's under warranty then it's better to verify it now then later.
>
> Eliezer
>
> 
> Eliezer Croitoru
> Linux System Administrator
> Mobile: +972-5-28704261
> Email: > elie...@ngtech.co.il
>
>
>
> -Original Message-
> From: CentOS [> mailto:centos-boun...@centos.org> ] On Behalf Of Robert
> Moskowitz
> Sent: Wednesday, August 9, 2017 17:03
> To: CentOS mailing list <> centos@centos.org> >
> Subject: [CentOS] Errors on an SSD drive
>
> I am building a new system using an Kingston 240GB SSD drive I pulled 
> from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos 
> install went fine and ran for a couple days then got errors on the 
> console.  Here is an example:
>
> [168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result: 
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0 
> 00 00 08 00
> [168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
> [168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result: 
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0 
> 00 00 08 00
> [168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
> [168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result: 
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0 
> 00 00 08 00
> [168787.830544] blk_update_request: I/O error, dev sda, sector 17066160
>
> Eventually, I could not do anything on the system.  Not even a 
> 'reboot'.  I had to do a cold power cycle to bring things back.
>
> Is there anything to do about this or trash the drive and start anew?
>
> Thanks
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-09 Thread Eliezer Croitoru
I have yet to see a SSD read\write error which wasn't related to disk issues
like a bad sector but the controller might have an issue with the drive.
To verify it you will need to burn some read\write IOPS of the drive but if
it's under warranty then it's better to verify it now then later.

Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il



-Original Message-
From: CentOS [mailto:centos-boun...@centos.org] On Behalf Of Robert
Moskowitz
Sent: Wednesday, August 9, 2017 17:03
To: CentOS mailing list 
Subject: [CentOS] Errors on an SSD drive

I am building a new system using an Kingston 240GB SSD drive I pulled 
from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos 
install went fine and ran for a couple days then got errors on the 
console.  Here is an example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0 
00 00 08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0 
00 00 08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0 
00 00 08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 
'reboot'.  I had to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?

Thanks

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-09 Thread Mark Haney
To be honest, I'd not try a btrfs volume on a notebook SSD. I did that on a
couple of systems and it corrupted pretty quickly.  I'd stick with xfs/ext4
if you manage to get the drive working again.


Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Aug 9, 2017 at 1:48 PM, hw  wrote:

> Robert Moskowitz wrote:
>
>> I am building a new system using an Kingston 240GB SSD drive I pulled
>> from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
>> install went fine and ran for a couple days then got errors on the
>> console.  Here is an example:
>>
>> [168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> [168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
>> 00 00 08 00
>> [168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
>> [168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> [168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
>> 00 00 08 00
>> [168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
>> [168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> [168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
>> 00 00 08 00
>> [168787.830544] blk_update_request: I/O error, dev sda, sector 17066160
>>
>> Eventually, I could not do anything on the system.  Not even a 'reboot'.
>> I had to do a cold power cycle to bring things back.
>>
>> Is there anything to do about this or trash the drive and start anew?
>>
>
> Make sure the cables and power supply are ok.  Try the drive in another
> machine
> that has a different controller to see if there is an incompatibility
> between
> the drive and the controller.
>
> You could make a btrfs file system on the whole device: that should say
> that
> a trim operation is performed for the whole device.  Maybe that helps.
>
> If the errors persist, replace the drive.  I悲 use Intel SSDs because they
> seam to have the least problems with broken firmwares.  Do not use SSDs
> with
> hardware RAID controllers unless the SSDs were designed for this
> application.
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
>


-- 
[image: photo]
Mark Haney
Network Engineer at NeoNova
919-460-3330 <(919)%20460-3330> (opt 1) • mark.ha...@neonova.net
www.neonova.net 
  

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-09 Thread hw

Robert Moskowitz wrote:

I am building a new system using an Kingston 240GB SSD drive I pulled from my 
notebook (when I had to upgrade to a 500GB SSD drive).  Centos install went 
fine and ran for a couple days then got errors on the console.  Here is an 
example:

[168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
[168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
[168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0 00 00 
08 00
[168787.830544] blk_update_request: I/O error, dev sda, sector 17066160

Eventually, I could not do anything on the system.  Not even a 'reboot'.  I had 
to do a cold power cycle to bring things back.

Is there anything to do about this or trash the drive and start anew?


Make sure the cables and power supply are ok.  Try the drive in another machine
that has a different controller to see if there is an incompatibility between
the drive and the controller.

You could make a btrfs file system on the whole device: that should say that
a trim operation is performed for the whole device.  Maybe that helps.

If the errors persist, replace the drive.  I´d use Intel SSDs because they
seam to have the least problems with broken firmwares.  Do not use SSDs with
hardware RAID controllers unless the SSDs were designed for this application.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Errors on an SSD drive

2017-08-09 Thread Chris Murphy
If it's a bad sector problem, you'd write to sector 17066160 and see if the
drive complies or spits back a write error. It looks like a bad sector in
that the same LBA is reported each time but I've only ever seen this with
both a read error and a UNC error. So I'm not sure it's a bad sector.

What is DID_BAD_TARGET?

And what do you get for
smartctl -x 

Chris Murphy

On Wed, Aug 9, 2017, 8:03 AM Robert Moskowitz  wrote:

> I am building a new system using an Kingston 240GB SSD drive I pulled
> from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
> install went fine and ran for a couple days then got errors on the
> console.  Here is an example:
>
> [168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
> 00 00 08 00
> [168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
> [168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
> 00 00 08 00
> [168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
> [168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
> 00 00 08 00
> [168787.830544] blk_update_request: I/O error, dev sda, sector 17066160
>
> Eventually, I could not do anything on the system.  Not even a
> 'reboot'.  I had to do a cold power cycle to bring things back.
>
> Is there anything to do about this or trash the drive and start anew?
>
> Thanks
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos