Re: Boot-time hard drive errors

2013-02-24 Thread Martin Alejandro Paredes Sanchez
On Sunday 24 February 2013 14:33:06 Ronald F. Guilmette wrote:
> I have a somewhat eclectic system, currently running (or at any rate,
> trying to run) 9.1-RELEASE.  The system in question contains three
> drives, to wit:
>
> ATA-8 SATA 3.x device
> ATA-8 SATA 1.x device
> ATA-8 SATA 3.x device
>
> Previously, I had the ST3500320AS in this system, along with one other
> entirely different Seagate drive, i.e. one not shown in the list above.
> (Also, I was previously running 8.3-RELEASE and only recently updated
> to 9.1-RELEASE.)
>
> Since I reconfigured the system to its current state, i.e. with the set
> of three drives listed above, whenever I reboot the system, about 50%
> of the time, when the boot process gets down to the point where it
> would ordinarily be printing out the messages relating to ada0, ada1,
> etc. suddenly I start to get a massive and apparently endless stream
> of error messages, apparently relating to one of the drives listed
> above, but the stream actually alternates between two consecutive
> error messages, both undoubtedly related to each other.
>

Does your HDD controller is SATA 3?

I had a similar problem (some times could not boot) and was caused because my 
HDD controller is SATA 1

Intel ICH5 SATA150 controller

And my hard disk is SATA 2

WDC WD2500AVVS-00L2B0 01.03A01

The problem disapear when I lock the HDD at 150 MB/s (jumper settings the HDD 
to SATA 1)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Boot-time hard drive errors

2013-02-24 Thread Simon


Have you tried Pause/Break to see if you could feeze the screen to get the
error message?

I would stress test all three drives to see if they pass with flying colors. One
or more of your drives could be indeed flaky, regardless being new, that means
little. Also, something could be conflicting from time to time, that could also
show up under stress testing.

Make backup if you have important data before stress testing.

-Simon

On Sun, 24 Feb 2013 13:33:06 -0800, Ronald F. Guilmette wrote:



>I have a somewhat eclectic system, currently running (or at any rate,
>trying to run) 9.1-RELEASE.  The system in question contains three
>drives, to wit:

>ATA-8 SATA 3.x device
>ATA-8 SATA 1.x device
>ATA-8 SATA 3.x device

>Previously, I had the ST3500320AS in this system, along with one other
>entirely different Seagate drive, i.e. one not shown in the list above.
>(Also, I was previously running 8.3-RELEASE and only recently updated
>to 9.1-RELEASE.)

>Since I reconfigured the system to its current state, i.e. with the set
>of three drives listed above, whenever I reboot the system, about 50%
>of the time, when the boot process gets down to the point where it
>would ordinarily be printing out the messages relating to ada0, ada1,
>etc. suddenly I start to get a massive and apparently endless stream
>of error messages, apparently relating to one of the drives listed
>above, but the stream actually alternates between two consecutive
>error messages, both undoubtedly related to each other.

>The boot process never completes, and I am just left staring at a
>screen that's displaying, in very rapid succession, first the one
>error message and then the other, and then the first one again, and
>then the second one again, and on and on like that.

>Unfortunately, the two error messages are being printed on the screen
>so fast (and alternating, as described above) that I cannot even read
>them, but I could just barely make out that they seem to relate to ada2...
>well, anyway, one or another of the hard drives.

>I do not know the proper way to rectify whatever is causing these "flaky"
>errors.  I use the term "flaky" because, as I have said, this boot-time
>problem only seems to occur maybe about 50% of the time, and the rest
>of the time when I boot up there is no problem whatsoever.

>Because I am able to boot up successfully, with no problems whatsoever,
>a significant fraction of the time, I am inclined to think that whatever
>is causing the failure is not actually a hardware fault.  (And by the way,
>the WDC drive and the Hitachi drive are both practically brand new.  That
>doesn't prove anything, of course, but it does make me think that they
>are unlikely to have serious hardware faults.)

>I would report this problem by filing a standard PR, but as I've said
>above, I can't even read the error messages, because they are being
>printed in such rapid succession, so I'm not sure that filing a PR
>would be useful to anybody.  I mean what would it say?  That I'm getting
>some unspecified failure at boot time that seems to relate to the hard
>drives in this system?  That kind of PR would clearly not be very helpful.

>Has anyone else ever encountered symptoms like those I have listed
>above, either with 9.1-RELEASE or with any other version of FreeBSD?


>Regards,
>rfg
>___
>freebsd-questions@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"




___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Boot-time hard drive errors

2013-02-24 Thread Ronald F. Guilmette


I have a somewhat eclectic system, currently running (or at any rate,
trying to run) 9.1-RELEASE.  The system in question contains three
drives, to wit:

ATA-8 SATA 3.x device
ATA-8 SATA 1.x device
ATA-8 SATA 3.x device

Previously, I had the ST3500320AS in this system, along with one other
entirely different Seagate drive, i.e. one not shown in the list above.
(Also, I was previously running 8.3-RELEASE and only recently updated
to 9.1-RELEASE.)

Since I reconfigured the system to its current state, i.e. with the set
of three drives listed above, whenever I reboot the system, about 50%
of the time, when the boot process gets down to the point where it
would ordinarily be printing out the messages relating to ada0, ada1,
etc. suddenly I start to get a massive and apparently endless stream
of error messages, apparently relating to one of the drives listed
above, but the stream actually alternates between two consecutive
error messages, both undoubtedly related to each other.

The boot process never completes, and I am just left staring at a
screen that's displaying, in very rapid succession, first the one
error message and then the other, and then the first one again, and
then the second one again, and on and on like that.

Unfortunately, the two error messages are being printed on the screen
so fast (and alternating, as described above) that I cannot even read
them, but I could just barely make out that they seem to relate to ada2...
well, anyway, one or another of the hard drives.

I do not know the proper way to rectify whatever is causing these "flaky"
errors.  I use the term "flaky" because, as I have said, this boot-time
problem only seems to occur maybe about 50% of the time, and the rest
of the time when I boot up there is no problem whatsoever.

Because I am able to boot up successfully, with no problems whatsoever,
a significant fraction of the time, I am inclined to think that whatever
is causing the failure is not actually a hardware fault.  (And by the way,
the WDC drive and the Hitachi drive are both practically brand new.  That
doesn't prove anything, of course, but it does make me think that they
are unlikely to have serious hardware faults.)

I would report this problem by filing a standard PR, but as I've said
above, I can't even read the error messages, because they are being
printed in such rapid succession, so I'm not sure that filing a PR
would be useful to anybody.  I mean what would it say?  That I'm getting
some unspecified failure at boot time that seems to relate to the hard
drives in this system?  That kind of PR would clearly not be very helpful.

Has anyone else ever encountered symptoms like those I have listed
above, either with 9.1-RELEASE or with any other version of FreeBSD?


Regards,
rfg
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: "make package" vs "pkg create"

2013-02-24 Thread Matthew Seaman
On 24/02/2013 16:33, Joshua Isom wrote:
> I tried making a build jail, not with pourdriere or tinderbox.  I just
> went to the ports and ran `make -DBATCH package-recursive clean` to get
> packages created.  I ran `pkg add *` in the packages/All directory, but
> all failed because of MANIFEST missing.  I'm guessing this is a bug in
> the .mk files, since I do have WITH_PKGNG set.  Is this a known problem
> or is there supposed to be a different way to do it?  Am I just supposed
> to use pourdriere or the source to keep my ports up to date until all
> the packages are rebuilt on freebsd.org?

'MANIFEST' is pretty fundamental to pkgs -- probably the error you are
seeing is because there are some other sort of files that aren't pkgng
packages present.  That's going to upset pkg add.  What's the history of
this jail?  Did it start out using pkgng, or did it get converted from
pkg_tools?  If the latter, did the conversion go smoothly?  Can you use
eg. 'pkg info' in your jail to get an accurate listing of the packages
installed there?

If 'WITH_PKGNG' is set in your make.conf, then 'make package' will
certainly use pkgng to generate packages.  I do that a lot in testing,
and it works just fine.

If you can clear out the non-pkgng stuff, the recommended way to do what
you intend is to generate a repository catalogue, and then use 'pkg
install'.  'pkg add' really should only be considered for installing
single packages when there is absolutely no alternative.

You should be able to run 'pkg repo /usr/ports/packages' to build a
repository catalogue for all the pkgng packages you've built in your
jail.  Then you can either mount the jail's package tree on the machine
where you want to install packages, or make it available through a web
server.  Set PACKAGESITE appropriately in ${LOCALBASE}/etc/pkg.conf --
for instance, this is what you'ld set to use a repo made as above and
mounted in the same location:

   PACKAGESITE : file:/usr/ports/packages

You can then use 'pkg install' or 'pkg upgrade' in the usual way.

Note: you won't need to install every package in your repo -- many of
them will exist solely in order to facilitate building other packages.
 If you choose the packages you specifically want, pkgng will sort out
installing the required dependencies, and moreover will set the
autoremove flags appropriately, so you could later purge things
installed solely as dependencies of packages you no longer want.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey




signature.asc
Description: OpenPGP digital signature


Re: "make package" vs "pkg create"

2013-02-24 Thread Steve O'Hara-Smith
On Sun, 24 Feb 2013 10:33:45 -0600
Joshua Isom  wrote:

> I tried making a build jail, not with pourdriere or tinderbox.  I just 
> went to the ports and ran `make -DBATCH package-recursive clean` to get 
> packages created.  I ran `pkg add *` in the packages/All directory, but 
> all failed because of MANIFEST missing.  I'm guessing this is a bug in 
> the .mk files, since I do have WITH_PKGNG set.

No bug, but you will need to run pkg repo  to turn the collection
of packages you've built into a pkgng repository.

>  Is this a known problem 
> or is there supposed to be a different way to do it?  Am I just supposed 
> to use pourdriere or the source to keep my ports up to date until all 
> the packages are rebuilt on freebsd.org?

You don't need to use poudriere but it is very convenient once set
up. For example - updating the ports tree and rebuilding the affected ports

poudriere ports -u
poudriere bulk -f /root/packages -j build

build is my build jail, and /root/packages is a file listing the
packages I want.

-- 
Steve O'Hara-Smith 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Can't build kernel

2013-02-24 Thread Andre Goree
On 02/23/2013 07:04 PM, ill...@gmail.com wrote:
> On 22 February 2013 18:56, Andre Goree  wrote:
> 
>> cc1: warnings being treated as errors
> 
> Need to set NO_WERROR perhaps?
> 

Thanks for the suggestion, though it did not help.  This turned out to
be user error (i.e. a failed patch).  After erasing /usr/src and pulling
everything down again, I was able to rebuild without issue.  Thanks.

-- 
Andre Goree
an...@drenet.info



signature.asc
Description: OpenPGP digital signature


"make package" vs "pkg create"

2013-02-24 Thread Joshua Isom
I tried making a build jail, not with pourdriere or tinderbox.  I just 
went to the ports and ran `make -DBATCH package-recursive clean` to get 
packages created.  I ran `pkg add *` in the packages/All directory, but 
all failed because of MANIFEST missing.  I'm guessing this is a bug in 
the .mk files, since I do have WITH_PKGNG set.  Is this a known problem 
or is there supposed to be a different way to do it?  Am I just supposed 
to use pourdriere or the source to keep my ports up to date until all 
the packages are rebuilt on freebsd.org?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Strange delays in ZFS scrub or resilver

2013-02-24 Thread Rob Rati
A bit of a stab in the dark here, but are any of the disks in your array 
Advanced Format drives?  If so, did you create a pool with a block size of 4k?  
Lastly, are all the partitions on your disks (if any) aligned to 4k sector 
boundaries (in the case of the Advanced Format disks)?

Rob

On Feb 23, 2013, at 11:23 PM, John Levine wrote:

> I have a raidz of three 1 TB SATA drives, in USB enclosures.  One of
> the disks went bad, so I replaced it last night and it's been
> resilvering ever since.  I can watch the activity lights on the disks
> and it cranks away for a minute or so, then stops for a minute, then
> cranks for a minute, and so forth.  If I do a zpool status while it's
> stopped, the zpool waits until the I/O resumes, and a ^T shows it
> waiting for zio->io_cv.
> 
> I'm running FreeBSD 9.1, amd64 version, totally vanilla install on a
> mini-itx box with 4GB of RAM.  The root/swap disk is an SSD separate
> from the zfs disks.  When the disks are active, top shows about 10%
> system time and 4% interrupt.  When it isn't, top shows about 99.8%
> idle.  The server isn't doing much else, and nothing else currently
> touches the disks.  (They're for remote backup of a system somewhere
> else, and I have the backup job turned off until resilvering
> completes.)
> 
> I'm running this on the console, and there are no disk error messages.
> 
> Any idea what's going on or how to fix it?  I could move the disks to
> an ESATA enclosure if USB is losing interrupts or something.
> 
> My recollection is that when I've done a scrub, it does the same thing,
> work, pause, work, pause.
> 
> R's,
> John
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
> 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: HAST - detect failure and restore avoiding an outage?

2013-02-24 Thread Pawel Jakub Dawidek
On Sun, Feb 24, 2013 at 12:05:06PM +0200, Mikolaj Golub wrote:
> On Sat, Feb 23, 2013 at 09:51:03PM +0100, Pawel Jakub Dawidek wrote:
> 
> > I'm fine with the patchi except for missing breaks in switch added to
> > hastd/primary.c.
> 
> Oops. Fixed. Thanks!
> 
> > I'm also wondering... You count all those errors separately just to
> > print them as one number. If we do that already let's print them
> > separately, eg.
> > 
> > local i/o errors: read(0), write(3), delete(5), flush(9)
> 
> The idea was that hastd provided all available counters, and hastctl
> showed only aggregated counter just to save a screen space, but if one
> wanted to write its own utility to monitor hastd, which would talk
> directly to hastd via socket, she would be able to see all counters
> separately.
> 
> But your idea with writing errors in one string looks better, as it
> allows to save a screen space and provide more detailed info. I would
> prefer a little different output though:
> 
>   role: secondary
>   provname: test
>   localpath: /dev/md102
>   extentsize: 2097152 (2.0MB)
>   keepdirty: 0
>   remoteaddr: kopusha:7771
>   replication: memsync
>   status: complete
>   dirty: 0 (0B)
>   statistics:
> reads: 13
> writes: 521
> deletes: 0
> flushes: 0
> activemap updates: 0
> local i/o errors:
>   read: 13, write: 425, delete: 0, flush: 0
> 
> but don't have a strong opinion and will be ok with yours if you don't
> like my version.

My only comment would be to keep that in one line so it is easier to
grep. And merging those two lines won't exceed 80 chars.

> > BTW. Why not to count activemap update errors as write and flush errors?
> 
> I need (internally) separate counters for activemap errors because
> they are updated by the different thread and I wouldn't want to
> introduce locking for error counter update operations. As hastctl was
> supposed to show an aggregated counter I didn't bother much how to
> make activemap update errors to count as write and flush errors. I
> improved this too in the updated patch:
> 
> http://people.freebsd.org/~trociny/hast.stat_error.2.patch

The patch looks good.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpWIEUPJOQes.pgp
Description: PGP signature


Re: HAST - detect failure and restore avoiding an outage?

2013-02-24 Thread Mikolaj Golub
On Sat, Feb 23, 2013 at 09:51:03PM +0100, Pawel Jakub Dawidek wrote:

> I'm fine with the patchi except for missing breaks in switch added to
> hastd/primary.c.

Oops. Fixed. Thanks!

> I'm also wondering... You count all those errors separately just to
> print them as one number. If we do that already let's print them
> separately, eg.
> 
>   local i/o errors: read(0), write(3), delete(5), flush(9)

The idea was that hastd provided all available counters, and hastctl
showed only aggregated counter just to save a screen space, but if one
wanted to write its own utility to monitor hastd, which would talk
directly to hastd via socket, she would be able to see all counters
separately.

But your idea with writing errors in one string looks better, as it
allows to save a screen space and provide more detailed info. I would
prefer a little different output though:

  role: secondary
  provname: test
  localpath: /dev/md102
  extentsize: 2097152 (2.0MB)
  keepdirty: 0
  remoteaddr: kopusha:7771
  replication: memsync
  status: complete
  dirty: 0 (0B)
  statistics:
reads: 13
writes: 521
deletes: 0
flushes: 0
activemap updates: 0
local i/o errors:
  read: 13, write: 425, delete: 0, flush: 0

but don't have a strong opinion and will be ok with yours if you don't
like my version.

> 
> BTW. Why not to count activemap update errors as write and flush errors?

I need (internally) separate counters for activemap errors because
they are updated by the different thread and I wouldn't want to
introduce locking for error counter update operations. As hastctl was
supposed to show an aggregated counter I didn't bother much how to
make activemap update errors to count as write and flush errors. I
improved this too in the updated patch:

http://people.freebsd.org/~trociny/hast.stat_error.2.patch

-- 
Mikolaj Golub
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


ZFS root, error 2 when mounting root

2013-02-24 Thread bw.mail.lists
Basically, I tried to follow 
https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/9.0-RELEASE, but ended up 
with a system that didn't know how to mount /.


There are two scripts attached.

zfsnocache.sh follows the instructions on the wiki. The system booted 
just fine, but when it got to the part where it mounts the root 
partition, it stopped with 'error 2' 'unknown file system'. I could 
import the pool when booting from LiveFS, I wrote to it, it was working 
fine, but at boot it just refused to be mounted as /.


zfswithcache.sh from http://strahlert.net/wordpress/?p=142, I think. 
This worked with no issues.


The main difference I see between those two scripts is that one doesn't 
use a cache file and the other one does, hence the name of the scripts. 
But it should work without cachefile too, shouldn't it? The other 
difference is how mountpoints are set, but I can't figure out what could 
be wrong there.


Can someone please explain why zfsnocache fails to mount / ?
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"