Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Charles Sprickman

On Wed, 21 Jul 2010, Charles Sprickman wrote:


On Tue, 20 Jul 2010, alan bryan wrote:




--- On Mon, 7/19/10, Dan Langille  wrote:


From: Dan Langille 
Subject: Re: Problems replacing failing drive in ZFS pool
To: "Freddie Cash" 
Cc: "freebsd-stable" 
Date: Monday, July 19, 2010, 7:07 PM
On 7/19/2010 12:15 PM, Freddie Cash
wrote:
> On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore 
wrote:

>> So you think it's because when I switch from the
old disk to the new disk,
>> ZFS doesn't realize the disk has changed, and
thinks the data is just
>> corrupt now? Even if that happens, shouldn't the
pool still be available,
>> since it's RAIDZ1 and only one disk has gone
away?
> > I think it's because you pull the old drive, boot with
the new drive,
> the controller re-numbers all the devices (ie da3 is
now da2, da2 is
> now da1, da1 is now da0, da0 is now da6, etc), and ZFS
thinks that all
> the drives have changed, thus corrupting the
pool.  I've had this
> happen on our storage servers a couple of times before
I started using
> glabel(8) on all our drives (dead drive on RAID
controller, remove
> drive, reboot for whatever reason, all device nodes
are renumbered,
> everything goes kablooey).

Can you explain a bit about how you use glabel(8) in
conjunction with ZFS?  If I can retrofit this into an
exist ZFS array to make things easier in the future...

8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME STATE     READ WRITE CKSUM
        storage
   ONLINE
   0     0
   0
          raidz1 ONLINE       0
   0     0
            ad8
   ONLINE
   0     0
   0
            ad10 ONLINE       0
   0     0
            ad12 ONLINE       0
   0     0
            ad14 ONLINE       0
   0     0
            ad16 ONLINE       0
   0     0

> Of course, always have good backups.  ;)

In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

-- Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html


Two things:

-What's the preferred labelling method for disks that will be used with zfs 
these days?  geom_label or gpt labels?  I've been using the latter and I find 
them a little simpler.


-I think that if you already are using gpt partitioning, you can add a gpt 
label after the fact (ie: gpart -i index# -l your_label adaX).  "gpart list" 
will give you a list of index numbers.


Oops.

That should be "gpart modify -i index# -l your_label adax".


Charles


--Alan Bryan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Charles Sprickman

On Tue, 20 Jul 2010, alan bryan wrote:




--- On Mon, 7/19/10, Dan Langille  wrote:


From: Dan Langille 
Subject: Re: Problems replacing failing drive in ZFS pool
To: "Freddie Cash" 
Cc: "freebsd-stable" 
Date: Monday, July 19, 2010, 7:07 PM
On 7/19/2010 12:15 PM, Freddie Cash
wrote:
> On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore 
wrote:

>> So you think it's because when I switch from the
old disk to the new disk,
>> ZFS doesn't realize the disk has changed, and
thinks the data is just
>> corrupt now? Even if that happens, shouldn't the
pool still be available,
>> since it's RAIDZ1 and only one disk has gone
away?
> 
> I think it's because you pull the old drive, boot with

the new drive,
> the controller re-numbers all the devices (ie da3 is
now da2, da2 is
> now da1, da1 is now da0, da0 is now da6, etc), and ZFS
thinks that all
> the drives have changed, thus corrupting the
pool.  I've had this
> happen on our storage servers a couple of times before
I started using
> glabel(8) on all our drives (dead drive on RAID
controller, remove
> drive, reboot for whatever reason, all device nodes
are renumbered,
> everything goes kablooey).

Can you explain a bit about how you use glabel(8) in
conjunction with ZFS?  If I can retrofit this into an
exist ZFS array to make things easier in the future...

8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010

]# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME 
STATE     READ WRITE CKSUM

        storage
   ONLINE
   0     0
   0
          raidz1 
ONLINE       0

   0     0
            ad8
   ONLINE
   0     0
   0
            ad10 
ONLINE       0

   0     0
            ad12 
ONLINE       0

   0     0
            ad14 
ONLINE       0

   0     0
            ad16 
ONLINE       0

   0     0

> Of course, always have good backups.  ;)

In my case, this ZFS array is the backup.  ;)

But I'm setting up a tape library, real soon now

-- Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html


Two things:

-What's the preferred labelling method for disks that will be used with 
zfs these days?  geom_label or gpt labels?  I've been using the latter and 
I find them a little simpler.


-I think that if you already are using gpt partitioning, you can add a 
gpt label after the fact (ie: gpart -i index# -l your_label adaX).  "gpart 
list" will give you a list of index numbers.


Charles


--Alan Bryan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread jhell
On 07/21/2010 02:14, Joshua Boyd wrote:
> [r...@foghornleghorn ~]# zpool replace tank da0 label/disk01
> cannot open 'label/disk01': no such GEOM provider
> must be a full path or shorthand device name

Of course you cant. You have labeled a disk that is already in use so in
turn the label should never appear in dev/label/*.

If you tried to re-silver the same disk that was already in use I would
think if it could be done that the result would be of inconsistent data
and write errors all over the place.


Regards,

-- 

 jhell,v

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Joshua Boyd
On Wed, Jul 21, 2010 at 2:09 AM, Joshua Boyd  wrote:

> On Wed, Jul 21, 2010 at 1:57 AM, alan bryan wrote:
>
>>
>>
>> --- On Mon, 7/19/10, Dan Langille  wrote:
>>
>> > From: Dan Langille 
>> > Subject: Re: Problems replacing failing drive in ZFS pool
>> > To: "Freddie Cash" 
>> > Cc: "freebsd-stable" 
>> > Date: Monday, July 19, 2010, 7:07 PM
>> > On 7/19/2010 12:15 PM, Freddie Cash
>> > wrote:
>> > > On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore> >
>> > wrote:
>> > >> So you think it's because when I switch from the
>> > old disk to the new disk,
>> > >> ZFS doesn't realize the disk has changed, and
>> > thinks the data is just
>> > >> corrupt now? Even if that happens, shouldn't the
>> > pool still be available,
>> > >> since it's RAIDZ1 and only one disk has gone
>> > away?
>> > >
>> > > I think it's because you pull the old drive, boot with
>> > the new drive,
>> > > the controller re-numbers all the devices (ie da3 is
>> > now da2, da2 is
>> > > now da1, da1 is now da0, da0 is now da6, etc), and ZFS
>> > thinks that all
>> > > the drives have changed, thus corrupting the
>> > pool.  I've had this
>> > > happen on our storage servers a couple of times before
>> > I started using
>> > > glabel(8) on all our drives (dead drive on RAID
>> > controller, remove
>> > > drive, reboot for whatever reason, all device nodes
>> > are renumbered,
>> > > everything goes kablooey).
>> >
>> > Can you explain a bit about how you use glabel(8) in
>> > conjunction with ZFS?  If I can retrofit this into an
>> > exist ZFS array to make things easier in the future...
>> >
>> > 8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
>> >
>> > ]# zpool status
>> >   pool: storage
>> >  state: ONLINE
>> >  scrub: none requested
>> > config:
>> >
>> > NAME
>> > STATE READ WRITE CKSUM
>> > storage
>> >ONLINE
>> >0 0
>> >0
>> >   raidz1
>> > ONLINE   0
>> >0 0
>> > ad8
>> >ONLINE
>> >0 0
>> >0
>> > ad10
>> > ONLINE   0
>> >0 0
>> > ad12
>> > ONLINE   0
>> >0 0
>> > ad14
>> > ONLINE   0
>> >0 0
>> > ad16
>> > ONLINE   0
>> >0 0
>> >
>> > > Of course, always have good backups.  ;)
>> >
>> > In my case, this ZFS array is the backup.  ;)
>> >
>> > But I'm setting up a tape library, real soon now
>> >
>> > -- Dan Langille - http://langille.org/
>> > ___
>> > freebsd-stable@freebsd.org
>> > mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "
>> freebsd-stable-unsubscr...@freebsd.org"
>> >
>>
>> Dan,
>>
>> Here's how to do it after the fact:
>>
>>
>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html
>>
>> --Alan Bryan
>>
>
> [r...@foghornleghorn ~]# glabel label disk01 /dev/da0
> glabel: Can't store metadata on /dev/da0: Operation not permitted.
>
> Hrmph.
>

Nevermind, sysctl kern.geom.debugflags=16 solves that problem, but then you
get this:

[r...@foghornleghorn ~]# zpool replace tank da0 label/disk01
cannot open 'label/disk01': no such GEOM provider
must be a full path or shorthand device name



>
>
>>
>>
>>
>>
>>
>>
>> ___
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>>
>
>
>
> --
> Joshua Boyd
> JBipNet
>
> E-mail: boy...@jbip.net
>
> http://www.jbip.net
>



-- 
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net

http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Joshua Boyd
On Wed, Jul 21, 2010 at 1:57 AM, alan bryan  wrote:

>
>
> --- On Mon, 7/19/10, Dan Langille  wrote:
>
> > From: Dan Langille 
> > Subject: Re: Problems replacing failing drive in ZFS pool
> > To: "Freddie Cash" 
> > Cc: "freebsd-stable" 
> > Date: Monday, July 19, 2010, 7:07 PM
> > On 7/19/2010 12:15 PM, Freddie Cash
> > wrote:
> > > On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore >
> > wrote:
> > >> So you think it's because when I switch from the
> > old disk to the new disk,
> > >> ZFS doesn't realize the disk has changed, and
> > thinks the data is just
> > >> corrupt now? Even if that happens, shouldn't the
> > pool still be available,
> > >> since it's RAIDZ1 and only one disk has gone
> > away?
> > >
> > > I think it's because you pull the old drive, boot with
> > the new drive,
> > > the controller re-numbers all the devices (ie da3 is
> > now da2, da2 is
> > > now da1, da1 is now da0, da0 is now da6, etc), and ZFS
> > thinks that all
> > > the drives have changed, thus corrupting the
> > pool.  I've had this
> > > happen on our storage servers a couple of times before
> > I started using
> > > glabel(8) on all our drives (dead drive on RAID
> > controller, remove
> > > drive, reboot for whatever reason, all device nodes
> > are renumbered,
> > > everything goes kablooey).
> >
> > Can you explain a bit about how you use glabel(8) in
> > conjunction with ZFS?  If I can retrofit this into an
> > exist ZFS array to make things easier in the future...
> >
> > 8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
> >
> > ]# zpool status
> >   pool: storage
> >  state: ONLINE
> >  scrub: none requested
> > config:
> >
> > NAME
> > STATE READ WRITE CKSUM
> > storage
> >ONLINE
> >0 0
> >0
> >   raidz1
> > ONLINE   0
> >0 0
> > ad8
> >ONLINE
> >0 0
> >0
> > ad10
> > ONLINE   0
> >0 0
> > ad12
> > ONLINE   0
> >0 0
> > ad14
> > ONLINE   0
> >0 0
> > ad16
> > ONLINE   0
> >0 0
> >
> > > Of course, always have good backups.  ;)
> >
> > In my case, this ZFS array is the backup.  ;)
> >
> > But I'm setting up a tape library, real soon now
> >
> > -- Dan Langille - http://langille.org/
> > ___
> > freebsd-stable@freebsd.org
> > mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org
> "
> >
>
> Dan,
>
> Here's how to do it after the fact:
>
>
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html
>
> --Alan Bryan
>

[r...@foghornleghorn ~]# glabel label disk01 /dev/da0
glabel: Can't store metadata on /dev/da0: Operation not permitted.

Hrmph.


>
>
>
>
>
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>



-- 
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net

http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread alan bryan


--- On Mon, 7/19/10, Dan Langille  wrote:

> From: Dan Langille 
> Subject: Re: Problems replacing failing drive in ZFS pool
> To: "Freddie Cash" 
> Cc: "freebsd-stable" 
> Date: Monday, July 19, 2010, 7:07 PM
> On 7/19/2010 12:15 PM, Freddie Cash
> wrote:
> > On Mon, Jul 19, 2010 at 8:56 AM, Garrett Moore 
> wrote:
> >> So you think it's because when I switch from the
> old disk to the new disk,
> >> ZFS doesn't realize the disk has changed, and
> thinks the data is just
> >> corrupt now? Even if that happens, shouldn't the
> pool still be available,
> >> since it's RAIDZ1 and only one disk has gone
> away?
> > 
> > I think it's because you pull the old drive, boot with
> the new drive,
> > the controller re-numbers all the devices (ie da3 is
> now da2, da2 is
> > now da1, da1 is now da0, da0 is now da6, etc), and ZFS
> thinks that all
> > the drives have changed, thus corrupting the
> pool.  I've had this
> > happen on our storage servers a couple of times before
> I started using
> > glabel(8) on all our drives (dead drive on RAID
> controller, remove
> > drive, reboot for whatever reason, all device nodes
> are renumbered,
> > everything goes kablooey).
> 
> Can you explain a bit about how you use glabel(8) in
> conjunction with ZFS?  If I can retrofit this into an
> exist ZFS array to make things easier in the future...
> 
> 8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
> 
> ]# zpool status
>   pool: storage
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME       
> STATE     READ WRITE CKSUM
>         storage 
>    ONLINE   
>    0     0 
>    0
>           raidz1   
> ONLINE       0 
>    0     0
>             ad8 
>    ONLINE   
>    0     0 
>    0
>             ad10   
> ONLINE       0 
>    0     0
>             ad12   
> ONLINE       0 
>    0     0
>             ad14   
> ONLINE       0 
>    0     0
>             ad16   
> ONLINE       0 
>    0     0
> 
> > Of course, always have good backups.  ;)
> 
> In my case, this ZFS array is the backup.  ;)
> 
> But I'm setting up a tape library, real soon now
> 
> -- Dan Langille - http://langille.org/
> ___
> freebsd-stable@freebsd.org
> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 

Dan,

Here's how to do it after the fact:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html

--Alan Bryan






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)

2010-07-20 Thread Markus Gebert

On 20.07.2010, at 21:59, John Baldwin wrote:

>> I started narrowing the revisions down until I 
>> found out, that while on r202386 I'm still able to trigger the MCE, r202387 
>> seems to solve the problem on CURRENT:
>> 
>> http://svn.freebsd.org/viewvc/base?view=revision&revision=202387
> 
> Although this change was MFC'd, it was later disabled by default because it 
> causes issues on other machines.  I think there is a tunable you need to set 
> in loader.conf to enable it for 8.1.  Attilio (the author of that commit) 
> should know which tunable to set.

Might be this one in sys/amd64/amd64/clock.c:


static int lapic_allclocks = 1;
TUNABLE_INT("machdep.lapic_allclocks", &lapic_allclocks);


The r202387 changes put this into local_apic.c, guess it was moved later on (or 
after MFC), and that's why I couldn't find it on 8-stable. And, indeed, this 
tunable seems to be gone again in current. Testing with 
machdep.lapic_allclocks=0 right now. So far it looks very promising. I'll let 
it run overnight.

Another thing though: Today I compared verbose boot output from 8-stable and 
the current box. I saw that the ioapic sets up IRQ routing differently on these 
two systems although the hardware is the same. This seemed not so interesting 
at first, but then I noticed that 8-stable sets up two routes (to lapic0 and 
lapic2, or sometimes lapic3) for IRQ58 (mpt0), while current only uses one 
route (to lapic0).

I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box behave like 
the one running current. Indeed, this seems to have changed IRQ58 to be routed 
to lapic0 only. And the box was running for hours without showing the symptoms.

I just checked boot verbose outpout of my 8-stable box again (booted with 
machdep.lapic_allclocks=0 as mentioned above). And now it seems to have set up 
IRQ routes just like the current box (one route for IRQ58 to lapic0).

So I don't get which issue came first... If either one is ruled out, the 
problem seems to be gone. Was it the clock issue causing wrong IRQ routing 
setup which in turn causes mpt or the CPU go nuts? Or is mpt having two 
interrupt routes actually a normal thing (then why doesn't current behave this 
way?), but the mpt driver causes strange thins when operating with clock 
issues? Or have I misinterpreted something?

Here's the boot verbose output of ioapic related to interrupts 56 (em0), 57 
(em1) and 58 (mpt0):

 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=1, MCEs can be 
reproduced easily) 
# egrep '^ioapic' boot.normal | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 1 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 2 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 3 vector 50


 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=0, test currently 
running, no MCEs so far) 
# egrep '^ioapic' boot.lapic_allclocks0 | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57


 2nd X4100M2 - running current (MCEs cannot be reproduced) 
# dmesg | egrep '^ioapic' | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57



Markus

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)

2010-07-20 Thread Markus Gebert

On 20.07.2010, at 10:15, jhell wrote:

>> Any ideas how to proceed?
>> 
> 
> Adding to this I remembered some specific commits that caught my attention 
> when they happened. Specifically they were to mca.c (locate mca) on my 
> machine provided the file paths and svn log provided the commit log.
> 
> When you said April and I seen the log it rang a bell.

Thank you for the hint. We've already tried to reproduce with MCA disabled, and 
didn't succeed. The thing is, without altering the bios default settings, the 
OS doesn't even get an MCE before the system reboots itself showing those 
"hypertransport sync flood" and "pci express fatal error" stuff during POST. So 
I guess it's safe to say, that the problem happens before MCA can kick in.


Markus___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)

2010-07-20 Thread John Baldwin
On Saturday, July 17, 2010 2:35:21 pm Markus Gebert wrote:
> 
> On 13.07.2010, at 16:02, Markus Gebert wrote:
> 
> > Unfortunately, I have not been able to get anything useful out the svn 
commit logs, which could explain this. Maybe someone else has an idea what 
could have changed between 7 and 8 to break it, and again between 8 and 
CURRENT to magically fix it again.
> 
> I tracked this down further. I couldn't easily downgrade my 8.1 installation 
to see when the problem was introduced because the zpool version used is 14. 
So I tried to figure out, when the problem was solved in CURRENT.
> 
> I started with the first possible revision that can boot off my v14 pool 
(r201143, Dec 28, zfs v14 commit). With this revision, I was able to trigger 
the MCE.
> 
> Then I took some later revision (rev206010, Apr 1, chosen randomly), and I 
couldn't reproduce the problem. I started narrowing the revisions down until I 
found out, that while on r202386 I'm still able to trigger the MCE, r202387 
seems to solve the problem on CURRENT:
> 
> http://svn.freebsd.org/viewvc/base?view=revision&revision=202387

Although this change was MFC'd, it was later disabled by default because it 
causes issues on other machines.  I think there is a tunable you need to set 
in loader.conf to enable it for 8.1.  Attilio (the author of that commit) 
should know which tunable to set.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: today's 8.1/i386: panic: bad pte

2010-07-20 Thread Mikhail T.

20.07.2010 12:47, Alan Cox написав(ла):
Historically, this panic has indicated flakey memory.  This panic 
occurs because a memory location within a page table has unexpectedly 
changed to zero.
Ouch... Thanks for the hint (maybe, the panic should say something like 
that?)


In any case, is there a way to identify the the flakey DIMM? I did run 
memtest on this box and haven't received any errors... Thanks! Yours,


   -mi

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: today's 8.1/i386: panic: bad pte

2010-07-20 Thread Alan Cox

Mikhail T. wrote:

20.07.2010 12:47, Alan Cox написав(ла):
Historically, this panic has indicated flakey memory.  This panic 
occurs because a memory location within a page table has unexpectedly 
changed to zero.
Ouch... Thanks for the hint (maybe, the panic should say something 
like that?)


In any case, is there a way to identify the the flakey DIMM? I did run 
memtest on this box and haven't received any errors... Thanks! Yours,


No, not from the panic message.  If a thorough memtest didn't turn up a 
problem, then I would start looking for another cause.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: today's 8.1/i386: panic: bad pte

2010-07-20 Thread Alan Cox
On Mon, Jul 19, 2010 at 11:40 PM, Mikhail T.

> wrote:

> Some part of KDE4's kdm crashed at start-up and seems to have taken the
> entire machine with it:
>
>   kgdb /boot/kernel/kernel /var/crash/vmcore.22
>   GNU gdb 6.1.1 [FreeBSD]
>   Copyright 2004 Free Software Foundation, Inc.
>   GDB is free software, covered by the GNU General Public License, and
>   you are
>   welcome to change it and/or distribute copies of it under certain
>   conditions.
>   Type "show copying" to see the conditions.
>   There is absolutely no warranty for GDB.  Type "show warranty" for
>   details.
>   This GDB was configured as "i386-marcel-freebsd"...
>
>   Unread portion of the kernel message buffer:
>   <6>pid 18398 (drkonqi), uid 0: exited on signal 11 (core dumped)
>   TPTE at 0xbfca9488  IS ZERO @ VA 2a522000
>   panic: bad pte
>   Uptime: 2h28m24s
>   Physical memory: 1263 MB
>   Dumping 195 MB: 180 164 148 132 116 100 84 68 52 36 20 4
>
>   Reading symbols from /boot/kernel/splash_pcx.ko...Reading symbols
>   from /boot/kernel/splash_pcx.ko.symbols...done.
>   done.
>   Loaded symbols for /boot/kernel/splash_pcx.ko
>   Reading symbols from /boot/kernel/vesa.ko...Reading symbols from
>   /boot/kernel/vesa.ko.symbols...done.
>   done.
>   Loaded symbols for /boot/kernel/vesa.ko
>   Reading symbols from /boot/modules/nvidia.ko...done.
>   Loaded symbols for /boot/modules/nvidia.ko
>   Reading symbols from /boot/kernel/linux.ko...Reading symbols from
>   /boot/kernel/linux.ko.symbols...done.
>   done.
>   Loaded symbols for /boot/kernel/linux.ko
>   Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
>   /boot/kernel/acpi.ko.symbols...done.
>   done.
>   Loaded symbols for /boot/kernel/acpi.ko
>   Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols
>   from /boot/kernel/linprocfs.ko.symbols...done.
>   done.
>   Loaded symbols for /boot/kernel/linprocfs.ko
>   #0  doadump () at pcpu.h:231
>   231 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>   (kgdb) bt full
>   #0  doadump () at pcpu.h:231
>   No locals.
>   #1  0xc05d10a4 in boot (howto=260) at
>   /usr/src/sys/kern/kern_shutdown.c:416
>_giantcnt = Variable "_giantcnt" is not available.
>   (kgdb) where
>   #0  doadump () at pcpu.h:231
>   #1  0xc05d10a4 in boot (howto=260) at
>   /usr/src/sys/kern/kern_shutdown.c:416
>   #2  0xc05d12b1 in panic (fmt=Variable "fmt" is not available.
>   ) at /usr/src/sys/kern/kern_shutdown.c:590
>   #3  0xc07f0406 in pmap_remove_pages (pmap=0xc85bbc78) at
>   /usr/src/sys/i386/i386/pmap.c:4198
>   #4  0xc079516b in vmspace_exit (td=0xc51f3a00) at
>   /usr/src/sys/vm/vm_map.c:409
>   #5  0xc05a7253 in exit1 (td=0xc51f3a00, rv=139) at
>   /usr/src/sys/kern/kern_exit.c:303
>   #6  0xc05d3296 in sigexit (td=0xc51f3a00, sig=139) at
>   /usr/src/sys/kern/kern_sig.c:2872
>   #7  0xc05d47a8 in postsig (sig=11) at /usr/src/sys/kern/kern_sig.c:2759
>   #8  0xc06082f8 in ast (framep=0xe5fafd38) at
>   /usr/src/sys/kern/subr_trap.c:234
>   #9  0xc07e2c44 in doreti_ast () at
>   /usr/src/sys/i386/i386/exception.s:368
>
> Does this look familiar to anyone? Thanks!
>
>
Historically, this panic has indicated flakey memory.  This panic occurs
because a memory location within a page table has unexpectedly changed to
zero.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: handle_written_inodeblock: bad size

2010-07-20 Thread Jeremy Chadwick
On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote:
> On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote:
> > 19.07.2010 07:31, Jeremy Chadwick написав(ла):
> > >If you boot the machine in single-user, and run fsck manually, are there
> > >any errors?
> > Thanks, Jeremy... I wish, there was a way to learn, /which/
> > file-system is giving trouble... However, after sending the question
> > out last night, I tried to pkg_delete a package on the machine, and
> > was very lucky to see a file-system error (inode something or other)
> > before the panic struck. That, at least, told me, which file-system
> > was in trouble (/var).
> > [...]
> > And, IMO, at the very least, *any panic related to a file-system
> > must clearly identify the file-system in question*... What do you
> > think?
>
> [...] 
> Assuming work tonight isn't that busy for me, I'll see if I can dedicate
> some cycles to printing this information in the error string you saw.

I spent some time on this tonight.  It's not as simple as it sounds, for
me anyway.  Relevant source bits:

src/sys/ufs/ffs/ffs_softdep.c
src/sys/ufs/ffs/fs.h
src/sys/ufs/ffs/softdep.h

ffs_softdep.c, which is almost 6500 lines, contains a large number of
inode-related functions which can call panic().  Functions which have
easy access to the related inodedep struct are the ones which would be
able to print this information easily.  Sort of.

struct inodedep (see softdep.h) contains a member called id_fs, which is
struct fs (see fs.h).  struct fs contains a member called fs_fsmnt (a
char buffer), which is the name of the mounted filesystem.  fs_fsmnt[0]
should be NULL ('\0') if the filesystem isn't mounted.

So in the case of your panic within handle_written_inodeblock(), it
would be as simple as something like:

u_char *mntpt = NULL;

if (inodedep->id_fs->fs_fsmnt[0] != '\0')
mntpt = &inodedep->id_fs->fs_fsmnt;
else
/* XXX do what here? */

Then, the panic() statements later have to do something like this (taken
from real code):

if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno)
panic("%s: %s: %s #%jd mismatch %d != %jd",
"handle_written_inodeblock",
(mntpt ? mntpt) : "",
"direct pointer",
(intmax_t)adp->ad_lbn,
dp1->di_db[adp->ad_lbn],
(intmax_t)adp->ad_oldblkno);

The panic message would look like one of the following:

panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn
panic: handle_written_inodeblock: : direct pointer #nnn mismatch nnn 
!= nnn

The "" string there is a Bad Idea(tm); see below.

Secondly, this brings up the question: what happens if someone is doing
something like "fsck /var", where /var uses soft updates?  /var isn't
mounted when this happens.  Can these inode-related functions get called
during that time?  If so, fs_fsmnt would (in theory -- I haven't tested
in practise) be null.  So in that case, what should get printed as the
filesystem?  Well, this is where the "" string comes into play.

My first answer was: "the name of the device/slice/etc. which the inode
is associated with".

The problem is that I couldn't find a way to get this information, as
it's not stored in struct fs anywhere.  One would have to change the
kernel ABI to pass this down the stack, which changes the ABI and is not
something I'm willing to do (plus there's performance implications as
you're passing something else on the stack per every call).  Of course
there may be a way to get this easily, but I don't see it or know of it.

Thirdly, and this is equally as important: given the repetitive nature
of this code (it would have to be repeated in numerous functions),
making a common function that populates a (global) variable with the
fsname its working on would be ideal.  But I don't know the implication
of this, nor do I see many (I think two?) global variables used within
softdep_ffs.c.

Extending one of the structs to get access to the necessary information
is not as simple as "just do it" -- there are implications when it comes
to memory usage and so on.  This is not a piece of code to bang on
lightly.

This should probably be discussed on freebsd-hackers, but cross-posting
across 3 separate mailing lists is rude.  If you want to drive this,
cool, but please start a new thread about the matter (wanting the
filesystem or device printed in panic() when things like filesystem
panics happen) on freebsd-hackers.  I'm not subscribed to that list, so
please CC me if you go this route.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

__

today's 8.1/i386: panic: bad pte

2010-07-20 Thread Mikhail T.
Some part of KDE4's kdm crashed at start-up and seems to have taken the 
entire machine with it:


   kgdb /boot/kernel/kernel /var/crash/vmcore.22
   GNU gdb 6.1.1 [FreeBSD]
   Copyright 2004 Free Software Foundation, Inc.
   GDB is free software, covered by the GNU General Public License, and
   you are
   welcome to change it and/or distribute copies of it under certain
   conditions.
   Type "show copying" to see the conditions.
   There is absolutely no warranty for GDB.  Type "show warranty" for
   details.
   This GDB was configured as "i386-marcel-freebsd"...

   Unread portion of the kernel message buffer:
   <6>pid 18398 (drkonqi), uid 0: exited on signal 11 (core dumped)
   TPTE at 0xbfca9488  IS ZERO @ VA 2a522000
   panic: bad pte
   Uptime: 2h28m24s
   Physical memory: 1263 MB
   Dumping 195 MB: 180 164 148 132 116 100 84 68 52 36 20 4

   Reading symbols from /boot/kernel/splash_pcx.ko...Reading symbols
   from /boot/kernel/splash_pcx.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/splash_pcx.ko
   Reading symbols from /boot/kernel/vesa.ko...Reading symbols from
   /boot/kernel/vesa.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/vesa.ko
   Reading symbols from /boot/modules/nvidia.ko...done.
   Loaded symbols for /boot/modules/nvidia.ko
   Reading symbols from /boot/kernel/linux.ko...Reading symbols from
   /boot/kernel/linux.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/linux.ko
   Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
   /boot/kernel/acpi.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/acpi.ko
   Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols
   from /boot/kernel/linprocfs.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/linprocfs.ko
   #0  doadump () at pcpu.h:231
   231 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
   (kgdb) bt full
   #0  doadump () at pcpu.h:231
   No locals.
   #1  0xc05d10a4 in boot (howto=260) at
   /usr/src/sys/kern/kern_shutdown.c:416
_giantcnt = Variable "_giantcnt" is not available.
   (kgdb) where
   #0  doadump () at pcpu.h:231
   #1  0xc05d10a4 in boot (howto=260) at
   /usr/src/sys/kern/kern_shutdown.c:416
   #2  0xc05d12b1 in panic (fmt=Variable "fmt" is not available.
   ) at /usr/src/sys/kern/kern_shutdown.c:590
   #3  0xc07f0406 in pmap_remove_pages (pmap=0xc85bbc78) at
   /usr/src/sys/i386/i386/pmap.c:4198
   #4  0xc079516b in vmspace_exit (td=0xc51f3a00) at
   /usr/src/sys/vm/vm_map.c:409
   #5  0xc05a7253 in exit1 (td=0xc51f3a00, rv=139) at
   /usr/src/sys/kern/kern_exit.c:303
   #6  0xc05d3296 in sigexit (td=0xc51f3a00, sig=139) at
   /usr/src/sys/kern/kern_sig.c:2872
   #7  0xc05d47a8 in postsig (sig=11) at /usr/src/sys/kern/kern_sig.c:2759
   #8  0xc06082f8 in ast (framep=0xe5fafd38) at
   /usr/src/sys/kern/subr_trap.c:234
   #9  0xc07e2c44 in doreti_ast () at
   /usr/src/sys/i386/i386/exception.s:368

Does this look familiar to anyone? Thanks!

   -mi

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Pawel Tyll
Hi guys,

> I second what others have said - crap.
> But there could be some hope, not sure.
> Can you check what is the actual size used by the pool on the disk?
> It should be somewhere in zdb -C output ("asize"?).
> If I remember correctly, that actual size should be a multiple of some rather
> large power of two, so it could be that it is smaller than 'User Capacity' of 
> both
> old and new drives.
Well, I see some possibilities for creative solution here, using some
ssd (or usb stick or mdconfig as act of desperation) and gconcat, but
it's asking for trouble and should probably be considered a temporary
hack.

What I personally would do is get a 2TB drive and use it instead, with
gpt and -l for label, and replace it as gpt/something. Using 100 or so
MB less than whole disk is also a good idea, as you can see ;)

Cheers and good luck.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SIGEPIPE after update to 8.1-RC2

2010-07-20 Thread Ruben van Staveren

On 20 Jul 2010, at 10:03, Jeremy Chadwick wrote:

> On Tue, Jul 20, 2010 at 09:19:39AM +0200, Ruben van Staveren wrote:
>> To me, this is a clear breakage and should be considered a show
>> stopper issue for 8.1-RELEASE.
> 
> Too late for that now…

Oh well, errata when the culprit is found…

I've filed this as misc/148781

> 
> ftp://ftp4.freebsd.org/pub/FreeBSD/releases/amd64/
> ftp://ftp4.freebsd.org/pub/FreeBSD/releases/i386/

Thanks!

> 
> -- 
> | Jeremy Chadwick   j...@parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |

Regards,
Ruben___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problems replacing failing drive in ZFS pool

2010-07-20 Thread Andriy Gapon
on 20/07/2010 01:04 Garrett Moore said the following:
> Well, hotswapping worked, but now I have a totally different problem. Just
> for reference:
> # zpool offline tank da3
> # camcontrol stop da3
> 
> # camcontrol rescan all
> <'da3 lost device, removing device entry'>
> # camcontrol rescan all
> <'da3 at mpt0 ...', so new drive was found! yay>
> # zpool replace tank da3
> *cannot replace da3 with da3: device is too small*
> 
> So I looked at the smartctl output for the old and new drive. Old:
> Device Model: WDC WD15EADS-00P8B0
> Serial Number:WD-WMAVU0087717
> Firmware Version: 01.00A01
> User Capacity:1,500,301,910,016 bytes
> 
> New:
> Device Model: WDC WD15EADS-00R6B0
> Serial Number:WD-WCAVY4770428
> Firmware Version: 01.00A01
> User Capacity:1,500,300,828,160 bytes
> 
> God damnit, Western Digital. What can I do now? It's such a small
> difference, is there a way I can work around this? My other replacement
> drive is the "00R6B0" drive model as well, with the slightly smaller
> capacity.

I second what others have said - crap.
But there could be some hope, not sure.
Can you check what is the actual size used by the pool on the disk?
It should be somewhere in zdb -C output ("asize"?).
If I remember correctly, that actual size should be a multiple of some rather
large power of two, so it could be that it is smaller than 'User Capacity' of 
both
old and new drives.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SIGEPIPE after update to 8.1-RC2

2010-07-20 Thread Sean

On 20/07/2010, at 5:19 PM, Ruben van Staveren wrote:

> Hi,
> 
> This happens during a "sudo portupgrade -va --batch"
> my shell is /bin/tcsh too. When I run "exec bash" after sudo -s and then do 
> the portupgrade the problem doesn't show up. 
> 
> To me, this is a clear breakage and should be considered a show stopper issue 
> for 8.1-RELEASE. All shells should be equally supported, especially when they 
> reside in /bin. Is there already an open pr on this ?
> 

No PR from me, and not a chance of a fix to 8.1 at this point, unless it really 
does cause breakage (not just a message, but actually stops things); the tag 
has been laid down and would need to be slid forward.

It's likely to be either of two things... a bug in sh, that using tcsh 
highlights because of differing signal setup; or a bug in tcsh that a bug fix 
in sh highlights. It's a bug that comes and goes in the history of FreeBSD, at 
least since early 2006 (based on 10 seconds with Google - 
http://www.linuxquestions.org/questions/*bsd-17/broken-pipe-432167/)


> Thanks,
>   Ruben  
> 
> 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)

2010-07-20 Thread jhell


On Sat, 17 Jul 2010 14:35, Markus Gebert wrote:
In Message-Id: 



On 13.07.2010, at 16:02, Markus Gebert wrote:

Unfortunately, I have not been able to get anything useful out the svn 
commit logs, which could explain this. Maybe someone else has an idea 
what could have changed between 7 and 8 to break it, and again between 
8 and CURRENT to magically fix it again.


I tracked this down further. I couldn't easily downgrade my 8.1 
installation to see when the problem was introduced because the zpool 
version used is 14. So I tried to figure out, when the problem was 
solved in CURRENT.


I started with the first possible revision that can boot off my v14 pool 
(r201143, Dec 28, zfs v14 commit). With this revision, I was able to 
trigger the MCE.


Then I took some later revision (rev206010, Apr 1, chosen randomly), and 
I couldn't reproduce the problem. I started narrowing the revisions down 
until I found out, that while on r202386 I'm still able to trigger the 
MCE, r202387 seems to solve the problem on CURRENT:


http://svn.freebsd.org/viewvc/base?view=revision&revision=202387

Since John Baldwin mentioned this problem could be timing related, it 
seems reasonable, that a clock-related change could be fix it. But this 
commit seems to have been MFC'd to 8-STABLE and 8.1 (at least as far as 
I can tell) along with some other changes to amd64 specific code. I 
thought that maybe these other changes that have been MFC'd could have 
reintroduced the problem later on, but so far I could not reproduce the 
problem with newer CURRENT revisions. So, I actually nailed this one 
done to a single commit on CURRENT, but still cannot tell what the 
actual difference is compared to 8-STABLE/8.1.


Any ideas how to proceed?



Adding to this I remembered some specific commits that caught my attention 
when they happened. Specifically they were to mca.c (locate mca) on my 
machine provided the file paths and svn log provided the commit log.


When you said April and I seen the log it rang a bell.

These may be of interest to you:


r210079 | jhb | 2010-07-14 17:10:14 -0400 (Wed, 14 Jul 2010) | 13 lines

MFC 208507,208556,208621:

Add support for corrected machine check interrupts.  CMCI is a new local 
APIC interrupt that fires when a threshold of corrected machine check 
events is reached.  CMCI also includes a count of events when reporting 
corrected errors in the bank's status register.  Note that individual 
banks may or may not support CMCI.  If they do, each bank includes its own 
threshold register that determines when the interrupt fires.  Currently 
the code uses a very simple strategy where it doubles the threshold on 
each interrupt until it succeeds in throttling the interrupt to occur only 
once a minute (this interval can be tuned via sysctl).  The threshold is 
also adjusted on each hourly poll which will lower the threshold once 
events stop occurring.



r206183 | alc | 2010-04-05 12:11:42 -0400 (Mon, 05 Apr 2010) | 6 lines

MFC r204907, r204913, r205402, r205573, r205573
  Implement AMD's recommended workaround for Erratum 383 on Family 10h
  processors.

  Enable machine check exceptions by default.



And a list of mca.c's within the stable/8 src tree:
/usr/src/sbin/mca/mca.c
/usr/src/sys/amd64/amd64/mca.c
/usr/src/sys/dev/aha/aha_mca.c
/usr/src/sys/dev/buslogic/bt_mca.c
/usr/src/sys/dev/ep/if_ep_mca.c
/usr/src/sys/i386/i386/mca.c
/usr/src/sys/ia64/ia64/mca.c


Regards & Good luck,

--

 jhell

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SIGEPIPE after update to 8.1-RC2

2010-07-20 Thread Jeremy Chadwick
On Tue, Jul 20, 2010 at 09:19:39AM +0200, Ruben van Staveren wrote:
> To me, this is a clear breakage and should be considered a show
> stopper issue for 8.1-RELEASE.

Too late for that now...

ftp://ftp4.freebsd.org/pub/FreeBSD/releases/amd64/
ftp://ftp4.freebsd.org/pub/FreeBSD/releases/i386/

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SIGEPIPE after update to 8.1-RC2

2010-07-20 Thread Ruben van Staveren
Hi,

On 18 Jul 2010, at 4:20, Sean wrote:

> On 18/07/2010 1:24 AM, Alex Kozlov wrote:
>> Hi, stable
>> 
>> After updating my buildbox from 26 April 8-STABLE
>> to 8.1-RC2 I constantly getting SIGEPIPE
>> 
> 
> 
> [snip]
> 
> I'm getting the same thing; what shell are you using? I changed my shell on 
> one machine from /bin/tcsh to /usr/local/bin/bash and problem disappeared.

Another occasion where this problem acts up:

is marked as broken: does not build** Makefile possibly broken: 
mail/moztraybiff:
grep: write error: Broken pipe
moztraybiff-1.2.4_1
--->  Session ended at: Tue, 20 Jul 2010 09:04:41 +0200 (consumed 
00:03:01)/usr/local/sbin/portupgrade:1473:in `get_pkgname': Makefile broken 
(MakefileBrok
enError)
from /usr/local/sbin/portupgrade:623
from /usr/local/sbin/portupgrade:614:in `each'
from /usr/local/sbin/portupgrade:614
from /usr/local/sbin/portupgrade:588:in `catch'
from /usr/local/sbin/portupgrade:588
from /usr/local/lib/ruby/1.8/optparse.rb:1310:in `call'
from /usr/local/lib/ruby/1.8/optparse.rb:1310:in `parse_in_order'
from /usr/local/lib/ruby/1.8/optparse.rb:1306:in `catch'
from /usr/local/lib/ruby/1.8/optparse.rb:1306:in `parse_in_order'
from /usr/local/lib/ruby/1.8/optparse.rb:1254:in `catch'
from /usr/local/lib/ruby/1.8/optparse.rb:1254:in `parse_in_order'
from /usr/local/lib/ruby/1.8/optparse.rb:1248:in `order!'
from /usr/local/lib/ruby/1.8/optparse.rb:1241:in `order'
from /usr/local/sbin/portupgrade:565:in `main'
from /usr/local/lib/ruby/1.8/optparse.rb:791:in `initialize'
from /usr/local/sbin/portupgrade:229:in `new'
from /usr/local/sbin/portupgrade:229:in `main'
from /usr/local/sbin/portupgrade:2213

This happens during a "sudo portupgrade -va --batch"
my shell is /bin/tcsh too. When I run "exec bash" after sudo -s and then do the 
portupgrade the problem doesn't show up. 

To me, this is a clear breakage and should be considered a show stopper issue 
for 8.1-RELEASE. All shells should be equally supported, especially when they 
reside in /bin. Is there already an open pr on this ?

Thanks,
Ruben  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"