Hi Richard,

Apologies for the delay in getting back to you. I am going to cross post this 
to the ufs-discuss email list as well.

It seems like based on the symptoms you are seeing that for some reason the 
data UFS is getting during the fsck during boot is bad, in your examples of / 
and /usr. And, that your subsequent fsck of that filesystem, based on your 
subsequent correspondence with Sanjay Nadkarni nets no failures as shown in 
this email to Sanjay:

# fsck -F ufs /dev/md/rdsk/d30
** /dev/md/rdsk/d30
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
216878 files, 4949179 used, 3627460 free (386204 frags, 405157 blocks, 4.5% 
fragmentation)

So, it looks like to me that this is a potential mirror resync issue. Although, 
your original email shows the device in question on your first test system, 
d30, looks ok based on your metastat output. This could be a UFS logging issue 
I suppose as well.

Basically, it looks like something is making the system think the filesystem in 
question needs a check, it forces you in to system maintenance mode, then when 
you run fsck it all looks ok, so somehow it clears itself up.

fs-usr is the script failing in both scenarios you sent data about, and in svcs 
does remount the filesystem read/write from read only during boot which is why 
it is doing fsck on these filesystems.

So, I need a few things from you if possible to try to help me see where this 
is failing:

1. Can you boot as follows: 'boot -m verbose' ,which should give me more data 
about the SMv services that are running at the time of the failure. You will 
have to halt your system to do this since it doesn't look like reboot supports 
these arguments. However, you did say in your subsequent emails to Sanjay that 
this does happen on second reboot as well. The only concern I have with halting 
your system, and then booting may quiesce the filesytem enough to mask this 
issue. But it is worth a try.

2. Can you modify your jumpstart install to have a single node mirror during 
install, see if this problems continues to happen or not. Then after the 
install, attach the other submirror. This would give me some data regarding 
where this might be happening. Trying to isolate mirror resync issues from UFS 
issues.

3. Had you seen this issue before b16? Just trying to narrow down the putbacks 
to solaris to look at.

thanks,
sarah
******

> Hi,
> I am seeing a problem with snv_16 and snv_18 that on
> reboot the mirrored file systems fail fsck. This
> problem is most noticable on the first reboot after
> my jumpstart builds.
> My two fcal disks are formated :-
> install_type initial_install
> system_type standalone
> partitioning explicit
> filesys mirror:d10 c0t0d0s0 c1t4d0s0 256
> / logging
> filesys mirror:d20 c0t0d0s3 c1t4d0s3 4096
> /var logging
> filesys mirror:d30 c0t0d0s4 c1t4d0s4 4096
> /usr logging
> filesys mirror:d40 c0t0d0s5 c1t4d0s5 1536
> /opt logging
> filesys mirror:d50 c0t0d0s7 c1t4d0s7 15360
> /tmp2 logging
> filesys mirror:d60 c0t0d0s1 c1t4d0s1 free
> swap
> metadb c0t0d0s6 size 8192 count 4
> metadb c1t4d0s6 size 8192 count 4
> cluster SUNWCXall
> locale en_GB
>
> The install works file and I have put a metastat at
> the end of my finish script and all looks ok:-
> /sbin/metastat
> metastat: brscs02:
> system/metainit:default
> system/mdmonitor:default
> network/rpc/meta:default: service(s) not
> e(s) not online in SMF
>
> d60: Mirror
> Submirror 0: d61
> State: Okay
> Submirror 1: d62
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 19182960 blocks (9.1 GB)
>
> d61: Submirror of d60
> State: Okay
> Size: 19182960 blocks (9.1 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s1 0 No Okay
> Okay Yes
>
>
> d62: Submirror of d60
> State: Okay
> Size: 19182960 blocks (9.1 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s1 0 No Okay
> Okay Yes
>
>
> d50: Mirror
> Submirror 0: d51
> State: Okay
> Submirror 1: d52
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 31458321 blocks (15 GB)
>
> d51: Submirror of d50
> State: Okay
> Size: 31458321 blocks (15 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s7 0 No Okay
> Okay Yes
>
>
> d52: Submirror of d50
> State: Okay
> Size: 31458321 blocks (15 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s7 0 No Okay
> Okay Yes
>
>
> d40: Mirror
> Submirror 0: d41
> State: Okay
> Submirror 1: d42
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 3146121 blocks (1.5 GB)
>
> d41: Submirror of d40
> State: Okay
> Size: 3146121 blocks (1.5 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s5 0 No Okay
> Okay Yes
>
>
> d42: Submirror of d40
> State: Okay
> Size: 3146121 blocks (1.5 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s5 0 No Okay
> Okay Yes
>
>
> d30: Mirror
> Submirror 0: d31
> State: Okay
> Submirror 1: d32
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 8389656 blocks (4.0 GB)
>
> d31: Submirror of d30
> State: Okay
> Size: 8389656 blocks (4.0 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s4 0 No Okay
> Okay Yes
>
>
> d32: Submirror of d30
> State: Okay
> Size: 8389656 blocks (4.0 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s4 0 No Okay
> Okay Yes
>
>
> d20: Mirror
> Submirror 0: d21
> State: Okay
> Submirror 1: d22
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 8389656 blocks (4.0 GB)
>
> d21: Submirror of d20
> State: Okay
> Size: 8389656 blocks (4.0 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s3 0 No Okay
> Okay Yes
>
>
> d22: Submirror of d20
> State: Okay
> Size: 8389656 blocks (4.0 GB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s3 0 No Okay
> Okay Yes
>
>
> d10: Mirror
> Submirror 0: d11
> State: Okay
> Submirror 1: d12
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 525798 blocks (256 MB)
>
> d11: Submirror of d10
> State: Okay
> Size: 525798 blocks (256 MB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c0t0d0s0 0 No Okay
> Okay Yes
>
>
> d12: Submirror of d10
> State: Okay
> Size: 525798 blocks (256 MB)
> Stripe 0:
> Device Start Block Dbase State
> State Reloc Hot Spare
> c1t4d0s0 0 No Okay
> Okay Yes
>
>
> Device Relocation Information:
> Device Reloc Device ID
> c1t4d0 Yes id1,[EMAIL PROTECTED]
> c0t0d0 Yes id1,[EMAIL PROTECTED]
>
>
> But when it reboots it sometimes fails:-
> Finish script E3500+login.sh execution completed.
>
> The begin script log 'begin.log'
> is located in /var/sadm/system/logs after reboot.
>
> The finish script log 'finish.log'
> is located in /var/sadm/system/logs after reboot.
>
> syncing file systems... done
> rebooting...
> Resetting...
> ttya initialized
> Using POST's System Configuration
> Setting up memory
> fhc ac simm-status environment sram flashprom
> SUNW,UltraSPARC-II
> Probing UPA Slot at 2,0 sbus fhc ac environment
> flashprom eeprom sbus-speed counter-timer
> Probing UPA Slot at 3,0 sbus counter-timer
> Probing /[EMAIL PROTECTED],0 at d,0 SUNW,socal sf ssd sf ssd
> Probing /[EMAIL PROTECTED],0 at 1,0 QLGC,isp sd st
> Probing /[EMAIL PROTECTED],0 at 2,0 Nothing there
> Probing /[EMAIL PROTECTED],0 at 3,0 SUNW,hme SUNW,fas sd st
> Probing /[EMAIL PROTECTED],0 at 0,0 network
> 5-slot Sun Enterprise E3500, No Keyboard
> OpenBoot 3.2.30, 2048 MB memory installed, Serial
> #11240214.
> Copyright 2002 Sun Microsystems, Inc. All rights
> reserved
> Ethernet address 8:0:20:ab:83:16, Host ID: 80ab8316.
>
>
>
> Rebooting with command: boot
>
> Port#1 received soc-status=14
> Port#0 received soc-status=14 loop 0 is ONLINE
> Boot device: disk File and args:
> Loading ufs-file-system package 1.4 04 Aug 1995
> 13:02:54.
> FCode UFS Reader 1.12 00/07/17 15:48:16.
> Loading: /platform/SUNW,Ultra-Enterprise/ufsboot
> Loading: /platform/sun4u/ufsboot
> SunOS Release 5.11 Version snv_16 64-bit
> Copyright 1983-2005 Sun Microsystems, Inc. All
> rights reserved.
> Use is subject to license terms.
> SUNW,sbus-gem0: Using Gigabit SERDES Interface
> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex
> Link Up
> Hostname: brscs02
> The / file system (/dev/md/rdsk/d10) is being
> checked.
> The /usr file system (/dev/md/rdsk/d30) is being
> checked.
>
> WARNING - Unable to repair the /usr filesystem. Run
> fsck
> manually (fsck -F ufs /dev/md/rdsk/d30).
>
> Jul 26 17:50:14 svc.startd[7]:
> svc:/system/filesystem/usr:default: Method
> "/lib/svc/method/fs-usr" failed with exit status 95.
> [ system/filesystem/usr:default failed fatally (see
> 'svcs -x' for details) ]
> Requesting System Maintenance Mode
> (See /lib/svc/share/README for more information.)
> Console login service(s) cannot run
>
> Root password for system maintenance (control-d to
> bypass):
>
>
> Sometimes the reboot works but it checks the
> filesystems:-
>
> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex
> Link Up
>
> Hostname: brscs02
>
> The / file system (/dev/md/rdsk/d10) is being
> checked.
>
> Configuring devices.
>
> Loading smf(5) service descriptions:
>
>
> Other times another reboot will also fail.
>
>
> I have seen the problem on a Ultra 60 also with two
> onboard 9GB scsi drives
>
>
> Solaris 10 GA does not seem to have this problem.
>
> Cheers
> Richard.
This message posted from opensolaris.org
_______________________________________________
ufs-discuss mailing list
[email protected]

Reply via email to