Re: [osol-discuss] Re: Re: How To Mount CDROM?

2005-10-21 Thread Juergen Keil


> >There's at least one bug in Solaris Express snv_22 with the SUNWvolr
> >package's preinstall script:
> >  
> I'll file a bug and fix it. Thanks for your analysis and fix

Yesterday I've already submitted it, under category volmgt/other.

CR 6339683: SUNWvolr preinstall script broken, smf "smserver" service disabled


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Re: Solaris on Intel Macs??

2006-04-10 Thread Juergen Keil

> > The next try was with more console_putchar calls added to the
> > gateA20() code.  This narrowed it down to the loop waiting for an 
> > empty keyboard controller input buffer.
> 
> So how far did you get after that ?

Well, it doesn't hang any more after printing "Loading stage2 ",
after I added some code [*] to grub's gateA20() function.


grub is now able to load and display the menu.lst file and the splashimage.
grub reports reasonable base and upper memory.


I can type exactly *one* character on the usb keyboard.  When trying to
read a second character from the keyboard, the system hangs.


Or I can wait until the grub menu timeout expires.  This starts loading the
default entry.  multiboot and the boot_archive is loaded.  Text screen is
cleared,  , and the system hangs - before printing the 
"SunOS Release 5.xx" copyright string.


[*]

diff -rub ../opensolaris-20060404/usr/src/grub/grub-0.95/stage2/asm.S 
usr/src/grub/grub-0.95/stage2/asm.S
--- ../opensolaris-20060404/usr/src/grub/grub-0.95/stage2/asm.S 2006-04-05 
23:28:21.0 +0200
+++ usr/src/grub/grub-0.95/stage2/asm.S 2006-04-10 18:11:44.152925307 +0200
@@ -1787,7 +1787,30 @@
jnz 3f
ret

-3: /* use keyboard controller */
+3: /*
+* try to switch gateA20 using PORT92, the "Fast A20 and Init"
+* register
+*/
+mov$0x92, %dx
+inb%dx, %al
+   /* skip the port92 code if it's unimplemented (read returns 0xff) */
+   cmpb$0xff, %al
+   jz  6f
+
+   /* set or clear bit1, the ALT_A20_GATE bit */
+   movb4(%esp), %ah
+   testb   %ah, %ah
+   jz  4f
+   orb $2, %al
+   jmp 5f
+4:  and$0xfd, %al
+
+   /* clear the INIT_NOW bit; don't accidently reset the machine */
+5: and $0xfe, %al
+   outb%al, %dx
+
+
+6: /* use keyboard controller */
pushl   %eax

callgloop1
@@ -1797,9 +1820,12 @@

 gloopint1:
inb $K_STATUS
+   cmpb$0xff, %al
+   jz  gloopint1_done
andb$K_IBUF_FUL, %al
jnz gloopint1

+gloopint1_done:
movb$KB_OUTPUT_MASK, %al
cmpb$0, 0x8(%esp)
jz  gdoit
@@ -1820,6 +1846,8 @@

 gloop1:
inb $K_STATUS
+   cmpb$0xff, %al
+   jz  gloop2ret
andb$K_IBUF_FUL, %al
jnz gloop1

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Re: Solaris on Intel Macs??

2006-04-10 Thread Juergen Keil

> > Or I can wait until the grub menu timeout expires.  This starts loading the
> > default entry.  multiboot and the boot_archive is loaded.  Text screen is
> > cleared,  , and the system hangs - before printing the 
> > "SunOS Release 5.xx" copyright string.
> 
> A completely wild guess but maybe we got into multiboot's main and
> got to here:
> 

Hmm, I've already tried to search for "gateA20" code in multiboot, but havn't
found such a piece of code...

> 
http://cvs.opensolaris.org/source/xref/on/usr/src/psm/stand/boot/i386/common/key
board.c#kb_init

These loops (called via ischar() / getchar() => kb_ischar() / kb_getchar())
look interesting and would hang on systems that don't have a ps/2
keyboard controller, so it could be a problem with the intel macs.

But would the kernel call them when it was not started with the "-a" flag?

I guess when started with "-a" the kernel would try to read various parameters
from the console.  Is there any reason to call ischar() / getchar() when
"-a" is not passed to the kernel?

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] 76B Cannot find install software

2007-11-26 Thread Juergen Keil


Joerg Schilling wrote:
> Jürgen Keil <[EMAIL PROTECTED]> wrote:
> 
> > On some boards you can also change the configuration of the
> > s-ata controller to p-ata "legacy ide" (instead of ahci mode).  In
> > legacy mode, Solaris should be able to find both the (s-ata) disks
> > and the (s-ata) optical device.
> 
> What is the disadvantage from the legacy mode?

- no cfgadm_sata support, so you can't disconnect/connect/configure/unconfigure
  s-ata devices while the kernel is up and running

- no native command queing

- no s-ata port multiplier support

- afaik: zfs is unable to read a s-ata hdd's SMART data, so things like
  automatic replacing a failing s-ata disk with a hotspare probably doesn't
  work
  (I guess that could be added to the ata driver)
  
- it seems some sata controllers support dma access to memory >= 4GB,
  so the kernel doesn't have to use bounce buffers.  The ata driver
  can only access 32-bit addresses via dma.
  
  

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] 'more' broken in b77 miniroot?

2007-11-28 Thread Juergen Keil

> Date: Wed, 28 Nov 2007 13:27:38 -0500
> From: Kyle McDonald <[EMAIL PROTECTED]>
> To: James Carlson <[EMAIL PROTECTED]>
> CC: Jürgen Keil <[EMAIL PROTECTED]>, opensolaris-discuss@opensolaris.org
> Subject: Re: [osol-discuss] 'more' broken in b77 miniroot?
> 
> James Carlson wrote:
> > Jürgen Keil writes:
> >   
> >> In snv_75a, the miniroot /sbin/sulogin shell script contains this line:
> >>
> >> exec 0<> /dev/console 1>&0 2>&0
> >>
> >> The miniroot /sbin/sulogin from snv_75a has SCCS ID 
> >> "@(#)sulogin.sh 1.5".  Has that changed for snv_77?
> >> 
> >
> > It's still the same in the gate.
> >
> >   
> This might be the difference.
> 
> I didn't choose 'Single User Shell' from the menu.
> 
> The machine is configured to do Custom Jumpstart automatically, and to 
> see the environment the Begin script would run in, I temporarily changed 
> the begin script to just call 'exit 1'. This made JumpStart give up and 
> leave me a shell prompt.
> 
> Is this prompt JumpStart left me at supposed to be the same as 'sulogin'?

Maybe not.


Can you try "ls -lR / | truss more"  ?   What kind of error
does it get (when it tries to read from stderr fd#2) ?


You may also want to check the shell's filedescriptor flags
with "pfiles $$".  And in case stderr isn't opened O_RDWR check
the process tree with "ptree $$" and use "pfiles {pid}" on the
parents to find out where the readability of stderr is lost.

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [driver-discuss] CPU temperature and fan

2007-12-17 Thread Juergen Keil

> Is there any existing tools or interface on the solaris can monitor CPU
> temperature and control fan status?

I'm using the following dtrace script to monitor cpu temperatures on a
Tecra S1 centrino laptop (monitors some dtrace probes in the 
tzmon kernel module).  Unfortunatelly it's not very useful on ASUS
mainboards with the Q-Fan feature enabled: ASUS BIOS controlls the 
cpu fan speed, and ASUS' ACPI code always reports a cpu temperature of
40.0°C:


#!/usr/sbin/dtrace -s

#pragma D option quiet

sdt:tzmon:tzmon_eval_zone:tz-temp
{
printf("temp %d.%1u°K/%d.%1u°C",
arg0 / 10, arg0 % 10,
(arg0 - 2732) / 10, (arg0 - 2732) % 10);
}

sdt:tzmon:tzmon_eval_zone:tz-temp
/(int)arg1 > 0/
{
printf(", crit hot %d.%1u°K/%d.%1u°C",
arg1 / 10, arg1 % 10,
(arg1 - 2732) / 10, (arg1 - 2732) % 10);
}


sdt:tzmon:tzmon_eval_zone:tz-temp
/(int)arg2 > 0/
{
printf(", hot %d.%1u°K/%d.%1u°C",
arg2 / 10, arg2 % 10,
(arg2 - 2732) / 10, (arg2 - 2732) % 10);
}

sdt:tzmon:tzmon_eval_zone:tz-temp
{
printf(", %s\n", stringof(arg3));
}

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78

2008-06-16 Thread Juergen Keil
Robert William Fuller wrote:

> [EMAIL PROTECTED] wrote:
> > Hi Kyle,
> > 
> > given that what happens looks ever-so-slightly different each time, a 
> > hardware glitch could be possible; to exclude this, would you happen to 
> > know whether these panics occurred before build 78 as well ? If they occur 
> > if you use the b77 hsfs module on your post-b78 system ? Does the machine 
> > you're using have a history of hardware issues, or other symptoms that'd 
> > point at flaky hardware (such as e.g. ZFS block checksumming errors) ?
> 
> Did anybody else notice they're all NULL pointer de-references???  It's 
> probably not a hardware problem  For example, if it's a memory 
> problem, then you'll often see random pointers, but not 3 NULL pointers 
> in a row

Yep, I noticed that, too.

IIRC a bug like ``kmem_free(NULL, size)'' somewhere in the kernel can have the
effect that a subsequent ``kmem_alloc(size, KM_SLEEP)'' somewhere else in the
kernel will return with a NULL pointer!  (Assuming you run release bits)

For that reason I did suggest to Kyle to try to reproduce this hsfs mount
panic with kmem heap checking enabled.

Add the following line to /etc/system, reboot, retry to reproduce the hsfs
mount panic:

   set kmem_flags=0xf

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78

2008-06-16 Thread Juergen Keil
Frank Hofmann wrote:

> On Mon, 16 Jun 2008, Juergen Keil wrote:
> 

> > IIRC a bug like ``kmem_free(NULL, size)'' somewhere in the kernel can have 
the
> > effect that a subsequent ``kmem_alloc(size, KM_SLEEP)'' somewhere else in 
the
> > kernel will return with a NULL pointer!  (Assuming you run release bits)
> 
> If this is so, then it's a bug and should be fixed. Quote kmem_alloc(9F):
> 
> NOTES
>   kmem_alloc(0, flag) always returns NULL. kmem_free(NULL,  0)
>   is legal.
> 
> That's manpage - consider it a spec ...

Well, it says kmem_free with a ptr == NULL and size == 0 is legal;
but what about ptr == NULL and size > 0?


Quick test with ::call in kmdb, when booted with kmem_flags=0xf:

- kmem_alloc::call 8 0
  kmem_free::call  8
  
  works, as expected

- kmem_free::call 0 8

  kmdb fails this call, with "caught a trap"

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78

2008-06-16 Thread Juergen Keil
Frank Hofmann wrote:
 
> On Mon, 16 Jun 2008, Juergen Keil wrote:
> 
> > For that reason I did suggest to Kyle to try to reproduce this hsfs mount
> > panic with kmem heap checking enabled.
> >
> > Add the following line to /etc/system, reboot, retry to reproduce the hsfs
> > mount panic:
> >
> >   set kmem_flags=0xf
> 
> Good idea.

Ok, I can actually reproduce that panic using last week's opensolaris bits.

All I have to do is try and "mount -F hsfs" a non-existent slice; e.g. using
a CD containing OpenSolaris 2008.05, mount -F hsfs /dev/dsk/c1t1d0s4 /mnt
("mount -F hsfs /dev/dsk/c1t1d0p0 /mnt" should work, though):


panic[cpu1]/thread=ff0348445720: 
BAD TRAP: type=e (#pf Page fault) rp=ff00108bb990 addr=40 occurred in module
 "genunix" due to a NULL pointer dereference


mount: 
#pf Page fault
Bad kernel fault at addr=0x40
pid=19108, pc=0xfba92633, sp=0xff00108bba80, eflags=0x10207
cr0: 80050033 cr4: 6f8
cr2: 40
cr3: 22f819000
cr8: c

rdi: fbca88a0 rsi:1 rdx:8
rcx:0  r8: fbca8a70  r9:0
rax:0 rbx:0 rbp: ff00108bbaa0
r10: ff02d24a6500 r11: ff00108bb680 r12:   1b0103
r13: ff00108bbc08 r14:   1b0103 r15:   10
fsb:0 gsb: ff02d2e75540  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:0 rip: fba92633
 cs:   30 rfl:10207 rsp: ff00108bba80
 ss:   38

ff00108bb870 unix:die+c8 ()
ff00108bb980 unix:trap+13c3 ()
ff00108bb990 unix:_cmntrap+e9 ()
ff00108bbaa0 genunix:vfs_devismounted+23 ()
ff00108bbbc0 hsfs:hs_getmdev+176 ()
ff00108bbc60 hsfs:hsfs_mount+195 ()
ff00108bbc90 genunix:fsop_mount+21 ()
ff00108bbe00 genunix:domount+9ff ()
ff00108bbe80 genunix:mount+d2 ()
ff00108bbec0 genunix:syscall_ap+8f ()
ff00108bbf10 unix:brand_sys_syscall32+197 ()

syncing file systems...
 done
dumping to /dev/dsk/c9t0d0s1, offset 860356608, content: kernel
> $C
ff00108bbaa0 vfs_devismounted+0x23(1b0103)
ff00108bbbc0 hs_getmdev+0x176(ff02dcf8a508, 804729e, 101, 
ff00108bbc08, ff00108bbc3c, ff0315246708)
ff00108bbc60 hsfs_mount+0x195(ff02dcf8a508, ff02ffea2c00, 
ff00108bbe30, ff0315246708)
ff00108bbc90 fsop_mount+0x21(ff02dcf8a508, ff02ffea2c00, 
ff00108bbe30, ff0315246708)
ff00108bbe00 domount+0x9ff(0, ff00108bbe30, ff02ffea2c00, 
ff0315246708, ff00108bbe28)
ff00108bbe80 mount+0xd2(ff0347a60fd8, ff00108bbeb8)
ff00108bbec0 syscall_ap+0x8f()
ff00108bbf10 sys_syscall32+0x101()






The panic with "kmem_flags=0xf" is more interesting:

> ::status
debugging crash dump vmcore.5 (64-bit) from tiger2
operating system: 5.11 snv_93_jk (i86pc)
panic message: kernel heap corruption detected
dump content: kernel pages only
kernel memory allocator: 
invalid free: buffer not in cache
buffer=ff0010455e30  bufctl=0  cache: kmem_alloc_256

panic[cpu1]/thread=ff03a05ad060: 
kernel heap corruption detected


ff0010455a20 genunix:kmem_error+497 ()
ff0010455a40 genunix:kmem_free+d6 ()
ff0010455bb0 hsfs:hs_mountfs+8b9 ()
ff0010455c60 hsfs:hsfs_mount+1e9 ()
ff0010455c90 genunix:fsop_mount+21 ()
ff0010455e00 genunix:domount+9ff ()
ff0010455e80 genunix:mount+d2 ()
ff0010455ec0 genunix:syscall_ap+8f ()
ff0010455f10 unix:brand_sys_syscall32+197 ()

syncing file systems...
 done
dumping to /dev/dsk/c9t0d0s1, offset 860356608, content: kernel

> $C
ff0010455980 vpanic()
ff0010455a20 kmem_error+0x497(1, ff02ce62b020, ff0010455e30)
ff0010455a40 kmem_free+0xd6(ff0010455e30, e8)
ff0010455bb0 hs_mountfs+0x8b9(ff03a5096dc8, 1b0104, 
ff03a2b9f140, 6100, 0, ff034ed39978, 0)
ff0010455c60 hsfs_mount+0x1e9(ff03a5096dc8, ff02f09e8900, 
ff0010455e30, ff034ed39978)
ff0010455c90 fsop_mount+0x21(ff03a5096dc8, ff02f09e8900, 
ff0010455e30, ff034ed39978)
ff0010455e00 domount+0x9ff(0, ff0010455e30, ff02f09e8900, 
ff034ed39978, ff0010455e28)
ff0010455e80 mount+0xd2(ff02e97cce38, ff0010455eb8)
ff0010455ec0 syscall_ap+0x8f()
ff0010455f10 sys_syscall32+0x101()

> hs_mountfs+0x8b9::dis
hs_mountfs+0x88f:   movq   -0x78(%rbp),%r8
hs_mountfs+0x893:   xorq   %r9,%r9
hs_mountfs+0x896:   call   +0x34c9f65   
hs_mountfs+0x89b:   movq   0x30(%rsp),%rdi
hs_mountfs+0x8a0:   call   +0x34c700b   
hs_mountfs+0x8a5:   testq  %r13,%r

Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78

2008-06-16 Thread Juergen Keil

Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(),
whenever we use one of the first three |goto cleanup|,  the local variables
|svp| and |jvp| are uninitialized.  That should corrupt the kernel heap
when we kmem_free() with an unitialized stack lock pointer in the
cleanup section ...



struct hs_volume *svp;  /* Supplemental VD for ISO-9660:1999 */
struct hs_volume *jvp;  /* Joliet VD */

...

/*
 * Refuse to go any further if this
 * device is being used for swapping
 */
if (IS_SWAPVP(common_specvp(devvp))) {
error = EBUSY;
goto cleanup;
}

vap.va_mask = AT_SIZE;
if ((error = VOP_GETATTR(devvp, &vap, ATTR_COMM, cr, NULL)) != 0) {
cmn_err(CE_NOTE, "Cannot get attributes of the CD-ROM driver");
goto cleanup;
}

/*
 * Make sure we have a nonzero size partition.
 * The current version of the SD driver will *not* fail the open
 * of such a partition so we have to check for it here.
 */
if (vap.va_size == 0) {
error = ENXIO;
goto cleanup;
}

/*
 * Init a new hsfs structure.
 */
fsp = kmem_zalloc(sizeof (*fsp), KM_SLEEP);
svp = kmem_zalloc(sizeof (*svp), KM_SLEEP);
jvp = kmem_zalloc(sizeof (*jvp), KM_SLEEP);

...


cleanup:
(void) VOP_CLOSE(devvp, FREAD, 1, (offset_t)0, cr, NULL);
VN_RELE(devvp);
if (fsp)
kmem_free(fsp, sizeof (*fsp));
if (svp)
kmem_free(svp, sizeof (*svp));
if (jvp)
kmem_free(jvp, sizeof (*jvp));
return (error);

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78

2008-06-16 Thread Juergen Keil

I filed a bug at http://bugs.opensolaris.org/;
Bug-ID is not yet known.

Fix is obvious:

diff --git a/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c 
b/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c
--- a/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c
+++ b/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c
@@ -596,8 +596,8 @@ hs_mountfs(
size_t  pathbufsz = strlen(path) + 1;
int redo_rootvp;
 
-   struct hs_volume *svp;  /* Supplemental VD for ISO-9660:1999 */
-   struct hs_volume *jvp;  /* Joliet VD */
+   struct hs_volume *svp = NULL;   /* Supplemental VD for ISO-9660:1999 */
+   struct hs_volume *jvp = NULL;   /* Joliet VD */
 
/*
 * The rules for which extension will be used are:


> Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(),
> whenever we use one of the first three |goto cleanup|,  the local variables
> |svp| and |jvp| are uninitialized.  That should corrupt the kernel heap
> when we kmem_free() with an unitialized stack lock pointer in the
> cleanup section ...
> 
> 
> 
> struct hs_volume *svp;  /* Supplemental VD for ISO-9660:1999 
*/
> struct hs_volume *jvp;  /* Joliet VD */
> 
> ...
> 
>   /*
>  * Refuse to go any further if this
>  * device is being used for swapping
>  */
> if (IS_SWAPVP(common_specvp(devvp))) {
>   error = EBUSY;
> goto cleanup;
>   }
> 
>   vap.va_mask = AT_SIZE;
> if ((error = VOP_GETATTR(devvp, &vap, ATTR_COMM, cr, NULL)) != 0) {
>   cmn_err(CE_NOTE, "Cannot get attributes of the CD-ROM driver");
> goto cleanup;
>   }
> 
>   /*
>  * Make sure we have a nonzero size partition.
>  * The current version of the SD driver will *not* fail the open
>  * of such a partition so we have to check for it here.
>  */
> if (vap.va_size == 0) {
>   error = ENXIO;
> goto cleanup;
>   }
>   
> /*
>  * Init a new hsfs structure.
>  */
> fsp = kmem_zalloc(sizeof (*fsp), KM_SLEEP);
> svp = kmem_zalloc(sizeof (*svp), KM_SLEEP);
> jvp = kmem_zalloc(sizeof (*jvp), KM_SLEEP);
> 
> ...
> 
> 
> cleanup:
> (void) VOP_CLOSE(devvp, FREAD, 1, (offset_t)0, cr, NULL);
> VN_RELE(devvp);
> if (fsp)
>   kmem_free(fsp, sizeof (*fsp));
>   if (svp)
>   kmem_free(svp, sizeof (*svp));
>   if (jvp)
>   kmem_free(jvp, sizeof (*jvp));
>   return (error);
> 

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org