Re: [osol-discuss] Need help debuggin hang issue on build 134

2010-03-24 Thread m...@bruningsystems.com

Hi Ronny,

How are you getting the crash dump if your system is hung
and you are not getting into kmdb?  To get into kmdb,
you need to type f1-a on the console (based on the addresses,
you are on x64, not SPARC).

max

Ronny Egner wrote:

HI all,

cpuinfo -v yields (see file cpuinfo.txt ).

I noticed one interesting thing:

When preparing to catche the hang i opened the console and pre-typed mdb -K 
to crash the system if needed to.
When the system hang i pressed ENTER but nothing happened. While digging around in the 
core dump i found my mdb -K:

ff1381a42e40 ff13a6594cc0 ff138192c310   1  600
  PC: _resume_from_idle+0xf1CMD: mdb -K
  stack pointer for thread ff1381a42e40: ff008c643d30
  [ ff008c643d30 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
vmem_xalloc+0x635()
vmem_alloc+0x161()
segkmem_xalloc+0x90()
segkmem_alloc_vn+0xcd()
segkmem_zio_alloc+0x24()
vmem_xalloc+0x546()
vmem_alloc+0x161()
kmem_slab_create+0x81()
kmem_slab_alloc+0x5b()
kmem_cache_alloc+0x1fa()
zio_data_buf_alloc+0x2c()
arc_get_data_buf+0x18b()
arc_buf_alloc+0xa2()
arc_read_nolock+0x12f()
arc_read+0x75()
dbuf_read_impl+0x172()
dbuf_read+0xfe()
dmu_buf_hold_array_by_dnode+0x1c9()
dmu_buf_hold_array+0x6e()
dmu_read_uio+0x4d()
zfs_read+0x2d1()
fop_read+0x6b()
vn_rdwr+0x17f()
gexec+0x140()
exec_common+0x45c()
exece+0x1f()
_sys_sysenter_post_swapgs+0x149()

  

ff1381a42e40::thread


ADDRSTATE  FLG PFLG SFLG   PRI  EPRI PIL INTR
ff1381a42e40 run  1000  104360 0   0  n/a
  

ff1381a42e40::threadlist


ADDR PROC  LWP CMD/LWPID
ff1381a42e40 ff13a6594cc0 ff138192c310 mdb/1


So i looked for the mdb process and found it on CPU ID #7:

I noticed mdb -K was in run queue on CPU ID 7: 


ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
  7 ff1376a44540  1f70  99   nono t-0ff008b137c60 sched
   ||
RUNNING --++--  PRI THREAD   PROC
  READY60 ff1381a42e40 mdb
   QUIESCED60 ff13a35ee720 nfsd
 EXISTS60 ff1377474400 bash
 ENABLE60 ff008c1bec60 sched
   59 ff13817ef540 nscd
   59 ff13817fbc60 syslogd
   58 ff1381a38720 smbd

So it seems mdb was blocked by sched (thread ff008b137c60); digging it 
yields:

  

ff008b137c60::findstack


stack pointer for thread ff008b137c60: ff008b1370d0
  ff008b137120 intr_thread_prolog+0x2a()
  ff008b137140 apic_setspl+0x5c()
  ff008b137180 splr+0x55()
  ff008b137c60 0x22d9fd9301c7()


Any ideas?

Message was edited by: ronnyegn
  



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Need help debuggin hang issue on build 134

2010-03-23 Thread m...@bruningsystems.com

You might try ::cpuinfo -v
in mdb on the dump to see what was running on the cpus
at the time of the hang.

max

Brian Ruthven - Sun UK wrote:



Ronny Egner wrote:

ff008bfe0480 unix:die+dd ()
ff008bfe0590 unix:trap+177b ()
ff008bfe05a0 unix:cmntrap+e6 ()
ff008bfe0690 0 ()
ff008bfe06b0 unix:debug_enter+38 ()
ff008bfe06d0 unix:abort_sequence_enter+35 ()
ff008bfe0720 kbtrans:kbtrans_streams_key+102 ()
ff008bfe0750 conskbd:conskbdlrput+e7 ()
ff008bfe07c0 unix:putnext+21e ()
ff008bfe0800 kbtrans:kbtrans_queueevent+7c ()
ff008bfe0830 kbtrans:kbtrans_queuepress+7c ()
ff008bfe0870 kbtrans:kbtrans_untrans_keypressed_raw+46 ()
ff008bfe08a0 kbtrans:kbtrans_processkey+32 ()
ff008bfe08f0 kbtrans:kbtrans_streams_key+175 ()
ff008bfe0920 usbkbm:usbkbm_wrap_kbtrans+20 ()
ff008bfe0960 usbkbm:usbkbm_streams_callback+3c ()
ff008bfe09e0 usbkbm:usbkbm_unpack_usb_packet+2f6 ()
ff008bfe0a10 usbkbm:usbkbm_rput+84 ()
ff008bfe0a80 unix:putnext+21e ()
ff008bfe0ac0 hid:hid_interrupt_pipe_callback+7c ()
ff008bfe0b00 usba:usba_req_normal_cb+155 ()
ff008bfe0b60 usba:hcdi_do_cb+133 ()
ff008bfe0ba0 usba:hcdi_cb_thread+b2 ()
ff008bfe0c40 genunix:taskq_thread+248 ()
ff008bfe0c50 unix:thread_start+8 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: all
  

snip
So just judgint from the stack trace ($C macro) the problem seems to 
be related to some kind of USB device?

Can anyone help me?
  


Your stack trace above is the direct result of dropping to the 
debugger, from handling the USB interrupt at the bottom of the trace, 
all the way up through the usbkbm and kbtrans drivers, up to dropping 
out of Solaris via the debug_enter() call. The remainder of the stack 
is the result of forcing the panic - a jmpl to a zero address triggers 
an instant bad trap, and the system panics through the normal 
mechanism. The USB device here is merely the messenger of the Alt-F1 
sequence to drop to the debugger. This particular stack is not a 
problem in itself.



For your issue, you will need to see what other threads are doing in 
the system at the time you halted it.


[ Might be worth ruling out 
http://defect.opensolaris.org/bz/show_bug.cgi?id=12528 - I don't think 
it's the right bug, but it reared its head recently. I don't know 
whether this only affects the graphics card or the entire system... ]


You may need to be looking at what the IO layer was doing with the two 
internal disks at the time - IIRC, threads blocked in biowait() are a 
good starting point.


Regards,
Brian




___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] b134 freezes

2010-03-20 Thread m...@bruningsystems.com

Hi Jouko,

You might try running mdb -k and see what is thread 0xd8db3dc0 using:

d8db3dc0::threadlist -v

Jouko Holopainen wrote:

Situation improves ... after I disabled compression on the swap Firefox no 
longer hangs. At least yet.. Should I make a separate partition for it?

The kernel freeze I debugged with http://www.bruningsystems.com/runq.d and it 
says:
(just a part here):
de7c4ac0 on run queue, execname = xscreensaver
dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36
dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36
dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36
dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36
...

Yes, there is 36 shed's in the queue (according to the script).
I can give the whole output (480k, should zip a lot) to anyone interested.
  


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] b134 freezes

2010-03-20 Thread m...@bruningsystems.com

You could try making a copy of /etc/driver_aliases.  Then rem_drv atge
(you may need a reboot afterwards).  Then see if the problem goes away.
Also, try disabling nwam.  That may stop the taskq thread(s) for atge.

max

Jouko Holopainen wrote:

d8db3dc0::threadlist -v


ADDR PROC  LWP CLS PRIWCHAN
d8db3dc0 fec21dd80   0  60 d2bb902c
  PC: _resume_from_idle+0xb1TASKQ: atge_mii0
  stack pointer for thread d8db3dc0: d8db3c58
swtch+0x188()
cv_timedwait_hires+0xc5()
cv_reltimedwait+0x52()
_mii_task+0x184()
taskq_thread+0x1f7()
thread_start+8()

This is a bit strange as atge (wired) is not connected at all.
Or maybe it is not that strange after all ...  maybe it is trying to up.
  


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

kill doesn't work?

max

Mike DeMarco wrote:

looking for some kind of tool that will let me release the swap space consumed 
by a process that was swapped out and its parent died. There is no way for this 
process to get back on the stack and its carcass consumes unrecoverable swap 
space.
  


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

Hi Mike,

Mike DeMarco wrote:
How do I know that the process is swapped out. 

Best answer I can give for this is that I don't. 

I have no tools that can tell me what processes are represented by the number that is in the w column in vmstat. I do know that once swap is hit this number will grow and stay  zero until the kernel is rebooted.  


I am looking for tools/methods to analyze the processes in swap determine if 
indeed they are unneeded and remove them since they will never be called out of 
swap and are consuming disk space.
  
The w column in vmstat is a count of the number of threads swapped 
out.  A swapped out thread
will not have 0x1 set in t_schedflag.  You can find the process(es) the 
threads belong to by:


# mdb -k
 ::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | 
::eval t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs


(Starting at ::walk thread t is all one line.)

max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

Mike DeMarco wrote:
How do I know that the process is swapped out. 

Best answer I can give for this is that I don't. 

I have no tools that can tell me what processes are represented by the number that is in the w column in vmstat. I do know that once swap is hit this number will grow and stay  zero until the kernel is rebooted.  


I am looking for tools/methods to analyze the processes in swap determine if 
indeed they are unneeded and remove them since they will never be called out of 
swap and are consuming disk space.
  

A little follow up from my box.

# vmstat 2
kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr cd s0 s1 --   in   sy   cs us 
sy id
1 0 1 1551744 875932 51 488 6 285 349 0 17112 25 -0 11 0 935 160430 
1503 11 9 80
0 0 11 1693076 1221836 0 117 213 0 0 0  0 74  0  0  0  540  433  498  
0  1 99
0 0 11 1692408 1220544 0 89 307 0 0  0  0 89  0  0  0  586  314  555  
1  0 99


So, 11 threads swapped out.  Here is a list.

# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc 
pcplusmp scsi_vhci zfs usba sockfs ip hook neti sctp arp uhci sd fctl 
lofs audiosup fcip random cpc crypto logindmux ptm ufs sppp ipc ]
 ::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | 
::eval t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs

p_user.u_psargs = [ /usr/lib/inet/in.ndpd ]
p_user.u_psargs = [ /usr/sbin/avahi-daemon-bridge-dsd -D ]
p_user.u_psargs = [ /usr/openwin/bin/fbconsole -n -d :0 ]
p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ]
p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ]
p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ]
p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ]
p_user.u_psargs = [ gnome-terminal ]
p_user.u_psargs = [ gnome-terminal ]
p_user.u_psargs = [ /usr/sbin/gdm-binary ]
p_user.u_psargs = [ /usr/lib/rmvolmgr -s ]


max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

Mike DeMarco wrote:

mdb -k
Loading modules: [ unix genunix specfs dtrace ufs sd mpt px md ip hook neti 
sctp arp nca fcp fctl emlxs cpc random crypto zfs wrsmd fcip ssd logindmux ptm 
sppp nfs lofs ipc ]
  

::walk thread t| ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval

After ::eval, on the same line, you need: t=K | ::print kthread_t 
t_procp | ::print proc_t p_user.u_psargs

Usage: eval command
  

::walk thread t| ::print kthread_t t_schedflag | ::grep (.1)==0


4000
4000
4000
  

And 0x4000 is, according to /usr/include/sys/thread.h:
#defineTS_RUNQMATCH0x4000/* exact run queue balancing by 
setbackdq() */


In particular, the TS_LOAD flag (0x1) is not set, indicating the thread 
is not in memory

(ie., swapped out).

If you want pid instead of arguments, you can use p_pidp | ::print 
struct pid pid_id instead

of p_user.u_psargs.

max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

Mike DeMarco wrote:

::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval 


t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs
p_user.u_psargs = [ /lib/svc/bin/svc.configd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ devfsadmd ]
p_user.u_psargs = [ devfsadmd ]
p_user.u_psargs = [ devfsadmd ]
p_user.u_psargs = [ devfsadmd ]
p_user.u_psargs = [ /usr/lib/sysevent/syseventd ]
p_user.u_psargs = [ /usr/lib/sysevent/syseventd ]
p_user.u_psargs = [ /usr/lib/sysevent/syseventd ]
p_user.u_psargs = [ /usr/lib/sysevent/syseventd ]
p_user.u_psargs = [ /usr/lib/sysevent/syseventd ]
p_user.u_psargs = [ /usr/sbin/nscd ]
p_user.u_psargs = [ /usr/lib/crypto/kcfd ]
p_user.u_psargs = [ /usr/platform/SUNW,SPARC-Enterprise/lib/sparcv9/oplhpd ]
p_user.u_psargs = [ /usr/platform/SUNW,SPARC-Enterprise/lib/sparcv9/oplhpd ]
p_user.u_psargs = [ /usr/lib/picl/picld ]
p_user.u_psargs = [ /usr/lib/picl/picld ]
p_user.u_psargs = [ /usr/sbin/rpcbind ]
p_user.u_psargs = [ /usr/lib/nfs/statd ]
p_user.u_psargs = [ /usr/lib/nfs/statd ]
p_user.u_psargs = [ /usr/lib/nfs/nfsmapid ]
p_user.u_psargs = [ /usr/openwin/bin/rpc.ttdbserverd ]
p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-ttdbserverd ]
p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-cmsd ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/lib/dcs -l ]
p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ]
p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ]
p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ]
p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ]
p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ]
p_user.u_psargs = [ /lib/svc/bin/svc.configd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /usr/lib/crypto/kcfd ]
p_user.u_psargs = [ /usr/sbin/rpcbind ]
p_user.u_psargs = [ /usr/lib/nfs/statd ]
p_user.u_psargs = [ /usr/lib/nfs/lockd ]
p_user.u_psargs = [ /usr/lib/nfs/nfs4cbd ]
p_user.u_psargs = [ /usr/lib/nfs/nfsmapid ]
p_user.u_psargs = [ /usr/openwin/bin/rpc.ttdbserverd ]
p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-ttdbserverd ]
p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-cmsd ]
p_user.u_psargs = [ /usr/sbin/nscd ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ]
p_user.u_psargs = [ /usr/dt/bin/dtlogin -daemon ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/lib/saf/ttymon -g -d /dev/console -l console -T vt100 -m ldterm,ttcompat -
 ]
p_user.u_psargs = [ /usr/lib/dmi/dmispd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /usr/bin/pfksh ]
p_user.u_psargs = [ /usr/bin/pfksh ]
p_user.u_psargs = [ 
/usr/lib/saf/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p seie
 ]
p_user.u_psargs = [ /usr/lib/sendmail -bd -q15m ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ 
/usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
 ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]
p_user.u_psargs = [ /lib/svc/bin/svc.startd ]


So is this telling me that a thread of rpcbind is still swapped out?
  
Correct.  As well as thread(s) from svc.configd, svc.startd, devfsadmd, 
syseventd, etc.

max


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread m...@bruningsystems.com

Hi Mike,

The only book I can recommend is Solaris Modular Debugger Guide,
at http://docs.sun.com/app/docs/doc/816-5041?l=en
If you look around a little, you'll probably find some good blog entries.
Eric Schrock wrote a nice little puzzle a while ago
(see http://blogs.sun.com/eschrock/entry/mdb_puzzle).
I have also done a little with mdb, and blogged a bit about it
at mbruning.blogspot.com.

The best way to learn it is by experience.  I recommend learning
the data structures (header files work well for this), and trying to see
the data structures via mdb.

Of course, I think a very good way to come up to speed with it is
to take a course.  I would start with a Solaris Internals course, then
take a course on kernel crash analysis and debugging.  (But then, I _would_
recommend this, as I teach both.)

max


Mike DeMarco wrote:

Thanks Max:
  This helps.

Can you recommend good documentation/books for mdb?
  


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Question about (Open)Solaris VM - why is swap required?

2009-04-25 Thread m...@bruningsystems.com

Hi,

Mike Gerdts wrote:

On Sat, Apr 25, 2009 at 2:01 AM, Shawn Walker swal...@opensolaris.org wrote:
  

Anon Y Mous wrote:


as we tested scaling and put more load on the servers (i.e. allowing
Apache to spawn more children), we were surprised to see that the system
doesn't tend to go below 8Gb (half) of the available RAM. Further requests
for memory allocation failed (can't fork new processes including new sshd,
vmstat, top and bash).


This may also be helpful:
http://blogs.sun.com/jimlaurent/entry/solaris_faq_myths_and_facts



The first comment in this blog entry probably reflects the behavior
that you are seeing.

In comparison, Linux will allow overallocation of memory.  If things
try to touch more memory+swap space than the system has, Linux will
invoke its out of memory killer.
See, for example, http://lwn.net/Articles/104179/.  This behavior can
be turned off on Linux, but I believe that it is enabled by default on
most distros.
  

You might also want to look at output of swap -sh
I assume you are getting free memory numbers from vmstat?

Try running the following program, then look at vmstat and swap -sh 
output.  If the
amount of reservable swap space as reported by swap -sh is less than the 
space needed
during a fork, the call will fail.  Even though there is plenty of free 
memory.


Here's the program:

char a[1024*1024*1024];  /* essentially, a 1GB heap */

main()
{
   pause();
}
# swap -lh
swapfile devswaplo   blocks free
/dev/zvol/dsk/rpool/swap 182,24K1023M1023M
# swap -sh
total: 876M allocated + 286M reserved = 1.1G used, 1.1G available
# vmstat 2 2
kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr cd lf s0 s2   in   sy   cs us 
sy id
0 0 0 1237452 538672 13 61  0  0  0  0 17  3 17 -0  8  480 3489  905  
3  1 96
0 0 0 1165692 443224 6  22  0  0  0  0  0  0  0  0  0  432 2013  726  
5  1 94

# ./foo 
[1] 1333
# swap -lh
swapfile devswaplo   blocks free
/dev/zvol/dsk/rpool/swap 182,24K1023M1023M  -- no swap 
space used on device

# swap -sh
total: 876M allocated + 1.3G reserved = 2.1G used, 114M available  -- 
only 114MB available

# vmstat 2 2
kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr cd lf s0 s2   in   sy   cs us 
sy id
0 0 0 1236672 538584 13 61  0  0  0  0 17  3 17 -0  8  480 3488  905  
3  1 96
0 0 0 116968 443116  9  22  0  0  0  0  0  0  0  0  0  434 2258  725  
4  1 95  -- memory usage has not significantly changed

# ./foo 
[2] 1337
#
[2]+  Killed  ./foo  -- here, exec probably failed, not 
fork.  due to insufficient reservable space

#


max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] problems sending to opensolaris-discuss

2009-04-02 Thread m...@bruningsystems.com

Hi,
Is anyone else having problems posting to osol-discuss?
I notice that today there have been only 10 messages (thus far).
I posted a couple of messages about 5 hours ago that are
still not showing up.  Let's see if this one makes it...

thank you.
max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] Free kernel internals training from Bruning Systems

2009-04-02 Thread m...@bruningsystems.com


Bruning Systems and OSUNIX are hosting a free one day training course on 
kernel internals for developers.  Limited positions are available.  
Learn more at http://sl.osunix.org/FreeKernelTrainingDay


thanks,
max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] OpenSolaris Internals course description

2009-04-01 Thread m...@bruningsystems.com

Hi All,

A course description for the OpenSolaris Internals course to be held in
Warsaw, Poland May 4-8 is available at 
http://www.bruningsystems.com/page14/page13/page13.html.


Any comments/suggestions are appreciated.

thanks,
max

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] OpenSolaris Internals course announcement

2009-03-25 Thread m...@bruningsystems.com

Hi All,

Systemics Poland (http://www.systemics.pl) and  Bruning Systems 
(http://www.bruningsystems.com) are holding an OpenSolaris Internals 
class on May 4-8 at the Systemics site in Warsaw.  The course will be 
taught in English.  For pricing and availability, please contact 
Magdalena Sternik magdalena.ster...@systemics.pl.  For detailed 
information on the course, please contact me at m...@bruningsystems.com.  
Please note that the course will be held subject to the number of people 
who enroll.


Thanks,
max
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] what kinds of situation will the read(fd, buf, cnt) return a negtive number?

2009-03-21 Thread m...@bruningsystems.com



man -s2 read

Read the sections on Return Values and Errors.


max

Ian mao wrote:

Need the detailed situations.
And is there any tool to trigger the read function failure?

Thanks!
  


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] regarding ptrace equivalent in solairs

2009-03-16 Thread m...@bruningsystems.com

casper@sun.com wrote:

hi,'m using the 5.11 kernel version on amd64 architecture, 32-bit. i need help
on the following issues



Forget about ptrace; use /proc.

As a Solaris process has multiple threads of control, you can't change the
registers using ptrace.

You get the registers through getting the lwpstatus file:


/proc/pid/lwp/lwp/lwpstatus.

You can use the PCSREG command to change the registers; a command is 
written with the argument to /proc/pid/lwp/lwp/lwpctl or

to /proc/pid/ctl.
  

You might also take a look at man -s 4 proc.
max


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] opensolaris.org not responding

2009-03-03 Thread m...@bruningsystems.com

Hi,
I would post this to website-discuss, except that I am not on that 
mailing list.
To get on that mailing list, I need to get to opensolaris.org.  
www.opensolaris.org
has timed out every time I have tried it today.  I am having no problems 
with any

other sites.
thanks,
max


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] diagnosing server hang

2009-01-30 Thread m...@bruningsystems.com
Hi Matt,
Maybe this should be moved to mdb-discuss...
My comments are at the end.

Matt Harrison wrote:
 Matt Harrison wrote:
   
 m...@bruningsystems.com wrote:
 
 Hi Matt,

 Matt Harrison wrote:
   
 Matt Harrison wrote:
  
 
 thanks Ian, I'll look into this tomorrow.

 
   
 Well I'm not sure if it's good news or not. I've got the machine 
 running memtest86+ with the standard tests and so far it's done 2 
 passes (3 hours runtime) without a single error.

 I'm going to leave it running overnight but does it seem there could 
 be another problem other than memory?
   
 
 Have you gotten anywhere yet with this hang?  Have you tried set 
 snooping=1 in
 /etc/system?  How about booting with kmdb and forcing a dump?
 I'm not sure why this is necessarily hardware related...

 max


   
 Unfortunately not yet...I was forced to bring the server back up to get 
 some files from it, and I haven't had a chance to take it down again yet.

 I still need to make sure it can survive 24h solid of memtest, but I am 
 happy to try other things.

 I'm not familiar with the snooping variable, nor with kmdb, although I 
 have read about it being used here and there.

 I'll go ahead with the memtest when I can and report back.
 

 Ok, sorry it's taken a while but I've had the server run memtest for 24 
 hours and it hasn't found any errors whatsoever.

 Does anyone have an idea as to what I could try next?
   
I would try booting with kmdb (or, alternatively, load kmdb once the 
machine is
up but before it is hung).  You can do this from command line console 
login (no graphics)
by running:

# mdb -K  -- this will load kmdb and drop into it
:c  -- this will continue

If you must have a windowing system to reproduce the hang, you can still use
kmdb, but, unless you can redirect console input/output from/to a serial 
port, you
won't be able to see what you are doing.  But, it is ok.
You type:

# mdb -K -F  -- again, loads kmdb and drops into it.  The machine will 
appear hung.

Now, carefully with no typos:

: c   -- and enter (that's colon c enter (3 key strokes)) the machine 
should
 come back (unless you have a typo).

Now, do whatever you are doing that causes the machine to hang.
When the machine is hung, type F1-a  (that is function key f1 and a 
together.
Unless the machine is hard hung, this will put you into kmdb.  Again, 
you won't
be able to see what is happening if your console is on a windowing system.
Then type (again, no typos):

$systemdump-- this will give you a panic dump and reboot.

If the above doesn't work, you either made typos, or your machine is 
hard hung.
If it is hard hung, add this line to /etc/system (of course, you'll have 
to bounce the
machine to get it back up to do this):

set snooping=1

Then reboot.  This sets a deadman timer.  Again, do your thing to cause
the hang.  If the scheduling clock does not
run for (by default) 50 seconds, the machine will panic giving you a dump.

If neither of these work, it implies that the real time clock is blocked 
out.
This is highly unlikely, but can occur.

Once you have the dump, report back...

max


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] diagnosing server hang

2009-01-27 Thread m...@bruningsystems.com
Hi Matt,

Matt Harrison wrote:
 Matt Harrison wrote:
   
 thanks Ian, I'll look into this tomorrow.

 

 Well I'm not sure if it's good news or not. I've got the machine running 
 memtest86+ with the standard tests and so far it's done 2 passes (3 
 hours runtime) without a single error.

 I'm going to leave it running overnight but does it seem there could be 
 another problem other than memory?
   
Have you gotten anywhere yet with this hang?  Have you tried set 
snooping=1 in
/etc/system?  How about booting with kmdb and forcing a dump?
I'm not sure why this is necessarily hardware related...

max


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] migrating issues from SXCE to 2008.11

2009-01-11 Thread m...@bruningsystems.com
Hi Matt,

Matt Harrison wrote:
 Hi all,

 I've got a filer using SXCE installed and I'm thinking about migrating 
 to OpenSolaris due mainly to the improved pkg command.

 I've tried testing it on VMware (6.0.0 build-45731) but there is an 
 immediate problem.

 The install went fine but on reboot it appears to fail to start X, 
 dumping me back to a console login with some garbage characters 
 printed over it. It tries to start X 3 times I think, each time coming 
 back to the messed up prompt.

 I tried to login on the console to check the X logs but it just spams 
 the screen with more garbage. Attached is a screenshot of the console 
 after a login attempt.

 I have installed the previous release of OSOL before on the same 
 version of VMware, and that didn't have this problem.

 Any ideas welcomed before I abandon it for SXCE again :)
I see this a few times right now (I am developing an Xinput module).  If 
you can telnet/ssh in,
take a look at /var/log/Xorg.0.log.   You should also try killing Xorg.  
I suspect it is running
based on the screen snapshot you have attached.

max


 Many thanks

 Matt Harrison

 

 

 ___
 opensolaris-discuss mailing list
 opensolaris-discuss@opensolaris.org

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] migrating issues from SXCE to 2008.11

2009-01-11 Thread m...@bruningsystems.com
Hi Matt,

Matt Harrison wrote:
 m...@bruningsystems.com wrote:
 Hi Matt,

 Matt Harrison wrote:
 Hi all,

 I've got a filer using SXCE installed and I'm thinking about 
 migrating to OpenSolaris due mainly to the improved pkg command.

 I've tried testing it on VMware (6.0.0 build-45731) but there is an 
 immediate problem.

 The install went fine but on reboot it appears to fail to start X, 
 dumping me back to a console login with some garbage characters 
 printed over it. It tries to start X 3 times I think, each time 
 coming back to the messed up prompt.

 I tried to login on the console to check the X logs but it just 
 spams the screen with more garbage. Attached is a screenshot of the 
 console after a login attempt.

 I have installed the previous release of OSOL before on the same 
 version of VMware, and that didn't have this problem.

 Any ideas welcomed before I abandon it for SXCE again :)
 I see this a few times right now (I am developing an Xinput module).  
 If you can telnet/ssh in,
 take a look at /var/log/Xorg.0.log.   You should also try killing 
 Xorg.  I suspect it is running
 based on the screen snapshot you have attached.

 max

 Thanks for the reply Max,

 I've managed to ssh in and grepped /var/log/Xorg.0.log for EE:

 The two lines that show up are

 (EE) Unable to locate/open config file
 and
 (EE) AIGLX: Screen 0 is not DRI capable

 So presumably I need to fiddle a bit with the X config, unfortunately 
 thats a job I always hate and haven't done in ages :P One thing I 
 liked about SXCE was the way X seems to work better than on my linux 
 boxes ;)

 I'll have a fiddle and see what I can come up with.
You might also want to look at ~/.xsession-errors
max


 Thanks

 Matt


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Linux/FreeBSD/(Open)Solaris survey paper

2008-09-20 Thread m...@bruningsystems.com
Hi Philip,

I also wrote an article that is a lot more detailed, but aimed at
application developers, not kernel people, per se.  It can be found
at: http://developers.sun.com/solaris/articles/solaris_linux_app.html

Have fun!
max


Philip Torchinsky wrote:
 Stefan,

 yes, this is close to what I am looking for, but I try to read this 
 deeply. Thank you for the link, I lost it a year ago!
 If anybody has more links to point me on, please, let me know!

 Philip

 Stefan Varga:
   
 I can find data spread across many books and papers (like Solaris 
 Internals for Solaris and some unknown to me yet for Linux and 
 FreeBSD) but I believe there can be a paper with all the info 
 gathered and analyzed already. Can you advise me one?
   
   
 this one ?
 http://www.softpanorama.org/Articles/solaris_vs_linux.shtml

 Stefan
 

 ___
 opensolaris-discuss mailing list
 opensolaris-discuss@opensolaris.org

   

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org