Re: [osol-discuss] Need help debuggin hang issue on build 134
Hi Ronny, How are you getting the crash dump if your system is hung and you are not getting into kmdb? To get into kmdb, you need to type f1-a on the console (based on the addresses, you are on x64, not SPARC). max Ronny Egner wrote: HI all, cpuinfo -v yields (see file cpuinfo.txt ). I noticed one interesting thing: When preparing to catche the hang i opened the console and pre-typed mdb -K to crash the system if needed to. When the system hang i pressed ENTER but nothing happened. While digging around in the core dump i found my mdb -K: ff1381a42e40 ff13a6594cc0 ff138192c310 1 600 PC: _resume_from_idle+0xf1CMD: mdb -K stack pointer for thread ff1381a42e40: ff008c643d30 [ ff008c643d30 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() vmem_xalloc+0x635() vmem_alloc+0x161() segkmem_xalloc+0x90() segkmem_alloc_vn+0xcd() segkmem_zio_alloc+0x24() vmem_xalloc+0x546() vmem_alloc+0x161() kmem_slab_create+0x81() kmem_slab_alloc+0x5b() kmem_cache_alloc+0x1fa() zio_data_buf_alloc+0x2c() arc_get_data_buf+0x18b() arc_buf_alloc+0xa2() arc_read_nolock+0x12f() arc_read+0x75() dbuf_read_impl+0x172() dbuf_read+0xfe() dmu_buf_hold_array_by_dnode+0x1c9() dmu_buf_hold_array+0x6e() dmu_read_uio+0x4d() zfs_read+0x2d1() fop_read+0x6b() vn_rdwr+0x17f() gexec+0x140() exec_common+0x45c() exece+0x1f() _sys_sysenter_post_swapgs+0x149() ff1381a42e40::thread ADDRSTATE FLG PFLG SFLG PRI EPRI PIL INTR ff1381a42e40 run 1000 104360 0 0 n/a ff1381a42e40::threadlist ADDR PROC LWP CMD/LWPID ff1381a42e40 ff13a6594cc0 ff138192c310 mdb/1 So i looked for the mdb process and found it on CPU ID #7: I noticed mdb -K was in run queue on CPU ID 7: ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 7 ff1376a44540 1f70 99 nono t-0ff008b137c60 sched || RUNNING --++-- PRI THREAD PROC READY60 ff1381a42e40 mdb QUIESCED60 ff13a35ee720 nfsd EXISTS60 ff1377474400 bash ENABLE60 ff008c1bec60 sched 59 ff13817ef540 nscd 59 ff13817fbc60 syslogd 58 ff1381a38720 smbd So it seems mdb was blocked by sched (thread ff008b137c60); digging it yields: ff008b137c60::findstack stack pointer for thread ff008b137c60: ff008b1370d0 ff008b137120 intr_thread_prolog+0x2a() ff008b137140 apic_setspl+0x5c() ff008b137180 splr+0x55() ff008b137c60 0x22d9fd9301c7() Any ideas? Message was edited by: ronnyegn ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] Need help debuggin hang issue on build 134
You might try ::cpuinfo -v in mdb on the dump to see what was running on the cpus at the time of the hang. max Brian Ruthven - Sun UK wrote: Ronny Egner wrote: ff008bfe0480 unix:die+dd () ff008bfe0590 unix:trap+177b () ff008bfe05a0 unix:cmntrap+e6 () ff008bfe0690 0 () ff008bfe06b0 unix:debug_enter+38 () ff008bfe06d0 unix:abort_sequence_enter+35 () ff008bfe0720 kbtrans:kbtrans_streams_key+102 () ff008bfe0750 conskbd:conskbdlrput+e7 () ff008bfe07c0 unix:putnext+21e () ff008bfe0800 kbtrans:kbtrans_queueevent+7c () ff008bfe0830 kbtrans:kbtrans_queuepress+7c () ff008bfe0870 kbtrans:kbtrans_untrans_keypressed_raw+46 () ff008bfe08a0 kbtrans:kbtrans_processkey+32 () ff008bfe08f0 kbtrans:kbtrans_streams_key+175 () ff008bfe0920 usbkbm:usbkbm_wrap_kbtrans+20 () ff008bfe0960 usbkbm:usbkbm_streams_callback+3c () ff008bfe09e0 usbkbm:usbkbm_unpack_usb_packet+2f6 () ff008bfe0a10 usbkbm:usbkbm_rput+84 () ff008bfe0a80 unix:putnext+21e () ff008bfe0ac0 hid:hid_interrupt_pipe_callback+7c () ff008bfe0b00 usba:usba_req_normal_cb+155 () ff008bfe0b60 usba:hcdi_do_cb+133 () ff008bfe0ba0 usba:hcdi_cb_thread+b2 () ff008bfe0c40 genunix:taskq_thread+248 () ff008bfe0c50 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: all snip So just judgint from the stack trace ($C macro) the problem seems to be related to some kind of USB device? Can anyone help me? Your stack trace above is the direct result of dropping to the debugger, from handling the USB interrupt at the bottom of the trace, all the way up through the usbkbm and kbtrans drivers, up to dropping out of Solaris via the debug_enter() call. The remainder of the stack is the result of forcing the panic - a jmpl to a zero address triggers an instant bad trap, and the system panics through the normal mechanism. The USB device here is merely the messenger of the Alt-F1 sequence to drop to the debugger. This particular stack is not a problem in itself. For your issue, you will need to see what other threads are doing in the system at the time you halted it. [ Might be worth ruling out http://defect.opensolaris.org/bz/show_bug.cgi?id=12528 - I don't think it's the right bug, but it reared its head recently. I don't know whether this only affects the graphics card or the entire system... ] You may need to be looking at what the IO layer was doing with the two internal disks at the time - IIRC, threads blocked in biowait() are a good starting point. Regards, Brian ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] b134 freezes
Hi Jouko, You might try running mdb -k and see what is thread 0xd8db3dc0 using: d8db3dc0::threadlist -v Jouko Holopainen wrote: Situation improves ... after I disabled compression on the swap Firefox no longer hangs. At least yet.. Should I make a separate partition for it? The kernel freeze I debugged with http://www.bruningsystems.com/runq.d and it says: (just a part here): de7c4ac0 on run queue, execname = xscreensaver dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36 dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36 dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36 dispthread at clock = d8db3dc0, exec = sched, nrunnable at clock = 36 ... Yes, there is 36 shed's in the queue (according to the script). I can give the whole output (480k, should zip a lot) to anyone interested. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] b134 freezes
You could try making a copy of /etc/driver_aliases. Then rem_drv atge (you may need a reboot afterwards). Then see if the problem goes away. Also, try disabling nwam. That may stop the taskq thread(s) for atge. max Jouko Holopainen wrote: d8db3dc0::threadlist -v ADDR PROC LWP CLS PRIWCHAN d8db3dc0 fec21dd80 0 60 d2bb902c PC: _resume_from_idle+0xb1TASKQ: atge_mii0 stack pointer for thread d8db3dc0: d8db3c58 swtch+0x188() cv_timedwait_hires+0xc5() cv_reltimedwait+0x52() _mii_task+0x184() taskq_thread+0x1f7() thread_start+8() This is a bit strange as atge (wired) is not connected at all. Or maybe it is not that strange after all ... maybe it is trying to up. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
kill doesn't work? max Mike DeMarco wrote: looking for some kind of tool that will let me release the swap space consumed by a process that was swapped out and its parent died. There is no way for this process to get back on the stack and its carcass consumes unrecoverable swap space. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
Hi Mike, Mike DeMarco wrote: How do I know that the process is swapped out. Best answer I can give for this is that I don't. I have no tools that can tell me what processes are represented by the number that is in the w column in vmstat. I do know that once swap is hit this number will grow and stay zero until the kernel is rebooted. I am looking for tools/methods to analyze the processes in swap determine if indeed they are unneeded and remove them since they will never be called out of swap and are consuming disk space. The w column in vmstat is a count of the number of threads swapped out. A swapped out thread will not have 0x1 set in t_schedflag. You can find the process(es) the threads belong to by: # mdb -k ::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs (Starting at ::walk thread t is all one line.) max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
Mike DeMarco wrote: How do I know that the process is swapped out. Best answer I can give for this is that I don't. I have no tools that can tell me what processes are represented by the number that is in the w column in vmstat. I do know that once swap is hit this number will grow and stay zero until the kernel is rebooted. I am looking for tools/methods to analyze the processes in swap determine if indeed they are unneeded and remove them since they will never be called out of swap and are consuming disk space. A little follow up from my box. # vmstat 2 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr cd s0 s1 -- in sy cs us sy id 1 0 1 1551744 875932 51 488 6 285 349 0 17112 25 -0 11 0 935 160430 1503 11 9 80 0 0 11 1693076 1221836 0 117 213 0 0 0 0 74 0 0 0 540 433 498 0 1 99 0 0 11 1692408 1220544 0 89 307 0 0 0 0 89 0 0 0 586 314 555 1 0 99 So, 11 threads swapped out. Here is a list. # mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs usba sockfs ip hook neti sctp arp uhci sd fctl lofs audiosup fcip random cpc crypto logindmux ptm ufs sppp ipc ] ::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs p_user.u_psargs = [ /usr/lib/inet/in.ndpd ] p_user.u_psargs = [ /usr/sbin/avahi-daemon-bridge-dsd -D ] p_user.u_psargs = [ /usr/openwin/bin/fbconsole -n -d :0 ] p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ] p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ] p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ] p_user.u_psargs = [ /usr/lib/thunderbird/thunderbird-bin ] p_user.u_psargs = [ gnome-terminal ] p_user.u_psargs = [ gnome-terminal ] p_user.u_psargs = [ /usr/sbin/gdm-binary ] p_user.u_psargs = [ /usr/lib/rmvolmgr -s ] max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
Mike DeMarco wrote: mdb -k Loading modules: [ unix genunix specfs dtrace ufs sd mpt px md ip hook neti sctp arp nca fcp fctl emlxs cpc random crypto zfs wrsmd fcip ssd logindmux ptm sppp nfs lofs ipc ] ::walk thread t| ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval After ::eval, on the same line, you need: t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs Usage: eval command ::walk thread t| ::print kthread_t t_schedflag | ::grep (.1)==0 4000 4000 4000 And 0x4000 is, according to /usr/include/sys/thread.h: #defineTS_RUNQMATCH0x4000/* exact run queue balancing by setbackdq() */ In particular, the TS_LOAD flag (0x1) is not set, indicating the thread is not in memory (ie., swapped out). If you want pid instead of arguments, you can use p_pidp | ::print struct pid pid_id instead of p_user.u_psargs. max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
Mike DeMarco wrote: ::walk thread t | ::print kthread_t t_schedflag | ::grep (.1)==0 | ::eval t=K | ::print kthread_t t_procp | ::print proc_t p_user.u_psargs p_user.u_psargs = [ /lib/svc/bin/svc.configd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ devfsadmd ] p_user.u_psargs = [ devfsadmd ] p_user.u_psargs = [ devfsadmd ] p_user.u_psargs = [ devfsadmd ] p_user.u_psargs = [ /usr/lib/sysevent/syseventd ] p_user.u_psargs = [ /usr/lib/sysevent/syseventd ] p_user.u_psargs = [ /usr/lib/sysevent/syseventd ] p_user.u_psargs = [ /usr/lib/sysevent/syseventd ] p_user.u_psargs = [ /usr/lib/sysevent/syseventd ] p_user.u_psargs = [ /usr/sbin/nscd ] p_user.u_psargs = [ /usr/lib/crypto/kcfd ] p_user.u_psargs = [ /usr/platform/SUNW,SPARC-Enterprise/lib/sparcv9/oplhpd ] p_user.u_psargs = [ /usr/platform/SUNW,SPARC-Enterprise/lib/sparcv9/oplhpd ] p_user.u_psargs = [ /usr/lib/picl/picld ] p_user.u_psargs = [ /usr/lib/picl/picld ] p_user.u_psargs = [ /usr/sbin/rpcbind ] p_user.u_psargs = [ /usr/lib/nfs/statd ] p_user.u_psargs = [ /usr/lib/nfs/statd ] p_user.u_psargs = [ /usr/lib/nfs/nfsmapid ] p_user.u_psargs = [ /usr/openwin/bin/rpc.ttdbserverd ] p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-ttdbserverd ] p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-cmsd ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/lib/dcs -l ] p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ] p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ] p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ] p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ] p_user.u_psargs = [ /usr/lib/fm/fmd/fmd ] p_user.u_psargs = [ /lib/svc/bin/svc.configd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /usr/lib/crypto/kcfd ] p_user.u_psargs = [ /usr/sbin/rpcbind ] p_user.u_psargs = [ /usr/lib/nfs/statd ] p_user.u_psargs = [ /usr/lib/nfs/lockd ] p_user.u_psargs = [ /usr/lib/nfs/nfs4cbd ] p_user.u_psargs = [ /usr/lib/nfs/nfsmapid ] p_user.u_psargs = [ /usr/openwin/bin/rpc.ttdbserverd ] p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-ttdbserverd ] p_user.u_psargs = [ /bin/sh /lib/svc/method/rpc-cmsd ] p_user.u_psargs = [ /usr/sbin/nscd ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/sadm/lib/smc/bin/smcboot ] p_user.u_psargs = [ /usr/dt/bin/dtlogin -daemon ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/lib/saf/ttymon -g -d /dev/console -l console -T vt100 -m ldterm,ttcompat - ] p_user.u_psargs = [ /usr/lib/dmi/dmispd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /usr/bin/pfksh ] p_user.u_psargs = [ /usr/bin/pfksh ] p_user.u_psargs = [ /usr/lib/saf/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p seie ] p_user.u_psargs = [ /usr/lib/sendmail -bd -q15m ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] p_user.u_psargs = [ /lib/svc/bin/svc.startd ] So is this telling me that a thread of rpcbind is still swapped out? Correct. As well as thread(s) from svc.configd, svc.startd, devfsadmd, syseventd, etc. max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] swap purge
Hi Mike, The only book I can recommend is Solaris Modular Debugger Guide, at http://docs.sun.com/app/docs/doc/816-5041?l=en If you look around a little, you'll probably find some good blog entries. Eric Schrock wrote a nice little puzzle a while ago (see http://blogs.sun.com/eschrock/entry/mdb_puzzle). I have also done a little with mdb, and blogged a bit about it at mbruning.blogspot.com. The best way to learn it is by experience. I recommend learning the data structures (header files work well for this), and trying to see the data structures via mdb. Of course, I think a very good way to come up to speed with it is to take a course. I would start with a Solaris Internals course, then take a course on kernel crash analysis and debugging. (But then, I _would_ recommend this, as I teach both.) max Mike DeMarco wrote: Thanks Max: This helps. Can you recommend good documentation/books for mdb? ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] Question about (Open)Solaris VM - why is swap required?
Hi, Mike Gerdts wrote: On Sat, Apr 25, 2009 at 2:01 AM, Shawn Walker swal...@opensolaris.org wrote: Anon Y Mous wrote: as we tested scaling and put more load on the servers (i.e. allowing Apache to spawn more children), we were surprised to see that the system doesn't tend to go below 8Gb (half) of the available RAM. Further requests for memory allocation failed (can't fork new processes including new sshd, vmstat, top and bash). This may also be helpful: http://blogs.sun.com/jimlaurent/entry/solaris_faq_myths_and_facts The first comment in this blog entry probably reflects the behavior that you are seeing. In comparison, Linux will allow overallocation of memory. If things try to touch more memory+swap space than the system has, Linux will invoke its out of memory killer. See, for example, http://lwn.net/Articles/104179/. This behavior can be turned off on Linux, but I believe that it is enabled by default on most distros. You might also want to look at output of swap -sh I assume you are getting free memory numbers from vmstat? Try running the following program, then look at vmstat and swap -sh output. If the amount of reservable swap space as reported by swap -sh is less than the space needed during a fork, the call will fail. Even though there is plenty of free memory. Here's the program: char a[1024*1024*1024]; /* essentially, a 1GB heap */ main() { pause(); } # swap -lh swapfile devswaplo blocks free /dev/zvol/dsk/rpool/swap 182,24K1023M1023M # swap -sh total: 876M allocated + 286M reserved = 1.1G used, 1.1G available # vmstat 2 2 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr cd lf s0 s2 in sy cs us sy id 0 0 0 1237452 538672 13 61 0 0 0 0 17 3 17 -0 8 480 3489 905 3 1 96 0 0 0 1165692 443224 6 22 0 0 0 0 0 0 0 0 0 432 2013 726 5 1 94 # ./foo [1] 1333 # swap -lh swapfile devswaplo blocks free /dev/zvol/dsk/rpool/swap 182,24K1023M1023M -- no swap space used on device # swap -sh total: 876M allocated + 1.3G reserved = 2.1G used, 114M available -- only 114MB available # vmstat 2 2 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr cd lf s0 s2 in sy cs us sy id 0 0 0 1236672 538584 13 61 0 0 0 0 17 3 17 -0 8 480 3488 905 3 1 96 0 0 0 116968 443116 9 22 0 0 0 0 0 0 0 0 0 434 2258 725 4 1 95 -- memory usage has not significantly changed # ./foo [2] 1337 # [2]+ Killed ./foo -- here, exec probably failed, not fork. due to insufficient reservable space # max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] problems sending to opensolaris-discuss
Hi, Is anyone else having problems posting to osol-discuss? I notice that today there have been only 10 messages (thus far). I posted a couple of messages about 5 hours ago that are still not showing up. Let's see if this one makes it... thank you. max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] Free kernel internals training from Bruning Systems
Bruning Systems and OSUNIX are hosting a free one day training course on kernel internals for developers. Limited positions are available. Learn more at http://sl.osunix.org/FreeKernelTrainingDay thanks, max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] OpenSolaris Internals course description
Hi All, A course description for the OpenSolaris Internals course to be held in Warsaw, Poland May 4-8 is available at http://www.bruningsystems.com/page14/page13/page13.html. Any comments/suggestions are appreciated. thanks, max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] OpenSolaris Internals course announcement
Hi All, Systemics Poland (http://www.systemics.pl) and Bruning Systems (http://www.bruningsystems.com) are holding an OpenSolaris Internals class on May 4-8 at the Systemics site in Warsaw. The course will be taught in English. For pricing and availability, please contact Magdalena Sternik magdalena.ster...@systemics.pl. For detailed information on the course, please contact me at m...@bruningsystems.com. Please note that the course will be held subject to the number of people who enroll. Thanks, max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] what kinds of situation will the read(fd, buf, cnt) return a negtive number?
man -s2 read Read the sections on Return Values and Errors. max Ian mao wrote: Need the detailed situations. And is there any tool to trigger the read function failure? Thanks! ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] regarding ptrace equivalent in solairs
casper@sun.com wrote: hi,'m using the 5.11 kernel version on amd64 architecture, 32-bit. i need help on the following issues Forget about ptrace; use /proc. As a Solaris process has multiple threads of control, you can't change the registers using ptrace. You get the registers through getting the lwpstatus file: /proc/pid/lwp/lwp/lwpstatus. You can use the PCSREG command to change the registers; a command is written with the argument to /proc/pid/lwp/lwp/lwpctl or to /proc/pid/ctl. You might also take a look at man -s 4 proc. max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
[osol-discuss] opensolaris.org not responding
Hi, I would post this to website-discuss, except that I am not on that mailing list. To get on that mailing list, I need to get to opensolaris.org. www.opensolaris.org has timed out every time I have tried it today. I am having no problems with any other sites. thanks, max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] diagnosing server hang
Hi Matt, Maybe this should be moved to mdb-discuss... My comments are at the end. Matt Harrison wrote: Matt Harrison wrote: m...@bruningsystems.com wrote: Hi Matt, Matt Harrison wrote: Matt Harrison wrote: thanks Ian, I'll look into this tomorrow. Well I'm not sure if it's good news or not. I've got the machine running memtest86+ with the standard tests and so far it's done 2 passes (3 hours runtime) without a single error. I'm going to leave it running overnight but does it seem there could be another problem other than memory? Have you gotten anywhere yet with this hang? Have you tried set snooping=1 in /etc/system? How about booting with kmdb and forcing a dump? I'm not sure why this is necessarily hardware related... max Unfortunately not yet...I was forced to bring the server back up to get some files from it, and I haven't had a chance to take it down again yet. I still need to make sure it can survive 24h solid of memtest, but I am happy to try other things. I'm not familiar with the snooping variable, nor with kmdb, although I have read about it being used here and there. I'll go ahead with the memtest when I can and report back. Ok, sorry it's taken a while but I've had the server run memtest for 24 hours and it hasn't found any errors whatsoever. Does anyone have an idea as to what I could try next? I would try booting with kmdb (or, alternatively, load kmdb once the machine is up but before it is hung). You can do this from command line console login (no graphics) by running: # mdb -K -- this will load kmdb and drop into it :c -- this will continue If you must have a windowing system to reproduce the hang, you can still use kmdb, but, unless you can redirect console input/output from/to a serial port, you won't be able to see what you are doing. But, it is ok. You type: # mdb -K -F -- again, loads kmdb and drops into it. The machine will appear hung. Now, carefully with no typos: : c -- and enter (that's colon c enter (3 key strokes)) the machine should come back (unless you have a typo). Now, do whatever you are doing that causes the machine to hang. When the machine is hung, type F1-a (that is function key f1 and a together. Unless the machine is hard hung, this will put you into kmdb. Again, you won't be able to see what is happening if your console is on a windowing system. Then type (again, no typos): $systemdump-- this will give you a panic dump and reboot. If the above doesn't work, you either made typos, or your machine is hard hung. If it is hard hung, add this line to /etc/system (of course, you'll have to bounce the machine to get it back up to do this): set snooping=1 Then reboot. This sets a deadman timer. Again, do your thing to cause the hang. If the scheduling clock does not run for (by default) 50 seconds, the machine will panic giving you a dump. If neither of these work, it implies that the real time clock is blocked out. This is highly unlikely, but can occur. Once you have the dump, report back... max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] diagnosing server hang
Hi Matt, Matt Harrison wrote: Matt Harrison wrote: thanks Ian, I'll look into this tomorrow. Well I'm not sure if it's good news or not. I've got the machine running memtest86+ with the standard tests and so far it's done 2 passes (3 hours runtime) without a single error. I'm going to leave it running overnight but does it seem there could be another problem other than memory? Have you gotten anywhere yet with this hang? Have you tried set snooping=1 in /etc/system? How about booting with kmdb and forcing a dump? I'm not sure why this is necessarily hardware related... max ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] migrating issues from SXCE to 2008.11
Hi Matt, Matt Harrison wrote: Hi all, I've got a filer using SXCE installed and I'm thinking about migrating to OpenSolaris due mainly to the improved pkg command. I've tried testing it on VMware (6.0.0 build-45731) but there is an immediate problem. The install went fine but on reboot it appears to fail to start X, dumping me back to a console login with some garbage characters printed over it. It tries to start X 3 times I think, each time coming back to the messed up prompt. I tried to login on the console to check the X logs but it just spams the screen with more garbage. Attached is a screenshot of the console after a login attempt. I have installed the previous release of OSOL before on the same version of VMware, and that didn't have this problem. Any ideas welcomed before I abandon it for SXCE again :) I see this a few times right now (I am developing an Xinput module). If you can telnet/ssh in, take a look at /var/log/Xorg.0.log. You should also try killing Xorg. I suspect it is running based on the screen snapshot you have attached. max Many thanks Matt Harrison ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] migrating issues from SXCE to 2008.11
Hi Matt, Matt Harrison wrote: m...@bruningsystems.com wrote: Hi Matt, Matt Harrison wrote: Hi all, I've got a filer using SXCE installed and I'm thinking about migrating to OpenSolaris due mainly to the improved pkg command. I've tried testing it on VMware (6.0.0 build-45731) but there is an immediate problem. The install went fine but on reboot it appears to fail to start X, dumping me back to a console login with some garbage characters printed over it. It tries to start X 3 times I think, each time coming back to the messed up prompt. I tried to login on the console to check the X logs but it just spams the screen with more garbage. Attached is a screenshot of the console after a login attempt. I have installed the previous release of OSOL before on the same version of VMware, and that didn't have this problem. Any ideas welcomed before I abandon it for SXCE again :) I see this a few times right now (I am developing an Xinput module). If you can telnet/ssh in, take a look at /var/log/Xorg.0.log. You should also try killing Xorg. I suspect it is running based on the screen snapshot you have attached. max Thanks for the reply Max, I've managed to ssh in and grepped /var/log/Xorg.0.log for EE: The two lines that show up are (EE) Unable to locate/open config file and (EE) AIGLX: Screen 0 is not DRI capable So presumably I need to fiddle a bit with the X config, unfortunately thats a job I always hate and haven't done in ages :P One thing I liked about SXCE was the way X seems to work better than on my linux boxes ;) I'll have a fiddle and see what I can come up with. You might also want to look at ~/.xsession-errors max Thanks Matt ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] Linux/FreeBSD/(Open)Solaris survey paper
Hi Philip, I also wrote an article that is a lot more detailed, but aimed at application developers, not kernel people, per se. It can be found at: http://developers.sun.com/solaris/articles/solaris_linux_app.html Have fun! max Philip Torchinsky wrote: Stefan, yes, this is close to what I am looking for, but I try to read this deeply. Thank you for the link, I lost it a year ago! If anybody has more links to point me on, please, let me know! Philip Stefan Varga: I can find data spread across many books and papers (like Solaris Internals for Solaris and some unknown to me yet for Linux and FreeBSD) but I believe there can be a paper with all the info gathered and analyzed already. Can you advise me one? this one ? http://www.softpanorama.org/Articles/solaris_vs_linux.shtml Stefan ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org