Re: USB printing panic

2011-03-12 Thread Bill Green

On 2/10/2011 1:33 AM, Martin Husemann wrote:
> Can you try something like this?
>
> Martin

Hi Martin,

Apologies for the long-delayed response.  I'm using 5.1; after making 
the change in your patch I still see the panic.  For now, I've added a 
test to ulpt_tick for sc->sc_in_xfer == NULL, and simply returning if 
so.  This seems naive, but it solves the panic issue, although it also 
seems that occasionally communication with the printer is terminated 
early as well (meaning I have to send jobs multiple times).  That could 
well be another issue with my CUPS setup, though.


Bill



Re: USB printing panic

2011-02-10 Thread Martin Husemann
Can you try something like this?

Martin
Index: ulpt.c
===
RCS file: /cvsroot/src/sys/dev/usb/ulpt.c,v
retrieving revision 1.85
diff -u -p -r1.85 ulpt.c
--- ulpt.c  3 Nov 2010 22:34:24 -   1.85
+++ ulpt.c  10 Feb 2011 09:33:25 -
@@ -656,7 +656,7 @@ ulptclose(dev_t dev, int flag, int mode,
 
if (sc->sc_has_callout) {
DPRINTFN(2, ("ulptclose: stopping read callout\n"));
-   callout_stop(&sc->sc_read_callout);
+   callout_halt(&sc->sc_read_callout, NULL);
sc->sc_has_callout = 0;
}
 


Re: USB printing panic

2011-02-09 Thread Bill Green

Hello Eduardo,

All of the below should be taken with the caveat that I'm an amateur at 
best and I've never looked at the NetBSD kernel (or that of any other 
OS, for that matter) before.  So it's likely to be completely wrong, or 
worse.


On 2/9/2011 4:08 PM, Eduardo Horvath wrote:

On Wed, 9 Feb 2011, Bill Green wrote:


>> cpu0: data fault: pc=14becc8 addr=0
>> kernel trap 30: data access exception
>> Stopped in pid 0.5 (system) at  netbsd:usbd_setup_xfer+0x8: ldub
>> [
>> %o0 + 0x70], %g3
>
> This one is definitely a NULL pointer dereference in the kernel, probably
> in usbd_setup_xfer.

In usbdi.c there are several functions (usbd_setup_xfer, usbd_transfer, 
others) which take pointers to structures and don't check whether they 
are null before using them.


In ulpt.c, ulpt_tick calls usbd_setup_xfer and usbd_transfer, passing 
them a usb_xfer_handle contained in the struct ulpt_softc it gets a 
pointer to as argument.


The following appears to be happening in my case: after rastertoqpdl
crashes, the usb transfer is never finished (from the perspective of the 
printer, which will eventually print a sheet with a timeout error). 
ulptclose is called, which sets sc.sc_out_xfer (that eventually gets 
passed to usbd_setup_xfer and friends) to NULL, but leaves set sc (the 
struct ulpt_softc that ulpt_tick uses).


ulpt_tick sometimes (I haven't found where) gets called after ulptclose, 
and only checks whether sc is null, and NOT sc->sc_out_xfer.


I've added a test to ulpt_tick to check if sc->sc_out_xfer is null, and 
haven't been able to panic the system since.  But I'm not sure whether 
anything else is making calls to the usbd_* functions with similar 
possible problems, or what the best way to fix this would be.  Perhaps 
one could set the struct ulpt_softc itself to NULL in ulptclose, if 
other functions in ulpt.c follow the same assumptions? But, as I 
mentioned, there seem to be a lot of functions in usbdi.c that assume 
they are getting usable pointers, and these functions get used in a lot 
of other drivers besides the ulpt code.




panic: kernel fault
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:nop
db>  bt
data_access_fault(b5cbaa0, 30, 1476388, 0, 70, 400) at


Definitely a kernel problem but don't know the specifics.  You need to
dump the trapframe.


I think this is the same bug detailed above. I'm not exactly sure what 
you mean by needing to dump the trapframe unless it is what I've 
provided below.


#0  dumpsys () at ../../../../arch/sparc64/sparc64/machdep.c:755
#1  0x014abeb8 in cpu_reboot (howto=256, user_boot_string=0x0)
at ../../../../arch/sparc64/sparc64/machdep.c:623
#2  0x010c7a28 in db_sync_cmd (addr=190633464, have_addr=false, 
count=-1, modif=0xb5cd4d8 "")

at ../../../../ddb/db_command.c:1304
#3  0x010c821c in db_command (last_cmdp=0x180f678) at 
../../../../ddb/db_command.c:926
#4  0x010c8514 in db_command_loop () at 
../../../../ddb/db_command.c:583
#5  0x010cbc90 in db_trap (type=, code=0) 
at ../../../../ddb/db_trap.c:101
#6  0x014bbc6c in kdb_trap (type=48, tf=0xb5cd9e0) at 
../../../../arch/sparc64/sparc64/db_interface.c:498
#7  0x014b8604 in data_access_fault (tf=0xb5cd9e0, type=48, 
pc=21757420, addr=0, sfva=0, sfsr=8390665)

at ../../../../arch/sparc64/sparc64/trap.c:1200
#8  0x01008b24 in Ldatafault_internal ()
#9  0x01008b24 in Ldatafault_internal ()
Previous frame identical to this frame (corrupt stack?)

(gdb) bt full
[...]
#7  0x014b8604 in data_access_fault (tf=0xb5cd9e0, type=48, 
pc=21757420, addr=0, sfva=0, sfsr=8390665)

at ../../../../arch/sparc64/sparc64/trap.c:1200
l = (struct lwp *) 0xb30ef80
p = (struct proc *) 0x1823c98
vm = (struct vmspace *) 0xe0018000
va = 0
rv = 0
access_type = 1
onfault = 0
sticks = 128
ksi = {ksi_flags = 1, ksi_list = {cqe_next = 0xb5cd131, 
cqe_prev = 0x108360c}, ksi_info = {_signo = 0,

---Type  to continue, or q  to quit---
_code = 16352, _errno = 510, _pad = 33555456, _reason = {_rt = 
{_pid = 0, _uid = 16360, _value = {sival_int = 0,
  sival_ptr = 0x0}}, _child = {_pid = 0, _uid = 16360, _status 
= 0, _utime = 0, _stime = 0}, _fault = {
_addr = 0x3fe8, _trap = 0}, _poll = {_band = 16360, _fd = 0}}}, 
ksi_lid = 0}

lastdouble = 0
[...]

(gdb) frame 7
#7  0x014b8604 in data_access_fault (tf=0xb5cd9e0, type=48, 
pc=21757420, addr=0, sfva=0, sfsr=8390665)

at ../../../../arch/sparc64/sparc64/trap.c:1200
1200DEBUGGER(type, tf);
(gdb) print *tf
$1 = {tf_tstate = 17666409988, tf_pc = 21757420, tf_npc = 21757424, 
tf_fault = 0, tf_kstack = 0, tf_y = 0, tf_tt = 48,
  tf_pil = 0 '\0', tf_oldpil = 0 '\0', tf_global = {0, 4294967296, 
29442048, 0, 1, 29442048, 2504691800080896,
387520}, tf_out = {25980824, 5, 0, 25980824, 190635096, 0, 
190632721, 20291036}, tf_loc

Re: USB printing panic

2011-02-09 Thread Eduardo Horvath
On Wed, 9 Feb 2011, Bill Green wrote:

> I am running NetBSD 5.1 sparc64 on a Sun Ultra 5. A Samsung USB printer is
> connected to the system via an NEC-chipset PCI USB host.  Printing via CUPS
> using the SPLIX drivers (http://splix.sourceforge.net/) causes a kernel
> panic.  Printing via CUPS across the network (from hosts with
> their own drivers) works without problems.
> 
> As far as I can tell, a component of SPLIX (rastertoqpdl)
> crashes with SIGBUS, and this sometimes panics the kernel. I don't know what
> the bug in SPLIX is, either.
> 
> I'm attaching below a dmesg, and ddb backtraces from two panics; the second,

[...]
> panic: kernel fault
> Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:nop
> db> bt
> data_access_fault(b5cbaa0, 30, 1476388, 0, 70, 400) at

Definitely a kernel problem but don't know the specifics.  You need to 
dump the trapframe.


> cpu0: data fault: pc=14becc8 addr=0
> kernel trap 30: data access exception
> Stopped in pid 0.5 (system) at  netbsd:usbd_setup_xfer+0x8: ldub
> [
> %o0 + 0x70], %g3

This one is definitely a NULL pointer dereference in the kernel, probably 
in usbd_setup_xfer.

Eduardo


USB printing panic

2011-02-09 Thread Bill Green

Hello,

I am running NetBSD 5.1 sparc64 on a Sun Ultra 5. A Samsung USB printer 
is
connected to the system via an NEC-chipset PCI USB host.  Printing via 
CUPS

using the SPLIX drivers (http://splix.sourceforge.net/) causes a kernel
panic.  Printing via CUPS across the network (from hosts with
their own drivers) works without problems.

As far as I can tell, a component of SPLIX (rastertoqpdl)
crashes with SIGBUS, and this sometimes panics the kernel. I don't know 
what

the bug in SPLIX is, either.

I'm attaching below a dmesg, and ddb backtraces from two panics; the 
second,
occuring when a DEBUG kernel was running, is preceded by kernel 
debugging
messages (and I also enabled the debug code in dev/usb/ulpt.c).  I also 
executed
a few "show" commands in ddb, although I don't entirely understand 
them, in the hope

that they might be useful.  I don't know how to track this any further.
I do have the core dump from the last crash and would be happy to help
however possible.

Boot messages:

Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard
OpenBoot 3.11, 256 MB memory installed, Serial #10467420.
Ethernet address 8:0:20:9f:b8:5c, Host ID: 809fb85c.



Initializing Memory |
ok boot debug
Boot device: /pci@1f,0/pci@1,1/ide@3/disk@0,0  File and args: debug
NetBSD IEEE 1275 Bootblock

NetBSD/sparc64 OpenFirmware Boot, Revision 1.13

=0x859bd8
Loading debug: 7579168+367512+483888 [519120+340115]=0x9a2130
Loaded initial symtab at 0x18cfdc8, strtab at 0x194f098, # entries 
21604

consinit()
stdin node = f0061840
stdout package = f0061840
buffer @ 0x1c05ca0
console is /pci@1f,0/pci@1,1/ebus@1/se@14,40:a
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 
2005,

2006, 2007, 2008, 2009, 2010
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 5.1 (GENERIC-DEBUG) #4: Thu Feb  3 16:59:01 PST 2011

b...@puddle.supposedly.org:/home/bill/netbsd-5-1-source/usr/src/sys/arch/sparc64/compile/GENERIC-DEBUG

total memory = 256 MB
avail memory = 238 MB
mainbus0 (root): SUNW,Ultra-5_10 (Sun Ultra 5/10 UPA/PCI): hostid 
809fb85c

cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 270 MHz, UPA id 0
cpu0: 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 
b/l)

psycho0 at mainbus0 addr 0xfffc4000
psycho0: SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 2; PCI 
bus 0

extent `psycho mem' (0x0 - 0x), flags = 0x0
 0x0 - 0x807f
extent `psycho io' (0x0 - 0xff), flags = 0x0
 0x0 - 0x47f
DVMA map: c000 to e000
IOTSB: 1045a000 to 104da000
memory range: 01ff 
pci0 at psycho0
ppb0 at pci0 dev 1 function 1: Sun Microsystems Simba PCI bridge (rev. 
0x11)

pci1 at ppb0 bus 1
ebus0 at pci1 dev 1 function 0
ebus0: Sun Microsystems PCIO Ebus2, revision 0x01
auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, 
72c000-72c003, 72f000-72f003

power at ebus0 addr 724000-724003 ipl 37 not configured
SUNW,pll at ebus0 addr 504000-504002 not configured
sab0 at ebus0 addr 40-40007f ipl 43: rev 3.2
sabtty0 at sab0 port 0: console i/o
sabtty1 at sab0 port 1
com0 at ebus0 addr 3083f8-3083ff ipl 41: ns16550a, working fifo
kbd0 at com0
com1 at ebus0 addr 3062f8-3062ff ipl 42: ns16550a, working fifo
ms0 at com1
wsmouse0 at ms0 mux 0
lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 70-7f ipl 34
fdthree at ebus0 addr 3023f0-3023f7, 706000-70600f, 72-720003 ipl 
39 not configured

clock0 at ebus0 addr 0-1fff: mk48t59
flashprom at ebus0 addr 0-f not configured
audiocs0 at ebus0 addr 20-2000ff, 702000-70200f, 704000-70400f, 
722000-722003 ipl 35 ipl 36: CS4231A

audio0 at audiocs0: full duplex, playback, capture
hme0 at pci1 dev 1 function 1: Sun Happy Meal Ethernet, rev. 1
hme0: interrupting at ivec 3021
hme0: Ethernet address 08:00:20:9f:b8:5c
nsphy0 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
machfb0 at pci1 dev 2 function 0: ATI Technologies 3D Rage I/II (rev. 
0x9a)

machfb0: 16 MB aperture at 0xe100, 4 KB registers at 0x
machfb0: memctl 003210b3
machfb0: 2048 KB SGRAM 62.999 MHz, maximum RAMDAC clock 170 MHz
gen_cntl: 01000210
mach64_get_mode: 1152 5304 5432 1528 900 902 938 937
machfb0: initial resolution 1152x864 at 8 bpp
machfb0: attached to /dev/fb0
machfb0: initializing the DSP
wsdisplay1 at machfb0 kbdmux 1
cmdide0 at pci1 dev 3 function 0
cmdide0: CMD Technology PCI0646 (rev. 0x03)
cmdide0: primary channel configured to native-PCI mode
cmdide0: using ivec 1820 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
atabus1 at cmdide0 channel 1
ppb1 at pci0 dev 1 function 0: Sun Microsystems Simba PCI bridge (rev. 
0x11)

pci2 at ppb1 bus 2
ohci0 at pci2 dev 2 function 0: NEC USB Host Controller (rev. 0x43)
ohci0: interrupting at ivec 14
ohci0: OHCI version 1.0
usb0 at