Re: pool related crashes, but "kernel did no panic"

2016-06-09 Thread Alexey Suslikov
On Tue, May 31, 2016 at 7:16 PM, Theo de Raadt  wrote:
>> is exactly 80 characters long (such a long printf violates "80 chars"
>> rule, isn't it?).
>
> there is no hard and fast rule for that at all; printing extra newlines
> has other downsides such as the screen scrolling sooner.

Hi. I finally have a trace with pfsync related panic. See here

http://article.gmane.org/gmane.os.openbsd.bugs/23666



Re: pool related crashes, but "kernel did no panic"

2016-05-31 Thread Theo de Raadt
> is exactly 80 characters long (such a long printf violates "80 chars"
> rule, isn't it?).

there is no hard and fast rule for that at all; printing extra newlines
has other downsides such as the screen scrolling sooner.




Re: pool related crashes, but "kernel did no panic"

2016-05-31 Thread Alexey Suslikov
On Mon, May 30, 2016 at 9:02 PM, Ted Unangst  wrote:
> Alexey Suslikov wrote:
>> On Thu, May 12, 2016 at 4:14 PM, Bob Beck  wrote:
>> > Thank you!now that's a bug report..
>>
>> Hi.
>>
>> Moved to 6.0-beta some time ago to make crash dumps more up
>> to date. Also, removed some services to minimize their impact.
>>
>> Fresh build against today's cvs don't survived even half of the day.
>>
>> http://article.gmane.org/gmane.os.openbsd.bugs/23593
>>
>> For me, it looks like: 5.7-5.8 - rare crashes, 5.9-6.0 - more frequent
>> crashes.
>>
>> Backtrace differs from crash to crash, but this remains the same:
>>
>> Stopped at  pool_put+0x1dd: xorq0x8(%rax),%rcx
>>
>> Do you have any idea where should I look in a source code?
>
> sys/kern/subr_pool.c

Thanks for your replies. Especially Stefan who noticed "show pools"
output being truncated for some reason.

Here, kernel output is redirected to com, which is redirected to kvm,
browser with java applet is connected to kvm. This is how I get it.

amappl1: pool(0x81974640:amappl1): page inconsistency: page 0xff01e0

is exactly 80 characters long (such a long printf violates "80 chars"
rule, isn't it?).

Maybe there's a bug in kvm (java applet?) and output gets truncated.

Anyway, let's see, because now I run with the following:

Index: sys/kern/subr_pool.c
===
RCS file: /cvs/src/sys/kern/subr_pool.c,v
retrieving revision 1.194
diff -u -p -u -p -r1.194 subr_pool.c
--- sys/kern/subr_pool.c15 Jan 2016 11:21:58 -1.194
+++ sys/kern/subr_pool.c31 May 2016 09:10:21 -
@@ -1160,7 +1160,8 @@ pool_chk_page(struct pool *pp, struct po
 page = (caddr_t)((u_long)ph & pp->pr_pgmask);
 if (page != ph->ph_page && POOL_INPGHDR(pp)) {
 printf("%s: ", label);
-printf("pool(%p:%s): page inconsistency: page %p; "
+printf("pool(%p:%s):\n"
+"page inconsistency: page %p;\n"
 "at page head addr %p (p %p)\n",
 pp, pp->pr_wchan, ph->ph_page, ph, page);
 return 1;
@@ -1172,9 +1173,10 @@ pool_chk_page(struct pool *pp, struct po
 if ((caddr_t)pi < ph->ph_page ||
 (caddr_t)pi >= ph->ph_page + pp->pr_pgsize) {
 printf("%s: ", label);
-printf("pool(%p:%s): page inconsistency: page %p;"
-" item ordinal %d; addr %p\n", pp,
-pp->pr_wchan, ph->ph_page, n, pi);
+printf("pool(%p:%s):\n"
+"page inconsistency: page %p;\n"
+"item ordinal %d; addr %p\n",
+pp, pp->pr_wchan, ph->ph_page, n, pi);
 return (1);
 }

@@ -1204,16 +1206,18 @@ pool_chk_page(struct pool *pp, struct po
 #endif /* DIAGNOSTIC */
 }
 if (n + ph->ph_nmissing != pp->pr_itemsperpage) {
-printf("pool(%p:%s): page inconsistency: page %p;"
-" %d on list, %d missing, %d items per page\n", pp,
-pp->pr_wchan, ph->ph_page, n, ph->ph_nmissing,
+printf("pool(%p:%s):\n"
+"page inconsistency: page %p;\n"
+"%d on list, %d missing, %d items per page\n",
+pp, pp->pr_wchan, ph->ph_page, n, ph->ph_nmissing,
 pp->pr_itemsperpage);
 return 1;
 }
 if (expected >= 0 && n != expected) {
-printf("pool(%p:%s): page inconsistency: page %p;"
-" %d on list, %d missing, %d expected\n", pp,
-pp->pr_wchan, ph->ph_page, n, ph->ph_nmissing,
+printf("pool(%p:%s):\n"
+"page inconsistency: page %p;\n"
+"%d on list, %d missing, %d expected\n",
+pp, pp->pr_wchan, ph->ph_page, n, ph->ph_nmissing,
 expected);
 return 1;
 }



Re: pool related crashes, but "kernel did no panic"

2016-05-30 Thread Ted Unangst
Alexey Suslikov wrote:
> On Thu, May 12, 2016 at 4:14 PM, Bob Beck  wrote:
> > Thank you!now that's a bug report..
> 
> Hi.
> 
> Moved to 6.0-beta some time ago to make crash dumps more up
> to date. Also, removed some services to minimize their impact.
> 
> Fresh build against today's cvs don't survived even half of the day.
> 
> http://article.gmane.org/gmane.os.openbsd.bugs/23593
> 
> For me, it looks like: 5.7-5.8 - rare crashes, 5.9-6.0 - more frequent
> crashes.
> 
> Backtrace differs from crash to crash, but this remains the same:
> 
> Stopped at  pool_put+0x1dd: xorq0x8(%rax),%rcx
> 
> Do you have any idea where should I look in a source code?

sys/kern/subr_pool.c



Re: pool related crashes, but "kernel did no panic"

2016-05-30 Thread Alexey Suslikov
On Thu, May 12, 2016 at 4:14 PM, Bob Beck  wrote:
> Thank you!now that's a bug report..

Hi.

Moved to 6.0-beta some time ago to make crash dumps more up
to date. Also, removed some services to minimize their impact.

Fresh build against today's cvs don't survived even half of the day.

http://article.gmane.org/gmane.os.openbsd.bugs/23593

For me, it looks like: 5.7-5.8 - rare crashes, 5.9-6.0 - more frequent
crashes.

Backtrace differs from crash to crash, but this remains the same:

Stopped at  pool_put+0x1dd: xorq0x8(%rax),%rcx

Do you have any idea where should I look in a source code?

Thanks.



Re: pool related crashes, but "kernel did no panic"

2016-05-13 Thread Alexey Suslikov
On Fri, May 13, 2016 at 3:59 AM, David Gwynne  wrote:
>
>> On 12 May 2016, at 20:28, Alexey Suslikov  wrote:
>>
>> On Wed, Apr 27, 2016 at 7:22 PM, Theo de Raadt  
>> wrote:
 On 27/04/16(Wed) 15:45, Alexey Suslikov wrote:
> Theo de Raadt  cvs.openbsd.org> writes:
>
>>
>> Most of these bug reports completely stink.
>>
>> ALWAYS include *ALL* information in a report.
>
> In an idealistic world, yes.

 In an idealistic world their would be no bug.
>>>
>>> In an idealistic world, Alexey Suslikov wouldn't feel compelled to
>>> defend sloppiness.
>>
>> follow up is here
>>
>> http://marc.info/?l=openbsd-bugs=146304833425471=2
>> http://marc.info/?l=openbsd-bugs=146304864925575=2
>>
>
> this shoudl be fixed in stable. can you make sure you have the following:
>
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/uipc_mbuf.c.diff?r1=1.219=1.219.2.1

what do you think about this (new) one

http://marc.info/?l=openbsd-bugs=146312050712969=2

I really can do more to debug this and asked for an advice
from the begging of this thread.



Re: pool related crashes, but "kernel did no panic"

2016-05-12 Thread Bob Beck
Thank you!now that's a bug report..


On Thu, May 12, 2016 at 4:28 AM, Alexey Suslikov
 wrote:
> On Wed, Apr 27, 2016 at 7:22 PM, Theo de Raadt  
> wrote:
>>> On 27/04/16(Wed) 15:45, Alexey Suslikov wrote:
>>> > Theo de Raadt  cvs.openbsd.org> writes:
>>> >
>>> > >
>>> > > Most of these bug reports completely stink.
>>> > >
>>> > > ALWAYS include *ALL* information in a report.
>>> >
>>> > In an idealistic world, yes.
>>>
>>> In an idealistic world their would be no bug.
>>
>> In an idealistic world, Alexey Suslikov wouldn't feel compelled to
>> defend sloppiness.
>
> follow up is here
>
> http://marc.info/?l=openbsd-bugs=146304833425471=2
> http://marc.info/?l=openbsd-bugs=146304864925575=2
>



Re: pool related crashes, but "kernel did no panic"

2016-05-12 Thread Alexey Suslikov
On Wed, Apr 27, 2016 at 7:22 PM, Theo de Raadt  wrote:
>> On 27/04/16(Wed) 15:45, Alexey Suslikov wrote:
>> > Theo de Raadt  cvs.openbsd.org> writes:
>> >
>> > >
>> > > Most of these bug reports completely stink.
>> > >
>> > > ALWAYS include *ALL* information in a report.
>> >
>> > In an idealistic world, yes.
>>
>> In an idealistic world their would be no bug.
>
> In an idealistic world, Alexey Suslikov wouldn't feel compelled to
> defend sloppiness.

follow up is here

http://marc.info/?l=openbsd-bugs=146304833425471=2
http://marc.info/?l=openbsd-bugs=146304864925575=2



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Bob Beck
On Wed, Apr 27, 2016 at 03:45:45PM +, Alexey Suslikov wrote:
> Theo de Raadt  cvs.openbsd.org> writes:
> 
> > 
> > Most of these bug reports completely stink.
> > 
> > ALWAYS include *ALL* information in a report.
> 
> In an idealistic world, yes.
> 
> Above are not parts of the "chain", but different statements of the
> same bug. To have both blue screen and ddb, I need to keep kvm console
> running in a browser for undefined period of time (crash can occur twice
> per day, or once per 2 months), which isn't as easy as it seems.

http://www.openbsd.org/report.html

We are pretty clear in there what you need. and if you don't have all the 
information, there's
really not a lot we can do.. we don't ask you to include it for decorative 
purposes, we ask 
so we can actually know what's going on - without it your report is only an 
exercise in frustration
for all of us




Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Theo de Raadt
> On 27/04/16(Wed) 15:45, Alexey Suslikov wrote:
> > Theo de Raadt  cvs.openbsd.org> writes:
> > 
> > > 
> > > Most of these bug reports completely stink.
> > > 
> > > ALWAYS include *ALL* information in a report.
> > 
> > In an idealistic world, yes.
> 
> In an idealistic world their would be no bug.

In an idealistic world, Alexey Suslikov wouldn't feel compelled to
defend sloppiness.



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Martin Pieuchot
On 27/04/16(Wed) 15:45, Alexey Suslikov wrote:
> Theo de Raadt  cvs.openbsd.org> writes:
> 
> > 
> > Most of these bug reports completely stink.
> > 
> > ALWAYS include *ALL* information in a report.
> 
> In an idealistic world, yes.

In an idealistic world their would be no bug.

> Above are not parts of the "chain", but different statements of the
> same bug. To have both blue screen and ddb, I need to keep kvm console
> running in a browser for undefined period of time (crash can occur twice
> per day, or once per 2 months), which isn't as easy as it seems.

Come on, your bug reports are useless because you don't include a dmesg,
how hard it is to, do so?  If you don't include a dmesg, do not spend
your time reporting a bug it is useless.



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Alexey Suslikov
Theo de Raadt  cvs.openbsd.org> writes:

> 
> Most of these bug reports completely stink.
> 
> ALWAYS include *ALL* information in a report.

In an idealistic world, yes.

Above are not parts of the "chain", but different statements of the
same bug. To have both blue screen and ddb, I need to keep kvm console
running in a browser for undefined period of time (crash can occur twice
per day, or once per 2 months), which isn't as easy as it seems.

But sure I'll try to fill more complete report.



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Theo de Raadt
Most of these bug reports completely stink.

ALWAYS include *ALL* information in a report.

If you are told your report is missing information, write a completely
fresh report that includes ALL INFORMATION.  Don't reply in a series
of emails adding more and more information.  People who submit reports
which are missing information should feel terrible.

People in this project are not being paid to reconstruct sloppy email
chains of partial information.

It is a simple request, and we need to be firm.

> On Wed, Apr 27, 2016 at 09:13:40AM +, alexey.susli...@gmail.com wrote:
> > Hi tech@.
> > 
> > (Maybe related to http://marc.info/?l=openbsd-bugs=146174654219490=2).
>  
> ;-)
> 
> > Crashing server acts as a carp backup (master has same hardware config but
> > don't crash, in contrast to backup). Will post additional information if
> > necessary.
> 
> In my case, the server is acting as a backup for 2 carp devices and also
> as a master for 2 other carp devices.
> But indeed, it is always the same node (part of a 2 nodes setup) that is
> crashing.  This node just crached again a few minutes ago. It seems
> upgrading it to 5.9 makes the bug more frequent. So I am keeping the
> other node with «OpenBSD 5.8-current (GENERIC.MP) #1661: Tue Nov 24
> 20:16:36 MST 2015» for now.
> 
> 
> Here is frech output:
> 
> ddb{2}> trace
> Debugger() at Debugger+0x9
> panic() at panic+0xfe
> pool_runqueue() at pool_runqueue
> pool_get() at pool_get+0xb5
> m_clget() at m_clget+0x51
> m_dup_pkt() at m_dup_pkt+0x88
> carp_input() at carp_input+0x17c
> if_input_process() at if_input_process+0xcd
> taskq_thread() at taskq_thread+0x6c
> end trace frame: 0x0, count: -9
> ddb{2}> show panic
> pool_do_get: mcl2k free list modified: page 0xff00f1ec7000; item
> addr 0xfff
> fff00f1eca800; offset 0x0=0x0 != 0xaaa0cffd8d1e5cb4
> ddb{2}> show register
> rdi  0x1
> rsi0x292
> rbp   0x800022519b50
> rbx   0x817195a0systqmp+0x1860
> rdx0
> rcx   0x8004f000
> rax  0x1
> r80x800022519a70
> r9 0
> r10   0x800022519a20
> r11  0x8
> r120x100
> r13   0x800022519b60
> r14  0x2
> r15  0x2
> rip   0x81349a09Debugger+0x9
> cs   0x8
> rflags 0x282
> rsp   0x800022519b40
> ss  0x10
> Debugger+0x9:   leave
> ddb{2}>
> 
> 
> --
> oc
> 



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Alexey Suslikov
Stuart Henderson  spacehopper.org> writes:

> There should be some lines printed before you get dumped into DDB
> (probably a uvm_fault), the information in them is important.

I either have a screenshot, or ddb. Not both at the same time.

Here is one of screenshots from 5.9 transcribed:

uvm_fault(0x81940240, 0x10, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 811a5c3e cs 8 rflags 10206 cr 2 10 cpl 
a rsp 800022171e20
panic: trap type 6, code=0, pc=811a5c3e
Starting stack trace...
panic() at panic+0x10b
trap() at trap+0x7b8
--- trap (number 6) ---
pool_p_free() at pool_p_free+0x7e
pool_gc_pages() at pool_gc_pages+0xe4
taskq_thread() at taskq_thread+0x6c
end trace frame: 0x0, count: 252
End of stack trace.
syncing disks... 5 done



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Alexey Suslikov
Another one from my collection.

Apr 16:

ddb{0}> show panic
the kernel did not panic

ddb{0}> trace
pool_do_get() at pool_do_get+0x90
pool_get() at pool_get+0xb5
m_get() at m_get+0x28
sbappendaddr() at sbappendaddr+0x9a
uipc_usrreq() at uipc_usrreq+0x3b8
sosend() at sosend+0x3d8
dosendsyslog() at dosendsyslog+0x110
sys_sendsyslog2() at sys_sendsyslog2+0xbd
syscall() at syscall+0x368
--- syscall (number 112) ---
end of kernel
end trace frame: 0x183f8dab6913, count: -9
0x1842755e571a:

ddb{0}> show registers
rdi  0x7
rsi   0x9ff5c49ed229ae92
rbp   0x8000222f5b00
rbx   0xff022d80d6d0
rdx   0x8000222f5b64
rcx   0x818c76e0cpu_info_primary
rax   0x7293fa06e984af44
r8 0
r9   0x1
r10   0x811c7c00uipc_usrreq
r11   0x81344be0copy_fault
r12   0x8194c000mbpool
r13   0xff40b152a900
r14  0x2
r15   0x818b4570sun_noname
rip   0x811a5340pool_do_get+0x90
cs   0x8
rflags   0x10282__ALIGN_SIZE+0xf282
rsp   0x8000222f5ab0
ss  0x10
pool_do_get+0x90:   movq0(%r13),%rdi



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Stuart Henderson
On 2016/04/27 13:54, Alexey Suslikov wrote:
> Another one from my collection.
> 
> Apr 16:
> 
> ddb{0}> show panic
> the kernel did not panic

There should be some lines printed before you get dumped into DDB
(probably a uvm_fault), the information in them is important.


> ddb{0}> trace
> pool_do_get() at pool_do_get+0x90
> pool_get() at pool_get+0xb5
> m_get() at m_get+0x28
> sbappendaddr() at sbappendaddr+0x9a
> uipc_usrreq() at uipc_usrreq+0x3b8
> sosend() at sosend+0x3d8
> dosendsyslog() at dosendsyslog+0x110
> sys_sendsyslog2() at sys_sendsyslog2+0xbd
> syscall() at syscall+0x368
> --- syscall (number 112) ---
> end of kernel
> end trace frame: 0x183f8dab6913, count: -9
> 0x1842755e571a:
> 
> ddb{0}> show registers
> rdi  0x7
> rsi   0x9ff5c49ed229ae92
> rbp   0x8000222f5b00
> rbx   0xff022d80d6d0
> rdx   0x8000222f5b64
> rcx   0x818c76e0cpu_info_primary
> rax   0x7293fa06e984af44
> r8 0
> r9   0x1
> r10   0x811c7c00uipc_usrreq
> r11   0x81344be0copy_fault
> r12   0x8194c000mbpool
> r13   0xff40b152a900
> r14  0x2
> r15   0x818b4570sun_noname
> rip   0x811a5340pool_do_get+0x90
> cs   0x8
> rflags   0x10282__ALIGN_SIZE+0xf282
> rsp   0x8000222f5ab0
> ss  0x10
> pool_do_get+0x90:   movq0(%r13),%rdi
> 



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Bob Beck



On Wed, Apr 27, 2016 at 02:57:31PM +0200, Olivier Cherrier wrote:
> On Wed, Apr 27, 2016 at 09:13:40AM +, alexey.susli...@gmail.com wrote:
> > Hi tech@.
> > 
> > (Maybe related to http://marc.info/?l=openbsd-bugs=146174654219490=2).
>  
> ;-)
> 
> > Crashing server acts as a carp backup (master has same hardware config but
> > don't crash, in contrast to backup). Will post additional information if
> > necessary.
> 
> In my case, the server is acting as a backup for 2 carp devices and also
> as a master for 2 other carp devices.
> But indeed, it is always the same node (part of a 2 nodes setup) that is
> crashing.  This node just crached again a few minutes ago. It seems
> upgrading it to 5.9 makes the bug more frequent. So I am keeping the
> other node with ?OpenBSD 5.8-current (GENERIC.MP) #1661: Tue Nov 24
> 20:16:36 MST 2015? for now.
> 
> 
> Here is frech output:
> 
> ddb{2}> trace
> Debugger() at Debugger+0x9
> panic() at panic+0xfe

show panic please

> pool_runqueue() at pool_runqueue
> pool_get() at pool_get+0xb5
> m_clget() at m_clget+0x51
> m_dup_pkt() at m_dup_pkt+0x88
> carp_input() at carp_input+0x17c
> if_input_process() at if_input_process+0xcd
> taskq_thread() at taskq_thread+0x6c
> end trace frame: 0x0, count: -9
> ddb{2}> show panic
> pool_do_get: mcl2k free list modified: page 0xff00f1ec7000; item
> addr 0xfff
> fff00f1eca800; offset 0x0=0x0 != 0xaaa0cffd8d1e5cb4
> ddb{2}> show register
> rdi  0x1
> rsi0x292
> rbp   0x800022519b50
> rbx   0x817195a0systqmp+0x1860
> rdx0
> rcx   0x8004f000
> rax  0x1
> r80x800022519a70
> r9 0
> r10   0x800022519a20
> r11  0x8
> r120x100
> r13   0x800022519b60
> r14  0x2
> r15  0x2
> rip   0x81349a09Debugger+0x9
> cs   0x8
> rflags 0x282
> rsp   0x800022519b40
> ss  0x10
> Debugger+0x9:   leave
> ddb{2}>
> 
> 
> --
> oc
> 



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Olivier Cherrier
On Wed, Apr 27, 2016 at 09:13:40AM +, alexey.susli...@gmail.com wrote:
> Hi tech@.
> 
> (Maybe related to http://marc.info/?l=openbsd-bugs=146174654219490=2).
 
;-)

> Crashing server acts as a carp backup (master has same hardware config but
> don't crash, in contrast to backup). Will post additional information if
> necessary.

In my case, the server is acting as a backup for 2 carp devices and also
as a master for 2 other carp devices.
But indeed, it is always the same node (part of a 2 nodes setup) that is
crashing.  This node just crached again a few minutes ago. It seems
upgrading it to 5.9 makes the bug more frequent. So I am keeping the
other node with «OpenBSD 5.8-current (GENERIC.MP) #1661: Tue Nov 24
20:16:36 MST 2015» for now.


Here is frech output:

ddb{2}> trace
Debugger() at Debugger+0x9
panic() at panic+0xfe
pool_runqueue() at pool_runqueue
pool_get() at pool_get+0xb5
m_clget() at m_clget+0x51
m_dup_pkt() at m_dup_pkt+0x88
carp_input() at carp_input+0x17c
if_input_process() at if_input_process+0xcd
taskq_thread() at taskq_thread+0x6c
end trace frame: 0x0, count: -9
ddb{2}> show panic
pool_do_get: mcl2k free list modified: page 0xff00f1ec7000; item
addr 0xfff
fff00f1eca800; offset 0x0=0x0 != 0xaaa0cffd8d1e5cb4
ddb{2}> show register
rdi  0x1
rsi0x292
rbp   0x800022519b50
rbx   0x817195a0systqmp+0x1860
rdx0
rcx   0x8004f000
rax  0x1
r80x800022519a70
r9 0
r10   0x800022519a20
r11  0x8
r120x100
r13   0x800022519b60
r14  0x2
r15  0x2
rip   0x81349a09Debugger+0x9
cs   0x8
rflags 0x282
rsp   0x800022519b40
ss  0x10
Debugger+0x9:   leave
ddb{2}>


--
oc



Re: pool related crashes, but "kernel did no panic"

2016-04-27 Thread Martin Pieuchot
On 27/04/16(Wed) 09:13, Alexey Suslikov wrote:
> Hi tech@.
> 
> (Maybe related to http://marc.info/?l=openbsd-bugs=146174654219490=2).

Maybe maybe not.  Please keep send your bug reports to bugs@ with all
the required informations.