Re: [smartos-discuss] Re-occurrence of bug 3917

2016-09-20 Thread Brian Bennett
Len or Zak, would either of you be able to provide us with a crash dump? It 
should be in /var/crash/volatile on any CN that had this panic occur.


-- 
Brian Bennett
Systems Engineer, Cloud Operations
Joyent, Inc. | www.joyent.com 
> On Sep 19, 2016, at 5:09 AM, Len Weincier  wrote:
> 
> Hi 
> 
> This has now happened to 3 hosts in the last 3 days. Any idea what we can 
> look at ?
> 
> It seems to happen under high load on those systems, all older E5 based hosts.
> 
> We just had another reboot and this is in /var/adm/messages 
> 
> 2016-09-19T11:49:39.065057+00:00 c1a unix: [ID 836849 kern.notice] 
> #012#015panic[cpu14]/thread=ff19e4687420: 
> 2016-09-19T11:49:39.065068+00:00 c1a genunix: [ID 761616 kern.notice] 
> turnstile_block(ff19ab6e9230): unowned mutex
> 2016-09-19T11:49:39.065074+00:00 c1a unix: [ID 10 kern.notice] #012
> 2016-09-19T11:49:39.065079+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689380 genunix:turnstile_block+78a ()
> 2016-09-19T11:49:39.065084+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba6893f0 unix:mutex_vector_enter+3a3 ()
> 2016-09-19T11:49:39.065089+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba6894c0 vnd:vnd_mac_input+12a ()
> 2016-09-19T11:49:39.065094+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689580 dls:dls_rx_promisc+119 ()
> 2016-09-19T11:49:39.065099+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba6895e0 mac:mac_promisc_dispatch_one+81 ()
> 2016-09-19T11:49:39.065104+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689660 mac:mac_promisc_dispatch+b2 ()
> 2016-09-19T11:49:39.065109+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689750 mac:mac_tx_send+33f ()
> 2016-09-19T11:49:39.065114+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba6897f0 mac:mac_tx_single_ring_mode+6e ()
> 2016-09-19T11:49:39.065132+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba6898a0 mac:mac_tx+da ()
> 2016-09-19T11:49:39.065139+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689950 dld:str_mdata_raw_fastpath_put+85 ()
> 2016-09-19T11:49:39.065144+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689990 vnd:vnd_squeue_tx_one+6a ()
> 2016-09-19T11:49:39.065149+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689a20 vnd:vnd_squeue_tx_drain+112 ()
> 2016-09-19T11:49:39.065164+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689ac0 vnd:vnd_squeue_tx_append+103 ()
> 2016-09-19T11:49:39.065170+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689b50 ip:squeue_enter+41c ()
> 2016-09-19T11:49:39.065175+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689ba0 gsqueue:gsqueue_enter_one+43 ()
> 2016-09-19T11:49:39.065179+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689c40 vnd:vnd_frameio_write+10e ()
> 2016-09-19T11:49:39.065184+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689cc0 vnd:vnd_ioctl+270 ()
> 2016-09-19T11:49:39.065196+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689d00 genunix:cdev_ioctl+39 ()
> 2016-09-19T11:49:39.065202+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689d50 specfs:spec_ioctl+60 ()
> 2016-09-19T11:49:39.065208+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689de0 genunix:fop_ioctl+55 ()
> 2016-09-19T11:49:39.065213+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689f00 genunix:ioctl+9b ()
> 2016-09-19T11:49:39.065218+00:00 c1a genunix: [ID 655072 kern.notice] 
> ff00ba689f10 unix:brand_sys_syscall+238 ()
> 2016-09-19T11:49:39.065224+00:00 c1a unix: [ID 10 kern.notice]
> 
> 
> Thanks
> Len
> 
> On Mon, 19 Sep 2016 at 09:34 Zak McGregor  > wrote:
> Hi Brian
> 
> Thanks, here's a full stack trace.
> 
> Ciao
> 
> Zak
> 
> On 18 September 2016 at 21:58, Brian Bennett  > wrote:
> > Zak,
> >
> > Considering that illumos #3917 is three years old, you've probably hit a 
> > different bug involving mutexes. It would be best if you can give the full 
> > stack trace, not just the top two frames. Having the full stack trace, I 
> > may be able to identify the particular crash you encountered.
> >
> > Thanks.
> >
> > --
> > Brian Bennett
> > Systems Engineer, Cloud Operations
> > Joyent, Inc. | www.joyent.com 
> >
> >> On Sep 16, 2016, at 4:04 AM, Zak McGregor  >> > wrote:
> >>
> >> Hi
> >>
> >> This issue here:
> >> https://illumos.org/issues/3917 
> >>
> >> seems to have hit one of our production boxes today. I took a look at
> >> the dump and it seems to tally precisely with this issue.
> >>
> >> Here's a snippet:
> >>
> >> mdb -k unix.0 vmcore.0
> >> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
> >> pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba uhci mm stmf_sbd
> >> stmf zfs lofs idm crypto random 

Re: [smartos-discuss] Re-occurrence of bug 3917

2016-09-20 Thread Len Weincier
Hi Brian

How do we get the dump to you ?

Thanks
Len


On Tue, 20 Sep 2016 at 20:21 Brian Bennett  wrote:

> Len or Zak, would either of you be able to provide us with a crash dump?
> It should be in /var/crash/volatile on any CN that had this panic occur.
>
>
> --
> Brian Bennett
> Systems Engineer, Cloud Operations
> Joyent, Inc. | www.joyent.com
>
> On Sep 19, 2016, at 5:09 AM, Len Weincier  wrote:
>
> Hi
>
> This has now happened to 3 hosts in the last 3 days. Any idea what we can
> look at ?
>
> It seems to happen under high load on those systems, all older E5 based
> hosts.
>
> We just had another reboot and this is in /var/adm/messages
>
> 2016-09-19T11:49:39.065057+00:00 c1a unix: [ID 836849 kern.notice]
> #012#015panic[cpu14]/thread=ff19e4687420:
> 2016-09-19T11:49:39.065068+00:00 c1a genunix: [ID 761616 kern.notice]
> turnstile_block(ff19ab6e9230): unowned mutex
> 2016-09-19T11:49:39.065074+00:00 c1a unix: [ID 10 kern.notice] #012
> 2016-09-19T11:49:39.065079+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689380 genunix:turnstile_block+78a ()
> 2016-09-19T11:49:39.065084+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba6893f0 unix:mutex_vector_enter+3a3 ()
> 2016-09-19T11:49:39.065089+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba6894c0 vnd:vnd_mac_input+12a ()
> 2016-09-19T11:49:39.065094+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689580 dls:dls_rx_promisc+119 ()
> 2016-09-19T11:49:39.065099+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba6895e0 mac:mac_promisc_dispatch_one+81 ()
> 2016-09-19T11:49:39.065104+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689660 mac:mac_promisc_dispatch+b2 ()
> 2016-09-19T11:49:39.065109+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689750 mac:mac_tx_send+33f ()
> 2016-09-19T11:49:39.065114+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba6897f0 mac:mac_tx_single_ring_mode+6e ()
> 2016-09-19T11:49:39.065132+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba6898a0 mac:mac_tx+da ()
> 2016-09-19T11:49:39.065139+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689950 dld:str_mdata_raw_fastpath_put+85 ()
> 2016-09-19T11:49:39.065144+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689990 vnd:vnd_squeue_tx_one+6a ()
> 2016-09-19T11:49:39.065149+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689a20 vnd:vnd_squeue_tx_drain+112 ()
> 2016-09-19T11:49:39.065164+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689ac0 vnd:vnd_squeue_tx_append+103 ()
> 2016-09-19T11:49:39.065170+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689b50 ip:squeue_enter+41c ()
> 2016-09-19T11:49:39.065175+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689ba0 gsqueue:gsqueue_enter_one+43 ()
> 2016-09-19T11:49:39.065179+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689c40 vnd:vnd_frameio_write+10e ()
> 2016-09-19T11:49:39.065184+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689cc0 vnd:vnd_ioctl+270 ()
> 2016-09-19T11:49:39.065196+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689d00 genunix:cdev_ioctl+39 ()
> 2016-09-19T11:49:39.065202+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689d50 specfs:spec_ioctl+60 ()
> 2016-09-19T11:49:39.065208+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689de0 genunix:fop_ioctl+55 ()
> 2016-09-19T11:49:39.065213+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689f00 genunix:ioctl+9b ()
> 2016-09-19T11:49:39.065218+00:00 c1a genunix: [ID 655072 kern.notice]
> ff00ba689f10 unix:brand_sys_syscall+238 ()
> 2016-09-19T11:49:39.065224+00:00 c1a unix: [ID 10 kern.notice]
>
>
> Thanks
> Len
>
> On Mon, 19 Sep 2016 at 09:34 Zak McGregor  wrote:
>
>> Hi Brian
>>
>> Thanks, here's a full stack trace.
>>
>> Ciao
>>
>> Zak
>>
>> On 18 September 2016 at 21:58, Brian Bennett 
>> wrote:
>> > Zak,
>> >
>> > Considering that illumos #3917 is three years old, you've probably hit
>> a different bug involving mutexes. It would be best if you can give the
>> full stack trace, not just the top two frames. Having the full stack trace,
>> I may be able to identify the particular crash you encountered.
>> >
>> > Thanks.
>> >
>> > --
>> > Brian Bennett
>> > Systems Engineer, Cloud Operations
>> > Joyent, Inc. | www.joyent.com
>> >
>> >> On Sep 16, 2016, at 4:04 AM, Zak McGregor  wrote:
>> >>
>> >> Hi
>> >>
>> >> This issue here:
>> >> https://illumos.org/issues/3917
>> >>
>> >> seems to have hit one of our production boxes today. I took a look at
>> >> the dump and it seems to tally precisely with this issue.
>> >>
>> >> Here's a snippet:
>> >>
>> >> mdb -k unix.0 vmcore.0
>> >> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
>> >> pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba uhci mm stmf_sbd
>> >> stmf zfs lofs idm crypto random cpc logindmux ptm kvm sd sppp nsmb
>> >> 

Re: [smartos-discuss] Re-occurrence of bug 3917

2016-09-20 Thread Brian Bennett
You can either put it somewhere I can download it, or I can give you a Manta 
URL to upload to.


-- 
Brian Bennett
Systems Engineer, Cloud Operations
Joyent, Inc. | www.joyent.com 
> On Sep 20, 2016, at 11:22 AM, Len Weincier  wrote:
> 
> Hi Brian
> 
> How do we get the dump to you ?
> 
> Thanks
> Len
> 
> 
> On Tue, 20 Sep 2016 at 20:21 Brian Bennett  > wrote:
> Len or Zak, would either of you be able to provide us with a crash dump? It 
> should be in /var/crash/volatile on any CN that had this panic occur.
> 
> 
> -- 
> Brian Bennett
> Systems Engineer, Cloud Operations
> Joyent, Inc. | www.joyent.com 
>> On Sep 19, 2016, at 5:09 AM, Len Weincier > > wrote:
>> 
>> Hi 
>> 
>> This has now happened to 3 hosts in the last 3 days. Any idea what we can 
>> look at ?
>> 
>> It seems to happen under high load on those systems, all older E5 based 
>> hosts.
>> 
>> We just had another reboot and this is in /var/adm/messages 
>> 
>> 2016-09-19T11:49:39.065057+00:00 c1a unix: [ID 836849 kern.notice] 
>> #012#015panic[cpu14]/thread=ff19e4687420: 
>> 2016-09-19T11:49:39.065068+00:00 c1a genunix: [ID 761616 kern.notice] 
>> turnstile_block(ff19ab6e9230): unowned mutex
>> 2016-09-19T11:49:39.065074+00:00 c1a unix: [ID 10 kern.notice] #012
>> 2016-09-19T11:49:39.065079+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689380 genunix:turnstile_block+78a ()
>> 2016-09-19T11:49:39.065084+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba6893f0 unix:mutex_vector_enter+3a3 ()
>> 2016-09-19T11:49:39.065089+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba6894c0 vnd:vnd_mac_input+12a ()
>> 2016-09-19T11:49:39.065094+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689580 dls:dls_rx_promisc+119 ()
>> 2016-09-19T11:49:39.065099+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba6895e0 mac:mac_promisc_dispatch_one+81 ()
>> 2016-09-19T11:49:39.065104+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689660 mac:mac_promisc_dispatch+b2 ()
>> 2016-09-19T11:49:39.065109+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689750 mac:mac_tx_send+33f ()
>> 2016-09-19T11:49:39.065114+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba6897f0 mac:mac_tx_single_ring_mode+6e ()
>> 2016-09-19T11:49:39.065132+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba6898a0 mac:mac_tx+da ()
>> 2016-09-19T11:49:39.065139+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689950 dld:str_mdata_raw_fastpath_put+85 ()
>> 2016-09-19T11:49:39.065144+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689990 vnd:vnd_squeue_tx_one+6a ()
>> 2016-09-19T11:49:39.065149+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689a20 vnd:vnd_squeue_tx_drain+112 ()
>> 2016-09-19T11:49:39.065164+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689ac0 vnd:vnd_squeue_tx_append+103 ()
>> 2016-09-19T11:49:39.065170+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689b50 ip:squeue_enter+41c ()
>> 2016-09-19T11:49:39.065175+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689ba0 gsqueue:gsqueue_enter_one+43 ()
>> 2016-09-19T11:49:39.065179+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689c40 vnd:vnd_frameio_write+10e ()
>> 2016-09-19T11:49:39.065184+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689cc0 vnd:vnd_ioctl+270 ()
>> 2016-09-19T11:49:39.065196+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689d00 genunix:cdev_ioctl+39 ()
>> 2016-09-19T11:49:39.065202+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689d50 specfs:spec_ioctl+60 ()
>> 2016-09-19T11:49:39.065208+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689de0 genunix:fop_ioctl+55 ()
>> 2016-09-19T11:49:39.065213+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689f00 genunix:ioctl+9b ()
>> 2016-09-19T11:49:39.065218+00:00 c1a genunix: [ID 655072 kern.notice] 
>> ff00ba689f10 unix:brand_sys_syscall+238 ()
>> 2016-09-19T11:49:39.065224+00:00 c1a unix: [ID 10 kern.notice]
>> 
>> 
>> Thanks
>> Len
>> 
>> On Mon, 19 Sep 2016 at 09:34 Zak McGregor > > wrote:
>> Hi Brian
>> 
>> Thanks, here's a full stack trace.
>> 
>> Ciao
>> 
>> Zak
>> 
>> On 18 September 2016 at 21:58, Brian Bennett > > wrote:
>> > Zak,
>> >
>> > Considering that illumos #3917 is three years old, you've probably hit a 
>> > different bug involving mutexes. It would be best if you can give the full 
>> > stack trace, not just the top two frames. Having the full stack trace, I 
>> > may be able to identify the particular crash you encountered.
>> >
>> > Thanks.
>> >
>> > --
>> > Brian Bennett
>> > Systems Engineer, Cloud Operations
>> > Joyent, Inc. | www.joyent.com 

Re: [smartos-discuss] Re-occurrence of bug 3917

2016-09-20 Thread Len Weincier
Hi Brian

I have the file, can I get a manta url please ?

Thanks
Len


On Tue, 20 Sep 2016 at 20:32 Brian Bennett  wrote:

> You can either put it somewhere I can download it, or I can give you a
> Manta URL to upload to.
>
>
> --
> Brian Bennett
> Systems Engineer, Cloud Operations
> Joyent, Inc. | www.joyent.com
>
> On Sep 20, 2016, at 11:22 AM, Len Weincier  wrote:
>
> Hi Brian
>
> How do we get the dump to you ?
>
> Thanks
> Len
>
>
> On Tue, 20 Sep 2016 at 20:21 Brian Bennett 
> wrote:
>
> Len or Zak, would either of you be able to provide us with a crash dump?
>> It should be in /var/crash/volatile on any CN that had this panic occur.
>>
>>
>> --
>> Brian Bennett
>> Systems Engineer, Cloud Operations
>> Joyent, Inc. | www.joyent.com
>>
>> On Sep 19, 2016, at 5:09 AM, Len Weincier  wrote:
>>
>> Hi
>>
>> This has now happened to 3 hosts in the last 3 days. Any idea what we can
>> look at ?
>>
>> It seems to happen under high load on those systems, all older E5 based
>> hosts.
>>
>> We just had another reboot and this is in /var/adm/messages
>>
>> 2016-09-19T11:49:39.065057+00:00 c1a unix: [ID 836849 kern.notice]
>> #012#015panic[cpu14]/thread=ff19e4687420:
>> 2016-09-19T11:49:39.065068+00:00 c1a genunix: [ID 761616 kern.notice]
>> turnstile_block(ff19ab6e9230): unowned mutex
>> 2016-09-19T11:49:39.065074+00:00 c1a unix: [ID 10 kern.notice] #012
>> 2016-09-19T11:49:39.065079+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689380 genunix:turnstile_block+78a ()
>> 2016-09-19T11:49:39.065084+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba6893f0 unix:mutex_vector_enter+3a3 ()
>> 2016-09-19T11:49:39.065089+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba6894c0 vnd:vnd_mac_input+12a ()
>> 2016-09-19T11:49:39.065094+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689580 dls:dls_rx_promisc+119 ()
>> 2016-09-19T11:49:39.065099+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba6895e0 mac:mac_promisc_dispatch_one+81 ()
>> 2016-09-19T11:49:39.065104+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689660 mac:mac_promisc_dispatch+b2 ()
>> 2016-09-19T11:49:39.065109+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689750 mac:mac_tx_send+33f ()
>> 2016-09-19T11:49:39.065114+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba6897f0 mac:mac_tx_single_ring_mode+6e ()
>> 2016-09-19T11:49:39.065132+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba6898a0 mac:mac_tx+da ()
>> 2016-09-19T11:49:39.065139+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689950 dld:str_mdata_raw_fastpath_put+85 ()
>> 2016-09-19T11:49:39.065144+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689990 vnd:vnd_squeue_tx_one+6a ()
>> 2016-09-19T11:49:39.065149+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689a20 vnd:vnd_squeue_tx_drain+112 ()
>> 2016-09-19T11:49:39.065164+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689ac0 vnd:vnd_squeue_tx_append+103 ()
>> 2016-09-19T11:49:39.065170+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689b50 ip:squeue_enter+41c ()
>> 2016-09-19T11:49:39.065175+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689ba0 gsqueue:gsqueue_enter_one+43 ()
>> 2016-09-19T11:49:39.065179+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689c40 vnd:vnd_frameio_write+10e ()
>> 2016-09-19T11:49:39.065184+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689cc0 vnd:vnd_ioctl+270 ()
>> 2016-09-19T11:49:39.065196+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689d00 genunix:cdev_ioctl+39 ()
>> 2016-09-19T11:49:39.065202+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689d50 specfs:spec_ioctl+60 ()
>> 2016-09-19T11:49:39.065208+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689de0 genunix:fop_ioctl+55 ()
>> 2016-09-19T11:49:39.065213+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689f00 genunix:ioctl+9b ()
>> 2016-09-19T11:49:39.065218+00:00 c1a genunix: [ID 655072 kern.notice]
>> ff00ba689f10 unix:brand_sys_syscall+238 ()
>> 2016-09-19T11:49:39.065224+00:00 c1a unix: [ID 10 kern.notice]
>>
>>
>> Thanks
>> Len
>>
>> On Mon, 19 Sep 2016 at 09:34 Zak McGregor  wrote:
>>
>>> Hi Brian
>>>
>>> Thanks, here's a full stack trace.
>>>
>>> Ciao
>>>
>>> Zak
>>>
>>> On 18 September 2016 at 21:58, Brian Bennett 
>>> wrote:
>>> > Zak,
>>> >
>>> > Considering that illumos #3917 is three years old, you've probably hit
>>> a different bug involving mutexes. It would be best if you can give the
>>> full stack trace, not just the top two frames. Having the full stack trace,
>>> I may be able to identify the particular crash you encountered.
>>> >
>>> > Thanks.
>>> >
>>> > --
>>> > Brian Bennett
>>> > Systems Engineer, Cloud Operations
>>> > Joyent, Inc. | www.joyent.com
>>> >
>>> >> On Sep 16, 2016, at 4:04 AM, Zak McGregor 

Re: [smartos-discuss] Can not override physical sector disks

2016-09-20 Thread Joshua M. Clulow
On 20 September 2016 at 07:00, InterNetX - Juergen Gotteswinter
 wrote:
> try changing this in /kernel/drv/sd.conf
> sd-config-list=
> "",
> "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-blocksize:4096",

This is not likely to help.  The original post is about a pool with
512 byte sectors, and replacement with a 4K/512e disk.

The problem is that ZFS currently only considers the physical block
size, which is already 4K, when attaching a device to a pool.  In this
particular case, though it is likely suboptimal, it is technically
acceptable to attach a 512e device to the pool.  That's what the
patched platform I linked to has been modified to do, and what OS-4718
describes.

-- 
Joshua M. Clulow
UNIX Admin/Developer
http://blog.sysmgr.org


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Can not override physical sector disks

2016-09-20 Thread Humberto Ramirez
Is there a definitive approach / guide / manual / wiki as to how to
properly work / replace 4k - 512 - 512e disks? This has been asked before,
here and in some other lists and obviously continues to be a source of
problems and confusion...

On Sep 19, 2016 11:54 PM, "Joshua M. Clulow"  wrote:

> On 19 September 2016 at 20:12, 郑圆杰  wrote:
> > I have created an zpool with ashift=9.
>
> How did you do this?  Just by using disks with native 512 byte
> sectors, or through some other mechanism?
>
> > Now  a disk is out of service. And I try to replace with a new disk.
>
> Is the replacement disk a different model from the original disk?
>
> > Unfortunately, new disk reports that the physical sector size is 4k.
> Some error occurs when trying exec command “zfs replace”/ “zfs attach”.
> 
> Do you know if the new disk is an "Advanced Format" disk (aka "512e")?
>  That is: does the new disk present 4KB physical sectors, but provide
> emulation for legacy 512 byte sectors?
> 
> If the new disks are 4K native, I'm afraid you cannot use them in an
> ashift=9 pool.  If the disks _do_ provide an emulated 512 byte logical
> sector size, you might be hitting this bug:
> 
> https://smartos.org/bugview/OS-4718
> 
> If these _are_ Advanced Format (512e) disks, you might want to try
> this custom patched platform:
> 
> https://us-east.manta.joyent.com/jmc/public/tmp/platform-
> 20160904T224833Z-OS-4718.tgz
> 
> This custom platform image includes an attempted fix for OS-4718 which
> should help.  Source diff for the platform build is here:
> 
> https://gist.github.com/jclulow/ccb00c396c2f6961672494ef2dbdee66
> 
> Let me know how it goes!
> 
> Cheers.
> 
> --
> Joshua M. Clulow
> UNIX Admin/Developer
> http://blog.sysmgr.org
> 



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Can not override physical sector disks

2016-09-20 Thread InterNetX - Juergen Gotteswinter
try changing this in /kernel/drv/sd.conf

sd-config-list=
"",
"retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2",


to

sd-config-list=
"",
"retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-blocksize:4096",

and if any of the other entries matches your disk/ssd and sets it to 4k,
remove it.

after that, update_drv -vf and try again replacing your disk. worked for
me, even if update_drv tells that it needs to be rebooted. ignore it...


Am 20.09.2016 um 06:49 schrieb 郑圆杰:
> Hi Joshua
> Thank you very much!
> The original disk driver just report logical sector size 512 byte without a 
> physical sector size. We create the zpool with default setting.
> And zfs use logical sector size as ashift if physical sector size do not 
> reported.
> The replacement disk is a different model that report logical sector size 512 
> byte and physical sectory size 4k.
> Memtioned in https://smartos.org/bugview/OS-4718. Zfs will mismatch the 
> alignment.
> 
> 
> 在 16/9/20 11:53,“Joshua M. Clulow” 写入:
> 
> On 19 September 2016 at 20:12, 郑圆杰  wrote:
> > I have created an zpool with ashift=9.
> 
> How did you do this?  Just by using disks with native 512 byte
> sectors, or through some other mechanism?
> 
> > Now  a disk is out of service. And I try to replace with a new disk.
> 
> Is the replacement disk a different model from the original disk?
> 
> > Unfortunately, new disk reports that the physical sector size is 4k. 
> Some error occurs when trying exec command “zfs replace”/ “zfs attach”.
> 
> Do you know if the new disk is an "Advanced Format" disk (aka "512e")?
>  That is: does the new disk present 4KB physical sectors, but provide
> emulation for legacy 512 byte sectors?
> 
> If the new disks are 4K native, I'm afraid you cannot use them in an
> ashift=9 pool.  If the disks _do_ provide an emulated 512 byte logical
> sector size, you might be hitting this bug:
> 
> https://smartos.org/bugview/OS-4718
> 
> If these _are_ Advanced Format (512e) disks, you might want to try
> this custom patched platform:
> 
> https://us-east.manta.joyent.com/jmc/public/tmp/platform-20160904T224833Z-OS-4718.tgz
> 
> This custom platform image includes an attempted fix for OS-4718 which
> should help.  Source diff for the platform build is here:
> 
> https://gist.github.com/jclulow/ccb00c396c2f6961672494ef2dbdee66
> 
> Let me know how it goes!
> 
> Cheers.
> 
> --
> Joshua M. Clulow
> UNIX Admin/Developer
> http://blog.sysmgr.org
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Can not override physical sector disks

2016-09-20 Thread InterNetX - Juergen Gotteswinter
update_drv -vf sd

sorry, typo

Am 20.09.2016 um 16:00 schrieb InterNetX - Juergen Gotteswinter:
> try changing this in /kernel/drv/sd.conf
> 
> sd-config-list=
> "",
> "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2",
> 
> 
> to
> 
> sd-config-list=
> "",
> "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-blocksize:4096",
> 
> and if any of the other entries matches your disk/ssd and sets it to 4k,
> remove it.
> 
> after that, update_drv -vf and try again replacing your disk. worked for
> me, even if update_drv tells that it needs to be rebooted. ignore it...
> 
> 
> Am 20.09.2016 um 06:49 schrieb 郑圆杰:
>> Hi Joshua
>> Thank you very much!
>> The original disk driver just report logical sector size 512 byte without a 
>> physical sector size. We create the zpool with default setting.
>> And zfs use logical sector size as ashift if physical sector size do not 
>> reported.
>> The replacement disk is a different model that report logical sector size 
>> 512 byte and physical sectory size 4k.
>> Memtioned in https://smartos.org/bugview/OS-4718. Zfs will mismatch the 
>> alignment.
>>
>>
>> 在 16/9/20 11:53,“Joshua M. Clulow” 写入:
>>
>> On 19 September 2016 at 20:12, 郑圆杰  wrote:
>> > I have created an zpool with ashift=9.
>> 
>> How did you do this?  Just by using disks with native 512 byte
>> sectors, or through some other mechanism?
>> 
>> > Now  a disk is out of service. And I try to replace with a new disk.
>> 
>> Is the replacement disk a different model from the original disk?
>> 
>> > Unfortunately, new disk reports that the physical sector size is 4k. 
>> Some error occurs when trying exec command “zfs replace”/ “zfs attach”.
>>
>> Do you know if the new disk is an "Advanced Format" disk (aka "512e")?
>>  That is: does the new disk present 4KB physical sectors, but provide
>> emulation for legacy 512 byte sectors?
>>
>> If the new disks are 4K native, I'm afraid you cannot use them in an
>> ashift=9 pool.  If the disks _do_ provide an emulated 512 byte logical
>> sector size, you might be hitting this bug:
>>
>> https://smartos.org/bugview/OS-4718
>>
>> If these _are_ Advanced Format (512e) disks, you might want to try
>> this custom patched platform:
>>
>> https://us-east.manta.joyent.com/jmc/public/tmp/platform-20160904T224833Z-OS-4718.tgz
>>
>> This custom platform image includes an attempted fix for OS-4718 which
>> should help.  Source diff for the platform build is here:
>>
>> https://gist.github.com/jclulow/ccb00c396c2f6961672494ef2dbdee66
>>
>> Let me know how it goes!
>>
>> Cheers.
>>
>> --
>> Joshua M. Clulow
>> UNIX Admin/Developer
>> http://blog.sysmgr.org
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


[smartos-discuss] Contributing packages to SmartOS

2016-09-20 Thread Filip Chabik
Hi @all,

As I recently started to contribute some small work into joyent-pkgsrc
repository, I would like to clarify few open cases to establish how to prepare
"perfect" (or at the very least: recommended) environment for building packages.
First of all, there's a pkgbuild image that can be easily found via imgadm
command:

imgadm update ; imgadm avail | grep pkgbuild

The way I understand it, one should pick the version against which package
will be build. For example, should you want to target 2015Q4 (to provide some
package for 15.4.1 LTS images), one should pick 15.4.1 pkgbuild and should you
target 2016Q2 (or trunk) you should grab latest one available (at current moment
it would be 16.2.0).
There are few places that cover preparation of the environment on the pkgbuild -
some are on Joyent's images documentation, some are on GitHub, some are on
Joyent's pkgsrc documentation and some bits are available only on some Gist. The
list:

 * 
https://docs.joyent.com/public-cloud/instances/infrastructure/images/smartos/pkgbuild
 * https://github.com/joyent/pkgbuild/
 * https://pkgsrc.joyent.com/
 * https://gist.github.com/drscream/c45419950d8af648e2c6

The last is there as I want to sign packages I'm building and it's the only
place that explains how to add your key to be recognized by SmartOS and to allow
me to install self-built stuff. (It doesn't work BTW., but I will get back to it
later).

I tried to combine all of these sources and still failed to get it right in my
first two attempts (even though Jonathan picked the PR I made on GitHub, he had
to introduce quite a few changes in order to push them further upstream -- thank
you BTW. also for the awesome help in the IRC channel). For one thing  -- I
always failed to properly add my public key to the pkgsrc keyring, like so:

gpg --primary-keyring /opt/local/etc/gnupg/pkgsrc.gpg --import
pkgsrc/pkgsrc_pkg_sig.pub

Command itself is not complaining, but pkg_install is still complaining when
trying to install package built and signed with my GPG key :/
Secondly, I never used to run-sandbox, which is necessary for proper chroot
build environment (or at least that's how I understand it) -- simply because I
missed it on pkgsrc.joyent.com and docs.joyent.com (noticed it later on on
GitHub).

OK, enough. I would now like to clarify what is step-by-step to get this whole
environment right -- with proper GPG signing, proper sandboxing etc. etc.

1. Spawn new zone based on the pkgbuild image (for example
4183fce6-49b2-11e6-a1ca-4f007e77f9d5 for 16.2.0).
   a) First question: which user should I use to build packages? There's default
   admin user with 'sudo' superpowers, there's also pbulk user described as
   Package Builder (but, by default, the dude has no home [directory]), but the
   code checkout under /data is owned by root... So, admin? pbulk? root?
2. The way I understand it, ordinary fellow like me should not push stuff
   directly to joyent-pkgsrc. Preferred way here is definitely to fork the repo
   on GH, provide keys to clone it onto your pkgbuild zone to the user that will
   be handling the builds (look it up above in point 1a) in place of the
   existing /data/pkgsrc. If you plan to maintain this repo for some time it
   might not be a terrible idea to add joyent-pkgsrc as a upstream source to
   sync it every now and then.
3. User is picked, keys are in place, code has been forked & cloned. There's no
   gcc installed by default, so I will now follow instructions for Building
   Packages from pkgsrc.joyent.com.
   a) pkgin -y in gcc49 gnupg2
   b) checkout the branch you want to build against (unless it's the default
   trunk), for example joyent/release/2016Q2.
   c) directory structure is mostly already provided under /data. Difference I
   can already see is that default /opt/local/etc/mk.conf file is now having an
   include to the /opt/local/etc/mk.conf.local where all custom changes should
   reside in (DISTDIR, PACKAGES, WRKOBJDIR, SIGN_PACKAGES etc.).
   d) put GPG & GPG_SIGN_AS into /opt/local/etc/pkg_install.conf and import your
   private and public GPG keys to the user that will be building packages (1a).
   e) setup gpg-agent accordingly (.bashrc config, .gnupg/gpg.conf and
   .gnupg/gpg-agent.conf files involved).
   f) Add your public key to the pkgsrc keyring (even though the command is not
   complaining, when running bmake install it fails to recognize my signature).
4. Once that's done, from the user picked for packages building (1a), issue
   run-sandbox 2016Q2 where 2016Q2 matches the pkgbuild image version, branch
   version that you checked out and your idea against which code version you
   want to build package against.
5. Afterwards, the rest should be quite straight-forward -- pick a package and
   run bmake package to check whether all is fine. What should happen:
   a) all the dependencies for building a package should be fetched and
   installed.
   b) package should be built.
   c) package should be signed with your 

Re: [smartos-discuss] Can not override physical sector disks

2016-09-20 Thread Richard Elling

> On Sep 20, 2016, at 6:48 AM, Humberto Ramirez  wrote:
> 
> Is there a definitive approach / guide / manual / wiki as to how to properly 
> work / replace 4k - 512 - 512e disks? This has been asked before, here and in 
> some other lists and obviously continues to be a source of problems and 
> confusion…
> 

The reason the mismatch is reported is because it causes pain and tears if you 
override.
Do yourself a favor and don’t replace 512n drives with 512e.
 — richard

> 
> On Sep 19, 2016 11:54 PM, "Joshua M. Clulow"  > wrote:
> On 19 September 2016 at 20:12, 郑圆杰  > wrote:
> > I have created an zpool with ashift=9.
> 
> How did you do this?  Just by using disks with native 512 byte
> sectors, or through some other mechanism?
> 
> > Now  a disk is out of service. And I try to replace with a new disk.
> 
> Is the replacement disk a different model from the original disk?
> 
> > Unfortunately, new disk reports that the physical sector size is 4k. Some 
> > error occurs when trying exec command “zfs replace”/ “zfs attach”.
> 
> Do you know if the new disk is an "Advanced Format" disk (aka "512e")?
>  That is: does the new disk present 4KB physical sectors, but provide
> emulation for legacy 512 byte sectors?
> 
> If the new disks are 4K native, I'm afraid you cannot use them in an
> ashift=9 pool.  If the disks _do_ provide an emulated 512 byte logical
> sector size, you might be hitting this bug:
> 
> https://smartos.org/bugview/OS-4718 
> 
> If these _are_ Advanced Format (512e) disks, you might want to try
> this custom patched platform:
> 
> https://us-east.manta.joyent.com/jmc/public/tmp/platform-20160904T224833Z-OS-4718.tgz
>  
> 
> 
> This custom platform image includes an attempted fix for OS-4718 which
> should help.  Source diff for the platform build is here:
> 
> https://gist.github.com/jclulow/ccb00c396c2f6961672494ef2dbdee66 
> 
> 
> Let me know how it goes!
> 
> Cheers.
> 
> --
> Joshua M. Clulow
> UNIX Admin/Developer
> http://blog.sysmgr.org 
> 
> smartos-discuss | Archives 
>   
>  | 
> Modify  Your Subscription   
> 



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com