Re: Kernel crash during video transcoding

2020-08-16 Thread Alexandre Levy
"m" is not NULL :

(kgdb) frame 16
#16 0x80ec23ed in vm_page_busy_acquire (m=0xfe00040ff9e8,
allocflags=16) at /usr/src/sys/vm/vm_page.c:884
(kgdb) p *m
$2 = {plinks = {q = {tqe_next = 0x578491b51dd60510, tqe_prev =
0xd78c11bd9dde8518}, s = {ss = {sle_next = 0x578491b51dd60510}}, memguard =
{p = 6306325585301210384,
  v = 15531808720989095192}, uma = {slab = 0x578491b51dd60510, zone =
0xd78c11bd9dde8518}}, listq = {tqe_next = 0xd78c11bd9dde8518, tqe_prev =
0x265bc92017d7aa38},
  object = 0x2659c92217d5aa3a, pindex = 2758957463725517354, phys_addr =
2758957463725517354, md = {pv_list = {tqh_first = 0x2e49c1321fc5a22a,
tqh_last = 0x3e4bd1300fc7b228},
pv_gen = 265794104, pat_mode = 1046204704}, ref_count = 257405624,
busy_lock = 1054593440, a = {{flags = 4757, queue = 48 '0', act_count = 134
'\206'}, _bits = 2251297429},
  order = 98 'b', pool = 204 '\314', flags = 75 'K', oflags = 105 'i',
psind = -107 '\225', segind = 18 '\022', valid = 48 '0', dirty = 134 '\206'}

I had to recompile drm-devel-kmod with make WITH_DEBUG=yes DEBUG_FLAGS="-g
-O0" because "m" was optimized out. I then started a kgdb session with the
same crash dump than before, loaded the module symbols with add-kld
/boot/modules/i915kms.ko and I now have a different backtrace from frames
#17 to #28.

Also the panic doesn't occur when I plug a screen to the HDMI port (which
now works for some reason...) and I can see the frame #17 is now the
following :

#17 0x82b4e980 in intel_plane_can_remap
(plane_state=0xf80315148300)
at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/display/intel_display.c:2583

and used to be :

#17 0x82b4e980 in remap_io_mapping (vma=0xf80315148300,
addr=, pfn=, size=,
iomap=)

I don't understand why the backtrace changed although the crash dump is the
same as before. Any suggestions ?

Le dim. 16 août 2020 à 18:19, Hans Petter Selasky  a
écrit :

> On 2020-08-16 17:28, Alexandre Levy wrote:
> > Now at intel_freebsd.c:193 (frame #17) the driver calls
> > vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
> > vm_obj of the calling frame.
>
> Can you check if "m" is NULL at this point?
>
> --HPS
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


(solved) Re: dma fails to connect (error:1408F10B:SSL routines:ssl3_get_record:wrong version number)

2020-08-16 Thread Ronald Klop
On Sun, 16 Aug 2020 16:44:51 +0200, Ronald Klop   
wrote:



Hi,

I have uname -UK -> 1300101 1300101 in my laptop. This uses libexec/dma  
as mail agent.
I have 2 jails running uname -U -> 1300101 and 1300104. All dma configs  
are the same.


In all 1300101 versions dma can deliver mail to my smarthost. On 1300104  
I get:


Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: trying remote  
delivery to smtp.greenhost.nl [213.108.110.112] pref 0
Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]:  
SSL_client_method
Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: remote  
delivery deferred: SSL handshake failed fatally: error:1408F10B:SSL  
routines:ssl3_get_record:wrong version number


Any thoughts on this?
bisecting this will take me hours and hours of compilation

Regards,
Ronald.



I found the cause of the error with ngrep. My jail has an underscore in  
the name and the SMTP EHLO command complained about it. But the error  
handling in dma does not handle this error properly if STARTTLS is  
enabled, so communication with the server goes wrong which results in  
STARTTLS getting weird results later on.


I proposed a fix upstream and will rename my jail to not contain an  
underscore in the hostname.

https://github.com/corecode/dma/pull/87

Computers and all the time consuming little bugs. Arrgh.

Ronald.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dma fails to connect (error:1408F10B:SSL routines:ssl3_get_record:wrong version number)

2020-08-16 Thread Benjamin Kaduk
On Sun, Aug 16, 2020 at 04:44:51PM +0200, Ronald Klop wrote:
> Hi,
> 
> I have uname -UK -> 1300101 1300101 in my laptop. This uses libexec/dma as  
> mail agent.
> I have 2 jails running uname -U -> 1300101 and 1300104. All dma configs  
> are the same.
> 
> In all 1300101 versions dma can deliver mail to my smarthost. On 1300104 I  
> get:
> 
> Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: trying remote  
> delivery to smtp.greenhost.nl [213.108.110.112] pref 0
> Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]:  
> SSL_client_method
> Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: remote delivery  
> deferred: SSL handshake failed fatally: error:1408F10B:SSL  
> routines:ssl3_get_record:wrong version number
> 
> Any thoughts on this?
> bisecting this will take me hours and hours of compilation

IMO bisecting is not the fastest approach.
"ssl3_get_record:wrong version number" sometimes means "you tried to speak
TLS to an endpoint that's doing plaintext", but if it reflects an actual
TLS version mismatch, a packet capture should make it clear quite quickly.
Note that openssl upstream has been gradually ratcheting the default
settings towards a more-secure state, so if your peer is only using TLS
1.0/1.1, non-AEAD ciphers, etc., a local upgrade might result in a failure
to communicate with the default settings.

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kernel crash during video transcoding

2020-08-16 Thread Hans Petter Selasky

On 2020-08-16 17:28, Alexandre Levy wrote:

Now at intel_freebsd.c:193 (frame #17) the driver calls
vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
vm_obj of the calling frame.


Can you check if "m" is NULL at this point?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


review of a change to sosend_generic()

2020-08-16 Thread Rick Macklem
Hi,

I put D25923 up on phabricator a little while ago.
I clicked on a couple of people that I thought might like to
review it.

However, if anyone else would like to review it, please do so.
The review is as much about the concept as the actual implementation.

Thanks, rick

Here is the description of it...
The kernel RPC cannot process non-application data records when
  using TLS.  It must to an upcall to a userspace daemon that will
  call SSL_read() to process them.
  
  This patch adds a new flag called MSG_TLSAPPDATA that the kernel
  RPC can use to tell sorecieve() to return ENXIO instead of a non-application
  data record, when that is what is at the top of the receive queue.
  
  The code could use any error return that is not normally returned by
  soreceive(). If some other errno is preferred, that can easily be changed.
  
  I also put the code in #ifdef KERN_TLS/#endif, although it will build without
  that, so that it is recognized as only useful when KERN_TLS is enabled.
  
  The alternative to doing this is to have the kernel RPC re-queue the
  non-application data message after receiving it, but that seems more
  complicated and might introduce message ordering issues when there
  are multiple non-application data records one after another.
  
  I do not know what, if any, changes will be required to support TLS1.3.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kernel crash during video transcoding

2020-08-16 Thread Alexandre Levy
Hi,

I looked at the crash dump and the code more closely:

#18 0x82be1c5f in i915_gem_fault (dummy=,
vmf=)
at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367
(kgdb) p area->vm_obj->lock
$43 = {lock_object = {lo_name = 0x8112c767 "vm object", lo_flags =
627245056, lo_data = 0, lo_witness = 0xf8045f575800}, rw_lock =
18446741878623409920}

So vm_obj is not NULL and has a rw_lock member

Now at intel_freebsd.c:193 (frame #17) the driver calls
vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
vm_obj of the calling frame.

The panic occurs in kern_rwlock.c:270 in frame #15 when
calling rw_wowner(rwlock2rw(c)) so something goes wrong either in rw_wowner
or in rwlock2rw.

Looking at rwlock2rw() :

/*
 * Return the rwlock address when the lock cookie address is provided.
 * This functionality assumes that struct rwlock* have a member named
rw_lock.
 */
#define rwlock2rw(c)(__containerof(c, struct rwlock, rw_lock))

I think this one is just extracting out the rw_lock member of the passed in
struct. However I don't understand what the cookie address is about due to
my lack of knowledge on kernel locking concepts. So maybe something is
wrong with the cookie or the rw_lock value itself.

Looking at rw_wowner() :

/*
 * Return a pointer to the owning thread if the lock is write-locked or
 * NULL if the lock is unlocked or read-locked.
 */

#define lv_rw_wowner(v) \
((v) & RW_LOCK_READ ? NULL :\
 (struct thread *)RW_OWNER((v)))

#define rw_wowner(rw)   lv_rw_wowner(RW_READ_VALUE(rw))

I don't think that one could cause a panic but again I'm not experienced
enough to be sure, it seems this either returns the thread that owns the
lock or NULL if no thread owns it.

The is also the fact that the driver calls vm_page_busy_acquire with the
VM_ALLOC_WAITFAIL flag which is defined in vm_page.h as :

#define VM_ALLOC_WAITFAIL   0x0010  /* (acf) Sleep and return error */

Could this be the reason of the panic as in we try to lock, then cannot and
eventually just return an error without retrying ? There is the flag
VM_ALLOC_WAITOK that says /* (acf) Sleep and retry */. Should I try to
patch intel_freebsd.c to use this flag instead ?

Thanks.

Le sam. 15 août 2020 à 20:35, Alexandre Levy  a écrit :

> Hi,
>
> I could finally generate a crash dump even with a black screen, I had to
> guess I was in the crash handler and I type "dump" and enter which worked.
> The driver logs "[drm] Cannot find any crtc or sizes" which I guess is the
> reason why I couldn't see anything on my screen.
>
> Back to the initial problem, I could start a kgdb session, loaded the
> i915kms.ko symbols and here are the results :
>
> (kgdb) bt
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394
> #2  0x8049c26a in db_dump (dummy=,
> dummy2=, dummy3=, dummy4=) at
> /usr/src/sys/ddb/db_command.c:575
> #3  0x8049c02c in db_command (last_cmdp=,
> cmd_table=, dopager=1) at /usr/src/sys/ddb/db_command.c:482
> #4  0x8049bd9d in db_command_loop () at
> /usr/src/sys/ddb/db_command.c:535
> #5  0x8049f048 in db_trap (type=, code= out>) at /usr/src/sys/ddb/db_main.c:270
> #6  0x80c1b374 in kdb_trap (type=3, code=0, tf=) at
> /usr/src/sys/kern/subr_kdb.c:699
> #7  0x8100ca98 in trap (frame=0xfe00d7567300) at
> /usr/src/sys/amd64/amd64/trap.c:576
> #8  
> #9  kdb_enter (why=0x811d5de0 "panic", msg=) at
> /usr/src/sys/kern/subr_kdb.c:486
> #10 0x80bd00be in vpanic (fmt=, ap=)
> at /usr/src/sys/kern/kern_shutdown.c:902
> #11 0x80bcfe53 in panic (fmt=0x81c8c7c8 
> "\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:839
> #12 0x8100cee7 in trap_fatal (frame=0xfe00d7567600, eva=0) at
> /usr/src/sys/amd64/amd64/trap.c:915
> #13 0x8100c360 in trap (frame=0xfe00d7567600) at
> /usr/src/sys/amd64/amd64/trap.c:212
> #14 
> #15 _rw_wowned (c=0x2659c92217d5aa52) at
> /usr/src/sys/kern/kern_rwlock.c:270
> #16 0x80ec23ed in vm_page_busy_acquire (m=0xfe00040ff9e8,
> allocflags=16) at /usr/src/sys/vm/vm_page.c:884
> #17 0x82b4e980 in remap_io_mapping (vma=0xf80315148300,
> addr=, pfn=, size=,
> iomap=)
> at
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193
> #18 0x82be1c5f in i915_gem_fault (dummy=,
> vmf=)
> at
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367
> #19 0x82cb5ddf in linux_cdev_pager_populate
> (vm_obj=0xf80368501420, pidx=, fault_type= out>, max_prot=,
> first=0xfe00d7567868, last=0xfe00d7567888) at
> /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:554
> #20 0x80ea9e8f in vm_pager_populate (object=0x2659c92217d5aa52,
> 

Re: CFT for vendor openzfs - week 5 reminder + memdisk images

2020-08-16 Thread Walter von Entferndt
At Sonntag, 16. August 2020, 14:00:00 CEST,  Matthew Macy  
wrote:
> Yes, this appears to have been going on for at least the last week.
> The FreeBSD infrastructure directly available to developers appears to
> be unreliable for serving large files. Individuals with accounts on
> freefall have been able to scp the files. It's possible that we may
> just end up sharing images more widely by way of releng generated
> images after commit. I'll see if there's an alternative for the last
> week of the CFT.
Why not use torrent as a workaround?
-- 
=|o)"Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


dma fails to connect (error:1408F10B:SSL routines:ssl3_get_record:wrong version number)

2020-08-16 Thread Ronald Klop

Hi,

I have uname -UK -> 1300101 1300101 in my laptop. This uses libexec/dma as  
mail agent.
I have 2 jails running uname -U -> 1300101 and 1300104. All dma configs  
are the same.


In all 1300101 versions dma can deliver mail to my smarthost. On 1300104 I  
get:


Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: trying remote  
delivery to smtp.greenhost.nl [213.108.110.112] pref 0
Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]:  
SSL_client_method
Aug 16 16:29:00 freebsd13_py3 dma[385ba.800e480a0][52169]: remote delivery  
deferred: SSL handshake failed fatally: error:1408F10B:SSL  
routines:ssl3_get_record:wrong version number


Any thoughts on this?
bisecting this will take me hours and hours of compilation

Regards,
Ronald.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"