Re: Maxphys on -current?

2023-08-04 Thread Jaromír Doleček
Le ven. 4 août 2023 à 17:27, Jason Thorpe a écrit : > If someone does pick this up, I think it would be a good idea to start from > scratch, because MAXPHYS, as it stands, is used for multiple things. > Thankfully, I think it would be relatively straightforward to do the work > that I am sugge

Re: Module autounload proposal: opt-in, not opt-out

2022-08-07 Thread Jaromír Doleček
Le lun. 8 août 2022 à 07:23, Taylor R Campbell a écrit : > I propose we remove the assumption that ENOTTY means no objection, and > switch from opt-out to opt-in: if a module _has_ been audited to > verify MODULE_CMD_AUTOUNLOAD is safe, it can return 0; otherwise it > will never be autounloaded, e

Re: maximum limit of files open with O_EXLOCK

2021-06-19 Thread Jaromír Doleček
Le sam. 19 juin 2021 à 10:12, nia a écrit : > The Zig developer found the kernel limit: > https://nxr.netbsd.org/xref/src/sys/kern/vfs_lockf.c#116 > > but it doesn't seem to be adjustable through sysctl. > I wonder if it should be. Yes it should be a sysctl. The default should probably also be bu

Re: Ext4 support

2021-04-29 Thread Jaromír Doleček
Le jeu. 29 avr. 2021 à 22:06, Vincent DEFERT <20@defert.com> a écrit : > I'd like to have full ext4 support so an ext4-formatted disk could be > used to exchange data between Linux and NetBSD, for instance. Most of the commonly used features for this should be actually implemented. Big one mi

Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0

2021-04-14 Thread Jaromír Doleček
Le mer. 14 avr. 2021 à 03:21, Greg A. Woods a écrit : > However their front-end code does detect it and seems to make use of it, > and has done for some 6 years now according to "git blame" (with no > recent fixes beyond fixing a memory leak on their end). Here we see it > live from FreeBSD's sys

Re: I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)

2021-04-11 Thread Jaromír Doleček
Le dim. 11 avr. 2021 à 17:51, Robert Elz a écrit : > > Date:Sun, 11 Apr 2021 14:25:40 - (UTC) > From:mlel...@serpens.de (Michael van Elst) > Message-ID: > > | + dg->dg_secperunit = vnd->sc_size / DEV_BSIZE; > > While it shouldn't make any difference for an

Re: Bounties for xhci features: scatter-gather, suspend/resume

2021-03-31 Thread Jaromír Doleček
Le jeu. 25 mars 2021 à 21:36, a écrit : > xHCI scatter-gather support > > The infamous "failed to create xfers". xhci wants contiguous > allocations. With too much memory fragmentation, they're hard to do. > This shows up as USB drivers randomly failing on machines that have been > Up for a while.

Re: Issues with older wd* / IDE on various platforms

2020-11-13 Thread Jaromír Doleček
Le ven. 13 nov. 2020 à 22:31, Robert Swindells a écrit : > > > John Klos wrote: > >I've noticed problems in three places and only recently did it occur to me > >that they may all be related. > > [snip] > > >All three of these machines have much older IDE, so I'm wondering what in > >NetBSD change

Re: notes from running will-it-scale

2020-07-19 Thread Jaromír Doleček
Very interesting, particularly the outrageous assembly for pmap_{zero,copy}_page(). Is there some way to tell the compiler that the address is already 4096-aligned and avoid the conditionals? Failing that, we could just adopt the FreeBSD assembly for this. Does anyone see a problem with introduci

Re: kernel stack usage

2020-07-04 Thread Jaromír Doleček
Le sam. 4 juil. 2020 à 15:30, Kamil Rytarowski a écrit : > > Kamil - what's the difference in gcc between -Wframe-larger-than= and > > -Wstack-size= ? > > > > -Wstack-size doesn't exist? Sorry, meant -Wstack-usage= > > I see according to gcc documentation -Wframe-larger-than doesn't count > > si

Re: kernel stack usage

2020-07-04 Thread Jaromír Doleček
ays, which makes it much less useful than -Wstack-usage. Jaromir Le dim. 31 mai 2020 à 16:39, Kamil Rytarowski a écrit : > > Can we adopt -Wframe-larger-than=1024 and mark it fatal? > > This option is supported by GCC and Clang. > > On 31.05.2020 15:55, Jaromír Doleček wrote: > &

Re: Straw proposal: MI kthread vector/fp unit API

2020-06-23 Thread Jaromír Doleček
Le mar. 23 juin 2020 à 00:14, Eduardo Horvath a écrit : > The SPARC has always had a lazy FPU save logic. The fpstate structure is > not part of the pcb and is allocated on first use. No lazy FPU save logic please. It was eradicated from x86 for a reason: https://access.redhat.com/solutions/3485

Re: UBSan: Undefined Behavior in lf_advlock

2020-06-05 Thread Jaromír Doleček
Le ven. 5 juin 2020 à 21:49, syzbot a écrit : > [ 44.1699615] panic: UBSan: Undefined Behavior in > /syzkaller/managers/netbsd-kubsan/kernel/sys/kern/vfs_lockf.c:843:16, signed > integer overflow: 131072 + 9223372036854771712 cannot be represented in type > 'long int' > > [ 44.1931600] cpu0:

UEFI boot and PCI interrupt routing (amd64)

2020-06-03 Thread Jaromír Doleček
Hi, I'm working on a driver for some PCI device, I'm far enough to execute operations which should trigger interrupt, but the interrupt handler (registered via pci_intr_establish()) is not called. It looks like there is some kind of interrupt routing problem, maybe. Any hints on what could/should

Re: kernel stack usage

2020-05-31 Thread Jaromír Doleček
I think it would make sense to add -Wstack-usage=X to kernel builds. Either 2KB or 1KB seems to be good limit, not too many offenders between 1KB and 2KB so maybe worth it to use 1KB and change them. I'm sure there would be something similar for LLVM too. Jaromir Le dim. 31 mai 2020 à 15:39, Si

Re: kernel stack usage

2020-05-30 Thread Jaromír Doleček
Le sam. 30 mai 2020 à 18:41, Jason Thorpe a écrit : > These two seem slightly bogus. coredump_note_elf64() was storing register > state not the stack, but not nearly 3K worth. procfs_domounts() has nearly > nothing on the stack as far as I can tell, and the one function that could be > auto-i

Re: kernel stack usage

2020-05-30 Thread Jaromír Doleček
I've fixed several where I felt comfortable, feel free to do more: 4096pci_conf_print at pci_subr.c:4812 4096dtv_demux_read at dtv_demux.c:493 3408genfb_calc_hsize at genfb.c:630 2240bwfm_rx_event_cb at bwfm.c:2099 1664wdcprobe_with_reset at wdc.c:491 Jaromir Le sam. 30 mai 20

ACPI SCI interrupt flooding - any ideas?

2020-04-27 Thread Jaromír Doleček
Hi, we have the SCI interrupt storm - http://gnats.netbsd.org/53687 Linux has this quirk, and it claims such thing happens due to some unsupported hw features. Would it make sense to have some quirk like that? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c4aa1ee

Netowork interface deferred start - does it work?

2020-03-27 Thread Jaromír Doleček
Hi, I'm looking on some performance improvements for xennet(4) and particularly xvif(4). The drivers use the deferred start and there are lots of comments on how important it is to batch requests, but actually it's really rare that the start routine would ever be called with more than one mbuf in

Re: config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-19 Thread Jaromír Doleček
Le lun. 17 févr. 2020 à 17:55, matthew green a écrit : > > FWIW, i've been running my radeon with a patch that exlicitly drops > kernel lock around the "real attach" function (the one that config > mountroot ends up calling.) > > we really need to MPSAFE-ify the autoconf subsystem. right now, it

config_mounroot - spinout while attaching nouveaufb0 on amd64 with LOCKDEBUG

2020-02-16 Thread Jaromír Doleček
Hi, while debugging the MSI attachment for nouveaufb0, I've got several times spinout panic like one below. It doesn't happen on every boot, but on almost every one. I confirmed via ddb that this happens due to config_mountroot_thread() holding the kernel lock for too long - that's where backtrac

Re: Proposal to remove netsmb / smbfs

2020-01-20 Thread Jaromír Doleček
Le lun. 20 janv. 2020 à 21:07, Jason Thorpe a écrit : > > (Cross-posted to tech-kern / tech-net because it affects networking and file > systems.) > > I would like to propose that we remove netsmb and smbfs. Two reasons: Yes, please. > 1- They only support SMB1, which is an ancient flavor of t

Re: New "ahcisata0 port 1: device present" messages with NetBSD 9

2020-01-17 Thread Jaromír Doleček
All right, let's do that. Penalising quirky old gear makes more sense than penalising new gear. I didn't like that the controller approach means we penalise ALL drives on the controller. However, it's better than penalising EVO drives everywhere, when we have evidence it works without errors on ot

Re: New "ahcisata0 port 1: device present" messages with NetBSD 9

2020-01-13 Thread Jaromír Doleček
Le lun. 13 janv. 2020 à 11:56, Simon Burge a écrit : > A bit more digging shows that this seems to be a (somewhat) known > problem with Samsung EVO 860 disks and AMD SB710/750 chipsets. The > problem also occurs on Windows and Linux with these drives and chipsets. > > Here's a couple of links: >

Re: adding linux syscall fallocate

2019-11-16 Thread Jaromír Doleček
Le sam. 16 nov. 2019 à 17:14, HRISHIKESH GOYAL a écrit : > Does *posix_fallocate()* implemented (has support) for any of the > underlying filesystems in NetBSD/FreeBSD? > As far as I know no filesystem actually implements the support in NetBSD. > Also, as per my understanding, all calls to *po

Re: adding linux syscall fallocate

2019-11-09 Thread Jaromír Doleček
found this: > https://wiki.netbsd.org/projects/project/ffs-fallocate/ > Are there any details for this project besides that page? I don't know > anything about NetBSD internals though if it's not meant for gurus, I'd > have a look at it and give it a try. > > Best rega

Re: adding linux syscall fallocate

2019-11-03 Thread Jaromír Doleček
Le dim. 3 nov. 2019 à 08:57, r0ller a écrit : > As you can see on the attached screenshot, "line 4741" gets printed out. So I > went on to check what happens in VOP_FALLOCATE but it gets really internal > there. > > Does anyone have any hint? fallocate VOP is not implemented for FFS: > grep fa

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-27 Thread Jaromír Doleček
Le jeu. 26 sept. 2019 à 18:08, Manuel Bouyer a écrit : > > On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote: > > issues for a clearly marginal use case, and given the current general > ^^^ > > This is where we dissagree. You guess it's marginal bu

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Jaromír Doleček
Le jeu. 26 sept. 2019 à 10:54, Maxime Villard a écrit : > I recently made a big set of changes to fix many bugs and vulnerabilities in > compat_linux and compat_linux32, the majority of which have a security impact > bigger than the Intel CPU bugs we hear about so much. These compat layers are > e

Re: Enable functionality by default

2019-04-16 Thread Jaromír Doleček
Le mar. 16 avr. 2019 à 17:55, Sevan Janiyan a écrit : > altq, veriexec, BUFQ_PRIOCSCAN, CARP > ... > Are there any reasons not enable these features by default on all ports > (with the exception of resource constrained systems such as sun2, sun3, > zaurus) ? I think enabling veriexec used to incu

Re: Removing PF

2019-04-01 Thread Jaromír Doleček
Le lun. 1 avr. 2019 à 14:32, Stephen Borrill a écrit : >Your two statements are mutually inconsistent: > 1) No-one is maintaining ipf or pf > and > 2) If the effort had been on one firewall instead of three, the one chosen > would be more functional. IMO it's consistent - if we had one, it would

Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Jaromír Doleček
Le sam. 16 mars 2019 à 16:12, Robert Elz a écrit : > Sorry, I must have missed that. All I ever seem to see is that xxx is > unmaintained and full of unspecified bugs, and that obviously no-one cares, > and so we should delete it.That's not an argument for anything. You suggest that there a

Re: Reserve device major numbers for pkgsrc

2019-02-16 Thread Jaromír Doleček
Perhaps do not pre-reserve anything and simply reserve consecutive numbers as the need arises? Le sam. 16 févr. 2019 à 23:55, Kamil Rytarowski a écrit : > > On 16.02.2019 23:40, Jaromír Doleček wrote: > > Le sam. 16 févr. 2019 à 23:24, Kamil Rytarowski a écrit : > >> >

Re: Reserve device major numbers for pkgsrc

2019-02-16 Thread Jaromír Doleček
Le sam. 16 févr. 2019 à 23:24, Kamil Rytarowski a écrit : > > We started to build and ship kernel modules through pkgsrc. > > I would like to reserve 3 major numbers for the HAXM case from the base > pool of devices and prevent potential future conflicts and compatibility > breakage due to picking

Re: scsipi: physio split the request

2018-12-28 Thread Jaromír Doleček
> On Dec 27, 12:29pm, buh...@nfbcal.org (Brian Buhrow) wrote: > -- Subject: Re: scsipi: physio split the request > > | hello. Just out of curiosity, why did the tls-maxphys branch never > | get merged with head once the work was done or mostly done? Simply nobody finished it up yet. Le jeu

Re: scsipi: physio split the request

2018-12-28 Thread Jaromír Doleček
Le jeu. 27 déc. 2018 à 15:41, Emmanuel Dreyfus a écrit : > > On Thu, Dec 27, 2018 at 02:33:28PM +, Christos Zoulas wrote: > > I think you need resurrect the tls-maxphys branch... It was close to working > > IIRC. > > What happens if I just #define MAXPHYS (1024*1204*1024) ? Several drivers us

Re: svr4, again

2018-12-18 Thread Jaromír Doleček
Le mar. 18 déc. 2018 à 13:16, Maxime Villard a écrit : > It is clear that COMPAT_SVR4 is completely buggy, but to be clear on the > use of the code: +1 to removal for COMPAT_SVR4, there is always attic. I remember I've been also doing some mechanical changes in the area in past, and also encount

Re: [uvm_hotplug] Fixing the build of tests

2018-12-15 Thread Jaromír Doleček
Le sam. 15 déc. 2018 à 15:27, Jason Thorpe a écrit : > We can buy ourselves some time on aarch64 by using a different page size, > yes? iOS, for example, uses 16KiB VM pages (backed by 4KiB or 16KiB physical > pages, depending on the specific CPU type). I don't think the ARM eabi has > the sa

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-05 Thread Jaromír Doleček
Le mer. 5 déc. 2018 à 08:39, Masanobu SAITOH a écrit : > I suspect Serial ATA AHCI 1.2.1 specification page 111 has the hint. > > Figure 23: Port/CCC and MSI Message Mapping, Example 1 > Figure 24: Port and MSI Message Mapping, Example 2 > > I suspect MSI-X also assume this layout. pci_msix_

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-04 Thread Jaromír Doleček
I've now disabled MSI-X for ahcisata(4), need to figure what needs to be setup there. Le lun. 3 déc. 2018 à 22:41, Jared McNeill a écrit : > IIUC we don't actually have confirmation that ThunderX AHCI works yet.. > Nick is having issues with his board. Okay - I misunderstood, thought it was conf

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-03 Thread Jaromír Doleček
Le lun. 3 déc. 2018 à 12:09, Masanobu SAITOH a écrit : > C3000's AHCI has multi-vector MSI-X table and it doesn't work since > Nobember 20th... Can you try if by chance this code adapted nvme(4) changes anything on your system? http://www.netbsd.org/~jdolecek/ahcisata_msixoff.diff ahcisata(4) w

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-11-27 Thread Jaromír Doleček
Le mer. 28 nov. 2018 à 00:42, Taylor R Campbell a écrit : > > Date: Tue, 27 Nov 2018 21:14:04 + > > From: Robert Swindells > > > > It looks to me that drivers try MSI and/or MSIX first based on the > > device type not on whether the host controller can handle it. pci_intr_alloc() checks what

pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-11-27 Thread Jaromír Doleček
I see several drivers (e.g. xhci(4), ahcisata(4), bge(4), nvme(4)) retry the pci_intr_alloc()+pci_intr_establish() with 'lower' types when pci_intr_establish() fails. Is this a real case, can it actualy happen that pci_intr_alloc() returns the interrupt handlers, but pci_intr_establish() would fai

Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Jaromír Doleček
2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: >> 100.002054 14.18 kernel_lock >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 >> 23.73 188 3.36 kernel_lockintr

Re: panic: biodone2 already

2018-08-07 Thread Jaromír Doleček
2018-08-07 18:42 GMT+02:00 Emmanuel Dreyfus : > kern/53506 Thanks. Could you please try a -current kernel for DOMU and see if it crashes the same? If possible a DOMU kernel from daily builds, to rule out local compiler issue. There are not really many differences in xbd/evtchn code itself betwe

Re: panic: biodone2 already

2018-08-06 Thread Jaromír Doleček
This is always a bug, driver processes same buf twice. It can do harm. If the buf is reused for some other I/O, system can fail to store data, or claim to read data when it didn't. Can you give full backtrace? Jaromir 2018-08-06 17:56 GMT+02:00 Emmanuel Dreyfus : > Hello > > I have a Xen domU th

Re: Removing dbregs

2018-07-13 Thread Jaromír Doleček
2018-07-13 22:54 GMT+02:00 Kamil Rytarowski : > I disagree with disabling it. The code is not broken, it's covered by > tests, it's in use. This looks like perfect candidate for optional (default off) feature. It is useless and dangerous for general purpose use by virtue of being root only, but us

Re: CVS commit: src/sys/arch/x86/x86

2018-07-08 Thread Jaromír Doleček
Le dim. 8 juil. 2018 à 15:29, Kamil Rytarowski a écrit : > I've introduced the change to mpbios.c as it was small, selfcontained > and without the need to decorate the whole function. Am I reading the code wrong or you actually introduced bug in mpbios.c? Shouldn't this: memtop |= (uint16_t)mpb

Re: interrupt cleanup #1

2018-06-24 Thread Jaromír Doleček
ne 24. 6. 2018 v 12:13 odesílatel Cherry G.Mathew napsal: > Sometime tomorrow I'll send a first re-org patch (no functional changes) > followed by the actual meat later in the week. I have some small changes for x86/intr.c and xen/evtchn.c too. In intr.c pretty much just some #ifdef shuffle to ma

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-06-10 Thread Jaromír Doleček
2018-05-12 21:24 GMT+02:00 Chuck Silvers : > the problem with keeping the pages locked (ie. PG_BUSY) while accessing > the user address space is that it can lead to deadlock if the page Meanwhile I've tested this scenario - wrote a test program to do mmap(2) for file, then calling write() using th

Re: Revisiting uvm_loan() for 'direct' write pipes

2018-06-10 Thread Jaromír Doleček
2018-05-25 23:19 GMT+02:00 Jason Thorpe : > BTW, I was thinking about this, and I think you need to also handle the case > where you try the ubc_uiomove_direct() case, but then *fall back* onto the > non-direct case if some magic error is returned. > ... > These cache flushes could be potentially v

Re: Revisiting uvm_loan() for 'direct' write pipes

2018-05-25 Thread Jaromír Doleček
2018-05-21 21:49 GMT+02:00 Jaromír Doleček : > It turned out uvm_loan() incurs most of the overhead. I'm still on my > way to figure what it is exactly which makes it so much slower than > uiomove(). I've now pinned the problem down to the pmap_page_protect(..., VM_PROT_READ),

Revisiting uvm_loan() for 'direct' write pipes

2018-05-21 Thread Jaromír Doleček
Hello, I've been playing a little on revisiting kern/sys_pipe.c to take advantage of the direct map in order to avoid the pmap_enter() et.al via the new uvm_direct_process() interface. Mostly since I want to have at least one other consumer of the interface before I consider it as final, to make s

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-05-13 Thread Jaromír Doleček
2018-05-12 21:24 GMT+02:00 Chuck Silvers : > the problem with keeping the pages locked (ie. PG_BUSY) while accessing > the user address space is that it can lead to deadlock if the page > we are accessing via the kernel mapping is the same page that we are > accessing via the user mapping. The pag

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-05-12 Thread Jaromír Doleček
2018-05-12 7:07 GMT+02:00 Michael van Elst : > On Fri, May 11, 2018 at 05:28:18PM +0200, Jaromír Dole?ek wrote: >> I've implemented a tweak to read-ahead code to skip the full >> read-ahead if last page of the range is already in cache, this >> improved things a lot: > > That looks a bit like a hac

Removing uvm_emap (aka ephemeral mapping interface)?

2018-05-11 Thread Jaromír Doleček
Hello, I suggest to remove emap code. Unfortunately it turned out as too tricky to implement and use correctly. It's not currently used anywhere, the only use in sys_pipe.c was ifdeffed out due to stability problems in 2009-08-31. While the commit states the stability problems were present also

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-05-11 Thread Jaromír Doleček
uristics for read-ahead is good enough for general case. Jaromir 2018-04-19 22:39 GMT+02:00 Jaromír Doleček : > I've finally got my test rig setup, so was able to check the > performance difference when using emap. > > Good news there is significant speedupon NVMe device, without >

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-04-19 Thread Jaromír Doleček
21:28 GMT+02:00 Jaromír Doleček : > 2018-03-31 13:42 GMT+02:00 Jaromír Doleček : >> 2018-03-25 17:27 GMT+02:00 Joerg Sonnenberger : >>> Yeah, that's what ephemeral mappings where supposed to be for. The other >>> question is whether we can't just use the di

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-04-02 Thread Jaromír Doleček
2018-03-31 13:42 GMT+02:00 Jaromír Doleček : > 2018-03-25 17:27 GMT+02:00 Joerg Sonnenberger : >> Yeah, that's what ephemeral mappings where supposed to be for. The other >> question is whether we can't just use the direct map for this on amd64 >> and similar platfo

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86

2018-03-31 Thread Jaromír Doleček
2018-03-25 17:27 GMT+02:00 Joerg Sonnenberger : > Yeah, that's what ephemeral mappings where supposed to be for. The other > question is whether we can't just use the direct map for this on amd64 > and similar platforms? Right, we could/should use emap. I haven't realized emap is actually already

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86

2018-03-27 Thread Jaromír Doleček
2018-03-27 0:51 GMT+02:00 Michael van Elst : > UBC_WINSHIFT=17 (i.e. 2*MAXPHYS) works fine, but sure, for KVA limited > platforms you have to be careful. On the other hand, it's a temporary > mapping that only exists while doing I/O. How many concurrent I/O > operations would you expect on a 32bit

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86

2018-03-26 Thread Jaromír Doleček
2018-03-25 13:48 GMT+02:00 Michael van Elst : > For some reason it's not per block. There is a mechanism that moves > an 8kbyte window, independent of the block size. You can easily change > the window size (I'm currently experimenting with 128kbyte) by building > a kernel with e.g. options UBC_WIN

Fixing excessive shootdowns during FFS read (kern/53124) on x86

2018-03-25 Thread Jaromír Doleček
Hello, I'd like to gather some feedback on how to best tackle kern/53124. The problem there is that FFS triggers a pathologic case. I/O transfer maps and then unmaps each block into kernel pmap, so that the data could be copied into user memory. This triggers TLB shootdown IPIs for each FS block,

Re: amd64: svs

2018-01-11 Thread Jaromír Doleček
2018-01-10 19:56 GMT+01:00 Maxime Villard : > That's what we do, too. Doing this is faster than "unmapping the kernel pages > on return to user space". > > Switching the address space costs one "movq %rax,%cr3". Yes, but address space switch also causes an implicit TLB/paging structure cache flush

Re: amd64: svs

2018-01-09 Thread Jaromír Doleček
2018-01-08 10:24 GMT+01:00 Maxime Villard : > > As far as SVS is concerned, it is not needed: each time an L4 slot is added > (pmap_get_ptp) or removed (pmap_free_ptp), SVS only applies the change in the > user page tables. > > The TLB is then flushed as usual: the slots that are stale in the pmap

Re: amd64: svs

2018-01-07 Thread Jaromír Doleček
BTW Maxime, I've updated https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention to note your work on NetBSD support for SMAP and SMEP. Would it be feasible to get SMAP support pulled up to netbsd-8 by chance? The AMD Epyc processors support SMEP/SMAP as well, hopefully this will eventuall

Re: increase softint_bytes

2017-11-16 Thread Jaromír Doleček
softint_establish() already fails if there is no space, and prints a WARNING in this case. The caller however needs to check for the failure, which ixgbe(4) mostly doesn't. Not sure if that is really the root cause, the panic seems to be actually in different area, and I didn't actually see the war

Re: ext3 support

2017-10-30 Thread Jaromír Doleček
I share pretty much the same sentiment. ext3/ext4 journalling support is not very useful, as it's not likely to be used for anything more critical then sharing files between Linux and NetBSD. Ext3 journal would have to be complete reimplementation - NetBSD wapbl is not usable nor sufficient for ex

Re: Deadlock on fragmented memory?

2017-10-24 Thread Jaromír Doleček
Thanks for doing this. How difficult you gather it would be to convert the execargs to use 4K pages rathern then continuous KVA, are there any particular gotchas? I would be interested to look at this - it sounds like nice high-kernel only project for me, to make a break from the ATA oddities. Ja

Removal of 'wd* at umass?' driver?

2017-10-08 Thread Jaromír Doleček
Hello, The code for this umass attachment (dev/usb/umass_isdata.c) doesn't use the atabus, and sidesteps the normal atabus processing. Today I discovered that after the NCQ branch was merged, some kernels which included dev/usb/usbdevices.config but no atabus failed to compile due to missing symbo

Re: Snapshot performances

2017-09-26 Thread Jaromír Doleček
The WAPBL resource exhaustion caught my attention. Does this still happen with netbsd-8? I've fixed something like this in WAPBL a while ago, and the fixes are on the release branch. Jaromir 2017-09-26 14:52 GMT+02:00 Edgar Fuß : > > How is an internal snapshot defined? > I'm not completely sur

Re: Proposal: Disable autoload of compat_xyz modules

2017-08-01 Thread Jaromír Doleček
I like that all the arch-specific code is under sys/arch, and not randomly spread around tree, i.e. I prefer to keep the compat things under sys/arch. For sure, same argument could be used the opposite direction, that it would be neater to have all the compat code together. But IMO the arch-speci

SATA PMP and ATAPI - apparently not working, is it supposed to?

2017-06-28 Thread Jaromír Doleček
Hi, during my testing on jdolecek-ncq branch, I found that when I attach the ATAPI device via a port multiplier, the device is never actually detected. The behaviour I see is that atapibus is detected on the port, but not the actual device cd1, like this: ... atabus10 at siisata0 channel 0 ... at

Re: HEADS-UP: jdolecek-ncq branch merge imminent

2017-06-28 Thread Jaromír Doleček
ng XXXs in siisata(4) prevent the branch to be merged? Jaromir 2017-06-13 22:34 GMT+02:00 Jaromír Doleček : > I'll see if I can do some chaos monkey thing for wd(4) to test the error > paths. > > I'm not quite sure what kind of special error recovery would be necessary > t

Re: HEADS-UP: jdolecek-ncq branch merge imminent

2017-06-14 Thread Jaromír Doleček
ere anything on HEAD which is useful to sync to the branch? I'm wondering whether it's useful to sync it to ncq branch right now at all. Jaromir 2017-06-08 10:45 GMT+02:00 Jonathan A. Kollasch : > On Wed, Jun 07, 2017 at 09:45:48PM +0200, Jaromír Doleček wrote: > > Hello, > >

HEADS-UP: jdolecek-ncq branch merge imminent

2017-06-07 Thread Jaromír Doleček
Hello, I plan to merge the branch to HEAD very soon, likely over the weekend. Eventual further fixes will be done on HEAD already, including mvsata(4) restabilization, and potential switch of siisata(4) to support NCQ. The plan is to get this pulled up to netbsd-8 branch soon also, so that it wil

Re: Modules and bus_dmamap_create() failing

2017-05-28 Thread Jaromír Doleček
size would help? Jaromir 2017-05-28 13:48 GMT+02:00 Jonathan A. Kollasch : > > On Sat, May 27, 2017 at 11:21:14AM +0200, Jaromír Doleček wrote: > > The driver allocates sizeable, but not excessive amounts of memory - mostly > > working with values like MAXPHYS / PAGE_SIZE * (queue size)

Modules and bus_dmamap_create() failing

2017-05-27 Thread Jaromír Doleček
Hello, while working on vioscsi(4) improvements, I had quite tedious problems with driver failing to allocate the dma map in bus_dmamap_create(). Symptoms were that the bus_dmamap_create()/bus_dmamem_alloc() call was very often, but not always, failing when driver was loaded via module. For examp

Initial ahci(4) NCQ support on jdolecek-ncq branch, test/review wanted

2017-04-19 Thread Jaromír Doleček
Hi, I've committed the initial NCQ support, it's available at jdolecek-ncq branch (branch only over sys). It works under Parallels for now, and is ready for brave adventurers. There is sys/dev/ata/TODO.ncq which you might want to check, also. If people want to try it out and give feedback, I'd re

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-04-03 Thread Jaromír Doleček
2017-04-02 17:28 GMT+02:00 Thor Lancelot Simon : > However -- I believe for the 20-30% of SAS drives you mention as shipping > with WCE set, it should be possible to obtain nearly identical performance > and more safety by setting the Queue Algorithm Modifier bit in the control > mode page to 1. T

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-31 Thread Jaromír Doleček
2017-03-31 22:16 GMT+02:00 Thor Lancelot Simon : > It's not obvious, but in fact ORDERED gets set for writes > as a default, I believe -- in sd.c, I think? > > This confused me for some time when I last looked at it. It confused me also, that's why I changed the code a while back to be less confus

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-31 Thread Jaromír Doleček
> The problem is that it does not always use SIMPLE and ORDERED tags in a > way that would facilitate the use of ORDERED tags to enforce barriers. Our scsipi layer actually never issues ORDERED tags right now as far as I can see, and there is currently no interface to get it set for an I/O. > Als

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-27 Thread Jaromír Doleček
some serious objections. Jaromir 2017-03-05 23:22 GMT+01:00 Jaromír Doleček : > Here is an updated patch. It was updated to check for the FUA support > for SCSI, using the MODE SENSE device-specific flag. Code was tested > with QEMU emulated bha(4) and nvme. WAPBL code was updated to use t

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-27 Thread Jaromír Doleček
2017-03-12 11:15 GMT+01:00 Edgar Fuß : > Some comments as I probably count as one of the larger WAPBL consumers (we > have ~150 employee's Home and Mail on NFS on FFS2+WAPBL on RAIDframe on SAS): I've not changed the code in RF to pass the cache flags, so the patch doesn't actually enable FUA ther

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-05 Thread Jaromír Doleček
Here is an updated patch. It was updated to check for the FUA support for SCSI, using the MODE SENSE device-specific flag. Code was tested with QEMU emulated bha(4) and nvme. WAPBL code was updated to use the flag. It keeps the flag naming for now. In the patch, WAPBL sets the flag for journal wri

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-03 Thread Jaromír Doleček
2017-03-03 18:11 GMT+01:00 David Holland : > Yes and no; there's also standard terminology for talking about > caches, so my inclination would be to call it something like > B_MEDIASYNC: synchronous at the media level. Okay, this might be good. Words better then acronyms :) > > For DPO it's not

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-02 Thread Jaromír Doleček
> Some quick thoughts, though: > > (1) ultimately it's necessary to patch each driver to crosscheck the > flag, because otherwise eventually there'll be silent problems. Maybe. I think I like having this as responsibility on the caller for now, avoids too broad tree changes. Ultimately it might in

Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-01 Thread Jaromír Doleček
Hi, I'm working on an interface for WAPBL to use Force Unit Access (FUA) feature on compatible hardware (currently SCSI and NVMe), as a replacement to full disk cache flushes. I'd also like to add support for DPO (Disable Page Out), as that is trivial extension of FUA support at least for SCSI. S

Re: Plan: journalling fixes for WAPBL

2017-01-02 Thread Jaromír Doleček
2017-01-02 18:31 GMT+01:00 David Holland : > Well, there's two things going on here. One is the parallelism limit, > which you can work around by having more drives, e.g. in an array. > The other is netbsd's 64k MAXPHYS issue, which is our own problem that > we could put time into. (And in fact, it

Re: Possible buffer cache race?

2016-10-30 Thread Jaromír Doleček
2016-10-24 19:34 GMT+02:00 David Holland : > Is this correlated with the syncer running? I have been seeing a > problem where every time the syncer runs it locks out everything else, > and once that happens some things take several seconds to complete > afterwards. Doesn't seem to me just syncer s

Re: WAPBL fix for deallocation exhaustion + slow file removal

2016-10-28 Thread Jaromír Doleček
The fix was committed, with only minor changes (some comments, and fixed mishandling of error return value in ffs_indirtrunc(). Jaromir 2016-10-06 22:36 GMT+02:00 Jaromír Doleček : > I've incorporated the mutex fix, here is the final patch relative to > trunk. I'd like to comm

Re: PCIVERBOSE causing kernel stack overflow during boot - why?

2016-10-26 Thread Jaromír Doleček
In my case it crashed on the same device, Core i7-6xxxK/Xeon-D Memory Controller (Target Address, Thermal, RAS) ID 0x6fa8. The last pci line before the trap was for device immediatelly preceding that one. Thanks Paul for getting to the bottom of this. Jaromir 2016-10-25 7:02 GMT+02:00 Paul Goyet

Re: PCIVERBOSE causing kernel stack overflow during boot - why?

2016-10-23 Thread Jaromír Doleček
Here is the output from lspci/pcictl. I'll try that DDB_COMMANDONENTER also - the machine is remote though, so I'll send it later when I get it. Thanks. Jaromir 2016-10-19 7:23 GMT+02:00 Paul Goyette : > On Tue, 18 Oct 2016, Paul Goyette wrote: > >> Just as an added experiment, can you try to b

Possible buffer cache race?

2016-10-23 Thread Jaromír Doleček
Hi, I'm doing some testing with nvme(4) with quite deep i/o queues with -current kernel on a MP system. At this moment basically just confirming functionality, so just bunch of parallel tar extracting file within a filesystem to a subdirectory. I have the filesystem mounted async and the machine

PCIVERBOSE causing kernel stack overflow during boot - why?

2016-10-17 Thread Jaromír Doleček
Hi, I've got an amd64 system which panics with 'stack overflow detected' on boot, somewhere halfway through probing pci9 bus, when booted with kernel with PCIVERBOSE. Same kernel config without PCIVERBOSE boots fine. dmesg without PCIVERBOSE is attached. Any idea what might be causing this? I've

Re: WAPBL fix for deallocation exhaustion + slow file removal

2016-10-06 Thread Jaromír Doleček
I've incorporated the mutex fix, here is the final patch relative to trunk. I'd like to commit this sometime next week. Jaromir 2016-10-01 19:00 GMT+02:00 Taylor R Campbell : >Date: Sat, 1 Oct 2016 18:40:31 +0200 >From: Jaromír Dole ek > >> Thanks for taking a shot at this! But I th

Re: WAPBL fix for deallocation exhaustion + slow file removal

2016-10-01 Thread Jaromír Doleček
> Thanks for taking a shot at this! But I think it needs a little more > time for review -- certainly I can't digest it in the 24 hours you're > giving. Sure. > From a quick glance at the patch, I see one bug immediately in > vfs_wapbl.c that must have been introduced in a recent change: > pool_

WAPBL fix for deallocation exhaustion + slow file removal

2016-10-01 Thread Jaromír Doleček
Hi, attached patch contains a fix to WAPBL deallocation structure exhaustion and panic (kern/47146), and avoids need to do slow partial truncates in loop, fixing kern/49175. Patch changes wapbl_register_deallocation() to fail with EAGAIN when we run into the limit, and change ffs_truncate() and f

Re: Plan: journalling fixes for WAPBL

2016-09-28 Thread Jaromír Doleček
I think it's far assesment to say that on SATA with NCQ/31 tags (max is actually 31, not 32 tags), it's pretty much impossible to have acceptable write performance without using write cache. We could never saturate even drive with 16MB cache with just 31 tags and 64k maxphys. So it's IMO not useful

Re: CVS commit: src/sys/arch

2016-09-23 Thread Jaromír Doleček
Hey Maxime, Seems the KASSERTs() are too aggressive, or there is some other bug. I can trigger the kassert by simply attaching to rump_ffs, setting a breakpoint and continuing, i.e: > rump_ffs -o log ./ffs ./mnt > gdb rump_ffs ... (gdb) attach RUMP_PID (gdb) break ffs_truncate Breakpoint 1 at 0x

  1   2   >