Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote: > 3. vhost looping endlessly, waiting for kworker to be scheduled > > I dug a little deeper on what the vhost is doing. I'm not an expert on > virtio whatsoever, so these are just educated guesses that maybe > someone can verify/correct. Please bear with me probably messing up > the terminology. > > - vhost is looping through available queues. > - vhost wants to wake up a kworker to process a found queue. > - kworker does something with that queue and terminates quickly. > > What I found by throwing in some very noisy trace statements was that, > if the kworker is not woken up, the vhost just keeps looping accross > all available queues (and seems to repeat itself). So it essentially > relies on the scheduler to schedule the kworker fast enough. Otherwise > it will just keep on looping until it is migrated off the CPU. Normally it takes the buffers off the queue and is done with it. I am guessing that at the same time guest is running on some other CPU and keeps adding available buffers? -- MST
Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
On Tue, Nov 28, 2023 at 04:55:11PM +0800, Abel Wu wrote: > On 11/27/23 9:56 PM, Tobias Huschle Wrote: > > On Wed, Nov 22, 2023 at 11:00:16AM +0100, Peter Zijlstra wrote: > > > On Tue, Nov 21, 2023 at 02:17:21PM +0100, Tobias Huschle wrote: [...] > > What are the weights of the two entities? > Both entities have the same weights (I saw 1048576 for both of them). The story looks different when we look at the cgroup hierarchy though: sew := weight of the sched entity (se->load.weight) CPU 6/KVM-2360[011] d 1158.884473: sched_place: comm=vhost-2961 pid=2984 sev=3595548386 sed=3598548386 sel=0 sew=1048576 avg=3595548386 min=3595548386 cpu=11 nr=0 vru=3595548386 lag=0 CPU 6/KVM-2360[011] d 1158.884473: sched_place: comm= pid=0 sev=19998138425 sed=20007532920 sel=0 sew=335754 avg=19998138425 min=19998138425 cpu=11 nr=0 vru=19998138425 lag=0 CPU 6/KVM-2360[011] d 1158.884474: sched_place: comm= pid=0 sev=37794158943 sed=37807515464 sel=0 sew=236146 avg=37794158943 min=37794158943 cpu=11 nr=0 vru=37794158943 lag=0 CPU 6/KVM-2360[011] d 1158.884474: sched_place: comm= pid=0 sev=50387168150 sed=50394482435 sel=0 sew=430665 avg=50387168150 min=50387168150 cpu=11 nr=0 vru=50387168150 lag=0 CPU 6/KVM-2360[011] d 1158.884474: sched_place: comm= pid=0 sev=76600751247 sed=77624751246 sel=0 sew=3876 avg=76600751247 min=76600751247 cpu=11 nr=0 vru=76600751247 lag=0 <...> vhost-2961-2984[011] d 1158.884487: sched_place: comm=kworker/11:2 pid=202 sev=76603905961 sed=76606905961 sel=0 sew=1048576 avg=76603905961 min=76603905961 cpu=11 nr=1 vru=76603905961 lag=0 Here we can see the following weights: kworker -> 1048576 vhost -> 1048576 cgroup root ->3876 kworker and vhost weights remain the same. The weights of the nodes in the cgroup vary. I also spent some more thought on this and have some more observations: 1. kworker lag after short runtime vhost-2961-2984[011] d 1158.884486: sched_waking: comm=kworker/11:2 pid=202 prio=120 target_cpu=011 vhost-2961-2984[011] d 1158.884487: sched_place: comm=kworker/11:2 pid=202 sev=76603905961 sed=76606905961 sel=0 sew=1048576 avg=76603905961 min=76603905961 cpu=11 nr=1 vru=76603905961 lag=0 <...> ^ vhost-2961-2984[011] d 1158.884490: sched_switch: prev_comm=vhost-2961 prev_pid=2984 prev_prio=120 prev_state=R+ ==> next_comm=kworker/11:2 next_pid=202 next_prio=120 kworker/11:2-202[011] d 1158.884491: sched_waking: comm=CPU 0/KVM pid=2988 prio=120 target_cpu=009 kworker/11:2-202[011] d 1158.884492: sched_stat_runtime: comm=kworker/11:2 pid=202 runtime=5150 [ns] vruntime=7660391 [ns] deadline=76606905961 [ns] lag=76606905961 kworker/11:2-202[011] d 1158.884492: sched_update: comm=kworker/11:2 pid=202 sev=7660391 sed=76606905961 sel=-1128 sew=1048576 avg=76603909983 min=76603905961 cpu=11 nr=2 lag=-1128 lim=1000 ^ kworker/11:2-202[011] d 1158.884494: sched_stat_wait: comm=vhost-2961 pid=2984 delay=5150 [ns] kworker/11:2-202[011] d 1158.884494: sched_switch: prev_comm=kworker/11:2 prev_pid=202 prev_prio=120 prev_state=I ==> next_comm=vhost-2961 next_pid=2984 next_prio=120 In the sequence above, the kworker gets woken up by the vhost and placed on the timeline with 0 lag. The kworker then executes for 5150ns and returns control to the vhost. Unfortunately, this short runtime earns the kworker a negative lag of -1128. This in turn, causes the kworker to not be selected by check_preempt_wakeup_fair. My naive understanding of lag is, that only those entities get negative lag, which consume more time than they should. Why is the kworker being punished for running only a tiny portion of time? In the majority of cases, the kworker finishes after a 4-digit number of ns. There are occassional outliers with 5-digit numbers. I would therefore not expect negative lag for the kworker. It is fair to say that the kworker was executing while the vhost was not. kworker gets put on the queue with no lag, so it essentially has its vruntime set to avg_vruntime. After giving up its timeslice the kworker has now a vruntime which is larger than the avg_vruntime. Hence the negative lag might make sense here from an algorithmic standpoint. 2a/b. vhost getting increased deadlines over time, no call of pick_eevdf vhost-2961-2984[011] d.h.. 1158.892878: sched_stat_runtime: comm=vhost-2961 pid=2984 runtime=8385872 [ns] vruntime=3603948448 [ns] deadline=3606948448 [ns] lag=3598548386 vhost-2961-2984
Re: [PATCH net-next 6/6] tools: virtio: introduce vhost_net_test
On Tue, Dec 5, 2023 at 7:35 PM Yunsheng Lin wrote: > > introduce vhost_net_test basing on virtio_test to test > vhost_net changing in the kernel. > > Signed-off-by: Yunsheng Lin > --- > tools/virtio/Makefile | 8 +- > tools/virtio/vhost_net_test.c | 441 ++ > 2 files changed, 446 insertions(+), 3 deletions(-) > create mode 100644 tools/virtio/vhost_net_test.c > > diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile > index d128925980e0..e25e99c1c3b7 100644 > --- a/tools/virtio/Makefile > +++ b/tools/virtio/Makefile > @@ -1,8 +1,9 @@ > # SPDX-License-Identifier: GPL-2.0 > all: test mod > -test: virtio_test vringh_test > +test: virtio_test vringh_test vhost_net_test > virtio_test: virtio_ring.o virtio_test.o > vringh_test: vringh_test.o vringh.o virtio_ring.o > +vhost_net_test: virtio_ring.o vhost_net_test.o > > try-run = $(shell set -e; \ > if ($(1)) >/dev/null 2>&1; \ > @@ -49,6 +50,7 @@ oot-clean: OOT_BUILD+=clean > > .PHONY: all test mod clean vhost oot oot-clean oot-build > clean: > - ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \ > - vhost_test/Module.symvers vhost_test/modules.order *.d > + ${RM} *.o vringh_test virtio_test vhost_net_test vhost_test/*.o \ > + vhost_test/.*.cmd vhost_test/Module.symvers \ > + vhost_test/modules.order *.d > -include *.d > diff --git a/tools/virtio/vhost_net_test.c b/tools/virtio/vhost_net_test.c > new file mode 100644 > index ..7e7b7aba3668 > --- /dev/null > +++ b/tools/virtio/vhost_net_test.c > @@ -0,0 +1,441 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define RANDOM_BATCH -1 > + > +static int tun_alloc(void) > +{ > + struct ifreq ifr; > + int fd, e; > + > + fd = open("/dev/net/tun", O_RDWR); > + if (fd < 0) { > + perror("Cannot open /dev/net/tun"); > + return fd; > + } > + > + memset(, 0, sizeof(ifr)); > + > + ifr.ifr_flags = IFF_TUN | IFF_NO_PI; Why did you use IFF_TUN but not IFF_TAP here? > + strncpy(ifr.ifr_name, "tun0", IFNAMSIZ); tun0 is pretty common if there's a VPN. Do we need some randomized name here? > + > + e = ioctl(fd, TUNSETIFF, (void *) ); > + if (e < 0) { > + perror("ioctl[TUNSETIFF]"); > + close(fd); > + return e; > + } > + > + return fd; > +} > + > +/* Unused */ > +void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end; Why do we need trick like these here? > + > +struct vq_info { > + int kick; > + int call; > + int num; > + int idx; > + void *ring; > + /* copy used for control */ > + struct vring vring; > + struct virtqueue *vq; > +}; > + > +struct vdev_info { > + struct virtio_device vdev; > + int control; > + struct pollfd fds[1]; > + struct vq_info vqs[1]; > + int nvqs; > + void *buf; > + size_t buf_size; > + struct vhost_memory *mem; > +}; > + > +static struct vhost_vring_file no_backend = { .index = 1, .fd = -1 }, > +backend = { .index = 1, .fd = 1 }; A magic number like fd = 1 is pretty confusing. And I don't see why we need global variables here. > +static const struct vhost_vring_state null_state = {}; > + > +bool vq_notify(struct virtqueue *vq) > +{ > + struct vq_info *info = vq->priv; > + unsigned long long v = 1; > + int r; > + r = write(info->kick, , sizeof v); > + assert(r == sizeof v); > + return true; > +} > + > +void vq_callback(struct virtqueue *vq) > +{ > +} > + > + > +void vhost_vq_setup(struct vdev_info *dev, struct vq_info *info) > +{ > + struct vhost_vring_state state = { .index = info->idx }; > + struct vhost_vring_file file = { .index = info->idx }; > + unsigned long long features = dev->vdev.features; > + struct vhost_vring_addr addr = { > + .index = info->idx, > + .desc_user_addr = (uint64_t)(unsigned long)info->vring.desc, > + .avail_user_addr = (uint64_t)(unsigned long)info->vring.avail, > + .used_user_addr = (uint64_t)(unsigned long)info->vring.used, > + }; > + int r; > + r = ioctl(dev->control, VHOST_SET_FEATURES, ); > + assert(r >= 0); > + state.num = info->vring.num; > + r = ioctl(dev->control, VHOST_SET_VRING_NUM, ); > + assert(r >= 0); > + state.num = 0; > + r = ioctl(dev->control, VHOST_SET_VRING_BASE, ); > + assert(r >= 0); > + r = ioctl(dev->control, VHOST_SET_VRING_ADDR, ); > + assert(r >= 0); > +
Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()
On Tue, Dec 5, 2023 at 7:35 PM Yunsheng Lin wrote: > > The page frag in vhost_net_page_frag_refill() uses the > 'struct page_frag' from skb_page_frag_refill(), but it's > implementation is similar to page_frag_alloc_align() now. > > This patch removes vhost_net_page_frag_refill() by using > 'struct page_frag_cache' instead of 'struct page_frag', > and allocating frag using page_frag_alloc_align(). > > The added benefit is that not only unifying the page frag > implementation a little, but also having about 0.5% performance > boost testing by using the vhost_net_test introduced in the > last patch. > > Signed-off-by: Yunsheng Lin > --- Acked-by: Jason Wang Thanks
Re: [PATCH net-next 5/6] net: introduce page_frag_cache_drain()
On Tue, Dec 5, 2023 at 7:35 PM Yunsheng Lin wrote: > > When draining a page_frag_cache, most user are doing > the similar steps, so introduce an API to avoid code > duplication. > > Signed-off-by: Yunsheng Lin > --- For vhost part: Acked-by: Jason Wang Thanks
Re: [PATCH] ipvs: add a stateless type of service and a stateless Maglev hashing scheduler
Hi Lev, kernel test robot noticed the following build warnings: https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Lev-Pantiukhin/ipvs-add-a-stateless-type-of-service-and-a-stateless-Maglev-hashing-scheduler/20231204-232344 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master patch link: https://lore.kernel.org/r/20231204152020.472247-1-kndrvt%40yandex-team.ru patch subject: [PATCH] ipvs: add a stateless type of service and a stateless Maglev hashing scheduler config: i386-randconfig-141-20231207 (https://download.01.org/0day-ci/archive/20231207/202312070849.i9gwwsh0-...@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce: (https://download.01.org/0day-ci/archive/20231207/202312070849.i9gwwsh0-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Reported-by: Dan Carpenter | Closes: https://lore.kernel.org/r/202312070849.i9gwwsh0-...@intel.com/ New smatch warnings: net/netfilter/ipvs/ip_vs_core.c:545 ip_vs_schedule() error: uninitialized symbol 'need_state'. vim +/need_state +545 net/netfilter/ipvs/ip_vs_core.c ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 440 struct ip_vs_conn * 190ecd27cd7294 net/netfilter/ipvs/ip_vs_core.c Julian Anastasov 2010-10-17 441 ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb, d4383f04d145cc net/netfilter/ipvs/ip_vs_core.c Jesper Dangaard Brouer 2012-09-26 442 struct ip_vs_proto_data *pd, int *ignored, d4383f04d145cc net/netfilter/ipvs/ip_vs_core.c Jesper Dangaard Brouer 2012-09-26 443 struct ip_vs_iphdr *iph) ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 444 { 9330419d9aa4f9 net/netfilter/ipvs/ip_vs_core.c Hans Schillstrom 2011-01-03 445 struct ip_vs_protocol *pp = pd->pp; ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 446 struct ip_vs_conn *cp = NULL; ceec4c38168184 net/netfilter/ipvs/ip_vs_core.c Julian Anastasov 2013-03-22 447 struct ip_vs_scheduler *sched; ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 448 struct ip_vs_dest *dest; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 449 __be16 _ports[2], *pptr, cport, vport; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 450 const void *caddr, *vaddr; 3575792e005dc9 net/netfilter/ipvs/ip_vs_core.c Julian Anastasov 2010-09-17 451 unsigned int flags; b276d504bee439 net/netfilter/ipvs/ip_vs_core.c Lev Pantiukhin 2023-12-04 452 bool need_state; ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 453 190ecd27cd7294 net/netfilter/ipvs/ip_vs_core.c Julian Anastasov 2010-10-17 454 *ignored = 1; 2f74713d1436b7 net/netfilter/ipvs/ip_vs_core.c Jesper Dangaard Brouer 2012-09-26 455 /* 2f74713d1436b7 net/netfilter/ipvs/ip_vs_core.c Jesper Dangaard Brouer 2012-09-26 456* IPv6 frags, only the first hit here. 2f74713d1436b7 net/netfilter/ipvs/ip_vs_core.c Jesper Dangaard Brouer 2012-09-26 457*/ 6b3d933000cbe5 net/netfilter/ipvs/ip_vs_core.c Gao Feng 2017-11-13 458 pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 459 if (pptr == NULL) ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 460 return NULL; ^1da177e4c3f41 net/ipv4/ipvs/ip_vs_core.c Linus Torvalds 2005-04-16 461 ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 462 if (likely(!ip_vs_iph_inverse(iph))) { ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 463 cport = pptr[0]; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 464 caddr = >saddr; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 465 vport = pptr[1]; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 466 vaddr = >daddr; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 467 } else { ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 468 cport = pptr[1]; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 469 caddr = >daddr; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 470 vport = pptr[0]; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex Gartrell 2015-08-26 471 vaddr = >saddr; ee78378f976488 net/netfilter/ipvs/ip_vs_core.c Alex
[PATCH v2 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax
Add the missing sysfs ABI documentation for the device DAX subsystem. Various ABI attributes under this have been present since v5.1, and more have been added over time. In preparation for adding a new attribute, add this file with the historical details. Cc: Dan Williams Signed-off-by: Vishal Verma --- Documentation/ABI/testing/sysfs-bus-dax | 151 1 file changed, 151 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax new file mode 100644 index ..a61a7b186017 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -0,0 +1,151 @@ +What: /sys/bus/dax/devices/daxX.Y/align +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RW) Provides a way to specify an alignment for a dax device. + Values allowed are constrained by the physical address ranges + that back the dax device, and also by arch requirements. + +What: /sys/bus/dax/devices/daxX.Y/mapping +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (WO) Provides a way to allocate a mapping range under a dax + device. Specified in the format -. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'start' attribute indicates the + start physical address for the given range. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'end' attribute indicates the + end physical address for the given range. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'page_offset' attribute indicates the + offset of the current range in the dax device. + +What: /sys/bus/dax/devices/daxX.Y/resource +Date: June, 2019 +KernelVersion: v5.3 +Contact: nvd...@lists.linux.dev +Description: + (RO) The resource attribute indicates the starting physical + address of a dax device. In case of a device with multiple + constituent ranges, it indicates the starting address of the + first range. + +What: /sys/bus/dax/devices/daxX.Y/size +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RW) The size attribute indicates the total size of a dax + device. For creating subdivided dax devices, or for resizing + an existing device, the new size can be written to this as + part of the reconfiguration process. + +What: /sys/bus/dax/devices/daxX.Y/numa_node +Date: November, 2019 +KernelVersion: v5.5 +Contact: nvd...@lists.linux.dev +Description: + (RO) If NUMA is enabled and the platform has affinitized the + backing device for this dax device, emit the CPU node + affinity for this device. + +What: /sys/bus/dax/devices/daxX.Y/target_node +Date: February, 2019 +KernelVersion: v5.1 +Contact: nvd...@lists.linux.dev +Description: + (RO) The target-node attribute is the Linux numa-node that a + device-dax instance may create when it is online. Prior to + being online the device's 'numa_node' property reflects the + closest online cpu node which is the typical expectation of a + device 'numa_node'. Once it is online it becomes its own + distinct numa node. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvd...@lists.linux.dev +Description: + (RO) The available_size attribute tracks available dax region + capacity. This only applies to volatile hmem devices, not pmem + devices, since pmem devices are defined by nvdimm namespace + boundaries. + +What: $(readlink -f
[PATCH v2 2/2] dax: add a sysfs knob to control memmap_on_memory behavior
Add a sysfs knob for dax devices to control the memmap_on_memory setting if the dax device were to be hotplugged as system memory. The default memmap_on_memory setting for dax devices originating via pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to preserve legacy behavior. For dax devices via CXL, the default is on. The sysfs control allows the administrator to override the above defaults if needed. Cc: David Hildenbrand Cc: Dan Williams Cc: Dave Jiang Cc: Dave Hansen Cc: Huang Ying Reviewed-by: Jonathan Cameron Reviewed-by: David Hildenbrand Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 40 + Documentation/ABI/testing/sysfs-bus-dax | 13 +++ 2 files changed, 53 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1ff1ab5fa105..11abb57cc031 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1270,6 +1270,45 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t memmap_on_memory_show(struct device *dev, +struct device_attribute *attr, char *buf) +{ + struct dev_dax *dev_dax = to_dev_dax(dev); + + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); +} + +static ssize_t memmap_on_memory_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + ssize_t rc; + bool val; + + rc = kstrtobool(buf, ); + if (rc) + return rc; + + if (dev_dax->memmap_on_memory == val) + return len; + + device_lock(dax_region->dev); + if (!dax_region->dev->driver) { + device_unlock(dax_region->dev); + return -ENXIO; + } + + device_lock(dev); + dev_dax->memmap_on_memory = val; + device_unlock(dev); + + device_unlock(dax_region->dev); + return rc == 0 ? len : rc; +} +static DEVICE_ATTR_RW(memmap_on_memory); + static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); @@ -1296,6 +1335,7 @@ static struct attribute *dev_dax_attributes[] = { _attr_align.attr, _attr_resource.attr, _attr_numa_node.attr, + _attr_memmap_on_memory.attr, NULL, }; diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax index a61a7b186017..bb063a004e41 100644 --- a/Documentation/ABI/testing/sysfs-bus-dax +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -149,3 +149,16 @@ KernelVersion: v5.1 Contact: nvd...@lists.linux.dev Description: (RO) The id attribute indicates the region id of a dax region. + +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory +Date: October, 2023 +KernelVersion: v6.8 +Contact: nvd...@lists.linux.dev +Description: + (RW) Control the memmap_on_memory setting if the dax device + were to be hotplugged as system memory. This determines whether + the 'altmap' for the hotplugged memory will be placed on the + device being hotplugged (memmap_on+memory=1) or if it will be + placed on regular memory (memmap_on_memory=0). This attribute + must be set before the device is handed over to the 'kmem' + driver (i.e. hotplugged into system-ram). -- 2.41.0
[PATCH v2 0/2] Add DAX ABI for memmap_on_memory
The DAX drivers were missing sysfs ABI documentation entirely. Add this missing documentation for the sysfs ABI for DAX regions and Dax devices in patch 1. Add a new ABI for toggling memmap_on_memory semantics in patch 2. The missing ABI was spotted in [1], this series is a split of the new ABI additions behind the initial documentation creation. [1]: https://lore.kernel.org/linux-cxl/651f27b728fef_ae7e729...@dwillia2-xfh.jf.intel.com.notmuch/ Cc: Dan Williams Cc: Cc: Cc: To: Dave Jiang Signed-off-by: Vishal Verma Changes in v2: - Fix CC lists, patch 1/2 didn't get sent correctly in v1 - Link to v1: https://lore.kernel.org/r/20231206-vv-dax_abi-v1-0-474eb88e2...@intel.com --- Vishal Verma (2): Documentatiion/ABI: Add ABI documentation for sys-bus-dax dax: add a sysfs knob to control memmap_on_memory behavior drivers/dax/bus.c | 40 Documentation/ABI/testing/sysfs-bus-dax | 164 2 files changed, 204 insertions(+) --- base-commit: c4e1ccfad42352918810802095a8ace8d1c744c9 change-id: 20231025-vv-dax_abi-17a219c46076 Best regards, -- Vishal Verma
[PATCH v8] bus: mhi: host: Add tracing support
This change adds ftrace support for following functions which helps in debugging the issues when there is Channel state & MHI state change and also when we receive data and control events: 1. mhi_intvec_mhi_states 2. mhi_process_data_event_ring 3. mhi_process_ctrl_ev_ring 4. mhi_gen_tre 5. mhi_update_channel_state 6. mhi_tryset_pm_state 7. mhi_pm_st_worker Where ever the trace events are added, debug messages are removed. Signed-off-by: Krishna chaitanya chundru --- Changes in v8: - Pass the structure and derefernce the variables in TP_fast_assign as suggested by steve - Link to v7: https://lore.kernel.org/r/20231206-ftrace_support-v7-1-aca49a042...@quicinc.com Changes in v7: - change log format as pointed by mani. - Link to v6: https://lore.kernel.org/r/20231204-ftrace_support-v6-1-9b206546d...@quicinc.com Changes in v6: - use 'rp' directly as suggested by jeffrey. - Link to v5: https://lore.kernel.org/r/20231127-ftrace_support-v5-1-eb67daead...@quicinc.com Changes in v5: - Use DECLARE_EVENT_CLASS for multiple events as suggested by steve. - Instead of converting to u64 to print address, use %px to print the address to avoid - warnings in some platforms. - Link to v4: https://lore.kernel.org/r/2023-ftrace_support-v4-1-c83602399...@quicinc.com Changes in v4: - Fix compilation issues in previous patch which happended due to rebasing. - In the defconfig FTRACE config is not enabled due to that the compilation issue is not - seen in my workspace. - Link to v3: https://lore.kernel.org/r/2023-ftrace_support-v3-1-f358d2911...@quicinc.com Changes in v3: - move trace header file from include/trace/events to drivers/bus/mhi/host/ so that - we can include driver header files. - Use macros directly in the trace events as suggested Jeffrey Hugo. - Reorder the structure in the events as suggested by steve to avoid holes in the buffer. - removed the mhi_to_physical function as this can give security issues. - removed macros to define strings as we can get those from driver headers. - Link to v2: https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce01...@quicinc.com Changes in v2: - Passing the raw state into the trace event and using __print_symbolic() as suggested by bjorn. - Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn. - Fixed the kernel test rebot issues. - Link to v1: https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com --- drivers/bus/mhi/host/init.c | 3 + drivers/bus/mhi/host/main.c | 19 ++-- drivers/bus/mhi/host/pm.c| 7 +- drivers/bus/mhi/host/trace.h | 205 +++ 4 files changed, 221 insertions(+), 13 deletions(-) diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index f78aefd2d7a3..6acb85f4c5f8 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -20,6 +20,9 @@ #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + static DEFINE_IDA(mhi_controller_ida); const char * const mhi_ee_str[MHI_EE_MAX] = { diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index dcf627b36e82..189f4786403e 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -15,6 +15,7 @@ #include #include #include "internal.h" +#include "trace.h" int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *base, u32 offset, u32 *out) @@ -491,11 +492,8 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv) state = mhi_get_mhi_state(mhi_cntrl); ee = mhi_get_exec_env(mhi_cntrl); - dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n", - TO_MHI_EXEC_STR(mhi_cntrl->ee), - mhi_state_str(mhi_cntrl->dev_state), - TO_MHI_EXEC_STR(ee), mhi_state_str(state)); + trace_mhi_intvec_states(mhi_cntrl, ee, state); if (state == MHI_STATE_SYS_ERR) { dev_dbg(dev, "System error detected\n"); pm_state = mhi_tryset_pm_state(mhi_cntrl, @@ -832,6 +830,8 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_ctrl_event(mhi_cntrl, local_rp); + switch (type) { case MHI_PKT_TYPE_BW_REQ_EVENT: { @@ -997,6 +997,8 @@ int mhi_process_data_event_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp && event_quota > 0) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_data_event(mhi_cntrl, local_rp); + chan = MHI_TRE_GET_EV_CHID(local_rp); WARN_ON(chan >= mhi_cntrl->max_chan); @@ -1235,6 +1237,7 @@ int mhi_gen_tre(struct mhi_controller *mhi_
Re: [PATCH v7] bus: mhi: host: Add tracing support
On 12/6/2023 9:25 PM, Steven Rostedt wrote: On Wed, 6 Dec 2023 21:12:57 +0530 Krishna chaitanya chundru wrote: diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index f78aefd2d7a3..6acb85f4c5f8 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -20,6 +20,9 @@ #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + static DEFINE_IDA(mhi_controller_ida); const char * const mhi_ee_str[MHI_EE_MAX] = { diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index dcf627b36e82..507cf3351a97 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -15,6 +15,7 @@ #include #include #include "internal.h" +#include "trace.h" int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *base, u32 offset, u32 *out) @@ -491,11 +492,9 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv) state = mhi_get_mhi_state(mhi_cntrl); ee = mhi_get_exec_env(mhi_cntrl); - dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n", - TO_MHI_EXEC_STR(mhi_cntrl->ee), - mhi_state_str(mhi_cntrl->dev_state), - TO_MHI_EXEC_STR(ee), mhi_state_str(state)); + trace_mhi_intvec_states(mhi_cntrl->mhi_dev->name, mhi_cntrl->ee, + mhi_cntrl->dev_state, ee, state); I'm not sure if this was discussed before, but why not just pass in the mhi_cntrl pointer and do the assigning in the TRACE_EVENT() TP_fast_assign() portion? It will save on setting up the parameters. if (state == MHI_STATE_SYS_ERR) { dev_dbg(dev, "System error detected\n"); pm_state = mhi_tryset_pm_state(mhi_cntrl, @@ -832,6 +831,10 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_ctrl_event(mhi_cntrl->mhi_dev->name, local_rp, +local_rp->ptr, local_rp->dword[0], +local_rp->dword[1]); And here.. + switch (type) { case MHI_PKT_TYPE_BW_REQ_EVENT: { @@ -997,6 +1000,9 @@ int mhi_process_data_event_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp && event_quota > 0) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_data_event(mhi_cntrl->mhi_dev->name, local_rp, local_rp->ptr, +local_rp->dword[0], local_rp->dword[1]); and here.. for the local_rp. Especially since you are just repeating what you send into the two events. You can consolidate it down to just adding that in the TP_fast_assign() of the shared DECLARE_EVENT_CLASS(). + chan = MHI_TRE_GET_EV_CHID(local_rp); WARN_ON(chan >= mhi_cntrl->max_chan); @@ -1235,6 +1241,8 @@ int mhi_gen_tre(struct mhi_controller *mhi_cntrl, struct mhi_chan *mhi_chan, mhi_tre->dword[0] = MHI_TRE_DATA_DWORD0(info->len); mhi_tre->dword[1] = MHI_TRE_DATA_DWORD1(bei, eot, eob, chain); + trace_mhi_gen_tre(mhi_cntrl->mhi_dev->name, mhi_chan->chan, mhi_tre, + mhi_tre->ptr, mhi_tre->dword[0], mhi_tre->dword[1]); Same here. You can pass in the mhi_tre pointer. /* increment WP */ mhi_add_ring_element(mhi_cntrl, tre_ring); mhi_add_ring_element(mhi_cntrl, buf_ring); @@ -1327,9 +1335,7 @@ static int mhi_update_channel_state(struct mhi_controller *mhi_cntrl, enum mhi_cmd_type cmd = MHI_CMD_NOP; int ret; - dev_dbg(dev, "%d: Updating channel state to: %s\n", mhi_chan->chan, - TO_CH_STATE_TYPE_STR(to_state)); - + trace_mhi_channel_command_start(mhi_cntrl->mhi_dev->name, mhi_chan->chan, to_state); And here.. switch (to_state) { case MHI_CH_STATE_TYPE_RESET: write_lock_irq(_chan->lock); @@ -1396,9 +1402,7 @@ static int mhi_update_channel_state(struct mhi_controller *mhi_cntrl, write_unlock_irq(_chan->lock); } - dev_dbg(dev, "%d: Channel state change to %s successful\n", - mhi_chan->chan, TO_CH_STATE_TYPE_STR(to_state)); - + trace_mhi_channel_command_end(mhi_cntrl->mhi_dev->name, mhi_chan->chan, to_state); and here.. Where you can also update the DECLARE_EVENT_CLASS() to directly access the mhi_cntrl and mhi_chan pointers. Sometimes it's better to do the dereference from inside the TP_fast_assign. That way, things like eprobes and bpf tracing can also have access to the full structure if needed. -- Steve Hi Steve, I will make changes as suggested in the next patch series. - Krishna Chaitanya.
[PATCH] nvdimm-btt: fix a potential memleak in btt_freelist_init
When an error happens in btt_freelist_init(), its caller discover_arenas() will directly free arena, which makes arena->freelist allocated in btt_freelist_init() a leaked memory. Fix this by freeing arena->freelist in all error handling paths of btt_freelist_init(). Fixes: 5212e11fde4d ("nd_btt: atomic sector updates") Signed-off-by: Dinghao Liu --- drivers/nvdimm/btt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index d5593b0dc700..d8c4ba8bfdda 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -544,8 +544,10 @@ static int btt_freelist_init(struct arena_info *arena) for (i = 0; i < arena->nfree; i++) { new = btt_log_read(arena, i, _new, LOG_NEW_ENT); - if (new < 0) - return new; + if (new < 0) { + ret = new; + goto out_free; + } /* old and new map entries with any flags stripped out */ log_oldmap = ent_lba(le32_to_cpu(log_new.old_map)); @@ -577,7 +579,7 @@ static int btt_freelist_init(struct arena_info *arena) ret = btt_map_read(arena, le32_to_cpu(log_new.lba), _entry, NULL, NULL, 0); if (ret) - return ret; + goto out_free; /* * The map_entry from btt_read_map is stripped of any flag bits, @@ -594,11 +596,16 @@ static int btt_freelist_init(struct arena_info *arena) ret = btt_map_write(arena, le32_to_cpu(log_new.lba), le32_to_cpu(log_new.new_map), 0, 0, 0); if (ret) - return ret; + goto out_free; } } return 0; + +out_free: + kfree(arena->freelist); + arena->freelist = NULL; + return ret; } static bool ent_is_padding(struct log_entry *ent) -- 2.17.1
Re: [PATCH net-next 2/6] page_frag: unify gfp bit for order 3 page allocation
On Tue, 5 Dec 2023 19:34:40 +0800 Yunsheng Lin wrote: > __GFP_DIRECT_RECLAIM is xor'd to avoid > direct reclaim in skb_page_frag_refill(), but it is not > xor'd in __page_frag_cache_refill(). xor is not the same thing as masking a bit off. The patch itself LGTM.
Re: [PATCH net-next] tcp: add tracepoints for data send/recv/acked
On Tue, 5 Dec 2023 20:39:28 +0100, Eric Dumazet wrote: > On Tue, Dec 5, 2023 at 3:11 AM Xuan Zhuo wrote: > > > > On Mon, 4 Dec 2023 13:28:21 +0100, Eric Dumazet wrote: > > > On Mon, Dec 4, 2023 at 12:43 PM Philo Lu wrote: > > > > > > > > Add 3 tracepoints, namely tcp_data_send/tcp_data_recv/tcp_data_acked, > > > > which will be called every time a tcp data packet is sent, received, and > > > > acked. > > > > tcp_data_send: called after a data packet is sent. > > > > tcp_data_recv: called after a data packet is receviced. > > > > tcp_data_acked: called after a valid ack packet is processed (some sent > > > > data are ackknowledged). > > > > > > > > We use these callbacks for fine-grained tcp monitoring, which collects > > > > and analyses every tcp request/response event information. The whole > > > > system has been described in SIGMOD'18 (see > > > > https://dl.acm.org/doi/pdf/10.1145/3183713.3190659 for details). To > > > > achieve this with bpf, we require hooks for data events that call bpf > > > > prog (1) when any data packet is sent/received/acked, and (2) after > > > > critical tcp state variables have been updated (e.g., snd_una, snd_nxt, > > > > rcv_nxt). However, existing bpf hooks cannot meet our requirements. > > > > Besides, these tracepoints help to debug tcp when data send/recv/acked. > > > > > > This I do not understand. > > > > > > > > > > > Though kretprobe/fexit can also be used to collect these information, > > > > they will not work if the kernel functions get inlined. Considering the > > > > stability, we prefer tracepoint as the solution. > > > > > > I dunno, this seems quite weak to me. I see many patches coming to add > > > tracing in the stack, but no patches fixing any issues. > > > > > > We have implemented a mechanism to split the request and response from the > > TCP > > connection using these "hookers", which can handle various protocols such as > > HTTP, HTTPS, Redis, and MySQL. This mechanism allows us to record important > > information about each request and response, including the amount of data > > uploaded, the time taken by the server to handle the request, and the time > > taken > > for the client to receive the response. This mechanism has been running > > internally for many years and has proven to be very useful. > > > > One of the main benefits of this mechanism is that it helps in locating the > > source of any issues or problems that may arise. For example, if there is a > > problem with the network, the application, or the machine, we can use this > > mechanism to identify and isolate the issue. > > > > TCP has long been a challenge when it comes to tracking the transmission of > > data > > on the network. The application can only confirm that it has sent a certain > > amount of data to the kernel, but it has limited visibility into whether the > > client has actually received this data. Our mechanism addresses this issue > > by > > providing insights into the amount of data received by the client and the > > time > > it was received. Furthermore, we can also detect any packet loss or delays > > caused by the server. > > > > https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/7912288961/9732df025beny.svg > > > > So, we do not want to add some tracepoint to do some unknow debug. > > We have a clear goal. debugging is just an incidental capability. > > > > We have powerful mechanisms in the stack already that ordinary (no > privilege requested) applications can readily use. > > We have been using them for a while. > > If existing mechanisms are missing something you need, please expand them. > > For reference, start looking at tcp_get_timestamping_opt_stats() history. > > Sender side can for instance get precise timestamps. > > Combinations of these timestamps reveal different parts of the overall > network latency, > > T0: sendmsg() enters TCP > T1: first byte enters qdisc > T2: first byte sent to the NIC > T3: first byte ACKed in TCP > T4: last byte sent to the NIC > T5: last byte ACKed > T1 - T0: how long the first byte was blocked in the TCP layer ("Head > of Line Blocking" latency). > T2 - T1: how long the first byte was blocked in the Linux traffic > shaping layer (known as QDisc). > T3 - T2: the network ‘distance’ (propagation delay + current queuing > delay along the network path and at the receiver). > T5 - T2: how fast the sent chunk was delivered. > Message Size / (T5 - T0): goodput (from application’s perspective) The key point is that using our mechanism, the application does not need to be modified. As long as the app's network protocol is request-response, we can trace tcp connection at any time to analyze the request and response. And record the start and end times of request and response. Of course there is some ttl and other information. Thanks.
Re: [PATCH net-next v7 3/4] virtio/vsock: fix logic which reduces credit update messages
On 07.12.2023 01:08, Michael S. Tsirkin wrote: > On Thu, Dec 07, 2023 at 12:52:51AM +0300, Arseniy Krasnov wrote: >> >> >> On 07.12.2023 00:53, Michael S. Tsirkin wrote: >>> On Thu, Dec 07, 2023 at 12:18:48AM +0300, Arseniy Krasnov wrote: Add one more condition for sending credit update during dequeue from stream socket: when number of bytes in the rx queue is smaller than SO_RCVLOWAT value of the socket. This is actual for non-default value of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data transmission, because we need at least SO_RCVLOWAT bytes in our rx queue to wake up user for reading data (in corner case it is also possible to stuck both tx and rx sides, this is why 'Fixes' is used). Also handle case when 'fwd_cnt' wraps, while 'last_fwd_cnt' is still not. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov --- Changelog: v6 -> v7: * Handle wrap of 'fwd_cnt'. * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. net/vmw_vsock/virtio_transport_common.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index e137d740804e..39f8660d825d 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -558,6 +558,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, struct virtio_vsock_sock *vvs = vsk->trans; size_t bytes, total = 0; struct sk_buff *skb; + u32 fwd_cnt_delta; + bool low_rx_bytes; int err = -EFAULT; u32 free_space; @@ -601,7 +603,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, } } - free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); + /* Handle wrap of 'fwd_cnt'. */ + if (vvs->fwd_cnt < vvs->last_fwd_cnt) + fwd_cnt_delta = vvs->fwd_cnt + (U32_MAX - vvs->last_fwd_cnt); >>> >>> Are you sure there's no off by one here? for example if fwd_cnt is 0 >>> and last_fwd_cnt is 0xf then apparently delta is 0. >> >> Seems yes, I need +1 here > > And then you will get a nop, because assigning U32_MAX + 1 to u32 > gives you 0. Adding () does nothing to change the result, > + and - are commutative. Ahh, unsigned here, yes. @Stefano, what did You mean about wrapping here? I think Michael is right, for example vvs->fwd_cnt wraps and now == 5 vvs->last_fwd_cnt == 0x now delta before this patch will be 6 - correct value May be I didn't get your idea, so implement it very naive? Thanks, Arseniy > > >>> >>> + else + fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; >>> >>> I actually don't see what is wrong with just >>> fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt >>> 32 bit unsigned math will I think handle wrap around correctly. >>> >>> And given buf_alloc is also u32 - I don't see where the bug is in >>> the original code. >> >> I think problem is when fwd_cnt wraps, while last_fwd_cnt is not. In this >> case fwd_cnt_delta will be too big, so we won't send credit update which >> leads to stall for sender >> >> Thanks, Arseniy > > Care coming up with an example? > > >>> >>> + + free_space = vvs->buf_alloc - fwd_cnt_delta; + low_rx_bytes = (vvs->rx_bytes < + sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); spin_unlock_bh(>rx_lock); @@ -611,9 +621,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, * too high causes extra messages. Too low causes transmitter * stalls. As stalls are in theory more expensive than extra * messages, we set the limit to a high value. TODO: experiment - * with different values. + * with different values. Also send credit update message when + * number of bytes in rx queue is not enough to wake up reader. */ - if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) + if (fwd_cnt_delta && + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) virtio_transport_send_credit_update(vsk); return total; -- 2.25.1 >>> >
Re: [PATCH net-next v7 3/4] virtio/vsock: fix logic which reduces credit update messages
On Thu, Dec 07, 2023 at 12:52:51AM +0300, Arseniy Krasnov wrote: > > > On 07.12.2023 00:53, Michael S. Tsirkin wrote: > > On Thu, Dec 07, 2023 at 12:18:48AM +0300, Arseniy Krasnov wrote: > >> Add one more condition for sending credit update during dequeue from > >> stream socket: when number of bytes in the rx queue is smaller than > >> SO_RCVLOWAT value of the socket. This is actual for non-default value > >> of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data > >> transmission, because we need at least SO_RCVLOWAT bytes in our rx > >> queue to wake up user for reading data (in corner case it is also > >> possible to stuck both tx and rx sides, this is why 'Fixes' is used). > >> Also handle case when 'fwd_cnt' wraps, while 'last_fwd_cnt' is still > >> not. > >> > >> Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") > >> Signed-off-by: Arseniy Krasnov > >> --- > >> Changelog: > >> v6 -> v7: > >> * Handle wrap of 'fwd_cnt'. > >> * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. > >> > >> net/vmw_vsock/virtio_transport_common.c | 18 +++--- > >> 1 file changed, 15 insertions(+), 3 deletions(-) > >> > >> diff --git a/net/vmw_vsock/virtio_transport_common.c > >> b/net/vmw_vsock/virtio_transport_common.c > >> index e137d740804e..39f8660d825d 100644 > >> --- a/net/vmw_vsock/virtio_transport_common.c > >> +++ b/net/vmw_vsock/virtio_transport_common.c > >> @@ -558,6 +558,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > >> *vsk, > >>struct virtio_vsock_sock *vvs = vsk->trans; > >>size_t bytes, total = 0; > >>struct sk_buff *skb; > >> + u32 fwd_cnt_delta; > >> + bool low_rx_bytes; > >>int err = -EFAULT; > >>u32 free_space; > >> > >> @@ -601,7 +603,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > >> *vsk, > >>} > >>} > >> > >> - free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); > >> + /* Handle wrap of 'fwd_cnt'. */ > >> + if (vvs->fwd_cnt < vvs->last_fwd_cnt) > >> + fwd_cnt_delta = vvs->fwd_cnt + (U32_MAX - vvs->last_fwd_cnt); > > > > Are you sure there's no off by one here? for example if fwd_cnt is 0 > > and last_fwd_cnt is 0xf then apparently delta is 0. > > Seems yes, I need +1 here And then you will get a nop, because assigning U32_MAX + 1 to u32 gives you 0. Adding () does nothing to change the result, + and - are commutative. > > > > > >> + else > >> + fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; > > > > I actually don't see what is wrong with just > > fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt > > 32 bit unsigned math will I think handle wrap around correctly. > > > > And given buf_alloc is also u32 - I don't see where the bug is in > > the original code. > > I think problem is when fwd_cnt wraps, while last_fwd_cnt is not. In this > case fwd_cnt_delta will be too big, so we won't send credit update which > leads to stall for sender > > Thanks, Arseniy Care coming up with an example? > > > > > >> + > >> + free_space = vvs->buf_alloc - fwd_cnt_delta; > >> + low_rx_bytes = (vvs->rx_bytes < > >> + sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); > >> > >>spin_unlock_bh(>rx_lock); > >> > >> @@ -611,9 +621,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > >> *vsk, > >> * too high causes extra messages. Too low causes transmitter > >> * stalls. As stalls are in theory more expensive than extra > >> * messages, we set the limit to a high value. TODO: experiment > >> - * with different values. > >> + * with different values. Also send credit update message when > >> + * number of bytes in rx queue is not enough to wake up reader. > >> */ > >> - if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) > >> + if (fwd_cnt_delta && > >> + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) > >>virtio_transport_send_credit_update(vsk); > >> > >>return total; > >> -- > >> 2.25.1 > >
Re: [PATCH net-next v7 3/4] virtio/vsock: fix logic which reduces credit update messages
On 07.12.2023 00:53, Michael S. Tsirkin wrote: > On Thu, Dec 07, 2023 at 12:18:48AM +0300, Arseniy Krasnov wrote: >> Add one more condition for sending credit update during dequeue from >> stream socket: when number of bytes in the rx queue is smaller than >> SO_RCVLOWAT value of the socket. This is actual for non-default value >> of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data >> transmission, because we need at least SO_RCVLOWAT bytes in our rx >> queue to wake up user for reading data (in corner case it is also >> possible to stuck both tx and rx sides, this is why 'Fixes' is used). >> Also handle case when 'fwd_cnt' wraps, while 'last_fwd_cnt' is still >> not. >> >> Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") >> Signed-off-by: Arseniy Krasnov >> --- >> Changelog: >> v6 -> v7: >> * Handle wrap of 'fwd_cnt'. >> * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. >> >> net/vmw_vsock/virtio_transport_common.c | 18 +++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/net/vmw_vsock/virtio_transport_common.c >> b/net/vmw_vsock/virtio_transport_common.c >> index e137d740804e..39f8660d825d 100644 >> --- a/net/vmw_vsock/virtio_transport_common.c >> +++ b/net/vmw_vsock/virtio_transport_common.c >> @@ -558,6 +558,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock >> *vsk, >> struct virtio_vsock_sock *vvs = vsk->trans; >> size_t bytes, total = 0; >> struct sk_buff *skb; >> +u32 fwd_cnt_delta; >> +bool low_rx_bytes; >> int err = -EFAULT; >> u32 free_space; >> >> @@ -601,7 +603,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock >> *vsk, >> } >> } >> >> -free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); >> +/* Handle wrap of 'fwd_cnt'. */ >> +if (vvs->fwd_cnt < vvs->last_fwd_cnt) >> +fwd_cnt_delta = vvs->fwd_cnt + (U32_MAX - vvs->last_fwd_cnt); > > Are you sure there's no off by one here? for example if fwd_cnt is 0 > and last_fwd_cnt is 0xf then apparently delta is 0. Seems yes, I need +1 here > > >> +else >> +fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; > > I actually don't see what is wrong with just > fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt > 32 bit unsigned math will I think handle wrap around correctly. > > And given buf_alloc is also u32 - I don't see where the bug is in > the original code. I think problem is when fwd_cnt wraps, while last_fwd_cnt is not. In this case fwd_cnt_delta will be too big, so we won't send credit update which leads to stall for sender Thanks, Arseniy > > >> + >> +free_space = vvs->buf_alloc - fwd_cnt_delta; >> +low_rx_bytes = (vvs->rx_bytes < >> +sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); >> >> spin_unlock_bh(>rx_lock); >> >> @@ -611,9 +621,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock >> *vsk, >> * too high causes extra messages. Too low causes transmitter >> * stalls. As stalls are in theory more expensive than extra >> * messages, we set the limit to a high value. TODO: experiment >> - * with different values. >> + * with different values. Also send credit update message when >> + * number of bytes in rx queue is not enough to wake up reader. >> */ >> -if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) >> +if (fwd_cnt_delta && >> +(free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) >> virtio_transport_send_credit_update(vsk); >> >> return total; >> -- >> 2.25.1 >
Re: [PATCH net-next v7 3/4] virtio/vsock: fix logic which reduces credit update messages
On Thu, Dec 07, 2023 at 12:18:48AM +0300, Arseniy Krasnov wrote: > Add one more condition for sending credit update during dequeue from > stream socket: when number of bytes in the rx queue is smaller than > SO_RCVLOWAT value of the socket. This is actual for non-default value > of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data > transmission, because we need at least SO_RCVLOWAT bytes in our rx > queue to wake up user for reading data (in corner case it is also > possible to stuck both tx and rx sides, this is why 'Fixes' is used). > Also handle case when 'fwd_cnt' wraps, while 'last_fwd_cnt' is still > not. > > Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") > Signed-off-by: Arseniy Krasnov > --- > Changelog: > v6 -> v7: > * Handle wrap of 'fwd_cnt'. > * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. > > net/vmw_vsock/virtio_transport_common.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index e137d740804e..39f8660d825d 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -558,6 +558,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > struct virtio_vsock_sock *vvs = vsk->trans; > size_t bytes, total = 0; > struct sk_buff *skb; > + u32 fwd_cnt_delta; > + bool low_rx_bytes; > int err = -EFAULT; > u32 free_space; > > @@ -601,7 +603,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > *vsk, > } > } > > - free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); > + /* Handle wrap of 'fwd_cnt'. */ > + if (vvs->fwd_cnt < vvs->last_fwd_cnt) > + fwd_cnt_delta = vvs->fwd_cnt + (U32_MAX - vvs->last_fwd_cnt); Are you sure there's no off by one here? for example if fwd_cnt is 0 and last_fwd_cnt is 0xf then apparently delta is 0. > + else > + fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; I actually don't see what is wrong with just fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt 32 bit unsigned math will I think handle wrap around correctly. And given buf_alloc is also u32 - I don't see where the bug is in the original code. > + > + free_space = vvs->buf_alloc - fwd_cnt_delta; > + low_rx_bytes = (vvs->rx_bytes < > + sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); > > spin_unlock_bh(>rx_lock); > > @@ -611,9 +621,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock > *vsk, >* too high causes extra messages. Too low causes transmitter >* stalls. As stalls are in theory more expensive than extra >* messages, we set the limit to a high value. TODO: experiment > - * with different values. > + * with different values. Also send credit update message when > + * number of bytes in rx queue is not enough to wake up reader. >*/ > - if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) > + if (fwd_cnt_delta && > + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) > virtio_transport_send_credit_update(vsk); > > return total; > -- > 2.25.1
[PATCH net-next v7 4/4] vsock/test: two tests to check credit update logic
Both tests are almost same, only differs in two 'if' conditions, so implemented in a single function. Tests check, that credit update message is sent: 1) During setting SO_RCVLOWAT value of the socket. 2) When number of 'rx_bytes' become smaller than SO_RCVLOWAT value. Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella --- Changelog: v1 -> v2: * Update commit message by removing 'This patch adds XXX' manner. * Update commit message by adding details about dependency for this test from kernel internal define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. * Add comment for this dependency in 'vsock_test.c' where this define is duplicated. v2 -> v3: * Replace synchronization based on control TCP socket with vsock data socket - this is needed to allow sender transmit data only when new buffer size of receiver is visible to sender. Otherwise there is race and test fails sometimes. v3 -> v4: * Replace 'recv_buf()' to 'recv(MSG_DONTWAIT)' in last read operation in server part. This is needed to ensure that 'poll()' wake up us when number of bytes ready to read is equal to SO_RCVLOWAT value. v4 -> v5: * Use 'recv_buf(MSG_DONTWAIT)' instead of 'recv(MSG_DONTWAIT)'. v5 -> v6: * Add second test which checks, that credit update is sent during reading data from socket. * Update commit message. tools/testing/vsock/vsock_test.c | 175 +++ 1 file changed, 175 insertions(+) diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index 01fa816868bc..66246d81d654 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -1232,6 +1232,171 @@ static void test_double_bind_connect_client(const struct test_opts *opts) } } +#define RCVLOWAT_CREDIT_UPD_BUF_SIZE (1024 * 128) +/* This define is the same as in 'include/linux/virtio_vsock.h': + * it is used to decide when to send credit update message during + * reading from rx queue of a socket. Value and its usage in + * kernel is important for this test. + */ +#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64) + +static void test_stream_rcvlowat_def_cred_upd_client(const struct test_opts *opts) +{ + size_t buf_size; + void *buf; + int fd; + + fd = vsock_stream_connect(opts->peer_cid, 1234); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + /* Send 1 byte more than peer's buffer size. */ + buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE + 1; + + buf = malloc(buf_size); + if (!buf) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + /* Wait until peer sets needed buffer size. */ + recv_byte(fd, 1, 0); + + if (send(fd, buf, buf_size, 0) != buf_size) { + perror("send failed"); + exit(EXIT_FAILURE); + } + + free(buf); + close(fd); +} + +static void test_stream_credit_update_test(const struct test_opts *opts, + bool low_rx_bytes_test) +{ + size_t recv_buf_size; + struct pollfd fds; + size_t buf_size; + void *buf; + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE; + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, + _size, sizeof(buf_size))) { + perror("setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)"); + exit(EXIT_FAILURE); + } + + if (low_rx_bytes_test) { + /* Set new SO_RCVLOWAT here. This enables sending credit +* update when number of bytes if our rx queue become < +* SO_RCVLOWAT value. +*/ + recv_buf_size = 1 + VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; + + if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT, + _buf_size, sizeof(recv_buf_size))) { + perror("setsockopt(SO_RCVLOWAT)"); + exit(EXIT_FAILURE); + } + } + + /* Send one dummy byte here, because 'setsockopt()' above also +* sends special packet which tells sender to update our buffer +* size. This 'send_byte()' will serialize such packet with data +* reads in a loop below. Sender starts transmission only when +* it receives this single byte. +*/ + send_byte(fd, 1, 0); + + buf = malloc(buf_size); + if (!buf) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + /* Wait until there will be 128KB of data in rx queue. */ + while (1) { + ssize_t res; + + res = recv(fd, buf, buf_size, MSG_PEEK); + if (res == buf_size) + break; + + if (res
[PATCH net-next v7 2/4] virtio/vsock: send credit update during setting SO_RCVLOWAT
Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Signed-off-by: Arseniy Krasnov --- Changelog: v1 -> v2: * Update commit message by removing 'This patch adds XXX' manner. * Do not initialize 'send_update' variable - set it directly during first usage. v3 -> v4: * Fit comment in 'virtio_transport_notify_set_rcvlowat()' to 80 chars. v4 -> v5: * Do not change callbacks order in transport structures. v5 -> v6: * Reorder callbacks in transport structures. * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. drivers/vhost/vsock.c | 1 + include/linux/virtio_vsock.h| 1 + net/vmw_vsock/virtio_transport.c| 1 + net/vmw_vsock/virtio_transport_common.c | 30 + net/vmw_vsock/vsock_loopback.c | 1 + 5 files changed, 34 insertions(+) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index f75731396b7e..ec20ecff85c7 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -449,6 +449,7 @@ static struct virtio_transport vhost_transport = { .notify_send_pre_enqueue = virtio_transport_notify_send_pre_enqueue, .notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue, .notify_buffer_size = virtio_transport_notify_buffer_size, + .notify_set_rcvlowat = virtio_transport_notify_set_rcvlowat, .read_skb = virtio_transport_read_skb, }, diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index ebb3ce63d64d..c82089dee0c8 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list); int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor); +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, int val); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index af5bab1acee1..f495b9e5186b 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -537,6 +537,7 @@ static struct virtio_transport virtio_transport = { .notify_send_pre_enqueue = virtio_transport_notify_send_pre_enqueue, .notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue, .notify_buffer_size = virtio_transport_notify_buffer_size, + .notify_set_rcvlowat = virtio_transport_notify_set_rcvlowat, .read_skb = virtio_transport_read_skb, }, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index f6dc896bf44c..e137d740804e 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1684,6 +1684,36 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto } EXPORT_SYMBOL_GPL(virtio_transport_read_skb); +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, int val) +{ + struct virtio_vsock_sock *vvs = vsk->trans; + bool send_update; + + spin_lock_bh(>rx_lock); + + /* If number of available bytes is less than new SO_RCVLOWAT value, +* kick sender to send more data, because sender may sleep in its +* 'send()' syscall waiting for enough space at our side. Also +* don't send credit update when peer already knows actual value - +* such transmission will be useless. +*/ + send_update = (vvs->rx_bytes < val) && + (vvs->fwd_cnt != vvs->last_fwd_cnt); + + spin_unlock_bh(>rx_lock); + + if (send_update) { + int err; + + err = virtio_transport_send_credit_update(vsk); + if (err < 0) + return err; + } + + return 0; +} +EXPORT_SYMBOL_GPL(virtio_transport_notify_set_rcvlowat); + MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Asias He"); MODULE_DESCRIPTION("common code for virtio vsock"); diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 048640167411..6dea6119f5b2 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -96,6 +96,7 @@ static struct virtio_transport loopback_transport = { .notify_send_pre_enqueue = virtio_transport_notify_send_pre_enqueue, .notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
[PATCH net-next v7 0/4] send credit update during setting SO_RCVLOWAT
Hello, DESCRIPTION This patchset fixes old problem with hungup of both rx/tx sides and adds test for it. This happens due to non-default SO_RCVLOWAT value and deferred credit update in virtio/vsock. Link to previous old patchset: https://lore.kernel.org/netdev/39b2e9fd-601b-189d-39a9-914e55745...@sberdevices.ru/ Here is what happens step by step: TEST INITIAL CONDITIONS 1) Vsock buffer size is 128KB. 2) Maximum packet size is also 64KB as defined in header (yes it is hardcoded, just to remind about that value). 3) SO_RCVLOWAT is default, e.g. 1 byte. STEPS SENDER RECEIVER 1) sends 128KB + 1 byte in a single buffer. 128KB will be sent, but for 1 byte sender will wait for free space at peer. Sender goes to sleep. 2) reads 64KB, credit update not sent 3) sets SO_RCVLOWAT to 64KB + 1 4) poll() -> wait forever, there is only 64KB available to read. So in step 4) receiver also goes to sleep, waiting for enough data or connection shutdown message from the sender. Idea to fix it is that rx kicks tx side to continue transmission (and may be close connection) when rx changes number of bytes to be woken up (e.g. SO_RCVLOWAT) and this value is bigger than number of available bytes to read. I've added small test for this, but not sure as it uses hardcoded value for maximum packet length, this value is defined in kernel header and used to control deferred credit update. And as this is not available to userspace, I can't control test parameters correctly (if one day this define will be changed - test may become useless). Head for this patchset is: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=021b0c952f226236f2edf89c737efb9a28d1422d Link to v1: https://lore.kernel.org/netdev/20231108072004.1045669-1-avkras...@salutedevices.com/ Link to v2: https://lore.kernel.org/netdev/20231119204922.2251912-1-avkras...@salutedevices.com/ Link to v3: https://lore.kernel.org/netdev/20231122180510.2297075-1-avkras...@salutedevices.com/ Link to v4: https://lore.kernel.org/netdev/20231129212519.2938875-1-avkras...@salutedevices.com/ Link to v5: https://lore.kernel.org/netdev/20231130130840.253733-1-avkras...@salutedevices.com/ Link to v6: https://lore.kernel.org/netdev/20231205064806.2851305-1-avkras...@salutedevices.com/ Changelog: v1 -> v2: * Patchset rebased and tested on new HEAD of net-next (see hash above). * New patch is added as 0001 - it removes return from SO_RCVLOWAT set callback in 'af_vsock.c' when transport callback is set - with that we can set 'sk_rcvlowat' only once in 'af_vsock.c' and in future do not copy-paste it to every transport. It was discussed in v1. * See per-patch changelog after ---. v2 -> v3: * See changelog after --- in 0003 only (0001 and 0002 still same). v3 -> v4: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v4 -> v5: * Change patchset tag 'RFC' -> 'net-next'. * See per-patch changelog after ---. v5 -> v6: * New patch 0003 which sends credit update during reading bytes from socket. * See per-patch changelog after ---. v6 -> v7: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. Arseniy Krasnov (4): vsock: update SO_RCVLOWAT setting callback virtio/vsock: send credit update during setting SO_RCVLOWAT virtio/vsock: fix logic which reduces credit update messages vsock/test: two tests to check credit update logic drivers/vhost/vsock.c | 1 + include/linux/virtio_vsock.h| 1 + include/net/af_vsock.h | 2 +- net/vmw_vsock/af_vsock.c| 9 +- net/vmw_vsock/hyperv_transport.c| 4 +- net/vmw_vsock/virtio_transport.c| 1 + net/vmw_vsock/virtio_transport_common.c | 48 ++- net/vmw_vsock/vsock_loopback.c | 1 + tools/testing/vsock/vsock_test.c| 175 9 files changed, 234 insertions(+), 8 deletions(-) -- 2.25.1
[PATCH net-next v7 3/4] virtio/vsock: fix logic which reduces credit update messages
Add one more condition for sending credit update during dequeue from stream socket: when number of bytes in the rx queue is smaller than SO_RCVLOWAT value of the socket. This is actual for non-default value of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data transmission, because we need at least SO_RCVLOWAT bytes in our rx queue to wake up user for reading data (in corner case it is also possible to stuck both tx and rx sides, this is why 'Fixes' is used). Also handle case when 'fwd_cnt' wraps, while 'last_fwd_cnt' is still not. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov --- Changelog: v6 -> v7: * Handle wrap of 'fwd_cnt'. * Do to send credit update when 'fwd_cnt' == 'last_fwd_cnt'. net/vmw_vsock/virtio_transport_common.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index e137d740804e..39f8660d825d 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -558,6 +558,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, struct virtio_vsock_sock *vvs = vsk->trans; size_t bytes, total = 0; struct sk_buff *skb; + u32 fwd_cnt_delta; + bool low_rx_bytes; int err = -EFAULT; u32 free_space; @@ -601,7 +603,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, } } - free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); + /* Handle wrap of 'fwd_cnt'. */ + if (vvs->fwd_cnt < vvs->last_fwd_cnt) + fwd_cnt_delta = vvs->fwd_cnt + (U32_MAX - vvs->last_fwd_cnt); + else + fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; + + free_space = vvs->buf_alloc - fwd_cnt_delta; + low_rx_bytes = (vvs->rx_bytes < + sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); spin_unlock_bh(>rx_lock); @@ -611,9 +621,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, * too high causes extra messages. Too low causes transmitter * stalls. As stalls are in theory more expensive than extra * messages, we set the limit to a high value. TODO: experiment -* with different values. +* with different values. Also send credit update message when +* number of bytes in rx queue is not enough to wake up reader. */ - if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) + if (fwd_cnt_delta && + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) virtio_transport_send_credit_update(vsk); return total; -- 2.25.1
[PATCH net-next v7 1/4] vsock: update SO_RCVLOWAT setting callback
Do not return if transport callback for SO_RCVLOWAT is set (only in error case). In this case we don't need to set 'sk_rcvlowat' field in each transport - only in 'vsock_set_rcvlowat()'. Also, if 'sk_rcvlowat' is now set only in af_vsock.c, change callback name from 'set_rcvlowat' to 'notify_set_rcvlowat'. Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella --- Changelog: v3 -> v4: * Rename 'set_rcvlowat' to 'notify_set_rcvlowat'. * Commit message updated. include/net/af_vsock.h | 2 +- net/vmw_vsock/af_vsock.c | 9 +++-- net/vmw_vsock/hyperv_transport.c | 4 ++-- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index e302c0e804d0..535701efc1e5 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -137,7 +137,6 @@ struct vsock_transport { u64 (*stream_rcvhiwat)(struct vsock_sock *); bool (*stream_is_active)(struct vsock_sock *); bool (*stream_allow)(u32 cid, u32 port); - int (*set_rcvlowat)(struct vsock_sock *vsk, int val); /* SEQ_PACKET. */ ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, @@ -168,6 +167,7 @@ struct vsock_transport { struct vsock_transport_send_notify_data *); /* sk_lock held by the caller */ void (*notify_buffer_size)(struct vsock_sock *, u64 *); + int (*notify_set_rcvlowat)(struct vsock_sock *vsk, int val); /* Shutdown. */ int (*shutdown)(struct vsock_sock *, int); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 816725af281f..54ba7316f808 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -2264,8 +2264,13 @@ static int vsock_set_rcvlowat(struct sock *sk, int val) transport = vsk->transport; - if (transport && transport->set_rcvlowat) - return transport->set_rcvlowat(vsk, val); + if (transport && transport->notify_set_rcvlowat) { + int err; + + err = transport->notify_set_rcvlowat(vsk, val); + if (err) + return err; + } WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); return 0; diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 7cb1a9d2cdb4..e2157e387217 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -816,7 +816,7 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written, } static -int hvs_set_rcvlowat(struct vsock_sock *vsk, int val) +int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val) { return -EOPNOTSUPP; } @@ -856,7 +856,7 @@ static struct vsock_transport hvs_transport = { .notify_send_pre_enqueue = hvs_notify_send_pre_enqueue, .notify_send_post_enqueue = hvs_notify_send_post_enqueue, - .set_rcvlowat = hvs_set_rcvlowat + .notify_set_rcvlowat = hvs_notify_set_rcvlowat }; static bool hvs_check_transport(struct vsock_sock *vsk) -- 2.25.1
[PATCH RFC 4/4] dev_dax_iomap: Add CONFIG_DEV_DAX_IOMAP kernel build parameter
From: John Groves Add CONFIG_DEV_DAXIOMAP kernel build parameter --- drivers/dax/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index a88744244149..b1ebcc77120b 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -78,4 +78,10 @@ config DEV_DAX_KMEM Say N if unsure. +config DEV_DAX_IOMAP + depends on DEV_DAX && DAX + def_bool y + help + Support iomap mapping of devdax devices (for FS-DAX file + systems that reside on character /dev/dax devices) endif -- 2.40.1
[PATCH RFC 3/4] dev_dax_iomap: Add dax_operations to /dev/dax struct dax_device
From: John Groves This is the primary content of this rfc. Notes about this commit: * These methods are based somewhat loosely on pmem_dax_ops from drivers/nvdimm/pmem.c * dev_dax_direct_access() is physaddr based * dev_dax_direct_access() works for mmap, but fails dax_copy_to_iter() on posix read * dev_dax_recovery_write() and dev_dax_zero_page_range() have not been tested yet. I'm looking for suggestions as to how to test those. I'm hoping somebody (Dan?) can point the way to getting this working with posix I/O. Does this need to go the memremap route? Thanks, John --- drivers/dax/bus.c | 105 ++ 1 file changed, 105 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1b55fd7aabaf..8f8c2991c7c2 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -10,6 +10,12 @@ #include "dax-private.h" #include "bus.h" +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) +#include +#include +#include +#endif + static DEFINE_MUTEX(dax_bus_lock); #define DAX_NAME_LEN 30 @@ -1374,6 +1380,100 @@ phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, } +/* the phys address approach */ +long __dev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, +long nr_pages, enum dax_access_mode mode, void **kaddr, +pfn_t *pfn) +{ + struct dev_dax *dev_dax = dax_get_private(dax_dev); + size_t size = nr_pages << PAGE_SHIFT; + size_t offset = pgoff << PAGE_SHIFT; + long range_remainder = 0; + phys_addr_t phys; + int i; + + /* +* pmem hides dax ranges by mapping to a contiguous +* pmem->virt_addr = devm_mremap_pages() (in pem_attach_disk()). +* Is it legal to avoid the vmap overhead (and resource consumption) and just return +* a (potentially partial) phys range? This function does this, returning the +* phys_addr with the length truncated if necessary to the range remainder +*/ + phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT); + + if (kaddr) + *kaddr = (void *)phys; + + if (pfn) + *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); /* are flags correct? */ + + /* +* If dax_pgoff_to_phys() also returned the range remainder (range_len - range_offset) +* this loop would not be necessary +*/ + for (i = 0; i < dev_dax->nr_range; i++) { + size_t rlen = range_len(&(dev_dax->ranges[i].range)); + + if (offset < rlen) { + range_remainder = rlen - offset; + break; + } + offset -= rlen; + } + + /* +* Return length valid at phys. Hoping callers can deal with len < entire_dax_device +* (or < npages). This returns the remaining length in the applicable dax region. +*/ + return PHYS_PFN(min_t(size_t, range_remainder, size)); +} + +static int dev_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, + size_t nr_pages) +{ + long resid = nr_pages << PAGE_SHIFT; + long offset = pgoff << PAGE_SHIFT; + + /* Break into one write per dax region */ + while (resid > 0) { + void *kaddr; + pgoff_t poff = offset >> PAGE_SHIFT; + long len = __dev_dax_direct_access(dax_dev, poff, + nr_pages, DAX_ACCESS, , NULL); + len = min_t(long, len, PAGE_SIZE); + write_dax(kaddr, ZERO_PAGE(0), offset, len); + + offset += len; + resid -= len; + } + return 0; +} + +static long dev_dax_direct_access(struct dax_device *dax_dev, + pgoff_t pgoff, long nr_pages, enum dax_access_mode mode, + void **kaddr, pfn_t *pfn) +{ + return __dev_dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); +} + +static size_t dev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i) +{ + size_t len, off; + + off = offset_in_page(addr); + len = PFN_PHYS(PFN_UP(off + bytes)); + + return _copy_from_iter_flushcache(addr, bytes, i); +} + +static const struct dax_operations dev_dax_ops = { + .direct_access = dev_dax_direct_access, + .zero_page_range = dev_dax_zero_page_range, + .recovery_write = dev_dax_recovery_write, +}; +#endif /* IS_ENABLED(CONFIG_DEV_DAX_IOMAP) */ + struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) { struct dax_region *dax_region = data->dax_region; @@ -1429,11 +1529,16 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) } } +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) + /* holder_ops currently populated separately in a slightly hacky way */ + dax_dev =
[PATCH RFC 2/4] dev_dax_iomap: Temporary hacks due to linkage issues
From: John Groves These are functions that should be called from outside, but I had linkage issues and did this hack instead. Will fix in the "real" patches... --- drivers/dax/bus.c | 50 +++ 1 file changed, 50 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1659b787b65f..1b55fd7aabaf 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1324,6 +1324,56 @@ static const struct device_type dev_dax_type = { .groups = dax_attribute_groups, }; +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) + +/* + * This is write_pmem() from pmem.c + */ +static void write_dax(void *pmem_addr, struct page *page, + unsigned int off, unsigned int len) +{ + unsigned int chunk; + void *mem; + + while (len) { + mem = kmap_atomic(page); + chunk = min_t(unsigned int, len, PAGE_SIZE - off); + memcpy_flushcache(pmem_addr, mem + off, chunk); + kunmap_atomic(mem); + len -= chunk; + off = 0; + page++; + pmem_addr += chunk; + } +} + +/* + * This function is from drivers/dax/device.c + * For some reason EXPORT_SYMBOL(dax_pgoff_to_phys) didn't result in linkable code + */ +phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, + unsigned long size) +{ + int i; + + for (i = 0; i < dev_dax->nr_range; i++) { + struct dev_dax_range *dax_range = _dax->ranges[i]; + struct range *range = _range->range; + unsigned long long pgoff_end; + phys_addr_t phys; + + pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1; + if (pgoff < dax_range->pgoff || pgoff > pgoff_end) + continue; + phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start; + if (phys + size - 1 <= range->end) + return phys; + break; + } + return -1; +} + + struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) { struct dax_region *dax_region = data->dax_region; -- 2.40.1
[PATCH RFC 1/4] dev_dax_iomap: Add add_dax_ops() func for fs-dax to provide dax holder_ops
From: John Groves This is clearly not the right way to set the holder_ops; where & how should this be done? --- drivers/dax/super.c | 16 include/linux/dax.h | 5 + 2 files changed, 21 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 0da9232ea175..3d4e205c1854 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -467,6 +467,22 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops) } EXPORT_SYMBOL_GPL(alloc_dax); +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) +/* famfs calls this to add the holder_ops. There is probably a more elegant approach */ +int add_dax_ops( + struct dax_device *dax_dev, + const struct dax_holder_operations *hops) +{ + /* dax_dev->ops should have been populated by devm_create_dev_dax() */ + WARN_ON(!dax_dev->ops); + + /* Use cmpxchg? */ + dax_dev->holder_ops = hops; + return 0; +} +EXPORT_SYMBOL_GPL(add_dax_ops); +#endif /* DEV_DAX_IOMAP */ + void put_dax(struct dax_device *dax_dev) { if (!dax_dev) diff --git a/include/linux/dax.h b/include/linux/dax.h index b463502b16e1..31b68667b3cb 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -57,6 +57,11 @@ struct dax_holder_operations { #if IS_ENABLED(CONFIG_DAX) struct dax_device *alloc_dax(void *private, const struct dax_operations *ops); + +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) +int add_dax_ops(struct dax_device *dax_dev, + const struct dax_holder_operations *hops); +#endif void *dax_holder(struct dax_device *dax_dev); void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); -- 2.40.1
[PATCH RFC 0/4] dev_dax_iomap: introduce iomap for /dev/dax
From: John Groves This patch set is not intended to be merged; I'm hoping to get some clarification as to the correct approach (especialy from Dan). This work is related to famfs, which is a dax file system for shared fabric-attached memory (FAM). Famfs is "coming soon" as an RFC, but the concept and requirements were presented at LPC 2023. See https://lpc.events/event/17/contributions/1455/ and https://www.youtube.com/watch?v=aA_DgO95gLo. My expectation is that a future (fully working) version of this patch will be folded into the famfs patches. Unlike the current set of fs-dax file systems, famfs does not need a block (pmem) device, and should really run on a /dev/dax character device since that's how sharable fabric-attached cxl memory will surface. But /dev/dax character devices are missing some functionality that is provided by the block /dev/pmem driver - specifically struct dax_operations pointer in struct dax_device. This patch, when CONFIG_DEV_DAX_IOMAP=y, populates dax_dev->ops for character dax devices. The added operations differ (currently) from /dev/pmem's dev_dax->ops in that they don't use memremap but instead provide a physical address in response to the dev_dax->direct_access() method. The dax_operations are direct_access() (which resolves a dax dev offset to an address), zero_page_range() and recovery_write(). I'm not sure yet how to test the latter two, but the direct_access() method works in conjunciton with famfs - but only for mmaped files. But Posix reads fail. Specifically dax_iomap_iter() calls dax_copy_to_iter(), which declines to copy the data for some reason in one of the lower level copy_to_user variants. I've tried to isolate the reason for the failure with a VM under gdb, but no luck there yet. I'm wondering if there is some flag or attribute that needs to be applied to these addresses/pages somehow to allow this to work. The failing copy_to_user() differs from the same path with pmem fs-dax, in that pmem does a memremap (which I think generates one contiguous range, even if the device had more than one range - is this right, and does this mean it's consuming some of the vmap/vmalloc range?) I spent some time attempting a memremap, but I haven't figured out the magic for that. However, I like the simplicity of resolving to phys if that's not a non-starter for some reason. I hope this is enough context for a meaningful review and suggestions as to what a working dev_dax->dax_operations implementation should look like. Thanks for any tips! John Groves (4): Add add_dax_ops() func for fs-dax to provide dax holder_ops Temporary hacks due to linkage issues Add dax_operations to /dev/dax struct dax_device Add CONFIG_DEV_DAX_IOMAP kernel build parameter drivers/dax/Kconfig | 6 ++ drivers/dax/bus.c | 155 drivers/dax/super.c | 16 + include/linux/dax.h | 5 ++ 4 files changed, 182 insertions(+) -- 2.40.1
[PATCH v6 1/2] depmod: Handle installing modules under a different directory
Some distributions aim at shipping all files in /usr. The path under which kernel modules are installed is hardcoded to /lib which conflicts with this goal. When kmod provides kmod.pc, use it to determine the correct module installation path. With kmod that does not provide the config /lib/modules is used as before. While pkg-config does not return an error when a variable does not exist the kmod configure script puts some effort into ensuring that module_directory is non-empty. With that empty module_directory from pkg-config can be used to detect absence of the variable. Signed-off-by: Michal Suchanek --- v6: - use ?= instead of := to make it easier to override the value - use shorter expression for determining the module directory assuming it's non-empty --- Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 511b5616aa41..84f32bd563d4 100644 --- a/Makefile +++ b/Makefile @@ -1081,7 +1081,9 @@ export INSTALL_DTBS_PATH ?= $(INSTALL_PATH)/dtbs/$(KERNELRELEASE) # makefile but the argument can be passed to make if needed. # -MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE) +export KERNEL_MODULE_DIRECTORY ?= $(or $(shell pkg-config --variable=module_directory kmod 2>/dev/null),/lib/modules) + +MODLIB = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE) export MODLIB PHONY += prepare0 -- 2.42.0
[PATCH v6 2/2] kbuild: rpm-pkg: Fix build with non-default MODLIB
The default MODLIB value is composed of three variables MODLIB = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE) However, the kernel.spec hadcodes the default value of $(KERNEL_MODULE_DIRECTORY), and changed value is not reflected when building the package. Pass KERNEL_MODULE_DIRECTORY to kernel.spec to fix this problem. Signed-off-by: Michal Suchanek --- Build on top of the previous patch adding KERNEL_MODULE_DIRECTORY --- scripts/package/kernel.spec | 8 scripts/package/mkspec | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/scripts/package/kernel.spec b/scripts/package/kernel.spec index 3eee0143e0c5..12996ed365f8 100644 --- a/scripts/package/kernel.spec +++ b/scripts/package/kernel.spec @@ -67,7 +67,7 @@ cp $(%{make} %{makeflags} -s image_name) %{buildroot}/boot/vmlinuz-%{KERNELRELEA %{make} %{makeflags} INSTALL_HDR_PATH=%{buildroot}/usr headers_install cp System.map %{buildroot}/boot/System.map-%{KERNELRELEASE} cp .config %{buildroot}/boot/config-%{KERNELRELEASE} -ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}/lib/modules/%{KERNELRELEASE}/build +ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}%{KERNEL_MODULE_DIRECTORY}/%{KERNELRELEASE}/build %if %{with_devel} %{make} %{makeflags} run-command KBUILD_RUN_COMMAND='${srctree}/scripts/package/install-extmod-build %{buildroot}/usr/src/kernels/%{KERNELRELEASE}' %endif @@ -98,8 +98,8 @@ fi %files %defattr (-, root, root) -/lib/modules/%{KERNELRELEASE} -%exclude /lib/modules/%{KERNELRELEASE}/build +%{KERNEL_MODULE_DIRECTORY}/%{KERNELRELEASE} +%exclude %{KERNEL_MODULE_DIRECTORY}/%{KERNELRELEASE}/build /boot/* %files headers @@ -110,5 +110,5 @@ fi %files devel %defattr (-, root, root) /usr/src/kernels/%{KERNELRELEASE} -/lib/modules/%{KERNELRELEASE}/build +%{KERNEL_MODULE_DIRECTORY}/%{KERNELRELEASE}/build %endif diff --git a/scripts/package/mkspec b/scripts/package/mkspec index ce201bfa8377..e952fa4f2937 100755 --- a/scripts/package/mkspec +++ b/scripts/package/mkspec @@ -24,6 +24,7 @@ fi cat<
Re: [PATCH] init: move THIS_MODULE from to
[Re: [PATCH] init: move THIS_MODULE from to ] On 03/12/2023 (Sun 19:06) Masahiro Yamada wrote: > On Sun, Nov 26, 2023 at 4:19???PM Masahiro Yamada > wrote: > > > > Commit f50169324df4 ("module.h: split out the EXPORT_SYMBOL into > > export.h") appropriately separated EXPORT_SYMBOL into > > because modules and EXPORT_SYMBOL are orthogonal; modules are symbol > > consumers, while EXPORT_SYMBOL are used by symbol providers, which > > may not be necessarily a module. > > > > However, that commit also relocated THIS_MODULE. As explained in the > > commit description, the intention was to define THIS_MODULE in a > > lightweight header, but I do not believe was the > > suitable location because EXPORT_SYMBOL and THIS_MODULE are unrelated. > > > > Move it to another lightweight header, . The reason for > > choosing is to make self-contained > > without relying on incorrectly including > > . > > > > With this adjustment, the role of becomes clearer as > > it only defines EXPORT_SYMBOL. > > > > Signed-off-by: Masahiro Yamada > > --- > > > Applied to kbuild. > > I did not get any report from the 0day bot so far, > but I hope it will get a little more compile tests > before getting into linux-next. Haven't touched that kind of header shuffle for over 10 years? But yeah, it is near impossible to not trip over some implicit header inclusion somewhere in some driver or a less common arch and hence break the build at least once when doing this kind of stuff. Paul. -- > > > > > > > include/linux/export.h | 18 -- > > include/linux/init.h | 7 +++ > > 2 files changed, 7 insertions(+), 18 deletions(-) > > > > diff --git a/include/linux/export.h b/include/linux/export.h > > index 9911508a9604..0bbd02fd351d 100644 > > --- a/include/linux/export.h > > +++ b/include/linux/export.h > > @@ -6,15 +6,6 @@ > > #include > > #include > > > > -/* > > - * Export symbols from the kernel to modules. Forked from module.h > > - * to reduce the amount of pointless cruft we feed to gcc when only > > - * exporting a simple symbol or two. > > - * > > - * Try not to add #includes here. It slows compilation and makes kernel > > - * hackers place grumpy comments in header files. > > - */ > > - > > /* > > * This comment block is used by fixdep. Please do not remove. > > * > > @@ -23,15 +14,6 @@ > > * side effect of the *.o build rule. > > */ > > > > -#ifndef __ASSEMBLY__ > > -#ifdef MODULE > > -extern struct module __this_module; > > -#define THIS_MODULE (&__this_module) > > -#else > > -#define THIS_MODULE ((struct module *)0) > > -#endif > > -#endif /* __ASSEMBLY__ */ > > - > > #ifdef CONFIG_64BIT > > #define __EXPORT_SYMBOL_REF(sym) \ > > .balign 8 ASM_NL \ > > diff --git a/include/linux/init.h b/include/linux/init.h > > index 01b52c9c7526..3fa3f6241350 100644 > > --- a/include/linux/init.h > > +++ b/include/linux/init.h > > @@ -179,6 +179,13 @@ extern void (*late_time_init)(void); > > > > extern bool initcall_debug; > > > > +#ifdef MODULE > > +extern struct module __this_module; > > +#define THIS_MODULE (&__this_module) > > +#else > > +#define THIS_MODULE ((struct module *)0) > > +#endif > > + > > #endif > > > > #ifndef MODULE > > -- > > 2.40.1 > > > > > -- > Best Regards > Masahiro Yamada
Re: [PATCH net] vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
On 06.12.2023 19:41, Stefano Garzarella wrote: > After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY > flag support") in CentOS Stream 9, CI reported the following error: > > In file included from ./include/linux/kernel.h:17, > from ./include/linux/list.h:9, > from ./include/linux/preempt.h:11, > from ./include/linux/spinlock.h:56, > from net/vmw_vsock/virtio_transport_common.c:9: > net/vmw_vsock/virtio_transport_common.c: In function > ‘virtio_transport_can_zcopy‘: > ./include/linux/minmax.h:20:35: error: comparison of distinct pointer > types lacks a cast [-Werror] >20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) > | ^~ > ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘ >26 | (__typecheck(x, y) && __no_side_effects(x, y)) > | ^~~ > ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘ >36 | __builtin_choose_expr(__safe_cmp(x, y), \ > | ^~ > ./include/linux/minmax.h:45:25: note: in expansion of macro > ‘__careful_cmp‘ >45 | #define min(x, y) __careful_cmp(x, y, <) > | ^ > net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of > macro ‘min‘ >63 | int pages_to_send = min(pages_in_iov, > MAX_SKB_FRAGS); > > We could solve it by using min_t(), but this operation seems entirely > unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), > which performs almost the same check, returning at most MAX_SKB_FRAGS > elements. So, let's eliminate this unnecessary comparison. > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") > Cc: avkras...@salutedevices.com > Signed-off-by: Stefano Garzarella > --- Reviewed-by: Arseniy Krasnov > net/vmw_vsock/virtio_transport_common.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index f6dc896bf44c..c8e162c9d1df 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -59,8 +59,7 @@ static bool virtio_transport_can_zcopy(const struct > virtio_transport *t_ops, > t_ops = virtio_transport_get_ops(info->vsk); > > if (t_ops->can_msgzerocopy) { > - int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); > - int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); > + int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); > > /* +1 is for packet header. */ > return t_ops->can_msgzerocopy(pages_to_send + 1);
Re: [PATCH 0/2] kmod /usr support
On Fri, Nov 10, 2023 at 01:13:53PM +0100, Michal Suchanek wrote: Hello, This is resend of the last patch in the series that adds prefix support to kernel module location together with additional patch for validating the user supplied input to options that are interpreted as directories. Thanks applied, thanks Lucas De Marchi
Re: [PATCH net] vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
On Wed, Dec 06, 2023 at 05:41:43PM +0100, Stefano Garzarella wrote: > After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY > flag support") in CentOS Stream 9, CI reported the following error: > > In file included from ./include/linux/kernel.h:17, > from ./include/linux/list.h:9, > from ./include/linux/preempt.h:11, > from ./include/linux/spinlock.h:56, > from net/vmw_vsock/virtio_transport_common.c:9: > net/vmw_vsock/virtio_transport_common.c: In function > ???virtio_transport_can_zcopy???: > ./include/linux/minmax.h:20:35: error: comparison of distinct pointer > types lacks a cast [-Werror] >20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) > | ^~ > ./include/linux/minmax.h:26:18: note: in expansion of macro > ???__typecheck??? >26 | (__typecheck(x, y) && __no_side_effects(x, y)) > | ^~~ > ./include/linux/minmax.h:36:31: note: in expansion of macro > ???__safe_cmp??? >36 | __builtin_choose_expr(__safe_cmp(x, y), \ > | ^~ > ./include/linux/minmax.h:45:25: note: in expansion of macro > ???__careful_cmp??? >45 | #define min(x, y) __careful_cmp(x, y, <) > | ^ > net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of > macro ???min??? >63 | int pages_to_send = min(pages_in_iov, > MAX_SKB_FRAGS); > > We could solve it by using min_t(), but this operation seems entirely > unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), > which performs almost the same check, returning at most MAX_SKB_FRAGS > elements. So, let's eliminate this unnecessary comparison. > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") > Cc: avkras...@salutedevices.com > Signed-off-by: Stefano Garzarella > --- Acked-by: Michael S. Tsirkin > net/vmw_vsock/virtio_transport_common.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index f6dc896bf44c..c8e162c9d1df 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -59,8 +59,7 @@ static bool virtio_transport_can_zcopy(const struct > virtio_transport *t_ops, > t_ops = virtio_transport_get_ops(info->vsk); > > if (t_ops->can_msgzerocopy) { > - int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); > - int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); > + int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); > > /* +1 is for packet header. */ > return t_ops->can_msgzerocopy(pages_to_send + 1); > -- > 2.43.0
Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support
On 12/6/23 9:43 AM, Mathieu Poirier wrote: > On Fri, 1 Dec 2023 at 11:10, Tanmay Shah wrote: > > > > > > On 11/29/23 11:10 AM, Mathieu Poirier wrote: > > > On Mon, Nov 27, 2023 at 10:33:05AM -0600, Tanmay Shah wrote: > > > > > > > > On 11/23/23 12:11 PM, Mathieu Poirier wrote: > > > > > On Wed, Nov 22, 2023 at 03:00:36PM -0600, Tanmay Shah wrote: > > > > > > Hi Mathieu, > > > > > > > > > > > > Please find my comments below. > > > > > > > > > > > > On 11/21/23 4:59 PM, Mathieu Poirier wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote: > > > > > > > > Use TCM pm domains extracted from device-tree > > > > > > > > to power on/off TCM using general pm domain framework. > > > > > > > > > > > > > > > > Signed-off-by: Tanmay Shah > > > > > > > > --- > > > > > > > > > > > > > > > > Changes in v7: > > > > > > > > - %s/pm_dev1/pm_dev_core0/r > > > > > > > > - %s/pm_dev_link1/pm_dev_core0_link/r > > > > > > > > - %s/pm_dev2/pm_dev_core1/r > > > > > > > > - %s/pm_dev_link2/pm_dev_core1_link/r > > > > > > > > - remove pm_domain_id check to move next patch > > > > > > > > - add comment about how 1st entry in pm domain list is used > > > > > > > > - fix loop when jump to fail_add_pm_domains loop > > > > > > > > > > > > > > > > drivers/remoteproc/xlnx_r5_remoteproc.c | 215 > > > > > > > > +++- > > > > > > > > 1 file changed, 212 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > > b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > > index 4395edea9a64..22bccc5075a0 100644 > > > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > > @@ -16,6 +16,7 @@ > > > > > > > > #include > > > > > > > > #include > > > > > > > > #include > > > > > > > > +#include > > > > > > > > > > > > > > > > #include "remoteproc_internal.h" > > > > > > > > > > > > > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data > > > > > > > > zynqmp_tcm_banks_lockstep[] = { > > > > > > > > * @rproc: rproc handle > > > > > > > > * @pm_domain_id: RPU CPU power domain id > > > > > > > > * @ipi: pointer to mailbox information > > > > > > > > + * @num_pm_dev: number of tcm pm domain devices for this core > > > > > > > > + * @pm_dev_core0: pm domain virtual devices for power domain > > > > > > > > framework > > > > > > > > + * @pm_dev_core0_link: pm domain device links after > > > > > > > > registration > > > > > > > > + * @pm_dev_core1: used only in lockstep mode. second core's pm > > > > > > > > domain virtual devices > > > > > > > > + * @pm_dev_core1_link: used only in lockstep mode. second > > > > > > > > core's pm device links after > > > > > > > > + * registration > > > > > > > > */ > > > > > > > > struct zynqmp_r5_core { > > > > > > > > struct device *dev; > > > > > > > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core { > > > > > > > > struct rproc *rproc; > > > > > > > > u32 pm_domain_id; > > > > > > > > struct mbox_info *ipi; > > > > > > > > + int num_pm_dev; > > > > > > > > + struct device **pm_dev_core0; > > > > > > > > + struct device_link **pm_dev_core0_link; > > > > > > > > + struct device **pm_dev_core1; > > > > > > > > + struct device_link **pm_dev_core1_link; > > > > > > > > }; > > > > > > > > > > > > > > > > /** > > > > > > > > @@ -651,7 +663,8 @@ static int > > > > > > > > add_tcm_carveout_lockstep_mode(struct rproc *rproc) > > > > > > > > > > > > > > > > ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > > > > > > > > > > > > > > > ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > > > > > > > if (ret < 0) { > > > > > > > > - dev_err(dev, "failed to turn on TCM 0x%x", > > > > > > > > pm_domain_id); > > > > > > > > + dev_err(dev, "failed to turn on TCM 0x%x", > > > > > > > > + pm_domain_id); > > > > > > > > > > > > > > Spurious change, you should have caught that. > > > > > > > > > > > > Ack, need to observe changes more closely before sending them. > > > > > > > > > > > > > > > > > > > > > goto release_tcm_lockstep; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct > > > > > > > > rproc *rproc, const struct firmware *fw) > > > > > > > > return ret; > > > > > > > > } > > > > > > > > > > > > > > > > +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc) > > > > > > > > +{ > > > > > > > > + struct zynqmp_r5_core *r5_core = rproc->priv; > > > > > > > > + struct device *dev = r5_core->dev; > > > > > > > > + struct zynqmp_r5_cluster *cluster; > > > > > > > > + int i; > > > > > > > > + > > > > > > > > + cluster = > > > > > > > > platform_get_drvdata(to_platform_device(dev->parent)); > > > > > > >
[PATCH net] vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") in CentOS Stream 9, CI reported the following error: In file included from ./include/linux/kernel.h:17, from ./include/linux/list.h:9, from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:56, from net/vmw_vsock/virtio_transport_common.c:9: net/vmw_vsock/virtio_transport_common.c: In function âvirtio_transport_can_zcopyâ: ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro â__typecheckâ 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~ ./include/linux/minmax.h:36:31: note: in expansion of macro â__safe_cmpâ 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~ ./include/linux/minmax.h:45:25: note: in expansion of macro â__careful_cmpâ 45 | #define min(x, y) __careful_cmp(x, y, <) | ^ net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro âminâ 63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); We could solve it by using min_t(), but this operation seems entirely unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), which performs almost the same check, returning at most MAX_SKB_FRAGS elements. So, let's eliminate this unnecessary comparison. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkras...@salutedevices.com Signed-off-by: Stefano Garzarella --- net/vmw_vsock/virtio_transport_common.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index f6dc896bf44c..c8e162c9d1df 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -59,8 +59,7 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops, t_ops = virtio_transport_get_ops(info->vsk); if (t_ops->can_msgzerocopy) { - int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); - int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); + int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); /* +1 is for packet header. */ return t_ops->can_msgzerocopy(pages_to_send + 1); -- 2.43.0
Re: [PATCH v7] bus: mhi: host: Add tracing support
On Wed, 6 Dec 2023 21:12:57 +0530 Krishna chaitanya chundru wrote: > diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c > index f78aefd2d7a3..6acb85f4c5f8 100644 > --- a/drivers/bus/mhi/host/init.c > +++ b/drivers/bus/mhi/host/init.c > @@ -20,6 +20,9 @@ > #include > #include "internal.h" > > +#define CREATE_TRACE_POINTS > +#include "trace.h" > + > static DEFINE_IDA(mhi_controller_ida); > > const char * const mhi_ee_str[MHI_EE_MAX] = { > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > index dcf627b36e82..507cf3351a97 100644 > --- a/drivers/bus/mhi/host/main.c > +++ b/drivers/bus/mhi/host/main.c > @@ -15,6 +15,7 @@ > #include > #include > #include "internal.h" > +#include "trace.h" > > int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl, > void __iomem *base, u32 offset, u32 *out) > @@ -491,11 +492,9 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, > void *priv) > > state = mhi_get_mhi_state(mhi_cntrl); > ee = mhi_get_exec_env(mhi_cntrl); > - dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n", > - TO_MHI_EXEC_STR(mhi_cntrl->ee), > - mhi_state_str(mhi_cntrl->dev_state), > - TO_MHI_EXEC_STR(ee), mhi_state_str(state)); > > + trace_mhi_intvec_states(mhi_cntrl->mhi_dev->name, mhi_cntrl->ee, > + mhi_cntrl->dev_state, ee, state); I'm not sure if this was discussed before, but why not just pass in the mhi_cntrl pointer and do the assigning in the TRACE_EVENT() TP_fast_assign() portion? It will save on setting up the parameters. > if (state == MHI_STATE_SYS_ERR) { > dev_dbg(dev, "System error detected\n"); > pm_state = mhi_tryset_pm_state(mhi_cntrl, > @@ -832,6 +831,10 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller > *mhi_cntrl, > while (dev_rp != local_rp) { > enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); > > + trace_mhi_ctrl_event(mhi_cntrl->mhi_dev->name, local_rp, > + local_rp->ptr, local_rp->dword[0], > + local_rp->dword[1]); And here.. > + > switch (type) { > case MHI_PKT_TYPE_BW_REQ_EVENT: > { > @@ -997,6 +1000,9 @@ int mhi_process_data_event_ring(struct mhi_controller > *mhi_cntrl, > while (dev_rp != local_rp && event_quota > 0) { > enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); > > + trace_mhi_data_event(mhi_cntrl->mhi_dev->name, local_rp, > local_rp->ptr, > + local_rp->dword[0], local_rp->dword[1]); and here.. for the local_rp. Especially since you are just repeating what you send into the two events. You can consolidate it down to just adding that in the TP_fast_assign() of the shared DECLARE_EVENT_CLASS(). > + > chan = MHI_TRE_GET_EV_CHID(local_rp); > > WARN_ON(chan >= mhi_cntrl->max_chan); > @@ -1235,6 +1241,8 @@ int mhi_gen_tre(struct mhi_controller *mhi_cntrl, > struct mhi_chan *mhi_chan, > mhi_tre->dword[0] = MHI_TRE_DATA_DWORD0(info->len); > mhi_tre->dword[1] = MHI_TRE_DATA_DWORD1(bei, eot, eob, chain); > > + trace_mhi_gen_tre(mhi_cntrl->mhi_dev->name, mhi_chan->chan, mhi_tre, > + mhi_tre->ptr, mhi_tre->dword[0], mhi_tre->dword[1]); Same here. You can pass in the mhi_tre pointer. > /* increment WP */ > mhi_add_ring_element(mhi_cntrl, tre_ring); > mhi_add_ring_element(mhi_cntrl, buf_ring); > @@ -1327,9 +1335,7 @@ static int mhi_update_channel_state(struct > mhi_controller *mhi_cntrl, > enum mhi_cmd_type cmd = MHI_CMD_NOP; > int ret; > > - dev_dbg(dev, "%d: Updating channel state to: %s\n", mhi_chan->chan, > - TO_CH_STATE_TYPE_STR(to_state)); > - > + trace_mhi_channel_command_start(mhi_cntrl->mhi_dev->name, > mhi_chan->chan, to_state); And here.. > switch (to_state) { > case MHI_CH_STATE_TYPE_RESET: > write_lock_irq(_chan->lock); > @@ -1396,9 +1402,7 @@ static int mhi_update_channel_state(struct > mhi_controller *mhi_cntrl, > write_unlock_irq(_chan->lock); > } > > - dev_dbg(dev, "%d: Channel state change to %s successful\n", > - mhi_chan->chan, TO_CH_STATE_TYPE_STR(to_state)); > - > + trace_mhi_channel_command_end(mhi_cntrl->mhi_dev->name, mhi_chan->chan, > to_state); and here.. Where you can also update the DECLARE_EVENT_CLASS() to directly access the mhi_cntrl and mhi_chan pointers. Sometimes it's better to do the dereference from inside the TP_fast_assign. That way, things like eprobes and bpf tracing can also have access to the full structure if needed. -- Steve
Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support
On Fri, 1 Dec 2023 at 11:10, Tanmay Shah wrote: > > > On 11/29/23 11:10 AM, Mathieu Poirier wrote: > > On Mon, Nov 27, 2023 at 10:33:05AM -0600, Tanmay Shah wrote: > > > > > > On 11/23/23 12:11 PM, Mathieu Poirier wrote: > > > > On Wed, Nov 22, 2023 at 03:00:36PM -0600, Tanmay Shah wrote: > > > > > Hi Mathieu, > > > > > > > > > > Please find my comments below. > > > > > > > > > > On 11/21/23 4:59 PM, Mathieu Poirier wrote: > > > > > > Hi, > > > > > > > > > > > > On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote: > > > > > > > Use TCM pm domains extracted from device-tree > > > > > > > to power on/off TCM using general pm domain framework. > > > > > > > > > > > > > > Signed-off-by: Tanmay Shah > > > > > > > --- > > > > > > > > > > > > > > Changes in v7: > > > > > > > - %s/pm_dev1/pm_dev_core0/r > > > > > > > - %s/pm_dev_link1/pm_dev_core0_link/r > > > > > > > - %s/pm_dev2/pm_dev_core1/r > > > > > > > - %s/pm_dev_link2/pm_dev_core1_link/r > > > > > > > - remove pm_domain_id check to move next patch > > > > > > > - add comment about how 1st entry in pm domain list is used > > > > > > > - fix loop when jump to fail_add_pm_domains loop > > > > > > > > > > > > > > drivers/remoteproc/xlnx_r5_remoteproc.c | 215 > > > > > > > +++- > > > > > > > 1 file changed, 212 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > index 4395edea9a64..22bccc5075a0 100644 > > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > > @@ -16,6 +16,7 @@ > > > > > > > #include > > > > > > > #include > > > > > > > #include > > > > > > > +#include > > > > > > > > > > > > > > #include "remoteproc_internal.h" > > > > > > > > > > > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data > > > > > > > zynqmp_tcm_banks_lockstep[] = { > > > > > > > * @rproc: rproc handle > > > > > > > * @pm_domain_id: RPU CPU power domain id > > > > > > > * @ipi: pointer to mailbox information > > > > > > > + * @num_pm_dev: number of tcm pm domain devices for this core > > > > > > > + * @pm_dev_core0: pm domain virtual devices for power domain > > > > > > > framework > > > > > > > + * @pm_dev_core0_link: pm domain device links after registration > > > > > > > + * @pm_dev_core1: used only in lockstep mode. second core's pm > > > > > > > domain virtual devices > > > > > > > + * @pm_dev_core1_link: used only in lockstep mode. second core's > > > > > > > pm device links after > > > > > > > + * registration > > > > > > > */ > > > > > > > struct zynqmp_r5_core { > > > > > > > struct device *dev; > > > > > > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core { > > > > > > > struct rproc *rproc; > > > > > > > u32 pm_domain_id; > > > > > > > struct mbox_info *ipi; > > > > > > > + int num_pm_dev; > > > > > > > + struct device **pm_dev_core0; > > > > > > > + struct device_link **pm_dev_core0_link; > > > > > > > + struct device **pm_dev_core1; > > > > > > > + struct device_link **pm_dev_core1_link; > > > > > > > }; > > > > > > > > > > > > > > /** > > > > > > > @@ -651,7 +663,8 @@ static int > > > > > > > add_tcm_carveout_lockstep_mode(struct rproc *rproc) > > > > > > > > > > > > > > ZYNQMP_PM_CAPABILITY_ACCESS, 0, > > > > > > > > > > > > > > ZYNQMP_PM_REQUEST_ACK_BLOCKING); > > > > > > > if (ret < 0) { > > > > > > > - dev_err(dev, "failed to turn on TCM 0x%x", > > > > > > > pm_domain_id); > > > > > > > + dev_err(dev, "failed to turn on TCM 0x%x", > > > > > > > + pm_domain_id); > > > > > > > > > > > > Spurious change, you should have caught that. > > > > > > > > > > Ack, need to observe changes more closely before sending them. > > > > > > > > > > > > > > > > > > goto release_tcm_lockstep; > > > > > > > } > > > > > > > > > > > > > > @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct rproc > > > > > > > *rproc, const struct firmware *fw) > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc) > > > > > > > +{ > > > > > > > + struct zynqmp_r5_core *r5_core = rproc->priv; > > > > > > > + struct device *dev = r5_core->dev; > > > > > > > + struct zynqmp_r5_cluster *cluster; > > > > > > > + int i; > > > > > > > + > > > > > > > + cluster = > > > > > > > platform_get_drvdata(to_platform_device(dev->parent)); > > > > > > > + > > > > > > > + for (i = 1; i < r5_core->num_pm_dev; i++) { > > > > > > > + device_link_del(r5_core->pm_dev_core0_link[i]); > > > > > > > + dev_pm_domain_detach(r5_core->pm_dev_core0[i], false); > > > > > > > + } > > > > > > > + > > > > > > > +
[PATCH v7] bus: mhi: host: Add tracing support
This change adds ftrace support for following functions which helps in debugging the issues when there is Channel state & MHI state change and also when we receive data and control events: 1. mhi_intvec_mhi_states 2. mhi_process_data_event_ring 3. mhi_process_ctrl_ev_ring 4. mhi_gen_tre 5. mhi_update_channel_state 6. mhi_tryset_pm_state 7. mhi_pm_st_worker Where ever the trace events are added, debug messages are removed. Signed-off-by: Krishna chaitanya chundru --- Changes in v7: - change log format as pointed by mani. - Link to v6: https://lore.kernel.org/r/20231204-ftrace_support-v6-1-9b206546d...@quicinc.com Changes in v6: - use 'rp' directly as suggested by jeffrey. - Link to v5: https://lore.kernel.org/r/20231127-ftrace_support-v5-1-eb67daead...@quicinc.com Changes in v5: - Use DECLARE_EVENT_CLASS for multiple events as suggested by steve. - Instead of converting to u64 to print address, use %px to print the address to avoid - warnings in some platforms. - Link to v4: https://lore.kernel.org/r/2023-ftrace_support-v4-1-c83602399...@quicinc.com Changes in v4: - Fix compilation issues in previous patch which happended due to rebasing. - In the defconfig FTRACE config is not enabled due to that the compilation issue is not - seen in my workspace. - Link to v3: https://lore.kernel.org/r/2023-ftrace_support-v3-1-f358d2911...@quicinc.com Changes in v3: - move trace header file from include/trace/events to drivers/bus/mhi/host/ so that - we can include driver header files. - Use macros directly in the trace events as suggested Jeffrey Hugo. - Reorder the structure in the events as suggested by steve to avoid holes in the buffer. - removed the mhi_to_physical function as this can give security issues. - removed macros to define strings as we can get those from driver headers. - Link to v2: https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce01...@quicinc.com Changes in v2: - Passing the raw state into the trace event and using __print_symbolic() as suggested by bjorn. - Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn. - Fixed the kernel test rebot issues. - Link to v1: https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com --- drivers/bus/mhi/host/init.c | 3 + drivers/bus/mhi/host/main.c | 24 ++--- drivers/bus/mhi/host/pm.c| 7 +- drivers/bus/mhi/host/trace.h | 208 +++ 4 files changed, 229 insertions(+), 13 deletions(-) diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index f78aefd2d7a3..6acb85f4c5f8 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -20,6 +20,9 @@ #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + static DEFINE_IDA(mhi_controller_ida); const char * const mhi_ee_str[MHI_EE_MAX] = { diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index dcf627b36e82..507cf3351a97 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -15,6 +15,7 @@ #include #include #include "internal.h" +#include "trace.h" int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *base, u32 offset, u32 *out) @@ -491,11 +492,9 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv) state = mhi_get_mhi_state(mhi_cntrl); ee = mhi_get_exec_env(mhi_cntrl); - dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n", - TO_MHI_EXEC_STR(mhi_cntrl->ee), - mhi_state_str(mhi_cntrl->dev_state), - TO_MHI_EXEC_STR(ee), mhi_state_str(state)); + trace_mhi_intvec_states(mhi_cntrl->mhi_dev->name, mhi_cntrl->ee, + mhi_cntrl->dev_state, ee, state); if (state == MHI_STATE_SYS_ERR) { dev_dbg(dev, "System error detected\n"); pm_state = mhi_tryset_pm_state(mhi_cntrl, @@ -832,6 +831,10 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_ctrl_event(mhi_cntrl->mhi_dev->name, local_rp, +local_rp->ptr, local_rp->dword[0], +local_rp->dword[1]); + switch (type) { case MHI_PKT_TYPE_BW_REQ_EVENT: { @@ -997,6 +1000,9 @@ int mhi_process_data_event_ring(struct mhi_controller *mhi_cntrl, while (dev_rp != local_rp && event_quota > 0) { enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp); + trace_mhi_data_event(mhi_cntrl->mhi_dev->name, local_rp, local_rp->ptr, +local_rp->dword[0], local_rp->dword[1]); + chan = MHI_TRE_GET_EV_CHID(local_rp); WARN_ON(chan >= mhi_cntrl->max_chan); @@ -1235,6
Re: [PATCH] kmod: Add FIPS 202 SHA-3 support
On Sun, Oct 22, 2023 at 07:09:28PM +0100, Dimitri John Ledkov wrote: Add support for parsing FIPS 202 SHA-3 signature hashes. Separately, it is not clear why explicit hashes are re-encoded here, instead of trying to generically show any digest openssl supports. Signed-off-by: Dimitri John Ledkov --- libkmod/libkmod-signature.c | 12 1 file changed, 12 insertions(+) diff --git a/libkmod/libkmod-signature.c b/libkmod/libkmod-signature.c index b749a818f9..a39059cd7c 100644 --- a/libkmod/libkmod-signature.c +++ b/libkmod/libkmod-signature.c @@ -57,6 +57,9 @@ enum pkey_hash_algo { PKEY_HASH_SHA512, PKEY_HASH_SHA224, PKEY_HASH_SM3, + PKEY_HASH_SHA3_256, + PKEY_HASH_SHA3_384, + PKEY_HASH_SHA3_512, PKEY_HASH__LAST }; @@ -70,6 +73,9 @@ const char *const pkey_hash_algo[PKEY_HASH__LAST] = { [PKEY_HASH_SHA512] = "sha512", [PKEY_HASH_SHA224] = "sha224", [PKEY_HASH_SM3] = "sm3", + [PKEY_HASH_SHA3_256]= "sha3-256", + [PKEY_HASH_SHA3_384]= "sha3-384", + [PKEY_HASH_SHA3_512]= "sha3-512", }; enum pkey_id_type { @@ -167,6 +173,12 @@ static int obj_to_hash_algo(const ASN1_OBJECT *o) case NID_sm3: return PKEY_HASH_SM3; # endif + case NID_sha3_256: + return PKEY_HASH_SHA3_256; + case NID_sha3_384: + return PKEY_HASH_SHA3_384; + case NID_sha3_512: + return PKEY_HASH_SHA3_512; with your other patch, libkmod: remove pkcs7 obj_to_hash_algo(), this hunk is not needed anymore. Do you want to send a new version of this patch? thanks Lucas De Marchi default: return -1; } -- 2.34.1
[PATCH] ring-buffer: Force absolute timestamp on discard of event
From: "Steven Rostedt (Google)" There's a race where if an event is discarded from the ring buffer and an interrupt were to happen at that time and insert an event, the time stamp is still used from the discarded event as an offset. This can screw up the timings. If the event is going to be discarded, set the "before_stamp" to zero. When a new event comes in, it compares the "before_stamp" with the "write_stamp" and if they are not equal, it will insert an absolute timestamp. This will prevent the timings from getting out of sync due to the discarded event. Cc: sta...@vger.kernel.org Fixes: 6f6be606e763f ("ring-buffer: Force before_stamp and write_stamp to be different on discard") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 43cc47d7faaf..a6da2d765c78 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3030,22 +3030,19 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer, local_read(>write) & ~RB_WRITE_MASK; unsigned long event_length = rb_event_length(event); + /* +* For the before_stamp to be different than the write_stamp +* to make sure that the next event adds an absolute +* value and does not rely on the saved write stamp, which +* is now going to be bogus. +*/ + rb_time_set(_buffer->before_stamp, 0); + /* Something came in, can't discard */ if (!rb_time_cmpxchg(_buffer->write_stamp, write_stamp, write_stamp - delta)) return false; - /* -* It's possible that the event time delta is zero -* (has the same time stamp as the previous event) -* in which case write_stamp and before_stamp could -* be the same. In such a case, force before_stamp -* to be different than write_stamp. It doesn't -* matter what it is, as long as its different. -*/ - if (!delta) - rb_time_set(_buffer->before_stamp, 0); - /* * If an event were to come in now, it would see that the * write_stamp and the before_stamp are different, and assume -- 2.42.0
[PATCH] ring-buffer: Test last update in 32bit version of __rb_time_read()
From: "Steven Rostedt (Google)" Since 64 bit cmpxchg() is very expensive on 32bit architectures, the timestamp used by the ring buffer does some interesting tricks to be able to still have an atomic 64 bit number. It originally just used 60 bits and broke it up into two 32 bit words where the extra 2 bits were used for synchronization. But this was not enough for all use cases, and all 64 bits were required. The 32bit version of the ring buffer timestamp was then broken up into 3 32bit words using the same counter trick. But one update was not done. The check to see if the read operation was done without interruption only checked the first two words and not last one (like it had before this update). Fix it by making sure all three updates happen without interruption by comparing the initial counter with the last updated counter. Cc: sta...@vger.kernel.org Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index a6da2d765c78..8d2a4f00eca9 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -644,8 +644,8 @@ static inline bool __rb_time_read(rb_time_t *t, u64 *ret, unsigned long *cnt) *cnt = rb_time_cnt(top); - /* If top and bottom counts don't match, this interrupted a write */ - if (*cnt != rb_time_cnt(bottom)) + /* If top and msb counts don't match, this interrupted a write */ + if (*cnt != rb_time_cnt(msb)) return false; /* The shift to msb will lose its cnt bits */ -- 2.42.0
Re: [PATCH] ipvs: add a stateless type of service and a stateless Maglev hashing scheduler
Hello, On Mon, 4 Dec 2023, Lev Pantiukhin wrote: > +#define IP_VS_SVC_F_STATELESS0x0040 /* stateless scheduling > */ I have another idea for the traffic that does not need per-client state. We need some per-dest cp to forward the packet. If we replace the cp->caddr usage with iph->saddr/daddr usage we can try it. cp->caddr is used at the following places: - tcp_snat_handler (iph->daddr), tcp_dnat_handler (iph->saddr): iph is already provided. tcp_snat_handler requires IP_VS_SVC_F_STATELESS to be set for serivce with present vaddr, i.e. non-fwmark based. So, NAT+svc->fwmark is another restriction for IP_VS_SVC_F_STATELESS because we do not know what VIP to use as saddr for outgoing traffic. - ip_vs_nfct_expect_related - we should investigate for any problems when IP_VS_CONN_F_NFCT is set, probably, we can not work with NFCT? - ip_vs_conn_drop_conntrack - FTP: - sets IP_VS_CONN_F_NFCT, uses cp->app May be IP_VS_CONN_F_NFCT should be restriction for IP_VS_SVC_F_STATELESS mode? cp->app for sure because we keep TCP seq/ack state for the app in cp->in_seq/out_seq. We can keep some dest->cp_route or another name that will hold our cp for such connections. The idea is to not allocate cp for every packet but to reuse this saved cp. It has all needed info to forward skb to real server. The first packet will create it, save it with some locking into dest and next packets will reuse it. Probably, it should be ONE_PACKET entry (not hashed in table) but can be with running timer, if needed. One refcnt for attaching to dest, new temp refcnt for every packet. But in this mode __ip_vs_conn_put_timer uses 0-second timer, we have to handle it somehow. It should be released when dest is removed and on edit_dest if needed. There are other problems to solve, such as set_tcp_state() changing dest->activeconns and dest->inactconns. They are used also in ip_vs_bind_dest(), ip_vs_unbind_dest(). As we do not keep previous connection state and as conn can start in established state, we should avoid touching these counters. For UDP ONE_PACKET has no such problem with states but for TCP/SCTP we should take care. Regards -- Julian Anastasov
Re: [PATCH] ipvs: add a stateless type of service and a stateless Maglev hashing scheduler
Hi Lev, kernel test robot noticed the following build warnings: [auto build test WARNING on horms-ipvs-next/master] [also build test WARNING on horms-ipvs/master linus/master v6.7-rc4 next-20231206] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Lev-Pantiukhin/ipvs-add-a-stateless-type-of-service-and-a-stateless-Maglev-hashing-scheduler/20231204-232344 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master patch link: https://lore.kernel.org/r/20231204152020.472247-1-kndrvt%40yandex-team.ru patch subject: [PATCH] ipvs: add a stateless type of service and a stateless Maglev hashing scheduler config: microblaze-randconfig-r123-20231206 (https://download.01.org/0day-ci/archive/20231206/202312061823.o99vpf15-...@intel.com/config) compiler: microblaze-linux-gcc (GCC) 13.2.0 reproduce: (https://download.01.org/0day-ci/archive/20231206/202312061823.o99vpf15-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202312061823.o99vpf15-...@intel.com/ sparse warnings: (new ones prefixed by >>) >> net/netfilter/ipvs/ip_vs_ctl.c:972:42: sparse: sparse: restricted __be32 >> degrades to integer net/netfilter/ipvs/ip_vs_ctl.c:972:60: sparse: sparse: restricted __be32 degrades to integer net/netfilter/ipvs/ip_vs_ctl.c:976:48: sparse: sparse: restricted __be32 degrades to integer net/netfilter/ipvs/ip_vs_ctl.c:976:70: sparse: sparse: restricted __be32 degrades to integer >> net/netfilter/ipvs/ip_vs_ctl.c:976:30: sparse: sparse: incorrect type in >> assignment (different base types) @@ expected restricted __be32 >> [usertype] diff @@ got unsigned int @@ net/netfilter/ipvs/ip_vs_ctl.c:976:30: sparse: expected restricted __be32 [usertype] diff net/netfilter/ipvs/ip_vs_ctl.c:976:30: sparse: got unsigned int >> net/netfilter/ipvs/ip_vs_ctl.c:978:41: sparse: sparse: cast from restricted >> __be32 net/netfilter/ipvs/ip_vs_ctl.c:1532:27: sparse: sparse: dereference of noderef expression vim +972 net/netfilter/ipvs/ip_vs_ctl.c 962 963 static int __ip_vs_mh_compare_dests(struct list_head *a, struct list_head *b) 964 { 965 struct ip_vs_dest *dest_a = list_entry(a, struct ip_vs_dest, n_list); 966 struct ip_vs_dest *dest_b = list_entry(b, struct ip_vs_dest, n_list); 967 unsigned int i = 0; 968 __be32 diff; 969 970 switch (dest_a->af) { 971 case AF_INET: > 972 return (int)(dest_a->addr.ip - dest_b->addr.ip); 973 974 case AF_INET6: 975 for (; i < ARRAY_SIZE(dest_a->addr.ip6); i++) { > 976 diff = dest_a->addr.ip6[i] - > dest_b->addr.ip6[i]; 977 if (diff) > 978 return (int)diff; 979 } 980 } 981 982 return 0; 983 } 984 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH 12/27] tty: hvc: convert to u8 and size_t
Le 06/12/2023 à 08:36, Jiri Slaby (SUSE) a écrit : > Switch character types to u8 and sizes to size_t. To conform to > characters/sizes in the rest of the tty layer. > > Signed-off-by: Jiri Slaby (SUSE) > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: Amit Shah > Cc: Arnd Bergmann > Cc: Paul Walmsley > Cc: Palmer Dabbelt > Cc: Albert Ou > Cc: linuxppc-...@lists.ozlabs.org > Cc: virtualizat...@lists.linux.dev > Cc: linux-ri...@lists.infradead.org > --- > arch/powerpc/include/asm/hvconsole.h | 4 ++-- > arch/powerpc/include/asm/hvsi.h| 18 > arch/powerpc/include/asm/opal.h| 8 +--- > arch/powerpc/platforms/powernv/opal.c | 14 +++-- > arch/powerpc/platforms/pseries/hvconsole.c | 4 ++-- > drivers/char/virtio_console.c | 10 - > drivers/tty/hvc/hvc_console.h | 4 ++-- > drivers/tty/hvc/hvc_dcc.c | 24 +++--- > drivers/tty/hvc/hvc_iucv.c | 18 > drivers/tty/hvc/hvc_opal.c | 5 +++-- > drivers/tty/hvc/hvc_riscv_sbi.c| 9 > drivers/tty/hvc/hvc_rtas.c | 11 +- > drivers/tty/hvc/hvc_udbg.c | 9 > drivers/tty/hvc/hvc_vio.c | 18 > drivers/tty/hvc/hvc_xen.c | 23 +++-- > drivers/tty/hvc/hvsi_lib.c | 20 ++ > 16 files changed, 107 insertions(+), 92 deletions(-) > > diff --git a/arch/powerpc/include/asm/hvconsole.h > b/arch/powerpc/include/asm/hvconsole.h > index ccb2034506f0..d841a97010a0 100644 > --- a/arch/powerpc/include/asm/hvconsole.h > +++ b/arch/powerpc/include/asm/hvconsole.h > @@ -21,8 +21,8 @@ >* Vio firmware always attempts to fetch MAX_VIO_GET_CHARS chars. The > 'count' >* parm is included to conform to put_chars() function pointer template >*/ > -extern int hvc_get_chars(uint32_t vtermno, char *buf, int count); > -extern int hvc_put_chars(uint32_t vtermno, const char *buf, int count); > +extern ssize_t hvc_get_chars(uint32_t vtermno, u8 *buf, size_t count); > +extern ssize_t hvc_put_chars(uint32_t vtermno, const u8 *buf, size_t count); Would be a good opportunity to drop this pointless deprecated 'extern' keyword on all function prototypes you are changing. Christophe