Re: [PATCH v2 6/5] statx: add STATX_RESULT_MASK flag
On Oct 19, 2018, at 11:42 AM, Miklos Szeredi wrote: >>> +#define STATX_RESULT_MASK STATX__RESERVED >> >> Please don't use that bit. > > Using it internally is perfectly harmless. If we'll need to extend > statx in the future and make use of this flag externally, then we can > easily move the internal flag somewhere else (e.g. extend request_mask > to 64bit, which we'll probably need to do anyway in that case). I was thinking about this - what is the point of returning an error if STATX__RESERVED is set? If this is used to indicate the presence of e.g. stx_mask2, then newer applications trying to request any of the flags encoded into stx_mask2 will get an error, rather than the expected behaviour of "ignore flags you don't understand, and don't set them in the return stx_mask". Essentially, this will make STATX__RESERVED useless in the future, since no application will be able to use it without getting an error if they are running on an old kernel. Cheers, Andreas signature.asc Description: Message signed with OpenPGP
Re: [PATCH v2 6/5] statx: add STATX_RESULT_MASK flag
On Oct 19, 2018, at 11:42 AM, Miklos Szeredi wrote: >>> +#define STATX_RESULT_MASK STATX__RESERVED >> >> Please don't use that bit. > > Using it internally is perfectly harmless. If we'll need to extend > statx in the future and make use of this flag externally, then we can > easily move the internal flag somewhere else (e.g. extend request_mask > to 64bit, which we'll probably need to do anyway in that case). I was thinking about this - what is the point of returning an error if STATX__RESERVED is set? If this is used to indicate the presence of e.g. stx_mask2, then newer applications trying to request any of the flags encoded into stx_mask2 will get an error, rather than the expected behaviour of "ignore flags you don't understand, and don't set them in the return stx_mask". Essentially, this will make STATX__RESERVED useless in the future, since no application will be able to use it without getting an error if they are running on an old kernel. Cheers, Andreas signature.asc Description: Message signed with OpenPGP
Re: [PATCH 0/6] Tracing register accesses with pstore and dynamic debug
On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote: > Hi, > > This patch series adds Event tracing support to pstore and is continuation > to the RFC patch introduced to add a new tracing facility for register > accesses called Register Trace Buffer(RTB). Since we decided to not introduce > a separate framework to trace register accesses and use existing framework > like tracepoints, I have moved from RFC. Details of the RFC in link below: > > Link: > https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ran...@codeaurora.org/ > > MSR tracing example given by Steven was helpful in using tracepoints for > register accesses instead of using separate trace. But just having these > IO traces would not help much unless we could have them in some persistent > ram buffer for debugging unclocked access or some kind of bus hang or an > unexpected reset caused by some buggy driver which happens a lot during > initial development stages. By analyzing the last few entries of this buffer, > we could identify the register access which is causing the issue. Hi Sai, I wanted to see if I could make some time to get your patches working. We are hitting usecases that need something like this as well. Basically devices hanging and then the ramdump does not tell us much, so in this case pstore events can be really helpful. This usecase came up last year as well. Anyway while I was going through your patches, I cleaned up some pstore code as well and I have 3 more patches on top of yours for this clean up. I prefer we submit the patches together and sync our work together so that there is least conflict. Here's my latest tree: https://github.com/joelagnel/linux-kernel/commits/pstore-events (note that I have only build tested the patches since I just wrote them and its quite late in the night here ;-)) thanks, - Joel
Re: [PATCH 0/6] Tracing register accesses with pstore and dynamic debug
On Sun, Sep 09, 2018 at 01:57:01AM +0530, Sai Prakash Ranjan wrote: > Hi, > > This patch series adds Event tracing support to pstore and is continuation > to the RFC patch introduced to add a new tracing facility for register > accesses called Register Trace Buffer(RTB). Since we decided to not introduce > a separate framework to trace register accesses and use existing framework > like tracepoints, I have moved from RFC. Details of the RFC in link below: > > Link: > https://lore.kernel.org/lkml/cover.1535119710.git.saiprakash.ran...@codeaurora.org/ > > MSR tracing example given by Steven was helpful in using tracepoints for > register accesses instead of using separate trace. But just having these > IO traces would not help much unless we could have them in some persistent > ram buffer for debugging unclocked access or some kind of bus hang or an > unexpected reset caused by some buggy driver which happens a lot during > initial development stages. By analyzing the last few entries of this buffer, > we could identify the register access which is causing the issue. Hi Sai, I wanted to see if I could make some time to get your patches working. We are hitting usecases that need something like this as well. Basically devices hanging and then the ramdump does not tell us much, so in this case pstore events can be really helpful. This usecase came up last year as well. Anyway while I was going through your patches, I cleaned up some pstore code as well and I have 3 more patches on top of yours for this clean up. I prefer we submit the patches together and sync our work together so that there is least conflict. Here's my latest tree: https://github.com/joelagnel/linux-kernel/commits/pstore-events (note that I have only build tested the patches since I just wrote them and its quite late in the night here ;-)) thanks, - Joel
Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE [ver #12]
On Fri, Oct 19, 2018 at 11:36:19PM +0100, David Howells wrote: > Alan Jenkins wrote: > > > # open_tree_clone 3 > # cd /proc/self/fd/3 > > # mount --move . /mnt > > [ 41.747831] mnt_flags=1020 umount=0 > > # cd / > > # umount /mnt > > umount: /mnt: target is busy > > > > ^ a newly introduced bug? I do not remember having this problem before. > > The reason EBUSY is returned is because propagate_mount_busy() is called by > do_umount() with refcnt == 2, but mnt_count == 3: > > umount-3577 M=f8898a34 u=3 0x555 sp=__x64_sys_umount+0x12/0x15 > > the trace line being added here: > > if (!propagate_mount_busy(mnt, 2)) { > if (!list_empty(>mnt_list)) > umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); > retval = 0; > } else { > trace_mnt_count(mnt, mnt->mnt_id, > atomic_read(>mnt_count), > 0x555, __builtin_return_address(0)); > } > > The busy evaluation is a result of this check: > > if (!list_empty(>mnt_mounts) || do_refcount_check(mnt, refcnt)) > > in propagate_mount_busy(). > > > The problem apparently being that mnt_count counts both refs from mountings > and refs from other sources, such as file descriptors or pathwalk. As it bloody well should. Once the tree has been attached, that open_ctree() descriptor is refering to vfsmount of /mnt (what else could it be?) IOW, it *is* genuinely busy. The livelock on umount -l you've mentioned is a different story - that's definitely a bug, but this -EBUSY is correct.
Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE [ver #12]
On Fri, Oct 19, 2018 at 11:36:19PM +0100, David Howells wrote: > Alan Jenkins wrote: > > > # open_tree_clone 3 > # cd /proc/self/fd/3 > > # mount --move . /mnt > > [ 41.747831] mnt_flags=1020 umount=0 > > # cd / > > # umount /mnt > > umount: /mnt: target is busy > > > > ^ a newly introduced bug? I do not remember having this problem before. > > The reason EBUSY is returned is because propagate_mount_busy() is called by > do_umount() with refcnt == 2, but mnt_count == 3: > > umount-3577 M=f8898a34 u=3 0x555 sp=__x64_sys_umount+0x12/0x15 > > the trace line being added here: > > if (!propagate_mount_busy(mnt, 2)) { > if (!list_empty(>mnt_list)) > umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); > retval = 0; > } else { > trace_mnt_count(mnt, mnt->mnt_id, > atomic_read(>mnt_count), > 0x555, __builtin_return_address(0)); > } > > The busy evaluation is a result of this check: > > if (!list_empty(>mnt_mounts) || do_refcount_check(mnt, refcnt)) > > in propagate_mount_busy(). > > > The problem apparently being that mnt_count counts both refs from mountings > and refs from other sources, such as file descriptors or pathwalk. As it bloody well should. Once the tree has been attached, that open_ctree() descriptor is refering to vfsmount of /mnt (what else could it be?) IOW, it *is* genuinely busy. The livelock on umount -l you've mentioned is a different story - that's definitely a bug, but this -EBUSY is correct.
Re: [PATCH 03/24] iov_iter: Add I/O discard iterator
On Sat, Oct 20, 2018 at 02:10:59AM +0100, David Howells wrote: > @@ -1060,6 +1074,9 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) > } > unroll -= i->iov_offset; > switch (iov_iter_type(i)) { > + case ITER_DISCARD: > + i->iov_offset = 0; > + return; ... the hell? That makes no sense whatsoever; what, besides this and immediately preceding part of iov_iter_revert() so much as looks at ->iov_offset for those? Just have it bugger off before the if (unroll <= i->iov_offset) { i->iov_offset -= unroll; return; } bit...
Re: [PATCH 03/24] iov_iter: Add I/O discard iterator
On Sat, Oct 20, 2018 at 02:10:59AM +0100, David Howells wrote: > @@ -1060,6 +1074,9 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) > } > unroll -= i->iov_offset; > switch (iov_iter_type(i)) { > + case ITER_DISCARD: > + i->iov_offset = 0; > + return; ... the hell? That makes no sense whatsoever; what, besides this and immediately preceding part of iov_iter_revert() so much as looks at ->iov_offset for those? Just have it bugger off before the if (unroll <= i->iov_offset) { i->iov_offset -= unroll; return; } bit...
Re: [PATCH 02/24] iov_iter: Renumber the ITER_* constants in uio.h
On Sat, Oct 20, 2018 at 02:10:52AM +0100, David Howells wrote: > Renumber the ITER_* constants in uio.h to be contiguous to make comparing > them more efficient in a switch-statement. Are you sure that they *are* more efficient that way? Some of those paths are fairly hot, so much that I would really like to see profiles before and after these changes (both this and the previous one).
Re: [PATCH 02/24] iov_iter: Renumber the ITER_* constants in uio.h
On Sat, Oct 20, 2018 at 02:10:52AM +0100, David Howells wrote: > Renumber the ITER_* constants in uio.h to be contiguous to make comparing > them more efficient in a switch-statement. Are you sure that they *are* more efficient that way? Some of those paths are fairly hot, so much that I would really like to see profiles before and after these changes (both this and the previous one).
Re: [PATCH 01/24] iov_iter: Separate type from direction and use accessor functions
On Sat, Oct 20, 2018 at 02:10:44AM +0100, David Howells wrote: One general comment: I would strongly recommend splitting the iov_iter initializers change into a separate patch. > index 8d41ca7bfcf1..dcdbcb6f09f8 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -2990,7 +2990,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, > struct iov_iter *iter) > size_t copy = min_t(size_t, remaining, PAGE_SIZE); > size_t written; > > - if (unlikely(iter->type & ITER_PIPE)) { > + if (unlikely(iov_iter_is_pipe(iter))) { > void *addr = kmap_atomic(page); > > written = copy_to_iter(addr, copy, iter); FWIW, I wonder if that one is actually a missing primitive getting open-coded... > @@ -786,7 +786,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct > iov_iter *iter, int rw) > struct page **pages = NULL; > struct bio_vec *bv = NULL; > > - if (iter->type & ITER_KVEC) { > + if (iov_iter_is_kvec(iter)) { > memcpy(>iter, iter, sizeof(struct iov_iter)); > ctx->len = count; > iov_iter_advance(iter, count); ... and so, to much greater extent, is this. > @@ -2054,14 +2054,22 @@ int smbd_recv(struct smbd_connection *info, struct > msghdr *msg) > + switch (iov_iter_type(>msg_iter)) { > + case ITER_KVEC: > buf = msg->msg_iter.kvec->iov_base; > to_read = msg->msg_iter.kvec->iov_len; > rc = smbd_recv_buf(info, buf, to_read); > break; > > - case READ | ITER_BVEC: > + case ITER_BVEC: > page = msg->msg_iter.bvec->bv_page; > page_offset = msg->msg_iter.bvec->bv_offset; > to_read = msg->msg_iter.bvec->bv_len; Incidentally, this is bollocks - looks like a fallout of RDMA patches of some sort, but AFAICS there's no reason have separate bvec and kvec paths there - smbd_recv_buf() can bloody well use copy_to_iter(), eliminating the need for kmap_atomic, sleep avoidance, etc. As well as this branching on iterator flavour... Anyway, not your headache. > @@ -1313,7 +1313,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode > *inode, > spin_lock_init(>bio_lock); > dio->refcount = 1; > > - dio->should_dirty = (iter->type == ITER_IOVEC); > + dio->should_dirty = iter_is_iovec(iter); Nope. This path *can* get both read and write iov_iter. Not an equivalent change. > @@ -1795,7 +1795,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (pos >= dio->i_size) > goto out_free_dio; > > - if (iter->type == ITER_IOVEC) > + if (iter_is_iovec(iter)) > dio->flags |= IOMAP_DIO_DIRTY; Ditto. > @@ -417,28 +417,35 @@ int iov_iter_fault_in_readable(struct iov_iter *i, > size_t bytes) > int err; > struct iovec v; > > - if (!(i->type & (ITER_BVEC|ITER_KVEC))) { > + switch (iov_iter_type(i)) { > + case ITER_IOVEC: > + case ITER_PIPE: > iterate_iovec(i, bytes, v, iov, skip, ({ > err = fault_in_pages_readable(v.iov_base, v.iov_len); > if (unlikely(err)) > return err; > 0;})) > + break; > + case ITER_KVEC: > + case ITER_BVEC: > + break; > } > return 0; > } > EXPORT_SYMBOL(iov_iter_fault_in_readable); Huh? That makes no sense whatsoever - ITER_PIPE ones are write-only in the first place, so they won't be passed to that one, but feeding ITER_PIPE to iterate_iovec() is insane. And even if they copy-from ITER_PIPES would appear, why the devil would we want to fault-in anything? > @@ -987,7 +1003,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) > return; > i->count += unroll; > - if (unlikely(i->type & ITER_PIPE)) { > + if (unlikely(iov_iter_is_pipe(i))) { > struct pipe_inode_info *pipe = i->pipe; ... > + case ITER_PIPE: > + BUG(); > + } > } > EXPORT_SYMBOL(iov_iter_revert); Wha...?
Re: [PATCH 01/24] iov_iter: Separate type from direction and use accessor functions
On Sat, Oct 20, 2018 at 02:10:44AM +0100, David Howells wrote: One general comment: I would strongly recommend splitting the iov_iter initializers change into a separate patch. > index 8d41ca7bfcf1..dcdbcb6f09f8 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -2990,7 +2990,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, > struct iov_iter *iter) > size_t copy = min_t(size_t, remaining, PAGE_SIZE); > size_t written; > > - if (unlikely(iter->type & ITER_PIPE)) { > + if (unlikely(iov_iter_is_pipe(iter))) { > void *addr = kmap_atomic(page); > > written = copy_to_iter(addr, copy, iter); FWIW, I wonder if that one is actually a missing primitive getting open-coded... > @@ -786,7 +786,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct > iov_iter *iter, int rw) > struct page **pages = NULL; > struct bio_vec *bv = NULL; > > - if (iter->type & ITER_KVEC) { > + if (iov_iter_is_kvec(iter)) { > memcpy(>iter, iter, sizeof(struct iov_iter)); > ctx->len = count; > iov_iter_advance(iter, count); ... and so, to much greater extent, is this. > @@ -2054,14 +2054,22 @@ int smbd_recv(struct smbd_connection *info, struct > msghdr *msg) > + switch (iov_iter_type(>msg_iter)) { > + case ITER_KVEC: > buf = msg->msg_iter.kvec->iov_base; > to_read = msg->msg_iter.kvec->iov_len; > rc = smbd_recv_buf(info, buf, to_read); > break; > > - case READ | ITER_BVEC: > + case ITER_BVEC: > page = msg->msg_iter.bvec->bv_page; > page_offset = msg->msg_iter.bvec->bv_offset; > to_read = msg->msg_iter.bvec->bv_len; Incidentally, this is bollocks - looks like a fallout of RDMA patches of some sort, but AFAICS there's no reason have separate bvec and kvec paths there - smbd_recv_buf() can bloody well use copy_to_iter(), eliminating the need for kmap_atomic, sleep avoidance, etc. As well as this branching on iterator flavour... Anyway, not your headache. > @@ -1313,7 +1313,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode > *inode, > spin_lock_init(>bio_lock); > dio->refcount = 1; > > - dio->should_dirty = (iter->type == ITER_IOVEC); > + dio->should_dirty = iter_is_iovec(iter); Nope. This path *can* get both read and write iov_iter. Not an equivalent change. > @@ -1795,7 +1795,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (pos >= dio->i_size) > goto out_free_dio; > > - if (iter->type == ITER_IOVEC) > + if (iter_is_iovec(iter)) > dio->flags |= IOMAP_DIO_DIRTY; Ditto. > @@ -417,28 +417,35 @@ int iov_iter_fault_in_readable(struct iov_iter *i, > size_t bytes) > int err; > struct iovec v; > > - if (!(i->type & (ITER_BVEC|ITER_KVEC))) { > + switch (iov_iter_type(i)) { > + case ITER_IOVEC: > + case ITER_PIPE: > iterate_iovec(i, bytes, v, iov, skip, ({ > err = fault_in_pages_readable(v.iov_base, v.iov_len); > if (unlikely(err)) > return err; > 0;})) > + break; > + case ITER_KVEC: > + case ITER_BVEC: > + break; > } > return 0; > } > EXPORT_SYMBOL(iov_iter_fault_in_readable); Huh? That makes no sense whatsoever - ITER_PIPE ones are write-only in the first place, so they won't be passed to that one, but feeding ITER_PIPE to iterate_iovec() is insane. And even if they copy-from ITER_PIPES would appear, why the devil would we want to fault-in anything? > @@ -987,7 +1003,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) > return; > i->count += unroll; > - if (unlikely(i->type & ITER_PIPE)) { > + if (unlikely(iov_iter_is_pipe(i))) { > struct pipe_inode_info *pipe = i->pipe; ... > + case ITER_PIPE: > + BUG(); > + } > } > EXPORT_SYMBOL(iov_iter_revert); Wha...?
Re: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver
Hi Neil, Thank you for the patch! Yet something to improve: [auto build test ERROR on linux-sof-driver/master] [also build test ERROR on v4.19-rc8 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-SMMUv3-PMU-driver-with-IORT-support/20181017-063949 base: https://github.com/thesofproject/linux master config: i386-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_counter_set_value': >> drivers//perf/arm_smmuv3_pmu.c:145:36: warning: left shift count >= width of >> type [-Wshift-count-overflow] if (smmu_pmu->counter_mask & BIT(32)) ^~ drivers//perf/arm_smmuv3_pmu.c:146:3: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration] writeq(value, smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8)); ^~ writel drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_counter_get_value': drivers//perf/arm_smmuv3_pmu.c:155:36: warning: left shift count >= width of type [-Wshift-count-overflow] if (smmu_pmu->counter_mask & BIT(32)) ^~ drivers//perf/arm_smmuv3_pmu.c:156:11: error: implicit declaration of function 'readq'; did you mean 'readl'? [-Werror=implicit-function-declaration] value = readq(smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8)); ^ readl drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_reset': >> drivers//perf/arm_smmuv3_pmu.c:607:2: error: implicit declaration of >> function 'writeq_relaxed'; did you mean 'seq_release'? >> [-Werror=implicit-function-declaration] writeq_relaxed(smmu_pmu->counter_present_mask, ^~ seq_release drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_probe': >> drivers//perf/arm_smmuv3_pmu.c:666:15: error: implicit declaration of >> function 'readq_relaxed'; did you mean 'seq_release'? >> [-Werror=implicit-function-declaration] ceid_64[0] = readq_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CEID0); ^ seq_release drivers//perf/arm_smmuv3_pmu.c:687:108: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'resource_size_t {aka unsigned int}' [-Wformat=] name = devm_kasprintf(>dev, GFP_KERNEL, "smmuv3_pmcg_%llx", ^ (res_0->start) >> SMMU_PA_SHIFT); cc1: some warnings being treated as errors vim +607 drivers//perf/arm_smmuv3_pmu.c 601 602 static void smmu_pmu_reset(struct smmu_pmu *smmu_pmu) 603 { 604 smmu_pmu_disable(_pmu->pmu); 605 606 /* Disable counter and interrupt */ > 607 writeq_relaxed(smmu_pmu->counter_present_mask, 608 smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0); 609 writeq_relaxed(smmu_pmu->counter_present_mask, 610 smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0); 611 writeq_relaxed(smmu_pmu->counter_present_mask, 612 smmu_pmu->reloc_base + SMMU_PMCG_OVSCLR0); 613 } 614 615 static int smmu_pmu_probe(struct platform_device *pdev) 616 { 617 struct smmu_pmu *smmu_pmu; 618 struct resource *res_0, *res_1; 619 u32 cfgr, reg_size; 620 u64 ceid_64[2]; 621 int irq, err; 622 char *name; 623 struct device *dev = >dev; 624 625 smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL); 626 if (!smmu_pmu) 627 return -ENOMEM; 628 629 smmu_pmu->dev = dev; 630 platform_set_drvdata(pdev, smmu_pmu); 631 632 smmu_pmu->pmu = (struct pmu) { 633 .task_ctx_nr= perf_invalid_context, 634 .pmu_enable = smmu_pmu_enable, 635 .pmu_disable= smmu_pmu_disable, 636 .event_init = smmu_pmu_event_init, 637 .add= smmu_pmu_event_add, 638 .del= smmu_pmu_event_del, 639 .start = smmu_pmu_event_start, 640 .stop = smmu_pmu_event_stop, 641
Re: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver
Hi Neil, Thank you for the patch! Yet something to improve: [auto build test ERROR on linux-sof-driver/master] [also build test ERROR on v4.19-rc8 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-SMMUv3-PMU-driver-with-IORT-support/20181017-063949 base: https://github.com/thesofproject/linux master config: i386-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_counter_set_value': >> drivers//perf/arm_smmuv3_pmu.c:145:36: warning: left shift count >= width of >> type [-Wshift-count-overflow] if (smmu_pmu->counter_mask & BIT(32)) ^~ drivers//perf/arm_smmuv3_pmu.c:146:3: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration] writeq(value, smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8)); ^~ writel drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_counter_get_value': drivers//perf/arm_smmuv3_pmu.c:155:36: warning: left shift count >= width of type [-Wshift-count-overflow] if (smmu_pmu->counter_mask & BIT(32)) ^~ drivers//perf/arm_smmuv3_pmu.c:156:11: error: implicit declaration of function 'readq'; did you mean 'readl'? [-Werror=implicit-function-declaration] value = readq(smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8)); ^ readl drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_reset': >> drivers//perf/arm_smmuv3_pmu.c:607:2: error: implicit declaration of >> function 'writeq_relaxed'; did you mean 'seq_release'? >> [-Werror=implicit-function-declaration] writeq_relaxed(smmu_pmu->counter_present_mask, ^~ seq_release drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_probe': >> drivers//perf/arm_smmuv3_pmu.c:666:15: error: implicit declaration of >> function 'readq_relaxed'; did you mean 'seq_release'? >> [-Werror=implicit-function-declaration] ceid_64[0] = readq_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CEID0); ^ seq_release drivers//perf/arm_smmuv3_pmu.c:687:108: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'resource_size_t {aka unsigned int}' [-Wformat=] name = devm_kasprintf(>dev, GFP_KERNEL, "smmuv3_pmcg_%llx", ^ (res_0->start) >> SMMU_PA_SHIFT); cc1: some warnings being treated as errors vim +607 drivers//perf/arm_smmuv3_pmu.c 601 602 static void smmu_pmu_reset(struct smmu_pmu *smmu_pmu) 603 { 604 smmu_pmu_disable(_pmu->pmu); 605 606 /* Disable counter and interrupt */ > 607 writeq_relaxed(smmu_pmu->counter_present_mask, 608 smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0); 609 writeq_relaxed(smmu_pmu->counter_present_mask, 610 smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0); 611 writeq_relaxed(smmu_pmu->counter_present_mask, 612 smmu_pmu->reloc_base + SMMU_PMCG_OVSCLR0); 613 } 614 615 static int smmu_pmu_probe(struct platform_device *pdev) 616 { 617 struct smmu_pmu *smmu_pmu; 618 struct resource *res_0, *res_1; 619 u32 cfgr, reg_size; 620 u64 ceid_64[2]; 621 int irq, err; 622 char *name; 623 struct device *dev = >dev; 624 625 smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL); 626 if (!smmu_pmu) 627 return -ENOMEM; 628 629 smmu_pmu->dev = dev; 630 platform_set_drvdata(pdev, smmu_pmu); 631 632 smmu_pmu->pmu = (struct pmu) { 633 .task_ctx_nr= perf_invalid_context, 634 .pmu_enable = smmu_pmu_enable, 635 .pmu_disable= smmu_pmu_disable, 636 .event_init = smmu_pmu_event_init, 637 .add= smmu_pmu_event_add, 638 .del= smmu_pmu_event_del, 639 .start = smmu_pmu_event_start, 640 .stop = smmu_pmu_event_stop, 641
Re: perf overlapping maps...
From: David Miller Date: Fri, 19 Oct 2018 21:05:49 -0700 (PDT) > One solution I've come up with is: > > 1) When synthesizing a fork event, set PERF_RECORD_MISC_COMM_EXEC in >header->misc. > > 2) Use this to elide the map groups clone in >thread__clone_map_groups(). Looking into code history, I notice: commit 363b785f3805a2632eb09a8b430842461c21a640 Author: Don Zickus Date: Fri Mar 14 10:43:44 2014 -0400 perf tools: Speed up thread map generation and the subsequent: commit 4aa5f4f7bb8bc41cba15bcd0d80c4fb085027d6b Author: Arnaldo Carvalho de Melo Date: Fri Feb 27 19:52:10 2015 -0300 perf tools: Fix FORK after COMM when synthesizing records for pre-existing threads If Don wanted to have the map cloning to happen for processes without CLONE_VM, I'm not sure that's right. For real threads, we just take a reference to the map group from the parent. Don, a quick summary. If we synthesize a fork event, let's say for an emacs process. perf will clone the map groups of the parent bash shell which invoked emacs. Via: thread__fork(thread, parent, timestamp) { ... thread__clone_map_groups(thread, parent) { ... map_groups__clone(thread, parent->mg) Which is completely bogus. It brings all of the bash process maps into the emacs thread map group. Then we process the emacs mmap2 events, which overlap the bash process maps already cloned into the emacs map group. And this make all kinds of erroneous things happen. I'm suggesting to elide the map groups clone in this situation where we are synthesizing the fork. Thanks.
Re: perf overlapping maps...
From: David Miller Date: Fri, 19 Oct 2018 21:05:49 -0700 (PDT) > One solution I've come up with is: > > 1) When synthesizing a fork event, set PERF_RECORD_MISC_COMM_EXEC in >header->misc. > > 2) Use this to elide the map groups clone in >thread__clone_map_groups(). Looking into code history, I notice: commit 363b785f3805a2632eb09a8b430842461c21a640 Author: Don Zickus Date: Fri Mar 14 10:43:44 2014 -0400 perf tools: Speed up thread map generation and the subsequent: commit 4aa5f4f7bb8bc41cba15bcd0d80c4fb085027d6b Author: Arnaldo Carvalho de Melo Date: Fri Feb 27 19:52:10 2015 -0300 perf tools: Fix FORK after COMM when synthesizing records for pre-existing threads If Don wanted to have the map cloning to happen for processes without CLONE_VM, I'm not sure that's right. For real threads, we just take a reference to the map group from the parent. Don, a quick summary. If we synthesize a fork event, let's say for an emacs process. perf will clone the map groups of the parent bash shell which invoked emacs. Via: thread__fork(thread, parent, timestamp) { ... thread__clone_map_groups(thread, parent) { ... map_groups__clone(thread, parent->mg) Which is completely bogus. It brings all of the bash process maps into the emacs thread map group. Then we process the emacs mmap2 events, which overlap the bash process maps already cloned into the emacs map group. And this make all kinds of erroneous things happen. I'm suggesting to elide the map groups clone in this situation where we are synthesizing the fork. Thanks.
perf overlapping maps...
Symbols aren't exactly right all the time on sparc and even the owner of a sample is set to "unknown" from time to time so I turned on some debugging to investigate. One thing that stands out is that we get overlapping maps all the time. So I tried to narrow down how this happens. Here is one case, we get a new thread fork event for emacs-gtk before the MMAP events so we go: thread__fork(thread, parent, timestamp) { ... thread__clone_map_groups(thread, parent) { ... map_groups__clone(thread, parent->mg) Dumping this map_groups__clone() operation I see: map_groups__clone: parent 0x1425420 --> 0x1418fb0 map_groups__clone: new [0100:0111] /bin/bash map_groups__clone: new [01212000:0121e000] /bin/bash map_groups__clone: new [0121e000:012a] /tmp/perf-1309.map map_groups__clone: new [fff1:fff100024000] /lib/sparc64-linux-gnu/ld-2.27.so map_groups__clone: new [fff100124000:fff100126000] /lib/sparc64-linux-gnu/ld-2.27.so map_groups__clone: new [fff100128000:fff100152000] /lib/sparc64-linux-gnu/libtinfo.so.6.1 map_groups__clone: new [fff100254000:fff100256000] /lib/sparc64-linux-gnu/libtinfo.so.6.1 map_groups__clone: new [fff100258000:fff10025c000] /lib/sparc64-linux-gnu/libdl-2.27.so map_groups__clone: new [fff10035c000:fff10035e000] /lib/sparc64-linux-gnu/libdl-2.27.so map_groups__clone: new [fff10046a000:fff10046c000] [vdso] map_groups__clone: new [fff10046c000:fff1005cc000] /lib/sparc64-linux-gnu/libc-2.27.so map_groups__clone: new [fff1006d:fff1006d4000] /lib/sparc64-linux-gnu/libc-2.27.so map_groups__clone: new [fff1006d4000:fff1006d6000] /tmp/perf-1309.map map_groups__clone: new [fff100874000:fff10087e000] /lib/sparc64-linux-gnu/libnss_files-2.27.so map_groups__clone: new [fff10097e000:fff10098] /lib/sparc64-linux-gnu/libnss_files-2.27.so map_groups__clone: new [fff10098:fff100986000] /tmp/perf-1309.map It's inheriting maps for the parent bash shell that invoked emacs-gtk, which makes no sense at all. We proceed to process the MMAP events which have the proper mappings for emacs-gtk, and eventually we happen to hit a mapping that overlaps with one of the address ranges of the parent bash shell. For the stuff that doesn't overlap, we have bogus parent bash shell process mappings left in the emacs-gtk thread map group. The above trace is simply from "./perf record 2>x.log", nothing fancy. What we are doing above can't be right. Yes, when processing real perf events from the kernel for a fork event, we should do that inheritance stuff. But if we are synthesizing the fork to build threads and maps for already running processes, we absolutely should not perform the map groups clone. One solution I've come up with is: 1) When synthesizing a fork event, set PERF_RECORD_MISC_COMM_EXEC in header->misc. 2) Use this to elide the map groups clone in thread__clone_map_groups(). Comments?
perf overlapping maps...
Symbols aren't exactly right all the time on sparc and even the owner of a sample is set to "unknown" from time to time so I turned on some debugging to investigate. One thing that stands out is that we get overlapping maps all the time. So I tried to narrow down how this happens. Here is one case, we get a new thread fork event for emacs-gtk before the MMAP events so we go: thread__fork(thread, parent, timestamp) { ... thread__clone_map_groups(thread, parent) { ... map_groups__clone(thread, parent->mg) Dumping this map_groups__clone() operation I see: map_groups__clone: parent 0x1425420 --> 0x1418fb0 map_groups__clone: new [0100:0111] /bin/bash map_groups__clone: new [01212000:0121e000] /bin/bash map_groups__clone: new [0121e000:012a] /tmp/perf-1309.map map_groups__clone: new [fff1:fff100024000] /lib/sparc64-linux-gnu/ld-2.27.so map_groups__clone: new [fff100124000:fff100126000] /lib/sparc64-linux-gnu/ld-2.27.so map_groups__clone: new [fff100128000:fff100152000] /lib/sparc64-linux-gnu/libtinfo.so.6.1 map_groups__clone: new [fff100254000:fff100256000] /lib/sparc64-linux-gnu/libtinfo.so.6.1 map_groups__clone: new [fff100258000:fff10025c000] /lib/sparc64-linux-gnu/libdl-2.27.so map_groups__clone: new [fff10035c000:fff10035e000] /lib/sparc64-linux-gnu/libdl-2.27.so map_groups__clone: new [fff10046a000:fff10046c000] [vdso] map_groups__clone: new [fff10046c000:fff1005cc000] /lib/sparc64-linux-gnu/libc-2.27.so map_groups__clone: new [fff1006d:fff1006d4000] /lib/sparc64-linux-gnu/libc-2.27.so map_groups__clone: new [fff1006d4000:fff1006d6000] /tmp/perf-1309.map map_groups__clone: new [fff100874000:fff10087e000] /lib/sparc64-linux-gnu/libnss_files-2.27.so map_groups__clone: new [fff10097e000:fff10098] /lib/sparc64-linux-gnu/libnss_files-2.27.so map_groups__clone: new [fff10098:fff100986000] /tmp/perf-1309.map It's inheriting maps for the parent bash shell that invoked emacs-gtk, which makes no sense at all. We proceed to process the MMAP events which have the proper mappings for emacs-gtk, and eventually we happen to hit a mapping that overlaps with one of the address ranges of the parent bash shell. For the stuff that doesn't overlap, we have bogus parent bash shell process mappings left in the emacs-gtk thread map group. The above trace is simply from "./perf record 2>x.log", nothing fancy. What we are doing above can't be right. Yes, when processing real perf events from the kernel for a fork event, we should do that inheritance stuff. But if we are synthesizing the fork to build threads and maps for already running processes, we absolutely should not perform the map groups clone. One solution I've come up with is: 1) When synthesizing a fork event, set PERF_RECORD_MISC_COMM_EXEC in header->misc. 2) Use this to elide the map groups clone in thread__clone_map_groups(). Comments?
[tip:locking/core 6/10] arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
tree: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core head: 01a14bda11add9dcd4a59200f13834d634559935 commit: 7aa54be2976550f17c11a1c3e3630002dea39303 [6/10] locking/qspinlock, x86: Provide liveness guarantee config: x86_64-randconfig-u0-10201040 (attached as .config) compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010 reproduce: git checkout 7aa54be2976550f17c11a1c3e3630002dea39303 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/atomic.h:5:0, from include/linux/atomic.h:7, from include/linux/crypto.h:20, from arch/x86/kernel/asm-offsets.c:9: arch/x86/include/asm/qspinlock.h: In function 'queued_fetch_set_pending_acquire': >> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression : clobbers : cc_label);\ ^ include/linux/compiler.h:58:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 'RMWcc_CONCAT' #define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, RMWcc_ARGS(X))(X) ^ arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 'GEN_BINARY_RMWcc' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here cc_label: c = true; \ ^ include/linux/compiler.h:58:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 'RMWcc_CONCAT' #define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, RMWcc_ARGS(X))(X) ^ arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 'GEN_BINARY_RMWcc' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ >> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label' cc_label: c = true; \ ^ include/linux/compiler.h:58:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32:
[tip:locking/core 6/10] arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
tree: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core head: 01a14bda11add9dcd4a59200f13834d634559935 commit: 7aa54be2976550f17c11a1c3e3630002dea39303 [6/10] locking/qspinlock, x86: Provide liveness guarantee config: x86_64-randconfig-u0-10201040 (attached as .config) compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010 reproduce: git checkout 7aa54be2976550f17c11a1c3e3630002dea39303 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/atomic.h:5:0, from include/linux/atomic.h:7, from include/linux/crypto.h:20, from arch/x86/kernel/asm-offsets.c:9: arch/x86/include/asm/qspinlock.h: In function 'queued_fetch_set_pending_acquire': >> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression : clobbers : cc_label);\ ^ include/linux/compiler.h:58:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 'RMWcc_CONCAT' #define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, RMWcc_ARGS(X))(X) ^ arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 'GEN_BINARY_RMWcc' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here cc_label: c = true; \ ^ include/linux/compiler.h:58:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 'RMWcc_CONCAT' #define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, RMWcc_ARGS(X))(X) ^ arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 'GEN_BINARY_RMWcc' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ >> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label' cc_label: c = true; \ ^ include/linux/compiler.h:58:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if' if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c, ^ arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc' __GEN_RMWcc(op " %[val], " arg0, var, cc, \ ^ arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 'GEN_BINARY_RMWcc_6' GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]") ^ arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 'GEN_BINARY_RMWcc_5' #define __RMWcc_CONCAT(a, b) a ## b ^ arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro '__RMWcc_CONCAT' #define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b) ^ arch/x86/include/asm/rmwcc.h:60:32:
Second attempt to reach you: URGENT
Hello, I'm Kelvin Cord, with First City Monument Bank here in Nigeria. I was wondering if you received my last email. I sincerely seek to present you as the Next of Kin to a late client who left behind ($170 Million USD) in a fixed deposit account in my bank before his demise. The British born client was into Diamond and Gold mining and died without a Next of Kin. I shall obtain the legal documents that will give you legal rights to make this claim legitimately. I am willing to share the funds 60/40 with you and this will be completed within 72 hours. With the legal documents, the bank will approve you as the Next of Kin and pay out this amount to you within three working days. I considered the funds would be of better use to both of us instead of allowing corrupt politicians confiscate the funds before the coming presidenal election. Please reach me as soon as possible with your full name, address, direct contact number and occupation for the processing of the legal documents if you are interested and can be trusted to return my own share when you have received the funds in your bank account. More information shall be given to you once I hear from you. If however you are not interested in the offer, kindly delete this message from your mailbox and pretend that I never contacted you. I shall continue my search for a liable person that will be able to help me out. Best Wishes, Kelvin Cord Auditor General, First City Monument Bank
Second attempt to reach you: URGENT
Hello, I'm Kelvin Cord, with First City Monument Bank here in Nigeria. I was wondering if you received my last email. I sincerely seek to present you as the Next of Kin to a late client who left behind ($170 Million USD) in a fixed deposit account in my bank before his demise. The British born client was into Diamond and Gold mining and died without a Next of Kin. I shall obtain the legal documents that will give you legal rights to make this claim legitimately. I am willing to share the funds 60/40 with you and this will be completed within 72 hours. With the legal documents, the bank will approve you as the Next of Kin and pay out this amount to you within three working days. I considered the funds would be of better use to both of us instead of allowing corrupt politicians confiscate the funds before the coming presidenal election. Please reach me as soon as possible with your full name, address, direct contact number and occupation for the processing of the legal documents if you are interested and can be trusted to return my own share when you have received the funds in your bank account. More information shall be given to you once I hear from you. If however you are not interested in the offer, kindly delete this message from your mailbox and pretend that I never contacted you. I shall continue my search for a liable person that will be able to help me out. Best Wishes, Kelvin Cord Auditor General, First City Monument Bank
[PATCH v2 1/1] ARM: dts: imx6sx-sdb: Add flexcan support
From: Dong Aisheng CAN transceiver is different on RevA and RevB board. It's active high on RevA while active low on Rev B. Signed-off-by: Dong Aisheng Signed-off-by: Joakim Zhang --- arch/arm/boot/dts/imx6sx-sdb-reva.dts | 12 arch/arm/boot/dts/imx6sx-sdb.dts | 5 arch/arm/boot/dts/imx6sx-sdb.dtsi | 42 +++ 3 files changed, 59 insertions(+) diff --git a/arch/arm/boot/dts/imx6sx-sdb-reva.dts b/arch/arm/boot/dts/imx6sx-sdb-reva.dts index 9cc6ff206aea..d98dcf00b9c4 100644 --- a/arch/arm/boot/dts/imx6sx-sdb-reva.dts +++ b/arch/arm/boot/dts/imx6sx-sdb-reva.dts @@ -10,6 +10,18 @@ / { model = "Freescale i.MX6 SoloX SDB RevA Board"; + + /* Transceiver EN/STBY is active high on RevA board */ + reg_can_en: regulator-can-en { + gpio = < 25 GPIO_ACTIVE_HIGH>; + enable-active-high; + }; + + reg_can_stby: regulator-can-stby { + gpio = < 27 GPIO_ACTIVE_HIGH>; + enable-active-high; + vin-supply = <_can_en>; + }; }; { diff --git a/arch/arm/boot/dts/imx6sx-sdb.dts b/arch/arm/boot/dts/imx6sx-sdb.dts index 6dd9bebfe027..092b8de142a8 100644 --- a/arch/arm/boot/dts/imx6sx-sdb.dts +++ b/arch/arm/boot/dts/imx6sx-sdb.dts @@ -10,6 +10,11 @@ / { model = "Freescale i.MX6 SoloX SDB RevB Board"; + + /* Transceiver EN/STBY is active low on RevB board */ + reg_can_stby: regulator-can-stby { + gpio = < 27 GPIO_ACTIVE_LOW>; + }; }; { diff --git a/arch/arm/boot/dts/imx6sx-sdb.dtsi b/arch/arm/boot/dts/imx6sx-sdb.dtsi index f8f31872fa14..e37ec4b396a2 100644 --- a/arch/arm/boot/dts/imx6sx-sdb.dtsi +++ b/arch/arm/boot/dts/imx6sx-sdb.dtsi @@ -136,6 +136,20 @@ regulator-max-microvolt = <500>; }; + reg_can_en: regulator-can-en { + compatible = "regulator-fixed"; + regulator-name = "can-en"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + + reg_can_stby: regulator-can-stby { + compatible = "regulator-fixed"; + regulator-name = "can-stby"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + sound { compatible = "fsl,imx6sx-sdb-wm8962", "fsl,imx-audio-wm8962"; model = "wm8962-audio"; @@ -202,6 +216,20 @@ status = "okay"; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_flexcan1>; + xceiver-supply = <_can_stby>; + status = "okay"; +}; + + { + pinctrl-names = "default"; + pinctrl-0 = <_flexcan2>; + xceiver-supply = <_can_stby>; + status = "okay"; +}; + { clock-frequency = <10>; pinctrl-names = "default"; @@ -397,6 +425,20 @@ >; }; + pinctrl_flexcan1: flexcan1grp { + fsl,pins = < + MX6SX_PAD_QSPI1B_DQS__CAN1_TX 0x1b020 + MX6SX_PAD_QSPI1A_SS1_B__CAN1_RX 0x1b020 + >; + }; + + pinctrl_flexcan2: flexcan2grp { + fsl,pins = < + MX6SX_PAD_QSPI1B_SS1_B__CAN2_RX 0x1b020 + MX6SX_PAD_QSPI1A_DQS__CAN2_TX 0x1b020 + >; + }; + pinctrl_gpio_keys: gpio_keysgrp { fsl,pins = < MX6SX_PAD_CSI_DATA04__GPIO1_IO_18 0x17059 -- 2.17.1
[PATCH v2 1/1] ARM: dts: imx6sx-sdb: Add flexcan support
From: Dong Aisheng CAN transceiver is different on RevA and RevB board. It's active high on RevA while active low on Rev B. Signed-off-by: Dong Aisheng Signed-off-by: Joakim Zhang --- arch/arm/boot/dts/imx6sx-sdb-reva.dts | 12 arch/arm/boot/dts/imx6sx-sdb.dts | 5 arch/arm/boot/dts/imx6sx-sdb.dtsi | 42 +++ 3 files changed, 59 insertions(+) diff --git a/arch/arm/boot/dts/imx6sx-sdb-reva.dts b/arch/arm/boot/dts/imx6sx-sdb-reva.dts index 9cc6ff206aea..d98dcf00b9c4 100644 --- a/arch/arm/boot/dts/imx6sx-sdb-reva.dts +++ b/arch/arm/boot/dts/imx6sx-sdb-reva.dts @@ -10,6 +10,18 @@ / { model = "Freescale i.MX6 SoloX SDB RevA Board"; + + /* Transceiver EN/STBY is active high on RevA board */ + reg_can_en: regulator-can-en { + gpio = < 25 GPIO_ACTIVE_HIGH>; + enable-active-high; + }; + + reg_can_stby: regulator-can-stby { + gpio = < 27 GPIO_ACTIVE_HIGH>; + enable-active-high; + vin-supply = <_can_en>; + }; }; { diff --git a/arch/arm/boot/dts/imx6sx-sdb.dts b/arch/arm/boot/dts/imx6sx-sdb.dts index 6dd9bebfe027..092b8de142a8 100644 --- a/arch/arm/boot/dts/imx6sx-sdb.dts +++ b/arch/arm/boot/dts/imx6sx-sdb.dts @@ -10,6 +10,11 @@ / { model = "Freescale i.MX6 SoloX SDB RevB Board"; + + /* Transceiver EN/STBY is active low on RevB board */ + reg_can_stby: regulator-can-stby { + gpio = < 27 GPIO_ACTIVE_LOW>; + }; }; { diff --git a/arch/arm/boot/dts/imx6sx-sdb.dtsi b/arch/arm/boot/dts/imx6sx-sdb.dtsi index f8f31872fa14..e37ec4b396a2 100644 --- a/arch/arm/boot/dts/imx6sx-sdb.dtsi +++ b/arch/arm/boot/dts/imx6sx-sdb.dtsi @@ -136,6 +136,20 @@ regulator-max-microvolt = <500>; }; + reg_can_en: regulator-can-en { + compatible = "regulator-fixed"; + regulator-name = "can-en"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + + reg_can_stby: regulator-can-stby { + compatible = "regulator-fixed"; + regulator-name = "can-stby"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + sound { compatible = "fsl,imx6sx-sdb-wm8962", "fsl,imx-audio-wm8962"; model = "wm8962-audio"; @@ -202,6 +216,20 @@ status = "okay"; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_flexcan1>; + xceiver-supply = <_can_stby>; + status = "okay"; +}; + + { + pinctrl-names = "default"; + pinctrl-0 = <_flexcan2>; + xceiver-supply = <_can_stby>; + status = "okay"; +}; + { clock-frequency = <10>; pinctrl-names = "default"; @@ -397,6 +425,20 @@ >; }; + pinctrl_flexcan1: flexcan1grp { + fsl,pins = < + MX6SX_PAD_QSPI1B_DQS__CAN1_TX 0x1b020 + MX6SX_PAD_QSPI1A_SS1_B__CAN1_RX 0x1b020 + >; + }; + + pinctrl_flexcan2: flexcan2grp { + fsl,pins = < + MX6SX_PAD_QSPI1B_SS1_B__CAN2_RX 0x1b020 + MX6SX_PAD_QSPI1A_DQS__CAN2_TX 0x1b020 + >; + }; + pinctrl_gpio_keys: gpio_keysgrp { fsl,pins = < MX6SX_PAD_CSI_DATA04__GPIO1_IO_18 0x17059 -- 2.17.1
RE: [PATCH] ARM: dts: imx6sx-sdb: Add flexcan support
-Original Message- From: Fabio Estevam [mailto:feste...@gmail.com] Sent: 2018年10月19日 20:52 To: Joakim Zhang Cc: Shawn Guo ; Sascha Hauer ; Sascha Hauer ; Fabio Estevam ; dl-linux-imx ; Rob Herring ; Mark Rutland ; A.s. Dong ; open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS ; linux-kernel ; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE Subject: Re: [PATCH] ARM: dts: imx6sx-sdb: Add flexcan support Hi Joakim, On Fri, Oct 19, 2018 at 6:43 AM Joakim Zhang wrote: > > From: Dong Aisheng > > CAN transceiver is different on RevA and RevB board. > It's active high on RevA while active low on Rev B. > > Signed-off-by: Dong Aisheng > Signed-off-by: Joakim Zhang > --- > arch/arm/boot/dts/imx6sx-sdb.dts | 5 > arch/arm/boot/dts/imx6sx-sdb.dtsi | 42 +++ > 2 files changed, 47 insertions(+) > > diff --git a/arch/arm/boot/dts/imx6sx-sdb.dts > b/arch/arm/boot/dts/imx6sx-sdb.dts > index 6dd9bebfe027..092b8de142a8 100644 > --- a/arch/arm/boot/dts/imx6sx-sdb.dts > +++ b/arch/arm/boot/dts/imx6sx-sdb.dts > @@ -10,6 +10,11 @@ > > / { > model = "Freescale i.MX6 SoloX SDB RevB Board"; > + > + /* Transceiver EN/STBY is active low on RevB board */ > + reg_can_stby: regulator-can-stby { > + gpio = < 27 GPIO_ACTIVE_LOW>; Don't we need a gpio = < 27 GPIO_ACTIVE_HIGH>; and also a "enable-active-high" entries in the imx6sx-sdb-reva.dts? Yes, you are right. I will add the entry in the imx6sx-sdb-reva.dts. Thanks a lot! BRs, Jaokim Zhang
RE: [PATCH] ARM: dts: imx6sx-sdb: Add flexcan support
-Original Message- From: Fabio Estevam [mailto:feste...@gmail.com] Sent: 2018年10月19日 20:52 To: Joakim Zhang Cc: Shawn Guo ; Sascha Hauer ; Sascha Hauer ; Fabio Estevam ; dl-linux-imx ; Rob Herring ; Mark Rutland ; A.s. Dong ; open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS ; linux-kernel ; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE Subject: Re: [PATCH] ARM: dts: imx6sx-sdb: Add flexcan support Hi Joakim, On Fri, Oct 19, 2018 at 6:43 AM Joakim Zhang wrote: > > From: Dong Aisheng > > CAN transceiver is different on RevA and RevB board. > It's active high on RevA while active low on Rev B. > > Signed-off-by: Dong Aisheng > Signed-off-by: Joakim Zhang > --- > arch/arm/boot/dts/imx6sx-sdb.dts | 5 > arch/arm/boot/dts/imx6sx-sdb.dtsi | 42 +++ > 2 files changed, 47 insertions(+) > > diff --git a/arch/arm/boot/dts/imx6sx-sdb.dts > b/arch/arm/boot/dts/imx6sx-sdb.dts > index 6dd9bebfe027..092b8de142a8 100644 > --- a/arch/arm/boot/dts/imx6sx-sdb.dts > +++ b/arch/arm/boot/dts/imx6sx-sdb.dts > @@ -10,6 +10,11 @@ > > / { > model = "Freescale i.MX6 SoloX SDB RevB Board"; > + > + /* Transceiver EN/STBY is active low on RevB board */ > + reg_can_stby: regulator-can-stby { > + gpio = < 27 GPIO_ACTIVE_LOW>; Don't we need a gpio = < 27 GPIO_ACTIVE_HIGH>; and also a "enable-active-high" entries in the imx6sx-sdb-reva.dts? Yes, you are right. I will add the entry in the imx6sx-sdb-reva.dts. Thanks a lot! BRs, Jaokim Zhang
Re: [PATCH 3/3] fpga manager: Adding FPGA Manager support for Xilinx zynqmp
On Fri, Oct 19, 2018 at 2:33 PM Moritz Fischer wrote: > > Hi Nava, > > Looks good to me, a couple of nits inline below. > > On Fri, Oct 19, 2018 at 1:50 AM Nava kishore Manne > wrote: > > > > This patch adds FPGA Manager support for the Xilinx > > ZynqMp chip. > > Isn't it ZynqMP ? > > > > Signed-off-by: Nava kishore Manne > > --- > > Changes for v1: > > -None. > > > > Changes for RFC-V2: > > -Updated the Fpga Mgr registrations call's > > to 4.18 > > > > drivers/fpga/Kconfig | 9 +++ > > drivers/fpga/Makefile | 1 + > > drivers/fpga/zynqmp-fpga.c | 159 + > > 3 files changed, 169 insertions(+) > > create mode 100644 drivers/fpga/zynqmp-fpga.c > > > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig > > index 1ebcef4bab5b..26ebbcf3d3a3 100644 > > --- a/drivers/fpga/Kconfig > > +++ b/drivers/fpga/Kconfig > > @@ -56,6 +56,15 @@ config FPGA_MGR_ZYNQ_FPGA > > help > > FPGA manager driver support for Xilinx Zynq FPGAs. > > > > +config FPGA_MGR_ZYNQMP_FPGA > > + tristate "Xilinx Zynqmp FPGA" > > + depends on ARCH_ZYNQMP || COMPILE_TEST > > + help > > + FPGA manager driver support for Xilinx ZynqMP FPGAs. > > + This driver uses processor configuration port(PCAP) > This driver uses *the* processor configuration port. > > > + to configure the programmable logic(PL) through PS > > + on ZynqMP SoC. > > + > > config FPGA_MGR_XILINX_SPI > > tristate "Xilinx Configuration over Slave Serial (SPI)" > > depends on SPI > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile > > index 7a2d73ba7122..3488ebbaee46 100644 > > --- a/drivers/fpga/Makefile > > +++ b/drivers/fpga/Makefile > > @@ -16,6 +16,7 @@ obj-$(CONFIG_FPGA_MGR_SOCFPGA_A10)+= socfpga-a10.o > > obj-$(CONFIG_FPGA_MGR_TS73XX) += ts73xx-fpga.o > > obj-$(CONFIG_FPGA_MGR_XILINX_SPI) += xilinx-spi.o > > obj-$(CONFIG_FPGA_MGR_ZYNQ_FPGA) += zynq-fpga.o > > +obj-$(CONFIG_FPGA_MGR_ZYNQMP_FPGA) += zynqmp-fpga.o > > obj-$(CONFIG_ALTERA_PR_IP_CORE) += altera-pr-ip-core.o > > obj-$(CONFIG_ALTERA_PR_IP_CORE_PLAT)+= altera-pr-ip-core-plat.o > > > > diff --git a/drivers/fpga/zynqmp-fpga.c b/drivers/fpga/zynqmp-fpga.c > > new file mode 100644 > > index ..2760d7e3872a > > --- /dev/null > > +++ b/drivers/fpga/zynqmp-fpga.c > > @@ -0,0 +1,159 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * Copyright (C) 2018 Xilinx, Inc. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* Constant Definitions */ > > +#define IXR_FPGA_DONE_MASK 0X0008U > > + > > +/** > > + * struct zynqmp_fpga_priv - Private data structure > > + * @dev: Device data structure > > + * @flags: flags which is used to identify the bitfile type > > + */ > > +struct zynqmp_fpga_priv { > > + struct device *dev; > > + u32 flags; > > +}; > > + > > +static int zynqmp_fpga_ops_write_init(struct fpga_manager *mgr, > > + struct fpga_image_info *info, > > + const char *buf, size_t size) > > +{ > > + struct zynqmp_fpga_priv *priv; > > + > > + priv = mgr->priv; > > + priv->flags = info->flags; > > + > > + return 0; > > +} > > + > > +static int zynqmp_fpga_ops_write(struct fpga_manager *mgr, > > +const char *buf, size_t size) > > +{ > > + struct zynqmp_fpga_priv *priv; > > + char *kbuf; > > + dma_addr_t dma_addr; > > + int ret; > > + const struct zynqmp_eemi_ops *eemi_ops = zynqmp_pm_get_eemi_ops(); > > Reverse xmas-tree please, i.e. long lines first. > > > + > > + if (!eemi_ops || !eemi_ops->fpga_load) > > + return -ENXIO; > > + > > + priv = mgr->priv; > > + > > + kbuf = dma_alloc_coherent(priv->dev, size, _addr, GFP_KERNEL); > > + if (!kbuf) > > + return -ENOMEM; > > + > > + memcpy(kbuf, buf, size); > > + > > + wmb(); /* ensure all writes are done before initiate FW call */ > > + > > + ret = eemi_ops->fpga_load(dma_addr, size, priv->flags); Don't you have to do anything with the flags? Is it really just a pass-through of FPGA manager flags to eemi calls? Don't you want to make partial bitstreams e.g. use a flags value that you export in your firmware header (xlnx-zynqmp.h) and set those based on what flags get passed in, i.e. explicitely translate FPGA Manager flags to your firmware flags? Thanks, Moritz
Re: [PATCH 3/3] fpga manager: Adding FPGA Manager support for Xilinx zynqmp
On Fri, Oct 19, 2018 at 2:33 PM Moritz Fischer wrote: > > Hi Nava, > > Looks good to me, a couple of nits inline below. > > On Fri, Oct 19, 2018 at 1:50 AM Nava kishore Manne > wrote: > > > > This patch adds FPGA Manager support for the Xilinx > > ZynqMp chip. > > Isn't it ZynqMP ? > > > > Signed-off-by: Nava kishore Manne > > --- > > Changes for v1: > > -None. > > > > Changes for RFC-V2: > > -Updated the Fpga Mgr registrations call's > > to 4.18 > > > > drivers/fpga/Kconfig | 9 +++ > > drivers/fpga/Makefile | 1 + > > drivers/fpga/zynqmp-fpga.c | 159 + > > 3 files changed, 169 insertions(+) > > create mode 100644 drivers/fpga/zynqmp-fpga.c > > > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig > > index 1ebcef4bab5b..26ebbcf3d3a3 100644 > > --- a/drivers/fpga/Kconfig > > +++ b/drivers/fpga/Kconfig > > @@ -56,6 +56,15 @@ config FPGA_MGR_ZYNQ_FPGA > > help > > FPGA manager driver support for Xilinx Zynq FPGAs. > > > > +config FPGA_MGR_ZYNQMP_FPGA > > + tristate "Xilinx Zynqmp FPGA" > > + depends on ARCH_ZYNQMP || COMPILE_TEST > > + help > > + FPGA manager driver support for Xilinx ZynqMP FPGAs. > > + This driver uses processor configuration port(PCAP) > This driver uses *the* processor configuration port. > > > + to configure the programmable logic(PL) through PS > > + on ZynqMP SoC. > > + > > config FPGA_MGR_XILINX_SPI > > tristate "Xilinx Configuration over Slave Serial (SPI)" > > depends on SPI > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile > > index 7a2d73ba7122..3488ebbaee46 100644 > > --- a/drivers/fpga/Makefile > > +++ b/drivers/fpga/Makefile > > @@ -16,6 +16,7 @@ obj-$(CONFIG_FPGA_MGR_SOCFPGA_A10)+= socfpga-a10.o > > obj-$(CONFIG_FPGA_MGR_TS73XX) += ts73xx-fpga.o > > obj-$(CONFIG_FPGA_MGR_XILINX_SPI) += xilinx-spi.o > > obj-$(CONFIG_FPGA_MGR_ZYNQ_FPGA) += zynq-fpga.o > > +obj-$(CONFIG_FPGA_MGR_ZYNQMP_FPGA) += zynqmp-fpga.o > > obj-$(CONFIG_ALTERA_PR_IP_CORE) += altera-pr-ip-core.o > > obj-$(CONFIG_ALTERA_PR_IP_CORE_PLAT)+= altera-pr-ip-core-plat.o > > > > diff --git a/drivers/fpga/zynqmp-fpga.c b/drivers/fpga/zynqmp-fpga.c > > new file mode 100644 > > index ..2760d7e3872a > > --- /dev/null > > +++ b/drivers/fpga/zynqmp-fpga.c > > @@ -0,0 +1,159 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * Copyright (C) 2018 Xilinx, Inc. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* Constant Definitions */ > > +#define IXR_FPGA_DONE_MASK 0X0008U > > + > > +/** > > + * struct zynqmp_fpga_priv - Private data structure > > + * @dev: Device data structure > > + * @flags: flags which is used to identify the bitfile type > > + */ > > +struct zynqmp_fpga_priv { > > + struct device *dev; > > + u32 flags; > > +}; > > + > > +static int zynqmp_fpga_ops_write_init(struct fpga_manager *mgr, > > + struct fpga_image_info *info, > > + const char *buf, size_t size) > > +{ > > + struct zynqmp_fpga_priv *priv; > > + > > + priv = mgr->priv; > > + priv->flags = info->flags; > > + > > + return 0; > > +} > > + > > +static int zynqmp_fpga_ops_write(struct fpga_manager *mgr, > > +const char *buf, size_t size) > > +{ > > + struct zynqmp_fpga_priv *priv; > > + char *kbuf; > > + dma_addr_t dma_addr; > > + int ret; > > + const struct zynqmp_eemi_ops *eemi_ops = zynqmp_pm_get_eemi_ops(); > > Reverse xmas-tree please, i.e. long lines first. > > > + > > + if (!eemi_ops || !eemi_ops->fpga_load) > > + return -ENXIO; > > + > > + priv = mgr->priv; > > + > > + kbuf = dma_alloc_coherent(priv->dev, size, _addr, GFP_KERNEL); > > + if (!kbuf) > > + return -ENOMEM; > > + > > + memcpy(kbuf, buf, size); > > + > > + wmb(); /* ensure all writes are done before initiate FW call */ > > + > > + ret = eemi_ops->fpga_load(dma_addr, size, priv->flags); Don't you have to do anything with the flags? Is it really just a pass-through of FPGA manager flags to eemi calls? Don't you want to make partial bitstreams e.g. use a flags value that you export in your firmware header (xlnx-zynqmp.h) and set those based on what flags get passed in, i.e. explicitely translate FPGA Manager flags to your firmware flags? Thanks, Moritz
[PATCH 1/3] tracing: Fix synthetic event to accept unsigned modifier
From: Masami Hiramatsu Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myeventunsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan Cc: Tom Zanussi Cc: sta...@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 85f6b01431c7..6ff83941065a 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -738,16 +738,30 @@ static void free_synth_field(struct synth_field *field) kfree(field); } -static struct synth_field *parse_synth_field(char *field_type, -char *field_name) +static struct synth_field *parse_synth_field(int argc, char **argv, +int *consumed) { struct synth_field *field; + const char *prefix = NULL; + char *field_type = argv[0], *field_name; int len, ret = 0; char *array; if (field_type[0] == ';') field_type++; + if (!strcmp(field_type, "unsigned")) { + if (argc < 3) + return ERR_PTR(-EINVAL); + prefix = "unsigned "; + field_type = argv[1]; + field_name = argv[2]; + *consumed = 3; + } else { + field_name = argv[1]; + *consumed = 2; + } + len = strlen(field_name); if (field_name[len - 1] == ';') field_name[len - 1] = '\0'; @@ -760,11 +774,15 @@ static struct synth_field *parse_synth_field(char *field_type, array = strchr(field_name, '['); if (array) len += strlen(array); + if (prefix) + len += strlen(prefix); field->type = kzalloc(len, GFP_KERNEL); if (!field->type) { ret = -ENOMEM; goto free; } + if (prefix) + strcat(field->type, prefix); strcat(field->type, field_type); if (array) { strcat(field->type, array); @@ -1009,7 +1027,7 @@ static int create_synth_event(int argc, char **argv) struct synth_field *field, *fields[SYNTH_FIELDS_MAX]; struct synth_event *event = NULL; bool delete_event = false; - int i, n_fields = 0, ret = 0; + int i, consumed = 0, n_fields = 0, ret = 0; char *name; mutex_lock(_event_mutex); @@ -1061,13 +1079,13 @@ static int create_synth_event(int argc, char **argv) goto err; } - field = parse_synth_field(argv[i], argv[i + 1]); + field = parse_synth_field(argc - i, [i], ); if (IS_ERR(field)) { ret = PTR_ERR(field); goto err; } - fields[n_fields] = field; - i++; n_fields++; + fields[n_fields++] = field; + i += consumed - 1; } if (i < argc) { -- 2.19.0
[PATCH 3/3] selftests: ftrace: Add synthetic event syntax testcase
From: Masami Hiramatsu Add a testcase to check the syntax and field types for synthetic_events interface. Link: http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox Acked-by: Shuah Khan Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- .../trigger-synthetic-event-syntax.tc | 80 +++ 1 file changed, 80 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc new file mode 100644 index ..88e6c3f43006 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc @@ -0,0 +1,80 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test synthetic_events syntax parser + +do_reset() { +reset_trigger +echo > set_event +clear_trace +} + +fail() { #msg +do_reset +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +if [ ! -f synthetic_events ]; then +echo "synthetic event is not supported" +exit_unsupported +fi + +reset_tracer +do_reset + +echo "Test synthetic_events syntax parser" + +echo > synthetic_events + +# synthetic event must have a field +! echo "myevent" >> synthetic_events +echo "myevent u64 var1" >> synthetic_events + +# synthetic event must be found in synthetic_events +grep "myevent[[:space:]]u64 var1" synthetic_events + +# it is not possible to add same name event +! echo "myevent u64 var2" >> synthetic_events + +# Non-append open will cleanup all events and add new one +echo "myevent u64 var2" > synthetic_events + +# multiple fields with different spaces +echo "myevent u64 var1; u64 var2;" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events +echo "myevent u64 var1 ; u64 var2 ;" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events +echo "myevent u64 var1 ;u64 var2" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events + +# test field types +echo "myevent u32 var" > synthetic_events +echo "myevent u16 var" > synthetic_events +echo "myevent u8 var" > synthetic_events +echo "myevent s64 var" > synthetic_events +echo "myevent s32 var" > synthetic_events +echo "myevent s16 var" > synthetic_events +echo "myevent s8 var" > synthetic_events + +echo "myevent char var" > synthetic_events +echo "myevent int var" > synthetic_events +echo "myevent long var" > synthetic_events +echo "myevent pid_t var" > synthetic_events + +echo "myevent unsigned char var" > synthetic_events +echo "myevent unsigned int var" > synthetic_events +echo "myevent unsigned long var" > synthetic_events +grep "myevent[[:space:]]unsigned long var" synthetic_events + +# test string type +echo "myevent char var[10]" > synthetic_events +grep "myevent[[:space:]]char\[10\] var" synthetic_events + +do_reset + +exit 0 -- 2.19.0
[PATCH 1/3] tracing: Fix synthetic event to accept unsigned modifier
From: Masami Hiramatsu Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myeventunsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan Cc: Tom Zanussi Cc: sta...@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 85f6b01431c7..6ff83941065a 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -738,16 +738,30 @@ static void free_synth_field(struct synth_field *field) kfree(field); } -static struct synth_field *parse_synth_field(char *field_type, -char *field_name) +static struct synth_field *parse_synth_field(int argc, char **argv, +int *consumed) { struct synth_field *field; + const char *prefix = NULL; + char *field_type = argv[0], *field_name; int len, ret = 0; char *array; if (field_type[0] == ';') field_type++; + if (!strcmp(field_type, "unsigned")) { + if (argc < 3) + return ERR_PTR(-EINVAL); + prefix = "unsigned "; + field_type = argv[1]; + field_name = argv[2]; + *consumed = 3; + } else { + field_name = argv[1]; + *consumed = 2; + } + len = strlen(field_name); if (field_name[len - 1] == ';') field_name[len - 1] = '\0'; @@ -760,11 +774,15 @@ static struct synth_field *parse_synth_field(char *field_type, array = strchr(field_name, '['); if (array) len += strlen(array); + if (prefix) + len += strlen(prefix); field->type = kzalloc(len, GFP_KERNEL); if (!field->type) { ret = -ENOMEM; goto free; } + if (prefix) + strcat(field->type, prefix); strcat(field->type, field_type); if (array) { strcat(field->type, array); @@ -1009,7 +1027,7 @@ static int create_synth_event(int argc, char **argv) struct synth_field *field, *fields[SYNTH_FIELDS_MAX]; struct synth_event *event = NULL; bool delete_event = false; - int i, n_fields = 0, ret = 0; + int i, consumed = 0, n_fields = 0, ret = 0; char *name; mutex_lock(_event_mutex); @@ -1061,13 +1079,13 @@ static int create_synth_event(int argc, char **argv) goto err; } - field = parse_synth_field(argv[i], argv[i + 1]); + field = parse_synth_field(argc - i, [i], ); if (IS_ERR(field)) { ret = PTR_ERR(field); goto err; } - fields[n_fields] = field; - i++; n_fields++; + fields[n_fields++] = field; + i += consumed - 1; } if (i < argc) { -- 2.19.0
[PATCH 3/3] selftests: ftrace: Add synthetic event syntax testcase
From: Masami Hiramatsu Add a testcase to check the syntax and field types for synthetic_events interface. Link: http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox Acked-by: Shuah Khan Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- .../trigger-synthetic-event-syntax.tc | 80 +++ 1 file changed, 80 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc new file mode 100644 index ..88e6c3f43006 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc @@ -0,0 +1,80 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event trigger - test synthetic_events syntax parser + +do_reset() { +reset_trigger +echo > set_event +clear_trace +} + +fail() { #msg +do_reset +echo $1 +exit_fail +} + +if [ ! -f set_event ]; then +echo "event tracing is not supported" +exit_unsupported +fi + +if [ ! -f synthetic_events ]; then +echo "synthetic event is not supported" +exit_unsupported +fi + +reset_tracer +do_reset + +echo "Test synthetic_events syntax parser" + +echo > synthetic_events + +# synthetic event must have a field +! echo "myevent" >> synthetic_events +echo "myevent u64 var1" >> synthetic_events + +# synthetic event must be found in synthetic_events +grep "myevent[[:space:]]u64 var1" synthetic_events + +# it is not possible to add same name event +! echo "myevent u64 var2" >> synthetic_events + +# Non-append open will cleanup all events and add new one +echo "myevent u64 var2" > synthetic_events + +# multiple fields with different spaces +echo "myevent u64 var1; u64 var2;" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events +echo "myevent u64 var1 ; u64 var2 ;" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events +echo "myevent u64 var1 ;u64 var2" > synthetic_events +grep "myevent[[:space:]]u64 var1; u64 var2" synthetic_events + +# test field types +echo "myevent u32 var" > synthetic_events +echo "myevent u16 var" > synthetic_events +echo "myevent u8 var" > synthetic_events +echo "myevent s64 var" > synthetic_events +echo "myevent s32 var" > synthetic_events +echo "myevent s16 var" > synthetic_events +echo "myevent s8 var" > synthetic_events + +echo "myevent char var" > synthetic_events +echo "myevent int var" > synthetic_events +echo "myevent long var" > synthetic_events +echo "myevent pid_t var" > synthetic_events + +echo "myevent unsigned char var" > synthetic_events +echo "myevent unsigned int var" > synthetic_events +echo "myevent unsigned long var" > synthetic_events +grep "myevent[[:space:]]unsigned long var" synthetic_events + +# test string type +echo "myevent char var[10]" > synthetic_events +grep "myevent[[:space:]]char\[10\] var" synthetic_events + +do_reset + +exit 0 -- 2.19.0
[PATCH 2/3] tracing: Fix synthetic event to allow semicolon at end
From: Masami Hiramatsu Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan Cc: Tom Zanussi Cc: sta...@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 6ff83941065a..d239004aaf29 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -1088,7 +1088,7 @@ static int create_synth_event(int argc, char **argv) i += consumed - 1; } - if (i < argc) { + if (i < argc && strcmp(argv[i], ";") != 0) { ret = -EINVAL; goto err; } -- 2.19.0
[PATCH 0/3] [GIT PULL] tracing: A few small fixes to synthetic events
Linus (aka Greg), Masami found some issues with the creation of synthetic events. The first two patches fix handling of unsigned type, and handling of a space before an ending semi-colon. The third patch adds a selftest to test the processing of synthetic events. Please pull the latest trace-v4.19-rc8-2 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-v4.19-rc8-2 Tag SHA1: e97b692b87a195b6f33386989bb7fc7341e27288 Head SHA1: ba0e41ca81b935b958006c7120466e2217357827 Masami Hiramatsu (3): tracing: Fix synthetic event to accept unsigned modifier tracing: Fix synthetic event to allow semicolon at end selftests: ftrace: Add synthetic event syntax testcase kernel/trace/trace_events_hist.c | 32 +++-- .../inter-event/trigger-synthetic-event-syntax.tc | 80 ++ 2 files changed, 105 insertions(+), 7 deletions(-) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc
[PATCH 0/3] [GIT PULL] tracing: A few small fixes to synthetic events
Linus (aka Greg), Masami found some issues with the creation of synthetic events. The first two patches fix handling of unsigned type, and handling of a space before an ending semi-colon. The third patch adds a selftest to test the processing of synthetic events. Please pull the latest trace-v4.19-rc8-2 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-v4.19-rc8-2 Tag SHA1: e97b692b87a195b6f33386989bb7fc7341e27288 Head SHA1: ba0e41ca81b935b958006c7120466e2217357827 Masami Hiramatsu (3): tracing: Fix synthetic event to accept unsigned modifier tracing: Fix synthetic event to allow semicolon at end selftests: ftrace: Add synthetic event syntax testcase kernel/trace/trace_events_hist.c | 32 +++-- .../inter-event/trigger-synthetic-event-syntax.tc | 80 ++ 2 files changed, 105 insertions(+), 7 deletions(-) create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc
[PATCH 2/3] tracing: Fix synthetic event to allow semicolon at end
From: Masami Hiramatsu Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan Cc: Tom Zanussi Cc: sta...@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 6ff83941065a..d239004aaf29 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -1088,7 +1088,7 @@ static int create_synth_event(int argc, char **argv) i += consumed - 1; } - if (i < argc) { + if (i < argc && strcmp(argv[i], ";") != 0) { ret = -EINVAL; goto err; } -- 2.19.0
Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix
On Fri, 19 Oct 2018 04:44:33 + Nadav Amit wrote: > at 9:29 PM, Andy Lutomirski wrote: > > >> On Oct 18, 2018, at 6:08 PM, Nadav Amit wrote: > >> > >> at 10:00 AM, Andy Lutomirski wrote: > >> > On Oct 18, 2018, at 9:47 AM, Nadav Amit wrote: > > at 8:51 PM, Andy Lutomirski wrote: > > >> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit wrote: > >> at 6:22 PM, Andy Lutomirski wrote: > >> > On Oct 17, 2018, at 5:54 PM, Nadav Amit wrote: > > It is sometimes beneficial to prevent preemption for very few > instructions, or prevent preemption for some instructions that > precede > a branch (this latter case will be introduced in the next patches). > > To provide such functionality on x86-64, we use an empty REX-prefix > (opcode 0x40) as an indication that preemption is disabled for the > following instruction. > >>> > >>> Nifty! > >>> > >>> That being said, I think you have a few bugs. First, you can’t just > >>> ignore > >>> a rescheduling interrupt, as you introduce unbounded latency when this > >>> happens ― you’re effectively emulating preempt_enable_no_resched(), > >>> which > >>> is not a drop-in replacement for preempt_enable(). To fix this, you > >>> may > >>> need to jump to a slow-path trampoline that calls schedule() at the > >>> end or > >>> consider rewinding one instruction instead. Or use TF, which is only a > >>> little bit terrifying… > >> > >> Yes, I didn’t pay enough attention here. For my use-case, I think that > >> the > >> easiest solution would be to make synchronize_sched() ignore > >> preemptions > >> that happen while the prefix is detected. It would slightly change the > >> meaning of the prefix. > > So thinking about it further, rewinding the instruction seems the easiest > and most robust solution. I’ll do it. > > >>> You also aren’t accounting for the case where you get an exception > >>> that > >>> is, in turn, preempted. > >> > >> Hmm.. Can you give me an example for such an exception in my use-case? > >> I > >> cannot think of an exception that might be preempted (assuming #BP, #MC > >> cannot be preempted). > > > > Look for cond_local_irq_enable(). > > I looked at it. Yet, I still don’t see how exceptions might happen in my > use-case, but having said that - this can be fixed too. > >>> > >>> I’m not totally certain there’s a case that matters. But it’s worth > >>> checking > >> > >> I am still checking. But, I wanted to ask you whether the existing code is > >> correct, since it seems to me that others do the same mistake I did, unless > >> I don’t understand the code. > >> > >> Consider for example do_int3(), and see my inlined comments: > >> > >> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code) > >> { > >> ... > >> ist_enter(regs);// => preempt_disable() > >> cond_local_irq_enable(regs);// => assume it enables IRQs > >> > >> ... > >> // resched irq can be delivered here. It will not caused rescheduling > >> // since preemption is disabled > >> > >> cond_local_irq_disable(regs);// => assume it disables IRQs > >> ist_exit(regs);// => preempt_enable_no_resched() > >> } > >> > >> At this point resched will not happen for unbounded length of time (unless > >> there is another point when exiting the trap handler that checks if > >> preemption should take place). > > > > I think it's only a bug in the cases where someone uses extable to fix > > up an int3 (which would be nuts) or that we oops. But I should still > > fix it. In the normal case where int3 was in user code, we'll miss > > the reschedule in do_trap(), but we'll reschedule in > > prepare_exit_to_usermode() -> exit_to_usermode_loop(). > > Thanks for your quick response, and sorry for bothering instead of dealing > with it. Note that do_debug() does something similar to do_int3(). > > And then there is optimized_callback() that also uses > preempt_enable_no_resched(). I think the original use was correct, but then > a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized > kprobes”) removed the IRQ disabling, while leaving > preempt_enable_no_resched() . No? Ah, good catch! Indeed, we don't need to stick on no_resched anymore. Thanks! -- Masami Hiramatsu
Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix
On Fri, 19 Oct 2018 04:44:33 + Nadav Amit wrote: > at 9:29 PM, Andy Lutomirski wrote: > > >> On Oct 18, 2018, at 6:08 PM, Nadav Amit wrote: > >> > >> at 10:00 AM, Andy Lutomirski wrote: > >> > On Oct 18, 2018, at 9:47 AM, Nadav Amit wrote: > > at 8:51 PM, Andy Lutomirski wrote: > > >> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit wrote: > >> at 6:22 PM, Andy Lutomirski wrote: > >> > On Oct 17, 2018, at 5:54 PM, Nadav Amit wrote: > > It is sometimes beneficial to prevent preemption for very few > instructions, or prevent preemption for some instructions that > precede > a branch (this latter case will be introduced in the next patches). > > To provide such functionality on x86-64, we use an empty REX-prefix > (opcode 0x40) as an indication that preemption is disabled for the > following instruction. > >>> > >>> Nifty! > >>> > >>> That being said, I think you have a few bugs. First, you can’t just > >>> ignore > >>> a rescheduling interrupt, as you introduce unbounded latency when this > >>> happens ― you’re effectively emulating preempt_enable_no_resched(), > >>> which > >>> is not a drop-in replacement for preempt_enable(). To fix this, you > >>> may > >>> need to jump to a slow-path trampoline that calls schedule() at the > >>> end or > >>> consider rewinding one instruction instead. Or use TF, which is only a > >>> little bit terrifying… > >> > >> Yes, I didn’t pay enough attention here. For my use-case, I think that > >> the > >> easiest solution would be to make synchronize_sched() ignore > >> preemptions > >> that happen while the prefix is detected. It would slightly change the > >> meaning of the prefix. > > So thinking about it further, rewinding the instruction seems the easiest > and most robust solution. I’ll do it. > > >>> You also aren’t accounting for the case where you get an exception > >>> that > >>> is, in turn, preempted. > >> > >> Hmm.. Can you give me an example for such an exception in my use-case? > >> I > >> cannot think of an exception that might be preempted (assuming #BP, #MC > >> cannot be preempted). > > > > Look for cond_local_irq_enable(). > > I looked at it. Yet, I still don’t see how exceptions might happen in my > use-case, but having said that - this can be fixed too. > >>> > >>> I’m not totally certain there’s a case that matters. But it’s worth > >>> checking > >> > >> I am still checking. But, I wanted to ask you whether the existing code is > >> correct, since it seems to me that others do the same mistake I did, unless > >> I don’t understand the code. > >> > >> Consider for example do_int3(), and see my inlined comments: > >> > >> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code) > >> { > >> ... > >> ist_enter(regs);// => preempt_disable() > >> cond_local_irq_enable(regs);// => assume it enables IRQs > >> > >> ... > >> // resched irq can be delivered here. It will not caused rescheduling > >> // since preemption is disabled > >> > >> cond_local_irq_disable(regs);// => assume it disables IRQs > >> ist_exit(regs);// => preempt_enable_no_resched() > >> } > >> > >> At this point resched will not happen for unbounded length of time (unless > >> there is another point when exiting the trap handler that checks if > >> preemption should take place). > > > > I think it's only a bug in the cases where someone uses extable to fix > > up an int3 (which would be nuts) or that we oops. But I should still > > fix it. In the normal case where int3 was in user code, we'll miss > > the reschedule in do_trap(), but we'll reschedule in > > prepare_exit_to_usermode() -> exit_to_usermode_loop(). > > Thanks for your quick response, and sorry for bothering instead of dealing > with it. Note that do_debug() does something similar to do_int3(). > > And then there is optimized_callback() that also uses > preempt_enable_no_resched(). I think the original use was correct, but then > a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized > kprobes”) removed the IRQ disabling, while leaving > preempt_enable_no_resched() . No? Ah, good catch! Indeed, we don't need to stick on no_resched anymore. Thanks! -- Masami Hiramatsu
Re: [PATCH v7 00/21] tpm: separate tpm 1.x and tpm 2.x commands
On Fri, 19 Oct 2018, Tomas Winkler wrote: This patch series provides initial separation of tpm 1.x and tpm 2.x commands, in foresight that the tpm 1.x chips will eventually phase out and can be compiled out for modern systems. A new file is added tpm1-cmd.c that contains tpm 1.x specific commands. In addition, tpm 1.x commands are now implemented using tpm_buf structure and instead of tpm_cmd_t construct. The latter is now removed. Note: my tpm 1.x HW availability is limited hence some more testing is needed. This series also contains two trivial cleanups and addition of new commands by TCG spec 1.36, now supported on new Intet's platforms. V6: 1. Dropping tpm: move pcr extend code to tpm2-cmd.c and rebasing code over that change 2. Trivial fixes in kdoc and header V7: 1. Add backportable patch for nuvoton duration calculation 2. Rebase durations patches over it. 3. Fix notorious typo tmp->tpm Tomas Winkler (21): tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x tpm2: add new tpm2 commands according to TCG 1.36 tpm: sort objects in the Makefile tpm: factor out tpm 1.x duration calculation to tpm1-cmd.c tpm: add tpm_calc_ordinal_duration() wrapper tpm: factor out tpm_get_timeouts() tpm: move tpm1_pcr_extend to tpm1-cmd.c tpm: move tpm_getcap to tpm1-cmd.c tpm: factor out tpm1_get_random into tpm1-cmd.c tpm: move tpm 1.x selftest code from tpm-interface.c tpm1-cmd.c tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c tpm: factor out tpm_startup function tpm: add tpm_auto_startup() into tpm-interface.c tpm: tpm-interface.c drop unused macros tpm: tpm-space.c remove unneeded semicolon tpm: tpm1: rewrite tpm1_get_random() using tpm_buf structure tpm1: implement tpm1_pcr_read_dev() using tpm_buf structure tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read() tpm1: reimplement SAVESTATE using tpm_buf tpm1: reimplement tpm1_continue_selftest() using tpm_buf tpm: use u32 instead of int for PCR index drivers/char/tpm/Makefile| 16 +- drivers/char/tpm/st33zp24/st33zp24.c | 2 +- drivers/char/tpm/tpm-chip.c | 11 +- drivers/char/tpm/tpm-interface.c | 817 +++ drivers/char/tpm/tpm-sysfs.c | 52 +-- drivers/char/tpm/tpm.h | 97 ++--- drivers/char/tpm/tpm1-cmd.c | 781 + drivers/char/tpm/tpm2-cmd.c | 301 +++-- drivers/char/tpm/tpm2-space.c| 2 +- drivers/char/tpm/tpm_i2c_nuvoton.c | 11 +- drivers/char/tpm/tpm_tis_core.c | 10 +- include/linux/tpm.h | 11 +- security/integrity/ima/ima_crypto.c | 5 +- 13 files changed, 1082 insertions(+), 1034 deletions(-) create mode 100644 drivers/char/tpm/tpm1-cmd.c -- 2.14.4 Starts to look reosonable: https://patchwork.kernel.org/project/linux-integrity/list/?series=33257 This is the list of patches (assuming that I didn't miss anything) that still need tested-by tags: - tpm: factor out tpm1_get_random into tpm1-cmd.c - tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c - tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read() (the subsystem tag is wrong in this, just noticed, should be 'tpm:') - tpm: use u32 instead of int for PCR index /Jarkko
Re: [PATCH v7 00/21] tpm: separate tpm 1.x and tpm 2.x commands
On Fri, 19 Oct 2018, Tomas Winkler wrote: This patch series provides initial separation of tpm 1.x and tpm 2.x commands, in foresight that the tpm 1.x chips will eventually phase out and can be compiled out for modern systems. A new file is added tpm1-cmd.c that contains tpm 1.x specific commands. In addition, tpm 1.x commands are now implemented using tpm_buf structure and instead of tpm_cmd_t construct. The latter is now removed. Note: my tpm 1.x HW availability is limited hence some more testing is needed. This series also contains two trivial cleanups and addition of new commands by TCG spec 1.36, now supported on new Intet's platforms. V6: 1. Dropping tpm: move pcr extend code to tpm2-cmd.c and rebasing code over that change 2. Trivial fixes in kdoc and header V7: 1. Add backportable patch for nuvoton duration calculation 2. Rebase durations patches over it. 3. Fix notorious typo tmp->tpm Tomas Winkler (21): tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x tpm2: add new tpm2 commands according to TCG 1.36 tpm: sort objects in the Makefile tpm: factor out tpm 1.x duration calculation to tpm1-cmd.c tpm: add tpm_calc_ordinal_duration() wrapper tpm: factor out tpm_get_timeouts() tpm: move tpm1_pcr_extend to tpm1-cmd.c tpm: move tpm_getcap to tpm1-cmd.c tpm: factor out tpm1_get_random into tpm1-cmd.c tpm: move tpm 1.x selftest code from tpm-interface.c tpm1-cmd.c tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c tpm: factor out tpm_startup function tpm: add tpm_auto_startup() into tpm-interface.c tpm: tpm-interface.c drop unused macros tpm: tpm-space.c remove unneeded semicolon tpm: tpm1: rewrite tpm1_get_random() using tpm_buf structure tpm1: implement tpm1_pcr_read_dev() using tpm_buf structure tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read() tpm1: reimplement SAVESTATE using tpm_buf tpm1: reimplement tpm1_continue_selftest() using tpm_buf tpm: use u32 instead of int for PCR index drivers/char/tpm/Makefile| 16 +- drivers/char/tpm/st33zp24/st33zp24.c | 2 +- drivers/char/tpm/tpm-chip.c | 11 +- drivers/char/tpm/tpm-interface.c | 817 +++ drivers/char/tpm/tpm-sysfs.c | 52 +-- drivers/char/tpm/tpm.h | 97 ++--- drivers/char/tpm/tpm1-cmd.c | 781 + drivers/char/tpm/tpm2-cmd.c | 301 +++-- drivers/char/tpm/tpm2-space.c| 2 +- drivers/char/tpm/tpm_i2c_nuvoton.c | 11 +- drivers/char/tpm/tpm_tis_core.c | 10 +- include/linux/tpm.h | 11 +- security/integrity/ima/ima_crypto.c | 5 +- 13 files changed, 1082 insertions(+), 1034 deletions(-) create mode 100644 drivers/char/tpm/tpm1-cmd.c -- 2.14.4 Starts to look reosonable: https://patchwork.kernel.org/project/linux-integrity/list/?series=33257 This is the list of patches (assuming that I didn't miss anything) that still need tested-by tags: - tpm: factor out tpm1_get_random into tpm1-cmd.c - tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c - tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read() (the subsystem tag is wrong in this, just noticed, should be 'tpm:') - tpm: use u32 instead of int for PCR index /Jarkko
[PATCH 24/24] afs: Probe multiple fileservers simultaneously
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells --- fs/afs/Makefile|4 - fs/afs/addr_list.c | 40 -- fs/afs/cmservice.c | 129 +++-- fs/afs/fs_probe.c | 270 fs/afs/fsclient.c | 27 +++- fs/afs/internal.h | 98 +--- fs/afs/proc.c |6 - fs/afs/rotate.c| 174 ++-- fs/afs/rxrpc.c | 44 --- fs/afs/server.c| 109 +- fs/afs/server_list.c |6 - fs/afs/vl_list.c |6 + fs/afs/vl_probe.c | 273 fs/afs/vl_rotate.c | 159 +- fs/afs/vlclient.c | 35 +++--- fs/afs/volume.c| 16 --- include/trace/events/afs.h |4 - 17 files changed, 1050 insertions(+), 350 deletions(-) create mode 100644 fs/afs/fs_probe.c create mode 100644 fs/afs/vl_probe.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cc942b790cff..0738e2bf5193 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -17,6 +17,7 @@ kafs-y := \ file.o \ flock.o \ fsclient.o \ + fs_probe.o \ inode.o \ main.o \ misc.o \ @@ -29,8 +30,9 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ - vl_rotate.o \ vl_list.o \ + vl_probe.o \ + vl_rotate.o \ volume.o \ write.o \ xattr.o \ diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 1536d1d21c33..967db336d11a 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin); srx->transport.sin.sin_family = AF_INET; srx->transport.sin.sin_port = htons(port); @@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin6); srx->transport.sin6.sin6_family = AF_INET6; srx->transport.sin6.sin6_port = htons(port); @@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) */ bool afs_iterate_addresses(struct afs_addr_cursor *ac) { - _enter("%hu+%hd", ac->start, (short)ac->index); + unsigned long set, failed; + int index; if (!ac->alist) return false; + set = ac->alist->responded; + failed = ac->alist->failed; + _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index); + ac->nr_iterations++; - if (ac->begun) { - ac->index++; - if (ac->index == ac->alist->nr_addrs) - ac->index = 0; + set &= ~(failed | ac->tried); - if (ac->index == ac->start) - return false; - } + if (!set) + return false; + + index = READ_ONCE(ac->alist->preferred); + if (test_bit(index, )) + goto selected; + + index = __ffs(set); - ac->begun = true; +selected: + ac->index = index; + set_bit(index, >tried); ac->responded = false; return true; } @@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac) alist = ac->alist; if (alist) { - if (ac->responded && ac->index != ac->start) - WRITE_ONCE(alist->index, ac->index); + if (ac->responded && + ac->index != alist->preferred && + test_bit(ac->alist->preferred, >tried)) + WRITE_ONCE(alist->preferred, ac->index); afs_put_addrlist(alist); + ac->alist = NULL; } - ac->alist = NULL; - ac->begun = false; return ac->error; } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 8cf8d10daa6c..8ee5972893ed 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call) { _enter("{%u, CB.OP %u}",
[PATCH 24/24] afs: Probe multiple fileservers simultaneously
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells --- fs/afs/Makefile|4 - fs/afs/addr_list.c | 40 -- fs/afs/cmservice.c | 129 +++-- fs/afs/fs_probe.c | 270 fs/afs/fsclient.c | 27 +++- fs/afs/internal.h | 98 +--- fs/afs/proc.c |6 - fs/afs/rotate.c| 174 ++-- fs/afs/rxrpc.c | 44 --- fs/afs/server.c| 109 +- fs/afs/server_list.c |6 - fs/afs/vl_list.c |6 + fs/afs/vl_probe.c | 273 fs/afs/vl_rotate.c | 159 +- fs/afs/vlclient.c | 35 +++--- fs/afs/volume.c| 16 --- include/trace/events/afs.h |4 - 17 files changed, 1050 insertions(+), 350 deletions(-) create mode 100644 fs/afs/fs_probe.c create mode 100644 fs/afs/vl_probe.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cc942b790cff..0738e2bf5193 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -17,6 +17,7 @@ kafs-y := \ file.o \ flock.o \ fsclient.o \ + fs_probe.o \ inode.o \ main.o \ misc.o \ @@ -29,8 +30,9 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ - vl_rotate.o \ vl_list.o \ + vl_probe.o \ + vl_rotate.o \ volume.o \ write.o \ xattr.o \ diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 1536d1d21c33..967db336d11a 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin); srx->transport.sin.sin_family = AF_INET; srx->transport.sin.sin_port = htons(port); @@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin6); srx->transport.sin6.sin6_family = AF_INET6; srx->transport.sin6.sin6_port = htons(port); @@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) */ bool afs_iterate_addresses(struct afs_addr_cursor *ac) { - _enter("%hu+%hd", ac->start, (short)ac->index); + unsigned long set, failed; + int index; if (!ac->alist) return false; + set = ac->alist->responded; + failed = ac->alist->failed; + _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index); + ac->nr_iterations++; - if (ac->begun) { - ac->index++; - if (ac->index == ac->alist->nr_addrs) - ac->index = 0; + set &= ~(failed | ac->tried); - if (ac->index == ac->start) - return false; - } + if (!set) + return false; + + index = READ_ONCE(ac->alist->preferred); + if (test_bit(index, )) + goto selected; + + index = __ffs(set); - ac->begun = true; +selected: + ac->index = index; + set_bit(index, >tried); ac->responded = false; return true; } @@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac) alist = ac->alist; if (alist) { - if (ac->responded && ac->index != ac->start) - WRITE_ONCE(alist->index, ac->index); + if (ac->responded && + ac->index != alist->preferred && + test_bit(ac->alist->preferred, >tried)) + WRITE_ONCE(alist->preferred, ac->index); afs_put_addrlist(alist); + ac->alist = NULL; } - ac->alist = NULL; - ac->begun = false; return ac->error; } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 8cf8d10daa6c..8ee5972893ed 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call) { _enter("{%u, CB.OP %u}",
[PATCH 18/24] afs: Get the target vnode in afs_rmdir() and get a callback on it
Get the target vnode in afs_rmdir() and validate it before we attempt the deletion, The vnode pointer will be passed through to the delivery function in a later patch so that the delivery function can mark it deleted. Signed-off-by: David Howells --- fs/afs/dir.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 8936731c59ff..f2dd48d4363f 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry) static int afs_rmdir(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir); + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; u64 data_version = dvnode->status.data_version; int ret; @@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) goto error; } + /* Try to make sure we have a callback promise on the victim. */ + if (d_really_is_positive(dentry)) { + vnode = AFS_FS_I(d_inode(dentry)); + ret = afs_validate(vnode, key); + if (ret < 0) + goto error_key; + } + ret = -ERESTARTSYS; if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { @@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) } } +error_key: key_put(key); error: return ret;
[PATCH 17/24] afs: Calc callback expiry in op reply delivery
Calculate the callback expiration time at the point of operation reply delivery, using the reply time queried from AF_RXRPC on that call as a base. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/fsclient.c | 22 +- fs/afs/inode.c|4 ++-- fs/afs/internal.h |2 ++ fs/afs/rxrpc.c|6 ++ 5 files changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index fb9bcb8758ea..417cd23529c5 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -68,8 +68,8 @@ typedef enum { } afs_callback_type_t; struct afs_callback { + time64_texpires_at; /* Time at which expires */ unsignedversion;/* Callback version */ - unsignedexpiry; /* Time at which expires */ afs_callback_type_t type; /* Type of callback */ }; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index f758750e81d8..6105cdb17163 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, *_bp = bp; } -static void xdr_decode_AFSCallBack_raw(const __be32 **_bp, +static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry) +{ + return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC); +} + +static void xdr_decode_AFSCallBack_raw(struct afs_call *call, + const __be32 **_bp, struct afs_callback *cb) { const __be32 *bp = *_bp; cb->version = ntohl(*bp++); - cb->expiry = ntohl(*bp++); + cb->expires_at = xdr_decode_expiry(call, ntohl(*bp++)); cb->type= ntohl(*bp++); *_bp = bp; } @@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, struct afs_volsync *volsy call->reply[0] = vnode; call->reply[1] = volsync; call->expected_version = new_inode ? 1 : vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, call->reply[3]); + xdr_decode_AFSCallBack_raw(call, , call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc, call->reply[2] = newstatus; call->reply[3] = newcb; call->expected_version = current_data_version + 1; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, callback); + xdr_decode_AFSCallBack_raw(call, , callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, call->reply[2] = callback; call->reply[3] = volsync; call->expected_version = 1; /* vnode->status.data_version */ + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct afs_call *call) bp = call->buffer; callbacks = call->reply[2]; callbacks[call->count].version = ntohl(bp[0]); - callbacks[call->count].expiry = ntohl(bp[1]); + callbacks[call->count].expires_at = xdr_decode_expiry(call, ntohl(bp[1])); callbacks[call->count].type = ntohl(bp[2]); statuses = call->reply[1]; if (call->count == 0 && vnode && statuses[0].abort_code == 0) @@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc, call->reply[2] = callbacks; call->reply[3] = volsync; call->count2 = nr_fids; + call->want_reply_time = true; /*
[PATCH 19/24] afs: Expand data structure fields to support YFS
Expand fields in various data structures to support the expanded information that YFS is capable of returning. Signed-off-by: David Howells --- fs/afs/afs.h | 35 ++- fs/afs/fsclient.c |9 + 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index 417cd23529c5..d12ffb457e47 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -130,19 +130,18 @@ typedef u32 afs_access_t; struct afs_file_status { u64 size; /* file size */ afs_dataversion_t data_version; /* current data version */ - time_t mtime_client; /* last time client changed data */ - time_t mtime_server; /* last time server changed data */ - unsignedabort_code; /* Abort if bulk-fetching this failed */ - - afs_file_type_t type; /* file type */ - unsignednlink; /* link count */ - u32 author; /* author ID */ - u32 owner; /* owner ID */ - u32 group; /* group ID */ + struct timespec64 mtime_client; /* Last time client changed data */ + struct timespec64 mtime_server; /* Last time server changed data */ + s64 author; /* author ID */ + s64 owner; /* owner ID */ + s64 group; /* group ID */ afs_access_tcaller_access; /* access rights for authenticated caller */ afs_access_tanon_access;/* access rights for unauthenticated caller */ umode_t mode; /* UNIX mode */ + afs_file_type_t type; /* file type */ + u32 nlink; /* link count */ s32 lock_count; /* file lock count (0=UNLK -1=WRLCK +ve=#RDLCK */ + u32 abort_code; /* Abort if bulk-fetching this failed */ }; /* @@ -159,25 +158,27 @@ struct afs_file_status { * AFS volume synchronisation information */ struct afs_volsync { - time_t creation; /* volume creation time */ + time64_tcreation; /* volume creation time */ }; /* * AFS volume status record */ struct afs_volume_status { - u32 vid;/* volume ID */ - u32 parent_id; /* parent volume ID */ + afs_volid_t vid;/* volume ID */ + afs_volid_t parent_id; /* parent volume ID */ u8 online; /* true if volume currently online and available */ u8 in_service; /* true if volume currently in service */ u8 blessed;/* same as in_service */ u8 needs_salvage; /* true if consistency checking required */ u32 type; /* volume type (afs_voltype_t) */ - u32 min_quota; /* minimum space set aside (blocks) */ - u32 max_quota; /* maximum space this volume may occupy (blocks) */ - u32 blocks_in_use; /* space this volume currently occupies (blocks) */ - u32 part_blocks_avail; /* space available in volume's partition */ - u32 part_max_blocks; /* size of volume's partition */ + u64 min_quota; /* minimum space set aside (blocks) */ + u64 max_quota; /* maximum space this volume may occupy (blocks) */ + u64 blocks_in_use; /* space this volume currently occupies (blocks) */ + u64 part_blocks_avail; /* space available in volume's partition */ + u64 part_max_blocks; /* size of volume's partition */ + s64 vol_copy_date; + s64 vol_backup_date; }; #define AFS_BLOCK_SIZE 1024 diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 6105cdb17163..2da65309e0de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode, struct timespec64 t; umode_t mode; - t.tv_sec = status->mtime_client; - t.tv_nsec = 0; + t = status->mtime_client; vnode->vfs_inode.i_ctime = t; vnode->vfs_inode.i_mtime = t; vnode->vfs_inode.i_atime = t; @@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, EXTRACT_M(mode); EXTRACT_M(group); - status->mtime_client = ntohl(xdr->mtime_client); - status->mtime_server =
[PATCH 21/24] afs: Allow dumping of server cursor on operation failure
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells --- fs/afs/Kconfig | 12 +++ fs/afs/addr_list.c |2 ++ fs/afs/internal.h |3 +++ fs/afs/rotate.c| 57 fs/afs/vl_rotate.c | 53 5 files changed, 127 insertions(+) diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig index ebba3b18e5da..701aaa9b1899 100644 --- a/fs/afs/Kconfig +++ b/fs/afs/Kconfig @@ -27,3 +27,15 @@ config AFS_FSCACHE help Say Y here if you want AFS data to be cached locally on disk through the generic filesystem cache manager + +config AFS_DEBUG_CURSOR + bool "AFS server cursor debugging" + depends on AFS_FS + help + Say Y here to cause the contents of a server cursor to be dumped to + the dmesg log if the server rotation algorithm fails to successfully + contact a server. + + See for more information. + + If unsure, say N. diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 3f60b4012587..bc5ce31a4ae4 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (!ac->alist) return false; + ac->nr_iterations++; + if (ac->begun) { ac->index++; if (ac->index == ac->alist->nr_addrs) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ce79bd514331..ac9da1e4050e 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -660,6 +660,7 @@ struct afs_addr_cursor { short error; boolbegun; /* T if we've begun iteration */ boolresponded; /* T if the current address responded */ + unsigned short nr_iterations; /* Number of address iterations */ }; /* @@ -677,6 +678,7 @@ struct afs_vl_cursor { #define AFS_VL_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_VL_CURSOR_RETRY0x0002 /* Set to do a retry */ #define AFS_VL_CURSOR_RETRIED 0x0004 /* Set if started a retry */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* @@ -700,6 +702,7 @@ struct afs_fs_cursor { #define AFS_FS_CURSOR_VNOVOL 0x0008 /* Set if seen VNOVOL */ #define AFS_FS_CURSOR_CUR_ONLY 0x0010 /* Set if current server only (file lock held) */ #define AFS_FS_CURSOR_NO_VSLEEP0x0020 /* Set to prevent sleep on VBUSY, VOFFLINE, ... */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 41405dde0113..7c4487781637 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) return false; } + fc->nr_iterations++; + /* Evaluate the result of the previous operation, if there was one. */ switch (error) { case SHRT_MAX: @@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor *fc) return false; } +/* + * Dump cursor state in the case of the error being EDESTADDRREQ. + */ +static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc) +{ + static int count; + int i; + + if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3) + return; + count++; + + rcu_read_lock(); + + pr_notice("EDESTADDR occurred\n"); + pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n", + fc->cb_break, fc->cb_break_2, fc->flags, fc->error); + pr_notice("FC: st=%u ix=%u ni=%u\n", + fc->start, fc->index, fc->nr_iterations); + + if (fc->server_list) { + const struct afs_server_list *sl = fc->server_list; + pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n", + sl->nr_servers, sl->index, sl->vnovol_mask); + for (i = 0; i < sl->nr_servers; i++) { + const struct afs_server *s = sl->servers[i].server; + pr_notice("FC: server fl=%lx av=%u %pU\n", + s->flags, s->addr_version, >uuid); + if (s->addresses) { + const struct afs_addr_list *a = + rcu_dereference(s->addresses); + pr_notice("FC: - av=%u nr=%u/%u/%u ax=%u\n", + a->version, + a->nr_ipv4, a->nr_addrs, a->max_addrs, + a->index); + pr_notice("FC: - pr=%lx yf=%lx\n", +
[PATCH 18/24] afs: Get the target vnode in afs_rmdir() and get a callback on it
Get the target vnode in afs_rmdir() and validate it before we attempt the deletion, The vnode pointer will be passed through to the delivery function in a later patch so that the delivery function can mark it deleted. Signed-off-by: David Howells --- fs/afs/dir.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 8936731c59ff..f2dd48d4363f 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry) static int afs_rmdir(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir); + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; u64 data_version = dvnode->status.data_version; int ret; @@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) goto error; } + /* Try to make sure we have a callback promise on the victim. */ + if (d_really_is_positive(dentry)) { + vnode = AFS_FS_I(d_inode(dentry)); + ret = afs_validate(vnode, key); + if (ret < 0) + goto error_key; + } + ret = -ERESTARTSYS; if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { @@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) } } +error_key: key_put(key); error: return ret;
[PATCH 17/24] afs: Calc callback expiry in op reply delivery
Calculate the callback expiration time at the point of operation reply delivery, using the reply time queried from AF_RXRPC on that call as a base. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/fsclient.c | 22 +- fs/afs/inode.c|4 ++-- fs/afs/internal.h |2 ++ fs/afs/rxrpc.c|6 ++ 5 files changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index fb9bcb8758ea..417cd23529c5 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -68,8 +68,8 @@ typedef enum { } afs_callback_type_t; struct afs_callback { + time64_texpires_at; /* Time at which expires */ unsignedversion;/* Callback version */ - unsignedexpiry; /* Time at which expires */ afs_callback_type_t type; /* Type of callback */ }; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index f758750e81d8..6105cdb17163 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, *_bp = bp; } -static void xdr_decode_AFSCallBack_raw(const __be32 **_bp, +static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry) +{ + return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC); +} + +static void xdr_decode_AFSCallBack_raw(struct afs_call *call, + const __be32 **_bp, struct afs_callback *cb) { const __be32 *bp = *_bp; cb->version = ntohl(*bp++); - cb->expiry = ntohl(*bp++); + cb->expires_at = xdr_decode_expiry(call, ntohl(*bp++)); cb->type= ntohl(*bp++); *_bp = bp; } @@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, struct afs_volsync *volsy call->reply[0] = vnode; call->reply[1] = volsync; call->expected_version = new_inode ? 1 : vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, call->reply[3]); + xdr_decode_AFSCallBack_raw(call, , call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc, call->reply[2] = newstatus; call->reply[3] = newcb; call->expected_version = current_data_version + 1; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, callback); + xdr_decode_AFSCallBack_raw(call, , callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, call->reply[2] = callback; call->reply[3] = volsync; call->expected_version = 1; /* vnode->status.data_version */ + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct afs_call *call) bp = call->buffer; callbacks = call->reply[2]; callbacks[call->count].version = ntohl(bp[0]); - callbacks[call->count].expiry = ntohl(bp[1]); + callbacks[call->count].expires_at = xdr_decode_expiry(call, ntohl(bp[1])); callbacks[call->count].type = ntohl(bp[2]); statuses = call->reply[1]; if (call->count == 0 && vnode && statuses[0].abort_code == 0) @@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc, call->reply[2] = callbacks; call->reply[3] = volsync; call->count2 = nr_fids; + call->want_reply_time = true; /*
[PATCH 19/24] afs: Expand data structure fields to support YFS
Expand fields in various data structures to support the expanded information that YFS is capable of returning. Signed-off-by: David Howells --- fs/afs/afs.h | 35 ++- fs/afs/fsclient.c |9 + 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index 417cd23529c5..d12ffb457e47 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -130,19 +130,18 @@ typedef u32 afs_access_t; struct afs_file_status { u64 size; /* file size */ afs_dataversion_t data_version; /* current data version */ - time_t mtime_client; /* last time client changed data */ - time_t mtime_server; /* last time server changed data */ - unsignedabort_code; /* Abort if bulk-fetching this failed */ - - afs_file_type_t type; /* file type */ - unsignednlink; /* link count */ - u32 author; /* author ID */ - u32 owner; /* owner ID */ - u32 group; /* group ID */ + struct timespec64 mtime_client; /* Last time client changed data */ + struct timespec64 mtime_server; /* Last time server changed data */ + s64 author; /* author ID */ + s64 owner; /* owner ID */ + s64 group; /* group ID */ afs_access_tcaller_access; /* access rights for authenticated caller */ afs_access_tanon_access;/* access rights for unauthenticated caller */ umode_t mode; /* UNIX mode */ + afs_file_type_t type; /* file type */ + u32 nlink; /* link count */ s32 lock_count; /* file lock count (0=UNLK -1=WRLCK +ve=#RDLCK */ + u32 abort_code; /* Abort if bulk-fetching this failed */ }; /* @@ -159,25 +158,27 @@ struct afs_file_status { * AFS volume synchronisation information */ struct afs_volsync { - time_t creation; /* volume creation time */ + time64_tcreation; /* volume creation time */ }; /* * AFS volume status record */ struct afs_volume_status { - u32 vid;/* volume ID */ - u32 parent_id; /* parent volume ID */ + afs_volid_t vid;/* volume ID */ + afs_volid_t parent_id; /* parent volume ID */ u8 online; /* true if volume currently online and available */ u8 in_service; /* true if volume currently in service */ u8 blessed;/* same as in_service */ u8 needs_salvage; /* true if consistency checking required */ u32 type; /* volume type (afs_voltype_t) */ - u32 min_quota; /* minimum space set aside (blocks) */ - u32 max_quota; /* maximum space this volume may occupy (blocks) */ - u32 blocks_in_use; /* space this volume currently occupies (blocks) */ - u32 part_blocks_avail; /* space available in volume's partition */ - u32 part_max_blocks; /* size of volume's partition */ + u64 min_quota; /* minimum space set aside (blocks) */ + u64 max_quota; /* maximum space this volume may occupy (blocks) */ + u64 blocks_in_use; /* space this volume currently occupies (blocks) */ + u64 part_blocks_avail; /* space available in volume's partition */ + u64 part_max_blocks; /* size of volume's partition */ + s64 vol_copy_date; + s64 vol_backup_date; }; #define AFS_BLOCK_SIZE 1024 diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 6105cdb17163..2da65309e0de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode, struct timespec64 t; umode_t mode; - t.tv_sec = status->mtime_client; - t.tv_nsec = 0; + t = status->mtime_client; vnode->vfs_inode.i_ctime = t; vnode->vfs_inode.i_mtime = t; vnode->vfs_inode.i_atime = t; @@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, EXTRACT_M(mode); EXTRACT_M(group); - status->mtime_client = ntohl(xdr->mtime_client); - status->mtime_server =
[PATCH 21/24] afs: Allow dumping of server cursor on operation failure
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells --- fs/afs/Kconfig | 12 +++ fs/afs/addr_list.c |2 ++ fs/afs/internal.h |3 +++ fs/afs/rotate.c| 57 fs/afs/vl_rotate.c | 53 5 files changed, 127 insertions(+) diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig index ebba3b18e5da..701aaa9b1899 100644 --- a/fs/afs/Kconfig +++ b/fs/afs/Kconfig @@ -27,3 +27,15 @@ config AFS_FSCACHE help Say Y here if you want AFS data to be cached locally on disk through the generic filesystem cache manager + +config AFS_DEBUG_CURSOR + bool "AFS server cursor debugging" + depends on AFS_FS + help + Say Y here to cause the contents of a server cursor to be dumped to + the dmesg log if the server rotation algorithm fails to successfully + contact a server. + + See for more information. + + If unsure, say N. diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 3f60b4012587..bc5ce31a4ae4 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (!ac->alist) return false; + ac->nr_iterations++; + if (ac->begun) { ac->index++; if (ac->index == ac->alist->nr_addrs) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ce79bd514331..ac9da1e4050e 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -660,6 +660,7 @@ struct afs_addr_cursor { short error; boolbegun; /* T if we've begun iteration */ boolresponded; /* T if the current address responded */ + unsigned short nr_iterations; /* Number of address iterations */ }; /* @@ -677,6 +678,7 @@ struct afs_vl_cursor { #define AFS_VL_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_VL_CURSOR_RETRY0x0002 /* Set to do a retry */ #define AFS_VL_CURSOR_RETRIED 0x0004 /* Set if started a retry */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* @@ -700,6 +702,7 @@ struct afs_fs_cursor { #define AFS_FS_CURSOR_VNOVOL 0x0008 /* Set if seen VNOVOL */ #define AFS_FS_CURSOR_CUR_ONLY 0x0010 /* Set if current server only (file lock held) */ #define AFS_FS_CURSOR_NO_VSLEEP0x0020 /* Set to prevent sleep on VBUSY, VOFFLINE, ... */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 41405dde0113..7c4487781637 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) return false; } + fc->nr_iterations++; + /* Evaluate the result of the previous operation, if there was one. */ switch (error) { case SHRT_MAX: @@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor *fc) return false; } +/* + * Dump cursor state in the case of the error being EDESTADDRREQ. + */ +static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc) +{ + static int count; + int i; + + if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3) + return; + count++; + + rcu_read_lock(); + + pr_notice("EDESTADDR occurred\n"); + pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n", + fc->cb_break, fc->cb_break_2, fc->flags, fc->error); + pr_notice("FC: st=%u ix=%u ni=%u\n", + fc->start, fc->index, fc->nr_iterations); + + if (fc->server_list) { + const struct afs_server_list *sl = fc->server_list; + pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n", + sl->nr_servers, sl->index, sl->vnovol_mask); + for (i = 0; i < sl->nr_servers; i++) { + const struct afs_server *s = sl->servers[i].server; + pr_notice("FC: server fl=%lx av=%u %pU\n", + s->flags, s->addr_version, >uuid); + if (s->addresses) { + const struct afs_addr_list *a = + rcu_dereference(s->addresses); + pr_notice("FC: - av=%u nr=%u/%u/%u ax=%u\n", + a->version, + a->nr_ipv4, a->nr_addrs, a->max_addrs, + a->index); + pr_notice("FC: - pr=%lx yf=%lx\n", +
[PATCH 15/24] afs: Implement the YFS cache manager service
Implement the YFS cache manager service which gives extra capabilities on top of AFS. This is done by listening for an additional service on the same port and indicating that anyone requesting an upgrade should be upgraded to the YFS port. Signed-off-by: David Howells --- fs/afs/cmservice.c| 103 + fs/afs/protocol_yfs.h | 57 +++ fs/afs/rxrpc.c| 15 +++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 fs/afs/protocol_yfs.h diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index fc0010d800a0..8cf8d10daa6c 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -16,6 +16,7 @@ #include #include "internal.h" #include "afs_cm.h" +#include "protocol_yfs.h" static int afs_deliver_cb_init_call_back_state(struct afs_call *); static int afs_deliver_cb_init_call_back_state3(struct afs_call *); @@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *); static void SRXAFSCB_ProbeUuid(struct work_struct *); static void SRXAFSCB_TellMeAboutYourself(struct work_struct *); +static int afs_deliver_yfs_cb_callback(struct afs_call *); + #define CM_NAME(name) \ const char afs_SRXCB##name##_name[] __tracepoint_string = \ "CB." #name @@ -100,13 +103,24 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { .work = SRXAFSCB_TellMeAboutYourself, }; +/* + * YFS CB.CallBack operation type + */ +static CM_NAME(YFS_CallBack); +static const struct afs_call_type afs_SRXYFSCB_CallBack = { + .name = afs_SRXCBYFS_CallBack_name, + .deliver= afs_deliver_yfs_cb_callback, + .destructor = afs_cm_destructor, + .work = SRXAFSCB_CallBack, +}; + /* * route an incoming cache manager call * - return T if supported, F if not */ bool afs_cm_incoming_call(struct afs_call *call) { - _enter("{CB.OP %u}", call->operation_ID); + _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID); switch (call->operation_ID) { case CBCallBack: @@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBTellMeAboutYourself: call->type = _SRXCBTellMeAboutYourself; return true; + case YFSCBCallBack: + if (call->service_id != YFS_CM_SERVICE) + return false; + call->type = _SRXYFSCB_CallBack; + return true; default: return false; } @@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return afs_queue_call_work(call); } + +/* + * deliver request data to a YFS CB.CallBack call + */ +static int afs_deliver_yfs_cb_callback(struct afs_call *call) +{ + struct afs_callback_break *cb; + struct sockaddr_rxrpc srx; + struct yfs_xdr_YFSFid *bp; + size_t size; + int ret, loop; + + _enter("{%u}", call->unmarshall); + + switch (call->unmarshall) { + case 0: + afs_extract_to_tmp(call); + call->unmarshall++; + + /* extract the FID array and its count in two steps */ + case 1: + _debug("extract FID count"); + ret = afs_extract_data(call, true); + if (ret < 0) + return ret; + + call->count = ntohl(call->tmp); + _debug("FID count: %u", call->count); + if (call->count > YFSCBMAX) + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); + + size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid)); + call->buffer = kmalloc(size, GFP_KERNEL); + if (!call->buffer) + return -ENOMEM; + afs_extract_to_buf(call, size); + call->unmarshall++; + + case 2: + _debug("extract FID array"); + ret = afs_extract_data(call, false); + if (ret < 0) + return ret; + + _debug("unmarshall FID array"); + call->request = kcalloc(call->count, + sizeof(struct afs_callback_break), + GFP_KERNEL); + if (!call->request) + return -ENOMEM; + + cb = call->request; + bp = call->buffer; + for (loop = call->count; loop > 0; loop--, cb++) { + cb->fid.vid = xdr_to_u64(bp->volume); + cb->fid.vnode = xdr_to_u64(bp->vnode.lo); + cb->fid.vnode_hi = ntohl(bp->vnode.hi); + cb->fid.unique = ntohl(bp->vnode.unique); + bp++; + } + + afs_extract_to_tmp(call);
[PATCH 15/24] afs: Implement the YFS cache manager service
Implement the YFS cache manager service which gives extra capabilities on top of AFS. This is done by listening for an additional service on the same port and indicating that anyone requesting an upgrade should be upgraded to the YFS port. Signed-off-by: David Howells --- fs/afs/cmservice.c| 103 + fs/afs/protocol_yfs.h | 57 +++ fs/afs/rxrpc.c| 15 +++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 fs/afs/protocol_yfs.h diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index fc0010d800a0..8cf8d10daa6c 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -16,6 +16,7 @@ #include #include "internal.h" #include "afs_cm.h" +#include "protocol_yfs.h" static int afs_deliver_cb_init_call_back_state(struct afs_call *); static int afs_deliver_cb_init_call_back_state3(struct afs_call *); @@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *); static void SRXAFSCB_ProbeUuid(struct work_struct *); static void SRXAFSCB_TellMeAboutYourself(struct work_struct *); +static int afs_deliver_yfs_cb_callback(struct afs_call *); + #define CM_NAME(name) \ const char afs_SRXCB##name##_name[] __tracepoint_string = \ "CB." #name @@ -100,13 +103,24 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { .work = SRXAFSCB_TellMeAboutYourself, }; +/* + * YFS CB.CallBack operation type + */ +static CM_NAME(YFS_CallBack); +static const struct afs_call_type afs_SRXYFSCB_CallBack = { + .name = afs_SRXCBYFS_CallBack_name, + .deliver= afs_deliver_yfs_cb_callback, + .destructor = afs_cm_destructor, + .work = SRXAFSCB_CallBack, +}; + /* * route an incoming cache manager call * - return T if supported, F if not */ bool afs_cm_incoming_call(struct afs_call *call) { - _enter("{CB.OP %u}", call->operation_ID); + _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID); switch (call->operation_ID) { case CBCallBack: @@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBTellMeAboutYourself: call->type = _SRXCBTellMeAboutYourself; return true; + case YFSCBCallBack: + if (call->service_id != YFS_CM_SERVICE) + return false; + call->type = _SRXYFSCB_CallBack; + return true; default: return false; } @@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return afs_queue_call_work(call); } + +/* + * deliver request data to a YFS CB.CallBack call + */ +static int afs_deliver_yfs_cb_callback(struct afs_call *call) +{ + struct afs_callback_break *cb; + struct sockaddr_rxrpc srx; + struct yfs_xdr_YFSFid *bp; + size_t size; + int ret, loop; + + _enter("{%u}", call->unmarshall); + + switch (call->unmarshall) { + case 0: + afs_extract_to_tmp(call); + call->unmarshall++; + + /* extract the FID array and its count in two steps */ + case 1: + _debug("extract FID count"); + ret = afs_extract_data(call, true); + if (ret < 0) + return ret; + + call->count = ntohl(call->tmp); + _debug("FID count: %u", call->count); + if (call->count > YFSCBMAX) + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); + + size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid)); + call->buffer = kmalloc(size, GFP_KERNEL); + if (!call->buffer) + return -ENOMEM; + afs_extract_to_buf(call, size); + call->unmarshall++; + + case 2: + _debug("extract FID array"); + ret = afs_extract_data(call, false); + if (ret < 0) + return ret; + + _debug("unmarshall FID array"); + call->request = kcalloc(call->count, + sizeof(struct afs_callback_break), + GFP_KERNEL); + if (!call->request) + return -ENOMEM; + + cb = call->request; + bp = call->buffer; + for (loop = call->count; loop > 0; loop--, cb++) { + cb->fid.vid = xdr_to_u64(bp->volume); + cb->fid.vnode = xdr_to_u64(bp->vnode.lo); + cb->fid.vnode_hi = ntohl(bp->vnode.hi); + cb->fid.unique = ntohl(bp->vnode.unique); + bp++; + } + + afs_extract_to_tmp(call);
[PATCH 22/24] afs: Eliminate the address pointer from the address list cursor
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells --- fs/afs/addr_list.c |2 -- fs/afs/internal.h |1 - fs/afs/rxrpc.c |2 +- fs/afs/server.c|2 -- fs/afs/vl_rotate.c |2 +- fs/afs/volume.c|6 +++--- 6 files changed, 5 insertions(+), 10 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index bc5ce31a4ae4..1536d1d21c33 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) ac->begun = true; ac->responded = false; - ac->addr = >alist->addrs[ac->index]; return true; } @@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac) afs_put_addrlist(alist); } - ac->addr = NULL; ac->alist = NULL; ac->begun = false; return ac->error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ac9da1e4050e..e5b596bd8acf 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -653,7 +653,6 @@ struct afs_interface { */ struct afs_addr_cursor { struct afs_addr_list*alist; /* Current address list (pins ref) */ - struct sockaddr_rxrpc *addr; u32 abort_code; unsigned short start; /* Starting point in alist->addrs[] */ unsigned short index; /* Wrapping offset from start to current addr */ diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 444ba0d511ef..42e1ea7372e9 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg) long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp, bool async) { - struct sockaddr_rxrpc *srx = ac->addr; + struct sockaddr_rxrpc *srx = >alist->addrs[ac->index]; struct rxrpc_call *rxcall; struct msghdr msg; struct kvec iov[1]; diff --git a/fs/afs/server.c b/fs/afs/server.c index aa35cfae5440..7c1be8b4dc9a 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct afs_server *server) .alist = alist, .start = alist->index, .index = 0, - .addr = >addrs[alist->index], .error = 0, }; _enter("%p", server); @@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor *fc) _enter(""); - fc->ac.addr = NULL; fc->ac.start = READ_ONCE(fc->ac.alist->index); fc->ac.index = fc->ac.start; fc->ac.error = 0; diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c index 5b99ea7be194..ead6dedbb561 100644 --- a/fs/afs/vl_rotate.c +++ b/fs/afs/vl_rotate.c @@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc) if (!afs_iterate_addresses(>ac)) goto next_server; - _leave(" = t %pISpc", >ac.addr->transport); + _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport); return true; next_server: diff --git a/fs/afs/volume.c b/fs/afs/volume.c index f0020e35bf6f..7527c081726e 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct afs_cell *cell, case VL_SERVICE: clear_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; case YFS_VL_SERVICE: set_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; } } - + vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz); }
[PATCH 23/24] afs: Fix callback handling
In some circumstances, the callback interest pointer is NULL, so in such a case we can't dereference it when checking to see if the callback is broken. This causes an oops in some circumstances. Fix this by replacing the function that worked out the aggregate break counter with one that actually does the comparison, and then make that return true (ie. broken) if there is no callback interest as yet (ie. the pointer is NULL). Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling") Signed-off-by: David Howells --- fs/afs/fsclient.c |2 +- fs/afs/internal.h |9 ++--- fs/afs/security.c |7 --- fs/afs/yfsclient.c |2 +- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 3975969719de..7c75a1813321 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { vnode->cb_version = ntohl(*bp++); cb_expiry = ntohl(*bp++); vnode->cb_type = ntohl(*bp++); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e5b596bd8acf..b60d15212975 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct afs_vnode *vnode) return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break; } -static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode, - struct afs_cb_interest *cbi) +static inline bool afs_cb_is_broken(unsigned int cb_break, + const struct afs_vnode *vnode, + const struct afs_cb_interest *cbi) { - return vnode->cb_break + cbi->server->cb_s_break + vnode->volume->cb_v_break; + return !cbi || cb_break != (vnode->cb_break + + cbi->server->cb_s_break + + vnode->volume->cb_v_break); } /* diff --git a/fs/afs/security.c b/fs/afs/security.c index d1ae53fd3739..5f58a9a17e69 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, break; } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) { + if (afs_cb_is_broken(cb_break, vnode, +vnode->cb_interest)) { changed = true; break; } @@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, } } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) + if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest)) goto someone_else_changed_it; /* We need a ref on any permits list we want to copy as we'll have to @@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, spin_lock(>lock); zap = rcu_access_pointer(vnode->permit_cache); - if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) && + if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) && zap == permits) rcu_assign_pointer(vnode->permit_cache, replacement); else diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index d5e3f0095040..12658c1363ae 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { cb_expiry = xdr_to_u64(xdr->expiration_time); do_div(cb_expiry, 10 * 1000 * 1000); vnode->cb_version = ntohl(xdr->version);
[PATCH 20/24] afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells --- fs/afs/Makefile|3 fs/afs/callback.c |9 fs/afs/dir.c | 21 fs/afs/fsclient.c | 104 ++ fs/afs/internal.h | 35 + fs/afs/protocol_yfs.h | 106 ++ fs/afs/server.c|8 fs/afs/yfsclient.c | 2184 include/trace/events/afs.h | 58 + 9 files changed, 2500 insertions(+), 28 deletions(-) create mode 100644 fs/afs/yfsclient.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 03e9f7afea1b..cc942b790cff 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -33,7 +33,8 @@ kafs-y := \ vl_list.o \ volume.o \ write.o \ - xattr.o + xattr.o \ + yfsclient.o kafs-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_AFS_FS) := kafs.o diff --git a/fs/afs/callback.c b/fs/afs/callback.c index df9bfee698ad..1c7955f5cdaf 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server) /* * actually break a callback */ -void afs_break_callback(struct afs_vnode *vnode) +void __afs_break_callback(struct afs_vnode *vnode) { _enter(""); - write_seqlock(>cb_lock); - clear_bit(AFS_VNODE_NEW_CONTENT, >flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) { vnode->cb_break++; @@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode) afs_lock_may_be_available(vnode); spin_unlock(>lock); } +} +void afs_break_callback(struct afs_vnode *vnode) +{ + write_seqlock(>cb_lock); + __afs_break_callback(vnode); write_sequnlock(>cb_lock); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index f2dd48d4363f..43dea3b00c29 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, true, + afs_fs_remove(, vnode, dentry->d_name.name, true, data_version); } @@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, if (d_really_is_positive(dentry)) { struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry)); - if (dir_valid) { + if (test_bit(AFS_VNODE_DELETED, >flags)) { + /* Already done */ + } else if (dir_valid) { drop_nlink(>vfs_inode); if (vnode->vfs_inode.i_nlink == 0) { set_bit(AFS_VNODE_DELETED, >flags); @@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, static int afs_unlink(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode; + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; unsigned long d_version = (unsigned long)dentry->d_fsdata; u64 data_version = dvnode->status.data_version; @@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, false, + + if (test_bit(AFS_SERVER_FL_IS_YFS, >server->flags) && + !test_bit(AFS_SERVER_FL_NO_RM2, >server->flags)) { + yfs_fs_remove_file2(, vnode, dentry->d_name.name, + data_version); + if (fc.ac.error != -ECONNABORTED || + fc.ac.abort_code != RXGEN_OPCODE) + continue; + set_bit(AFS_SERVER_FL_NO_RM2, >server->flags); + } + + afs_fs_remove(, vnode, dentry->d_name.name, false, data_version); } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 2da65309e0de..3975969719de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -17,6 +17,7 @@ #include "internal.h" #include "afs_fs.h" #include "xdr_fs.h" +#include "protocol_yfs.h" static const struct afs_fid afs_zero_fid;
[PATCH 16/24] afs: Fix FS.FetchStatus delivery from updating wrong vnode
The FS.FetchStatus reply delivery function was updating inode of the directory in which a lookup had been done with the status of the looked up file. This corrupts some of the directory state. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells --- fs/afs/fsclient.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5e3027f21390..f758750e81d8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) struct afs_file_status *status = call->reply[1]; struct afs_callback *callback = call->reply[2]; struct afs_volsync *volsync = call->reply[3]; - struct afs_vnode *vnode = call->reply[0]; + struct afs_fid *fid = call->reply[0]; const __be32 *bp; int ret; @@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) if (ret < 0) return ret; - _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode); + _enter("{%llx:%llu}", fid->vid, fid->vnode); /* unmarshall the reply once we've received all of it */ bp = call->buffer; - ret = afs_decode_status(call, , status, vnode, + ret = afs_decode_status(call, , status, NULL, >expected_version, NULL); if (ret < 0) return ret; - callback[call->count].version = ntohl(bp[0]); - callback[call->count].expiry= ntohl(bp[1]); - callback[call->count].type = ntohl(bp[2]); - if (vnode) - xdr_decode_AFSCallBack(call, vnode, ); - else - bp += 3; + xdr_decode_AFSCallBack_raw(, callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, } call->key = fc->key; - call->reply[0] = NULL; /* vnode for fid[0] */ + call->reply[0] = fid; call->reply[1] = status; call->reply[2] = callback; call->reply[3] = volsync;
[PATCH 22/24] afs: Eliminate the address pointer from the address list cursor
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells --- fs/afs/addr_list.c |2 -- fs/afs/internal.h |1 - fs/afs/rxrpc.c |2 +- fs/afs/server.c|2 -- fs/afs/vl_rotate.c |2 +- fs/afs/volume.c|6 +++--- 6 files changed, 5 insertions(+), 10 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index bc5ce31a4ae4..1536d1d21c33 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) ac->begun = true; ac->responded = false; - ac->addr = >alist->addrs[ac->index]; return true; } @@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac) afs_put_addrlist(alist); } - ac->addr = NULL; ac->alist = NULL; ac->begun = false; return ac->error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ac9da1e4050e..e5b596bd8acf 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -653,7 +653,6 @@ struct afs_interface { */ struct afs_addr_cursor { struct afs_addr_list*alist; /* Current address list (pins ref) */ - struct sockaddr_rxrpc *addr; u32 abort_code; unsigned short start; /* Starting point in alist->addrs[] */ unsigned short index; /* Wrapping offset from start to current addr */ diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 444ba0d511ef..42e1ea7372e9 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg) long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp, bool async) { - struct sockaddr_rxrpc *srx = ac->addr; + struct sockaddr_rxrpc *srx = >alist->addrs[ac->index]; struct rxrpc_call *rxcall; struct msghdr msg; struct kvec iov[1]; diff --git a/fs/afs/server.c b/fs/afs/server.c index aa35cfae5440..7c1be8b4dc9a 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct afs_server *server) .alist = alist, .start = alist->index, .index = 0, - .addr = >addrs[alist->index], .error = 0, }; _enter("%p", server); @@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor *fc) _enter(""); - fc->ac.addr = NULL; fc->ac.start = READ_ONCE(fc->ac.alist->index); fc->ac.index = fc->ac.start; fc->ac.error = 0; diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c index 5b99ea7be194..ead6dedbb561 100644 --- a/fs/afs/vl_rotate.c +++ b/fs/afs/vl_rotate.c @@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc) if (!afs_iterate_addresses(>ac)) goto next_server; - _leave(" = t %pISpc", >ac.addr->transport); + _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport); return true; next_server: diff --git a/fs/afs/volume.c b/fs/afs/volume.c index f0020e35bf6f..7527c081726e 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct afs_cell *cell, case VL_SERVICE: clear_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; case YFS_VL_SERVICE: set_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; } } - + vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz); }
[PATCH 23/24] afs: Fix callback handling
In some circumstances, the callback interest pointer is NULL, so in such a case we can't dereference it when checking to see if the callback is broken. This causes an oops in some circumstances. Fix this by replacing the function that worked out the aggregate break counter with one that actually does the comparison, and then make that return true (ie. broken) if there is no callback interest as yet (ie. the pointer is NULL). Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling") Signed-off-by: David Howells --- fs/afs/fsclient.c |2 +- fs/afs/internal.h |9 ++--- fs/afs/security.c |7 --- fs/afs/yfsclient.c |2 +- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 3975969719de..7c75a1813321 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { vnode->cb_version = ntohl(*bp++); cb_expiry = ntohl(*bp++); vnode->cb_type = ntohl(*bp++); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e5b596bd8acf..b60d15212975 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct afs_vnode *vnode) return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break; } -static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode, - struct afs_cb_interest *cbi) +static inline bool afs_cb_is_broken(unsigned int cb_break, + const struct afs_vnode *vnode, + const struct afs_cb_interest *cbi) { - return vnode->cb_break + cbi->server->cb_s_break + vnode->volume->cb_v_break; + return !cbi || cb_break != (vnode->cb_break + + cbi->server->cb_s_break + + vnode->volume->cb_v_break); } /* diff --git a/fs/afs/security.c b/fs/afs/security.c index d1ae53fd3739..5f58a9a17e69 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, break; } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) { + if (afs_cb_is_broken(cb_break, vnode, +vnode->cb_interest)) { changed = true; break; } @@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, } } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) + if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest)) goto someone_else_changed_it; /* We need a ref on any permits list we want to copy as we'll have to @@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, spin_lock(>lock); zap = rcu_access_pointer(vnode->permit_cache); - if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) && + if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) && zap == permits) rcu_assign_pointer(vnode->permit_cache, replacement); else diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index d5e3f0095040..12658c1363ae 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { cb_expiry = xdr_to_u64(xdr->expiration_time); do_div(cb_expiry, 10 * 1000 * 1000); vnode->cb_version = ntohl(xdr->version);
[PATCH 20/24] afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells --- fs/afs/Makefile|3 fs/afs/callback.c |9 fs/afs/dir.c | 21 fs/afs/fsclient.c | 104 ++ fs/afs/internal.h | 35 + fs/afs/protocol_yfs.h | 106 ++ fs/afs/server.c|8 fs/afs/yfsclient.c | 2184 include/trace/events/afs.h | 58 + 9 files changed, 2500 insertions(+), 28 deletions(-) create mode 100644 fs/afs/yfsclient.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 03e9f7afea1b..cc942b790cff 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -33,7 +33,8 @@ kafs-y := \ vl_list.o \ volume.o \ write.o \ - xattr.o + xattr.o \ + yfsclient.o kafs-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_AFS_FS) := kafs.o diff --git a/fs/afs/callback.c b/fs/afs/callback.c index df9bfee698ad..1c7955f5cdaf 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server) /* * actually break a callback */ -void afs_break_callback(struct afs_vnode *vnode) +void __afs_break_callback(struct afs_vnode *vnode) { _enter(""); - write_seqlock(>cb_lock); - clear_bit(AFS_VNODE_NEW_CONTENT, >flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) { vnode->cb_break++; @@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode) afs_lock_may_be_available(vnode); spin_unlock(>lock); } +} +void afs_break_callback(struct afs_vnode *vnode) +{ + write_seqlock(>cb_lock); + __afs_break_callback(vnode); write_sequnlock(>cb_lock); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index f2dd48d4363f..43dea3b00c29 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, true, + afs_fs_remove(, vnode, dentry->d_name.name, true, data_version); } @@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, if (d_really_is_positive(dentry)) { struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry)); - if (dir_valid) { + if (test_bit(AFS_VNODE_DELETED, >flags)) { + /* Already done */ + } else if (dir_valid) { drop_nlink(>vfs_inode); if (vnode->vfs_inode.i_nlink == 0) { set_bit(AFS_VNODE_DELETED, >flags); @@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, static int afs_unlink(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode; + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; unsigned long d_version = (unsigned long)dentry->d_fsdata; u64 data_version = dvnode->status.data_version; @@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, false, + + if (test_bit(AFS_SERVER_FL_IS_YFS, >server->flags) && + !test_bit(AFS_SERVER_FL_NO_RM2, >server->flags)) { + yfs_fs_remove_file2(, vnode, dentry->d_name.name, + data_version); + if (fc.ac.error != -ECONNABORTED || + fc.ac.abort_code != RXGEN_OPCODE) + continue; + set_bit(AFS_SERVER_FL_NO_RM2, >server->flags); + } + + afs_fs_remove(, vnode, dentry->d_name.name, false, data_version); } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 2da65309e0de..3975969719de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -17,6 +17,7 @@ #include "internal.h" #include "afs_fs.h" #include "xdr_fs.h" +#include "protocol_yfs.h" static const struct afs_fid afs_zero_fid;
[PATCH 16/24] afs: Fix FS.FetchStatus delivery from updating wrong vnode
The FS.FetchStatus reply delivery function was updating inode of the directory in which a lookup had been done with the status of the looked up file. This corrupts some of the directory state. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells --- fs/afs/fsclient.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5e3027f21390..f758750e81d8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) struct afs_file_status *status = call->reply[1]; struct afs_callback *callback = call->reply[2]; struct afs_volsync *volsync = call->reply[3]; - struct afs_vnode *vnode = call->reply[0]; + struct afs_fid *fid = call->reply[0]; const __be32 *bp; int ret; @@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) if (ret < 0) return ret; - _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode); + _enter("{%llx:%llu}", fid->vid, fid->vnode); /* unmarshall the reply once we've received all of it */ bp = call->buffer; - ret = afs_decode_status(call, , status, vnode, + ret = afs_decode_status(call, , status, NULL, >expected_version, NULL); if (ret < 0) return ret; - callback[call->count].version = ntohl(bp[0]); - callback[call->count].expiry= ntohl(bp[1]); - callback[call->count].type = ntohl(bp[2]); - if (vnode) - xdr_decode_AFSCallBack(call, vnode, ); - else - bp += 3; + xdr_decode_AFSCallBack_raw(, callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, } call->key = fc->key; - call->reply[0] = NULL; /* vnode for fid[0] */ + call->reply[0] = fid; call->reply[1] = status; call->reply[2] = callback; call->reply[3] = volsync;
[PATCH 11/24] afs: Don't invoke the server to read data beyond EOF
When writing a new page, clear space in the page rather than attempting to load it from the server if the space is beyond the EOF. Signed-off-by: David Howells --- fs/afs/write.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/afs/write.c b/fs/afs/write.c index fdb9d6024126..11066a3248ba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, loff_t pos, unsigned int len, struct page *page) { struct afs_read *req; + size_t p; + void *data; int ret; _enter(",,%llu", (unsigned long long)pos); + if (pos >= vnode->vfs_inode.i_size) { + p = pos & ~PAGE_MASK; + ASSERTCMP(p + len, <=, PAGE_SIZE); + data = kmap(page); + memset(data + p, 0, len); + kunmap(page); + return 0; + } + req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *), GFP_KERNEL); if (!req)
[PATCH 13/24] afs: Commit the status on a new file/dir/symlink
Call the function to commit the status on a new file, dir or symlink so that the access rights for the caller's key are cached for that object. Without this, the next access to the file will cause a FetchStatus operation to be emitted to retrieve the access rights. Signed-off-by: David Howells --- fs/afs/dir.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 024b7cf7441c..8936731c59ff 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc, vnode = AFS_FS_I(inode); set_bit(AFS_VNODE_NEW_CONTENT, >flags); + afs_vnode_commit_status(fc, vnode, 0); d_add(new_dentry, inode); }
[PATCH 12/24] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells --- fs/afs/afs.h | 11 ++- fs/afs/cache.c |2 +- fs/afs/callback.c |2 +- fs/afs/dir.c | 24 fs/afs/dynroot.c |2 +- fs/afs/file.c |8 fs/afs/flock.c | 22 +++--- fs/afs/fsclient.c | 24 fs/afs/inode.c | 31 +-- fs/afs/proc.c |2 +- fs/afs/rotate.c|2 +- fs/afs/security.c |6 +++--- fs/afs/super.c |5 +++-- fs/afs/volume.c|2 +- fs/afs/write.c | 18 +- fs/afs/xattr.c |2 +- include/trace/events/afs.h |4 ++-- 17 files changed, 86 insertions(+), 81 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index b4ff1f7ae4ab..c23b31b742fa 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -23,9 +23,9 @@ #define AFSPATHMAX 1024/* Maximum length of a pathname plus NUL */ #define AFSOPAQUEMAX 1024/* Maximum length of an opaque field */ -typedef unsigned afs_volid_t; -typedef unsigned afs_vnodeid_t; -typedef unsigned long long afs_dataversion_t; +typedef u64afs_volid_t; +typedef u64afs_vnodeid_t; +typedef u64afs_dataversion_t; typedef enum { AFSVL_RWVOL,/* read/write volume */ @@ -52,8 +52,9 @@ typedef enum { */ struct afs_fid { afs_volid_t vid;/* volume ID */ - afs_vnodeid_t vnode; /* file index within volume */ - unsignedunique; /* unique ID number (file index version) */ + afs_vnodeid_t vnode; /* Lower 64-bits of file index within volume */ + u32 vnode_hi; /* Upper 32-bits of file index */ + u32 unique; /* unique ID number (file index version) */ }; /* diff --git a/fs/afs/cache.c b/fs/afs/cache.c index b1c31ec4523a..f6d0a21e8052 100644 --- a/fs/afs/cache.c +++ b/fs/afs/cache.c @@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data, struct afs_vnode *vnode = cookie_netfs_data; struct afs_vnode_cache_aux aux; - _enter("{%x,%x,%llx},%p,%u", + _enter("{%llx,%x,%llx},%p,%u", vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version, buffer, buflen); diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 5f261fbf2182..8698198ad427 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08x n=%u u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", callbacks->fid.vid, callbacks->fid.vnode, callbacks->fid.unique, diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 78f9754fd03d..024b7cf7441c 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry, } *fid = cookie.fid; - _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique); + _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique); return 0; } @@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, struct key *key; int ret; - _enter("{%x:%u},%p{%pd},", + _enter("{%llx:%llu},%p{%pd},", dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry); ASSERTCMP(d_inode(dentry), ==, NULL); @@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) if (d_really_is_positive(dentry)) { vnode = AFS_FS_I(d_inode(dentry)); - _enter("{v={%x:%u} n=%pd fl=%lx},", + _enter("{v={%llx:%llu} n=%pd fl=%lx},", vnode->fid.vid, vnode->fid.vnode, dentry, vnode->flags); } else { @@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) /* if the vnode ID has changed, then the dirent points to a
[PATCH 11/24] afs: Don't invoke the server to read data beyond EOF
When writing a new page, clear space in the page rather than attempting to load it from the server if the space is beyond the EOF. Signed-off-by: David Howells --- fs/afs/write.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/afs/write.c b/fs/afs/write.c index fdb9d6024126..11066a3248ba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, loff_t pos, unsigned int len, struct page *page) { struct afs_read *req; + size_t p; + void *data; int ret; _enter(",,%llu", (unsigned long long)pos); + if (pos >= vnode->vfs_inode.i_size) { + p = pos & ~PAGE_MASK; + ASSERTCMP(p + len, <=, PAGE_SIZE); + data = kmap(page); + memset(data + p, 0, len); + kunmap(page); + return 0; + } + req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *), GFP_KERNEL); if (!req)
[PATCH 13/24] afs: Commit the status on a new file/dir/symlink
Call the function to commit the status on a new file, dir or symlink so that the access rights for the caller's key are cached for that object. Without this, the next access to the file will cause a FetchStatus operation to be emitted to retrieve the access rights. Signed-off-by: David Howells --- fs/afs/dir.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 024b7cf7441c..8936731c59ff 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc, vnode = AFS_FS_I(inode); set_bit(AFS_VNODE_NEW_CONTENT, >flags); + afs_vnode_commit_status(fc, vnode, 0); d_add(new_dentry, inode); }
[PATCH 12/24] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells --- fs/afs/afs.h | 11 ++- fs/afs/cache.c |2 +- fs/afs/callback.c |2 +- fs/afs/dir.c | 24 fs/afs/dynroot.c |2 +- fs/afs/file.c |8 fs/afs/flock.c | 22 +++--- fs/afs/fsclient.c | 24 fs/afs/inode.c | 31 +-- fs/afs/proc.c |2 +- fs/afs/rotate.c|2 +- fs/afs/security.c |6 +++--- fs/afs/super.c |5 +++-- fs/afs/volume.c|2 +- fs/afs/write.c | 18 +- fs/afs/xattr.c |2 +- include/trace/events/afs.h |4 ++-- 17 files changed, 86 insertions(+), 81 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index b4ff1f7ae4ab..c23b31b742fa 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -23,9 +23,9 @@ #define AFSPATHMAX 1024/* Maximum length of a pathname plus NUL */ #define AFSOPAQUEMAX 1024/* Maximum length of an opaque field */ -typedef unsigned afs_volid_t; -typedef unsigned afs_vnodeid_t; -typedef unsigned long long afs_dataversion_t; +typedef u64afs_volid_t; +typedef u64afs_vnodeid_t; +typedef u64afs_dataversion_t; typedef enum { AFSVL_RWVOL,/* read/write volume */ @@ -52,8 +52,9 @@ typedef enum { */ struct afs_fid { afs_volid_t vid;/* volume ID */ - afs_vnodeid_t vnode; /* file index within volume */ - unsignedunique; /* unique ID number (file index version) */ + afs_vnodeid_t vnode; /* Lower 64-bits of file index within volume */ + u32 vnode_hi; /* Upper 32-bits of file index */ + u32 unique; /* unique ID number (file index version) */ }; /* diff --git a/fs/afs/cache.c b/fs/afs/cache.c index b1c31ec4523a..f6d0a21e8052 100644 --- a/fs/afs/cache.c +++ b/fs/afs/cache.c @@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data, struct afs_vnode *vnode = cookie_netfs_data; struct afs_vnode_cache_aux aux; - _enter("{%x,%x,%llx},%p,%u", + _enter("{%llx,%x,%llx},%p,%u", vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version, buffer, buflen); diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 5f261fbf2182..8698198ad427 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08x n=%u u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", callbacks->fid.vid, callbacks->fid.vnode, callbacks->fid.unique, diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 78f9754fd03d..024b7cf7441c 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry, } *fid = cookie.fid; - _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique); + _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique); return 0; } @@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, struct key *key; int ret; - _enter("{%x:%u},%p{%pd},", + _enter("{%llx:%llu},%p{%pd},", dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry); ASSERTCMP(d_inode(dentry), ==, NULL); @@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) if (d_really_is_positive(dentry)) { vnode = AFS_FS_I(d_inode(dentry)); - _enter("{v={%x:%u} n=%pd fl=%lx},", + _enter("{v={%llx:%llu} n=%pd fl=%lx},", vnode->fid.vid, vnode->fid.vnode, dentry, vnode->flags); } else { @@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) /* if the vnode ID has changed, then the dirent points to a
[PATCH 14/24] afs: Remove callback details from afs_callback_break struct
Remove unnecessary details of a broken callback, such as version, expiry and type, from the afs_callback_break struct as they're not actually used and make the list take more memory. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/callback.c |8 ++-- fs/afs/cmservice.c | 17 + 3 files changed, 8 insertions(+), 19 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index c23b31b742fa..fb9bcb8758ea 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -75,7 +75,7 @@ struct afs_callback { struct afs_callback_break { struct afs_fid fid;/* File identifier */ - struct afs_callback cb; /* Callback details */ + //struct afs_callback cb; /* Callback details */ }; #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */ diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 8698198ad427..df9bfee698ad 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u }", callbacks->fid.vid, callbacks->fid.vnode, - callbacks->fid.unique, - callbacks->cb.version, - callbacks->cb.expiry, - callbacks->cb.type - ); + callbacks->fid.unique); afs_break_one_callback(server, >fid); } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 186f621f8722..fc0010d800a0 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->fid.vid = ntohl(*bp++); cb->fid.vnode = ntohl(*bp++); cb->fid.unique = ntohl(*bp++); - cb->cb.type = AFSCM_CB_UNTYPED; } afs_extract_to_tmp(call); @@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - afs_extract_to_buf(call, call->count2 * 3 * 4); + call->_iter = >iter; + iov_iter_discard(>iter, READ, call->count2 * 3 * 4); call->unmarshall++; case 4: - _debug("extract CB array"); + _debug("extract discard %zu/%u", + iov_iter_count(>iter), call->count2 * 3 * 4); + ret = afs_extract_data(call, false); if (ret < 0) return ret; - _debug("unmarshall CB array"); - cb = call->request; - bp = call->buffer; - for (loop = call->count2; loop > 0; loop--, cb++) { - cb->cb.version = ntohl(*bp++); - cb->cb.expiry = ntohl(*bp++); - cb->cb.type = ntohl(*bp++); - } - call->unmarshall++; case 5: break;
[PATCH 14/24] afs: Remove callback details from afs_callback_break struct
Remove unnecessary details of a broken callback, such as version, expiry and type, from the afs_callback_break struct as they're not actually used and make the list take more memory. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/callback.c |8 ++-- fs/afs/cmservice.c | 17 + 3 files changed, 8 insertions(+), 19 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index c23b31b742fa..fb9bcb8758ea 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -75,7 +75,7 @@ struct afs_callback { struct afs_callback_break { struct afs_fid fid;/* File identifier */ - struct afs_callback cb; /* Callback details */ + //struct afs_callback cb; /* Callback details */ }; #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */ diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 8698198ad427..df9bfee698ad 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u }", callbacks->fid.vid, callbacks->fid.vnode, - callbacks->fid.unique, - callbacks->cb.version, - callbacks->cb.expiry, - callbacks->cb.type - ); + callbacks->fid.unique); afs_break_one_callback(server, >fid); } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 186f621f8722..fc0010d800a0 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->fid.vid = ntohl(*bp++); cb->fid.vnode = ntohl(*bp++); cb->fid.unique = ntohl(*bp++); - cb->cb.type = AFSCM_CB_UNTYPED; } afs_extract_to_tmp(call); @@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - afs_extract_to_buf(call, call->count2 * 3 * 4); + call->_iter = >iter; + iov_iter_discard(>iter, READ, call->count2 * 3 * 4); call->unmarshall++; case 4: - _debug("extract CB array"); + _debug("extract discard %zu/%u", + iov_iter_count(>iter), call->count2 * 3 * 4); + ret = afs_extract_data(call, false); if (ret < 0) return ret; - _debug("unmarshall CB array"); - cb = call->request; - bp = call->buffer; - for (loop = call->count2; loop > 0; loop--, cb++) { - cb->cb.version = ntohl(*bp++); - cb->cb.expiry = ntohl(*bp++); - cb->cb.type = ntohl(*bp++); - } - call->unmarshall++; case 5: break;
[PATCH 08/24] afs: Fix TTL on VL server and address lists
Currently the TTL on VL server and address lists isn't set in all circumstances and may be set to poor choices in others, since the TTL is derived from the SRV/AFSDB DNS record if and when available. Fix the TTL by limiting the range to a minimum and maximum from the current time. At some point these can be made into sysctl knobs. Further, use the TTL we obtained from the upcall to set the expiry on negative results too; in future a mechanism can be added to force reloading of such data. Signed-off-by: David Howells --- fs/afs/cell.c | 26 ++ fs/afs/proc.c | 14 +++--- 2 files changed, 33 insertions(+), 7 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index 963b6fa51fdf..cf445dbd5f2e 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -20,6 +20,8 @@ #include "internal.h" static unsigned __read_mostly afs_cell_gc_delay = 10; +static unsigned __read_mostly afs_cell_min_ttl = 10 * 60; +static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60; static void afs_manage_cell(struct work_struct *); @@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net, rcu_assign_pointer(cell->vl_servers, vllist); cell->dns_expiry = TIME64_MAX; + } else { + cell->dns_expiry = ktime_get_real_seconds(); } _leave(" = %p", cell); @@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char *rootcell) static void afs_update_cell(struct afs_cell *cell) { struct afs_vlserver_list *vllist, *old; - time64_t now, expiry; + unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl); + unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl); + time64_t now, expiry = 0; _enter("%s", cell->name); vllist = afs_dns_query(cell, ); + + now = ktime_get_real_seconds(); + if (min_ttl > max_ttl) + max_ttl = min_ttl; + if (expiry < now + min_ttl) + expiry = now + min_ttl; + else if (expiry > now + max_ttl) + expiry = now + max_ttl; + if (IS_ERR(vllist)) { switch (PTR_ERR(vllist)) { case -ENODATA: - /* The DNS said that the cell does not exist */ + case -EDESTADDRREQ: + /* The DNS said that the cell does not exist or there +* weren't any addresses to be had. +*/ set_bit(AFS_CELL_FL_NOT_FOUND, >flags); clear_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 61; + cell->dns_expiry = expiry; break; case -EAGAIN: case -ECONNREFUSED: default: set_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 10; + cell->dns_expiry = now + 10; break; } diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 6585f4bec0d3..fc36c41641ab 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct seq_file *m) */ static int afs_proc_cells_show(struct seq_file *m, void *v) { - struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); + struct afs_vlserver_list *vllist; + struct afs_cell *cell; if (v == SEQ_START_TOKEN) { /* display header on line 1 */ - seq_puts(m, "USE NAME\n"); + seq_puts(m, "USETTL SV NAME\n"); return 0; } + cell = list_entry(v, struct afs_cell, proc_link); + vllist = rcu_dereference(cell->vl_servers); + /* display one cell per line on subsequent lines */ - seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name); + seq_printf(m, "%3u %6lld %2u %s\n", + atomic_read(>usage), + cell->dns_expiry - ktime_get_real_seconds(), + vllist ? vllist->nr_servers : 0, + cell->name); return 0; }
[PATCH 09/24] afs: Handle EIO from delivery function
Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false [ cut here ] kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne Signed-off-by: David Howells --- fs/afs/rxrpc.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index a3904a8315de..947ae3ab389b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call) case -EINPROGRESS: case -EAGAIN: goto out; - case -EIO: case -ECONNABORTED: ASSERTCMP(state, ==, AFS_CALL_COMPLETE); goto done; @@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call) rxrpc_kernel_abort_call(call->net->socket, call->rxcall, abort_code, ret, "KIV"); goto local_abort; + case -EIO: + pr_err("kAFS: Call %u in bad state %u\n", + call->debug_id, state); + /* Fall through */ case -ENODATA: case -EBADMSG: case -EMSGSIZE:
[PATCH 10/24] afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells --- fs/afs/cmservice.c | 10 +++-- fs/afs/dir.c | 18 ++ fs/afs/internal.h | 11 ++ fs/afs/mntpt.c |5 ++- fs/afs/rxrpc.c |2 + fs/afs/server.c|2 + fs/afs/write.c |1 + include/trace/events/afs.h | 81 8 files changed, 114 insertions(+), 16 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 4db62ae8dc1a..186f621f8722 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 855bf2b79fed..78f9754fd03d 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ntohs(dbuf->blocks[tmp].hdr.magic)); trace_afs_dir_check_failed(dvnode, off, i_size); kunmap(page); + trace_afs_file_error(dvnode, -EIO, afs_file_error_dir_bad_magic); goto error; } @@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) retry: i_size = i_size_read(>vfs_inode); if (i_size < 2048) - return ERR_PTR(-EIO); - if (i_size > 2048 * 1024) + return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small)); + if (i_size > 2048 * 1024) { + trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); return ERR_PTR(-EFBIG); + } _enter("%llu", i_size); @@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) /* * deal with one block in an AFS directory */ -static int afs_dir_iterate_block(struct dir_context *ctx, +static int afs_dir_iterate_block(struct afs_vnode *dvnode, +struct dir_context *ctx, union afs_xdr_dir_block *block, unsigned blkoff) { @@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_over_end); } if (!(block->hdr.bitmap[next / 8] & (1 << (next % 8 { @@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " %u unmarked extension (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_unmarked_ext); } _debug("ENT[%zu.%u]: ext %u/%zu", @@ -442,7 +446,7 @@ static int afs_dir_iterate(struct
[PATCH 08/24] afs: Fix TTL on VL server and address lists
Currently the TTL on VL server and address lists isn't set in all circumstances and may be set to poor choices in others, since the TTL is derived from the SRV/AFSDB DNS record if and when available. Fix the TTL by limiting the range to a minimum and maximum from the current time. At some point these can be made into sysctl knobs. Further, use the TTL we obtained from the upcall to set the expiry on negative results too; in future a mechanism can be added to force reloading of such data. Signed-off-by: David Howells --- fs/afs/cell.c | 26 ++ fs/afs/proc.c | 14 +++--- 2 files changed, 33 insertions(+), 7 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index 963b6fa51fdf..cf445dbd5f2e 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -20,6 +20,8 @@ #include "internal.h" static unsigned __read_mostly afs_cell_gc_delay = 10; +static unsigned __read_mostly afs_cell_min_ttl = 10 * 60; +static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60; static void afs_manage_cell(struct work_struct *); @@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net, rcu_assign_pointer(cell->vl_servers, vllist); cell->dns_expiry = TIME64_MAX; + } else { + cell->dns_expiry = ktime_get_real_seconds(); } _leave(" = %p", cell); @@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char *rootcell) static void afs_update_cell(struct afs_cell *cell) { struct afs_vlserver_list *vllist, *old; - time64_t now, expiry; + unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl); + unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl); + time64_t now, expiry = 0; _enter("%s", cell->name); vllist = afs_dns_query(cell, ); + + now = ktime_get_real_seconds(); + if (min_ttl > max_ttl) + max_ttl = min_ttl; + if (expiry < now + min_ttl) + expiry = now + min_ttl; + else if (expiry > now + max_ttl) + expiry = now + max_ttl; + if (IS_ERR(vllist)) { switch (PTR_ERR(vllist)) { case -ENODATA: - /* The DNS said that the cell does not exist */ + case -EDESTADDRREQ: + /* The DNS said that the cell does not exist or there +* weren't any addresses to be had. +*/ set_bit(AFS_CELL_FL_NOT_FOUND, >flags); clear_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 61; + cell->dns_expiry = expiry; break; case -EAGAIN: case -ECONNREFUSED: default: set_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 10; + cell->dns_expiry = now + 10; break; } diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 6585f4bec0d3..fc36c41641ab 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct seq_file *m) */ static int afs_proc_cells_show(struct seq_file *m, void *v) { - struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); + struct afs_vlserver_list *vllist; + struct afs_cell *cell; if (v == SEQ_START_TOKEN) { /* display header on line 1 */ - seq_puts(m, "USE NAME\n"); + seq_puts(m, "USETTL SV NAME\n"); return 0; } + cell = list_entry(v, struct afs_cell, proc_link); + vllist = rcu_dereference(cell->vl_servers); + /* display one cell per line on subsequent lines */ - seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name); + seq_printf(m, "%3u %6lld %2u %s\n", + atomic_read(>usage), + cell->dns_expiry - ktime_get_real_seconds(), + vllist ? vllist->nr_servers : 0, + cell->name); return 0; }
[PATCH 09/24] afs: Handle EIO from delivery function
Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false [ cut here ] kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne Signed-off-by: David Howells --- fs/afs/rxrpc.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index a3904a8315de..947ae3ab389b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call) case -EINPROGRESS: case -EAGAIN: goto out; - case -EIO: case -ECONNABORTED: ASSERTCMP(state, ==, AFS_CALL_COMPLETE); goto done; @@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call) rxrpc_kernel_abort_call(call->net->socket, call->rxcall, abort_code, ret, "KIV"); goto local_abort; + case -EIO: + pr_err("kAFS: Call %u in bad state %u\n", + call->debug_id, state); + /* Fall through */ case -ENODATA: case -EBADMSG: case -EMSGSIZE:
[PATCH 10/24] afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells --- fs/afs/cmservice.c | 10 +++-- fs/afs/dir.c | 18 ++ fs/afs/internal.h | 11 ++ fs/afs/mntpt.c |5 ++- fs/afs/rxrpc.c |2 + fs/afs/server.c|2 + fs/afs/write.c |1 + include/trace/events/afs.h | 81 8 files changed, 114 insertions(+), 16 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 4db62ae8dc1a..186f621f8722 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 855bf2b79fed..78f9754fd03d 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ntohs(dbuf->blocks[tmp].hdr.magic)); trace_afs_dir_check_failed(dvnode, off, i_size); kunmap(page); + trace_afs_file_error(dvnode, -EIO, afs_file_error_dir_bad_magic); goto error; } @@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) retry: i_size = i_size_read(>vfs_inode); if (i_size < 2048) - return ERR_PTR(-EIO); - if (i_size > 2048 * 1024) + return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small)); + if (i_size > 2048 * 1024) { + trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); return ERR_PTR(-EFBIG); + } _enter("%llu", i_size); @@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) /* * deal with one block in an AFS directory */ -static int afs_dir_iterate_block(struct dir_context *ctx, +static int afs_dir_iterate_block(struct afs_vnode *dvnode, +struct dir_context *ctx, union afs_xdr_dir_block *block, unsigned blkoff) { @@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_over_end); } if (!(block->hdr.bitmap[next / 8] & (1 << (next % 8 { @@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " %u unmarked extension (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_unmarked_ext); } _debug("ENT[%zu.%u]: ext %u/%zu", @@ -442,7 +446,7 @@ static int afs_dir_iterate(struct
[PATCH 03/24] iov_iter: Add I/O discard iterator
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and just discards any data copied to it. This is useful in a network filesystem for discarding any unwanted data sent by a server. Signed-off-by: David Howells --- include/linux/uio.h |7 lib/iov_iter.c | 96 ++- 2 files changed, 79 insertions(+), 24 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index f445d5cbb571..58efe514b253 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -26,6 +26,7 @@ enum iter_type { ITER_KVEC, ITER_BVEC, ITER_PIPE, + ITER_DISCARD, }; struct iov_iter { @@ -73,6 +74,11 @@ static inline bool iov_iter_is_pipe(const struct iov_iter *i) return iov_iter_type(i) == ITER_PIPE; } +static inline bool iov_iter_is_discard(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_DISCARD; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->iter_dir; @@ -221,6 +227,7 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ unsigned long nr_segs, size_t count); void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, size_t count); +void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 3d8c459e7cd8..2288488e8aaa 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -91,6 +91,9 @@ case ITER_PIPE: { \ break; \ } \ + case ITER_DISCARD: {\ + break; \ + } \ case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \ @@ -144,6 +147,10 @@ case ITER_PIPE: { \ break; \ } \ + case ITER_DISCARD: {\ + skip += n; \ + break; \ + } \ } \ i->count -= n; \ i->iov_offset = skip; \ @@ -448,6 +455,7 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) break; case ITER_KVEC: case ITER_BVEC: + case ITER_DISCARD: break; } return 0; @@ -870,6 +878,8 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, return copy_page_to_iter_iovec(page, offset, bytes, i); case ITER_PIPE: return copy_page_to_iter_pipe(page, offset, bytes, i); + case ITER_DISCARD: + return bytes; } BUG(); } @@ -882,6 +892,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, return 0; switch (iov_iter_type(i)) { case ITER_PIPE: + case ITER_DISCARD: break; case ITER_BVEC: case ITER_KVEC: { @@ -945,7 +956,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, kunmap_atomic(kaddr); return 0; } - if (unlikely(iov_iter_is_pipe(i))) { + if (unlikely(iov_iter_is_pipe(i) || iov_iter_type(i) == ITER_DISCARD)) { kunmap_atomic(kaddr); WARN_ON(1); return 0; @@ -1018,6 +1029,9 @@ void iov_iter_advance(struct iov_iter *i, size_t size) case ITER_BVEC: iterate_and_advance(i, size, v, 0, 0, 0); return; + case ITER_DISCARD: + i->count -= size; + return; } BUG(); } @@ -1060,6 +1074,9 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) } unroll -= i->iov_offset; switch (iov_iter_type(i)) { + case ITER_DISCARD: + i->iov_offset = 0; + return; case ITER_BVEC: { const struct bio_vec *bvec = i->bvec; while (1) { @@ -1103,6 +1120,7 @@ size_t
[PATCH 03/24] iov_iter: Add I/O discard iterator
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and just discards any data copied to it. This is useful in a network filesystem for discarding any unwanted data sent by a server. Signed-off-by: David Howells --- include/linux/uio.h |7 lib/iov_iter.c | 96 ++- 2 files changed, 79 insertions(+), 24 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index f445d5cbb571..58efe514b253 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -26,6 +26,7 @@ enum iter_type { ITER_KVEC, ITER_BVEC, ITER_PIPE, + ITER_DISCARD, }; struct iov_iter { @@ -73,6 +74,11 @@ static inline bool iov_iter_is_pipe(const struct iov_iter *i) return iov_iter_type(i) == ITER_PIPE; } +static inline bool iov_iter_is_discard(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_DISCARD; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->iter_dir; @@ -221,6 +227,7 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ unsigned long nr_segs, size_t count); void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, size_t count); +void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 3d8c459e7cd8..2288488e8aaa 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -91,6 +91,9 @@ case ITER_PIPE: { \ break; \ } \ + case ITER_DISCARD: {\ + break; \ + } \ case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \ @@ -144,6 +147,10 @@ case ITER_PIPE: { \ break; \ } \ + case ITER_DISCARD: {\ + skip += n; \ + break; \ + } \ } \ i->count -= n; \ i->iov_offset = skip; \ @@ -448,6 +455,7 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) break; case ITER_KVEC: case ITER_BVEC: + case ITER_DISCARD: break; } return 0; @@ -870,6 +878,8 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, return copy_page_to_iter_iovec(page, offset, bytes, i); case ITER_PIPE: return copy_page_to_iter_pipe(page, offset, bytes, i); + case ITER_DISCARD: + return bytes; } BUG(); } @@ -882,6 +892,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, return 0; switch (iov_iter_type(i)) { case ITER_PIPE: + case ITER_DISCARD: break; case ITER_BVEC: case ITER_KVEC: { @@ -945,7 +956,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, kunmap_atomic(kaddr); return 0; } - if (unlikely(iov_iter_is_pipe(i))) { + if (unlikely(iov_iter_is_pipe(i) || iov_iter_type(i) == ITER_DISCARD)) { kunmap_atomic(kaddr); WARN_ON(1); return 0; @@ -1018,6 +1029,9 @@ void iov_iter_advance(struct iov_iter *i, size_t size) case ITER_BVEC: iterate_and_advance(i, size, v, 0, 0, 0); return; + case ITER_DISCARD: + i->count -= size; + return; } BUG(); } @@ -1060,6 +1074,9 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) } unroll -= i->iov_offset; switch (iov_iter_type(i)) { + case ITER_DISCARD: + i->iov_offset = 0; + return; case ITER_BVEC: { const struct bio_vec *bvec = i->bvec; while (1) { @@ -1103,6 +1120,7 @@ size_t
[PATCH 07/24] afs: Implement VL server rotation
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs//vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells --- fs/afs/Makefile|2 fs/afs/addr_list.c | 163 + fs/afs/cell.c | 39 +++--- fs/afs/dynroot.c |2 fs/afs/internal.h | 114 -- fs/afs/proc.c | 90 +++--- fs/afs/server.c| 42 ++- fs/afs/vl_list.c | 336 fs/afs/vl_rotate.c | 251 +++ fs/afs/vlclient.c | 32 ++--- fs/afs/volume.c| 52 ++-- 11 files changed, 905 insertions(+), 218 deletions(-) create mode 100644 fs/afs/vl_list.c create mode 100644 fs/afs/vl_rotate.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 546874057bd3..03e9f7afea1b 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -29,6 +29,8 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ + vl_rotate.o \ + vl_list.o \ volume.o \ write.o \ xattr.o diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 7b34fad4f8f5..3f60b4012587 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr, /* * Parse a text string consisting of delimited addresses. */ -struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, - char delim, - unsigned short service, - unsigned short port) +struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net, + const char *text, size_t len, + char delim, + unsigned short service, + unsigned short port) { + struct afs_vlserver_list *vllist; struct afs_addr_list *alist; const char *p, *end = text + len; + const char *problem; unsigned int nr = 0; + int ret = -ENOMEM; _enter("%*.*s,%c", (int)len, (int)len, text, delim); - if (!len) + if (!len) { + _leave(" = -EDESTADDRREQ [empty]"); return ERR_PTR(-EDESTADDRREQ); + } if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len))) delim = ','; @@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, /* Count the addresses */ p = text; do { - if (!*p) - return ERR_PTR(-EINVAL); + if (!*p) { + problem = "nul"; + goto inval; + } if (*p == delim) continue; nr++; if (*p == '[') { p++; - if (p == end) - return ERR_PTR(-EINVAL); + if (p == end) { + problem = "brace1"; + goto inval; + } p = memchr(p, ']', end - p); - if (!p) - return ERR_PTR(-EINVAL); + if (!p) { + problem = "brace2"; + goto inval; + } p++; if (p >= end) break; @@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, _debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES); - alist = afs_alloc_addrlist(nr, service, port); - if (!alist) + vllist = afs_alloc_vlserver_list(1); + if (!vllist) return ERR_PTR(-ENOMEM); + vllist->nr_servers = 1; + vllist->servers[0].server = afs_alloc_vlserver("", 7, AFS_VL_PORT); + if (!vllist->servers[0].server) + goto error_vl; + +
[PATCH 05/24] afs: Set up the iov_iter before calling afs_extract_data()
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC each time it is called to describe the remaining buffer to be filled. Instead: (1) Put an iterator in the afs_call struct. (2) Set the iterator for each marshalling stage to load data into the appropriate places. A number of convenience functions are provided to this end (eg. afs_extract_to_buf()). This iterator is then passed to afs_extract_data(). (3) Use the new ITER_DISCARD iterator to discard any excess data provided by FetchData. Signed-off-by: David Howells --- fs/afs/cmservice.c | 40 +++--- fs/afs/fsclient.c | 282 +++- fs/afs/internal.h | 56 - fs/afs/rxrpc.c | 41 ++ fs/afs/vlclient.c | 104 +++- include/trace/events/afs.h | 22 ++- 6 files changed, 236 insertions(+), 309 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 58f79301a716..4db62ae8dc1a 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -176,13 +176,13 @@ static int afs_deliver_cb_callback(struct afs_call *call) switch (call->unmarshall) { case 0: - call->offset = 0; + afs_extract_to_tmp(call); call->unmarshall++; /* extract the FID array and its count in two steps */ case 1: _debug("extract FID count"); - ret = afs_extract_data(call, >tmp, 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -196,13 +196,12 @@ static int afs_deliver_cb_callback(struct afs_call *call) GFP_KERNEL); if (!call->buffer) return -ENOMEM; - call->offset = 0; + afs_extract_to_buf(call, call->count * 3 * 4); call->unmarshall++; case 2: _debug("extract FID array"); - ret = afs_extract_data(call, call->buffer, - call->count * 3 * 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -222,13 +221,13 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->cb.type = AFSCM_CB_UNTYPED; } - call->offset = 0; + afs_extract_to_tmp(call); call->unmarshall++; /* extract the callback array and its count in two steps */ case 3: _debug("extract CB count"); - ret = afs_extract_data(call, >tmp, 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -237,13 +236,12 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - call->offset = 0; + afs_extract_to_buf(call, call->count2 * 3 * 4); call->unmarshall++; case 4: _debug("extract CB array"); - ret = afs_extract_data(call, call->buffer, - call->count2 * 3 * 4, false); + ret = afs_extract_data(call, false); if (ret < 0) return ret; @@ -256,7 +254,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->cb.type = ntohl(*bp++); } - call->offset = 0; call->unmarshall++; case 5: break; @@ -303,7 +300,8 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call) rxrpc_kernel_get_peer(call->net->socket, call->rxcall, ); - ret = afs_extract_data(call, NULL, 0, false); + afs_extract_discard(call, 0); + ret = afs_extract_data(call, false); if (ret < 0) return ret; @@ -332,16 +330,15 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) switch (call->unmarshall) { case 0: - call->offset = 0; call->buffer = kmalloc_array(11, sizeof(__be32), GFP_KERNEL); if (!call->buffer) return -ENOMEM; + afs_extract_to_buf(call, 11 * sizeof(__be32)); call->unmarshall++; case 1: _debug("extract UUID"); - ret = afs_extract_data(call, call->buffer, - 11 * sizeof(__be32), false); + ret = afs_extract_data(call, false); switch (ret) { case 0: break; case -EAGAIN: return
[PATCH 05/24] afs: Set up the iov_iter before calling afs_extract_data()
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC each time it is called to describe the remaining buffer to be filled. Instead: (1) Put an iterator in the afs_call struct. (2) Set the iterator for each marshalling stage to load data into the appropriate places. A number of convenience functions are provided to this end (eg. afs_extract_to_buf()). This iterator is then passed to afs_extract_data(). (3) Use the new ITER_DISCARD iterator to discard any excess data provided by FetchData. Signed-off-by: David Howells --- fs/afs/cmservice.c | 40 +++--- fs/afs/fsclient.c | 282 +++- fs/afs/internal.h | 56 - fs/afs/rxrpc.c | 41 ++ fs/afs/vlclient.c | 104 +++- include/trace/events/afs.h | 22 ++- 6 files changed, 236 insertions(+), 309 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 58f79301a716..4db62ae8dc1a 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -176,13 +176,13 @@ static int afs_deliver_cb_callback(struct afs_call *call) switch (call->unmarshall) { case 0: - call->offset = 0; + afs_extract_to_tmp(call); call->unmarshall++; /* extract the FID array and its count in two steps */ case 1: _debug("extract FID count"); - ret = afs_extract_data(call, >tmp, 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -196,13 +196,12 @@ static int afs_deliver_cb_callback(struct afs_call *call) GFP_KERNEL); if (!call->buffer) return -ENOMEM; - call->offset = 0; + afs_extract_to_buf(call, call->count * 3 * 4); call->unmarshall++; case 2: _debug("extract FID array"); - ret = afs_extract_data(call, call->buffer, - call->count * 3 * 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -222,13 +221,13 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->cb.type = AFSCM_CB_UNTYPED; } - call->offset = 0; + afs_extract_to_tmp(call); call->unmarshall++; /* extract the callback array and its count in two steps */ case 3: _debug("extract CB count"); - ret = afs_extract_data(call, >tmp, 4, true); + ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -237,13 +236,12 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - call->offset = 0; + afs_extract_to_buf(call, call->count2 * 3 * 4); call->unmarshall++; case 4: _debug("extract CB array"); - ret = afs_extract_data(call, call->buffer, - call->count2 * 3 * 4, false); + ret = afs_extract_data(call, false); if (ret < 0) return ret; @@ -256,7 +254,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->cb.type = ntohl(*bp++); } - call->offset = 0; call->unmarshall++; case 5: break; @@ -303,7 +300,8 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call) rxrpc_kernel_get_peer(call->net->socket, call->rxcall, ); - ret = afs_extract_data(call, NULL, 0, false); + afs_extract_discard(call, 0); + ret = afs_extract_data(call, false); if (ret < 0) return ret; @@ -332,16 +330,15 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) switch (call->unmarshall) { case 0: - call->offset = 0; call->buffer = kmalloc_array(11, sizeof(__be32), GFP_KERNEL); if (!call->buffer) return -ENOMEM; + afs_extract_to_buf(call, 11 * sizeof(__be32)); call->unmarshall++; case 1: _debug("extract UUID"); - ret = afs_extract_data(call, call->buffer, - 11 * sizeof(__be32), false); + ret = afs_extract_data(call, false); switch (ret) { case 0: break; case -EAGAIN: return
[PATCH 07/24] afs: Implement VL server rotation
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs//vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells --- fs/afs/Makefile|2 fs/afs/addr_list.c | 163 + fs/afs/cell.c | 39 +++--- fs/afs/dynroot.c |2 fs/afs/internal.h | 114 -- fs/afs/proc.c | 90 +++--- fs/afs/server.c| 42 ++- fs/afs/vl_list.c | 336 fs/afs/vl_rotate.c | 251 +++ fs/afs/vlclient.c | 32 ++--- fs/afs/volume.c| 52 ++-- 11 files changed, 905 insertions(+), 218 deletions(-) create mode 100644 fs/afs/vl_list.c create mode 100644 fs/afs/vl_rotate.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 546874057bd3..03e9f7afea1b 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -29,6 +29,8 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ + vl_rotate.o \ + vl_list.o \ volume.o \ write.o \ xattr.o diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 7b34fad4f8f5..3f60b4012587 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr, /* * Parse a text string consisting of delimited addresses. */ -struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, - char delim, - unsigned short service, - unsigned short port) +struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net, + const char *text, size_t len, + char delim, + unsigned short service, + unsigned short port) { + struct afs_vlserver_list *vllist; struct afs_addr_list *alist; const char *p, *end = text + len; + const char *problem; unsigned int nr = 0; + int ret = -ENOMEM; _enter("%*.*s,%c", (int)len, (int)len, text, delim); - if (!len) + if (!len) { + _leave(" = -EDESTADDRREQ [empty]"); return ERR_PTR(-EDESTADDRREQ); + } if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len))) delim = ','; @@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, /* Count the addresses */ p = text; do { - if (!*p) - return ERR_PTR(-EINVAL); + if (!*p) { + problem = "nul"; + goto inval; + } if (*p == delim) continue; nr++; if (*p == '[') { p++; - if (p == end) - return ERR_PTR(-EINVAL); + if (p == end) { + problem = "brace1"; + goto inval; + } p = memchr(p, ']', end - p); - if (!p) - return ERR_PTR(-EINVAL); + if (!p) { + problem = "brace2"; + goto inval; + } p++; if (p >= end) break; @@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, _debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES); - alist = afs_alloc_addrlist(nr, service, port); - if (!alist) + vllist = afs_alloc_vlserver_list(1); + if (!vllist) return ERR_PTR(-ENOMEM); + vllist->nr_servers = 1; + vllist->servers[0].server = afs_alloc_vlserver("", 7, AFS_VL_PORT); + if (!vllist->servers[0].server) + goto error_vl; + +
[PATCH 01/24] iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells --- block/bio.c |2 drivers/block/drbd/drbd_main.c |2 drivers/block/drbd/drbd_receiver.c |2 drivers/block/loop.c |9 +- drivers/block/nbd.c | 12 +- drivers/fsi/fsi-sbefifo.c|4 - drivers/isdn/mISDN/l1oip_core.c |3 - drivers/misc/vmw_vmci/vmci_queue_pair.c |6 + drivers/nvme/target/io-cmd-file.c|2 drivers/target/iscsi/iscsi_target_util.c |6 - drivers/target/target_core_file.c|6 + drivers/usb/usbip/usbip_common.c |2 drivers/xen/pvcalls-back.c |8 +- fs/9p/vfs_addr.c |4 - fs/9p/vfs_dir.c |2 fs/9p/xattr.c|4 - fs/afs/rxrpc.c | 15 +-- fs/block_dev.c |2 fs/ceph/file.c |7 + fs/cifs/connect.c|4 - fs/cifs/file.c |4 - fs/cifs/misc.c |4 - fs/cifs/smb2ops.c|4 - fs/cifs/smbdirect.c | 17 +++ fs/cifs/transport.c |8 +- fs/direct-io.c |2 fs/dlm/lowcomms.c|2 fs/fuse/file.c |2 fs/iomap.c |2 fs/nfsd/vfs.c|4 - fs/ocfs2/cluster/tcp.c |2 fs/orangefs/inode.c |2 fs/splice.c |7 + include/linux/uio.h | 59 --- lib/iov_iter.c | 154 ++ mm/filemap.c |2 mm/page_io.c |2 net/9p/client.c |2 net/9p/trans_virtio.c|2 net/bluetooth/6lowpan.c |2 net/bluetooth/a2mp.c |2 net/bluetooth/smp.c |2 net/ceph/messenger.c |6 + net/netfilter/ipvs/ip_vs_sync.c |2 net/smc/smc_clc.c|4 - net/socket.c |6 + net/sunrpc/svcsock.c |2 net/tipc/topsrv.c|2 net/tls/tls_device.c |4 - net/tls/tls_sw.c |4 - 50 files changed, 235 insertions(+), 184 deletions(-) diff --git a/block/bio.c b/block/bio.c index 0093bed81c0e..c55f36bbe12a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1255,7 +1255,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, /* * success */ - if (((iter->type & WRITE) && (!map_data || !map_data->null_mapped)) || + if ((iov_iter_rw(iter) == WRITE && (!map_data || !map_data->null_mapped)) || (map_data && map_data->from_user)) { ret = bio_copy_from_iter(bio, iter); if (ret) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index ef8212a4b73e..ded9735d44af 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1856,7 +1856,7 @@ int drbd_send(struct drbd_connection *connection, struct socket *sock, /* THINK if (signal_pending) return ... ? */ - iov_iter_kvec(_iter, WRITE | ITER_KVEC, , 1, size); + iov_iter_kvec(_iter, WRITE, , 1, size); if (sock == connection->data.socket) { rcu_read_lock(); diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 75f6b47169e6..fcc70642b004 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -516,7 +516,7 @@ static int drbd_recv_short(struct socket *sock, void *buf, size_t size, int flag struct msghdr msg = { .msg_flags = (flags ? flags : MSG_WAITALL | MSG_NOSIGNAL) }; - iov_iter_kvec(_iter, READ | ITER_KVEC, , 1, size); + iov_iter_kvec(_iter, READ, , 1, size); return sock_recvmsg(sock, , msg.msg_flags); } diff --git a/drivers/block/loop.c
[PATCH 06/24] afs: Improve FS server rotation error handling
Improve the error handling in FS server rotation by: (1) Cache the latest useful error value for the fs operation as a whole in struct afs_fs_cursor separately from the error cached in the afs_addr_cursor struct. The one in the address cursor gets clobbered occasionally. Copy over the error to the fs operation only when it's something we'd be interested in passing to userspace. (2) Make it so that EDESTADDRREQ is the default that is seen only if no addresses are available to be accessed. (3) When calling utility functions, such as checking a volume status or probing a fileserver, don't let a successful result clobber the cached error in the cursor; instead, stash the result in a temporary variable until it has been assessed. (4) Don't return ETIMEDOUT or ETIME if a better error, such as ENETUNREACH, is already cached. (5) On leaving the rotation loop, turn any remote abort code into a more useful error than ECONNABORTED. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells --- fs/afs/addr_list.c |4 +- fs/afs/internal.h |1 + fs/afs/rotate.c| 95 +--- 3 files changed, 55 insertions(+), 45 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 55a756c60746..7b34fad4f8f5 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (ac->index == ac->alist->nr_addrs) ac->index = 0; - if (ac->index == ac->start) { - ac->error = -EDESTADDRREQ; + if (ac->index == ac->start) return false; - } } ac->begun = true; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 36e9cc74ac11..81936a4d5035 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -629,6 +629,7 @@ struct afs_fs_cursor { unsigned intcb_break_2; /* cb_break + cb_s_break (2nd vnode) */ unsigned char start; /* Initial index in server list */ unsigned char index; /* Number of servers tried beyond start */ + short error; unsigned short flags; #define AFS_FS_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_FS_CURSOR_VBUSY0x0002 /* Set if seen VBUSY */ diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 1faef56b12bd..d7cbc3c230ee 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, struct afs_vnode *vnode fc->vnode = vnode; fc->key = key; fc->ac.error = SHRT_MAX; + fc->error = -EDESTADDRREQ; if (mutex_lock_interruptible(>io_lock) < 0) { - fc->ac.error = -EINTR; + fc->error = -EINTR; fc->flags |= AFS_FS_CURSOR_STOP; return false; } @@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc, * and have to return an error. */ if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) { - fc->ac.error = -ESTALE; + fc->error = -ESTALE; return false; } @@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc) { msleep_interruptible(1000); if (signal_pending(current)) { - fc->ac.error = -ERESTARTSYS; + fc->error = -ERESTARTSYS; return false; } @@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) struct afs_addr_list *alist; struct afs_server *server; struct afs_vnode *vnode = fc->vnode; + int error = fc->ac.error; _enter("%u/%u,%u/%u,%d,%d", fc->index, fc->start, fc->ac.index, fc->ac.start, - fc->ac.error, fc->ac.abort_code); + error, fc->ac.abort_code); if (fc->flags & AFS_FS_CURSOR_STOP) { _leave(" = f [stopped]"); @@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) } /* Evaluate the result of the previous operation, if there was one. */ - switch (fc->ac.error) { + switch (error) { case SHRT_MAX: goto start; case 0: default: /* Success or local failure. Stop. */ + fc->error = error; fc->flags |= AFS_FS_CURSOR_STOP; - _leave(" = f [okay/local %d]", fc->ac.error); + _leave(" = f [okay/local %d]", error); return false; case -ECONNABORTED: @@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
[PATCH 04/24] afs: Better tracing of protocol errors
Include the site of detection of AFS protocol errors in trace lines to better be able to determine what went wrong. Signed-off-by: David Howells --- fs/afs/cmservice.c |6 ++ fs/afs/fsclient.c | 117 +++- fs/afs/inode.c |2 - fs/afs/internal.h |2 - fs/afs/rxrpc.c |5 +- fs/afs/vlclient.c | 30 --- include/trace/events/afs.h | 54 +--- 7 files changed, 146 insertions(+), 70 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 9e51d6fe7e8f..58f79301a716 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -189,7 +189,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) call->count = ntohl(call->tmp); _debug("FID count: %u", call->count); if (call->count > AFSCBMAX) - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); call->buffer = kmalloc(array3_size(call->count, 3, 4), GFP_KERNEL); @@ -234,7 +235,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) call->count2 = ntohl(call->tmp); _debug("CB count: %u", call->count2); if (call->count2 != call->count && call->count2 != 0) - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_count); call->offset = 0; call->unmarshall++; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 50929cb91732..d9a5815945dc 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -233,7 +233,7 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, bad: xdr_dump_bad(*_bp); - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); } /* @@ -399,9 +399,10 @@ static int afs_deliver_fs_fetch_status_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; xdr_decode_AFSCallBack(call, vnode, ); if (call->reply[1]) xdr_decode_AFSVolSync(, call->reply[1]); @@ -580,9 +581,10 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) return ret; bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >status.data_version, req) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >status.data_version, req); + if (ret < 0) + return ret; xdr_decode_AFSCallBack(call, vnode, ); if (call->reply[1]) xdr_decode_AFSVolSync(, call->reply[1]); @@ -733,10 +735,13 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; xdr_decode_AFSFid(, call->reply[1]); - if (afs_decode_status(call, , call->reply[2], NULL, NULL, NULL) < 0 || - afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , call->reply[2], NULL, NULL, NULL); + if (ret < 0) + return ret; + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; xdr_decode_AFSCallBack_raw(, call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ @@ -839,9 +844,10 @@ static int afs_deliver_fs_remove(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -929,10
[PATCH 02/24] iov_iter: Renumber the ITER_* constants in uio.h
Renumber the ITER_* constants in uio.h to be contiguous to make comparing them more efficient in a switch-statement. Signed-off-by: David Howells --- include/linux/uio.h |8 +++-- lib/iov_iter.c | 77 ++- 2 files changed, 62 insertions(+), 23 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 73d2cdf4..f445d5cbb571 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -22,10 +22,10 @@ struct kvec { }; enum iter_type { - ITER_IOVEC = 0, - ITER_KVEC = 2, - ITER_BVEC = 4, - ITER_PIPE = 8, + ITER_IOVEC, + ITER_KVEC, + ITER_BVEC, + ITER_PIPE, }; struct iov_iter { diff --git a/lib/iov_iter.c b/lib/iov_iter.c index bd828591afb0..3d8c459e7cd8 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -75,18 +75,28 @@ #define iterate_all_kinds(i, n, v, I, B, K) { \ if (likely(n)) {\ size_t skip = i->iov_offset;\ - if (unlikely(i->iter_type & ITER_BVEC)) { \ + switch (iov_iter_type(i)) { \ + case ITER_BVEC: { \ struct bio_vec v; \ struct bvec_iter __bi; \ - iterate_bvec(i, n, v, __bi, skip, (B)) \ - } else if (unlikely(i->iter_type & ITER_KVEC)) { \ + iterate_bvec(i, n, v, __bi, skip, (B)); \ + break; \ + } \ + case ITER_KVEC: { \ const struct kvec *kvec;\ struct kvec v; \ - iterate_kvec(i, n, v, kvec, skip, (K)) \ - } else {\ + iterate_kvec(i, n, v, kvec, skip, (K)); \ + break; \ + } \ + case ITER_PIPE: { \ + break; \ + } \ + case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \ - iterate_iovec(i, n, v, iov, skip, (I)) \ + iterate_iovec(i, n, v, iov, skip, (I)); \ + break; \ + } \ } \ } \ } @@ -96,7 +106,8 @@ n = i->count; \ if (i->count) { \ size_t skip = i->iov_offset;\ - if (unlikely(i->iter_type & ITER_BVEC)) { \ + switch (iov_iter_type(i)) { \ + case ITER_BVEC: { \ const struct bio_vec *bvec = i->bvec; \ struct bio_vec v; \ struct bvec_iter __bi; \ @@ -104,7 +115,9 @@ i->bvec = __bvec_iter_bvec(i->bvec, __bi); \ i->nr_segs -= i->bvec - bvec; \ skip = __bi.bi_bvec_done; \ - } else if (unlikely(i->iter_type & ITER_KVEC)) {\ + break; \ + } \ + case ITER_KVEC: { \ const struct kvec *kvec;\ struct kvec v; \ iterate_kvec(i, n, v, kvec, skip, (K)) \ @@ -114,7 +127,9 @@ } \ i->nr_segs -= kvec - i->kvec; \ i->kvec = kvec; \ - } else {\ + break; \ + } \ + case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \
[PATCH 00/24] AFS development
Hi Al, Here's a set of development patches for AFS if you could pull it for the upcoming merge window. Its main features are: (1) Provide wrapper functions for accessing iov iterators, renumber the iterator types to be more amenable to switching on and provide a new read discard iterator type (ITER_DISCARD). (2) Use iov iterators more directly in AFS unmarshalling routines. (3) Support for retrieving DNS information where the VL server address list is partitioned by server. (4) Implement VL server rotation and improve both this and FS server rotation. (5) Add support for the YFS variant of the AFS server. (6) When first attempting to use a server or a list of servers, plumb all the addresses simultaneously to try and determine the best route. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git afs-next-20181020 and can also be found on the following branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-next David --- David Howells (24): iov_iter: Separate type from direction and use accessor functions iov_iter: Renumber the ITER_* constants in uio.h iov_iter: Add I/O discard iterator afs: Better tracing of protocol errors afs: Set up the iov_iter before calling afs_extract_data() afs: Improve FS server rotation error handling afs: Implement VL server rotation afs: Fix TTL on VL server and address lists afs: Handle EIO from delivery function afs: Add a couple of tracepoints to log I/O errors afs: Don't invoke the server to read data beyond EOF afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Commit the status on a new file/dir/symlink afs: Remove callback details from afs_callback_break struct afs: Implement the YFS cache manager service afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Calc callback expiry in op reply delivery afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Expand data structure fields to support YFS afs: Implement YFS support in the fs client afs: Allow dumping of server cursor on operation failure afs: Eliminate the address pointer from the address list cursor afs: Fix callback handling afs: Probe multiple fileservers simultaneously block/bio.c |2 drivers/block/drbd/drbd_main.c |2 drivers/block/drbd/drbd_receiver.c |2 drivers/block/loop.c |9 drivers/block/nbd.c | 12 drivers/fsi/fsi-sbefifo.c|4 drivers/isdn/mISDN/l1oip_core.c |3 drivers/misc/vmw_vmci/vmci_queue_pair.c |6 drivers/nvme/target/io-cmd-file.c|2 drivers/target/iscsi/iscsi_target_util.c |6 drivers/target/target_core_file.c|6 drivers/usb/usbip/usbip_common.c |2 drivers/xen/pvcalls-back.c |8 fs/9p/vfs_addr.c |4 fs/9p/vfs_dir.c |2 fs/9p/xattr.c|4 fs/afs/Kconfig | 12 fs/afs/Makefile |7 fs/afs/addr_list.c | 209 ++- fs/afs/afs.h | 50 - fs/afs/cache.c |2 fs/afs/callback.c| 17 fs/afs/cell.c| 65 + fs/afs/cmservice.c | 287 +++- fs/afs/dir.c | 75 + fs/afs/dynroot.c |4 fs/afs/file.c|8 fs/afs/flock.c | 22 fs/afs/fs_probe.c| 270 fs/afs/fsclient.c| 583 fs/afs/inode.c | 37 - fs/afs/internal.h| 322 fs/afs/mntpt.c |5 fs/afs/proc.c| 110 +- fs/afs/protocol_yfs.h| 163 ++ fs/afs/rotate.c | 302 +++- fs/afs/rxrpc.c | 115 +- fs/afs/security.c| 13 fs/afs/server.c | 145 -- fs/afs/server_list.c |6 fs/afs/super.c |5 fs/afs/vl_list.c | 340 + fs/afs/vl_probe.c| 273 fs/afs/vl_rotate.c | 355 + fs/afs/vlclient.c| 195 +-- fs/afs/volume.c | 56 - fs/afs/write.c | 30 fs/afs/xattr.c |2 fs/afs/yfsclient.c | 2184 ++ fs/block_dev.c
[PATCH 04/24] afs: Better tracing of protocol errors
Include the site of detection of AFS protocol errors in trace lines to better be able to determine what went wrong. Signed-off-by: David Howells --- fs/afs/cmservice.c |6 ++ fs/afs/fsclient.c | 117 +++- fs/afs/inode.c |2 - fs/afs/internal.h |2 - fs/afs/rxrpc.c |5 +- fs/afs/vlclient.c | 30 --- include/trace/events/afs.h | 54 +--- 7 files changed, 146 insertions(+), 70 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 9e51d6fe7e8f..58f79301a716 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -189,7 +189,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) call->count = ntohl(call->tmp); _debug("FID count: %u", call->count); if (call->count > AFSCBMAX) - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); call->buffer = kmalloc(array3_size(call->count, 3, 4), GFP_KERNEL); @@ -234,7 +235,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) call->count2 = ntohl(call->tmp); _debug("CB count: %u", call->count2); if (call->count2 != call->count && call->count2 != 0) - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_count); call->offset = 0; call->unmarshall++; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 50929cb91732..d9a5815945dc 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -233,7 +233,7 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, bad: xdr_dump_bad(*_bp); - return afs_protocol_error(call, -EBADMSG); + return afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); } /* @@ -399,9 +399,10 @@ static int afs_deliver_fs_fetch_status_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; xdr_decode_AFSCallBack(call, vnode, ); if (call->reply[1]) xdr_decode_AFSVolSync(, call->reply[1]); @@ -580,9 +581,10 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) return ret; bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >status.data_version, req) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >status.data_version, req); + if (ret < 0) + return ret; xdr_decode_AFSCallBack(call, vnode, ); if (call->reply[1]) xdr_decode_AFSVolSync(, call->reply[1]); @@ -733,10 +735,13 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; xdr_decode_AFSFid(, call->reply[1]); - if (afs_decode_status(call, , call->reply[2], NULL, NULL, NULL) < 0 || - afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , call->reply[2], NULL, NULL, NULL); + if (ret < 0) + return ret; + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; xdr_decode_AFSCallBack_raw(, call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ @@ -839,9 +844,10 @@ static int afs_deliver_fs_remove(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (afs_decode_status(call, , >status, vnode, - >expected_version, NULL) < 0) - return afs_protocol_error(call, -EBADMSG); + ret = afs_decode_status(call, , >status, vnode, + >expected_version, NULL); + if (ret < 0) + return ret; /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -929,10
[PATCH 02/24] iov_iter: Renumber the ITER_* constants in uio.h
Renumber the ITER_* constants in uio.h to be contiguous to make comparing them more efficient in a switch-statement. Signed-off-by: David Howells --- include/linux/uio.h |8 +++-- lib/iov_iter.c | 77 ++- 2 files changed, 62 insertions(+), 23 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 73d2cdf4..f445d5cbb571 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -22,10 +22,10 @@ struct kvec { }; enum iter_type { - ITER_IOVEC = 0, - ITER_KVEC = 2, - ITER_BVEC = 4, - ITER_PIPE = 8, + ITER_IOVEC, + ITER_KVEC, + ITER_BVEC, + ITER_PIPE, }; struct iov_iter { diff --git a/lib/iov_iter.c b/lib/iov_iter.c index bd828591afb0..3d8c459e7cd8 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -75,18 +75,28 @@ #define iterate_all_kinds(i, n, v, I, B, K) { \ if (likely(n)) {\ size_t skip = i->iov_offset;\ - if (unlikely(i->iter_type & ITER_BVEC)) { \ + switch (iov_iter_type(i)) { \ + case ITER_BVEC: { \ struct bio_vec v; \ struct bvec_iter __bi; \ - iterate_bvec(i, n, v, __bi, skip, (B)) \ - } else if (unlikely(i->iter_type & ITER_KVEC)) { \ + iterate_bvec(i, n, v, __bi, skip, (B)); \ + break; \ + } \ + case ITER_KVEC: { \ const struct kvec *kvec;\ struct kvec v; \ - iterate_kvec(i, n, v, kvec, skip, (K)) \ - } else {\ + iterate_kvec(i, n, v, kvec, skip, (K)); \ + break; \ + } \ + case ITER_PIPE: { \ + break; \ + } \ + case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \ - iterate_iovec(i, n, v, iov, skip, (I)) \ + iterate_iovec(i, n, v, iov, skip, (I)); \ + break; \ + } \ } \ } \ } @@ -96,7 +106,8 @@ n = i->count; \ if (i->count) { \ size_t skip = i->iov_offset;\ - if (unlikely(i->iter_type & ITER_BVEC)) { \ + switch (iov_iter_type(i)) { \ + case ITER_BVEC: { \ const struct bio_vec *bvec = i->bvec; \ struct bio_vec v; \ struct bvec_iter __bi; \ @@ -104,7 +115,9 @@ i->bvec = __bvec_iter_bvec(i->bvec, __bi); \ i->nr_segs -= i->bvec - bvec; \ skip = __bi.bi_bvec_done; \ - } else if (unlikely(i->iter_type & ITER_KVEC)) {\ + break; \ + } \ + case ITER_KVEC: { \ const struct kvec *kvec;\ struct kvec v; \ iterate_kvec(i, n, v, kvec, skip, (K)) \ @@ -114,7 +127,9 @@ } \ i->nr_segs -= kvec - i->kvec; \ i->kvec = kvec; \ - } else {\ + break; \ + } \ + case ITER_IOVEC: { \ const struct iovec *iov;\ struct iovec v; \
[PATCH 01/24] iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells --- block/bio.c |2 drivers/block/drbd/drbd_main.c |2 drivers/block/drbd/drbd_receiver.c |2 drivers/block/loop.c |9 +- drivers/block/nbd.c | 12 +- drivers/fsi/fsi-sbefifo.c|4 - drivers/isdn/mISDN/l1oip_core.c |3 - drivers/misc/vmw_vmci/vmci_queue_pair.c |6 + drivers/nvme/target/io-cmd-file.c|2 drivers/target/iscsi/iscsi_target_util.c |6 - drivers/target/target_core_file.c|6 + drivers/usb/usbip/usbip_common.c |2 drivers/xen/pvcalls-back.c |8 +- fs/9p/vfs_addr.c |4 - fs/9p/vfs_dir.c |2 fs/9p/xattr.c|4 - fs/afs/rxrpc.c | 15 +-- fs/block_dev.c |2 fs/ceph/file.c |7 + fs/cifs/connect.c|4 - fs/cifs/file.c |4 - fs/cifs/misc.c |4 - fs/cifs/smb2ops.c|4 - fs/cifs/smbdirect.c | 17 +++ fs/cifs/transport.c |8 +- fs/direct-io.c |2 fs/dlm/lowcomms.c|2 fs/fuse/file.c |2 fs/iomap.c |2 fs/nfsd/vfs.c|4 - fs/ocfs2/cluster/tcp.c |2 fs/orangefs/inode.c |2 fs/splice.c |7 + include/linux/uio.h | 59 --- lib/iov_iter.c | 154 ++ mm/filemap.c |2 mm/page_io.c |2 net/9p/client.c |2 net/9p/trans_virtio.c|2 net/bluetooth/6lowpan.c |2 net/bluetooth/a2mp.c |2 net/bluetooth/smp.c |2 net/ceph/messenger.c |6 + net/netfilter/ipvs/ip_vs_sync.c |2 net/smc/smc_clc.c|4 - net/socket.c |6 + net/sunrpc/svcsock.c |2 net/tipc/topsrv.c|2 net/tls/tls_device.c |4 - net/tls/tls_sw.c |4 - 50 files changed, 235 insertions(+), 184 deletions(-) diff --git a/block/bio.c b/block/bio.c index 0093bed81c0e..c55f36bbe12a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1255,7 +1255,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, /* * success */ - if (((iter->type & WRITE) && (!map_data || !map_data->null_mapped)) || + if ((iov_iter_rw(iter) == WRITE && (!map_data || !map_data->null_mapped)) || (map_data && map_data->from_user)) { ret = bio_copy_from_iter(bio, iter); if (ret) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index ef8212a4b73e..ded9735d44af 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1856,7 +1856,7 @@ int drbd_send(struct drbd_connection *connection, struct socket *sock, /* THINK if (signal_pending) return ... ? */ - iov_iter_kvec(_iter, WRITE | ITER_KVEC, , 1, size); + iov_iter_kvec(_iter, WRITE, , 1, size); if (sock == connection->data.socket) { rcu_read_lock(); diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 75f6b47169e6..fcc70642b004 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -516,7 +516,7 @@ static int drbd_recv_short(struct socket *sock, void *buf, size_t size, int flag struct msghdr msg = { .msg_flags = (flags ? flags : MSG_WAITALL | MSG_NOSIGNAL) }; - iov_iter_kvec(_iter, READ | ITER_KVEC, , 1, size); + iov_iter_kvec(_iter, READ, , 1, size); return sock_recvmsg(sock, , msg.msg_flags); } diff --git a/drivers/block/loop.c
[PATCH 06/24] afs: Improve FS server rotation error handling
Improve the error handling in FS server rotation by: (1) Cache the latest useful error value for the fs operation as a whole in struct afs_fs_cursor separately from the error cached in the afs_addr_cursor struct. The one in the address cursor gets clobbered occasionally. Copy over the error to the fs operation only when it's something we'd be interested in passing to userspace. (2) Make it so that EDESTADDRREQ is the default that is seen only if no addresses are available to be accessed. (3) When calling utility functions, such as checking a volume status or probing a fileserver, don't let a successful result clobber the cached error in the cursor; instead, stash the result in a temporary variable until it has been assessed. (4) Don't return ETIMEDOUT or ETIME if a better error, such as ENETUNREACH, is already cached. (5) On leaving the rotation loop, turn any remote abort code into a more useful error than ECONNABORTED. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells --- fs/afs/addr_list.c |4 +- fs/afs/internal.h |1 + fs/afs/rotate.c| 95 +--- 3 files changed, 55 insertions(+), 45 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 55a756c60746..7b34fad4f8f5 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (ac->index == ac->alist->nr_addrs) ac->index = 0; - if (ac->index == ac->start) { - ac->error = -EDESTADDRREQ; + if (ac->index == ac->start) return false; - } } ac->begun = true; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 36e9cc74ac11..81936a4d5035 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -629,6 +629,7 @@ struct afs_fs_cursor { unsigned intcb_break_2; /* cb_break + cb_s_break (2nd vnode) */ unsigned char start; /* Initial index in server list */ unsigned char index; /* Number of servers tried beyond start */ + short error; unsigned short flags; #define AFS_FS_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_FS_CURSOR_VBUSY0x0002 /* Set if seen VBUSY */ diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 1faef56b12bd..d7cbc3c230ee 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, struct afs_vnode *vnode fc->vnode = vnode; fc->key = key; fc->ac.error = SHRT_MAX; + fc->error = -EDESTADDRREQ; if (mutex_lock_interruptible(>io_lock) < 0) { - fc->ac.error = -EINTR; + fc->error = -EINTR; fc->flags |= AFS_FS_CURSOR_STOP; return false; } @@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc, * and have to return an error. */ if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) { - fc->ac.error = -ESTALE; + fc->error = -ESTALE; return false; } @@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc) { msleep_interruptible(1000); if (signal_pending(current)) { - fc->ac.error = -ERESTARTSYS; + fc->error = -ERESTARTSYS; return false; } @@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) struct afs_addr_list *alist; struct afs_server *server; struct afs_vnode *vnode = fc->vnode; + int error = fc->ac.error; _enter("%u/%u,%u/%u,%d,%d", fc->index, fc->start, fc->ac.index, fc->ac.start, - fc->ac.error, fc->ac.abort_code); + error, fc->ac.abort_code); if (fc->flags & AFS_FS_CURSOR_STOP) { _leave(" = f [stopped]"); @@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) } /* Evaluate the result of the previous operation, if there was one. */ - switch (fc->ac.error) { + switch (error) { case SHRT_MAX: goto start; case 0: default: /* Success or local failure. Stop. */ + fc->error = error; fc->flags |= AFS_FS_CURSOR_STOP; - _leave(" = f [okay/local %d]", fc->ac.error); + _leave(" = f [okay/local %d]", error); return false; case -ECONNABORTED: @@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
[PATCH 00/24] AFS development
Hi Al, Here's a set of development patches for AFS if you could pull it for the upcoming merge window. Its main features are: (1) Provide wrapper functions for accessing iov iterators, renumber the iterator types to be more amenable to switching on and provide a new read discard iterator type (ITER_DISCARD). (2) Use iov iterators more directly in AFS unmarshalling routines. (3) Support for retrieving DNS information where the VL server address list is partitioned by server. (4) Implement VL server rotation and improve both this and FS server rotation. (5) Add support for the YFS variant of the AFS server. (6) When first attempting to use a server or a list of servers, plumb all the addresses simultaneously to try and determine the best route. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git afs-next-20181020 and can also be found on the following branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-next David --- David Howells (24): iov_iter: Separate type from direction and use accessor functions iov_iter: Renumber the ITER_* constants in uio.h iov_iter: Add I/O discard iterator afs: Better tracing of protocol errors afs: Set up the iov_iter before calling afs_extract_data() afs: Improve FS server rotation error handling afs: Implement VL server rotation afs: Fix TTL on VL server and address lists afs: Handle EIO from delivery function afs: Add a couple of tracepoints to log I/O errors afs: Don't invoke the server to read data beyond EOF afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Commit the status on a new file/dir/symlink afs: Remove callback details from afs_callback_break struct afs: Implement the YFS cache manager service afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Calc callback expiry in op reply delivery afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Expand data structure fields to support YFS afs: Implement YFS support in the fs client afs: Allow dumping of server cursor on operation failure afs: Eliminate the address pointer from the address list cursor afs: Fix callback handling afs: Probe multiple fileservers simultaneously block/bio.c |2 drivers/block/drbd/drbd_main.c |2 drivers/block/drbd/drbd_receiver.c |2 drivers/block/loop.c |9 drivers/block/nbd.c | 12 drivers/fsi/fsi-sbefifo.c|4 drivers/isdn/mISDN/l1oip_core.c |3 drivers/misc/vmw_vmci/vmci_queue_pair.c |6 drivers/nvme/target/io-cmd-file.c|2 drivers/target/iscsi/iscsi_target_util.c |6 drivers/target/target_core_file.c|6 drivers/usb/usbip/usbip_common.c |2 drivers/xen/pvcalls-back.c |8 fs/9p/vfs_addr.c |4 fs/9p/vfs_dir.c |2 fs/9p/xattr.c|4 fs/afs/Kconfig | 12 fs/afs/Makefile |7 fs/afs/addr_list.c | 209 ++- fs/afs/afs.h | 50 - fs/afs/cache.c |2 fs/afs/callback.c| 17 fs/afs/cell.c| 65 + fs/afs/cmservice.c | 287 +++- fs/afs/dir.c | 75 + fs/afs/dynroot.c |4 fs/afs/file.c|8 fs/afs/flock.c | 22 fs/afs/fs_probe.c| 270 fs/afs/fsclient.c| 583 fs/afs/inode.c | 37 - fs/afs/internal.h| 322 fs/afs/mntpt.c |5 fs/afs/proc.c| 110 +- fs/afs/protocol_yfs.h| 163 ++ fs/afs/rotate.c | 302 +++- fs/afs/rxrpc.c | 115 +- fs/afs/security.c| 13 fs/afs/server.c | 145 -- fs/afs/server_list.c |6 fs/afs/super.c |5 fs/afs/vl_list.c | 340 + fs/afs/vl_probe.c| 273 fs/afs/vl_rotate.c | 355 + fs/afs/vlclient.c| 195 +-- fs/afs/volume.c | 56 - fs/afs/write.c | 30 fs/afs/xattr.c |2 fs/afs/yfsclient.c | 2184 ++ fs/block_dev.c
Re: [PATCH v7 13/21] tpm: add tpm_auto_startup() into tpm-interface.c
On Fri, 19 Oct 2018, Tomas Winkler wrote: Add wrapper tpm_auto_startup() to tpm-interface.c instead of open coded decision between TPM 1.x and TPM 2.x in tpm-chip.c Signed-off-by: Tomas Winkler Tested-by: Jarkko Sakkinen --- V3: New in the series. V4: Fix the commit message. V5-7: Resend. drivers/char/tpm/tpm-chip.c | 11 +++ drivers/char/tpm/tpm-interface.c | 15 +++ drivers/char/tpm/tpm.h | 1 + 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 46caadca916a..32db84683c40 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -451,14 +451,9 @@ int tpm_chip_register(struct tpm_chip *chip) { int rc; - if (chip->ops->flags & TPM_OPS_AUTO_STARTUP) { - if (chip->flags & TPM_CHIP_FLAG_TPM2) - rc = tpm2_auto_startup(chip); - else - rc = tpm1_auto_startup(chip); - if (rc) - return rc; - } + rc = tpm_auto_startup(chip); + if (rc) + return rc; tpm_sysfs_add_device(chip); diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 54b81700561b..69e007a198ce 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -545,6 +545,21 @@ int tpm_send(struct tpm_chip *chip, void *cmd, size_t buflen) } EXPORT_SYMBOL_GPL(tpm_send); +int tpm_auto_startup(struct tpm_chip *chip) +{ + int rc; + + if (!(chip->ops->flags & TPM_OPS_AUTO_STARTUP)) + return 0; + + if (chip->flags & TPM_CHIP_FLAG_TPM2) + rc = tpm2_auto_startup(chip); + else + rc = tpm1_auto_startup(chip); + + return rc; +} + /* * We are about to suspend. Save the TPM state * so that it can be restored. diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 2eb73f6966c3..daca1d0190b1 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -541,6 +541,7 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, size_t min_rsp_body_length, unsigned int flags, const char *desc); int tpm_get_timeouts(struct tpm_chip *); +int tpm_auto_startup(struct tpm_chip *chip); int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr); int tpm1_auto_startup(struct tpm_chip *chip); -- 2.14.4 Reviewed-by: Jarkko Sakkinen /Jarkko
Re: [PATCH v7 13/21] tpm: add tpm_auto_startup() into tpm-interface.c
On Fri, 19 Oct 2018, Tomas Winkler wrote: Add wrapper tpm_auto_startup() to tpm-interface.c instead of open coded decision between TPM 1.x and TPM 2.x in tpm-chip.c Signed-off-by: Tomas Winkler Tested-by: Jarkko Sakkinen --- V3: New in the series. V4: Fix the commit message. V5-7: Resend. drivers/char/tpm/tpm-chip.c | 11 +++ drivers/char/tpm/tpm-interface.c | 15 +++ drivers/char/tpm/tpm.h | 1 + 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 46caadca916a..32db84683c40 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -451,14 +451,9 @@ int tpm_chip_register(struct tpm_chip *chip) { int rc; - if (chip->ops->flags & TPM_OPS_AUTO_STARTUP) { - if (chip->flags & TPM_CHIP_FLAG_TPM2) - rc = tpm2_auto_startup(chip); - else - rc = tpm1_auto_startup(chip); - if (rc) - return rc; - } + rc = tpm_auto_startup(chip); + if (rc) + return rc; tpm_sysfs_add_device(chip); diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 54b81700561b..69e007a198ce 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -545,6 +545,21 @@ int tpm_send(struct tpm_chip *chip, void *cmd, size_t buflen) } EXPORT_SYMBOL_GPL(tpm_send); +int tpm_auto_startup(struct tpm_chip *chip) +{ + int rc; + + if (!(chip->ops->flags & TPM_OPS_AUTO_STARTUP)) + return 0; + + if (chip->flags & TPM_CHIP_FLAG_TPM2) + rc = tpm2_auto_startup(chip); + else + rc = tpm1_auto_startup(chip); + + return rc; +} + /* * We are about to suspend. Save the TPM state * so that it can be restored. diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 2eb73f6966c3..daca1d0190b1 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -541,6 +541,7 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, size_t min_rsp_body_length, unsigned int flags, const char *desc); int tpm_get_timeouts(struct tpm_chip *); +int tpm_auto_startup(struct tpm_chip *chip); int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr); int tpm1_auto_startup(struct tpm_chip *chip); -- 2.14.4 Reviewed-by: Jarkko Sakkinen /Jarkko
Re: [PATCH v7 12/21] tpm: factor out tpm_startup function
On Fri, 19 Oct 2018, Tomas Winkler wrote: TPM manual startup is used only from within TPM 1.x or TPM 2.x code, hence remove tpm_startup() function from tpm-interface.c and add two static functions implementations tpm1_startup() and tpm2_startup() into to tpm1-cmd.c and tpm2-cmd.c respectively. Signed-off-by: Tomas Winkler Tested-by: Jarkko Sakkinen --- V2-V2: Resend. V4: Fix the commit message. V5: 1. A small fix in the kdoc. 2. Fixed Jarkko's name in Tested-by. V6: Rebase. V7: Resend. drivers/char/tpm/tpm-interface.c | 41 drivers/char/tpm/tpm.h | 1 - drivers/char/tpm/tpm1-cmd.c | 37 +++- drivers/char/tpm/tpm2-cmd.c | 34 +++-- 4 files changed, 68 insertions(+), 45 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index e7f220f691f9..54b81700561b 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -414,47 +414,6 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, } EXPORT_SYMBOL_GPL(tpm_transmit_cmd); -#define TPM_ORD_STARTUP 153 -#define TPM_ST_CLEAR 1 - -/** - * tpm_startup - turn on the TPM - * @chip: TPM chip to use - * - * Normally the firmware should start the TPM. This function is provided as a - * workaround if this does not happen. A legal case for this could be for - * example when a TPM emulator is used. - * - * Return: same as tpm_transmit_cmd() - */ -int tpm_startup(struct tpm_chip *chip) -{ - struct tpm_buf buf; - int rc; - - dev_info(>dev, "starting up the TPM manually\n"); - - if (chip->flags & TPM_CHIP_FLAG_TPM2) { - rc = tpm_buf_init(, TPM2_ST_NO_SESSIONS, TPM2_CC_STARTUP); - if (rc < 0) - return rc; - - tpm_buf_append_u16(, TPM2_SU_CLEAR); - } else { - rc = tpm_buf_init(, TPM_TAG_RQU_COMMAND, TPM_ORD_STARTUP); - if (rc < 0) - return rc; - - tpm_buf_append_u16(, TPM_ST_CLEAR); - } - - rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0, - "attempting to start the TPM"); - - tpm_buf_destroy(); - return rc; -} - int tpm_get_timeouts(struct tpm_chip *chip) { if (chip->flags & TPM_CHIP_FLAG_HAVE_TIMEOUTS) diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 754f7bcb15fa..2eb73f6966c3 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -540,7 +540,6 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, void *buf, size_t bufsiz, size_t min_rsp_body_length, unsigned int flags, const char *desc); -int tpm_startup(struct tpm_chip *chip); int tpm_get_timeouts(struct tpm_chip *); int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr); diff --git a/drivers/char/tpm/tpm1-cmd.c b/drivers/char/tpm/tpm1-cmd.c index 3bd9f1fa77ce..8a84db315676 100644 --- a/drivers/char/tpm/tpm1-cmd.c +++ b/drivers/char/tpm/tpm1-cmd.c @@ -308,6 +308,40 @@ unsigned long tpm1_calc_ordinal_duration(struct tpm_chip *chip, u32 ordinal) return duration; } +#define TPM_ORD_STARTUP 153 +#define TPM_ST_CLEAR 1 + +/** + * tpm_startup() - turn on the TPM + * @chip: TPM chip to use + * + * Normally the firmware should start the TPM. This function is provided as a + * workaround if this does not happen. A legal case for this could be for + * example when a TPM emulator is used. + * + * Return: same as tpm_transmit_cmd() + */ +static int tpm1_startup(struct tpm_chip *chip) +{ + struct tpm_buf buf; + int rc; + + dev_info(>dev, "starting up the TPM manually\n"); + + rc = tpm_buf_init(, TPM_TAG_RQU_COMMAND, TPM_ORD_STARTUP); + if (rc < 0) + return rc; + + tpm_buf_append_u16(, TPM_ST_CLEAR); + + rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0, + "attempting to start the TPM"); + + tpm_buf_destroy(); + + return rc; +} + int tpm1_get_timeouts(struct tpm_chip *chip) { cap_t cap; @@ -317,7 +351,7 @@ int tpm1_get_timeouts(struct tpm_chip *chip) rc = tpm1_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, , NULL, sizeof(cap.timeout)); if (rc == TPM_ERR_INVALID_POSTINIT) { - if (tpm_startup(chip)) + if (tpm1_startup(chip)) return rc; rc = tpm1_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, , @@ -727,3 +761,4 @@ int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr) return rc; } + diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index dd2e98f4de08..6ca4fc0a0d6f 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -948,6 +948,36 @@ static int tpm2_get_cc_attrs_tbl(struct
Re: [PATCH v7 12/21] tpm: factor out tpm_startup function
On Fri, 19 Oct 2018, Tomas Winkler wrote: TPM manual startup is used only from within TPM 1.x or TPM 2.x code, hence remove tpm_startup() function from tpm-interface.c and add two static functions implementations tpm1_startup() and tpm2_startup() into to tpm1-cmd.c and tpm2-cmd.c respectively. Signed-off-by: Tomas Winkler Tested-by: Jarkko Sakkinen --- V2-V2: Resend. V4: Fix the commit message. V5: 1. A small fix in the kdoc. 2. Fixed Jarkko's name in Tested-by. V6: Rebase. V7: Resend. drivers/char/tpm/tpm-interface.c | 41 drivers/char/tpm/tpm.h | 1 - drivers/char/tpm/tpm1-cmd.c | 37 +++- drivers/char/tpm/tpm2-cmd.c | 34 +++-- 4 files changed, 68 insertions(+), 45 deletions(-) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index e7f220f691f9..54b81700561b 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -414,47 +414,6 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, } EXPORT_SYMBOL_GPL(tpm_transmit_cmd); -#define TPM_ORD_STARTUP 153 -#define TPM_ST_CLEAR 1 - -/** - * tpm_startup - turn on the TPM - * @chip: TPM chip to use - * - * Normally the firmware should start the TPM. This function is provided as a - * workaround if this does not happen. A legal case for this could be for - * example when a TPM emulator is used. - * - * Return: same as tpm_transmit_cmd() - */ -int tpm_startup(struct tpm_chip *chip) -{ - struct tpm_buf buf; - int rc; - - dev_info(>dev, "starting up the TPM manually\n"); - - if (chip->flags & TPM_CHIP_FLAG_TPM2) { - rc = tpm_buf_init(, TPM2_ST_NO_SESSIONS, TPM2_CC_STARTUP); - if (rc < 0) - return rc; - - tpm_buf_append_u16(, TPM2_SU_CLEAR); - } else { - rc = tpm_buf_init(, TPM_TAG_RQU_COMMAND, TPM_ORD_STARTUP); - if (rc < 0) - return rc; - - tpm_buf_append_u16(, TPM_ST_CLEAR); - } - - rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0, - "attempting to start the TPM"); - - tpm_buf_destroy(); - return rc; -} - int tpm_get_timeouts(struct tpm_chip *chip) { if (chip->flags & TPM_CHIP_FLAG_HAVE_TIMEOUTS) diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 754f7bcb15fa..2eb73f6966c3 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -540,7 +540,6 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space, void *buf, size_t bufsiz, size_t min_rsp_body_length, unsigned int flags, const char *desc); -int tpm_startup(struct tpm_chip *chip); int tpm_get_timeouts(struct tpm_chip *); int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr); diff --git a/drivers/char/tpm/tpm1-cmd.c b/drivers/char/tpm/tpm1-cmd.c index 3bd9f1fa77ce..8a84db315676 100644 --- a/drivers/char/tpm/tpm1-cmd.c +++ b/drivers/char/tpm/tpm1-cmd.c @@ -308,6 +308,40 @@ unsigned long tpm1_calc_ordinal_duration(struct tpm_chip *chip, u32 ordinal) return duration; } +#define TPM_ORD_STARTUP 153 +#define TPM_ST_CLEAR 1 + +/** + * tpm_startup() - turn on the TPM + * @chip: TPM chip to use + * + * Normally the firmware should start the TPM. This function is provided as a + * workaround if this does not happen. A legal case for this could be for + * example when a TPM emulator is used. + * + * Return: same as tpm_transmit_cmd() + */ +static int tpm1_startup(struct tpm_chip *chip) +{ + struct tpm_buf buf; + int rc; + + dev_info(>dev, "starting up the TPM manually\n"); + + rc = tpm_buf_init(, TPM_TAG_RQU_COMMAND, TPM_ORD_STARTUP); + if (rc < 0) + return rc; + + tpm_buf_append_u16(, TPM_ST_CLEAR); + + rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0, + "attempting to start the TPM"); + + tpm_buf_destroy(); + + return rc; +} + int tpm1_get_timeouts(struct tpm_chip *chip) { cap_t cap; @@ -317,7 +351,7 @@ int tpm1_get_timeouts(struct tpm_chip *chip) rc = tpm1_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, , NULL, sizeof(cap.timeout)); if (rc == TPM_ERR_INVALID_POSTINIT) { - if (tpm_startup(chip)) + if (tpm1_startup(chip)) return rc; rc = tpm1_getcap(chip, TPM_CAP_PROP_TIS_TIMEOUT, , @@ -727,3 +761,4 @@ int tpm1_pm_suspend(struct tpm_chip *chip, int tpm_suspend_pcr) return rc; } + diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index dd2e98f4de08..6ca4fc0a0d6f 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -948,6 +948,36 @@ static int tpm2_get_cc_attrs_tbl(struct
Re: [PATCH v7 11/21] tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c
On Fri, 19 Oct 2018, Tomas Winkler wrote: Factor out TPM 1.x suspend flow from tpm-interface.c into a new function tpm1_pm_suspend() in tpm1-cmd.c Signed-off-by: Tomas Winkler Reviewed-by: Jarkko Sakkinen I'll test this later. /Jarkko
Re: [PATCH] Documentation: dynamic-debug: fix wildcard description
On 10/19/18 5:20 PM, Will Korteland wrote: > On 2018-10-20 00:16, Randy Dunlap wrote: >> -A another way is to use wildcard. The match rule support ``*`` (matches >> -zero or more characters) and ``?`` (matches exactly one character).For >> +A another way is to use wildcards. The match rule supports ``*`` (matches >> +zero or more characters) and ``?`` (matches exactly one character). For >> example, you can match all usb drivers:: > > "A another" -> "Another"? > > - Will Yes. I'll send an update. thanks, -- ~Randy
Re: [PATCH v7 11/21] tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c
On Fri, 19 Oct 2018, Tomas Winkler wrote: Factor out TPM 1.x suspend flow from tpm-interface.c into a new function tpm1_pm_suspend() in tpm1-cmd.c Signed-off-by: Tomas Winkler Reviewed-by: Jarkko Sakkinen I'll test this later. /Jarkko
Re: [PATCH] Documentation: dynamic-debug: fix wildcard description
On 10/19/18 5:20 PM, Will Korteland wrote: > On 2018-10-20 00:16, Randy Dunlap wrote: >> -A another way is to use wildcard. The match rule support ``*`` (matches >> -zero or more characters) and ``?`` (matches exactly one character).For >> +A another way is to use wildcards. The match rule supports ``*`` (matches >> +zero or more characters) and ``?`` (matches exactly one character). For >> example, you can match all usb drivers:: > > "A another" -> "Another"? > > - Will Yes. I'll send an update. thanks, -- ~Randy
Re: [PATCH 1/2] sched/fair: move rq_of helper function
Hi Vincent, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/sched/core] [also build test ERROR on v4.19-rc8 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Vincent-Guittot/sched-fair-move-rq_of-helper-function/20181020-081004 config: i386-randconfig-x002-201841 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from kernel/sched/pelt.c:27: kernel/sched/sched.h: In function 'rq_of': >> include/linux/kernel.h:997:51: error: dereferencing pointer to incomplete >> type 'struct rq' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^ include/linux/compiler.h:335:18: note: in definition of macro '__compiletime_assert' int __cond = !(condition);\ ^ include/linux/compiler.h:358:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/kernel.h:997:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~~ include/linux/kernel.h:997:20: note: in expansion of macro '__same_type' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ In file included from :0:0: >> include/linux/compiler_types.h:237:35: error: invalid use of undefined type >> 'struct rq' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^~~ include/linux/kernel.h:1000:21: note: in expansion of macro 'offsetof' ((type *)(__mptr - offsetof(type, member))); }) ^~~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ -- In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from kernel/sched/sched.h:5, from kernel/sched/fair.c:23: kernel/sched/sched.h: In function 'rq_of': >> include/linux/kernel.h:997:51: error: dereferencing pointer to incomplete >> type 'struct rq' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^ include/linux/compiler.h:335:18: note: in definition of macro '__compiletime_assert' int __cond = !(condition);\ ^ include/linux/compiler.h:358:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/kernel.h:997:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~~ include/linux/kernel.h:997:20: note: in expansion of macro '__same_type' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ In file included from :0:0: >> include/linux/compiler_types.h:237:35: error: invalid use of undefined type >> 'struct rq' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^
Re: [PATCH 1/2] sched/fair: move rq_of helper function
Hi Vincent, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/sched/core] [also build test ERROR on v4.19-rc8 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Vincent-Guittot/sched-fair-move-rq_of-helper-function/20181020-081004 config: i386-randconfig-x002-201841 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from kernel/sched/pelt.c:27: kernel/sched/sched.h: In function 'rq_of': >> include/linux/kernel.h:997:51: error: dereferencing pointer to incomplete >> type 'struct rq' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^ include/linux/compiler.h:335:18: note: in definition of macro '__compiletime_assert' int __cond = !(condition);\ ^ include/linux/compiler.h:358:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/kernel.h:997:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~~ include/linux/kernel.h:997:20: note: in expansion of macro '__same_type' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ In file included from :0:0: >> include/linux/compiler_types.h:237:35: error: invalid use of undefined type >> 'struct rq' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^~~ include/linux/kernel.h:1000:21: note: in expansion of macro 'offsetof' ((type *)(__mptr - offsetof(type, member))); }) ^~~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ -- In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from kernel/sched/sched.h:5, from kernel/sched/fair.c:23: kernel/sched/sched.h: In function 'rq_of': >> include/linux/kernel.h:997:51: error: dereferencing pointer to incomplete >> type 'struct rq' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^ include/linux/compiler.h:335:18: note: in definition of macro '__compiletime_assert' int __cond = !(condition);\ ^ include/linux/compiler.h:358:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/kernel.h:997:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~~ include/linux/kernel.h:997:20: note: in expansion of macro '__same_type' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~ >> kernel/sched/sched.h:581:9: note: in expansion of macro 'container_of' return container_of(cfs_rq, struct rq, cfs); ^~~~ In file included from :0:0: >> include/linux/compiler_types.h:237:35: error: invalid use of undefined type >> 'struct rq' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^