Re: wake_wide mechanism clarification
Hi, Joel On 07/29/2017 10:13 AM, Joel Fernandes wrote: > +Michael Wang on his current email address (old one bounced). (my > reply was to Mike Galbraith but I also meant to CC Michael Wang for > the discussion). Thanks Just back from vacation and saw this long long discussion... I think guys explained well on the idea, wake_wide() just try to filter out the cases that may congest the waker's domain, as much as possible without side effect (ideally). The factor at very beginning is just a static number which picked by enormous times of practice testing, Peter Zijlstr suggest we add the domain size and make it a flexible factor, which by practice not that bad. So the simple answer is we use the llc_size since no better option at that time :-) But things changing very fast and new feature can introduce new cases, I'm pretty sure if we redo the testing the results will be very different, however, the idea itself still make sense to me, at least on theory. Recently I'm also thinking about the scheduler issue, cfs try to find out general solution for all these cases and the best answer is obviously, all the cases will suffer some damage and scheduler itself bloated to achieve the goal 'taking care of all'. So in order to achieve the maximum performance of particular workload, some user defined scheduler would be an interesting idea :-P Regards, Michael Wang > > On Sat, Jul 29, 2017 at 1:01 AM, Joel Fernandes wrote: >> Hi Mike, >> >> I have take spent some time understanding the email thread and >> previous discussions. Unfortunately the second condition we are >> checking for in the wake_wide still didn't make sense to me (mentioned >> below) :-( >> >> On Fri, Jun 30, 2017 at 10:02 AM, Mike Galbraith >> wrote: >>> On Fri, 2017-06-30 at 10:28 -0400, Josef Bacik wrote: >>>> On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote: >>>> >>>>> That makes sense that we multiply slave's flips by a factor because >>>>> its low, but I still didn't get why the factor is chosen to be >>>>> llc_size instead of something else for the multiplication with slave >>>>> (slave * factor). >>> >>>> Yeah I don't know why llc_size was chosen... >>> >>> static void update_top_cache_domain(int cpu) >>> { >>> struct sched_domain_shared *sds = NULL; >>> struct sched_domain *sd; >>> int id = cpu; >>> int size = 1; >>> >>> sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES); >>> if (sd) { >>> id = cpumask_first(sched_domain_span(sd)); >>> size = cpumask_weight(sched_domain_span(sd)); >>> sds = sd->shared; >>> } >>> >>> rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); >>> per_cpu(sd_llc_size, cpu) = size; >>> >>> The goal of wake wide was to approximate when pulling would be a futile >>> consolidation effort and counterproductive to scaling. 'course with >>> ever increasing socket size, any 1:N waker is ever more likely to run >>> out of CPU for its one and only self (slamming into scaling wall) >>> before it needing to turn its minions loose to conquer the world. >> >> Actually the original question was why do we have the second condition >> as "master < slave * factor", instead of "master < factor". that's >> what didn't make sense to me. Why don't we return 0 from wake_wide if >> master < factor ? >> >> Infact, as the factor is set to the llc_size, I think the condition >> that makes sense to me is: >> >> if ((master + slave) < llc_size) >> return 0; >> >> In other words, if the master flips and the slave flips are totally >> higher than the llc_size, then we are most likely waking up too many >> tasks as affine and should then switch to wide to prevent overloading. >> >> Digging further into the original patch from Michael Wang (I also CC'd >> him), this was the code (before you had changed it to master/slave): >> >> wakee->nr_wakee_switch > factor && >> waker->nr_wakee_switch > (factor * wakee->nr_wakee_switch) >> >> To explain the second condition above, Michael Wang said the following in [1] >> >> "Furthermore, if waker also has a high 'nr_wakee_switch', imply that multiple >> tasks rely on it, then waker's higher latency will damage all of them, pull >> wakee seems to be a bad deal." >> >&
[ Linux 4.4 stable ] missing 'printk: set may_schedule for some of console_trylock() callers'
Hi, greg k-h During our testing with 4.4.73 we got soft lockup like: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [systemd-udevd:856] ... Call Trace: [] vprintk_emit+0x319/0x4a0 [] printk_emit+0x33/0x3b [] ? simple_strtoull+0x2c/0x50 [] devkmsg_write+0xaa/0x100 [] ? vprintk+0x30/0x30 [] do_readv_writev+0x1c2/0x270 [] ? kmem_cache_free+0x7d/0x1a0 [] vfs_writev+0x39/0x50 [] SyS_writev+0x4a/0xd0 [] entry_SYSCALL_64_fastpath+0x12/0x6a Currently in 4.4 the console_unlock() called by vprintk_emit() is with preemption disabled, so the cond_resched is not working, and soft lockup appear if it take too much time on writing data into every console. We found the upstream patch: commit 6b97a20d3a79 printk: set may_schedule for some of console_trylock() callers which should have addressed this issue, but not included in the latest 4.4.78 stable yet, is there any plan on backport it in future? Regards, Michael Wang
Re: [PATCH] md: return 0 instead of error in rdev_attr_show()
We found the upstream fix, sorry for the noise... Regards, Michael Wang On 04/11/2017 12:14 PM, Michael Wang wrote: > > sysfs_kf_read() expect the show() callback return the dumped > length, while rdev_attr_show() can return the error which lead > into overflow: > > BUG: unable to handle kernel paging request at 88040b084000 > IP: [] __memmove+0x24/0x1a0 > PGD 1edb067 PUD 1ede067 PMD 406b9a063 PTE 80040b084161 > Oops: 0003 [#1] SMP > [snip] > Call Trace: > [] ? sysfs_kf_read+0x80/0xb0 > [] kernfs_fop_read+0xab/0x160 > [] __vfs_read+0x28/0xd0 > [] vfs_read+0x86/0x130 > [] SyS_read+0x46/0xa0 > [] entry_SYSCALL_64_fastpath+0x12/0x6a > > Simply return 0 in case of error solved the problem. > > Signed-off-by: Michael Wang > --- > drivers/md/md.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 1db88d7..d46d714 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -3271,10 +3271,9 @@ rdev_attr_show(struct kobject *kobj, struct attribute > *attr, char *page) > struct rdev_sysfs_entry *entry = container_of(attr, struct > rdev_sysfs_entry, attr); > struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj); > > - if (!entry->show) > - return -EIO; > - if (!rdev->mddev) > - return -EBUSY; > + if (!entry->show || !rdev->mddev) > + return 0; > + > return entry->show(rdev, page); > } > >
[PATCH] md: return 0 instead of error in rdev_attr_show()
sysfs_kf_read() expect the show() callback return the dumped length, while rdev_attr_show() can return the error which lead into overflow: BUG: unable to handle kernel paging request at 88040b084000 IP: [] __memmove+0x24/0x1a0 PGD 1edb067 PUD 1ede067 PMD 406b9a063 PTE 80040b084161 Oops: 0003 [#1] SMP [snip] Call Trace: [] ? sysfs_kf_read+0x80/0xb0 [] kernfs_fop_read+0xab/0x160 [] __vfs_read+0x28/0xd0 [] vfs_read+0x86/0x130 [] SyS_read+0x46/0xa0 [] entry_SYSCALL_64_fastpath+0x12/0x6a Simply return 0 in case of error solved the problem. Signed-off-by: Michael Wang --- drivers/md/md.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 1db88d7..d46d714 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -3271,10 +3271,9 @@ rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page) struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj); - if (!entry->show) - return -EIO; - if (!rdev->mddev) - return -EBUSY; + if (!entry->show || !rdev->mddev) + return 0; + return entry->show(rdev, page); } -- 2.5.0
Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio
On 04/05/2017 12:17 AM, NeilBrown wrote: [snip] >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 7d67235..0554110 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> /* Don't try recovering from here - just fail it >> * ... unless it is the last working device of course */ >> md_error(mddev, rdev); >> -if (test_bit(Faulty, &rdev->flags)) >> +if (test_bit(Faulty, &rdev->flags)) { >> /* Don't try to read from here, but make sure >> * put_buf does it's thing >> */ >> bio->bi_end_io = end_sync_write; >> +bio->bi_next = NULL; >> +} >> } >> >> while(sectors) { > > > Ah - I see what is happening now. I was looking at the vanilla 4.4 > code, which doesn't have the failfast changes. My bad to forgot mention... yes our md stuff is very much close to the upstream. > > I don't think your patch is correct though. We really shouldn't be > re-using that bio, and setting bi_next to NULL just hides the bug. It > doesn't fix it. > As the rdev is now Faulty, it doesn't make sense for > sync_request_write() to submit a write request to it. Make sense, while still have concerns regarding the design: * in this case since the read_disk already abandoned, is it fine to keep r1_bio->read_disk recording the faulty device index? * we assign the 'end_sync_write' to the original read bio in this case, but when is this supposed to be called? > > Can you confirm that this works please. Yes, it works. Tested-by: Michael Wang Regards, Michael Wang > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index d2d8b8a5bd56..219f1e1f1d1d 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, > struct r1bio *r1_bio) >(i == r1_bio->read_disk || > !test_bit(MD_RECOVERY_SYNC, &mddev->recovery > continue; > + if (test_bit(Faulty, &conf->mirrors[i].rdev->flags)) > + continue; > > bio_set_op_attrs(wbio, REQ_OP_WRITE, 0); > if (test_bit(FailFast, &conf->mirrors[i].rdev->flags)) > > > Thanks, > NeilBrown >
[RFC PATCH] raid1: reset 'bi_next' before reuse the bio
During the testing we found the sync read bio can go through path: md_do_sync() sync_request() generic_make_request() blk_queue_bio() blk_attempt_plug_merge() bio->bi_next CHAINED HERE ... raid1d() sync_request_write() fix_sync_read_error() if FailFast && Faulty bio->bi_end_io = end_sync_write generic_make_request() BUG_ON(bio->bi_next) This need to meet the conditions: * bio once merged * read disk have FailFast enabled * read disk is Faulty And since the block layer won't reset the 'bi_next' after bio is done inside request, we hit the BUG like that. This patch simply reset the bi_next before we reuse it. Signed-off-by: Michael Wang --- drivers/md/raid1.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7d67235..0554110 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) /* Don't try recovering from here - just fail it * ... unless it is the last working device of course */ md_error(mddev, rdev); - if (test_bit(Faulty, &rdev->flags)) + if (test_bit(Faulty, &rdev->flags)) { /* Don't try to read from here, but make sure * put_buf does it's thing */ bio->bi_end_io = end_sync_write; + bio->bi_next = NULL; + } } while(sectors) { -- 2.5.0
Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request
On 04/04/2017 02:24 PM, Michael Wang wrote: > On 04/04/2017 12:23 PM, Michael Wang wrote: > [snip] >>> add something like >>> if (wbio->bi_next) >>> printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n", >>> i, r1_bio->read_disk, wbio->bi_end_io); >>> >>> that might help narrow down what is happening. >> >> Just triggered again in 4.4, dmesg like: >> >> [ 399.240230] md: super_written gets error=-5 >> [ 399.240286] md: super_written gets error=-5 >> [ 399.240286] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240300] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240312] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240323] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240334] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240341] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240349] md/raid1:md0: dm-0: unrecoverable I/O read error for block >> 204160 >> [ 399.240352] bi_next!= NULL i=0 read_disk=0 bi_end_io=end_sync_write >> [raid1] > > Is it possible that the fail fast who changed the 'bi_end_io' inside > fix_sync_read_error() help the used bio pass the check? Hi, NeilBrown, below patch fixed the issue in our testing, I'll post a md RFC patch so we can continue the discussion there. Regards, Michael Wang > > I'm not sure but if the read bio was supposed to be reused as write > for fail fast, maybe we should reset it like this? > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7d67235..0554110 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > /* Don't try recovering from here - just fail it > * ... unless it is the last working device of course */ > md_error(mddev, rdev); > - if (test_bit(Faulty, &rdev->flags)) > + if (test_bit(Faulty, &rdev->flags)) { > /* Don't try to read from here, but make sure > * put_buf does it's thing > */ > bio->bi_end_io = end_sync_write; > + bio->bi_next = NULL; > + } > } > > while(sectors) { > > Regards, > Michael Wang > > >> [ 399.240363] [ cut here ] >> [ 399.240364] kernel BUG at block/blk-core.c:2147! >> [ 399.240365] invalid opcode: [#1] SMP >> [ 399.240378] Modules linked in: ib_srp scsi_transport_srp raid1 md_mod >> ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_core vxlan ip6_udp_tunnel >> udp_tunnel mlx4_ib ib_sa ib_mad ib_core ib_addr ib_netlink iTCO_wdt >> iTCO_vendor_support dcdbas dell_smm_hwmon acpi_cpufreq x86_pkg_temp_thermal >> tpm_tis coretemp evdev tpm i2c_i801 crct10dif_pclmul serio_raw crc32_pclmul >> battery processor acpi_pad button kvm_intel kvm dm_round_robin irqbypass >> dm_multipath autofs4 sg sd_mod crc32c_intel ahci libahci psmouse libata >> mlx4_core scsi_mod xhci_pci xhci_hcd mlx_compat fan thermal [last unloaded: >> scsi_transport_srp] >> [ 399.240380] CPU: 1 PID: 2052 Comm: md0_raid1 Not tainted >> 4.4.50-1-pserver+ #26 >> [ 399.240381] Hardware name: Dell Inc. Precision Tower 3620/09WH54, BIOS >> 1.3.6 05/26/2016 >> [ 399.240381] task: 8804031b6200 ti: 8800d72b4000 task.ti: >> 8800d72b4000 >> [ 399.240385] RIP: 0010:[] [] >> generic_make_request+0x29e/0x2a0 >> [ 399.240385] RSP: 0018:8800d72b7d10 EFLAGS: 00010286 >> [ 399.240386] RAX: 8804031b6200 RBX: 8800d2577e00 RCX: >> 3fff >> [ 399.240387] RDX: c001 RSI: 0001 RDI: >> 8800d5e8c1e0 >> [ 399.240387] RBP: 8800d72b7d50 R08: R09: >> 003f >> [ 399.240388] R10: 0004 R11: 001db9ac R12: >> >> [ 399.240388] R13: 8800d2748e00 R14: 88040a016400 R15: >> 8800d2748e40 >> [ 399.240389] FS: () GS:88041dc4() >> knlGS: >> [ 399.240390] CS: 0010 DS: ES: CR0: 80050033 >> [ 399.240390] CR2: 7fb49246a000 CR3: 00040215c000 CR4: >> 003406e0 >> [ 399.240391] DR0: DR1: 000
Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request
On 04/04/2017 12:23 PM, Michael Wang wrote: [snip] >> add something like >> if (wbio->bi_next) >> printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n", >> i, r1_bio->read_disk, wbio->bi_end_io); >> >> that might help narrow down what is happening. > > Just triggered again in 4.4, dmesg like: > > [ 399.240230] md: super_written gets error=-5 > [ 399.240286] md: super_written gets error=-5 > [ 399.240286] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240300] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240312] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240323] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240334] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240341] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240349] md/raid1:md0: dm-0: unrecoverable I/O read error for block > 204160 > [ 399.240352] bi_next!= NULL i=0 read_disk=0 bi_end_io=end_sync_write [raid1] Is it possible that the fail fast who changed the 'bi_end_io' inside fix_sync_read_error() help the used bio pass the check? I'm not sure but if the read bio was supposed to be reused as write for fail fast, maybe we should reset it like this? diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7d67235..0554110 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) /* Don't try recovering from here - just fail it * ... unless it is the last working device of course */ md_error(mddev, rdev); - if (test_bit(Faulty, &rdev->flags)) + if (test_bit(Faulty, &rdev->flags)) { /* Don't try to read from here, but make sure * put_buf does it's thing */ bio->bi_end_io = end_sync_write; + bio->bi_next = NULL; + } } while(sectors) { Regards, Michael Wang > [ 399.240363] [ cut here ] > [ 399.240364] kernel BUG at block/blk-core.c:2147! > [ 399.240365] invalid opcode: [#1] SMP > [ 399.240378] Modules linked in: ib_srp scsi_transport_srp raid1 md_mod > ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_core vxlan ip6_udp_tunnel > udp_tunnel mlx4_ib ib_sa ib_mad ib_core ib_addr ib_netlink iTCO_wdt > iTCO_vendor_support dcdbas dell_smm_hwmon acpi_cpufreq x86_pkg_temp_thermal > tpm_tis coretemp evdev tpm i2c_i801 crct10dif_pclmul serio_raw crc32_pclmul > battery processor acpi_pad button kvm_intel kvm dm_round_robin irqbypass > dm_multipath autofs4 sg sd_mod crc32c_intel ahci libahci psmouse libata > mlx4_core scsi_mod xhci_pci xhci_hcd mlx_compat fan thermal [last unloaded: > scsi_transport_srp] > [ 399.240380] CPU: 1 PID: 2052 Comm: md0_raid1 Not tainted 4.4.50-1-pserver+ > #26 > [ 399.240381] Hardware name: Dell Inc. Precision Tower 3620/09WH54, BIOS > 1.3.6 05/26/2016 > [ 399.240381] task: 8804031b6200 ti: 8800d72b4000 task.ti: > 8800d72b4000 > [ 399.240385] RIP: 0010:[] [] > generic_make_request+0x29e/0x2a0 > [ 399.240385] RSP: 0018:8800d72b7d10 EFLAGS: 00010286 > [ 399.240386] RAX: 8804031b6200 RBX: 8800d2577e00 RCX: > 3fff > [ 399.240387] RDX: c001 RSI: 0001 RDI: > 8800d5e8c1e0 > [ 399.240387] RBP: 8800d72b7d50 R08: R09: > 003f > [ 399.240388] R10: 0004 R11: 001db9ac R12: > > [ 399.240388] R13: 8800d2748e00 R14: 88040a016400 R15: > 8800d2748e40 > [ 399.240389] FS: () GS:88041dc4() > knlGS: > [ 399.240390] CS: 0010 DS: ES: CR0: 80050033 > [ 399.240390] CR2: 7fb49246a000 CR3: 00040215c000 CR4: > 003406e0 > [ 399.240391] DR0: DR1: DR2: > > [ 399.240391] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 399.240392] Stack: > [ 399.240393] 8800d72b7d18 8800d72b7d30 > > [ 399.240394] a079c290 8800d2577e00 > 8800d2748e00 > [ 399.240395] 8800d72b7e58 a079e74c 88040b661c00 > 8800d2577e00 > [ 399.240396] Call Trace: > [ 399.240398] [] ? sync_request+0xb20/0xb20 [raid1] > [ 399.240400] [] raid1d+0x65c/0x1060 [raid1] > [ 399.240403] [] ? &g
Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request
On 04/04/2017 11:37 AM, NeilBrown wrote: > On Tue, Apr 04 2017, Michael Wang wrote: [snip] >>> >>> If sync_request_write() is using a bio that has already been used, it >>> should call bio_reset() and fill in the details again. >>> However I don't see how that would happen. >>> Can you give specific details on the situation that triggers the bug? >> >> We have storage side mapping lv through scst to server, on server side >> we assemble them into multipath device, and then assemble these dm into >> two raid1. >> >> The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage >> side we unmap all the lv (could during mkfs or fio), then on server side >> we hit the BUG (reproducible). > > So I assume the initial resync is still happening at this point? > And you unmap *all* the lv's so you expect IO to fail? > I can see that the code would behave strangely if you have a > bad-block-list configured (which is the default). > Do you have a bbl? If you create the array without the bbl, does it > still crash? The resync is at least happen concurrently in this case, we try to simulate the case that all the connections dropped, the IO do failed, also bunch of kernel log like: md: super_written gets error=-5 blk_update_request: I/O error, dev dm-3, sector 64184 md/raid1:md1: dm-2: unrecoverable I/O read error for block 46848 we expect that to happen, but server should not crash on BUG. And we haven't done any thing special regarding bbl, the bad_blocks in sysfs are all empty. > >> >> The path of bio was confirmed by add tracing, it is reused in >> sync_request_write() >> with 'bi_next' once chained inside blk_attempt_plug_merge(). > > I still don't see why it is re-used. > I assume you didn't explicitly ask for a check/repair (i.e. didn't write > to .../md/sync_action at all?). In that case MD_RECOVERY_REQUESTED is > not set. Just unmap lv on storage side, no operation on server side. > So sync_request() sends only one bio to generic_make_request(): >r1_bio->bios[r1_bio->read_disk]; > > then sync_request_write() *doesn't* send that bio again, but does send > all the others. > > So where does it reuse a bio? If that's the design then it would be strange... the log do showing the path of that bio go through sync_request(), will do more investigation. > >> >> We also tried to reset the bi_next inside sync_request_write() before >> generic_make_request() which also works. >> >> The testing was done with 4.4, but we found upstream also left bi_next >> chained after done in request, thus we post this RFC. >> >> Regarding raid1, we haven't found the place on path where the bio was >> reset... where does it supposed to be? > > I'm not sure what you mean. > We only reset bios when they are being reused. > One place is in process_checks() where bio_reset() is called before > filling in all the details. > > > Maybe, in sync_request_write(), before > > wbio->bi_rw = WRITE; > > add something like > if (wbio->bi_next) > printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n", > i, r1_bio->read_disk, wbio->bi_end_io); > > that might help narrow down what is happening. Just triggered again in 4.4, dmesg like: [ 399.240230] md: super_written gets error=-5 [ 399.240286] md: super_written gets error=-5 [ 399.240286] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240300] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240312] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240323] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240334] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240341] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240349] md/raid1:md0: dm-0: unrecoverable I/O read error for block 204160 [ 399.240352] bi_next!= NULL i=0 read_disk=0 bi_end_io=end_sync_write [raid1] [ 399.240363] [ cut here ] [ 399.240364] kernel BUG at block/blk-core.c:2147! [ 399.240365] invalid opcode: [#1] SMP [ 399.240378] Modules linked in: ib_srp scsi_transport_srp raid1 md_mod ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_core vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ib_sa ib_mad ib_core ib_addr ib_netlink iTCO_wdt iTCO_vendor_support dcdbas dell_smm_hwmon acpi_cpufreq x86_pkg_temp_thermal tpm_tis coretemp evdev tpm i2c_i801 crct10dif_pclmul serio_raw crc32_pclmul battery processor acpi_pad button kvm_intel kvm dm_round_robin irqbypass dm_multipath autofs4 sg sd_mod crc32c_intel ahci libahci psmouse libata
Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request
Hi, Neil On 04/03/2017 11:25 PM, NeilBrown wrote: > On Mon, Apr 03 2017, Michael Wang wrote: > >> blk_attempt_plug_merge() try to merge bio into request and chain them >> by 'bi_next', while after the bio is done inside request, we forgot to >> reset the 'bi_next'. >> >> This lead into BUG while removing all the underlying devices from md-raid1, >> the bio once go through: >> >> md_do_sync() >> sync_request() >> generic_make_request() > > This is a read request from the "first" device. > >> blk_queue_bio() >> blk_attempt_plug_merge() >> CHAINED HERE >> >> will keep chained and reused by: >> >> raid1d() >> sync_request_write() >> generic_make_request() > > This is a write request to some other device, isn't it? > > If sync_request_write() is using a bio that has already been used, it > should call bio_reset() and fill in the details again. > However I don't see how that would happen. > Can you give specific details on the situation that triggers the bug? We have storage side mapping lv through scst to server, on server side we assemble them into multipath device, and then assemble these dm into two raid1. The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage side we unmap all the lv (could during mkfs or fio), then on server side we hit the BUG (reproducible). The path of bio was confirmed by add tracing, it is reused in sync_request_write() with 'bi_next' once chained inside blk_attempt_plug_merge(). We also tried to reset the bi_next inside sync_request_write() before generic_make_request() which also works. The testing was done with 4.4, but we found upstream also left bi_next chained after done in request, thus we post this RFC. Regarding raid1, we haven't found the place on path where the bio was reset... where does it supposed to be? BTW the fix_sync_read_error() also invoked and succeed before trigger the BUG. Regards, Michael Wang > > Thanks, > NeilBrown > > >> BUG_ON(bio->bi_next) >> >> After reset the 'bi_next' this can no longer happen. >> >> Signed-off-by: Michael Wang >> --- >> block/blk-core.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 43b7d06..91223b2 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -2619,8 +2619,10 @@ bool blk_update_request(struct request *req, int >> error, unsigned int nr_bytes) >> struct bio *bio = req->bio; >> unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes); >> >> - if (bio_bytes == bio->bi_iter.bi_size) >> + if (bio_bytes == bio->bi_iter.bi_size) { >> req->bio = bio->bi_next; >> + bio->bi_next = NULL; >> + } >> >> req_bio_endio(req, bio, bio_bytes, error); >> >> -- >> 2.5.0
[RFC PATCH] blk: reset 'bi_next' when bio is done inside request
blk_attempt_plug_merge() try to merge bio into request and chain them by 'bi_next', while after the bio is done inside request, we forgot to reset the 'bi_next'. This lead into BUG while removing all the underlying devices from md-raid1, the bio once go through: md_do_sync() sync_request() generic_make_request() blk_queue_bio() blk_attempt_plug_merge() CHAINED HERE will keep chained and reused by: raid1d() sync_request_write() generic_make_request() BUG_ON(bio->bi_next) After reset the 'bi_next' this can no longer happen. Signed-off-by: Michael Wang --- block/blk-core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 43b7d06..91223b2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2619,8 +2619,10 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) struct bio *bio = req->bio; unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes); - if (bio_bytes == bio->bi_iter.bi_size) + if (bio_bytes == bio->bi_iter.bi_size) { req->bio = bio->bi_next; + bio->bi_next = NULL; + } req_bio_endio(req, bio, bio_bytes, error); -- 2.5.0
Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion
Dear Maintainers I'd like to ask for the status of this patch since we hit the issue too during our testing on md raid1. Split remainder bio_A was queued ahead, following by bio_B for lower device, at this moment raid start freezing, the loop take out bio_A firstly and deliver it, which will hung since raid is freezing, while the freezing never end since it waiting for bio_B to finish, and bio_B is still on the queue, waiting for bio_A to finish... We're looking for a good solution and we found this patch already progressed a lot, but we can't find it on linux-next, so we'd like to ask are we still planning to have this fix in upstream? Regards, Michael Wang On 07/11/2016 04:10 PM, Lars Ellenberg wrote: > For a long time, generic_make_request() converts recursion into > iteration by queuing recursive arguments on current->bio_list. > > This is convenient for stacking drivers, > the top-most driver would take the originally submitted bio, > and re-submit a re-mapped version of it, or one or more clones, > or one or more new allocated bios to its backend(s). Which > are then simply processed in turn, and each can again queue > more "backend-bios" until we reach the bottom of the driver stack, > and actually dispatch to the real backend device. > > Any stacking driver ->make_request_fn() could expect that, > once it returns, any backend-bios it submitted via recursive calls > to generic_make_request() would now be processed and dispatched, before > the current task would call into this driver again. > > This is changed by commit > 54efd50 block: make generic_make_request handle arbitrarily sized bios > > Drivers may call blk_queue_split() inside their ->make_request_fn(), > which may split the current bio into a front-part to be dealt with > immediately, and a remainder-part, which may need to be split even > further. That remainder-part will simply also be pushed to > current->bio_list, and would end up being head-of-queue, in front > of any backend-bios the current make_request_fn() might submit during > processing of the fron-part. > > Which means the current task would immediately end up back in the same > make_request_fn() of the same driver again, before any of its backend > bios have even been processed. > > This can lead to resource starvation deadlock. > Drivers could avoid this by learning to not need blk_queue_split(), > or by submitting their backend bios in a different context (dedicated > kernel thread, work_queue context, ...). Or by playing funny re-ordering > games with entries on current->bio_list. > > Instead, I suggest to distinguish between recursive calls to > generic_make_request(), and pushing back the remainder part in > blk_queue_split(), by pointing current->bio_lists to a > struct recursion_to_iteration_bio_lists { > struct bio_list recursion; > struct bio_list queue; > } > > By providing each q->make_request_fn() with an empty "recursion" > bio_list, then merging any recursively submitted bios to the > head of the "queue" list, we can make the recursion-to-iteration > logic in generic_make_request() process deepest level bios first, > and "sibling" bios of the same level in "natural" order. > > Signed-off-by: Lars Ellenberg > Signed-off-by: Roland Kammerer > --- > block/bio.c | 20 +++ > block/blk-core.c | 49 > +-- > block/blk-merge.c | 5 - > drivers/md/bcache/btree.c | 12 ++-- > drivers/md/dm-bufio.c | 2 +- > drivers/md/raid1.c| 5 ++--- > drivers/md/raid10.c | 5 ++--- > include/linux/bio.h | 25 > include/linux/sched.h | 4 ++-- > 9 files changed, 80 insertions(+), 47 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 848cd35..c2606fd 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -366,12 +366,16 @@ static void punt_bios_to_rescuer(struct bio_set *bs) >*/ > > bio_list_init(&punt); > - bio_list_init(&nopunt); > > - while ((bio = bio_list_pop(current->bio_list))) > + bio_list_init(&nopunt); > + while ((bio = bio_list_pop(¤t->bio_lists->recursion))) > bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio); > + current->bio_lists->recursion = nopunt; > > - *current->bio_list = nopunt; > + bio_list_init(&nopunt); > + while ((bio = bio_list_pop(¤t->bio_lists->queue))) > + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio); > + current->bio_lists-
[BUG] block: bdi_register_owner() failure cause NULL pointer dereference
Hi, Folks We observed the hard lockup while trying raid assemble with sas3ircu, it was start with the failure inside bdi_register_owner() with duplicated kobj path, and later comeup the NULL pointer dereference, after that system hang and we saw hard lockup on screen. The duplicated issue could be with the scsi controller driver and we are going to upgrade it anyway, but my question is why we don't do some error handling like: diff --git a/block/genhd.c b/block/genhd.c index a178c8e..318bc63 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -614,7 +614,15 @@ void device_add_disk(struct device *parent, struct gendisk *disk) /* Register BDI before referencing it from bdev */ bdi = &disk->queue->backing_dev_info; - bdi_register_owner(bdi, disk_to_dev(disk)); + if (bdi_register_owner(bdi, disk_to_dev(disk))) { + disk_release_events(disk); + blk_free_devt(devt); + disk->ev = NULL; + disk->first_minor = 0; + disk->major = 0; + WARN_ON(1); + return; + } blk_register_region(disk_devt(disk), disk->minors, NULL, exact_match, exact_lock, disk); to prevent the following NULL pointer dereference and hard lockup? Regards, Michael Wang Sep 29 09:53:28 st401b-3 systemd[1]: Starting Update UTMP about System Runlevel Changes... Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen and drop on 1 v6wildcard :: UDP 123 Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 2 lo 127.0.0.1 UDP 123 Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 3 eth0 10.41.12.3 UDP 123 Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 4 lo ::1 UDP 123 Sep 29 09:53:28 st401b-3 ntpd[4970]: Listen normally on 5 eth0 fe80::ec4:7aff:feab:6b0 UDP 123 Sep 29 09:53:28 st401b-3 ntpd[4970]: peers refreshed Sep 29 09:53:28 st401b-3 ntpd[4970]: Listening on routing socket on fd #22 for interface updates Sep 29 09:53:28 st401b-3 systemd[1]: Started Update UTMP about System Runlevel Changes. Sep 29 09:53:28 st401b-3 systemd[1]: Startup finished in 18.720s (kernel) + 39.513s (userspace) = 58.233s. Sep 29 09:55:01 st401b-3 CRON[5433]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Sep 29 09:55:01 st401b-3 CRON[5434]: (root) CMD (test -x /opt/profitbricks/bin/check_memory && /opt/profitbricks/bin/check_memory) Sep 29 09:55:17 st401b-3 kernel: [ 167.693658] scsi 0:1:0:0: Direct-Access LSI Logical Volume 3000 PQ: 0 ANSI: 6 Sep 29 09:55:17 st401b-3 kernel: [ 167.693795] scsi 0:1:0:0: RAID1: handle(0x0143), wwid(0x02f7ec7949091b05), pd_count(2), type(SSP) Sep 29 09:55:17 st401b-3 kernel: [ 167.694042] sd 0:1:0:0: [sdam] 5859373056 512-byte logical blocks: (3.00 TB/2.73 TiB) Sep 29 09:55:17 st401b-3 kernel: [ 167.694044] sd 0:1:0:0: [sdam] 4096-byte physical blocks Sep 29 09:55:17 st401b-3 kernel: [ 167.694057] sd 0:1:0:0: Attached scsi generic sg40 type 0 Sep 29 09:55:17 st401b-3 kernel: [ 167.694129] sd 0:1:0:0: [sdam] Write Protect is off Sep 29 09:55:17 st401b-3 kernel: [ 167.694131] sd 0:1:0:0: [sdam] Mode Sense: 03 00 00 08 Sep 29 09:55:17 st401b-3 kernel: [ 167.694166] sd 0:1:0:0: [sdam] No Caching mode page found Sep 29 09:55:17 st401b-3 kernel: [ 167.694282] sd 0:0:4:0: hidding raid component Sep 29 09:55:17 st401b-3 kernel: [ 167.694589] sd 0:1:0:0: [sdam] Assuming drive cache: write through Sep 29 09:55:17 st401b-3 kernel: [ 167.703346] sd 0:1:0:0: [sdam] Attached SCSI disk Sep 29 09:55:17 st401b-3 kernel: [ 167.703653] sd 0:0:5:0: hidding raid component Sep 29 09:55:39 st401b-3 check_backup_lvm_push: critical: local git command rev-parse HEAD failed, retval: 0, fatal: Not a git repository: '/var/lib/backup-lvm/.git/' Sep 29 09:56:03 st401b-3 kernel: [ 213.684812] scsi 0:1:1:0: Direct-Access LSI Logical Volume 3000 PQ: 0 ANSI: 6 Sep 29 09:56:03 st401b-3 kernel: [ 213.684946] scsi 0:1:1:0: RAID1: handle(0x0142), wwid(0x0ab3eca651cd1b58), pd_count(2), type(SSP) Sep 29 09:56:03 st401b-3 kernel: [ 213.685189] sd 0:1:1:0: [sde] 5859373056 512-byte logical blocks: (3.00 TB/2.73 TiB) Sep 29 09:56:03 st401b-3 kernel: [ 213.685192] sd 0:1:1:0: [sde] 4096-byte physical blocks Sep 29 09:56:03 st401b-3 kernel: [ 213.685204] sd 0:1:1:0: Attached scsi generic sg41 type 0 Sep 29 09:56:03 st401b-3 kernel: [ 213.685275] sd 0:1:1:0: [sde] Write Protect is off Sep 29 09:56:03 st401b-3 kernel: [ 213.685277] sd 0:1:1:0: [sde] Mode Sense: 03 00 00 08 Sep 29 09:56:03 st401b-3 kernel: [ 213.685307] sd 0:1:1:0: [sde] No Caching mode page found Sep 29 09:56:03 st401b-3 kernel: [ 213.685423] sd 0:0:6:0: hidding raid component Sep 29 09:56:03 st401b-3 kernel: [ 213.685698] sd 0:1:1:0: [sde] Assuming drive cache: write through Sep 29 09:56:03 st401b-3 kernel: [ 213.686226] [ cut here ] Sep 29 09:56:03 st401b-3 kernel: [ 213
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On 12/16/2015 07:16 PM, Jason Gunthorpe wrote: > On Wed, Dec 16, 2015 at 11:26:39AM +0100, Michael Wang wrote: [snip] >> >> I've rechecked the ib_init_ah_from_path() again, and found it >> still set IB_AH_GRH when the GID cache missing, but with: > > How do you mean? > > ah_attr->ah_flags = IB_AH_GRH; > ah_attr->grh.dgid = rec->dgid; > > ret = ib_find_cached_gid(device, &rec->sgid, ndev, &port_num, > &gid_index); > if (ret) { > if (ndev) > dev_put(ndev); > return ret; > } > > If find_cached_gid fails then ib_init_ah_from_path also fails. > > Is there a case where ib_find_cached_gid can succeed but not return > good data? Just for the GRH header, ib_find_cached_gid() will failed but the flag and dgid will be correct, and others are all 0 including hop limit, but may be just coincidence... As long as hop_limit > 1 means the pkg do have to pass through a router to other subnet, the fix make sense :-) Regards, Michael Wang > > I agree it would read nicer if the ah_flags and gr.dgid was moved > after the ib_find_cached_gid > >> BTW, cma_sidr_rep_handler() also call ib_init_ah_from_path() with out >> a check on return. > > That sounds like a problem. > > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On 12/15/2015 06:30 PM, Jason Gunthorpe wrote: > On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote: >> The hop_limit is only suggest that the package allowed to be >> routed, not have to, correct? > > If the hop limit is >= 2 (?) then the GRH is mandatory. The > SM will return this information in the PathRecord if it determines a > GRH is required. The whole stack follows this protocol. > > The GRH is optional for in-subnet communications. Thanks for the explain :-) I've rechecked the ib_init_ah_from_path() again, and found it still set IB_AH_GRH when the GID cache missing, but with: grh.sgid_index = 0 grh.flow_label = 0 grh.hop_limit = 0 grh.traffic_class = 0 Not sure if it's just coincidence, hop_limit is 0, so router will discard the pkg and GRH won't be used, the transaction in subnet still works. Could this by designed as an optimization for the case like when SM reassigning the GID? BTW, cma_sidr_rep_handler() also call ib_init_ah_from_path() with out a check on return. Regards, Michael Wang > > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On 12/15/2015 04:52 PM, Nicholas Krause wrote: > This adds a needed error path in the function, cm_init_av_by_path > after the call to ib_init_ah_from_path in order to avoid incorrectly > accessing the path pointer of structure type ib_sa_path_rec if this > function call fails to complete its intended work successfully by > returning a error code. > > Signed-off-by: Nicholas Krause > --- > drivers/infiniband/core/cm.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index 0a26dd6..e9b36ea 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -383,8 +383,11 @@ static int cm_init_av_by_path(struct ib_sa_path_rec > *path, struct cm_av *av) > return ret; > > av->port = port; > - ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path, > - &av->ah_attr); > + ret = ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path, > +&av->ah_attr); ..Just wondering what if the transport don't require GRH? eg inside the same subnet? The hop_limit is only suggest that the package allowed to be routed, not have to, correct? Regards, Michael Wang > + if (ret) > + return ret; > + > av->timeout = path->packet_life_time + 1; > > return 0; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] infiniband:core:Add needed error path in cm_init_av_by_path
On 12/07/2015 08:57 AM, Haggai Eran wrote: > On Friday, December 4, 2015 8:02 PM, Nicholas Krause > wrote: >> To: dledf...@redhat.com >> Cc: sean.he...@intel.com; hal.rosenst...@gmail.com; Haggai Eran; >> jguntho...@obsidianresearch.com; Matan Barak; yun.w...@profitbricks.com; >> ted.h@oracle.com; Doron Tsur; Erez Shitrit; david.ah...@oracle.com; >> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org >> Subject: [PATCH] infiniband:core:Add needed error path in cm_init_av_by_path >> >> This adds a needed error path in the function, cm_init_av_by_path >> after the call to ib_init_ah_from_path in order to avoid incorrectly >> accessing the path pointer of structure type ib_sa_path_rec if this >> function call fails to complete its intended work successfully by >> returning a error code. >> >> Signed-off-by: Nicholas Krause > > The subject doesn't seem to match the convention but apart from that, > > Reviewed-by: Haggai Eran > > I wonder if this should go to stable. If I understand correctly, this will > fail only when the SGID isn't found in the GID table, but such connections > would fail later on when creating a QP, right? Me too think this need a reconsider, to me the current logical don't really care the missing gid in cache when initializing AV, I'm not sure if it's necessary to fail all the following path for such cache missing... Regards, Michael Wang > > Haggai > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 06:36 PM, Catalin Marinas wrote: > On 2 December 2015 at 13:59, Borislav Petkov wrote: [snip] > > 1. The sl?b allocators themselves use page allocations, so kmemleak > could end up detecting the same pointer twice, hiding a potential leak > > 2. Most page allocations do not contain data/pointers relevant to > kmemleak (e.g. page cache pages), however the randomness of such data > greatly diminishes kmemleak's ability to detect real leaks > > Arguably, kmemleak could be made to detect both cases above by a > combination of page flags, additional annotations or specific page > alloc API. However, this has its own drawbacks in terms of code > complexity (potentially outside mm/kmemleak.c) and overhead. Thanks for the very nice explain :-) I used to thought overhead is the only concern, missing the point regarding allocator it self. Regards, Michael Wang > > Regarding a kmemleak_alloc() annotation like in the patch I suggested, > that's the second one I've seen needed outside alloc APIs (the first > one is commit f75782e4e067 - "block: kmemleak: Track the page > allocations for struct request"). If the number of such explicit > annotations stays small, it's better to keep it this way. > > There are other explicit annotations like kmemleak_not_leak() or > kmemleak_ignore() but these are for objects kmemleak knows about and > incorrectly reports them as leaks. Most of the time is because the > pointers to such objects are stored in a different form (e.g. physical > address). > > Anyway, kmemleak is not the only tool requiring annotations (take > spin_lock_nested() for example). If needed, we could do with an > additional page alloc/free API which informs kmemleak in the process > but I don't think it's worth it. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 03:13 PM, Borislav Petkov wrote: > On Wed, Dec 02, 2015 at 03:09:18PM +0100, Michael Wang wrote: >> This tool will help improve the kernel, AFAIK it's already made it's >> best, if you got any idea on how to make it even better that would be >> great, but at this moment, it still need few of care :-P > > I think you're replying to my emails without even reading what I said. > So I'm going to do the same and stop wasting my time. First of all, I respect all your reply, and reply regarding your point. Frankly speaking, I think you know all these already, it's not a big deal but you refuse to obey the rules setup by others, although it do help make things better, and benefit yourself too. If you refuse to make things better I'm totally fine, time is valuable for all of us :-) Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 02:59 PM, Borislav Petkov wrote: > On Wed, Dec 02, 2015 at 02:48:47PM +0100, Michael Wang wrote: >> I'm not sure why amd-iommu use get_page not kmalloc to initialize the >> pointer table, but if it's necessary then the conflict will be there, >> it's not the fault of driver or kmemleak, but the design require them >> to cooperate with each other. > > So, according to you, we should go and "fix" all callers of > __get_free_pages() to make kmemleak happy. Then when the next new tool > comes along, we should "fix" another kernel API just so that the tools > are happy. That's the way we have to detect leak, no driver could get rid of the possibility of memory leaking, so it should respect the rule to help others locating the problem, if a driver full of false report then most likely folks will gradually lost interests on help fix leaking problem for it. > > Bzzt. Wrong! > > The tools should work without sprinkling their code everywhere. Driver > etc developers don't need to care about what tool they make happy or > not. Tools' hooks should be hidden in macro magic so that developers > don't care. This tool will help improve the kernel, AFAIK it's already made it's best, if you got any idea on how to make it even better that would be great, but at this moment, it still need few of care :-P Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 02:40 PM, Borislav Petkov wrote: > On Wed, Dec 02, 2015 at 02:18:44PM +0100, Michael Wang wrote: [snip] > >> Yeah, but it would be better to solve it, otherwise whoever saw this >> report will need to go into the amd-iommu, make sure it's not a real >> leak, then change their testing script... > > No, you don't need to go into the iommu - you need to fix kmemleak. > > And frankly, I'm getting sick and tired of all those tools needing > special handling and us adding code just so that they're happy. If the > tools can't figure out something, they shouldn't warn just in case but > shut up instead. If you mean the design of kmemleak, IMHO it's not that bad. The problem is regarding performance, think about if kmemleak go into every page to find out pointers, I guess the whole system will stuck. I'm not sure why amd-iommu use get_page not kmalloc to initialize the pointer table, but if it's necessary then the conflict will be there, it's not the fault of driver or kmemleak, but the design require them to cooperate with each other. Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
Hi, Borislav On 12/02/2015 02:13 PM, Borislav Petkov wrote: > On Wed, Dec 02, 2015 at 02:01:55PM +0100, Michael Wang wrote: >> Yeah.. it's a little complicated since we have our own kernel tree and this >> won't be a problem for us, but we really prefer to help fix it in mainline >> too, as long as this is really a defect, so others could save time on >> research >> in future. > > Well, to keep it realistic and if it were me, I wouldn't even take such > a fix as it is apparently kmemleak's problem. Do you mean this could be a real kmemleak? Could you please provide more details? > > So you could fix your testing instead to ignore that error message now > that you know it is a false-positive. That should be easiest. > Yeah, but it would be better to solve it, otherwise whoever saw this report will need to go into the amd-iommu, make sure it's not a real leak, then change their testing script... Regards, Michael Wang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 01:56 PM, Joerg Roedel wrote: > On Wed, Dec 02, 2015 at 01:31:38PM +0100, Michael Wang wrote: >> It's not my work or your work... it's a defect in the module and maintainer >> should take responsibility on fixing it, correct? > > No, its a false positive from an in-kernel checking tool, the iommu > driver is correct. You just sent a patch to silence the false positive > report. Yeah, but caused by the driver :-P and have to be fixed in there too. > >> We're very willing to help, but as I mentioned we are out of resource for >> testing at this moment, but we can send you a new patch without testing, >> will that works for you? > > This should be testable on any AMD IOMMU system with working interrupt > remapping. I will probably have no time to test this, if you really > can't test yourself, try to get a Tested-by from someone else. Good point, anyone willing to help test the fix? Or better provide the new patch to be the author. Regards, Michael Wang > > > > Joerg > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 01:53 PM, Borislav Petkov wrote: > On Wed, Dec 02, 2015 at 01:31:38PM +0100, Michael Wang wrote: >> It's not my work or your work... it's a defect in the module and maintainer >> should take responsibility on fixing it, correct? > > Well, you said "actually we just want to get rid of this annoying report > on obj won't leak..." > > It sounds to me like you want to have something fixed. So you do the > patch properly, add to the commit message why exactly you're doing it > and test it. Like everyone else. Yeah.. it's a little complicated since we have our own kernel tree and this won't be a problem for us, but we really prefer to help fix it in mainline too, as long as this is really a defect, so others could save time on research in future. But seems like we can only wait for another chance to confirm the another solution, frankly speaking I think we both will forgot this soon... fortunately it's not that critical :-P Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 12:51 PM, Joerg Roedel wrote: > On Wed, Dec 02, 2015 at 12:38:03PM +0100, Michael Wang wrote: >> Joerg, this is really a tiny fix, would you mind to merge it into some >> of your cleanup patch and testing them together? we are not in hurry, >> just want to make sure the issue will get solved. > > I am not doing your work. You sent a patch, received feedback, and now > you can send a new patch based on it. Thats the process. If it addresses > the feedback I will merge it, but I will not scissor your patch > together. It's not my work or your work... it's a defect in the module and maintainer should take responsibility on fixing it, correct? We're very willing to help, but as I mentioned we are out of resource for testing at this moment, but we can send you a new patch without testing, will that works for you? Regards, Michael Wang > > > Joerg > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 12:31 PM, Catalin Marinas wrote: > On 2 December 2015 at 10:56, Michael Wang wrote: [snip] > > I could copy your description but I don't currently have a way (nor > time) to test the patch. If you plan to test it anyway, please feel > free to include my diff (which I guess was badly re-formatted by > gmail), I don't really mind which author it is (I found it easier to > show a diff than explain in plain English ;)). Unfortunately that's the same on my side... we already close the ticket and I don't have resources to testing it again. Joerg, this is really a tiny fix, would you mind to merge it into some of your cleanup patch and testing them together? we are not in hurry, just want to make sure the issue will get solved. Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 12/02/2015 11:52 AM, Catalin Marinas wrote: [snip] >> >> Is there any more concern? actually we just want to get rid of this >> annoying report on obj won't leak, if you're going to create obj for >> 'irq_lookup_table' that's also fine for us, or will you pick this patch? > > My preference (from a kmemleak perspective) is to tell kmemleak about > the irq_lookup_table. Untested: I'm fine with both solution, will leave the decision to maintainer :-) BTW, could you please send a formal patch with descriptions? Regards, Michael Wang > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 013bdfff2d4d..c41609f71cbe 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1692,6 +1693,7 @@ static struct syscore_ops amd_iommu_syscore_ops = { > > static void __init free_on_init_error(void) > { > + kmemleak_free(irq_lookup_table); > free_pages((unsigned long)irq_lookup_table, > get_order(rlookup_table_size)); > > @@ -1906,6 +1908,7 @@ static int __init early_amd_iommu_init(void) > irq_lookup_table = (void *)__get_free_pages( > GFP_KERNEL | __GFP_ZERO, > get_order(rlookup_table_size)); > + kmemleak_alloc(irq_lookup_table, rlookup_table_size, 1, GFP_KERNEL); > if (!irq_lookup_table) > goto out; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
Hi, Joerg On 11/25/2015 04:14 PM, Michael Wang wrote: > On 11/25/2015 04:08 PM, Joerg Roedel wrote: [snip] >>> This is caused by the 'irq_lookup_table' was allocated with >>> __get_free_pages() which won't create kmemleak object, thus it's >>> pointers won't be count as referencing 'irq_remap_table' in >>> kmemleak scan. >> >> Isn't it better to allocate the kmemleak object manually instead of >> ignoring all irq-table pointers? With this patch we might not notice any >> real leak of irq-tables. > > We've considered that too, but found that the irq-tables is not > dynamically alloc/free, they won't be freed once initialized, so there > is no leaking for such object :-) Is there any more concern? actually we just want to get rid of this annoying report on obj won't leak, if you're going to create obj for 'irq_lookup_table' that's also fine for us, or will you pick this patch? Regards, Michael Wang > > Regards, > Michael Wang > >> >> >> >> Joerg >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 11/25/2015 04:08 PM, Joerg Roedel wrote: > On Fri, Nov 20, 2015 at 12:33:50PM +0100, Michael Wang wrote: >> The kmemleak testing on 3.18.24 show: >> >> unreferenced object 0x880233ff9010 (size 16): >> comm "swapper/0", pid 1, jiffies 4294937440 (age 2010.490s) >> hex dump (first 16 bytes): >> 0a 0a 00 00 20 00 00 00 00 44 fb 33 02 88 ff ff D.3 >> backtrace: >> [] create_object+0x10d/0x2d0 >> [] kmemleak_alloc+0x5b/0xc0 >> [] kmem_cache_alloc_trace+0xb9/0x160 >> [] get_irq_table+0x151/0x380 >> >> This is caused by the 'irq_lookup_table' was allocated with >> __get_free_pages() which won't create kmemleak object, thus it's >> pointers won't be count as referencing 'irq_remap_table' in >> kmemleak scan. > > Isn't it better to allocate the kmemleak object manually instead of > ignoring all irq-table pointers? With this patch we might not notice any > real leak of irq-tables. We've considered that too, but found that the irq-tables is not dynamically alloc/free, they won't be freed once initialized, so there is no leaking for such object :-) Regards, Michael Wang > > > > Joerg > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
Hi, Joery On 11/20/2015 12:33 PM, Michael Wang wrote: > The kmemleak testing on 3.18.24 show: > > unreferenced object 0x880233ff9010 (size 16): > comm "swapper/0", pid 1, jiffies 4294937440 (age 2010.490s) > hex dump (first 16 bytes): > 0a 0a 00 00 20 00 00 00 00 44 fb 33 02 88 ff ff D.3 > backtrace: > [] create_object+0x10d/0x2d0 > [] kmemleak_alloc+0x5b/0xc0 > [] kmem_cache_alloc_trace+0xb9/0x160 > [] get_irq_table+0x151/0x380 > > This is caused by the 'irq_lookup_table' was allocated with > __get_free_pages() which won't create kmemleak object, thus it's > pointers won't be count as referencing 'irq_remap_table' in > kmemleak scan. > > The 'irq_remap_table' won't be freed after initialized, doesn't > make sense to check it's leaking. > > This patch mark the 'irq_remap_table' object as 'gray' to stop > the 'false positives' report. Any comments on this one? Regards, Michael Wang > > Signed-off-by: Michael Wang > --- > v2: > Use kmemleak_not_leak() instead of kmemleak_ignore() since > the 'irq_remap_table' itself also contain pointer. > > drivers/iommu/amd_iommu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 8b2be1e..87a1a88 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -3603,6 +3603,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, > bool ioapic) > } > > irq_lookup_table[devid] = table; > + kmemleak_not_leak(table); > set_dte_irq_entry(devid, table); > iommu_flush_dte(iommu, devid); > if (devid != alias) { > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfnetlink warnings
On 11/23/2015 11:32 AM, Borislav Petkov wrote: > On Mon, Nov 23, 2015 at 11:20:18AM +0100, Michael Wang wrote: >> Who want to do that would take responsibility to make an else branch at >> that time, but reserve the branch at this moment sounds unnecessary, and >> not that pretty frankly speaking. > > Actually, I was looking for the better idea which doesn't uglify the > code. And here it is: > > https://lkml.kernel.org/r/5585663.OcpAQiytKY@wuerfel Looks even better :-) Regards, Michael Wang > > :-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfnetlink warnings
On 11/23/2015 10:54 AM, Borislav Petkov wrote: > Hi Michael, > > On Mon, Nov 23, 2015 at 10:49:34AM +0100, Michael Wang wrote: >> Why not just initialized it as NULL, or mark it as uninitialized_var()? > > because I'd like us to save us the redundant NULL initialization in the > if-case. Well, I would vote initialized with NULL, rather than use another else branch to do the same thing. > > I'm not saying any of the approaches are good visually, though. Who > knows, someone might have a better idea like, maybe "Oh, I wanted to > rewrite that code and this handlong is going to be different anyway ..." > or so. Or something to that effect. Who want to do that would take responsibility to make an else branch at that time, but reserve the branch at this moment sounds unnecessary, and not that pretty frankly speaking. > > Btw, please do not top-post. Enjoy ;-) Regards, Michael Wang > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfnetlink warnings
Hi, Borislav Why not just initialized it as NULL, or mark it as uninitialized_var()? Regards, Michael Wang On 11/23/2015 10:36 AM, Borislav Petkov wrote: > Hey, > > so I keep getting those since recently: > > net/netfilter/nfnetlink_queue.c:519:19: warning: ‘nfnl_ct’ may be used > uninitialized in this function [-Wmaybe-uninitialized] > if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0) >^ > net/netfilter/nfnetlink_queue.c:316:23: note: ‘nfnl_ct’ was declared here > struct nfnl_ct_hook *nfnl_ct; >^ > net/netfilter/nfnetlink_queue.c: In function ‘nfqnl_recv_verdict’: > net/netfilter/nfnetlink_queue.c:1083:11: warning: ‘nfnl_ct’ may be used > uninitialized in this function [-Wmaybe-uninitialized] > nfnl_ct->seq_adjust(entry->skb, ct, ctinfo, diff); >^ > > and was thinking can we shut them up like this? I know, it is ugly :-\ > > I mean, it is obvious in both cases that nfnl_ct won't be used if ct is > not set but apparently gcc can't see that far... > > --- > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > index 7d81d280cb4f..cd61b0b5c413 100644 > --- a/net/netfilter/nfnetlink_queue.c > +++ b/net/netfilter/nfnetlink_queue.c > @@ -372,6 +372,8 @@ nfqnl_build_packet_message(struct net *net, struct > nfqnl_instance *queue, > if (ct != NULL) > size += nfnl_ct->build_size(ct); > } > + } else { > + nfnl_ct = NULL; > } > > if (queue->flags & NFQA_CFG_F_UID_GID) { > @@ -1069,6 +1071,8 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff > *skb, > nfnl_ct = rcu_dereference(nfnl_ct_hook); > if (nfnl_ct != NULL) > ct = nfqnl_ct_parse(nfnl_ct, nlh, nfqa, entry, &ctinfo); > + } else { > + nfnl_ct = NULL; > } > > if (nfqa[NFQA_PAYLOAD]) { > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
On 11/20/2015 12:33 PM, Michael Wang wrote: > The kmemleak testing on 3.18.24 show: > > unreferenced object 0x880233ff9010 (size 16): > comm "swapper/0", pid 1, jiffies 4294937440 (age 2010.490s) > hex dump (first 16 bytes): > 0a 0a 00 00 20 00 00 00 00 44 fb 33 02 88 ff ff D.3 > backtrace: > [] create_object+0x10d/0x2d0 > [] kmemleak_alloc+0x5b/0xc0 > [] kmem_cache_alloc_trace+0xb9/0x160 > [] get_irq_table+0x151/0x380 > > This is caused by the 'irq_lookup_table' was allocated with > __get_free_pages() which won't create kmemleak object, thus it's > pointers won't be count as referencing 'irq_remap_table' in > kmemleak scan. > > The 'irq_remap_table' won't be freed after initialized, doesn't > make sense to check it's leaking. > > This patch mark the 'irq_remap_table' object as 'gray' to stop > the 'false positives' report. > > Signed-off-by: Michael Wang Reported-by: Danil Kipnis Regards, Michael Wang > --- > v2: > Use kmemleak_not_leak() instead of kmemleak_ignore() since > the 'irq_remap_table' itself also contain pointer. > > drivers/iommu/amd_iommu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 8b2be1e..87a1a88 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -3603,6 +3603,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, > bool ioapic) > } > > irq_lookup_table[devid] = table; > + kmemleak_not_leak(table); > set_dte_irq_entry(devid, table); > iommu_flush_dte(iommu, devid); > if (devid != alias) { > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak
The kmemleak testing on 3.18.24 show: unreferenced object 0x880233ff9010 (size 16): comm "swapper/0", pid 1, jiffies 4294937440 (age 2010.490s) hex dump (first 16 bytes): 0a 0a 00 00 20 00 00 00 00 44 fb 33 02 88 ff ff D.3 backtrace: [] create_object+0x10d/0x2d0 [] kmemleak_alloc+0x5b/0xc0 [] kmem_cache_alloc_trace+0xb9/0x160 [] get_irq_table+0x151/0x380 This is caused by the 'irq_lookup_table' was allocated with __get_free_pages() which won't create kmemleak object, thus it's pointers won't be count as referencing 'irq_remap_table' in kmemleak scan. The 'irq_remap_table' won't be freed after initialized, doesn't make sense to check it's leaking. This patch mark the 'irq_remap_table' object as 'gray' to stop the 'false positives' report. Signed-off-by: Michael Wang --- v2: Use kmemleak_not_leak() instead of kmemleak_ignore() since the 'irq_remap_table' itself also contain pointer. drivers/iommu/amd_iommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8b2be1e..87a1a88 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3603,6 +3603,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic) } irq_lookup_table[devid] = table; + kmemleak_not_leak(table); set_dte_irq_entry(devid, table); iommu_flush_dte(iommu, devid); if (devid != alias) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] iommu/amd: make kmemleak ignore the 'irq_remap_table' object
The kmemleak testing on 3.18.24 show: unreferenced object 0x880233ff9010 (size 16): comm "swapper/0", pid 1, jiffies 4294937440 (age 2010.490s) hex dump (first 16 bytes): 0a 0a 00 00 20 00 00 00 00 44 fb 33 02 88 ff ff D.3 backtrace: [] create_object+0x10d/0x2d0 [] kmemleak_alloc+0x5b/0xc0 [] kmem_cache_alloc_trace+0xb9/0x160 [] get_irq_table+0x151/0x380 This is caused by the 'irq_lookup_table' was allocated with __get_free_pages() which won't create kmemleak object, thus it's pointers won't be count as referencing in kmemleak scanning. The 'irq_remap_table' allocated won't be freed after initialized, doesn't make sense to let kmemleak scan it. This patch mark the 'irq_remap_table' object as 'ignored' to stop the 'false positives' report. Signed-off-by: Michael Wang --- drivers/iommu/amd_iommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8b2be1e..87a1a88 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3603,6 +3603,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic) } irq_lookup_table[devid] = table; + kmemleak_ignore(table); set_dte_irq_entry(devid, table); iommu_flush_dte(iommu, devid); if (devid != alias) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v2] Documentation/infiniband: Add docs for rdma-helpers
On 05/18/2015 06:58 PM, Doug Ledford wrote: [snip] >> I see :-) I've not work with the kdoc yet, not sure if there is any >> guidelines on how to write the header of inline func for kdoc? > > It's an automated tool thing. Any comment section that starts with /** > is automatically included as a kdoc. Then there is an expected format > after that. See Documentation/kernel-doc-nano-HOWTO.txt. Got it :-) > >>> >>> Just because I want to move this along versus waiting for another >>> respin, I'm going to copy and paste these into those locations and clean >>> up the changelog when I integrate this patch. >> >> Got it, if there is anything I could help, please let me know ;-) > > I'm sending the patch for review, please let me know if you are OK with > how I handled the attribution. The definition is far more detailed and accurate, it's already good enough according to my understanding, should benefit the developer a lot ;-) Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v2] Documentation/infiniband: Add docs for rdma-helpers
On 05/18/2015 05:21 PM, Doug Ledford wrote: [snip] >> >> I'll put the highlights and changelog under '---' in next version, is it >> looks like this? > > We're still missing Jason's feedback request though. Specifically, he > pointed out that kdocs are usually not done in Documentation/*, they are > done in the .c files where the function is (or the .h file if the > function is an inline, which these all are). So, you included some > limited documentation for each of these items in your original patches > that added them. His request was that you put this expanded information > not in Documentation/infiniband where someone has to go looking for it, > but as part of the kdoc header for each of the various helpers in > ib_verbs.h itself. I see :-) I've not work with the kdoc yet, not sure if there is any guidelines on how to write the header of inline func for kdoc? > > Just because I want to move this along versus waiting for another > respin, I'm going to copy and paste these into those locations and clean > up the changelog when I integrate this patch. Got it, if there is anything I could help, please let me know ;-) Regards, Michael Wang > >> >> Subject: [PATCH RFC v3] Documentation/infiniband: Add docs for rdma-helpers >> >> This is the following patch for: >> https://lkml.org/lkml/2015/5/5/417 >> which try to document the settled rdma_cap_XX(). >> >> Signed-off-by: Michael Wang >> --- >> Highlights: >> There could be many missing/mistakes/misunderstanding, please don't >> be hesitate to point out the issues, any suggestions to improve or >> complete the description are very welcomed ;-) >> >> v2: >> * Merge the descriptions from Doug: >> http://www.spinics.net/lists/linux-rdma/msg25172.html >> >> v3: >> ... >> >> Documentation/infiniband/rdma_helpers.txt | 79 >> +++ >> 1 file changed, 79 insertions(+) >> create mode 100644 Documentation/infiniband/rdma_helpers.txt >> >> diff --git a/Documentation/infiniband/rdma_helpers.txt >> b/Documentation/infiniband/rdma_helpers.txt >> new file mode 100644 >> index 000..be9416d >> --- /dev/null >> +++ b/Documentation/infiniband/rdma_helpers.txt >> >> Regards, >> Michael Wang >> >>> >>>> >>>> Signed-off-by: Michael Wang >>>> --- >>>> Documentation/infiniband/rdma_helpers.txt | 79 >>>> +++ >>>> 1 file changed, 79 insertions(+) >>>> create mode 100644 Documentation/infiniband/rdma_helpers.txt >>>> >>>> diff --git a/Documentation/infiniband/rdma_helpers.txt >>>> b/Documentation/infiniband/rdma_helpers.txt >>>> new file mode 100644 >>>> index 000..be9416d >>>> --- /dev/null >>>> +++ b/Documentation/infiniband/rdma_helpers.txt >>>> @@ -0,0 +1,79 @@ >>>> +RDMA HELPERS >>>> + >>>> + The following helpers are used to check the specific capabilities of a >>>> + particular port before utilizing those capabilities. >>>> + >>>> +rdma_cap_ib_mad- Infiniband Management Datagrams. >>>> +rdma_cap_ib_smi- Infiniband Subnet Management Interface. >>>> +rdma_cap_ib_cm - Infiniband Communication Manager. >>>> +rdma_cap_iw_cm - IWARP Communication Manager. >>>> +rdma_cap_ib_sa - Infiniband Subnet Administration. >>>> +rdma_cap_ib_mcast - Infiniband Multicast Join/Leave Protocol. >>>> +rdma_cap_read_multi_sge- RDMA Read Work Request Support Multiple >>>> SGE. >>>> +rdma_cap_af_ib - Native Infiniband Address. >>>> +rdma_cap_eth_ah- InfiniBand Transport With Ethernet >>>> Address. >>>> + >>>> +USAGE >>>> + >>>> + if (rdma_cap_XX(device, i)) { >>>> + /* The port i of device support XX */ >>>> + ... >>>> + } else { >>>> + /* The port i of device don't support XX */ >>>> + ... >>>> + } >>>> + >>>> + rdma_cap_ib_mad >>>> + --- >>>> +Management Datagrams (MAD) are a required part of the InfiniBand >>>> +specification and are supported on all InfiniBand devices. A slightly >>>> +extended version are also supported on OPA i
Re: [PATCH RFC v2] Documentation/infiniband: Add docs for rdma-helpers
Hi, Or On 05/18/2015 11:47 AM, Or Gerlitz wrote: [snip] >> Highlights: >> There could be many missing/mistakes/misunderstanding, please don't >> be hesitate to point out the issues, any suggestions to improve or >> complete the description are very welcomed ;-) > > Michael, none of what you wrote above belongs to the change-log which > is going to stay for-ever in the upstream git repo. You should put it > all belong the --- line after your S.O.B and add proper change log > telling what this patch is about Thanks for point out this for me :-) I'll put the highlights and changelog under '---' in next version, is it looks like this? Subject: [PATCH RFC v3] Documentation/infiniband: Add docs for rdma-helpers This is the following patch for: https://lkml.org/lkml/2015/5/5/417 which try to document the settled rdma_cap_XX(). Signed-off-by: Michael Wang --- Highlights: There could be many missing/mistakes/misunderstanding, please don't be hesitate to point out the issues, any suggestions to improve or complete the description are very welcomed ;-) v2: * Merge the descriptions from Doug: http://www.spinics.net/lists/linux-rdma/msg25172.html v3: ... Documentation/infiniband/rdma_helpers.txt | 79 +++ 1 file changed, 79 insertions(+) create mode 100644 Documentation/infiniband/rdma_helpers.txt diff --git a/Documentation/infiniband/rdma_helpers.txt b/Documentation/infiniband/rdma_helpers.txt new file mode 100644 index 000..be9416d --- /dev/null +++ b/Documentation/infiniband/rdma_helpers.txt Regards, Michael Wang > >> >> Signed-off-by: Michael Wang >> --- >> Documentation/infiniband/rdma_helpers.txt | 79 >> +++ >> 1 file changed, 79 insertions(+) >> create mode 100644 Documentation/infiniband/rdma_helpers.txt >> >> diff --git a/Documentation/infiniband/rdma_helpers.txt >> b/Documentation/infiniband/rdma_helpers.txt >> new file mode 100644 >> index 000..be9416d >> --- /dev/null >> +++ b/Documentation/infiniband/rdma_helpers.txt >> @@ -0,0 +1,79 @@ >> +RDMA HELPERS >> + >> + The following helpers are used to check the specific capabilities of a >> + particular port before utilizing those capabilities. >> + >> +rdma_cap_ib_mad- Infiniband Management Datagrams. >> +rdma_cap_ib_smi- Infiniband Subnet Management Interface. >> +rdma_cap_ib_cm - Infiniband Communication Manager. >> +rdma_cap_iw_cm - IWARP Communication Manager. >> +rdma_cap_ib_sa - Infiniband Subnet Administration. >> +rdma_cap_ib_mcast - Infiniband Multicast Join/Leave Protocol. >> +rdma_cap_read_multi_sge- RDMA Read Work Request Support Multiple >> SGE. >> +rdma_cap_af_ib - Native Infiniband Address. >> +rdma_cap_eth_ah- InfiniBand Transport With Ethernet Address. >> + >> +USAGE >> + >> + if (rdma_cap_XX(device, i)) { >> + /* The port i of device support XX */ >> + ... >> + } else { >> + /* The port i of device don't support XX */ >> + ... >> + } >> + >> + rdma_cap_ib_mad >> + --- >> +Management Datagrams (MAD) are a required part of the InfiniBand >> +specification and are supported on all InfiniBand devices. A slightly >> +extended version are also supported on OPA interfaces. >> + >> + rdma_cap_ib_smi >> + --- >> +Subnet Management Interface (SMI) will handle SMP packet from SM >> +in an infiniband fabric. >> + >> + rdma_cap_ib_cm >> + --- >> +Communication Manager (CM) service, used to ease the process of >> connecting >> +to a remote host. The IB-CM can be used to connect to remote hosts >> using >> +either InfiniBand or RoCE connections, iWARP has its own CM. >> + >> + rdma_cap_iw_cm >> + --- >> +iWARP Communication Manager (CM), Similar to the IB-CM, but only used on >> +iWARP devices. >> + >> + rdma_cap_ib_sa >> + --- >> +Subnet Administration (SA) is the database built by SM in an >> +infiniband fabric. >> + >> + rdma_cap_ib_mcast >> + --- >> +InfiniBand (and OPA) use a different multicast mechanism rather than >> +traditional IP multicast found on Ethernet devices. If this is true, >> then >> +traditional IPv4/IPv6 multicast is handled by the IPoIB layer and direct >> +multicast joins and leaves
[PATCH RFC v2] Documentation/infiniband: Add docs for rdma-helpers
Since v1: * Merge the descriptions from Doug: http://www.spinics.net/lists/linux-rdma/msg25172.html This is the following patch for: https://lkml.org/lkml/2015/5/5/417 which try to document the settled rdma_cap_XX(). Highlights: There could be many missing/mistakes/misunderstanding, please don't be hesitate to point out the issues, any suggestions to improve or complete the description are very welcomed ;-) Signed-off-by: Michael Wang --- Documentation/infiniband/rdma_helpers.txt | 79 +++ 1 file changed, 79 insertions(+) create mode 100644 Documentation/infiniband/rdma_helpers.txt diff --git a/Documentation/infiniband/rdma_helpers.txt b/Documentation/infiniband/rdma_helpers.txt new file mode 100644 index 000..be9416d --- /dev/null +++ b/Documentation/infiniband/rdma_helpers.txt @@ -0,0 +1,79 @@ +RDMA HELPERS + + The following helpers are used to check the specific capabilities of a + particular port before utilizing those capabilities. + +rdma_cap_ib_mad- Infiniband Management Datagrams. +rdma_cap_ib_smi- Infiniband Subnet Management Interface. +rdma_cap_ib_cm - Infiniband Communication Manager. +rdma_cap_iw_cm - IWARP Communication Manager. +rdma_cap_ib_sa - Infiniband Subnet Administration. +rdma_cap_ib_mcast - Infiniband Multicast Join/Leave Protocol. +rdma_cap_read_multi_sge- RDMA Read Work Request Support Multiple SGE. +rdma_cap_af_ib - Native Infiniband Address. +rdma_cap_eth_ah- InfiniBand Transport With Ethernet Address. + +USAGE + + if (rdma_cap_XX(device, i)) { + /* The port i of device support XX */ + ... + } else { + /* The port i of device don't support XX */ + ... + } + + rdma_cap_ib_mad + --- +Management Datagrams (MAD) are a required part of the InfiniBand +specification and are supported on all InfiniBand devices. A slightly +extended version are also supported on OPA interfaces. + + rdma_cap_ib_smi + --- +Subnet Management Interface (SMI) will handle SMP packet from SM +in an infiniband fabric. + + rdma_cap_ib_cm + --- +Communication Manager (CM) service, used to ease the process of connecting +to a remote host. The IB-CM can be used to connect to remote hosts using +either InfiniBand or RoCE connections, iWARP has its own CM. + + rdma_cap_iw_cm + --- +iWARP Communication Manager (CM), Similar to the IB-CM, but only used on +iWARP devices. + + rdma_cap_ib_sa + --- +Subnet Administration (SA) is the database built by SM in an +infiniband fabric. + + rdma_cap_ib_mcast + --- +InfiniBand (and OPA) use a different multicast mechanism rather than +traditional IP multicast found on Ethernet devices. If this is true, then +traditional IPv4/IPv6 multicast is handled by the IPoIB layer and direct +multicast joins and leaves are handled per the InfiniBand specifications. + + rdma_cap_read_multi_sge + --- +Certain devices (iWARP in particular) have restrictions on the number of +scatter gather elements that can be present in an RDMA READ work request, +this is true if the device does not have that restriction. + + rdma_cap_af_ib + --- +Many code paths for traditional InfiniBand and RoCE links are the same, +but need minor differences to accommodate the different addresses on the +two types of connections. This helper is true when the address of the +specific connection is of the InfiniBand native variety. + + rdma_cap_eth_ah + --- +Queue Pair is InfiniBand transport, but uses Ethernet address instead +of native InfiniBand address (aka, this is a RoCE QP, and that means +ethertype 0x8915 + GRH for RoCEv1 and IP/UDP to well known UDP port for +RoCEv2), this is true when the address family of the specific queue pair +is of the Ethernet (RoCE) variety. -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
On 05/15/2015 04:40 PM, Doug Ledford wrote: [snip] > > The test itself doesn't mean that. It means we need a RoCE address > (it's true when transport is IB and link layer is Ethernet). That we > *use* it during connectionless communication because we have to generate > our own address vector for the packet while during connected queue pair > use the address vector is created by the card using the queue pair > information is just the circumstance of its use. And even though a > disconnected queue pair isn't solidly connected to a remote endpoint, it > is solidly bound to an adapter that requires either an IB or Ethernet > address family. Maybe this to resolve your issue with the wording: Thanks for the explain :-) The term 'connectionless' still sounds a little strange to me when it's just means no HW support on creating address vector, but I can understand the concept. > > This helper is true when the address family of this queue pair is of the > Ethernet (RoCE) variety. Sounds good, will be merged in next version :-) Regards, Michael Wang > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
On 05/15/2015 04:27 PM, Doug Ledford wrote: [snip] >> >> Me too used to think it's 'connection', while I found some docs explain >> this as 'communication'... but anyway, 'connection' sounds >> more close to what it did in kernel :-) > > That's kind of what I thought. Anyway, it's communication management > (which to me is a gross abuse of the english language for which the IBTA > should be appropriately chastised), but that doesn't mean that lower > down in the more descriptive area of text that we can't call out that > this is really for establishing a connection and that once your > connection is established and you *truly* want to communicate, this does > nothing. I see :-) we can reserve the communication management as the definition of CM, to obey the standard, meanwhile give some description related to connection below in the long description. > [snip] >> Shall we put this long description into USAGE? Here maybe list >> all the helpers to give some quick overview with a brief >> description, what's your opinion? > > Given how we have a more complete description of this below, it need not > have such a lengthy description here. Got it :-) Regards, Michael Wang >> >>>> + >>>> +USAGE >>>> + >>>> + if (rdma_cap_XX(device, i)) { >>>> + /* The port i of device support XX */ >>>> + ... >>>> + } else { >>>> + /* The port i of device don't support XX */ >>>> + ... >>>> + } >>>> + >>>> + rdma_cap_ib_mad >>>> + --- >>>> +Management Datagrams (MAD) is the prototype of management packet >>>> +to be used by all the kinds of infiniband managers, use the helper >>>> +to verify the port before utilize related features. >>> Management Datagrams (MAD) are a required part of the InfiniBand >>> specification and are supported on all InfiniBand devices. A slightly >>> extended version are also supported on OPA interfaces. >>> >>> I would drop all instances of "use the helper to verify..." as that's >>> redundant. This whole doc is about using the helpers to verify things. >> >> Agree, will be dropped in next version. >> >> And all the comments below make sense, will be merged ;-) >> >> Regards, >> Michael Wang >> >>> >>>> + >>>> + rdma_cap_ib_smi >>>> + --- >>>> +Subnet Management Interface (SMI) will handle SMP packet from SM >>>> +in an infiniband fabric, use the helper to verify the port before >>>> +utilize related features. >>>> + >>>> + rdma_cap_ib_cm >>>> + --- >>>> +Communication Manager (CM) will handle the connections between >>>^Connection Manager (CM) service, used to ease the process of >>> connecting to a remote host. The IB CM can be used to connect to remote >>> hosts using either InfiniBand or RoCE connections. iWARP has its own >>> connection manager, see below. >>>> +adaptors, currently there are two different implementation, >>>> +IB or IWARP, use the helper to verify whether the port using >>>> +IB-CM or not >>>> + >>>> + rdma_cap_iw_cm >>>> + --- >>>> +IWARP has it's own implemented CM which is different from infiniband, >>> iWARP connection manager. Similar to the IB Connection Manager, >>> but only used on iWARP devices. >>>> +use the helper to check whether the port using IWARP-CM or not. >>>> + >>>> + rdma_cap_ib_sa >>>> + --- >>>> +Subnet Administration (SA) is the database built by SM in an >>>> +infiniband fabric, use the helper to verify the port before >>>> +utilize related features. >>>> + >>>> + rdma_cap_ib_mcast >>>> + --- >>>> +Multicast is the feature for one QP to send messages to multiple >>>> +QP in an infiniband fabric, use the helper to verify the port before >>>> +utilize related features. >>> >>> InfiniBand (and OPA) use a different multicast mechanism than >>> traditional IP multicast found on Ethernet devices. If this capability >>> is true, then traditional IPv4/IPv6 multicast is handled by the IPoIB >>> layer and direct multicast joins and leaves are handled per the >
Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
On 05/13/2015 06:42 PM, Hefty, Sean wrote: >>> + rdma_cap_ib_cm >>> + --- >>> +Communication Manager (CM) will handle the connections between >>^Connection Manager (CM) service, used to ease the process of > > In IB terms, this is communication manager. It also handles transport level > address resolution for UD QPs. I could find both 'connection' and 'communication' in different docs, while 'connection' is more related to verbs, 'communication' is more close to specification. IMHO 'connection' make more sense, after all, all the transport between adaptors could named as communication, while connection management is exactly what CM did in kernel. Doug, what's your opinion? > >>> + rdma_cap_eth_ah >>> + --- >>> +Infiniband address handler format is special in ethernet fabric, >> use >>> +the helper to verify whether the port is using ethernet format or >> not. >> >> This helper is true when the address of the specific connection is of >> the Ethernet (RoCE) variety. > > This is used for connectionless communication. Could you please give more details on this? Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
On 05/13/2015 05:11 PM, Doug Ledford wrote: [snip] >> + >> + For core layer, below helpers are used to check if a paticular capability >> + is supported by the port. > > The following helpers are used to check the specific capabilities of a > particular port before utilizing those capabilities. Will be in next version :-) > >> + >> +rdma_cap_ib_mad - Infiniband Management Datagrams. >> +rdma_cap_ib_smi - Infiniband Subnet Management Interface. >> +rdma_cap_ib_cm - Infiniband Communication Manager. > InfiniBand Connection Management Me too used to think it's 'connection', while I found some docs explain this as 'communication'... but anyway, 'connection' sounds more close to what it did in kernel :-) >> +rdma_cap_iw_cm - IWARP Communication Manager. > iWARP Connection Management >> +rdma_cap_ib_sa - Infiniband Subnet Administration. >> +rdma_cap_ib_mcast - Infiniband Multicast. > InfiniBand Multicast join/leave protocol >> +rdma_cap_read_multi_sge - RDMA Read Multiple Scatter-Gather Entries. > RDMA Read verb supports more than 1 sge in the work request Will be in next version :-) >> +rdma_cap_af_ib - Native Infiniband Address. >> +rdma_cap_eth_ah - Ethernet Address Handler. > Queue Pair is InfiniBand transport, but uses Ethernet address instead of > native InfiniBand address (aka, this is a RoCE QP, and that means > ethertype 0x8915 + GRH for RoCEv1 and IP/UDP to well known UDP port for > RoCEv2) Shall we put this long description into USAGE? Here maybe list all the helpers to give some quick overview with a brief description, what's your opinion? >> + >> +USAGE >> + >> + if (rdma_cap_XX(device, i)) { >> +/* The port i of device support XX */ >> +... >> + } else { >> +/* The port i of device don't support XX */ >> +... >> + } >> + >> + rdma_cap_ib_mad >> + --- >> +Management Datagrams (MAD) is the prototype of management packet >> +to be used by all the kinds of infiniband managers, use the helper >> +to verify the port before utilize related features. > Management Datagrams (MAD) are a required part of the InfiniBand > specification and are supported on all InfiniBand devices. A slightly > extended version are also supported on OPA interfaces. > > I would drop all instances of "use the helper to verify..." as that's > redundant. This whole doc is about using the helpers to verify things. Agree, will be dropped in next version. And all the comments below make sense, will be merged ;-) Regards, Michael Wang > >> + >> + rdma_cap_ib_smi >> + --- >> +Subnet Management Interface (SMI) will handle SMP packet from SM >> +in an infiniband fabric, use the helper to verify the port before >> +utilize related features. >> + >> + rdma_cap_ib_cm >> + --- >> +Communication Manager (CM) will handle the connections between >^Connection Manager (CM) service, used to ease the process of > connecting to a remote host. The IB CM can be used to connect to remote > hosts using either InfiniBand or RoCE connections. iWARP has its own > connection manager, see below. >> +adaptors, currently there are two different implementation, >> +IB or IWARP, use the helper to verify whether the port using >> +IB-CM or not >> + >> + rdma_cap_iw_cm >> + --- >> +IWARP has it's own implemented CM which is different from infiniband, > iWARP connection manager. Similar to the IB Connection Manager, > but only used on iWARP devices. >> +use the helper to check whether the port using IWARP-CM or not. >> + >> + rdma_cap_ib_sa >> + --- >> +Subnet Administration (SA) is the database built by SM in an >> +infiniband fabric, use the helper to verify the port before >> +utilize related features. >> + >> + rdma_cap_ib_mcast >> + --- >> +Multicast is the feature for one QP to send messages to multiple >> +QP in an infiniband fabric, use the helper to verify the port before >> +utilize related features. > > InfiniBand (and OPA) use a different multicast mechanism than > traditional IP multicast found on Ethernet devices. If this capability > is true, then traditional IPv4/IPv6 multicast is handled by the IPoIB > layer and direct multicast joins and leaves are handled per the > InfiniBand specifications. > >> + >> + rdm
Re: [PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
On 05/13/2015 05:59 PM, Jason Gunthorpe wrote: > On Wed, May 13, 2015 at 03:24:32PM +0200, Michael Wang wrote: >> This is the following patch for: >> https://lkml.org/lkml/2015/5/5/417 >> which try to document the settled rdma_cap_XX(). >> >> Highlights: >> There could be many missing/mistakes/misunderstanding, please don't >> be hesitate to point out the issues, any suggestions to improve or >> complete the description are very welcomed ;-) > > I'd rather see this in the kdoc for each function. I used to thought you mean the kernel documentation like this... my misunderstanding but this is the usual way to document kernel stuff, isn't it? BTW, could you give more details on the kdoc? Regards, Michael Wang > > Thanks, > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC] Documentation/infiniband: Add docs for rdma-helpers
This is the following patch for: https://lkml.org/lkml/2015/5/5/417 which try to document the settled rdma_cap_XX(). Highlights: There could be many missing/mistakes/misunderstanding, please don't be hesitate to point out the issues, any suggestions to improve or complete the description are very welcomed ;-) Signed-off-by: Michael Wang --- Documentation/infiniband/rdma_helpers.txt | 76 +++ 1 file changed, 76 insertions(+) create mode 100644 Documentation/infiniband/rdma_helpers.txt diff --git a/Documentation/infiniband/rdma_helpers.txt b/Documentation/infiniband/rdma_helpers.txt new file mode 100644 index 000..abc75ec --- /dev/null +++ b/Documentation/infiniband/rdma_helpers.txt @@ -0,0 +1,76 @@ +RDMA HELPERS + + For core layer, below helpers are used to check if a paticular capability + is supported by the port. + +rdma_cap_ib_mad- Infiniband Management Datagrams. +rdma_cap_ib_smi- Infiniband Subnet Management Interface. +rdma_cap_ib_cm - Infiniband Communication Manager. +rdma_cap_iw_cm - IWARP Communication Manager. +rdma_cap_ib_sa - Infiniband Subnet Administration. +rdma_cap_ib_mcast - Infiniband Multicast. +rdma_cap_read_multi_sge- RDMA Read Multiple Scatter-Gather Entries. +rdma_cap_af_ib - Native Infiniband Address. +rdma_cap_eth_ah- Ethernet Address Handler. + +USAGE + + if (rdma_cap_XX(device, i)) { + /* The port i of device support XX */ + ... + } else { + /* The port i of device don't support XX */ + ... + } + + rdma_cap_ib_mad + --- +Management Datagrams (MAD) is the prototype of management packet +to be used by all the kinds of infiniband managers, use the helper +to verify the port before utilize related features. + + rdma_cap_ib_smi + --- +Subnet Management Interface (SMI) will handle SMP packet from SM +in an infiniband fabric, use the helper to verify the port before +utilize related features. + + rdma_cap_ib_cm + --- +Communication Manager (CM) will handle the connections between +adaptors, currently there are two different implementation, +IB or IWARP, use the helper to verify whether the port using +IB-CM or not + + rdma_cap_iw_cm + --- +IWARP has it's own implemented CM which is different from infiniband, +use the helper to check whether the port using IWARP-CM or not. + + rdma_cap_ib_sa + --- +Subnet Administration (SA) is the database built by SM in an +infiniband fabric, use the helper to verify the port before +utilize related features. + + rdma_cap_ib_mcast + --- +Multicast is the feature for one QP to send messages to multiple +QP in an infiniband fabric, use the helper to verify the port before +utilize related features. + + rdma_cap_read_multi_sge + --- +RDMA read operation could support multiple scatter-gather entries, +use the helper to verify wthether the port support this feature +or not. + + rdma_cap_af_ib + --- +RDMA address format could be ethernet or infiniband, use the helper +to verify whether the port support infiniband format or not. + + rdma_cap_eth_ah + --- +Infiniband address handler format is special in ethernet fabric, use +the helper to verify whether the port is using ethernet format or not. -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers
On 05/12/2015 10:09 PM, Doug Ledford wrote: [snip] >> >> I had asked for better kdocs for the new helpers so new people can >> understand when and where to use them. >> >> I've not looked at the series at all for the past few postings. > > Michael, please work up an incremental patch to address the kdocs issue. > I've picked up the v8 patchset, and there is no need to respin it, but I > would like to have that kdoc patch before the 4.2 merge window opens. Sure, now these helpers are settled down, it's time for document, I'll send out the RFC ASAP :-) Regards, Michael Wang > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers
On 05/12/2015 04:24 PM, Doug Ledford wrote: [snip] >> >> AFAIK Or was asking to merge the #15~23, and want to reserve the changelog >> meanwhile reply the cover of prev version (I'm still confused on that...), >> I've replied but get no respond yet. >> >> I can make a v9 to merge the #15~#23 if that could benefit the >> maintainability, >> please let me know your opinion :-) > > I don't think it would make a significant difference. I've pulled the > v8 patchset out of patchworks and I'll throw a new branch with it > included up to my github repo sometime today. Got it :-) Regards, Michael Wang > >> About the Bug, if it was not introduced in this series, maybe including the >> fix in next series would be better? >> >> Regards, >> Michael Wang >> >>> >>>> Frankly >>>> I vote for the former because as it stands this series does not break >>>> directly. >>>> It was only after I changed the implementation of rdma_cap_ib_mad that it >>>> broke. >>>> >>>> >>>> For the rest of the series. >>>> >>>> Reviewed-by: Ira Weiny >>>> Tested-by: Ira Weiny >>>>-- Limited to mlx4, qib, and OPA (with additional patches.) >>>> >>>> >>>> On Tue, May 05, 2015 at 02:50:17PM +0200, Michael Wang wrote: >>>>> Since v7: >>>>> * Thanks to Doug, Ira, Devesh for the testing :-) >>>>> * Thanks for the comments from or, Doug, Ira, Jason :-) >>>>> Please remind me if anything missed :-P >>>>> * Use rdma_cap_XX() instead of cap_XX() for readability >>>>> * Remove CC list in git log for maintainability >>>>> * Use bool as return value >>>>> * Updated github repository to v8 >>>>> >>>>> There are plenty of lengthy code to check the transport type of IB device, >>>>> or the link layer type of it's port, but actually we are just speculating >>>>> whether a particular management/feature is supported by the device/port. >>>>> >>>>> Thus instead of inferring, we should have our own mechanism for IB >>>>> management >>>>> capability/protocol/feature checking, several proposals below. >>>>> >>>>> This patch set will introduce query_protocol() to check management >>>>> requirement >>>>> instead of inferring from transport and link layer respectively, along >>>>> with >>>>> the new enum on protocol type. >>>>> >>>>> Mapping List: >>>>> node-type link-layer transport protocol >>>>> nes RNICETH IWARP IWARP >>>>> amso1100 RNICETH IWARP IWARP >>>>> cxgb3 RNICETH IWARP IWARP >>>>> cxgb4 RNICETH IWARP IWARP >>>>> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP >>>>> ocrdmaIB_CA ETH IB IBOE >>>>> mlx4 IB_CA IB/ETH IB IB/IBOE >>>>> mlx5 IB_CA IB IB IB >>>>> ehca IB_CA IB IB IB >>>>> ipath IB_CA IB IB IB >>>>> mthca IB_CA IB IB IB >>>>> qib IB_CA IB IB IB >>>>> >>>>> For example: >>>>> if (transport == IB) && (link-layer == ETH) >>>>> will now become: >>>>> if (query_protocol() == IBOE) >>>>> >>>>> Thus we will be able to get rid of the respective transport and link-layer >>>>> checking, and it will help us to add new protocol/Technology (like OPA) >>>>> more >>>>> easier, also with the introduced management helpers, IB management logical >>>>> will be more clear and easier for extending. >>>>> >>>>> Highlights: >>>>> The long CC list in each patches was complained consider about the >>>>> maintainability, it was suggested folks to provide their reviewed-by >>>>> or >>>>> Acked-by instead, so for those who used
Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers
On 05/12/2015 01:49 AM, ira.weiny wrote: > I have run with this series and the only issue I have found is not with this > patch set directly. > > This patch: > >> IB/Verbs: Use management helper rdma_cap_ib_mad() > > causes an error when you actually use the port passed from the ib_umad module. > I have a patch to fix that which I found while trying to build on this series > for the use of a bit mask. > > Doug, I don't know what you would like to do for this fix. I am submitting it > shortly with a new version of the core capability bit patches. If you want to > just add it after this series or force Michael to respin with the fix? > Frankly > I vote for the former because as it stands this series does not break > directly. > It was only after I changed the implementation of rdma_cap_ib_mad that it > broke. Agree, it sounds more reasonable to include the fix in the series introduced it :-P > > > For the rest of the series. > > Reviewed-by: Ira Weiny > Tested-by: Ira Weiny > -- Limited to mlx4, qib, and OPA (with additional patches.) Thanks for the review and testing :-) Regards, Michael Wang > > > On Tue, May 05, 2015 at 02:50:17PM +0200, Michael Wang wrote: >> Since v7: >> * Thanks to Doug, Ira, Devesh for the testing :-) >> * Thanks for the comments from or, Doug, Ira, Jason :-) >> Please remind me if anything missed :-P >> * Use rdma_cap_XX() instead of cap_XX() for readability >> * Remove CC list in git log for maintainability >> * Use bool as return value >> * Updated github repository to v8 >> >> There are plenty of lengthy code to check the transport type of IB device, >> or the link layer type of it's port, but actually we are just speculating >> whether a particular management/feature is supported by the device/port. >> >> Thus instead of inferring, we should have our own mechanism for IB management >> capability/protocol/feature checking, several proposals below. >> >> This patch set will introduce query_protocol() to check management >> requirement >> instead of inferring from transport and link layer respectively, along with >> the new enum on protocol type. >> >> Mapping List: >> node-type link-layer transport protocol >> nes RNICETH IWARP IWARP >> amso1100 RNICETH IWARP IWARP >> cxgb3RNICETH IWARP IWARP >> cxgb4RNICETH IWARP IWARP >> usnicUSNIC_UDP ETH USNIC_UDP USNIC_UDP >> ocrdma IB_CA ETH IB IBOE >> mlx4 IB_CA IB/ETH IB IB/IBOE >> mlx5 IB_CA IB IB IB >> ehca IB_CA IB IB IB >> ipathIB_CA IB IB IB >> mthcaIB_CA IB IB IB >> qib IB_CA IB IB IB >> >> For example: >> if (transport == IB) && (link-layer == ETH) >> will now become: >> if (query_protocol() == IBOE) >> >> Thus we will be able to get rid of the respective transport and link-layer >> checking, and it will help us to add new protocol/Technology (like OPA) more >> easier, also with the introduced management helpers, IB management logical >> will be more clear and easier for extending. >> >> Highlights: >> The long CC list in each patches was complained consider about the >> maintainability, it was suggested folks to provide their reviewed-by or >> Acked-by instead, so for those who used to be on the CC list, please >> provide your signature voluntarily :-) >> >> The 'mgmt-helpers' branch of 'g...@github.com:ywang-pb/infiniband-wy.git' >> contain this series based on the latest 'infiniband/for-next' >> >> Patch 1#~14# included all the logical reform, 15#~23# introduced the >> management helpers. >> >> Doug suggested the bitmask mechanism: >> https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html >> which could be the plan for future reforming, we prefer that to be >> another >> series which focus on semantic and performance. >> >> This patch-set is somewhat 'bloated' now and it may be a good timing for >> staging, I'd like to suggest we focus on impr
Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers
On 05/12/2015 02:27 AM, Doug Ledford wrote: > On Mon, 2015-05-11 at 19:49 -0400, ira.weiny wrote: >> I have run with this series and the only issue I have found is not with this >> patch set directly. >> >> This patch: >> >>> IB/Verbs: Use management helper rdma_cap_ib_mad() >> >> causes an error when you actually use the port passed from the ib_umad >> module. >> I have a patch to fix that which I found while trying to build on this series >> for the use of a bit mask. >> >> Doug, I don't know what you would like to do for this fix. I am submitting >> it >> shortly with a new version of the core capability bit patches. If you want >> to >> just add it after this series or force Michael to respin with the fix? > > As I recall, there was a comment from Or requesting to squash some of > the individual patches down, but I no longer have that email in my Inbox > to double check. And it seemed like there was one other review comment > not yet addressed. Do I have that right Michael? And if so, are you > working on a v9? AFAIK Or was asking to merge the #15~23, and want to reserve the changelog meanwhile reply the cover of prev version (I'm still confused on that...), I've replied but get no respond yet. I can make a v9 to merge the #15~#23 if that could benefit the maintainability, please let me know your opinion :-) About the Bug, if it was not introduced in this series, maybe including the fix in next series would be better? Regards, Michael Wang > >> Frankly >> I vote for the former because as it stands this series does not break >> directly. >> It was only after I changed the implementation of rdma_cap_ib_mad that it >> broke. >> >> >> For the rest of the series. >> >> Reviewed-by: Ira Weiny >> Tested-by: Ira Weiny >> -- Limited to mlx4, qib, and OPA (with additional patches.) >> >> >> On Tue, May 05, 2015 at 02:50:17PM +0200, Michael Wang wrote: >>> Since v7: >>> * Thanks to Doug, Ira, Devesh for the testing :-) >>> * Thanks for the comments from or, Doug, Ira, Jason :-) >>> Please remind me if anything missed :-P >>> * Use rdma_cap_XX() instead of cap_XX() for readability >>> * Remove CC list in git log for maintainability >>> * Use bool as return value >>> * Updated github repository to v8 >>> >>> There are plenty of lengthy code to check the transport type of IB device, >>> or the link layer type of it's port, but actually we are just speculating >>> whether a particular management/feature is supported by the device/port. >>> >>> Thus instead of inferring, we should have our own mechanism for IB >>> management >>> capability/protocol/feature checking, several proposals below. >>> >>> This patch set will introduce query_protocol() to check management >>> requirement >>> instead of inferring from transport and link layer respectively, along with >>> the new enum on protocol type. >>> >>> Mapping List: >>> node-type link-layer transport protocol >>> nes RNICETH IWARP IWARP >>> amso1100RNICETH IWARP IWARP >>> cxgb3 RNICETH IWARP IWARP >>> cxgb4 RNICETH IWARP IWARP >>> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP >>> ocrdma IB_CA ETH IB IBOE >>> mlx4IB_CA IB/ETH IB IB/IBOE >>> mlx5IB_CA IB IB IB >>> ehcaIB_CA IB IB IB >>> ipath IB_CA IB IB IB >>> mthca IB_CA IB IB IB >>> qib IB_CA IB IB IB >>> >>> For example: >>> if (transport == IB) && (link-layer == ETH) >>> will now become: >>> if (query_protocol() == IBOE) >>> >>> Thus we will be able to get rid of the respective transport and link-layer >>> checking, and it will help us to add new protocol/Technology (like OPA) more >>> easier, also with the introduced management helpers, IB management logical >>> will be more clear and easier for extending. >>> >>> Highlights: >>> The long CC list in each patches was complained cons
Re: [PATCH v8 00/23] IB/Verbs: IB Management Helpers
On 05/05/2015 04:16 PM, Or Gerlitz wrote: > On 5/5/2015 3:50 PM, Michael Wang wrote: >> Since v7: >>* Thanks to Doug, Ira, Devesh for the testing:-) >>* Thanks for the comments from or, Doug, Ira, Jason:-) >> Please remind me if anything missed:-P >>* Use rdma_cap_XX() instead of cap_XX() for readability >>* Remove CC list in git log for maintainability >>* Use bool as return value >>* Updated github repository to v8 > > Didn't you see that patches 15~23 will be squashed into one? I missed the conversation on that, you mentioned to remove the CC list but don't want to merge the patches, correct? > > > Also, when you post version N you need not only to list the changes since > version N-1 but rather also to keep the full changes since Vx for x=1...N-2, > reviewers needs not chase your previous cover letters and see if/whatwent > wrong, specifically with a sensitive series like this one. That maybe too long and with some outdated info, I can reserve the log since this version if it was preferred by folks. > > So if/when there's V9, use that practice, and for the time being, reply on > the V8 cover letter with the full listing of changes I'm using git send-mail which reply on the cover of each version, not sure how to reply on a prev cover? and that won't start a new thread for the new version, isn't it? Regards, Michael Wang > > Or. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 00/23] IB/Verbs: IB Management Helpers
Since v7: * Thanks to Doug, Ira, Devesh for the testing :-) * Thanks for the comments from or, Doug, Ira, Jason :-) Please remind me if anything missed :-P * Use rdma_cap_XX() instead of cap_XX() for readability * Remove CC list in git log for maintainability * Use bool as return value * Updated github repository to v8 There are plenty of lengthy code to check the transport type of IB device, or the link layer type of it's port, but actually we are just speculating whether a particular management/feature is supported by the device/port. Thus instead of inferring, we should have our own mechanism for IB management capability/protocol/feature checking, several proposals below. This patch set will introduce query_protocol() to check management requirement instead of inferring from transport and link layer respectively, along with the new enum on protocol type. Mapping List: node-type link-layer transport protocol nes RNICETH IWARP IWARP amso1100RNICETH IWARP IWARP cxgb3 RNICETH IWARP IWARP cxgb4 RNICETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4IB_CA IB/ETH IB IB/IBOE mlx5IB_CA IB IB IB ehcaIB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB For example: if (transport == IB) && (link-layer == ETH) will now become: if (query_protocol() == IBOE) Thus we will be able to get rid of the respective transport and link-layer checking, and it will help us to add new protocol/Technology (like OPA) more easier, also with the introduced management helpers, IB management logical will be more clear and easier for extending. Highlights: The long CC list in each patches was complained consider about the maintainability, it was suggested folks to provide their reviewed-by or Acked-by instead, so for those who used to be on the CC list, please provide your signature voluntarily :-) The 'mgmt-helpers' branch of 'g...@github.com:ywang-pb/infiniband-wy.git' contain this series based on the latest 'infiniband/for-next' Patch 1#~14# included all the logical reform, 15#~23# introduced the management helpers. Doug suggested the bitmask mechanism: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html which could be the plan for future reforming, we prefer that to be another series which focus on semantic and performance. This patch-set is somewhat 'bloated' now and it may be a good timing for staging, I'd like to suggest we focus on improving existed helpers and push all the further reforms into next series ;-) Proposals: Sean: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html Doug: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23418.html https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html Jason: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23425.html Michael Wang (23): IB/Verbs: Implement new callback query_protocol() IB/Verbs: Implement raw management helpers IB/Verbs: Reform IB-core mad/agent/user_mad IB/Verbs: Reform IB-core cm IB/Verbs: Reform IB-core sa_query IB/Verbs: Reform IB-core multicast IB/Verbs: Reform IB-ulp ipoib IB/Verbs: Reform IB-ulp xprtrdma IB/Verbs: Reform IB-core verbs IB/Verbs: Reform cm related part in IB-core cma/ucm IB/Verbs: Reform route related part in IB-core cma IB/Verbs: Reform mcast related part in IB-core cma IB/Verbs: Reform cma_acquire_dev() IB/Verbs: Reform rest part in IB-core cma IB/Verbs: Use management helper rdma_cap_ib_mad() IB/Verbs: Use management helper rdma_cap_ib_smi() IB/Verbs: Use management helper rdma_cap_ib_cm() IB/Verbs: Use management helper rdma_cap_iw_cm() IB/Verbs: Use management helper rdma_cap_ib_sa() IB/Verbs: Use management helper rdma_cap_ib_mcast() IB/Verbs: Use management helper rdma_cap_read_multi_sge() IB/Verbs: Use management helper rdma_cap_af_ib() IB/Verbs: Use management helper rdma_cap_eth_ah() drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/cm.c | 20 ++- drivers/infiniband/core/cma.c| 257 +++ drivers/infiniband/core/device.c | 1 + drivers/infiniband/core/mad.c| 43 +++-- drivers/infiniband/c
[PATCH v8 02/23] IB/Verbs: Implement raw management helpers
Add raw helpers: rdma_protocol_ib rdma_protocol_iboe rdma_protocol_iwarp rdma_ib_or_iboe (transition, clean up later) To help us detect which technology the port supported. Signed-off-by: Michael Wang --- include/rdma/ib_verbs.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 080f204..e6dd984 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1752,6 +1752,28 @@ int ib_query_port(struct ib_device *device, enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num); +static inline bool rdma_protocol_ib(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IB; +} + +static inline bool rdma_protocol_iboe(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IBOE; +} + +static inline bool rdma_protocol_iwarp(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IWARP; +} + +static inline bool rdma_ib_or_iboe(struct ib_device *device, u8 port_num) +{ + enum rdma_protocol_type pt = device->query_protocol(device, port_num); + + return (pt == RDMA_PROTOCOL_IB || pt == RDMA_PROTOCOL_IBOE); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 11/23] IB/Verbs: Reform route related part in IB-core cma
Use raw management helpers to reform route related part in IB-core cma. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 31 --- drivers/infiniband/core/ucma.c | 25 ++--- 2 files changed, 14 insertions(+), 42 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8a07e89..36c5f8a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -923,13 +923,9 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv) static void cma_cancel_route(struct rdma_id_private *id_priv) { - switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) { - case IB_LINK_LAYER_INFINIBAND: + if (rdma_protocol_ib(id_priv->id.device, id_priv->id.port_num)) { if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); - break; - default: - break; } } @@ -1957,26 +1953,15 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) return -EINVAL; atomic_inc(&id_priv->refcount); - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ret = cma_resolve_ib_route(id_priv, timeout_ms); - break; - case IB_LINK_LAYER_ETHERNET: - ret = cma_resolve_iboe_route(id_priv); - break; - default: - ret = -ENOSYS; - } - break; - case RDMA_TRANSPORT_IWARP: + if (rdma_protocol_ib(id->device, id->port_num)) + ret = cma_resolve_ib_route(id_priv, timeout_ms); + else if (rdma_protocol_iboe(id->device, id->port_num)) + ret = cma_resolve_iboe_route(id_priv); + else if (rdma_protocol_iwarp(id->device, id->port_num)) ret = cma_resolve_iw_route(id_priv, timeout_ms); - break; - default: + else ret = -ENOSYS; - break; - } + if (ret) goto err; diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 45d67e9..dae7620 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file, resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid; resp.port_num = ctx->cm_id->port_num; - switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(ctx->cm_id->device, - ctx->cm_id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ucma_copy_ib_route(&resp, &ctx->cm_id->route); - break; - case IB_LINK_LAYER_ETHERNET: - ucma_copy_iboe_route(&resp, &ctx->cm_id->route); - break; - default: - break; - } - break; - case RDMA_TRANSPORT_IWARP: + + if (rdma_protocol_ib(ctx->cm_id->device, ctx->cm_id->port_num)) + ucma_copy_ib_route(&resp, &ctx->cm_id->route); + else if (rdma_protocol_iboe(ctx->cm_id->device, ctx->cm_id->port_num)) + ucma_copy_iboe_route(&resp, &ctx->cm_id->route); + else if (rdma_protocol_iwarp(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_iw_route(&resp, &ctx->cm_id->route); - break; - default: - break; - } out: if (copy_to_user((void __user *)(unsigned long)cmd.response, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 03/23] IB/Verbs: Reform IB-core mad/agent/user_mad
Use raw management helpers to reform IB-core mad/agent/user_mad. Signed-off-by: Michael Wang --- drivers/infiniband/core/agent.c| 2 +- drivers/infiniband/core/mad.c | 43 +++--- drivers/infiniband/core/user_mad.c | 26 --- 3 files changed, 41 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..89d4fbc 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } - if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 74c30f4..507eb67 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device, init_mad_qp(port_priv, &port_priv->qp_info[1]); cq_size = mad_sendq_size + mad_recvq_size; - has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND; + has_smi = rdma_protocol_ib(device, port_num); if (has_smi) cq_size *= 2; @@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device) { int start, end, i; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - if (device->node_type == RDMA_NODE_IB_SWITCH) { start = 0; end = 0; @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device) } for (i = start; i <= end; i++) { + if (!rdma_ib_or_iboe(device, i)) + continue; + if (ib_mad_port_open(device, i)) { dev_err(&device->dev, "Couldn't open port %d\n", i); goto error; @@ -3086,40 +3086,39 @@ error_agent: dev_err(&device->dev, "Couldn't close port %d\n", i); error: - i--; + while (--i >= start) { + if (!rdma_ib_or_iboe(device, i)) + continue; - while (i >= start) { if (ib_agent_port_close(device, i)) dev_err(&device->dev, "Couldn't close port %d for agents\n", i); if (ib_mad_port_close(device, i)) dev_err(&device->dev, "Couldn't close port %d\n", i); - i--; } } static void ib_mad_remove_device(struct ib_device *device) { - int i, num_ports, cur_port; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int start, end, i; if (device->node_type == RDMA_NODE_IB_SWITCH) { - num_ports = 1; - cur_port = 0; + start = 0; + end = 0; } else { - num_ports = device->phys_port_cnt; - cur_port = 1; + start = 1; + end = device->phys_port_cnt; } - for (i = 0; i < num_ports; i++, cur_port++) { - if (ib_agent_port_close(device, cur_port)) + + for (i = start; i <= end; i++) { + if (!rdma_ib_or_iboe(device, i)) + continue; + + if (ib_agent_port_close(device, i)) dev_err(&device->dev, - "Couldn't close port %d for agents\n", - cur_port); - if (ib_mad_port_close(device, cur_port)) - dev_err(&device->dev, "Couldn't close port %d\n", - cur_port); + "Couldn't close port %d for agents\n", i); + if (ib_mad_port_close(device, i)) + dev_err(&device->dev, "Couldn't close port %d\n", i); } } diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 928cdd2..aa8b334 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device) { struct ib_umad_device *umad_dev; int s, e, i; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int count = 0;
[PATCH v8 05/23] IB/Verbs: Reform IB-core sa_query
Use raw management helpers to reform IB-core sa_query. Signed-off-by: Michael Wang --- drivers/infiniband/core/sa_query.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index c38f030..b115c28 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event struct ib_sa_port *port = &sa_dev->port[event->element.port_num - sa_dev->start_port]; - if (rdma_port_get_link_layer(handler->device, port->port_num) != IB_LINK_LAYER_INFINIBAND) + if (WARN_ON(!rdma_protocol_ib(handler->device, port->port_num))) return; spin_lock_irqsave(&port->ah_lock, flags); @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr->port_num = port_num; ah_attr->static_rate = rec->rate; - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET; + force_grh = rdma_protocol_iboe(device, port_num); if (rec->hop_limit > 1 || force_grh) { ah_attr->ah_flags = IB_AH_GRH; @@ -1153,9 +1153,7 @@ static void ib_sa_add_one(struct ib_device *device) { struct ib_sa_device *sa_dev; int s, e, i; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int count = 0; if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; @@ -1175,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device) for (i = 0; i <= e - s; ++i) { spin_lock_init(&sa_dev->port[i].ah_lock); - if (rdma_port_get_link_layer(device, i + 1) != IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, i + 1)) continue; sa_dev->port[i].sm_ah= NULL; @@ -1189,8 +1187,13 @@ static void ib_sa_add_one(struct ib_device *device) goto err; INIT_WORK(&sa_dev->port[i].update_task, update_sm_ah); + + count++; } + if (!count) + goto free; + ib_set_client_data(device, &sa_client, sa_dev); /* @@ -1204,19 +1207,20 @@ static void ib_sa_add_one(struct ib_device *device) if (ib_register_event_handler(&sa_dev->event_handler)) goto err; - for (i = 0; i <= e - s; ++i) - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) + for (i = 0; i <= e - s; ++i) { + if (rdma_protocol_ib(device, i + 1)) update_sm_ah(&sa_dev->port[i].update_task); + } return; err: - while (--i >= 0) - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) + while (--i >= 0) { + if (rdma_protocol_ib(device, i + 1)) ib_unregister_mad_agent(sa_dev->port[i].agent); - + } +free: kfree(sa_dev); - return; } @@ -1233,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device) flush_workqueue(ib_wq); for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) { - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, i + 1)) { ib_unregister_mad_agent(sa_dev->port[i].agent); if (sa_dev->port[i].sm_ah) kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 10/23] IB/Verbs: Reform cm related part in IB-core cma/ucm
Use raw management helpers to reform cm related part in IB-core cma/ucm. Few checks focus on the device cm type rather than the port capability, directly pass port 1 works currently, but can't support mixing cm type device in future. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 81 +-- drivers/infiniband/core/ucm.c | 3 +- 2 files changed, 26 insertions(+), 58 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d570030..8a07e89 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -735,8 +735,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); - switch (rdma_node_get_transport(id_priv->id.device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id->device, id->port_num)) { if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else @@ -745,19 +744,15 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id->device, id->port_num)) { if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; } else ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, qp_attr_mask); - break; - default: + } else ret = -ENOSYS; - break; - } return ret; } @@ -1037,17 +1032,12 @@ void rdma_destroy_id(struct rdma_cm_id *id) mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { - switch (rdma_node_get_transport(id_priv->id.device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); - break; - default: - break; } cma_leave_mc_groups(id_priv); cma_release_dev(id_priv); @@ -1626,7 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, int ret; if (cma_family(id_priv) == AF_IB && - rdma_node_get_transport(cma_dev->device->node_type) != RDMA_TRANSPORT_IB) + !rdma_ib_or_iboe(cma_dev->device, 1)) return; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, @@ -2028,7 +2018,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) mutex_lock(&lock); list_for_each_entry(cur_dev, &dev_list, list) { if (cma_family(id_priv) == AF_IB && - rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB) + !rdma_ib_or_iboe(cur_dev->device, 1)) continue; if (!cma_dev) @@ -2060,7 +2050,7 @@ port_found: goto out; id_priv->id.route.addr.dev_addr.dev_type = - (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ? + (rdma_protocol_ib(cma_dev->device, p)) ? ARPHRD_INFINIBAND : ARPHRD_ETHER; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); @@ -2537,18 +2527,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) id_priv->backlog = backlog; if (id->device) { - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id->device, 1)) { ret = cma_ib_listen(id_priv); if (ret) goto err; - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id->device, 1)) { ret = cma_iw_listen(id_priv, backlog); if (ret) goto err; - break; -
[PATCH v8 09/23] IB/Verbs: Reform IB-core verbs
Use raw management helpers to reform IB-core verbs Signed-off-by: Michael Wang --- drivers/infiniband/core/verbs.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f93eb8d..7dd2f51 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -198,11 +198,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, u32 flow_class; u16 gid_index; int ret; - int is_eth = (rdma_port_get_link_layer(device, port_num) == - IB_LINK_LAYER_ETHERNET); memset(ah_attr, 0, sizeof *ah_attr); - if (is_eth) { + if (rdma_protocol_iboe(device, port_num)) { if (!(wc->wc_flags & IB_WC_GRH)) return -EPROTOTYPE; @@ -871,7 +869,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp, union ib_gid sgid; if ((*qp_attr_mask & IB_QP_AV) && - (rdma_port_get_link_layer(qp->device, qp_attr->ah_attr.port_num) == IB_LINK_LAYER_ETHERNET)) { + (rdma_protocol_iboe(qp->device, qp_attr->ah_attr.port_num))) { ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num, qp_attr->ah_attr.grh.sgid_index, &sgid); if (ret) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 08/23] IB/Verbs: Reform IB-ulp xprtrdma
Use raw management helpers to reform IB-ulp xprtrdma. Signed-off-by: Michael Wang --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-- net/sunrpc/xprtrdma/svc_rdma_transport.c | 45 +--- 2 files changed, 20 insertions(+), 29 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index f9f13a3..2cc625d 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -117,8 +117,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { - if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) == -RDMA_TRANSPORT_IWARP) + if (rdma_protocol_iwarp(xprt->sc_cm_id->device, + xprt->sc_cm_id->port_num)) return 1; else return min_t(int, sge_count, xprt->sc_max_sge); diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index f609c1c..3df8320 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -851,7 +851,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) struct ib_qp_init_attr qp_attr; struct ib_device_attr devattr; int uninitialized_var(dma_mr_acc); - int need_dma_mr; + int need_dma_mr = 0; int ret; int i; @@ -985,35 +985,26 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) /* * Determine if a DMA MR is required and if so, what privs are required */ - switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) { - case RDMA_TRANSPORT_IWARP: - newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV; - if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) { - need_dma_mr = 1; - dma_mr_acc = - (IB_ACCESS_LOCAL_WRITE | -IB_ACCESS_REMOTE_WRITE); - } else if (!(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else - need_dma_mr = 0; - break; - case RDMA_TRANSPORT_IB: - if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else if (!(devattr.device_cap_flags & -IB_DEVICE_LOCAL_DMA_LKEY)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else - need_dma_mr = 0; - break; - default: + if (!rdma_protocol_iwarp(newxprt->sc_cm_id->device, +newxprt->sc_cm_id->port_num) && + !rdma_ib_or_iboe(newxprt->sc_cm_id->device, +newxprt->sc_cm_id->port_num)) goto errout; + + if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) || + !(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) { + need_dma_mr = 1; + dma_mr_acc = IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(newxprt->sc_cm_id->device, + newxprt->sc_cm_id->port_num) && + !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) + dma_mr_acc |= IB_ACCESS_REMOTE_WRITE; } + if (rdma_protocol_iwarp(newxprt->sc_cm_id->device, + newxprt->sc_cm_id->port_num)) + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV; + /* Create the DMA MR if needed, otherwise, use the DMA LKEY */ if (need_dma_mr) { /* Register all of physical memory */ -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 04/23] IB/Verbs: Reform IB-core cm
Use raw management helpers to reform IB-core cm. Signed-off-by: Michael Wang --- drivers/infiniband/core/cm.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..add5e484 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3760,11 +3760,9 @@ static void cm_add_one(struct ib_device *ib_device) }; unsigned long flags; int ret; + int count = 0; u8 i; - if (rdma_node_get_transport(ib_device->node_type) != RDMA_TRANSPORT_IB) - return; - cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) * ib_device->phys_port_cnt, GFP_KERNEL); if (!cm_dev) @@ -3783,6 +3781,9 @@ static void cm_add_one(struct ib_device *ib_device) set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= ib_device->phys_port_cnt; i++) { + if (!rdma_ib_or_iboe(ib_device, i)) + continue; + port = kzalloc(sizeof *port, GFP_KERNEL); if (!port) goto error1; @@ -3809,7 +3810,13 @@ static void cm_add_one(struct ib_device *ib_device) ret = ib_modify_port(ib_device, i, 0, &port_modify); if (ret) goto error3; + + count++; } + + if (!count) + goto free; + ib_set_client_data(ib_device, &cm_client, cm_dev); write_lock_irqsave(&cm.device_lock, flags); @@ -3825,11 +3832,15 @@ error1: port_modify.set_port_cap_mask = 0; port_modify.clr_port_cap_mask = IB_PORT_CM_SUP; while (--i) { + if (!rdma_ib_or_iboe(ib_device, i)) + continue; + port = cm_dev->port[i-1]; ib_modify_port(ib_device, port->port_num, 0, &port_modify); ib_unregister_mad_agent(port->mad_agent); cm_remove_port_fs(port); } +free: device_unregister(cm_dev->device); kfree(cm_dev); } @@ -3853,6 +3864,9 @@ static void cm_remove_one(struct ib_device *ib_device) write_unlock_irqrestore(&cm.device_lock, flags); for (i = 1; i <= ib_device->phys_port_cnt; i++) { + if (!rdma_ib_or_iboe(ib_device, i)) + continue; + port = cm_dev->port[i-1]; ib_modify_port(ib_device, port->port_num, 0, &port_modify); ib_unregister_mad_agent(port->mad_agent); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 12/23] IB/Verbs: Reform mcast related part in IB-core cma
Use raw management helpers to reform mcast related part in IB-core cma. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 56 ++- 1 file changed, 18 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 36c5f8a..34ec13f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -997,17 +997,12 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); - switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) { - case IB_LINK_LAYER_INFINIBAND: + if (rdma_protocol_ib(id_priv->cma_dev->device, + id_priv->id.port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); - break; - case IB_LINK_LAYER_ETHERNET: + } else kref_put(&mc->mcref, release_mc); - break; - default: - break; - } } } @@ -3314,24 +3309,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, list_add(&mc->list, &id_priv->mc_list); spin_unlock(&id_priv->lock); - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ret = cma_join_ib_multicast(id_priv, mc); - break; - case IB_LINK_LAYER_ETHERNET: - kref_init(&mc->mcref); - ret = cma_iboe_join_multicast(id_priv, mc); - break; - default: - ret = -EINVAL; - } - break; - default: + if (rdma_protocol_iboe(id->device, id->port_num)) { + kref_init(&mc->mcref); + ret = cma_iboe_join_multicast(id_priv, mc); + } else if (rdma_protocol_ib(id->device, id->port_num)) + ret = cma_join_ib_multicast(id_priv, mc); + else ret = -ENOSYS; - break; - } if (ret) { spin_lock_irq(&id_priv->lock); @@ -3359,19 +3343,15 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id->qp, &mc->multicast.ib->rec.mgid, be16_to_cpu(mc->multicast.ib->rec.mlid)); - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) { - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ib_sa_free_multicast(mc->multicast.ib); - kfree(mc); - break; - case IB_LINK_LAYER_ETHERNET: - kref_put(&mc->mcref, release_mc); - break; - default: - break; - } - } + + BUG_ON(id_priv->cma_dev->device != id->device); + + if (rdma_protocol_ib(id->device, id->port_num)) { + ib_sa_free_multicast(mc->multicast.ib); + kfree(mc); + } else if (rdma_protocol_iboe(id->device, id->port_num)) + kref_put(&mc->mcref, release_mc); + return; } } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 13/23] IB/Verbs: Reform cma_acquire_dev()
Reform cma_acquire_dev() with management helpers, introduce cma_validate_port() to make the code more clean. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 68 +-- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 34ec13f..3fb3458 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -349,18 +349,35 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a return ret; } +static inline int cma_validate_port(struct ib_device *device, u8 port, + union ib_gid *gid, int dev_type) +{ + u8 found_port; + int ret = -ENODEV; + + if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) + return ret; + + if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port)) + return ret; + + ret = ib_find_cached_gid(device, gid, &found_port, NULL); + if (port != found_port) + return -ENODEV; + + return ret; +} + static int cma_acquire_dev(struct rdma_id_private *id_priv, struct rdma_id_private *listen_id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct cma_device *cma_dev; - union ib_gid gid, iboe_gid; + union ib_gid gid, iboe_gid, *gidp; int ret = -ENODEV; - u8 port, found_port; - enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ? - IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET; + u8 port; - if (dev_ll != IB_LINK_LAYER_INFINIBAND && + if (dev_addr->dev_type != ARPHRD_INFINIBAND && id_priv->id.ps == RDMA_PS_IPOIB) return -EINVAL; @@ -370,41 +387,36 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, memcpy(&gid, dev_addr->src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); - if (listen_id_priv && - rdma_port_get_link_layer(listen_id_priv->id.device, -listen_id_priv->id.port_num) == dev_ll) { + + if (listen_id_priv) { cma_dev = listen_id_priv->cma_dev; port = listen_id_priv->id.port_num; - if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) - ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, -&found_port, NULL); - else - ret = ib_find_cached_gid(cma_dev->device, &gid, -&found_port, NULL); + gidp = rdma_protocol_iboe(cma_dev->device, port) ? + &iboe_gid : &gid; - if (!ret && (port == found_port)) { - id_priv->id.port_num = found_port; + ret = cma_validate_port(cma_dev->device, port, gidp, + dev_addr->dev_type); + if (!ret) { + id_priv->id.port_num = port; goto out; } } + list_for_each_entry(cma_dev, &dev_list, list) { for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) { if (listen_id_priv && listen_id_priv->cma_dev == cma_dev && listen_id_priv->id.port_num == port) continue; - if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) { - if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) - ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL); - else - ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL); - - if (!ret && (port == found_port)) { - id_priv->id.port_num = found_port; - goto out; - } + + gidp = rdma_protocol_iboe(cma_dev->device, port) ? + &iboe_gid : &gid; + + ret = cma_v
[PATCH v8 15/23] IB/Verbs: Use management helper rdma_cap_ib_mad()
Introduce helper rdma_cap_ib_mad() to help us check if the port of an IB device support Infiniband Management Datagrams. Signed-off-by: Michael Wang --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/user_mad.c | 6 +++--- include/rdma/ib_verbs.h| 15 +++ 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 507eb67..80777cd 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -3066,7 +3066,7 @@ static void ib_mad_init_device(struct ib_device *device) } for (i = start; i <= end; i++) { - if (!rdma_ib_or_iboe(device, i)) + if (!rdma_cap_ib_mad(device, i)) continue; if (ib_mad_port_open(device, i)) { @@ -3087,7 +3087,7 @@ error_agent: error: while (--i >= start) { - if (!rdma_ib_or_iboe(device, i)) + if (!rdma_cap_ib_mad(device, i)) continue; if (ib_agent_port_close(device, i)) @@ -3111,7 +3111,7 @@ static void ib_mad_remove_device(struct ib_device *device) } for (i = start; i <= end; i++) { - if (!rdma_ib_or_iboe(device, i)) + if (!rdma_cap_ib_mad(device, i)) continue; if (ib_agent_port_close(device, i)) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index aa8b334..d451717 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1294,7 +1294,7 @@ static void ib_umad_add_one(struct ib_device *device) umad_dev->end_port = e; for (i = s; i <= e; ++i) { - if (!rdma_ib_or_iboe(device, i)) + if (!rdma_cap_ib_mad(device, i)) continue; umad_dev->port[i - s].umad_dev = umad_dev; @@ -1315,7 +1315,7 @@ static void ib_umad_add_one(struct ib_device *device) err: while (--i >= s) { - if (!rdma_ib_or_iboe(device, i)) + if (!rdma_cap_ib_mad(device, i)) continue; ib_umad_kill_port(&umad_dev->port[i - s]); @@ -1333,7 +1333,7 @@ static void ib_umad_remove_one(struct ib_device *device) return; for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i) { - if (rdma_ib_or_iboe(device, i)) + if (rdma_cap_ib_mad(device, i)) ib_umad_kill_port(&umad_dev->port[i]); } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e6dd984..23ba66e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1774,6 +1774,21 @@ static inline bool rdma_ib_or_iboe(struct ib_device *device, u8 port_num) return (pt == RDMA_PROTOCOL_IB || pt == RDMA_PROTOCOL_IBOE); } +/** + * rdma_cap_ib_mad - Check if the port of device has the capability Infiniband + * Management Datagrams. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support Infiniband + * Management Datagrams. + */ +static inline bool rdma_cap_ib_mad(struct ib_device *device, u8 port_num) +{ + return rdma_ib_or_iboe(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 14/23] IB/Verbs: Reform rest part in IB-core cma
Use raw management helpers to reform rest part in IB-core cma. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 3fb3458..d43f492f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -447,10 +447,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) pkey = ntohs(addr->sib_pkey); list_for_each_entry(cur_dev, &dev_list, list) { - if (rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB) - continue; - for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) { + if (!rdma_ib_or_iboe(cur_dev->device, p)) + continue; + if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index)) continue; @@ -645,10 +645,9 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, if (ret) goto out; - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) - == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) - == IB_LINK_LAYER_ETHERNET) { + BUG_ON(id_priv->cma_dev->device != id_priv->id.device); + + if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) { ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL); if (ret) @@ -712,11 +711,10 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, int ret; u16 pkey; - if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == - IB_LINK_LAYER_INFINIBAND) - pkey = ib_addr_get_pkey(dev_addr); - else + if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) pkey = 0x; + else + pkey = ib_addr_get_pkey(dev_addr); ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num, pkey, &qp_attr->pkey_index); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 22/23] IB/Verbs: Use management helper rdma_cap_af_ib()
Introduce helper rdma_cap_af_ib() to help us check if the port of an IB device support Native Infiniband Address. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 101e9cc..a6cbf42 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -448,7 +448,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) list_for_each_entry(cur_dev, &dev_list, list) { for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) { - if (!rdma_ib_or_iboe(cur_dev->device, p)) + if (!rdma_cap_af_ib(cur_dev->device, p)) continue; if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index)) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index ac1e7f1..41f8445 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1865,6 +1865,21 @@ static inline bool rdma_cap_ib_mcast(struct ib_device *device, u8 port_num) } /** + * rdma_cap_af_ib - Check if the port of device has the capability + * Native Infiniband Address. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support + * Native Infiniband Address. + */ +static inline bool rdma_cap_af_ib(struct ib_device *device, u8 port_num) +{ + return rdma_ib_or_iboe(device, port_num); +} + +/** * rdma_cap_read_multi_sge - Check if the port of device has the capability * RDMA Read Multiple Scatter-Gather Entries. * -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 21/23] IB/Verbs: Use management helper rdma_cap_read_multi_sge()
Introduce helper rdma_cap_read_multi_sge() to help us check if the port of an IB device support RDMA Read Multiple Scatter-Gather Entries. Signed-off-by: Michael Wang --- include/rdma/ib_verbs.h | 16 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 ++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6bbbc86..2cf23b1 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1864,6 +1864,22 @@ static inline bool rdma_cap_ib_mcast(struct ib_device *device, u8 port_num) return rdma_cap_ib_sa(device, port_num); } +/** + * rdma_cap_read_multi_sge - Check if the port of device has the capability + * RDMA Read Multiple Scatter-Gather Entries. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support + * RDMA Read Multiple Scatter-Gather Entries. + */ +static inline bool rdma_cap_read_multi_sge(struct ib_device *device, + u8 port_num) +{ + return !rdma_protocol_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 2cc625d..86b4416 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -117,8 +117,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { - if (rdma_protocol_iwarp(xprt->sc_cm_id->device, - xprt->sc_cm_id->port_num)) + if (!rdma_cap_read_multi_sge(xprt->sc_cm_id->device, +xprt->sc_cm_id->port_num)) return 1; else return min_t(int, sge_count, xprt->sc_max_sge); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 23/23] IB/Verbs: Use management helper rdma_cap_eth_ah()
Introduce helper rdma_cap_eth_ah() to help us check if the port of an IB device support Ethernet Address Handler. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/sa_query.c | 2 +- drivers/infiniband/core/verbs.c| 4 ++-- include/rdma/ib_verbs.h| 15 +++ 4 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index a6cbf42..0a3e859 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -711,7 +711,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, int ret; u16 pkey; - if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) + if (rdma_cap_eth_ah(id_priv->id.device, id_priv->id.port_num)) pkey = 0x; else pkey = ib_addr_get_pkey(dev_addr); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 30aa5e5..7f7c8c9 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr->port_num = port_num; ah_attr->static_rate = rec->rate; - force_grh = rdma_protocol_iboe(device, port_num); + force_grh = rdma_cap_eth_ah(device, port_num); if (rec->hop_limit > 1 || force_grh) { ah_attr->ah_flags = IB_AH_GRH; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 7dd2f51..d110a5e 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -200,7 +200,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, int ret; memset(ah_attr, 0, sizeof *ah_attr); - if (rdma_protocol_iboe(device, port_num)) { + if (rdma_cap_eth_ah(device, port_num)) { if (!(wc->wc_flags & IB_WC_GRH)) return -EPROTOTYPE; @@ -869,7 +869,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp, union ib_gid sgid; if ((*qp_attr_mask & IB_QP_AV) && - (rdma_protocol_iboe(qp->device, qp_attr->ah_attr.port_num))) { + (rdma_cap_eth_ah(qp->device, qp_attr->ah_attr.port_num))) { ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num, qp_attr->ah_attr.grh.sgid_index, &sgid); if (ret) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 41f8445..721c378 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1880,6 +1880,21 @@ static inline bool rdma_cap_af_ib(struct ib_device *device, u8 port_num) } /** + * rdma_cap_eth_ah - Check if the port of device has the capability + * Ethernet Address Handler. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support + * Ethernet Address Handler. + */ +static inline bool rdma_cap_eth_ah(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_iboe(device, port_num); +} + +/** * rdma_cap_read_multi_sge - Check if the port of device has the capability * RDMA Read Multiple Scatter-Gather Entries. * -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 20/23] IB/Verbs: Use management helper rdma_cap_ib_mcast()
Introduce helper rdma_cap_ib_mcast() to help us check if the port of an IB device support Infiniband Multicast. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 6 +++--- drivers/infiniband/core/multicast.c | 6 +++--- include/rdma/ib_verbs.h | 15 +++ 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8def2f5..101e9cc 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1007,7 +1007,7 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); - if (rdma_protocol_ib(id_priv->cma_dev->device, + if (rdma_cap_ib_mcast(id_priv->cma_dev->device, id_priv->id.port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); @@ -3321,7 +3321,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, if (rdma_protocol_iboe(id->device, id->port_num)) { kref_init(&mc->mcref); ret = cma_iboe_join_multicast(id_priv, mc); - } else if (rdma_protocol_ib(id->device, id->port_num)) + } else if (rdma_cap_ib_mcast(id->device, id->port_num)) ret = cma_join_ib_multicast(id_priv, mc); else ret = -ENOSYS; @@ -3355,7 +3355,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) BUG_ON(id_priv->cma_dev->device != id->device); - if (rdma_protocol_ib(id->device, id->port_num)) { + if (rdma_cap_ib_mcast(id->device, id->port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); } else if (rdma_protocol_iboe(id->device, id->port_num)) diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index b57ed03..605f20a 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); - if (WARN_ON(!rdma_protocol_ib(dev->device, event->element.port_num))) + if (WARN_ON(!rdma_cap_ib_mcast(dev->device, event->element.port_num))) return; index = event->element.port_num - dev->start_port; @@ -820,7 +820,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (!rdma_protocol_ib(device, dev->start_port + i)) + if (!rdma_cap_ib_mcast(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; @@ -858,7 +858,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_protocol_ib(device, dev->start_port + i)) { + if (rdma_cap_ib_mcast(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index c3a561e8..6bbbc86 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1849,6 +1849,21 @@ static inline bool rdma_cap_ib_sa(struct ib_device *device, u8 port_num) return rdma_protocol_ib(device, port_num); } +/** + * rdma_cap_ib_mcast - Check if the port of device has the capability Infiniband + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support Infiniband + * Multicast. + */ +static inline bool rdma_cap_ib_mcast(struct ib_device *device, u8 port_num) +{ + return rdma_cap_ib_sa(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 19/23] IB/Verbs: Use management helper rdma_cap_ib_sa()
Introduce helper rdma_cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 4 ++-- drivers/infiniband/core/sa_query.c | 10 +- drivers/infiniband/core/ucma.c | 2 +- include/rdma/ib_verbs.h| 15 +++ 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0787035..8def2f5 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -933,7 +933,7 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv) static void cma_cancel_route(struct rdma_id_private *id_priv) { - if (rdma_protocol_ib(id_priv->id.device, id_priv->id.port_num)) { + if (rdma_cap_ib_sa(id_priv->id.device, id_priv->id.port_num)) { if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); } @@ -1957,7 +1957,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) return -EINVAL; atomic_inc(&id_priv->refcount); - if (rdma_protocol_ib(id->device, id->port_num)) + if (rdma_cap_ib_sa(id->device, id->port_num)) ret = cma_resolve_ib_route(id_priv, timeout_ms); else if (rdma_protocol_iboe(id->device, id->port_num)) ret = cma_resolve_iboe_route(id_priv); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index b115c28..30aa5e5 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event struct ib_sa_port *port = &sa_dev->port[event->element.port_num - sa_dev->start_port]; - if (WARN_ON(!rdma_protocol_ib(handler->device, port->port_num))) + if (WARN_ON(!rdma_cap_ib_sa(handler->device, port->port_num))) return; spin_lock_irqsave(&port->ah_lock, flags); @@ -1173,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device) for (i = 0; i <= e - s; ++i) { spin_lock_init(&sa_dev->port[i].ah_lock); - if (!rdma_protocol_ib(device, i + 1)) + if (!rdma_cap_ib_sa(device, i + 1)) continue; sa_dev->port[i].sm_ah= NULL; @@ -1208,7 +1208,7 @@ static void ib_sa_add_one(struct ib_device *device) goto err; for (i = 0; i <= e - s; ++i) { - if (rdma_protocol_ib(device, i + 1)) + if (rdma_cap_ib_sa(device, i + 1)) update_sm_ah(&sa_dev->port[i].update_task); } @@ -1216,7 +1216,7 @@ static void ib_sa_add_one(struct ib_device *device) err: while (--i >= 0) { - if (rdma_protocol_ib(device, i + 1)) + if (rdma_cap_ib_sa(device, i + 1)) ib_unregister_mad_agent(sa_dev->port[i].agent); } free: @@ -1237,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device) flush_workqueue(ib_wq); for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) { - if (rdma_protocol_ib(device, i + 1)) { + if (rdma_cap_ib_sa(device, i + 1)) { ib_unregister_mad_agent(sa_dev->port[i].agent); if (sa_dev->port[i].sm_ah) kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index dae7620..d42b816 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -723,7 +723,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid; resp.port_num = ctx->cm_id->port_num; - if (rdma_protocol_ib(ctx->cm_id->device, ctx->cm_id->port_num)) + if (rdma_cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_ib_route(&resp, &ctx->cm_id->route); else if (rdma_protocol_iboe(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_iboe_route(&resp, &ctx->cm_id->route); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index cc92a64..c3a561e8 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1834,6 +1834,21 @@ static inline bool rdma_cap_iw_cm(struct ib_device *device, u8 port_num) return rdma_protocol_iwarp(device, port_num); } +/** + * rdma_cap_ib_sa - Check if the port of device has the capability Infiniband + * Subnet Admin
[PATCH v8 16/23] IB/Verbs: Use management helper rdma_cap_ib_smi()
Introduce helper rdma_cap_ib_smi() to help us check if the port of an IB device support Infiniband Subnet Management Interface. Signed-off-by: Michael Wang --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/mad.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index 89d4fbc..a6fc4d6 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } - if (rdma_protocol_ib(device, port_num)) { + if (rdma_cap_ib_smi(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 80777cd..e9699c9 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device, init_mad_qp(port_priv, &port_priv->qp_info[1]); cq_size = mad_sendq_size + mad_recvq_size; - has_smi = rdma_protocol_ib(device, port_num); + has_smi = rdma_cap_ib_smi(device, port_num); if (has_smi) cq_size *= 2; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 23ba66e..e983e33 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1789,6 +1789,21 @@ static inline bool rdma_cap_ib_mad(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * rdma_cap_ib_smi - Check if the port of device has the capability Infiniband + * Subnet Management Interface. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support Infiniband + * Subnet Management Interface. + */ +static inline bool rdma_cap_ib_smi(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 18/23] IB/Verbs: Use management helper rdma_cap_iw_cm()
Introduce helper rdma_cap_iw_cm() to help us check if the port of an IB device support IWARP Communication Manager. Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 14 +++--- include/rdma/ib_verbs.h | 15 +++ 2 files changed, 22 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 71a3668..0787035 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -754,7 +754,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (rdma_cap_iw_cm(id->device, id->port_num)) { if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; @@ -1036,7 +1036,7 @@ void rdma_destroy_id(struct rdma_cm_id *id) if (rdma_cap_ib_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); - } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { + } else if (rdma_cap_iw_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); } @@ -2520,7 +2520,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) ret = cma_ib_listen(id_priv); if (ret) goto err; - } else if (rdma_protocol_iwarp(id->device, 1)) { + } else if (rdma_cap_iw_cm(id->device, 1)) { ret = cma_iw_listen(id_priv, backlog); if (ret) goto err; @@ -2865,7 +2865,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) ret = cma_resolve_ib_udp(id_priv, conn_param); else ret = cma_connect_ib(id_priv, conn_param); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) + } else if (rdma_cap_iw_cm(id->device, id->port_num)) ret = cma_connect_iw(id_priv, conn_param); else ret = -ENOSYS; @@ -2987,7 +2987,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) else ret = cma_rep_recv(id_priv); } - } else if (rdma_protocol_iwarp(id->device, id->port_num)) + } else if (rdma_cap_iw_cm(id->device, id->port_num)) ret = cma_accept_iw(id_priv, conn_param); else ret = -ENOSYS; @@ -3042,7 +3042,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data, ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (rdma_cap_iw_cm(id->device, id->port_num)) { ret = iw_cm_reject(id_priv->cm_id.iw, private_data, private_data_len); } else @@ -3068,7 +3068,7 @@ int rdma_disconnect(struct rdma_cm_id *id) /* Initiate or respond to a disconnect. */ if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (rdma_cap_iw_cm(id->device, id->port_num)) { ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); } else ret = -EINVAL; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e349596..cc92a64 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1819,6 +1819,21 @@ static inline bool rdma_cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * rdma_cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return false when port of the device don't support IWARP + * Communication Manager. + */ +static inline bool rdma_cap_iw_cm(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "
[PATCH v8 17/23] IB/Verbs: Use management helper rdma_cap_ib_cm()
Introduce helper rdma_cap_ib_cm() to help us check if the port of an IB device support Infiniband Communication Manager. Signed-off-by: Michael Wang --- drivers/infiniband/core/cm.c | 6 +++--- drivers/infiniband/core/cma.c | 19 +-- drivers/infiniband/core/ucm.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 4 files changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index add5e484..7073f98 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3781,7 +3781,7 @@ static void cm_add_one(struct ib_device *ib_device) set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= ib_device->phys_port_cnt; i++) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!rdma_cap_ib_cm(ib_device, i)) continue; port = kzalloc(sizeof *port, GFP_KERNEL); @@ -3832,7 +3832,7 @@ error1: port_modify.set_port_cap_mask = 0; port_modify.clr_port_cap_mask = IB_PORT_CM_SUP; while (--i) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!rdma_cap_ib_cm(ib_device, i)) continue; port = cm_dev->port[i-1]; @@ -3864,7 +3864,7 @@ static void cm_remove_one(struct ib_device *ib_device) write_unlock_irqrestore(&cm.device_lock, flags); for (i = 1; i <= ib_device->phys_port_cnt; i++) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!rdma_cap_ib_cm(ib_device, i)) continue; port = cm_dev->port[i-1]; diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d43f492f..71a3668 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -745,7 +745,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (rdma_cap_ib_cm(id->device, id->port_num)) { if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else @@ -1033,7 +1033,7 @@ void rdma_destroy_id(struct rdma_cm_id *id) mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { - if (rdma_ib_or_iboe(id_priv->id.device, 1)) { + if (rdma_cap_ib_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { @@ -1616,8 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct rdma_cm_id *id; int ret; - if (cma_family(id_priv) == AF_IB && - !rdma_ib_or_iboe(cma_dev->device, 1)) + if (cma_family(id_priv) == AF_IB && !rdma_cap_ib_cm(cma_dev->device, 1)) return; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, @@ -2008,7 +2007,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) mutex_lock(&lock); list_for_each_entry(cur_dev, &dev_list, list) { if (cma_family(id_priv) == AF_IB && - !rdma_ib_or_iboe(cur_dev->device, 1)) + !rdma_cap_ib_cm(cur_dev->device, 1)) continue; if (!cma_dev) @@ -2517,7 +2516,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) id_priv->backlog = backlog; if (id->device) { - if (rdma_ib_or_iboe(id->device, 1)) { + if (rdma_cap_ib_cm(id->device, 1)) { ret = cma_ib_listen(id_priv); if (ret) goto err; @@ -2861,7 +2860,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) id_priv->srq = conn_param->srq; } - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (rdma_cap_ib_cm(id->device, id->port_num)) { if (id->qp_type == IB_QPT_UD) ret = cma_resolve_ib_udp(id_priv, conn_param); else @@ -2972,7 +2971,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) id_priv->srq = conn_param->srq; } - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (rdma_cap_ib_cm(id->device, id->port_num)) { if (id->qp_type == IB_QPT_UD) { if (conn_param) ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS
[PATCH v8 07/23] IB/Verbs: Reform IB-ulp ipoib
Use raw management helpers to reform IB-ulp ipoib. Signed-off-by: Michael Wang --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 7cad4dd..468fc2b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1680,9 +1680,7 @@ static void ipoib_add_one(struct ib_device *device) struct net_device *dev; struct ipoib_dev_priv *priv; int s, e, p; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int count = 0; dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); if (!dev_list) @@ -1699,15 +1697,21 @@ static void ipoib_add_one(struct ib_device *device) } for (p = s; p <= e; ++p) { - if (rdma_port_get_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, p)) continue; dev = ipoib_add_port("ib%d", device, p); if (!IS_ERR(dev)) { priv = netdev_priv(dev); list_add_tail(&priv->list, dev_list); + count++; } } + if (!count) { + kfree(dev_list); + return; + } + ib_set_client_data(device, &ipoib_client, dev_list); } @@ -1716,9 +1720,6 @@ static void ipoib_remove_one(struct ib_device *device) struct ipoib_dev_priv *priv, *tmp; struct list_head *dev_list; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - dev_list = ib_get_client_data(device, &ipoib_client); if (!dev_list) return; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 06/23] IB/Verbs: Reform IB-core multicast
Use raw management helpers to reform IB-core multicast. Signed-off-by: Michael Wang --- drivers/infiniband/core/multicast.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index fa17b55..b57ed03 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,8 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); - if (rdma_port_get_link_layer(dev->device, event->element.port_num) != - IB_LINK_LAYER_INFINIBAND) + if (WARN_ON(!rdma_protocol_ib(dev->device, event->element.port_num))) return; index = event->element.port_num - dev->start_port; @@ -808,9 +807,6 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, GFP_KERNEL); if (!dev) @@ -824,8 +820,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_port_get_link_layer(device, dev->start_port + i) != - IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; @@ -863,8 +858,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_port_get_link_layer(device, dev->start_port + i) == - IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 01/23] IB/Verbs: Implement new callback query_protocol()
Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNICETH IWARP IWARP amso1100RNICETH IWARP IWARP cxgb3 RNICETH IWARP IWARP cxgb4 RNICETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4IB_CA IB/ETH IB IB/IBOE mlx5IB_CA IB IB IB ehcaIB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang --- drivers/infiniband/core/device.c | 1 + drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++ drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++ drivers/infiniband/hw/cxgb4/provider.c | 7 +++ drivers/infiniband/hw/ehca/ehca_hca.c| 6 ++ drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++ drivers/infiniband/hw/ehca/ehca_main.c | 1 + drivers/infiniband/hw/ipath/ipath_verbs.c| 7 +++ drivers/infiniband/hw/mlx4/main.c| 10 ++ drivers/infiniband/hw/mlx5/main.c| 7 +++ drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++ drivers/infiniband/hw/nes/nes_verbs.c| 6 ++ drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 + drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++ drivers/infiniband/hw/qib/qib_verbs.c| 7 +++ drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 + drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++ drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++ include/rdma/ib_verbs.h | 9 + 20 files changed, 104 insertions(+) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..b360350 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device) } mandatory_table[] = { IB_MANDATORY_FUNC(query_device), IB_MANDATORY_FUNC(query_port), + IB_MANDATORY_FUNC(query_protocol), IB_MANDATORY_FUNC(query_pkey), IB_MANDATORY_FUNC(query_gid), IB_MANDATORY_FUNC(alloc_pd), diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index bdf3507..6fe329a 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev, return 0; } +static enum rdma_protocol_type +c2_query_protocol(struct ib_device *device, u8 port_num) +{ + return RDMA_PROTOCOL_IWARP; +} + static int c2_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 * pkey) { @@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev) dev->ibdev.dma_device = &dev->pcidev->dev; dev->ibdev.query_device = c2_query_device; dev->ibdev.query_port = c2_query_port; + dev->ibdev.query_protocol = c2_query_protocol; dev->ibdev.query_pkey = c2_query_pkey; dev->ibdev.query_gid = c2_query_gid; dev->ibdev.alloc_ucontext = c2_alloc_ucontext; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 811b24a..298d1ca 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev, return 0; } +static enum rdma_protocol_type +iwch_query_protocol(struct ib_device *device, u8 port_num) +{ + return RDMA_PROTOCOL_IWARP; +} + static ssize_t show_rev(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev); dev->ibdev.query_device = iwch_query_device; dev->ibdev.query_port = iwch_query_port; + dev->ibdev.query_protocol = iwch_query_protocol; dev->ibdev.query_pkey = iwch_query_pkey; dev->ibdev.query_gid = iwch_query_gid; dev->ibdev.alloc_ucontext = iwch_alloc_ucontext; diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 66bd6a2..f52ee63 100644 --- a/driv
Re: [PATCH v7 00/23] IB/Verbs: IB Management Helpers
On 05/01/2015 08:34 AM, ira.weiny wrote: > On Tue, Apr 28, 2015 at 05:10:00PM +0200, Michael Wang wrote: >> Since v6: >> * Thanks to Ira, Devesh for the review and testing :-) >> * Thanks for the comments from Sean, Tom, Jason, Doug, Devesh, Ira, >> Liran :-) Please remind me if anything missed :-P >> * Use query_protocol() and enum protocol type in 1# >> * Use rdma_protocol_XX() in 2# >> * Drop cma_set_legacy_transport() >> * Reserve rdma_ib_or_iboe() and rdma_node_get_transport() >> * Updated github repository to v7 > > I pulled these via Dougs for-4.2 branch and have done light testing with mlx4 > and qib. > > Now we need to look at converting to some bit mask. > > Does anyone have a link to the emails which proposed bitmasks? I can't find > them right now. I've recorded two links related to bitmask proposal: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23418.html https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html But there are also some scattered pieces of discussion during the review of prev version, I think the best way is to send out another series dedicated on bitmask and then gather discussions in there :-) > > > For the Series: > > Reviewed-by: Ira Weiny Thanks for the review :-) Regards, Michael Wang > > >> >> There are plenty of lengthy code to check the transport type of IB device, >> or the link layer type of it's port, but actually we are just speculating >> whether a particular management/feature is supported by the device/port. >> >> Thus instead of inferring, we should have our own mechanism for IB management >> capability/protocol/feature checking, several proposals below. >> >> This patch set will introduce query_protocol() to check management >> requirement >> instead of inferring from transport and link layer respectively, along with >> the new enum on protocol type. >> >> Mapping List: >> node-type link-layer transport protocol >> nes RNICETH IWARP IWARP >> amso1100 RNICETH IWARP IWARP >> cxgb3RNICETH IWARP IWARP >> cxgb4RNICETH IWARP IWARP >> usnicUSNIC_UDP ETH USNIC_UDP USNIC_UDP >> ocrdma IB_CA ETH IB IBOE >> mlx4 IB_CA IB/ETH IB IB/IBOE >> mlx5 IB_CA IB IB IB >> ehca IB_CA IB IB IB >> ipathIB_CA IB IB IB >> mthcaIB_CA IB IB IB >> qib IB_CA IB IB IB >> >> For example: >> if (transport == IB) && (link-layer == ETH) >> will now become: >> if (query_protocol() == IBOE) >> >> Thus we will be able to get rid of the respective transport and link-layer >> checking, and it will help us to add new protocol/Technology (like OPA) more >> easier, also with the introduced management helpers, IB management logical >> will be more clear and easier for extending. >> >> Highlights: >> The 'mgmt-helpers' branch of 'g...@github.com:ywang-pb/infiniband-wy.git' >> contain this series based on the latest 'infiniband/for-next' >> >> The patch set covered a wide range of IB stuff, thus for those who are >> familiar with the particular part, your suggestion would be invaluable >> ;-) >> >> Patch 1#~14# included all the logical reform, 15#~23# introduced the >> management helpers. >> >> we appreciate for those one who have the HW willing to provide Tested-by >> :-) >> >> Doug suggested the bitmask mechanism: >> https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html >> which could be the plan for future reforming, we prefer that to be >> another >> series which focus on semantic and performance. >> >> This patch-set is somewhat 'bloated' now and it may be a good timing for >> staging, I'd like to suggest we focus on improving existed helpers and >> push >> all the further reforms into next series ;-) >> >> >> Proposals: >> Sean: >> https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html >> Doug: >> https://
Re: [PATCH v7 00/23] IB/Verbs: IB Management Helpers
On 04/29/2015 06:28 PM, Doug Ledford wrote: > On Tue, 2015-04-28 at 17:10 +0200, Michael Wang wrote: >> Since v6: >> * Thanks to Ira, Devesh for the review and testing :-) >> * Thanks for the comments from Sean, Tom, Jason, Doug, Devesh, Ira, >> Liran :-) Please remind me if anything missed :-P >> * Use query_protocol() and enum protocol type in 1# >> * Use rdma_protocol_XX() in 2# >> * Drop cma_set_legacy_transport() >> * Reserve rdma_ib_or_iboe() and rdma_node_get_transport() >> * Updated github repository to v7 > > I've taken your patchset and threw it into a for-4.2 branch in my repo. > This will get it 0day testing. I've also pulled it into my test cluster > and done minimal testing (bootup, finds devices, gets IP address via > dhcp). Here are the hardware types it passed on: > > mthca (yes, I still have these...only just barely, but I do) > mlx4 (IB/RoCE, in both standalone and SRIOV usage) > mlx5 (IB only, I don't have the Eth capable mlx5 hardware yet) > cxgb3 > cxgb4 > qib > ocrdma My appreciation :-) Please let me know if there are anything broken. Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 04/23] IB/Verbs: Reform IB-core cm
On 04/29/2015 05:48 PM, Or Gerlitz wrote: [snip] > >> >> I think the CC list is not that big for a patch set covered such a wide >> range, isn't it :-P > > Maybe it's a matter of taste, but for me it look way way too big. If > you really want to have such > a huge listing, do it in the early patches of the series where you > introduce the new concepts, and later,on downstream patches, when you > use it, put one person if they happen to be the author or maintainthat > area (e.g Sean <-- CM/CMA, Doug/Erez <-- IPoIB Ira, Hal <-- MAD, Steve > <-- IW_CM, etc) Thanks for the suggestion, I can't callback correctly who participated on which part of the review accurately... my bad, will take care next time :-) and will stop add new CC from now on. BTW, as now folks already familiar with the cap_XX stuff, may be the last version to be applied could merge all the cap_XX into one, after all, it's more focus on the description rather than logical, separate cap_XX won't help easier the review anyway. Regards, Michael Wang > > Or. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 04/23] IB/Verbs: Reform IB-core cm
Hi, Or On 04/28/2015 09:02 PM, Or Gerlitz wrote: > On Tue, Apr 28, 2015 at 6:10 PM, Michael Wang > wrote: >> Use raw management helpers to reform IB-core cm. >> >> Cc: Hal Rosenstock >> Cc: Steve Wise >> Cc: Tom Talpey >> Cc: Jason Gunthorpe >> Cc: Doug Ledford >> Cc: Ira Weiny >> Cc: Sean Hefty >> Signed-off-by: Michael Wang >> --- >> drivers/infiniband/core/cm.c | 20 +--- >> 1 file changed, 17 insertions(+), 3 deletions(-) > > Hi Michael, > > I don't really see the benefit (e.g for someone doing bisection > 1/2/5/10 years from now and landing here) of listing all the group of > reviewers for each of the ~30 patches that make this series, any > special reason that caused you doing so? Those on the CC list are used to help correct some problems or contributed to the definition/plan :-) They are familiar with the whole story of this patch set. As you mentioned, few years later when someone bisect out the patches and want to learn why it's like that, he could have enough address to send his question, although few of them may not work on the same aspect anymore, but the chance to find someone have the story is higher. I think the CC list is not that big for a patch set covered such a wide range, isn't it :-P Regards, Michael Wang > > Or. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 08/23] IB/Verbs: Reform IB-ulp xprtrdma
Use raw management helpers to reform IB-ulp xprtrdma. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-- net/sunrpc/xprtrdma/svc_rdma_transport.c | 45 +--- 2 files changed, 20 insertions(+), 29 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index f9f13a3..2cc625d 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -117,8 +117,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { - if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) == -RDMA_TRANSPORT_IWARP) + if (rdma_protocol_iwarp(xprt->sc_cm_id->device, + xprt->sc_cm_id->port_num)) return 1; else return min_t(int, sge_count, xprt->sc_max_sge); diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index f609c1c..3df8320 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -851,7 +851,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) struct ib_qp_init_attr qp_attr; struct ib_device_attr devattr; int uninitialized_var(dma_mr_acc); - int need_dma_mr; + int need_dma_mr = 0; int ret; int i; @@ -985,35 +985,26 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) /* * Determine if a DMA MR is required and if so, what privs are required */ - switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) { - case RDMA_TRANSPORT_IWARP: - newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV; - if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) { - need_dma_mr = 1; - dma_mr_acc = - (IB_ACCESS_LOCAL_WRITE | -IB_ACCESS_REMOTE_WRITE); - } else if (!(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else - need_dma_mr = 0; - break; - case RDMA_TRANSPORT_IB: - if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else if (!(devattr.device_cap_flags & -IB_DEVICE_LOCAL_DMA_LKEY)) { - need_dma_mr = 1; - dma_mr_acc = IB_ACCESS_LOCAL_WRITE; - } else - need_dma_mr = 0; - break; - default: + if (!rdma_protocol_iwarp(newxprt->sc_cm_id->device, +newxprt->sc_cm_id->port_num) && + !rdma_ib_or_iboe(newxprt->sc_cm_id->device, +newxprt->sc_cm_id->port_num)) goto errout; + + if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) || + !(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) { + need_dma_mr = 1; + dma_mr_acc = IB_ACCESS_LOCAL_WRITE; + if (rdma_protocol_iwarp(newxprt->sc_cm_id->device, + newxprt->sc_cm_id->port_num) && + !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) + dma_mr_acc |= IB_ACCESS_REMOTE_WRITE; } + if (rdma_protocol_iwarp(newxprt->sc_cm_id->device, + newxprt->sc_cm_id->port_num)) + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV; + /* Create the DMA MR if needed, otherwise, use the DMA LKEY */ if (need_dma_mr) { /* Register all of physical memory */ -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 05/23] IB/Verbs: Reform IB-core sa_query
Use raw management helpers to reform IB-core sa_query. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/sa_query.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index c38f030..b115c28 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event struct ib_sa_port *port = &sa_dev->port[event->element.port_num - sa_dev->start_port]; - if (rdma_port_get_link_layer(handler->device, port->port_num) != IB_LINK_LAYER_INFINIBAND) + if (WARN_ON(!rdma_protocol_ib(handler->device, port->port_num))) return; spin_lock_irqsave(&port->ah_lock, flags); @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr->port_num = port_num; ah_attr->static_rate = rec->rate; - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET; + force_grh = rdma_protocol_iboe(device, port_num); if (rec->hop_limit > 1 || force_grh) { ah_attr->ah_flags = IB_AH_GRH; @@ -1153,9 +1153,7 @@ static void ib_sa_add_one(struct ib_device *device) { struct ib_sa_device *sa_dev; int s, e, i; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int count = 0; if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; @@ -1175,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device) for (i = 0; i <= e - s; ++i) { spin_lock_init(&sa_dev->port[i].ah_lock); - if (rdma_port_get_link_layer(device, i + 1) != IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, i + 1)) continue; sa_dev->port[i].sm_ah= NULL; @@ -1189,8 +1187,13 @@ static void ib_sa_add_one(struct ib_device *device) goto err; INIT_WORK(&sa_dev->port[i].update_task, update_sm_ah); + + count++; } + if (!count) + goto free; + ib_set_client_data(device, &sa_client, sa_dev); /* @@ -1204,19 +1207,20 @@ static void ib_sa_add_one(struct ib_device *device) if (ib_register_event_handler(&sa_dev->event_handler)) goto err; - for (i = 0; i <= e - s; ++i) - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) + for (i = 0; i <= e - s; ++i) { + if (rdma_protocol_ib(device, i + 1)) update_sm_ah(&sa_dev->port[i].update_task); + } return; err: - while (--i >= 0) - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) + while (--i >= 0) { + if (rdma_protocol_ib(device, i + 1)) ib_unregister_mad_agent(sa_dev->port[i].agent); - + } +free: kfree(sa_dev); - return; } @@ -1233,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device) flush_workqueue(ib_wq); for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) { - if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, i + 1)) { ib_unregister_mad_agent(sa_dev->port[i].agent); if (sa_dev->port[i].sm_ah) kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 06/23] IB/Verbs: Reform IB-core multicast
Use raw management helpers to reform IB-core multicast. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/multicast.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index fa17b55..b57ed03 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,8 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); - if (rdma_port_get_link_layer(dev->device, event->element.port_num) != - IB_LINK_LAYER_INFINIBAND) + if (WARN_ON(!rdma_protocol_ib(dev->device, event->element.port_num))) return; index = event->element.port_num - dev->start_port; @@ -808,9 +807,6 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, GFP_KERNEL); if (!dev) @@ -824,8 +820,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_port_get_link_layer(device, dev->start_port + i) != - IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; @@ -863,8 +858,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_port_get_link_layer(device, dev->start_port + i) == - IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 02/23] IB/Verbs: Implement raw management helpers
Add raw helpers: rdma_protocol_ib rdma_protocol_iboe rdma_protocol_iwarp rdma_ib_or_iboe To help us detect which technology the port supported. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- include/rdma/ib_verbs.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 080f204..acdba60 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1752,6 +1752,28 @@ int ib_query_port(struct ib_device *device, enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num); +static inline int rdma_protocol_ib(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IB; +} + +static inline int rdma_protocol_iboe(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IBOE; +} + +static inline int rdma_protocol_iwarp(struct ib_device *device, u8 port_num) +{ + return device->query_protocol(device, port_num) == RDMA_PROTOCOL_IWARP; +} + +static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num) +{ + enum rdma_protocol_type pt = device->query_protocol(device, port_num); + + return (pt == RDMA_PROTOCOL_IB || pt == RDMA_PROTOCOL_IBOE); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 10/23] IB/Verbs: Reform cm related part in IB-core cma/ucm
Use raw management helpers to reform cm related part in IB-core cma/ucm. Few checks focus on the device cm type rather than the port capability, directly pass port 1 works currently, but can't support mixing cm type device in future. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 81 +-- drivers/infiniband/core/ucm.c | 3 +- 2 files changed, 26 insertions(+), 58 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d570030..8a07e89 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -735,8 +735,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); - switch (rdma_node_get_transport(id_priv->id.device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id->device, id->port_num)) { if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else @@ -745,19 +744,15 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id->device, id->port_num)) { if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; } else ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, qp_attr_mask); - break; - default: + } else ret = -ENOSYS; - break; - } return ret; } @@ -1037,17 +1032,12 @@ void rdma_destroy_id(struct rdma_cm_id *id) mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { - switch (rdma_node_get_transport(id_priv->id.device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); - break; - default: - break; } cma_leave_mc_groups(id_priv); cma_release_dev(id_priv); @@ -1626,7 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, int ret; if (cma_family(id_priv) == AF_IB && - rdma_node_get_transport(cma_dev->device->node_type) != RDMA_TRANSPORT_IB) + !rdma_ib_or_iboe(cma_dev->device, 1)) return; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, @@ -2028,7 +2018,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) mutex_lock(&lock); list_for_each_entry(cur_dev, &dev_list, list) { if (cma_family(id_priv) == AF_IB && - rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB) + !rdma_ib_or_iboe(cur_dev->device, 1)) continue; if (!cma_dev) @@ -2060,7 +2050,7 @@ port_found: goto out; id_priv->id.route.addr.dev_addr.dev_type = - (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ? + (rdma_protocol_ib(cma_dev->device, p)) ? ARPHRD_INFINIBAND : ARPHRD_ETHER; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); @@ -2537,18 +2527,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) id_priv->backlog = backlog; if (id->device) { - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: + if (rdma_ib_or_iboe(id->device, 1)) { ret = cma_ib_listen(id_priv); if (ret) goto err; - break; - case RDMA_TRANSPORT_IWARP: + } else if (rdma_protocol_iwarp(id->device, 1)) { ret = cma_iw_listen(id_priv,
[PATCH v7 03/23] IB/Verbs: Reform IB-core mad/agent/user_mad
Use raw management helpers to reform IB-core mad/agent/user_mad. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/agent.c| 2 +- drivers/infiniband/core/mad.c | 43 +++--- drivers/infiniband/core/user_mad.c | 26 --- 3 files changed, 41 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..89d4fbc 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } - if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { + if (rdma_protocol_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 74c30f4..507eb67 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device, init_mad_qp(port_priv, &port_priv->qp_info[1]); cq_size = mad_sendq_size + mad_recvq_size; - has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND; + has_smi = rdma_protocol_ib(device, port_num); if (has_smi) cq_size *= 2; @@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device) { int start, end, i; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - if (device->node_type == RDMA_NODE_IB_SWITCH) { start = 0; end = 0; @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device) } for (i = start; i <= end; i++) { + if (!rdma_ib_or_iboe(device, i)) + continue; + if (ib_mad_port_open(device, i)) { dev_err(&device->dev, "Couldn't open port %d\n", i); goto error; @@ -3086,40 +3086,39 @@ error_agent: dev_err(&device->dev, "Couldn't close port %d\n", i); error: - i--; + while (--i >= start) { + if (!rdma_ib_or_iboe(device, i)) + continue; - while (i >= start) { if (ib_agent_port_close(device, i)) dev_err(&device->dev, "Couldn't close port %d for agents\n", i); if (ib_mad_port_close(device, i)) dev_err(&device->dev, "Couldn't close port %d\n", i); - i--; } } static void ib_mad_remove_device(struct ib_device *device) { - int i, num_ports, cur_port; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int start, end, i; if (device->node_type == RDMA_NODE_IB_SWITCH) { - num_ports = 1; - cur_port = 0; + start = 0; + end = 0; } else { - num_ports = device->phys_port_cnt; - cur_port = 1; + start = 1; + end = device->phys_port_cnt; } - for (i = 0; i < num_ports; i++, cur_port++) { - if (ib_agent_port_close(device, cur_port)) + + for (i = start; i <= end; i++) { + if (!rdma_ib_or_iboe(device, i)) + continue; + + if (ib_agent_port_close(device, i)) dev_err(&device->dev, - "Couldn't close port %d for agents\n", - cur_port); - if (ib_mad_port_close(device, cur_port)) - dev_err(&device->dev, "Couldn't close port %d\n", - cur_port); + "Couldn't close port %d for agents\n", i); + if (ib_mad_port_close(device, i)) + dev_err(&device->dev, "Couldn't close port %d\n", i); } } diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 928cdd2..aa8b334 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device) { struct ib_umad_device *umad_dev; int s, e, i; - -
[PATCH v7 11/23] IB/Verbs: Reform route related part in IB-core cma
Use raw management helpers to reform route related part in IB-core cma. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 31 --- drivers/infiniband/core/ucma.c | 25 ++--- 2 files changed, 14 insertions(+), 42 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8a07e89..36c5f8a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -923,13 +923,9 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv) static void cma_cancel_route(struct rdma_id_private *id_priv) { - switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) { - case IB_LINK_LAYER_INFINIBAND: + if (rdma_protocol_ib(id_priv->id.device, id_priv->id.port_num)) { if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); - break; - default: - break; } } @@ -1957,26 +1953,15 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) return -EINVAL; atomic_inc(&id_priv->refcount); - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ret = cma_resolve_ib_route(id_priv, timeout_ms); - break; - case IB_LINK_LAYER_ETHERNET: - ret = cma_resolve_iboe_route(id_priv); - break; - default: - ret = -ENOSYS; - } - break; - case RDMA_TRANSPORT_IWARP: + if (rdma_protocol_ib(id->device, id->port_num)) + ret = cma_resolve_ib_route(id_priv, timeout_ms); + else if (rdma_protocol_iboe(id->device, id->port_num)) + ret = cma_resolve_iboe_route(id_priv); + else if (rdma_protocol_iwarp(id->device, id->port_num)) ret = cma_resolve_iw_route(id_priv, timeout_ms); - break; - default: + else ret = -ENOSYS; - break; - } + if (ret) goto err; diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 45d67e9..dae7620 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file, resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid; resp.port_num = ctx->cm_id->port_num; - switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(ctx->cm_id->device, - ctx->cm_id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ucma_copy_ib_route(&resp, &ctx->cm_id->route); - break; - case IB_LINK_LAYER_ETHERNET: - ucma_copy_iboe_route(&resp, &ctx->cm_id->route); - break; - default: - break; - } - break; - case RDMA_TRANSPORT_IWARP: + + if (rdma_protocol_ib(ctx->cm_id->device, ctx->cm_id->port_num)) + ucma_copy_ib_route(&resp, &ctx->cm_id->route); + else if (rdma_protocol_iboe(ctx->cm_id->device, ctx->cm_id->port_num)) + ucma_copy_iboe_route(&resp, &ctx->cm_id->route); + else if (rdma_protocol_iwarp(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_iw_route(&resp, &ctx->cm_id->route); - break; - default: - break; - } out: if (copy_to_user((void __user *)(unsigned long)cmd.response, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 13/23] IB/Verbs: Reform cma_acquire_dev()
Reform cma_acquire_dev() with management helpers, introduce cma_validate_port() to make the code more clean. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 68 +-- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 34ec13f..3fb3458 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -349,18 +349,35 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a return ret; } +static inline int cma_validate_port(struct ib_device *device, u8 port, + union ib_gid *gid, int dev_type) +{ + u8 found_port; + int ret = -ENODEV; + + if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) + return ret; + + if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port)) + return ret; + + ret = ib_find_cached_gid(device, gid, &found_port, NULL); + if (port != found_port) + return -ENODEV; + + return ret; +} + static int cma_acquire_dev(struct rdma_id_private *id_priv, struct rdma_id_private *listen_id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct cma_device *cma_dev; - union ib_gid gid, iboe_gid; + union ib_gid gid, iboe_gid, *gidp; int ret = -ENODEV; - u8 port, found_port; - enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ? - IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET; + u8 port; - if (dev_ll != IB_LINK_LAYER_INFINIBAND && + if (dev_addr->dev_type != ARPHRD_INFINIBAND && id_priv->id.ps == RDMA_PS_IPOIB) return -EINVAL; @@ -370,41 +387,36 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, memcpy(&gid, dev_addr->src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); - if (listen_id_priv && - rdma_port_get_link_layer(listen_id_priv->id.device, -listen_id_priv->id.port_num) == dev_ll) { + + if (listen_id_priv) { cma_dev = listen_id_priv->cma_dev; port = listen_id_priv->id.port_num; - if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) - ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, -&found_port, NULL); - else - ret = ib_find_cached_gid(cma_dev->device, &gid, -&found_port, NULL); + gidp = rdma_protocol_iboe(cma_dev->device, port) ? + &iboe_gid : &gid; - if (!ret && (port == found_port)) { - id_priv->id.port_num = found_port; + ret = cma_validate_port(cma_dev->device, port, gidp, + dev_addr->dev_type); + if (!ret) { + id_priv->id.port_num = port; goto out; } } + list_for_each_entry(cma_dev, &dev_list, list) { for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) { if (listen_id_priv && listen_id_priv->cma_dev == cma_dev && listen_id_priv->id.port_num == port) continue; - if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) { - if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) - ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL); - else - ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL); - - if (!ret && (port == found_port)) { - id_priv->id.port_num = found_port; - goto out; - } + + gidp = rdma_protocol_iboe(cma_dev->devic
[PATCH v7 07/23] IB/Verbs: Reform IB-ulp ipoib
Use raw management helpers to reform IB-ulp ipoib. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 7cad4dd..468fc2b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1680,9 +1680,7 @@ static void ipoib_add_one(struct ib_device *device) struct net_device *dev; struct ipoib_dev_priv *priv; int s, e, p; - - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; + int count = 0; dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); if (!dev_list) @@ -1699,15 +1697,21 @@ static void ipoib_add_one(struct ib_device *device) } for (p = s; p <= e; ++p) { - if (rdma_port_get_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND) + if (!rdma_protocol_ib(device, p)) continue; dev = ipoib_add_port("ib%d", device, p); if (!IS_ERR(dev)) { priv = netdev_priv(dev); list_add_tail(&priv->list, dev_list); + count++; } } + if (!count) { + kfree(dev_list); + return; + } + ib_set_client_data(device, &ipoib_client, dev_list); } @@ -1716,9 +1720,6 @@ static void ipoib_remove_one(struct ib_device *device) struct ipoib_dev_priv *priv, *tmp; struct list_head *dev_list; - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) - return; - dev_list = ib_get_client_data(device, &ipoib_client); if (!dev_list) return; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 14/23] IB/Verbs: Reform rest part in IB-core cma
Use raw management helpers to reform rest part in IB-core cma. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 3fb3458..d43f492f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -447,10 +447,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) pkey = ntohs(addr->sib_pkey); list_for_each_entry(cur_dev, &dev_list, list) { - if (rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB) - continue; - for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) { + if (!rdma_ib_or_iboe(cur_dev->device, p)) + continue; + if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index)) continue; @@ -645,10 +645,9 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, if (ret) goto out; - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) - == RDMA_TRANSPORT_IB && - rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) - == IB_LINK_LAYER_ETHERNET) { + BUG_ON(id_priv->cma_dev->device != id_priv->id.device); + + if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) { ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL); if (ret) @@ -712,11 +711,10 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, int ret; u16 pkey; - if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == - IB_LINK_LAYER_INFINIBAND) - pkey = ib_addr_get_pkey(dev_addr); - else + if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) pkey = 0x; + else + pkey = ib_addr_get_pkey(dev_addr); ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num, pkey, &qp_attr->pkey_index); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 17/23] IB/Verbs: Use management helper cap_ib_cm()
Introduce helper cap_ib_cm() to help us check if the port of an IB device support Infiniband Communication Manager. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cm.c | 6 +++--- drivers/infiniband/core/cma.c | 19 +-- drivers/infiniband/core/ucm.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 4 files changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index add5e484..3ffaad3 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3781,7 +3781,7 @@ static void cm_add_one(struct ib_device *ib_device) set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= ib_device->phys_port_cnt; i++) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!cap_ib_cm(ib_device, i)) continue; port = kzalloc(sizeof *port, GFP_KERNEL); @@ -3832,7 +3832,7 @@ error1: port_modify.set_port_cap_mask = 0; port_modify.clr_port_cap_mask = IB_PORT_CM_SUP; while (--i) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!cap_ib_cm(ib_device, i)) continue; port = cm_dev->port[i-1]; @@ -3864,7 +3864,7 @@ static void cm_remove_one(struct ib_device *ib_device) write_unlock_irqrestore(&cm.device_lock, flags); for (i = 1; i <= ib_device->phys_port_cnt; i++) { - if (!rdma_ib_or_iboe(ib_device, i)) + if (!cap_ib_cm(ib_device, i)) continue; port = cm_dev->port[i-1]; diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d43f492f..ecb0484 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -745,7 +745,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (cap_ib_cm(id->device, id->port_num)) { if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else @@ -1033,7 +1033,7 @@ void rdma_destroy_id(struct rdma_cm_id *id) mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { - if (rdma_ib_or_iboe(id_priv->id.device, 1)) { + if (cap_ib_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { @@ -1616,8 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct rdma_cm_id *id; int ret; - if (cma_family(id_priv) == AF_IB && - !rdma_ib_or_iboe(cma_dev->device, 1)) + if (cma_family(id_priv) == AF_IB && !cap_ib_cm(cma_dev->device, 1)) return; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, @@ -2008,7 +2007,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) mutex_lock(&lock); list_for_each_entry(cur_dev, &dev_list, list) { if (cma_family(id_priv) == AF_IB && - !rdma_ib_or_iboe(cur_dev->device, 1)) + !cap_ib_cm(cur_dev->device, 1)) continue; if (!cma_dev) @@ -2517,7 +2516,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) id_priv->backlog = backlog; if (id->device) { - if (rdma_ib_or_iboe(id->device, 1)) { + if (cap_ib_cm(id->device, 1)) { ret = cma_ib_listen(id_priv); if (ret) goto err; @@ -2861,7 +2860,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) id_priv->srq = conn_param->srq; } - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (cap_ib_cm(id->device, id->port_num)) { if (id->qp_type == IB_QPT_UD) ret = cma_resolve_ib_udp(id_priv, conn_param); else @@ -2972,7 +2971,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) id_priv->srq = conn_param->srq; } - if (rdma_ib_or_iboe(id->device, id->port_num)) { + if (cap_ib_cm(id->device, id->port_num)) { if (id->qp_type == IB_QPT_UD) { if (conn_param)
[PATCH v7 15/23] IB/Verbs: Use management helper cap_ib_mad()
Introduce helper cap_ib_mad() to help us check if the port of an IB device support Infiniband Management Datagrams. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/user_mad.c | 6 +++--- include/rdma/ib_verbs.h| 15 +++ 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 507eb67..59459e7 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -3066,7 +3066,7 @@ static void ib_mad_init_device(struct ib_device *device) } for (i = start; i <= end; i++) { - if (!rdma_ib_or_iboe(device, i)) + if (!cap_ib_mad(device, i)) continue; if (ib_mad_port_open(device, i)) { @@ -3087,7 +3087,7 @@ error_agent: error: while (--i >= start) { - if (!rdma_ib_or_iboe(device, i)) + if (!cap_ib_mad(device, i)) continue; if (ib_agent_port_close(device, i)) @@ -3111,7 +3111,7 @@ static void ib_mad_remove_device(struct ib_device *device) } for (i = start; i <= end; i++) { - if (!rdma_ib_or_iboe(device, i)) + if (!cap_ib_mad(device, i)) continue; if (ib_agent_port_close(device, i)) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index aa8b334..e3ccbf2 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1294,7 +1294,7 @@ static void ib_umad_add_one(struct ib_device *device) umad_dev->end_port = e; for (i = s; i <= e; ++i) { - if (!rdma_ib_or_iboe(device, i)) + if (!cap_ib_mad(device, i)) continue; umad_dev->port[i - s].umad_dev = umad_dev; @@ -1315,7 +1315,7 @@ static void ib_umad_add_one(struct ib_device *device) err: while (--i >= s) { - if (!rdma_ib_or_iboe(device, i)) + if (!cap_ib_mad(device, i)) continue; ib_umad_kill_port(&umad_dev->port[i - s]); @@ -1333,7 +1333,7 @@ static void ib_umad_remove_one(struct ib_device *device) return; for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i) { - if (rdma_ib_or_iboe(device, i)) + if (cap_ib_mad(device, i)) ib_umad_kill_port(&umad_dev->port[i]); } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index acdba60..cb3ba2d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1774,6 +1774,21 @@ static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num) return (pt == RDMA_PROTOCOL_IB || pt == RDMA_PROTOCOL_IBOE); } +/** + * cap_ib_mad - Check if the port of device has the capability Infiniband + * Management Datagrams. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support Infiniband + * Management Datagrams. + */ +static inline int cap_ib_mad(struct ib_device *device, u8 port_num) +{ + return rdma_ib_or_iboe(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 20/23] IB/Verbs: Use management helper cap_ib_mcast()
Introduce helper cap_ib_mcast() to help us check if the port of an IB device support Infiniband Multicast. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 6 +++--- drivers/infiniband/core/multicast.c | 6 +++--- include/rdma/ib_verbs.h | 15 +++ 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index ec3a901..c06ca60 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1007,7 +1007,7 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); - if (rdma_protocol_ib(id_priv->cma_dev->device, + if (cap_ib_mcast(id_priv->cma_dev->device, id_priv->id.port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); @@ -3321,7 +3321,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, if (rdma_protocol_iboe(id->device, id->port_num)) { kref_init(&mc->mcref); ret = cma_iboe_join_multicast(id_priv, mc); - } else if (rdma_protocol_ib(id->device, id->port_num)) + } else if (cap_ib_mcast(id->device, id->port_num)) ret = cma_join_ib_multicast(id_priv, mc); else ret = -ENOSYS; @@ -3355,7 +3355,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) BUG_ON(id_priv->cma_dev->device != id->device); - if (rdma_protocol_ib(id->device, id->port_num)) { + if (cap_ib_mcast(id->device, id->port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); } else if (rdma_protocol_iboe(id->device, id->port_num)) diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index b57ed03..bdc1880 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); - if (WARN_ON(!rdma_protocol_ib(dev->device, event->element.port_num))) + if (WARN_ON(!cap_ib_mcast(dev->device, event->element.port_num))) return; index = event->element.port_num - dev->start_port; @@ -820,7 +820,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (!rdma_protocol_ib(device, dev->start_port + i)) + if (!cap_ib_mcast(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; @@ -858,7 +858,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { - if (rdma_protocol_ib(device, dev->start_port + i)) { + if (cap_ib_mcast(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index f3d9760..dde2aa9 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1849,6 +1849,21 @@ static inline int cap_ib_sa(struct ib_device *device, u8 port_num) return rdma_protocol_ib(device, port_num); } +/** + * cap_ib_mcast - Check if the port of device has the capability Infiniband + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support Infiniband + * Multicast. + */ +static inline int cap_ib_mcast(struct ib_device *device, u8 port_num) +{ + return cap_ib_sa(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 19/23] IB/Verbs: Use management helper cap_ib_sa()
Introduce helper cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 4 ++-- drivers/infiniband/core/sa_query.c | 10 +- drivers/infiniband/core/ucma.c | 2 +- include/rdma/ib_verbs.h| 15 +++ 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 7d55296..ec3a901 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -933,7 +933,7 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv) static void cma_cancel_route(struct rdma_id_private *id_priv) { - if (rdma_protocol_ib(id_priv->id.device, id_priv->id.port_num)) { + if (cap_ib_sa(id_priv->id.device, id_priv->id.port_num)) { if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); } @@ -1957,7 +1957,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) return -EINVAL; atomic_inc(&id_priv->refcount); - if (rdma_protocol_ib(id->device, id->port_num)) + if (cap_ib_sa(id->device, id->port_num)) ret = cma_resolve_ib_route(id_priv, timeout_ms); else if (rdma_protocol_iboe(id->device, id->port_num)) ret = cma_resolve_iboe_route(id_priv); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index b115c28..c82aa48 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event struct ib_sa_port *port = &sa_dev->port[event->element.port_num - sa_dev->start_port]; - if (WARN_ON(!rdma_protocol_ib(handler->device, port->port_num))) + if (WARN_ON(!cap_ib_sa(handler->device, port->port_num))) return; spin_lock_irqsave(&port->ah_lock, flags); @@ -1173,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device) for (i = 0; i <= e - s; ++i) { spin_lock_init(&sa_dev->port[i].ah_lock); - if (!rdma_protocol_ib(device, i + 1)) + if (!cap_ib_sa(device, i + 1)) continue; sa_dev->port[i].sm_ah= NULL; @@ -1208,7 +1208,7 @@ static void ib_sa_add_one(struct ib_device *device) goto err; for (i = 0; i <= e - s; ++i) { - if (rdma_protocol_ib(device, i + 1)) + if (cap_ib_sa(device, i + 1)) update_sm_ah(&sa_dev->port[i].update_task); } @@ -1216,7 +1216,7 @@ static void ib_sa_add_one(struct ib_device *device) err: while (--i >= 0) { - if (rdma_protocol_ib(device, i + 1)) + if (cap_ib_sa(device, i + 1)) ib_unregister_mad_agent(sa_dev->port[i].agent); } free: @@ -1237,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device) flush_workqueue(ib_wq); for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) { - if (rdma_protocol_ib(device, i + 1)) { + if (cap_ib_sa(device, i + 1)) { ib_unregister_mad_agent(sa_dev->port[i].agent); if (sa_dev->port[i].sm_ah) kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah); diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index dae7620..6204065 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -723,7 +723,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid; resp.port_num = ctx->cm_id->port_num; - if (rdma_protocol_ib(ctx->cm_id->device, ctx->cm_id->port_num)) + if (cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_ib_route(&resp, &ctx->cm_id->route); else if (rdma_protocol_iboe(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_iboe_route(&resp, &ctx->cm_id->route); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index d69e467..f3d9760 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1834,6 +1834,21 @@ static inline int cap_iw_cm(struct ib_device *device, u8 port_num) return rdma_protocol_iwarp(device, port_num); } +/** + * cap_ib_sa - Check if the port
[PATCH v7 12/23] IB/Verbs: Reform mcast related part in IB-core cma
Use raw management helpers to reform mcast related part in IB-core cma. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 56 ++- 1 file changed, 18 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 36c5f8a..34ec13f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -997,17 +997,12 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); - switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) { - case IB_LINK_LAYER_INFINIBAND: + if (rdma_protocol_ib(id_priv->cma_dev->device, + id_priv->id.port_num)) { ib_sa_free_multicast(mc->multicast.ib); kfree(mc); - break; - case IB_LINK_LAYER_ETHERNET: + } else kref_put(&mc->mcref, release_mc); - break; - default: - break; - } } } @@ -3314,24 +3309,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, list_add(&mc->list, &id_priv->mc_list); spin_unlock(&id_priv->lock); - switch (rdma_node_get_transport(id->device->node_type)) { - case RDMA_TRANSPORT_IB: - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ret = cma_join_ib_multicast(id_priv, mc); - break; - case IB_LINK_LAYER_ETHERNET: - kref_init(&mc->mcref); - ret = cma_iboe_join_multicast(id_priv, mc); - break; - default: - ret = -EINVAL; - } - break; - default: + if (rdma_protocol_iboe(id->device, id->port_num)) { + kref_init(&mc->mcref); + ret = cma_iboe_join_multicast(id_priv, mc); + } else if (rdma_protocol_ib(id->device, id->port_num)) + ret = cma_join_ib_multicast(id_priv, mc); + else ret = -ENOSYS; - break; - } if (ret) { spin_lock_irq(&id_priv->lock); @@ -3359,19 +3343,15 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id->qp, &mc->multicast.ib->rec.mgid, be16_to_cpu(mc->multicast.ib->rec.mlid)); - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) { - switch (rdma_port_get_link_layer(id->device, id->port_num)) { - case IB_LINK_LAYER_INFINIBAND: - ib_sa_free_multicast(mc->multicast.ib); - kfree(mc); - break; - case IB_LINK_LAYER_ETHERNET: - kref_put(&mc->mcref, release_mc); - break; - default: - break; - } - } + + BUG_ON(id_priv->cma_dev->device != id->device); + + if (rdma_protocol_ib(id->device, id->port_num)) { + ib_sa_free_multicast(mc->multicast.ib); + kfree(mc); + } else if (rdma_protocol_iboe(id->device, id->port_num)) + kref_put(&mc->mcref, release_mc); + return; } } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 18/23] IB/Verbs: Use management helper cap_iw_cm()
Introduce helper cap_iw_cm() to help us check if the port of an IB device support IWARP Communication Manager. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 14 +++--- include/rdma/ib_verbs.h | 15 +++ 2 files changed, 22 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index ecb0484..7d55296 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -754,7 +754,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (cap_iw_cm(id->device, id->port_num)) { if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; @@ -1036,7 +1036,7 @@ void rdma_destroy_id(struct rdma_cm_id *id) if (cap_ib_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.ib) ib_destroy_cm_id(id_priv->cm_id.ib); - } else if (rdma_protocol_iwarp(id_priv->id.device, 1)) { + } else if (cap_iw_cm(id_priv->id.device, 1)) { if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); } @@ -2520,7 +2520,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) ret = cma_ib_listen(id_priv); if (ret) goto err; - } else if (rdma_protocol_iwarp(id->device, 1)) { + } else if (cap_iw_cm(id->device, 1)) { ret = cma_iw_listen(id_priv, backlog); if (ret) goto err; @@ -2865,7 +2865,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) ret = cma_resolve_ib_udp(id_priv, conn_param); else ret = cma_connect_ib(id_priv, conn_param); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) + } else if (cap_iw_cm(id->device, id->port_num)) ret = cma_connect_iw(id_priv, conn_param); else ret = -ENOSYS; @@ -2987,7 +2987,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) else ret = cma_rep_recv(id_priv); } - } else if (rdma_protocol_iwarp(id->device, id->port_num)) + } else if (cap_iw_cm(id->device, id->port_num)) ret = cma_accept_iw(id_priv, conn_param); else ret = -ENOSYS; @@ -3042,7 +3042,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data, ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (cap_iw_cm(id->device, id->port_num)) { ret = iw_cm_reject(id_priv->cm_id.iw, private_data, private_data_len); } else @@ -3068,7 +3068,7 @@ int rdma_disconnect(struct rdma_cm_id *id) /* Initiate or respond to a disconnect. */ if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); - } else if (rdma_protocol_iwarp(id->device, id->port_num)) { + } else if (cap_iw_cm(id->device, id->port_num)) { ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); } else ret = -EINVAL; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 87b07f2..d69e467 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1819,6 +1819,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support IWARP + * Communication Manager. + */ +static inline int cap_iw_cm(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid);
[PATCH v7 22/23] IB/Verbs: Use management helper cap_af_ib()
Introduce helper cap_af_ib() to help us check if the port of an IB device support Native Infiniband Address. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index c06ca60..c3dbcdd 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -448,7 +448,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) list_for_each_entry(cur_dev, &dev_list, list) { for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) { - if (!rdma_ib_or_iboe(cur_dev->device, p)) + if (!cap_af_ib(cur_dev->device, p)) continue; if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index)) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index cca0293..c045be1 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1865,6 +1865,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num) } /** + * cap_af_ib - Check if the port of device has the capability + * Native Infiniband Address. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Native Infiniband Address. + */ +static inline int cap_af_ib(struct ib_device *device, u8 port_num) +{ + return rdma_ib_or_iboe(device, port_num); +} + +/** * cap_read_multi_sge - Check if the port of device has the capability * RDMA Read Multiple Scatter-Gather Entries. * -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 21/23] IB/Verbs: Use management helper cap_read_multi_sge()
Introduce helper cap_read_multi_sge() to help us check if the port of an IB device support RDMA Read Multiple Scatter-Gather Entries. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- include/rdma/ib_verbs.h | 15 +++ net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index dde2aa9..cca0293 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1864,6 +1864,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num) return cap_ib_sa(device, port_num); } +/** + * cap_read_multi_sge - Check if the port of device has the capability + * RDMA Read Multiple Scatter-Gather Entries. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * RDMA Read Multiple Scatter-Gather Entries. + */ +static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num) +{ + return !rdma_protocol_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 2cc625d..7711b7a 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -117,7 +117,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { - if (rdma_protocol_iwarp(xprt->sc_cm_id->device, + if (!cap_read_multi_sge(xprt->sc_cm_id->device, xprt->sc_cm_id->port_num)) return 1; else -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 23/23] IB/Verbs: Use management helper cap_eth_ah()
Introduce helper cap_eth_ah() to help us check if the port of an IB device support Ethernet Address Handler. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/sa_query.c | 2 +- drivers/infiniband/core/verbs.c| 4 ++-- include/rdma/ib_verbs.h| 15 +++ 4 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index c3dbcdd..2e96883 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -711,7 +711,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, int ret; u16 pkey; - if (rdma_protocol_iboe(id_priv->id.device, id_priv->id.port_num)) + if (cap_eth_ah(id_priv->id.device, id_priv->id.port_num)) pkey = 0x; else pkey = ib_addr_get_pkey(dev_addr); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index c82aa48..1f0d009 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr->port_num = port_num; ah_attr->static_rate = rec->rate; - force_grh = rdma_protocol_iboe(device, port_num); + force_grh = cap_eth_ah(device, port_num); if (rec->hop_limit > 1 || force_grh) { ah_attr->ah_flags = IB_AH_GRH; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 7dd2f51..db1139f 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -200,7 +200,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, int ret; memset(ah_attr, 0, sizeof *ah_attr); - if (rdma_protocol_iboe(device, port_num)) { + if (cap_eth_ah(device, port_num)) { if (!(wc->wc_flags & IB_WC_GRH)) return -EPROTOTYPE; @@ -869,7 +869,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp, union ib_gid sgid; if ((*qp_attr_mask & IB_QP_AV) && - (rdma_protocol_iboe(qp->device, qp_attr->ah_attr.port_num))) { + (cap_eth_ah(qp->device, qp_attr->ah_attr.port_num))) { ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num, qp_attr->ah_attr.grh.sgid_index, &sgid); if (ret) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index c045be1..c724114 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1880,6 +1880,21 @@ static inline int cap_af_ib(struct ib_device *device, u8 port_num) } /** + * cap_eth_ah - Check if the port of device has the capability + * Ethernet Address Handler. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Ethernet Address Handler. + */ +static inline int cap_eth_ah(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_iboe(device, port_num); +} + +/** * cap_read_multi_sge - Check if the port of device has the capability * RDMA Read Multiple Scatter-Gather Entries. * -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 09/23] IB/Verbs: Reform IB-core verbs
Use raw management helpers to reform IB-core verbs Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/verbs.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f93eb8d..7dd2f51 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -198,11 +198,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, u32 flow_class; u16 gid_index; int ret; - int is_eth = (rdma_port_get_link_layer(device, port_num) == - IB_LINK_LAYER_ETHERNET); memset(ah_attr, 0, sizeof *ah_attr); - if (is_eth) { + if (rdma_protocol_iboe(device, port_num)) { if (!(wc->wc_flags & IB_WC_GRH)) return -EPROTOTYPE; @@ -871,7 +869,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp, union ib_gid sgid; if ((*qp_attr_mask & IB_QP_AV) && - (rdma_port_get_link_layer(qp->device, qp_attr->ah_attr.port_num) == IB_LINK_LAYER_ETHERNET)) { + (rdma_protocol_iboe(qp->device, qp_attr->ah_attr.port_num))) { ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num, qp_attr->ah_attr.grh.sgid_index, &sgid); if (ret) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v7 16/23] IB/Verbs: Use management helper cap_ib_smi()
Introduce helper cap_ib_smi() to help us check if the port of an IB device support Infiniband Subnet Management Interface. Cc: Hal Rosenstock Cc: Steve Wise Cc: Tom Talpey Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/mad.c | 2 +- include/rdma/ib_verbs.h | 15 +++ 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index 89d4fbc..61471ee 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } - if (rdma_protocol_ib(device, port_num)) { + if (cap_ib_smi(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 59459e7..ee3a05e 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device, init_mad_qp(port_priv, &port_priv->qp_info[1]); cq_size = mad_sendq_size + mad_recvq_size; - has_smi = rdma_protocol_ib(device, port_num); + has_smi = cap_ib_smi(device, port_num); if (has_smi) cq_size *= 2; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index cb3ba2d..b364a82 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1789,6 +1789,21 @@ static inline int cap_ib_mad(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_ib_smi - Check if the port of device has the capability Infiniband + * Subnet Management Interface. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support Infiniband + * Subnet Management Interface. + */ +static inline int cap_ib_smi(struct ib_device *device, u8 port_num) +{ + return rdma_protocol_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/