The region field was not added to the devlink man page.
Fixes: 8b4fbf0bed8e6 ("devlink: Add support for devlink-region access")
Signed-off-by: Alex Vesker
---
man/man8/devlink.8 | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/man/man8/devlink.8 b/man/man8
e snapshot support via devlink: (Alex Vesker)
Last three patches, add the support for capturing region snapshot
of the
firmware crspace during critical errors, using devlink region_snapshot
parameter.
-Saeed.
----
Alex Vesker (3):
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
devlink/devlink.c | 485 +-
man/man8/devlink-region.8 | 131 +
man/man8/devlink.8| 1 +
3 files changed, 616 insertions(+), 1 deletion(-)
create mode 100644 man/man8
On 7/13/2018 3:51 AM, Jakub Kicinski wrote:
On Thu, 12 Jul 2018 15:13:09 +0300, Alex Vesker wrote:
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/uapi/linux/devlink.h | 2 +
net/core/devlink.c
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.
Signed-off-by: Alex Vesker
Signed-off-by: Tariq Toukan
Signed-off-by: Jiri Pirko
---
drivers/net/ethernet/mellanox/mlx4/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx4/catas.c
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri
are provided a snapshot read
will done.
This is used for both snapshot access and will be used in the same
way to access current data on the region.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/uapi/linux/devlink.h | 7 ++
net/core/devlink.c | 182
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.
Signed-off-by: Alex Vesker
Signed
region_snapshot - When set enables capturing region snapshots
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
Reviewed-by: Moshe Shemesh
---
include/net/devlink.h | 4
net/core/devlink.c| 5 +
2 files changed, 9 insertions(+)
diff --git a/include/net/devlink.h b/include/net
of readable data followed by a
lock which is used to block volatile CR space access.
Signed-off-by: Alex Vesker
Signed-off-by: Tariq Toukan
Signed-off-by: Jiri Pirko
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 5 -
drivers/net/ethernet/mellanox/mlx4/fw.h | 1 +
drivers/net/ethernet/mellanox
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.
Reply includes:
BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
will be deleted using the destructor function
when destroying a region or when a snapshot delete command
from devlink user tool.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/net/devlink.h | 13 +++
net/core/devlink.c| 95
Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
double allocation
v2->v3:
-Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE
from u32 to u64
Alex Vesker (11):
devlin
, register-space.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/net/devlink.h | 22 ++
net/core/devlink.c| 84 +++
2 files changed, 106 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.
Reply includes:
BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
will be deleted using the destructor function
when destroying a region or when a snapshot delete command
from devlink user tool.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/net/devlink.h | 13 +++
net/core/devlink.c| 95
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
are provided a snapshot read
will done.
This is used for both snapshot access and will be used in the same
way to access current data on the region.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/uapi/linux/devlink.h | 7 ++
net/core/devlink.c | 182
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.
Signed-off-by: Alex Vesker
Signed
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/uapi/linux/devlink.h | 2 +
net/core/devlink.c
of readable data followed by a
lock which is used to block volatile CR space access.
Signed-off-by: Alex Vesker
Signed-off-by: Tariq Toukan
Signed-off-by: Jiri Pirko
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 5 -
drivers/net/ethernet/mellanox/mlx4/fw.h | 1 +
drivers/net/ethernet/mellanox
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.
Signed-off-by: Alex Vesker
Signed-off-by: Tariq Toukan
Signed-off-by: Jiri Pirko
---
drivers/net/ethernet/mellanox/mlx4/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx4/catas.c
, register-space.
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
---
include/net/devlink.h | 22 ++
net/core/devlink.c| 84 +++
2 files changed, 106 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index
Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
double allocation
Alex Vesker (11):
devlink: Add support for creating and destroying regions
devlink: Add callback to query for snaps
region_snapshot - When set enables capturing region snapshots
Signed-off-by: Alex Vesker
Signed-off-by: Jiri Pirko
Reviewed-by: Moshe Shemesh
---
include/net/devlink.h | 4
net/core/devlink.c| 5 +
2 files changed, 9 insertions(+)
diff --git a/include/net/devlink.h b/include/net
On 3/31/2018 8:21 PM, David Ahern wrote:
On 3/31/18 9:53 AM, Andrew Lunn wrote:
I want to be able to login to a customer and accessing this snapshot
without any previous configuration from the user and not asking for
enabling the feature and then waiting for a repro...this will help
debugging
/20/436
How well does this API work for a 2Gbyte snapshot?
Ccing Alex who did the tests.
I didn't check the performance for such a large snapshot.
From my measurement it takes 0.09s for 1 MB of data this means
about ~3m.
This can be tuned and improved since this is a socket application
On 3/31/2018 1:26 AM, David Ahern wrote:
On 3/30/18 1:39 PM, Alex Vesker wrote:
On 3/30/2018 7:57 PM, David Ahern wrote:
On 3/30/18 8:34 AM, Andrew Lunn wrote:
And it seems to want contiguous pages. How well does that work after
the system has been running for a while and memory
On 3/30/2018 7:57 PM, David Ahern wrote:
On 3/30/18 8:34 AM, Andrew Lunn wrote:
And it seems to want contiguous pages. How well does that work after
the system has been running for a while and memory is fragmented?
The allocation can be changed, there is no read need for contiguous pages.
It
On 3/29/2018 10:51 PM, Andrew Lunn wrote:
Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]
So you have 2Mbytes of snapshot data. Is this held in the device, or
kernel memory?
This is allocated in devlink, the
On 3/29/2018 8:13 PM, Andrew Lunn wrote:
On Thu, Mar 29, 2018 at 07:07:43PM +0300, Alex Vesker wrote:
This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes
, register-space.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
include/net/devlink.h | 22 ++
net/core/devlink.c| 84 +++
2 files changed, 106 insertions(+)
diff --git a
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.
Signed-off-by: Alex Vesker <
are provided a snapshot read
will done.
This is used for both snapshot access and will be used in the same
way to access current data on the region.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
include/uapi/linux/devlink.h | 7
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.
Reply includes:
BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE
Signed-off-by: Alex Vesker <va...@mellanox.com>
Sign
region.
The snapshot are can be deleted from devlink user tool or when
destroying a region.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
include/net/devlink.h | 9 +
net/core/devli
of readable data followed by a
lock which is used to block volatile CR space access.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 5 -
dr
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
drivers/net/ethe
-health snapshot 1 address 0
length 16
0014 95dc 0014 9514 0035 1670 0034 db30
For more information you can check devlink-region.8 man page
Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region
Alex Vesker
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
includ
On Mon, 12 Mar 2018 09:01:54 -0700
Alexander Duyck <alexander.du...@gmail.com> wrote:
> On Mon, Mar 12, 2018 at 12:59 AM, Christoph Hellwig <h...@lst.de> wrote:
> > On Sun, Mar 11, 2018 at 09:59:09PM -0600, Alex Williamson wrote:
> >> I still struggle to understa
ed and why they should care.
Can't we just have a pci_simple_sriov_configure() helper and ignore
this unmanaged business? Thanks,
Alex
when a PF driver is not present to manage a device, or the PF
> + driver does not provide functionality to support SR-IOV.
Given a pf, how does a user determine whether it is managed or unmanaged
and therefore which autoprobe attributes are in effect? Thanks,
Alex
On Fri, 2 Mar 2018 06:54:17 +
"Tian, Kevin" <kevin.t...@intel.com> wrote:
> > From: Alex Williamson
> > Sent: Friday, March 2, 2018 4:22 AM
> > >
> > > I am pretty sure that you are describing is true of some, but not for
> > > al
On Thu, 1 Mar 2018 18:49:53 -0800
Alexander Duyck <alexander.du...@gmail.com> wrote:
> On Thu, Mar 1, 2018 at 3:58 PM, Alex Williamson
> <alex.william...@redhat.com> wrote:
> > On Thu, 1 Mar 2018 14:42:40 -0800
> > Alexander Duyck <alexander.du...@gmail.com> w
On Thu, 1 Mar 2018 14:42:40 -0800
Alexander Duyck <alexander.du...@gmail.com> wrote:
> On Thu, Mar 1, 2018 at 12:22 PM, Alex Williamson
> <alex.william...@redhat.com> wrote:
> > On Wed, 28 Feb 2018 16:36:38 -0800
> > Alexander Duyck <alexander.du...@gmail.com&
On Wed, 28 Feb 2018 16:36:38 -0800
Alexander Duyck <alexander.du...@gmail.com> wrote:
> On Wed, Feb 28, 2018 at 2:59 PM, Alex Williamson
> <alex.william...@redhat.com> wrote:
> > On Wed, 28 Feb 2018 09:49:21 -0800
> > Alexander Duyck <alexander.du...@gmail.com&
On Wed, 28 Feb 2018 09:49:21 -0800
Alexander Duyck <alexander.du...@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
> <alexander.du...@gmail.com> wrote:
> > On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
> > <alex.william...@redhat.com>
On Tue, 27 Feb 2018 11:06:54 -0800
Alexander Duyck wrote:
> From: Alexander Duyck
>
> This patch is meant to add support for SR-IOV on devices when the VFs are
> not managed by the kernel. Examples of recent patches attempting to do this
+-
> drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +-
For amdgpu:
Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> drivers/media/i2c/msp3400-kthreads.c | 2 +-
> drivers/message/fusion/mptsas.c | 2 +-
> drive
rks fine no matter how many times he changes the state of
VirtualConnect modules.
Jarod,
could you please add printing slave->link_new_state for both slaves at each
entry to bond_miimon_inspect?
(and instead of nudging slave->new_link like I suggested, use Jay's patch).
Alex
On 11/03/2017
ect: entered
BOND_LINK_DOWN case on slave ens3f0
Oct 31 09:09:26 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered
BOND_LINK_UP case on slave ens3f1
...
Alex
On 11/02/2017 09:11 PM, Jay Vosburgh wrote:
Alex Sidorenko <alexandre.sidore...@hpe.com> wrote:
On 11/02/2017 12:51 AM, Jay Vosb
On 11/02/2017 12:51 AM, Jay Vosburgh wrote:
Jarod Wilson <ja...@redhat.com> wrote:
On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
Jay Vosburgh <jay.vosbu...@canonical.com> wrote:
Alex Sidorenko <alexandre.sidore...@hpe.com> wrote:
The problem has been found while tryi
commit++;
continue;
}
--
2.7.4
--
--
Alex Sidorenko email: a...@hpe.com
ERT Linux Hewlett-Packard Enterprise (Canada)
--
On 10/23/2017 6:47 PM, Jason Gunthorpe wrote:
On Sat, Oct 14, 2017 at 11:48:23AM -0700, Saeed Mahameed wrote:
From: Alex Vesker <va...@mellanox.com>
This change is needed for PKEY support, since the RQs are shared
between the child interface and the parent. The parent is responsible
fo
ng a bit long.
> Is it possible to run-time determine that the ACS control register is hard
> wired
> to zero, and apply the quirk to all such devices.
> Or even changing to a (device & mask) == value test??
In fact, hard-wired ACS doesn't need a quirk at all, please see the
other thread of the discussion. Thanks,
Alex
ill need the quirk for IOMMU grouping to allow assignment
> > of individual SR-IOV functions.
> >
> > Signed-off-by: Roland Dreier <rol...@purestorage.com>
>
> I haven't seen a real conclusion to the discussion yet, so I'm waiting on
> that and hopefully an ack from Alex.
ed the --cc-cmd option to send-email. I'll be sure to CC netdev@ on
[PATCH v2].
Alex
C in the capabilities
> register would be more useful.
Some sort of interface for manipulating the control vector would be
necessary to fully support it and maybe the interface today just
doesn't make much sense for it. Thanks,
Alex
with X550, since
> from reading the code it looks like it should work, but people have observed
> that it doesn't on our system. However the issue might be elsewhere.
>
> However I'll send a patch removing that "| PCI_ACS_EC" from the check if
> you agree - maybe I'm misunderstanding the logic but I don't see how it
> could work if it ever became relevant.
I don't yet see anything wrong with the EC handling, but please explain
further if I'm just being dense. Thanks,
Alex
On Thu, 20 Jul 2017 15:53:04 -0700
Roland Dreier <roland.dre...@gmail.com> wrote:
> On Thu, Jul 20, 2017 at 3:15 PM, Alex Williamson
> <alex.william...@redhat.com> wrote:
>
> > Most of the ACS capabilities are worded as "Must be implemented by
> > devices
your suggested, thanks. :)
>
> @Bjorn do you want me to spawn a new patchset with the new commit title
> and the Reviewed-by from Casey on the patch 3, or maybe you could pick this
> up and modify it own ? thanks.
Hi Ding,
Bjorn is currently on holiday so it might be a good idea to respin the
series with any updates so nothing is lost. Thanks,
Alex
ACS capabilities are worded as "Must be implemented by
devices that implement ..." Shouldn't a hard-wired ACS capability
sufficiently describe that, or is there something wrong with how
they're hard wired? Thanks,
Alex
>
> Signed-off-by: Roland Dreier <rol
it
abundantly clear that there's nothing in the body of the loop, it's
also more aesthetically pleasing than a semi-colon on the line by
itself, ex. /* Nothing */; It's just too easy to misinterpret the
loop otherwise, especially without gratuitous white space. Thanks,
Alex
> >
t.
I'm glad it was an easy fix. Last time I had issues with RGMII, I had to
pull out the oscilloscope.
Alex
Regards,
Teresa
ixing that commit, you should also provide
a proper "Fixes: " tag on the line right before your signoff.
Thank you very much for the feedback. I will update accordingly.
Alex
Thanks.
On Monday, November 28, 2016 4:14:04 PM EST Alex Sidorenko wrote:
> On Monday, November 28, 2016 3:54:59 PM EST David Miller wrote:
> > From: Alex Sidorenko <alexandre.sidore...@hpe.com>
> > Date: Mon, 28 Nov 2016 15:49:26 -0500
> >
> > > Now the quest
On Monday, November 28, 2016 3:54:59 PM EST David Miller wrote:
> From: Alex Sidorenko <alexandre.sidore...@hpe.com>
> Date: Mon, 28 Nov 2016 15:49:26 -0500
>
> > Now the question is whether is is OK to have icsk->icsk_ack.rcv_mss
> > larger than MTU.
>
> It
. Previous versions used mss_clamp
* here. I don't know if the value based on our guesses
* of peer's MSS is better for the performance. It's more correct
* but may be worse for the performance because of rcv_mss
On 11/16/2016 08:54 AM, Andrew Lunn wrote:
On Wed, Nov 16, 2016 at 08:44:30AM -0800, Alex wrote:
On 11/16/2016 05:50 AM, Andrew Lunn wrote:
On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
VSC8601
i-txid", "rgmii-rxid" or "rgmii" interfaces. */
Hi Alexandru
You should be able to make "rgmii" work as expected. If that is the
phy mode, disable the skew.
And that's exactly the implemented behavior. See vsc8601_config_init()
below.
Alex
t applies a skew in both TX and RX directions.
Alex
for port lookup
and as a result, packets are dropped. The fix consists of changing
'int num' to 'unsigned int num'. Testing of a fixed kernel shows that
there is no packet drop anymore.
Signed-off-by: Alex Sidorenko <alexandre.sidore...@hpe.com>
---
v2: fixed formatting
include/linux/if_
for port lookup and as a result,
packets are dropped. The fix
consists of changing 'int num' to 'unsigned int num'. Testing of a fixed kernel
shows that there
is no packet drop anymore.
Signed-off-by: Alex Sidorenko <alexandre.sidore...@hpe.com>
---
include/linux/if_team.h | 2 +-
ass 'unsigned char hash' to team_num_to_port_index(), so there should
be no overflow. I did not test that mode in my tests.
Regards,
Alex
--
--
Alex Sidorenko email: a...@hpe.com
ERT Linux Hewlett-Packard Enterprise (Canada)
--
ello,
> >
> > Original bugzilla thread could be found here:
> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840
>
> That bugzilla is private and I can't read it.
Hmm, I can, but I don't see anything in it that supports this. Is that
really the right bz? It's the right hardware, but has all sorts of FUD
about the version of various other components in the stack.
> > This is our HW bug, exist only in 82579 devices. More new devices
> > have no such problem. We have found root cause and suggested this
> > solution.
>
> Is there an erratum you can reference?
>
> > This solution should work for a 95% of cases, so I do not
> > think that this is fragile. For another cases possible solution is
> > get up working system and manually disable FLR, before VM start use
> > our adapter.
>
> I don't think a 95% solution is sufficient. Can you use the
> pci_dev_specific_reset() framework to make a 100% solution?
Right, plus when this does work I suspect it removes the one mechanism
we have to reset the device, which depending on how obscure the failure
scenario is, isn't a clear cut improvement for device assignment.
Thanks,
Alex
s);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> quirk_intel_flr_cap_dis);
This seems like a pretty fragile quirk since we're just hoping that the
BIOS hasn't already written this byte. Should we at least re-read and
warn if the write didn't take? What about using dev_flags or a device
specific reset to make this less fragile? A device specific reset
could pick the best reset mechanism for the device, ignoring AF FLR.
Thanks,
Alex
pace or a VM driver can't
very well determine these sorts of interactions when it only has
visibility to a subset of the functions and users and hardware folks
would throw a fit if I extended iommu groups to encompass all the
related devices rather than take the relatively simple step of
virtualizing these accesses and occasionally quirking devices that are
extra broken, as seems to be required here. Thanks,
Alex
Greetings,
We had this warning[1] on long-term mainline kernel 3.18.19. Can anybody
please advise on what might be causing it.
Thanks,
Alex.
[1]
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.538709] BUG: Bad page state
in process kworker/0:1H pfn:4b317
Jul 21 22:57:27 vsa-01cc-vc-1
drivers.
>
> Yes, agreed. We can remove the CONFIG_VXLAN and CONFIG_GENEVE stuff now.
> But I think this can be a separate series.
Actually it should be pretty easy to do all of this in the same
series. I am directly effecting the code where it is wrapped up
anyway. Dropping the defines will make this easier for me to test.
- Alex
Tom Herbert herbertland.com> writes:
>
> Transports over UDP is intended to encapsulate TCP and other transport
> protocols directly and securely in UDP.
>
> The goal of this work is twofold:
>
> 1) Allow applications to run their own transport layer stack (i.e.from
>userspace). This
uced in Tom's patchset with hardware
>> >> offloads.
>> >>
>> >> ---
>> >>
>> >> Alexander Duyck (2):
>> >> ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled
>> >> with FOU/GUE
>> >>
@@ static int inet_gro_complete(struct sk_buff *skb,
>> int nhoff)
>>
>> if (skb->encapsulation)
>> skb_set_inner_network_header(skb, nhoff);
>> + else
>> + skb_set_network_header(skb, nhoff);
>
> This seems like an unrelated change.
It was needed for the bits in ipip_gro_complete and sit_gro_complete.
I agree that based on your other comments this is not needed.
>>
>> csum_replace2(>check, iph->tot_len, newlen);
>> iph->tot_len = newlen;
>>
>> rcu_read_lock();
>> - ops = rcu_dereference(inet_offloads[proto]);
>> - if (WARN_ON(!ops || !ops->callbacks.gro_complete))
>> - goto out_unlock;
>>
>> /* Only need to add sizeof(*iph) to get to the next hdr below
>> * because any hdr with option will have been flushed in
>> * inet_gro_receive().
>> */
>> - err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
>> + nhoff += sizeof(*iph);
>> +
>> + ops = rcu_dereference(inet_offloads[proto]);
>> + if (WARN_ON(!ops || !ops->callbacks.gro_complete))
>> + goto out_unlock;
>> +
>> + err = ops->callbacks.gro_complete(skb, nhoff);
>
> Why this change?
This is a bit of mess left from an experiment I did where I was also
resetting the outer transport header. It turns out we are leaving the
outer transport header pointing at the inner header. I was debating
adding a line after the call to gro_complete that resets the outer
network header using skb_set_transport_header.
>>
>> out_unlock:
>> rcu_read_unlock();
>> @@ -1483,7 +1488,8 @@ out_unlock:
>> static int ipip_gro_complete(struct sk_buff *skb, int nhoff)
>> {
>> skb->encapsulation = 1;
>> - skb_shinfo(skb)->gso_type |= SKB_GSO_IPIP;
>> + skb_shinfo(skb)->gso_type = (ip_hdr(skb)->version == 4) ?
>> + SKB_GSO_IPIPV4 : SKB_GSO_IPIPV6;
>
> I don't think this is necessary. IPIP means IPv4 over IPv4, IPv4 over
> IPv6 can have its own gro_complete
Yeah, I am not all that familiar with IPIP and SIT so I wasn't sure if
they were restricted like that or not. I kind of spaced on the fact
that it is an inet_offload and not something that is used for IPv4 and
IPv6.
The resetting of the network header and transport header is something
we might want to address perhaps for the GRO and bridging case but I
think other people may be working on that as well so for now I might
just go back to watching how this all develops.
Thanks.
- Alex
nts and then we can support tunnels with outer checksum
because the checksum has been computed once and can be applied to all
of the segmented frames.
Hope that helps.
- Alex
tX-3/4 adapters in terms of VXLAN tunnels.
>
> I'm going to mark this as "deferred" in patchwork, so Alex why don't you just
> respin these and repost next week when you get final feedback from the
> Mellanox
> folks?
>
> THanks.
Okay. Will do.
Thanks.
- Alex
wever it cannot support outer IPv6 headers. For
>> this reason I am adding the feature to the hw_enc_features and adding an
>> extra check to the features_check call that will disable GSO and checksum
>> offload in the case that the encapsulated frame has an outer IP version of
>> th
Thanks, looks like exactly the same issue, I'll check if it works on 4.5.
/--Regards, Alex/
On 03/04/16 15:26, poma wrote:
On 03.04.2016 12:15, Alex wrote:
Hello,
[1.] System hang when connecting USB modem (LU150)
[2.] I'm running 4.4.5 kernel (Arch Linux). When this modem is connected
I'm
der approach as it is much easier to be compliant with all the
RFCs.
We might be able to get some of that supported for net-next but things
are going to be limited. We need to have the UDP tunnels actually
setting the DF bit which as far as I know none of them do now. In
addition we will have to add rules for all the encapsulated types so
that we can enforce the outer IP header incrementing in the event that
DF is not set. Then we will also have to go through and make certain
that we have the DF bit set in all headers between the transport and
the outer network header in order to allow support for GSO partial.
What you are describing is no small task. There are bugs that need to
be fixed now in net. We can try to get the features you want pushed
for net-next but they don't exist now so locking down GRO so that it
matches the feature set provided by GSO is not a regression.
- Alex
based off of a conversation several of us had on the
list about doing TSO for tunnels and the fact that the IP IDs for the
outer header have to advance. It makes it easier for me to validate
that I am doing things properly if GRO doesn't destroy the IP ID data
for the outer headers.
- Alex
On Fri, Feb 19, 2016 at 1:53 PM, Jesse Gross wrote:
> On Fri, Feb 19, 2016 at 11:26 AM, Alexander Duyck wrote:
>> This patch series makes it so that we enable the outer Tx checksum for IPv4
>> tunnels by default. This makes the behavior consistent with how
irtio_transport_dgram_dequeue,
> + .dgram_bind = virtio_transport_dgram_bind,
> + .dgram_allow = virtio_transport_dgram_allow,
> +
> + .stream_enqueue = virtio_transport_stream_enqueue,
> + .stream_dequeue = virtio_transport_
irtual Sockets"
> + depends on VSOCKETS && VIRTIO
> + select VIRTIO_VSOCKETS_COMMON
> + help
> + This module implements a virtio transport for Virtual Sockets.
> +
> + Enable this transport if your Virtual Machine runs on
> Qemu/KVM.
Is
_transport_recv_pkt_work);
> + INIT_WORK(>tx_work, virtio_transport_send_pkt_work);
> +
> + mutex_lock(>rx_lock);
> + virtio_vsock_rx_fill(vsock);
> + mutex_unlock(>rx_lock);
> +
> + mutex_unlock(_virtio_vsock_mutex);
> + return 0;
> +
> +out_vqs:
);
> + child = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
> +sk->sk_type, 0);
> + if (!child) {
> + virtio_transport_send_reset(vsk, pkt);
> + return -ENOMEM;
> + }
> +
> + sk->sk_ack_backlog++;
> +
> + lock_sock(child);
> +
> + child->sk_state = SS_CONNECTED;
> +
> + vchild = vsock_sk(child);
> + vsock_addr_init(>local_addr, le32_to_cpu(pkt->hdr.dst_cid),
> + le32_to_cpu(pkt->hdr.dst_port));
> + vsock_addr_init(>remote_addr, le32_to_cpu(pkt->hdr.src_cid),
> + le32_to_cpu(pkt->hdr.src_port));
> +
> + vsock_insert_connected(vchild);
> + vsock_enqueue_accept(sk, child);
> + virtio_transport_send_response(vchild, pkt);
> +
> + release_sock(child);
> +
> + sk->sk_data_ready(sk);
> + return 0;
> +}
> +
> +static void virtio_transport_space_update(struct sock *sk,
> + struct virtio_vsock_pkt *pkt)
> +{
> + struct vsock_sock *vsk = vsock_sk(sk);
> + struct virtio_transport *trans = vsk->trans;
> + bool space_available;
> +
> + /* buf_alloc and fwd_cnt is always included in the hdr */
> + mutex_lock(>tx_lock);
> + trans->peer_buf_alloc = le32_to_cpu(pkt->hdr.buf_alloc);
> + trans->peer_fwd_cnt = le32_to_cpu(pkt->hdr.fwd_cnt);
> + space_available = virtio_transport_has_space(vsk);
> + mutex_unlock(>tx_lock);
> +
> + if (space_available)
> + sk->sk_write_space(sk);
> +}
> +
> +/* We are under the virtio-vsock's vsock->rx_lock or
> + * vhost-vsock's vq->mutex lock */
> +void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
> +{
> + struct virtio_transport *trans;
> + struct sockaddr_vm src, dst;
> + struct vsock_sock *vsk;
> + struct sock *sk;
> +
> + vsock_addr_init(, le32_to_cpu(pkt->hdr.src_cid),
> le32_to_cpu(pkt->hdr.src_port));
> + vsock_addr_init(, le32_to_cpu(pkt->hdr.dst_cid),
> le32_to_cpu(pkt->hdr.dst_port));
> +
> + virtio_vsock_dumppkt(__func__, pkt);
> +
> + if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
> + /* TODO send RST */
TODO's shouldn't make it into final submissions.
> + goto free_pkt;
> + }
> +
> + /* The socket must be in connected or bound table
> + * otherwise send reset back
> + */
> + sk = vsock_find_connected_socket(, );
> + if (!sk) {
> + sk = vsock_find_bound_socket();
> + if (!sk) {
> + pr_debug("%s: can not find bound_socket\n", __func__);
> + virtio_vsock_dumppkt(__func__, pkt);
> + /* Ignore this pkt instead of sending reset back */
> + /* TODO send a RST unless this packet is a RST
> (to avoid infinite loops) */
Ditto.
> + goto free_pkt;
> + }
> + }
> +
> + vsk = vsock_sk(sk);
> + trans = vsk->trans;
> + BUG_ON(!trans);
See above re: BUG_ON
> +
> + virtio_transport_space_update(sk, pkt);
> +
> + lock_sock(sk);
> + switch (sk->sk_state) {
> + case VSOCK_SS_LISTEN:
> + virtio_transport_recv_listen(sk, pkt);
> + virtio_transport_free_pkt(pkt);
> + break;
> + case SS_CONNECTING:
> + virtio_transport_recv_connecting(sk, pkt);
> + virtio_transport_free_pkt(pkt);
> + break;
> + case SS_CONNECTED:
> + virtio_transport_recv_connected(sk, pkt);
> + break;
> + default:
> + virtio_transport_free_pkt(pkt);
> + break;
> + }
> + release_sock(sk);
> +
> + /* Release refcnt obtained when we fetched this socket out of the
> + * bound or connected list.
> + */
> + sock_put(sk);
> + return;
> +
> +free_pkt:
> + virtio_transport_free_pkt(pkt);
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> +
> +void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> +{
> + kfree(pkt->buf);
> + kfree(pkt);
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Asias He");
> +MODULE_DESCRIPTION("common code for virtio vsock");
> --
> 2.5.0
--
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
to get to 10Gbps, requiring real-mode tricks. virtio-net may add
some latency, but it's not that hard to get it to 10Gbps and it already
supports migration. An emulated IOMMU in the guest is really only good
for relatively static mappings, the latency for anything else is likely
too high. Maybe there are shadow page table tricks that could help, but
it's imposing overhead the whole time the guest is running, not only on
migration. Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-10-22 at 15:32 +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2015 at 01:20:27PM -0600, Alex Williamson wrote:
> > The trouble here is that the VF needs to be unplugged prior to the start
> > of migration because we can't do effective dirty page tracking while
1 - 100 of 174 matches
Mail list logo