On 9/26/2023 11:40 AM, Parav Pandit wrote:
From: Jason Wang <jasow...@redhat.com>
Sent: Tuesday, September 26, 2023 8:16 AM
On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <pa...@nvidia.com> wrote:
From: Jason Wang <jasow...@redhat.com>
Sent: Friday, September 22, 2023 8:38 AM
Device context has no overlap.
I can give you one example, e.g debugging.
Almost every feature needs debugging. :) So I am omitting it for time
being.
Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
Sure add something specific for debug and explicitly mention that it is for
debug like -d.
Every feature and functionality needs debug, not specifically device context.
So add infra for debug. Device migration series is not the vehicle to piggy
back on.
1) define virtqueue state, inflight descriptors in the section of
basic facility but not under the admin commands
It will be part of the device context such a way that so that one can only read
the vq state only instead of full device context.
This will work.
I'm not sure what it looks like, but I think they are well decoupled in this
series.
E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
watchdog.
Once its done it will be visible how it looks like.
The key is it needs to cover BOTH use cases.
As I replied in other thread, I see several problems:
1) layer violation, PCI specific state were mixed into the basic facilities
After agreeing to see merged donctext, now you are hinting that you don’t agree
to merge two.
I disagree if you are leaning towards that direction.
I hope my deduction from your above comment is incorrect.
There is no violation. PCI specific device context will be captured in PCI
specific section.
Device type contexts will be captured in those device type sections.
TLVs will cover many of the device context information.
2) I don't see a good definition on "device context"
3) TLV works fine for queue but not registers
Please definition of device context in [1].
[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
What needs to be done first is to describe what device context means and what
it contains. Not the actual data structure since it may vary.
Sure, it is already defined in device migration theory of operation section in
[2].
I will try to take it out and put in device management section, so that device
migration can refer to it and
Some other basic facility also can refer to it (which must need to explain a
use case beyond silly debug point).
[2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html
3) define transport specific interfaces or admin commands to access
them
As you also envisioned, it is done using admin commands to access it.
That's fine, but it should allow other ways.
For passthrough admin commands fits the design.
Sure, you need to draft it how to do other ways..
> >
Does this work? It seems you refused such steps in the past.
I didn’t.
There must be some confusion in many emails we both exchanged, because
already posted v0 has it split such as device context.
For dirty page tracking I couldn’t find a solid use case without device
migration, so I asked which you already replied above.
First, you never explain why such coupling gives us any benefit.
It is not coupled. Device migration uses this facility. So it is matter of text
organization in the spec.
Not the design.
Second, I've given you sufficient examples but you tend to ignore them. Why
not go through Qemu codes then you will see the answer.
You gave example of debug, and profiling. You didn’t explain the use case of
how to actually connect and how to profile etc.
Actually, I would like to leave 2) as it's very complicated which
might not converge easily.
I will split current "device migration" section to two.
1. device management
2. device migration
Device management covers device mode, device context and dirty page
tracking.
I don't see any connection between "management" and "device context".
You have better name than device management?
May be device operation.
May be it is just better to have the device context just the way it is in [1]
under basic facility.
Device migration refers to device management section.
We can omit the dirty page tracking commands for a moment and first close
on the device mode and device context as first step.
Since it is already part of v0 and it is needed, I will keep it in subsequent
v1 but
moved to device management section.
Could you please answer what's wrong with the first 4 patches in this series?
1. cover letter is missing the problem statement and use case
I only reply to this section of comments, this does not mean I agree
with you on your other statements, Instead I agree with
Jason on his replies to you.
In my cover letter:
"The main usecase of these new facilities is Live Migration."
Did you miss it?
2. why queue suspend and both device suspend are introduced, only one should be
there. The design description is missing.
there are no queue suspend, they are device suspend and vq state
accessors. please read the patch if you want to comment.
3. Even though it claims under some random basic facility, cover letter clearly states
the main use case is "live migration".
it is not random, they are precisely defined virtio basic facilities.
4. Patch 4 is not needed at all. When device is suspended, it is _suspended_.
It does not do any bifurcation.
The device should not accept vq reset and the driver should reset vqs,
please read previous discussions with MST
and please don't ignore the conclusions
5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and
freeze both covered in series [1].
we have discussed this for many times, P2P is out of virtio spec, do you
want to mediate every PCI state/functionality?
6. Finally the whole description of 1 to 4 need to be split in the device
operation, so that both passthrough and medication can utilize it using admin
cmd and otherwise.
Do you see any reasons this solution can not be used for passthrough and
mediation?
Or does features_OK work for passthrough or mediation? Any difference?
Since Zhu, told that dirty tracking and inflight descriptors will be done, I
presume he will propose to do over admin q or command interface.
And since all can run over the admin commands, the plumbing done in 1 to 4 can
be made using admin commands.
No
Until now we could not establish creating yet another DMA interface that is
better than q interface.
So ...
To me both the methods will start looking more converged to me over admin
command and queues.
I don't think so, again, we are introducing basic facilities and these
facilities don't
depend on or rely on admin vq.
Passthrough will use them over owner device.
Mediation somehow need to do over member device.
Mediation will not use any device suspend command because it needs to keep
bisecting everything.
please read QEMU vhost live migration solution
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org