Can vhost translate to io_uring?

2023-06-14 Thread Eric W. Biederman
I am sad my idea for simplifying things did not work out. Let's try an even bigger idea to reduce maintenance and simplify things. Could vhost depend on io_uring? Could vhost just be a translation layer of existing vhost requests to io_uring requests? At a quick glance it looks like

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-11 Thread Eric W. Biederman
Oleg Nesterov writes: > On 06/06, Mike Christie wrote: >> >> On 6/6/23 7:16 AM, Oleg Nesterov wrote: >> > On 06/05, Mike Christie wrote: >> > >> >> So it works like if we were using a kthread still: >> >> >> >> 1. Userapce thread0 opens /dev/vhost-$something. >> >> 2. thread0 does

[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Eric W. Biederman
and vhost_worker_free will warn about the situtation. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Co-developed-by: Mike Christie Signed-off-by: Eric W. Biederman --- This fixes the ordering issue in vhost_task_fn so that get_signal should not work. This patch i

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Eric W. Biederman
Oleg Nesterov writes: > Hi Mike, > > sorry, but somehow I can't understand this patch... > > I'll try to read it with a fresh head on Weekend, but for example, > > On 06/01, Mike Christie wrote: >> >> static int vhost_task_fn(void *data) >> { >> struct vhost_task *vtsk = data; >> -int

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-01 Thread Eric W. Biederman
Mike Christie writes: > When switching from kthreads to vhost_tasks two bugs were added: > 1. The vhost worker tasks's now show up as processes so scripts doing > ps or ps a would not incorrectly detect the vhost task as another > process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-30 Thread Eric W. Biederman
"Eric W. Biederman" writes: > Linus Torvalds writes: > >> So I'd really like to finish this. Even if we end up with a hack or >> two in signal handling that we can hopefully fix up later by having >> vhost fix up some of its current assumptions. > >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Eric W. Biederman
michael.chris...@oracle.com writes: > On 5/29/23 2:35 PM, Mike Christie wrote: >>> Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap >>> itself, >> Oh wait, are you saying that when we get auto-reaped then we would do the >> last >> fput and call the

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Eric W. Biederman
michael.chris...@oracle.com writes: > On 5/29/23 6:19 AM, Oleg Nesterov wrote: >> On 05/27, Eric W. Biederman wrote: >>> >>> Looking forward I don't see not asking the worker threads to stop >>> for the coredump right now causing any problems in t

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-27 Thread Eric W. Biederman
Mike Christie writes: > On 5/23/23 7:15 AM, Oleg Nesterov wrote: >> >> Now the main question. Whatever we do, SIGKILL/SIGSTOP/etc can come right >> before we call work->fn(). Is it "safe" to run this callback with >> signal_pending() or fatal_signal_pending() ? > > The questions before this one

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-27 Thread Eric W. Biederman
Linus Torvalds writes: > On Sat, May 27, 2023 at 2:49 AM Eric W. Biederman > wrote: >> >> The real sticky widget for me is how to handle one of these processes >> coredumping. It really looks like it will result in a reliable hang. > > Well, if *that* is the main

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-27 Thread Eric W. Biederman
Linus Torvalds writes: > So I'd really like to finish this. Even if we end up with a hack or > two in signal handling that we can hopefully fix up later by having > vhost fix up some of its current assumptions. The real sticky widget for me is how to handle one of these processes coredumping.

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-25 Thread Eric W. Biederman
Oleg Nesterov writes: > On 05/24, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt >> > vhost_worker(). >> >> Actually I think it reveals that exiting with SI

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-24 Thread Eric W. Biederman
Oleg Nesterov writes: > On 05/23, Eric W. Biederman wrote: >> >> I want to point out that we need to consider not just SIGKILL, but >> SIGABRT that causes a coredump, as well as the process peforming >> an ordinary exit(2). All of which will cause get_signal to return

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-23 Thread Eric W. Biederman
Oleg Nesterov writes: > On 05/22, Oleg Nesterov wrote: >> >> Right now I think that "int dead" should die, > > No, probably we shouldn't call get_signal() if we have already > dequeued SIGKILL. Very much agreed. It is one thing to add a patch to move do_exit out of get_signal. It is another

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-23 Thread Eric W. Biederman
"Michael S. Tsirkin" writes: > On Sun, May 21, 2023 at 09:51:24PM -0500, Mike Christie wrote: >> When switching from kthreads to vhost_tasks two bugs were added: >> 1. The vhost worker tasks's now show up as processes so scripts doing ps >> or ps a would not incorrectly detect the vhost task as

Re: [PATCH 1/3] signal: Don't always put SIGKILL in shared_pending

2023-05-23 Thread Eric W. Biederman
Mike Christie writes: > When get_pending detects the task has been marked to be killed we try to ^^^ get_signal > clean up the SIGKLL by doing a sigdelset and recalc_sigpending, but we > still leave it in shared_pending. If the signal is being short circuit > delivered there is no

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Eric W. Biederman
Mike Christie writes: > On 5/18/23 1:28 PM, Eric W. Biederman wrote: >> Still the big issue seems to be the way get_signal is connected into >> these threads so that it keeps getting called. Calling get_signal after >> a fatal signal has been returned happens nowhere el

Re: [RFC PATCH 5/8] vhost: Add callback that stops new work and waits on running ones

2023-05-18 Thread Eric W. Biederman
Mike Christie writes: > On 5/18/23 9:18 AM, Christian Brauner wrote: >>> @@ -352,12 +353,13 @@ static int vhost_worker(void *data) >>> if (!node) { >>> schedule(); >>> /* >>> -* When we get a SIGKILL our release function

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Eric W. Biederman
Oleg Nesterov writes: > On 05/18, Mike Christie wrote: >> >> On 5/18/23 11:25 AM, Oleg Nesterov wrote: >> > I too do not understand the 1st change in this patch ... >> > >> > On 05/18, Mike Christie wrote: >> >> >> >> In the other patches we do: >> >> >> >> if (get_signal(ksig)) >> >>

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-17 Thread Eric W. Biederman
Long story short. In the patch below the first hunk is a noop. The code you are bypassing was added to ensure that process termination (aka SIGKILL) is processed before any other signals. Other than signal processing order there are not any substantive differences in the two code paths. With

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-17 Thread Eric W. Biederman
Mike Christie writes: > This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is > set when we are dealing with PF_USER_WORKER tasks. > When a vhost_task gets a SIGKILL, we could have outstanding IO in flight. > We can easily stop new work/IO from being queued to the vhost_task,

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Eric W. Biederman
Oleg Nesterov writes: > On 05/16, Eric W. Biederman wrote: >> >> A kernel thread can block SIGKILL and that is supported. >> >> For a thread that is part of a process you can't block SIGKILL when the >> task is part of a user mode process. > > Or SIGSTOP.

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Eric W. Biederman
Linus Torvalds writes: > On Mon, May 15, 2023 at 3:23 PM Mike Christie > wrote: >> >> The vhost layer really doesn't want any signals and wants to work like >> kthreads >> for that case. To make it really simple can we do something like this where >> it >> separates user and io worker

Re: [GIT PULL] virtio: last minute fixup

2022-05-13 Thread Eric W. Biederman
Linus Torvalds writes: > On Thu, May 12, 2022 at 10:10 AM Linus Torvalds > wrote: >> >> And most definitely not just random data that can be trivially >> auto-generated after-the-fact. > > Put another way: when people asked for change ID's and I said "we have > links", I by no means meant that

Re: [PATCH V6 01/10] Use copy_process in vhost layer

2022-01-18 Thread Eric W. Biederman
Mike Christie writes: > On 1/17/22 11:31 AM, Eric W. Biederman wrote: >> Mike Christie writes: >> >>> On 12/22/21 12:24 PM, Eric W. Biederman wrote: >>>> All I am certain of is that you need to set >>>> "args->exit_signal = -1;".

Re: [PATCH V6 01/10] Use copy_process in vhost layer

2022-01-17 Thread Eric W. Biederman
Mike Christie writes: > On 12/22/21 12:24 PM, Eric W. Biederman wrote: >> All I am certain of is that you need to set >> "args->exit_signal = -1;". This prevents having to play games with >> do_notify_parent. > > Hi Eric, > > I have all your review

[PATCH] kthread: Generalize pf_io_worker so it can point to struct kthread

2021-12-22 Thread Eric W. Biederman
Suggested-by: Linus Torvalds Signed-off-by: "Eric W. Biederman" --- I looked again and the vhost_worker changes do not generalize pf_io_worker, and as pf_io_worker is already a void * it is easy to generalize. So I just did that. Unless someone spots a problem I will add this to my signal

Re: [PATCH 09/10] kthread: Ensure struct kthread is present for all kthreads

2021-12-22 Thread Eric W. Biederman
Added a couple of people from the vhost thread. Linus Torvalds writes: > On Wed, Dec 22, 2021 at 3:25 PM Eric W. Biederman > wrote: >> >> Solve this by skipping the put_user for all kthreads. > > Ugh. > > While this fixes the problem, could we please just not

Re: [PATCH V6 01/10] Use copy_process in vhost layer

2021-12-22 Thread Eric W. Biederman
Mike Christie writes: > On 12/21/21 6:20 PM, Eric W. Biederman wrote: >> michael.chris...@oracle.com writes: >> >>> On 12/17/21 1:26 PM, Eric W. Biederman wrote: >>>> Mike Christie writes: >>>> >>>>> The following patches made over

Re: [PATCH V6 01/10] Use copy_process in vhost layer

2021-12-21 Thread Eric W. Biederman
michael.chris...@oracle.com writes: > On 12/17/21 1:26 PM, Eric W. Biederman wrote: >> Mike Christie writes: >> >>> The following patches made over Linus's tree, allow the vhost layer to do >>> a copy_process on the thread that does the VHOST_SET_OWNER

Re: [PATCH V6 01/10] Use copy_process in vhost layer

2021-12-17 Thread Eric W. Biederman
Mike Christie writes: > The following patches made over Linus's tree, allow the vhost layer to do > a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how > io_uring does a copy_process against its userspace app. This allows the > vhost layer's worker threads to inherit

Re: [PATCH V6 10/10] vhost: use user_worker to check RLIMITs

2021-12-17 Thread Eric W. Biederman
Mike Christie writes: > For vhost workers we use the kthread API which inherit's its values from > and checks against the kthreadd thread. This results in the wrong RLIMITs > being checked. This patch has us use the user_worker helpers which will > inherit its values/checks from the thread that

Re: [PATCH V6 06/10] fork: add helpers to clone a process for kernel use

2021-12-17 Thread Eric W. Biederman
Mike Christie writes: > The vhost layer is creating kthreads to execute IO and management > operations. These threads need to share a mm with a userspace thread, > inherit cgroups, and we would like to have the thread accounted for > under the userspace thread's rlimit nproc value so a user

Re: [PATCH V6 05/10] signal: Perfom autoreap for PF_USER_WORKER

2021-12-17 Thread Eric W. Biederman
Mike Christie writes: > Userspace doesn't know about PF_USER_WORKER threads, so it can't do wait > to clean them up. For cases like where qemu will do dynamic/hot add/remove > of vhost devices, then we need to auto reap the thread like was done for > the kthread case, because qemu does not know

Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime

2021-11-02 Thread Eric W. Biederman
Joerg Roedel writes: > Hi again, > > On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote: >> I seem to remember the consensus when this was reviewed that it was >> unnecessary and there is already support for doing something like >> this at a more fin

Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime

2021-11-01 Thread Eric W. Biederman
Borislav Petkov writes: > On Mon, Sep 13, 2021 at 05:55:52PM +0200, Joerg Roedel wrote: >> From: Joerg Roedel >> >> Allow a runtime opt-out of kexec support for architecture code in case >> the kernel is running in an environment where kexec is not properly >> supported yet. >> >> This will

Re: [PATCH 2/2] x86/kexec/64: Forbid kexec when running as an SEV-ES guest

2021-05-06 Thread Eric W. Biederman
Joerg Roedel writes: > On Thu, May 06, 2021 at 12:42:03PM -0500, Eric W. Biederman wrote: >> I don't understand this. >> >> Fundamentally kexec is about doing things more or less inspite of >> what the firmware is doing. >> >> I don't have any idea wha

Re: [PATCH 2/2] x86/kexec/64: Forbid kexec when running as an SEV-ES guest

2021-05-06 Thread Eric W. Biederman
Joerg Roedel writes: > From: Joerg Roedel > > For now, kexec is not supported when running as an SEV-ES guest. Doing > so requires additional hypervisor support and special code to hand > over the CPUs to the new kernel in a safe way. > > Until this is implemented, do not support kexec in

Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP

2020-04-30 Thread Eric W. Biederman
David Hildenbrand writes: > On 30.04.20 18:33, Eric W. Biederman wrote: >> David Hildenbrand writes: >> >>> On 30.04.20 17:38, Eric W. Biederman wrote: >>>> David Hildenbrand writes: >>>> >>>>> Some devices/drivers that add memo

Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP

2020-04-30 Thread Eric W. Biederman
David Hildenbrand writes: > On 30.04.20 17:38, Eric W. Biederman wrote: >> David Hildenbrand writes: >> >>> Some devices/drivers that add memory via add_memory() and friends (e.g., >>> dax/kmem, but also virtio-mem in the future) don't want to create en

Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP

2020-04-30 Thread Eric W. Biederman
; If there is no entry, it will simply return with -EINVAL. > > [1] > https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap You know what this justification is rubbish, and I have previously explained why it is rubbish. Nacked-by: "Eric W. Biederman"

Re: [PATCH v7] pci: quirk to skip msi disable on shutdown

2015-09-17 Thread Eric W. Biederman
...@redhat.com> >> Cc: Bjorn Helgaas <bhelg...@google.com> >> Cc: Yinghai Lu <yhlu.kernel.s...@gmail.com> >> Cc: Ulrich Obergfell <uober...@redhat.com> >> Cc: Rusty Russell <ru...@rustcorp.com.au> >> Cc: "Eric W. Biederman" <ebied...

Re: [PATCH trivial] include: uapi: standard all files' macro prefix and suffix, excluding linux/ sub-directory

2013-08-02 Thread Eric W. Biederman
Chen Gang gang.c...@asianux.com writes: It is a trivial patch for include/uapi, exclude linux sub-directory. If it is useful, I will send another patch for linux sub-directory. BTW: it is really big mail addresses from ./scripts/get_maintainers.pl What is the point? Is there a bug that

Re: [Xen-devel] is kexec on Xen domU possible?

2013-07-24 Thread Eric W. Biederman
Greg KH gre...@linuxfoundation.org writes: On Tue, Jul 23, 2013 at 05:22:36PM -0700, Matt Wilson wrote: On Mon, Jul 22, 2013 at 11:33:15AM -0700, Greg KH wrote: On Mon, Jul 22, 2013 at 11:24:46AM -0700, H. Peter Anvin wrote: On 07/22/2013 10:20 AM, Eric W. Biederman wrote: Many Xen

Re: is kexec on Xen domU possible?

2013-07-22 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes: On Fri, Jul 19, 2013 at 01:58:05PM -0700, H. Peter Anvin wrote: On 07/19/2013 12:14 PM, Greg KH wrote: The errors that the kexec tools seem to run into is finding the memory to place the new kernel into, is that just an issue that PV guests

Re: [PATCH] x86: make IDT read-only

2013-04-10 Thread Eric W. Biederman
Ingo Molnar mi...@kernel.org writes: * Eric W. Biederman ebied...@xmission.com wrote: H. Peter Anvin h...@zytor.com writes: On 04/08/2013 03:43 PM, Kees Cook wrote: This makes the IDT unconditionally read-only. This primarily removes the IDT from being a target for arbitrary memory

Re: [PATCH] x86: make IDT read-only

2013-04-09 Thread Eric W. Biederman
H. Peter Anvin h...@zytor.com writes: On 04/08/2013 03:43 PM, Kees Cook wrote: This makes the IDT unconditionally read-only. This primarily removes the IDT from being a target for arbitrary memory write attacks. It has an added benefit of also not leaking (via the sidt instruction) the

Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-11 Thread Eric W. Biederman
David Vrabel david.vra...@citrix.com writes: On 11/01/13 13:22, Daniel Kiper wrote: On Thu, Jan 10, 2013 at 02:19:55PM +, David Vrabel wrote: On 04/01/13 17:01, Daniel Kiper wrote: My .5 cents: - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload; probably we

Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-10 Thread Eric W. Biederman
Konrad Rzeszutek Wilk konrad.w...@oracle.com writes: On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote: I think that new kexec hypercall function should mimics kexec syscall. It means that all arguments passed to hypercall should have same types if it is possible or if it is not

Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-02 Thread Eric W. Biederman
Andrew Cooper andrew.coop...@citrix.com writes: On 27/12/12 18:02, Eric W. Biederman wrote: Andrew Cooperandrew.coop...@citrix.com writes: On 27/12/2012 07:53, Eric W. Biederman wrote: The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept

Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2012-12-27 Thread Eric W. Biederman
Andrew Cooper andrew.coop...@citrix.com writes: On 27/12/2012 07:53, Eric W. Biederman wrote: The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept of domU support is also strange. What does domU support even mean, when the dom0 support

Re: [PATCH v3 01/11] kexec: introduce kexec firmware support

2012-12-27 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes: Daniel Kiper daniel.ki...@oracle.com writes: Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default Linux infrastructure and require some support from firmware and/or hypervisor. To cope with that problem kexec firmware

Re: [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation

2012-12-27 Thread Eric W. Biederman
H. Peter Anvin h...@zytor.com writes: On 12/27/2012 03:23 PM, Daniel Kiper wrote: On 12/26/2012 06:18 PM, Daniel Kiper wrote: Add i386 kexec/kdump implementation. v2 - suggestions/fixes: - allocate transition page table pages below 4 GiB (suggested by Jan Beulich). Why?

Re: [PATCH v3 01/11] kexec: introduce kexec firmware support

2012-12-26 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes: Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default Linux infrastructure and require some support from firmware and/or hypervisor. To cope with that problem kexec firmware infrastructure was introduced. It allows a developer to

Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2012-12-26 Thread Eric W. Biederman
The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept of domU support is also strange. What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over. I expect a lot of decisions about what code

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes: On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote: Daniel Kiper daniel.ki...@oracle.com writes: Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default functions or require some changes in behavior of

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-20 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes: Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default functions or require some changes in behavior of kexec/kdump generic code. To cope with that problem kexec_ops struct was introduced. It allows a developer to replace all or

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes: Arnd Bergmann wrote: On Tuesday 24 November 2009, Patrick McHardy wrote: Eric W. Biederman wrote: I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes: I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good point, in that case we probably should clear it as well. But in the

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-24 Thread Eric W. Biederman
Patrick McHardy ka...@trash.net writes: Eric W. Biederman wrote: Patrick McHardy ka...@trash.net writes: I did all my testing with macvlan interfaces in separate namespaces communicating with each other, so I'd assume that we should always clear skb-mark and skb-dst in this function. Good

Re: Paravirtualization on VMware's Platform [VMI].

2009-10-01 Thread Eric W. Biederman
Alok Kataria akata...@vmware.com writes: On Tue, 2009-09-29 at 01:08 -0700, Arjan van de Ven wrote: For now I have just added some text in the feature-removal file and disabled VMI by default in the Kconfig, the reason that needs to be done is because Live Migration of a VMI enabled VM to

Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls

2009-04-15 Thread Eric W. Biederman
: Eric W. Biederman ebied...@xmission.com Date: Tue Jan 20 11:03:21 2009 + tun: Move read_wait into tun_file The poll interface requires that the waitqueue exist while the struct file is open. In the rare case when a tun device disappears before the tun file closes we fail

Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls

2009-04-15 Thread Eric W. Biederman
Herbert Xu herb...@gondor.apana.org.au writes: On Wed, Apr 15, 2009 at 06:23:29AM -0700, Eric W. Biederman wrote: There is a GIGANTIC reason to have the wait queue on tfile. If you open a file, and do ip link del tapN you can still be blocked waiting in poll. The problem

Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls

2009-04-15 Thread Eric W. Biederman
Herbert Xu herb...@gondor.apana.org.au writes: On Wed, Apr 15, 2009 at 06:35:58AM -0700, Eric W. Biederman wrote: Because as far as I can tell we would just leak that refcount. The poll code does not appear to call back into any of the file methods when it frees itself from the wait queue

Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls

2009-04-15 Thread Eric W. Biederman
Herbert Xu herb...@gondor.apana.org.au writes: On Wed, Apr 15, 2009 at 07:06:10AM -0700, Eric W. Biederman wrote: There is the boring rmmod case that has always existed. There is more interesting case of moving your tap device into another network namespace. In which case

Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls

2009-04-15 Thread Eric W. Biederman
Herbert Xu herb...@gondor.apana.org.au writes: On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote: So holding the reference only blocks us indefinitely in netdev_wait_allrefs, blocking the network namespace exit, and holding net_mutex indefinitely. OK that's a killer because

Re: another RFC patch: bzImage with ELF payload

2007-06-01 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Jeremy Fitzhardinge wrote: BTW, this won't apply as-is; I have some mucking-around patches to try and get the linux/elf*.h headers into a bit more order, but that's not ready yet. BTW2: I've been basically ignoring zImage. Do we need to

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-09 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: yhlu wrote: On 5/8/07, H. Peter Anvin [EMAIL PROTECTED] wrote: Jeremy Fitzhardinge wrote: Specifically boot_params.screen_info isn't being properly set up by the caller. will setup real_mode_data in kexec path? -ENOPARSE I believe YH is asking

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-09 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I expect I can find a few more examples where we specify video_cols and video_lines but we use video_mode == 0. Going farther mode 0x00 is a BIOS 40x25 mode. So the patch below is not always safe even if we boot

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-09 Thread Eric W. Biederman
yhlu [EMAIL PROTECTED] writes: On 5/8/07, Vivek Goyal [EMAIL PROTECTED] wrote: On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote: This message generally appears if you did not specify --args-linux on kexec command line while loading vmlinux. besides elf-x86_64, still need --args-linux to

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-09 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: yhlu wrote: so the kexec tools need to scan the pci devices list, and find out how to set real_mode.isVGA and orig_video_mode, also need to parse the comand line about vga console. BTW, welcome to the hell of bypassing setup. Well in this case

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-09 Thread Eric W. Biederman
] Cc: Rusty Russell [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Cc: Alan [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] Cc: Eric W. Biederman [EMAIL PROTECTED] --- drivers/video/console/vgacon.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) Index: vanilla

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-08 Thread Eric W. Biederman
yhlu [EMAIL PROTECTED] writes: Eric, With the latest change that make vmlinux to be elf64 and make bzImage do switch to 64bit long mode, the kernel started via kexec can not get VGA console. but the serial console works well. I wonder if the setup.S is skipped in bzImage via kexec path.

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-08 Thread Eric W. Biederman
yhlu [EMAIL PROTECTED] writes: Eric, i tried to load vmlinux with kexec and got Ramdisks not supported with generic elf arguments So i use mkelfImage with my patch ( convert elf64 to elf32) to make another elf32. and loaded with kexec and can not get vga console too. ---serial console

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-01 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I'm not going to worry about going farther until the patches in flight settle down a little bit, but this looks promising. Is there any value in adding an early-putchar function pointer into the structure somehow? I

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-04-30 Thread Eric W. Biederman
Rusty Russell [EMAIL PROTECTED] writes: On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote: Rusty Russell wrote: Dammit, Eric, you spend a lot of time using words like insane where you mean we didn't do everything all at once. It's *not* clear that using %esi is sane, but

[PATCH 0/12] Early USB debug port and i386 boot cleanups

2007-04-30 Thread Eric W. Biederman
Modern hardware relies primarily on memory mapped I/O which is typically at addresses that are not mapped by the kernels initial page tables, which makes using them currently unusable for early debugging print support. So this patch set digs in and fixes the early page tables on both arch/i386

[PATCH 08/12] i386: Convert the boot time page tables to the kernels native format.

2007-04-30 Thread Eric W. Biederman
that boot_ioremap could be replaced with something that work in the presence of PAE page tables. The net result is a simpler and easier to work in, early boot environment. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/i386/Kconfig |7 -- arch/i386/kernel/efi.c

[PATCH 05/12] i386: During page table initialization always set the leaf page table entries.

2007-04-30 Thread Eric W. Biederman
. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/i386/mm/init.c | 52 +++--- 1 files changed, 20 insertions(+), 32 deletions(-) diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c index b77a43c..dbe16f6 100644 --- a/arch/i386/mm/init.c

[PATCH 09/12] i386/x86_64: EHCI usb debug port early printk support.

2007-04-30 Thread Eric W. Biederman
data. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/x86_64/kernel/early_printk.c | 571 + drivers/usb/host/ehci.h |8 + include/asm-i386/fixmap.h |1 + include/asm-x86_64/fixmap.h |1 + 4 files changed, 581

[PATCH 11/12] i386: Move setup_idt from head.S to head32.c

2007-04-30 Thread Eric W. Biederman
This slightly delays when we setup the idt. But by doing it in C things are noticeably simpler. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/i386/kernel/head.S | 68 +++- arch/i386/kernel/head32.c | 26 + 2 files

Re: [PATCH 07/12] i386: Add missing !X86_PAE dependincy to the 2G/2G split.

2007-04-30 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: When in PAE mode we require that the user kernel divide to be on a 1G boundary. The 2G/2G split does not have that property so require !X86_PAE ? -hpa From arch/i386/Kconfig: choice depends

Re: [PATCH 08/12] i386: Convert the boot time page tables to the kernels native format.

2007-04-30 Thread Eric W. Biederman
Andi Kleen [EMAIL PROTECTED] writes: On Monday 30 April 2007 18:15:08 Eric W. Biederman wrote: Currently we have a lot of special case code and a lot of limitations because we cannot count on the initial boot time page tables being in the format our page table handling routines know how

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-04-30 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I have several ideas on how we can make this work but first I have to ask what is it that you are trying to accomplish? The requirements are: 1. the domain builder needs to get various information about

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-04-30 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I'm tempted to just reload the segments in setup.S, but that might break loadlin support or one of the other bootloaders that starts the kernel in 32bit mode so we need to be careful. We already load all the segments

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-04-29 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: All it does is set a flag that tells a bootloader. Hey. I can run when loaded a non-default address, and this is what you have to align me to. All relocation processing happens in the kernel itself. Is it possible

Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-29 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: And it just occurred to me PSE disabled, otherwise you would not have needed more than 4 pages. I supposed you were testing the Xen case. No, actually, I wasn't. It was booting native (the Xen boot path doesn't go

Re: The virtuailization patches break Voyager.

2007-04-28 Thread Eric W. Biederman
Andi Kleen [EMAIL PROTECTED] writes: On Saturday 28 April 2007 09:52:30 Jeremy Fitzhardinge wrote: Well, not really. The problem with the subarch mechanism is that it promotes a lot of copied code with small modifications, and so making changes is the inherently non-general activity of trying

Re: The virtuailization patches break Voyager.

2007-04-28 Thread Eric W. Biederman
Andi Kleen [EMAIL PROTECTED] writes: On Saturday 28 April 2007 11:15:33 Eric W. Biederman wrote: Scary thought. But I don't see why people using embedded x86s should suddenly design new interrupt controllers etc. - after all the main value of using x86s embedded is some degree

Re: [PATCH] i386: introduce voyager smp_ops, fix voyager build

2007-04-28 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: This adds an smp_ops for voyager, and hooks things up appropriately. This is the first baby-step to making subarch runtime switchable. Unless I have missed something early_gdt_descr still needs to be updated. Eric

Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-25 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Chuck Ebbert wrote: H. Peter Anvin wrote: Andi Kleen wrote: Then we would have seen reports surely? Yes, I would have thought so. It surprised me that such an obvious bug could be there, apparently for a long time. But it's

Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-25 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Since you have PSE disabled for Xen my hunch is that somehow that got left on for your test boot. No. Under Xen cpuid masks out PSE (and complains if you try to set it in a pte), but when booting native it will just use

Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-23 Thread Eric W. Biederman
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: H. Peter Anvin wrote: It would be *trivial* to make a certain number of page table slots available at the end of the head.S-generated map. Or you could use a fixmap. That certain number of page table slots should be the fixmap slots. If you do

Re: [patch 20/26] Xen-paravirt_ops: Core Xen implementation

2007-03-19 Thread Eric W. Biederman
Chris Wright [EMAIL PROTECTED] writes: * Ingo Molnar ([EMAIL PROTECTED]) wrote: ENTRY(swapper_pg_dir) + .align PAGE_SIZE_asm .fill 1024,4,0 does the native kernel lose memory here? Not in my builds. Shouldn't the align be before the label. Otherwise padding would be inserted

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread Eric W. Biederman
Rusty Russell [EMAIL PROTECTED] writes: On Sun, 2007-03-18 at 13:08 +0100, Andi Kleen wrote: The idea is _NOT_ that you go look for references to the paravirt_ops members structure, that would be stupid and you wouldn't be able to use the most efficient addressing mode on a given cpu,