Re: [PATCH 6/7] qemu: Implement virtio-pstore device
On Thu, 28 Jul 2016 14:39:53 +0900 Namhyung Kimwrote: > Well, I dont' know. As you know, the kernel oops dump is already sent > to serial device but it's rather slow. As I wrote in the cover > letter, enabling ftrace_dump_on_oops makes it even worse.. Also > pstore saves the (compressed) binary data so I thought it'd be better > to have a dedicated IO channel. BTW, I agree with this. It is better to have a quick way to grab the ftrace buffers when a crash happens, as serial is excruciatingly slow. Although, currently I still use kexec/kdump, but as Namhyung said, it depends on crash being up to date. I tend to be sending in updates every time I have to use it. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: Do not trace _paravirt_ident_*() functions
Hmm, I'm guessing this patch got lost. -- Steve On Wed, 25 May 2016 13:47:26 -0400 Steven Rostedt <rost...@goodmis.org> wrote: > Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up > after enabling function tracer. I asked him to bisect the functions within > available_filter_functions, which he did and it came down to three: > > _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() > > It was found that this is only an issue when noreplace-paravirt is added to > the kernel command line. > > This means that those functions are most likely called within critical > sections of the funtion tracer, and must not be traced. > > In newer kenels _paravirt_nop() is defined within gcc asm(), and is no > longer an issue. But both _paravirt_ident_{32,64}() causes the following > splat when they are traced: > > mm/pgtable-generic.c:33: bad pmd 8800d2435150(01d00054) > mm/pgtable-generic.c:33: bad pmd 8800d3624190(01d00070) > mm/pgtable-generic.c:33: bad pmd 8800d36a5110(01d00054) > mm/pgtable-generic.c:33: bad pmd 880118eb1450(01d00054) > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] > Modules linked in: e1000e > CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 > Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 > 05/07/2012 > task: 880118f740c0 ti: 8800d4aec000 task.ti: 8800d4aec000 > RIP: 0010:[] [] > queued_spin_lock_slowpath+0x118/0x1a0 > RSP: 0018:8800d4aefb90 EFLAGS: 0246 > RAX: RBX: RCX: 88011eb16d40 > RDX: 82485760 RSI: 1f288820 RDI: ea008030 > RBP: 8800d4aefb90 R08: 000c R09: > R10: 821c8e0e R11: R12: 88200fb8 > R13: 7f7a4e3f7000 R14: ea000303f600 R15: 8800d4b562e0 > FS: 7f7a4e3d7840() GS:88011eb0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f7a4e3f7000 CR3: d3e71000 CR4: 001406e0 > Stack: > 8800d4aefba0 81cc5f47 8800d4aefc60 8122c15b > 8800d4aefcb0 8800d4aefbd0 811bf4cb 0002 > 0015 8800d2276050 8000c0fd8867 ea008030 > Call Trace: > [] _raw_spin_lock+0x27/0x30 > [] handle_pte_fault+0x13db/0x16b0 > [] ? function_trace_call+0x15b/0x180 > [] ? handle_pte_fault+0x5/0x16b0 > [] handle_mm_fault+0x312/0x670 > [] ? find_vma+0x68/0x70 > [] __do_page_fault+0x1b1/0x4e0 > [] do_page_fault+0x22/0x30 > [] page_fault+0x28/0x30 > [] ? copy_user_enhanced_fast_string+0x5/0x10 > [] ? seq_read+0x305/0x370 > [] __vfs_read+0x28/0xe0 > [] ? __vfs_read+0x5/0xe0 > [] ? __vfs_read+0x5/0xe0 > [] vfs_read+0x86/0x130 > [] SyS_read+0x46/0xa0 > [] entry_SYSCALL_64_fastpath+0x1e/0xa8 > Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 > c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 > 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b > > Reported-by: Łukasz Daniluk <lukasz.dani...@intel.com> > Signed-off-by: Steven Rostedt <rost...@goodmis.org> > > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c > index f08ac28b8136..f975d226be6e 100644 > --- a/arch/x86/kernel/paravirt.c > +++ b/arch/x86/kernel/paravirt.c > @@ -55,12 +55,12 @@ asm (".pushsection .entry.text, \"ax\"\n" > ".popsection"); > > /* identity function, which can be inlined */ > -u32 _paravirt_ident_32(u32 x) > +u32 notrace _paravirt_ident_32(u32 x) > { > return x; > } > > -u64 _paravirt_ident_64(u64 x) > +u64 notrace _paravirt_ident_64(u64 x) > { > return x; > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: Do not trace _paravirt_ident_*() functions
I just spent half a day bisecting function tracing because I tripped over this again. I thought this was merged, but I guess it was missed again. Can someone please pull this in. And mark it for stable, it goes probably as far back as 2.6.32. Thanks! -- Steve On Wed, 25 May 2016 13:47:26 -0400 Steven Rostedt <rost...@goodmis.org> wrote: > Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up > after enabling function tracer. I asked him to bisect the functions within > available_filter_functions, which he did and it came down to three: > > _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() > > It was found that this is only an issue when noreplace-paravirt is added to > the kernel command line. > > This means that those functions are most likely called within critical > sections of the funtion tracer, and must not be traced. > > In newer kenels _paravirt_nop() is defined within gcc asm(), and is no > longer an issue. But both _paravirt_ident_{32,64}() causes the following > splat when they are traced: > > mm/pgtable-generic.c:33: bad pmd 8800d2435150(01d00054) > mm/pgtable-generic.c:33: bad pmd 8800d3624190(01d00070) > mm/pgtable-generic.c:33: bad pmd 8800d36a5110(01d00054) > mm/pgtable-generic.c:33: bad pmd 880118eb1450(01d00054) > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] > Modules linked in: e1000e > CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 > Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 > 05/07/2012 > task: 880118f740c0 ti: 8800d4aec000 task.ti: 8800d4aec000 > RIP: 0010:[] [] > queued_spin_lock_slowpath+0x118/0x1a0 > RSP: 0018:8800d4aefb90 EFLAGS: 0246 > RAX: RBX: RCX: 88011eb16d40 > RDX: 82485760 RSI: 1f288820 RDI: ea008030 > RBP: 8800d4aefb90 R08: 000c R09: > R10: 821c8e0e R11: R12: 88200fb8 > R13: 7f7a4e3f7000 R14: ea000303f600 R15: 8800d4b562e0 > FS: 7f7a4e3d7840() GS:88011eb0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f7a4e3f7000 CR3: d3e71000 CR4: 001406e0 > Stack: > 8800d4aefba0 81cc5f47 8800d4aefc60 8122c15b > 8800d4aefcb0 8800d4aefbd0 811bf4cb 0002 > 0015 8800d2276050 8000c0fd8867 ea008030 > Call Trace: > [] _raw_spin_lock+0x27/0x30 > [] handle_pte_fault+0x13db/0x16b0 > [] ? function_trace_call+0x15b/0x180 > [] ? handle_pte_fault+0x5/0x16b0 > [] handle_mm_fault+0x312/0x670 > [] ? find_vma+0x68/0x70 > [] __do_page_fault+0x1b1/0x4e0 > [] do_page_fault+0x22/0x30 > [] page_fault+0x28/0x30 > [] ? copy_user_enhanced_fast_string+0x5/0x10 > [] ? seq_read+0x305/0x370 > [] __vfs_read+0x28/0xe0 > [] ? __vfs_read+0x5/0xe0 > [] ? __vfs_read+0x5/0xe0 > [] vfs_read+0x86/0x130 > [] SyS_read+0x46/0xa0 > [] entry_SYSCALL_64_fastpath+0x1e/0xa8 > Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 > c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 > 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b > > Reported-by: Łukasz Daniluk <lukasz.dani...@intel.com> > Signed-off-by: Steven Rostedt <rost...@goodmis.org> > > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c > index f08ac28b8136..f975d226be6e 100644 > --- a/arch/x86/kernel/paravirt.c > +++ b/arch/x86/kernel/paravirt.c > @@ -55,12 +55,12 @@ asm (".pushsection .entry.text, \"ax\"\n" > ".popsection"); > > /* identity function, which can be inlined */ > -u32 _paravirt_ident_32(u32 x) > +u32 notrace _paravirt_ident_32(u32 x) > { > return x; > } > > -u64 _paravirt_ident_64(u64 x) > +u64 notrace _paravirt_ident_64(u64 x) > { > return x; > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support
On Thu, 24 May 2018 13:40:24 +0200 Petr Mladekwrote: > On Wed 2018-05-23 12:54:15, Thomas Garnier wrote: > > When using -fPIE/PIC with function tracing, the compiler generates a > > call through the GOT (call *__fentry__@GOTPCREL). This instruction > > takes 6 bytes instead of 5 on the usual relative call. > > > > If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte nop > > so ftrace can handle the previous 5-bytes as before. > > > > Position Independent Executable (PIE) support will allow to extended the > > KASLR randomization range below the -2G memory limit. > > > > Signed-off-by: Thomas Garnier > > --- > > arch/x86/include/asm/ftrace.h | 6 +++-- > > arch/x86/include/asm/sections.h | 4 > > arch/x86/kernel/ftrace.c| 42 +++-- > > 3 files changed, 48 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h > > index c18ed65287d5..8f2decce38d8 100644 > > --- a/arch/x86/include/asm/ftrace.h > > +++ b/arch/x86/include/asm/ftrace.h > > @@ -25,9 +25,11 @@ extern void __fentry__(void); > > static inline unsigned long ftrace_call_adjust(unsigned long addr) > > { > > /* > > -* addr is the address of the mcount call instruction. > > -* recordmcount does the necessary offset calculation. > > +* addr is the address of the mcount call instruction. PIE has always a > > +* byte added to the start of the function. > > */ > > + if (IS_ENABLED(CONFIG_X86_PIE)) > > + addr -= 1; > > This seems to modify the address even for modules that are _not_ compiled with > PIE, see below. Can one load a module not compiled for PIE in a kernel with PIE? > > > return addr; > > } > > > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c > > index 01ebcb6f263e..73b3c30cb7a3 100644 > > --- a/arch/x86/kernel/ftrace.c > > +++ b/arch/x86/kernel/ftrace.c > > @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip, unsigned > > const char *old_code, > > return 0; > > } > > > > +/* Bytes before call GOT offset */ > > +const unsigned char got_call_preinsn[] = { 0xff, 0x15 }; > > + > > +static int > > +ftrace_modify_initial_code(unsigned long ip, unsigned const char *old_code, > > + unsigned const char *new_code) > > +{ > > + unsigned char replaced[MCOUNT_INSN_SIZE + 1]; > > + > > + ftrace_expected = old_code; > > + > > + /* > > +* If PIE is not enabled or no GOT call was found, default to the > > +* original approach to code modification. > > +*/ > > + if (!IS_ENABLED(CONFIG_X86_PIE) || > > + probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) || > > + memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn))) > > + return ftrace_modify_code_direct(ip, old_code, new_code); > > And this looks like an attempt to handle modules compiled without > PIE. Does it works with the right ip in that case? I'm guessing the || is for the "or no GOT call was found", but it doesn't explain why no GOT would be found. > > I wonder if a better solution would be to update > scripts/recordmcount.c to store the incremented location into the module. If recordmcount.c can handle this, then I think that's the preferred approach. Thanks! -- Steve > > IMPORTANT: I have only vague picture about how this all works. It is > possible that I am completely wrong. The code might be correct, > especially if you tested this situation. > > Best Regards, > Petr ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: net-next boot error
[ Added Thomas Gleixner ] On Thu, 26 Jul 2018 11:34:39 +0200 Dmitry Vyukov wrote: > On Thu, Jul 26, 2018 at 11:29 AM, syzbot > wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:dc66fe43b7eb rds: send: Fix dead code in rds_sendmsg > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=127874c840 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f34ce142a9f5f0e8 > > dashboard link: https://syzkaller.appspot.com/bug?extid=604f8271211546f5b3c7 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+604f8271211546f5b...@syzkaller.appspotmail.com > > > > possible deadlock in static_key_slow_incsd 0:0:1:0: [sda] Attached SCSI disk > > MACsec IEEE 802.1AE > > tun: Universal TUN/TAP device driver, 1.6 > > > > > > WARNING: possible recursive locking detected > > +Tetsuo, perhaps this boot lockdep problem then disables lockdep for > actual testing. I think lockdep should respect panic_on_warn. > > > > 4.18.0-rc6+ #141 Not tainted > > > > swapper/0/1 is trying to acquire lock: > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > > > but task is already holding lock: > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > include/linux/cpu.h:126 [inline] > > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: init_vqs+0xe1a/0x1520 > > drivers/net/virtio_net.c:2777 Here init_vqs() does: get_online_cpus(); virtnet_set_affinity(vi); put_online_cpus(); Which disables cpu hotplug and calls virtnet_set_affinity() Note, get_online_cpus() is no longer recursive. > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > >CPU0 > > > > lock(cpu_hotplug_lock.rw_sem); > > lock(cpu_hotplug_lock.rw_sem); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 3 locks held by swapper/0/1: > > #0: (ptrval) (>mutex){}, at: device_lock > > include/linux/device.h:1134 [inline] > > #0: (ptrval) (>mutex){}, at: __driver_attach+0x15f/0x2f0 > > drivers/base/dd.c:820 > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: get_online_cpus > > include/linux/cpu.h:126 [inline] > > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > > init_vqs+0xe1a/0x1520 drivers/net/virtio_net.c:2777 > > #2: (ptrval) (xps_map_mutex){+.+.}, at: > > __netif_set_xps_queue+0x243/0x23f0 net/core/dev.c:2278 > > > > stack backtrace: > > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6+ #141 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 > > print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] > > check_deadlock kernel/locking/lockdep.c:1809 [inline] > > validate_chain kernel/locking/lockdep.c:2405 [inline] > > __lock_acquire.cold.65+0x1fb/0x486 kernel/locking/lockdep.c:3435 > > lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 > > percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36 [inline] > > percpu_down_read include/linux/percpu-rwsem.h:59 [inline] > > cpus_read_lock+0x43/0xa0 kernel/cpu.c:289 > > static_key_slow_inc+0x12/0x30 kernel/jump_label.c:124 > > __netif_set_xps_queue+0xaac/0x23f0 net/core/dev.c:2320 __netif_set_xps_queue() calls static_key_slow_inc() which will also do a get_online_cpus() which will trigger this bug. There's a static_key_slow_inc_cpuslocked() version that should be used when get_online_cpus() is already taken, but I see __netif_set_xps_queue() is called from several places, and I doubt it is always called with get_online_cpus() held. Thus just using the cpuslocked() version is probably not sufficient of a fix. I don't know the code enough to offer other suggestions. -- Steve > > netif_set_xps_queue+0x26/0x30 net/core/dev.c:2455 > > virtnet_set_affinity+0x2ba/0x4b0 drivers/net/virtio_net.c:1944 > > init_vqs+0xe22/0x1520 drivers/net/virtio_net.c:2778 > > virtnet_probe+0x1092/0x2260 drivers/net/virtio_net.c:3016 > > virtio_dev_probe+0x592/0x942 drivers/virtio/virtio.c:245 > > really_probe drivers/base/dd.c:446 [inline] > > driver_probe_device+0x6ad/0x970 drivers/base/dd.c:588 > > __driver_attach+0x28b/0x2f0 drivers/base/dd.c:822 > > bus_for_each_dev+0x15d/0x1f0 drivers/base/bus.c:311 > > driver_attach+0x3d/0x50 drivers/base/dd.c:841 > > bus_add_driver+0x4b2/0x600 drivers/base/bus.c:667 > > driver_register+0x1c8/0x320 drivers/base/driver.c:170 > > register_virtio_driver+0x79/0xd0
[PATCH] lguest32 kallsyms backtrace of guest.
This is taken from the work I did on lguest64. When killing a guest, we read the guest stack to do a nice back trace of the guest and send it via printk to the host. So instead of just getting an error message from the lguest launcher of: lguest: bad read address 537012178 len 1 I also get in my dmesg: called from [c0405f30] show_trace_log_lvl+0x1a/0x2f [c04069aa] show_trace+0x12/0x14 [c0406a03] dump_stack+0x16/0x18 [f8c746de] lguest_dump_lg_regs+0x22/0x13c [lg] [f8c7131b] lgread+0x59/0x90 [lg] [f8c715bd] run_guest+0x26b/0x406 [lg] [f8c739be] read+0x73/0x7d [lg] [c04825e9] vfs_read+0xad/0x161 [c0482a75] sys_read+0x3d/0x61 [c0404f34] syscall_call+0x7/0xb === [f8c7131b] lgread+0x59/0x90 [lg] Printing LG 0 regs cr3: 021eb000 EIP: 0061: [e00227d2] ESP: 0069:c236fe3c EFLAGS: 00010202 EAX: 0004 EBX: e001fb20 ECX: 0008 EDX: 03f2 ESI: e001ee00 EDI: e001fb60 EBP: c236fea0 CR2: 1278000 lguest_data-cr2: 80011380 errcode: 0 trapnum: d Stack Dump: [c1042b7a] trace_hardirqs_on+0x125/0x149 [c123b0ea] wait_for_completion+0x90/0x98 [c123bddc] __mutex_unlock_slowpath+0x129/0x13e [c1048769] unlock_cpu_hotplug+0x62/0x64 [c104b5b6] sys_init_module+0x14e3/0x162c [c1081d44] do_sync_read+0xc2/0xff [c1004f85] restore_nocheck+0x12/0x15 [c1004f34] syscall_call+0x7/0xb TODO: - Clean up a little (still has stuff from lguest64 in it). - Perhaps make a config option or runtime switch to turn it off. - Send to the launcher the dump instead of printk. - make modules work too. Also I need to change the %u of the bad read print to a %x, because seeing 0x200227d2 is better than seeing 537012178 for addresses. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm2/drivers/lguest/Makefile === --- linux-2.6.21-rc5-mm2.orig/drivers/lguest/Makefile +++ linux-2.6.21-rc5-mm2/drivers/lguest/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_LGUEST_GUEST) += lguest.o l # Host requires the other files, which can be a module. obj-$(CONFIG_LGUEST) += lg.o lg-objs := core.o hypercalls.o page_tables.o interrupts_and_traps.o \ - segments.o io.o lguest_user.o hypervisor.o + segments.o io.o lguest_user.o hypervisor.o lguest_debug.o Index: linux-2.6.21-rc5-mm2/drivers/lguest/core.c === --- linux-2.6.21-rc5-mm2.orig/drivers/lguest/core.c +++ linux-2.6.21-rc5-mm2/drivers/lguest/core.c @@ -210,6 +210,28 @@ int lguest_address_ok(const struct lgues } /* Just like get_user, but don't let guest access lguest binary. */ +u8 lgread_u8(struct lguest *lg, u32 addr) +{ + u8 val = 0; + + /* Don't let them access lguest binary */ + if (!lguest_address_ok(lg, addr) + || get_user(val, (u32 __user *)addr) != 0) + kill_guest(lg, bad read address %u, addr); + return val; +} + +u16 lgread_u16(struct lguest *lg, u32 addr) +{ + u16 val = 0; + + /* Don't let them access lguest binary */ + if (!lguest_address_ok(lg, addr) + || get_user(val, (u32 __user *)addr) != 0) + kill_guest(lg, bad read address %u, addr); + return val; +} + u32 lgread_u32(struct lguest *lg, u32 addr) { u32 val = 0; Index: linux-2.6.21-rc5-mm2/drivers/lguest/lg.h === --- linux-2.6.21-rc5-mm2.orig/drivers/lguest/lg.h +++ linux-2.6.21-rc5-mm2/drivers/lguest/lg.h @@ -176,6 +176,8 @@ extern struct mutex lguest_lock; /* core.c: */ /* Entry points in hypervisor */ const unsigned long *__lguest_default_idt_entries(void); +u8 lgread_u8(struct lguest *lg, u32 addr); +u16 lgread_u16(struct lguest *lg, u32 addr); u32 lgread_u32(struct lguest *lg, u32 addr); void lgwrite_u32(struct lguest *lg, u32 val, u32 addr); void lgread(struct lguest *lg, void *buf, u32 addr, unsigned bytes); @@ -238,6 +240,7 @@ int hypercall(struct lguest *info, struc #define kill_guest(lg, fmt...) \ do { \ if (!(lg)-dead) { \ + lguest_dump_lg_regs(lg);\ (lg)-dead = kasprintf(GFP_ATOMIC, fmt);\ if (!(lg)-dead)\ (lg)-dead = (void *)1; \ @@ -248,5 +251,11 @@ static inline unsigned long guest_pa(str { return vaddr - lg-page_offset; } + +/* lguest_debug.c */ +void lguest_print_address(struct lguest *lg, unsigned long address); +void lguest_dump_trace(struct lguest *lg, struct lguest_regs *regs); +void lguest_dump_lg_regs(struct lguest *lg); + #endif /* __ASSEMBLY__ */ #endif /* _LGUEST_H */ Index: linux-2.6.21-rc5-mm2/drivers/lguest/lguest.c
[PATCH] Lguest32 print hex on bad reads and writes
Currently the lguest32 error messages from bad reads and writes prints a decimal integer for addresses. This is pretty annoying. So this patch changes those to be hex outputs. This is applied on top of my debug patch. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm2/drivers/lguest/core.c === --- linux-2.6.21-rc5-mm2.orig/drivers/lguest/core.c +++ linux-2.6.21-rc5-mm2/drivers/lguest/core.c @@ -220,7 +220,7 @@ u8 lgread_u8(struct lguest *lg, u32 addr /* Don't let them access lguest binary */ if (!lguest_address_ok(lg, addr) || get_user(val, (u32 __user *)addr) != 0) - kill_guest(lg, bad read address %u, addr); + kill_guest(lg, bad read address %x, addr); return val; } @@ -231,7 +231,7 @@ u16 lgread_u16(struct lguest *lg, u32 ad /* Don't let them access lguest binary */ if (!lguest_address_ok(lg, addr) || get_user(val, (u32 __user *)addr) != 0) - kill_guest(lg, bad read address %u, addr); + kill_guest(lg, bad read address %x, addr); return val; } @@ -242,7 +242,7 @@ u32 lgread_u32(struct lguest *lg, u32 ad /* Don't let them access lguest binary */ if (!lguest_address_ok(lg, addr) || get_user(val, (u32 __user *)addr) != 0) - kill_guest(lg, bad read address %u, addr); + kill_guest(lg, bad read address %x, addr); return val; } @@ -250,7 +250,7 @@ void lgwrite_u32(struct lguest *lg, u32 { if (!lguest_address_ok(lg, addr) || put_user(val, (u32 __user *)addr) != 0) - kill_guest(lg, bad write address %u, addr); + kill_guest(lg, bad write address %x, addr); } void lgread(struct lguest *lg, void *b, u32 addr, unsigned bytes) @@ -259,7 +259,7 @@ void lgread(struct lguest *lg, void *b, || copy_from_user(b, (void __user *)addr, bytes) != 0) { /* copy_from_user should do this, but as we rely on it... */ memset(b, 0, bytes); - kill_guest(lg, bad read address %u len %u, addr, bytes); + kill_guest(lg, bad read address %x len %u, addr, bytes); } } @@ -268,7 +268,7 @@ void lgwrite(struct lguest *lg, u32 addr if (addr + bytes addr || !lguest_address_ok(lg, addr+bytes) || copy_to_user((void __user *)addr, b, bytes) != 0) - kill_guest(lg, bad write address %u len %u, addr, bytes); + kill_guest(lg, bad write address %x len %u, addr, bytes); } static void set_ts(unsigned int guest_ts) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] Lguest32 print hex on bad reads and writes
On Wed, 2007-04-04 at 23:06 -0400, Kyle Moffett wrote: (Erk, I wonder what I was thinking when I wrote that?) Can I ask for %#x (or 0x%x)? I'm easily confused. How about %p for pointers? But that would require casting the numbers to pointers. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] lguest32 kallsyms backtrace of guest.
On Thu, 2007-04-05 at 12:54 +1000, Rusty Russell wrote: This is a cool idea, but there are two issues with this patch. The first is that it's 500 lines of code: that's around +10% on lguest's total code size! The second is that it conflicts with the medium-term plan to allow any user to run up lguests: this is why lg.ko never printk()s about problems with the guest. Not much I can do about the size, but it's in the debug section so hopefully it's not considered too bad :) While it is useful for cases where a guest dies mysteriously before it brings up the console, three alternatives come to mind: 1) Modify early_printk so Guests can use it. 2) Have a separate tool(-set?) for this kind of post-mortem. Then you just have to implement guest suspend! 8) 3) Put this in a CONFIG_LGUEST_DEBUG. Note that options 1 or 2 make you do more work, but are probably better in the long term. I'm happy for #3 to sit as a patch in the tree for the duration, tho! OK, I'll make a #3 patch to send, but the #1 looks best. Not to mention that I still need to make it so that the console can read it. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH] Lguest launcher, child starving parent
Glauber noticed long delays between hitting a key, and seeing data come up on the virtual console. Looking into this, I found that the wake_parent routine that reads from all devices was actually starving out the parent after sending the parent a signal to wake up. The thing is, the child which takes the console input is recognized by the scheduler as an interactive process. The parent, doesn't do so much, so it is recognized more as a CPU hog. So the child easily gets a higher priority than the parent. So when the child receives data from the console, it sends a signal to the parent and then does another select. The problem is that the select doesn't actually read from the device, and will return immediately since their is still data pending until it is read. But it's the parent that reads the data. So the child actually starves the parent from reading the data by spinning and waiting for it to read the data. The fix I implemented was to have the child wait for a response from the parent before going on. Since there was already communication between the parent and child via a pipe, I used that. This time, the data returned by the pipe is either everything went fine (fd == -1) or the device should be ignored (fd = 0). Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6-lguest/Documentation/lguest/lguest.c === --- linux-2.6-lguest.orig/Documentation/lguest/lguest.c 2007-04-05 16:13:08.0 -0400 +++ linux-2.6-lguest/Documentation/lguest/lguest.c 2007-04-05 16:16:31.0 -0400 @@ -328,15 +328,15 @@ static void wake_parent(int pipefd, stru for (;;) { fd_set rfds = devices-infds; + int ignorefd; select(devices-max_infd+1, rfds, NULL, NULL, NULL); - if (FD_ISSET(pipefd, rfds)) { - int ignorefd; - if (read(pipefd, ignorefd, sizeof(ignorefd)) == 0) - exit(0); - FD_CLR(ignorefd, devices-infds); - } kill(getppid(), SIGUSR1); + /* wait for parent response */ + if (read(pipefd, ignorefd, sizeof(ignorefd)) == 0) + exit(0); + if (ignorefd = 0) + FD_CLR(ignorefd, devices-infds); } } @@ -594,6 +594,7 @@ static void handle_output(int fd, unsign static void handle_input(int fd, int childfd, struct device_list *devices) { struct timeval poll = { .tv_sec = 0, .tv_usec = 0 }; + int fdok = -1; for (;;) { struct device *i; @@ -608,7 +609,9 @@ static void handle_input(int fd, int chi FD_CLR(i-fd, devices-infds); /* Tell child to ignore it too... */ write(childfd, i-fd, sizeof(i-fd)); - } + } else + /* Tell child to continue */ + write(childfd, fdok, sizeof(fdok)); } } } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: lguest over qemu
(Note that lguest doesn't support NMIs, but Steven has code for NMI support for lguest-x86-64 which could be ported across). Rusty, About that. Is there a way to get a NMI only stack in i386? In x86_64 it's part of the TSS. So I can always know I have a good stack for the NMI. I'm not sure i368 has the same thing. Or do we always have a good stack whenever we are in ring0? Oh, and btw, I've just rewrote all of the Lguest64 page table handling. I'm just going over one design change that is really bothering me. In x86_64 we can have 2M or 4k pages (like the PSE in i386). But since 4K pages are used by the shadow page tables, I have to map them like that. But this means that I can have the same guest address as both a PMD and a PTE. Which is breaking some of my code. I'm working on a fix as I write this. But to get you up-to-date to where I'm at. I've implemented a way to have the HV mapped uniquely for all VCPUs. So there's a HV text section (the same for all VCPUs), a HV VCPU Data section (readonly in all rings with guest cr3), and a HV VCPU Scratch Pad section (read/write in rings 0,1,and2). So now the guest kernel runs in ring 1. With this change, I already implemented a syscall trampoline that no longer needs to switch to the host, as well as iretq by the guest kernel goes directly to the guest user space (or kernel). The next version of lguest64 will be much cleaner and faster -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S
Hi Andi, Thanks for all the comments, it's greatly appreciated. On Wed, 8 Aug 2007, Andi Kleen wrote: +#define SYSRETQ\ + movq%gs:pda_oldrsp,%rsp;\ + swapgs; \ + sysretq; When the macro does more than sysret it should have a different name Noted. Do you have a better idea? Something like SETSTACK_SWAPGS_SYSRETQ? */ .globl int_ret_from_sys_call int_ret_from_sys_call: - cli + DISABLE_INTERRUPTS(CLBR_ANY) ANY? There are certainly some registers alive at this point like rax Glauber will need to address this, this is his code ;-) retint_restore_args: - cli + DISABLE_INTERRUPTS(CLBR_ANY) Similar. /* * The iretq could re-enable interrupts: */ @@ -566,10 +587,14 @@ retint_restore_args: restore_args: RESTORE_ARGS 0,8,0 iret_label: - iretq +#ifdef CONFIG_PARAVIRT + INTERRUPT_RETURN +ENTRY(native_iret) ENTRY adds alignment. Why do you need that export anyways? The paravirt ops struct points to it. +#endif +1: iretq .section __ex_table,a - .quad iret_label,bad_iret + .quad 1b, bad_iret iret_label seems more expressive to me than 1 The reason for this change is because of the added: #ifdef CONFIG_PARAVIRT INTERRUPT_RETURN ENTRY(native_iret) #endif If we are paravirt ops, we need the iretq in the exception table, not the paravit ops function call. Since that function call may simply call the native_iretq, and if we take a fault at the iretq, it wont be in the exception table. + ENABLE_INTERRUPTS(CLBR_NONE) In many of the CLBR_NONEs there are actually some registers free; but it might be safer to keep it this way. But if some client can get significantly better code with one or two free registers it might be worthwhile to investigate. Glauber, that comment is for you. - swapgs + SWAPGS_NOSTACK There's still stack here OK, bad name then. How about SWAPGS_UNTRUSTED_STACK? From earlier in the file where SWAPGS_NOSTACK is declared we have: /* Currently paravirt can't handle swapgs nicely when we * don't have a stack. So we either find a way around these * or just fault and emulate if a guest tries to call swapgs * directly. * * Either way, this is a good way to document that we don't * have a reliable stack. */ #define SWAPGS_NOSTACK swapgs paranoid_restore\trace: RESTORE_ALL 8 - iretq + INTERRUPT_RETURN I suspect Xen will need much more changes anyways because of its ring 3 guest. Are these changes sufficient for lguest? Probably not, but this part of the code I don't fully understand. But just doing this doesn't break lguest. But perhaps it only did not break it yet ;-) Thanks, -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
On Wed, 8 Aug 2007, Andi Kleen wrote: Strictly you would at least need a !X86_VSMP dependency, but with the vsmp change i requested that will be unnecessary Is this really synced with the latest version of the i386 code? Glauber started the paravirt ops 64 a second time around, from scratch using the PVOPS of i386 as a base. But since we couldn't just take a the PVOPS patch from i386 and apply it to x86_64, this was mainly done by looking at i386 code and massaging it for x86_64. Somethings may have slipped (and we may have been looking at different versions of PVOPS). It's not easy trying to keep up with a moving target ;-) -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S
On Wed, 8 Aug 2007, Andi Kleen wrote: Probably not, but this part of the code I don't fully understand. I would suggest to defer all this until at least one example to test it (except vsmp which is too simple) is around. Who uses that code? NMIs and debug regs? Lguest only has the host handle the NMIs (doesn't pass to guest). And we haven't gotten to debug regs. Who else uses that part of the code? We can leave the paranoid check change out for this series. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S
-- On Wed, 8 Aug 2007, Andi Kleen wrote: Steven Rostedt [EMAIL PROTECTED] writes: On Wed, 8 Aug 2007, Andi Kleen wrote: Probably not, but this part of the code I don't fully understand. I would suggest to defer all this until at least one example to test it (except vsmp which is too simple) is around. Who uses that code? NMIs and debug regs? Lguest only has the host handle the NMIs (doesn't pass to guest). And we haven't gotten to debug regs. Who else uses that part of the code? I'm not sure I understand your question. You're asking who uses entry.S? Answer would be everybody. If you asked something else please reformulate. When I said this part of the code I don't fully understand I was not talking about entry.S. I understand entry.S very well, but the comment was originally on the paranoid_restore code. Which I thought had to deal with NMIs and such that I didn't worry about that I simply did the default. paranoid_restore\trace: RESTORE_ALL 8 - iretq + INTERRUPT_RETURN I suspect Xen will need much more changes anyways because of its ring 3 guest. Are these changes sufficient for lguest? The above was what I was replying to. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S
-- On Wed, 8 Aug 2007, Andi Kleen wrote: On Wed, Aug 08, 2007 at 09:47:05AM -0400, Steven Rostedt wrote: [...] asm volatile (pushq %2; pushq %%rsp; pushfq; pushq %3; call *%6; /* The stack we pushed is off by 8, due to the previous pushq */ addq $8, %%rsp Weird stack inbalance, i'm surprised that works. %6 must be doing strange things. Heh, no it's very subtle. We do the push %rsp after the push %2. So on return from the call, the stack that was pushed, is really 8 bytes off. Because we didn't save the stack pointer from the entry of the asm, we saved it after doing that push %2. :-) We get back to this call via a retq from the HV switcher code. So that rsp that was pushed will be the stack. /* Need to read manual, does rdmsr clear * the top 32 bits of rax? */ xor %rax, %rax movl$MSR_GS_BASE,%ecx rdmsr This seems incredibly slow. Since GS changes are controlled in the kernel why don't you just cache them in the per cpu area? The only case where user can reload it is using segment selector changes, which can be also handled. Also why do you need the guest gs value anyways? I did a swapgs here, so we have the HV GS value. I don't know of an easy way to read the GS value to put it into the RSP. You could just use your own stack and let the guest switch to its own. Well, it is quite complex, since we can't save anything in the writable section that the guest can't also write to. Remember, we are in ring 1 for the guest kernel, so we have no protection. All the host related stuff is in read only sections when we are using the guest cr3. What I would like to do is have the GS point to the read only section, and load the RSP from a value there. But first we need to store the old value of RSP. The problem is, where do you store it. Hmm, I guess I can create my own big offset from the read only section :) OK this gives me an idea, I *can* point the kernel GS to the vcpu read only section, and have a way to store the value with the GS offset. The read only section is always mapped before the read write section, so it should be easy to calculate the diff! Thanks for the comment, you guided me to this great idea! -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 2/7] Added generic lg.h in lguest directory.
Add a generic lg.h file to call the architecture specific one. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- drivers/lguest/lg.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h new file mode 100644 index 000..4c4356e --- /dev/null +++ b/drivers/lguest/lg.h @@ -0,0 +1,3 @@ +#ifdef CONFIG_X86_32 +#include i386/lg.h +#endif -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 5/7] Change lguest launcher to use asm generic include
Have the lguest launcher include e820.h via asm/e820.h instead of explicitly saying i386. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- Documentation/lguest/lguest.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index f791840..eb7a01c 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -46,7 +46,7 @@ typedef uint32_t u32; typedef uint16_t u16; typedef uint8_t u8; #include ../../include/linux/lguest_launcher.h -#include ../../include/asm-i386/e820.h +#include ../../include/asm/e820.h /*:*/ #define PAGE_PRESENT 0x7 /* Present, RW, Execute */ -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 7/7] Move lguest_dma_info into generic lg.h
The lguest_dma_info is also generic across architectures. Move it to the generic lg.h Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- drivers/lguest/i386/lg.h | 11 --- drivers/lguest/lg.h | 11 +++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/lguest/i386/lg.h b/drivers/lguest/i386/lg.h index c5ea14c..ae83bd2 100644 --- a/drivers/lguest/i386/lg.h +++ b/drivers/lguest/i386/lg.h @@ -49,17 +49,6 @@ int init_pagetables(struct page **switcher_page, unsigned int pages); #define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00}) #define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300}) -struct lguest_dma_info -{ - struct list_head list; - union futex_key key; - unsigned long dmas; - u16 next_dma; - u16 num_dmas; - u16 guestid; - u8 interrupt; /* 0 when not registered */ -}; - /*H:310 The page-table code owes a great debt of gratitude to Andi Kleen. He * reviewed the original code which used u32 for all page table entries, and * insisted that it would be far clearer with explicit typing. I thought it diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 3147bf6..6057e3c 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -12,6 +12,17 @@ void release_all_dma(struct lguest *lg); unsigned long get_dma_buffer(struct lguest *lg, unsigned long key, unsigned long *interrupt); +struct lguest_dma_info +{ + struct list_head list; + union futex_key key; + unsigned long dmas; + u16 next_dma; + u16 num_dmas; + u16 guestid; + u8 interrupt; /* 0 when not registered */ +}; + #ifdef CONFIG_X86_32 #include i386/lg.h #endif -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 1/7] Move lg.h to the i386 specific lguest directory
Make a i386 directory in the lguest directory, and move the lg.h into it. This will clear the way for other archs to have their own lg.h Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- drivers/lguest/i386/lg.h | 300 ++ drivers/lguest/lg.h | 300 -- 2 files changed, 300 insertions(+), 300 deletions(-) diff --git a/drivers/lguest/i386/lg.h b/drivers/lguest/i386/lg.h new file mode 100644 index 000..64f0abe --- /dev/null +++ b/drivers/lguest/i386/lg.h @@ -0,0 +1,300 @@ +#ifndef _LGUEST_H +#define _LGUEST_H + +#include asm/desc.h + +#define GDT_ENTRY_LGUEST_CS10 +#define GDT_ENTRY_LGUEST_DS11 +#define LGUEST_CS (GDT_ENTRY_LGUEST_CS * 8) +#define LGUEST_DS (GDT_ENTRY_LGUEST_DS * 8) + +#ifndef __ASSEMBLY__ +#include linux/types.h +#include linux/init.h +#include linux/stringify.h +#include linux/binfmts.h +#include linux/futex.h +#include linux/lguest.h +#include linux/lguest_launcher.h +#include linux/wait.h +#include linux/err.h +#include asm/semaphore.h +#include irq_vectors.h + +#define GUEST_PL 1 + +struct lguest_regs +{ + /* Manually saved part. */ + unsigned long ebx, ecx, edx; + unsigned long esi, edi, ebp; + unsigned long gs; + unsigned long eax; + unsigned long fs, ds, es; + unsigned long trapnum, errcode; + /* Trap pushed part */ + unsigned long eip; + unsigned long cs; + unsigned long eflags; + unsigned long esp; + unsigned long ss; +}; + +void free_pagetables(void); +int init_pagetables(struct page **switcher_page, unsigned int pages); + +/* Full 4G segment descriptors, suitable for CS and DS. */ +#define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00}) +#define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300}) + +struct lguest_dma_info +{ + struct list_head list; + union futex_key key; + unsigned long dmas; + u16 next_dma; + u16 num_dmas; + u16 guestid; + u8 interrupt; /* 0 when not registered */ +}; + +/*H:310 The page-table code owes a great debt of gratitude to Andi Kleen. He + * reviewed the original code which used u32 for all page table entries, and + * insisted that it would be far clearer with explicit typing. I thought it + * was overkill, but he was right: it is much clearer than it was before. + * + * We have separate types for the Guest's ptes pgds and the shadow ptes + * pgds. There's already a Linux type for these (pte_t and pgd_t) but they + * change depending on kernel config options (PAE). */ + +/* Each entry is identical: lower 12 bits of flags and upper 20 bits for the + * page frame number (0 == first physical page, etc). They are different + * types so the compiler will warn us if we mix them improperly. */ +typedef union { + struct { unsigned flags:12, pfn:20; }; + struct { unsigned long val; } raw; +} spgd_t; +typedef union { + struct { unsigned flags:12, pfn:20; }; + struct { unsigned long val; } raw; +} spte_t; +typedef union { + struct { unsigned flags:12, pfn:20; }; + struct { unsigned long val; } raw; +} gpgd_t; +typedef union { + struct { unsigned flags:12, pfn:20; }; + struct { unsigned long val; } raw; +} gpte_t; + +/* We have two convenient macros to convert a raw value as handed to us by + * the Guest into the correct Guest PGD or PTE type. */ +#define mkgpte(_val) ((gpte_t){.raw.val = _val}) +#define mkgpgd(_val) ((gpgd_t){.raw.val = _val}) +/*:*/ + +struct pgdir +{ + unsigned long cr3; + spgd_t *pgdir; +}; + +/* This is a guest-specific page (mapped ro) into the guest. */ +struct lguest_ro_state +{ + /* Host information we need to restore when we switch back. */ + u32 host_cr3; + struct Xgt_desc_struct host_idt_desc; + struct Xgt_desc_struct host_gdt_desc; + u32 host_sp; + + /* Fields which are used when guest is running. */ + struct Xgt_desc_struct guest_idt_desc; + struct Xgt_desc_struct guest_gdt_desc; + struct i386_hw_tss guest_tss; + struct desc_struct guest_idt[IDT_ENTRIES]; + struct desc_struct guest_gdt[GDT_ENTRIES]; +}; + +/* We have two pages shared with guests, per cpu. */ +struct lguest_pages +{ + /* This is the stack page mapped rw in guest */ + char spare[PAGE_SIZE - sizeof(struct lguest_regs)]; + struct lguest_regs regs; + + /* This is the host state guest descriptor page, ro in guest */ + struct lguest_ro_state state; +} __attribute__((aligned(PAGE_SIZE))); + +#define CHANGED_IDT1 +#define CHANGED_GDT2 +#define CHANGED_GDT_TLS4 /* Actually a subset of CHANGED_GDT */ +#define CHANGED_ALL3 + +/* The private info the thread maintains about the guest. */ +struct lguest +{ + /* At end of a page shared mapped over lguest_pages in guest
[PATCH 4/7] Moved the io struct up to the generic lg.h
Move the io struct into the lg.h file since the io.c is generic to other architectures. Also added a proper ifdef for the generic lg.h. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- drivers/lguest/i386/lg.h | 11 ++- drivers/lguest/lg.h | 16 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/lguest/i386/lg.h b/drivers/lguest/i386/lg.h index 64f0abe..c5ea14c 100644 --- a/drivers/lguest/i386/lg.h +++ b/drivers/lguest/i386/lg.h @@ -20,6 +20,8 @@ #include linux/err.h #include asm/semaphore.h #include irq_vectors.h +/* some files include this, some include the parent */ +#include ../lg.h #define GUEST_PL 1 @@ -245,15 +247,6 @@ void pin_page(struct lguest *lg, unsigned long vaddr); int lguest_device_init(void); void lguest_device_remove(void); -/* io.c: */ -void lguest_io_init(void); -int bind_dma(struct lguest *lg, -unsigned long key, unsigned long udma, u16 numdmas, u8 interrupt); -void send_dma(struct lguest *info, unsigned long key, unsigned long udma); -void release_all_dma(struct lguest *lg); -unsigned long get_dma_buffer(struct lguest *lg, unsigned long key, -unsigned long *interrupt); - /* hypercalls.c: */ void do_hypercalls(struct lguest *lg); void write_timestamp(struct lguest *lg); diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 4c4356e..3147bf6 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -1,3 +1,19 @@ +#ifndef _LGUEST_LG_GENERIC +#define _LGUEST_LG_GENERIC + +struct lguest; + +/* io.c: */ +void lguest_io_init(void); +int bind_dma(struct lguest *lg, +unsigned long key, unsigned long udma, u16 numdmas, u8 interrupt); +void send_dma(struct lguest *info, unsigned long key, unsigned long udma); +void release_all_dma(struct lguest *lg); +unsigned long get_dma_buffer(struct lguest *lg, unsigned long key, +unsigned long *interrupt); + #ifdef CONFIG_X86_32 #include i386/lg.h #endif + +#endif /* _LGUEST_LG_GENERIC */ -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 6/7] Remove __pa() use in hvc_lguest
The hvc_lguest uses __pa in the const initialization. In some architectures, __pa() is not constant so this fails in compiles. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- drivers/char/hvc_lguest.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/char/hvc_lguest.c b/drivers/char/hvc_lguest.c index feeccba..c7d7d4c 100644 --- a/drivers/char/hvc_lguest.c +++ b/drivers/char/hvc_lguest.c @@ -42,7 +42,6 @@ * use of physical address for the buffer itself. */ static char inbuf[256]; static struct lguest_dma cons_input = { .used_len = 0, - .addr[0] = __pa(inbuf), .len[0] = sizeof(inbuf), .len[1] = 0 }; @@ -114,6 +113,13 @@ static struct hv_ops lguest_cons = { * (0), and the struct hv_ops containing the put_chars() function. */ static int __init cons_init(void) { + /* +* We can't initialize using __pa in const declarations, +* since __pa(inbuf) does not evaluate into a constant on +* all architectures (namely x86_64). +*/ + cons_input.addr[0] = __pa(inbuf); + if (strcmp(paravirt_ops.name, lguest) != 0) return 0; -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/7] Added generic lg.h in lguest directory.
-- On Thu, 9 Aug 2007, Stephen Rothwell wrote: On Wed, 08 Aug 2007 20:32:13 -0400 Steven Rostedt [EMAIL PROTECTED] wrote: Add a generic lg.h file to call the architecture specific one. Please combine this with the previous patch so that the tree will build at each step. Yeah, I wanted to do that, but git got confused when I did that, and did what I didn't want. That was, created a new file in the i386 directory, and diffed the one in this directory. I'm really want to keep git knowing the history of the file on moving. I could just simply cat the two files instead. Would that be acceptible? -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 0/5 -v2] Modify lguest32 to make room for lguest64 (version 2)
[ Changes since last version. - Move lg.h to include/asm instead (suggested by Rusty Russel) - All steps of the series compiles (suggested by Stephen Rothwell) - Better ifdef header naming (suggested by Stephen Rothwell) - Added Andi Kleen to CC (forgot to on V1) ] Hi all, I've been working on lguest64 and in order to do this, I had to move a lot of the i386 specific out of the way. Well, the lguest64 port is still not ready to display, but before Rusty makes too many changes I would like this in upstream so I don't have to keep repeating my changes :-) So this patch series moves lguest32 out of the way for other archs. -- Steve -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 4/5 -v2] Added generic lg.h in the include/linux directory
Add a generic lg.h that can be included from lguest files. This file will hold the data that can be shared across archs. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- arch/i386/kernel/asm-offsets.c|2 +- drivers/lguest/core.c |2 +- drivers/lguest/hypercalls.c |2 +- drivers/lguest/interrupts_and_traps.c |2 +- drivers/lguest/io.c |2 +- drivers/lguest/lguest_user.c |2 +- drivers/lguest/page_tables.c |2 +- drivers/lguest/segments.c |2 +- include/asm-i386/lg.h | 24 ++-- include/linux/lg.h| 30 ++ 10 files changed, 40 insertions(+), 30 deletions(-) diff --git a/arch/i386/kernel/asm-offsets.c b/arch/i386/kernel/asm-offsets.c index e426e51..d6d17e5 100644 --- a/arch/i386/kernel/asm-offsets.c +++ b/arch/i386/kernel/asm-offsets.c @@ -21,7 +21,7 @@ #ifdef CONFIG_LGUEST_GUEST #include linux/lguest.h -#include asm/lg.h +#include linux/lg.h #endif #define DEFINE(sym, val) \ diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index e5f441d..5bcd237 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -19,7 +19,7 @@ #include asm/highmem.h #include asm/asm-offsets.h #include asm/i387.h -#include asm/lg.h +#include linux/lg.h /* Found in switcher.S */ extern char start_switcher_text[], end_switcher_text[], switch_to_guest[]; diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index 0ddb133..3c0d41e 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -26,7 +26,7 @@ #include asm/page.h #include asm/pgtable.h #include irq_vectors.h -#include asm/lg.h +#include linux/lg.h /*H:120 This is the core hypercall routine: where the Guest gets what it * wants. Or gets killed. Or, in the case of LHCALL_CRASH, both. diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 899e681..3daf3e2 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -12,7 +12,7 @@ * them first, so we also have a way of reflecting them into the Guest as if * they had been delivered to it directly. :*/ #include linux/uaccess.h -#include asm/lg.h +#include linux/lg.h /* The address of the interrupt handler is split into two bits: */ static unsigned long idt_address(u32 lo, u32 hi) diff --git a/drivers/lguest/io.c b/drivers/lguest/io.c index 091a46d..d196121 100644 --- a/drivers/lguest/io.c +++ b/drivers/lguest/io.c @@ -25,7 +25,7 @@ #include linux/mm.h #include linux/highmem.h #include linux/uaccess.h -#include asm/lg.h +#include linux/lg.h /*L:300 * I/O diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index a29d85b..2f477df 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -7,7 +7,7 @@ #include linux/uaccess.h #include linux/miscdevice.h #include linux/fs.h -#include asm/lg.h +#include linux/lg.h /*L:030 setup_regs() doesn't really belong in this file, but it gives us an * early glimpse deeper into the Host so it's worth having here. diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c index db3dd8b..812d56a 100644 --- a/drivers/lguest/page_tables.c +++ b/drivers/lguest/page_tables.c @@ -13,7 +13,7 @@ #include linux/random.h #include linux/percpu.h #include asm/tlbflush.h -#include asm/lg.h +#include linux/lg.h /*M:008 We hold reference to pages, which prevents them from being swapped. * It'd be nice to have a callback in the struct mm_struct when Linux wants diff --git a/drivers/lguest/segments.c b/drivers/lguest/segments.c index 1d16d5d..351b13f 100644 --- a/drivers/lguest/segments.c +++ b/drivers/lguest/segments.c @@ -9,7 +9,7 @@ * In these modern times, the segment handling code consists of simple sanity * checks, and the worst you'll experience reading this code is butterfly-rash * from frolicking through its parklike serenity. :*/ -#include asm/lg.h +#include linux/lg.h /*H:600 * We've almost completed the Host; there's just one file to go! diff --git a/include/asm-i386/lg.h b/include/asm-i386/lg.h index 64f0abe..c8e01c3 100644 --- a/include/asm-i386/lg.h +++ b/include/asm-i386/lg.h @@ -1,5 +1,5 @@ -#ifndef _LGUEST_H -#define _LGUEST_H +#ifndef _I386_LGUEST_H +#define _I386_LGUEST_H #include asm/desc.h @@ -47,17 +47,6 @@ int init_pagetables(struct page **switcher_page, unsigned int pages); #define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00}) #define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300}) -struct lguest_dma_info -{ - struct list_head list; - union futex_key key; - unsigned long dmas; - u16 next_dma; - u16 num_dmas; - u16 guestid; - u8 interrupt; /* 0 when not registered */ -}; - /*H:310 The page-table code owes a great debt of gratitude to Andi Kleen. He
[PATCH 2/5 -v2] Change lguest launcher to use asm generic include
Have the lguest launcher include e820.h via asm/e820.h instead of explicitly saying i386. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- Documentation/lguest/lguest.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index f791840..eb7a01c 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -46,7 +46,7 @@ typedef uint32_t u32; typedef uint16_t u16; typedef uint8_t u8; #include ../../include/linux/lguest_launcher.h -#include ../../include/asm-i386/e820.h +#include ../../include/asm/e820.h /*:*/ #define PAGE_PRESENT 0x7 /* Present, RW, Execute */ -- 1.4.4.4 -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S
-- On Wed, 8 Aug 2007, Jeremy Fitzhardinge wrote: Steven Rostedt wrote: /* * x86 arch doesn't have an easy way to find out where * gs is located. So we need to read the MSR. But first * we need to save off the rcx, rax and rdx. Why don't you store it in gs? movq %gs:my_gs_base, %rax? Because it can't be trusted. After the swapgs, we are pointing to the RW section of the HV. But by running the guest kernel in ring 1, we have no protection from the guest writing into that area too. So we can't put anything in that section and expect it to be safe after jumping to guest code. The only trusted values, is jumping in after getting there by an interrupt, where the hardware places the values onto the stack. But with Andi's comments, I realized I can point the gs pointer to the RO area. And make a constant offset that will point up into the RW area, so we could save the stack and replace it. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
-- On Thu, 9 Aug 2007, Andi Kleen wrote: This has to match the normal C calling convention though, doesn't it? Native cli/sti/save/restore_flags are all only assembly and can be easily (in fact more easily than in C) written as pure assembler functions. Then you can use whatever calling convention you want. I agree. Should we make a paravirt_ops_asm.S file that can implement these native funcions, and so we can get rid of the C functions only doing asm? While some paravirt implementations may have more complicated implementations i guess it's still a reasonable requirement to make them simple enough in pure assembler. If not they can use a trampoline, but that's hopefully not needed. It works for lguest64. I'm sure it should be no problem with other HVs. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] input: Fix interrupt enable in i8042_ctr when enabling interrupt fails
-- On Mon, 10 Sep 2007, Markus Armbruster wrote: I believe this possible, but unlikely (perhaps not so unlikely on virtual machines). Scenarios involve enable succeeding the first time, failing the second time, and succeeding the third time. I can provide details, but the point I'd like to make is not that this is broken (although it is, strictly speaking), but that it is not obviously correct where it easily could be: just clear the interrupt enable bits when writing them to the hardware failed, like the old code did. I also want to stress that this is more of a clean up for technically correct code than a bug fix. This bug probably would never happen on baremetal unless it was running on broken hardware. BUT!!! With more and more systems going to a virtual environment, having a bug or some other anomaly can trigger the error that this patch prevents. The patch will also trigger a print in the case of running on a buggy virtual machine, which would help out the developers of that virtual machine to fix their code. Please apply. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 7/24] consolidate msr.h
On Mon, 19 Nov 2007, Bastian Blank wrote: On Fri, Nov 09, 2007 at 04:42:48PM -0200, Glauber de Oliveira Costa wrote: - wrmsrl(MSR_CSTAR, ia32_cstar_target); + wrmsrl(MSR_CSTAR, (u64)ia32_cstar_target); Hmm, why do you add explicit casts? The compiler should convert that correctly on its own. +static inline void wrmsrl(unsigned int msr, unsigned long long val) Hmm, long long is 64 bit on all x86, but why not use explicit u64 to show that? (quick reply) With PVOPS on it gives compiler warnings without that explict cast. Without looking at the code, IIRC with non-PVOPS it is a macro directly into asm, so it didn't matter what the cast was. But with PVOPS as a function, it gave compiler warnings. Take it out and try compiling it for both i386 and x86_64. One of them gave warnings. But maybe it's not a problem now. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] Staging: hv: Unify hyper-v device abstractions
Hi! Just an FYI, When sending out patch series, could you send out a [PATCH 0/6] first, and have all other patches a reply to that patch. Both git and quilt have ways to do this. The reason is that the linux kernel mailing list gets over 600 emails a day, and for those of us that skim through, we like to hit [make thread as read] when we are not interested in the thread or patch set. When people send out a set of patches that are all individual, this means I need to hit this command for every patch. Also do not think I'm just picking on you. I've complained to several people about doing this. One just recently sent out 21 patches. Thanks, -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)
On Sat, 2011-08-06 at 11:22 -0700, Jeremy Fitzhardinge wrote: On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote: On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote: * Ingo Molnar mi...@elte.hu wrote: * Randy Dunlap rdun...@xenotime.net wrote: On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote: On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote: These build failures are still triggering upstream: arch/x86/xen/trace.c:44:2: error: array index in initializer not of integer type arch/x86/xen/trace.c:44:2: error: (near initialization for ‘xen_hypercall_names’) arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared here (not in a function) arch/x86/xen/trace.c:45:2: error: array index in initializer not of integer type arch/x86/xen/trace.c:45:2: error: (near initialization for ‘xen_hypercall_names’) Oh, that I haven't seen. Can you send me the .config for that please. You can't be trying very hard then. I see lots of these (but no, Ah, I am getting it now. Thanks for reporting it. This should do the trick: Acked-by: Randy Dunlap rdun...@xenotime.net That patch did the trick here too: Acked-by: Ingo Molnar mi...@elte.hu Except that i'm still seeing the occasional build failure - see the error log below. Config attached. Much appreciate for the report. I believe this fix (which I think hit linux-next yesterday?) should do that: commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd Author: Jeremy Fitzhardinge jer...@goop.org Date: Wed Aug 3 09:43:44 2011 -0700 xen/tracing: it looks like we wanted CONFIG_FTRACE Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER. Reported-by: Sander Eikelenboom li...@eikelenboom.it Tested-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index 45e94ac..3326204 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -15,7 +15,7 @@ obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \ grant-table.o suspend.o platform-pci-unplug.o \ p2m.o -obj-$(CONFIG_FUNCTION_TRACER) += trace.o +obj-$(CONFIG_FTRACE) += trace.o I'm not sure this is correct either. Maybe it should be CONFIG_TRACEPOINTS? Steven? Actually, I believe the correct answer is: CONFIG_EVENT_TRACING -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer
On Thu, 2012-08-09 at 18:24 +0900, Masami Hiramatsu wrote: Yeah, it is really easy to fix that. But out of curiosity, would that be really a problem? I guess that host can access any guest page if need. If that is right, is that really insecure to leak randomly allocated unused page to the host? Yeah, it's like protecting userspace pages from the kernel ;-) -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] trace-cmd: Use TRACE_DIR envrionment variable if defined
On Wed, 2012-08-22 at 17:43 +0900, Yoshihiro YUNOMAE wrote: From: Masami Hiramatsu masami.hiramatsu...@hitachi.com Use TRACE_DIR environment variable for setting TRACING_DIR would be better, as we are searching for /debug/tracing and not /debug/trace. Perhaps DEBUG_TRACING_DIR would be even better as to be less of a generic term. -- Steve debugfs/tracing directory if defined. This is for controlling guest(or remote) ftrace. Signed-off-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com Signed-off-by: Yoshihiro YUNOMAE yoshihiro.yunomae...@hitachi.com --- trace-util.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/trace-util.c b/trace-util.c index e128188..d5a3eb4 100644 --- a/trace-util.c +++ b/trace-util.c @@ -311,6 +311,15 @@ char *tracecmd_find_tracing_dir(void) char type[100]; FILE *fp; + tracing_dir = getenv(TRACE_DIR); + if (tracing_dir) { + tracing_dir = strdup(tracing_dir); + if (!tracing_dir) + die(malloc); + warning(Use environmental tracing directory: %s\n, tracing_dir); + return tracing_dir; + } + if ((fp = fopen(/proc/mounts,r)) == NULL) { warning(Can't open /proc/mounts for read); return NULL; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/5] trace-cmd: Use tracing directory to count CPUs
On Wed, 2012-08-22 at 17:43 +0900, Yoshihiro YUNOMAE wrote: From: Masami Hiramatsu masami.hiramatsu...@hitachi.com Count debugfs/tracing/per_cpu/cpu* to determine the number of CPUs. I'm curious, do you find that sysconf doesn't return the # of CPUs the system has? I've had boxes where the per_cpu/cpu* had more cpus than the box actually holds. But this was a bug in the kernel, not the tool. This change log needs to have rational instead of just explaining what the patch does. Thanks, -- Steve Signed-off-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com Signed-off-by: Yoshihiro YUNOMAE yoshihiro.yunomae...@hitachi.com --- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/5] trace-cmd: Support trace-agent of virtio-trace
On Wed, 2012-08-22 at 17:43 +0900, Yoshihiro YUNOMAE wrote: Add read path and control path to use trace-agent of virtio-trace. When we use trace-agent, trace-cmd will be used as follows: # AGENT_READ_DIR=/tmp/virtio-trace/tracing \ AGENT_CTL=/tmp/virtio-trace/agent-ctl-path.in \ TRACING_DIR=/tmp/virtio-trace/debugfs/tracing \\ Ha! You used TRACING_DIR but patch one introduces TRACE_DIR. Lets change this to DEBUG_TRACING_DIR instead anyway. Also, I don't like the generic environment variables. Perhaps VIRTIO_TRACE_DIR, or AGENT_TRACE_DIR and AGENT_TRACE_CTL. Lets try to keep the environment namespace sparse. trace-cmd record -e sched:* Here, AGENT_READ_DIR is the path for a reading directory of virtio-trace, AGENT_CTL is a control path of trace-agent, and TRACING_DIR is a debugfs path of a guest. Signed-off-by: Yoshihiro YUNOMAE yoshihiro.yunomae...@hitachi.com --- trace-cmd.h |1 + trace-recorder.c | 57 +- trace-util.c | 18 + 3 files changed, 75 insertions(+), 1 deletions(-) diff --git a/trace-cmd.h b/trace-cmd.h index f904dc5..75506ed 100644 --- a/trace-cmd.h +++ b/trace-cmd.h @@ -72,6 +72,7 @@ static inline int tracecmd_host_bigendian(void) } char *tracecmd_find_tracing_dir(void); +char *guest_agent_tracing_read_dir(void); /* --- Opening and Reading the trace.dat file --- */ diff --git a/trace-recorder.c b/trace-recorder.c index 215affc..3b750e9 100644 --- a/trace-recorder.c +++ b/trace-recorder.c @@ -33,6 +33,7 @@ #include unistd.h #include ctype.h #include errno.h +#include stdbool.h #include trace-cmd.h @@ -43,6 +44,8 @@ struct tracecmd_recorder { int page_size; int cpu; int stop; + int ctl_fd; + boolagent_existing; Thanks for the reminder. I need to convert a lot to use 'bool' instead. }; void tracecmd_free_recorder(struct tracecmd_recorder *recorder) @@ -59,11 +62,29 @@ void tracecmd_free_recorder(struct tracecmd_recorder *recorder) free(recorder); } +static char *use_trace_agent_dir(char *ctl_path, + struct tracecmd_recorder *recorder) +{ + ctl_path = strdup(ctl_path); + if (!ctl_path) + die(malloc); + warning(Use environmental control path: %s\n, ctl_path); s/Use/Using/ -- Steve + + recorder-ctl_fd = open(ctl_path, O_WRONLY); + if (recorder-ctl_fd 0) + return NULL; + + recorder-agent_existing = true; + + return guest_agent_tracing_read_dir(); +} + ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/5] trace-cmd: Use tracing directory to count CPUs
On Thu, 2012-08-23 at 12:00 +0900, Masami Hiramatsu wrote: (2012/08/23 11:01), Masami Hiramatsu wrote: (2012/08/22 22:41), Steven Rostedt wrote: On Wed, 2012-08-22 at 17:43 +0900, Yoshihiro YUNOMAE wrote: From: Masami Hiramatsu masami.hiramatsu...@hitachi.com Count debugfs/tracing/per_cpu/cpu* to determine the number of CPUs. I'm curious, do you find that sysconf doesn't return the # of CPUs the system has? No, sysconf returns the number of hosts CPUs, not guests. I've had boxes where the per_cpu/cpu* had more cpus than the box actually holds. But this was a bug in the kernel, not the tool. This change log needs to have rational instead of just explaining what the patch does. Ah, I see. Hmm, then this should be enabled by a command line option or an environment variable. Oops, I misunderstood. I'll add more comment for why this should be tried instead of sysconf. And now that I understand why you are doing this, why not only do this if the TRACE_AGENT or DEBUG_TRACING_DIR is defined. That is, if we are doing it against a bare metal system, then sysconf should suffice, but if we are tracing against a guest, then it should use the tracing directory to determine the buffers. We could add options to override this, but I would think the default should just Do The Right Thing(tm). -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Readonly GDT
On Tue, 2013-04-09 at 17:43 -0700, H. Peter Anvin wrote: OK, thinking about the GDT here. The GDT is quite small -- 256 bytes on i386, 128 bytes on x86-64. As such, we probably don't want to allocate a full page to it for only that. This means that in order to create a readonly mapping we have to pack GDTs from different CPUs together in the same pages, *or* we tolerate that other things on the same page gets reflected in the same mapping. What about grouping via nodes? However, the packing solution has the advantage of reducing address space consumption which matters on 32 bits: even on i386 we can easily burn a megabyte of address space for 4096 processors, but burning 16 megabytes starts to hurt. Having 4096 32 bit processors, you deserve what you get. ;-) -- Steve It would be important to measure the performance impact on task switch, though. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] tracing/events: Add bounce tracing to swiotbl-xen
On Thu, 22 Aug 2013 22:47:28 +0100 Zoltan Kiss zoltan.k...@citrix.com wrote: /* * Used to do a quick range check in swiotlb_tbl_unmap_single and * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this @@ -358,6 +362,9 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, /* * Oh well, have to allocate and map a bounce buffer. */ + + trace_bounced(dev, dev_addr, size, swiotlb_force); Please use a more specific name. bounce is too generic. I know tracepoints are grouped by systems, but its easier for tools to just state a tracepoint name than the system:event pair. trace_xen_bounced() or trace_swiotlb_bounced() Something other than just bounced. Thanks! -- Steve + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir); if (map == SWIOTLB_MAP_ERROR) return DMA_ERROR_CODE; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] tracing/events: Add bounce tracing to swiotbl
On Wed, 25 Sep 2013 13:56:49 -0400 Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: .. snip.. + TP_printk(dev_name: %s dma_mask=%llx dev_addr=%llx + size=%zu swiotlb_force=%x, + __get_str(dev_name), + __entry-dma_mask, + (unsigned long long)__entry-dev_addr, + __entry-size, + __entry-swiotlb_force) Would it make sense to do something like this: __entry-swiotlb_force ? swiotlb_force : ) I think that's fine. I do believe that the libtraceevents can parse the ?: syntax. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip RFC 0/2] kprobes: introduce NOKPROBE_SYMBOL() and prohibit probing on .entry.text
On Tue, 12 Nov 2013 02:18:53 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: After that we can convert all the rest, probably as part of this series. OK, I'll do. :) BTW, converting all the __kprobes involves many archs, which kprobes ported. In that case, which mailing-list would better me to post the series, linux-arch? I would add linux-arch. Note, you may need to support both ways for the current time being, as new __kprobes are being added (I've seen several in patches flying by in LKML). But perhaps at a later -rc we convert the rest and discontinue them. That way, linux-next will break all new __kprobes, and that should get them fixed before we enter 3.14-rc. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip RFC v2 01/22] kprobes: Prohibit probing on .entry.text code
On Fri, 15 Nov 2013 04:53:18 + Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: .entry.text is a code area which is used for interrupt/syscall entries, and there are many sensitive codes. Thus, it is better to prohibit probing on all of such codes instead of a part of that. Since some symbols are already registered on kprobe blacklist, this also removes them from the blacklist. This change only works with x86. On other archs, I get this: kernel/built-in.o: In function `register_kprobe': (.kprobes.text+0x9f4): undefined reference to `__entry_text_start' kernel/built-in.o: In function `register_kprobe': (.kprobes.text+0x9f8): undefined reference to `__entry_text_end' make[1]: *** [vmlinux] Error 1 make: *** [sub-make] Error 2 -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
On Wed, 20 Nov 2013 12:36:00 -0500 Frank Ch. Eigler f...@redhat.com wrote: Hi - Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? That's generally the purpose of the annotations - if it doesn't then that's a bug. AFAIK, no kernel since kprobes was introduced has ever stood up to that test. perf probe lacks the wildcarding powers of systemtap, so one needs to resort to something like: # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do perf probe $symbol done I'm curious to why one would do that. IIUC, perf now has function tracing support. -- Steve then wait for a few hours for that to finish. Then, or while the loop is still running, run # perf record -e 'probe:*' -aR sleep 1 to take a kernel down. - FChE ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up after enabling function tracer. I asked him to bisect the functions within available_filter_functions, which he did and it came down to three: _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() It was found that this is only an issue when noreplace-paravirt is added to the kernel command line. This means that those functions are most likely called within critical sections of the funtion tracer, and must not be traced. In newer kenels _paravirt_nop() is defined within gcc asm(), and is no longer an issue. But both _paravirt_ident_{32,64}() causes the following splat when they are traced: mm/pgtable-generic.c:33: bad pmd 8800d2435150(01d00054) mm/pgtable-generic.c:33: bad pmd 8800d3624190(01d00070) mm/pgtable-generic.c:33: bad pmd 8800d36a5110(01d00054) mm/pgtable-generic.c:33: bad pmd 880118eb1450(01d00054) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] Modules linked in: e1000e CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: 880118f740c0 ti: 8800d4aec000 task.ti: 8800d4aec000 RIP: 0010:[] [] queued_spin_lock_slowpath+0x118/0x1a0 RSP: 0018:8800d4aefb90 EFLAGS: 0246 RAX: RBX: RCX: 88011eb16d40 RDX: 82485760 RSI: 1f288820 RDI: ea008030 RBP: 8800d4aefb90 R08: 000c R09: R10: 821c8e0e R11: R12: 88200fb8 R13: 7f7a4e3f7000 R14: ea000303f600 R15: 8800d4b562e0 FS: 7f7a4e3d7840() GS:88011eb0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f7a4e3f7000 CR3: d3e71000 CR4: 001406e0 Stack: 8800d4aefba0 81cc5f47 8800d4aefc60 8122c15b 8800d4aefcb0 8800d4aefbd0 811bf4cb 0002 0015 8800d2276050 8000c0fd8867 ea008030 Call Trace: [] _raw_spin_lock+0x27/0x30 [] handle_pte_fault+0x13db/0x16b0 [] ? function_trace_call+0x15b/0x180 [] ? handle_pte_fault+0x5/0x16b0 [] handle_mm_fault+0x312/0x670 [] ? find_vma+0x68/0x70 [] __do_page_fault+0x1b1/0x4e0 [] do_page_fault+0x22/0x30 [] page_fault+0x28/0x30 [] ? copy_user_enhanced_fast_string+0x5/0x10 [] ? seq_read+0x305/0x370 [] __vfs_read+0x28/0xe0 [] ? __vfs_read+0x5/0xe0 [] ? __vfs_read+0x5/0xe0 [] vfs_read+0x86/0x130 [] SyS_read+0x46/0xa0 [] entry_SYSCALL_64_fastpath+0x1e/0xa8 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b Reported-by: Łukasz Daniluk <lukasz.dani...@intel.com> Signed-off-by: Steven Rostedt <rost...@goodmis.org> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index f08ac28b8136..f975d226be6e 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -55,12 +55,12 @@ asm (".pushsection .entry.text, \"ax\"\n" ".popsection"); /* identity function, which can be inlined */ -u32 _paravirt_ident_32(u32 x) +u32 notrace _paravirt_ident_32(u32 x) { return x; } -u64 _paravirt_ident_64(u64 x) +u64 notrace _paravirt_ident_64(u64 x) { return x; } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
On Wed, 4 Oct 2017 14:19:56 -0700 Thomas Garnierwrote: > When using -fPIE/PIC with function tracing, the compiler generates a > call through the GOT (call *__fentry__@GOTPCREL). This instruction > takes 6 bytes instead of 5 on the usual relative call. > > With this change, function tracing supports 6 bytes on traceable > function and can still replace relative calls on the ftrace assembly > functions. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. Question: This 6 bytes is only the initial call that gcc creates. When function tracing is enabled, the calls are back to the normal call to the ftrace trampoline? -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
On Thu, 5 Oct 2017 09:01:14 -0700 Thomas Garnier <thgar...@google.com> wrote: > On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rost...@goodmis.org> wrote: > > On Wed, 4 Oct 2017 14:19:56 -0700 > > Thomas Garnier <thgar...@google.com> wrote: > > > >> When using -fPIE/PIC with function tracing, the compiler generates a > >> call through the GOT (call *__fentry__@GOTPCREL). This instruction > >> takes 6 bytes instead of 5 on the usual relative call. > >> > >> With this change, function tracing supports 6 bytes on traceable > >> function and can still replace relative calls on the ftrace assembly > >> functions. > >> > >> Position Independent Executable (PIE) support will allow to extended the > >> KASLR randomization range below the -2G memory limit. > > > > Question: This 6 bytes is only the initial call that gcc creates. When > > function tracing is enabled, the calls are back to the normal call to > > the ftrace trampoline? > > That is correct. > Then I think a better idea is to simply nop them out at compile time, and have the code that updates them to nops to know about it. See scripts/recordmcount.c Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it the same as if it didn't exist? This code can be a little complex, and can cause really nasty side effects if things go wrong. I would like to keep from adding more variables to the changes here. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] x86/vmware: Do not trace vmware_sched_clock()
From: Steven Rostedt (VMware) When running function tracing on a Linux guest running on VMware Workstation, the guest would crash. This is due to tracing of the sched_clock internal call of the VMware vmware_sched_clock(), which causes an infinite recursion within the tracing code (clock calls must not be traced). Make vmware_sched_clock() not traced by ftrace. Cc: sta...@vger.kernel.org Fixes: 80e9a4f21fd7c ("x86/vmware: Add paravirt sched clock") Reported-by: GwanYeong Kim Signed-off-by: Steven Rostedt (VMware) --- diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index d9ab49bed8af..0eda91f8eeac 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -77,7 +77,7 @@ static __init int setup_vmw_sched_clock(char *s) } early_param("no-vmw-sched-clock", setup_vmw_sched_clock); -static unsigned long long vmware_sched_clock(void) +static unsigned long long notrace vmware_sched_clock(void) { unsigned long long ns; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: netconsole deadlock with virtnet
On Mon, 23 Nov 2020 13:08:55 +0200 Leon Romanovsky wrote: > [ 10.028024] Chain exists of: > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2 Note, the problem is that we have a location that grabs the xmit_lock while holding target_list_lock (and possibly console_owner). > [ 10.028028] > [ 10.028028] Possible interrupt unsafe locking scenario: > [ 10.028029] > [ 10.028029]CPU0CPU1 > [ 10.028030] > [ 10.028030] lock(_xmit_ETHER#2); > [ 10.028032]local_irq_disable(); > [ 10.028032]lock(console_owner); > [ 10.028034]lock(target_list_lock); > [ 10.028035] > [ 10.028035] lock(console_owner); > [ 10.028036] > [ 10.028037] *** DEADLOCK *** > [ 10.028037] > [ 10.028107] the dependencies between the lock to be acquired > [ 10.028107] and HARDIRQ-irq-unsafe lock: > [ 10.028108] -> (_xmit_ETHER#2){+.-.}-{2:2} ops: 217 { > [ 10.028110]HARDIRQ-ON-W at: > [ 10.028111]__lock_acquire+0x8bc/0x1a94 > [ 10.028111]lock_acquire.part.0+0x170/0x360 > [ 10.028112]lock_acquire+0x68/0x8c > [ 10.028113]_raw_spin_trylock+0x80/0xd0 > [ 10.028113]virtnet_poll+0xac/0x360 xmit_lock is taken in virtnet_poll() (via virtnet_poll_cleantx()). This is called from the softirq, and interrupts are not disabled. > [ 10.028114]net_rx_action+0x1b0/0x4e0 > [ 10.028115]__do_softirq+0x1f4/0x638 > [ 10.028115]do_softirq+0xb8/0xcc > [ 10.028116]__local_bh_enable_ip+0x18c/0x200 > [ 10.028116]virtnet_napi_enable+0xc0/0xd4 > [ 10.028117]virtnet_open+0x98/0x1c0 > [ 10.028118]__dev_open+0x12c/0x200 > [ 10.028118]__dev_change_flags+0x1a0/0x220 > [ 10.028119]dev_change_flags+0x2c/0x70 > [ 10.028119]do_setlink+0x214/0xe20 > [ 10.028120]__rtnl_newlink+0x514/0x820 > [ 10.028120]rtnl_newlink+0x58/0x84 > [ 10.028121]rtnetlink_rcv_msg+0x184/0x4b4 > [ 10.028122]netlink_rcv_skb+0x60/0x124 > [ 10.028122]rtnetlink_rcv+0x20/0x30 > [ 10.028123]netlink_unicast+0x1b4/0x270 > [ 10.028124]netlink_sendmsg+0x1f0/0x400 > [ 10.028124]sock_sendmsg+0x5c/0x70 > [ 10.028125]sys_sendmsg+0x24c/0x280 > [ 10.028125]___sys_sendmsg+0x88/0xd0 > [ 10.028126]__sys_sendmsg+0x70/0xd0 > [ 10.028127]__arm64_sys_sendmsg+0x2c/0x40 > [ 10.028128]el0_svc_common.constprop.0+0x84/0x200 > [ 10.028128]do_el0_svc+0x2c/0x90 > [ 10.028129]el0_svc+0x18/0x50 > [ 10.028129]el0_sync_handler+0xe0/0x350 > [ 10.028130]el0_sync+0x158/0x180 [..] > [ 10.028171] ... key at: [] > netdev_xmit_lock_key+0x10/0x390 > [ 10.028171] ... acquired at: > [ 10.028172]__lock_acquire+0x134c/0x1a94 > [ 10.028172]lock_acquire.part.0+0x170/0x360 > [ 10.028173]lock_acquire+0x68/0x8c > [ 10.028173]_raw_spin_lock+0x64/0x90 > [ 10.028174]virtnet_poll_tx+0x84/0x120 > [ 10.028174]netpoll_poll_dev+0x12c/0x350 > [ 10.028175]netpoll_send_skb+0x39c/0x400 > [ 10.028175]netpoll_send_udp+0x2b8/0x440 > [ 10.028176]write_msg+0xfc/0x120 [netconsole] > [ 10.028176]console_unlock+0x3ec/0x6a4 The above shows the problem. We have: console_unlock() (which holds the console_owner lock) write_msg() (which holds the target_list_lock) Then we write_msg() calls: netpoll_send_udp() { netpoll_send_skb() { netpoll_poll_dev() { virtnet_poll_tx() (which takes the xmit_lock!) DEADLOCK! In netpoll_send_skb() I see this: /* tickle device maybe there is some cleanup */ netpoll_poll_dev(np->dev); Which looks to me that it will call some code that should only be used in softirq context. It's called with locks held that are taken in interrupt context, and any locks that are taken in netpoll_poll_dev() must always be taken with interrupts disabled. That is, if xmit_lock is taken within netpoll_poll_dev(), then it must always be taken with interrupts disabled. Otherwise you can have the deadlock that lockdep reported. -- Steve > [ 10.028177]
Re: netconsole deadlock with virtnet
On Tue, 24 Nov 2020 11:22:03 +0800 Jason Wang wrote: > Btw, have a quick search, there are several other drivers that uses tx > lock in the tx NAPI. tx NAPI is not the issue. The issue is that write_msg() (in netconsole.c) calls this polling logic with the target_list_lock held. Are those other drivers called by netconsole? If not, then this is unique to virtio-net. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: netconsole deadlock with virtnet
On Mon, 23 Nov 2020 10:52:52 -0800 Jakub Kicinski wrote: > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote: > > On Mon, 23 Nov 2020 13:08:55 +0200 > > Leon Romanovsky wrote: > > > > > > > [ 10.028024] Chain exists of: > > > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2 > > > > Note, the problem is that we have a location that grabs the xmit_lock while > > holding target_list_lock (and possibly console_owner). > > Well, it try_locks the xmit_lock. Does lockdep understand try-locks? > > (not that I condone the shenanigans that are going on here) Does it? virtnet_poll_tx() { __netif_tx_lock() { spin_lock(>_xmit_lock); That looks like we can have: CPU0CPU1 lock(xmit_lock) lock(console) lock(target_list_lock) __netif_tx_lock() lock(xmit_lock); [BLOCKED] lock(console) [BLOCKED] DEADLOCK. So where is the trylock here? Perhaps you need the trylock in virtnet_poll_tx()? -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: netconsole deadlock with virtnet
[ Adding netdev as perhaps someone there knows ] On Wed, 18 Nov 2020 12:09:59 +0800 Jason Wang wrote: > > This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is > > basically > > soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk() > > > > Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on > > console_owner, which is owned by CPU1? It still looks to me that the target_list_lock is taken in IRQ, (which can be the case because printk calls write_msg() which takes that lock). And someplace there's a: lock(target_list_lock) lock(xmit_lock) which means you can remove the console lock from this scenario completely, and you still have a possible deadlock between target_list_lock and xmit_lock. > > > If this is true, it looks not a virtio-net specific issue but somewhere > else. > > I think all network driver will synchronize through bh instead of hardirq. I think the issue is where target_list_lock is held when we take xmit_lock. Is there anywhere in netconsole.c that can end up taking xmit_lock while holding the target_list_lock? If so, that's the problem. As target_list_lock is something that can be taken in IRQ context, which means *any* other lock that is taking while holding the target_list_lock must also protect against interrupts from happening while it they are held. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: netconsole deadlock with virtnet
On Tue, 17 Nov 2020 12:23:41 +0200 Leon Romanovsky wrote: > Hi, > > Approximately two weeks ago, our regression team started to experience those > netconsole splats. The tested code is Linus's master (-rc4) + netdev net-next > + netdev net-rc. > > Such splats are random and we can't bisect because there is no stable > reproducer. > > Any idea, what is the root cause? > > [ 21.141117] netpoll: netconsole: local port > [ 21.141947] netpoll: netconsole: local IPv4 address 10.236.251.7 > [ 21.143138] netpoll: netconsole: interface 'eth1' > [ 21.143988] netpoll: netconsole: remote port 62517 > [ 21.145127] netpoll: netconsole: remote IPv4 address 10.236.251.1 > [ 21.146206] netpoll: netconsole: remote ethernet address 68:05:ca:aa:99:49 > [ 21.149464] = > [ 21.149466] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > [ 21.149467] 5.10.0-rc4_for_upstream_min_debug_2020_11_16_13_03 #1 Not > tainted > [ 21.149468] - > [ 21.149469] modprobe/596 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 21.149471] 000194b4e298 (_xmit_ETHER#2){+.-.}-{2:2}, at: > virtnet_poll_tx+0x84/0x120 > [ 21.149478] and this task is already holding: > [ 21.149478] 893d4018 (target_list_lock){}-{2:2}, at: > write_msg+0x6c/0x120 [netconsole] > [ 21.149483] which would create a new lock dependency: > [ 21.149484] (target_list_lock){}-{2:2} -> (_xmit_ETHER#2){+.-.}-{2:2} > [ 21.149491] but this new dependency connects a HARDIRQ-irq-safe lock: > [ 21.149492] (console_owner){-.-.}-{0:0} > [ 21.149496] ... which became HARDIRQ-irq-safe at: > [ 21.149496] __lock_acquire+0xa78/0x1a94 > [ 21.149498] lock_acquire.part.0+0x170/0x360 > [ 21.149498] lock_acquire+0x68/0x8c > [ 21.149500] console_unlock+0x1e8/0x6a4 > [ 21.149500] vprintk_emit+0x1c4/0x3c4 > [ 21.149501] vprintk_default+0x40/0x4c > [ 21.149502] vprintk_func+0x10c/0x220 > [ 21.149503] printk+0x68/0x90 > [ 21.149504] crng_fast_load+0x1bc/0x1c0 > [ 21.149505] add_interrupt_randomness+0x280/0x290 > [ 21.149506] handle_irq_event+0x80/0x120 > [ 21.149507] handle_fasteoi_irq+0xac/0x200 > [ 21.149508] __handle_domain_irq+0x84/0xf0 > [ 21.149509] gic_handle_irq+0xd4/0x320 > [ 21.149510] el1_irq+0xd0/0x180 > [ 21.149511] arch_cpu_idle+0x24/0x44 > [ 21.149512] default_idle_call+0x48/0xa0 > [ 21.149513] do_idle+0x260/0x300 > [ 21.149514] cpu_startup_entry+0x30/0x6c > [ 21.149515] rest_init+0x1b4/0x288 > [ 21.149515] arch_call_rest_init+0x18/0x24 > [ 21.149516] start_kernel+0x5cc/0x608 > [ 21.149518] to a HARDIRQ-irq-unsafe lock: > [ 21.149519] (_xmit_ETHER#2){+.-.}-{2:2} > [ 21.149523] ... which became HARDIRQ-irq-unsafe at: > [ 21.149524] ... __lock_acquire+0x8bc/0x1a94 > [ 21.149525] lock_acquire.part.0+0x170/0x360 > [ 21.149526] lock_acquire+0x68/0x8c > [ 21.149527] _raw_spin_trylock+0x80/0xd0 > [ 21.149527] virtnet_poll+0xac/0x360 > [ 21.149528] net_rx_action+0x1b0/0x4e0 > [ 21.149529] __do_softirq+0x1f4/0x638 > [ 21.149530] do_softirq+0xb8/0xcc > [ 21.149531] __local_bh_enable_ip+0x18c/0x200 > [ 21.149532] virtnet_napi_enable+0xc0/0xd4 > [ 21.149533] virtnet_open+0x98/0x1c0 > [ 21.149534] __dev_open+0x12c/0x200 > [ 21.149535] __dev_change_flags+0x1a0/0x220 > [ 21.149536] dev_change_flags+0x2c/0x70 > [ 21.149536] do_setlink+0x214/0xe20 > [ 21.149537] __rtnl_newlink+0x514/0x820 > [ 21.149538] rtnl_newlink+0x58/0x84 > [ 21.149539] rtnetlink_rcv_msg+0x184/0x4b4 > [ 21.149540] netlink_rcv_skb+0x60/0x124 > [ 21.149541] rtnetlink_rcv+0x20/0x30 > [ 21.149542] netlink_unicast+0x1b4/0x270 > [ 21.149543] netlink_sendmsg+0x1f0/0x400 > [ 21.149544] sock_sendmsg+0x5c/0x70 > [ 21.149545] sys_sendmsg+0x24c/0x280 > [ 21.149545] ___sys_sendmsg+0x88/0xd0 > [ 21.149546] __sys_sendmsg+0x70/0xd0 > [ 21.149547] __arm64_sys_sendmsg+0x2c/0x40 > [ 21.149548] el0_svc_common.constprop.0+0x84/0x200 > [ 21.149549] do_el0_svc+0x2c/0x90 > [ 21.149550] el0_svc+0x18/0x50 > [ 21.149551] el0_sync_handler+0xe0/0x350 > [ 21.149552] el0_sync+0x158/0x180 > [ 21.149553] other info that might help us debug this: > [ 21.149555] Chain exists of: > [ 21.149556] console_owner --> target_list_lock --> _xmit_ETHER#2 So somewhere we have target_list_lock taken while holding the console_owner lock (which can happen in interrupt). And what this is saying, target_list_lock is held somewhere where _xmit_ETHER#2 is taken (which I'm guessing from the back traces is the xmit_lock taken in virtnet_poll_tx(). Thus, you can have a deadlock with three CPUs (I need to update the lockdep output to make it proper one of these days). CPU0CPU1CPU2
Re: [RFC][PATCH] vhost/vsock: Add vsock_list file to map cid with vhost tasks
On Thu, 13 May 2021 16:57:34 +0100 Stefan Hajnoczi wrote: > This approach relies on process hierarchy of the VMM (QEMU). > Multi-process QEMU is in development and will allow VIRTIO devices to > run as separate processes from the main QEMU. It then becomes harder to > correlate a VIRTIO device process with its QEMU process. And we need to know all these mapping regardless, as we need to map each thread / process to the vCPU in order to correlate between host thread and vCPU thread for showing in KernelShark. Thus this mapping to find the main thread/process needs to be done regardless. > > So I think in the end this approach ends up being as fragile as parsing > command-lines. The kernel doesn't really have the concept of a "VM" that > the vhost_vsock is associated with :). Maybe just parse QEMU and crosvm > command-lines? > That's what we do now, and it already broke once, and even parsing the command line wont be enough for the stated reasons above. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC][PATCH] vhost/vsock: Add vsock_list file to map cid with vhost tasks
On Fri, 7 May 2021 16:11:20 +0200 Stefano Garzarella wrote: > Hi Steven, > > On Wed, May 05, 2021 at 04:38:55PM -0400, Steven Rostedt wrote: > >The new trace-cmd 3.0 (which is almost ready to be released) allows for > >tracing between host and guests with timestamp synchronization such that > >the events on the host and the guest can be interleaved in the proper order > >that they occur. KernelShark now has a plugin that visualizes this > >interaction. > > > >The implementation requires that the guest has a vsock CID assigned, and on > >the guest a "trace-cmd agent" is running, that will listen on a port for > >the CID. The on the host a "trace-cmd record -A guest@cid:port -e events" > >can be called and the host will connect to the guest agent through the > >cid/port pair and have the agent enable tracing on behalf of the host and > >send the trace data back down to it. > > > >The problem is that there is no sure fire way to find the CID for a guest. > >Currently, the user must know the cid, or we have a hack that looks for the > >qemu process and parses the --guest-cid parameter from it. But this is > >prone to error and does not work on other implementation (was told that > >crosvm does not use qemu). > > For debug I think could be useful to link the vhost-vsock kthread to the > CID, but for the user point of view, maybe is better to query the VM > management layer, for example if you're using libvirt, you can easily do: > > $ virsh dumpxml fedora34 | grep cid > We looked into going this route, but then that means trace-cmd host/guest tracing needs a way to handle every layer, as some people use libvirt (myself included), some people use straight qemu, some people us Xen, and some people use crosvm. We need to support all of them. Which is why I'm looking at doing this from the lowest common denominator, and since vsock is a requirement from trace-cmd to do this tracing, getting the thread that's related to the vsock is that lowest denominator. > > > > >As I can not find a way to discover CIDs assigned to guests via any kernel > >interface, I decided to create this one. Note, I'm not attached to it. If > >there's a better way to do this, I would love to have it. But since I'm not > >an expert in the networking layer nor virtio, I decided to stick to what I > >know and add a debugfs interface that simply lists all the registered > >CIDs > >and the worker task that they are associated with. The worker task at > >least has the PID of the task it represents. > > I honestly don't know if it's the best interface, like I said maybe for > debugging it's fine, but if we want to expose it to the user in some > way, we could support devlink/netlink to provide information about the > vsock devices currently in use. Ideally, a devlink/netlink is the right approach. I just had no idea on how to implement that ;-) So I went with what I know, which is debugfs files! > >Signed-off-by: Steven Rostedt (VMware) > >--- > >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > >index 5e78fb719602..4f03b25b23c1 100644 > >--- a/drivers/vhost/vsock.c > >+++ b/drivers/vhost/vsock.c > >@@ -15,6 +15,7 @@ > > #include > > #include > > #include > >+#include > > > > #include > > #include "vhost.h" > >@@ -900,6 +901,128 @@ static struct miscdevice vhost_vsock_misc = { > > .fops = _vsock_fops, > > }; > > > >+static struct dentry *vsock_file; > >+ > >+struct vsock_file_iter { > >+struct hlist_node *node; > >+int index; > >+}; > >+ > >+ > >+static void *vsock_next(struct seq_file *m, void *v, loff_t *pos) > >+{ > >+struct vsock_file_iter *iter = v; > >+struct vhost_vsock *vsock; > >+ > >+if (pos) > >+(*pos)++; > >+ > >+if (iter->index >= (int)HASH_SIZE(vhost_vsock_hash)) > >+return NULL; > >+ > >+if (iter->node) > >+iter->node = rcu_dereference_raw(hlist_next_rcu(iter->node)); > >+ > >+for (;;) { > >+if (iter->node) { > >+vsock = > >hlist_entry_safe(rcu_dereference_raw(iter->node), > >+ struct vhost_vsock, hash); > >+if (vsock->guest_cid) > >+break; > >+iter->node = > >rcu_dereference_raw(hlist_next_rcu(iter->node)); > >+continue; > >+} > >
Re: [RFC][PATCH] vhost/vsock: Add vsock_list file to map cid with vhost tasks
On Fri, 7 May 2021 17:43:32 +0200 Stefano Garzarella wrote: > >The start/stop of a seq_file() is made for taking locks. I do this with all > >my code in ftrace. Yeah, there's a while loop between the two, but that's > >just to fill the buffer. It's not that long and it never goes to userspace > >between the two. You can even use this for spin locks (but I wouldn't > >recommend doing it for raw ones). > > Ah okay, thanks for the clarification! > > I was worried because building with `make C=2` I had these warnings: > > ../drivers/vhost/vsock.c:944:13: warning: context imbalance in 'vsock_start' > - wrong count at exit > ../drivers/vhost/vsock.c:963:13: warning: context imbalance in 'vsock_stop' - > unexpected unlock > > Maybe we need to annotate the functions somehow. Yep, I it should have been. static void *vsock_start(struct seq_file *m, loff_t *pos) __acquires(rcu) { [...] } static void vsock_stop(struct seq_file *m, void *p) __releases(rcu) { [...] } static int vsock_show(struct seq_file *m, void *v) __must_hold(rcu) { [...] } And guess what? I just copied those annotations from sock_hash_seq_start(), sock_hash_seq_show() and sock_hash_seq_stop() from net/core/sock_map.c which is doing exactly the same thing ;-) So there's definitely precedence for this. > > > > >> > >> >+ > >> >+ iter->index = -1; > >> >+ iter->node = NULL; > >> >+ t = vsock_next(m, iter, NULL); > >> >+ > >> >+ for (; iter->index < HASH_SIZE(vhost_vsock_hash) && l < *pos; > >> >+ t = vsock_next(m, iter, )) > >> >+ ; > >> > >> A while() maybe was more readable... > > > >Again, I just cut and pasted from my other code. > > > >If you have a good idea on how to implement this with netlink (something > >that ss or netstat can dislpay), I think that's the best way to go. > > Okay, I'll take a look and get back to you. > If it's too complicated, we can go ahead with this patch. Awesome, thanks! -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC][PATCH] vhost/vsock: Add vsock_list file to map cid with vhost tasks
The new trace-cmd 3.0 (which is almost ready to be released) allows for tracing between host and guests with timestamp synchronization such that the events on the host and the guest can be interleaved in the proper order that they occur. KernelShark now has a plugin that visualizes this interaction. The implementation requires that the guest has a vsock CID assigned, and on the guest a "trace-cmd agent" is running, that will listen on a port for the CID. The on the host a "trace-cmd record -A guest@cid:port -e events" can be called and the host will connect to the guest agent through the cid/port pair and have the agent enable tracing on behalf of the host and send the trace data back down to it. The problem is that there is no sure fire way to find the CID for a guest. Currently, the user must know the cid, or we have a hack that looks for the qemu process and parses the --guest-cid parameter from it. But this is prone to error and does not work on other implementation (was told that crosvm does not use qemu). As I can not find a way to discover CIDs assigned to guests via any kernel interface, I decided to create this one. Note, I'm not attached to it. If there's a better way to do this, I would love to have it. But since I'm not an expert in the networking layer nor virtio, I decided to stick to what I know and add a debugfs interface that simply lists all the registered CIDs and the worker task that they are associated with. The worker task at least has the PID of the task it represents. Now I can find the cid / host process in charge of the guest pair: # cat /sys/kernel/debug/vsock_list 3 vhost-1954:2002 # ps aux | grep 1954 qemu1954 9.9 21.3 1629092 796148 ? Sl 16:22 0:58 /usr/bin/qemu-kvm -name guest=Fedora21,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-Fedora21/master-key.aes -machine pc-1.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64 -m 1000 -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 1eefeeb0-3ac7-07c1-926e-236908313b4c -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -blockdev {"driver":"host_device","filename":"/dev/mapper/vg_bxtest-GuestFedora","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"} -device ide-hd,bus=ide.0,unit=0,drive=libvirt-1- format,id=ide0-0-0,bootindex=1 -netdev tap,fd=34,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:9f:e9:d5,bus=pci.0,addr=0x3 -netdev tap,fd=35,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:ec:dc:6e,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pipe,id=charchannel0,path=/var/lib/trace-cmd/virt/Fedora21/trace-pipe-cpu0 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=trace-pipe-cpu0 -chardev pipe,id=charchannel1,path=/var/lib/trace-cmd/virt/Fedora21/trace-pipe-cpu1 -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=trace-pipe-cpu1 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -device vhost-vsock-pci,id=vsock0,guest-cid=3,vhostfd=16,bus=pci.0,addr=0x7 -msg timestamp=on root2000 0.0 0.0 0 0 ?S16:22 0:00 [kvm-pit/1954] root2002 0.0 0.0 0 0 ?S16:22 0:00 [vhost-1954] This is just an example of what I'm looking for. Just a way to find what process is using what cid. Signed-off-by: Steven Rostedt (VMware) --- diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 5e78fb719602..4f03b25b23c1 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include "vhost.h" @@ -900,6 +901,128 @@ static struct miscdevice vhost_vsock_misc = { .fops = _vsock_fops, }; +static struct dentry *vsock_file; + +struct vsock_file_iter { + struct hlist_node *node; + int index; +}; + + +static void *vsock_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct vsock_file_iter *iter = v; + struct vhost_vsock *vsock; + + if (pos) + (*pos)++; + + if (iter->index >= (int)HASH_SIZE(vhost_v
Re: [RFC][PATCH] vhost/vsock: Add vsock_list file to map cid with vhost tasks
For kicks, I wrote this program that uses libtracefs to search all CIDS (1-255), and find the kvm guests that are attached to them. It traces the sched_wakeup and kvm_exit, looking for: this_task -> wakeup -> wakeup -> kvm_exit when doing a connect to a cid. When it finds the pid that did a kvm_exit, it knows that's the PID that is woken by the vhost worker task. It's a little slow, and I would really like a better way to do this, but it's at least an option that is available now. -- Steve #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #define MAX_CID 256 static int this_pid; static int open_vsock(unsigned int cid, unsigned int port) { struct sockaddr_vm addr = { .svm_family = AF_VSOCK, .svm_cid = cid, .svm_port = port, }; int sd; sd = socket(AF_VSOCK, SOCK_STREAM, 0); if (sd < 0) return -1; if (connect(sd, (struct sockaddr *), sizeof(addr))) return -1; return sd; } struct pids { struct pids *next; int pid; }; struct trace_info { struct tracefs_instance *instance; struct tep_handle *tep; struct tep_event *wake_up; struct tep_event *kvm_exit; struct tep_format_field *common_pid; struct tep_format_field *wake_pid; struct pids *pids; intcid; intpid; }; static void tear_down_trace(struct trace_info *info) { tracefs_instance_file_write(info->instance, "events/enable", "0"); tracefs_instance_destroy(info->instance); tracefs_instance_free(info->instance); tep_free(info->tep); } static int setup_trace(struct trace_info *info) { const char *systems[] = { "sched", "kvm", NULL}; char *name; int ret; info->pids = NULL; ret = asprintf(, "vsock_find-%d\n", getpid()); if (ret < 0) return ret; info->instance = tracefs_instance_create(name); free(name); if (!info->instance) return -1; tracefs_trace_off(info->instance); info->tep = tracefs_local_events_system(NULL, systems); if (!info->tep) goto fail; info->wake_up = tep_find_event_by_name(info->tep, "sched", "sched_waking"); if (!info->wake_up) { fprintf(stderr, "Failed to find sched_waking\n"); goto fail; } info->kvm_exit = tep_find_event_by_name(info->tep, "kvm", "kvm_exit"); if (!info->kvm_exit) { fprintf(stderr, "Failed to find kvm_exit\n"); goto fail; } info->wake_pid = tep_find_any_field(info->wake_up, "pid"); if (!info->wake_pid) { fprintf(stderr, "Failed to find wake up pid\n"); goto fail; } info->common_pid = tep_find_common_field(info->wake_up, "common_pid"); if (!info->common_pid) { fprintf(stderr, "Failed to find common pid\n"); goto fail; } ret = tracefs_instance_file_write(info->instance, "events/sched/sched_waking/enable", "1"); if (ret < 0) { fprintf(stderr, "Failed to enable sched_waking\n"); goto fail; } ret = tracefs_instance_file_write(info->instance, "events/kvm/kvm_exit/enable", "1"); if (ret < 0) { fprintf(stderr, "Failed to enable kvm_exit\n"); goto fail; } return 0; fail: tear_down_trace(info); return -1; } static void free_pids(struct pids *pids) { struct pids *next; while (pids) { next = pids; pids = pids->next; free(next); } } static void add_pid(struct pids **pids, int pid) { struct pids *new_pid; new_pid = malloc(sizeof(*new_pid)); if (!new_pid) return; new_pid->pid = pid; new_pid->next = *pids; *pids = new_pid; } static bool match_pid(struct pids *pids, int pid) { while (pids) { if (pids->pid == pid) return true; pids = pids->next; } return false; } static int callback(struct tep_event *event, struct tep_record *record, int cpu, void *data) { struct trace_info *info = data; struct tep_handle *tep = info->tep; unsigned long long val; int type; int pid; int ret; ret = tep_read_number_field(info->common_pid, record->data, ); if (ret < 0) return 0; pid = val; if (!match_pid(info->pids, pid)) return 0; type = tep_data_type(tep, record); if (type == info->kvm_exit->id) { info->pid = pid; return -1; } if (type != info->wake_up->id) return 0; ret = tep_read_number_field(info->wake_pid, record->data, ); if (ret < 0) return 0; add_pid(>pids, (int)val); return 0; } static void print_cid_pid(int cid, int pid) { FILE *fp; char *path; char *buf = NULL; char *save; size_t l = 0; int tgid = -1; if (asprintf(, "/proc/%d/status", pid) < 0) return; fp = fopen(path, "r"); free(path); if (!fp) return; while (getline(, , fp) > 0) { char *tok; if (strncmp(buf, "Tgid:", 5) != 0) continue; tok = strtok_r(buf, ":", ); if (!tok) continue; tok = strtok_r(NULL, ":", ); if (!tok) continue; while (isspace(*tok)) tok++; tgid = strtol(tok, NULL, 0); break; } free(buf); if (tgid >= 0) printf("%d\t%d\n", cid, tgid); } static void find_cid(struct trace_info *info, int cid) { int fd; add_pid(>pids, this_pid); tracefs_instance_file_clear(info->instance, "trace"); tracefs_trace_on(info->instance); fd =
Re: [RFC PATCH v5 19/19] virtio/vsock: update trace event for SEQPACKET
On Thu, 18 Feb 2021 08:42:15 +0300 Arseny Krasnov wrote: Not sure if this was pulled in yet, but I do have a small issue with this patch. > @@ -69,14 +82,19 @@ TRACE_EVENT(virtio_transport_alloc_pkt, > __entry->type = type; > __entry->op = op; > __entry->flags = flags; > + __entry->msg_len = msg_len; > + __entry->msg_cnt = msg_cnt; > ), > - TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x", > + TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x " > + "msg_len=%u msg_cnt=%u", It's considered poor formatting to split strings like the above. This is one of the exceptions for the 80 character limit. Do not break strings just to keep it within 80 characters. -- Steve > __entry->src_cid, __entry->src_port, > __entry->dst_cid, __entry->dst_port, > __entry->len, > show_type(__entry->type), > show_op(__entry->op), > - __entry->flags) > + __entry->flags, > + __entry->msg_len, > + __entry->msg_cnt) > ); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 1/3] MAINTAINERS: Update maintainers for paravirt ops and VMware hypervisor interface
On Thu, 11 Nov 2021 07:50:39 +0100 Greg KH wrote: > > Signed-off-by: Srivatsa S. Bhat (VMware) > > Acked-by: Alexey Makhalov > > Acked-by: Deep Shah > > Acked-by: Juergen Gross > > Cc: sta...@vger.kernel.org > > Why are MAINTAINERS updates needed for stable? That's not normal :( Probably not needed, but does it hurt? And who's normal? -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 9/9] testmmiotrace: eliminate anonymous module_init & module_exit
On Wed, 16 Mar 2022 12:20:10 -0700 Randy Dunlap wrote: > Eliminate anonymous module_init() and module_exit(), which can lead to > confusion or ambiguity when reading System.map, crashes/oops/bugs, > or an initcall_debug log. > > Give each of these init and exit functions unique driver-specific > names to eliminate the anonymous names. > > Example 1: (System.map) > 832fc78c t init > 832fc79e t init > 832fc8f8 t init > > Example 2: (initcall_debug log) > calling init+0x0/0x12 @ 1 > initcall init+0x0/0x12 returned 0 after 15 usecs > calling init+0x0/0x60 @ 1 > initcall init+0x0/0x60 returned 0 after 2 usecs > calling init+0x0/0x9a @ 1 > initcall init+0x0/0x9a returned 0 after 74 usecs > > Fixes: 8b7d89d02ef3 ("x86: mmiotrace - trace memory mapped IO") > Signed-off-by: Randy Dunlap > Cc: Thomas Gleixner > Cc: Steven Rostedt Acked-by: Steven Rostedt (Google) -- Steve > Cc: Ingo Molnar > Cc: Karol Herbst > Cc: Pekka Paalanen > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: nouv...@lists.freedesktop.org > Cc: x...@kernel.org > --- > arch/x86/mm/testmmiotrace.c |8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > --- lnx-517-rc8.orig/arch/x86/mm/testmmiotrace.c > +++ lnx-517-rc8/arch/x86/mm/testmmiotrace.c > @@ -113,7 +113,7 @@ static void do_test_bulk_ioremapping(voi > synchronize_rcu(); > } > > -static int __init init(void) > +static int __init testmmiotrace_init(void) > { > unsigned long size = (read_far) ? (8 << 20) : (16 << 10); > int ret = security_locked_down(LOCKDOWN_MMIOTRACE); > @@ -136,11 +136,11 @@ static int __init init(void) > return 0; > } > > -static void __exit cleanup(void) > +static void __exit testmmiotrace_cleanup(void) > { > pr_debug("unloaded.\n"); > } > > -module_init(init); > -module_exit(cleanup); > +module_init(testmmiotrace_init); > +module_exit(testmmiotrace_cleanup); > MODULE_LICENSE("GPL"); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio: Workaround fix for hard hang on guest using fifos
This is more of a workaround patch and probably not the proper fix. But I'm doing some work that is using fifos for guests and this is causing a hang that is quite annoying. I currently have this patch applied to continue my work. I was working on analyzing data transfers between host and guests via virtio sockets (FIFOs on host, dev on guest), vsockets and TCP packets. I wrote a program to test each by passing a 1GB file and timing it. I'm using the splice system call to help move things along. In doing so, I found that my pipe between splice calls originally used "page_size" for data transfer, and that is not as efficient as finding out what the pipe size is. So I changed the code to use pipe_size and while debugging it, the guest locked up hard. I'm attaching the "agent-fifo" that runs on the guest, and the "client-fifo" that runs on the host (the names may be backwards, but makes sense when you add how I test vsockets and network packets). Here's what I did: # ./client-fifo /var/lib/virt/Guest/trace-pipe-cpu0.out /test/bigfile Where the trace-pipe-cpu0.out is the receiving side from the guest's virtio pipe. The /test/bigfile is created when data starts coming in from the guest pipe. # dd if=/dev/urandom of=bigfile bs=1024 count=1048576 # ./agent-fifo /dev/virtio-ports/trace-pipe-cpu0 bigfile With the updates to change the size being passed in the splice from page_size to pipe_size, this never finished (it would copy around a meg or so). And stopped. When I killed the agent-fifo task on the guest, the guest hung hard. Debugging this, I found that the guest is stuck in the loop in drivers/char/virt_console.c: __send_control_msg(): if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) { virtqueue_kick(vq); while (!virtqueue_get_buf(vq, ) && !virtqueue_is_broken(vq)) cpu_relax(); } It never exits that loop. My workaround (this patch) is to put in a timeout, and exit out if it spins there for more than 5 seconds. This makes the problem go away. Below is my changes, but this is a band-aid, it is not the cure. Workaround-fix-by: Steven Rostedt (Google) --- diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index e3c430539a17..65f259f3f8cb 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -551,6 +551,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, struct scatterlist sg[1]; struct virtqueue *vq; unsigned int len; + u64 end; if (!use_multiport(portdev)) return 0; @@ -567,9 +568,15 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) { virtqueue_kick(vq); + end = jiffies + 5 * HZ; while (!virtqueue_get_buf(vq, ) - && !virtqueue_is_broken(vq)) + && !virtqueue_is_broken(vq)) { + if (unlikely(end < jiffies)) { + dev_warn(>vdev->dev, "send_control_msg timed out!\n"); + break; + } cpu_relax(); + } } spin_unlock(>c_ovq_lock); #include #include #include #include #include #include #include #include #include #include #define _GNU_SOURCE #include #ifndef F_GETPIPE_SZ # define F_GETPIPE_SZ 1032 /* The Linux number for the option */ #endif const char *this_name = "agent-fifo"; static void usage(void) { printf("usage: %s dev bigfile\n" "\n",this_name); exit(-1); } static void __vdie(const char *fmt, va_list ap, int err) { int ret = errno; const char *p = this_name; if (err && errno) perror(p); else ret = -1; fprintf(stderr, " "); vfprintf(stderr, fmt, ap); fprintf(stderr, "\n"); exit(ret); } void die(const char *fmt, ...) { va_list ap; va_start(ap, fmt); __vdie(fmt, ap, 0); va_end(ap); } void pdie(const char *fmt, ...) { va_list ap; va_start(ap, fmt); __vdie(fmt, ap, 1); va_end(ap); } static unsigned long long time() { struct timeval tv; gettimeofday(, NULL); return tv.tv_sec * 100 + tv.tv_usec; } static void print_time(unsigned long long time) { unsigned long long seconds; unsigned long long usecs; seconds = time / 100; usecs = time - (seconds * 100); printf("time: %llu.%06llu\n", seconds, usecs); } int main(int argc, char *argv[]) { unsigned long long start; unsigned long long end; struct stat st; off_t size; char *file; char *port; int page_size; int pipe_size; int brass[2]; int ret; int cfd; int fd; if (argc < 3) usage(); port = argv[1]; file = argv[2]; fd = ope
Re: [PATCH] virtio: Workaround fix for hard hang on guest using fifos
On Fri, 25 Mar 2022 17:36:27 -0700 Linus Torvalds wrote: > On Fri, Mar 25, 2022 at 2:20 PM Steven Rostedt wrote: > > > > With the updates to change the size being passed in the splice from > > page_size to pipe_size, this never finished (it would copy around a meg or > > so). And stopped. When I killed the agent-fifo task on the guest, the guest > > hung hard. > > Without knowing (or really caring) at all how virtqueue works, this > sounds very much like the classic pipe deadlock where two processes > communicate over a pair of pipes, sending each other commands, and > replying to each other with status updates. It does look like this. Note, this is not due to my application, since it flows in one direction, but I think the way virtio pipes are implemented can cause this to occur. A virtio pipe on the guest looks like a normal bidirectional char device. You open it, read it, write to it. No problem. On the host, there are two FIFOs that are attached to the guest. One for each direction. That is, the host is using two pipes that convert to a character device on the guest. > > And you absolutely cannot do that if one side can possibly want to up > fill the whole pipe. > > Deadlock: > > - process A is trying to send data to process B (on 'pipe_A'), and > blocks because the pipe is full > > - process B is reads the data and everything is fine, and A gets to continue > I think you left out a step here, where A is blocked again. > - but then process B sends some stratus update the other way (on > 'pipe_B' - you can't use the same pipe for bidirectional, it's why you > use a pair of pipes or a socketpair) and waits for the result. > > - now A and B are both waiting for each other - A is waiting for B to > empty the big bunch of data it's sending, and B is waiting for the > result for the (small) command it sent. > > and neither makes any progress. > > You can find several mentions of these kinds of problems by just > googling for "bidirectional pipe deadlock" or similar. > > The solution is invariably to either > > (a) make sure that nobody writes even remotely close to enough data > to fill a pipe before reading the other pipe (you can still fill up a > pipe, but at least somebody is always going to succeed and make > progress and do the read to make progress). > > (b) make sure everybody who writes to a pipe will use nonblocking IO > (and is willing to do reads in between to satisfy the other end). > > That first case is basically what one of the PIPE_BUF guarantees is > all about (the other one is the atomicity it guarantees, ie you can > write a "command packet" and be guaranteed that readers will see it > without data mixed in from other writes). > > I have no idea what your client/agent does and how it interacts with > the virtio pipes, but it really _sounds_ like a very similar issue, > where it used to work (because PIPE_BUF) and now no longer does > (because pipe filled). So the agent writes to the virtio char device on the guest, while the client is reading the guest's FIFO on the host. One directional. > > And that virtio_console __send_control_msg() pattern very much sounds > like a "send data and wait for ACK" behavior of "process B". Something happens where data stops transferring (the virtio system waiting for status?) This mostly happens when I run the agent under gdb. Which it stops transferring data. Everything is still fine. I hit Ctrl^C twice to get back to the gdb command prompt, and then I kill the process with the gdb "kill" command. It asks me if I'm sure, I type "y", and then boom! The system is hung. The guest locks up. And printk()s show that the exit of the agent is in that loop. watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [agent-test:1165] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth ecdh_generic ecc cfg80211 rfkill cirrus drm_shmem_helper drm_kms_helper vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common pcspkr joydev serio_raw vsock virtio_console drm virtio_balloon 8139too i2c_piix4 floppy nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net net_failover failover virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring 8139cp mii ata_generic pata_acpi CPU: 0 PID: 1165 Comm: agent-test Not tainted 5.17.0-test+ #41 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:virtqueue_get_buf_ctx_split+0x0/0xd0 [virtio_ring] Code: 75 a6 45 31 e4 39 45 2c 74 e6 0f 0b 83 c6 01 48 83 c2 10 39 c6 75 b4 45 31 e4 39 45 2c 74 d1 0f 0b 66 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 41 54 53 80 7f 3b 00 0f 85 9a 00 00 00 48 8b 47 68 RSP: 0018:b85c012e7c68 EFLAGS: 0246 RAX: RBX: 9517051713
Re: [PATCH 2/3] x86/xen: move paravirt lazy code
On Wed, 13 Sep 2023 13:38:27 +0200 Juergen Gross wrote: > diff --git a/include/trace/events/xen.h b/include/trace/events/xen.h > index 44a3f565264d..0577f0cdd231 100644 > --- a/include/trace/events/xen.h > +++ b/include/trace/events/xen.h > @@ -6,26 +6,26 @@ > #define _TRACE_XEN_H > > #include > -#include > +#include > #include > > struct multicall_entry; > > /* Multicalls */ > DECLARE_EVENT_CLASS(xen_mc__batch, > - TP_PROTO(enum paravirt_lazy_mode mode), > + TP_PROTO(enum xen_lazy_mode mode), > TP_ARGS(mode), > TP_STRUCT__entry( > - __field(enum paravirt_lazy_mode, mode) > + __field(enum xen_lazy_mode, mode) > ), > TP_fast_assign(__entry->mode = mode), > TP_printk("start batch LAZY_%s", > - (__entry->mode == PARAVIRT_LAZY_MMU) ? "MMU" : > - (__entry->mode == PARAVIRT_LAZY_CPU) ? "CPU" : "NONE") > + (__entry->mode == XEN_LAZY_MMU) ? "MMU" : > + (__entry->mode == XEN_LAZY_CPU) ? "CPU" : "NONE") There's helper functions that make the above easier to implement as well as exports the symbols so that user space can parse this better: TRACE_DEFINE_ENUM(XEN_LAZY_NONE); TRACE_DEFINE_ENUM(XEN_LAZY_MMU); TRACE_DEFINE_ENUM(XEN_LAZY_CPU); [..] TP_printk("start batch LAZY_%s", __print_symbolic(mode, { XEN_LAZY_NONE, "NONE" }, { XEN_LAZY_MMU, "MMU }, { XEN_LAZY_CPU, "CPU" })) Then user space parsers that read the raw data can convert these events into something humans can read. -- Steve > ); > #define DEFINE_XEN_MC_BATCH(name)\ > DEFINE_EVENT(xen_mc__batch, name, \ > - TP_PROTO(enum paravirt_lazy_mode mode), \ > + TP_PROTO(enum xen_lazy_mode mode), \ >TP_ARGS(mode)) > > DEFINE_XEN_MC_BATCH(xen_mc_batch); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage
On Thu, 9 Jun 2022 15:02:20 +0200 Petr Mladek wrote: > > I'm somewhat curious whether we can actually remove that trace event. > > Good question. > > Well, I think that it might be useful. It allows to see trace and > printk messages together. Yes people still use it. I was just asked about it at Kernel Recipes. That is, someone wanted printk mixed in with the tracing, and I told them about this event (which they didn't know about but was happy to hear that it existed). -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 33/44] ftrace: WARN on rcuidle
Nit, the subject should have "tracing:" an not "ftrace:" as the former encompasses the tracing infrastructure and the latter is for the function hook part of that. On Mon, 19 Sep 2022 12:00:12 +0200 Peter Zijlstra wrote: > CONFIG_GENERIC_ENTRY disallows any and all tracing when RCU isn't > enabled. > > XXX if s390 (the only other GENERIC_ENTRY user as of this writing) > isn't comfortable with this, we could switch to > HAVE_NOINSTR_VALIDATION which is x86_64 only atm. > > Signed-off-by: Peter Zijlstra (Intel) > --- > include/linux/tracepoint.h | 13 - > kernel/trace/trace.c |3 +++ > 2 files changed, 15 insertions(+), 1 deletion(-) > > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -178,6 +178,16 @@ static inline struct tracepoint *tracepo > #endif /* CONFIG_HAVE_STATIC_CALL */ > > /* > + * CONFIG_GENERIC_ENTRY archs are expected to have sanitized entry and idle > + * code that disallow any/all tracing/instrumentation when RCU isn't > watching. > + */ > +#ifdef CONFIG_GENERIC_ENTRY > +#define RCUIDLE_COND(rcuidle)(rcuidle) > +#else Should probably move the below comment to here: /* srcu can't be used from NMI */ > +#define RCUIDLE_COND(rcuidle)(rcuidle && in_nmi()) > +#endif > + > +/* > * it_func[0] is never NULL because there is at least one element in the > array > * when the array itself is non NULL. > */ > @@ -189,7 +199,8 @@ static inline struct tracepoint *tracepo > return; \ > \ > /* srcu can't be used from NMI */ \ And remove the above. -- Steve > - WARN_ON_ONCE(rcuidle && in_nmi()); \ > + if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle)))\ > + return; \ > \ > /* keep srcu and sched-rcu usage consistent */ \ > preempt_disable_notrace(); \ > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -3104,6 +3104,9 @@ void __trace_stack(struct trace_array *t > return; > } > > + if (WARN_ON_ONCE(IS_ENABLED(CONFIG_GENERIC_ENTRY))) > + return; > + > /* >* When an NMI triggers, RCU is enabled via ct_nmi_enter(), >* but if the above rcu_is_watching() failed, then the NMI > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/6] ftrace/x86: Warn and ignore graph tracing when RCU is disabled
On Mon, 23 Jan 2023 21:50:12 +0100 Peter Zijlstra wrote: > All RCU disabled code should be noinstr and hence we should never get > here -- when we do, WARN about it and make sure to not actually do > tracing. > > Signed-off-by: Peter Zijlstra (Intel) > --- > arch/x86/kernel/ftrace.c |3 +++ > 1 file changed, 3 insertions(+) > > --- a/arch/x86/kernel/ftrace.c > +++ b/arch/x86/kernel/ftrace.c > @@ -646,6 +646,9 @@ void prepare_ftrace_return(unsigned long > if (unlikely(atomic_read(>tracing_graph_pause))) > return; > > + if (WARN_ONCE(!rcu_is_watching(), "RCU not on for: %pS\n", (void *)ip)) > + return; > + Please add this to after recursion trylock below. Although WARN_ONCE() should not not have recursion issues, as function tracing can do weird things, I rather be safe than sorry, and not have the system triple boot due to some path that might get added in the future. If rcu_is_watching() is false, it will still get by the below recursion check and warn. That is, the below check should be done before this function calls any other function. > bit = ftrace_test_recursion_trylock(ip, *parent); > if (bit < 0) > return; > -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/6] ftrace/x86: Warn and ignore graph tracing when RCU is disabled
On Mon, 23 Jan 2023 16:53:04 -0500 Steven Rostedt wrote: > On Mon, 23 Jan 2023 21:50:12 +0100 > Peter Zijlstra wrote: > > > All RCU disabled code should be noinstr and hence we should never get > > here -- when we do, WARN about it and make sure to not actually do > > tracing. > > > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > arch/x86/kernel/ftrace.c |3 +++ > > 1 file changed, 3 insertions(+) > > > > --- a/arch/x86/kernel/ftrace.c > > +++ b/arch/x86/kernel/ftrace.c > > @@ -646,6 +646,9 @@ void prepare_ftrace_return(unsigned long > > if (unlikely(atomic_read(>tracing_graph_pause))) > > return; > > > > + if (WARN_ONCE(!rcu_is_watching(), "RCU not on for: %pS\n", (void *)ip)) > > + return; > > + > > Please add this to after recursion trylock below. Although WARN_ONCE() > should not not have recursion issues, as function tracing can do weird > things, I rather be safe than sorry, and not have the system triple boot > due to some path that might get added in the future. > > If rcu_is_watching() is false, it will still get by the below recursion > check and warn. That is, the below check should be done before this > function calls any other function. > > > bit = ftrace_test_recursion_trylock(ip, *parent); > > if (bit < 0) > > return; > > > Actually, perhaps we can just add this, and all you need to do is create and set CONFIG_NO_RCU_TRACING (or some other name). This should cover all ftrace locations. (Uncompiled). -- Steve diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h index c303f7a114e9..10ee3fbb9113 100644 --- a/include/linux/trace_recursion.h +++ b/include/linux/trace_recursion.h @@ -135,6 +135,22 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip); # define do_ftrace_record_recursion(ip, pip) do { } while (0) #endif +#ifdef CONFIG_NO_RCU_TRACING +# define trace_warn_on_no_rcu(ip) \ + ({ \ + bool __ret = false; \ + if (!trace_recursion_test(TRACE_RECORD_RECURSION_BIT)) { \ + trace_recursion_set(TRACE_RECORD_RECURSION_BIT); \ + __ret = WARN_ONCE(!rcu_is_watching(), \ + "RCU not on for: %pS\n", (void *)ip); \ + trace_recursion_clear(TRACE_RECORD_RECURSION_BIT); \ + } \ + __ret; \ + }) +#else +# define trace_warn_on_no_rcu(ip) false +#endif + /* * Preemption is promised to be disabled when return bit >= 0. */ @@ -144,6 +160,9 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign unsigned int val = READ_ONCE(current->trace_recursion); int bit; + if (trace_warn_on_no_rcu(ip)) + return -1; + bit = trace_get_context_bit() + start; if (unlikely(val & (1 << bit))) { /* ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 0/6] use canonical ftrace path whenever possible
On Fri, 10 Mar 2023 12:06:58 -0700 Ross Zwisler wrote: > On Fri, Mar 10, 2023 at 03:29:49AM -0500, Michael S. Tsirkin wrote: > > On Wed, Feb 15, 2023 at 03:33:44PM -0700, Ross Zwisler wrote: > > > Changes in v2: > > > * Dropped patches which were pulled into maintainer trees. > > > * Split BPF patches out into another series targeting bpf-next. > > > * trace-agent now falls back to debugfs if tracefs isn't present. > > > * Added Acked-by from m...@redhat.com to series. > > > * Added a typo fixup for the virtio-trace README. > > > > > > Steven, assuming there are no objections, would you feel comfortable > > > taking this series through your tree? > > > > Acked-by: Michael S. Tsirkin > > > > if you want the virtio patches through my tree after all, let me know. > > Yes, please, that would be great. I'll send out v3 with the few patches that > haven't gotten a response, but I'll drop the virtio patches assuming you've > got them. > Since the last patch 6/6 is not tracing related, I would prefer it to go through the virtio tree. For patches 1-5, please add: Reviewed-by: Steven Rostedt (Google) Thanks! -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 01/20] x86: move prepare_ftrace_return prototype to header
On Tue, 16 May 2023 21:35:30 +0200 Arnd Bergmann wrote: > From: Arnd Bergmann > > On 32-bit builds, the prepare_ftrace_return() function only has a global > definition, but no prototype before it, which causes a warning: > > arch/x86/kernel/ftrace.c:625:6: warning: no previous prototype for > ‘prepare_ftrace_return’ [-Wmissing-prototypes] > 625 | void prepare_ftrace_return(unsigned long ip, unsigned long *parent, > > Move the prototype that is already needed for some configurations into > a header file where it can be seen unconditionally. > > Signed-off-by: Arnd Bergmann > --- > arch/x86/include/asm/ftrace.h | 3 +++ > arch/x86/kernel/ftrace.c | 3 --- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h > index 5061ac98ffa1..b8d4a07f9595 100644 > --- a/arch/x86/include/asm/ftrace.h > +++ b/arch/x86/include/asm/ftrace.h Acked-by: Steven Rostedt (Google) -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization