Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-16 Thread Miroslav Benes
On Wed, 15 Jul 2020, Josh Poimboeuf wrote:

> On Wed, Jul 15, 2020 at 03:41:55PM +0200, Petr Mladek wrote:
> > On Wed 2020-07-15 14:10:54, Petr Mladek wrote:
> > > On Wed 2020-07-15 13:11:14, Miroslav Benes wrote:
> > > > Petr, would you agree to revert -flive-patching.
> > > 
> > > Yes, I agree.
> > 
> > Or better to say that I will not block it.
> > 
> > I see that it causes maintenance burden. There are questions about
> > reliability and performance impact. I do not have a magic solution
> > in the pocket.
> > 
> > Anyway, we need a solution to know what functions need to get livepatched.
> > I do not have experience with comparing the assembly, so I do not know
> > how it is complicated and reliable.
> 
> Thanks Petr/Miroslav.  I can do the revert patch.  It doesn't have to be
> a permanent revert.  I'd certainly be willing to ACK it again in the
> future if/when it becomes more ready.

Ok.

> Yannick has agreed to start looking at the objtool function diff
> feature.  It's purely theoretical at the moment, we'll see how it works
> in practice.  We already do something similar in kpatch-build -- it
> differs from the objtool model, but it at least shows that something
> similar is possible.

Great. I'm looking forward to seeing that.

Thanks
Miroslav


Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-15 Thread Josh Poimboeuf
On Wed, Jul 15, 2020 at 03:41:55PM +0200, Petr Mladek wrote:
> On Wed 2020-07-15 14:10:54, Petr Mladek wrote:
> > On Wed 2020-07-15 13:11:14, Miroslav Benes wrote:
> > > Petr, would you agree to revert -flive-patching.
> > 
> > Yes, I agree.
> 
> Or better to say that I will not block it.
> 
> I see that it causes maintenance burden. There are questions about
> reliability and performance impact. I do not have a magic solution
> in the pocket.
> 
> Anyway, we need a solution to know what functions need to get livepatched.
> I do not have experience with comparing the assembly, so I do not know
> how it is complicated and reliable.

Thanks Petr/Miroslav.  I can do the revert patch.  It doesn't have to be
a permanent revert.  I'd certainly be willing to ACK it again in the
future if/when it becomes more ready.

Yannick has agreed to start looking at the objtool function diff
feature.  It's purely theoretical at the moment, we'll see how it works
in practice.  We already do something similar in kpatch-build -- it
differs from the objtool model, but it at least shows that something
similar is possible.

-- 
Josh



Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-15 Thread Petr Mladek
On Wed 2020-07-15 14:10:54, Petr Mladek wrote:
> On Wed 2020-07-15 13:11:14, Miroslav Benes wrote:
> > Petr, would you agree to revert -flive-patching.
> 
> Yes, I agree.

Or better to say that I will not block it.

I see that it causes maintenance burden. There are questions about
reliability and performance impact. I do not have a magic solution
in the pocket.

Anyway, we need a solution to know what functions need to get livepatched.
I do not have experience with comparing the assembly, so I do not know
how it is complicated and reliable.

Best Regards,
Petr


Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-15 Thread Petr Mladek
On Wed 2020-07-15 13:11:14, Miroslav Benes wrote:
> Petr, would you agree to revert -flive-patching.

Yes, I agree.

Best Regards,
Petr


Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-15 Thread Miroslav Benes
On Tue, 14 Jul 2020, Josh Poimboeuf wrote:

> On Tue, Jul 14, 2020 at 12:56:21PM +0200, Miroslav Benes wrote:
> > On Thu, 2 Jul 2020, Josh Poimboeuf wrote:
> > 
> > > On Tue, Jun 23, 2020 at 08:06:07AM -0700, Randy Dunlap wrote:
> > > > On 6/22/20 11:28 PM, Stephen Rothwell wrote:
> > > > > Hi all,
> > > > > 
> > > > > Changes since 20200622:
> > > > > 
> > > > 
> > > > on x86_64:
> > > > 
> > > > arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_timed_out()+0x24: 
> > > > unreachable instruction
> > > > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: 
> > > > unreachable instruction
> > > > 
> > > > Full randconfig file is attached.
> > > 
> > > More livepatch...
> > 
> > Correct.
> > 
> > Both are known and I thought Josh had fixes queued somewhere for both, but 
> > my memory fails me quite often. See below.
> 
> I did have fixes for some of them in a stash somewhere, but I never
> finished them because I decided it's a GCC bug.

Same here.
 
> > However, I think it is time to decide how to approach this whole saga. It 
> > seems that there are not so many places in the kernel in need of 
> > __noreturn annotation in the end and as jikos argued at least some of 
> > those should be fixed regardless.
> 
> I would agree that global functions like do_group_exit() deserve a
> __noreturn annotation, though it should be in the header file.  But
> static functions shouldn't need it.

Agreed. I'll post the patches for global functions eventually, but see 
below first.

> > Josh, should I prepare proper patches and submit them to relevant
> > maintainers to see where this path is going?
> 
> If that's how you want to handle it, ok, but it doesn't seem right to
> me, for the static functions at least.
> 
> > It would be much better to fix it in GCC, but it has been like banging 
> > one's head against a wall so far. Josh, you wanted to create a bug 
> > for GCC in this respect in the past? Has that happened?
> 
> I didn't open a bug, but I could, if you think that would help.  I
> haven't had a lot of success with GCC bugs in the past.

Understood.

> > If I remember correctly, we discussed briefly a possibility to cope with 
> > that in objtool, but no solution was presented.
> 
> That would also feel like a GCC workaround and might impede objtool's
> ability to find bugs like this one, and possibly more serious bugs.
> 
> > Removing -flive-patching is also a possibility. I don't like it much, but 
> > we discussed it with Petr M. a couple of months ago and it might be a way 
> > too.
> 
> -flive-patching has many problems which I outlined before.  None of them
> have been addressed.  I still feel the same way, that it should be
> reverted until it's ready.  Otherwise it's a drain on upstream.
> 
> Also, if the GCC developers won't acknowledge this bug then it doesn't
> give me confidence in their ability to keep the feature working as
> optimizations are added or changed.

I must admit that I've started to share the sentiment recently. And it is 
probably the main reason for changing my mind about the whole thing.

> I still think a potential alternative exists: objtool could be used as a
> simple tree-wide object diff tool by generating a checksum for each
> function.  Then the patch can be applied and built to see exactly which
> functions have changed, based on the changed checksums.  In which case
> this feature would no longer be needed anyway, would you agree?

Yes.

> I also think that could be a first step for converging our patch
> creation processes.

Yes again.

Petr, would you agree to revert -flive-patching due to reasons above? Is 
there anything you want to add?

Miroslav


Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-14 Thread Josh Poimboeuf
On Tue, Jul 14, 2020 at 12:56:21PM +0200, Miroslav Benes wrote:
> On Thu, 2 Jul 2020, Josh Poimboeuf wrote:
> 
> > On Tue, Jun 23, 2020 at 08:06:07AM -0700, Randy Dunlap wrote:
> > > On 6/22/20 11:28 PM, Stephen Rothwell wrote:
> > > > Hi all,
> > > > 
> > > > Changes since 20200622:
> > > > 
> > > 
> > > on x86_64:
> > > 
> > > arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_timed_out()+0x24: 
> > > unreachable instruction
> > > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable 
> > > instruction
> > > 
> > > Full randconfig file is attached.
> > 
> > More livepatch...
> 
> Correct.
> 
> Both are known and I thought Josh had fixes queued somewhere for both, but 
> my memory fails me quite often. See below.

I did have fixes for some of them in a stash somewhere, but I never
finished them because I decided it's a GCC bug.

> However, I think it is time to decide how to approach this whole saga. It 
> seems that there are not so many places in the kernel in need of 
> __noreturn annotation in the end and as jikos argued at least some of 
> those should be fixed regardless.

I would agree that global functions like do_group_exit() deserve a
__noreturn annotation, though it should be in the header file.  But
static functions shouldn't need it.

> Josh, should I prepare proper patches and submit them to relevant
> maintainers to see where this path is going?

If that's how you want to handle it, ok, but it doesn't seem right to
me, for the static functions at least.

> It would be much better to fix it in GCC, but it has been like banging 
> one's head against a wall so far. Josh, you wanted to create a bug 
> for GCC in this respect in the past? Has that happened?

I didn't open a bug, but I could, if you think that would help.  I
haven't had a lot of success with GCC bugs in the past.

> If I remember correctly, we discussed briefly a possibility to cope with 
> that in objtool, but no solution was presented.

That would also feel like a GCC workaround and might impede objtool's
ability to find bugs like this one, and possibly more serious bugs.

> Removing -flive-patching is also a possibility. I don't like it much, but 
> we discussed it with Petr M. a couple of months ago and it might be a way 
> too.

-flive-patching has many problems which I outlined before.  None of them
have been addressed.  I still feel the same way, that it should be
reverted until it's ready.  Otherwise it's a drain on upstream.

Also, if the GCC developers won't acknowledge this bug then it doesn't
give me confidence in their ability to keep the feature working as
optimizations are added or changed.

I still think a potential alternative exists: objtool could be used as a
simple tree-wide object diff tool by generating a checksum for each
function.  Then the patch can be applied and built to see exactly which
functions have changed, based on the changed checksums.  In which case
this feature would no longer be needed anyway, would you agree?

I also think that could be a first step for converging our patch
creation processes.

-- 
Josh



Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-14 Thread Miroslav Benes
On Thu, 2 Jul 2020, Josh Poimboeuf wrote:

> On Tue, Jun 23, 2020 at 08:06:07AM -0700, Randy Dunlap wrote:
> > On 6/22/20 11:28 PM, Stephen Rothwell wrote:
> > > Hi all,
> > > 
> > > Changes since 20200622:
> > > 
> > 
> > on x86_64:
> > 
> > arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_timed_out()+0x24: 
> > unreachable instruction
> > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable 
> > instruction
> > 
> > Full randconfig file is attached.
> 
> More livepatch...

Correct.

Both are known and I thought Josh had fixes queued somewhere for both, but 
my memory fails me quite often. See below.

However, I think it is time to decide how to approach this whole saga. It 
seems that there are not so many places in the kernel in need of 
__noreturn annotation in the end and as jikos argued at least some of 
those should be fixed regardless. Josh, should I prepare proper patches 
and submit them to relevant maintainers to see where this path is going?

It would be much better to fix it in GCC, but it has been like banging 
one's head against a wall so far. Josh, you wanted to create a bug 
for GCC in this respect in the past? Has that happened?

If I remember correctly, we discussed briefly a possibility to cope with 
that in objtool, but no solution was presented.

Removing -flive-patching is also a possibility. I don't like it much, but 
we discussed it with Petr M. a couple of months ago and it might be a way 
too.

Thanks
Miroslav

---

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 14e4b4d17ee5..469a71ecea3c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -279,7 +279,7 @@ static int fake_panic;
 static atomic_t mce_fake_panicked;
 
 /* Panic in progress. Enable interrupts and wait for final IPI */
-static void wait_for_panic(void)
+static void __noreturn wait_for_panic(void)
 {
long timeout = PANIC_TIMEOUT*USEC_PER_SEC;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 727150f28103..570649152e7f 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -877,7 +877,7 @@ SYSCALL_DEFINE1(exit, int, error_code)
  * as well as by sys_exit_group (below).
  */
 void
-do_group_exit(int exit_code)
+__noreturn do_group_exit(int exit_code)
 {
struct signal_struct *sig = current->signal;



Re: linux-next: Tree for Jun 23 (objtool (2))

2020-07-02 Thread Josh Poimboeuf
On Tue, Jun 23, 2020 at 08:06:07AM -0700, Randy Dunlap wrote:
> On 6/22/20 11:28 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20200622:
> > 
> 
> on x86_64:
> 
> arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_timed_out()+0x24: 
> unreachable instruction
> kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable 
> instruction
> 
> Full randconfig file is attached.

More livepatch...

-- 
Josh



Re: linux-next: Tree for Jun 23 (objtool (2))

2020-06-23 Thread Randy Dunlap
On 6/22/20 11:28 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20200622:
> 

on x86_64:

arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_timed_out()+0x24: 
unreachable instruction
kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable 
instruction

Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.8.0-rc2 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (SUSE Linux) 7.5.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70500
CONFIG_LD_VERSION=23200
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_UAPI_HEADER_TEST=y
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_WATCH_QUEUE=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ_FULL is not set
CONFIG_CONTEXT_TRACKING=y
CONFIG_CONTEXT_TRACKING_FORCE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
# CONFIG_SCHED_THERMAL_PRESSURE is not set
CONFIG_PSI=y
# CONFIG_PSI_DEFAULT_DISABLED is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=m
CONFIG_IKHEADERS=y
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_KMEM=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_HUGETLB=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
CONFIG_BOOT_CONFIG=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
# CONFIG_MULTIUSER is not set
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
# CONFIG_PRINTK is not set
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y