from:"Mathieu Desnoyers"

Re: [lttng-dev] date in lttng trace logs

2022-01-17 Thread Mathieu Desnoyers via lttng-dev

- On Jan 16, 2022, at 6:29 AM, lttng-dev  wrote: 

> Hi,
> Is it possible to have date in lttng trace logs like below?

> [Sat-Jan-15 06:03:53.725525515] (+2.019614411) localhost:

See man babeltrace(1), especially "--clock-date". 

Thanks, 

Mathieu 

> --
> Regards,
> Subrata Paul

> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] Transfer of LTTng copyright ownership: Jesper Derehag

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Hi Jesper,

In order to facilitate maintenance of the LTTng project, would you agree to
transfer your copyright ownership for code authored in your own time to
EfficiOS Inc. ?

Even though most of those changes are somewhat trivial, I would prefer that
we keep a public record of this for the future.

Those changes are authored with this email:

Jesper Derehag 

This affects the lttng-tools and lttng-ust projects.

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [PATCH lttng-ust] Copyright ownership transfer

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Apply copyright ownership transfer from David Goulet and Julien Desfossez
to EfficiOS Inc.

Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030087.html
Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030092.html
Signed-off-by: Mathieu Desnoyers 
Cc: David Goulet 
Cc: Julien Desfossez 
Change-Id: Ibc6bb52296406e68466d44ae991a7ab70134dd76
---
 include/lttng/ust-ctl.h| 2 +-
 include/lttng/ust-error.h  | 3 +--
 src/common/ustcomm.c   | 2 +-
 src/common/ustcomm.h   | 3 +--
 src/common/utils.c | 2 +-
 src/lib/lttng-ust-ctl/ustctl.c | 2 +-
 src/lib/lttng-ust/lttng-ust-comm.c | 2 +-
 src/lib/lttng-ust/strerror.c   | 2 +-
 8 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/lttng/ust-ctl.h b/include/lttng/ust-ctl.h
index 360bb09a..25e588b3 100644
--- a/include/lttng/ust-ctl.h
+++ b/include/lttng/ust-ctl.h
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: GPL-2.0-only
  *
- * Copyright (C) 2011 Julien Desfossez 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011-2013 Mathieu Desnoyers 
  */
 
diff --git a/include/lttng/ust-error.h b/include/lttng/ust-error.h
index 543c894a..49045bab 100644
--- a/include/lttng/ust-error.h
+++ b/include/lttng/ust-error.h
@@ -1,8 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
- * Copyright (C) 2011 Julien Desfossez 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011 Mathieu Desnoyers 
  */
 
diff --git a/src/common/ustcomm.c b/src/common/ustcomm.c
index 6dfec5ea..252ed4f2 100644
--- a/src/common/ustcomm.c
+++ b/src/common/ustcomm.c
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011-2013 Mathieu Desnoyers 
  */
 
diff --git a/src/common/ustcomm.h b/src/common/ustcomm.h
index 734757c3..d1e9af13 100644
--- a/src/common/ustcomm.h
+++ b/src/common/ustcomm.h
@@ -1,8 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
- * Copyright (C) 2011 Julien Desfossez 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011 Mathieu Desnoyers 
  */
 
diff --git a/src/common/utils.c b/src/common/utils.c
index 108f76db..0c95b06c 100644
--- a/src/common/utils.c
+++ b/src/common/utils.c
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011 Mathieu Desnoyers 
  */
 
diff --git a/src/lib/lttng-ust-ctl/ustctl.c b/src/lib/lttng-ust-ctl/ustctl.c
index 8c60bdb2..94d84843 100644
--- a/src/lib/lttng-ust-ctl/ustctl.c
+++ b/src/lib/lttng-ust-ctl/ustctl.c
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: GPL-2.0-only
  *
- * Copyright (C) 2011 Julien Desfossez 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011-2013 Mathieu Desnoyers 
  */
 
diff --git a/src/lib/lttng-ust/lttng-ust-comm.c 
b/src/lib/lttng-ust/lttng-ust-comm.c
index d6755079..caba7452 100644
--- a/src/lib/lttng-ust/lttng-ust-comm.c
+++ b/src/lib/lttng-ust/lttng-ust-comm.c
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011 Mathieu Desnoyers 
  */
 
diff --git a/src/lib/lttng-ust/strerror.c b/src/lib/lttng-ust/strerror.c
index 1fa7fc3a..5a671a07 100644
--- a/src/lib/lttng-ust/strerror.c
+++ b/src/lib/lttng-ust/strerror.c
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: LGPL-2.1-only
  *
- * Copyright (C) 2011 David Goulet 
+ * Copyright (C) 2011 EfficiOS Inc.
  * Copyright (C) 2011-2013 Mathieu Desnoyers 
  */
 
-- 
2.20.1

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [PATCH lttng-modules] Copyright ownership transfer

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Apply copyright ownership transfer from Julien Desfossez to EfficiOS Inc.

Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030092.html
Signed-off-by: Mathieu Desnoyers 
Cc: Julien Desfossez 
Change-Id: Ida168b1fbe6589cb371a549ef14d9b4b28b221b3
---
 .../lttng-syscalls-extractor/lttng-syscalls-extractor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/include/instrumentation/syscalls/lttng-syscalls-extractor/lttng-syscalls-extractor.c
 
b/include/instrumentation/syscalls/lttng-syscalls-extractor/lttng-syscalls-extractor.c
index 0f683750..8670d25f 100644
--- 
a/include/instrumentation/syscalls/lttng-syscalls-extractor/lttng-syscalls-extractor.c
+++ 
b/include/instrumentation/syscalls/lttng-syscalls-extractor/lttng-syscalls-extractor.c
@@ -5,7 +5,7 @@
  * Dump syscall metadata to console.
  *
  * Copyright 2011 Mathieu Desnoyers 
- * Copyright 2011 Julien Desfossez 
+ * Copyright 2011 EfficiOS Inc.
  */
 
 #include 
-- 
2.20.1

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Transfer of LTTng copyright ownership: David Goulet

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

- On Jan 14, 2022, at 2:34 PM, David Goulet dgou...@ev0ke.net wrote:

> On 14 Jan (14:31:02), Mathieu Desnoyers wrote:
>> Hi David,
>> 
>> In order to facilitate maintenance of the LTTng project, would you agree to
>> transfer your copyright ownership for code authored outside of your 
>> employment
>> at EfficiOS to EfficiOS Inc. ?
>> 
>> This particularly affects the lttng-tools and lttng-ust files with copyright:
>> 
>> Copyright (C) 201N David Goulet 
> 
> No problem.
> 
> I agree and this email is GPG signed :).

Great, I will CC you on the patches applying the copyright ownership transfer.
Please reply to them publicly with your Signed-off-by tag so I can add it
before merging them.

Thanks,

Mathieu

> 
> Cheers!
> David
> 
> --
> xtN36bqNuwFHhNp1j45nwcCwE5HeLkdFZT9naRvLgG0=

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Transfer of LTTng copyright ownership: Simon Marchi

2022-01-14 Thread Mathieu Desnoyers via lttng-dev




- On Jan 14, 2022, at 2:49 PM, Simon Marchi simon.mar...@efficios.com wrote:

> On 2022-01-14 2:39 p.m., Mathieu Desnoyers wrote:
>> Hi Simon,
>> 
>> In order to facilitate maintenance of the LTTng project, would you agree to
>> transfer your copyright ownership for code authored outside of your 
>> employment
>> at EfficiOS to EfficiOS Inc. ?
>> 
>> This particularly affects the lttng-tools files with copyright:
>> 
>> Copyright (C) 201N Simon Marchi 
>> 
>> Thank you!
>> 
>> Mathieu
> 
> Yes.

Great, I will CC you on the patches applying the copyright ownership transfer.
Please reply to them publicly with your Signed-off-by tag so I can add it
before merging them.

Thanks,

Mathieu

> 
> Thanks,
> 
> Simon

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Transfer of LTTng copyright ownership: Julien Desfossez

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

- On Jan 14, 2022, at 2:51 PM, Julien Desfossez j...@klipix.org wrote:

> On 2022-01-14 2:49 p.m., Mathieu Desnoyers wrote:
>> Hi Julien,
>> 
>> In order to facilitate maintenance of the LTTng project, would you agree to
>> transfer your copyright ownership for code authored outside of your 
>> employment
>> at EfficiOS to EfficiOS Inc. ?
>> 
>> This particularly affects the lttng-tools, lttng-ust and lttng-modules files
>> with copyright:
>> 
>> Copyright (C) 201N Julien Desfossez 
> Yes, no problem !

Great, I will CC you on the patches applying the copyright ownership transfer.
Please reply to them publicly with your Signed-off-by tag so I can add it
before merging them.

Thanks,

Mathieu

> 
>> Thanks for all your contributions to LTTng !
>> 
>> Mathieu
> 
> Thanks !
> 
> Julien

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] Transfer of LTTng copyright ownership: Julien Desfossez

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Hi Julien,

In order to facilitate maintenance of the LTTng project, would you agree to
transfer your copyright ownership for code authored outside of your employment
at EfficiOS to EfficiOS Inc. ?

This particularly affects the lttng-tools, lttng-ust and lttng-modules files
with copyright:

Copyright (C) 201N Julien Desfossez 

Thanks for all your contributions to LTTng !

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] Transfer of LTTng copyright ownership: Simon Marchi

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Hi Simon,

In order to facilitate maintenance of the LTTng project, would you agree to
transfer your copyright ownership for code authored outside of your employment
at EfficiOS to EfficiOS Inc. ?

This particularly affects the lttng-tools files with copyright:

Copyright (C) 201N Simon Marchi 

Thank you!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] Transfer of LTTng copyright ownership: David Goulet

2022-01-14 Thread Mathieu Desnoyers via lttng-dev

Hi David,

In order to facilitate maintenance of the LTTng project, would you agree to
transfer your copyright ownership for code authored outside of your employment
at EfficiOS to EfficiOS Inc. ?

This particularly affects the lttng-tools and lttng-ust files with copyright:

Copyright (C) 201N David Goulet 

Thanks again for all your contributions to LTTng !

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH 07/23] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2022-01-12 Thread Mathieu Desnoyers

include/linux/sync_core.h
> deleted file mode 100644
> index 013da4b8b327..
> --- a/include/linux/sync_core.h
> +++ /dev/null
> @@ -1,21 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _LINUX_SYNC_CORE_H
> -#define _LINUX_SYNC_CORE_H
> -
> -#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> -#include 
> -#else
> -/*
> - * This is a dummy sync_core_before_usermode() implementation that can be 
> used
> - * on all architectures which return to user-space through core serializing
> - * instructions.
> - * If your architecture returns to user-space through non-core-serializing
> - * instructions, you need to write your own functions.
> - */
> -static inline void sync_core_before_usermode(void)
> -{
> -}
> -#endif
> -
> -#endif /* _LINUX_SYNC_CORE_H */
> -

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 06/23] powerpc/membarrier: Remove special barrier on mm switch

2022-01-12 Thread Mathieu Desnoyers

- On Jan 8, 2022, at 11:43 AM, Andy Lutomirski l...@kernel.org wrote:

> powerpc did the following on some, but not all, paths through
> switch_mm_irqs_off():
> 
>   /*
>* Only need the full barrier when switching between processes.
>* Barrier when switching from kernel to userspace is not
>* required here, given that it is implied by mmdrop(). Barrier
>* when switching from userspace to kernel is not needed after
>* store to rq->curr.
>*/
>   if (likely(!(atomic_read(>membarrier_state) &
>(MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
>   return;
> 
> This is puzzling: if !prev, then one might expect that we are switching
> from kernel to user, not user to kernel, which is inconsistent with the
> comment.  But this is all nonsense, because the one and only caller would
> never have prev == NULL and would, in fact, OOPS if prev == NULL.
> 
> In any event, this code is unnecessary, since the new generic
> membarrier_finish_switch_mm() provides the same barrier without arch help.
> 
> arch/powerpc/include/asm/membarrier.h remains as an empty header,
> because a later patch in this series will add code to it.

My disagreement with "membarrier: Make the post-switch-mm barrier explicit"
may affect this patch significantly, or even make it irrelevant.

Thanks,

Mathieu

> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin 
> Cc: Mathieu Desnoyers 
> Cc: Peter Zijlstra 
> Signed-off-by: Andy Lutomirski 
> ---
> arch/powerpc/include/asm/membarrier.h | 24 
> arch/powerpc/mm/mmu_context.c |  1 -
> 2 files changed, 25 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/membarrier.h
> b/arch/powerpc/include/asm/membarrier.h
> index de7f79157918..b90766e95bd1 100644
> --- a/arch/powerpc/include/asm/membarrier.h
> +++ b/arch/powerpc/include/asm/membarrier.h
> @@ -1,28 +1,4 @@
> #ifndef _ASM_POWERPC_MEMBARRIER_H
> #define _ASM_POWERPC_MEMBARRIER_H
> 
> -static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> -  struct mm_struct *next,
> -  struct task_struct *tsk)
> -{
> - /*
> -  * Only need the full barrier when switching between processes.
> -  * Barrier when switching from kernel to userspace is not
> -  * required here, given that it is implied by mmdrop(). Barrier
> -  * when switching from userspace to kernel is not needed after
> -  * store to rq->curr.
> -  */
> - if (IS_ENABLED(CONFIG_SMP) &&
> - likely(!(atomic_read(>membarrier_state) &
> -  (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> -   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
> - return;
> -
> - /*
> -  * The membarrier system call requires a full memory barrier
> -  * after storing to rq->curr, before going back to user-space.
> -  */
> - smp_mb();
> -}
> -
> #endif /* _ASM_POWERPC_MEMBARRIER_H */
> diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
> index 74246536b832..5f2daa6b0497 100644
> --- a/arch/powerpc/mm/mmu_context.c
> +++ b/arch/powerpc/mm/mmu_context.c
> @@ -84,7 +84,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
> mm_struct *next,
>   asm volatile ("dssall");
> 
>   if (!new_on_cpu)
> - membarrier_arch_switch_mm(prev, next, tsk);
> 
>   /*
>* The actual HW switching method differs between the various
> --
> 2.33.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] [RELEASE] LTTng-UST 2.12.3 and 2.11.5 (Linux user-space tracer)

2022-01-05 Thread Mathieu Desnoyers via lttng-dev

- On Jan 5, 2022, at 4:27 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> Hi,
> 
> This is a release announcement for the LTTng-UST 2.12.3 and 2.11.5
> tracer. Those are bug fix releases.
> 
> The 2.11.5 release marks the end of life (EOL) of the stable-2.11 branch.
> Users experiencing issues with the 2.11 branch are expected to upgrade to
> either 2.12.3 or 2.13.1.
> 
> Note that the latest stable branch is 2.13 (its most recent release
> is 2.13.1, released on December 10, 2021).
> 
> * Notable changes in 2.12.3:
> 
> Two notable fixes are "Fix: nestable pthread cancelstate" and
> "Fix: abort on decrement_sem_count during concurrent tracing start and
> teardown",
> which fix a rare deadlock scenario at application exit.

Just as a clarification, the second fix corrects a race scenario triggering an
assertion failure.

Thanks,

MAthieu

> 
> Another notable fix is "fix: liblttng-ust-fd async-signal-safe close()", which
> ensures that the implementation of the symbol interposition for "close" is
> async-signal safe, as it should be.
> 
> Please refer to the change logs below for the fixes contained in those
> two releases.
> 
> Feedback is welcome,
> 
> Thanks,
> 
> Mathieu
> 
> Project website: https://lttng.org
> Documentation: https://lttng.org/docs
> Download link: https://lttng.org/download
> 
> 
> 2022-01-05 (National Bird Day) lttng-ust 2.12.3
>* Fix: ust-cancelstate: include string.h for strerror
>* Fix: lttng-ust-fd: remove lttng_ust_common_ctor call
>* Fix: nestable pthread cancelstate
>* Fix: abort on decrement_sem_count during concurrent tracing start 
> and teardown
>* fix: liblttng-ust-fd async-signal-safe close()
>* Set git-review branch to stable-2.12
>* fix: link benchmark with pthread
>* Fix: liblttng-ust-ctl have dependencies on liburcu
>* Fix: add extern "C" to two header files
> 
> 2022-01-05 (National Bird Day) lttng-ust 2.11.5
>* Set git-review branch to stable-2.11
>* fix: allow building with userspace-rcu 0.13
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng-UST 2.12.3 and 2.11.5 (Linux user-space tracer)

2022-01-05 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a release announcement for the LTTng-UST 2.12.3 and 2.11.5
tracer. Those are bug fix releases.

The 2.11.5 release marks the end of life (EOL) of the stable-2.11 branch.
Users experiencing issues with the 2.11 branch are expected to upgrade to
either 2.12.3 or 2.13.1.

Note that the latest stable branch is 2.13 (its most recent release
is 2.13.1, released on December 10, 2021).

* Notable changes in 2.12.3:

Two notable fixes are "Fix: nestable pthread cancelstate" and
"Fix: abort on decrement_sem_count during concurrent tracing start and 
teardown",
which fix a rare deadlock scenario at application exit.

Another notable fix is "fix: liblttng-ust-fd async-signal-safe close()", which
ensures that the implementation of the symbol interposition for "close" is
async-signal safe, as it should be.

Please refer to the change logs below for the fixes contained in those
two releases.

Feedback is welcome,

Thanks,

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download


2022-01-05 (National Bird Day) lttng-ust 2.12.3
* Fix: ust-cancelstate: include string.h for strerror
* Fix: lttng-ust-fd: remove lttng_ust_common_ctor call
* Fix: nestable pthread cancelstate
* Fix: abort on decrement_sem_count during concurrent tracing start and 
teardown
* fix: liblttng-ust-fd async-signal-safe close()
* Set git-review branch to stable-2.12
* fix: link benchmark with pthread
* Fix: liblttng-ust-ctl have dependencies on liburcu
* Fix: add extern "C" to two header files

2022-01-05 (National Bird Day) lttng-ust 2.11.5
* Set git-review branch to stable-2.11
* fix: allow building with userspace-rcu 0.13

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] Userspace RCU 0.11.4, 0.12.3 and 0.13.1

2022-01-05 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a release announcement for the currently maintained stable
branches of the Userspace RCU library. Please note that 0.11.4 is
the last release of the 0.11 stable branch, which is reaching end
of life. Users facing issues with the 0.11 stable branch are expected
to upgrade to 0.12.3 or 0.13.1.

The content of those bug fix releases is fairly straightforward,
and the change log small enough to be shown in this release
announcement (see below).

Feedback is as always welcome!

Thanks,

Mathieu

Project website: https://liburcu.org
Git repository: git://git.liburcu.org/urcu.git

Changelog:

2022-01-05 Userspace RCU 0.13.1
* fix: properly detect 'cmpxchg' on x86-32
* fix: use urcu-tls compat with c++ compiler
* fix: remove autoconf features default value in help message
* fix: add missing pkgconfig file for memb flavour lib
* Make temporary variable in _rcu_dereference non-const
* Fix: x86 and s390: uatomic __hp() macro C++ support
* Fix: x86 and s390: uatomic __hp() macro clang support
* Fix: x86 and s390 uatomic: __hp() macro warning with gcc 11
* Fix: changelog: v0.13.0 was released in 2021

2022-01-05 Userspace RCU 0.12.3
* fix: use urcu-tls compat with c++ compiler
* fix: add missing pkgconfig file for memb flavour lib
* Make temporary variable in _rcu_dereference non-const
* Fix: x86 and s390: uatomic __hp() macro C++ support
* Fix: x86 and s390: uatomic __hp() macro clang support
* Fix: x86 and s390 uatomic: __hp() macro warning with gcc 11
* Document known ABI issue in README.md
* fix: clock_gettime on macOs
* Fix: rculist header: use parenthesis around macro parameters
* Fix: rcuhlist header: use parenthesis around macro parameters
* Fix: hlist header: use parenthesis around macro parameters
* Fix: list.h: use parenthesis around macro parameters, 
caa_container_of()
* Fix: hlist iteration relies on undefined behavior
* Fix: use __atomic_load() rather than atomic load explicit
* Fix: use atomic load memory_order_consume for rcu_dereference on 
C11/C++11
* fix: warnings on non-Linux platforms
* fix: HAVE_SCHED_SETAFFINITY is not defined
* Add git review config to stable branch
* fix: include 'sys/endian.h' on FreeBSD
* cleanup: explicitly mark unused parameters (-Wunused-parameter)
* fix: shadowed local variable (-Wshadow)
* cleanup: all functions have declarations (-Wmissing-prototypes)
* Import libtap from babeltrace

2022-01-05 Userspace RCU 0.11.4
* fix: add missing pkgconfig file for memb flavour lib
* Make temporary variable in _rcu_dereference non-const
* Fix: x86 and s390: uatomic __hp() macro C++ support
* Fix: x86 and s390: uatomic __hp() macro clang support
* Fix: x86 and s390 uatomic: __hp() macro warning with gcc 11
* Document known ABI issue in README.md
* fix: clock_gettime on macOs
* Fix: rculist header: use parenthesis around macro parameters
* Fix: rcuhlist header: use parenthesis around macro parameters
* Fix: hlist header: use parenthesis around macro parameters
* Fix: list.h: use parenthesis around macro parameters, 
caa_container_of()
* Fix: hlist iteration relies on undefined behavior
* Fix: use __atomic_load() rather than atomic load explicit
* Fix: use atomic load memory_order_consume for rcu_dereference on 
C11/C++11
* Fix: gitreview defaultbranch should be stable-0.11
* fix: warnings on non-Linux platforms
* fix: HAVE_SCHED_SETAFFINITY is not defined
* Add git review config to stable branch
* fix: include 'sys/endian.h' on FreeBSD
* cleanup: explicitly mark unused parameters (-Wunused-parameter)
* fix: shadowed local variable (-Wshadow)
* cleanup: all functions have declarations (-Wmissing-prototypes)
* Import libtap from babeltrace

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng-modules 2.12.7 and 2.13.1 (Linux kernel tracer)

2022-01-05 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a release announcement for the LTTng Linux kernel tracer 2.12.7 and
2.13.1 releases. Both are bug fix releases in the lttng-modules stable branches.
They also enable support for the most recent Linux kernel releases.

LTTng-modules 2.12.7 contains instrumentation fixes for the 5.14, 5.15 and
5.16-rc Linux kernels. Kernel version ranges were adjusted for instrumentation
of the RHEL 8.4 kernel.

LTTng-modules 2.13.1 fixes scenarios where trigger actions are missing:

There is an issue when associating multiple actions to a single system
call, where the first action is effectively added, but the following
actions are silently ignored.

The affected on-event trigger is a feature newly introduced in 2.13.

This 2.13.1 release also contains instrumentation fixes for the 5.15 and 5.16-rc
Linux kernels. It now emits a warning in the kernel console whenever 
registration
of events internally fails within LTTng-modules, for example in case of
out-of-memory error.

As always, feedback is welcome!

Thanks,

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng-UST 2.13.1 (Linux user-space tracer)

2021-12-10 Thread Mathieu Desnoyers via lttng-dev

This 2.13.1 release of LTTng-UST is a bugfix-only release for various
shortcomings that were identified in the 2.13.0 release.

A few key highlights: the C++ support for building probe providers has
been improved, async-signal-safety of the close wrapper has been fixed,
pthread cancelstate in nested use scenarios is now properly handled.

2021-12-10 (Lost and Found Day) lttng-ust 2.13.1
* Fix: ust-compiler: constructor/destructor build on g++ 4.8
* ust-compiler: constructor/destructor whitespaces layout and macro 
dependency
* Fix: ust-cancelstate: include string.h for strerror
* Fix: libnuma is prepended to LIBS
* fix: Allow disabling some abi compat tests
* Fix: generate probe registration constructor as a C++ constuctor
* Fix: nestable pthread cancelstate
* Fix: abort on decrement_sem_count during concurrent tracing start and 
teardown
* fix: allocating C++ compound literal on heap with Clang
* Check for C++11 when building C++ probe providers
* fix: liblttng-ust-fd async-signal-safe close()
* tracepoints: print debug message when lttng-ust-tracepoint.so is not 
found
* Fix: static_assert unavailable with glibc < 2.16
* Fix: combined tracing of lttng-ust 2.12/2.13 generates corrupted 
traces
* doc/man: Document LTTNG_UST_ABORT_ON_CRITICAL variable
* fix: remove autoconf features default value in help message
* Set git-review branch to stable-2.13
* Fix: add extern "C" to two header files

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] 回复： lttng-consumerd can NOT get notification to consume ring buffer data

2021-12-09 Thread Mathieu Desnoyers via lttng-dev

- On Dec 7, 2021, at 8:47 PM, zhenyu.ren  wrote: 

> Hi, Desnoyers

> For various reasons, none of the three suggestions you mentioned about
> reproducing the problem can be achieved easily.
> I am also wonder why commit counter not updated...It seems that the span
> producers not complete commiting(be killed?hang?)? It‘s very hard to debug all
> span producers since they are too many...
> So, Can I make lttng-consumer just skip out this subbuffer and try to consume
> the next subbuffer?

No, it's not that simple. Once this state is reached (e.g. caused by a killed 
application), just skipping this sub-buffer 
will not allow applications to continue using this buffer for tracing until it 
is destroyed. 

Thanks, 

Mathieu 

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data

2021-12-07 Thread Mathieu Desnoyers via lttng-dev

st struct
> lttng_ust_lib_ring_buffer_config *config,
> 211 struct lttng_ust_lib_ring_buffer *buf,
> 212 struct channel *chan,
> 213 struct lttng_ust_shm_handle *handle)
> 214 {
> 215 unsigned long consumed_old, consumed_idx, commit_count, write_offset;
> 216
> 217 consumed_old = uatomic_read(>consumed);
> 218 consumed_idx = subbuf_index(consumed_old, chan);
> 219 commit_count = v_read(config, _index(handle, buf->commit_cold,
> consumed_idx)->cc_sb);
> 220 /*
> 221 * No memory barrier here, since we are only interested
> 222 * in a statistically correct polling result. The next poll will
> 223 * get the data is we are racing. The mb() that ensures correct
> 224 * memory order is in get_subbuf.
> 225 */
> 226 write_offset = v_read(config, >offset);
> 227
> 228 /*
> 229 * Check that the subbuffer we are trying to consume has been
> 230 * already fully committed.
> 231 */
> 232
> 233 if (((commit_count - chan->backend.subbuf_size)
> 234 & chan->commit_count_mask)
> 235 - (buf_trunc(consumed_old, chan)
> 236 >> chan->backend.num_subbuf_order)
> 237 != 0)
> 238 return 0;
> 239
> 240 /*
> 241 * Check that we are not about to read the same subbuffer in
> 242 * which the writer head is.
> 243 */
> 244 if (subbuf_trunc(write_offset, chan) - subbuf_trunc(consumed_old, chan)
> 245 == 0)
> 246 return 0;
> 247
> 248 return 1;
> 249
> 250 }

> It seems that lib_ring_buffer_channel_do_read()-> 
> lib_ring_buffer_poll_deliver()
> returned 0 at line 238 ($consume_old:0x86c400000, $commint_count:0x86c0,
> $write_offset:0x86d40).

> Buffer type: per UID
> Channels:
> -
> - channel0: [enabled]

> Attributes:
> overwrite mode: 0
> subbufers size: 1048576
> number of subbufers: 16
> switch timer interval: 2000
> read timer interval: 6000
> trace file count: 80
> trace file size (bytes): 2097152
> output: mmap()

> PS: Lttng version is 2.7(I know it is an very old version:) )
> Do you have any ideas about this problem? Thanks in advance

> Thanks a lot
> zhenyu.ren

> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Feature request: dynamically load from prefix

2021-10-05 Thread Mathieu Desnoyers via lttng-dev

- On Jul 15, 2021, at 7:21 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hello,
> 
> The production rootfs should be untouched, ideally read-only,
> for development/tests a subdirectory can be mounted (eg. /usr/local).
> Idea is that the contents of that directory alone (and at most some
> env variables)
> should allow enabling development features.
> 
> For lttng I would have wanted to add a library
> '/usr/local/lib/libmyservice-tracepoints.so' with runpath
> '/usr/local/lib' that would activate lttng tracing,
> pulling in lttng libraries (ust, ust-tracepoint) from /usr/local/lib.
> 
> There is a caveat though, unless 'libmyservice-tracepoints.so'' is
> preloaded, the code in lttng/tracepoint.h will run constructor functions
> to register the tracepoint probes, trying to dlopen the lttng-ust-tracepoint
> library and fail at that because this is not in the library search paths.
> 
> At a later time,  'libmyservice-tracepoints.so'' will be loaded, and
> lttng-ust-tracepoint (along with lttng-ust) can be resolved. but the
> tracepoints are not registered.
> 
> So I guess what I would need is to either retrigger the registration
> of tracepoints
> (likely complicated with static and weak symbols potentially causing a mess), 
> or
> redirect the dlopen function.
> Useful would be either try to find the library in /usr/local/lib or
> use '/usr/local/lib/libmyservice-tracepoints.so''
> instead of  lttng-ust-tracepoint (both have (dis)advantages).
> 
> At any rate, I would welcome some customization macro.

I'm currently working on improvements to the reporting of such scenario.

See:

https://review.lttng.org/c/lttng-ust/+/6480 tracepoints: print debug message 
when lttng-ust-tracepoint.so is not found 
https://review.lttng.org/c/lttng-ust/+/6484 tracepoints: increase dlopen 
failure message level from debug to critical

With this in place, it should be easier for a lttng end-user to figure out that
liblttng-ust-tracepoint.so cannot be found by dlopen() because it is not in the
system's library search path.

>From that point, what is wrong with requesting the user to add the path towards
liblttng-ust-tracepoint.so to the environment variable LD_LIBRARY_PATH when 
running
the application ?

Thanks,

Mathieu

> 
> For illustration the current hack-around is following
> 
> Norbert Lange
> 
> #define TRACEPOINT_DEFINE
> #define TRACEPOINT_PROBE_DYNAMIC_LINKAGE
> 
> #include 
> 
> static inline void *s_remap_dlopen(const char *localfilename, int
> flags, unsigned prelen) {
>void *md = (dlopen)(localfilename + prelen, flags);
>return md ? md : (dlopen)(localfilename, flags);
> }
> 
> # ideally this would be LTTNG_TRACEPOINT_PROBE_DLOPEN instead of the dlopen 
> mess
> #define dlopen(x, y) s_remap_dlopen("/usr/local/lib/" x, y,
> (unsigned)sizeof("/usr/local/lib/") - 1U)
> 
> #include "trace_mytracepoints.h"
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] with -Og option, lttng-ust compile failed on 32bit arm

2021-09-30 Thread Mathieu Desnoyers via lttng-dev

- On Sep 29, 2021, at 11:47 PM, lttng-dev  
wrote: 

> Hi,

> The problem happened after upgrade to lttng-ust 2.13.0. 2.12.0 don't have this
> issue.

> liburcu version: 0.13.0

> gcc: 11.2.0

> This is my reproduce steps, it is cross compile enviroment based on yocto
> project.

> 1. git clone git://git.yoctoproject.org/poky
> 2. . oe-init-build-env
> 3. echo "MACHINE='qemuarm'" >> conf/local.conf
> 4. echo "DEBUG_BUILD='1'" >> conf/local.conf
> 5. bitbake lttng-ust

> compile failed with error:

>| 
>/work/cortexa15t2hf-neon-poky-linux-gnueabi/lttng-ust/2_2.13.0-r0/recipe-sysroot-native/usr/bin/arm-poky-linux-gnueabi/../../libexec/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/11.2.0/ld:
>| ../../../src/lib/lttng-ust/.libs/liblttng-ust.so: undefined reference to
> | `_uatomic_link_error'
> | collect2: error: ld returned 1 exit status
> | Makefile:399: recipe for target 'test_ust_error' failed

> checked with "nm ../../../src/lib/lttng-ust/.libs/liblttng-ust.so" | grep
> atomic, we can see 'U _uatomic_link_error', but since -Og

> is used, liburcu don't define this function.

> [snip]
> #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR
> static inline __attribute__((always_inline, noreturn))
> void _uatomic_link_error(void)
> {
> #ifdef ILLEGAL_INSTR
> /*
> * generate an illegal instruction. Cannot catch this with
> * linker tricks when optimizations are disabled.
> */
> __asm__ __volatile__(ILLEGAL_INSTR);
> #else
> __builtin_trap();
> #endif
> }
> #else /* #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR */
> extern void _uatomic_link_error(void);
> #endif /* #else #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR */

> [snip]

> we cannot see 'U _uatomic_link_error' in following conditions, so compile
> successed:

> 1. without -Og(using -O2), + 32bit arm

> 2. -Og + 64bit arm

> 3. -Og + x86/x86-64

> Do you have any idea about how to fix this? I don't understand why only "-Og +
> 32bit arm" will call function _uatomic_link_error.
I suspect it depends on which optimizations are being enabled at -Og on each 
architecture. 
The "_uatomic_link_error()" trick indeed depends on the compiler optimizing 
away 
unreachable calls. 

If you really intend on using "-Og" on arm32, trying building with 
"-DUATOMIC_NO_LINK_ERROR". 
It should take care of making sure to generate an illegal instruction rather 
than rely on the linker 
error. 

Thanks, 

Mathieu 

> Thanks

> //Changqing

> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Lttng live protocol

2021-09-15 Thread Mathieu Desnoyers via lttng-dev

- On Sep 1, 2021, at 1:23 PM, lttng-dev  wrote: 

> Hi there,

> I am currently evaluating the use of CTF and lttng tooling for application
> tracing on windows. We are exploring alternatives to ETW that are more
> customisable.
> One thing we would really like to do is real-time monitoring of our 
> application
> from another machine. I have a few questions regarding this:

> 1. Is lttng live protocol suitable for this purpose? What kind of latency 
> would
> we expect? (e.g 10s or 100s of milliseconds or more)

The lttng live protocol has been designed for extracting a low-throughput of 
events to a live pretty-printer, with delays in the area of 
a few seconds. It's a polling-based mechanism at the moment. 

> 2. Is the protocol documented?

No. There is only an implementation with the lttng project and in babeltrace. 

> 3. Is it possible to use lttng-relayd to read from local CTF log files (which
> are being written to) and stream events to other machines / a viewer on the
> same machine? The reason I ask this is the documentation seems suggests
> lttng-relayd can consume CTF files [
> https://lttng.org/docs/v2.13/#doc-lttng-relayd |
> https://lttng.org/docs/v2.13/#doc-lttng-relayd ] .

lttng-relayd needs to control both writing to the CTF log files and reading 
from them. The "writing to" 
cannot be done by an external process. 

> 4. I see there is a windows cygwin build on jenkins. Would you recommend this
> for production use?

We do not recommend Cygwin builds for production use unless there are no 
alternatives. From my own 
past experience, the Cygwin layer is not a solid basis for production-quality 
software. 

Thanks for your interest, 

Mathieu 

> Any guidance would be much appreciated.

> Thanks in advance,

> Mayur

> --

> [ http://www.disguise.one/ ]

> Mayur Patel

> Lead Software Engineer

> T : +44 20 7234 9840

> M : +44 7342 180127

> A : [
> https://www.google.com/maps/place/88-89%20Blackfriars%20Road,%20London,%20SE1%208HA
> | 88-89 Blackfriars Road, London, SE1 8HA ]

> [ https://www.facebook.com/disguise.one/ ]

> [ https://twitter.com/disguise_one ]


> [ https://www.youtube.com/channel/UCBXckvTm2VHU29BUoKJizvA ]

> [ https://www.instagram.com/disguise_one/ ]

> [ https://www.linkedin.com/company/disguise-/ ]

> disguise Technologies Limited is a privately owned business registered in
> England and Wales (registered number 07937973), with its registered office
> located at 88-89 Blackfriars Road, London, SE1 8HA. This e-mail, and any
> attachments thereto, is intended only for use by the addressee(s) named herein
> and may contain legally privileged and/or confidential information. If you are
> not the intended recipient of this e-mail, you are hereby notified that any
> dissemination, distribution or copying of this e-mail, and any attachments
> thereto, is strictly prohibited. If you have received this e-mail in error,
> please notify me by replying to this message and permanently delete the
> original and any copy of this e-mail and any printout thereof. Although this
> e-mail and any attachments are believed to be free of any virus, or other
> defect which might affect any computer or system into which they are received
> and opened, it is the responsibility of the recipient to ensure that they are
> virus free and no responsibility is accepted by disguise for any loss or 
> damage
> from receipt or use thereof.

> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH v3 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-09-02 Thread Mathieu Desnoyers

- On Sep 1, 2021, at 4:30 PM, Sean Christopherson sea...@google.com wrote:

> Add a test to verify an rseq's CPU ID is updated correctly if the task is
> migrated while the kernel is handling KVM_RUN.  This is a regression test
> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
> without updating rseq, leading to a stale CPU ID and other badness.
> 
> Signed-off-by: Sean Christopherson 

Thanks!

Acked-by: Mathieu Desnoyers 

> ---
> tools/testing/selftests/kvm/.gitignore  |   1 +
> tools/testing/selftests/kvm/Makefile|   3 +
> tools/testing/selftests/kvm/rseq_test.c | 236 
> 3 files changed, 240 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/rseq_test.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore
> b/tools/testing/selftests/kvm/.gitignore
> index 0709af0144c8..6d031ff6b68e 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -47,6 +47,7 @@
> /kvm_page_table_test
> /memslot_modification_stress_test
> /memslot_perf_test
> +/rseq_test
> /set_memory_region_test
> /steal_time
> /kvm_binary_stats_test
> diff --git a/tools/testing/selftests/kvm/Makefile
> b/tools/testing/selftests/kvm/Makefile
> index 5832f510a16c..0756e79cb513 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
> TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> TEST_GEN_PROGS_x86_64 += memslot_perf_test
> +TEST_GEN_PROGS_x86_64 += rseq_test
> TEST_GEN_PROGS_x86_64 += set_memory_region_test
> TEST_GEN_PROGS_x86_64 += steal_time
> TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> @@ -92,6 +93,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
> TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
> TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> +TEST_GEN_PROGS_aarch64 += rseq_test
> TEST_GEN_PROGS_aarch64 += set_memory_region_test
> TEST_GEN_PROGS_aarch64 += steal_time
> TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
> @@ -103,6 +105,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
> TEST_GEN_PROGS_s390x += dirty_log_test
> TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> TEST_GEN_PROGS_s390x += kvm_page_table_test
> +TEST_GEN_PROGS_s390x += rseq_test
> TEST_GEN_PROGS_s390x += set_memory_region_test
> TEST_GEN_PROGS_s390x += kvm_binary_stats_test
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c
> b/tools/testing/selftests/kvm/rseq_test.c
> new file mode 100644
> index ..060538bd405a
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "test_util.h"
> +
> +#define VCPU_ID 0
> +
> +static __thread volatile struct rseq __rseq = {
> + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
> +};
> +
> +/*
> + * Use an arbitrary, bogus signature for configuring rseq, this test does not
> + * actually enter an rseq critical section.
> + */
> +#define RSEQ_SIG 0xdeadbeef
> +
> +/*
> + * Any bug related to task migration is likely to be timing-dependent; 
> perform
> + * a large number of migrations to reduce the odds of a false negative.
> + */
> +#define NR_TASK_MIGRATIONS 10
> +
> +static pthread_t migration_thread;
> +static cpu_set_t possible_mask;
> +static bool done;
> +
> +static atomic_t seq_cnt;
> +
> +static void guest_code(void)
> +{
> + for (;;)
> + GUEST_SYNC(0);
> +}
> +
> +static void sys_rseq(int flags)
> +{
> + int r;
> +
> + r = syscall(__NR_rseq, &__rseq, sizeof(__rseq), flags, RSEQ_SIG);
> + TEST_ASSERT(!r, "rseq failed, errno = %d (%s)", errno, strerror(errno));
> +}
> +
> +static void *migration_worker(void *ign)
> +{
> + cpu_set_t allowed_mask;
> + int r, i, nr_cpus, cpu;
> +
> + CPU_ZERO(_mask);
> +
> + nr_cpus = CPU_COUNT(_mask);
> +
> + for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
> + cpu = i % nr_cpus;
> + if (!CPU_ISSET(cpu, _mask))
> + continue;
> +
> + CPU_SET

Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-27 Thread Mathieu Desnoyers

- On Aug 27, 2021, at 7:23 PM, Sean Christopherson sea...@google.com wrote:

> On Fri, Aug 27, 2021, Mathieu Desnoyers wrote:
[...]
>> Does it reproduce if we randomize the delay to have it picked randomly from 
>> 0us
>> to 100us (with 1us step) ? It would remove a lot of the needs for 
>> arch-specific
>> magic delay value.
> 
> My less-than-scientific testing shows that it can reproduce at delays up to
> ~500us,
> but above ~10us the reproducibility starts to drop.  The bug still reproduces
> reliably, it just takes more iterations, and obviously the test runs a bit
> slower.
> 
> Any objection to using a 1-10us delay, e.g. a simple usleep((i % 10) + 1)?

Works for me, thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-27 Thread Mathieu Desnoyers




- On Aug 26, 2021, at 7:54 PM, Sean Christopherson sea...@google.com wrote:

> On Thu, Aug 26, 2021, Mathieu Desnoyers wrote:
>> - On Aug 25, 2021, at 8:51 PM, Sean Christopherson sea...@google.com 
>> wrote:
>> >> >> +  r = sched_setaffinity(0, sizeof(allowed_mask), 
>> >> >> _mask);
>> >> >> +  TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d 
>> >> >> (%s)",
>> >> >> +  errno, strerror(errno));
>> >> >> +  atomic_inc(_cnt);
>> >> >> +
>> >> >> +  CPU_CLR(cpu, _mask);
>> >> >> +
>> >> >> +  /*
>> >> >> +   * Let the read-side get back into KVM_RUN to improve 
>> >> >> the odds
>> >> >> +   * of task migration coinciding with KVM's run loop.
>> >> > 
>> >> > This comment should be about increasing the odds of letting the seqlock
>> >> > read-side complete. Otherwise, the delay between the two back-to-back
>> >> > atomic_inc is so small that the seqlock read-side may never have time to
>> >> > complete the reading the rseq cpu id and the sched_getcpu() call, and 
>> >> > can
>> >> > retry forever.
>> > 
>> > Hmm, but that's not why there's a delay.  I'm not arguing that a livelock 
>> > isn't
>> > possible (though that syscall would have to be screaming fast),
>> 
>> I don't think we have the same understanding of the livelock scenario. AFAIU 
>> the
>> livelock
>> would be caused by a too-small delay between the two consecutive atomic_inc()
>> from one
>> loop iteration to the next compared to the time it takes to perform a 
>> read-side
>> critical
>> section of the seqlock. Back-to-back atomic_inc can be performed very 
>> quickly,
>> so I
>> doubt that the sched_getcpu implementation have good odds to be fast enough 
>> to
>> complete
>> in that narrow window, leading to lots of read seqlock retry.
> 
> Ooooh, yeah, brain fart on my side.  I was thinking of the two atomic_inc() in
> the
> same loop iteration and completely ignoring the next iteration.  Yes, I 100%
> agree
> that a delay to ensure forward progress is needed.  An assertion in main() 
> that
> the
> reader complete at least some reasonable number of KVM_RUNs is also probably a
> good
> idea, e.g. to rule out a false pass due to the reader never making forward
> progress.

Agreed.

> 
> FWIW, the do-while loop does make forward progress without a delay, but at 
> ~50%
> throughput, give or take.

I did not expect absolutely no progress, but a significant slow down of
the read-side.

> 
>> > but the primary motivation is very much to allow the read-side enough time
>> > to get back into KVM proper.
>> 
>> I'm puzzled by your statement. AFAIU, let's say we don't have the delay, 
>> then we
>> have:
>> 
>> migration thread KVM_RUN/read-side thread
>> ---
>>  - ioctl(KVM_RUN)
>> - atomic_inc_seq_cst()
>> - sched_setaffinity
>> - atomic_inc_seq_cst()
>>  - a = atomic_load() & ~1
>>  - smp_rmb()
>>  - b = 
>> LOAD_ONCE(__rseq_abi->cpu_id);
>>  - sched_getcpu()
>>  - smp_rmb()
>>  - re-load seqcnt/compare 
>> (succeeds)
>>- Can only succeed if entire 
>> read-side happens while the seqcnt
>>  is in an even numbered 
>> state.
>>  - if (a != b) abort()
>>   /* no delay. Even counter state is very
>>  short. */
>> - atomic_inc_seq_cst()
>>   /* Let's suppose the lack of delay causes the
>>  setaffinity to complete too early compared
>>  with KVM_RUN ioctl */
>> - sched_setaffinity
>> - atomic_inc_seq_cst()
>> 
>>   /* no delay. Even counter state is very
>>  short. */
>> - atomic_inc_seq_cst()
>>   /* Then a setaffinity from a following
>>  migration thread loop wil

Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-26 Thread Mathieu Desnoyers

- On Aug 25, 2021, at 8:51 PM, Sean Christopherson sea...@google.com wrote:

> On Mon, Aug 23, 2021, Mathieu Desnoyers wrote:
>> [ re-send to Darren Hart ]
>> 
>> - On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
>> 
>> > - On Aug 20, 2021, at 6:50 PM, Sean Christopherson sea...@google.com 
>> > wrote:
>> > 
>> >> Add a test to verify an rseq's CPU ID is updated correctly if the task is
>> >> migrated while the kernel is handling KVM_RUN.  This is a regression test
>> >> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
>> >> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
>> >> without updating rseq, leading to a stale CPU ID and other badness.
>> >> 
>> > 
>> > [...]
>> > 
>> > +#define RSEQ_SIG 0xdeadbeef
>> > 
>> > Is there any reason for defining a custom signature rather than including
>> > tools/testing/selftests/rseq/rseq.h ? This should take care of including
>> > the proper architecture header which will define the appropriate signature.
>> > 
>> > Arguably you don't define rseq critical sections in this test per se, but
>> > I'm wondering why the custom signature here.
> 
> Partly to avoid taking a dependency on rseq.h, and partly to try to call out
> that
> the test doesn't actually do any rseq critical sections.

It might be good to add a comment near this define stating this then, so nobody
attempts to copy this as an example.

> 
>> > [...]
>> > 
>> >> +
>> >> +static void *migration_worker(void *ign)
>> >> +{
>> >> + cpu_set_t allowed_mask;
>> >> + int r, i, nr_cpus, cpu;
>> >> +
>> >> + CPU_ZERO(_mask);
>> >> +
>> >> + nr_cpus = CPU_COUNT(_mask);
>> >> +
>> >> + for (i = 0; i < 2; i++) {
>> >> + cpu = i % nr_cpus;
>> >> + if (!CPU_ISSET(cpu, _mask))
>> >> + continue;
>> >> +
>> >> + CPU_SET(cpu, _mask);
>> >> +
>> >> + /*
>> >> +  * Bump the sequence count twice to allow the reader to detect
>> >> +  * that a migration may have occurred in between rseq and sched
>> >> +  * CPU ID reads.  An odd sequence count indicates a migration
>> >> +  * is in-progress, while a completely different count indicates
>> >> +  * a migration occurred since the count was last read.
>> >> +  */
>> >> + atomic_inc(_cnt);
>> > 
>> > So technically this atomic_inc contains the required barriers because the
>> > selftests implementation uses "__sync_add_and_fetch(>val, 1)". But
>> > it's rather odd that the semantic differs from the kernel implementation in
>> > terms of memory barriers: the kernel implementation of atomic_inc
>> > guarantees no memory barriers, but this one happens to provide full
>> > barriers pretty much by accident (selftests futex/include/atomic.h
>> > documents no such guarantee).
> 
> Yeah, I got quite lost trying to figure out what atomics the test would 
> actually
> end up with.

At the very least, until things are clarified in the selftests atomic header, I 
would
recommend adding a comment stating which memory barriers are required alongside 
each
use of atomic_inc here. I would even prefer that we add extra (currently 
unneeded)
write barriers to make extra sure that this stays documented. Performance of 
the write-side
does not matter much here.

> 
>> > If this full barrier guarantee is indeed provided by the selftests atomic.h
>> > header, I would really like a comment stating that in the atomic.h header
>> > so the carpet is not pulled from under our feet by a future optimization.
>> > 
>> > 
>> >> + r = sched_setaffinity(0, sizeof(allowed_mask), _mask);
>> >> + TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
>> >> + errno, strerror(errno));
>> >> + atomic_inc(_cnt);
>> >> +
>> >> + CPU_CLR(cpu, _mask);
>> >> +
>> >> + /*
>> >> +  * Let the read-side get back into KVM_RUN to improve the odds
>> >> +  * of task migration coinciding with KVM's run loop.
>> > 
>> > This comment should be about increasing the odds of letting the se

Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-23 Thread Mathieu Desnoyers

[ re-send to Darren Hart ]

- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Aug 20, 2021, at 6:50 PM, Sean Christopherson sea...@google.com 
> wrote:
> 
>> Add a test to verify an rseq's CPU ID is updated correctly if the task is
>> migrated while the kernel is handling KVM_RUN.  This is a regression test
>> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
>> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
>> without updating rseq, leading to a stale CPU ID and other badness.
>> 
> 
> [...]
> 
> +#define RSEQ_SIG 0xdeadbeef
> 
> Is there any reason for defining a custom signature rather than including
> tools/testing/selftests/rseq/rseq.h ? This should take care of including
> the proper architecture header which will define the appropriate signature.
> 
> Arguably you don't define rseq critical sections in this test per se, but
> I'm wondering why the custom signature here.
> 
> [...]
> 
>> +
>> +static void *migration_worker(void *ign)
>> +{
>> +cpu_set_t allowed_mask;
>> +int r, i, nr_cpus, cpu;
>> +
>> +CPU_ZERO(_mask);
>> +
>> +nr_cpus = CPU_COUNT(_mask);
>> +
>> +for (i = 0; i < 2; i++) {
>> +cpu = i % nr_cpus;
>> +if (!CPU_ISSET(cpu, _mask))
>> +continue;
>> +
>> +CPU_SET(cpu, _mask);
>> +
>> +/*
>> + * Bump the sequence count twice to allow the reader to detect
>> + * that a migration may have occurred in between rseq and sched
>> + * CPU ID reads.  An odd sequence count indicates a migration
>> + * is in-progress, while a completely different count indicates
>> + * a migration occurred since the count was last read.
>> + */
>> +atomic_inc(_cnt);
> 
> So technically this atomic_inc contains the required barriers because the
> selftests
> implementation uses "__sync_add_and_fetch(>val, 1)". But it's rather odd
> that
> the semantic differs from the kernel implementation in terms of memory 
> barriers:
> the
> kernel implementation of atomic_inc guarantees no memory barriers, but this 
> one
> happens to provide full barriers pretty much by accident (selftests
> futex/include/atomic.h documents no such guarantee).
> 
> If this full barrier guarantee is indeed provided by the selftests atomic.h
> header,
> I would really like a comment stating that in the atomic.h header so the 
> carpet
> is
> not pulled from under our feet by a future optimization.
> 
> 
>> +r = sched_setaffinity(0, sizeof(allowed_mask), _mask);
>> +TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
>> +errno, strerror(errno));
>> +atomic_inc(_cnt);
>> +
>> +CPU_CLR(cpu, _mask);
>> +
>> +/*
>> + * Let the read-side get back into KVM_RUN to improve the odds
>> + * of task migration coinciding with KVM's run loop.
> 
> This comment should be about increasing the odds of letting the seqlock
> read-side
> complete. Otherwise, the delay between the two back-to-back atomic_inc is so
> small
> that the seqlock read-side may never have time to complete the reading the 
> rseq
> cpu id and the sched_getcpu() call, and can retry forever.
> 
> I'm wondering if 1 microsecond is sufficient on other architectures as well. 
> One
> alternative way to make this depend less on the architecture's implementation 
> of
> sched_getcpu (whether it's a vDSO, or goes through a syscall) would be to read
> the rseq cpu id and call sched_getcpu a few times (e.g. 3 times) in the
> migration
> thread rather than use usleep, and throw away the value read. This would 
> ensure
> the delay is appropriate on all architectures.
> 
> Thanks!
> 
> Mathieu
> 
>> + */
>> +usleep(1);
>> +}
>> +done = true;
>> +return NULL;
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +struct kvm_vm *vm;
>> +u32 cpu, rseq_cpu;
>> +int r, snapshot;
>> +
>> +/* Tell stdout not to buffer its content */
>> +setbuf(stdout, NULL);
>> +
>> +r = sched_getaffinity(0, sizeof(possible_mask), _mask);
>> +TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
>> +strerror(errno));
>> +
>> +if (CPU_COUNT(_

Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-23 Thread Mathieu Desnoyers

_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
> + "Guest failed?");
> +
> + /*
> +  * Verify rseq's CPU matches sched's CPU.  Ensure migration
> +  * doesn't occur between sched_getcpu() and reading the rseq
> +  * cpu_id by rereading both if the sequence count changes, or
> +  * if the count is odd (migration in-progress).
> +  */
> + do {
> + /*
> +  * Drop bit 0 to force a mismatch if the count is odd,
> +  * i.e. if a migration is in-progress.
> +  */
> + snapshot = atomic_read(_cnt) & ~1;
> + smp_rmb();
> + cpu = sched_getcpu();
> + rseq_cpu = READ_ONCE(__rseq.cpu_id);
> + smp_rmb();
> + } while (snapshot != atomic_read(_cnt));
> +
> + TEST_ASSERT(rseq_cpu == cpu,
> + "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
> + }
> +
> + pthread_join(migration_thread, NULL);
> +
> + kvm_vm_free(vm);
> +
> + sys_rseq(RSEQ_FLAG_UNREGISTER);
> +
> + return 0;
> +}
> --
> 2.33.0.rc2.250.ged5fa647cd-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest

2021-08-23 Thread Mathieu Desnoyers

- On Aug 20, 2021, at 6:49 PM, Sean Christopherson sea...@google.com wrote:

> Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
> transferring to a KVM guest, which is roughly equivalent to an exit to
> userspace and processes many of the same pending actions.  While the task
> cannot be in an rseq critical section as the KVM path is reachable only
> by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
> critical section still apply, e.g. the current CPU needs to be updated if
> the task is migrated.
> 
> Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
> and other badness in userspace VMMs that use rseq in combination with KVM,
> e.g. due to the CPU ID being stale after task migration.

Acked-by: Mathieu Desnoyers 

> 
> Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
> Reported-by: Peter Foley 
> Bisected-by: Doug Evans 
> Cc: Shakeel Butt 
> Cc: Thomas Gleixner 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sean Christopherson 
> ---
> kernel/entry/kvm.c |  4 +++-
> kernel/rseq.c  | 14 +++---
> 2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> index 49972ee99aff..049fd06b4c3d 100644
> --- a/kernel/entry/kvm.c
> +++ b/kernel/entry/kvm.c
> @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> unsigned long ti_work)
>   if (ti_work & _TIF_NEED_RESCHED)
>   schedule();
> 
> - if (ti_work & _TIF_NOTIFY_RESUME)
> + if (ti_work & _TIF_NOTIFY_RESUME) {
>   tracehook_notify_resume(NULL);
> + rseq_handle_notify_resume(NULL, NULL);
> + }
> 
>   ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
>   if (ret)
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..6d45ac3dae7f 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -282,9 +282,17 @@ void __rseq_handle_notify_resume(struct ksignal *ksig,
> struct pt_regs *regs)
> 
>   if (unlikely(t->flags & PF_EXITING))
>   return;
> - ret = rseq_ip_fixup(regs);
> - if (unlikely(ret < 0))
> - goto error;
> +
> + /*
> +  * regs is NULL if and only if the caller is in a syscall path.  Skip
> +  * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> +  * kill a misbehaving userspace on debug kernels.
> +  */
> + if (regs) {
> + ret = rseq_ip_fixup(regs);
> + if (unlikely(ret < 0))
> + goto error;
> + }
>   if (unlikely(rseq_update_cpu_id(t)))
>   goto error;
>   return;
> --
> 2.33.0.rc2.250.ged5fa647cd-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest

2021-08-20 Thread Mathieu Desnoyers

- On Aug 19, 2021, at 7:48 PM, Sean Christopherson sea...@google.com wrote:

> On Thu, Aug 19, 2021, Mathieu Desnoyers wrote:
>> - On Aug 17, 2021, at 8:12 PM, Sean Christopherson sea...@google.com 
>> wrote:
>> > @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>> > * If not nested over a rseq critical section, restart is useless.
>> > * Clear the rseq_cs pointer and return.
>> > */
>> > -  if (!in_rseq_cs(ip, _cs))
>> > +  if (!regs || !in_rseq_cs(ip, _cs))
>> 
>> I think clearing the thread's rseq_cs unconditionally here when regs is NULL
>> is not the behavior we want when this is called from xfer_to_guest_mode_work.
>> 
>> If we have a scenario where userspace ends up calling this ioctl(KVM_RUN)
>> from within a rseq c.s., we really want a CONFIG_DEBUG_RSEQ=y kernel to
>> kill this application in the rseq_syscall handler when exiting back to 
>> usermode
>> when the ioctl eventually returns.
>> 
>> However, clearing the thread's rseq_cs will prevent this from happening.
>> 
>> So I would favor an approach where we simply do:
>> 
>> if (!regs)
>>  return 0;
>> 
>> Immediately at the beginning of rseq_ip_fixup, before getting the instruction
>> pointer, so effectively skip all side-effects of the ip fixup code. Indeed, 
>> it
>> is not relevant to do any fixup here, because it is nested in a ioctl system
>> call.
>> 
>> Effectively, this would preserve the SIGSEGV behavior when this ioctl is
>> erroneously called by user-space from a rseq critical section.
> 
> Ha, that's effectively what I implemented first, but I changed it because of 
> the
> comment in clear_rseq_cs() that says:
> 
>  The rseq_cs field is set to NULL on preemption or signal delivery ... as well
>  as well as on top of code outside of the rseq assembly block.
> 
> Which makes it sound like something might rely on clearing rseq_cs?

This comment is describing succinctly the lazy clear scheme for rseq_cs.

Without the lazy clear scheme, a rseq c.s. would look like:

 * init(rseq_cs)
 * cpu = TLS->rseq::cpu_id_start
 *   [1]   TLS->rseq::rseq_cs = rseq_cs
 *   [start_ip]
 *   [2]   if (cpu != TLS->rseq::cpu_id)
 * goto abort_ip;
 *   [3]   
 *   [post_commit_ip]  
 *   [4]   TLS->rseq::rseq_cs = NULL

But as a fast-path optimization, [4] is not entirely needed because the rseq_cs
descriptor contains information about the instruction pointer range of the 
critical
section. Therefore, userspace can omit [4], but if the kernel never clears it, 
it
means that it will have to re-read the rseq_cs descriptor's content each time it
needs to check it to confirm that it is not nested over a rseq c.s..

So making the kernel lazily clear the rseq_cs pointer is just an optimization 
which
ensures that the kernel won't do useless work the next time it needs to check
rseq_cs, given that it has already validated that the userspace code is 
currently
not within the rseq c.s. currently advertised by the rseq_cs field.

> 
> Ah, or is it the case that rseq_cs is non-NULL if and only if userspace is in 
> an
> rseq critical section, and because syscalls in critical sections are illegal, 
> by
> definition clearing rseq_cs is a nop unless userspace is misbehaving.

Not quite, as I described above. But we want it to stay set so the 
CONFIG_DEBUG_RSEQ
code executed when returning from ioctl to userspace will be able to validate 
that
it is not nested within a rseq critical section.

> 
> If that's true, what about explicitly checking that at NOTIFY_RESUME?  Or is 
> it
> not worth the extra code to detect an error that will likely be caught 
> anyways?

The error will indeed already be caught on return from ioctl to userspace, so I
don't see any added value in duplicating this check.

Thanks,

Mathieu

> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..28b8342290b0 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -282,6 +282,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig,
> struct pt_regs *regs)
> 
>if (unlikely(t->flags & PF_EXITING))
>return;
> +   if (!regs) {
> +#ifdef CONFIG_DEBUG_RSEQ
> +   if (t->rseq && rseq_get_rseq_cs(t, _cs))
> +   goto error;
> +#endif
> +       return;
> +   }
>ret = rseq_ip_fixup(regs);
>    if (unlikely(ret < 0))
>goto error;
> 
>> Thanks for looking into this !
>> 
>> Mathieu
>> 
>> >return clear_rseq_cs(t);
>> >ret = rseq_need_restart(t, rseq_cs.flags);
>> >if (ret <= 0)
>> > --
>> > 2.33.0.rc1.237.g0d66db33f3-goog
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-20 Thread Mathieu Desnoyers

- On Aug 19, 2021, at 7:33 PM, Sean Christopherson sea...@google.com wrote:

> On Thu, Aug 19, 2021, Mathieu Desnoyers wrote:
>> - On Aug 17, 2021, at 8:12 PM, Sean Christopherson sea...@google.com 
>> wrote:
>> 
>> > Add a test to verify an rseq's CPU ID is updated correctly if the task is
>> > migrated while the kernel is handling KVM_RUN.  This is a regression test
>> > for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
>> > to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
>> > without updating rseq, leading to a stale CPU ID and other badness.
>> > 
>> > Signed-off-by: Sean Christopherson 
>> > ---
>> 
>> [...]
>> 
>> > +  while (!done) {
>> > +  vcpu_run(vm, VCPU_ID);
>> > +  TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
>> > +  "Guest failed?");
>> > +
>> > +  cpu = sched_getcpu();
>> > +  rseq_cpu = READ_ONCE(__rseq.cpu_id);
>> > +
>> > +  /*
>> > +   * Verify rseq's CPU matches sched's CPU, and that sched's CPU
>> > +   * is stable.  This doesn't handle the case where the task is
>> > +   * migrated between sched_getcpu() and reading rseq, and again
>> > +   * between reading rseq and sched_getcpu(), but in practice no
>> > +   * false positives have been observed, while on the other hand
>> > +   * blocking migration while this thread reads CPUs messes with
>> > +   * the timing and prevents hitting failures on a buggy kernel.
>> > +   */
>> 
>> I think you could get a stable cpu id between sched_getcpu and 
>> __rseq_abi.cpu_id
>> if you add a pthread mutex to protect:
>> 
>> sched_getcpu and __rseq_abi.cpu_id  reads
>> 
>> vs
>> 
>> sched_setaffinity calls within the migration thread.
>> 
>> Thoughts ?
> 
> I tried that and couldn't reproduce the bug.  That's what I attempted to call
> out
> in the blurb "blocking migration while this thread reads CPUs ... prevents
> hitting
> failures on a buggy kernel".
> 
> I considered adding arbitrary delays around the mutex to try and hit the bug,
> but
> I was worried that even if I got it "working" for this bug, the test would be
> too
> tailored to this bug and potentially miss future regression.  Letting the two
> threads run wild seemed like it would provide the best coverage, at the cost 
> of
> potentially causing to false failures.

OK, so your point is that using mutual exclusion to ensure stability of the cpu 
id
changes the timings too much, to a point where the issues don't reproduce. I 
understand
that this mutex ties the migration thread timings to the vcpu thread's use of 
the mutex,
which will reduce timings randomness, which is unwanted here.

I still really hate flakiness in tests, because then people stop caring when 
they
fail once in a while. And with the nature of rseq, a once-in-a-while failure is 
a
big deal. Let's see if we can use other tricks to ensure stability of the cpu id
without changing timings too much.

One idea would be to use a seqcount lock. But even if we use that, I'm 
concerned that
the very long writer critical section calling sched_setaffinity would need to be
alternated with a sleep to ensure the read-side progresses. The sleep delay 
could be
relatively small compared to the duration of the sched_setaffinity call, e.g. 
ratio
1:10.

static volatile uint64_t seqcnt;

The thread responsible for setting the affinity would do something like:

for (;;) {
  atomic_inc_seq_cst();
  sched_setaffinity(..., n++ % nr_cpus);
  atomic_inc_seq_cst();
  usleep(1);  /* this is where read-side is allowed to progress. */
}

And the thread reading the rseq cpu id and calling sched_getcpu():

uint64_t snapshot;

do {
  snapshot = atomic_load() & ~1; /* force retry if odd */
  smp_rmb();
  cpu = sched_getcpu();
  rseq_cpu = READ_ONCE(__rseq.cpu_id);
  smp_rmb();
} while (snapshot != atomic_load());

So the reader retry the cpu id reads whenever sched_setaffinity is being
called by the migration thread, and whenever it is preempted for more
than one migration thread loop.

That should achieve our goal of providing cpu id stability without significantly
changing the timings of the migration thread, given that it never blocks waiting
for the reader.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

2021-08-19 Thread Mathieu Desnoyers

- On Aug 17, 2021, at 8:12 PM, Sean Christopherson sea...@google.com wrote:

> Add a test to verify an rseq's CPU ID is updated correctly if the task is
> migrated while the kernel is handling KVM_RUN.  This is a regression test
> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
> without updating rseq, leading to a stale CPU ID and other badness.
> 
> Signed-off-by: Sean Christopherson 
> ---

[...]

> +
> +static void *migration_worker(void *ign)
> +{
> + cpu_set_t allowed_mask;
> + int r, i, nr_cpus, cpu;
> +
> + CPU_ZERO(_mask);
> +
> + nr_cpus = CPU_COUNT(_mask);
> +
> + for (i = 0; i < 2; i++) {
> + cpu = i % nr_cpus;
> + if (!CPU_ISSET(cpu, _mask))
> + continue;
> +
> + CPU_SET(cpu, _mask);
> +
> + r = sched_setaffinity(0, sizeof(allowed_mask), _mask);
> + TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", 
> errno,
> + strerror(errno));
> +
> + CPU_CLR(cpu, _mask);
> +
> + usleep(10);
> + }
> + done = true;
> + return NULL;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + struct kvm_vm *vm;
> + u32 cpu, rseq_cpu;
> + int r;
> +
> + /* Tell stdout not to buffer its content */
> + setbuf(stdout, NULL);
> +
> + r = sched_getaffinity(0, sizeof(possible_mask), _mask);
> + TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
> + strerror(errno));
> +
> + if (CPU_COUNT(_mask) < 2) {
> + print_skip("Only one CPU, task migration not possible\n");
> + exit(KSFT_SKIP);
> + }
> +
> + sys_rseq(0);
> +
> + /*
> +  * Create and run a dummy VM that immediately exits to userspace via
> +  * GUEST_SYNC, while concurrently migrating the process by setting its
> +  * CPU affinity.
> +  */
> + vm = vm_create_default(VCPU_ID, 0, guest_code);
> +
> + pthread_create(_thread, NULL, migration_worker, 0);
> +
> + while (!done) {
> + vcpu_run(vm, VCPU_ID);
> + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
> + "Guest failed?");
> +
> + cpu = sched_getcpu();
> + rseq_cpu = READ_ONCE(__rseq.cpu_id);
> +
> + /*
> +  * Verify rseq's CPU matches sched's CPU, and that sched's CPU
> +  * is stable.  This doesn't handle the case where the task is
> +  * migrated between sched_getcpu() and reading rseq, and again
> +  * between reading rseq and sched_getcpu(), but in practice no
> +  * false positives have been observed, while on the other hand
> +  * blocking migration while this thread reads CPUs messes with
> +  * the timing and prevents hitting failures on a buggy kernel.
> +  */

I think you could get a stable cpu id between sched_getcpu and __rseq_abi.cpu_id
if you add a pthread mutex to protect:

sched_getcpu and __rseq_abi.cpu_id  reads

vs

sched_setaffinity calls within the migration thread.

Thoughts ?

Thanks,

Mathieu

> + TEST_ASSERT(rseq_cpu == cpu || cpu != sched_getcpu(),
> + "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
> + }
> +
> + pthread_join(migration_thread, NULL);
> +
> + kvm_vm_free(vm);
> +
> + sys_rseq(RSEQ_FLAG_UNREGISTER);
> +
> + return 0;
> +}
> --
> 2.33.0.rc1.237.g0d66db33f3-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest

2021-08-19 Thread Mathieu Desnoyers

- On Aug 17, 2021, at 8:12 PM, Sean Christopherson sea...@google.com wrote:

> Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
> transferring to a KVM guest, which is roughly equivalent to an exit to
> userspace and processes many of the same pending actions.  While the task
> cannot be in an rseq critical section as the KVM path is reachable only
> via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
> critical section still apply, e.g. the CPU ID needs to be updated if the
> task is migrated.
> 
> Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
> and other badness in userspace VMMs that use rseq in combination with KVM,
> e.g. due to the CPU ID being stale after task migration.

I agree with the problem assessment, but I would recommend a small change
to this fix.

> 
> Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
> Reported-by: Peter Foley 
> Bisected-by: Doug Evans 
> Cc: Shakeel Butt 
> Cc: Thomas Gleixner 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sean Christopherson 
> ---
> kernel/entry/kvm.c | 4 +++-
> kernel/rseq.c  | 4 ++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> index 49972ee99aff..049fd06b4c3d 100644
> --- a/kernel/entry/kvm.c
> +++ b/kernel/entry/kvm.c
> @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> unsigned long ti_work)
>   if (ti_work & _TIF_NEED_RESCHED)
>   schedule();
> 
> - if (ti_work & _TIF_NOTIFY_RESUME)
> + if (ti_work & _TIF_NOTIFY_RESUME) {
>   tracehook_notify_resume(NULL);
> + rseq_handle_notify_resume(NULL, NULL);
> + }
> 
>   ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
>   if (ret)
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..58c79a7918cd 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -236,7 +236,7 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs
> *rseq_cs)
> 
> static int rseq_ip_fixup(struct pt_regs *regs)
> {
> - unsigned long ip = instruction_pointer(regs);
> + unsigned long ip = regs ? instruction_pointer(regs) : 0;
>   struct task_struct *t = current;
>   struct rseq_cs rseq_cs;
>   int ret;
> @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>* If not nested over a rseq critical section, restart is useless.
>* Clear the rseq_cs pointer and return.
>*/
> - if (!in_rseq_cs(ip, _cs))
> + if (!regs || !in_rseq_cs(ip, _cs))

I think clearing the thread's rseq_cs unconditionally here when regs is NULL
is not the behavior we want when this is called from xfer_to_guest_mode_work.

If we have a scenario where userspace ends up calling this ioctl(KVM_RUN)
from within a rseq c.s., we really want a CONFIG_DEBUG_RSEQ=y kernel to
kill this application in the rseq_syscall handler when exiting back to usermode
when the ioctl eventually returns.

However, clearing the thread's rseq_cs will prevent this from happening.

So I would favor an approach where we simply do:

if (!regs)
 return 0;

Immediately at the beginning of rseq_ip_fixup, before getting the instruction
pointer, so effectively skip all side-effects of the ip fixup code. Indeed, it
is not relevant to do any fixup here, because it is nested in a ioctl system
call.

Effectively, this would preserve the SIGSEGV behavior when this ioctl is
erroneously called by user-space from a rseq critical section.

Thanks for looking into this !

Mathieu

>   return clear_rseq_cs(t);
>   ret = rseq_need_restart(t, rseq_cs.flags);
>   if (ret <= 0)
> --
> 2.33.0.rc1.237.g0d66db33f3-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 2/5] entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()

2021-08-19 Thread Mathieu Desnoyers

- On Aug 17, 2021, at 8:12 PM, Sean Christopherson sea...@google.com wrote:

> Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
> that the two function are always called back-to-back by architectures
> that have rseq.  The rseq helper is stubbed out for architectures that
> don't support rseq, i.e. this is a nop across the board.
> 
> Note, tracehook_notify_resume() is horribly named and arguably does not
> belong in tracehook.h as literally every line of code in it has nothing
> to do with tracing.  But, that's been true since commit a42c6ded827d
> ("move key_repace_session_keyring() into tracehook_notify_resume()")
> first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
> mess up to future patches.
> 
> No functional change intended.

This will make it harder to introduce new code paths which consume the
NOTIFY_RESUME without calling the rseq callback, which introduces issues.
Agreed.

Acked-by: Mathieu Desnoyers 

> 
> Signed-off-by: Sean Christopherson 
> ---
> arch/arm/kernel/signal.c | 1 -
> arch/arm64/kernel/signal.c   | 1 -
> arch/csky/kernel/signal.c| 4 +---
> arch/mips/kernel/signal.c| 4 +---
> arch/powerpc/kernel/signal.c | 4 +---
> arch/s390/kernel/signal.c| 1 -
> include/linux/tracehook.h| 2 ++
> kernel/entry/common.c| 4 +---
> kernel/entry/kvm.c   | 4 +---
> 9 files changed, 7 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
> index a3a38d0a4c85..9df68d139965 100644
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -670,7 +670,6 @@ do_work_pending(struct pt_regs *regs, unsigned int
> thread_flags, int syscall)
>   uprobe_notify_resume(regs);
>   } else {
>   tracehook_notify_resume(regs);
> - rseq_handle_notify_resume(NULL, regs);
>   }
>   }
>   local_irq_disable();
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 23036334f4dc..22b55db13da6 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -951,7 +951,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> 
>   if (thread_flags & _TIF_NOTIFY_RESUME) {
>   tracehook_notify_resume(regs);
> - rseq_handle_notify_resume(NULL, regs);
> 
>   /*
>* If we reschedule after checking the affinity
> diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
> index 312f046d452d..bc4238b9f709 100644
> --- a/arch/csky/kernel/signal.c
> +++ b/arch/csky/kernel/signal.c
> @@ -260,8 +260,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>   if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
>   do_signal(regs);
> 
> - if (thread_info_flags & _TIF_NOTIFY_RESUME) {
> + if (thread_info_flags & _TIF_NOTIFY_RESUME)
>   tracehook_notify_resume(regs);
> - rseq_handle_notify_resume(NULL, regs);
> - }
> }
> diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
> index f1e985109da0..c9b2a75563e1 100644
> --- a/arch/mips/kernel/signal.c
> +++ b/arch/mips/kernel/signal.c
> @@ -906,10 +906,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, 
> void
> *unused,
>   if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
>   do_signal(regs);
> 
> - if (thread_info_flags & _TIF_NOTIFY_RESUME) {
> + if (thread_info_flags & _TIF_NOTIFY_RESUME)
>   tracehook_notify_resume(regs);
> - rseq_handle_notify_resume(NULL, regs);
> - }
> 
>   user_enter();
> }
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index e600764a926c..b93b87df499d 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -293,10 +293,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long
> thread_info_flags)
>   do_signal(current);
>   }
> 
> - if (thread_info_flags & _TIF_NOTIFY_RESUME) {
> + if (thread_info_flags & _TIF_NOTIFY_RESUME)
>   tracehook_notify_resume(regs);
> - rseq_handle_notify_resume(NULL, regs);
> - }
> }
> 
> static unsigned long get_tm_stackpointer(struct task_struct *tsk)
> diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
> index 78ef53b29958..b307db26bf2d 100644
> --- a/arch/s390/kernel/signal.c
> +++ b/arch/s390/kernel/signal.c
>

[lttng-dev] LTTng 2.13.0 - Nordicité - Linux kernel and user-space tracer

2021-08-03 Thread Mathieu Desnoyers via lttng-dev

ion is met, nothing is done.

  - Rotate session
This action causes the LTTng session daemon to rotate the session with the
given name. See lttng-rotate(1) for more information about the session
rotation concept. If no session with the given name exist at the time the
condition is met, nothing is done.

  - Snapshot session
This action causes the LTTng session daemon to take a snapshot of the
session with the given name. See lttng-snapshot(1) for more information
about the session snapshot concept. If no session with the given name exist
at the time the condition is met, nothing is done.

These new actions can also be combined together. For instance, the following
trigger will stop `my_session`, record a snapshot of `my_session`, and notify
any listening application when '/etc/passwd' is opened:

$ lttng add-trigger
--condition event-rule-matches
  --type kernel:syscall
  --name 'openat'
  --filter 'filename == "/etc/passwd"'
--action stop-session my_session
--action snapshot-session my_session
--action notify

For more information, see the following manual pages:
  - lttng-add-trigger(1),
  - lttng-remove-trigger(1),
  - lttng-list-triggers(1).


Notification payload capture
---

The new event-rule matches condition type also allows 'captures'.
This allow event record and context fields to be captured when an event-rule
matches condition is satisfied.

The captured field values are made available in the evaluation object of the
notifications transmitted to listening applications.

Capture descriptors can be specified using a syntax reminiscent of the one used
by the filter expressions.

The following example will capture a process's name and the 'filename' argument
of all `openat()` system calls:

$ lttng add-trigger
--condition event-rule-matches
  --type kernel:syscall
  --name 'openat'
  --capture 'filename'
  --capture '$ctx.procname'
--action notify

See the lttng-add-trigger(1) manual page for more information.


vtracef and vtracelog (LTTng-UST)
---

New versions of the `tracef()` and `tracelog()` tracing helpers accepting
variable argument lists are introduced as `vtracef()` and `vtracelog()`.

See the tracef(3) and tracelog(3) manual pages for more information.


Add time namespace context (LTTng-UST and LTTng-modules)
---

It is now possible to add the time namespace of a process as a context to
channels (`time_ns`) using the `add-context` command.

See the time_namespaces(7) manual page for more information.


Links
---

Project website: https://lttng.org

Download links:
https://lttng.org/files/lttng-tools/lttng-tools-2.13.0.tar.bz2
https://lttng.org/files/lttng-ust/lttng-ust-2.13.0.tar.bz2
https://lttng.org/files/lttng-modules/lttng-modules-2.13.0.tar.bz2

GPG signatures:
https://lttng.org/files/lttng-tools/lttng-tools-2.13.0.tar.bz2.asc
https://lttng.org/files/lttng-ust/lttng-ust-2.13.0.tar.bz2.asc
https://lttng.org/files/lttng-modules/lttng-modules-2.13.0.tar.bz2.asc

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] unable to add callstack-user context

2021-07-09 Thread Mathieu Desnoyers via lttng-dev

- On Jul 9, 2021, at 4:30 AM, lttng-dev  wrote: 

> Hi there, I'm a little new to lttng and tracing in general. I am using a Java
> Userspace agent, which is working well, but I am having trouble adding
> callstack context to my tracing channels. I'm not sure if it's relevant but
> running 'lttng add-context --list" shows callstack-user in the list.

It's not surprising, because the callstack-user context is specific to the 
kernel tracer. 

> the commands I input are:

> lttng create
> lttng enable-channel --userspace mychannel
> lttng enable-event --userspace --all --channel mychannel
> lttng add-context --userspace --channel mychannel --type callstack-user
> >Error: callstack-user: UST invalid context

This is expected. 

> When I run the following command instead:
> lttng add-context --userspace --channel mychannel --type vpid

> I am able to get it to work and I see vpid appear in my traces. Are there any
> other things I need to set up first to be able to use it?

> also, are there any commands that can list the per-domain contexts available?

Not currently, though I think it would make sense to make "lttng add-context 
--list" allow a tracing domain 
(e.g. --kernel/--userspace) to be specified as a future improvement. 

Some information about this is available in the lttng-add-context(1) man page, 
but the fact that the 
kernel and user callstack contexts are only supported by the kernel tracer 
don't seem to be mentioned. 

Thanks, 

Mathieu 

> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] what will happen during tracing if getcpu() doesn't return correct cpu number occasionally?

2021-07-07 Thread Mathieu Desnoyers via lttng-dev

- On Jul 6, 2021, at 10:37 PM, lttng-dev  wrote: 

> Hi,
> I know lttng-ust tracepoint process uses per cpu lockless ringbuffer algorithm
> for high performance so that it relies on getcpu() to return the cpu number on
> which the app is running.
> Unfortunately, I am working on arm that linux kernel does not support vdso
> getcpu() implemention and one getcpu() will take 200ns!!!
> My question is :
> 1. do you have any advice for that?

You might want to try wiring up the "rseq" system call in user-space to provide 
an accurate cpu_id 
field in a __rseq_abi TLS variable. It is always kept up to date by the kernel. 
The rseq system call 
is implemented on ARM. However the __rseq_abi TLS is a shared resource across 
libraries, and 
we have not agreed with glibc people on how exactly it must be shared within a 
process. 

> 2. If I implement a cache-version for getcpu()(just like getcpu() implemention
> before kernel 2.6.23 ), what will happen during tracing process?

You'd have to give more details on how this "cache-version" works. 

> Since use of the cache could speed getcpu() calls, at the cost that there was 
> a
> very small chance that the returned cpu number would be out of date, I am not
> sure whether the "wrong" cpu number will result in the tracing app crashing?

LTTng-UST always has to expect that it can be migrated at any point between 
getcpu and writes to 
per-cpu data. Therefore, it always relies on atomic operations when interacting 
with the ring buffer, 
and there is no expectation that it runs on the "right" CPU compared to the 
ring buffer data structure 
for consistency. Therefore, you won't experience crashes nor corruption even if 
the CPU number is 
wrong once in a while, as long as it belongs to the "possible CPUs". 

This behavior is selected by lttng's libringbuffer "RING_BUFFER_SYNC_GLOBAL" 
configuration 
option internally, as selected by lttng-ust. 

Note that the kernel tracer instead selects "RING_BUFFER_SYNC_PER_CPU", which 
is faster, but 
requires that preemption (or migration) be disabled between the 
"smp_processor_id()" and writes to 
the ring buffer per-cpu data structures. 

Thanks, 

Mathieu 

> Thanks
> zhenyu.ren

> _______
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Mathieu Desnoyers

- On Jun 18, 2021, at 3:58 PM, Andy Lutomirski l...@kernel.org wrote:

> On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
>> - On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:
>> 
>> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
>> > 
>> >> Please change back this #ifndef / #else / #endif within function for
>> >> 
>> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>> >>   ...
>> >> } else {
>> >>   ...
>> >> }
>> >> 
>> >> I don't think mixing up preprocessor and code logic makes it more 
>> >> readable.
>> > 
>> > I agree, but I don't know how to make the result work well.
>> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
>> > case, so either I need to fake up a definition or use #ifdef.
>> > 
>> > If I faked up a definition, I would want to assert, at build time, that
>> > it isn't called.  I don't think we can do:
>> > 
>> > static void membarrier_sync_core_before_usermode()
>> > {
>> >BUILD_BUG_IF_REACHABLE();
>> > }
>> 
>> Let's look at the context here:
>> 
>> static void ipi_sync_core(void *info)
>> {
>> []
>> membarrier_sync_core_before_usermode()
>> }
>> 
>> ^ this can be within #ifdef / #endif
>> 
>> static int membarrier_private_expedited(int flags, int cpu_id)
>> [...]
>>if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
>> return -EINVAL;
>> if (!(atomic_read(>membarrier_state) &
>>   MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>> return -EPERM;
>> ipi_func = ipi_sync_core;
>> 
>> All we need to make the line above work is to define an empty ipi_sync_core
>> function in the #else case after the ipi_sync_core() function definition.
>> 
>> Or am I missing your point ?
> 
> Maybe?
> 
> My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the 
> core.
> I would be fine with that if I could have the compiler statically verify that
> it’s not called, but I’m uncomfortable having it there if the implementation 
> is
> actively incorrect.

I see. Another approach would be to implement a "setter" function to populate
"ipi_func". That setter function would return -EINVAL in its #ifndef 
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
implementation.

Would that be better ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()

2021-06-18 Thread Mathieu Desnoyers

- On Jun 18, 2021, at 1:13 PM, Christophe Leroy christophe.le...@csgroup.eu 
wrote:
[...]
> 
> I don't understand all that complexity to just replace a simple
> 'smp_mb__after_unlock_lock()'.
> 
> #define smp_mb__after_unlock_lock()   smp_mb()
> #define smp_mb()  barrier()
> # define barrier() __asm__ __volatile__("": : :"memory")
> 
> 
> Am I missing some subtility ?

On powerpc CONFIG_SMP, smp_mb() is actually defined as:

#define smp_mb()__smp_mb()
#define __smp_mb()  mb()
#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

So the original motivation here was to skip a "sync" instruction whenever
switching between threads which are part of the same process. But based on
recent discussions, I suspect my implementation may be inaccurately doing
so though.

Thanks,

Mathieu

> 
> Thanks
> Christophe

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] Disabling sys_membarrier in lttng urcu

2021-06-18 Thread Mathieu Desnoyers via lttng-dev

- On Jun 17, 2021, at 6:13 PM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hello,
> 
> Some old topic, see
> https://lists.lttng.org/pipermail/lttng-dev/2019-November/029411.html.
> (I actually thought this was solved meanwhile).
> 
> With lttng 2.13 liburcu is replicated in lttng-ust so my old custom
> hack aint helping.
> 
> Aside from another crude hack, I thought about doing this:
> 
> extern int lttng_ust_urcu_has_sys_membarrier;
> int setup() {
>  lttng_ust_urcu_has_sys_membarrier = 0;
> }
> 
> this is obvious possible in my own program, but I don't know if some
> lttng daemon would need to update RCU structures in shared memory that
> should sync to other processes (and wont do that with sys_membarrier
> in case of Xenomai threads)?
> 
> Seems safer to me to hack it out once more...

We currently don't have any RCU synchronization across shared memory,
and the implementation we have in lttng-ust only tracks threads within
the same process, so it would be safe to tweak the lttng-ust-urcu behavior
to use membarrier or not based on an environment variable.

So I think we could expose a new "--disable-sys-membarrier" configure option
to lttng-ust, and maybe add a LTTNG_UST_MEMBARRIER environment variable.
This would likely be 2.14 material though.

I also notice that lttng-ust 2.13's implementation of RCU still refers to
CONFIG_RCU_FORCE_SYS_MEMBARRIER which belongs to liburcu. This is
probably something we'll want to fix right away before we release 2.13
final.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Mathieu Desnoyers

- On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:

> On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
> 
>> Please change back this #ifndef / #else / #endif within function for
>> 
>> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>>   ...
>> } else {
>>   ...
>> }
>> 
>> I don't think mixing up preprocessor and code logic makes it more readable.
> 
> I agree, but I don't know how to make the result work well.
> membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
> case, so either I need to fake up a definition or use #ifdef.
> 
> If I faked up a definition, I would want to assert, at build time, that
> it isn't called.  I don't think we can do:
> 
> static void membarrier_sync_core_before_usermode()
> {
>BUILD_BUG_IF_REACHABLE();
> }

Let's look at the context here:

static void ipi_sync_core(void *info)
{
[]
membarrier_sync_core_before_usermode()
}

^ this can be within #ifdef / #endif

static int membarrier_private_expedited(int flags, int cpu_id)
[...]
   if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
if (!(atomic_read(>membarrier_state) &
  MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
ipi_func = ipi_sync_core;

All we need to make the line above work is to define an empty ipi_sync_core
function in the #else case after the ipi_sync_core() function definition.

Or am I missing your point ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-17 Thread Mathieu Desnoyers

- On Jun 15, 2021, at 11:21 PM, Andy Lutomirski l...@kernel.org wrote:

[...]

> +# An architecture that wants to support
> +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE needs to define precisely what 
> it
> +# is supposed to do and implement membarrier_sync_core_before_usermode() to
> +# make it do that.  Then it can select ARCH_HAS_MEMBARRIER_SYNC_CORE via
> +# Kconfig.Unfortunately, MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is not a
> +# fantastic API and may not make sense on all architectures.  Once an
> +# architecture meets these requirements,

Can we please remove the editorial comment about the quality of the membarrier
sync-core's API ?

At least it's better than having all userspace rely on mprotect() undocumented
side-effects to perform something which typically works, until it won't, or 
until
this prevents mprotect's implementation to be improved because it will start 
breaking
JITs all over the place.

We can simply state that the definition of what membarrier sync-core does is 
defined
per-architecture, and document the sequence of operations to perform when doing
cross-modifying code specifically for each architecture.

Now if there are weird architectures where membarrier is an odd fit (I've heard 
that
riscv might need address ranges to which the core sync needs to apply), then 
those
might need to implement their own arch-specific system call, which is all fine.

> +#
> +# On x86, a program can safely modify code, issue
> +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, and then execute that code, via
> +# the modified address or an alias, from any thread in the calling process.
> +#
> +# On arm64, a program can modify code, flush the icache as needed, and issue
> +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE to force a "context 
> synchronizing
> +# event", aka pipeline flush on all CPUs that might run the calling process.
> +# Then the program can execute the modified code as long as it is executed
> +# from an address consistent with the icache flush and the CPU's cache type.
> +#
> +# On powerpc, a program can use MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE
> +# similarly to arm64.  It would be nice if the powerpc maintainers could
> +# add a more clear explanantion.

We should document the requirements on ARMv7 as well.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-17 Thread Mathieu Desnoyers

- On Jun 15, 2021, at 11:21 PM, Andy Lutomirski l...@kernel.org wrote:

> The old sync_core_before_usermode() comments suggested that a 
> non-icache-syncing
> return-to-usermode instruction is x86-specific and that all other
> architectures automatically notice cross-modified code on return to
> userspace.
> 
> This is misleading.  The incantation needed to modify code from one
> CPU and execute it on another CPU is highly architecture dependent.
> On x86, according to the SDM, one must modify the code, issue SFENCE
> if the modification was WC or nontemporal, and then issue a "serializing
> instruction" on the CPU that will execute the code.  membarrier() can do
> the latter.
> 
> On arm64 and powerpc, one must flush the icache and then flush the pipeline
> on the target CPU, although the CPU manuals don't necessarily use this
> language.
> 
> So let's drop any pretense that we can have a generic way to define or
> implement membarrier's SYNC_CORE operation and instead require all
> architectures to define the helper and supply their own documentation as to
> how to use it.

Agreed. Documentation of the sequence of operations that need to be performed
when cross-modifying code on SMP should be per-architecture. The documentation
of the architectural effects of membarrier sync-core should be per-arch as well.

> This means x86, arm64, and powerpc for now.

And also arm32, as discussed in the other leg of the patchset's email thread.

> Let's also
> rename the function from sync_core_before_usermode() to
> membarrier_sync_core_before_usermode() because the precise flushing details
> may very well be specific to membarrier, and even the concept of
> "sync_core" in the kernel is mostly an x86-ism.

OK

> 
[...]
> 
> static void ipi_rseq(void *info)
> {
> @@ -368,12 +373,14 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
>   smp_call_func_t ipi_func = ipi_mb;
> 
>   if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
> - if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
> +#ifndef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
>   return -EINVAL;
> +#else
>   if (!(atomic_read(>membarrier_state) &
> MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>   return -EPERM;
>   ipi_func = ipi_sync_core;
> +#endif

Please change back this #ifndef / #else / #endif within function for

if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
  ...
} else {
  ...
}

I don't think mixing up preprocessor and code logic makes it more readable.

Thanks,

Mathieu

>   } else if (flags == MEMBARRIER_FLAG_RSEQ) {
>   if (!IS_ENABLED(CONFIG_RSEQ))
>   return -EINVAL;
> --
> 2.31.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[lttng-dev] Usage example of libbabeltrace (babeltrace2) to read CTF traces from a GUI

2021-06-14 Thread Mathieu Desnoyers via lttng-dev

Hi Philippe, Hi Jeremie,

Steven is interested to use libbabeltrace as a CTF trace reader in the 
KernelShark
project. It's a GUI which shows Linux kernel traces.

He notices that most of the documented usage examples of the libbabeltrace API
focus on use-cases where a custom trace format is converted into CTF.

I know that the babeltrace2 program is the main user of libbabeltrace at this
point, and that it reads CTF as source.

For using libbabeltrace as CTF reader for a GUI, what would be the best examples
to look at as starting point ? Perhaps the babeltrace2 binary, or just adapt a
smaller example program and change the custom trace format source to "ctf.fs" ?
Or anything else ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] LTTng 32 bits on 64 bits systems - kernel events... but no user-event

2021-06-07 Thread Mathieu Desnoyers via lttng-dev




- On Jun 4, 2021, at 9:09 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hi LTTng team,

> I am currently trying to add lttng 32bits on a 64bits system with a custom
> kernel 4.19.177+ x86_64 (Debian 10.9)
> My former attempt only with only a 100% arm32 bits was successful (raspberry).
> I am facing now a strange issue : I can record kernel event but not userspace
> event... but I can see these user-event (see below)!
> (tuto : [ 
> https://lttng.org/docs/#doc-instrumenting-32-bit-app-on-64-bit-system
> | https://lttng.org/docs/#doc-instrumenting-32-bit-app-on-64-bit-system ] )

> I followed two times the tutorial : on lttng 2.12 and lttng 2.10.
> I am on lttng (LTTng Trace Control) 2.10.11 - KeKriek now
> All installation seemed to install without any error.
> The only thing I added is at the very end where I did a : ln -s
> /usr/local/bin/lttng /usr/bin/lttng (to directly call commands like "lttng
> start")

> Here some interesting outputs, I am using this projet as an example : [
> https://lttng.org/docs/#doc-tracing-your-own-user-application |
> https://lttng.org/docs/#doc-tracing-your-own-user-application ]
> lttng list -u : OK [ https://paste.ubuntu.com/p/hZptnNzySw/ | Ubuntu Pastebin 
> ]
> LTTNG_UST_DEBUG=1 ./hello : seems OK : [ 
> https://paste.ubuntu.com/p/w6xHrJsWJ9/
> | Ubuntu Pastebin ]
> kernel-event output : [ https://paste.ubuntu.com/p/5KfyS8wVdb/ | Ubuntu 
> Pastebin
> ]
> user-event ouput : none - the output folder stated with lttng create
> my-user-session --output=/tmp/my-user-session doesn't even appear
> apt-file search lttng : [ https://paste.ubuntu.com/p/xMcCc7bqwk/ | Ubuntu
> Pastebin ]
> lsmod | grep lttng : [ https://paste.ubuntu.com/p/7YNkNsh9Fr/ | Ubuntu 
> Pastebin
> ]

> Note : file /usr/local/bin/lttng (the exe I am using) gives :
> /usr/local/bin/lttng: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
> dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux
> 3.2.0, BuildID[sha1]=542a108c40e386c17027c2c8f8fcb30e278b7748, with 
> debug_info,
> not stripped

> I am wondering if the installation didn't pick a 64bits RCU and analyses
> programms only on this.
> However, I have currently absolutely no idea how to debug this.

> Could you advise me some methods ?

Did you follow this section of the documentation ? 

https://lttng.org/docs/#doc-instrumenting-32-bit-app-on-64-bit-system 

Thanks, 

Mathieu 

> Regards,

> _______
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [Kgdb-bugreport] [PATCH 4/6] sched: Add get_current_state()

2021-06-07 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()  READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Re: [Kgdb-bugreport] [PATCH 3/6] sched, perf, kvm: Fix preemption condition

2021-06-07 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:
[...]
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
>   },
>   };
> 
> - if (!sched_in && task->state == TASK_RUNNING)
> + if (!sched_in && current->on_rq) {
>   switch_event.event_id.header.misc |=
>   PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> + }
> 
> - perf_iterate_sb(perf_event_switch_output,
> -_event,
> -NULL);
> + perf_iterate_sb(perf_event_switch_output, _event, NULL);
> }

There is a lot of code cleanup going on here which does not seem to belong
to a "Fix" patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Re: [Kgdb-bugreport] [PATCH 3/6] sched, perf, kvm: Fix preemption condition

2021-06-07 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> When ran from the sched-out path (preempt_notifier or perf_event),
> p->state is irrelevant to determine preemption. You can get preempted
> with !task_is_running() just fine.
> 
> The right indicator for preemption is if the task is still on the
> runqueue in the sched-out path.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
> kernel/events/core.c |7 +++
> virt/kvm/kvm_main.c  |2 +-
> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
>   },
>   };
> 
> - if (!sched_in && task->state == TASK_RUNNING)
> + if (!sched_in && current->on_rq) {

This changes from checking task->state to current->on_rq, but this change
from "task" to "current" is not described in the commit message, which is odd.

Are we really sure that task == current here ?

Thanks,

Mathieu

>   switch_event.event_id.header.misc |=
>   PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> + }
> 
> - perf_iterate_sb(perf_event_switch_output,
> -_event,
> -NULL);
> + perf_iterate_sb(perf_event_switch_output, _event, NULL);
> }
> 
> /*
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
> {
>   struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> 
> - if (current->state == TASK_RUNNING) {
> + if (current->on_rq) {
>   WRITE_ONCE(vcpu->preempted, true);
>   WRITE_ONCE(vcpu->ready, true);
>   }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Re: [Kgdb-bugreport] [PATCH 6/6] sched: Change task_struct::state

2021-06-07 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
>   psinfo->pr_pgrp = task_pgrp_vnr(p);
>   psinfo->pr_sid = task_session_vnr(p);
> 
> - i = p->state ? ffz(~p->state) + 1 : 0;
> + state = READ_ONCE(p->__state);
> + i = state ? ffz(~state) + 1 : 0;
>   psinfo->pr_state = i;
>   psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
>   psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
>__TASK_TRACED | EXIT_DEAD | 
> EXIT_ZOMBIE | \
>TASK_PARKED)
> 
> -#define task_is_running(task)(READ_ONCE((task)->state) == 
> TASK_RUNNING)
> +#define task_is_running(task)(READ_ONCE((task)->__state) == 
> TASK_RUNNING)
> 
> -#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task) ((READ_ONCE(task->__state) & 
> __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)((task->state & __TASK_STOPPED) 
> != 0)
> +#define task_is_stopped(task)((READ_ONCE(task->__state) & 
> __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)  ((task->state & (__TASK_STOPPED 
> |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)  ((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
>   do {\
>   WARN_ON_ONCE(is_special_task_state(state_value));\
>   current->task_state_change = _THIS_IP_; \
> - current->state = (state_value);     \
> + WRITE_ONCE(current->__state, (state_value));\
>   } while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

[lttng-dev] [RELEASE] Userspace RCU 0.13.0

2021-06-03 Thread Mathieu Desnoyers via lttng-dev

Hi,

The Userspace RCU 0.13 release is mostly a library soname version bump
to address an ABI incompatibility between the 0.10 and { 0.11, 0.12 }
releases.

The observed application vs library compatibility problem occurs as follows:

- An application executable is built with _LGPL_SOURCE defined, includes
  any of the Userspace RCU 0.10 urcu flavor headers, and is built
  without the -fpic compiler option.

- The Userspace RCU 0.10 library shared objects are updated to 0.11
  or 0.12 without rebuilding the application.

- The application will hang, typically when RCU grace period
  (synchronize_rcu) is invoked.

Some possible work-arounds for this are:

- Rebuild the application against Userspace RCU 0.11+,
- Rebuild the application with -fpic.
- Upgrade Userspace RCU to 0.13+ without installing 0.11 nor 0.12.


* Explanation of the issue

In URCU 0.11, we introduced new symbols to clean up the library symbol
namespacing, using the "alias" attribute to keep emitting the old
symbols, expecting to preserve ABI backward compatibility.
Unfortunately, it turns out that even though it works well for function
symbols, it is broken for public global variables due to the way ELF
copy relocation works.

When building a non-PIC executable that uses an extern variable, a .bss
symbol is emitted in the executable. This will take precedence over the
symbol implemented within the library in the Global Symbol Table.
Unfortunately, the alias within the library will not be aware that the
actual GST symbol differs from its alias within the library, and the
addresses for the symbol and its alias will differ at runtime.

Considering that this compatibility issue affects official library
releases, there is little we can do beyond documenting this issue, and
bumping the Userspace RCU major soname for the next (0.13) release.

In summary, do not upgrade from Userspace RCU 0.10 to 0.11 or 0.12 if 
you have applications which:

- define _LGPL_SOURCE
- use Userspace RCU 0.10 headers
- are not compiled with -fpic

We recommend instead to upgrade to Userspace RCU 0.13, which bumps the
library soname major number.

Thanks,

Mathieu

Project website: http://liburcu.org
Git repository: git://git.liburcu.org/urcu.git

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state

2021-06-03 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
>   psinfo->pr_pgrp = task_pgrp_vnr(p);
>   psinfo->pr_sid = task_session_vnr(p);
> 
> - i = p->state ? ffz(~p->state) + 1 : 0;
> + state = READ_ONCE(p->__state);
> + i = state ? ffz(~state) + 1 : 0;
>   psinfo->pr_state = i;
>   psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
>   psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
>__TASK_TRACED | EXIT_DEAD | 
> EXIT_ZOMBIE | \
>TASK_PARKED)
> 
> -#define task_is_running(task)(READ_ONCE((task)->state) == 
> TASK_RUNNING)
> +#define task_is_running(task)(READ_ONCE((task)->__state) == 
> TASK_RUNNING)
> 
> -#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task) ((READ_ONCE(task->__state) & 
> __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)((task->state & __TASK_STOPPED) 
> != 0)
> +#define task_is_stopped(task)((READ_ONCE(task->__state) & 
> __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)  ((task->state & (__TASK_STOPPED 
> |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)  ((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
>   do {\
>   WARN_ON_ONCE(is_special_task_state(state_value));\
>   current->task_state_change = _THIS_IP_; \
> - current->state = (state_value);     \
> + WRITE_ONCE(current->__state, (state_value));\
>   } while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition

2021-06-03 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:
[...]
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
>   },
>   };
> 
> - if (!sched_in && task->state == TASK_RUNNING)
> + if (!sched_in && current->on_rq) {
>   switch_event.event_id.header.misc |=
>   PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> + }
> 
> - perf_iterate_sb(perf_event_switch_output,
> -_event,
> -NULL);
> + perf_iterate_sb(perf_event_switch_output, _event, NULL);
> }

There is a lot of code cleanup going on here which does not seem to belong
to a "Fix" patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 4/6] sched: Add get_current_state()

2021-06-03 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()  READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition

2021-06-03 Thread Mathieu Desnoyers

- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> When ran from the sched-out path (preempt_notifier or perf_event),
> p->state is irrelevant to determine preemption. You can get preempted
> with !task_is_running() just fine.
> 
> The right indicator for preemption is if the task is still on the
> runqueue in the sched-out path.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
> kernel/events/core.c |7 +++
> virt/kvm/kvm_main.c  |2 +-
> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
>   },
>   };
> 
> - if (!sched_in && task->state == TASK_RUNNING)
> + if (!sched_in && current->on_rq) {

This changes from checking task->state to current->on_rq, but this change
from "task" to "current" is not described in the commit message, which is odd.

Are we really sure that task == current here ?

Thanks,

Mathieu

>   switch_event.event_id.header.misc |=
>   PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> + }
> 
> - perf_iterate_sb(perf_event_switch_output,
> -_event,
> -NULL);
> + perf_iterate_sb(perf_event_switch_output, _event, NULL);
> }
> 
> /*
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
> {
>   struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> 
> - if (current->state == TASK_RUNNING) {
> + if (current->on_rq) {
>   WRITE_ONCE(vcpu->preempted, true);
>   WRITE_ONCE(vcpu->ready, true);
>   }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

[lttng-dev] [PATCH lttng-tools stable-2.12] Fix: list_lttng_agent_events: unbalanced RCU read-side lock on error

2021-05-27 Thread Mathieu Desnoyers via lttng-dev

The error label jumps to the end label which releases the RCU read-side
lock. There are many error paths in this function which goto error
without holding the RCU read-side lock, thus causing unbalanced RCU
read-side lock.

There is no point in keeping so short RCU read-side critical sections,
so cover the entire function with a single read-side critical section.

[ Applies to stable-2.12 and possibly prior versions. Does _not_ apply
  to stable-2.13+. ]

Signed-off-by: Mathieu Desnoyers 
Change-Id: I5b20c229a5df22d22ecfdc64dbbb87ee118649d2
---
 src/bin/lttng-sessiond/cmd.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/bin/lttng-sessiond/cmd.c b/src/bin/lttng-sessiond/cmd.c
index b608df1e1..eb5da1b76 100644
--- a/src/bin/lttng-sessiond/cmd.c
+++ b/src/bin/lttng-sessiond/cmd.c
@@ -510,7 +510,6 @@ static int list_lttng_agent_events(struct agent *agt,
 
rcu_read_lock();
nb_event = lttng_ht_get_count(agt->events);
-   rcu_read_unlock();
if (nb_event == 0) {
ret = nb_event;
*total_size = 0;
@@ -524,7 +523,6 @@ static int list_lttng_agent_events(struct agent *agt,
 * This is only valid because the commands which add events are
 * processed in the same thread as the listing.
 */
-   rcu_read_lock();
cds_lfht_for_each_entry(agt->events->ht, , event, node.node) {
ret = increment_extended_len(event->filter_expression, NULL, 
NULL,
_len);
@@ -534,7 +532,6 @@ static int list_lttng_agent_events(struct agent *agt,
goto error;
}
}
-   rcu_read_unlock();
 
*total_size = nb_event * sizeof(*tmp_events) + extended_len;
tmp_events = zmalloc(*total_size);
@@ -547,7 +544,6 @@ static int list_lttng_agent_events(struct agent *agt,
extended_at = ((uint8_t *) tmp_events) +
nb_event * sizeof(struct lttng_event);
 
-   rcu_read_lock();
cds_lfht_for_each_entry(agt->events->ht, , event, node.node) {
strncpy(tmp_events[i].name, event->name, 
sizeof(tmp_events[i].name));
tmp_events[i].name[sizeof(tmp_events[i].name) - 1] = '\0';
-- 
2.17.1

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH lttng-ust] Improve tracelog handling, reduce exported functions

2021-05-21 Thread Mathieu Desnoyers via lttng-dev




- On May 20, 2021, at 1:43 PM, Norbert Lange nolang...@gmail.com wrote:

> Am Do., 20. Mai 2021 um 19:18 Uhr schrieb Mathieu Desnoyers
> :
>>
>>
>>
>> - On May 20, 2021, at 12:51 PM, Norbert Lange nolang...@gmail.com wrote:
>>
>> > Am Do., 20. Mai 2021 um 18:25 Uhr schrieb Mathieu Desnoyers
>> > :
>> >>
>> >> - On May 20, 2021, at 11:54 AM, Norbert Lange nolang...@gmail.com 
>> >> wrote:
>> >> [...]
>> >>
>> >> >> What prevents you from linking against lttng-ust.so again ?
>> >> >
>> >> > I did not poke around enough with Lttng to be confident it wont have
>> >> > side effects,
>> >> > I really don't want it active in production. It doesn't seem there is
>> >> > much public knowledge with Xenomai either.
>> >> > lttng-ust.so will spawn threads, lttng-ust-tracepoint.so is mostly 
>> >> > passive,
>> >>
>> >> There is indeed a split between instrumentation and runtime threads done
>> >> with lttng-ust-tracepoint.so vs lttng-ust.so.
>> >>
>> >> I understand that this split is missing for tracelog and tracef, and
>> >> would be a good thing to have.
>> >>
>> >> I would be interested to move the tracelog and tracef implementation
>> >> from liblttng-ust.so to liblttng-ust-tracepoint.so, even this late
>> >> in the -rc cycle, because all users of tracelog/tracef need to link
>> >> against liblttng-ust-tracepoint.so anyway. So moving these symbols
>> >> should not affect anyone.
>> >>
>> >> Can you give it a try and let me know if it works for you ?
>> >
>> > Will take some time, whats the timeframe you need for feedback?
>>
>> Here is the tentative commit:
>>
>> https://review.lttng.org/c/lttng-ust/+/5927 Move tracef/tracelog symbols to
>> liblttng-ust-tracepoint.so
> 
> Well... this is certainly an improvement. I am not completely happy
> though: "... users now link against
> liblttng-ust-tracepoint.so explicitly"

I'm abandoning this change for now.

It's too late in the rc cycle for doing an ABI breaking change. Also, this
can eventually be done gradually by introducing a new .so with new symbols,
and mapping the tracelog/tracef APIs to those new symbols in a future release.
So I don't think it justifies breaking ABI at this stage.

> 
> My homecooked solution currently works like this:
> 
> *) define the probes from  with
> TRACEPOINT_PROBE_DYNAMIC_LINKAGE,
>link them in the application, together with other dynamic probes
> *) build a separate library with *other* tracepoints, lets call it
> libtracepoint.so
> *) don't link the application to any lttng library.
> 
> Which means:
> 
> 1) the application works without lttng libraries. tracepoints are no-ops
> 2) if available then liblttng-ust-tracepoint.so is loaded (constructor
> function from your headers). tracepoints are no-ops
> 3) if the application dlopen's libtracepoint.so and in turn
> liblttng-ust.so then tracepoints work.
> 
> I'd lose option 1 compared to reimplementing tracelog using homecooked
> lttng-ust-tracelog tracepoints.
> 
> So, are there any issues using  that way,
> it seems to work fine,
> are there mutliple competing instances now?
> (I am not re-using any bit from tracelog.h, I am just after using the
> tracepoint definition).
> 
> I mean I could dlsym all the functions, but tracelog has 1 per
> loglevel and really ugly long names ;)

I recommend that you re-use the parts of tracelog which are useful to
you, but that you implement your own events within your provider .so
with your own event names, so there is no clash with upstream lttng-ust.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH lttng-ust] Improve tracelog handling, reduce exported functions

2021-05-20 Thread Mathieu Desnoyers via lttng-dev




- On May 20, 2021, at 12:51 PM, Norbert Lange nolang...@gmail.com wrote:

> Am Do., 20. Mai 2021 um 18:25 Uhr schrieb Mathieu Desnoyers
> :
>>
>> - On May 20, 2021, at 11:54 AM, Norbert Lange nolang...@gmail.com wrote:
>> [...]
>>
>> >> What prevents you from linking against lttng-ust.so again ?
>> >
>> > I did not poke around enough with Lttng to be confident it wont have
>> > side effects,
>> > I really don't want it active in production. It doesn't seem there is
>> > much public knowledge with Xenomai either.
>> > lttng-ust.so will spawn threads, lttng-ust-tracepoint.so is mostly passive,
>>
>> There is indeed a split between instrumentation and runtime threads done
>> with lttng-ust-tracepoint.so vs lttng-ust.so.
>>
>> I understand that this split is missing for tracelog and tracef, and
>> would be a good thing to have.
>>
>> I would be interested to move the tracelog and tracef implementation
>> from liblttng-ust.so to liblttng-ust-tracepoint.so, even this late
>> in the -rc cycle, because all users of tracelog/tracef need to link
>> against liblttng-ust-tracepoint.so anyway. So moving these symbols
>> should not affect anyone.
>>
>> Can you give it a try and let me know if it works for you ?
> 
> Will take some time, whats the timeframe you need for feedback?

Here is the tentative commit:

https://review.lttng.org/c/lttng-ust/+/5927 Move tracef/tracelog symbols to 
liblttng-ust-tracepoint.so

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH lttng-ust] Improve tracelog handling, reduce exported functions

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 11:54 AM, Norbert Lange nolang...@gmail.com wrote:
[...]

>> What prevents you from linking against lttng-ust.so again ?
> 
> I did not poke around enough with Lttng to be confident it wont have
> side effects,
> I really don't want it active in production. It doesn't seem there is
> much public knowledge with Xenomai either.
> lttng-ust.so will spawn threads, lttng-ust-tracepoint.so is mostly passive,

There is indeed a split between instrumentation and runtime threads done
with lttng-ust-tracepoint.so vs lttng-ust.so.

I understand that this split is missing for tracelog and tracef, and
would be a good thing to have.

I would be interested to move the tracelog and tracef implementation
from liblttng-ust.so to liblttng-ust-tracepoint.so, even this late
in the -rc cycle, because all users of tracelog/tracef need to link
against liblttng-ust-tracepoint.so anyway. So moving these symbols
should not affect anyone.

Can you give it a try and let me know if it works for you ?

> So Id want a dynamic tracepoint-provider than i can dlopen (so that
> the signal masks are inherited,
> I hope you dont touch them).

The signals are all blocked for lttng-ust listener threads. We don't
modify the signal masks in the tracepoint probes. Not sure which is
the target of your question though.

> 
> Of course I could just remove all lttng libraries on the production
> system aswell. Still doesnt change that
> tracelog and tracef doesnt work that way.

Would moving the tracelog/tracef implementation to liblttng-ust-tracepoint.so
solve your issues ?

> 
> I implemented my own tracelog/tracef using the normal lttng
> tracepoints for now, they totally break on source level with 2.13
> aswell ;)
> is it ok if I do this to access them:
> 
> #define TRACEPOINT_DEFINE
> #define TRACEPOINT_PROBE_DYNAMIC_LINKAGE
> // 2.12
> // #include 
> // #include 
> // 2.13
> #include 
> #include 
> 
> ie. I would load lttng-ust.so later and can then use those tracepoints.

Reimplementing the tracelog/tracef providers is not an intended use-case.
I'd very much prefer if we move their implementation to
liblttng-ust-tracepoint.so.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH lttng-ust] Improve tracelog handling, reduce exported functions

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 10:57 AM, Norbert Lange nolang...@gmail.com wrote:

> Am Do., 20. Mai 2021 um 16:19 Uhr schrieb Mathieu Desnoyers
> :
>>
>> - On May 20, 2021, at 8:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote:
>>
>> > Instead of creating functions for each loglevel, simply pass the
>> > callback as argument.
>> >
>> > Further pack all preprocessor information into a struct that
>> > the compiler already can prepare.
>>
>> This introduces an ABI break too late in the cycle.
> 
> So 2.14 would be the next chance I guess

No. The original ABI was introduced about 10 years ago with lttng-ust 2.0,
and lttng-ust 2.13 introduces the first ABI break since. I don't
plan on doing any ABI break in lttng-ust in the foreseeable future.

ABI breaks require that our users recompile all their instrumented
applications, which is really cumbersome for large software deployments.
We don't break ABI lightly.

We should rather introduce new features as extensions (new symbols).

> 
>> Also, I'm not so keen on adding an indirect call on the fast-path
>> when it's not absolutely needed.
> 
> Code seems pretty similar: https://godbolt.org/z/oK1WhWqGT

By fast-path, I also mean:

+(*callback)(source->file, source->line, source->func, msg, len,
+LTTNG_UST_CALLER_IP());

Which introduces an indirect call which needs to be taken when tracing
is active.

> 
>> What is wrong with having one symbol per loglevel ?
> 
> Macro-magic is cumbersome to edit, more code, more relocations.

If it was still time for ABI breaks, I would be tempted to consider it
especially given that tracelog and tracef are not expected to be "high-speed",
but now is too late for breaking ABI.

> 
> Easier to adapt aswell, could roll my own tracelog functions while
> using lttng_ust__tracelog_printf (started soind that as I don't want
> to link to lttng-ust.so)

What prevents you from linking against lttng-ust.so again ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] LTTng - Xenomai : different results between timestamp-lttng and rt_time_read()

2021-05-20 Thread Mathieu Desnoyers via Xenomai

- On May 20, 2021, at 9:56 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On May 20, 2021, at 9:54 AM, lttng-dev lttng-...@lists.lttng.org wrote:
> 
>> - On May 20, 2021, at 5:11 AM, lttng-dev lttng-...@lists.lttng.org wrote:
>> 
>>> Am Do., 20. Mai 2021 um 10:28 Uhr schrieb MONTET Julien
>>> :
>>>>
>>>> Hi Norbert,
>>>>
>>>> Thank you for your answer !
>>>>
>>>> Yes, I am using a Xenomai cobalt - xenomai is 3.1
>>>> cat /proc/xenomai/version => 3.1
>>>>
>>>> After the installation, I tested "test tools" in /proc/xenomai/ and it 
>>>> worked
>>>> nice.
>>> 
>>> Just asked to make sure, thought the scripts usual add some -xeno tag
>>> to the kernel version.
>>> 
>>>> What do you mean by "it might deadlock really good" ?
>>> 
>>> clock_gettime will either use a syscall (kills realtime always) or is
>>> optimized via VDSO (which very likely is your case).
>>> 
>>> What happens is that the kernel will take a spinlock, then write new
>>> values, then releases the spinlock.
>>> your program will aswell spin (but just to see if the spinlock is
>>> free), read the values and interpolates them.
>>> 
>>> But if your program interrupts the kernel while the kernel holds the
>>> lock (all on the same cpu core), then it will spin forever and the
>>> kernel will never execute.
>> 
>> Just one clarification: the specific locking strategy used by the
>> Linux kernel monotonic clock vDSO is a "seqlock", where the kernel
>> sets a bit which keeps concurrent readers looping until they observe
> 
> When I say "sets a bit", I actually mean "increment a sequence counter",
> and readers observe either odd or even state, thus knowing whether
> they need to retry, and whether the value read before/after reading
> the data structure changed.

Looking again at the Linux kernel's kernel/time/vsyscall.c implementation
of vdso_update_{begin,end}, I notice that interrupts are disabled across
the entire update. So I understand that the Interrupt pipeline (I-pipe)
interrupt gets delivered even when the kernel disables interrupts. Did
you consider modifying the I-pipe kernel patch to change the vdso update so
it updates the vdso from within an I-pipe virq handler ?

AFAIU this would allow Xenomai userspace to use the Linux kernel vDSO
clock sources.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>> a consistent value. With Xenomai it indeed appears to be prone to
>> deadlock if a high priority Xenomai thread interrupts the kernel
>> while the write seqlock is held, and then proceeds to loop forever
>> on the read-side of the seqlock.
>> 
>> Note that for the in-kernel tracer clock read use-case, which
>> needs to be able to happen from NMI context, I've contributed a
>> modified version of the seqlock to the Linux kernel:
>> 
>> https://lwn.net/Articles/831540/ The seqcount latch lock type
>> 
>> It basically keeps two copies of the clock data structures, so the
>> read-side never has to loop waiting for the updater: it simply gets
>> redirected to the "stable" copy of the data.
>> 
>> The trade-off here is that with the latch lock used for clocks, a
>> reader may observe time going slightly backwards between two clock
>> reads when reading while specific clock rate adjustments are made
>> by an updater. The clock user needs to be aware of this.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>> ___
>> lttng-dev mailing list
>> lttng-...@lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] LTTng - Xenomai : different results between timestamp-lttng and rt_time_read()

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 9:56 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On May 20, 2021, at 9:54 AM, lttng-dev lttng-dev@lists.lttng.org wrote:
> 
>> - On May 20, 2021, at 5:11 AM, lttng-dev lttng-dev@lists.lttng.org wrote:
>> 
>>> Am Do., 20. Mai 2021 um 10:28 Uhr schrieb MONTET Julien
>>> :
>>>>
>>>> Hi Norbert,
>>>>
>>>> Thank you for your answer !
>>>>
>>>> Yes, I am using a Xenomai cobalt - xenomai is 3.1
>>>> cat /proc/xenomai/version => 3.1
>>>>
>>>> After the installation, I tested "test tools" in /proc/xenomai/ and it 
>>>> worked
>>>> nice.
>>> 
>>> Just asked to make sure, thought the scripts usual add some -xeno tag
>>> to the kernel version.
>>> 
>>>> What do you mean by "it might deadlock really good" ?
>>> 
>>> clock_gettime will either use a syscall (kills realtime always) or is
>>> optimized via VDSO (which very likely is your case).
>>> 
>>> What happens is that the kernel will take a spinlock, then write new
>>> values, then releases the spinlock.
>>> your program will aswell spin (but just to see if the spinlock is
>>> free), read the values and interpolates them.
>>> 
>>> But if your program interrupts the kernel while the kernel holds the
>>> lock (all on the same cpu core), then it will spin forever and the
>>> kernel will never execute.
>> 
>> Just one clarification: the specific locking strategy used by the
>> Linux kernel monotonic clock vDSO is a "seqlock", where the kernel
>> sets a bit which keeps concurrent readers looping until they observe
> 
> When I say "sets a bit", I actually mean "increment a sequence counter",
> and readers observe either odd or even state, thus knowing whether
> they need to retry, and whether the value read before/after reading
> the data structure changed.

Looking again at the Linux kernel's kernel/time/vsyscall.c implementation
of vdso_update_{begin,end}, I notice that interrupts are disabled across
the entire update. So I understand that the Interrupt pipeline (I-pipe)
interrupt gets delivered even when the kernel disables interrupts. Did
you consider modifying the I-pipe kernel patch to change the vdso update so
it updates the vdso from within an I-pipe virq handler ?

AFAIU this would allow Xenomai userspace to use the Linux kernel vDSO
clock sources.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>> a consistent value. With Xenomai it indeed appears to be prone to
>> deadlock if a high priority Xenomai thread interrupts the kernel
>> while the write seqlock is held, and then proceeds to loop forever
>> on the read-side of the seqlock.
>> 
>> Note that for the in-kernel tracer clock read use-case, which
>> needs to be able to happen from NMI context, I've contributed a
>> modified version of the seqlock to the Linux kernel:
>> 
>> https://lwn.net/Articles/831540/ The seqcount latch lock type
>> 
>> It basically keeps two copies of the clock data structures, so the
>> read-side never has to loop waiting for the updater: it simply gets
>> redirected to the "stable" copy of the data.
>> 
>> The trade-off here is that with the latch lock used for clocks, a
>> reader may observe time going slightly backwards between two clock
>> reads when reading while specific clock rate adjustments are made
>> by an updater. The clock user needs to be aware of this.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>> ___
>> lttng-dev mailing list
>> lttng-dev@lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH lttng-ust] Improve tracelog handling, reduce exported functions

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

> + __attribute__ ((format(printf, 3, 4)));
> +
> +extern void lttng_ust__tracelog_vprintf(tpcallback_t *callback,
> + const struct lttng_ust__tracelog_sourceinfo *source, const char *fmt, 
> va_list
> ap)
> + __attribute__ ((format(printf, 3, 0)));
> +
> +static inline
> +void lttng_ust___tracelog_vprintf(tpcallback_t *callback,
> + const struct lttng_ust__tracelog_sourceinfo *source,
> + const char *fmt, va_list ap)
> + __attribute__((always_inline, format(printf, 3, 0)));
> +
> +
> +static inline
> +void lttng_ust___tracelog_vprintf(tpcallback_t *callback,
> + const struct lttng_ust__tracelog_sourceinfo *source,
> + const char *fmt, va_list ap)
> +{
> + char *msg;
> + const int len = vasprintf(, fmt, ap);
> +
> + /* len does not include the final \0 */
> + if (len >= 0)
> + goto end;
> + (*callback)(source->file, source->line, source->func, msg, len,
> + LTTNG_UST_CALLER_IP());
> + free(msg);
> +end:
> + return;
> +}
> +
> +
> +void lttng_ust__tracelog_printf(tpcallback_t *callback,
> + const struct lttng_ust__tracelog_sourceinfo *source, const char *fmt, 
> ...)
> +{
> + va_list ap;
> +
> + va_start(ap, fmt);
> + lttng_ust___tracelog_vprintf(callback, source, fmt, ap);
> + va_end(ap);
> +}
> +
> +void lttng_ust__tracelog_vprintf(tpcallback_t *callback,
> + const struct lttng_ust__tracelog_sourceinfo *source, const char *fmt, 
> va_list
> ap)
> +{
> + lttng_ust___tracelog_vprintf(callback, source, fmt, ap);
> +}
> --
> 2.30.2
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] reading context fields causes syscalls

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 9:42 AM, Norbert Lange nolang...@gmail.com wrote:

> Am Do., 20. Mai 2021 um 15:28 Uhr schrieb Mathieu Desnoyers
> :
>>
>> - On May 20, 2021, at 8:46 AM, Norbert Lange nolang...@gmail.com wrote:
>>
>> > Am Mi., 19. Mai 2021 um 20:52 Uhr schrieb Mathieu Desnoyers
>> > :
>> >>
>> >> - On May 19, 2021, at 8:11 AM, lttng-dev lttng-dev@lists.lttng.org 
>> >> wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > Several context fields will cause a syscall atleast the first time a
>> >> > tracepoint is
>> >> > recorded. For example all of the following:
>> >> >
>> >> > `lttng add-context -c chan --userspace --type=vpid --type=vtid 
>> >> > --type=procname`
>> >> >
>> >> > Each of them seems cached in TLS however, and most should never change
>> >> > after startup.
>> >> >
>> >> > As I am using Lttng over Xenomai, syscalls are strictly forbidden, I
>> >> > would like to have some function that prepares all data, which I can
>> >> > call on each thread before it switches to realtime work.
>> >> >
>> >> > Kinda similar to urcu_bp_register_thread, I'd like to have some
>> >> > `lttng_ust_warmup_thread` function that fetches the context values
>> >> > that can be cached. (urcu_bp_register_thread should be called there
>> >> > aswell)
>> >> > I considered just doing a tracepoint, but AFAIK the channel can be
>> >> > changed/configured after the process is running. So this is not robust
>> >> > enough.
>> >>
>> >> The new lttng_ust_init_thread() API in lttng-ust 2.13 would be the right
>> >> place to do this I think:
>> >>
>> >> /*
>> >>  * Initialize this thread's LTTng-UST data structures. There is
>> >>  * typically no need to call this, because LTTng-UST initializes its
>> >>  * per-thread data structures lazily, but it should be called explicitly
>> >>  * upon creation of each thread before signal handlers nesting over
>> >>  * those threads use LTTng-UST tracepoints.
>> >>  */
>> >>
>> >> It would make sense that this new initialization helper also initializes
>> >> all contexts which cache the result of a system call. Considering that
>> >> contexts can be used from the filter and capture bytecode interpreter, as
>> >> well as contexts added to channels, I think we'd need to simply initialize
>> >> them all.
>> >
>> > Yeah, just figured that it doesnt help at all if I do a tracepoint, as
>> > it might just be disabled ;)
>> > lttng_ust_init_thread() sounds right for that, maybe add one or 2 
>> > arguments for
>> > stuff you want initialized / dont want initialized over the default.
>> >
>> > I take that the downside of eager initialization is potentially wasted
>> > resources (now ignoring any one-time runtime cost).
>>
>> I would not want to introduce too much coupling between the application and
>> the tracer though. The public API I've added for the 2.13 release cycle takes
>> no argument, and I'm not considering changing that at this stage (we are 
>> already
>> at -rc2, so we're past the API freeze).
> 
> Ok, figured if that's preferred then nows the last chance

Actually I did an exception between rc1 and rc2 when I changed the probe 
provider's
API and ABI, but I don't expect to do any more breaking API/ABI changes onwards.
We have customers now using rc2 as a stable baseline, and I need a really strong
argument to break ABI at this stage.

Adding features to lttng_ust_init_thread() does not fit in that category. This 
should
have happened before -rc1 if we wished to do that.

> 
>>
>> I'd be open to adding an extra API with a different purpose though. Currently
>> lttng_ust_init_thread is meant to initialize per-thread data structures for
>> tracing signal handlers.
>>
>> Your use-case is different: you aim at tracing from a context which cannot
>> issue system calls. Basically, any attempt to issue a system call from that
>> thread after this is a no-go. I would be tempted to introduce something like
>> "lttng_ust_{set,clear}_thread_no_syscall" or such, which would have the
>> following
>> effects when set:
> 
> No systemcalls and no clock_gettime().

Right, no clock_gettime because of the seqlock.

>>
>> * Force immediate initiali

Re: [lttng-dev] LTTng - Xenomai : different results between timestamp-lttng and rt_time_read()

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 9:54 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> - On May 20, 2021, at 5:11 AM, lttng-dev lttng-dev@lists.lttng.org wrote:
> 
>> Am Do., 20. Mai 2021 um 10:28 Uhr schrieb MONTET Julien
>> :
>>>
>>> Hi Norbert,
>>>
>>> Thank you for your answer !
>>>
>>> Yes, I am using a Xenomai cobalt - xenomai is 3.1
>>> cat /proc/xenomai/version => 3.1
>>>
>>> After the installation, I tested "test tools" in /proc/xenomai/ and it 
>>> worked
>>> nice.
>> 
>> Just asked to make sure, thought the scripts usual add some -xeno tag
>> to the kernel version.
>> 
>>> What do you mean by "it might deadlock really good" ?
>> 
>> clock_gettime will either use a syscall (kills realtime always) or is
>> optimized via VDSO (which very likely is your case).
>> 
>> What happens is that the kernel will take a spinlock, then write new
>> values, then releases the spinlock.
>> your program will aswell spin (but just to see if the spinlock is
>> free), read the values and interpolates them.
>> 
>> But if your program interrupts the kernel while the kernel holds the
>> lock (all on the same cpu core), then it will spin forever and the
>> kernel will never execute.
> 
> Just one clarification: the specific locking strategy used by the
> Linux kernel monotonic clock vDSO is a "seqlock", where the kernel
> sets a bit which keeps concurrent readers looping until they observe

When I say "sets a bit", I actually mean "increment a sequence counter",
and readers observe either odd or even state, thus knowing whether
they need to retry, and whether the value read before/after reading
the data structure changed.

Thanks,

Mathieu

> a consistent value. With Xenomai it indeed appears to be prone to
> deadlock if a high priority Xenomai thread interrupts the kernel
> while the write seqlock is held, and then proceeds to loop forever
> on the read-side of the seqlock.
> 
> Note that for the in-kernel tracer clock read use-case, which
> needs to be able to happen from NMI context, I've contributed a
> modified version of the seqlock to the Linux kernel:
> 
> https://lwn.net/Articles/831540/ The seqcount latch lock type
> 
> It basically keeps two copies of the clock data structures, so the
> read-side never has to loop waiting for the updater: it simply gets
> redirected to the "stable" copy of the data.
> 
> The trade-off here is that with the latch lock used for clocks, a
> reader may observe time going slightly backwards between two clock
> reads when reading while specific clock rate adjustments are made
> by an updater. The clock user needs to be aware of this.
> 
> Thanks,
> 
> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] LTTng - Xenomai : different results between timestamp-lttng and rt_time_read()

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 5:11 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Am Do., 20. Mai 2021 um 10:28 Uhr schrieb MONTET Julien
> :
>>
>> Hi Norbert,
>>
>> Thank you for your answer !
>>
>> Yes, I am using a Xenomai cobalt - xenomai is 3.1
>> cat /proc/xenomai/version => 3.1
>>
>> After the installation, I tested "test tools" in /proc/xenomai/ and it worked
>> nice.
> 
> Just asked to make sure, thought the scripts usual add some -xeno tag
> to the kernel version.
> 
>> What do you mean by "it might deadlock really good" ?
> 
> clock_gettime will either use a syscall (kills realtime always) or is
> optimized via VDSO (which very likely is your case).
> 
> What happens is that the kernel will take a spinlock, then write new
> values, then releases the spinlock.
> your program will aswell spin (but just to see if the spinlock is
> free), read the values and interpolates them.
> 
> But if your program interrupts the kernel while the kernel holds the
> lock (all on the same cpu core), then it will spin forever and the
> kernel will never execute.

Just one clarification: the specific locking strategy used by the
Linux kernel monotonic clock vDSO is a "seqlock", where the kernel
sets a bit which keeps concurrent readers looping until they observe
a consistent value. With Xenomai it indeed appears to be prone to
deadlock if a high priority Xenomai thread interrupts the kernel
while the write seqlock is held, and then proceeds to loop forever
on the read-side of the seqlock.

Note that for the in-kernel tracer clock read use-case, which
needs to be able to happen from NMI context, I've contributed a
modified version of the seqlock to the Linux kernel:

https://lwn.net/Articles/831540/ The seqcount latch lock type

It basically keeps two copies of the clock data structures, so the
read-side never has to loop waiting for the updater: it simply gets
redirected to the "stable" copy of the data.

The trade-off here is that with the latch lock used for clocks, a
reader may observe time going slightly backwards between two clock
reads when reading while specific clock rate adjustments are made
by an updater. The clock user needs to be aware of this.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH] Avoid using the heap for small strings with tracef/tracelog

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 8:20 AM, Norbert Lange nolang...@gmail.com wrote:

> This patch was intended for 2.12, no src/common/tracer.h there.

You should post feature patches against the master branch. Or if you just
want to send a RFC, and you work on an older branch, please state that the
patch is RFC and which branch the patch is against, but that work won't
be merged unless it's rebased against master.

> Also this patch is brocken, as a va_list cant be iterated twice.

Right. But you could easily do:

va_start() [ first use ] va_end()
then
va_start() [ second use ] va_end(), right ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] reading context fields causes syscalls

2021-05-20 Thread Mathieu Desnoyers via lttng-dev

- On May 20, 2021, at 8:46 AM, Norbert Lange nolang...@gmail.com wrote:

> Am Mi., 19. Mai 2021 um 20:52 Uhr schrieb Mathieu Desnoyers
> :
>>
>> - On May 19, 2021, at 8:11 AM, lttng-dev lttng-dev@lists.lttng.org wrote:
>>
>> > Hello,
>> >
>> > Several context fields will cause a syscall atleast the first time a
>> > tracepoint is
>> > recorded. For example all of the following:
>> >
>> > `lttng add-context -c chan --userspace --type=vpid --type=vtid 
>> > --type=procname`
>> >
>> > Each of them seems cached in TLS however, and most should never change
>> > after startup.
>> >
>> > As I am using Lttng over Xenomai, syscalls are strictly forbidden, I
>> > would like to have some function that prepares all data, which I can
>> > call on each thread before it switches to realtime work.
>> >
>> > Kinda similar to urcu_bp_register_thread, I'd like to have some
>> > `lttng_ust_warmup_thread` function that fetches the context values
>> > that can be cached. (urcu_bp_register_thread should be called there
>> > aswell)
>> > I considered just doing a tracepoint, but AFAIK the channel can be
>> > changed/configured after the process is running. So this is not robust
>> > enough.
>>
>> The new lttng_ust_init_thread() API in lttng-ust 2.13 would be the right
>> place to do this I think:
>>
>> /*
>>  * Initialize this thread's LTTng-UST data structures. There is
>>  * typically no need to call this, because LTTng-UST initializes its
>>  * per-thread data structures lazily, but it should be called explicitly
>>  * upon creation of each thread before signal handlers nesting over
>>  * those threads use LTTng-UST tracepoints.
>>  */
>>
>> It would make sense that this new initialization helper also initializes
>> all contexts which cache the result of a system call. Considering that
>> contexts can be used from the filter and capture bytecode interpreter, as
>> well as contexts added to channels, I think we'd need to simply initialize
>> them all.
> 
> Yeah, just figured that it doesnt help at all if I do a tracepoint, as
> it might just be disabled ;)
> lttng_ust_init_thread() sounds right for that, maybe add one or 2 arguments 
> for
> stuff you want initialized / dont want initialized over the default.
> 
> I take that the downside of eager initialization is potentially wasted
> resources (now ignoring any one-time runtime cost).

I would not want to introduce too much coupling between the application and
the tracer though. The public API I've added for the 2.13 release cycle takes
no argument, and I'm not considering changing that at this stage (we are already
at -rc2, so we're past the API freeze).

I'd be open to adding an extra API with a different purpose though. Currently
lttng_ust_init_thread is meant to initialize per-thread data structures for
tracing signal handlers.

Your use-case is different: you aim at tracing from a context which cannot
issue system calls. Basically, any attempt to issue a system call from that
thread after this is a no-go. I would be tempted to introduce something like
"lttng_ust_{set,clear}_thread_no_syscall" or such, which would have the 
following
effects when set:

* Force immediate initialization of all thread's cached context information,
* Set a TLS variable flag indicating that the tracer should not do any system
  call whatsoever. The tracer could either use dummy data (zeroes), log an 
error,
  or abort() the process if a thread in no_syscall mode attempts to issue a 
system
  call. This could be dynamically selected by a new environment variable.
* Prevent threads in no_syscall mode from calling the write() system call on
  sub-buffer switch (of course the read-timer channel option is preferred).

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [PATCH] Avoid using the heap for small strings with tracef/tracelog

2021-05-19 Thread Mathieu Desnoyers via lttng-dev

- On May 19, 2021, at 12:56 PM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Try to use vsnprintf with a 512 Byte buffer on the Stack,
> if that fails allocate a larger one.

I agree with the approach.

Can we move the hardcoded "512" to a #define within src/common/tracer.h ?

Thanks,

Mathieu

> 
> Signed-off-by: Norbert Lange 
> ---
> liblttng-ust/tracef.c   | 13 ++---
> liblttng-ust/tracelog.c | 12 +---
> 2 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/liblttng-ust/tracef.c b/liblttng-ust/tracef.c
> index ea98e43e..18c29e25 100644
> --- a/liblttng-ust/tracef.c
> +++ b/liblttng-ust/tracef.c
> @@ -32,17 +32,24 @@
> void _lttng_ust_tracef(const char *fmt, ...)
> {
>   va_list ap;
> - char *msg;
> + char local_buf[512];
> + char *msg = local_buf;
>   int len;
> 
>   va_start(ap, fmt);
> - len = vasprintf(, fmt, ap);
> + len = vsnprintf(local_buf, sizeof(local_buf), fmt, ap);
> + if (len >= sizeof(local_buf)) {
> + msg = (char *)malloc(len + 1);
> + len = msg ? vsnprintf(msg, len + 1, fmt, ap) : -1;
> + }
> +
>   /* len does not include the final \0 */
>   if (len < 0)
>   goto end;
>   __tracepoint_cb_lttng_ust_tracef___event(msg, len,
>   LTTNG_UST_CALLER_IP());
> - free(msg);
> end:
> + if (msg != local_buf)
> + free(msg);
>   va_end(ap);
> }
> diff --git a/liblttng-ust/tracelog.c b/liblttng-ust/tracelog.c
> index 65fc87ed..a5d110fa 100644
> --- a/liblttng-ust/tracelog.c
> +++ b/liblttng-ust/tracelog.c
> @@ -35,19 +35,25 @@
>   const char *fmt, ...) \
>   { \
>   va_list ap; \
> - char *msg; \
> + char local_buf[512]; \
> + char *msg = local_buf; \
>   int len; \
>   \
>   va_start(ap, fmt); \
> - len = vasprintf(, fmt, ap); \
> + len = vsnprintf(local_buf, sizeof(local_buf), fmt, ap); \
> + if (len >= sizeof(local_buf)) { \
> + msg = (char *)malloc(len + 1); \
> + len = msg ? vsnprintf(msg, len + 1, fmt, ap) : -1; \
> + } \
>   /* len does not include the final \0 */ \
>   if (len < 0) \
>   goto end; \
>   __tracepoint_cb_lttng_ust_tracelog___##level(file, \
>   line, func, msg, len, \
>   LTTNG_UST_CALLER_IP()); \
> - free(msg); \
>   end: \
> + if (msg != local_buf) \
> + free(msg); \
>   va_end(ap); \
>   }
> 
> --
> 2.30.2
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] reading context fields causes syscalls

2021-05-19 Thread Mathieu Desnoyers via lttng-dev

- On May 19, 2021, at 8:11 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hello,
> 
> Several context fields will cause a syscall atleast the first time a
> tracepoint is
> recorded. For example all of the following:
> 
> `lttng add-context -c chan --userspace --type=vpid --type=vtid 
> --type=procname`
> 
> Each of them seems cached in TLS however, and most should never change
> after startup.
> 
> As I am using Lttng over Xenomai, syscalls are strictly forbidden, I
> would like to have some function that prepares all data, which I can
> call on each thread before it switches to realtime work.
> 
> Kinda similar to urcu_bp_register_thread, I'd like to have some
> `lttng_ust_warmup_thread` function that fetches the context values
> that can be cached. (urcu_bp_register_thread should be called there
> aswell)
> I considered just doing a tracepoint, but AFAIK the channel can be
> changed/configured after the process is running. So this is not robust
> enough.

The new lttng_ust_init_thread() API in lttng-ust 2.13 would be the right
place to do this I think:

/*
 * Initialize this thread's LTTng-UST data structures. There is
 * typically no need to call this, because LTTng-UST initializes its
 * per-thread data structures lazily, but it should be called explicitly
 * upon creation of each thread before signal handlers nesting over
 * those threads use LTTng-UST tracepoints.
 */

It would make sense that this new initialization helper also initializes
all contexts which cache the result of a system call. Considering that
contexts can be used from the filter and capture bytecode interpreter, as
well as contexts added to channels, I think we'd need to simply initialize
them all.

Thanks,

Mathieu

> 
> regards, Norbert
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [RELEASE] LTTng-modules 2.11.9 and 2.12.6 (Linux kernel tracer)

2021-05-14 Thread Mathieu Desnoyers via lttng-dev

- On May 14, 2021, at 3:52 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> Hi,
> 
> The 2.11.9 and 2.12.6 releases of the LTTng kernel tracer are the latest bug 
> fix
> releases
> of the 2.11 and 2.12 stable branches of the LTTng modules project.
> 
> There are a few minor bug fixes, but those are the noteworthy changes:
> 
> - Support for 5.12 Linux kernels,
> - Support recent stable kernel branches 4.14, 4.19, 5.4,
> - Support for newer Ubuntu 4.15, 5.4 and RHEL 8.2/8.3 kernels,
> - Fix: increment buffer offset when failing to copy from user-space:
> 
>Upon failure to copy from user-space due to failing access ok check, the
>ring buffer offset is not incremented, which could generate unreadable
>traces because we don't account for the padding we write into the ring
>buffer.
>
>Note that this typically won't affect a common use-case of copying
>strings from user-space, because unless mprotect is invoked within a
>narrow race window (between user strlen and user strcpy), the strlen
>will fail on access ok when calculating the space to reserve, which will
>match what happens on strcpy.

There is one additional noteworthy set of changes in lttng-modules 2.12.6:

* Disable block rwbs bitwise enum in default build
* Disable sched_switch bitwise enum in default build
* Add experimental bitwise enum config option

The block rwbs bitwise enum and sched switch bitwise enum were introduced late
in the 2.12 rc cycle, and ended up producing traces which were not strictly
conformant with the CTF 1.8 specifications: they were producing enumerations
with values associated with no known labels.

This causes Babeltrace 1 and 2, with default options, to generate warnings when
viewing kernel traces.

So those commits introduce a new build-time option which can optionally enable
those bitwise enumerations, e.g.:

 make CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM=y

So until we finalize the CTF2 specification and its implementation in the 
tracers
and trace viewers, this provides a way for experimental users of LTTng-modules 
to
keep generating those bitwise enumerations without producing warnings in the
default use-case.

Without this option, a simple integer field is emitted in the trace rather than 
a
bitwise enumeration.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> Project website: https://lttng.org
> Documentation: https://lttng.org/docs
> Download link: https://lttng.org/download
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng-modules 2.11.9 and 2.12.6 (Linux kernel tracer)

2021-05-14 Thread Mathieu Desnoyers via lttng-dev

Hi,

The 2.11.9 and 2.12.6 releases of the LTTng kernel tracer are the latest bug 
fix releases
of the 2.11 and 2.12 stable branches of the LTTng modules project.

There are a few minor bug fixes, but those are the noteworthy changes:

- Support for 5.12 Linux kernels,
- Support recent stable kernel branches 4.14, 4.19, 5.4,
- Support for newer Ubuntu 4.15, 5.4 and RHEL 8.2/8.3 kernels,
- Fix: increment buffer offset when failing to copy from user-space:

Upon failure to copy from user-space due to failing access ok check, the
ring buffer offset is not incremented, which could generate unreadable
traces because we don't account for the padding we write into the ring
buffer.

Note that this typically won't affect a common use-case of copying
strings from user-space, because unless mprotect is invoked within a
narrow race window (between user strlen and user strcpy), the strlen
will fail on access ok when calculating the space to reserve, which will
match what happens on strcpy.

Thanks,

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng-UST 2.11.4 and 2.12.2

2021-05-14 Thread Mathieu Desnoyers via lttng-dev

Hi,

LTTng-UST 2.11.4 and 2.12.2 are bug fix releases of the LTTng-UST user-space 
tracer for the
2.11 and 2.12 stable branches.

There are a few fixes, as well as a in-depth rework of the benchmark code which 
was sitting
in the tests of lttng-ust. This test code was contributed more than 10 years 
ago before I took
maintainership of the LTTng-UST project, and it was in a broken state (always 
returned
0 second on my test machine because it was truncating significant digits). So I 
reworked the
benchmark in the 2.11 and 2.12 stable branches to facilitate comparison of 
benchmarks between
branches, especially given the major rework done in the upcoming 2.13 version.

Thanks,

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] User-space RCU: call rcu_barrier() before dissociating helper thread?

2021-05-05 Thread Mathieu Desnoyers via lttng-dev

- On May 5, 2021, at 3:54 AM, Martin Wilck mwi...@suse.com wrote:

> On Fri, 2021-04-30 at 14:41 -0400, Mathieu Desnoyers wrote:
>> - On Apr 29, 2021, at 9:49 AM, lttng-dev
>> lttng-dev@lists.lttng.org wrote:
>> 
>> > In multipath-tools, we are using a custom RCU helper thread, which
>> > is cleaned
>> > out
>> > on exit:
>> > 
>> > https://github.com/opensvc/multipath-tools/blob/23a01fa679481ff1144139222fbd2c4c863b78f8/multipathd/main.c#L3058
>> > 
>> > I put a call to rcu_barrier() there in order to make sure all
>> > callbacks had
>> > finished
>> > before detaching the helper thread.
>> > 
>> > Now we got a report that rcu_barrier() isn't available before user-
>> > space RCU 0.8
>> > (https://github.com/opensvc/multipath-tools/issues/5) (and RHEL7 /
>> > Centos7
>> > still has 0.7.16).
>> > 
>> > Question: was it over-cautious or otherwise wrong to call
>> > rcu_barrier() before
>> > set_thread_call_rcu_data(NULL)? Can we maybe just skip this call?
>> > If no, what
>> > would be the recommended way for liburcu < 0.8 to dissociate a
>> > helper thread?
>> > 
>> > (Note: I'm not currently subscribed to lttng-dev).
>> 
>> First of all, there is a significant reason why liburcu does not free
>> the "default"
>> call_rcu worker thread data structures at process exit. This is
>> caused by the fact that
>> a call_rcu callback may very well invoke call_rcu() to re-enqueue
>> more work.
>> 
>> AFAIU this is somewhat similar to what happens to the Linux kernel
>> RCU implementation
>> when the machine needs to be shutdown or rebooted: there may indeed
>> never be any point
>> in time where it is safe to free the call_rcu worker thread data
>> structures without leaks,
>> due to the fact that a call_rcu callback may re-enqueue further work
>> indefinitely.
>> 
>> So my understanding is that you implement your own call rcu worker
>> thread because the
>> one provided by liburcu leaks data structure on process exit, and you
>> expect that
>> call rcu_barrier once will suffice to ensure quiescence of the call
>> rcu worker thread
>> data structures. Unfortunately, this does not cover the scenario
>> where a call_rcu
>> callback re-enqueues additional work.
> 
> I understand. In multipath-tools, we only have one callback, which
> doesn't re-enqueue any work. Our callback really just calls free() on a
> data structure. And it's unlikely that we'll get more RCU callbacks any
> time soon.
> 
> So, to clarify my question: Does it make sense to call rcu_barrier()
> before set_thread_call_rcu_data(NULL) in this case?

Yes, it would ensure that all pending callbacks are executed prior to
removing the worker thread. And considering that you don't have chained
callbacks, it makes sense to invoke rcu_barrier() only once.

> If yes, is there an
> alternative for safely detaching the custom RCU thread if rcu_barrier()
> is unavailable?

I suspect you could re-implement something similar to rcu_barrier() within
your application through call_rcu and a rendez-vous synchronization. It
all depends on how much complexity you want to add to your application
for the sake of not leaking data structures when using old versions of
liburcu.

> 
>> So without knowing more details on the reasons why you wish to clean
>> up memory at
>> process exit, and why it would be valid to do so in your particular
>> use-case, it's
>> rather difficult for me to elaborate a complete answer.
> 
> multipathd is a long-running process, so being wary of memory leaks is
> important. valgrind tests pop up an ugly warning about liburcu - it's
> obviously not a big issue, as it occurs only on exit, but it makes a
> negative impression on users running memory leak tests. It's possible
> to work around that by using valgrind "suppressions", but so far my
> policy was to use these only as last resort measure, in case we
> couldn't find any way to work around it in our code. That's why I came
> up with the "custom RCU thread" approach.
> 
> Anyway, from what you're saying, it might be be better to simply accept
> the fact that this pseudo-memory-leak exists than trying to fix it in
> an unsafe way with older liburcu versions.

If we push this line of thinking to the extreme, we should look into what
improvement should be to to liburcu upstream so we fix this situation in
the future, and then you can decide how you want to handle legacy liburcu
on your side.

>> I can see that maybe we could change l

Re: [lttng-dev] User-space RCU: call rcu_barrier() before dissociating helper thread?

2021-04-30 Thread Mathieu Desnoyers via lttng-dev

- On Apr 29, 2021, at 9:49 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> In multipath-tools, we are using a custom RCU helper thread, which is cleaned
> out
> on exit:
> 
> https://github.com/opensvc/multipath-tools/blob/23a01fa679481ff1144139222fbd2c4c863b78f8/multipathd/main.c#L3058
> 
> I put a call to rcu_barrier() there in order to make sure all callbacks had
> finished
> before detaching the helper thread.
> 
> Now we got a report that rcu_barrier() isn't available before user-space RCU 
> 0.8
> (https://github.com/opensvc/multipath-tools/issues/5) (and RHEL7 / Centos7
> still has 0.7.16).
> 
> Question: was it over-cautious or otherwise wrong to call rcu_barrier() before
> set_thread_call_rcu_data(NULL)? Can we maybe just skip this call? If no, what
> would be the recommended way for liburcu < 0.8 to dissociate a helper thread?
> 
> (Note: I'm not currently subscribed to lttng-dev).

First of all, there is a significant reason why liburcu does not free the 
"default"
call_rcu worker thread data structures at process exit. This is caused by the 
fact that
a call_rcu callback may very well invoke call_rcu() to re-enqueue more work.

AFAIU this is somewhat similar to what happens to the Linux kernel RCU 
implementation
when the machine needs to be shutdown or rebooted: there may indeed never be 
any point
in time where it is safe to free the call_rcu worker thread data structures 
without leaks,
due to the fact that a call_rcu callback may re-enqueue further work 
indefinitely.

So my understanding is that you implement your own call rcu worker thread 
because the
one provided by liburcu leaks data structure on process exit, and you expect 
that
call rcu_barrier once will suffice to ensure quiescence of the call rcu worker 
thread
data structures. Unfortunately, this does not cover the scenario where a 
call_rcu
callback re-enqueues additional work.

So without knowing more details on the reasons why you wish to clean up memory 
at
process exit, and why it would be valid to do so in your particular use-case, 
it's
rather difficult for me to elaborate a complete answer.

I can see that maybe we could change liburcu to make it so that we free all
call_rcu data structures _if_ they happen to be empty of callbacks at process 
exit,
after invoking one rcu_barrier. That should take care of not leaking data 
structures
in the common case where call_rcu does not enqueue further callbacks.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] [RELEASE] LTTng 2.13.0-rc1 - Nordicité - Linux kernel and user-space tracer

2021-04-23 Thread Mathieu Desnoyers via lttng-dev

ssion daemon to take a snapshot of the
session with the given name. See lttng-snapshot(1) for more information
about the session snapshot concept. If no session with the given name exist
at the time the condition is met, nothing is done.

These new actions can also be combined together. For instance, the following
trigger will stop `my_session`, record a snapshot of `my_session`, and notify
any listening application when '/etc/passwd' is opened:

$ lttng add-trigger
--condition event-rule-matches
  --domain kernel
  --type syscall
  --name "openat"
  --filter 'filename == "/etc/passwd"'
--action stop-session my_session
--action snapshot-session my_session
--action notify

For more information, see the following manual pages:
  - lttng-add-trigger(1),
  - lttng-remove-trigger(1),
  - lttng-list-triggers(1).


Notification payload capture
---

The new event-rule matches condition type also allows 'captures'.
This allow event record and context fields to be captured when an event-rule
matches condition is satisfied.

The captured field values are made available in the evaluation object of the
notifications transmitted to listening applications.

Capture descriptors can be specified using a syntax reminiscent of the one used
by the filter expressions.

The following example will capture a process's name and the 'filename' argument
of all `openat()` system calls:

$ lttng add-trigger
--condition event-rule-matches
  --domain kernel
  --type syscall
  --name "openat"
  --capture "filename"
  --capture "$ctx.procname"
--action notify

See the lttng-add-trigger(1) manual page for more information.


vtracef and vtracelog (LTTng-UST)
---

New versions of the `tracef()` and `tracelog()` tracing helpers accepting
variable argument lists are introduced as `vtracef()` and `vtracelog()`.

See the tracef(3) and tracelog(3) manual pages for more information.


Add time namespace context (LTTng-UST and LTTng-modules)
---

It is now possible to add the time namespace of a process as a context to
channels (`time_ns`) using the `add-context` command.

See the time_namespaces(7) manual page for more information.


Links
---

Project website: https://lttng.org

Download links:
https://lttng.org/files/lttng-tools/lttng-tools-2.13.0-rc1.tar.bz2
https://lttng.org/files/lttng-ust/lttng-ust-2.13.0-rc1.tar.bz2
https://lttng.org/files/lttng-modules/lttng-modules-2.13.0-rc1.tar.bz2

GPG signatures:
https://lttng.org/files/lttng-tools/lttng-tools-2.13.0-rc1.tar.bz2.asc
https://lttng.org/files/lttng-ust/lttng-ust-2.13.0-rc1.tar.bz2.asc
https://lttng.org/files/lttng-modules/lttng-modules-2.13.0-rc1.tar.bz2.asc

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH][RFC] tracing: Enable tracepoints via module parameters

2021-04-20 Thread Mathieu Desnoyers

- On Apr 20, 2021, at 10:55 AM, rostedt rost...@goodmis.org wrote:

> On Tue, 20 Apr 2021 09:29:27 -0400 (EDT)
> Mathieu Desnoyers  wrote:
> 
>> - On Apr 20, 2021, at 8:55 AM, rostedt rost...@goodmis.org wrote:
>> [...]
>> > 
>> > Would adding automatic module parameters be an issue? That is, you can add
>> > in the insmod command line a parameter that will enable tracepoints. We
>> > could have a way to even see them from the modinfo. I think I had that
>> > working once, and it wasn't really that hard to do.
>> 
>> There is one thing we should consider here in terms of namespacing: those 
>> module
>> command line parameters should be specific to each tracer (e.g. ftrace, perf,
>> ebpf).
>> 
>> LTTng for instance already tackles early module load tracing in a different
>> way: users can enable instrumentation of yet-to-be loaded kernel modules. So
>> it would not make sense in that scheme to have module load parameters.
>> 
>> It's a different trade-off in terms of error reporting though: for instance,
>> LTTng won't report an error if a user does a typo when entering an event 
>> name.
>> 
>> So I think those command line parameters should be tracer-specific, do you 
>> agree
>> ?
> 
> 
> No, I do not agree. I would like to make it consistent with the kernel
> command line. As you can put in: "trace_event=sched_switch" and the
> sched_switch trace point will be enable (for the tracefs directory) on boot
> up. The same should be for modules as well.
> 
> It shouldn't affect LTTng, as you already have a way to enable them as they
> get loaded.

That sounds fine. So that would be within the "trace_event" (and not tracepoint)
namespace for module load parameters as well ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH][RFC] tracing: Enable tracepoints via module parameters

2021-04-20 Thread Mathieu Desnoyers

- On Apr 20, 2021, at 8:55 AM, rostedt rost...@goodmis.org wrote:
[...]
> 
> Would adding automatic module parameters be an issue? That is, you can add
> in the insmod command line a parameter that will enable tracepoints. We
> could have a way to even see them from the modinfo. I think I had that
> working once, and it wasn't really that hard to do.

There is one thing we should consider here in terms of namespacing: those module
command line parameters should be specific to each tracer (e.g. ftrace, perf, 
ebpf).

LTTng for instance already tackles early module load tracing in a different
way: users can enable instrumentation of yet-to-be loaded kernel modules. So
it would not make sense in that scheme to have module load parameters.

It's a different trade-off in terms of error reporting though: for instance,
LTTng won't report an error if a user does a typo when entering an event name.

So I think those command line parameters should be tracer-specific, do you 
agree ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-19 Thread Mathieu Desnoyers via lttng-dev

- On Apr 19, 2021, at 11:41 AM, Duncan Sands baldr...@free.fr wrote:

> Hi Mathieu,
> 
> On 4/19/21 5:31 PM, Mathieu Desnoyers wrote:
>> - On Apr 19, 2021, at 5:41 AM, Duncan Sands baldr...@free.fr wrote:
>> 
>> 
>>>
>>>> Quick question: should we use __atomic_load() or atomic_load_explicit() 
>>>> (C) and
>>>> (std::atomic<__typeof__(x)>)(x)).load() (C++) ?
>>>
>>> If both are available, is there any advantage to using the C++ version when
>>> compiling C++?  As opposed to using the C11 one for both C and C++?
>> 
>> I recently noticed that using C11/C++11 atomic load explicit is not a good
>> fit for rcu_dereference, because we want the type to be a pointer, not an
>> _Atomic type. gcc appears to accept a looser typing, but clang has issues
>> trying to build that code.
> 
> in the long run maybe the original variables should be declared with the
> appropriate atomic type from the get-go.

Considering that rcu_dereference is public API, we would have to wait until we
do a major soname ABI bump _and_ an API break to do that, which I am very
reluctant to do, especially for the API break part.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-19 Thread Mathieu Desnoyers via lttng-dev

- On Apr 19, 2021, at 5:41 AM, Duncan Sands baldr...@free.fr wrote:

> 
>> Quick question: should we use __atomic_load() or atomic_load_explicit() (C) 
>> and
>> (std::atomic<__typeof__(x)>)(x)).load() (C++) ?
> 
> If both are available, is there any advantage to using the C++ version when
> compiling C++?  As opposed to using the C11 one for both C and C++?

I recently noticed that using C11/C++11 atomic load explicit is not a good
fit for rcu_dereference, because we want the type to be a pointer, not an
_Atomic type. gcc appears to accept a looser typing, but clang has issues
trying to build that code.

So I plan to use __atomic(p, v, __ATOMIC_CONSUME) instead in both C and C++.

Also, I'll drop the cmm_smp_read_barrier_depends() when using __ATOMIC_CONSUME,
because AFAIU their memory ordering semantics are redundant for rcu_dereference.

Here is the resulting commit for review on gerrit:

https://review.lttng.org/c/userspace-rcu/+/5455 Fix: use __atomic_load() rather 
than atomic load explicit [NEW]

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers via lttng-dev

- On Apr 16, 2021, at 11:22 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hi Mathieu,
> 

Hi Duncan,

> On 4/16/21 4:52 PM, Mathieu Desnoyers via lttng-dev wrote:
>> Hi Paul, Will, Peter,
>> 
>> I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
>> is able to break rcu_dereference. This seems to be taken care of by
>> arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.
>> 
>> In the liburcu user-space library, we have this comment near 
>> rcu_dereference()
>> in
>> include/urcu/static/pointer.h:
>> 
>>   * The compiler memory barrier in CMM_LOAD_SHARED() ensures that
>>   value-speculative
>>   * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform 
>> the
>>   * data read before the pointer read by speculating the value of the 
>> pointer.
>>   * Correct ordering is ensured because the pointer is read as a volatile 
>> access.
>>   * This acts as a global side-effect operation, which forbids reordering of
>>   * dependent memory operations. Note that such concern about 
>> dependency-breaking
>>   * optimizations will eventually be taken care of by the 
>> "memory_order_consume"
>>   * addition to forthcoming C++ standard.
>> 
>> (note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was 
>> introduced in
>> liburcu as a public API before READ_ONCE() existed in the Linux kernel)
> 
> this is not directly on topic, but what do you think of porting userspace RCU 
> to
> use the C++ memory model and GCC/LLVM atomic builtins (__atomic_store etc)
> rather than rolling your own?  Tools like thread sanitizer would then 
> understand
> what userspace RCU is doing.  Not to mention the compiler.  More developers
> would understand it too!

Yes, that sounds like a clear win.

> From a code organization viewpoint, going down this path would presumably mean
> directly using GCC/LLVM atomic support when available, and falling back on
> something like the current uatomic to emulate them for older compilers.

Yes, I think this approach would be good. One caveat though: the GCC atomic
operations were known to be broken with some older compilers for specific 
architectures,
so we may have to keep track of a list of known buggy compilers to use our own
implementation instead in those situations. It's been a while since I've looked 
at
this though, so we may not even be supporting those old compilers in liburcu 
anymore.

> 
> Some parts of uatomic have pretty clear equivalents (see below), but not all, 
> so
> the conversion could be quite tricky.

We'd have to see on a case by case basis, but it cannot hurt to start the effort
by integrating the easy ones.

> 
>> Peter tells me the "memory_order_consume" is not something which can be used
>> today.
> 
> This is a pity, because it seems to have been invented with rcu_dereference in
> mind.

Actually, (see other leg of this email thread) memory_order_consume works for
rcu_dereference, but it appears to be implemented as a slightly heavier than
required memory_order_acquire on weakly-ordered architectures. So we're just
moving the issue into compiler-land. Oh well.

> 
>> Any information on its status at C/C++ standard levels and 
>> implementation-wise ?
>> 
>> Pragmatically speaking, what should we change in liburcu to ensure we don't
>> generate
>> broken code when LTO is enabled ? I suspect there are a few options here:
>> 
>> 1) Fail to build if LTO is enabled,
>> 2) Generate slower code for rcu_dereference, either on all architectures or 
>> only
>> on weakly-ordered architectures,
>> 3) Generate different code depending on whether LTO is enabled or not. AFAIU
>> this would only
>> work if every compile unit is aware that it will end up being optimized 
>> with
>> LTO. Not sure
>> how this could be done in the context of user-space.
>> 4) [ Insert better idea here. ]
>> 
>> Thoughts ?
> 
> Best wishes, Duncan.
> 
> PS: We are experimentally running with the following patch, as it already 
> makes
> thread sanitizer a lot happier:

Quick question: should we use __atomic_load() or atomic_load_explicit() (C) and
(std::atomic<__typeof__(x)>)(x)).load() (C++) ?

We'd have to make this dependent on C11/C++11 though, and keep volatile for 
older
compilers.

Last thing: I have limited time to work on this, so if you have well-tested 
patches
you wish to submit, I'll do my best to review them!

Thanks,

Mathieu

> 
> --- a/External/UserspaceRCU/userspace-rcu/include/urcu/system.h
> 
> +++ b/External/UserspaceRCU/userspace-rcu/include/urcu/system.h
> 
>

Re: [lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers via lttng-dev

- On Apr 16, 2021, at 3:02 PM, paulmck paul...@kernel.org wrote:
[...]
> 
> If it can be done reasonably, I suggest also having some way for the
> person building userspace RCU to say "I know what I am doing, so do
> it with volatile rather than memory_order_consume."

Like so ?

#define CMM_ACCESS_ONCE(x) (*(__volatile__  __typeof__(x) *)&(x))
#define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p)

/*
 * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
 * volatile access to implement rcu_dereference rather than
 * memory_order_consume load from the C11/C++11 standards.
 *
 * This may improve performance on weakly-ordered architectures where
 * the compiler implements memory_order_consume as a
 * memory_order_acquire, which is stricter than required by the
 * standard.
 *
 * Note that using volatile accesses for rcu_dereference may cause
 * LTO to generate incorrectly ordered code starting from C11/C++11.
 */

#ifdef URCU_DEREFERENCE_USE_VOLATILE
# define rcu_dereference(x) CMM_LOAD_SHARED(x)
#else
# if defined (__cplusplus)
#  if __cplusplus >= 201103L
#   include 
#   define rcu_dereference(x)   
((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume)
#  else
#   define rcu_dereference(x)   CMM_LOAD_SHARED(x)
#  endif
# else
#  if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
#   include 
#   define rcu_dereference(x)   atomic_load_explicit(&(x), memory_order_consume)
#  else
#   define rcu_dereference(x)   CMM_LOAD_SHARED(x)
#  endif
# endif
#endif

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers

- On Apr 16, 2021, at 3:02 PM, paulmck paul...@kernel.org wrote:
[...]
> 
> If it can be done reasonably, I suggest also having some way for the
> person building userspace RCU to say "I know what I am doing, so do
> it with volatile rather than memory_order_consume."

Like so ?

#define CMM_ACCESS_ONCE(x) (*(__volatile__  __typeof__(x) *)&(x))
#define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p)

/*
 * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
 * volatile access to implement rcu_dereference rather than
 * memory_order_consume load from the C11/C++11 standards.
 *
 * This may improve performance on weakly-ordered architectures where
 * the compiler implements memory_order_consume as a
 * memory_order_acquire, which is stricter than required by the
 * standard.
 *
 * Note that using volatile accesses for rcu_dereference may cause
 * LTO to generate incorrectly ordered code starting from C11/C++11.
 */

#ifdef URCU_DEREFERENCE_USE_VOLATILE
# define rcu_dereference(x) CMM_LOAD_SHARED(x)
#else
# if defined (__cplusplus)
#  if __cplusplus >= 201103L
#   include 
#   define rcu_dereference(x)   
((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume)
#  else
#   define rcu_dereference(x)   CMM_LOAD_SHARED(x)
#  endif
# else
#  if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
#   include 
#   define rcu_dereference(x)   atomic_load_explicit(&(x), memory_order_consume)
#  else
#   define rcu_dereference(x)   CMM_LOAD_SHARED(x)
#  endif
# endif
#endif

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers

- On Apr 16, 2021, at 12:01 PM, paulmck paul...@kernel.org wrote:

> On Fri, Apr 16, 2021 at 05:17:11PM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 16, 2021 at 10:52:16AM -0400, Mathieu Desnoyers wrote:
>> > Hi Paul, Will, Peter,
>> > 
>> > I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
>> > is able to break rcu_dereference. This seems to be taken care of by
>> > arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.
>> > 
>> > In the liburcu user-space library, we have this comment near 
>> > rcu_dereference()
>> > in
>> > include/urcu/static/pointer.h:
>> > 
>> >  * The compiler memory barrier in CMM_LOAD_SHARED() ensures that
>> >  value-speculative
>> >  * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform 
>> > the
>> >  * data read before the pointer read by speculating the value of the 
>> > pointer.
>> >  * Correct ordering is ensured because the pointer is read as a volatile 
>> > access.
>> >  * This acts as a global side-effect operation, which forbids reordering of
>> >  * dependent memory operations. Note that such concern about 
>> > dependency-breaking
>> >  * optimizations will eventually be taken care of by the 
>> > "memory_order_consume"
>> >  * addition to forthcoming C++ standard.
>> > 
>> > (note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was 
>> > introduced in
>> > liburcu as a public API before READ_ONCE() existed in the Linux kernel)
>> > 
>> > Peter tells me the "memory_order_consume" is not something which can be 
>> > used
>> > today.
>> > Any information on its status at C/C++ standard levels and 
>> > implementation-wise ?
> 
> Actually, you really can use memory_order_consume.  All current
> implementations will compile it as if it was memory_order_acquire.
> This will work correctly, but may be slower than you would like on ARM,
> PowerPC, and so on.
> 
> On things like x86, the penalty is forgone optimizations, so less
> of a problem there.

OK

> 
>> > Pragmatically speaking, what should we change in liburcu to ensure we don't
>> > generate
>> > broken code when LTO is enabled ? I suspect there are a few options here:
>> > 
>> > 1) Fail to build if LTO is enabled,
>> > 2) Generate slower code for rcu_dereference, either on all architectures 
>> > or only
>> >on weakly-ordered architectures,
>> > 3) Generate different code depending on whether LTO is enabled or not. 
>> > AFAIU
>> > this would only
>> >work if every compile unit is aware that it will end up being optimized 
>> > with
>> >LTO. Not sure
>> >how this could be done in the context of user-space.
>> > 4) [ Insert better idea here. ]
> 
> Use memory_order_consume if LTO is enabled.  That will work now, and
> might generate good code in some hoped-for future.

In the context of a user-space library, how does one check whether LTO is 
enabled with
preprocessor directives ? A quick test with gcc seems to show that both with 
and without
-flto cannot be distinguished from a preprocessor POV, e.g. the output of both

gcc --std=c11 -O2 -dM -E - < /dev/null
and
gcc --std=c11 -O2 -flto -dM -E - < /dev/null

is exactly the same. Am I missing something here ?

If we accept to use memory_order_consume all the time in both C and C++ code 
starting from
C11 and C++11, the following code snippet could do the trick:

#define CMM_ACCESS_ONCE(x) (*(__volatile__  __typeof__(x) *)&(x))
#define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p)

#if defined (__cplusplus)
# if __cplusplus >= 201103L
#  include 
#  define rcu_dereference(x)
((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume)
# else
#  define rcu_dereference(x)CMM_LOAD_SHARED(x)
# endif
#else
# if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
#  include 
#  define rcu_dereference(x)atomic_load_explicit(&(x), memory_order_consume)
# else
#  define rcu_dereference(x)CMM_LOAD_SHARED(x)
# endif
#endif

This uses the volatile approach prior to C11/C++11, and moves to 
memory_order_consume
afterwards. This will bring a performance penalty on weakly-ordered 
architectures even
when -flto is not specified though.

Then the burden is pushed on the compiler people to eventually implement an 
efficient
memory_order_consume.

Is that acceptable ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers via lttng-dev

- On Apr 16, 2021, at 12:01 PM, paulmck paul...@kernel.org wrote:

> On Fri, Apr 16, 2021 at 05:17:11PM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 16, 2021 at 10:52:16AM -0400, Mathieu Desnoyers wrote:
>> > Hi Paul, Will, Peter,
>> > 
>> > I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
>> > is able to break rcu_dereference. This seems to be taken care of by
>> > arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.
>> > 
>> > In the liburcu user-space library, we have this comment near 
>> > rcu_dereference()
>> > in
>> > include/urcu/static/pointer.h:
>> > 
>> >  * The compiler memory barrier in CMM_LOAD_SHARED() ensures that
>> >  value-speculative
>> >  * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform 
>> > the
>> >  * data read before the pointer read by speculating the value of the 
>> > pointer.
>> >  * Correct ordering is ensured because the pointer is read as a volatile 
>> > access.
>> >  * This acts as a global side-effect operation, which forbids reordering of
>> >  * dependent memory operations. Note that such concern about 
>> > dependency-breaking
>> >  * optimizations will eventually be taken care of by the 
>> > "memory_order_consume"
>> >  * addition to forthcoming C++ standard.
>> > 
>> > (note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was 
>> > introduced in
>> > liburcu as a public API before READ_ONCE() existed in the Linux kernel)
>> > 
>> > Peter tells me the "memory_order_consume" is not something which can be 
>> > used
>> > today.
>> > Any information on its status at C/C++ standard levels and 
>> > implementation-wise ?
> 
> Actually, you really can use memory_order_consume.  All current
> implementations will compile it as if it was memory_order_acquire.
> This will work correctly, but may be slower than you would like on ARM,
> PowerPC, and so on.
> 
> On things like x86, the penalty is forgone optimizations, so less
> of a problem there.

OK

> 
>> > Pragmatically speaking, what should we change in liburcu to ensure we don't
>> > generate
>> > broken code when LTO is enabled ? I suspect there are a few options here:
>> > 
>> > 1) Fail to build if LTO is enabled,
>> > 2) Generate slower code for rcu_dereference, either on all architectures 
>> > or only
>> >on weakly-ordered architectures,
>> > 3) Generate different code depending on whether LTO is enabled or not. 
>> > AFAIU
>> > this would only
>> >work if every compile unit is aware that it will end up being optimized 
>> > with
>> >LTO. Not sure
>> >how this could be done in the context of user-space.
>> > 4) [ Insert better idea here. ]
> 
> Use memory_order_consume if LTO is enabled.  That will work now, and
> might generate good code in some hoped-for future.

In the context of a user-space library, how does one check whether LTO is 
enabled with
preprocessor directives ? A quick test with gcc seems to show that both with 
and without
-flto cannot be distinguished from a preprocessor POV, e.g. the output of both

gcc --std=c11 -O2 -dM -E - < /dev/null
and
gcc --std=c11 -O2 -flto -dM -E - < /dev/null

is exactly the same. Am I missing something here ?

If we accept to use memory_order_consume all the time in both C and C++ code 
starting from
C11 and C++11, the following code snippet could do the trick:

#define CMM_ACCESS_ONCE(x) (*(__volatile__  __typeof__(x) *)&(x))
#define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p)

#if defined (__cplusplus)
# if __cplusplus >= 201103L
#  include 
#  define rcu_dereference(x)
((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume)
# else
#  define rcu_dereference(x)CMM_LOAD_SHARED(x)
# endif
#else
# if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
#  include 
#  define rcu_dereference(x)atomic_load_explicit(&(x), memory_order_consume)
# else
#  define rcu_dereference(x)CMM_LOAD_SHARED(x)
# endif
#endif

This uses the volatile approach prior to C11/C++11, and moves to 
memory_order_consume
afterwards. This will bring a performance penalty on weakly-ordered 
architectures even
when -flto is not specified though.

Then the burden is pushed on the compiler people to eventually implement an 
efficient
memory_order_consume.

Is that acceptable ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers via lttng-dev

Hi Paul, Will, Peter,

I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
is able to break rcu_dereference. This seems to be taken care of by
arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.

In the liburcu user-space library, we have this comment near rcu_dereference() 
in
include/urcu/static/pointer.h:

 * The compiler memory barrier in CMM_LOAD_SHARED() ensures that 
value-speculative
 * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform the
 * data read before the pointer read by speculating the value of the pointer.
 * Correct ordering is ensured because the pointer is read as a volatile access.
 * This acts as a global side-effect operation, which forbids reordering of
 * dependent memory operations. Note that such concern about dependency-breaking
 * optimizations will eventually be taken care of by the "memory_order_consume"
 * addition to forthcoming C++ standard.

(note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was introduced in
liburcu as a public API before READ_ONCE() existed in the Linux kernel)

Peter tells me the "memory_order_consume" is not something which can be used 
today.
Any information on its status at C/C++ standard levels and implementation-wise ?

Pragmatically speaking, what should we change in liburcu to ensure we don't 
generate
broken code when LTO is enabled ? I suspect there are a few options here:

1) Fail to build if LTO is enabled,
2) Generate slower code for rcu_dereference, either on all architectures or only
   on weakly-ordered architectures,
3) Generate different code depending on whether LTO is enabled or not. AFAIU 
this would only
   work if every compile unit is aware that it will end up being optimized with 
LTO. Not sure
   how this could be done in the context of user-space.
4) [ Insert better idea here. ]

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

2021-04-16 Thread Mathieu Desnoyers

Hi Paul, Will, Peter,

I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
is able to break rcu_dereference. This seems to be taken care of by
arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.

In the liburcu user-space library, we have this comment near rcu_dereference() 
in
include/urcu/static/pointer.h:

 * The compiler memory barrier in CMM_LOAD_SHARED() ensures that 
value-speculative
 * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform the
 * data read before the pointer read by speculating the value of the pointer.
 * Correct ordering is ensured because the pointer is read as a volatile access.
 * This acts as a global side-effect operation, which forbids reordering of
 * dependent memory operations. Note that such concern about dependency-breaking
 * optimizations will eventually be taken care of by the "memory_order_consume"
 * addition to forthcoming C++ standard.

(note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was introduced in
liburcu as a public API before READ_ONCE() existed in the Linux kernel)

Peter tells me the "memory_order_consume" is not something which can be used 
today.
Any information on its status at C/C++ standard levels and implementation-wise ?

Pragmatically speaking, what should we change in liburcu to ensure we don't 
generate
broken code when LTO is enabled ? I suspect there are a few options here:

1) Fail to build if LTO is enabled,
2) Generate slower code for rcu_dereference, either on all architectures or only
   on weakly-ordered architectures,
3) Generate different code depending on whether LTO is enabled or not. AFAIU 
this would only
   work if every compile unit is aware that it will end up being optimized with 
LTO. Not sure
   how this could be done in the context of user-space.
4) [ Insert better idea here. ]

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [lttng-dev] QSBR urcu read lock question

2021-04-15 Thread Mathieu Desnoyers via lttng-dev

- On Apr 15, 2021, at 2:11 PM, lbj lbj...@yahoo.com wrote:

> Thanks Mathieu. Is it safe to assume that if call_rcu is called twice then the
> callbacks are executed in the order that call_rcu was invoked? I think there 
> is
> a queue and only one thread that QSBR uses to handle callbacks, i just wanted
> to make sure that the queue was a guaranteed fifo.

It may very well depend on your configuration. For instance, if an application
invokes create_all_cpu_call_rcu_data(), it will create per-cpu worker threads
on each possible CPU. In that configuration, a given thread invoking call_rcu()
twice (back to back) may be migrated from one CPU to another in between, hence
different call-rcu worker threads will be responsible for executing the 
callbacks,
and they can be executed in any order.

So even though by careful analysis of your specific application configuration 
you
may presume that the callbacks will be executed in the same order they were 
invoked,
this is not guaranteed by the API, so I would not recommend relying on FIFO 
order
assumptions.

And given that the call_rcu API does not guarantee FIFO order, we could 
eventually
decide to change the order of callback execution from FIFO to LIFO if it leads 
to
performance improvements due to better cache locality.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] QSBR urcu read lock question

2021-04-15 Thread Mathieu Desnoyers via lttng-dev

- On Apr 15, 2021, at 10:54 AM, lbj lbj...@yahoo.com wrote:

> Mathieu,
> Thanks so much for your wealth if information and timely responses, they are
> greatly appreciated. Final question: is there any harm in explicitly calling
> rcu_thread_online/rcu_thread_offline from within my call_rcu callback 
> function?
> From what you described it sounds like it would be redundant, but presumably
> would be harmless. Correct? Thanks again.

You could indeed invoke pairs of:

  rcu_thread_offline();   <--- emphasis on _offline_ here.
  [ long wait ... ]
  rcu_thread_online();

in that specific order within the call-rcu worker thread. Note that the qsbr 
state
of the call-rcu worker thread is "online" when it invokes the callbacks, so 
each callback
should make sure that state is back to "online" before it returns control back
to its caller.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] QSBR urcu read lock question

2021-04-15 Thread Mathieu Desnoyers via lttng-dev

- On Apr 15, 2021, at 8:41 AM, lbj lbj...@yahoo.com wrote:

> Hi Mathieu,
> When I say “reclamation thread” I do mean the thread launched by call_rcu that
> is typically responsible for memory deallocations. Is is possible/recommended
> to register for rcu and then take an rcu-reader lock in such a thread? That is
> my main question.
> 
> As for reader locks being no-ops in QSBR, I read that but dont quite 
> understand
> it. Something must be preventing memory reclamation of rcu protected elements
> when I take that lock.

Note that a RCU read-side "lock" is really just a marker about the beginning/end
of a transaction which delays grace periods. We use the name "lock" to match
the kernel RCU APIs, but it should not be considered as doing any kind of mutual
exclusion.

> 
> My specific situation is: I have a QSBR rcu protected “policy” object (just a
> regular old C++ object that periodically gets refreshed and must be atomically
> updated because worker cores are reading it while spinning, and they cant slow
> down). When a new policy is received we invoke call_rcu on the old policy.
> call_rcu will eventually launch a thread in which the old policy’s resources
> are reclaimed. In this thread I would like to iterate through another, 
> separate
> structure, which is also QSBR rcu protected (a urcu hashtable). To do so
> safely, presumably I must use an rcu readlock. I just want to make sure such a
> scenario is reasonable and at very least not contra-indicated. Thanks!

If you look at urcu-call-rcu-impl.h, you will notice that call_rcu_thread()
indeed registers itself as a reader thread.

So the call-rcu callbacks can indeed take a RCU read-side lock, but for QSBR
the story does not end there, because due to the nature of QSBR, the read-side
lock is indeed a no-op, and it relies instead on all registered QSBR reader
threads to periodically invoke urcu_qsbr_quiescent_state() to report that they
are in a quiescent state, or invoke urcu_qsbr_thread_offline() if they expect 
to be
in a quiescent state for a long period of time (e.g. blocking), followed by
urcu_qsbr_thread_online().

And indeed, the call_rcu_thread puts itself in "offline" mode while awaiting for
grace periods (this is implicitly done within the qsbr synchronize_rcu() 
implementation)
and when sleeping.

So yes, you should be able to have a RCU read-side from within a call-rcu worker
thread, and it's OK to assume you can do a RCU traversal with the QSBR urcu 
flavor
from a call-rcu worker thread as well.

Thanks,

Mathieu

> 
> Jeff
> 
> 
> 
> Sent from my iPhone
> 
>> On Apr 15, 2021, at 8:20 AM, Mathieu Desnoyers 
>> 
>> wrote:
>> 
>> - On Apr 13, 2021, at 11:19 PM, lttng-dev lttng-dev@lists.lttng.org 
>> wrote:
>> 
>>> Hello all,
>>> 
>>> I have two different entities that are both protected by QSBR rcu: a policy 
>>> and
>>> a hashtable. In the reclamation thread for the policy I would like to take a
>>> read lock so that I can safely iterate through the hashtable. I dont see
>>> anything wrong with this, but I just wanted to make sure it was ok since 
>>> taking
>>> an rcu read lock in an rcu reclamation thread seems like it may be a bit
>>> suspect. Thanks for any insights, let me know if clarification is needed!
>> 
>> When you say "the reclamation thread for the policy", do you refer to a 
>> call-rcu
>> worker thread ?
>> 
>> Also, you are aware that RCU read-side lock/unlock are effectively no-ops for
>> QSBR rcu, right ?
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] QSBR urcu read lock question

2021-04-15 Thread Mathieu Desnoyers via lttng-dev

- On Apr 13, 2021, at 11:19 PM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hello all,
> 
> I have two different entities that are both protected by QSBR rcu: a policy 
> and
> a hashtable. In the reclamation thread for the policy I would like to take a
> read lock so that I can safely iterate through the hashtable. I dont see
> anything wrong with this, but I just wanted to make sure it was ok since 
> taking
> an rcu read lock in an rcu reclamation thread seems like it may be a bit
> suspect. Thanks for any insights, let me know if clarification is needed!

When you say "the reclamation thread for the policy", do you refer to a call-rcu
worker thread ?

Also, you are aware that RCU read-side lock/unlock are effectively no-ops for
QSBR rcu, right ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [PATCH v3 0/3] rseq: minor optimizations

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 4:33 PM, Eric Dumazet eric.duma...@gmail.com wrote:

> From: Eric Dumazet 
> 
> rseq is a heavy user of copy to/from user data in fast paths.
> This series tries to reduce the cost.
> 
> v3: Third patch going back to v1 (only deal with 64bit arches)
> v2: Addressed Peter and Mathieu feedbacks, thanks !

For the whole series:

Reviewed-by: Mathieu Desnoyers 

Thanks Eric!

Mathieu

> 
> Eric Dumazet (3):
>  rseq: optimize rseq_update_cpu_id()
>  rseq: remove redundant access_ok()
>  rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()
> 
> kernel/rseq.c | 29 +
> 1 file changed, 21 insertions(+), 8 deletions(-)
> 
> --
> 2.31.1.295.g9ea45b61b8-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread Mathieu Desnoyers




- On Apr 13, 2021, at 2:22 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Apr 13, 2021 at 8:00 PM Mathieu Desnoyers
>  wrote:
>>
> 
>> As long as the ifdefs are localized within clearly identified wrappers in the
>> rseq code I don't mind doing the special-casing there.
>>
>> The point which remains is that I don't think we want to optimize for speed
>> on 32-bit architectures when it adds special-casing and complexity to the 
>> 32-bit
>> build. I suspect there is less and less testing performed on 32-bit
>> architectures
>> nowadays, and it's good that as much code as possible is shared between 
>> 32-bit
>> and
>> 64-bit builds to share the test coverage.
>>
> 
> Quite frankly V1 was fine, I can't really make it looking better.

Yes, I'm OK with V1 of that patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 1:33 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Apr 13, 2021 at 7:20 PM Mathieu Desnoyers
>  wrote:
>>
>> - On Apr 13, 2021, at 1:07 PM, Eric Dumazet eduma...@google.com wrote:
>>
>> > On Tue, Apr 13, 2021 at 7:01 PM Eric Dumazet  wrote:
>> >>
>> >> On Tue, Apr 13, 2021 at 6:57 PM Eric Dumazet  wrote:
>> >> >
>> >> > On Tue, Apr 13, 2021 at 6:54 PM Mathieu Desnoyers
>> >> >  wrote:
>> >> > >
>> >> > > - On Apr 13, 2021, at 12:22 PM, Eric Dumazet 
>> >> > > eric.duma...@gmail.com wrote:
>> >> > >
>> >> > > > From: Eric Dumazet 
>> >> > > >
>> >> > > > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
>> >> > > > update includes") added regressions for our servers.
>> >> > > >
>> >> > > > Using copy_from_user() and clear_user() for 64bit values
>> >> > > > is suboptimal.
>> >> > > >
>> >> > > > We can use faster put_user() and get_user().
>> >> > > >
>> >> > > > 32bit arches can be changed to use the ptr32 field,
>> >> > > > since the padding field must always be zero.
>> >> > > >
>> >> > > > v2: added ideas from Peter and Mathieu about making this
>> >> > > >generic, since my initial patch was only dealing with
>> >> > > >64bit arches.
>> >> > >
>> >> > > Ah, now I remember the reason why reading and clearing the entire 
>> >> > > 64-bit
>> >> > > is important: it's because we don't want to allow user-space 
>> >> > > processes to
>> >> > > use this change in behavior to figure out whether they are running on 
>> >> > > a
>> >> > > 32-bit or in a 32-bit compat mode on a 64-bit kernel.
>> >> > >
>> >> > > So although I'm fine with making 64-bit kernels faster, we'll want to 
>> >> > > keep
>> >> > > updating the entire 64-bit ptr field on 32-bit kernels as well.
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> >
>> >> > So... back to V1 then ?
>> >>
>> >> Or add more stuff as in :
>> >
>> > diff against v2, WDYT ?
>>
>> I like this approach slightly better, because it moves the preprocessor 
>> ifdefs
>> into
>> rseq_get_rseq_cs and clear_rseq_cs, while keeping the same behavior for a 
>> 32-bit
>> process running on native 32-bit kernel and as compat task on a 64-bit 
>> kernel.
>>
>> That being said, I don't expect anyone to care much about performance of 
>> 32-bit
>> kernels, so we could use copy_from_user() on 32-bit kernels to remove
>> special-cases
>> in 32-bit specific code. This would eliminate the 32-bit specific "padding"
>> read, and
>> let the TASK_SIZE comparison handle the check for both 32-bit and 64-bit
>> kernels.
>>
>> As for clear_user(), I wonder whether we could simply keep using it, but 
>> change
>> the
>> clear_user() macro to figure out that it can use a faster 8-byte put_user ? I
>> find it
>> odd that performance optimizations which would be relevant elsewhere creep 
>> into
>> the
>> rseq code.
> 
> 
> clear_user() is a maze of arch-dependent macros/functions/assembly
> 
> I guess the same could be said from  copy_in_user(), but apparently we removed
> special-casing, like in commit a41e0d754240fe8ca9c4f2070bf67e3b0228aa22
> 
> Definitely it seems odd having to carefully choose between multiple methods.

As long as the ifdefs are localized within clearly identified wrappers in the
rseq code I don't mind doing the special-casing there.

The point which remains is that I don't think we want to optimize for speed
on 32-bit architectures when it adds special-casing and complexity to the 32-bit
build. I suspect there is less and less testing performed on 32-bit 
architectures
nowadays, and it's good that as much code as possible is shared between 32-bit 
and
64-bit builds to share the test coverage.

Thanks,

Mathieu

> 
> 
>>
>> Thanks,
>>
>> Mathieu
>>
>> >
>> > diff --git a/kernel/rseq.c b/kernel/rseq.c
>> > index
>> > f2eee3f7f5d330688c81cb2e57d47ca6b843873e..537b1f684efa11069990018ffa3642c209993011
>>

Re: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 1:07 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Apr 13, 2021 at 7:01 PM Eric Dumazet  wrote:
>>
>> On Tue, Apr 13, 2021 at 6:57 PM Eric Dumazet  wrote:
>> >
>> > On Tue, Apr 13, 2021 at 6:54 PM Mathieu Desnoyers
>> >  wrote:
>> > >
>> > > - On Apr 13, 2021, at 12:22 PM, Eric Dumazet eric.duma...@gmail.com 
>> > > wrote:
>> > >
>> > > > From: Eric Dumazet 
>> > > >
>> > > > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
>> > > > update includes") added regressions for our servers.
>> > > >
>> > > > Using copy_from_user() and clear_user() for 64bit values
>> > > > is suboptimal.
>> > > >
>> > > > We can use faster put_user() and get_user().
>> > > >
>> > > > 32bit arches can be changed to use the ptr32 field,
>> > > > since the padding field must always be zero.
>> > > >
>> > > > v2: added ideas from Peter and Mathieu about making this
>> > > >generic, since my initial patch was only dealing with
>> > > >64bit arches.
>> > >
>> > > Ah, now I remember the reason why reading and clearing the entire 64-bit
>> > > is important: it's because we don't want to allow user-space processes to
>> > > use this change in behavior to figure out whether they are running on a
>> > > 32-bit or in a 32-bit compat mode on a 64-bit kernel.
>> > >
>> > > So although I'm fine with making 64-bit kernels faster, we'll want to 
>> > > keep
>> > > updating the entire 64-bit ptr field on 32-bit kernels as well.
>> > >
>> > > Thanks,
>> > >
>> >
>> > So... back to V1 then ?
>>
>> Or add more stuff as in :
> 
> diff against v2, WDYT ?

I like this approach slightly better, because it moves the preprocessor ifdefs 
into
rseq_get_rseq_cs and clear_rseq_cs, while keeping the same behavior for a 32-bit
process running on native 32-bit kernel and as compat task on a 64-bit kernel.

That being said, I don't expect anyone to care much about performance of 32-bit
kernels, so we could use copy_from_user() on 32-bit kernels to remove 
special-cases
in 32-bit specific code. This would eliminate the 32-bit specific "padding" 
read, and
let the TASK_SIZE comparison handle the check for both 32-bit and 64-bit 
kernels.

As for clear_user(), I wonder whether we could simply keep using it, but change 
the
clear_user() macro to figure out that it can use a faster 8-byte put_user ? I 
find it
odd that performance optimizations which would be relevant elsewhere creep into 
the
rseq code.

Thanks,

Mathieu

> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index
> f2eee3f7f5d330688c81cb2e57d47ca6b843873e..537b1f684efa11069990018ffa3642c209993011
> 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -136,6 +136,10 @@ static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp,
> {
>u32 ptr;
> 
> +   if (get_user(ptr, >rseq_cs.ptr.padding))
> +   return -EFAULT;
> +   if (ptr)
> +   return -EINVAL;
>if (get_user(ptr, >rseq_cs.ptr.ptr32))
>return -EFAULT;
>*uptrp = (struct rseq_cs __user *)ptr;
> @@ -150,8 +154,9 @@ static int rseq_get_rseq_cs(struct task_struct *t,
> struct rseq_cs *rseq_cs)
>u32 sig;
>int ret;
> 
> -   if (rseq_get_cs_ptr(_cs, t->rseq))
> -   return -EFAULT;
> +   ret = rseq_get_cs_ptr(_cs, t->rseq);
> +   if (ret)
> +   return ret;
>if (!urseq_cs) {
>memset(rseq_cs, 0, sizeof(*rseq_cs));
>return 0;
> @@ -237,7 +242,8 @@ static int clear_rseq_cs(struct task_struct *t)
> #ifdef CONFIG_64BIT
>return put_user(0UL, >rseq->rseq_cs.ptr64);
> #else
> -   return put_user(0UL, >rseq->rseq_cs.ptr.ptr32);
> +   return put_user(0UL, >rseq->rseq_cs.ptr.ptr32) |
> +  put_user(0UL, >rseq->rseq_cs.ptr.padding);
> #endif
>  }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 12:57 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Apr 13, 2021 at 6:54 PM Mathieu Desnoyers
>  wrote:
>>
>> - On Apr 13, 2021, at 12:22 PM, Eric Dumazet eric.duma...@gmail.com 
>> wrote:
>>
>> > From: Eric Dumazet 
>> >
>> > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
>> > update includes") added regressions for our servers.
>> >
>> > Using copy_from_user() and clear_user() for 64bit values
>> > is suboptimal.
>> >
>> > We can use faster put_user() and get_user().
>> >
>> > 32bit arches can be changed to use the ptr32 field,
>> > since the padding field must always be zero.
>> >
>> > v2: added ideas from Peter and Mathieu about making this
>> >generic, since my initial patch was only dealing with
>> >64bit arches.
>>
>> Ah, now I remember the reason why reading and clearing the entire 64-bit
>> is important: it's because we don't want to allow user-space processes to
>> use this change in behavior to figure out whether they are running on a
>> 32-bit or in a 32-bit compat mode on a 64-bit kernel.
>>
>> So although I'm fine with making 64-bit kernels faster, we'll want to keep
>> updating the entire 64-bit ptr field on 32-bit kernels as well.
>>
>> Thanks,
>>
> 
> So... back to V1 then ?

In terms of behavior, yes. And it's probably the "easy" fix, but I hate that
it adds lots of preprocessor ifdefs into the rseq code.

But this would require auditing get_user()/put_user() for each architecture
supported by rseq to ensure they support 8-byte load/store. And it would become
an added burden on architecture maintainers wishing to add rseq support for 
their
architecture.

One alternative would be to implement rseq_get_user_u64 and rseq_put_user_u64
wrappers as static functions within rseq.c to hide the preprocessor ifdeffery
from the higher-level code. I try very hard to avoid mixing preprocessor ifdefs
with C code logic whenever I can.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 12:22 PM, Eric Dumazet eric.duma...@gmail.com wrote:

> From: Eric Dumazet 
> 
> Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
> update includes") added regressions for our servers.
> 
> Using copy_from_user() and clear_user() for 64bit values
> is suboptimal.
> 
> We can use faster put_user() and get_user().
> 
> 32bit arches can be changed to use the ptr32 field,
> since the padding field must always be zero.
> 
> v2: added ideas from Peter and Mathieu about making this
>generic, since my initial patch was only dealing with
>64bit arches.

Ah, now I remember the reason why reading and clearing the entire 64-bit
is important: it's because we don't want to allow user-space processes to
use this change in behavior to figure out whether they are running on a
32-bit or in a 32-bit compat mode on a 64-bit kernel.

So although I'm fine with making 64-bit kernels faster, we'll want to keep
updating the entire 64-bit ptr field on 32-bit kernels as well.

Thanks,

Mathieu

> 
> Signed-off-by: Eric Dumazet 
> Cc: Mathieu Desnoyers 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Boqun Feng 
> Cc: Arjun Roy 
> Cc: Ingo Molnar 
> ---
> kernel/rseq.c | 41 +
> 1 file changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index
> cfe01ab5253c1c424c0e8b25acbb6a8e1b41a5b6..f2eee3f7f5d330688c81cb2e57d47ca6b843873e
> 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -119,23 +119,46 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t)
>   return 0;
> }
> 
> +#ifdef CONFIG_64BIT
> +static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp,
> +const struct rseq __user *rseq)
> +{
> + u64 ptr;
> +
> + if (get_user(ptr, >rseq_cs.ptr64))
> + return -EFAULT;
> + *uptrp = (struct rseq_cs __user *)ptr;
> + return 0;
> +}
> +#else
> +static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp,
> +const struct rseq __user *rseq)
> +{
> + u32 ptr;
> +
> + if (get_user(ptr, >rseq_cs.ptr.ptr32))
> + return -EFAULT;
> + *uptrp = (struct rseq_cs __user *)ptr;
> + return 0;
> +}
> +#endif
> +
> static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
> {
>   struct rseq_cs __user *urseq_cs;
> - u64 ptr;
>   u32 __user *usig;
>   u32 sig;
>   int ret;
> 
> - if (copy_from_user(, >rseq->rseq_cs.ptr64, sizeof(ptr)))
> + if (rseq_get_cs_ptr(_cs, t->rseq))
>   return -EFAULT;
> - if (!ptr) {
> + if (!urseq_cs) {
>   memset(rseq_cs, 0, sizeof(*rseq_cs));
>   return 0;
>   }
> - if (ptr >= TASK_SIZE)
> + if ((unsigned long)urseq_cs >= TASK_SIZE)
>   return -EINVAL;
> - urseq_cs = (struct rseq_cs __user *)(unsigned long)ptr;
> +
>   if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)))
>   return -EFAULT;
> 
> @@ -211,9 +234,11 @@ static int clear_rseq_cs(struct task_struct *t)
>*
>* Set rseq_cs to NULL.
>*/
> - if (clear_user(>rseq->rseq_cs.ptr64, sizeof(t->rseq->rseq_cs.ptr64)))
> - return -EFAULT;
> - return 0;
> +#ifdef CONFIG_64BIT
> + return put_user(0UL, >rseq->rseq_cs.ptr64);
> +#else
> + return put_user(0UL, >rseq->rseq_cs.ptr.ptr32);
> +#endif
> }
> 
> /*
> --
> 2.31.1.295.g9ea45b61b8-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 3/3] rseq: optimise for 64bit arches

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 11:06 AM, David Laight david.lai...@aculab.com wrote:
[...]
> 
> Hmmm... too much replication.
> You could do:
> #ifdef CONFIG_64BIT
> #define PTR_TYPE u64
> #define PTR_FLD ptr64
> #else
> #define PTR_TYPE u32
> #define PTR_FLD ptr32
> #endif
> 
> Then have one copy of the code and the #undefs.

Good point, agreed.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 2/3] rseq: remove redundant access_ok()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 3:36 AM, Eric Dumazet eric.duma...@gmail.com wrote:

> From: Eric Dumazet 
> 
> After commit 8f2817701492 ("rseq: Use get_user/put_user rather
> than __get_user/__put_user") we no longer need
> an access_ok() call from __rseq_handle_notify_resume()

While we are doing that, should we also remove the access_ok() check in
rseq_syscall() ? It look like it can also be removed for the exact same
reason outlined here.

Thanks,

Mathieu

> 
> Signed-off-by: Eric Dumazet 
> Cc: Mathieu Desnoyers 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Boqun Feng 
> Cc: Arjun Roy 
> Cc: Ingo Molnar 
> ---
> kernel/rseq.c | 2 --
> 1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index
> d2689ccbb132c0fc8ec0924008771e5ee1ca855e..57344f9abb43905c7dd2b6081205ff508d963e1e
> 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -273,8 +273,6 @@ void __rseq_handle_notify_resume(struct ksignal *ksig,
> struct pt_regs *regs)
> 
>   if (unlikely(t->flags & PF_EXITING))
>   return;
> - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq
> - goto error;
>   ret = rseq_ip_fixup(regs);
>   if (unlikely(ret < 0))
>   goto error;
> --
> 2.31.1.295.g9ea45b61b8-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 1/3] rseq: optimize rseq_update_cpu_id()

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 3:36 AM, Eric Dumazet eric.duma...@gmail.com wrote:

> From: Eric Dumazet 
> 
> Two put_user() in rseq_update_cpu_id() are replaced
> by a pair of unsafe_put_user() with appropriate surroundings.
> 
> This removes one stac/clac pair on x86 in fast path.
> 
> Signed-off-by: Eric Dumazet 
> Cc: Mathieu Desnoyers 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Boqun Feng 
> Cc: Arjun Roy 
> Cc: Ingo Molnar 
> ---
> kernel/rseq.c | 15 +++
> 1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index
> a4f86a9d6937cdfa2f13d1dcc9be863c1943d06f..d2689ccbb132c0fc8ec0924008771e5ee1ca855e
> 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -84,13 +84,20 @@
> static int rseq_update_cpu_id(struct task_struct *t)
> {
>   u32 cpu_id = raw_smp_processor_id();
> + struct rseq *r = t->rseq;

AFAIU the variable above should be a struct rseq __user *.

Elsewhere in the file we use "rseq" rather than "r" for struct rseq __user *
variable name, it would be better to keep the naming consistent across the file
if possible.

Thanks,

Mathieu

> 
> - if (put_user(cpu_id, >rseq->cpu_id_start))
> - return -EFAULT;
> - if (put_user(cpu_id, >rseq->cpu_id))
> - return -EFAULT;
> + if (!user_write_access_begin(r, sizeof(*r)))
> + goto efault;
> + unsafe_put_user(cpu_id, >cpu_id_start, efault_end);
> + unsafe_put_user(cpu_id, >cpu_id, efault_end);
> + user_write_access_end();
>   trace_rseq_update(t);
>   return 0;
> +
> +efault_end:
> + user_write_access_end();
> +efault:
> + return -EFAULT;
> }
> 
> static int rseq_reset_rseq_cpu_id(struct task_struct *t)
> --
> 2.31.1.295.g9ea45b61b8-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 3/3] rseq: optimise for 64bit arches

2021-04-13 Thread Mathieu Desnoyers

- On Apr 13, 2021, at 6:36 AM, David Laight david.lai...@aculab.com wrote:

> From: Peter Zijlstra
>> Sent: 13 April 2021 10:10
>> 
>> On Tue, Apr 13, 2021 at 12:36:57AM -0700, Eric Dumazet wrote:
>> > From: Eric Dumazet 
>> >
>> > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union,
>> > update includes") added regressions for our servers.
>> >
>> > Using copy_from_user() and clear_user() for 64bit values
>> > on 64bit arches is suboptimal.
>> >
>> > We might revisit this patch once all 32bit arches support
>> > get_user() and/or put_user() for 8 bytes values.
>> 
>> Argh, what a mess :/ afaict only nios32 lacks put_user_8, but get_user_8
>> is missing in a fair number of archs.
>> 
>> That said; 32bit archs never have to actually set the top bits in that
>> word, so they _could_ only set the low 32 bits. That works provided
>> userspace itself keeps the high bits clear.
> 
> Does that work for 32bit BE ?

Yes, because uapi/linux/rseq.h defines the ptr32 as:

#if (defined(__BYTE_ORDER) && (__BYTE_ORDER == __BIG_ENDIAN)) || 
defined(__BIG_ENDIAN)
__u32 padding;  /* Initialized to zero. */
__u32 ptr32;
#else /* LITTLE */
__u32 ptr32;
__u32 padding;  /* Initialized to zero. */
#endif /* ENDIAN */

which takes care of BE vs LE.

> 
>   David
> 
>> So I suppose that if we're going to #ifdef this, we might as well do the
>> whole thing.
>> 
>> Mathieu; did I forget a reason why this cannot work?

The only difference it brings on 32-bit is that the truncation of high bits
will be done before the following validation:

if (!ptr) {
memset(rseq_cs, 0, sizeof(*rseq_cs));
return 0;
}
if (ptr >= TASK_SIZE)
return -EINVAL;

The question is whether we really want to issue a segmentation fault if 32-bit
user-space has set non-zero high bits, or if silently ignoring those high
bits is acceptable.

Nits below:

>> 
>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>> index a4f86a9d6937..94006190b8eb 100644
>> --- a/kernel/rseq.c
>> +++ b/kernel/rseq.c
>> @@ -115,20 +115,25 @@ static int rseq_reset_rseq_cpu_id(struct task_struct 
>> *t)
>>  static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
>>  {
>>  struct rseq_cs __user *urseq_cs;
>> -u64 ptr;
>> +unsigned long ptr;

I am always reluctant to use long/unsigned long type as type for the 
get/put_user
(x) argument, because it hides the cast deep within architecture-specific 
macros.
I understand that in this specific case it happens that on 64-bit archs we end 
up
casting a u64 to unsigned long (same size), and on 32-bit archs we end up 
casting a
u32 to unsigned long (also same size), so there is no practical concern about 
type
promotion and sign-extension, but I think it would be better to have something
explicit, e.g.:

#ifdef CONFIG_64BIT
static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp, struct rseq_cs 
*rseq_cs)
{
u64 ptr;

if (get_user(ptr, _cs->ptr64))
return -EFAULT;
*ptrp = (struct rseq_cs __user *)ptr;
return 0;
}
#else
static int rseq_get_cs_ptr(struct rseq_cs __user **uptrp, struct rseq_cs 
*rseq_cs)
{
u32 ptr;

if (get_user(ptr, _cs->ptr.ptr32))
return -EFAULT;
*ptrp = (struct rseq_cs __user *)ptr;
return 0;
}
#endif

And use those helpers to get the ptr value.

>>  u32 __user *usig;
>>  u32 sig;
>>  int ret;
>> 
>> -if (copy_from_user(, >rseq->rseq_cs.ptr64, sizeof(ptr)))
>> +#ifdef CONFIG_64BIT
>> +if (get_user(ptr, >rseq->rseq_cs.ptr64))
>>  return -EFAULT;
>> +#else
>> +if (get_user(ptr, >rseq->rseq_cs.ptr32))

Note that this is also not right. It should be >rseq->rseq_cs.ptr.ptr32.

Thanks,

Mathieu

>> +return -EFAULT;
>> +#endif
>>  if (!ptr) {
>>  memset(rseq_cs, 0, sizeof(*rseq_cs));
>>  return 0;
>>  }
>>  if (ptr >= TASK_SIZE)
>>  return -EINVAL;
>> -urseq_cs = (struct rseq_cs __user *)(unsigned long)ptr;
>> +urseq_cs = (struct rseq_cs __user *)ptr;
>>  if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)))
>>  return -EFAULT;
>> 
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT,
> UK
> Registration No: 1397386 (Wales)

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 11961 matches

Mail list logo