Re: [PATCH 2/3] printk: Add /sys/consoles/ interface

2017-11-03 Thread Petr Mladek
On Fri 2017-11-03 15:32:34, Kroah-Hartman wrote:
> > > diff --git a/Documentation/ABI/testing/sysfs-consoles 
> > > b/Documentation/ABI/testing/sysfs-consoles
> > > new file mode 100644
> > > index 000..6a1593e
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-consoles
> > > @@ -0,0 +1,13 @@
> > > +What:/sys/consoles/
> 
> Eeek, what!
> 
> > I rather add Greg in CC. I am not 100% sure that the top level
> > directory is the right thing to do.
> 
> Neither do I.
> 
> > Alternative might be to hide this under /sys/kernel/consoles/.
> 
> No no no.
> 
> > > diff --git a/include/linux/console.h b/include/linux/console.h
> > > index a5b5d79..76840be 100644
> > > --- a/include/linux/console.h
> > > +++ b/include/linux/console.h
> > > @@ -148,6 +148,7 @@ struct console {
> > >   void*data;
> > >   struct   console *next;
> > >   int level;
> > > + struct kobject *kobj;
> 
> Why are you using "raw" kobjects and not a "real" struct device?  This
> is a device, use that interface instead please.

Hmm, struct console has a member

struct tty_driver *(*device)(struct console *, int *);

but it is set only when the console has tty binding.

> If you need a console 'bus' to place them on, fine, but the virtual bus
> is probably best and simpler to use.
> 
> That is if you _really_ feel you need sysfs interaction with the console
> layer (hint, I am not yet convinced...)

The purpose of this patch is to make a userfriendly interface
for setting console-specific loglevel (message filtering).

It curretnly uses kobject for creating a simple directory
structure under /sys/. It is inspired by /sys/power/, /sys/kernel/mm/,
/sys/kernel/debug/tracing/ stuff.

There are ideas to add more files that would allow to modify even the
global setting. This is currectly possible by the four numbers in
/proc/sys/kernel/printk. Nobody knows what the four numbers mean.
IMHO, the following interface would be easier to work with:

   /sys/console/loglevel
   /sys/console/min_loglevel
   /sys/condole/default_loglevel

> > /*
> >  * Find the related struct console a safe way. The kobject
> >  * desctruction is asynchronous.
> >  */
> > > + console_lock();
> > > + for_each_console(con) {
> > > + if (con->kobj == kobj) {
> 
> You are doing something wrong, go from kobj to your console directly,
> the fact that you can not do that here is a _huge_ hint that your
> structure is not correct.
> 
> Hint, it's not correct at all :)

I know that we are not following the original purpose of sysfs.
But there are more (mis)users.

> Please cc: me on stuff like this if you want a review, and as you are
> adding a random new sysfs root directory, please always cc: me on that
> so I can talk you out of it...

Good to know.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] printk: Add ability to set loglevel via "console=" cmdline

2017-11-03 Thread Petr Mladek
On Thu 2017-09-28 17:43:57, Calvin Owens wrote:
> This extends the "console=" interface to allow setting the per-console
> loglevel by adding "/N" to the string, where N is the desired loglevel
> expressed as a base 10 integer. Invalid values are silently ignored.
> 
> Cc: Petr Mladek <pmla...@suse.com>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
> Signed-off-by: Calvin Owens <calvinow...@fb.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  6 ++---
>  kernel/printk/console_cmdline.h |  1 +
>  kernel/printk/printk.c  | 30 
> -
>  3 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 0549662..f22b992 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -607,10 +607,10 @@
>   ttyS[,options]
>   ttyUSB0[,options]
>   Use the specified serial port.  The options are of
> - the form "pnf", where "" is the baud rate,
> + the form "pnf/l", where "" is the baud rate,
>   "p" is parity ("n", "o", or "e"), "n" is number of
> - bits, and "f" is flow control ("r" for RTS or
> - omit it).  Default is "9600n8".
> + bits, "f" is flow control ("r" for RTS or omit it),
> + and "l" is the loglevel on [0,7]. Default is "9600n8".
>  

If I get this correctly, the patch allows to define the loglevel for any
console. I think that we need to describe it in a generic
way. Something like:

   console=[KNL] Output console device and options.

Format: name[,options][/min_loglevel]

Where "name" is the console name, "options"
are console-specific options, and "min_loglevel"
allows to increase the loglevel for a particular
console over the global one.

tty  Use the virtual console device .

I would also add a cross reference into the loglevel= section
about that the global loglevel might be overridden by a higher
console-specific min_loglevel value.


>   See Documentation/admin-guide/serial-console.rst for 
> more
>   information.  See
> diff --git a/kernel/printk/console_cmdline.h b/kernel/printk/console_cmdline.h
> index 2ca4a8b..269e666 100644
> --- a/kernel/printk/console_cmdline.h
> +++ b/kernel/printk/console_cmdline.h
> @@ -5,6 +5,7 @@ struct console_cmdline
>  {
>   charname[16];   /* Name of the driver   */
>   int index;  /* Minor dev. to use*/
> + int loglevel;   /* Loglevel to use */

Again, I would use "min_loglevel".


>   char*options;   /* Options for the driver   */
>  #ifdef CONFIG_A11Y_BRAILLE_CONSOLE
>   char*brl_options;   /* Options for braille driver */
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 488bda3..4c14cf2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2541,6 +2552,12 @@ void register_console(struct console *newcon)
>   if (newcon->index < 0)
>   newcon->index = c->index;
>  
> + /*
> +  * Carry over the loglevel from the cmdline
> +  */
> + newcon->level = c->loglevel;
> + extant = true;

I would personally do the following:

if (!newcon->min_loglevel)
newcon->min_loglevel = c->min_loglevel;

It is similar to newcon->index handling above. It will use the
command line setting only when the console is registered for the first
time.

All this is based on my assumption that all non-initialized struct
console members are zero. At least I do not see any location where
some other members would be explicitly zeroed. And there might
be candidates, e.g. data, match(), next.

In each case, I do not know what the shortcut "extant" stands for
and I feel confused ;-)


>   if (_braille_register_console(newcon, c))
>   return;
>  
&g

Re: [PATCH 2/3] printk: Add /sys/consoles/ interface

2017-11-03 Thread Petr Mladek
On Thu 2017-09-28 17:43:56, Calvin Owens wrote:
> This adds a new sysfs interface that contains a directory for each
> console registered on the system. Each directory contains a single
> "loglevel" file for reading and setting the per-console loglevel.
> 
> We can let kobject destruction race with console removal: if it does,
> loglevel_{show,store}() will safely fail with -ENODEV. This is a little
> weird, but avoids embedding the kobject and therefore needing to totally
> refactor the way we handle console struct lifetime.

It looks like a sane approach. It might be worth a comment in the code.


>  Documentation/ABI/testing/sysfs-consoles | 13 +
>  include/linux/console.h  |  1 +
>  kernel/printk/printk.c   | 88 
> 
>  3 files changed, 102 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-consoles
> 
> diff --git a/Documentation/ABI/testing/sysfs-consoles 
> b/Documentation/ABI/testing/sysfs-consoles
> new file mode 100644
> index 000..6a1593e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-consoles
> @@ -0,0 +1,13 @@
> +What:/sys/consoles/

I rather add Greg in CC. I am not 100% sure that the top level
directory is the right thing to do.

Alternative might be to hide this under /sys/kernel/consoles/.


> +Date:September 2017
> +KernelVersion:   4.15
> +Contact: Calvin Owens 
> +Description: The /sys/consoles tree contains a directory for each console
> + configured on the system. These directories contain the
> + following attributes:
> +
> + * "loglevel"Set the per-console loglevel: the kernel uses
> + max(system_loglevel, perconsole_loglevel) when
> + deciding whether to emit a given message. The
> + default is 0, which means max() always yields
> + the system setting in the kernel.printk sysctl.

I would call the attribute "min_loglevel". The name "loglevel" should
be reserved for the really used loglevel that depends also on the
global loglevel value.


> diff --git a/include/linux/console.h b/include/linux/console.h
> index a5b5d79..76840be 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -148,6 +148,7 @@ struct console {
>   void*data;
>   struct   console *next;
>   int level;
> + struct kobject *kobj;
>  };
>  
>  /*
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 3f1675e..488bda3 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -105,6 +105,8 @@ enum devkmsg_log_masks {
>  
>  static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
>  
> +static struct kobject *consoles_dir_kobj;
>
>  static int __control_devkmsg(char *str)
>  {
>   if (!str)
> @@ -2371,6 +2373,82 @@ static int __init keep_bootcon_setup(char *str)
>  
>  early_param("keep_bootcon", keep_bootcon_setup);
>  
> +static ssize_t loglevel_show(struct kobject *kobj, struct kobj_attribute 
> *attr,
> +  char *buf)
> +{
> + struct console *con;
> + ssize_t ret = -ENODEV;
> +

This might deserve a comment. Something like:

/*
 * Find the related struct console a safe way. The kobject
 * desctruction is asynchronous.
 */
> + console_lock();
> + for_each_console(con) {
> + if (con->kobj == kobj) {
> + ret = sprintf(buf, "%d\n", con->level);
> + break;
> + }
> + }
> + console_unlock();
> +
> + return ret;
> +}
> +
> +static ssize_t loglevel_store(struct kobject *kobj, struct kobj_attribute 
> *attr,
> +   const char *buf, size_t count)
> +{
> + struct console *con;
> + ssize_t ret;
> + int tmp;

I would use some meaningful name, e.g. new_level ;-)

> + ret = kstrtoint(buf, 10, );
> + if (ret < 0)
> + return ret;
> +
> + if (tmp < LOGLEVEL_EMERG)
> + return -ERANGE;
> +
> + /*
> +  * Mimic the behavior of /dev/kmsg with respect to minimum_loglevel
> +  */
> + if (tmp < minimum_console_loglevel)
> + tmp = minimum_console_loglevel;

Hmm, I would remove this "mimic" stuff. minimum_console_loglevel is currently
used to limit operations by the syslog system call. But root is still
able modify the minimum_console_loglevel by writing into
/proc/sys/kernel/printk.

My plan is that /sys/console interface would eventually replace the
crazy /proc/sys/kernel/printk one.

In each case, the default con->level value is zero. It would be
weird if people were not able to set this value.

> +
> + ret = -ENODEV;

I would repeat the same comment here:

/*
 * Find the related struct console a safe way. The kobject
 * desctruction is asynchronous.
 */

> + 

Re: [PATCH 1/3] printk: Introduce per-console loglevel setting

2017-11-03 Thread Petr Mladek
On Thu 2017-09-28 17:43:55, Calvin Owens wrote:
> This patch introduces a new per-console loglevel setting, and changes
> console_unlock() to use max(global_level, per_console_level) when
> deciding whether or not to emit a given log message.

> diff --git a/include/linux/console.h b/include/linux/console.h
> index b8920a0..a5b5d79 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -147,6 +147,7 @@ struct console {
>   int cflag;
>   void*data;
>   struct   console *next;
> + int level;

I would make the meaning more clear and call this min_loglevel.

>  };
>  
>  /*
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 512f7c2..3f1675e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1141,9 +1141,14 @@ module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(ignore_loglevel,
>"ignore loglevel setting (prints all kernel messages to the 
> console)");
>  
> -static bool suppress_message_printing(int level)
> +static int effective_loglevel(struct console *con)
>  {
> - return (level >= console_loglevel && !ignore_loglevel);
> + return max(console_loglevel, con ? con->level : LOGLEVEL_EMERG);
> +}
> +
> +static bool suppress_message_printing(int level, struct console *con)
> +{
> + return (level >= effective_loglevel(con) && !ignore_loglevel);
>  }

We need to be more careful here:

First, there is one ugly level called CONSOLE_LOGLEVEL_SILENT. Fortunately,
it is used only by vkdb_printf(). I guess that the purpose is to store
messages into the log buffer and do not show them on consoles.

It is hack and it is racy. It would hide the messages only when the
console_lock() is not already taken. Similar hack is used on more
locations, e.g. in __handle_sysrq() and these are racy as well.
We need to come up with something better in the future but this
is a task for another patchset.


Second, this functions are called with NULL when we need to take
all usable consoles into account. You simplified it by ignoring
the per-console setting. But it is not correct. For example,
you might need to delay the printing in boot_delay_msec()
also on the fast console. Also this was the reason to remove
one optimization in console_unlock().

I thought about a reasonable solution and came up with something like:

static bool suppress_message_printing(int level, struct console *con)
{
int callable_loglevel;

if (ignore_loglevel || console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH)
return false;

/* Make silent even fast consoles. */
if (console_loglevel == CONSOLE_LOGLEVEL_SILENT)
return true;

if (con)
callable_loglevel = con->min_loglevel;
else
callable_loglevel = max_custom_console_loglevel;

/* Global setting might make all consoles more verbose. */
if (callable_loglevel < console_loglevel)
callable_loglevel = console_loglevel;

return level >= callable_loglevel();
}

Yes, it is complicated. But the logic is complicated. IMHO, this has
the advantage that we do most of the decisions on a single place
and it might be easier to get the picture.

Anyway, max_custom_console_loglevel would be a global variable
defined as:

/*
 * Minimum loglevel of the most talkative registered console.
 * It is a maximum of all registered con->min_logvevel values.
 */
static int max_custom_console_loglevel = LOGLEVEL_EMERG;

The value should get updated when any console is registered
and when a registered console is manipulated. It means in
register_console(), unregister_console(), and the sysfs
write callbacks.


>  #ifdef CONFIG_BOOT_PRINTK_DELAY
> @@ -2199,22 +2205,11 @@ void console_unlock(void)
>   } else {
>   len = 0;
>   }
> -skip:
> +
>   if (console_seq == log_next_seq)
>   break;
>  
>   msg = log_from_idx(console_idx);
> - if (suppress_message_printing(msg->level)) {
> - /*
> -  * Skip record we have buffered and already printed
> -  * directly to the console when we received it, and
> -  * record that has level above the console loglevel.
> -  */
> - console_idx = log_next(console_idx);
> - console_seq++;
> - goto skip;
> - }

I would like to keep this code. It does not make sense to prepare the
text buffer if it won't be used at all. It would work with the change
that I proposed above.


>   len += msg_print_text(msg, false, text + len, sizeof(text) - 
> len);
>   if (nr_ext_console_drivers) {
>   ext_len = msg_print_ext_header(ext_text,
> @@ -2230,7 +2225,7 @@ void console_unlock(void)
>   raw_spin_unlock(_lock);
>  
>   

Re: [PATCH 1/3] printk: Introduce per-console loglevel setting

2017-10-20 Thread Petr Mladek
On Thu 2017-10-19 16:40:45, Calvin Owens wrote:
> On 09/28/2017 05:43 PM, Calvin Owens wrote:
> >Not all consoles are created equal: depending on the actual hardware,
> >the latency of a printk() call can vary dramatically. The worst examples
> >are serial consoles, where it can spin for tens of milliseconds banging
> >the UART to emit a message, which can cause application-level problems
> >when the kernel spews onto the console.
> 
> Any thoughts on this series? Happy to resend again, but if there are no
> objections I'd love to see it merged sooner rather than later :)
> 
> Happy to resend too, just let me know.

There is no need to resend the patch. It is on my radar and I am
going to look at it.

Please, be patient, you hit conference, illness, after vacation
season. We do not want to unnecessarily delay it but it is
not a trivial change that might be accepted within minutes.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3 v12] printk: Add monotonic, boottime, and realtime timestamps

2017-09-26 Thread Petr Mladek
On Mon 2017-09-18 13:51:00, Prarit Bhargava wrote:
> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
> timestamp to printk messages.  The local hardware clock loses time each
> day making it difficult to determine exactly when an issue has occurred in
> the kernel log, and making it difficult to determine how kernel and
> hardware issues relate to each other in real time.
> 
> Make printk output different timestamps by adding options for no
> timestamp, the local hardware clock, the monotonic clock, the boottime
> clock, and the real clock.  Allow a user to pick one of the clocks by
> using the printk.time kernel parameter.  Output the type of clock in
> /sys/module/printk/parameters/time so userspace programs can interpret the
> timestamp.
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 512f7c2baedd..5e0bf2ef02f7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1201,14 +1204,130 @@ static inline void boot_delay_msec(int level)
> +static int printk_time = CONFIG_PRINTK_TIME_TYPE;
> +
> +static int printk_set_ts_source(enum timestamp_sources ts_source)
> +{
> + int err = 0;


> @@ -1861,6 +1980,7 @@ static size_t msg_print_text(const struct printk_log 
> *msg,
>bool syslog, char *buf, size_t size) { return 0; }
>  static bool suppress_message_printing(int level) { return false; }
>  
> +static int printk_time;

I worried if the variable should have got initialized. But it seems to
be a relic from an older version. The variable is not longer used and
needed when CONFIG_PRINTK is not defined. It is proved by gcc:

  CC  kernel/printk/printk.o
kernel/printk/printk.c:1983:12: warning: ‘printk_time’ defined but not used 
[-Wunused-variable]
 static int printk_time;


>  #endif /* CONFIG_PRINTK */
>  
>  #ifdef CONFIG_EARLY_PRINTK

Otherwise that patch looks fine. With the unused variable removed,
feel free to use:

Reviewed-by: Petr Mladek <pmla...@suse.com>

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3 v11] printk: Add monotonic, boottime, and realtime timestamps

2017-09-15 Thread Petr Mladek
On Tue 2017-09-05 08:06:41, Prarit Bhargava wrote:
> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
> timestamp to printk messages.  The local hardware clock loses time each
> day making it difficult to determine exactly when an issue has occurred in
> the kernel log, and making it difficult to determine how kernel and
> hardware issues relate to each other in real time.
> 
> Make printk output different timestamps by adding options for no
> timestamp, the local hardware clock, the monotonic clock, the boottime
> clock, and the real clock.  Allow a user to pick one of the clocks by
> using the printk.time kernel parameter.  Output the type of clock in
> /sys/module/printk/parameters/time so userspace programs can interpret the
> timestamp.
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fc47863f629c..5aaeb1ebd26c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1202,14 +1205,113 @@ static inline void boot_delay_msec(int level)
>  }
>  #endif
>  
> -static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
> -module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
> +/**
> + * enum timestamp_sources - Timestamp sources for printk() messages.
> + * @PRINTK_TIME_DISABLED: No time stamp.
> + * @PRINTK_TIME_LOCAL: Local hardware clock timestamp.
> + * @PRINTK_TIME_BOOT: Boottime clock timestamp.
> + * @PRINTK_TIME_MONO: Monotonic clock timestamp.
> + * @PRINTK_TIME_REAL: Realtime clock timestamp.
> + */
> +enum timestamp_sources {
> + PRINTK_TIME_DISABLED = 0,
> + PRINTK_TIME_LOCAL = 1,
> + PRINTK_TIME_BOOT = 2,
> + PRINTK_TIME_MONO = 3,
> + PRINTK_TIME_REAL = 4,
> +};
> +
> +static const char * const timestamp_sources_str[5] = {
> + "disabled",
> + "local",
> + "boottime",
> + "monotonic",
> + "realtime",
> +};
> +
> +static int printk_time = CONFIG_PRINTK_TIME_TYPE;
> +
> +static void printk_set_ts_func(void)
> +{
> + switch (printk_time) {
> + case PRINTK_TIME_LOCAL:
> + case PRINTK_TIME_DISABLED:
> + default:
> + printk_get_ts = local_clock;
> + break;

This is slightly confusing. One would expect that local_clock()
will be used when "disabled" is written into
/sys/module/printk/parameters/time. But it is not true
because this function is not called in that case.

I know that you do this to avoid recursion in
printk_get_first_ts(), I know because I read the discussion
about an older version of the patch. But this is less
obvious from the code.


> + case PRINTK_TIME_BOOT:
> + printk_get_ts = ktime_get_boot_fast_ns;
> + break;
> + case PRINTK_TIME_MONO:
> + printk_get_ts = ktime_get_mono_fast_ns;
> + break;
> + case PRINTK_TIME_REAL:
> + printk_get_ts = ktime_get_real_fast_ns;
> + break;
> + }
> +}

I think that it would be cleaner the following way:

static int printk_set_ts_source(enum timestamp_sources ts_source)
{
int err = 0;

switch (ts_source) {
case PRINTK_TIME_LOCAL:
printk_get_ts = local_clock;
break;
case PRINTK_TIME_BOOT:
printk_get_ts = ktime_get_boot_fast_ns;
break;
case PRINTK_TIME_MONO:
printk_get_ts = ktime_get_mono_fast_ns;
break;
case PRINTK_TIME_REAL:
printk_get_ts = ktime_get_real_fast_ns;
break;
case PRINTK_TIME_DISABLED:
/*
 * The timestamp is always stored into the log buffer.
 * Keep the current one.
 */
break;
default:
err = -EINVAL;
break;
}

if (!err)
printk_time = ts_source;
return err;
}

Then we could avoid the recursion directly in printk_get_first_ts().

> +
> +static u64 printk_get_first_ts(void)
> +{
> + printk_set_ts_func();

printk_set_ts_source(printk_time);

/* Fallback for invalid or disabled timestamp source */
if (printk_get_ts == printk_get_first_ts)
printk_get_ts = local_clock;

> + return printk_get_ts();
> +}
> +
> +static int param_set_time(const char *val, const struct kernel_param *kp)
> +{
> + char *param = strstrip((char *)val);
> + int _printk_time = -1;
> + int ts;
int err;
> +
> + if (strlen(param) == 1) {
> + /* Preserve legacy boolean settings */
> + if ((param[0] == '0') || (param[0] == 'n') ||
> + (param[0] == 'N'))
> + _printk_time = PRINTK_TIME_DISABLED;
> + if ((param[0] == '1') || (param[0] == 'y') ||
> + (param[0] == 'Y'))
> + _printk_time = PRINTK_TIME_LOCAL;
> + }
> + if (_printk_time == -1) {
> + for (ts = 0; ts < ARRAY_SIZE(timestamp_sources_str); ts++) {
> 

Re: [PATCH 2/2 v7] printk: Add monotonic, boottime, and realtime timestamps

2017-08-23 Thread Petr Mladek
On Thu 2017-08-17 09:15:39, Prarit Bhargava wrote:
> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
> timestamp to printk messages.  The local hardware clock loses time each
> day making it difficult to determine exactly when an issue has occurred in
> the kernel log, and making it difficult to determine how kernel and
> hardware issues relate to each other in real time.
> 
> Make printk output different timestamps by adding options for no
> timestamp, the local hardware clock, the monotonic clock, the boottime
> clock, and the real clock.  Allow a user to pick one of the clocks by
> using the printk.time kernel parameter.  Output the type of clock in
> /sys/module/printk/parameters/time so userspace programs can interpret the
> timestamp.
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fc47863f629c..3c46217629fd 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1202,14 +1205,144 @@ static inline void boot_delay_msec(int level)
>  }
>  #endif
>  
> -static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
> -module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
> +static int printk_time = CONFIG_PRINTK_TIME_TYPE;
> +static int printk_time_source;
> +
> +/**
> + * enum timestamp_sources - Timestamp sources for printk() messages.
> + * @PRINTK_TIME_UNDEFINED: Timestamp undefined.  This option is not 
> selectable
> + * from the configs, and is used as a reference in the code.
> + * @PRINTK_TIME_DISABLE: No time stamp.
> + * @PRINTK_TIME_LOCAL: Local hardware clock timestamp.
> + * @PRINTK_TIME_BOOT: Boottime clock timestamp.
> + * @PRINTK_TIME_MONO: Monotonic clock timestamp.
> + * @PRINTK_TIME_REAL: Realtime clock timestamp.  On 32-bit
> + * systems selecting the real clock printk timestamp may lead to unlikely
> + * situations where a timestamp is wrong because the real time offset is read
> + * without the protection of a sequence lock when printk_get_ts() is set to
> + * printk_get_real_ns().
> + */
> +enum timestamp_sources {

I guess that this might be static. Also it is related to the
PRINTK_TIME_TYPE Kconfig option. So the name should be
enum printk_time_type or so.


> + PRINTK_TIME_UNDEFINED = 0,

I would use -1 for PRINTK_TIME_UNDEFINED or I would remove it
completely, see below. Then we could use 0 for PRINTK_TIME_DISABLED
which is more expected.


> + PRINTK_TIME_DISABLE = 1,

This should be PRINTK_TIME_DISABLED to be in sync with
the string and style of the other flags.


> + PRINTK_TIME_LOCAL = 2,
> + PRINTK_TIME_BOOT = 3,
> + PRINTK_TIME_MONO = 4,
> + PRINTK_TIME_REAL = 5,
> +};
> +
> +static const char * const timestamp_sources_str[6] = {
> + "undefined",

Do we really need the string "undefined"? Is user able
to see or enter it?


> + "disabled",
> + "local",
> + "boottime",
> + "monotonic",
> + "realtime",
> +};
> +
> +/**
> + * printk_get_real_ns: - Return a realtime timestamp for printk messages
> + * On 32-bit systems selecting the real clock printk timestamp may lead to
> + * unlikely situations where a timestamp is wrong because the real time 
> offset
> + * is read without the protection of a sequence lock.
> + */
> +static u64 printk_get_real_ns(void)
> +{
> + return ktime_get_mono_fast_ns() + ktime_get_real_offset();
> +}
> +
> +static u64 printk_set_timestamp(void)

I would rename this function to printk_set_ts_func() or so.
It will be more clear the it sets the function pointer.


> +{
> + switch (printk_time) {
> + case PRINTK_TIME_LOCAL:
> + case PRINTK_TIME_DISABLE:
> + default:
> + printk_get_ts = local_clock;

The PRINTK_TIME_DISABLE case is more complicated. We should set
printk_get_ts only when it has not been set yet. Otherwise,
we should keep storing the same type of time stamps into
to the log buffer and keep it consistent.


> + break;
> + case PRINTK_TIME_BOOT:
> + printk_get_ts = ktime_get_boot_fast_ns;
> + break;
> + case PRINTK_TIME_MONO:
> + printk_get_ts = ktime_get_mono_fast_ns;
> + break;
> + case PRINTK_TIME_REAL:
> + printk_get_ts = printk_get_real_ns;
> + break;
> + }
> + return printk_get_ts();

On one hand, this side effect helps to reuse the same code but it
looks too hacky to me. Another possibility would be to create:

static u64 printk_get_first_ts()
{
printk_set_ts_function();

return printk_get_ts();
}

and initialize:

static u64 (*printk_get_ts)(void) = printk_get_first_ts();

Another side effect is that we might not need PRINTK_TIME_UNDEFINED.


> +}
> +
> +static int printk_time_set(const char *val, const struct kernel_param *kp)
> +{
> + char *param = strstrip((char *)val);
> + int _printk_time = PRINTK_TIME_UNDEFINED;
> + int ts;
> +
> + if (strlen(param) == 1) {
> + /* Preserve legacy boolean settings */
> + if 

Re: [PATCH 2/2 v7] printk: Add monotonic, boottime, and realtime timestamps

2017-08-22 Thread Petr Mladek
On Tue 2017-08-22 10:09:40, Prarit Bhargava wrote:
> 
> 
> On 08/17/2017 11:30 AM, Mark Salyzyn wrote:
> > On 08/17/2017 06:15 AM, Prarit Bhargava wrote:
> >> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
> >> timestamp to printk messages.  The local hardware clock loses time each
> >> day making it difficult to determine exactly when an issue has occurred in
> >> the kernel log, and making it difficult to determine how kernel and
> >> hardware issues relate to each other in real time.
> > Congrats, greatly eases merges into older kernels, and has eliminated the 
> > churn
> > this could place on all the configured systems out there.
> > 
> > Sadly, one of my suggestions did not quite go the way I expected ;-} easy to
> > correct, and fix (I missed a spot in my original suggestion, as code changed
> > underfoot over the set ;-/). (see bottom)
> 
> Added.  I will send out v8 shortly.

It might make sense to wait a bit. I am in the middle of v7 review and
have several comments.

I would suggest to slow down this a bit anyway. You sent 7 versions
within three weeks. It improved nicely over time. But I think that
I am not the only one who has troubles to follow.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] kmod: help make deterministic

2017-06-27 Thread Petr Mladek
On Tue 2017-06-27 02:27:44, Luis R. Rodriguez wrote:
> On Tue, Jun 27, 2017 at 12:44:42AM +0200, Luis R. Rodriguez wrote:
> > On Mon, Jun 26, 2017 at 11:37:36PM +0200, Jessica Yu wrote:
> > > +++ Kees Cook [20/06/17 17:23 -0700]:
> > > > On Tue, Jun 20, 2017 at 1:56 PM, Luis R. Rodriguez  
> > > > wrote:
> > > > > On Fri, May 26, 2017 at 02:12:24PM -0700, Luis R. Rodriguez wrote:
> > > > > > Luis R. Rodriguez (4):
> > > > > >   module: use list_for_each_entry_rcu() on find_module_all()
> > > > > >   kmod: reduce atomic operations on kmod_concurrent and simplify
> > > > > >   kmod: add test driver to stress test the module loader
> > > > > >   kmod: throttle kmod thread limit
> > > > > 
> > > > > About a month now with no further nitpicks. What tree should these 
> > > > > changes
> > > > > go through if there are no issues? Andrew's, Jessica's ?
> > > > 
> > > > Seems like going through Jessica's would make the most sense?
> > > 
> > > Would be happy to take patches 01 (which I need to anyway), 02,
> > > possibly 04 if decoupled from the test driver (03).
> > 
> > Feel free to decouple it, but note that then the commit log must then be
> > changed. My own take is this fix is not so critical as it is a corner case, 
> > so
> > I have instead preferred to couple in the test case and respective fix
> > together. I'll leave it up to you how to proceed.
> 
> Note: Linus noted swait is actually very special use-case [0] so I'd hate to
> add a new use case not vetted for. This use case on kmod.c really does *not*
> require anything but a simple wait though, so really am inclined to let that
> through unless I hear back...
> 
> [0] https://lkml.kernel.org/r/20170627001534.gk21...@wotan.suse.de

Heh, I was not aware of this special case either. The welcoming
comment of the swait API confused me as well.

In this light, I suggest to switch the patch to using the normal wait API.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] kmod: throttle kmod thread limit

2017-06-26 Thread Petr Mladek
On Fri 2017-06-23 12:20:11, Luis R. Rodriguez wrote:
> If we reach the limit of modprobe_limit threads running the next
> request_module() call will fail. The original reason for adding
> a kill was to do away with possible issues with in old circumstances
> which would create a recursive series of request_module() calls.
> 
> We can do better than just be super aggressive and reject calls
> once we've reached the limit by simply making pending callers wait
> until the threshold has been reduced, and then throttling them in,
> one by one.
> 
> This throttling enables requests over the kmod concurrent limit to
> be processed once a pending request completes. Only the first item
> queued up to wait is woken up. The assumption here is once a task
> is woken it will have no other option to also kick the queue to check
> if there are more pending tasks -- regardless of whether or not it
> was successful.
> 
> By throttling and processing only max kmod concurrent tasks we ensure
> we avoid unexpected fatal request_module() calls, and we keep memory
> consumption on module loading to a minimum.
> 
> With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
> time to run both tests:
> 
> time ./kmod.sh -t 0008
> real0m16.523s
> user0m0.879s
> sys 0m8.977s
> 
> time ./kmod.sh -t 0009
> real0m56.080s
> user0m0.717s
> sys 0m10.324s
> 
> Signed-off-by: Luis R. Rodriguez 

All the changes look fine to me. They make perfect sense.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] kmod: reduce atomic operations on kmod_concurrent and simplify

2017-06-26 Thread Petr Mladek
On Fri 2017-05-26 14:12:26, Luis R. Rodriguez wrote:
> atomic operations. We do this by inverting the logic of of the enabler,
> instead of incrementing kmod_concurrent as we get new kmod users, define the
> variable kmod_concurrent_max as the max number of currently allowed kmod
> users and as we get new kmod users just decrement it if its still positive.
> This combines the dec and read in one atomic operation.
> 
> In this case we no longer get the same false failure:
> 
> CPU1  CPU2
> atomic_dec_if_positive()
>   atomic_dec_if_positive()
> atomic_inc()
>   atomic_inc()
> 
> Suggested-by: Petr Mladek <pmla...@suse.com>
> Suggested-by: Dmitry Torokhov <dmitry.torok...@gmail.com>
> Signed-off-by: Luis R. Rodriguez <mcg...@kernel.org>

The change looks fine to me. The code is much easier and less hacky.

Reviewed-by: Petr Mladek <pmla...@suse.com>

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] kmod: throttle kmod thread limit

2017-06-26 Thread Petr Mladek
On Fri 2017-06-23 21:16:37, Luis R. Rodriguez wrote:
> On Fri, Jun 23, 2017 at 07:56:11PM +0200, Luis R. Rodriguez wrote:
> > On Fri, Jun 23, 2017 at 06:16:19PM +0200, Luis R. Rodriguez wrote:
> > > On Thu, Jun 22, 2017 at 05:19:36PM +0200, Petr Mladek wrote:
> > > > On Fri 2017-05-26 14:12:28, Luis R. Rodriguez wrote:
> > > > > --- a/kernel/kmod.c
> > > > > +++ b/kernel/kmod.c
> > > > > @@ -178,6 +175,7 @@ int __request_module(bool wait, const char *fmt, 
> > > > > ...)
> > > > >   ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : 
> > > > > UMH_WAIT_EXEC);
> > > > >  
> > > > >   atomic_inc(_concurrent_max);
> > > > > + wake_up_all(_wq);
> > > > 
> > > > Does it make sense to wake up all waiters when we released the resource
> > > > only for one? IMHO, a simple wake_up() should be here.
> > > 
> > > Then we should wake_up() also on failure, otherwise we have the potential
> > > to not wake some in a proper time.
> > 
> > I checked and it turns out we have no error paths after we consume a kmod
> > ticket, if you will. Once we bump with atomic_dec_if_positive() we assume
> > we're moving forward with an attempt, and the only failure path is already
> > bundled with a wake at the end of the __request_module() call.
> > 
> > Then the next question would be *who* exactly gets woken up next if we just
> > use wake_up() ? The common core wake up code varies depending on use and
> > all this reminded me of the complexity we just don't need, so I have now
> > converted to use swait. swait uses list_add() if empty and then iterates
> > with list_first_entry() on wakeup, so that should get the first item added
> > to the wait list.
> > 
> > Works with me. Will run a test a before v4 is sent, but since only 2 patches
> > are modified will only send a respective update for these 2 patches.
> 
> Alright, this worked out well! Its just a tiny bit slower on test cases 0008
> and 0009 (few seconds) but that's fine, its natural due to the lack of the
> swake_up_all().

This is interesting. I guess that it was faster with swake_up_all()
because it worked as a speculative pre-wake. I mean that it takes some
time between adding a process into run-queue and really running it.
IMHO, swake_up_all() caused that __request_module() callers were
more often really running and trying to pass that
atomic_dec_if_positive(_concurrent_max) >= 0).

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] kmod: throttle kmod thread limit

2017-06-26 Thread Petr Mladek
On Fri 2017-06-23 18:16:19, Luis R. Rodriguez wrote:
> On Thu, Jun 22, 2017 at 05:19:36PM +0200, Petr Mladek wrote:
> > On Fri 2017-05-26 14:12:28, Luis R. Rodriguez wrote:
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -163,14 +163,11 @@ int __request_module(bool wait, const char *fmt, 
> > > ...)
> > >   return ret;
> > >  
> > >   if (atomic_dec_if_positive(_concurrent_max) < 0) {
> > > - /* We may be blaming an innocent here, but unlikely */
> > > - if (kmod_loop_msg < 5) {
> > > - printk(KERN_ERR
> > > -"request_module: runaway loop modprobe %s\n",
> > > -module_name);
> > > - kmod_loop_msg++;
> > > - }
> > > - return -ENOMEM;
> > > + pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) 
> > > close to 0 (max_modprobes: %u), for module %s\n, throttling...",
> > > + atomic_read(_concurrent_max),
> > > + 50, module_name);
> > 
> > It is weird to pass the constant '50' via %s.
> 
> The 50 was passed with %u, so I take it you meant it is odd to use a parameter
> for it.

Yeah, I meant %u and not %s.

> > Also a #define should be
> > used to keep it in sync with the kmod_concurrent_max initialization.
> 
> OK.
> 
> > > + wait_event_interruptible(kmod_wq,
> > > +  
> > > atomic_dec_if_positive(_concurrent_max) >= 0);
> > >   }
> > >  
> > >   trace_module_request(module_name, wait, _RET_IP_);
> > > @@ -178,6 +175,7 @@ int __request_module(bool wait, const char *fmt, ...)
> > >   ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
> > >  
> > >   atomic_inc(_concurrent_max);
> > > + wake_up_all(_wq);
> > 
> > Does it make sense to wake up all waiters when we released the resource
> > only for one? IMHO, a simple wake_up() should be here.
> 
> Then we should wake_up() also on failure, otherwise we have the potential
> to not wake some in a proper time.

I think that we must wake_up() always when we increment
kmod_concurrent_max. If the value was negative, the increment will
allow exactly one process to pass that
atomic_dec_if_positive(_concurrent_max) >= 0). It the value
is positive, there must have been other wake_up() calls or there
is no waiter.

IMHO, this works because kmod_concurrent_max handling is atomic
and race-less now. Also (s)wait_event_interruptible() is safe
and does not allow to get into sleep when the resource is available.

Anyway, it is great that you have double checked this.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] kmod: throttle kmod thread limit

2017-06-22 Thread Petr Mladek
On Fri 2017-05-26 14:12:28, Luis R. Rodriguez wrote:
> If we reach the limit of modprobe_limit threads running the next
> request_module() call will fail. The original reason for adding
> a kill was to do away with possible issues with in old circumstances
> which would create a recursive series of request_module() calls.
> We can do better than just be super aggressive and reject calls
> once we've reached the limit by simply making pending callers wait
> until the threshold has been reduced.
> 
> The only difference is the clutch helps with avoiding making
> request_module() requests fatal more often. With x86_64 qemu,
> with 4 cores, 4 GiB of RAM it takes the following run time to
> run both tests:
> 
> time ./kmod.sh -t 0008
> real0m12.364s
> user0m0.704s
> sys 0m5.373s
> 
> time ./kmod.sh -t 0009
> real0m47.638s
> user0m1.033s
> sys 0m5.425s
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  kernel/kmod.c| 16 +++-
>  tools/testing/selftests/kmod/kmod.sh | 24 ++--
>  2 files changed, 9 insertions(+), 31 deletions(-)
> 
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 3e346c700e80..46b12fed6fd0 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -163,14 +163,11 @@ int __request_module(bool wait, const char *fmt, ...)
>   return ret;
>  
>   if (atomic_dec_if_positive(_concurrent_max) < 0) {
> - /* We may be blaming an innocent here, but unlikely */
> - if (kmod_loop_msg < 5) {
> - printk(KERN_ERR
> -"request_module: runaway loop modprobe %s\n",
> -module_name);
> - kmod_loop_msg++;
> - }
> - return -ENOMEM;
> + pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) 
> close to 0 (max_modprobes: %u), for module %s\n, throttling...",
> + atomic_read(_concurrent_max),
> + 50, module_name);

It is weird to pass the constant '50' via %s. Also a #define should be
used to keep it in sync with the kmod_concurrent_max initialization.


> + wait_event_interruptible(kmod_wq,
> +  
> atomic_dec_if_positive(_concurrent_max) >= 0);
>   }
>  
>   trace_module_request(module_name, wait, _RET_IP_);
> @@ -178,6 +175,7 @@ int __request_module(bool wait, const char *fmt, ...)
>   ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>  
>   atomic_inc(_concurrent_max);
> + wake_up_all(_wq);

Does it make sense to wake up all waiters when we released the resource
only for one? IMHO, a simple wake_up() should be here.

I am sorry for the late review. The month ran really fast.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kmod: preempt on kmod_umh_threads_get()

2017-05-25 Thread Petr Mladek
On Wed 2017-05-24 19:27:38, Dmitry Torokhov wrote:
> On Thu, May 25, 2017 at 03:00:17AM +0200, Luis R. Rodriguez wrote:
> > On Wed, May 24, 2017 at 05:45:37PM -0700, Dmitry Torokhov wrote:
> > > On Thu, May 25, 2017 at 02:14:52AM +0200, Luis R. Rodriguez wrote:
> > > > On Fri, May 19, 2017 at 03:27:12PM -0700, Dmitry Torokhov wrote:
> > > > > On Thu, May 18, 2017 at 08:24:43PM -0700, Luis R. Rodriguez wrote:
> > > > > > In theory it is possible multiple concurrent threads will try to
> > > > > > kmod_umh_threads_get() and as such atomic_inc(_concurrent) at
> > > > > > the same time, therefore enabling a small time during which we've
> > > > > > bumped kmod_concurrent but have not really enabled work. By using
> > > > > > preemption we mitigate this a bit.
> > > > > > 
> > > > > > Preemption is not needed when we kmod_umh_threads_put().
> > > > > > 
> > > > > > Signed-off-by: Luis R. Rodriguez 
> > > > > > ---
> > > > > >  kernel/kmod.c | 24 ++--
> > > > > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > > > index 563600fc9bb1..7ea11dbc7564 100644
> > > > > > --- a/kernel/kmod.c
> > > > > > +++ b/kernel/kmod.c
> > > > > > @@ -113,15 +113,35 @@ static int call_modprobe(char *module_name, 
> > > > > > int wait)
> > > > > >  
> > > > > >  static int kmod_umh_threads_get(void)
> > > > > >  {
> > > > > > +   int ret = 0;
> > > > > > +
> > > > > > +   /*
> > > > > > +* Disabling preemption makes sure that we are not rescheduled 
> > > > > > here
> > > > > > +*
> > > > > > +* Also preemption helps kmod_concurrent is not increased by 
> > > > > > mistake
> > > > > > +* for too long given in theory two concurrent threads could 
> > > > > > race on
> > > > > > +* atomic_inc() before we atomic_read() -- we know that's 
> > > > > > possible
> > > > > > +* and but we don't care, this is not used for object 
> > > > > > accounting and
> > > > > > +* is just a subjective threshold. The alternative is a lock.
> > > > > > +*/
> > > > > > +   preempt_disable();
> > > > > > atomic_inc(_concurrent);
> > > > > > if (atomic_read(_concurrent) <= max_modprobes)
> > > > > 
> > > > > That is very "fancy" way to basically say:
> > > > > 
> > > > >   if (atomic_inc_return(_concurrent) <= max_modprobes)
> > > > 
> > > > Do you mean to combine the atomic_inc() and atomic_read() in one as you 
> > > > noted
> > > > (as that is not a change in this patch), *or* that using a memory 
> > > > barrier here
> > > > with atomic_inc_return() should suffice to address the same and avoid an
> > > > explicit preemption  enable / disable ?
> > > 
> > > I am saying that atomic_inc_return() will avoid situation where you have
> > > more than one threads incrementing the counter and believing that they
> > > are [not] allowed to start modprobe.
> > > 
> > > I have no idea why you think preempt_disable() would help here. It only
> > > ensures that current thread will not be preempted between the point
> > > where you update the counter and where you check the result. It does not
> > > stop interrupts nor does it affect other threads that might be updating
> > > the same counter.
> > 
> > The preemption was inspired by __module_get() and try_module_get(), was that
> > rather silly ?
> 
> As far as I can see prrempt_disable() was needed in __module_get() when
> modules user per-cpu refcounts: you did not want to move away from CPU
> while manipulating refcount.
> 
> Now that modules use simple atomics for refcounting I think these
> preempt_disable() and preempt_enable() can be removed.

preempt_disable() still might be useful because you do the
atomic_dec() when you reach the limit.

By other words, you have three operations that should be atomic:
inc, read, and dec. atomic_inc_return() covers only two of them.

Hmm, a solution might be to use atomic_dec_if_positive().
I would kmod_concurrent to something like kmod_concurrent_allowed,
intialize it with the maximum allowed number. Then you could do:

static int kmod_umh_threads_get(void)
{
if (atomic_dec_if_positive(kmod_concurrent_available) < 0)
return -EBUSY;
return 0;
}

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching

2017-01-06 Thread Petr Mladek
On Fri 2017-01-06 15:00:45, Miroslav Benes wrote:
> The Limitations section of the documentation describes the impossibility
> to livepatch anything that is inlined to __schedule() function. This had
> been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
> ("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
> __switch_to_asm function now (implemented in assembly) called properly
> from context_switch(). RIP is thus saved on the stack and a task would
> return to proper version of __schedule() et al. functions.
> 
> Of course __switch_to_asm() is not patchable for the reason described in
> the section. But there is no __fentry__ call and I cannot imagine a
> reason to do it anyway.
> 
> Therefore, remove the paragraphs from the section.
> 
> Signed-off-by: Miroslav Benes <mbe...@suse.cz>

It is great to get a feature for free ;-)

Reviewed-by: Petr Mladek <pmla...@suse.com>

Best Regards,
Petr

---
> FWIW, I also tested this to be sure on top of the consistency model
> patch set. I patched schedule() function which calls __schedule() (it is
> impossible to patch it directly due to notrace attribute). It works well
> except...
> 
> 1. the patching process does not finish, because many tasks sleep in
> schedule. STOP/CONT signal does not help. I'll investigate.

Are these userspace processes or kthreads? Kthreads would cause
problems because they do not handle signals.


> 2. reversion of the process does not work as expected. The kernel
> crashes after the removal of the module. A task very likely slept in
> schedule and was not migrated properly. It might be because of the races
> in klp_reverse_transition() described by Petr, or might be somewhere
> else. I'll look into it.

I hope that I will be able to do another dive into the consistency
model patchset the following week.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 09/10] kmod: add helpers for getting kmod count and limit

2016-12-15 Thread Petr Mladek
On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> This adds helpers for getting access to the kmod count and limit from
> userspace. While at it, this also lets userspace fine tune the kmod
> limit after boot, it uses the shiny new proc_douintvec_minmax().
> 
> These knobs should help userspace more gracefully and deterministically
> handle module loading.
>
> Signed-off-by: Luis R. Rodriguez 
> ---
>  include/linux/kmod.h |  8 +
>  kernel/kmod.c| 83 
> ++--
>  kernel/sysctl.c  | 14 +
>  3 files changed, 103 insertions(+), 2 deletions(-)

I am not sure if it is worth it. As you say in the 3rd patch,
there was rather low limit for 16 years and nobody probably had
problems with it.

Anyway, it seems that such know should also get documented in
Documentation/sysctl/kernel.txt

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: livepatch/x86: apply alternatives and paravirt patches after relocations

2016-08-19 Thread Petr Mladek
On Thu 2016-08-18 14:03:13, Jessica Yu wrote:
> +++ Petr Mladek [18/08/16 11:51 +0200]:
> >On Wed 2016-08-17 20:58:29, Jessica Yu wrote:
> >>Implement arch_klp_init_object_loaded() for x86, which applies
> >>alternatives/paravirt patches. This fixes the order in which relocations
> >>and alternatives/paravirt patches are applied.
> >>
> >>--- /dev/null
> >>+++ b/arch/x86/kernel/livepatch.c
> >>+   for (s = info->sechdrs; s < info->sechdrs + info->hdr.e_shnum; s++) {
> >>+   /* Apply per-object .klp.arch sections */
> >>+   cnt = sscanf(info->secstrings + s->sh_name,
> >>+".klp.arch.%55[^.].%127s",
> >>+sec_objname, secname);
> >>+   if (cnt != 2)
> >>+   continue;
> >>+   if (strcmp(sec_objname, objname))
> >>+   continue;
> >>+   if (!strcmp(".altinstructions", secname))
> >
> >The previous version of the patch compared against "altinstructions"
> >(without the dot). I admit that I haven't tested it but the dot
> >looks suspicious here.
> 
> Good eye, I should have explained why the dot is needed in the strcmp..
> So, the new documentation states that any arch-specific sections to
> be applied by livepatch are to be prefixed with the string
> ".klp.arch.$objname.", note the required dot at the end of this prefix.
> 
> So for example, if we have a .parainstructions section with a patch
> for the kvm module, the prefixed section name would look like:
> 
>   .klp.arch.kvm..parainstructions
>   ^   prefix   ^^ original name ^
> 
> That extra dot looks weird, but it is needed when we have section names
> like "__ftr_fixup" on powerpc. Without the extra dot at the end of
> ".klp.arch.$objname." We'd get names like ".klp.arch.$objname__ftr_fixup",
> and we wouldn't be able to tell where the objname ends and where the
> section name begins. But with ".klp.arch.$objname.__ftr_fixup", we
> have a hard delimeter and know that after the dot after $objname comes
> the original section name.

That is a bit unfortunate but it makes perfect sense.
Thanks a lot for explanation.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] Documentation: livepatch: add section about arch-specific code

2016-08-18 Thread Petr Mladek
On Wed 2016-08-17 20:58:30, Jessica Yu wrote:
> Document usage of arch-specific elf sections in livepatch as well
> as implementation of arch-specific code.
> 
> Signed-off-by: Jessica Yu <j...@redhat.com>
> ---
>  Documentation/livepatch/module-elf-format.txt | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/livepatch/module-elf-format.txt 
> b/Documentation/livepatch/module-elf-format.txt
> index eedbdcf..02bfafa 100644
> --- a/Documentation/livepatch/module-elf-format.txt
> +++ b/Documentation/livepatch/module-elf-format.txt
> @@ -25,7 +25,8 @@ Table of Contents
> 3.3.2 Required name format
> 3.3.3 Example livepatch symbol names
> 3.3.4 Example `readelf --symbols` output
> -4. Symbol table and Elf section access
> +4. Architecture-specific sections
> +5. Symbol table and Elf section access
>  
>  
>  0. Background and motivation
> @@ -46,7 +47,7 @@ architecture.
>  
>  Since apply_relocate_add() requires access to a module's section header
>  table, symbol table, and relocation section indices, Elf information is
> -preserved for livepatch modules (see section 4). Livepatch manages its own
> +preserved for livepatch modules (see section 5). Livepatch manages its own
>  relocation sections and symbols, which are described in this document. The
>  Elf constants used to mark livepatch symbols and relocation sections were
>  selected from OS-specific ranges according to the definitions from glibc.
> @@ -117,7 +118,7 @@ also possible for a livepatch module to have no livepatch 
> relocation
>  sections, as in the case of the sample livepatch module (see
>  samples/livepatch).
>  
> -Since Elf information is preserved for livepatch modules (see Section 4), a
> +Since Elf information is preserved for livepatch modules (see Section 5), a
>  livepatch relocation section can be applied simply by passing in the
>  appropriate section index to apply_relocate_add(), which then uses it to
>  access the relocation section and apply the relocations.
> @@ -292,8 +293,19 @@ Symbol table '.symtab' contains 127 entries:
>  [*] Note that the 'Ndx' (Section index) for these symbols is SHN_LIVEPATCH 
> (0xff20).
>  "OS" means OS-specific.
>  
> +-
> +4. Architecture-specific sections
> +-
> +Architectures may override arch_klp_init_object_loaded() to perform
> +additional arch-specific tasks when a target module loads, such as applying
> +arch-specific sections. On x86 for example, we must apply per-object
> +.altinstructions and .parainstructions sections when a target module loads.
> +These sections can be prefixed with ".klp.arch.$objname." so that they can
  ^^^

I would personally use "must" instead of "can". Or replace "can be prefixed"
with "are prefixed".

Otherwise, it looks fine.

Reviewed-by: Petr Mladek <pmla...@suse.com>

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] livepatch: use arch_klp_init_object_loaded() to finish arch-specific tasks

2016-08-18 Thread Petr Mladek
On Wed 2016-08-17 20:58:28, Jessica Yu wrote:
> Introduce arch_klp_init_object_loaded() to complete any additional
> arch-specific tasks during patching. Architecture code may override this
> function.
> 
> Signed-off-by: Jessica Yu <j...@redhat.com>

Reviewed-by: Petr Mladek <pmla...@suse.com>

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] livepatch/x86: apply alternatives and paravirt patches after relocations

2016-08-18 Thread Petr Mladek
On Wed 2016-08-17 20:58:29, Jessica Yu wrote:
> Implement arch_klp_init_object_loaded() for x86, which applies
> alternatives/paravirt patches. This fixes the order in which relocations
> and alternatives/paravirt patches are applied.
> 
> Previously, if a patch module had alternatives or paravirt patches,
> these were applied first by the module loader before livepatch can apply
> per-object relocations. The (buggy) sequence of events was:
> 
> (1) Load patch module
> (2) Apply alternatives and paravirt patches to patch module
> * Note that these are applied to the new functions in the patch module
> (3) Apply per-object relocations to patch module when target module loads.
> * This clobbers what was written in step 2
> 
> This lead to crashes and corruption in general, since livepatch would
> overwrite or step on previously applied alternative/paravirt patches.
> The correct sequence of events should be:
> 
> (1) Load patch module
> (2) Apply per-object relocations to patch module
> (3) Apply alternatives and paravirt patches to patch module
> 
> This is fixed by delaying paravirt/alternatives patching until after
> relocations are applied. Any .altinstructions or .parainstructions
> sections are prefixed with ".klp.arch.${objname}" and applied in
> arch_klp_init_object_loaded().
> 
> Signed-off-by: Jessica Yu 
> ---
>  arch/x86/kernel/Makefile|  1 +
>  arch/x86/kernel/livepatch.c | 65 
> +
>  2 files changed, 66 insertions(+)
>  create mode 100644 arch/x86/kernel/livepatch.c
> 
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index d3f49c3..92fd50c 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -81,6 +81,7 @@ obj-$(CONFIG_X86_MPPARSE)   += mpparse.o
>  obj-y+= apic/
>  obj-$(CONFIG_X86_REBOOTFIXUPS)   += reboot_fixups_32.o
>  obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
> +obj-$(CONFIG_LIVEPATCH)  += livepatch.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
>  obj-$(CONFIG_FTRACE_SYSCALLS)+= ftrace.o
>  obj-$(CONFIG_X86_TSC)+= trace_clock.o
> diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c
> new file mode 100644
> index 000..e9d252d
> --- /dev/null
> +++ b/arch/x86/kernel/livepatch.c
> @@ -0,0 +1,65 @@
> +/*
> + * livepatch.c - x86-specific Kernel Live Patching Core
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Apply per-object alternatives. Based on x86 module_finalize() */
> +void arch_klp_init_object_loaded(struct klp_patch *patch,
> +  struct klp_object *obj)
> +{
> + int cnt;
> + struct klp_modinfo *info;
> + Elf_Shdr *s, *alt = NULL, *para = NULL;
> + void *aseg, *pseg;
> + const char *objname;
> + char sec_objname[MODULE_NAME_LEN];
> + char secname[KSYM_NAME_LEN];
> +
> + info = patch->mod->klp_info;
> + objname = obj->name ? obj->name : "vmlinux";
> +
> + /* See livepatch core code for BUILD_BUG_ON() explanation */
> + BUILD_BUG_ON(MODULE_NAME_LEN < 56 || KSYM_NAME_LEN != 128);
> +
> + for (s = info->sechdrs; s < info->sechdrs + info->hdr.e_shnum; s++) {
> + /* Apply per-object .klp.arch sections */
> + cnt = sscanf(info->secstrings + s->sh_name,
> +  ".klp.arch.%55[^.].%127s",
> +  sec_objname, secname);
> + if (cnt != 2)
> + continue;
> + if (strcmp(sec_objname, objname))
> + continue;
> + if (!strcmp(".altinstructions", secname))

The previous version of the patch compared against "altinstructions"
(without the dot). I admit that I haven't tested it but the dot
looks suspicious here.

> + alt = s;
> + if (!strcmp(".parainstructions", secname))

Same here.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] capabilities: add capability cgroup controller

2016-07-08 Thread Petr Mladek
On Thu 2016-07-07 20:27:13, Topi Miettinen wrote:
> On 07/07/16 09:16, Petr Mladek wrote:
> > On Sun 2016-07-03 15:08:07, Topi Miettinen wrote:
> >> The attached patch would make any uses of capabilities generate audit
> >> messages. It works for simple tests as you can see from the commit
> >> message, but unfortunately the call to audit_cgroup_list() deadlocks the
> >> system when booting a full blown OS. There's no deadlock when the call
> >> is removed.
> >>
> >> I guess that in some cases, cgroup_mutex and/or css_set_lock could be
> >> already held earlier before entering audit_cgroup_list(). Holding the
> >> locks is however required by task_cgroup_from_root(). Is there any way
> >> to avoid this? For example, only print some kind of cgroup ID numbers
> >> (are there unique and stable IDs, available without locks?) for those
> >> cgroups where the task is registered in the audit message?
> > 
> > I am not sure if anyone know what really happens here. I suggest to
> > enable lockdep. It might detect possible deadlock even before it
> > really happens, see Documentation/locking/lockdep-design.txt
> > 
> > It can be enabled by
> > 
> >CONFIG_PROVE_LOCKING=y
> > 
> > It depends on
> > 
> > CONFIG_DEBUG_KERNEL=y
> > 
> > and maybe some more options, see lib/Kconfig.debug
> 
> Thanks a lot! I caught this stack dump:
> 
> starting version 230
> [3.416647] [ cut here ]
> [3.417310] WARNING: CPU: 0 PID: 95 at
> /home/topi/d/linux.git/kernel/locking/lockdep.c:2871
> lockdep_trace_alloc+0xb4/0xc0
> [3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> [3.417923] Modules linked in:
> [3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97
> [3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Debian-1.8.2-1 04/01/2014
> [3.418726]  0086 7970f3b0 8816fb00
> 813c9c45
> [3.418993]  8816fb50  8816fb40
> 81091e9b
> [3.419176]  0b3705e2c798 0046 0410
> 
> [3.419374] Call Trace:
> [3.419511]  [] dump_stack+0x67/0x92
> [3.419644]  [] __warn+0xcb/0xf0
> [3.419745]  [] warn_slowpath_fmt+0x5f/0x80
> [3.419868]  [] lockdep_trace_alloc+0xb4/0xc0
> [3.419988]  [] kmem_cache_alloc_node+0x42/0x600
> [3.420156]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [3.420170]  [] __alloc_skb+0x5b/0x1d0
> [3.420170]  [] audit_log_start+0x29b/0x480
> [3.420170]  [] ? __lock_task_sighand+0x95/0x270
> [3.420170]  [] audit_log_cap_use+0x39/0xf0
> [3.420170]  [] ns_capable+0x45/0x70
> [3.420170]  [] capable+0x17/0x20
> [3.420170]  [] oom_score_adj_write+0x150/0x2f0
> [3.420170]  [] __vfs_write+0x37/0x160
> [3.420170]  [] ? update_fast_ctr+0x17/0x30
> [3.420170]  [] ? percpu_down_read+0x49/0x90
> [3.420170]  [] ? __sb_start_write+0xb7/0xf0
> [3.420170]  [] ? __sb_start_write+0xb7/0xf0
> [3.420170]  [] vfs_write+0xb8/0x1b0
> [3.420170]  [] ? __fget_light+0x66/0x90
> [3.420170]  [] SyS_write+0x58/0xc0
> [3.420170]  [] do_syscall_64+0x5c/0x300
> [3.420170]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [3.420170] ---[ end trace fb586899fb556a5e ]---
> [3.447922] random: systemd-udevd urandom read with 3 bits of entropy
> available
> [4.014078] clocksource: Switched to clocksource tsc
> Begin: Loading essential drivers ... done.
> 
> This is with qemu and the boot continues normally. With real computer,
> there's no such output and system just seems to freeze.
> 
> Could it be possible that the deadlock happens because there's some IO
> towards /sys/fs/cgroup, which causes a capability check and that in turn
> causes locking problems when we try to print cgroup list?

The above warning is printed by the code from
kernel/locking/lockdep.c:2871

static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
{
[...]
/* We're only interested __GFP_FS allocations for now */
if (!(gfp_mask & __GFP_FS))
return;

/*
 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
 */
if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
return;


The backtrace shows that your new audit_log_cap_use() is called
from vfs_write(). You might try to use audit_log_start() with
GFP_NOFS instead of GFP_KERNEL.

Note that this is rather intuitive advice. I still need to learn a lot
about memory management and kernel in general to be more sure about
a correct solution.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] capabilities: add capability cgroup controller

2016-07-07 Thread Petr Mladek
On Sun 2016-07-03 15:08:07, Topi Miettinen wrote:
> The attached patch would make any uses of capabilities generate audit
> messages. It works for simple tests as you can see from the commit
> message, but unfortunately the call to audit_cgroup_list() deadlocks the
> system when booting a full blown OS. There's no deadlock when the call
> is removed.
> 
> I guess that in some cases, cgroup_mutex and/or css_set_lock could be
> already held earlier before entering audit_cgroup_list(). Holding the
> locks is however required by task_cgroup_from_root(). Is there any way
> to avoid this? For example, only print some kind of cgroup ID numbers
> (are there unique and stable IDs, available without locks?) for those
> cgroups where the task is registered in the audit message?

I am not sure if anyone know what really happens here. I suggest to
enable lockdep. It might detect possible deadlock even before it
really happens, see Documentation/locking/lockdep-design.txt

It can be enabled by

   CONFIG_PROVE_LOCKING=y

It depends on

CONFIG_DEBUG_KERNEL=y

and maybe some more options, see lib/Kconfig.debug


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/6] livepatch: reuse module loader code to write relocations

2016-03-21 Thread Petr Mladek
On Wed 2016-03-16 15:47:06, Jessica Yu wrote:
> Reuse module loader code to write relocations, thereby eliminating the need
> for architecture specific relocation code in livepatch. Specifically, reuse
> the apply_relocate_add() function in the module loader to write relocations
> instead of duplicating functionality in livepatch's arch-dependent
> klp_write_module_reloc() function.
> 
> In order to accomplish this, livepatch modules manage their own relocation
> sections (marked with the SHF_RELA_LIVEPATCH section flag) and
> livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
> index). To apply livepatch relocation sections, livepatch symbols
> referenced by relocs are resolved and then apply_relocate_add() is called
> to apply those relocations.
> 
> In addition, remove x86 livepatch relocation code and the s390
> klp_write_module_reloc() function stub. They are no longer needed since
> relocation work has been offloaded to module loader.

Most of the problems were covered by Mirek and Josh. I agree with
them. Please read two more comments below.

> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 780f00c..2aa20fa 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> +static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
> +{
> + int i, cnt, vmlinux, ret;
> + struct klp_buf bufs = {0};
> + Elf_Rela *relas;
> + Elf_Sym *sym;
> + char *symname;
> + unsigned long sympos;
> +
> + relas = (Elf_Rela *) relasec->sh_addr;
> + /* For each rela in this klp relocation section */
> + for (i = 0; i < relasec->sh_size / sizeof(Elf_Rela); i++) {
> + sym = pmod->core_kallsyms.symtab + ELF_R_SYM(relas[i].r_info);
> + if (sym->st_shndx != SHN_LIVEPATCH)
> + return -EINVAL;
> +
> + klp_clear_buf();
> +
> + /* Format: .klp.sym.objname.symbol_name,sympos */
> + symname = pmod->core_kallsyms.strtab + sym->st_name;
> + cnt = sscanf(symname, ".klp.sym.%64[^.].%128[^,],%lu",
> +  bufs.objname, bufs.symname, );

Note that MODULE_NAME_LEN even is not 64. It is defined by:

#define MAX_PARAM_PREFIX_LEN (64 - sizeof(unsigned long))

I strongly suggest to use the proposal from Josh.


> + if (cnt != 3)
> + return -EINVAL;
> +
> + /* klp_find_object_symbol() treats a NULL objname as vmlinux */
> + vmlinux = !strcmp(bufs.objname, "vmlinux");
> + ret = klp_find_object_symbol(vmlinux ? NULL : bufs.objname,
> +  bufs.symname, sympos,
> +  (unsigned long *) >st_value);
> + if (ret)
> + return ret;
>   }
> - preempt_enable();
>  
> - /*
> -  * Check if it's in another .o within the patch module. This also
> -  * checks that the external symbol is unique.
> -  */
> - return klp_find_object_symbol(pmod->name, name, 0, addr);
> + return 0;
>  }

[...]
> @@ -842,6 +867,9 @@ int klp_register_patch(struct klp_patch *patch)
>  {
>   int ret;
>  
> + if (!is_livepatch_module(patch->mod))
> + return -EINVAL;
> +

This breaks bisectability if livepatch-sample is used. Please, merge
the 5th patch here or move it before this one.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/8] Additional kmsg devices

2016-02-26 Thread Petr Mladek
On Fri 2016-02-26 14:22:42, Kazimierz Krosman wrote:
> On 02/25/2016 10:47 PM, Tejun Heo wrote:
> >I'm not sure this is the right layer to implement generic logging
> >facility.
> In general this patches add only one feature- possibility of adding
> and deleting
> new kmsg devices, so I think that it can be treated as kmsg upgrade.
> >>2. Using kmsg can cause lower CPU utilisation in the real-word use case than
> >>>userspace logging mechanisms.
> >>>We created 2 tests: (1) 100 writer processes write to created kmsg buffer 
> >>>and
> >>>(2) 100 writers write to socket (stream)- there is one reader to protect
> >>>socket buffer against overflow. Tests show that cpu utilisation in case of 
> >>>first
> >>>test is about 2.3 times lower (39.1%) than it is in second case (87.7%) 
> >>>(measured
> >>>with top program; tests code is attached below). Tested on Odroid XU4.
> >This sounds like a generic IPC problem than anything else.
> 
> For the test purpose I've written two tests (attached in cover
> letter). I think that tests
> show that in this use case (multiple writers) system with additional
> kmsg devices
> consumes less CPU time than system which use sockets for logging.
> Logging system
> based on sockets needs read process, that continuously reads socket
> and protects
> against socket buffers overflow and messages drop. It is one of
> advantages of this
> solution: no maintenance.

Wait. The net addition of this patch set is 1755 lines out of it
526 lines seems to be in non-test code. You added another level
of complexity into the handling of the ring buffer(s). And it will
require no maintenance?


> Could you explain in more detail what did you mean by IPC problems?

I guess that the idea was to make IPC more effective in general.
You definitely could not move all functionality that needs IPC
into the kernel.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 3/8] kmsg: introduce additional kmsg devices support

2016-02-26 Thread Petr Mladek
On Wed 2016-02-24 12:53:16, Kazimierz Krosman wrote:
> From: Marcin Niesluchowski 
> 
> kmsg device provides operations on cyclic logging buffer used mainly
> by kernel but also in userspace by privileged processes.
> 
> Additional kmsg devices keep the same log format but may be added
> dynamically with custom size.
> 
> Signed-off-by: Marcin Niesluchowski 
> Signed-off-by: Paul Osmialowski 
> [Rebased kmsg patch v5 on Linux 4.5-rc5]
> Signed-off-by: Kazimierz Krosman 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 3653a8e..b99403b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -584,9 +595,11 @@ ssize_t msg_print_ext_body(char *buf, size_t size,
>  void log_buf_kexec_setup(void)
>  {
>   VMCOREINFO_SYMBOL(log_buf);
> - VMCOREINFO_SYMBOL(log_buf_len);
> - VMCOREINFO_SYMBOL(log_first_idx);
> - VMCOREINFO_SYMBOL(log_next_idx);
> + VMCOREINFO_STRUCT_SIZE(log_buffer);
> + VMCOREINFO_OFFSET(log_buffer, buf);
> + VMCOREINFO_OFFSET(log_buffer, len);
> + VMCOREINFO_OFFSET(log_buffer, first_idx);
> + VMCOREINFO_OFFSET(log_buffer, next_idx);

This breaks makedumpfile, crash and possibly other tools
that use this information to access the log buffer.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 2/6] module: preserve Elf information for livepatch modules

2016-02-10 Thread Petr Mladek
On Wed 2016-02-03 20:11:07, Jessica Yu wrote:
> diff --git a/kernel/module.c b/kernel/module.c
> index 71c77ed..9c16eb2 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2676,6 +2764,23 @@ static int copy_module_from_user(const void __user 
> *umod, unsigned long len,
>   return 0;
>  }
>  
> +#ifdef CONFIG_LIVEPATCH
> +static int find_livepatch_modinfo(struct module *mod, struct load_info *info)
> +{
> + mod->klp = get_modinfo(info, "livepatch") ? true : false;
> +
> + return 0;
> +}
> +#else /* !CONFIG_LIVEPATCH */
> +static int find_livepatch_modinfo(struct module *mod, struct load_info *info)
> +{
> + if (get_modinfo(info, "livepatch"))

One more thing. I suggest to write a friendly message here. For example:

pr_err("LifePatch support is not available. Rejecting the module: s\n",
   mod->name);

It might safe some debugging a poor user or developer.

> + return -ENOEXEC;
> +
> + return 0;
> +}
> +#endif /* CONFIG_LIVEPATCH */
> +

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 2/6] module: preserve Elf information for livepatch modules

2016-02-09 Thread Petr Mladek
On Wed 2016-02-03 20:11:07, Jessica Yu wrote:
> For livepatch modules, copy Elf section, symbol, and string information
> from the load_info struct in the module loader. Persist copies of the
> original symbol table and string table.
> 
> Livepatch manages its own relocation sections in order to reuse module
> loader code to write relocations. Livepatch modules must preserve Elf
> information such as section indices in order to apply livepatch relocation
> sections using the module loader's apply_relocate_add() function.
> 
> In order to apply livepatch relocation sections, livepatch modules must
> keep a complete copy of their original symbol table in memory. Normally, a
> stripped down copy of a module's symbol table (containing only "core"
> symbols) is made available through module->core_symtab. But for livepatch
> modules, the symbol table copied into memory on module load must be exactly
> the same as the symbol table produced when the patch module was compiled.
> This is because the relocations in each livepatch relocation section refer
> to their respective symbols with their symbol indices, and the original
> symbol indices (and thus the symtab ordering) must be preserved in order
> for apply_relocate_add() to find the right symbol.
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 71c77ed..9c16eb2 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3222,6 +3331,12 @@ static noinline int do_init_module(struct module *mod)
>*/
>   current->flags &= ~PF_USED_ASYNC;
>  
> +#ifdef CONFIG_KALLSYMS
> + /* Make symtab and strtab available prior to module init call */
> + mod->num_symtab = mod->core_num_syms;
> + mod->symtab = mod->core_symtab;
> + mod->strtab = mod->core_strtab;
> +#endif

This should be done with module_mutex. Otherwise, it looks racy
at least against module_kallsyms_on_each_symbol().

BTW: I wonder why even the original code is not racy
for example against module_get_kallsym. It is called
without the mutex. This code sets the number of entries
before the pointer to the entries.

Note that the module is in the list even in the UNFORMED state.


>   do_mod_ctors(mod);
>   /* Start the module */
>   if (mod->init != NULL)
> @@ -3266,11 +3381,6 @@ static noinline int do_init_module(struct module *mod)
>   /* Drop initial reference. */
>   module_put(mod);
>   trim_init_extable(mod);
> -#ifdef CONFIG_KALLSYMS
> - mod->num_symtab = mod->core_num_syms;
> - mod->symtab = mod->core_symtab;
> - mod->strtab = mod->core_strtab;
> -#endif
>   mod_tree_remove_init(mod);
>   disable_ro_nx(>init_layout);
>   module_arch_freeing_init(mod);

In each case, it was called with the mutex here.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v4 4/6] livepatch: reuse module loader code to write relocations

2016-02-09 Thread Petr Mladek
On Wed 2016-02-03 20:11:09, Jessica Yu wrote:
> Reuse module loader code to write relocations, thereby eliminating the need
> for architecture specific relocation code in livepatch. Specifically, reuse
> the apply_relocate_add() function in the module loader to write relocations
> instead of duplicating functionality in livepatch's arch-dependent
> klp_write_module_reloc() function.
> 
> In order to accomplish this, livepatch modules manage their own relocation
> sections (marked with the SHF_RELA_LIVEPATCH section flag) and
> livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
> index). To apply livepatch relocation sections, livepatch symbols
> referenced by relocs are resolved and then apply_relocate_add() is called
> to apply those relocations.
> 
> In addition, remove x86 livepatch relocation code and the s390
> klp_write_module_reloc() function stub. They are no longer needed since
> relocation work has been offloaded to module loader.
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 7aa975d..c1fe57c 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -28,6 +28,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  
>  /**
> @@ -87,6 +90,166 @@ static bool klp_is_object_loaded(struct klp_object *obj)
>   return !obj->name || obj->mod;
>  }
>  
> +/*
> + * Check if a livepatch symbol is formatted properly.
> + *
> + * See Documentation/livepatch/module-elf-format.txt for a
> + * detailed outline of requirements.
> + */
> +static int klp_check_symbol_format(struct module *pmod, Elf_Sym *sym)
> +{
> + size_t len;
> + char *s, *objname, *symname;
> +
> + if (sym->st_shndx != SHN_LIVEPATCH)
> + return -EINVAL;
> +
> + /*
> +  * Livepatch symbol names must follow this format:
> +  * .klp.sym.objname.symbol_name,sympos
> +  */
> + s = pmod->strtab + sym->st_name;
> + /* [.klp.sym.]objname.symbol_name,sympos */
> + if (!s || strncmp(s, KLP_SYM_PREFIX, KLP_SYM_PREFIX_LEN))
> + return -EINVAL;
> +
> + /* .klp.sym.[objname].symbol_name,sympos */
> + objname = s + KLP_SYM_PREFIX_LEN;
> + len = strcspn(objname, ".");
> + if (!(len > 0))
> + return -EINVAL;
> +
> + /* .klp.sym.objname.symbol_name,[sympos] */
> + if (!strchr(s, ','))
> + return -EINVAL;
> +
> + /* .klp.sym.objname.[symbol_name],sympos */
> + symname = objname + len + 1;
> + len = strcspn(symname, ",");
> + if (!(len > 0))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +/*
> + * Check if a livepatch relocation section is formatted properly.
> + *
> + * See Documentation/livepatch/module-elf-format.txt for a
> + * detailed outline of requirements.
> + */
> +static int klp_check_relasec_format(struct module *pmod, Elf_Shdr *relasec)
> +{
> + char *secname;
> + size_t len;
> +
> + secname = pmod->klp_info->secstrings + relasec->sh_name;
> + /* [.klp.rela.]objname.section_name */
> + if (!secname || strncmp(secname, KLP_RELASEC_PREFIX,
> + KLP_RELASEC_PREFIX_LEN))
> + return -EINVAL;
> +
> + /* .klp.rela.[objname].section_name */
> + len = strcspn(secname + KLP_RELASEC_PREFIX_LEN, ".");
> + if (!(len > 0))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +/*
> + * Check if obj->name matches the objname encoded in the rela
> + * section name (.klp.rela.[objname].section_name)
> + *
> + * Must pass klp_check_relasec_format() before calling this.
> + */
> +static bool klp_relasec_matches_object(struct module *pmod, Elf_Shdr 
> *relasec,
> +struct klp_object *obj)
> +{
> + size_t len;
> + const char *obj_objname, *sec_objname, *secname;
> +
> + secname = pmod->klp_info->secstrings + relasec->sh_name;
> + /* .klp.rela.[objname].section_name */
> + sec_objname = secname + KLP_RELASEC_PREFIX_LEN;
> + obj_objname = klp_is_module(obj) ? obj->name : "vmlinux";
> +
> + /* Get length of the objname encoded in the section name */
> + len = strcspn(sec_objname, ".");
> +
> + if (strlen(obj_objname) != len)
> + return false;
> +
> + return strncmp(sec_objname, obj_objname, len) ? false : true;
> +}
> +
> +/*
> + * klp_get_* helper functions
> + *
> + * klp_get_* functions extract different components of the name
> + * of a livepatch symbol. The full symbol name from the strtab
> + * is passed in as parameter @s, and @result is filled in with
> + * the extracted component.
> + *
> + * These functions assume a correctly formatted symbol and the
> + * klp_check_symbol_format() test *must* pass before calling any
> + * of these functions.
> + */
> +
> +/* .klp.sym.[objname].symbol_name,sympos */
> +static int klp_get_sym_objname(char *s, char **result)
> +{
> + size_t len;
> + char *objname, *objname_start;
> +
> + /*