[PATCH v3 0/3] cgroup: Introducing bypass mode

2017-08-09 Thread Waiman Long
 v2->v3:
  - Remove invalid cgroup subdirectory creation patch.
  - Add use cases for the bypass mode and removing statements about
control files ownership in cgroup-v2.txt.
  - Restrict bypass mode to non-domain (threaded) controllers only.

 v1->v2:
  - Remove relax no-internal-process constraint patch as this feature
is in the thread mode v4 patch.
  - Remove subtree root mode patch.
  - Remove the skip dying css patch as I can no longer reproduce the
problem.
  - Rework the bypass mode so that write to "cgroup.controllers"
to enable or disable controller interface files is only allowed
if the parent grants bypass mode to children by writing the
'#'-prefixed controller to "cgroup.subtree_control".
  - Add a patch to disable subdirectory creation on an invalid domain.

 v1 patch - https://lkml.org/lkml/2017/6/14/551
 v2 patch - https://lkml.org/lkml/2017/7/21/606

This patchset introduces new capability to the cgroup v2 core to give
more freedom and flexibility to non-domain controllers so that they
can shape their own unique views of the virtual cgroup hierarchies
that can best suit thier own use cases. It also enables a cgroup
parent to selectively enable a non-domain controller in a subset of
its child cgroups instead of in either all or none of them.

The bypass mode cannot be used on domain controllers as it will
complicate resource distribution model and rules.

One use case is an application that want to use cpuset, for example,
to bind some worker threads to individual cpus. At the same time, the
application may also want to use cpu controller to limit the amount
of cpu consumed by some other threads. Right now, the only way to do
that with the current v2 control scheme is to create child cgroups
with both cpu and cpuset controllers enabled and put the desired
processes or threads into those child cgroups.

The cost of enabling cpuset on a task that need cpu controller is
negligible. However, the cost of enabling cpu controller on tasks
that only need cpuset can be noticeable. The performance difference
may become a concern for users who are thinking of moving from cgroup
v1 to v2.

Similarly, instead of cpuset, if we want to use perf_event, freezer or
other non-domain controllers in a subset of tasks, we will also need
to enable CPU controller along with the associated performance cost.

With bypass mode, we will have the ability to enable just the
non-domain controllers the tasks needed in their respective child
cgroups. It is just like what we can currently do with cgroup v1.

This patchset is layered on top of the "for-4.14" branch of Tejun's
cgroup git tree.

Patch 1 introduces a new bypass mode that allows a non-domain
controller to be disabled in a cgroup, but re-enabled again in its
children. This is enabled by writing the controller name prefixed with
'#' to the "cgroup.subtree_control" file. Then all its children will
have this controller in bypass mode.

Patch 2 extends the bypass mode mechanism to allow those child
cgroups that are put into the bypass mode for a particular non-domain
controller by their parent to be re-enabled again by writing the
controller name with the '+' prefix to the "cgroup.controllers" file.

Patch 3 extends the debug controller to expose additional controller
masks introduced by this patchset.

Waiman Long (3):
  cgroup: subtree_control bypass mode for non-domain controllers
  cgroup: Allow reenabling of controller in bypass mode
  cgroup: Make debug controller report new controller masks

 Documentation/cgroup-v2.txt |  58 +++---
 include/linux/cgroup-defs.h |  19 +++-
 kernel/cgroup/cgroup.c  | 250 +++-
 kernel/cgroup/debug.c   |   2 +
 4 files changed, 257 insertions(+), 72 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] printk: Add boottime and real timestamps

2017-08-09 Thread Luis R. Rodriguez
On Mon, Aug 07, 2017 at 02:17:33PM -0400, Prarit Bhargava wrote:
> 
> 
> On 08/07/2017 01:14 PM, Luis R. Rodriguez wrote:
> 
> > 
> > Note printk_late_init() is a late_initcall(). This means if the
> > printk_time_setting was disabled it will take a while to enable it. 
> > Enabling it
> > is done at the device_initcall(), so if printk setting is disabled but a 
> > user
> > enables it with a toggle of the module param there is a period of time 
> > during
> > which time resolution would be different. 
> 
> I'm not sure I follow your comment.  Could you elaborate with an example of
> what you think is going wrong or might be confusing?

Sure let's consider this:

+static u64 printk_get_ts(void)
+{
+   u64 mono, offset_real;
+
+   if (printk_time <= PRINTK_TIME_LOCAL)
+   return local_clock();
+
+   if (printk_time == PRINTK_TIME_BOOT)
+   return ktime_get_boot_log_ts();
+
+   mono = ktime_get_real_log_ts(_real);
+
+   if (printk_time == PRINTK_TIME_MONO)
+   return mono;
+
+   return mono + offset_real;
+}

So even if printk_time was flipped in the end the backend routines used will be
local_clock(), ,ktime_get_boot_log_ts() or ktime_get_real_log_ts().

This is used here;
 
@@ -1643,7 +1756,7 @@ static bool cont_add(int facility, int level, enum 
log_flags flags, const char *
cont.facility = facility;
cont.level = level;
cont.owner = current;
-   cont.ts_nsec = local_clock();
+   cont.ts_nsec = printk_get_ts();
cont.flags = flags;
}


But lets inspect these new calls:
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
@@ -477,6 +479,24 @@ u64 notrace ktime_get_boot_fast_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 
+u64 ktime_get_real_log_ts(u64 *offset_real)
+{
+   *offset_real = ktime_to_ns(tk_core.timekeeper.offs_real);
+
+   if (timekeeping_active)
+   return ktime_get_mono_fast_ns();
+   else
+   return local_clock();
+}
+
+u64 ktime_get_boot_log_ts(void)
+{
+   if (timekeeping_active)
+   return ktime_get_boot_fast_ns();
+   else
+   return local_clock();
+}
+

So they are really only effectively calling something other than
what lock_clock() returns *iff* timekeeping_active is true. But
this is only set later at the respective device_initcall() in this
file:

@@ -1530,6 +1550,8 @@ void __init timekeeping_init(void)
 
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
+
+   timekeeping_active = 1;
 }
 

So when the boot param is processed and prints out that it has
changed someone inspecting any time setting after that print
may assume its using after that ktime_get_mono_fast_ns() or
time_get_boot_fast_ns() but this is not accurate, it will use
local_clock() until *after* device_initcall().

So in between boot and this particular device_initcall() time
resolution can only be local_time(). Seems worth documenting
that.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 7:26 PM, Corey Minyard  wrote:
> On 08/09/2017 08:04 PM, Brendan Higgins wrote:
>>>
>>> Perhaps that is some level of abuse, but it's pretty common.  I'm not
>>> against it.
>>>
>>> There is standard IPMI firmware NetFN (though no commands defined) that
>>> if
>>> you use
>>> the driver automatically goes into "Maintenance mode" and modified the
>>> timeouts
>>> and handling to some extent to help with this.
>>
>> That is a really good point, I missed that.
>> ...
>>>
>>>
>>> There are ways to accomplish this that aren't that complex.  You can
>>> create
>>> an OEM
>>> command that can query the maximum message size and the ability to do
>>> sequence
>>> numbers in the messages.
>>>
>>> If messages larger than 32-bytes are supported, and the host I2C/SMBus
>>> driver
>>> supports it, you could use the standard SSIF SMBus commands to do this,
>>> they
>>> have an 8-bit length field.
>>>
>>> If sequence numbers are supported, The SSIF could use different SMBus
>>> commands
>>> to do the write and read requests.  Since this is only if you get an OEM
>>> command,
>>> and if you put the sequence numbers at the end where they are easy to add
>>> on
>>> the send side, this is a small change to the driver.
>>
>> What if we just had an OEM command that changed the message structure from
>> that point on? We could abuse the "maintenance mode" NetFN to get back
>> into
>> normal SSIF if necessary.
>
>
> Actually, I wouldn't have a separate "openbmc mode".  I would have OpenBMC
> always
> work with standard SSIF, and have separate SMBus commands for messages with
> the sequence number and messages larger than 32 bytes.
>
> I've attached a patch with what I would expect the changes to be to the host
> driver.
> It doesn't handle multiple outstanding messages, but it shows what detection
> and a
> separate SMBus command would look like.

I took a look at the patch, it seems reasonable. If I was maintaining
SSIF, I probably
would not want that kind of clutter for my admittedly weird use case,
but if you're
okay with it, then so am I.

>
>
>>> So I think the changes would be small and contained.  I'm actually ok
>>> with a
>>> different driver, but I think it would be more valuable to the OpenBMC
>>> project
>>> to have a standardized interface that would work (in a not quite as
>>> efficient
>>> mode) with software that does not use the Linux IPMI driver.
>>
>> I guess I see the all of my asks as hacky things which we can hopefully
>> remove
>> at some point. Hopefully, most OpenBMC users won't want or need these
>> things.
>> ...

 Regardless of what we do with the "BT-I2C" stuff, I am still interested
 in
 what
 you think about this.
>>>
>>>
>>> I think you are right, it probably belongs some place else.  The way that
>>> makes the most
>>> sense to me would be to have an "ipmi" directory with a "host" and
>>> "slave"
>>> side, and since
>>> ipmi is not really a char driver, to move it to the main driver
>>> directory.
>>> That might be
>>> fairly disruptive, though.
>>
>> That was my thinking exactly.
>>
>>> The other option that makes sense to me would be to add a
>>> drivers/char/ipmi_slave directory,
>>> or something like that, and put the slave code there.  That would be less
>>> disruptive.
>>
>> Right that is the approach I took, except I called it
>> drivers/char/ipmi_bmc.
>>
>> I originally thought doing the less disruptive thing is best; however, I
>> know
>> there are also some OpenBMC people who are interested in implementing
>> IPMB. So maybe now is the time to bite the bullet and create an ipmi
>> directory under drivers/.
>
>
> I'm not sure IPMB would make much difference, there's no host side change as
> it's
> already supported.  I don't think there would be any significant code
> sharing
> between the two.

No, I don't expect much code sharing between them. I just thought it would be a
reasonable place to put IPMB, sort of like how we have a bunch of "character"
device drivers in drivers/char, but I suppose that might be somewhat of an
anti-pattern ;-)

>
> If there end up being a significant amount of common code, then it would
> definitely be worth the effort to move it.
>
>>> -corey
>>
>> In summary, I think I can live with making it a mangled form of SSIF, but
>> I would prefer to put it in its own driver.
>
>
> You can look at the patch and consider it, and consider that you would need
> to
> implement flag and event handling.  On an x86 host there would be SMBIOS
> and ACPI stuff to deal with somehow for discovery.  There's probably few
> other
> things to deal with.
>
>> In any case, I think I would rather focus on the the BMC side IPMI
>> framework
>> now, since it is a bigger change and would also reduce the work of
>> implementing a BMC side SSIF driver.
>>
>> Here is what I propose: we focus on the BMC side IPMI framework RFC that
>> I sent out the other day:
>> 

Re: [Linux-ima-devel] [PATCH, RESEND 08/12] ima: added parser for RPM data type

2017-08-09 Thread Roberto Sassu

On 8/2/2017 9:22 AM, James Morris wrote:

On Tue, 1 Aug 2017, Roberto Sassu wrote:


On 8/1/2017 12:27 PM, Christoph Hellwig wrote:

On Tue, Aug 01, 2017 at 12:20:36PM +0200, Roberto Sassu wrote:

This patch introduces a parser for RPM packages. It extracts the digests
from the RPMTAG_FILEDIGESTS header section and converts them to binary
data
before adding them to the hash table.

The advantage of this data type is that verifiers can determine who
produced that data, as headers are signed by Linux distributions vendors.
RPM headers signatures can be provided as digest list metadata.


Err, parsing arbitrary file formats has no business in the kernel.


The benefit of this choice is that no actions are required for
Linux distribution vendors to support the solution I'm proposing,
because they already provide signed digest lists (RPM headers).

Since the proof of loading a digest list is the digest of the
digest list (included in the list metadata), if RPM headers are
converted to a different format, remote attestation verifiers
cannot check the signature.

If the concern is security, it would be possible to prevent unsigned
RPM headers from being parsed, if the PGP key type is upstreamed
(adding in CC keyri...@vger.kernel.org).


It's a security concern and also a layering violation, there should be no
need to parse package file formats in the kernel.


Parsing RPMs is not strictly necessary. Digests from the headers
can be extracted and written to a new file using the compact data
format (introduced with patch 7/12).

At boot time, IMA measures this file before digests are uploaded to the
kernel. At this point, only files with unknown digest will be added
to the measurement list. At verification time, verifiers recreate the
measurement list by merging together the digests uploaded to the
kernel with the unknown digests. Then, they verify the obtained list.

There are two ways to verify the digests: searching them in a reference
database, or checking a signature. With the 'ima-sig' measurement list
template, it is possible to verify signatures for each accessed file.
With this patch set, it is possible to verify the signature of
the file containing the digests uploaded to the kernel. If the data
format changes, the signature cannot be verified.

To avoid this limitation, the parsers could be moved to a userspace
tool which then uploads the parsed digests to the kernel. IMA would
measure the original files. But, if the tool is compromised, it could
load digests not included in the parsed files. With the current solution
this problem does not arise because no changes can be done by userspace
applications to the uploaded data while digests are parsed by IMA.

I could remove the RPM parser from the patch set for now.

Is the remaining part of the patch set ok, and is the explanation of
what it does clear?

Thanks

Roberto



I'm not really clear on exactly how this patch series works.  Can you
provide a more concrete explanation of what steps would occur during boot
and attestation?



--
HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Bo PENG, Qiuen PENG, Shengli WANG
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-ima-devel] [PATCH 11/12] ima: don't report measurements if digests are included in the loaded lists

2017-08-09 Thread Ken Goldman

On 7/25/2017 11:44 AM, Roberto Sassu wrote:

Don't report measurements if the file digest has been included in
an uploaded digest list.

The advantage of this solution is that the boot time overhead, when
a TPM is available, is very small because a PCR is extended only
for unknown files. The disadvantage is that verifiers do not know
anymore which and when files are accessed (they must assume that
the worst case happened, i.e. all files have been accessed).


Am I reading this correctly that you want to measure certain files, but 
not ones that have been included in a "digest list", which sounds like a 
white list of sorts.


If so, I have two concerns:

1 - How would the client get this digest list?  Shouldn't it be up to 
the relying party to decide what is trusted and not trusted, not the client?


What of the case with two different relying parties that have a 
different list of trusted applications?  E.g., one trusts any version of 
program X, while the other trusts only version 3.1 and up?


2 - What about files on the digest list that were not run?  The relying 
party may want to know if a program wasn't run?  E.g., antivirus or a 
firewall.


If the rule is "don't measure if it's on the digest list", how does the 
relying party know if it was run?


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] printk: Add monotonic, boottime, and realtime timestamps

2017-08-09 Thread Paul E. McKenney
On Tue, Aug 08, 2017 at 07:08:00PM -0400, Prarit Bhargava wrote:
> 
> 
> On 08/08/2017 04:28 AM, Peter Zijlstra wrote:
> > On Mon, Aug 07, 2017 at 01:36:39PM -0700, Paul E. McKenney wrote:
> >> On Mon, Aug 07, 2017 at 04:06:09PM -0400, Prarit Bhargava wrote:
> > 
> >>> peterz?  Want to offer a suggestion?  The issue is that I'm changing a 
> >>> bool
> >>> config option to an int and that impacts all the arch's defconfigs.  John 
> >>> points
> >>> out that this is a lot of churn and we're both wondering if there's a 
> >>> better way
> >>> to do the configs.
> >>
> >> The usual approach is to keep the old bool Kconfig option, and add another
> >> int Kconfig option that depends on the original one.  The tests for
> >> the int value get a bit more complex, but one way to handle this is to
> >> define a cpp macro something like the following:
> >>
> >> #ifdef CONFIG_OLD_OPTION
> >> #define CPP_NEW_OPTION 0
> >> #else
> >> #define CPP_NEW_OPTION CONFIG_NEW_OPTION
> >> #endif
> >>
> >> Then use CPP_NEW_OPTION, where zero means disabled and other numbers
> >> select the available options.
> >>
> >> Adjust to suit depending on what values mean what.
> >>
> >> Another approach is to make the range of the new Kconfig option
> >> depend on the old option:
> >>
> >> config NEW_OPTION
> >>int "your description here"
> >>range 1 5 if OLD_OPTION
> >>range 0 0 if !OLD_OPTION
> >>default 0
> >>help
> >>  your help here
> >>
> >> Again, adjust to suit depending on what values mean what.
> > 
> > Right this. Except I don't see the !OLD_OPTION working as expected.
> > A 'new' config will not include the old one, so the !OLD_OPTION thing
> > will 'always' be true.
> > 
> > So your:
> > 
> >> @@ -1,8 +1,46 @@
> >>  menu "printk and dmesg options"
> >>
> >> +choice
> >> +   prompt "printk default clock"
> >> +   config PRINTK_TIME_DISABLE
> >> +   bool "Disabled"
> >> +   help
> >> +Selecting this option disables the time stamps of printk().
> >> +
> >> +   config PRINTK_TIME_LOCAL
> >> +   bool "Local Clock"
> >> +   help
> >> + Selecting this option causes the time stamps of printk() to be
> >> + stamped with the unadjusted hardware clock.
> >> +
> >> +   config PRINTK_TIME_BOOT
> >> +   bool "CLOCK_BOOTTIME"
> >> +   help
> >> + Selecting this option causes the time stamps of printk() to be
> >> + stamped with the adjusted boottime clock.
> >> +
> >> +   config PRINTK_TIME_MONO
> >> +   bool "CLOCK_MONOTONIC"
> >> +   help
> >> + Selecting this option causes the time stamps of printk() to be
> >> + stamped with the adjusted monotonic clock.
> >> +
> >> +   config PRINTK_TIME_REAL
> >> +   bool "CLOCK_REALTIME"
> >> +   help
> >> + Selecting this option causes the time stamps of printk() to be
> >> + stamped with the adjusted realtime clock.
> >> +
> >> +endchoice
> >> +
> >>  config PRINTK_TIME
> > 
> > Change that into something like:
> > 
> > config PRINTK_CLOCK
> > 
> > 
> >> -   bool "Show timing information on printks"
> >> +   int "Show time stamp information on printks"
> >> depends on PRINTK
> >> +   default 0 if PRINTK_TIME_DISABLE
> >> +   default 1 if PRINTK_TIME_LOCAL
> > 
> > And that into:
> > 
> > default 1 if PRINTK_TIME_LOCAL || PRINTK_TIME
> > 
> >> +   default 2 if PRINTK_TIME_BOOT
> >> +   default 3 if PRINTK_TIME_MONO
> >> +   default 4 if PRINTK_TIME_REAL
> >>  help
> >>Selecting this option causes time stamps of the printk()
> > 
> > Then the old PRINTK_TIME symbol will auto-convert into the new
> > equivalent.
> > 
> 
> I don't think there's an easy code way around this.  Essentially this Kconfig
> code boils down to properly evaluating
> 
> config PRINTK_CLOCK
>   default 1 if PRINTK_TIME
>   default 0
> 
> where there is no Kconfig entry for PRINTK_TIME.
> 
> If undefined CONFIG_PRINTK_TIME is used in a config, it is immediately
> scrubbed by the kconfig script so it doesn't "exist" when CONFIG_PRINTK_CLOCK
> is evaluated.  The result of that is CONFIG_PRINT_CLOCK=0.
> 
> I tried
> 
> config PRINTK_TIME
>   bool "old config option"
> 
> then I end up with both a CONFIG_PRINTK_CLOCK=1 and a CONFIG_PRINTK_TIME=y in
> the resulting config which is confusing.
> 
> I've debated using the other suggestion that Paul made but TBH (sorry
> Paul) it seems like I'm avoiding the real but noisy solution of
> 
>   s/PRINTK_TIME=y/PRINTK_TIME=1/g
> 
> I'm obviously open to other suggestions...

It is someone else's turn to provide a suggestion.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] printk: Add boottime and real timestamps

2017-08-09 Thread Prarit Bhargava


On 08/09/2017 02:24 PM, Luis R. Rodriguez wrote:
> On Mon, Aug 07, 2017 at 02:17:33PM -0400, Prarit Bhargava wrote:
>>
>>
>> On 08/07/2017 01:14 PM, Luis R. Rodriguez wrote:
>>
>>>
>>> Note printk_late_init() is a late_initcall(). This means if the
>>> printk_time_setting was disabled it will take a while to enable it. 
>>> Enabling it
>>> is done at the device_initcall(), so if printk setting is disabled but a 
>>> user
>>> enables it with a toggle of the module param there is a period of time 
>>> during
>>> which time resolution would be different. 
>>
>> I'm not sure I follow your comment.  Could you elaborate with an example of
>> what you think is going wrong or might be confusing?
> 
> Sure let's consider this:
> 
> +static u64 printk_get_ts(void)
> +{
> + u64 mono, offset_real;
> +
> + if (printk_time <= PRINTK_TIME_LOCAL)
> + return local_clock();
> +
> + if (printk_time == PRINTK_TIME_BOOT)
> + return ktime_get_boot_log_ts();
> +
> + mono = ktime_get_real_log_ts(_real);
> +
> + if (printk_time == PRINTK_TIME_MONO)
> + return mono;
> +
> + return mono + offset_real;
> +}
> 
> So even if printk_time was flipped in the end the backend routines used will 
> be
> local_clock(), ,ktime_get_boot_log_ts() or ktime_get_real_log_ts().
> 
> This is used here;
>  
> @@ -1643,7 +1756,7 @@ static bool cont_add(int facility, int level, enum 
> log_flags flags, const char *
>   cont.facility = facility;
>   cont.level = level;
>   cont.owner = current;
> - cont.ts_nsec = local_clock();
> + cont.ts_nsec = printk_get_ts();
>   cont.flags = flags;
>   }
> 
> 
> But lets inspect these new calls:
>  
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> @@ -477,6 +479,24 @@ u64 notrace ktime_get_boot_fast_ns(void)
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
>  
> +u64 ktime_get_real_log_ts(u64 *offset_real)
> +{
> + *offset_real = ktime_to_ns(tk_core.timekeeper.offs_real);
> +
> + if (timekeeping_active)
> + return ktime_get_mono_fast_ns();
> + else
> + return local_clock();
> +}
> +
> +u64 ktime_get_boot_log_ts(void)
> +{
> + if (timekeeping_active)
> + return ktime_get_boot_fast_ns();
> + else
> + return local_clock();
> +}
> +
> 
> So they are really only effectively calling something other than
> what lock_clock() returns *iff* timekeeping_active is true. But
> this is only set later at the respective device_initcall() in this
> file:
> 
> @@ -1530,6 +1550,8 @@ void __init timekeeping_init(void)
>  
>   write_seqcount_end(_core.seq);
>   raw_spin_unlock_irqrestore(_lock, flags);
> +
> + timekeeping_active = 1;
>  }
>  
> 
> So when the boot param is processed and prints out that it has
> changed someone inspecting any time setting after that print
> may assume its using after that ktime_get_mono_fast_ns() or
> time_get_boot_fast_ns() but this is not accurate, it will use
> local_clock() until *after* device_initcall().
> 
> So in between boot and this particular device_initcall() time
> resolution can only be local_time(). Seems worth documenting
> that.

I've moved to a different model of using a fn ptr for print_get_ts() and using
peterz's suggestion of returning 0 until the timekeeping is initialized, so this
won't be a problem any more.

P.

> 
>   Luis
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 9/9] sparc64: Add support for ADI (Application Data Integrity)

2017-08-09 Thread Khalid Aziz
ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.

This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. ADI is not enabled by
default for any task. A task must explicitly enable ADI on a memory
range and set version tag for ADI to be effective for the task.

Signed-off-by: Khalid Aziz 
Cc: Khalid Aziz 
---
v7:
- Enhanced arch_validate_prot() to enable ADI only on writable
  addresses backed by physical RAM
- Added support for saving/restoring ADI tags for each ADI
  block size address range on a page on swap in/out
- Added code to copy ADI tags on COW
- Updated values for auxiliary vectors to not conflict with
  values on other architectures to avoid conflict in glibc. glibc
  consolidates all auxiliary vectors into its headers and
  duplicate values in consolidated header are problematic
- Disable same page merging on ADI enabled pages since ADI tags
  may not match on pages with identical data
- Broke the patch up further into smaller patches

v6:
- Eliminated instructions to read and write PSTATE as well as
  MCDPER and PMCDPER on every access to userspace addresses
  by setting PSTATE and PMCDPER correctly upon entry into
  kernel. PSTATE.mcde and PMCDPER are set upon entry into
  kernel when running on an M7 processor. PSTATE.mcde being
  set only affects memory accesses that have TTE.mcd set.
  PMCDPER being set only affects writes to memory addresses
  that have TTE.mcd set. This ensures any faults caused by
  ADI tag mismatch on a write are exposed before kernel returns
  to userspace.

v5:
- Fixed indentation issues and instrcuctions in assembly code
- Removed CONFIG_SPARC64 from mdesc.c
- Changed to maintain state of MCDPER register in thread info
  flags as opposed to in mm context. MCDPER is a per-thread
  state and belongs in thread info flag as opposed to mm context
  which is shared across threads. Added comments to clarify this
  is a lazily maintained state and must be updated on context
  switch and copy_process()
- Updated code to use the new arch_do_swap_page() and
  arch_unmap_one() functions

v4:
- Broke patch up into smaller patches

v3:
- Removed CONFIG_SPARC_ADI
- Replaced prctl commands with mprotect
- Added auxiliary vectors for ADI parameters
- Enabled ADI for swappable pages

v2:
- Fixed a build error

 Documentation/sparc/adi.txt | 272 +++
 arch/sparc/include/asm/mman.h   |  72 -
 arch/sparc/include/asm/mmu_64.h |  17 ++
 arch/sparc/include/asm/mmu_context_64.h |  43 +
 arch/sparc/include/asm/page_64.h|   4 +
 arch/sparc/include/asm/pgtable_64.h |  46 ++
 arch/sparc/include/asm/thread_info_64.h |   2 +-
 arch/sparc/include/asm/trap_block.h |   2 +
 arch/sparc/include/uapi/asm/mman.h  |   2 +
 arch/sparc/kernel/adi_64.c  | 277 
 arch/sparc/kernel/etrap_64.S|  28 +++-
 arch/sparc/kernel/process_64.c  |  25 +++
 arch/sparc/kernel/setup_64.c|  11 +-
 arch/sparc/kernel/vmlinux.lds.S |   5 +
 arch/sparc/mm/gup.c |  37 +
 arch/sparc/mm/hugetlbpage.c |  14 +-
 arch/sparc/mm/init_64.c |  33 
 arch/sparc/mm/tsb.c |  21 +++
 include/linux/mm.h  |   3 +
 mm/ksm.c|   4 +
 20 files changed, 913 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/sparc/adi.txt

diff --git a/Documentation/sparc/adi.txt b/Documentation/sparc/adi.txt
new file mode 100644
index ..383bc65fec1e
--- /dev/null
+++ b/Documentation/sparc/adi.txt
@@ -0,0 +1,272 @@
+Application Data Integrity (ADI)
+
+
+SPARC M7 processor adds the Application Data Integrity (ADI) feature.
+ADI allows a task to set version tags on any subset of its address
+space. Once ADI is enabled and 

[PATCH v7 0/9] Application Data Integrity feature introduced by SPARC M7

2017-08-09 Thread Khalid Aziz
SPARC M7 processor adds additional metadata for memory address space
that can be used to secure access to regions of memory. This additional
metadata is implemented as a 4-bit tag attached to each cacheline size
block of memory. A task can set a tag on any number of such blocks.
Access to such block is granted only if the virtual address used to
access that block of memory has the tag encoded in the uppermost 4 bits
of VA. Since sparc processor does not implement all 64 bits of VA, top 4
bits are available for ADI tags. Any mismatch between tag encoded in VA
and tag set on the memory block results in a trap. Tags are verified in
the VA presented to the MMU and tags are associated with the physical
page VA maps on to. If a memory page is swapped out and page frame gets
reused for another task, the tags are lost and hence must be saved when
swapping or migrating the page.

A userspace task enables ADI through mprotect(). This patch series adds
a page protection bit PROT_ADI and a corresponding VMA flag
VM_SPARC_ADI. VM_SPARC_ADI is used to trigger setting TTE.mcd bit in the
sparc pte that enables ADI checking on the corresponding page. MMU
validates the tag embedded in VA for every page that has TTE.mcd bit set
in its pte. After enabling ADI on a memory range, the userspace task can
set ADI version tags using stxa instruction with ASI_MCD_PRIMARY or
ASI_MCD_ST_BLKINIT_PRIMARY ASI.

Once userspace task calls mprotect() with PROT_ADI, kernel takes
following overall steps:

1. Find the VMAs covering the address range passed in to mprotect and
set VM_SPARC_ADI flag. If address range covers a subset of a VMA, the
VMA will be split.

2. When a page is allocated for a VA and the VMA covering this VA has
VM_SPARC_ADI flag set, set the TTE.mcd bit so MMU will check the
vwersion tag.

3. Userspace can now set version tags on the memory it has enabled ADI
on. Userspace accesses ADI enabled memory using a virtual address that
has the version tag embedded in the high bits. MMU validates this
version tag against the actual tag set on the memory. If tag matches,
MMU performs the VA->PA translation and access is granted. If there is a
mismatch, hypervisor sends a data access exception or precise memory
corruption detected exception depending upon whether precise exceptions
are enabled or not (controlled by MCDPERR register). Kernel sends
SIGSEGV to the task with appropriate si_code.

4. If a page is being swapped out or migrated, kernel must save any ADI
tags set on the page. Kernel maintains a page worth of tag storage
descriptors. Each descriptors pointsto a tag storage space and the
address range it covers. If the page being swapped out or migrated has
ADI enabled on it, kernel finds a tag storage descriptor that covers the
address range for the page or allocates a new descriptor if none of the
existing descriptors cover the address range. Kernel saves tags from the
page into the tag storage space descriptor points to.

5. When the page is swapped back in or reinstantiated after migration,
kernel restores the version tags on the new physical page by retrieving
the original tag from tag storage pointed to by a tag storage descriptor
for the virtual address range for new page.

User task can disable ADI by calling mprotect() again on the memory
range with PROT_ADI bit unset. Kernel clears the VM_SPARC_ADI flag in
VMAs, merges adjacent VMAs if necessary, and clears TTE.mcd bit in the
corresponding ptes.

IOMMU does not support ADI checking. Any version tags embedded in the
top bits of VA meant for IOMMU, are cleared and replaced with sign
extension of the first non-version tag bit (bit 59 for SPARC M7) for
IOMMU addresses.

This patch series adds support for this feature in 9 patches:

Patch 1/9
  Tag mismatch on access by a task results in a trap from hypervisor as
  data access exception or a precide memory corruption detected
  exception. As part of handling these exceptions, kernel sends a
  SIGSEGV to user process with special si_code to indicate which fault
  occurred. This patch adds three new si_codes to differentiate between
  various mismatch errors.

Patch 2/9
  When a page is swapped or migrated, metadata associated with the page
  must be saved so it can be restored later. This patch adds a new
  function that saves/restores this metadata when updating pte upon a
  swap/migration.

Patch 3/9
  SPARC M7 processor adds new fields to control registers to support ADI
  feature. It also adds a new exception for precise traps on tag
  mismatch. This patch adds definitions for the new control register
  fields, new ASIs for ADI and an exception handler for the precise trap
  on tag mismatch.

Patch 4/9
  New hypervisor fault types were added by sparc M7 processor to support
  ADI feature. This patch adds code to handle these fault types for data
  access exception handler.

Patch 5/9
  When ADI is in use for a page and a tag mismatch occurs, processor
  raises "Memory corruption Detected" trap. This patch adds a 

[PATCH v3 3/3] cgroup: Make debug controller report new controller masks

2017-08-09 Thread Waiman Long
The newly added cgroup controller masks (subtree_bypass and
enable_ss_mask) are now being reported in the debug.masks controller
file.

Signed-off-by: Waiman Long 
---
 kernel/cgroup/debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index f661b4c..5f35a76 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -262,6 +262,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v)
 
cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control);
cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask);
+   cgroup_masks_read_one(seq, "subtree_bypass",  cgrp->subtree_bypass);
+   cgroup_masks_read_one(seq, "enable_ss_mask",  cgrp->enable_ss_mask);
 
cgroup_kn_unlock(of->kn);
return 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers

2017-08-09 Thread Waiman Long
The special prefix '#' attached to a non-domain controller name can now
be written into the cgroup.subtree_control file to set that controller
in bypass mode in all the child cgroups. The controller will show up
in the children's cgroup.controllers file, but the corresponding
control knobs will be absent. However, that controller can be
enabled or bypassed in its children by writing to their respective
subtree_control files.

This mode is useful to non-domain controllers where there are costs to
each additional layer of hierarchy. This mode will also allow more
freedom in how each controller can shape its effective hierarchy
independent of each others.

Signed-off-by: Waiman Long 
---
 include/linux/cgroup-defs.h |  12 ++--
 kernel/cgroup/cgroup.c  | 143 
 2 files changed, 100 insertions(+), 55 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 59e4ad9..15655e5 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -308,16 +308,18 @@ struct cgroup {
struct cgroup_file events_file; /* handle for "cgroup.events" */
 
/*
-* The bitmask of subsystems enabled on the child cgroups.
-* ->subtree_control is the one configured through
-* "cgroup.subtree_control" while ->child_ss_mask is the effective
-* one which may have more subsystems enabled.  Controller knobs
-* are made available iff it's enabled in ->subtree_control.
+* The bitmask of subsystems enabled or bypassed on the child cgroups.
+* ->subtree_control and ->subtree_bypass are the one configured
+* through "cgroup.subtree_control" while ->subtree_ss_mask is the
+* effective one which may have more subsystems enabled.  Controller
+* knobs are made available iff it's enabled in ->subtree_ss_mask.
 */
u16 subtree_control;
u16 subtree_ss_mask;
+   u16 subtree_bypass;
u16 old_subtree_control;
u16 old_subtree_ss_mask;
+   u16 old_subtree_bypass;
 
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f5ca55d..9e69f7f 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -365,7 +365,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
return false;
 
/* and no domain controllers can be enabled */
-   if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask)
+   if ((cgrp->subtree_control|cgrp->subtree_bypass) &
+   ~cgrp_dfl_threaded_ss_mask)
return false;
 
return true;
@@ -387,7 +388,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp)
 * enabled is a thread root.
 */
if (cgroup_has_tasks(cgrp) &&
-   (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask))
+  ((cgrp->subtree_control|cgrp->subtree_bypass)
+  & cgrp_dfl_threaded_ss_mask))
return true;
 
return false;
@@ -412,7 +414,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp)
 }
 
 /* subsystems visibly enabled on a cgroup */
-static u16 cgroup_control(struct cgroup *cgrp)
+static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
 {
struct cgroup *parent = cgroup_parent(cgrp);
u16 root_ss_mask = cgrp->root->subsys_mask;
@@ -420,6 +422,9 @@ static u16 cgroup_control(struct cgroup *cgrp)
if (parent) {
u16 ss_mask = parent->subtree_control;
 
+   if (show_bypass)
+   ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -433,13 +438,17 @@ static u16 cgroup_control(struct cgroup *cgrp)
 }
 
 /* subsystems enabled on a cgroup */
-static u16 cgroup_ss_mask(struct cgroup *cgrp)
+static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
 {
struct cgroup *parent = cgroup_parent(cgrp);
 
if (parent) {
u16 ss_mask = parent->subtree_ss_mask;
 
+
+   if (show_bypass)
+   ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -492,7 +501,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct 
cgroup *cgrp,
 * This function is used while updating css associations and thus
 * can't test the csses directly.  Test ss_mask.
 */
-   while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
+   while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) {
cgrp = cgroup_parent(cgrp);
if (!cgrp)

[PATCH v3 2/3] cgroup: Allow reenabling of controller in bypass mode

2017-08-09 Thread Waiman Long
Non-domain controllers set to bypass mode in the parent's
"cgroup.subtree_control" can now be optionally enabled by writing the
controller name with the '+' prefix to "cgroup.controllers". Using the
'#' prefix will reset it back to the bypass state.

This capability allows a cgroup parent to individually enable
non-domain controllers in a subset of its children instead of either
all or none of them. This increases the flexibility each controller
has in shaping the effective cgroup hierarchy to best suit its need.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt |  58 +--
 include/linux/cgroup-defs.h |   7 +++
 kernel/cgroup/cgroup.c  | 109 ++--
 3 files changed, 156 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc44785..e76dc4cf 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -363,10 +363,16 @@ disabled by writing to the "cgroup.subtree_control" file::
 
   # echo "+cpu +memory -io" > cgroup.subtree_control
 
+The prefixes '+', '-' and '#' are used to enable, disable or put
+a controller in the bypass mode respectively.  In the bypass mode,
+a controller is disabled in a cgroup, but it can be enabled again in
+its child cgroups as it will still be listed in "cgroup.controllers".
+Bypass mode can only be used on non-domain controllers.
+
 Only controllers which are listed in "cgroup.controllers" can be
-enabled.  When multiple operations are specified as above, either they
-all succeed or fail.  If multiple operations on the same controller
-are specified, the last one is effective.
+enabled or bypassed.  When multiple operations are specified as above,
+either they all succeed or fail.  If multiple operations on the same
+controller are specified, the last one is effective.
 
 Enabling a controller in a cgroup indicates that the distribution of
 the target resource across its immediate children will be controlled.
@@ -390,6 +396,20 @@ prefixed controller interface files from C and D.  This 
means that the
 controller interface files - anything which doesn't start with
 "cgroup." are owned by the parent rather than the cgroup itself.
 
+Once a non-domain controller is put into bypass mode in
+"cgroup.subtree_control", that controller can optionally be enabled
+again in child cgroups by writing the controller name with the '+
+prefix into "cgroup.controllers".  Writing the controller name with
+the '#' prefix into "cgroup.controllers" resets the state back to
+bypass mode.  The state of a non-domain controller cannot be changed
+anymore if it is enabled or bypassed in its "cgroup.subtree_control".
+
+The use of bypass mode thus allows a cgroup parent to have the ability
+to selectively enable a non-domain controller in a subset of its
+child cgroups instead of in either all or none of them. In other words,
+a non-domain controller can be enabled only on the cgroup that actually
+needs it, if desired.
+
 
 Top-down Constraint
 ~~~
@@ -397,10 +417,11 @@ Top-down Constraint
 Resources are distributed top-down and a cgroup can further distribute
 a resource only if the resource has been distributed to it from the
 parent.  This means that all non-root "cgroup.subtree_control" files
-can only contain controllers which are enabled in the parent's
-"cgroup.subtree_control" file.  A controller can be enabled only if
-the parent has the controller enabled and a controller can't be
-disabled if one or more children have it enabled.
+can only contain controllers which are enabled or bypassed in the parent's
+"cgroup.subtree_control" file.  A controller can be enabled or bypassed
+only if the parent has the controller enabled or bypassed and the
+state of a controller can't be changed if one or more children have
+it enabled or bypassed.
 
 
 No Internal Process Constraint
@@ -823,11 +844,18 @@ All cgroup core files are prefixed with "cgroup."
should be granted along with the containing directory.
 
   cgroup.controllers
-   A read-only space separated values file which exists on all
+   A read-write space separated values file which exists on all
cgroups.
 
It shows space separated list of all controllers available to
-   the cgroup.  The controllers are not ordered.
+   the cgroup.  Controller names with '#' prefix are in bypass
+   mode.  The controllers are not ordered.
+
+   When a controller is set into bypass mode in its parent's
+   "cgroup.subtree_control", its name prefixed with '+' or '#'
+   can be written to enable it or reset it back to bypass mode
+   respectively.  Controllers not in bypass mode are not allowed
+   to be written.
 
   cgroup.subtree_control
A read-write space separated values file which exists on all
@@ -837,12 +865,12 @@ All cgroup core files are prefixed with "cgroup."
which are enabled to control 

Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Brendan Higgins
> Perhaps that is some level of abuse, but it's pretty common.  I'm not
> against it.
>
> There is standard IPMI firmware NetFN (though no commands defined) that if
> you use
> the driver automatically goes into "Maintenance mode" and modified the
> timeouts
> and handling to some extent to help with this.

That is a really good point, I missed that.
...
>
>
> There are ways to accomplish this that aren't that complex.  You can create
> an OEM
> command that can query the maximum message size and the ability to do
> sequence
> numbers in the messages.
>
> If messages larger than 32-bytes are supported, and the host I2C/SMBus
> driver
> supports it, you could use the standard SSIF SMBus commands to do this, they
> have an 8-bit length field.
>
> If sequence numbers are supported, The SSIF could use different SMBus
> commands
> to do the write and read requests.  Since this is only if you get an OEM
> command,
> and if you put the sequence numbers at the end where they are easy to add on
> the send side, this is a small change to the driver.

What if we just had an OEM command that changed the message structure from
that point on? We could abuse the "maintenance mode" NetFN to get back into
normal SSIF if necessary.

>
> So I think the changes would be small and contained.  I'm actually ok with a
> different driver, but I think it would be more valuable to the OpenBMC
> project
> to have a standardized interface that would work (in a not quite as
> efficient
> mode) with software that does not use the Linux IPMI driver.

I guess I see the all of my asks as hacky things which we can hopefully remove
at some point. Hopefully, most OpenBMC users won't want or need these things.
...
>>
>> Regardless of what we do with the "BT-I2C" stuff, I am still interested in
>> what
>> you think about this.
>
>
> I think you are right, it probably belongs some place else.  The way that
> makes the most
> sense to me would be to have an "ipmi" directory with a "host" and "slave"
> side, and since
> ipmi is not really a char driver, to move it to the main driver directory.
> That might be
> fairly disruptive, though.

That was my thinking exactly.

>
> The other option that makes sense to me would be to add a
> drivers/char/ipmi_slave directory,
> or something like that, and put the slave code there.  That would be less
> disruptive.

Right that is the approach I took, except I called it drivers/char/ipmi_bmc.

I originally thought doing the less disruptive thing is best; however, I know
there are also some OpenBMC people who are interested in implementing
IPMB. So maybe now is the time to bite the bullet and create an ipmi
directory under drivers/.

>
> -corey

In summary, I think I can live with making it a mangled form of SSIF, but
I would prefer to put it in its own driver.

In any case, I think I would rather focus on the the BMC side IPMI framework
now, since it is a bigger change and would also reduce the work of
implementing a BMC side SSIF driver.

Here is what I propose: we focus on the BMC side IPMI framework RFC that
I sent out the other day:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1463473.html
I will add a change to the BMC side IPMI framework patchset to move all the
IPMI stuff to the new drivers/ipmi directory as discussed and then drop the
patch in that patchset that depends on this patchset.

Let me know what you think
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 0/3] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 3:56 AM, Anton D. Kachalov  wrote:
> Hello,
>
> I would like to mention one of the our related work for IPMI and I2C.
>
> We use OpenIPMI stack to connect to the computing nodes through the I2C
> using IPMB (BT is not supported by nodes):
>
> https://github.com/ya-mouse/meta-openbmc-yandex/blob/master/meta-yandex/meta-openrack/meta-shaosi/recipes-kernel/linux/linux-obmc/ipmi_i2c.c
>
> It lacks complete slave support (slave part is only for receiving known
> packets with query results due to OpenIPMI implementation in kernel) and use
> one local slave to communicate with a number of target systems on the same
> bus (currently supported only 1-to-1 schema).
>
> With this stuff we able to use ipmitool across different /dev/ipmiX devices
> to communicate with nodes.

Cool, I met someone else who had a similar use case which is part of why
I decided to share this (not sure if should say who).

So it sounds like we are probably not going to go with the approach I proposed;
if you indeed find this useful, I would suggest that we put this in our OpenBMC
repository and switch it out with the suggested method at some point.

Let me know what you think
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 3/5] dt-bindings: i3c: Document core bindings

2017-08-09 Thread Rob Herring
On Mon, Jul 31, 2017 at 06:24:48PM +0200, Boris Brezillon wrote:
> A new I3C subsystem has been added and a generic description has been
> created to represent the I3C bus and the devices connected on it.
> 
> Document this generic representation.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  Documentation/devicetree/bindings/i3c/i3c.txt | 90 
> +++
>  1 file changed, 90 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/i3c/i3c.txt
> 
> diff --git a/Documentation/devicetree/bindings/i3c/i3c.txt 
> b/Documentation/devicetree/bindings/i3c/i3c.txt
> new file mode 100644
> index ..49261dec7b01
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/i3c/i3c.txt
> @@ -0,0 +1,90 @@
> +Generic device tree bindings for I3C busses
> +===
> +
> +This document describes generic bindings that should be used to describe I3C
> +busses in a device tree.
> +
> +Required properties
> +---
> +
> +- #address-cells  - should be <1>. Read more about addresses below.
> +- #size-cells - should be <0>.
> +- compatible  - name of I3C bus controller following generic names
> + recommended practice.
> +
> +For other required properties e.g. to describe register sets,
> +clocks, etc. check the binding documentation of the specific driver.
> +
> +Optional properties
> +---
> +
> +These properties may not be supported by all I3C master drivers. Each I3C
> +master bindings should specify which of them are supported.
> +
> +- i3c-scl-frequency: frequency (in Hz) of the SCL signal used for I3C
> +  transfers. When undefined the core set it to 12.5MHz.
> +
> +- i2c-scl-frequency: frequency (in Hz) of the SCL signal used for I2C
> +  transfers. When undefined, the core looks at LVR values
> +  of I2C devices described in the device tree to determine
> +  the maximum I2C frequency.
> +
> +I2C devices
> +===
> +
> +Each I2C device connected to the bus should be described in a subnode with
> +the following properties:
> +
> +All properties described in Documentation/devicetree/bindings/i2c/i2c.txt are
> +valid here.
> +
> +New required properties:
> +
> +- i3c-lvr: 32 bits integer property (only the lowest 8 bits are meaningful)

What does lvr mean?

> +describing device capabilities as described in the I3C
> +specification.
> +
> +bit[31:8]: unused
> +bit[7:5]: I2C device index. Possible values

index? Seems more like flags

> + * 0: I2C device has a 50 ns spike filter
> + * 1: I2C device does not have a 50 ns spike filter but supports high
> +  frequency on SCL
> + * 2: I2C device does not have a 50 ns spike filter and is not
> +  tolerant to high frequencies
> + * 3-7: reserved
> +
> +bit[4]: tell whether the device operates in FM or FM+ mode
> + * 0: FM+ mode
> + * 1: FM mode
> +
> +bit[3:0]: device type
> + * 0-15: reserved

That's useful...

> +
> +I3C devices
> +===
> +
> +I3C are not described in the device tree yet. We could decide to represent 
> them
> +at some point to assign a specific dynamic address to a device or to force an
> +I3C device to act as an I2C device if it has a static address.

I think we need to define this sooner rather than later if there's not a 
standard connector. That's the only thing that would enforce any sort of 
standard. Of course, that didn't help with SDIO.

> +
> +Example:
> +
> + i3c-master@0d04 {

The node name should go into the DT spec. I tend to think "i3c" would be 
sufficient and aligned with i2c.

> + compatible = "cdns,i3c-master";
> + clocks = <>, <>;
> + clock-names = "pclk", "sysclk";
> + interrupts = <3 0>;
> + reg = <0x0d04 0x1000>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + status = "okay";
> + i2c-scl-frequency = <10>;
> +
> + nunchuk: nunchuk@52 {
> + compatible = "nintendo,nunchuk";
> + reg = <0x52>;
> + i3c-lvr = <0x10>;
> + };
> + };
> +
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


make pdfdocs problem with 4.13-rc4

2017-08-09 Thread Jim Davis
On my Fedora 26 workstation, with the latest patches, running make
pdfdocs stops with

[jim@krebstar ~]$ tail /tmp/make-pdfdocs.out

Underfull \hbox (badness 1) in paragraph at lines 3980--3983
[]\EU1/DejaVuSans(0)/m/n/10 Threshold below
[31]
! Missing \endgroup inserted.

\endgroup
l.4114 \begin{savenotes}\sphinxattablestart

?

Pressing the return key (many times) doesn't solve the problem, and
eventually make fizzles out with

make[2]: *** [Makefile:33: media.pdf] Error 1
make[1]: *** [Documentation/Makefile:83: pdfdocs] Error 2
make: *** [Makefile:1473: pdfdocs] Error 2

Oh, and

[jim@krebstar ~]$ grep 'LaTeX Warning:' /tmp/make-pdfdocs.out | wc -l
5438

-- 
Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Corey Minyard

On 08/09/2017 08:04 PM, Brendan Higgins wrote:

Perhaps that is some level of abuse, but it's pretty common.  I'm not
against it.

There is standard IPMI firmware NetFN (though no commands defined) that if
you use
the driver automatically goes into "Maintenance mode" and modified the
timeouts
and handling to some extent to help with this.

That is a really good point, I missed that.
...


There are ways to accomplish this that aren't that complex.  You can create
an OEM
command that can query the maximum message size and the ability to do
sequence
numbers in the messages.

If messages larger than 32-bytes are supported, and the host I2C/SMBus
driver
supports it, you could use the standard SSIF SMBus commands to do this, they
have an 8-bit length field.

If sequence numbers are supported, The SSIF could use different SMBus
commands
to do the write and read requests.  Since this is only if you get an OEM
command,
and if you put the sequence numbers at the end where they are easy to add on
the send side, this is a small change to the driver.

What if we just had an OEM command that changed the message structure from
that point on? We could abuse the "maintenance mode" NetFN to get back into
normal SSIF if necessary.


Actually, I wouldn't have a separate "openbmc mode".  I would have 
OpenBMC always

work with standard SSIF, and have separate SMBus commands for messages with
the sequence number and messages larger than 32 bytes.

I've attached a patch with what I would expect the changes to be to the 
host driver.
It doesn't handle multiple outstanding messages, but it shows what 
detection and a

separate SMBus command would look like.


So I think the changes would be small and contained.  I'm actually ok with a
different driver, but I think it would be more valuable to the OpenBMC
project
to have a standardized interface that would work (in a not quite as
efficient
mode) with software that does not use the Linux IPMI driver.

I guess I see the all of my asks as hacky things which we can hopefully remove
at some point. Hopefully, most OpenBMC users won't want or need these things.
...

Regardless of what we do with the "BT-I2C" stuff, I am still interested in
what
you think about this.


I think you are right, it probably belongs some place else.  The way that
makes the most
sense to me would be to have an "ipmi" directory with a "host" and "slave"
side, and since
ipmi is not really a char driver, to move it to the main driver directory.
That might be
fairly disruptive, though.

That was my thinking exactly.


The other option that makes sense to me would be to add a
drivers/char/ipmi_slave directory,
or something like that, and put the slave code there.  That would be less
disruptive.

Right that is the approach I took, except I called it drivers/char/ipmi_bmc.

I originally thought doing the less disruptive thing is best; however, I know
there are also some OpenBMC people who are interested in implementing
IPMB. So maybe now is the time to bite the bullet and create an ipmi
directory under drivers/.


I'm not sure IPMB would make much difference, there's no host side 
change as it's
already supported.  I don't think there would be any significant code 
sharing

between the two.

If there end up being a significant amount of common code, then it would
definitely be worth the effort to move it.


-corey

In summary, I think I can live with making it a mangled form of SSIF, but
I would prefer to put it in its own driver.


You can look at the patch and consider it, and consider that you would 
need to

implement flag and event handling.  On an x86 host there would be SMBIOS
and ACPI stuff to deal with somehow for discovery.  There's probably few 
other

things to deal with.


In any case, I think I would rather focus on the the BMC side IPMI framework
now, since it is a bigger change and would also reduce the work of
implementing a BMC side SSIF driver.

Here is what I propose: we focus on the BMC side IPMI framework RFC that
I sent out the other day:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1463473.html
I will add a change to the BMC side IPMI framework patchset to move all the
IPMI stuff to the new drivers/ipmi directory as discussed and then drop the
patch in that patchset that depends on this patchset.

Let me know what you think


Let's hold off on the new directory, there's probably some convincing of 
the "powers

that be" for that.

I'll look at the patch set tomorrow, unless something critical comes up.

Thanks,

-corey

diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
index 11237e8..d467b1a 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -60,6 +60,11 @@
 
 #define IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD	0x57
 
+static u8 openbmc_iana[3] = { 0x10, 0x20, 0x30 };
+#define IPMI_OPENBMC_CAPABILITY_REQUEST_CMD	0x01
+#define SSIF_OPENBMC_REQUEST	0x80
+#define SSIF_OPENBMC_RESPONSE	0x81
+
 #define	SSIF_IPMI_REQUEST			2
 

[PATCH] Document:add Chinese translation of rfkill.txt

2017-08-09 Thread wangguohao . 2009
From: "guohao.w" 

Signed-off-by: guohao.w 
---
 Documentation/translations/zh_CN/rfkill.txt | 117 
 1 file changed, 117 insertions(+)
 create mode 100644 Documentation/translations/zh_CN/rfkill.txt

diff --git a/Documentation/translations/zh_CN/rfkill.txt 
b/Documentation/translations/zh_CN/rfkill.txt
new file mode 100644
index ..2c90f5733551
--- /dev/null
+++ b/Documentation/translations/zh_CN/rfkill.txt
@@ -0,0 +1,117 @@
+Chinese translated version of Documentation/rfkill.txt
+
+If you have any comment or update to the content, please post to LKML directly.
+However, if you have problem communicating in English you can also ask the
+Chinese maintainer for help.  Contact the Chinese maintainer, if this
+translation is outdated or there is problem with translation.
+
+Chinese maintainer: guohao.wang 
+-
+Documentation/rfkill.txt的中文翻译
+
+如果想评论或更新本文的内容,请直接发信到LKML。如果你使用英文交流有困难的话,也可
+以向中文版维护者求助。如果本翻译更新不及时或者翻译存在问题,请联系中文版维护者。
+
+中文版维护者:guohao.wang 
+中文版翻译者:guohao.wang 
+中文版校译者:Lily <1517048...@qq.com>
+以下为正文
+-
+
+
+rfkill - RF kill 开关的支持
+===
+
+1. 介绍
+2. 实现细节
+3. 内核API
+4. 用户空间支持
+
+
+1. 介绍
+
+rfkill子系统提供一个通用接口来禁用任意系统中的射频发射器。当发射器被锁定时,它
+将不再消耗任何电力。
+
+这个子系统也具有响应按键操作来禁用特定种类发射器(或全部种类)的能力。这个是适
+用于关闭发射器的场合,比如说在飞行器上。
+
+rfkill子系统提供了“硬”和“软”锁定的概念,它们意思几乎没有区别(断开 == 发射器关
+机),
+而真正的区别在于它们状态是否能被改变:
+ - 硬锁定:只读射频设备锁定,它不能被软件修改。
+ - 软锁定:可写的射频设备锁定(不需可读性),它可被系统软件设置。
+
+rfkill子系统有两个被记录在kernel-parameters.txt的参数rfkill.default_state 与
+rfkill.master_switch_mode。
+
+2. 实现细节
+
+rfkill 子系统是由三个主要部分组成:
+ * rfkill核心,
+ * 已被弃用的rfkill-input模块(一个输入层的handler,被用户空间策略代码替换),
+ * rfkill驱动。
+
+rfkill核心为驱动程序在内核中注册它们的射频发生器的打开和关闭的方法提供API,同时
+使系统知道硬件的禁用状态也许在设备中被实现。
+
+rfkill核心代码还会提醒用户空间状态的改变,并提供为用户空间提供一个查询当前状态的
+方法。更多信息见下节“用户空间支持”。
+
+当设备被硬锁定时(通过调用rfkill_set_hw_state()或query_hw_block)set_bloack()将
+会其他的软件锁定调用,但是驱动可以忽略这个调用因为它们可以使用
+rfkill_set_hw_state()的返回值来同步软件状态以此来替代set_block()调用的追踪。事
+实上,驱动应该使用rfkill_set_hw_state()的返回值除非硬件的确分开跟踪它的软锁定和
+硬锁定。
+
+3. 内核API
+
+设备发射器的驱动通常都实现一个rfkill的驱动。
+
+如果rfkill仅仅是个按钮,平台驱动也许实现输入设备。如果这个按钮要影响硬件你需要
+实现一个rfkill驱动取代平台驱动。如果平台提供一个开/关发射器的方法,以上方法同
+样适用。
+
+对部分平台,在挂起/休眠期间修改硬件状态是非常可能的,这样的情况下在恢复时候以
+当前状态更新rfkill核心的状态很有必要。
+
+去创建一个rfkill驱动,驱动的Kconfig需要有
+
+   depends on RFKILL || !RFKILL
+
+来确保当rfkill是模块时驱动不被编译进内核。!RFKILL允许驱动在rfkill没有被配置的
+情况下编译,这种情况所有的rfkill API 仍然可以被使用但是几乎什么都没有被编译进去。
+
+当状态改变正在发生时调用rfkill_set_hw_state()需要rfkill驱动可以进行硬锁定,除非它
+们也可以分配poll_hw_block()回调(然后rfkill核心将会轮训设备)。不要这样做除非你
+不能通过它方法获取事件。
+
+RFKill 提供一个每开关LED触发器,可以根据开关的状态驱动LED(LED_FULL表示断开,
+LED_OFF表示其他情况)。
+
+5. 用户空间支持
+
+被推荐使用的用户态接口是/dev/rfkill,它属于杂项字符设备,它允许用户态获取和设置
+rfkill的状态来设置硬件。它也通知用户态设备的添加和移除。API是一个简单的读/写
+API,它在linux/rfkill.h中定义,有个ioctl允许关闭在kernel过渡期间废弃的输入
+handler。
+
+除了一个ioctl,与内核的通信是通过“struct rfkill_event”实例read()和write()来
+完成。 在这个结构体中,软锁定和硬锁定块被正确区分(不同于sysfs,如下),用户空间
+能够获得系统中所有rfkill设备的一致的快照。 此外,切换所有rfkill驱动(或指定类
+型的所有驱动程序)的状态可能更新所有热插拔设备的默认状态。
+
+应用程序打开/dev/rfkill后,它可以读到所有设备的当前状态。可以通过轮询热插拔描述
+符,或状态更改事件,在或者侦听rfkill核心框架发出的uevent来获取修改。
+
+此外,每个rfkill设备都注册在sysfs并发出uevents。
+
+rfkill设备发出uevents(具有“更改”的操作),并设置以下环境变量:
+
+RFKILL_NAME
+RFKILL_STATE
+RFKILL_TYPE
+
+这些变量的内容对应于上面解释的“name”,“state”和“type” 的sysfs文件。
+
+更多的细节查看 Documentation/ABI/stable/sysfs-class-rfkill.
-- 
2.13.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-ima-devel] [PATCH, RESEND 08/12] ima: added parser for RPM data type

2017-08-09 Thread Mimi Zohar
On Wed, 2017-08-09 at 11:15 +0200, Roberto Sassu wrote:
> On 8/2/2017 9:22 AM, James Morris wrote:
> > On Tue, 1 Aug 2017, Roberto Sassu wrote:
> >
> >> On 8/1/2017 12:27 PM, Christoph Hellwig wrote:
> >>> On Tue, Aug 01, 2017 at 12:20:36PM +0200, Roberto Sassu wrote:
>  This patch introduces a parser for RPM packages. It extracts the digests
>  from the RPMTAG_FILEDIGESTS header section and converts them to binary
>  data
>  before adding them to the hash table.
> 
>  The advantage of this data type is that verifiers can determine who
>  produced that data, as headers are signed by Linux distributions vendors.
>  RPM headers signatures can be provided as digest list metadata.
> >>>
> >>> Err, parsing arbitrary file formats has no business in the kernel.
> >>
> >> The benefit of this choice is that no actions are required for
> >> Linux distribution vendors to support the solution I'm proposing,
> >> because they already provide signed digest lists (RPM headers).
> >>
> >> Since the proof of loading a digest list is the digest of the
> >> digest list (included in the list metadata), if RPM headers are
> >> converted to a different format, remote attestation verifiers
> >> cannot check the signature.
> >>
> >> If the concern is security, it would be possible to prevent unsigned
> >> RPM headers from being parsed, if the PGP key type is upstreamed
> >> (adding in CC keyri...@vger.kernel.org).
> >
> > It's a security concern and also a layering violation, there should be no
> > need to parse package file formats in the kernel.
> 
> Parsing RPMs is not strictly necessary. Digests from the headers
> can be extracted and written to a new file using the compact data
> format (introduced with patch 7/12).
> 
> At boot time, IMA measures this file before digests are uploaded to the
> kernel. At this point, only files with unknown digest will be added
> to the measurement list. At verification time, verifiers recreate the
> measurement list by merging together the digests uploaded to the
> kernel with the unknown digests. Then, they verify the obtained list.
> 
> There are two ways to verify the digests: searching them in a reference
> database, or checking a signature. With the 'ima-sig' measurement list
> template, it is possible to verify signatures for each accessed file.
> With this patch set, it is possible to verify the signature of
> the file containing the digests uploaded to the kernel. If the data
> format changes, the signature cannot be verified.
> 
> To avoid this limitation, the parsers could be moved to a userspace
> tool which then uploads the parsed digests to the kernel. IMA would
> measure the original files. But, if the tool is compromised, it could
> load digests not included in the parsed files. With the current solution
> this problem does not arise because no changes can be done by userspace
> applications to the uploaded data while digests are parsed by IMA.
> 
> I could remove the RPM parser from the patch set for now.
> 
> Is the remaining part of the patch set ok, and is the explanation of
> what it does clear?

>From a trusted boot perspective, file measurements are added to the
measurement list, before access to the file is given.  The measurement
list contains ALL measurements, as defined by policy.  This patch set
changes that meaning to be all measurements, as defined by policy,
with the exception of those in a white list.

Changing the fundamental meaning of the measurement list is not
acceptable.  You could define a new securityfs file to differentiate
between the full measurement list and this abbreviated one.  But
before making this sort of change, I would prefer to address the
underlying problem - TPM peformance.

There are a couple of things that could be done to improve the TPM
driver performance, itself.  Once all of these options have been
pursued, we could then consider batching the measurements to the TPM,
meaning that the measurement list would still contain all the file
measurements, but instead of extending the TPM for each measurement, a
batched hash - a hash of a group of file measurements - would be
extended into the TPM.

Mimi

> > I'm not really clear on exactly how this patch series works.  Can you
> > provide a more concrete explanation of what steps would occur during boot
> > and attestation?
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html