Re: [PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-05-21 Thread Dou Liyang



At 05/19/2018 11:06 PM, Thomas Gleixner wrote:

On Tue, 20 Mar 2018, Dou Liyang wrote:


ACPI driver should make sure all the processor IDs in their ACPI Namespace
are unique for CPU hotplug. the driver performs a depth-first walk of the
namespace tree and calls the acpi_processor_ids_walk().

But, the acpi_processor_ids_walk() will return true if one processor is
checked, that cause the walk break after walking pass the first processor.

Repace the value with AE_OK which is the standard acpi_status value.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor 
enumeration")

Signed-off-by: Dou Liyang 
---
  drivers/acpi/acpi_processor.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..db5bdb59639c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -663,11 +663,11 @@ static acpi_status __init (acpi_handle handle,
}
  
  	processor_validated_ids_update(uid);

-   return true;
+   return AE_OK;
  
  err:

acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
+   return AE_OK;


I'm not sure whether this is the right return value here. Rafael?


Hi, Thomas, Rafael,

Yes, I used AE_OK to make sure it can skip the invalid objects and
continue to do the following other objects, I'm also not sure.

For this bug, recently, I sent another patch to remove this check code
away.

   https://lkml.org/lkml/2018/5/17/320

IMO, the duplicate IDs can be avoid by the other code

   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))  1)

As the mapping of cpu_id(pr->id) and processor_id is fixed, when
hot-plugging a physical CPU, if its processor_id is duplicated with the
present, the above condition 1) will be 0, and Linux will do not add
this CPU.

And, when every time the system starts, this code will be executed, it
will waste more time with the increase in the number of CPU.

So I prefer to remove this code.

Thanks,
dou


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Documentation/translations: Italian

2018-05-21 Thread Jonathan Corbet
On Mon, 21 May 2018 22:54:18 +0200
Federico Vaga  wrote:

> I'm writing you because I would like to start an effort to translate the 
> Documentation in Italian. I would like also to express the idea of providing 
> guide lines for translations.

Mi sembra un'ottima idea! :)

> I know that there are already translations for Asian languages but I am not 
> able to find the history of them. I do not know if translations in European 
> languages are going to be accepted (perhaps there is the assumption that 
> everyone knows English in the European continent and it is a waste of energy 
> to do translations[?]). For example, even if French and Germans are quite 
> active there are not translations yet in their language: is there a 
> particular 
> reason or simply nobody did it?

Nobody has done it.  There certainly is no policy against translations to
any specific language - that would be hard to justify, to say the least.

OK, I might draw the line at Klingon.  But the discussion of error handling
in Klingon could actually be a lot of fun.

I'm happy to accept new translations of stuff in the documentation
directory.  In general, I've had two concerns about translations: they are
generally impossible for me to review, and there needs to be somebody
committed to keeping the translations current as the documentation
changes.  For Italian, the first problem doesn't exist, but the second is
always there. What are your intentions for maintaining the translations in
the long term?

> If you agree with the need to support different translations, I would like to 
> do the Italian one. But first I would like to open a little discussion about 
> translations  "how to write translations"; this discussion should produce a 
> document (in English) with guide lines for translator (e.g. Documentation/
> translation/howto.rst): what to translate first, what to NOT translate, how 
> to 
> structure it.
> Once this is defined I will start the Italian translation (I already have 
> some 
> documents translated).

This can be a fine plan, assuming we're convinced that the guidelines
document is really needed.  I guess I'm not yet convinced of that.  But you
might also consider gaining some experience in writing, merging, and
maintaining a translation before trying to lay down rules for everybody
else.  In other words, I think you might want to do things in the opposite
order.

> How to do translations (IMHO)
> -
> Here my personal guide lines for translations
> 
> - Translate only sphinx-ready documents, do not translate documents which are 
> not yet sphinx. We should avoid useless double work; at some point, I guess, 
> everything will be sphinx.

I wouldn't insist on that.  But a better idea in any case would be: if a
document you want to translate isn't yet in RST, just do the conversion.
The amount of work required is usually quite small.

> - Include in all documents a disclaimer saying that English is the main 
> reference (use sphinx directive 'include' to include it).
> - Include in all documents a reference to the English version. So it will be 
> easy jump to the original document.

Remember that the docs need to be readable *without* Sphinx processing.
Better to just name the source document in a quick line at the top, IMO.

> - Translate in order: non-technical documents (they are stable, useful for a 
> wider group of people (developers and managers): process/, doc-guide/ ), 
> technical documents about key concepts (they are stable, and important for 
> new-comers), subsystems (the big picture is stable, typically they do not 
> describe all little details that may change), and then other documents

If you want to work in that order, that is more than fine.  Others have
agreed - the process docs tend to get translated first.  But if somebody
else wants to start elsewhere, I wouldn't try to tell them not to.

Anyway, thanks for wanting to help improve the documentation!  If you have
some of this work already done, you might want to consider going ahead and
posting some patches.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Documentation/translations: Italian

2018-05-21 Thread Federico Vaga
Hello,

I'm writing you because I would like to start an effort to translate the 
Documentation in Italian. I would like also to express the idea of providing 
guide lines for translations.

A looked a bit in the archive but I did not find anything about these two 
topics (Italian translation, guide lines for translations).

I know that there are already translations for Asian languages but I am not 
able to find the history of them. I do not know if translations in European 
languages are going to be accepted (perhaps there is the assumption that 
everyone knows English in the European continent and it is a waste of energy 
to do translations[?]). For example, even if French and Germans are quite 
active there are not translations yet in their language: is there a particular 
reason or simply nobody did it?

Why
===
There is nothing better for understanding than our own mother tongue, and 
reading Documentation is one of those activities where it is important to 
understand its message rather than learning a different language (there are 
dedicated books and courses for that). This is especially true for young 
developers and new-comers who are really focused on understanding Linux and a 
different language can be an obstacle sometimes. I personally had a couple of 
experiences where I pointed people to the documentation and I had to explain 
English rather than Linux. Very competent people but they were not used to use 
English every day.

I put myself in this list of people who prefer the mother tongue language when 
it is time to really understand something. I work for an international 
organization in a country that is not mine with people coming from all around 
the European continent and our common tongue is bad-English with all its 
dialects and accents: true-English (with its own dialects), spaghetti-English, 
kartoffel-English, paella-English, formage-English and more. Misunderstanding 
is not rare, and sometimes express ourselves takes more time than needed. This 
is another reason why I believe that for understanding purposes is good to 
read in our own mother tongue.

Plan

If you agree with the need to support different translations, I would like to 
do the Italian one. But first I would like to open a little discussion about 
translations  "how to write translations"; this discussion should produce a 
document (in English) with guide lines for translator (e.g. Documentation/
translation/howto.rst): what to translate first, what to NOT translate, how to 
structure it.
Once this is defined I will start the Italian translation (I already have some 
documents translated).

How to do translations (IMHO)
-
Here my personal guide lines for translations

- Translate only sphinx-ready documents, do not translate documents which are 
not yet sphinx. We should avoid useless double work; at some point, I guess, 
everything will be sphinx.
- Include in all documents a disclaimer saying that English is the main 
reference (use sphinx directive 'include' to include it).
- Include in all documents a reference to the English version. So it will be 
easy jump to the original document.
- Translate in order: non-technical documents (they are stable, useful for a 
wider group of people (developers and managers): process/, doc-guide/ ), 
technical documents about key concepts (they are stable, and important for 
new-comers), subsystems (the big picture is stable, typically they do not 
describe all little details that may change), and then other documents
- avoid scattered translations: try to finish one "topic" before translating 
something else

Probably there is much more, that's why I would like to have a little 
discussion about it.


Thanks for reading everything :)

-- 
Federico Vaga
http://www.federicovaga.it/


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller

2018-05-21 Thread Wolfram Sang
Hi,

On Fri, Mar 23, 2018 at 02:20:59PM -0600, Karthikeyan Ramasubramanian wrote:
> This bus driver supports the GENI based i2c hardware controller in the
> Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
> module supporting a wide range of serial interfaces including I2C. The
> driver supports FIFO mode and DMA mode of transfer and switches modes
> dynamically depending on the size of the transfer.
> 
> Signed-off-by: Karthikeyan Ramasubramanian 
> Signed-off-by: Sagar Dharia 
> Signed-off-by: Girish Mahadevan 

Is one of these people interested in maintaining this driver? Then, an
entry for MAINTAINERS would be needed, too. (Same goes for
drivers/soc/qcom/ IMHO, but this is not my realm, so just saying)

> +static const struct geni_i2c_err_log gi2c_log[] = {
> + [GP_IRQ0] = {-EINVAL, "Unknown I2C err GP_IRQ0"},
> + [NACK] = {-ENOTCONN, "NACK: slv unresponsive, check its 
> power/reset-ln"},
> + [GP_IRQ2] = {-EINVAL, "Unknown I2C err GP IRQ2"},
> + [BUS_PROTO] = {-EPROTO, "Bus proto err, noisy/unepxected start/stop"},
> + [ARB_LOST] = {-EBUSY, "Bus arbitration lost, clock line undriveable"},
> + [GP_IRQ5] = {-EINVAL, "Unknown I2C err GP IRQ5"},
> + [GENI_OVERRUN] = {-EIO, "Cmd overrun, check GENI cmd-state machine"},
> + [GENI_ILLEGAL_CMD] = {-EILSEQ, "Illegal cmd, check GENI cmd-state 
> machine"},
> + [GENI_ABORT_DONE] = {-ETIMEDOUT, "Abort after timeout successful"},
> + [GENI_TIMEOUT] = {-ETIMEDOUT, "I2C TXN timed out"},
> +};

Please check Documentation/i2c/fault-codes for better -ERRNO values,
especially for NACK and ARB_LOST.

Rest looks good from a glimpse.

Thanks,

   Wolfram



signature.asc
Description: PGP signature


[v4 08/11] Documentation: hwmon: Add documents for PECI hwmon client drivers

2018-05-21 Thread Jae Hyun Yoo
This commit adds hwmon documents for PECI cputemp and dimmtemp drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Jason M Biils 
Cc: Randy Dunlap 
---
 Documentation/hwmon/peci-cputemp  | 78 +++
 Documentation/hwmon/peci-dimmtemp | 50 
 2 files changed, 128 insertions(+)
 create mode 100644 Documentation/hwmon/peci-cputemp
 create mode 100644 Documentation/hwmon/peci-dimmtemp

diff --git a/Documentation/hwmon/peci-cputemp b/Documentation/hwmon/peci-cputemp
new file mode 100644
index ..821a9258f2e6
--- /dev/null
+++ b/Documentation/hwmon/peci-cputemp
@@ -0,0 +1,78 @@
+Kernel driver peci-cputemp
+==
+
+Supported chips:
+   One of Intel server CPUs listed below which is connected to a PECI bus.
+   * Intel Xeon E5/E7 v3 server processors
+   Intel Xeon E5-14xx v3 family
+   Intel Xeon E5-24xx v3 family
+   Intel Xeon E5-16xx v3 family
+   Intel Xeon E5-26xx v3 family
+   Intel Xeon E5-46xx v3 family
+   Intel Xeon E7-48xx v3 family
+   Intel Xeon E7-88xx v3 family
+   * Intel Xeon E5/E7 v4 server processors
+   Intel Xeon E5-16xx v4 family
+   Intel Xeon E5-26xx v4 family
+   Intel Xeon E5-46xx v4 family
+   Intel Xeon E7-48xx v4 family
+   Intel Xeon E7-88xx v4 family
+   * Intel Xeon Scalable server processors
+   Intel Xeon Bronze family
+   Intel Xeon Silver family
+   Intel Xeon Gold family
+   Intel Xeon Platinum family
+   Addresses scanned: PECI client address 0x30 - 0x37
+   Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author:
+   Jae Hyun Yoo 
+
+Description
+---
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
+accessible using the PECI Client Command Suite via the processor PECI client.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+sysfs attributes
+
+
+temp1_label"Die"
+temp1_inputProvides current die temperature of the CPU package.
+temp1_max  Provides thermal control temperature of the CPU package
+   which is also known as Tcontrol.
+temp1_crit Provides shutdown temperature of the CPU package which
+   is also known as the maximum processor junction
+   temperature, Tjmax or Tprochot.
+temp1_crit_hystProvides the hysteresis value from Tcontrol to 
Tjmax of
+   the CPU package.
+
+temp2_label"Tcontrol"
+temp2_inputProvides current Tcontrol temperature of the CPU
+   package which is also known as Fan Temperature target.
+   Indicates the relative value from thermal monitor trip
+   temperature at which fans should be engaged.
+temp2_crit Provides Tcontrol critical value of the CPU package
+   which is same to Tjmax.
+
+temp3_label"Tthrottle"
+temp3_inputProvides current Tthrottle temperature of the CPU
+   package. Used for throttling temperature. If this value
+   is allowed and lower than Tjmax - the throttle will
+   occur and reported at lower than Tjmax.
+
+temp4_label"Tjmax"
+temp4_inputProvides the maximum junction temperature, Tjmax of the
+   CPU package.
+
+temp[5-*]_labelProvides string "Core X", where X is resolved 
core
+   number.
+temp[5-*]_inputProvides current temperature of each core.
+temp[5-*]_max  Provides thermal control temperature of the core.
+temp[5-*]_crit Provides shutdown temperature of the core.
+temp[5-*]_crit_hystProvides the hysteresis value from Tcontrol to Tjmax of
+   the core.
diff --git a/Documentation/hwmon/peci-dimmtemp 
b/Documentation/hwmon/peci-dimmtemp
new file mode 100644
index ..c54f2526188c
--- /dev/null
+++ b/Documentation/hwmon/peci-dimmtemp
@@ -0,0 +1,50 @@
+Kernel driver peci-dimmtemp
+===
+
+Supported chips:
+   One of Intel server CPUs listed below which is connected to a PECI 

[v4 02/11] Documentation: ioctl: Add ioctl numbers for PECI subsystem

2018-05-21 Thread Jae Hyun Yoo
This commit updates ioctl-number.txt to reflect ioctl numbers used
by the PECI subsystem.

Signed-off-by: Jae Hyun Yoo 
Cc: James Feist 
Cc: Jason M Biils 
Cc: Vernon Mauery 
---
 Documentation/ioctl/ioctl-number.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 480c8609dc58..1670ca4072b2 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -322,6 +322,8 @@ Code  Seq#(hex) Include FileComments
 0xB3   00  linux/mmc/ioctl.h
 0xB4   00-0F   linux/gpio.h
 0xB5   00-0F   uapi/linux/rpmsg.h  

+0xB6   00-0F   uapi/linux/peci-ioctl.h PECI subsystem
+   
 0xC0   00-0F   linux/usb/iowarrior.h
 0xCA   00-0F   uapi/misc/cxl.h
 0xCA   10-2F   uapi/misc/ocxl.h
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 00/10] PECI device driver introduction

2018-05-21 Thread Jae Hyun Yoo
Introduction of the Platform Environment Control Interface (PECI) bus
device driver. PECI is a one-wire bus interface that provides a
communication channel between an Intel processor and chipset components to
external monitoring or control devices. PECI is designed to support the
following sideband functions:

* Processor and DRAM thermal management
  - Processor fan speed control is managed by comparing Digital Thermal
Sensor (DTS) thermal readings acquired via PECI against the
processor-specific fan speed control reference point, or TCONTROL. Both
TCONTROL and DTS thermal readings are accessible via the processor PECI
client. These variables are referenced to a common temperature, the TCC
activation point, and are both defined as negative offsets from that
reference.
  - PECI based access to the processor package configuration space provides
a means for Baseboard Management Controllers (BMC) or other platform
management devices to actively manage the processor and memory power
and thermal features.

* Platform Manageability
  - Platform manageability functions including thermal, power, and error
monitoring. Note that platform 'power' management includes monitoring
and control for both the processor and DRAM subsystem to assist with
data center power limiting.
  - PECI allows read access to certain error registers in the processor MSR
space and status monitoring registers in the PCI configuration space
within the processor and downstream devices.
  - PECI permits writes to certain registers in the processor PCI
configuration space.

* Processor Interface Tuning and Diagnostics
  - Processor interface tuning and diagnostics capabilities
(Intel Interconnect BIST). The processors Intel Interconnect Built In
Self Test (Intel IBIST) allows for infield diagnostic capabilities in
the Intel UPI and memory controller interfaces. PECI provides a port to
execute these diagnostics via its PCI Configuration read and write
capabilities.

* Failure Analysis
  - Output the state of the processor after a failure for analysis via
Crashdump.

PECI uses a single wire for self-clocking and data transfer. The bus
requires no additional control lines. The physical layer is a self-clocked
one-wire bus that begins each bit with a driven, rising edge from an idle
level near zero volts. The duration of the signal driven high depends on
whether the bit value is a logic '0' or logic '1'. PECI also includes
variable data transfer rate established with every message. In this way, it
is highly flexible even though underlying logic is simple.

The interface design was optimized for interfacing between an Intel
processor and chipset components in both single processor and multiple
processor environments. The single wire interface provides low board
routing overhead for the multiple load connections in the congested routing
area near the processor and chipset components. Bus speed, error checking,
and low protocol overhead provides adequate link bandwidth and reliability
to transfer critical device operating conditions and configuration
information.

This implementation provides the basic framework to add PECI extensions to
the Linux bus and device models. A hardware specific 'Adapter' driver can
be attached to the PECI bus to provide sideband functions described above.
It is also possible to access all devices on an adapter from userspace
through the /dev interface. A device specific 'Client' driver also can be
attached to the PECI bus so each processor client's features can be
supported by the 'Client' driver through an adapter connection in the bus.
This patch set includes Aspeed 24xx/25xx PECI driver and PECI
cputemp/dimmtemp drivers as the first implementation for both adapter and
client drivers on the PECI bus framework.

Please review.

Thanks,

-Jae

Changes from v3:
* Made code more simple and compact.
* Removed unused header file inclusion.
* Fixed incorrect error return values and messages.
* Removed DTS margin temperature from the peci-cputemp.
* Made some magic numbers use defines.
* Moved peci_get_cpu_id() into peci-core as a common function.
* Replaced the cancel_delayed_work() call with a cancel_delayed_work_sync().
* Replaced AST and Aspeed uses with ASPEED.
* Simplified peci command timeout checking logic using
  regmap_read_poll_timeout().
* Simplified endian swap codes using endian handling macros.
* Dropped regmap read/write error checking except for the first access.
* Added a PECI reset setting in the device tree node.
* Removed unnecessary sleep from the probe context.
* Removed IRQF_SHARED flag from irq request code in the ASPEED PECI driver.
* Fixed typos in documents.
* Combined peci-bus.txt, peci-adapter.txt and peci-client.txt into peci.txt.
* Fixed and swept documents to drop some incorrect or unnecessary
  descriptions.
* Fixed device tree to make unit-address format use reg contents.
* Simplified bit manipulations using .
* 

Re: [PATCH 0/3] bpf: add boot parameters for sysctl knobs

2018-05-21 Thread Alexei Starovoitov
On Mon, May 21, 2018 at 02:29:30PM +0200, Eugene Syromiatnikov wrote:
> Hello.
> 
> This patch set adds ability to set default values for
> kernel.unprivileged_bpf_disable, net.core.bpf_jit_harden,
> net.core.bpf_jit_kallsyms sysctl knobs as well as option to override
> them via a boot-time kernel parameter.

Commits log not only should explain 'what' is being done by the patch,
but 'why' as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: document hung_task_panic kernel parameter

2018-05-21 Thread Omar Sandoval
From: Omar Sandoval 

This parameter has been around since commit e162b39a368f ("softlockup:
decouple hung tasks check from softlockup detection") in 2009 but was
never documented.

Signed-off-by: Omar Sandoval 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28ecdb6d..4e37bebdc3d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1341,6 +1341,16 @@
x86-64 are 2M (when the CPU supports "pse") and 1G
(when the CPU supports the "pdpe1gb" cpuinfo flag).
 
+   hung_task_panic=
+   [KNL] Should the hung task detector generate panics.
+   Format: 
+
+   A nonzero value instructs the kernel to panic when a
+   hung task is detected. The default value is controlled
+   by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
+   option. The value selected by this boot parameter can
+   be changed later by the kernel.hung_task_panic sysctl.
+
hvc_iucv=   [S390] Number of z/VM IUCV hypervisor console (HVC)
   terminal devices. Valid values: 0..8
hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs.
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller

2018-05-21 Thread Doug Anderson
Wolfram,

On Fri, Mar 23, 2018 at 4:34 PM, Doug Anderson  wrote:
> Hi,
>
> On Fri, Mar 23, 2018 at 1:20 PM, Karthikeyan Ramasubramanian
>  wrote:
>> This bus driver supports the GENI based i2c hardware controller in the
>> Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
>> module supporting a wide range of serial interfaces including I2C. The
>> driver supports FIFO mode and DMA mode of transfer and switches modes
>> dynamically depending on the size of the transfer.
>>
>> Signed-off-by: Karthikeyan Ramasubramanian 
>> Signed-off-by: Sagar Dharia 
>> Signed-off-by: Girish Mahadevan 
>> ---
>>  drivers/i2c/busses/Kconfig |  13 +
>>  drivers/i2c/busses/Makefile|   1 +
>>  drivers/i2c/busses/i2c-qcom-geni.c | 650 
>> +
>>  3 files changed, 664 insertions(+)
>
> [...]
>
>> +/*
>> + * Hardware uses the underlying formula to calculate time periods of
>> + * SCL clock cycle. Firmware uses some additional cycles excluded from the
>> + * below formula and it is confirmed that the time periods are within
>> + * specification limits.
>
> I was hoping for more than just "oh, and there's a fudge factor", but
> I guess this is the best I'm going to get?
>
>
>> +static int geni_i2c_probe(struct platform_device *pdev)
>> +{
>> +   struct geni_i2c_dev *gi2c;
>> +   struct resource *res;
>> +   u32 proto, tx_depth;
>> +   int ret;
>> +
>> +   gi2c = devm_kzalloc(>dev, sizeof(*gi2c), GFP_KERNEL);
>> +   if (!gi2c)
>> +   return -ENOMEM;
>> +
>> +   gi2c->se.dev = >dev;
>> +   gi2c->se.wrapper = dev_get_drvdata(pdev->dev.parent);
>> +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +   gi2c->se.base = devm_ioremap_resource(>dev, res);
>> +   if (IS_ERR(gi2c->se.base))
>> +   return PTR_ERR(gi2c->se.base);
>> +
>> +   gi2c->se.clk = devm_clk_get(>dev, "se");
>> +   if (IS_ERR(gi2c->se.clk)) {
>> +   ret = PTR_ERR(gi2c->se.clk);
>> +   dev_err(>dev, "Err getting SE Core clk %d\n", ret);
>> +   return ret;
>> +   }
>> +
>> +   ret = device_property_read_u32(>dev, "clock-frequency",
>> +   >clk_freq_out);
>> +   if (ret) {
>> +   /* Clock frequency not specified, so default to 100kHz. */
>> +   dev_info(>dev,
>> +   "Bus frequency not specified, default to 100kHz.\n");
>
> If you happen to spin again, can you remove the comment since it's
> obvious from the string in the print?  It looks a lot like this code:
>
> /* Print hello, world */
> printf("hello, world\n");
>
>
> In any case, that's a pretty minor nit, so I'll add:
>
> Reviewed-by: Douglas Anderson 
>
> ...assuming that the bindings and "geni" code get Acked / landed
> somewhere.  Ideally let's not land this before the geni code lands
> since if the geni API changes for some reason it'll cause us grief.

The bindings and "geni" code have landed in Andy's tree, so whenever
you get a chance it would be super if you could land this i2c driver
(assuming it looks good to you).  I know at least a few people have
been poking at this and it seems to work for basic transfers.

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg

2018-05-21 Thread Mike Kravetz
On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
> Thanks to Mike Kravetz for comment on the previous version patch.
> 
> The purpose of this patch-set is to make it possible to control whether or
> not to charge surplus hugetlb pages obtained by overcommitting to memory
> cgroup. In the future, I am trying to accomplish limiting the memory usage
> of applications that use both normal pages and hugetlb pages by the memory
> cgroup(not use the hugetlb cgroup).
> 
> Applications that use shared libraries like libhugetlbfs.so use both normal
> pages and hugetlb pages, but we do not know how much to use each. Please
> suppose you want to manage the memory usage of such applications by cgroup
> How do you set the memory cgroup and hugetlb cgroup limit when you want to
> limit memory usage to 10GB?
> 
> If you set a limit of 10GB for each, the user can use a total of 20GB of
> memory and can not limit it well. Since it is difficult to estimate the
> ratio used by user of normal pages and hugetlb pages, setting limits of 2GB
> to memory cgroup and 8GB to hugetlb cgroup is not very good idea. In such a
> case, I thought that by using my patch-set, we could manage resources just
> by setting 10GB as the limit of memory cgoup(there is no limit to hugetlb
> cgroup).
> 
> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> struct hstate. If it is true, it charges to the memory cgroup to which the
> task that obtained surplus hugepages belongs. If it is false, do nothing as
> before, and the default value is false. The charge_surplus_huge_pages can
> be controlled procfs or sysfs interfaces.
> 
> Since THP is very effective in environments with kernel page size of 4KB,
> such as x86, there is no reason to positively use HugeTLBfs, so I think
> that there is no situation to enable charge_surplus_huge_pages. However, in
> some distributions such as arm64, the page size of the kernel is 64KB, and
> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> may support multiple huge page sizes, and in such a special environment
> there is a desire to use HugeTLBfs.

One of the basic questions/concerns I have is accounting for surplus huge
pages in the default memory resource controller.  The existing huegtlb
resource controller already takes hugetlbfs huge pages into account,
including surplus pages.  This series would allow surplus pages to be
accounted for in the default  memory controller, or the hugetlb controller
or both.

I understand that current mechanisms do not meet the needs of the above
use case.  The question is whether this is an appropriate way to approach
the issue.  My cgroup experience and knowledge is extremely limited, but
it does not appear that any other resource can be controlled by multiple
controllers.  Therefore, I am concerned that this may be going against
basic cgroup design philosophy.

It would be good to get comments from people more cgroup knowledgeable,
and especially from those involved in the decision to do separate hugetlb
control.

-- 
Mike Kravetz

> 
> The patch set is for 4.17.0-rc3+. I don't know whether patch-set are
> acceptable or not, so I just done a simple test.
> 
> Thanks,
> Tsukada
> 
> TSUKADA Koutaro (7):
>   hugetlb: introduce charge_surplus_huge_pages to struct hstate
>   hugetlb: supports migrate charging for surplus hugepages
>   memcg: use compound_order rather than hpage_nr_pages
>   mm, sysctl: make charging surplus hugepages controllable
>   hugetlb: add charge_surplus_hugepages attribute
>   Documentation, hugetlb: describe about charge_surplus_hugepages
>   memcg: supports movement of surplus hugepages statistics
> 
>  Documentation/vm/hugetlbpage.txt |6 +
>  include/linux/hugetlb.h  |4 +
>  kernel/sysctl.c  |7 +
>  mm/hugetlb.c |  148 
> +++
>  mm/memcontrol.c  |  109 +++-
>  5 files changed, 269 insertions(+), 5 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy

2018-05-21 Thread Waiman Long
On 05/21/2018 11:09 AM, Patrick Bellasi wrote:
> On 21-May 09:55, Waiman Long wrote:
>
>> Changing cpuset.cpus will require searching for the all the tasks in
>> the cpuset and change its cpu mask.
> ... I'm wondering if that has to be the case. In principle there can
> be a different solution which is: update on demand. In the wakeup
> path, once we know a task really need a CPU and we want to find one
> for it, at that point we can align the cpuset mask with the task's
> one. Sort of using the cpuset mask as a clamp on top of the task's
> affinity mask.
>
> The main downside of such an approach could be the overheads in the
> wakeup path... but, still... that should be measured.
> The advantage is that we do not spend time changing attributes of
> tassk which, potentially, could be sleeping for a long time.

We already have a linked list of tasks in a cgroup. So it isn't too hard
to find them. Doing update on demand will require adding a bunch of code
to the wakeup path. So unless there is a good reason to do it, I don't
it as necessary at this point.

>
>> That isn't a fast operation, but it shouldn't be too bad either
>> depending on how many tasks are in the cpuset.
> Indeed, althought it still seems a bit odd and overkilling updating
> task affinity for tasks which are not currently RUNNABLE. Isn't it?
>
>> I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
>> the behavior of a task. So what exactly is the tuning you are thinking
>> about? Is it moving a task from the a high-power cpu to a low power one
>> or vice versa?
> That's defenitively a possible use case. In Android for example we
> usually assign more resources to TOP_APP tasks (those belonging to the
> application you are currently using) while we restrict the resoures
> one we switch an app to be in BACKGROUND.

Switching an app from foreground to background and vice versa shouldn't
happen that frequently. Maybe once every few seconds, at most. I am just
wondering what use cases will require changing cpuset attributes in tens
per second.

> More in general, if you think about a generic Run-Time Resource
> Management framework, which assign resources to the tasks of multiple
> applications and want to have a fine grained control.
>
>> If so, it is probably better to move the task from one cpuset of
>> high-power cpus to another cpuset of low-power cpus.
> This is what Android does not but also what we want to possible
> change, for two main reasons:
>
> 1. it does not fit with the "number one guideline" for proper
>CGroups usage, which is "Organize Once and Control":
>   
> https://elixir.bootlin.com/linux/latest/source/Documentation/cgroup-v2.txt#L518
>where it says that:
>   migrating processes across cgroups frequently as a means to
>   apply different resource restrictions is discouraged.
>
>Despite this giudeline, it turns out that in v1 at least, it seems
>to be faster to move tasks across cpusets then tuning cpuset
>attributes... also when all the tasks are sleeping.

It is probably similar in v2 as the core logic are almost the same.

> 2. it does not allow to get advantages for accounting controllers such
>as the memory controller where, by moving tasks around, we cannot
>properly account and control the amount of memory a task can use.

For v1, memory controller and cpuset controller can be in different
hierarchy. For v2, we have a unified hierarchy. However, we don't need
to enable all the controllers in different levels of the hierarchy. For
example,

A (memory, cpuset) -- B1 (cpuset)
\-- B2 (cpuset)

Cgroup A has memory and cpuset controllers enabled. The child cgroups B1
and B2 only have cpuset enabled. You can move tasks between B1 and B2
and they will be subjected to the same memory limitation as imposed by
the memory controller in A. So there are way to work around that.

> Thsu, for these reasons and also to possibly migrate to the unified
> hierarchy schema proposed by CGroups v2... we would like a
> low-overhead mechanism for setting/tuning cpuset at run-time with
> whatever frequency you like.

We may be able to improve the performance of changing cpuset attribute
somewhat, but I don't believe there will be much improvement here.

 +
 +The "cpuset" controller is hierarchical.  That means the controller
 +cannot use CPUs or memory nodes not allowed in its parent.
 +
 +
 +Cpuset Interface Files
 +~~
 +
 +  cpuset.cpus
 +  A read-write multiple values file which exists on non-root
 +  cpuset-enabled cgroups.
 +
 +  It lists the CPUs allowed to be used by tasks within this
 +  cgroup.  The CPU numbers are comma-separated numbers or
 +  ranges.  For example:
 +
 +# cat cpuset.cpus
 +0-4,6,8-10
 +
 +  An empty value indicates that the cgroup is using the same
 +  setting as the nearest cgroup ancestor with a non-empty
 +  

Re: [PATCH 0/3] docs/vm: transhuge: split userspace bits to admin-guide/mm

2018-05-21 Thread Jonathan Corbet
On Mon, 14 May 2018 11:13:37 +0300
Mike Rapoport  wrote:

> Here are minor updates to transparent hugepage docs. Except from minor
> formatting and spelling updates, these patches re-arrange the transhuge.rst
> so that userspace interface description will not be interleaved with the
> implementation details and it would be possible to split the userspace
> related bits to Documentation/admin-guide/mm, which is done by the third
> patch.

Looks good, I've applied the set, after adding a changelog for #3.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy

2018-05-21 Thread Patrick Bellasi
On 21-May 09:55, Waiman Long wrote:
> On 05/21/2018 07:55 AM, Patrick Bellasi wrote:
> > Hi Waiman!

[...]

> >> +Cpuset
> >> +--
> >> +
> >> +The "cpuset" controller provides a mechanism for constraining
> >> +the CPU and memory node placement of tasks to only the resources
> >> +specified in the cpuset interface files in a task's current cgroup.
> >> +This is especially valuable on large NUMA systems where placing jobs
> >> +on properly sized subsets of the systems with careful processor and
> >> +memory placement to reduce cross-node memory access and contention
> >> +can improve overall system performance.
> > Another quite important use-case for cpuset is Android, where they are
> > actively used to do both power-saving as well as performance tunings.
> > For example, depending on the status of an application, its threads
> > can be allowed to run on all available CPUS (e.g. foreground apps) or
> > be restricted only on few energy efficient CPUs (e.g. backgroud apps).
> >
> > Since here we are at "rewriting" cpusets for v2, I think it's important
> > to keep this mobile world scenario into consideration.
> >
> > For example, in this context, we are looking at the possibility to
> > update/tune cpuset.cpus with a relatively high rate, i.e. tens of
> > times per second. Not sure that's the same update rate usually
> > required for the large NUMA systems you cite above.  However, in this
> > case it's quite important to have really small overheads for these
> > operations.
> 
> The cgroup interface isn't designed for high update throughput.

Indeed, I had the same impression...

> Changing cpuset.cpus will require searching for the all the tasks in
> the cpuset and change its cpu mask.

... I'm wondering if that has to be the case. In principle there can
be a different solution which is: update on demand. In the wakeup
path, once we know a task really need a CPU and we want to find one
for it, at that point we can align the cpuset mask with the task's
one. Sort of using the cpuset mask as a clamp on top of the task's
affinity mask.

The main downside of such an approach could be the overheads in the
wakeup path... but, still... that should be measured.
The advantage is that we do not spend time changing attributes of
tassk which, potentially, could be sleeping for a long time.


> That isn't a fast operation, but it shouldn't be too bad either
> depending on how many tasks are in the cpuset.

Indeed, althought it still seems a bit odd and overkilling updating
task affinity for tasks which are not currently RUNNABLE. Isn't it?

> I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
> the behavior of a task. So what exactly is the tuning you are thinking
> about? Is it moving a task from the a high-power cpu to a low power one
> or vice versa?

That's defenitively a possible use case. In Android for example we
usually assign more resources to TOP_APP tasks (those belonging to the
application you are currently using) while we restrict the resoures
one we switch an app to be in BACKGROUND.

More in general, if you think about a generic Run-Time Resource
Management framework, which assign resources to the tasks of multiple
applications and want to have a fine grained control.

> If so, it is probably better to move the task from one cpuset of
> high-power cpus to another cpuset of low-power cpus.

This is what Android does not but also what we want to possible
change, for two main reasons:

1. it does not fit with the "number one guideline" for proper
   CGroups usage, which is "Organize Once and Control":
  
https://elixir.bootlin.com/linux/latest/source/Documentation/cgroup-v2.txt#L518
   where it says that:
  migrating processes across cgroups frequently as a means to
  apply different resource restrictions is discouraged.

   Despite this giudeline, it turns out that in v1 at least, it seems
   to be faster to move tasks across cpusets then tuning cpuset
   attributes... also when all the tasks are sleeping.


2. it does not allow to get advantages for accounting controllers such
   as the memory controller where, by moving tasks around, we cannot
   properly account and control the amount of memory a task can use.

Thsu, for these reasons and also to possibly migrate to the unified
hierarchy schema proposed by CGroups v2... we would like a
low-overhead mechanism for setting/tuning cpuset at run-time with
whatever frequency you like.

> >> +
> >> +The "cpuset" controller is hierarchical.  That means the controller
> >> +cannot use CPUs or memory nodes not allowed in its parent.
> >> +
> >> +
> >> +Cpuset Interface Files
> >> +~~
> >> +
> >> +  cpuset.cpus
> >> +  A read-write multiple values file which exists on non-root
> >> +  cpuset-enabled cgroups.
> >> +
> >> +  It lists the CPUs allowed to be used by tasks within this
> >> +  cgroup.  The CPU numbers are comma-separated numbers or
> >> +  ranges.  For example:
> >> +
> >> +# cat 

Re: [PATCH v2 3/7] memcg: use compound_order rather than hpage_nr_pages

2018-05-21 Thread Punit Agrawal
TSUKADA Koutaro  writes:

> On 2018/05/19 2:51, Punit Agrawal wrote:
>> Punit Agrawal  writes:
>>
>>> Tsukada-san,
>>>
>>> I am not familiar with memcg so can't comment about whether the patchset
>>> is the right way to solve the problem outlined in the cover letter but
>>> had a couple of comments about this patch.
>>>
>>> TSUKADA Koutaro  writes:
>>>
 The current memcg implementation assumes that the compound page is THP.
 In order to be able to charge surplus hugepage, we use compound_order.

 Signed-off-by: TSUKADA Koutaro 
>>>
>>> Please move this before Patch 1/7. This is to prevent wrong accounting
>>> of pages to memcg for size != PMD_SIZE.
>>
>> I just noticed that the default state is off so the change isn't enabled
>> until the sysfs node is exposed in the next patch. Please ignore this
>> comment.
>>
>> One below still applies.
>>
>>>
 ---
   memcontrol.c |   10 +-
   1 file changed, 5 insertions(+), 5 deletions(-)

 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index 2bd3df3..a8f1ff8 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -4483,7 +4483,7 @@ static int mem_cgroup_move_account(struct page *page,
   struct mem_cgroup *to)
   {
unsigned long flags;
 -  unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 +  unsigned int nr_pages = compound ? (1 << compound_order(page)) : 1;
>>>
>>> Instead of replacing calls to hpage_nr_pages(), is it possible to modify
>>> it to do the calculation?
>
> Thank you for review my code and please just call me Tsukada.
>
> I think it is possible to modify the inside of itself rather than
> replacing the call to hpage_nr_pages().
>
> Inferring from the processing that hpage_nr_pages() desires, I thought
> that the definition of hpage_nr_pages() could be moved outside the
> CONFIG_TRANSPARENT_HUGEPAGE. It seems that THP and HugeTLBfs can be
> handled correctly because compound_order() is judged by seeing whether it
> is PageHead or not.
>
> Also, I would like to use compound_order() inside hpage_nr_pages(), but
> since huge_mm.h is included before mm.h where compound_order() is defined,
> move hpage_nr_pages to mm.h.
>
> Instead of patch 3/7, are the following patches implementing what you
> intended?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a8a1262..1186ab7 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -204,12 +204,6 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
>   else
>   return NULL;
>  }
> -static inline int hpage_nr_pages(struct page *page)
> -{
> - if (unlikely(PageTransHuge(page)))
> - return HPAGE_PMD_NR;
> - return 1;
> -}
>
>  struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long 
> addr,
>   pmd_t *pmd, int flags);
> @@ -254,8 +248,6 @@ static inline bool thp_migration_supported(void)
>  #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; })
>  #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })
>
> -#define hpage_nr_pages(x) 1
> -
>  static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>  {
>   return false;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ac1f06..082f2ee 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -673,6 +673,12 @@ static inline unsigned int compound_order(struct page 
> *page)
>   return page[1].compound_order;
>  }
>
> +static inline int hpage_nr_pages(struct page *page)
> +{
> + VM_BUG_ON_PAGE(PageTail(page), page);
> + return (1 << compound_order(page));
> +}
> +
>  static inline void set_compound_order(struct page *page, unsigned int order)
>  {
>   page[1].compound_order = order;

That looks a lot better. Thanks for giving it a go.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg

2018-05-21 Thread Punit Agrawal
Hi Tsukada,

I was staring at memcg code to better understand your changes and had
the below thought.

TSUKADA Koutaro  writes:

[...]

> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> struct hstate. If it is true, it charges to the memory cgroup to which the
> task that obtained surplus hugepages belongs. If it is false, do nothing as
> before, and the default value is false. The charge_surplus_huge_pages can
> be controlled procfs or sysfs interfaces.

Instead of tying the surplus huge page charging control per-hstate,
could the control be made per-memcg?

This can be done by introducing a per-memory controller file in sysfs
(memory.charge_surplus_hugepages?) that indicates whether surplus
hugepages are to be charged to the controller and forms part of the
total limit. IIUC, the limit already accounts for page and swap cache
pages.

This would allow the control to be enabled per-cgroup and also keep the
userspace control interface in one place.

As said earlier, I'm not familiar with memcg so the above might not be a
feasible but think it'll lead to a more coherent user
interface. Hopefully, more knowledgeable folks on the thread can chime
in.

Thanks,
Punit

> Since THP is very effective in environments with kernel page size of 4KB,
> such as x86, there is no reason to positively use HugeTLBfs, so I think
> that there is no situation to enable charge_surplus_huge_pages. However, in
> some distributions such as arm64, the page size of the kernel is 64KB, and
> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> may support multiple huge page sizes, and in such a special environment
> there is a desire to use HugeTLBfs.
>
> The patch set is for 4.17.0-rc3+. I don't know whether patch-set are
> acceptable or not, so I just done a simple test.
>
> Thanks,
> Tsukada
>
> TSUKADA Koutaro (7):
>   hugetlb: introduce charge_surplus_huge_pages to struct hstate
>   hugetlb: supports migrate charging for surplus hugepages
>   memcg: use compound_order rather than hpage_nr_pages
>   mm, sysctl: make charging surplus hugepages controllable
>   hugetlb: add charge_surplus_hugepages attribute
>   Documentation, hugetlb: describe about charge_surplus_hugepages
>   memcg: supports movement of surplus hugepages statistics
>
>  Documentation/vm/hugetlbpage.txt |6 +
>  include/linux/hugetlb.h  |4 +
>  kernel/sysctl.c  |7 +
>  mm/hugetlb.c |  148 
> +++
>  mm/memcontrol.c  |  109 +++-
>  5 files changed, 269 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy

2018-05-21 Thread Waiman Long
On 05/21/2018 07:55 AM, Patrick Bellasi wrote:
> Hi Waiman!
>
> I've started looking at the possibility to move Android to use cgroups
> v2 and the availability of the cpuset controller makes this even more
> promising.
>
> I'll try to give a run to this series on Android, meanwhile I have
> some (hopefully not too much dummy) questions below.
>
> On 17-May 16:55, Waiman Long wrote:
>> Given the fact that thread mode had been merged into 4.14, it is now
>> time to enable cpuset to be used in the default hierarchy (cgroup v2)
>> as it is clearly threaded.
>>
>> The cpuset controller had experienced feature creep since its
>> introduction more than a decade ago. Besides the core cpus and mems
>> control files to limit cpus and memory nodes, there are a bunch of
>> additional features that can be controlled from the userspace. Some of
>> the features are of doubtful usefulness and may not be actively used.
>>
>> This patch enables cpuset controller in the default hierarchy with
>> a minimal set of features, namely just the cpus and mems and their
>> effective_* counterparts.  We can certainly add more features to the
>> default hierarchy in the future if there is a real user need for them
>> later on.
>>
>> Alternatively, with the unified hiearachy, it may make more sense
>> to move some of those additional cpuset features, if desired, to
>> memory controller or may be to the cpu controller instead of staying
>> with cpuset.
>>
>> Signed-off-by: Waiman Long 
>> ---
>>  Documentation/cgroup-v2.txt | 90 
>> ++---
>>  kernel/cgroup/cpuset.c  | 48 ++--
>>  2 files changed, 130 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 74cdeae..cf7bac6 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
>> 5-3-2. Writeback
>>   5-4. PID
>> 5-4-1. PID Interface Files
>> - 5-5. Device
>> - 5-6. RDMA
>> -   5-6-1. RDMA Interface Files
>> - 5-7. Misc
>> -   5-7-1. perf_event
>> + 5-5. Cpuset
>> +   5.5-1. Cpuset Interface Files
>> + 5-6. Device
>> + 5-7. RDMA
>> +   5-7-1. RDMA Interface Files
>> + 5-8. Misc
>> +   5-8-1. perf_event
>>   5-N. Non-normative information
>> 5-N-1. CPU controller root cgroup process behaviour
>> 5-N-2. IO controller root cgroup process behaviour
>> @@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN 
>> if the creation
>>  of a new process would cause a cgroup policy to be violated.
>>  
>>  
>> +Cpuset
>> +--
>> +
>> +The "cpuset" controller provides a mechanism for constraining
>> +the CPU and memory node placement of tasks to only the resources
>> +specified in the cpuset interface files in a task's current cgroup.
>> +This is especially valuable on large NUMA systems where placing jobs
>> +on properly sized subsets of the systems with careful processor and
>> +memory placement to reduce cross-node memory access and contention
>> +can improve overall system performance.
> Another quite important use-case for cpuset is Android, where they are
> actively used to do both power-saving as well as performance tunings.
> For example, depending on the status of an application, its threads
> can be allowed to run on all available CPUS (e.g. foreground apps) or
> be restricted only on few energy efficient CPUs (e.g. backgroud apps).
>
> Since here we are at "rewriting" cpusets for v2, I think it's important
> to keep this mobile world scenario into consideration.
>
> For example, in this context, we are looking at the possibility to
> update/tune cpuset.cpus with a relatively high rate, i.e. tens of
> times per second. Not sure that's the same update rate usually
> required for the large NUMA systems you cite above.  However, in this
> case it's quite important to have really small overheads for these
> operations.

The cgroup interface isn't designed for high update throughput. Changing
cpuset.cpus will require searching for the all the tasks in the cpuset
and change its cpu mask. That isn't a fast operation, but it shouldn't
be too bad either depending on how many tasks are in the cpuset.

I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
the behavior of a task. So what exactly is the tuning you are thinking
about? Is it moving a task from the a high-power cpu to a low power one
or vice versa? If so, it is probably better to move the task from one
cpuset of high-power cpus to another cpuset of low-power cpus.

>> +
>> +The "cpuset" controller is hierarchical.  That means the controller
>> +cannot use CPUs or memory nodes not allowed in its parent.
>> +
>> +
>> +Cpuset Interface Files
>> +~~
>> +
>> +  cpuset.cpus
>> +A read-write multiple values file which exists on non-root
>> +

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
On Mon, May 21, 2018 at 4:10 PM, Mark Rutland  wrote:
> On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
>> Hi Ganapat,
>>
>>
>> Sorry for the delay in replying; I was away most of last week.
>>
>> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
>> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
>> > wrote:
>> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
>> > > wrote:
>> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>
>> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore)
>> > >>> +{
>> > >>> + int counter;
>> > >>> +
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> > >>> + pmu_uncore->uncore_dev->max_counters);
>> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return -ENOSPC;
>> > >>> + }
>> > >>> + set_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return counter;
>> > >>> +}
>> > >>> +
>> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore,
>> > >>> + int counter)
>> > >>> +{
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> +}
>> > >>
>> > >> I don't believe that locking is required in either of these, as the perf
>> > >> core serializes pmu::add() and pmu::del(), where these get called.
>> >
>> > without this locking, i am seeing "BUG: scheduling while atomic" when
>> > i run perf with more events together than the maximum counters
>> > supported
>>
>> Did you manage to get to the bottom of this?
>>
>> Do you have a backtrace?
>>
>> It looks like in your latest posting you reserve counters through the
>> userspace ABI, which doesn't seem right to me, and I'd like to
>> understand the problem.
>
> Looks like I misunderstood -- those are still allocated kernel-side.
>
> I'll follow that up in the v5 posting.

please review v5.
>
> Thanks,
> Mark.

thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
Hi Mark,

On Mon, May 21, 2018 at 4:25 PM, Mark Rutland  wrote:
> On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
>> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
>> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>
>> >> + *
>> >> + *  L3 Tile and DMC channel selection is through SMC call
>> >> + *  SMC call arguments,
>> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
>> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
>> >> + *   x2 = Node id
>> >
>> > How do we map Linux node IDs to the firmware's view of node IDs?
>> >
>> > I don't believe the two are necessarily the same -- Linux's node IDs are
>> > a Linux-specific construct.
>>
>> both are same, it is numa node id from ACPI/firmware.
>
> I am very wary about assuming that the Linux nid will always be the same
> as the ACPI node id.
>
> For that to *potentially* be true, this driver should depend on
> CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
> node id will always be NUMA_NO_NODE.

ok, i can check the node id which we get from ACPI helpers in probe.
if it is NUMA_NO_NODE, I will init first socket uncore only and nid
param to fw is always zero?

>
> I would be *much* happier if we had an explicit mapping somewhere to the
> ID the FW expects.
>
>> > It would be much nicer if we could pass something based on the MPIDR,
>> > which is a known HW construct, or if this implicitly affected the
>> > current node.
>>
>> IMO,  node id is sufficient.
>
> I agree that *a* node ID is sufficient, I just don't think that we're
> guaranteed to have the specific node ID the FW wants.

for thunderx2 which is 2 socket only platform, pxm and nid should be
same(either 0 or 1)
however, i can send PXM id(node_to_pxm) to firmware to make it more sane.

>
>> > It would be vastly more sane for this to not be muxed at all. :/
>>
>> i am helpless due to crappy hw design!
>
> I'm certainly not blaming you for this! :)
>
> I hope the HW designers don't make the same mistake in future, though...
>
> Thanks,
> Mark.

thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] bpf: add ability to configure BPF JIT kallsyms export at the boot time

2018-05-21 Thread Eugene Syromiatnikov
This patch introduces two configuration options,
BPF_JIT_KALLSYMS_BOOTPARAM and BPF_JIT_KALLSYMS_BOOTPARAM_VALUE, that
allow configuring the initial value of net.core.bpf_jit_kallsyms sysctl
knob. This enables export of addresses of JIT'ed BPF programs that
created during the early boot.

Signed-off-by: Eugene Syromiatnikov 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 +
 init/Kconfig| 30 +
 kernel/bpf/core.c   | 14 
 3 files changed, 54 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 5adc6d0..10e7502 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -452,6 +452,16 @@
2 - JIT hardening is enabled for all users.
Default value is set via kernel config option.
 
+   bpf_jit_kallsyms=
+   Format: { "0" | "1" }
+   Sets initial value of net.core.bpf_jit_kallsyms
+   sysctl knob.
+   0 - Addresses of JIT'ed BPF programs are not exported
+   to kallsyms.
+   1 - Export of addresses of JIT'ed BPF programs is
+   enabled for privileged users.
+   Default value is set via kernel config option.
+
bttv.card=  [HW,V4L] bttv (bt848 + bt878 based grabber cards)
bttv.radio= Most important insmod options are available as
kernel args too.
diff --git a/init/Kconfig b/init/Kconfig
index b661497..b5405ca 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1464,6 +1464,36 @@ config BPF_JIT_HARDEN_BOOTPARAM_VALUE
 
  If you are unsure how to answer this question, answer 0.
 
+config BPF_JIT_KALLSYMS_BOOTPARAM
+   bool "BPF JIT kallsyms export boot parameter"
+   default n
+   help
+ This option adds a kernel parameter 'bpf_jit_kallsyms' that allows
+ configuring default state of the net.core.bpf_jit_kallsyms sysctl
+ knob.  If this option is selected, the default value of the
+ net.core.bpf_jit_kallsyms sysctl knob can be set on the kernel command
+ line.  The purpose of this option is to allow enabling BPF JIT
+ kallsyms export for the BPF programs created during the early boot,
+ so they can be traced later.
+
+ If you are unsure how to answer this question, answer N.
+
+config BPF_JIT_KALLSYMS_BOOTPARAM_VALUE
+   int "BPF JIT kallsyms export boot parameter default value"
+   depends on BPF_JIT_HARDEN_BOOTPARAM
+   range 0 1
+   default 0
+   help
+ This option sets the default value for the kernel parameter
+ 'bpf_jit_kallsyms' that configures default value of the
+ net.core.bpf_jit_kallsyms sysctl knob at boot.  If this option is set
+ to 0 (zero), the net.core.bpf_jit_kallsyms will default to 0, which
+ will lead to disabling of exporting of addresses of JIT'ed BPF
+ programs.  If this option is set to 1 (one), addresses of privileged
+ BPF programs are exported to kallsyms.
+
+ If you are unsure how to answer this question, answer 0.
+
 config USERFAULTFD
bool "Enable userfaultfd() system call"
select ANON_INODES
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9edb7a8..003d708 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -321,7 +321,21 @@ __setup("bpf_jit_harden=", bpf_jit_harden_setup);
 int bpf_jit_harden   __read_mostly;
 #endif /* CONFIG_BPF_JIT_HARDEN_BOOTPARAM */
 
+#ifdef CONFIG_BPF_JIT_KALLSYMS_BOOTPARAM
+int bpf_jit_kallsyms __read_mostly = CONFIG_BPF_JIT_KALLSYMS_BOOTPARAM_VALUE;
+
+static int __init bpf_jit_kallsyms_setup(char *str)
+{
+   unsigned long enabled;
+
+   if (!kstrtoul(str, 0, ))
+   bpf_jit_kallsyms = !!enabled;
+   return 1;
+}
+__setup("bpf_jit_kallsyms=", bpf_jit_kallsyms_setup);
+#else /* !CONFIG_BPF_JIT_KALLSYMS_BOOTPARAM */
 int bpf_jit_kallsyms __read_mostly;
+#endif /* CONFIG_BPF_JIT_KALLSYMS_BOOTPARAM */
 
 static __always_inline void
 bpf_get_prog_addr_region(const struct bpf_prog *prog,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] bpf: add ability to configure BPF JIT hardening via boot-time parameter

2018-05-21 Thread Eugene Syromiatnikov
This patch introduces two configuration options,
BPF_JIT_HARDEN_BOOTPARAM and BPF_JIT_HARDEN_BOOTPARAM_VALUE, that allow
configuring the initial value of net.core.bpf_jit_harden sysctl knob,
which is useful for enforcing JIT hardening during the early boot.

Signed-off-by: Eugene Syromiatnikov 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 +
 init/Kconfig| 29 +
 kernel/bpf/core.c   | 17 +++
 3 files changed, 56 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index aa8e831..5adc6d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -442,6 +442,16 @@
bert_disable[ACPI]
Disable BERT OS support on buggy BIOSes.
 
+   bpf_jit_harden=
+   Format: { "0" | "1" | "2" }
+   Sets initial value of net.core.bpf_jit_harden
+   sysctl knob.
+   0 - JIT hardening is disabled.
+   1 - JIT hardening is enabled for unprivileged users
+   only.
+   2 - JIT hardening is enabled for all users.
+   Default value is set via kernel config option.
+
bttv.card=  [HW,V4L] bttv (bt848 + bt878 based grabber cards)
bttv.radio= Most important insmod options are available as
kernel args too.
diff --git a/init/Kconfig b/init/Kconfig
index 1403a3e..b661497 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1435,6 +1435,35 @@ config UNPRIVILEGED_BPF_BOOTPARAM_VALUE
 
  If you are unsure how to answer this question, answer 0.
 
+config BPF_JIT_HARDEN_BOOTPARAM
+   bool "BPF JIT harden boot parameter"
+   default n
+   help
+ This option adds a kernel parameter 'bpf_jit_harden' that allows
+ configuring default state of the net.core.bpf_jit_harden sysctl knob.
+ If this option is selected, the default value of the
+ net.core.bpf_jit_harden sysctl knob can be set on the kernel command
+ line.  The purpose of this option is to allow enabling BPF JIT
+ hardening for the BPF programs created during the early boot.
+
+ If you are unsure how to answer this question, answer N.
+
+config BPF_JIT_HARDEN_BOOTPARAM_VALUE
+   int "BPF JIT harden boot parameter default value"
+   depends on BPF_JIT_HARDEN_BOOTPARAM
+   range 0 2
+   default 0
+   help
+ This option sets the default value for the kernel parameter
+ 'bpf_jit_enabled' that configures default value of the
+ net.core.bpf_jit_harden sysctl knob at boot.  If this option is set to
+ 0 (zero), the net.core.bpf_jit_harden will default to 0, which will
+ lead to no hardening at bootup.  If this option is set to 1 (one),
+ hardening will be applied only to unprivileged users only.  If this
+ option is set to 2 (two), JIT hardening will be enabled for all users.
+
+ If you are unsure how to answer this question, answer 0.
+
 config USERFAULTFD
bool "Enable userfaultfd() system call"
select ANON_INODES
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2194c6a..9edb7a8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -303,7 +304,23 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog 
*prog, u32 off,
 #ifdef CONFIG_BPF_JIT
 /* All BPF JIT sysctl knobs here. */
 int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
+
+#ifdef CONFIG_BPF_JIT_HARDEN_BOOTPARAM
+int bpf_jit_harden   __read_mostly = CONFIG_BPF_JIT_HARDEN_BOOTPARAM_VALUE;
+
+static int __init bpf_jit_harden_setup(char *str)
+{
+   unsigned long value;
+
+   if (!kstrtoul(str, 0, ))
+   bpf_jit_harden = min(value, 2UL);
+   return 1;
+}
+__setup("bpf_jit_harden=", bpf_jit_harden_setup);
+#else /* !CONFIG_BPF_JIT_HARDEN_BOOTPARAM */
 int bpf_jit_harden   __read_mostly;
+#endif /* CONFIG_BPF_JIT_HARDEN_BOOTPARAM */
+
 int bpf_jit_kallsyms __read_mostly;
 
 static __always_inline void
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] bpf: add ability to configure unprivileged BPF via boot-time parameter

2018-05-21 Thread Eugene Syromiatnikov
This patch introduces two configuration options,
UNPRIVILEGED_BPF_BOOTPARAM and UNPRIVILEGED_BPF_BOOTPARAM_VALUE, that
allow configuring the initial value of kernel.unprivileged_bpf_disabled
sysctl knob, which is useful for the cases when disabling unprivileged
bpf() access during the early boot is desirable.

Signed-off-by: Eugene Syromiatnikov 
---
 Documentation/admin-guide/kernel-parameters.txt |  8 +++
 init/Kconfig| 31 +
 kernel/bpf/syscall.c| 16 +
 3 files changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28e..aa8e831 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4355,6 +4355,14 @@
unknown_nmi_panic
[X86] Cause panic on unknown NMI.
 
+   unprivileged_bpf_disabled=
+   Format: { "0" | "1" }
+   Sets initial value of kernel.unprivileged_bpf_disabled
+   sysctl knob.
+   0 - unprivileged bpf() syscall access enabled.
+   1 - unprivileged bpf() syscall access disabled.
+   Default value is set via kernel config option.
+
usbcore.authorized_default=
[USB] Default USB device authorization:
(default -1 = authorized except for wireless USB,
diff --git a/init/Kconfig b/init/Kconfig
index 480a4f2..1403a3e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1404,6 +1404,37 @@ config BPF_JIT_ALWAYS_ON
  Enables BPF JIT and removes BPF interpreter to avoid
  speculative execution of BPF instructions by the interpreter
 
+config UNPRIVILEGED_BPF_BOOTPARAM
+   bool "Unprivileged bpf() boot parameter"
+   depends on BPF_SYSCALL
+   default n
+   help
+ This option adds a kernel parameter 'unprivileged_bpf_disabled'
+ that allows configuring default state of the
+ kernel.unprivileged_bpf_disabled sysctl knob.
+ If this option is selected, unprivileged access to the bpf() syscall
+ can be disabled with unprivileged_bpf_disabled=1 on the kernel command
+ line.  The purpose of this option is to allow disabling unprivileged
+ bpf() syscall access during the early boot.
+
+ If you are unsure how to answer this question, answer N.
+
+config UNPRIVILEGED_BPF_BOOTPARAM_VALUE
+   int "Unprivileged bpf() boot parameter default value"
+   depends on UNPRIVILEGED_BPF_BOOTPARAM
+   range 0 1
+   default 0
+   help
+ This option sets the default value for the kernel parameter
+ 'unprivileged_bpf_disabled', which allows disabling unprivileged bpf()
+ syscall access at boot.  If this option is set to 0 (zero), the
+ unprivileged bpf() boot kernel parameter will default to 0, allowing
+ unprivileged bpf() syscall access at bootup.  If this option is
+ set to 1 (one), the unprivileged bpf() kernel parameter will default
+ to 1, disabling unprivileged bpf() syscall access at bootup.
+
+ If you are unsure how to answer this question, answer 0.
+
 config USERFAULTFD
bool "Enable userfaultfd() system call"
select ANON_INODES
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bfcde94..fdc5fd9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
   (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
@@ -45,7 +46,22 @@ static DEFINE_SPINLOCK(prog_idr_lock);
 static DEFINE_IDR(map_idr);
 static DEFINE_SPINLOCK(map_idr_lock);
 
+#ifdef CONFIG_UNPRIVILEGED_BPF_BOOTPARAM
+int sysctl_unprivileged_bpf_disabled __read_mostly =
+   CONFIG_UNPRIVILEGED_BPF_BOOTPARAM_VALUE;
+
+static int __init unprivileged_bpf_setup(char *str)
+{
+   unsigned long disabled;
+
+   if (!kstrtoul(str, 0, ))
+   sysctl_unprivileged_bpf_disabled = !!disabled;
+   return 1;
+}
+__setup("unprivileged_bpf_disabled=", unprivileged_bpf_setup);
+#else /* !CONFIG_UNPRIVILEGED_BPF_BOOTPARAM */
 int sysctl_unprivileged_bpf_disabled __read_mostly;
+#endif /* CONFIG_UNPRIVILEGED_BPF_BOOTPARAM */
 
 static const struct bpf_map_ops * const bpf_map_types[] = {
 #define BPF_PROG_TYPE(_id, _ops)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] bpf: add boot parameters for sysctl knobs

2018-05-21 Thread Eugene Syromiatnikov
Hello.

This patch set adds ability to set default values for
kernel.unprivileged_bpf_disable, net.core.bpf_jit_harden,
net.core.bpf_jit_kallsyms sysctl knobs as well as option to override
them via a boot-time kernel parameter.

Eugene Syromiatnikov (3):
  bpf: add ability to configure unprivileged BPF via boot-time parameter
  bpf: add ability to configure BPF JIT hardening via boot-time
parameter
  bpf: add ability to configure BPF JIT kallsyms export at the boot time

 Documentation/admin-guide/kernel-parameters.txt | 28 
 init/Kconfig| 90 +
 kernel/bpf/core.c   | 31 +
 kernel/bpf/syscall.c| 16 +
 4 files changed, 165 insertions(+)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy

2018-05-21 Thread Patrick Bellasi
Hi Waiman!

I've started looking at the possibility to move Android to use cgroups
v2 and the availability of the cpuset controller makes this even more
promising.

I'll try to give a run to this series on Android, meanwhile I have
some (hopefully not too much dummy) questions below.

On 17-May 16:55, Waiman Long wrote:
> Given the fact that thread mode had been merged into 4.14, it is now
> time to enable cpuset to be used in the default hierarchy (cgroup v2)
> as it is clearly threaded.
> 
> The cpuset controller had experienced feature creep since its
> introduction more than a decade ago. Besides the core cpus and mems
> control files to limit cpus and memory nodes, there are a bunch of
> additional features that can be controlled from the userspace. Some of
> the features are of doubtful usefulness and may not be actively used.
> 
> This patch enables cpuset controller in the default hierarchy with
> a minimal set of features, namely just the cpus and mems and their
> effective_* counterparts.  We can certainly add more features to the
> default hierarchy in the future if there is a real user need for them
> later on.
> 
> Alternatively, with the unified hiearachy, it may make more sense
> to move some of those additional cpuset features, if desired, to
> memory controller or may be to the cpu controller instead of staying
> with cpuset.
> 
> Signed-off-by: Waiman Long 
> ---
>  Documentation/cgroup-v2.txt | 90 
> ++---
>  kernel/cgroup/cpuset.c  | 48 ++--
>  2 files changed, 130 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 74cdeae..cf7bac6 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
> 5-3-2. Writeback
>   5-4. PID
> 5-4-1. PID Interface Files
> - 5-5. Device
> - 5-6. RDMA
> -   5-6-1. RDMA Interface Files
> - 5-7. Misc
> -   5-7-1. perf_event
> + 5-5. Cpuset
> +   5.5-1. Cpuset Interface Files
> + 5-6. Device
> + 5-7. RDMA
> +   5-7-1. RDMA Interface Files
> + 5-8. Misc
> +   5-8-1. perf_event
>   5-N. Non-normative information
> 5-N-1. CPU controller root cgroup process behaviour
> 5-N-2. IO controller root cgroup process behaviour
> @@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN 
> if the creation
>  of a new process would cause a cgroup policy to be violated.
>  
>  
> +Cpuset
> +--
> +
> +The "cpuset" controller provides a mechanism for constraining
> +the CPU and memory node placement of tasks to only the resources
> +specified in the cpuset interface files in a task's current cgroup.
> +This is especially valuable on large NUMA systems where placing jobs
> +on properly sized subsets of the systems with careful processor and
> +memory placement to reduce cross-node memory access and contention
> +can improve overall system performance.

Another quite important use-case for cpuset is Android, where they are
actively used to do both power-saving as well as performance tunings.
For example, depending on the status of an application, its threads
can be allowed to run on all available CPUS (e.g. foreground apps) or
be restricted only on few energy efficient CPUs (e.g. backgroud apps).

Since here we are at "rewriting" cpusets for v2, I think it's important
to keep this mobile world scenario into consideration.

For example, in this context, we are looking at the possibility to
update/tune cpuset.cpus with a relatively high rate, i.e. tens of
times per second. Not sure that's the same update rate usually
required for the large NUMA systems you cite above.  However, in this
case it's quite important to have really small overheads for these
operations.

> +
> +The "cpuset" controller is hierarchical.  That means the controller
> +cannot use CPUs or memory nodes not allowed in its parent.
> +
> +
> +Cpuset Interface Files
> +~~
> +
> +  cpuset.cpus
> + A read-write multiple values file which exists on non-root
> + cpuset-enabled cgroups.
> +
> + It lists the CPUs allowed to be used by tasks within this
> + cgroup.  The CPU numbers are comma-separated numbers or
> + ranges.  For example:
> +
> +   # cat cpuset.cpus
> +   0-4,6,8-10
> +
> + An empty value indicates that the cgroup is using the same
> + setting as the nearest cgroup ancestor with a non-empty
> + "cpuset.cpus" or all the available CPUs if none is found.

Does that means that we can move tasks into a newly created group for
which we have not yet configured this value?
AFAIK, that's a different behavior wrt v1... and I like it better.

> +
> + The value of "cpuset.cpus" stays constant until the next update
> + and won't be affected by any CPU hotplug events.

This also sounds interesting, does it means 

Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-21 Thread Robert Walker

Hi Leo,

On 21/05/18 09:52, Leo Yan wrote:

Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly.  But the change has side effect for packet handling, it only
generate samples when 'prev_packet->last_instr_taken_branch' is true,
this results in the start tracing packet and exception packets are
dropped.

This patch checks extra two conditions for complete samples:

- If 'prev_packet->sample_type' is zero we can use this condition to
   get to know this is the start tracing packet; for this case, the start
   packet's end_addr is zero as well so we need to handle it in the
   function cs_etm__last_executed_instr();



I think you also need to add something in to handle discontinuities in
trace - for example it is possible to configure the ETM to only trace
execution in specific code regions or to trace a few cycles every so
often. In these cases, prev_packet->sample_type will not be zero, but 
whatever the previous packet was.  You will get a CS_ETM_TRACE_ON packet 
in such cases, generated by an I_TRACE_ON element in the trace stream.

You also get this on exception return.

However, you should also keep the test for prev_packet->sample_type == 0
as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
wrapped.

Regards

Rob


- If 'prev_packet->exc' is true, we can know the previous packet is
   exception handling packet so need to generate sample for exception
   flow.

Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Cc: Mike Leach 
Cc: Robert Walker 
Cc: Mathieu Poirier 
Signed-off-by: Leo Yan 
---
  tools/perf/util/cs-etm.c | 35 ---
  1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..378953b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct 
cs_etm_queue *etmq)
  static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
  {
/*
+* The packet is the start tracing packet if the end_addr is zero,
+* returns 0 for this case.
+*/
+   if (!packet->end_addr)
+   return 0;
+
+   /*
 * The packet records the execution range with an exclusive end address
 *
 * A64 instructions are constant size, so the last executed
@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}
  
-	if (etm->sample_branches &&

-   etmq->prev_packet &&
-   etmq->prev_packet->sample_type == CS_ETM_RANGE &&
-   etmq->prev_packet->last_instr_taken_branch) {
-   ret = cs_etm__synth_branch_sample(etmq);
-   if (ret)
-   return ret;
+   if (etm->sample_branches && etmq->prev_packet) {
+   bool generate_sample = false;
+
+   /* Generate sample for start tracing packet */
+   if (etmq->prev_packet->sample_type == 0)
+   generate_sample = true;
+
+   /* Generate sample for exception packet */
+   if (etmq->prev_packet->exc == true)
+   generate_sample = true;
+
+   /* Generate sample for normal branch packet */
+   if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+   etmq->prev_packet->last_instr_taken_branch)
+   generate_sample = true;
+
+   if (generate_sample) {
+   ret = cs_etm__synth_branch_sample(etmq);
+   if (ret)
+   return ret;
+   }
}
  
  	if (etm->sample_branches || etm->synth_opts.last_branch) {



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >> + *
> >> + *  L3 Tile and DMC channel selection is through SMC call
> >> + *  SMC call arguments,
> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
> >> + *   x2 = Node id
> >
> > How do we map Linux node IDs to the firmware's view of node IDs?
> >
> > I don't believe the two are necessarily the same -- Linux's node IDs are
> > a Linux-specific construct.
> 
> both are same, it is numa node id from ACPI/firmware.

I am very wary about assuming that the Linux nid will always be the same
as the ACPI node id.

For that to *potentially* be true, this driver should depend on
CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
node id will always be NUMA_NO_NODE.

I would be *much* happier if we had an explicit mapping somewhere to the
ID the FW expects.

> > It would be much nicer if we could pass something based on the MPIDR,
> > which is a known HW construct, or if this implicitly affected the
> > current node.
> 
> IMO,  node id is sufficient.

I agree that *a* node ID is sufficient, I just don't think that we're
guaranteed to have the specific node ID the FW wants.

> > It would be vastly more sane for this to not be muxed at all. :/
> 
> i am helpless due to crappy hw design!

I'm certainly not blaming you for this! :)

I hope the HW designers don't make the same mistake in future, though...

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
> Hi Ganapat,
> 
> 
> Sorry for the delay in replying; I was away most of last week.
> 
> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> > wrote:
> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
> > > wrote:
> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
> 
> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore)
> > >>> +{
> > >>> + int counter;
> > >>> +
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> > >>> + pmu_uncore->uncore_dev->max_counters);
> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return -ENOSPC;
> > >>> + }
> > >>> + set_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return counter;
> > >>> +}
> > >>> +
> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore,
> > >>> + int counter)
> > >>> +{
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> +}
> > >>
> > >> I don't believe that locking is required in either of these, as the perf
> > >> core serializes pmu::add() and pmu::del(), where these get called.
> > 
> > without this locking, i am seeing "BUG: scheduling while atomic" when
> > i run perf with more events together than the maximum counters
> > supported
> 
> Did you manage to get to the bottom of this?
> 
> Do you have a backtrace?
> 
> It looks like in your latest posting you reserve counters through the
> userspace ABI, which doesn't seem right to me, and I'd like to
> understand the problem.

Looks like I misunderstood -- those are still allocated kernel-side.

I'll follow that up in the v5 posting.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
Hi Ganapat,


Sorry for the delay in replying; I was away most of last week.

On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> wrote:
> > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> >>> +{
> >>> + int counter;
> >>> +
> >>> + raw_spin_lock(_uncore->lock);
> >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> >>> + pmu_uncore->uncore_dev->max_counters);
> >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return -ENOSPC;
> >>> + }
> >>> + set_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return counter;
> >>> +}
> >>> +
> >>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
> >>> + int counter)
> >>> +{
> >>> + raw_spin_lock(_uncore->lock);
> >>> + clear_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> +}
> >>
> >> I don't believe that locking is required in either of these, as the perf
> >> core serializes pmu::add() and pmu::del(), where these get called.
> 
> without this locking, i am seeing "BUG: scheduling while atomic" when
> i run perf with more events together than the maximum counters
> supported

Did you manage to get to the bottom of this?

Do you have a backtrace?

It looks like in your latest posting you reserve counters through the
userspace ABI, which doesn't seem right to me, and I'd like to
understand the problem.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-21 Thread Leo Yan
Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly.  But the change has side effect for packet handling, it only
generate samples when 'prev_packet->last_instr_taken_branch' is true,
this results in the start tracing packet and exception packets are
dropped.

This patch checks extra two conditions for complete samples:

- If 'prev_packet->sample_type' is zero we can use this condition to
  get to know this is the start tracing packet; for this case, the start
  packet's end_addr is zero as well so we need to handle it in the
  function cs_etm__last_executed_instr();

- If 'prev_packet->exc' is true, we can know the previous packet is
  exception handling packet so need to generate sample for exception
  flow.

Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Cc: Mike Leach 
Cc: Robert Walker 
Cc: Mathieu Poirier 
Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..378953b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct 
cs_etm_queue *etmq)
 static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
 {
/*
+* The packet is the start tracing packet if the end_addr is zero,
+* returns 0 for this case.
+*/
+   if (!packet->end_addr)
+   return 0;
+
+   /*
 * The packet records the execution range with an exclusive end address
 *
 * A64 instructions are constant size, so the last executed
@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
etmq->period_instructions = instrs_over;
}
 
-   if (etm->sample_branches &&
-   etmq->prev_packet &&
-   etmq->prev_packet->sample_type == CS_ETM_RANGE &&
-   etmq->prev_packet->last_instr_taken_branch) {
-   ret = cs_etm__synth_branch_sample(etmq);
-   if (ret)
-   return ret;
+   if (etm->sample_branches && etmq->prev_packet) {
+   bool generate_sample = false;
+
+   /* Generate sample for start tracing packet */
+   if (etmq->prev_packet->sample_type == 0)
+   generate_sample = true;
+
+   /* Generate sample for exception packet */
+   if (etmq->prev_packet->exc == true)
+   generate_sample = true;
+
+   /* Generate sample for normal branch packet */
+   if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+   etmq->prev_packet->last_instr_taken_branch)
+   generate_sample = true;
+
+   if (generate_sample) {
+   ret = cs_etm__synth_branch_sample(etmq);
+   if (ret)
+   return ret;
+   }
}
 
if (etm->sample_branches || etm->synth_opts.last_branch) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT v2 0/4] Perf script: Add python script for CoreSight trace disassembler

2018-05-21 Thread Leo Yan
This patch series is to support for using 'perf script' for CoreSight
trace disassembler, for this purpose this patch series adds a new
python script to parse CoreSight tracing event and use command 'objdump'
for disassembled lines, finally this can generate readable program
execution flow for reviewing tracing data.

Patch 0001 is one fixing patch to generate samples for the start packet
and exception packets.

Patch 0002 is the prerequisite to add addr into sample dict, so this
value can be used by python script to analyze instruction range.

Patch 0003 is to add python script for trace disassembler.

Patch 0004 is to add doc to explain python script usage and give
example for it.

This patch series has been rebased on acme git tree [1] with the last
commit 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and
tested on Hikey (ARM64 octa CA53 cores).

In this version the script has no dependency on ARM64 platform and is
expected to support ARM32 platform, but I am lacking ARM32 platform for
testing on it, so firstly upstream to support ARM64 platform.

This patch series is firstly to support 'per-thread' recording tracing
data, but we also need to verify the script can dump trace disassembler
CPU wide tracing and kernel panic kdump tracing data.  I also verified
this patch series which can work with kernel panic kdump tracing data,
because Mathieu is working on CPU wide tracing related work, so after
this we need to retest for CPU wide tracing and kdump tracing to ensure
the python script can handle well for all cases.

You are very welcome to test the script in this patch series, your
testing result and suggestion are very valuable to perfect this script
to cover more cases.

Changes from v1:
* According to Mike and Rob suggestion, add the fixing to generate samples
  for the start packet and exception packets.
* Simplify the python script to remove the exception prediction algorithm,
  we can rely on the sane exception packets for disassembler.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git


Leo Yan (4):
  perf cs-etm: Generate sample for missed packets
  perf script python: Add addr into perf sample dict
  perf script python: Add script for CoreSight trace disassembler
  coresight: Document for CoreSight trace disassembler

 Documentation/trace/coresight.txt  |  52 +
 tools/perf/scripts/python/arm-cs-trace-disasm.py   | 234 +
 tools/perf/util/cs-etm.c   |  35 ++-
 .../util/scripting-engines/trace-event-python.c|   2 +
 4 files changed, 316 insertions(+), 7 deletions(-)
 create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT v2 4/4] coresight: Document for CoreSight trace disassembler

2018-05-21 Thread Leo Yan
This commit documents CoreSight trace disassembler usage and gives
example for it.

Signed-off-by: Leo Yan 
---
 Documentation/trace/coresight.txt | 52 +++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/trace/coresight.txt 
b/Documentation/trace/coresight.txt
index 6f0120c..b8f2359 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -381,3 +381,55 @@ sort example is from the AutoFDO tutorial 
(https://gcc.gnu.org/wiki/AutoFDO/Tuto
$ taskset -c 2 ./sort_autofdo
Bubble sorting array of 3 elements
5806 ms
+
+
+Tracing data disassembler
+-
+
+'perf script' supports to use script to parse tracing packet and rely on
+'objdump' for disassembled lines, this can convert tracing data to readable
+program execution flow for easily reviewing tracing data.
+
+The CoreSight trace disassembler is located in the folder:
+tools/perf/scripts/python/arm-cs-trace-disasm.py.  This script support below
+options:
+
+   -d, --objdump: Set path to objdump executable, this option is
+  mandatory.
+   -k, --vmlinux: Set path to vmlinux file.
+   -v, --verbose: Enable debugging log, after enable this option the
+  script dumps every event data.
+
+Below is one example for using python script to dump CoreSight trace
+disassembler:
+
+   $ perf script -s arm-cs-trace-disasm.py -i perf.data \
+   -F cpu,event,ip,addr,sym -- -d objdump -k ./vmlinux > cs-disasm.log
+
+Below is one example for the disassembler log:
+
+ARM CoreSight Trace Data Assembler Dump
+   08a5f2dc :
+   08a5f2dc:   34a0cbz w0, 08a5f2f0 

+   08a5f2f0 :
+   08a5f2f0:   f9400260ldr x0, [x19]
+   08a5f2f4:   d5033f9fdsb sy
+   08a5f2f8:   913ec000add x0, x0, #0xfb0
+   08a5f2fc:   b91fstr wzr, [x0]
+   08a5f300:   f9400bf3ldr x19, [sp, #16]
+   08a5f304:   a8c27bfdldp x29, x30, [sp], #32
+   08a5f308:   d65f03c0ret
+   08a5fa18 :
+   08a5fa18:   1425b   08a5faac 

+   08a5faac :
+   08a5faac:   b9406261ldr w1, [x19, #96]
+   08a5fab0:   52800015mov w21, #0x0   
// #0
+   08a5fab4:   f901ca61str x1, [x19, #912]
+   08a5fab8:   2a1503e0mov w0, w21
+   08a5fabc:   3940e261ldrbw1, [x19, #56]
+   08a5fac0:   f901ce61str x1, [x19, #920]
+   08a5fac4:   a94153f3ldp x19, x20, [sp, #16]
+   08a5fac8:   a9425bf5ldp x21, x22, [sp, #32]
+   08a5facc:   a94363f7ldp x23, x24, [sp, #48]
+   08a5fad0:   a8c47bfdldp x29, x30, [sp], #64
+   08a5fad4:   d65f03c0ret
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT v2 2/4] perf script python: Add addr into perf sample dict

2018-05-21 Thread Leo Yan
ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.

This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.

Signed-off-by: Leo Yan 
---
 tools/perf/util/scripting-engines/trace-event-python.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index 10dd5fc..7f8afac 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample 
*sample,
PyLong_FromUnsignedLongLong(sample->period));
pydict_set_item_string_decref(dict_sample, "phys_addr",
PyLong_FromUnsignedLongLong(sample->phys_addr));
+   pydict_set_item_string_decref(dict_sample, "addr",
+   PyLong_FromUnsignedLongLong(sample->addr));
set_sample_read_in_dict(dict_sample, sample, evsel);
pydict_set_item_string_decref(dict, "sample", dict_sample);
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html