Re: [PATCH 1/2] ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config

2018-07-12 Thread Andrey Smirnov
On Thu, Jul 12, 2018 at 6:37 AM Fabio Estevam  wrote:
>
> Hi Andrey,
>
> On Wed, Jul 11, 2018 at 11:33 PM, Andrey Smirnov
>  wrote:
>
> > +   pinctrl_switch: switchgrp {
> > +   fsl,pins = <
> > +   MX51_PAD_AUD3_BB_CK__GPIO4_20   0xc5
>
> The i.MX51 Reference Manual states that 0xa5 is the default reset
> value for the register IOMUXC_SW_PAD_CTL_PAD_AUD3_BB_CK.
>
> By reading your commit log I had the impression you wanted to provide
> the default value explicitly.
>
> Please clarify.

I wanted to avoid relying on defaults be it register reset values or
settings that bootloader left us with. Default value of 0xa5 works,
but, given how the pin is IRQ_TYPE_LEVEL_HIGH, I though it would be
better to configure it to have a pulldown. Do you want me to add that
to commit log?

Thanks,
Andrey Smirnov


Re: [PATCH 1/2] ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config

2018-07-12 Thread Andrey Smirnov
On Thu, Jul 12, 2018 at 6:37 AM Fabio Estevam  wrote:
>
> Hi Andrey,
>
> On Wed, Jul 11, 2018 at 11:33 PM, Andrey Smirnov
>  wrote:
>
> > +   pinctrl_switch: switchgrp {
> > +   fsl,pins = <
> > +   MX51_PAD_AUD3_BB_CK__GPIO4_20   0xc5
>
> The i.MX51 Reference Manual states that 0xa5 is the default reset
> value for the register IOMUXC_SW_PAD_CTL_PAD_AUD3_BB_CK.
>
> By reading your commit log I had the impression you wanted to provide
> the default value explicitly.
>
> Please clarify.

I wanted to avoid relying on defaults be it register reset values or
settings that bootloader left us with. Default value of 0xa5 works,
but, given how the pin is IRQ_TYPE_LEVEL_HIGH, I though it would be
better to configure it to have a pulldown. Do you want me to add that
to commit log?

Thanks,
Andrey Smirnov


Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-12 Thread poza

On 2018-07-12 20:15, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Should this configuration no be set by Firmware ? why should Linux 
dictate it ?


Regards,
Oza.




Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-12 Thread poza

On 2018-07-12 20:15, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Should this configuration no be set by Firmware ? why should Linux 
dictate it ?


Regards,
Oza.




[PATCH v2 3/3] ARM: configs: imx_v6_v7_defconfig: add DMATEST support

2018-07-12 Thread Robin Gong
Add DMATEST support and remove invalid options, such as
CONFIG_BT_HCIUART_H4 is default enabled and CONFIG_SND_SOC_IMX_WM8962
is out of date and not appear in any config file. Please refer to
Documentation/driver-api/dmaengine/dmatest.rst to test MEMCPY feature
of imx-sdma.

Signed-off-by: Robin Gong 
---
 arch/arm/configs/imx_v6_v7_defconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index e381d05..f28d4d9 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -81,7 +81,6 @@ CONFIG_CAN=y
 CONFIG_CAN_FLEXCAN=y
 CONFIG_BT=y
 CONFIG_BT_HCIUART=y
-CONFIG_BT_HCIUART_H4=y
 CONFIG_BT_HCIUART_LL=y
 CONFIG_CFG80211=y
 CONFIG_CFG80211_WEXT=y
@@ -282,7 +281,6 @@ CONFIG_SND_SOC_FSL_ASRC=y
 CONFIG_SND_IMX_SOC=y
 CONFIG_SND_SOC_PHYCORE_AC97=y
 CONFIG_SND_SOC_EUKREA_TLV320=y
-CONFIG_SND_SOC_IMX_WM8962=y
 CONFIG_SND_SOC_IMX_ES8328=y
 CONFIG_SND_SOC_IMX_SGTL5000=y
 CONFIG_SND_SOC_IMX_SPDIF=y
@@ -371,6 +369,7 @@ CONFIG_DMADEVICES=y
 CONFIG_FSL_EDMA=y
 CONFIG_IMX_SDMA=y
 CONFIG_MXS_DMA=y
+CONFIG_DMATEST=m
 CONFIG_STAGING=y
 CONFIG_STAGING_MEDIA=y
 CONFIG_VIDEO_IMX_MEDIA=y
-- 
2.7.4



[PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface

2018-07-12 Thread Robin Gong
Add MEMCPY capability for imx-sdma driver.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 95 --
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index e3d5e73..ef50f2c 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -342,6 +342,7 @@ struct sdma_desc {
  * @pc_from_device:script address for those device_2_memory
  * @pc_to_device:  script address for those memory_2_device
  * @device_to_device:  script address for those device_2_device
+ * @pc_to_pc:  script address for those memory_2_memory
  * @flags: loop mode or not
  * @per_address:   peripheral source or destination address in common case
  *  destination address in p_2_p case
@@ -367,6 +368,7 @@ struct sdma_channel {
enum dma_slave_buswidth word_size;
unsigned intpc_from_device, pc_to_device;
unsigned intdevice_to_device;
+   unsigned intpc_to_pc;
unsigned long   flags;
dma_addr_t  per_address, per_address2;
unsigned long   event_mask[2];
@@ -869,14 +871,16 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
 * These are needed once we start to support transfers between
 * two peripherals or memory-to-memory transfers
 */
-   int per_2_per = 0;
+   int per_2_per = 0, emi_2_emi = 0;
 
sdmac->pc_from_device = 0;
sdmac->pc_to_device = 0;
sdmac->device_to_device = 0;
+   sdmac->pc_to_pc = 0;
 
switch (peripheral_type) {
case IMX_DMATYPE_MEMORY:
+   emi_2_emi = sdma->script_addrs->ap_2_ap_addr;
break;
case IMX_DMATYPE_DSP:
emi_2_per = sdma->script_addrs->bp_2_ap_addr;
@@ -949,6 +953,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
sdmac->pc_from_device = per_2_emi;
sdmac->pc_to_device = emi_2_per;
sdmac->device_to_device = per_2_per;
+   sdmac->pc_to_pc = emi_2_emi;
 }
 
 static int sdma_load_context(struct sdma_channel *sdmac)
@@ -965,6 +970,8 @@ static int sdma_load_context(struct sdma_channel *sdmac)
load_address = sdmac->pc_from_device;
else if (sdmac->direction == DMA_DEV_TO_DEV)
load_address = sdmac->device_to_device;
+   else if (sdmac->direction == DMA_MEM_TO_MEM)
+   load_address = sdmac->pc_to_pc;
else
load_address = sdmac->pc_to_device;
 
@@ -1214,10 +1221,28 @@ static int sdma_alloc_chan_resources(struct dma_chan 
*chan)
 {
struct sdma_channel *sdmac = to_sdma_chan(chan);
struct imx_dma_data *data = chan->private;
+   struct imx_dma_data mem_data;
int prio, ret;
 
-   if (!data)
-   return -EINVAL;
+   /*
+* MEMCPY may never setup chan->private by filter function such as
+* dmatest, thus create 'struct imx_dma_data mem_data' for this case.
+* Please note in any other slave case, you have to setup chan->private
+* with 'struct imx_dma_data' in your own filter function if you want to
+* request dma channel by dma_request_channel() rather than
+* dma_request_slave_channel(). Othwise, 'MEMCPY in case?' will appear
+* to warn you to correct your filter function.
+*/
+   if (!data) {
+   dev_dbg(sdmac->sdma->dev, "MEMCPY in case?\n");
+   mem_data.priority = 2;
+   mem_data.peripheral_type = IMX_DMATYPE_MEMORY;
+   mem_data.dma_request = 0;
+   mem_data.dma_request2 = 0;
+   data = _data;
+
+   sdma_get_pc(sdmac, IMX_DMATYPE_MEMORY);
+   }
 
switch (data->priority) {
case DMA_PRIO_HIGH:
@@ -1307,6 +1332,10 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
if (sdma_alloc_bd(desc))
goto err_desc_out;
 
+   /* No slave_config called in MEMCPY case, so do here */
+   if (direction == DMA_MEM_TO_MEM)
+   sdma_config_ownership(sdmac, false, true, false);
+
if (sdma_load_context(sdmac))
goto err_desc_out;
 
@@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
return NULL;
 }
 
+static struct dma_async_tx_descriptor *sdma_prep_memcpy(
+   struct dma_chan *chan, dma_addr_t dma_dst,
+   dma_addr_t dma_src, size_t len, unsigned long flags)
+{
+   struct sdma_channel *sdmac = to_sdma_chan(chan);
+   struct sdma_engine *sdma = sdmac->sdma;
+   int channel = sdmac->channel;
+   size_t count;
+   int i = 0, param;
+   struct sdma_buffer_descriptor *bd;
+   struct sdma_desc *desc;
+
+   if (!chan || !len)
+   return NULL;
+
+   

[lkp-robot] [xarray] f0b90e702f: BUG:soft_lockup-CPU##stuck_for#s

2018-07-12 Thread kernel test robot

FYI, we noticed the following commit (built with gcc-7):

commit: f0b90e702fe74fa575b7382ec3474d341098d5b1 ("xarray: Add XArray 
unconditional store operations")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 360M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 3d730c4294 | f0b90e702f |
++++
| boot_successes | 0  | 0  |
| boot_failures  | 14 | 25 |
| WARNING:at_mm/slab_common.c:#kmalloc_slab  | 14 | 25 |
| EIP:kmalloc_slab   | 14 | 25 |
| Mem-Info   | 14 | 25 |
| INFO:trying_to_register_non-static_key | 14 | 25 |
| BUG:unable_to_handle_kernel| 14 ||
| Oops:#[##] | 14 ||
| EIP:__pci_epf_register_driver  | 14 ||
| Kernel_panic-not_syncing:Fatal_exception   | 14 ||
| BUG:soft_lockup-CPU##stuck_for#s   | 0  | 25 |
| EIP:xa_entry   | 0  | 5  |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0  | 25 |
| EIP:xa_is_node | 0  | 8  |
| EIP:xas_load   | 0  | 2  |
| EIP:debug_lockdep_rcu_enabled  | 0  | 1  |
| EIP:xa_load| 0  | 3  |
| EIP:xas_descend| 0  | 2  |
| EIP:xa_head| 0  | 1  |
| EIP:xas_start  | 0  | 3  |
++++



[   44.03] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
[   44.03] irq event stamp: 1072387
[   44.03] hardirqs last  enabled at (1072387): [<4106ebde>] 
console_unlock+0x3f3/0x42d
[   44.03] hardirqs last disabled at (1072386): [<4106e84f>] 
console_unlock+0x64/0x42d
[   44.03] softirqs last  enabled at (1072364): [<417ecbeb>] 
__do_softirq+0x183/0x1b3
[   44.03] softirqs last disabled at (1072357): [<41007967>] 
do_softirq_own_stack+0x1d/0x23
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] EIP: xa_is_node+0x0/0x1a
[   44.03] Code: 89 73 08 89 7b 0c eb 0b 39 43 14 72 0c 8b 75 ec 8b 7d f0 
89 73 10 89 7b 14 8d 4d ec 89 d8 e8 88 fe ff ff 5a 59 5b 5e 5f 5d c3 <89> c2 55 
83 e2 03 83 fa 02 89 e5 0f 94 c2 3d 00 10 00 00 0f 97 c0 
[   44.03] EAX: 4c93caf2 EBX: 5442fec0 ECX: 4c93caf2 EDX: 0001
[   44.03] ESI:  EDI:  EBP: 5442feb4 ESP: 5442feac
[   44.03] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00200293
[   44.03] CR0: 80050033 CR2:  CR3: 01d27000 CR4: 000406b0
[   44.03] Call Trace:
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? check_xa_tag_1+0x308/0x308
[   44.03]  ? do_one_initcall+0x6a/0x13c
[   44.03]  ? parse_args+0xd9/0x1e3
[   44.03]  ? kernel_init_freeable+0xe1/0x172
[   44.03]  ? rest_init+0xaf/0xaf
[   44.03]  ? kernel_init+0x8/0xd0
[   44.03]  ? ret_from_fork+0x19/0x24
[   44.03] Kernel panic - not syncing: softlockup: hung tasks
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GWL
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] Call Trace:
[   44.03]  ? dump_stack+0x79/0xab
[   44.03]  ? panic+0x99/0x1d8
[   44.03]  ? watchdog_timer_fn+0x1ac/0x1d3
[   44.03]  ? __hrtimer_run_queues+0xa0/0x114
[   44.03]  ? watchdog+0x16/0x16
[   44.03]  ? hrtimer_run_queues+0xd2/0xe5
[   44.03]  ? run_local_timers+0x15/0x39
[   44.03]  ? update_process_times+0x18/0x39
[   44.03]  ? tick_nohz_handler+0xba/0xfb
[   44.03]  ? smp_apic_timer_interrupt+0x54/0x67
[   44.03]  ? apic_timer_interrupt+0x41/0x48
[   44.03]  ? siphash_2u64+0x54f/0x7de
[   44.03]  ? minmax_running_min+0x6f/0x6f
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? 

[PATCH v2 1/3] dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'

2018-07-12 Thread Robin Gong
Add macro SDMA_BD_MAX_CNT to replace '0x'.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 3b622d6..e3d5e73 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -185,6 +185,7 @@
  * Mode/Count of data node descriptors - IPCv2
  */
 struct sdma_mode_count {
+#define SDMA_BD_MAX_CNT0x
u32 count   : 16; /* size of the buffer pointed by this BD */
u32 status  :  8; /* E,R,I,C,W,D status bits stored here */
u32 command :  8; /* command mostly used for channel 0 */
@@ -1344,9 +1345,9 @@ static struct dma_async_tx_descriptor *sdma_prep_slave_sg(
 
count = sg_dma_len(sg);
 
-   if (count > 0x) {
+   if (count > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum bytes for 
sg entry exceeded: %d > %d\n",
-   channel, count, 0x);
+   channel, count, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1421,9 +1422,9 @@ static struct dma_async_tx_descriptor 
*sdma_prep_dma_cyclic(
 
sdmac->flags |= IMX_DMA_SG_LOOP;
 
-   if (period_len > 0x) {
+   if (period_len > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum period size 
exceeded: %zu > %d\n",
-   channel, period_len, 0x);
+   channel, period_len, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1970,7 +1971,7 @@ static int sdma_probe(struct platform_device *pdev)
sdma->dma_device.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
sdma->dma_device.device_issue_pending = sdma_issue_pending;
sdma->dma_device.dev->dma_parms = >dma_parms;
-   dma_set_max_seg_size(sdma->dma_device.dev, 65535);
+   dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT);
 
platform_set_drvdata(pdev, sdma);
 
-- 
2.7.4



[PATCH v2 3/3] ARM: configs: imx_v6_v7_defconfig: add DMATEST support

2018-07-12 Thread Robin Gong
Add DMATEST support and remove invalid options, such as
CONFIG_BT_HCIUART_H4 is default enabled and CONFIG_SND_SOC_IMX_WM8962
is out of date and not appear in any config file. Please refer to
Documentation/driver-api/dmaengine/dmatest.rst to test MEMCPY feature
of imx-sdma.

Signed-off-by: Robin Gong 
---
 arch/arm/configs/imx_v6_v7_defconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index e381d05..f28d4d9 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -81,7 +81,6 @@ CONFIG_CAN=y
 CONFIG_CAN_FLEXCAN=y
 CONFIG_BT=y
 CONFIG_BT_HCIUART=y
-CONFIG_BT_HCIUART_H4=y
 CONFIG_BT_HCIUART_LL=y
 CONFIG_CFG80211=y
 CONFIG_CFG80211_WEXT=y
@@ -282,7 +281,6 @@ CONFIG_SND_SOC_FSL_ASRC=y
 CONFIG_SND_IMX_SOC=y
 CONFIG_SND_SOC_PHYCORE_AC97=y
 CONFIG_SND_SOC_EUKREA_TLV320=y
-CONFIG_SND_SOC_IMX_WM8962=y
 CONFIG_SND_SOC_IMX_ES8328=y
 CONFIG_SND_SOC_IMX_SGTL5000=y
 CONFIG_SND_SOC_IMX_SPDIF=y
@@ -371,6 +369,7 @@ CONFIG_DMADEVICES=y
 CONFIG_FSL_EDMA=y
 CONFIG_IMX_SDMA=y
 CONFIG_MXS_DMA=y
+CONFIG_DMATEST=m
 CONFIG_STAGING=y
 CONFIG_STAGING_MEDIA=y
 CONFIG_VIDEO_IMX_MEDIA=y
-- 
2.7.4



[PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface

2018-07-12 Thread Robin Gong
Add MEMCPY capability for imx-sdma driver.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 95 --
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index e3d5e73..ef50f2c 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -342,6 +342,7 @@ struct sdma_desc {
  * @pc_from_device:script address for those device_2_memory
  * @pc_to_device:  script address for those memory_2_device
  * @device_to_device:  script address for those device_2_device
+ * @pc_to_pc:  script address for those memory_2_memory
  * @flags: loop mode or not
  * @per_address:   peripheral source or destination address in common case
  *  destination address in p_2_p case
@@ -367,6 +368,7 @@ struct sdma_channel {
enum dma_slave_buswidth word_size;
unsigned intpc_from_device, pc_to_device;
unsigned intdevice_to_device;
+   unsigned intpc_to_pc;
unsigned long   flags;
dma_addr_t  per_address, per_address2;
unsigned long   event_mask[2];
@@ -869,14 +871,16 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
 * These are needed once we start to support transfers between
 * two peripherals or memory-to-memory transfers
 */
-   int per_2_per = 0;
+   int per_2_per = 0, emi_2_emi = 0;
 
sdmac->pc_from_device = 0;
sdmac->pc_to_device = 0;
sdmac->device_to_device = 0;
+   sdmac->pc_to_pc = 0;
 
switch (peripheral_type) {
case IMX_DMATYPE_MEMORY:
+   emi_2_emi = sdma->script_addrs->ap_2_ap_addr;
break;
case IMX_DMATYPE_DSP:
emi_2_per = sdma->script_addrs->bp_2_ap_addr;
@@ -949,6 +953,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
sdmac->pc_from_device = per_2_emi;
sdmac->pc_to_device = emi_2_per;
sdmac->device_to_device = per_2_per;
+   sdmac->pc_to_pc = emi_2_emi;
 }
 
 static int sdma_load_context(struct sdma_channel *sdmac)
@@ -965,6 +970,8 @@ static int sdma_load_context(struct sdma_channel *sdmac)
load_address = sdmac->pc_from_device;
else if (sdmac->direction == DMA_DEV_TO_DEV)
load_address = sdmac->device_to_device;
+   else if (sdmac->direction == DMA_MEM_TO_MEM)
+   load_address = sdmac->pc_to_pc;
else
load_address = sdmac->pc_to_device;
 
@@ -1214,10 +1221,28 @@ static int sdma_alloc_chan_resources(struct dma_chan 
*chan)
 {
struct sdma_channel *sdmac = to_sdma_chan(chan);
struct imx_dma_data *data = chan->private;
+   struct imx_dma_data mem_data;
int prio, ret;
 
-   if (!data)
-   return -EINVAL;
+   /*
+* MEMCPY may never setup chan->private by filter function such as
+* dmatest, thus create 'struct imx_dma_data mem_data' for this case.
+* Please note in any other slave case, you have to setup chan->private
+* with 'struct imx_dma_data' in your own filter function if you want to
+* request dma channel by dma_request_channel() rather than
+* dma_request_slave_channel(). Othwise, 'MEMCPY in case?' will appear
+* to warn you to correct your filter function.
+*/
+   if (!data) {
+   dev_dbg(sdmac->sdma->dev, "MEMCPY in case?\n");
+   mem_data.priority = 2;
+   mem_data.peripheral_type = IMX_DMATYPE_MEMORY;
+   mem_data.dma_request = 0;
+   mem_data.dma_request2 = 0;
+   data = _data;
+
+   sdma_get_pc(sdmac, IMX_DMATYPE_MEMORY);
+   }
 
switch (data->priority) {
case DMA_PRIO_HIGH:
@@ -1307,6 +1332,10 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
if (sdma_alloc_bd(desc))
goto err_desc_out;
 
+   /* No slave_config called in MEMCPY case, so do here */
+   if (direction == DMA_MEM_TO_MEM)
+   sdma_config_ownership(sdmac, false, true, false);
+
if (sdma_load_context(sdmac))
goto err_desc_out;
 
@@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
return NULL;
 }
 
+static struct dma_async_tx_descriptor *sdma_prep_memcpy(
+   struct dma_chan *chan, dma_addr_t dma_dst,
+   dma_addr_t dma_src, size_t len, unsigned long flags)
+{
+   struct sdma_channel *sdmac = to_sdma_chan(chan);
+   struct sdma_engine *sdma = sdmac->sdma;
+   int channel = sdmac->channel;
+   size_t count;
+   int i = 0, param;
+   struct sdma_buffer_descriptor *bd;
+   struct sdma_desc *desc;
+
+   if (!chan || !len)
+   return NULL;
+
+   

[lkp-robot] [xarray] f0b90e702f: BUG:soft_lockup-CPU##stuck_for#s

2018-07-12 Thread kernel test robot

FYI, we noticed the following commit (built with gcc-7):

commit: f0b90e702fe74fa575b7382ec3474d341098d5b1 ("xarray: Add XArray 
unconditional store operations")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 360M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 3d730c4294 | f0b90e702f |
++++
| boot_successes | 0  | 0  |
| boot_failures  | 14 | 25 |
| WARNING:at_mm/slab_common.c:#kmalloc_slab  | 14 | 25 |
| EIP:kmalloc_slab   | 14 | 25 |
| Mem-Info   | 14 | 25 |
| INFO:trying_to_register_non-static_key | 14 | 25 |
| BUG:unable_to_handle_kernel| 14 ||
| Oops:#[##] | 14 ||
| EIP:__pci_epf_register_driver  | 14 ||
| Kernel_panic-not_syncing:Fatal_exception   | 14 ||
| BUG:soft_lockup-CPU##stuck_for#s   | 0  | 25 |
| EIP:xa_entry   | 0  | 5  |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0  | 25 |
| EIP:xa_is_node | 0  | 8  |
| EIP:xas_load   | 0  | 2  |
| EIP:debug_lockdep_rcu_enabled  | 0  | 1  |
| EIP:xa_load| 0  | 3  |
| EIP:xas_descend| 0  | 2  |
| EIP:xa_head| 0  | 1  |
| EIP:xas_start  | 0  | 3  |
++++



[   44.03] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
[   44.03] irq event stamp: 1072387
[   44.03] hardirqs last  enabled at (1072387): [<4106ebde>] 
console_unlock+0x3f3/0x42d
[   44.03] hardirqs last disabled at (1072386): [<4106e84f>] 
console_unlock+0x64/0x42d
[   44.03] softirqs last  enabled at (1072364): [<417ecbeb>] 
__do_softirq+0x183/0x1b3
[   44.03] softirqs last disabled at (1072357): [<41007967>] 
do_softirq_own_stack+0x1d/0x23
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] EIP: xa_is_node+0x0/0x1a
[   44.03] Code: 89 73 08 89 7b 0c eb 0b 39 43 14 72 0c 8b 75 ec 8b 7d f0 
89 73 10 89 7b 14 8d 4d ec 89 d8 e8 88 fe ff ff 5a 59 5b 5e 5f 5d c3 <89> c2 55 
83 e2 03 83 fa 02 89 e5 0f 94 c2 3d 00 10 00 00 0f 97 c0 
[   44.03] EAX: 4c93caf2 EBX: 5442fec0 ECX: 4c93caf2 EDX: 0001
[   44.03] ESI:  EDI:  EBP: 5442feb4 ESP: 5442feac
[   44.03] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00200293
[   44.03] CR0: 80050033 CR2:  CR3: 01d27000 CR4: 000406b0
[   44.03] Call Trace:
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? check_xa_tag_1+0x308/0x308
[   44.03]  ? do_one_initcall+0x6a/0x13c
[   44.03]  ? parse_args+0xd9/0x1e3
[   44.03]  ? kernel_init_freeable+0xe1/0x172
[   44.03]  ? rest_init+0xaf/0xaf
[   44.03]  ? kernel_init+0x8/0xd0
[   44.03]  ? ret_from_fork+0x19/0x24
[   44.03] Kernel panic - not syncing: softlockup: hung tasks
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GWL
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] Call Trace:
[   44.03]  ? dump_stack+0x79/0xab
[   44.03]  ? panic+0x99/0x1d8
[   44.03]  ? watchdog_timer_fn+0x1ac/0x1d3
[   44.03]  ? __hrtimer_run_queues+0xa0/0x114
[   44.03]  ? watchdog+0x16/0x16
[   44.03]  ? hrtimer_run_queues+0xd2/0xe5
[   44.03]  ? run_local_timers+0x15/0x39
[   44.03]  ? update_process_times+0x18/0x39
[   44.03]  ? tick_nohz_handler+0xba/0xfb
[   44.03]  ? smp_apic_timer_interrupt+0x54/0x67
[   44.03]  ? apic_timer_interrupt+0x41/0x48
[   44.03]  ? siphash_2u64+0x54f/0x7de
[   44.03]  ? minmax_running_min+0x6f/0x6f
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? 

[PATCH v2 1/3] dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'

2018-07-12 Thread Robin Gong
Add macro SDMA_BD_MAX_CNT to replace '0x'.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 3b622d6..e3d5e73 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -185,6 +185,7 @@
  * Mode/Count of data node descriptors - IPCv2
  */
 struct sdma_mode_count {
+#define SDMA_BD_MAX_CNT0x
u32 count   : 16; /* size of the buffer pointed by this BD */
u32 status  :  8; /* E,R,I,C,W,D status bits stored here */
u32 command :  8; /* command mostly used for channel 0 */
@@ -1344,9 +1345,9 @@ static struct dma_async_tx_descriptor *sdma_prep_slave_sg(
 
count = sg_dma_len(sg);
 
-   if (count > 0x) {
+   if (count > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum bytes for 
sg entry exceeded: %d > %d\n",
-   channel, count, 0x);
+   channel, count, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1421,9 +1422,9 @@ static struct dma_async_tx_descriptor 
*sdma_prep_dma_cyclic(
 
sdmac->flags |= IMX_DMA_SG_LOOP;
 
-   if (period_len > 0x) {
+   if (period_len > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum period size 
exceeded: %zu > %d\n",
-   channel, period_len, 0x);
+   channel, period_len, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1970,7 +1971,7 @@ static int sdma_probe(struct platform_device *pdev)
sdma->dma_device.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
sdma->dma_device.device_issue_pending = sdma_issue_pending;
sdma->dma_device.dev->dma_parms = >dma_parms;
-   dma_set_max_seg_size(sdma->dma_device.dev, 65535);
+   dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT);
 
platform_set_drvdata(pdev, sdma);
 
-- 
2.7.4



[PATCH v2 0/3] add memcpy support for sdma

2018-07-12 Thread Robin Gong
This patchset is to add memcpy interface for imx-sdma, besides,to
support dmatest and enable config by default, so that could test dma
easily without any other device support such as uart/audio/spi...

Change from v1:
  1. remove bus_width check for memcpy since only max bus width needed for
 memcpy case to speedup copy.
  2. remove DMATEST support patch, since DMATEST is a common memcpy case.
  3. split to single patch for SDMA_BD_MAX_CNT instead of '0x'
  4. move sdma_config_ownership() from alloc_chan into sdma_prep_memcpy.
  5. address some minor review comments.

Robin Gong (3):
  dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0x'
  dmaengine: imx-sdma: add memcpy interface
  ARM: configs: imx_v6_v7_defconfig: add DMATEST support

 arch/arm/configs/imx_v6_v7_defconfig |   3 +-
 drivers/dma/imx-sdma.c   | 106 ---
 2 files changed, 99 insertions(+), 10 deletions(-)

-- 
2.7.4



[PATCH v2 0/3] add memcpy support for sdma

2018-07-12 Thread Robin Gong
This patchset is to add memcpy interface for imx-sdma, besides,to
support dmatest and enable config by default, so that could test dma
easily without any other device support such as uart/audio/spi...

Change from v1:
  1. remove bus_width check for memcpy since only max bus width needed for
 memcpy case to speedup copy.
  2. remove DMATEST support patch, since DMATEST is a common memcpy case.
  3. split to single patch for SDMA_BD_MAX_CNT instead of '0x'
  4. move sdma_config_ownership() from alloc_chan into sdma_prep_memcpy.
  5. address some minor review comments.

Robin Gong (3):
  dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0x'
  dmaengine: imx-sdma: add memcpy interface
  ARM: configs: imx_v6_v7_defconfig: add DMATEST support

 arch/arm/configs/imx_v6_v7_defconfig |   3 +-
 drivers/dma/imx-sdma.c   | 106 ---
 2 files changed, 99 insertions(+), 10 deletions(-)

-- 
2.7.4



Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread Peter Xu
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: Peter Xu 

-- 
Peter Xu


Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread Peter Xu
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: Peter Xu 

-- 
Peter Xu


[RFC PATCH] vfio/pci: map prefetchble bars as writecombine

2018-07-12 Thread Srinath Mannam
By default all BARs map with VMA access permissions
as pgprot_noncached.

In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
is strongly ordered and allows aligned access.
This type of mapping works for NON-PREFETCHABLE bars
containing EP controller registers.
But it restricts PREFETCHABLE bars from doing
unaligned access.

In CMB NVMe drives PREFETCHABLE bars are required to
map as MT_NORMAL_NC to do unaligned access.

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
Reviewed-by: Vikram Prakash 
---
 drivers/vfio/pci/vfio_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b423a30..eff6b65 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1142,7 +1142,10 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
}
 
vma->vm_private_data = vdev;
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
+   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   else
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-- 
2.7.4



[RFC PATCH] vfio/pci: map prefetchble bars as writecombine

2018-07-12 Thread Srinath Mannam
By default all BARs map with VMA access permissions
as pgprot_noncached.

In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
is strongly ordered and allows aligned access.
This type of mapping works for NON-PREFETCHABLE bars
containing EP controller registers.
But it restricts PREFETCHABLE bars from doing
unaligned access.

In CMB NVMe drives PREFETCHABLE bars are required to
map as MT_NORMAL_NC to do unaligned access.

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
Reviewed-by: Vikram Prakash 
---
 drivers/vfio/pci/vfio_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b423a30..eff6b65 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1142,7 +1142,10 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
}
 
vma->vm_private_data = vdev;
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
+   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   else
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-- 
2.7.4



Important Notice...Reply Now

2018-07-12 Thread Richard & Angela Maxwell
My wife and I won the Euro Millions Lottery of £53 Million British Pounds and 
we have voluntarily decided to donate €1,000,000EUR(One Million Euros) to 5 
individuals randomly as part of our own charity project.
 
To verify our lottery winnings,please see our interview by visiting the web 
page below:

telegraph.co.uk/news/newstopics/howaboutthat/11511467/Lincolnshire-couple-win-53m-on-EuroMillions.html
 
Your email address was among the emails which were submitted to us by the 
Google Inc. as a web user; if you have received our email,kindly send us the 
below details so that we can transfer your €1,000,000.00 EUR(One Million Euros) 
to you in your own country.

Full Names:
Mobile No:
Age:
Occupation:
Country:

Send your response to: rmaxwel...@yahoo.com

Best Regards,
Richard & Angela Maxwell



Important Notice...Reply Now

2018-07-12 Thread Richard & Angela Maxwell
My wife and I won the Euro Millions Lottery of £53 Million British Pounds and 
we have voluntarily decided to donate €1,000,000EUR(One Million Euros) to 5 
individuals randomly as part of our own charity project.
 
To verify our lottery winnings,please see our interview by visiting the web 
page below:

telegraph.co.uk/news/newstopics/howaboutthat/11511467/Lincolnshire-couple-win-53m-on-EuroMillions.html
 
Your email address was among the emails which were submitted to us by the 
Google Inc. as a web user; if you have received our email,kindly send us the 
below details so that we can transfer your €1,000,000.00 EUR(One Million Euros) 
to you in your own country.

Full Names:
Mobile No:
Age:
Occupation:
Country:

Send your response to: rmaxwel...@yahoo.com

Best Regards,
Richard & Angela Maxwell



Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Paul E. McKenney
On Thu, Jul 12, 2018 at 07:05:39PM -0700, Daniel Lustig wrote:
> On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> > On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  
> > wrote:
> >>
> >> The locking pattern is fairly simple and shows where RCpc comes apart
> >> from expectation real nice.
> > 
> > So who does RCpc right now for the unlock-lock sequence? Somebody
> > mentioned powerpc. Anybody else?
> > 
> > How nasty would be be to make powerpc conform? I will always advocate
> > tighter locking and ordering rules over looser ones..
> > 
> > Linus
> 
> RISC-V probably would have been RCpc if we weren't having this discussion.
> Depending on how we map atomics/acquire/release/unlock/lock, we can end up
> producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
> behaviors, and we're trying to figure out which we actually need.
> 
> I think the debate is this:
> 
> Obviously programmers would prefer just to have RCsc and not have to figure 
> out
> all the complexity of the other options.  On x86 or architectures with native
> RCsc operations (like ARMv8), that's generally easy enough to get.
> 
> For weakly-ordered architectures that use fences for ordering (including
> PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
> from RCpc to either "RCtso" or RCsc.  People using these architectures are
> concerned about whether there's a negative performance impact from those extra
> fences.
> 
> However, some scheduler code, some RCU code, and probably some other examples
> already implicitly or explicitly assume unlock()/lock() provides stronger
> ordering than RCpc.

Just to be clear, the RCU code uses smp_mb__after_unlock_lock() to get
the ordering that it needs out of spinlocks.  Maybe that is what you
meant by "explicitly assume", but I figured I should clarify.

Thanx, Paul



Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Paul E. McKenney
On Thu, Jul 12, 2018 at 07:05:39PM -0700, Daniel Lustig wrote:
> On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> > On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  
> > wrote:
> >>
> >> The locking pattern is fairly simple and shows where RCpc comes apart
> >> from expectation real nice.
> > 
> > So who does RCpc right now for the unlock-lock sequence? Somebody
> > mentioned powerpc. Anybody else?
> > 
> > How nasty would be be to make powerpc conform? I will always advocate
> > tighter locking and ordering rules over looser ones..
> > 
> > Linus
> 
> RISC-V probably would have been RCpc if we weren't having this discussion.
> Depending on how we map atomics/acquire/release/unlock/lock, we can end up
> producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
> behaviors, and we're trying to figure out which we actually need.
> 
> I think the debate is this:
> 
> Obviously programmers would prefer just to have RCsc and not have to figure 
> out
> all the complexity of the other options.  On x86 or architectures with native
> RCsc operations (like ARMv8), that's generally easy enough to get.
> 
> For weakly-ordered architectures that use fences for ordering (including
> PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
> from RCpc to either "RCtso" or RCsc.  People using these architectures are
> concerned about whether there's a negative performance impact from those extra
> fences.
> 
> However, some scheduler code, some RCU code, and probably some other examples
> already implicitly or explicitly assume unlock()/lock() provides stronger
> ordering than RCpc.

Just to be clear, the RCU code uses smp_mb__after_unlock_lock() to get
the ordering that it needs out of spinlocks.  Maybe that is what you
meant by "explicitly assume", but I figured I should clarify.

Thanx, Paul



Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Paul E. McKenney
On Fri, Jul 13, 2018 at 11:47:18AM +0800, Lai Jiangshan wrote:
> On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
>  wrote:
> > Hello!
> >
> > I now have a semi-reasonable prototype of changes consolidating the
> > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> > There are likely still bugs to be fixed and probably other issues as well,
> > but a prototype does exist.
> >
> > Assuming continued good rcutorture results and no objections, I am
> > thinking in terms of this timeline:
> >
> > o   Preparatory work and cleanups are slated for the v4.19 merge window.
> >
> > o   The actual consolidation and post-consolidation cleanup is slated
> > for the merge window after v4.19 (v5.0?).  These cleanups include
> > the replacements called out below within the RCU implementation
> > itself (but excluding kernel/rcu/sync.c, see question below).
> >
> > o   Replacement of now-obsolete update APIs is slated for the second
> > merge window after v4.19 (v5.1?).  The replacements are currently
> > expected to be as follows:
> >
> > synchronize_rcu_bh() -> synchronize_rcu()
> > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> > call_rcu_bh() -> call_rcu()
> > rcu_barrier_bh() -> rcu_barrier()
> > synchronize_sched() -> synchronize_rcu()
> > synchronize_sched_expedited() -> synchronize_rcu_expedited()
> > call_rcu_sched() -> call_rcu()
> > rcu_barrier_sched() -> rcu_barrier()
> > get_state_synchronize_sched() -> get_state_synchronize_rcu()
> > cond_synchronize_sched() -> cond_synchronize_rcu()
> > synchronize_rcu_mult() -> synchronize_rcu()
> >
> > I have done light testing of these replacements with good results.
> >
> > Any objections to this timeline?
> >
> > I also have some questions on the ultimate end point.  I have default
> > choices, which I will likely take if there is no discussion.
> >
> > o
> > Currently, I am thinking in terms of keeping the per-flavor
> > read-side functions.  For example, rcu_read_lock_bh() would
> > continue to disable softirq, and would also continue to tell
> > lockdep about the RCU-bh read-side critical section.  However,
> > synchronize_rcu() will wait for all flavors of read-side critical
> > sections, including those introduced by (say) preempt_disable(),
> > so there will no longer be any possibility of mismatching (say)
> > RCU-bh readers with RCU-sched updaters.
> >
> > I could imagine other ways of handling this, including:
> >
> > a.  Eliminate rcu_read_lock_bh() in favor of
> > local_bh_disable() and so on.  Rely on lockdep
> > instrumentation of these other functions to identify RCU
> > readers, introducing such instrumentation as needed.  I am
> > not a fan of this approach because of the large number of
> > places in the Linux kernel where interrupts, preemption,
> > and softirqs are enabled or disabled "behind the scenes".
> >
> > b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> > and required callers to also disable softirqs, preemption,
> > or whatever as needed.  I am not a fan of this approach
> > because it seems a lot less convenient to users of RCU-bh
> > and RCU-sched.
> >
> > At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> > read-side APIs.  But are there better approaches?
> 
> Hello, Paul
> 
> Since local_bh_disable() will be guaranteed to be protected by RCU
> and more general. I'm afraid it will be preferred over
> rcu_read_lock_bh() which will be gradually being phased out.
> 
> In other words, keeping the RCU-bh read-side APIs will be a slower
> version of the option A. So will the same approach for the RCU-sched.
> But it'll still be better than the hurrying option A, IMHO.

I am OK with the read-side RCU-bh and RCU-sched interfaces going away,
it is just that I am not willing to put all that much effort into
it myself.  ;-)

Unless there is a good reason for me to hurry it along, of course.

Thanx, Paul

> Thanks,
> Lai
> 
> >
> > o   How should kernel/rcu/sync.c be handled?  Here are some
> > possibilities:
> >
> > a.  Leave the full gp_ops[] array and simply translate
> > the obsolete update-side functions to their RCU
> > equivalents.
> >
> > b.  Leave the current gp_ops[] array, but only have
> > the RCU_SYNC entry.  The __INIT_HELD field would
> > be set to a function that was OK with being in an
> > RCU read-side critical section, an interrupt-disabled
> > section, etc.
> >
> > 

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Paul E. McKenney
On Fri, Jul 13, 2018 at 11:47:18AM +0800, Lai Jiangshan wrote:
> On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
>  wrote:
> > Hello!
> >
> > I now have a semi-reasonable prototype of changes consolidating the
> > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> > There are likely still bugs to be fixed and probably other issues as well,
> > but a prototype does exist.
> >
> > Assuming continued good rcutorture results and no objections, I am
> > thinking in terms of this timeline:
> >
> > o   Preparatory work and cleanups are slated for the v4.19 merge window.
> >
> > o   The actual consolidation and post-consolidation cleanup is slated
> > for the merge window after v4.19 (v5.0?).  These cleanups include
> > the replacements called out below within the RCU implementation
> > itself (but excluding kernel/rcu/sync.c, see question below).
> >
> > o   Replacement of now-obsolete update APIs is slated for the second
> > merge window after v4.19 (v5.1?).  The replacements are currently
> > expected to be as follows:
> >
> > synchronize_rcu_bh() -> synchronize_rcu()
> > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> > call_rcu_bh() -> call_rcu()
> > rcu_barrier_bh() -> rcu_barrier()
> > synchronize_sched() -> synchronize_rcu()
> > synchronize_sched_expedited() -> synchronize_rcu_expedited()
> > call_rcu_sched() -> call_rcu()
> > rcu_barrier_sched() -> rcu_barrier()
> > get_state_synchronize_sched() -> get_state_synchronize_rcu()
> > cond_synchronize_sched() -> cond_synchronize_rcu()
> > synchronize_rcu_mult() -> synchronize_rcu()
> >
> > I have done light testing of these replacements with good results.
> >
> > Any objections to this timeline?
> >
> > I also have some questions on the ultimate end point.  I have default
> > choices, which I will likely take if there is no discussion.
> >
> > o
> > Currently, I am thinking in terms of keeping the per-flavor
> > read-side functions.  For example, rcu_read_lock_bh() would
> > continue to disable softirq, and would also continue to tell
> > lockdep about the RCU-bh read-side critical section.  However,
> > synchronize_rcu() will wait for all flavors of read-side critical
> > sections, including those introduced by (say) preempt_disable(),
> > so there will no longer be any possibility of mismatching (say)
> > RCU-bh readers with RCU-sched updaters.
> >
> > I could imagine other ways of handling this, including:
> >
> > a.  Eliminate rcu_read_lock_bh() in favor of
> > local_bh_disable() and so on.  Rely on lockdep
> > instrumentation of these other functions to identify RCU
> > readers, introducing such instrumentation as needed.  I am
> > not a fan of this approach because of the large number of
> > places in the Linux kernel where interrupts, preemption,
> > and softirqs are enabled or disabled "behind the scenes".
> >
> > b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> > and required callers to also disable softirqs, preemption,
> > or whatever as needed.  I am not a fan of this approach
> > because it seems a lot less convenient to users of RCU-bh
> > and RCU-sched.
> >
> > At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> > read-side APIs.  But are there better approaches?
> 
> Hello, Paul
> 
> Since local_bh_disable() will be guaranteed to be protected by RCU
> and more general. I'm afraid it will be preferred over
> rcu_read_lock_bh() which will be gradually being phased out.
> 
> In other words, keeping the RCU-bh read-side APIs will be a slower
> version of the option A. So will the same approach for the RCU-sched.
> But it'll still be better than the hurrying option A, IMHO.

I am OK with the read-side RCU-bh and RCU-sched interfaces going away,
it is just that I am not willing to put all that much effort into
it myself.  ;-)

Unless there is a good reason for me to hurry it along, of course.

Thanx, Paul

> Thanks,
> Lai
> 
> >
> > o   How should kernel/rcu/sync.c be handled?  Here are some
> > possibilities:
> >
> > a.  Leave the full gp_ops[] array and simply translate
> > the obsolete update-side functions to their RCU
> > equivalents.
> >
> > b.  Leave the current gp_ops[] array, but only have
> > the RCU_SYNC entry.  The __INIT_HELD field would
> > be set to a function that was OK with being in an
> > RCU read-side critical section, an interrupt-disabled
> > section, etc.
> >
> > 

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Lai Jiangshan
On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
 wrote:
> Hello!
>
> I now have a semi-reasonable prototype of changes consolidating the
> RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> There are likely still bugs to be fixed and probably other issues as well,
> but a prototype does exist.
>
> Assuming continued good rcutorture results and no objections, I am
> thinking in terms of this timeline:
>
> o   Preparatory work and cleanups are slated for the v4.19 merge window.
>
> o   The actual consolidation and post-consolidation cleanup is slated
> for the merge window after v4.19 (v5.0?).  These cleanups include
> the replacements called out below within the RCU implementation
> itself (but excluding kernel/rcu/sync.c, see question below).
>
> o   Replacement of now-obsolete update APIs is slated for the second
> merge window after v4.19 (v5.1?).  The replacements are currently
> expected to be as follows:
>
> synchronize_rcu_bh() -> synchronize_rcu()
> synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> call_rcu_bh() -> call_rcu()
> rcu_barrier_bh() -> rcu_barrier()
> synchronize_sched() -> synchronize_rcu()
> synchronize_sched_expedited() -> synchronize_rcu_expedited()
> call_rcu_sched() -> call_rcu()
> rcu_barrier_sched() -> rcu_barrier()
> get_state_synchronize_sched() -> get_state_synchronize_rcu()
> cond_synchronize_sched() -> cond_synchronize_rcu()
> synchronize_rcu_mult() -> synchronize_rcu()
>
> I have done light testing of these replacements with good results.
>
> Any objections to this timeline?
>
> I also have some questions on the ultimate end point.  I have default
> choices, which I will likely take if there is no discussion.
>
> o
> Currently, I am thinking in terms of keeping the per-flavor
> read-side functions.  For example, rcu_read_lock_bh() would
> continue to disable softirq, and would also continue to tell
> lockdep about the RCU-bh read-side critical section.  However,
> synchronize_rcu() will wait for all flavors of read-side critical
> sections, including those introduced by (say) preempt_disable(),
> so there will no longer be any possibility of mismatching (say)
> RCU-bh readers with RCU-sched updaters.
>
> I could imagine other ways of handling this, including:
>
> a.  Eliminate rcu_read_lock_bh() in favor of
> local_bh_disable() and so on.  Rely on lockdep
> instrumentation of these other functions to identify RCU
> readers, introducing such instrumentation as needed.  I am
> not a fan of this approach because of the large number of
> places in the Linux kernel where interrupts, preemption,
> and softirqs are enabled or disabled "behind the scenes".
>
> b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> and required callers to also disable softirqs, preemption,
> or whatever as needed.  I am not a fan of this approach
> because it seems a lot less convenient to users of RCU-bh
> and RCU-sched.
>
> At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> read-side APIs.  But are there better approaches?

Hello, Paul

Since local_bh_disable() will be guaranteed to be protected by RCU
and more general. I'm afraid it will be preferred over
rcu_read_lock_bh() which will be gradually being phased out.

In other words, keeping the RCU-bh read-side APIs will be a slower
version of the option A. So will the same approach for the RCU-sched.
But it'll still be better than the hurrying option A, IMHO.

Thanks,
Lai

>
> o   How should kernel/rcu/sync.c be handled?  Here are some
> possibilities:
>
> a.  Leave the full gp_ops[] array and simply translate
> the obsolete update-side functions to their RCU
> equivalents.
>
> b.  Leave the current gp_ops[] array, but only have
> the RCU_SYNC entry.  The __INIT_HELD field would
> be set to a function that was OK with being in an
> RCU read-side critical section, an interrupt-disabled
> section, etc.
>
> This allows for possible addition of SRCU functionality.
> It is also a trivial change.  Note that the sole user
> of sync.c uses RCU_SCHED_SYNC, and this would need to
> be changed to RCU_SYNC.
>
> But is it likely that we will ever add SRCU?
>
> c.  Eliminate that gp_ops[] array, hard-coding the function
> pointers into their call sites.
>
> I don't really have a preference.  Left to myself, I will be lazy
> and take 

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Lai Jiangshan
On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
 wrote:
> Hello!
>
> I now have a semi-reasonable prototype of changes consolidating the
> RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> There are likely still bugs to be fixed and probably other issues as well,
> but a prototype does exist.
>
> Assuming continued good rcutorture results and no objections, I am
> thinking in terms of this timeline:
>
> o   Preparatory work and cleanups are slated for the v4.19 merge window.
>
> o   The actual consolidation and post-consolidation cleanup is slated
> for the merge window after v4.19 (v5.0?).  These cleanups include
> the replacements called out below within the RCU implementation
> itself (but excluding kernel/rcu/sync.c, see question below).
>
> o   Replacement of now-obsolete update APIs is slated for the second
> merge window after v4.19 (v5.1?).  The replacements are currently
> expected to be as follows:
>
> synchronize_rcu_bh() -> synchronize_rcu()
> synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> call_rcu_bh() -> call_rcu()
> rcu_barrier_bh() -> rcu_barrier()
> synchronize_sched() -> synchronize_rcu()
> synchronize_sched_expedited() -> synchronize_rcu_expedited()
> call_rcu_sched() -> call_rcu()
> rcu_barrier_sched() -> rcu_barrier()
> get_state_synchronize_sched() -> get_state_synchronize_rcu()
> cond_synchronize_sched() -> cond_synchronize_rcu()
> synchronize_rcu_mult() -> synchronize_rcu()
>
> I have done light testing of these replacements with good results.
>
> Any objections to this timeline?
>
> I also have some questions on the ultimate end point.  I have default
> choices, which I will likely take if there is no discussion.
>
> o
> Currently, I am thinking in terms of keeping the per-flavor
> read-side functions.  For example, rcu_read_lock_bh() would
> continue to disable softirq, and would also continue to tell
> lockdep about the RCU-bh read-side critical section.  However,
> synchronize_rcu() will wait for all flavors of read-side critical
> sections, including those introduced by (say) preempt_disable(),
> so there will no longer be any possibility of mismatching (say)
> RCU-bh readers with RCU-sched updaters.
>
> I could imagine other ways of handling this, including:
>
> a.  Eliminate rcu_read_lock_bh() in favor of
> local_bh_disable() and so on.  Rely on lockdep
> instrumentation of these other functions to identify RCU
> readers, introducing such instrumentation as needed.  I am
> not a fan of this approach because of the large number of
> places in the Linux kernel where interrupts, preemption,
> and softirqs are enabled or disabled "behind the scenes".
>
> b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> and required callers to also disable softirqs, preemption,
> or whatever as needed.  I am not a fan of this approach
> because it seems a lot less convenient to users of RCU-bh
> and RCU-sched.
>
> At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> read-side APIs.  But are there better approaches?

Hello, Paul

Since local_bh_disable() will be guaranteed to be protected by RCU
and more general. I'm afraid it will be preferred over
rcu_read_lock_bh() which will be gradually being phased out.

In other words, keeping the RCU-bh read-side APIs will be a slower
version of the option A. So will the same approach for the RCU-sched.
But it'll still be better than the hurrying option A, IMHO.

Thanks,
Lai

>
> o   How should kernel/rcu/sync.c be handled?  Here are some
> possibilities:
>
> a.  Leave the full gp_ops[] array and simply translate
> the obsolete update-side functions to their RCU
> equivalents.
>
> b.  Leave the current gp_ops[] array, but only have
> the RCU_SYNC entry.  The __INIT_HELD field would
> be set to a function that was OK with being in an
> RCU read-side critical section, an interrupt-disabled
> section, etc.
>
> This allows for possible addition of SRCU functionality.
> It is also a trivial change.  Note that the sole user
> of sync.c uses RCU_SCHED_SYNC, and this would need to
> be changed to RCU_SYNC.
>
> But is it likely that we will ever add SRCU?
>
> c.  Eliminate that gp_ops[] array, hard-coding the function
> pointers into their call sites.
>
> I don't really have a preference.  Left to myself, I will be lazy
> and take 

[PATCH v1 1/2] mm: fix race on soft-offlining free huge pages

2018-07-12 Thread Naoya Horiguchi
There's a race condition between soft offline and hugetlb_fault which
causes unexpected process killing and/or hugetlb allocation failure.

The process killing is caused by the following flow:

  CPU 0   CPU 1  CPU 2

  soft offline
get_any_page
// find the hugetlb is free
  mmap a hugetlb file
  page fault
...
  hugetlb_fault
hugetlb_no_page
  alloc_huge_page
  // succeed
  soft_offline_free_page
  // set hwpoison flag
 mmap the hugetlb file
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 find_lock_page
   return VM_FAULT_HWPOISON
   mm_fault_error
 do_sigbus
 // kill the process


The hugetlb allocation failure comes from the following flow:

  CPU 0  CPU 1

 mmap a hugetlb file
 // reserve all free page but don't fault-in
  soft offline
get_any_page
// find the hugetlb is free
  soft_offline_free_page
  // set hwpoison flag
dissolve_free_huge_page
// fail because all free hugepages are reserved
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 alloc_huge_page
   ...
 dequeue_huge_page_node_exact
 // ignore hwpoisoned hugepage
 // and finally fail due to no-mem

The root cause of this is that current soft-offline code is written
based on an assumption that PageHWPoison flag should beset at first to
avoid accessing the corrupted data.  This makes sense for memory_failure()
or hard offline, but does not for soft offline because soft offline is
about corrected (not uncorrected) error and is safe from data lost.
This patch changes soft offline semantics where it sets PageHWPoison flag
only after containment of the error page completes successfully.

Reported-by: Xishi Qiu 
Suggested-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 mm/hugetlb.c| 11 +--
 mm/memory-failure.c | 22 --
 mm/migrate.c|  2 --
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
index 430be42..937c142 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
@@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, 
nodemask_t *nodes_allowed,
 /*
  * Dissolve a given free hugepage into free buddy pages. This function does
  * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
  */
 int dissolve_free_huge_page(struct page *page)
 {
-   int rc = 0;
+   int rc = -EBUSY;
 
spin_lock(_lock);
if (PageHuge(page) && !page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
-   if (h->free_huge_pages - h->resv_huge_pages == 0) {
-   rc = -EBUSY;
+   if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
-   }
/*
 * Move PageHWPoison flag from head page to the raw error page,
 * which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page)
h->free_huge_pages_node[nid]--;
h->max_huge_pages--;
update_and_free_page(h, head);
+   rc = 0;
}
 out:
spin_unlock(_lock);
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index 9d142b9..c63d982 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -1598,8 +1598,18 @@ static int 

[PATCH v1 0/2] mm: soft-offline: fix race against page allocation

2018-07-12 Thread Naoya Horiguchi
Xishi recently reported the issue about race on reusing the target pages
of soft offlining.
Discussion and analysis showed that we need make sure that setting PG_hwpoison
should be done in the right place under zone->lock for soft offline.
1/2 handles free hugepage's case, and 2/2 hanldes free buddy page's case.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (2):
  mm: fix race on soft-offlining free huge pages
  mm: soft-offline: close the race against page allocation

 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/hugetlb.c   | 11 +--
 mm/memory-failure.c| 44 +++-
 mm/migrate.c   |  4 +---
 mm/page_alloc.c| 29 +
 6 files changed, 75 insertions(+), 28 deletions(-)


[PATCH v1 2/2] mm: soft-offline: close the race against page allocation

2018-07-12 Thread Naoya Horiguchi
A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to
allocate a page that was just freed on the way of soft-offline.
This is undesirable because soft-offline (which is about corrected error)
is less aggressive than hard-offline (which is about uncorrected error),
and we can make soft-offline fail and keep using the page for good reason
like "system is busy."

Two main changes of this patch are:

- setting migrate type of the target page to MIGRATE_ISOLATE. As done
  in free_unref_page_commit(), this makes kernel bypass pcplist when
  freeing the page. So we can assume that the page is in freelist just
  after put_page() returns,

- setting PG_hwpoison on free page under zone->lock which protects
  freelists, so this allows us to avoid setting PG_hwpoison on a page
  that is decided to be allocated soon.

Reported-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/memory-failure.c| 26 +-
 mm/migrate.c   |  2 +-
 mm/page_alloc.c| 29 +
 5 files changed, 56 insertions(+), 16 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
index 901943e..74bee8c 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
@@ -369,8 +369,13 @@ PAGEFLAG_FALSE(Uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+extern bool set_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
+static inline bool set_hwpoison_free_buddy_page(struct page *page)
+{
+   return 0;
+}
 #define __PG_HWPOISON 0
 #endif
 
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
index 9c0eb4d..fe8e08b 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
@@ -335,11 +335,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return TestSetPageHWPoison(page);
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
atomic_long_inc(_poisoned_pages);
@@ -362,11 +357,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
return 0;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return false;
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
 }
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index c63d982..794687a 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -1697,6 +1698,7 @@ static int __soft_offline_page(struct page *page, int 
flags)
 static int soft_offline_in_use_page(struct page *page, int flags)
 {
int ret;
+   int mt;
struct page *hpage = compound_head(page);
 
if (!PageHuge(page) && PageTransHuge(hpage)) {
@@ -1715,23 +1717,37 @@ static int soft_offline_in_use_page(struct page *page, 
int flags)
put_hwpoison_page(hpage);
}
 
+   /*
+* Setting MIGRATE_ISOLATE here ensures that the page will be linked
+* to free list immediately (not via pcplist) when released after
+* successful page migration. Otherwise we can't guarantee that the
+* page is really free after put_page() returns, so
+* set_hwpoison_free_buddy_page() highly likely fails.
+*/
+   mt = get_pageblock_migratetype(page);
+   set_pageblock_migratetype(page, MIGRATE_ISOLATE);
if (PageHuge(page))
ret = soft_offline_huge_page(page, flags);
else
ret = __soft_offline_page(page, flags);
-
+   set_pageblock_migratetype(page, mt);
return ret;
 }
 
-static void soft_offline_free_page(struct page *page)
+static int soft_offline_free_page(struct page *page)
 {
int rc = 0;
struct page *head = compound_head(page);
 
if (PageHuge(head))
rc = dissolve_free_huge_page(page);
-   if (!rc && !TestSetPageHWPoison(page))
-   num_poisoned_pages_inc();
+   if (!rc) {
+   if (set_hwpoison_free_buddy_page(page))
+   num_poisoned_pages_inc();
+   else
+   rc = -EBUSY;
+   }
+   return rc;
 }
 
 /**
@@ -1775,7 +1791,7 @@ int soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = 

Re: [PATCH 5/5] f2fs: do not __punch_discard_cmd in lfs mode

2018-07-12 Thread Chao Yu
On 2018/7/12 23:09, Yunlong Song wrote:
> In lfs mode, it is better to submit and wait for discard of the
> new_blkaddr's overall section, rather than punch it which makes
> more small discards and is not friendly with flash alignment. And
> f2fs does not have to wait discard of each new_blkaddr except for the
> start_block of each section with this patch.

For non-zoned block device, unaligned discard can be allowed; and if synchronous
discard is very slow, it will block block allocator here, rather than that, I
prefer just punch 4k lba of discard entry for performance.

If you don't want to encounter this condition, I suggest issue large size
discard more quickly.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/segment.c | 76 
> ++-
>  fs/f2fs/segment.h |  7 -
>  2 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index f6c20e0..bce321a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -893,7 +893,19 @@ static void __remove_discard_cmd(struct f2fs_sb_info 
> *sbi,
>  static void f2fs_submit_discard_endio(struct bio *bio)
>  {
>   struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> + struct f2fs_sb_info *sbi = F2FS_SB(dc->bdev->bd_super);
>  
> + if (test_opt(sbi, LFS)) {
> + unsigned int segno = GET_SEGNO(sbi, dc->lstart);
> + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> + int cnt = (dc->len >> sbi->log_blocks_per_seg) /
> + sbi->segs_per_sec;
> +
> + while (cnt--) {
> + set_bit(secno, FREE_I(sbi)->discard_secmap);
> + secno++;
> + }
> + }
>   dc->error = blk_status_to_errno(bio->bi_status);
>   dc->state = D_DONE;
>   complete_all(>wait);
> @@ -1349,8 +1361,15 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info 
> *sbi, block_t blkaddr)
>   dc = (struct discard_cmd *)f2fs_lookup_rb_tree(>root,
>   NULL, blkaddr);
>   if (dc) {
> - if (dc->state == D_PREP) {
> + if (dc->state == D_PREP && !test_opt(sbi, LFS))
>   __punch_discard_cmd(sbi, dc, blkaddr);
> + else if (dc->state == D_PREP && test_opt(sbi, LFS)) {
> + struct discard_policy dpolicy;
> +
> + __init_discard_policy(sbi, , DPOLICY_FORCE, 1);
> + __submit_discard_cmd(sbi, , dc);
> + dc->ref++;
> + need_wait = true;
>   } else {
>   dc->ref++;
>   need_wait = true;
> @@ -2071,9 +2090,10 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
>   unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
>   unsigned int left_start = hint;
> - bool init = true;
> + bool init = true, check_discard = test_opt(sbi, LFS) ? true : false;
>   int go_left = 0;
>   int i;
> + unsigned long *free_secmap;
>  
>   spin_lock(_i->segmap_lock);
>  
> @@ -2084,11 +2104,25 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   goto got_it;
>   }
>  find_other_zone:
> - secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
> + if (check_discard) {
> + int entries = f2fs_bitmap_size(MAIN_SECS(sbi)) / 
> sizeof(unsigned long);
> +
> + free_secmap = free_i->tmp_secmap;
> + for (i = 0; i < entries; i++)
> + free_secmap[i] = (!(free_i->free_secmap[i] ^
> + free_i->discard_secmap[i])) | 
> free_i->free_secmap[i];
> + } else
> + free_secmap = free_i->free_secmap;
> +
> + secno = find_next_zero_bit(free_secmap, MAIN_SECS(sbi), hint);
>   if (secno >= MAIN_SECS(sbi)) {
>   if (dir == ALLOC_RIGHT) {
> - secno = find_next_zero_bit(free_i->free_secmap,
> + secno = find_next_zero_bit(free_secmap,
>   MAIN_SECS(sbi), 0);
> + if (secno >= MAIN_SECS(sbi) && check_discard) {
> + check_discard = false;
> + goto find_other_zone;
> + }
>   f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
>   } else {
>   go_left = 1;
> @@ -2098,13 +2132,17 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   if (go_left == 0)
>   goto skip_left;
>  
> - while (test_bit(left_start, free_i->free_secmap)) {
> + while (test_bit(left_start, free_secmap)) {
>   if (left_start > 0) {
>   left_start--;
>   continue;
>   }
> - 

[PATCH v1 1/2] mm: fix race on soft-offlining free huge pages

2018-07-12 Thread Naoya Horiguchi
There's a race condition between soft offline and hugetlb_fault which
causes unexpected process killing and/or hugetlb allocation failure.

The process killing is caused by the following flow:

  CPU 0   CPU 1  CPU 2

  soft offline
get_any_page
// find the hugetlb is free
  mmap a hugetlb file
  page fault
...
  hugetlb_fault
hugetlb_no_page
  alloc_huge_page
  // succeed
  soft_offline_free_page
  // set hwpoison flag
 mmap the hugetlb file
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 find_lock_page
   return VM_FAULT_HWPOISON
   mm_fault_error
 do_sigbus
 // kill the process


The hugetlb allocation failure comes from the following flow:

  CPU 0  CPU 1

 mmap a hugetlb file
 // reserve all free page but don't fault-in
  soft offline
get_any_page
// find the hugetlb is free
  soft_offline_free_page
  // set hwpoison flag
dissolve_free_huge_page
// fail because all free hugepages are reserved
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 alloc_huge_page
   ...
 dequeue_huge_page_node_exact
 // ignore hwpoisoned hugepage
 // and finally fail due to no-mem

The root cause of this is that current soft-offline code is written
based on an assumption that PageHWPoison flag should beset at first to
avoid accessing the corrupted data.  This makes sense for memory_failure()
or hard offline, but does not for soft offline because soft offline is
about corrected (not uncorrected) error and is safe from data lost.
This patch changes soft offline semantics where it sets PageHWPoison flag
only after containment of the error page completes successfully.

Reported-by: Xishi Qiu 
Suggested-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 mm/hugetlb.c| 11 +--
 mm/memory-failure.c | 22 --
 mm/migrate.c|  2 --
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
index 430be42..937c142 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
@@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, 
nodemask_t *nodes_allowed,
 /*
  * Dissolve a given free hugepage into free buddy pages. This function does
  * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
  */
 int dissolve_free_huge_page(struct page *page)
 {
-   int rc = 0;
+   int rc = -EBUSY;
 
spin_lock(_lock);
if (PageHuge(page) && !page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
-   if (h->free_huge_pages - h->resv_huge_pages == 0) {
-   rc = -EBUSY;
+   if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
-   }
/*
 * Move PageHWPoison flag from head page to the raw error page,
 * which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page)
h->free_huge_pages_node[nid]--;
h->max_huge_pages--;
update_and_free_page(h, head);
+   rc = 0;
}
 out:
spin_unlock(_lock);
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index 9d142b9..c63d982 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -1598,8 +1598,18 @@ static int 

[PATCH v1 0/2] mm: soft-offline: fix race against page allocation

2018-07-12 Thread Naoya Horiguchi
Xishi recently reported the issue about race on reusing the target pages
of soft offlining.
Discussion and analysis showed that we need make sure that setting PG_hwpoison
should be done in the right place under zone->lock for soft offline.
1/2 handles free hugepage's case, and 2/2 hanldes free buddy page's case.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (2):
  mm: fix race on soft-offlining free huge pages
  mm: soft-offline: close the race against page allocation

 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/hugetlb.c   | 11 +--
 mm/memory-failure.c| 44 +++-
 mm/migrate.c   |  4 +---
 mm/page_alloc.c| 29 +
 6 files changed, 75 insertions(+), 28 deletions(-)


[PATCH v1 2/2] mm: soft-offline: close the race against page allocation

2018-07-12 Thread Naoya Horiguchi
A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to
allocate a page that was just freed on the way of soft-offline.
This is undesirable because soft-offline (which is about corrected error)
is less aggressive than hard-offline (which is about uncorrected error),
and we can make soft-offline fail and keep using the page for good reason
like "system is busy."

Two main changes of this patch are:

- setting migrate type of the target page to MIGRATE_ISOLATE. As done
  in free_unref_page_commit(), this makes kernel bypass pcplist when
  freeing the page. So we can assume that the page is in freelist just
  after put_page() returns,

- setting PG_hwpoison on free page under zone->lock which protects
  freelists, so this allows us to avoid setting PG_hwpoison on a page
  that is decided to be allocated soon.

Reported-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/memory-failure.c| 26 +-
 mm/migrate.c   |  2 +-
 mm/page_alloc.c| 29 +
 5 files changed, 56 insertions(+), 16 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
index 901943e..74bee8c 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
@@ -369,8 +369,13 @@ PAGEFLAG_FALSE(Uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+extern bool set_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
+static inline bool set_hwpoison_free_buddy_page(struct page *page)
+{
+   return 0;
+}
 #define __PG_HWPOISON 0
 #endif
 
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
index 9c0eb4d..fe8e08b 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
@@ -335,11 +335,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return TestSetPageHWPoison(page);
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
atomic_long_inc(_poisoned_pages);
@@ -362,11 +357,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
return 0;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return false;
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
 }
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index c63d982..794687a 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -1697,6 +1698,7 @@ static int __soft_offline_page(struct page *page, int 
flags)
 static int soft_offline_in_use_page(struct page *page, int flags)
 {
int ret;
+   int mt;
struct page *hpage = compound_head(page);
 
if (!PageHuge(page) && PageTransHuge(hpage)) {
@@ -1715,23 +1717,37 @@ static int soft_offline_in_use_page(struct page *page, 
int flags)
put_hwpoison_page(hpage);
}
 
+   /*
+* Setting MIGRATE_ISOLATE here ensures that the page will be linked
+* to free list immediately (not via pcplist) when released after
+* successful page migration. Otherwise we can't guarantee that the
+* page is really free after put_page() returns, so
+* set_hwpoison_free_buddy_page() highly likely fails.
+*/
+   mt = get_pageblock_migratetype(page);
+   set_pageblock_migratetype(page, MIGRATE_ISOLATE);
if (PageHuge(page))
ret = soft_offline_huge_page(page, flags);
else
ret = __soft_offline_page(page, flags);
-
+   set_pageblock_migratetype(page, mt);
return ret;
 }
 
-static void soft_offline_free_page(struct page *page)
+static int soft_offline_free_page(struct page *page)
 {
int rc = 0;
struct page *head = compound_head(page);
 
if (PageHuge(head))
rc = dissolve_free_huge_page(page);
-   if (!rc && !TestSetPageHWPoison(page))
-   num_poisoned_pages_inc();
+   if (!rc) {
+   if (set_hwpoison_free_buddy_page(page))
+   num_poisoned_pages_inc();
+   else
+   rc = -EBUSY;
+   }
+   return rc;
 }
 
 /**
@@ -1775,7 +1791,7 @@ int soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = 

Re: [PATCH 5/5] f2fs: do not __punch_discard_cmd in lfs mode

2018-07-12 Thread Chao Yu
On 2018/7/12 23:09, Yunlong Song wrote:
> In lfs mode, it is better to submit and wait for discard of the
> new_blkaddr's overall section, rather than punch it which makes
> more small discards and is not friendly with flash alignment. And
> f2fs does not have to wait discard of each new_blkaddr except for the
> start_block of each section with this patch.

For non-zoned block device, unaligned discard can be allowed; and if synchronous
discard is very slow, it will block block allocator here, rather than that, I
prefer just punch 4k lba of discard entry for performance.

If you don't want to encounter this condition, I suggest issue large size
discard more quickly.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/segment.c | 76 
> ++-
>  fs/f2fs/segment.h |  7 -
>  2 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index f6c20e0..bce321a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -893,7 +893,19 @@ static void __remove_discard_cmd(struct f2fs_sb_info 
> *sbi,
>  static void f2fs_submit_discard_endio(struct bio *bio)
>  {
>   struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> + struct f2fs_sb_info *sbi = F2FS_SB(dc->bdev->bd_super);
>  
> + if (test_opt(sbi, LFS)) {
> + unsigned int segno = GET_SEGNO(sbi, dc->lstart);
> + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> + int cnt = (dc->len >> sbi->log_blocks_per_seg) /
> + sbi->segs_per_sec;
> +
> + while (cnt--) {
> + set_bit(secno, FREE_I(sbi)->discard_secmap);
> + secno++;
> + }
> + }
>   dc->error = blk_status_to_errno(bio->bi_status);
>   dc->state = D_DONE;
>   complete_all(>wait);
> @@ -1349,8 +1361,15 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info 
> *sbi, block_t blkaddr)
>   dc = (struct discard_cmd *)f2fs_lookup_rb_tree(>root,
>   NULL, blkaddr);
>   if (dc) {
> - if (dc->state == D_PREP) {
> + if (dc->state == D_PREP && !test_opt(sbi, LFS))
>   __punch_discard_cmd(sbi, dc, blkaddr);
> + else if (dc->state == D_PREP && test_opt(sbi, LFS)) {
> + struct discard_policy dpolicy;
> +
> + __init_discard_policy(sbi, , DPOLICY_FORCE, 1);
> + __submit_discard_cmd(sbi, , dc);
> + dc->ref++;
> + need_wait = true;
>   } else {
>   dc->ref++;
>   need_wait = true;
> @@ -2071,9 +2090,10 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
>   unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
>   unsigned int left_start = hint;
> - bool init = true;
> + bool init = true, check_discard = test_opt(sbi, LFS) ? true : false;
>   int go_left = 0;
>   int i;
> + unsigned long *free_secmap;
>  
>   spin_lock(_i->segmap_lock);
>  
> @@ -2084,11 +2104,25 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   goto got_it;
>   }
>  find_other_zone:
> - secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
> + if (check_discard) {
> + int entries = f2fs_bitmap_size(MAIN_SECS(sbi)) / 
> sizeof(unsigned long);
> +
> + free_secmap = free_i->tmp_secmap;
> + for (i = 0; i < entries; i++)
> + free_secmap[i] = (!(free_i->free_secmap[i] ^
> + free_i->discard_secmap[i])) | 
> free_i->free_secmap[i];
> + } else
> + free_secmap = free_i->free_secmap;
> +
> + secno = find_next_zero_bit(free_secmap, MAIN_SECS(sbi), hint);
>   if (secno >= MAIN_SECS(sbi)) {
>   if (dir == ALLOC_RIGHT) {
> - secno = find_next_zero_bit(free_i->free_secmap,
> + secno = find_next_zero_bit(free_secmap,
>   MAIN_SECS(sbi), 0);
> + if (secno >= MAIN_SECS(sbi) && check_discard) {
> + check_discard = false;
> + goto find_other_zone;
> + }
>   f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
>   } else {
>   go_left = 1;
> @@ -2098,13 +2132,17 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   if (go_left == 0)
>   goto skip_left;
>  
> - while (test_bit(left_start, free_i->free_secmap)) {
> + while (test_bit(left_start, free_secmap)) {
>   if (left_start > 0) {
>   left_start--;
>   continue;
>   }
> - 

Re: [PATCH v2 1/3] dt-bindings: thermal: Add binding document for SR thermal

2018-07-12 Thread Srinath Mannam
Hi Rob,

I have provided my inputs for the purpose of having multiple nodes.
Please get back if you have any comments or suggestions.

Regards,
Srinath.

On Tue, Jul 3, 2018 at 4:15 PM, Srinath Mannam
 wrote:
> Hi Rob,
>
> Kindly provide your feedback.
>
> Regards,
> Srinath.
>
> On Fri, Jun 22, 2018 at 11:21 AM, Srinath Mannam
>  wrote:
>> Hi Rob,
>>
>> Please find my comments for the reason to have multiple DT nodes.
>>
>> On Thu, Jun 21, 2018 at 1:22 AM, Rob Herring  wrote:
>>> On Mon, Jun 18, 2018 at 02:01:17PM +0530, Srinath Mannam wrote:
 From: Pramod Kumar 

 Add binding document for supported thermal implementation
 in Stingray.

 Signed-off-by: Pramod Kumar 
 Reviewed-by: Ray Jui 
 Reviewed-by: Scott Branden 
 Reviewed-by: Srinath Mannam 
 ---
  .../bindings/thermal/brcm,sr-thermal.txt   | 45 
 ++
  1 file changed, 45 insertions(+)
  create mode 100644 
 Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt

 diff --git a/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt 
 b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 new file mode 100644
 index 000..33f9e11
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 @@ -0,0 +1,45 @@
 +* Broadcom Stingray Thermal
 +
 +This binding describes thermal sensors that is part of Stingray SoCs.
 +
 +Required properties:
 +- compatible : Must be "brcm,sr-thermal"
 +- reg : memory where tmon data will be available.
 +
 +Example:
 + tmons {
 + compatible = "simple-bus";
 + #address-cells = <1>;
 + #size-cells = <1>;
 + ranges;
 +
 + tmon_ihost0: thermal@8f10 {
 + compatible = "brcm,sr-thermal";
 + reg = <0x8f10 0x4>;
 + };
>>>
>>> You still haven't given me a compelling reason why you need a node per
>>> register.
>>>
>>> You have a single range of registers. Make this 1 node.
>>>
>>
>> We Have two reasons to have multiple nodes..
>> 1. Our chip has multiple functional blocks. Each functional block has
>> its own thermal zone.
>> Functional blocks and their thermal zones enabled/disabled based on end 
>> product.
>> Few functional blocks need to disabled for few products so thermal
>> zones also need to disable.
>> In that case, nodes of specific thermal zones are removed from DTS
>> file of corresponding product.
>>
>> 2. Thermal framework provides sysfs interface to configure thermal
>> zones and read temperature of thermal zone.
>> To configure individual thermal zone, we need to have separate DT node.
>> Same to read temperature of individual thermal zone.
>> Ex: To read temperature of thermal zone 0.
>>  cat /sys/class/thermal/thermal_zone0/temp
>>  To configure trip temperature of thermal zone 0.
>>   echo 11 > /sys/class/thermal/thermal_zone0/trip_point_0_temp
>>
>> Also to avoid driver source change for the multiple products it is
>> clean to have multiple DT nodes.
>>
>>> Rob


Re: [PATCH v2 1/3] dt-bindings: thermal: Add binding document for SR thermal

2018-07-12 Thread Srinath Mannam
Hi Rob,

I have provided my inputs for the purpose of having multiple nodes.
Please get back if you have any comments or suggestions.

Regards,
Srinath.

On Tue, Jul 3, 2018 at 4:15 PM, Srinath Mannam
 wrote:
> Hi Rob,
>
> Kindly provide your feedback.
>
> Regards,
> Srinath.
>
> On Fri, Jun 22, 2018 at 11:21 AM, Srinath Mannam
>  wrote:
>> Hi Rob,
>>
>> Please find my comments for the reason to have multiple DT nodes.
>>
>> On Thu, Jun 21, 2018 at 1:22 AM, Rob Herring  wrote:
>>> On Mon, Jun 18, 2018 at 02:01:17PM +0530, Srinath Mannam wrote:
 From: Pramod Kumar 

 Add binding document for supported thermal implementation
 in Stingray.

 Signed-off-by: Pramod Kumar 
 Reviewed-by: Ray Jui 
 Reviewed-by: Scott Branden 
 Reviewed-by: Srinath Mannam 
 ---
  .../bindings/thermal/brcm,sr-thermal.txt   | 45 
 ++
  1 file changed, 45 insertions(+)
  create mode 100644 
 Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt

 diff --git a/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt 
 b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 new file mode 100644
 index 000..33f9e11
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 @@ -0,0 +1,45 @@
 +* Broadcom Stingray Thermal
 +
 +This binding describes thermal sensors that is part of Stingray SoCs.
 +
 +Required properties:
 +- compatible : Must be "brcm,sr-thermal"
 +- reg : memory where tmon data will be available.
 +
 +Example:
 + tmons {
 + compatible = "simple-bus";
 + #address-cells = <1>;
 + #size-cells = <1>;
 + ranges;
 +
 + tmon_ihost0: thermal@8f10 {
 + compatible = "brcm,sr-thermal";
 + reg = <0x8f10 0x4>;
 + };
>>>
>>> You still haven't given me a compelling reason why you need a node per
>>> register.
>>>
>>> You have a single range of registers. Make this 1 node.
>>>
>>
>> We Have two reasons to have multiple nodes..
>> 1. Our chip has multiple functional blocks. Each functional block has
>> its own thermal zone.
>> Functional blocks and their thermal zones enabled/disabled based on end 
>> product.
>> Few functional blocks need to disabled for few products so thermal
>> zones also need to disable.
>> In that case, nodes of specific thermal zones are removed from DTS
>> file of corresponding product.
>>
>> 2. Thermal framework provides sysfs interface to configure thermal
>> zones and read temperature of thermal zone.
>> To configure individual thermal zone, we need to have separate DT node.
>> Same to read temperature of individual thermal zone.
>> Ex: To read temperature of thermal zone 0.
>>  cat /sys/class/thermal/thermal_zone0/temp
>>  To configure trip temperature of thermal zone 0.
>>   echo 11 > /sys/class/thermal/thermal_zone0/trip_point_0_temp
>>
>> Also to avoid driver source change for the multiple products it is
>> clean to have multiple DT nodes.
>>
>>> Rob


general protection fault in propagate_entity_cfs_rq

2018-07-12 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:6fd066604123 Merge branch 'bpf-arm-jit-improvements'
git tree:   bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11e9267840
kernel config:  https://syzkaller.appspot.com/x/.config?x=a501a01deaf0fe9
dashboard link: https://syzkaller.appspot.com/bug?extid=2e37f794f31be5667a88
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1014db9440
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11f81e7840

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2e37f794f31be5667...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fb1b24d7e78 CR3: 0001ab04b000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 detach_entity_cfs_rq+0x6e3/0xf50 kernel/sched/fair.c:10059
 migrate_task_rq_fair+0xba/0x290 kernel/sched/fair.c:6709
 set_task_cpu+0x131/0x770 kernel/sched/core.c:1194
 detach_task.isra.89+0xdb/0x150 kernel/sched/fair.c:7438
 detach_tasks kernel/sched/fair.c:7525 [inline]
 load_balance+0xf0b/0x3640 kernel/sched/fair.c:8884
 rebalance_domains+0x82a/0xd90 kernel/sched/fair.c:9262
 run_rebalance_domains+0x365/0x4c0 kernel/sched/fair.c:9884
 __do_softirq+0x2e8/0xb17 kernel/softirq.c:288
 invoke_softirq kernel/softirq.c:368 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:408
 exiting_irq arch/x86/include/asm/apic.h:527 [inline]
 smp_apic_timer_interrupt+0x186/0x730 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
Code: c7 48 89 45 d8 e8 5a 04 24 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
49 04 24 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90

RSP: 0018:8801d9af7c38 EFLAGS: 0286 ORIG_RAX: ff13
RAX: dc00 RBX: 11003b35ef8a RCX: 81667982
RDX: 111e3610 RSI: 0004 RDI: 88f1b080
RBP: 8801d9af7c38 R08: ed003b5e46d7 R09: ed003b5e46d6
R10: ed003b5e46d6 R11: 8801daf236b3 R12: 0001
R13: 8801d9af7cf0 R14: 899edd20 R15: 
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0xc7/0x450 arch/x86/kernel/process.c:500
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:491
 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x3aa/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
 start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:265
 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace cb0cd83b57bb4bba ]---
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 

general protection fault in propagate_entity_cfs_rq

2018-07-12 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:6fd066604123 Merge branch 'bpf-arm-jit-improvements'
git tree:   bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11e9267840
kernel config:  https://syzkaller.appspot.com/x/.config?x=a501a01deaf0fe9
dashboard link: https://syzkaller.appspot.com/bug?extid=2e37f794f31be5667a88
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1014db9440
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11f81e7840

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2e37f794f31be5667...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fb1b24d7e78 CR3: 0001ab04b000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 detach_entity_cfs_rq+0x6e3/0xf50 kernel/sched/fair.c:10059
 migrate_task_rq_fair+0xba/0x290 kernel/sched/fair.c:6709
 set_task_cpu+0x131/0x770 kernel/sched/core.c:1194
 detach_task.isra.89+0xdb/0x150 kernel/sched/fair.c:7438
 detach_tasks kernel/sched/fair.c:7525 [inline]
 load_balance+0xf0b/0x3640 kernel/sched/fair.c:8884
 rebalance_domains+0x82a/0xd90 kernel/sched/fair.c:9262
 run_rebalance_domains+0x365/0x4c0 kernel/sched/fair.c:9884
 __do_softirq+0x2e8/0xb17 kernel/softirq.c:288
 invoke_softirq kernel/softirq.c:368 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:408
 exiting_irq arch/x86/include/asm/apic.h:527 [inline]
 smp_apic_timer_interrupt+0x186/0x730 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
Code: c7 48 89 45 d8 e8 5a 04 24 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
49 04 24 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90

RSP: 0018:8801d9af7c38 EFLAGS: 0286 ORIG_RAX: ff13
RAX: dc00 RBX: 11003b35ef8a RCX: 81667982
RDX: 111e3610 RSI: 0004 RDI: 88f1b080
RBP: 8801d9af7c38 R08: ed003b5e46d7 R09: ed003b5e46d6
R10: ed003b5e46d6 R11: 8801daf236b3 R12: 0001
R13: 8801d9af7cf0 R14: 899edd20 R15: 
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0xc7/0x450 arch/x86/kernel/process.c:500
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:491
 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x3aa/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
 start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:265
 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace cb0cd83b57bb4bba ]---
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 

Re: [PATCH 1/2] tracing: kprobes: Prohibit probing on notrace functions

2018-07-12 Thread Masami Hiramatsu
On Thu, 12 Jul 2018 13:54:12 -0400
Francis Deslauriers  wrote:

> From: Masami Hiramatsu 
> 
> Prohibit kprobe-events probing on notrace function.
> Since probing on the notrace function can cause recursive
> event call. In most case those are just skipped, but
> in some case it falls into infinite recursive call.

BTW, I'm considering to add an option to allow putting
kprobes on notrace function - just for debugging 
ftrace by kprobes. That is "developer only" option
so generally it should be disabled, but for debugging
the ftrace, we still need it. Or should I introduce
another kprobes module for debugging it?

Thank you,


-- 
Masami Hiramatsu 


Re: [PATCH 1/2] tracing: kprobes: Prohibit probing on notrace functions

2018-07-12 Thread Masami Hiramatsu
On Thu, 12 Jul 2018 13:54:12 -0400
Francis Deslauriers  wrote:

> From: Masami Hiramatsu 
> 
> Prohibit kprobe-events probing on notrace function.
> Since probing on the notrace function can cause recursive
> event call. In most case those are just skipped, but
> in some case it falls into infinite recursive call.

BTW, I'm considering to add an option to allow putting
kprobes on notrace function - just for debugging 
ftrace by kprobes. That is "developer only" option
so generally it should be disabled, but for debugging
the ftrace, we still need it. Or should I introduce
another kprobes module for debugging it?

Thank you,


-- 
Masami Hiramatsu 


Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread Matthew Wilcox
On Fri, Jul 13, 2018 at 10:05:50AM +0800, jiangyiwen wrote:
> > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> > *clnt)
> >  {
> > int ret;
> > struct p9_fid *fid;
> > -   unsigned long flags;
> >  
> > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> > if (!fid)
> > return NULL;
> >  
> > -   ret = p9_idpool_get(clnt->fidpool);
> > -   if (ret < 0)
> > -   goto error;
> > -   fid->fid = ret;
> > -
> > memset(>qid, 0, sizeof(struct p9_qid));
> > fid->mode = -1;
> > fid->uid = current_fsuid();
> > fid->clnt = clnt;
> > fid->rdir = NULL;
> > -   spin_lock_irqsave(>lock, flags);
> > -   list_add(>flist, >fidlist);
> > -   spin_unlock_irqrestore(>lock, flags);
> > +   fid->fid = 0;
> >  
> > -   return fid;
> > +   idr_preload(GFP_KERNEL);
> 
> It is best to use GFP_NOFS instead, or else it may cause some
> unpredictable problem, because when out of memory it will
> reclaim memory from v9fs.

Earlier in this function, fid was allocated with GFP_KERNEL:

> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);


> > +   spin_lock_irq(>lock);
> > +   ret = idr_alloc_u32(>fids, fid, >fid, P9_NOFID - 1,
> > +   GFP_NOWAIT);
> > +   spin_unlock_irq(>lock);
> 
> use spin_lock instead, clnt->lock is not used in irq context.

I don't think that's right.  What about p9_fid_destroy?  It was already
using spin_lock_irqsave(), so I just assumed that whoever wrote that
code at least considered that it might be called from interrupt context.

Also consider p9_free_req() which shares the same lock.  We could get
rid of clnt->lock altogether as there's a lock embedded in each IDR,
but that'll introduce an unwanted dependence on the RDMA tree in this
merge window.

> > @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)
> >  
> > v9fs_put_trans(clnt->trans_mod);
> >  
> > -   list_for_each_entry_safe(fid, fidptr, >fidlist, flist) {
> > +   idr_for_each_entry(>fids, fid, id) {
> > pr_info("Found fid %d not clunked\n", fid->fid);
> > p9_fid_destroy(fid);
> > }
> >  
> > -   if (clnt->fidpool)
> > -   p9_idpool_destroy(clnt->fidpool);
> > -
> 
> I suggest add idr_destroy in the end.

Why?  p9_fid_destroy calls idr_remove() for each fid, so it'll already
be empty.

Thanks for all the review, to everyone who's submitted review.  This is
a really healthy community.


Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread Matthew Wilcox
On Fri, Jul 13, 2018 at 10:05:50AM +0800, jiangyiwen wrote:
> > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> > *clnt)
> >  {
> > int ret;
> > struct p9_fid *fid;
> > -   unsigned long flags;
> >  
> > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> > if (!fid)
> > return NULL;
> >  
> > -   ret = p9_idpool_get(clnt->fidpool);
> > -   if (ret < 0)
> > -   goto error;
> > -   fid->fid = ret;
> > -
> > memset(>qid, 0, sizeof(struct p9_qid));
> > fid->mode = -1;
> > fid->uid = current_fsuid();
> > fid->clnt = clnt;
> > fid->rdir = NULL;
> > -   spin_lock_irqsave(>lock, flags);
> > -   list_add(>flist, >fidlist);
> > -   spin_unlock_irqrestore(>lock, flags);
> > +   fid->fid = 0;
> >  
> > -   return fid;
> > +   idr_preload(GFP_KERNEL);
> 
> It is best to use GFP_NOFS instead, or else it may cause some
> unpredictable problem, because when out of memory it will
> reclaim memory from v9fs.

Earlier in this function, fid was allocated with GFP_KERNEL:

> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);


> > +   spin_lock_irq(>lock);
> > +   ret = idr_alloc_u32(>fids, fid, >fid, P9_NOFID - 1,
> > +   GFP_NOWAIT);
> > +   spin_unlock_irq(>lock);
> 
> use spin_lock instead, clnt->lock is not used in irq context.

I don't think that's right.  What about p9_fid_destroy?  It was already
using spin_lock_irqsave(), so I just assumed that whoever wrote that
code at least considered that it might be called from interrupt context.

Also consider p9_free_req() which shares the same lock.  We could get
rid of clnt->lock altogether as there's a lock embedded in each IDR,
but that'll introduce an unwanted dependence on the RDMA tree in this
merge window.

> > @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)
> >  
> > v9fs_put_trans(clnt->trans_mod);
> >  
> > -   list_for_each_entry_safe(fid, fidptr, >fidlist, flist) {
> > +   idr_for_each_entry(>fids, fid, id) {
> > pr_info("Found fid %d not clunked\n", fid->fid);
> > p9_fid_destroy(fid);
> > }
> >  
> > -   if (clnt->fidpool)
> > -   p9_idpool_destroy(clnt->fidpool);
> > -
> 
> I suggest add idr_destroy in the end.

Why?  p9_fid_destroy calls idr_remove() for each fid, so it'll already
be empty.

Thanks for all the review, to everyone who's submitted review.  This is
a really healthy community.


Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread Theodore Y. Ts'o
On Thu, Jul 12, 2018 at 11:54:41PM +0100, David Howells wrote:
> 
> Would that mean then that doing:
> 
>   mount /dev/sda3 /a
>   mount /dev/sda3 /b
> 
> would then fail on the second command because /dev/sda3 is already open
> exclusively?

Good point.  One workaround would be to require an open with O_PATH instead.

- Ted


Re: [PATCH 0/2] scsi: arcmsr: fix error of resuming from hibernation

2018-07-12 Thread Martin K. Petersen


Ching,

> This patch series are against to mkp's 4.19/scsi-queue.
>
> 1. Fix error of resuming from hibernation for adapter type E.
> 2. Update driver version to v1.40.00.09-20180709

Applied to 4.19/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread Theodore Y. Ts'o
On Thu, Jul 12, 2018 at 11:54:41PM +0100, David Howells wrote:
> 
> Would that mean then that doing:
> 
>   mount /dev/sda3 /a
>   mount /dev/sda3 /b
> 
> would then fail on the second command because /dev/sda3 is already open
> exclusively?

Good point.  One workaround would be to require an open with O_PATH instead.

- Ted


Re: [PATCH 0/2] scsi: arcmsr: fix error of resuming from hibernation

2018-07-12 Thread Martin K. Petersen


Ching,

> This patch series are against to mkp's 4.19/scsi-queue.
>
> 1. Fix error of resuming from hibernation for adapter type E.
> 2. Update driver version to v1.40.00.09-20180709

Applied to 4.19/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 2:45 AM, Will Deacon wrote:
> On Thu, Jul 12, 2018 at 11:34:32AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote:
>>> And I think if we raise atomic*_acquire() to require TSO (but ideally
>>> raise it to RCsc) we're there.
>>
>> To clarify, just the RmW-acquire. Things like atomic_read_acquire() can
>> stay smp_load_acquire() and be RCpc.
> 
> I don't have strong opinions about strengthening RmW atomics to TSO, so
> if it helps to unblock Alan's patch (which doesn't go near this!) then I'll
> go with it. The important part is that we continue to allow roach motel
> into the RmW for other accesses in the non-fully-ordered cases.
> 
> Daniel -- your AMO instructions are cool with this, right? It's just the
> fence-based implementations that will need help?
> 
> Will
Right, let me pull this part out of the overly-long response I just gave
on the thread with Linus :)

if we pair AMOs with AMOs, we get RCsc, and everything is fine.  If we
start mixing in fences (mostly because we don't currently have native
load-acquire or store-release opcodes), then that's when all the rest of the
complexity comes in.

Dan


Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 2:45 AM, Will Deacon wrote:
> On Thu, Jul 12, 2018 at 11:34:32AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote:
>>> And I think if we raise atomic*_acquire() to require TSO (but ideally
>>> raise it to RCsc) we're there.
>>
>> To clarify, just the RmW-acquire. Things like atomic_read_acquire() can
>> stay smp_load_acquire() and be RCpc.
> 
> I don't have strong opinions about strengthening RmW atomics to TSO, so
> if it helps to unblock Alan's patch (which doesn't go near this!) then I'll
> go with it. The important part is that we continue to allow roach motel
> into the RmW for other accesses in the non-fully-ordered cases.
> 
> Daniel -- your AMO instructions are cool with this, right? It's just the
> fence-based implementations that will need help?
> 
> Will
Right, let me pull this part out of the overly-long response I just gave
on the thread with Linus :)

if we pair AMOs with AMOs, we get RCsc, and everything is fine.  If we
start mixing in fences (mostly because we don't currently have native
load-acquire or store-release opcodes), then that's when all the rest of the
complexity comes in.

Dan


RE: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms

2018-07-12 Thread Anson Huang
Hi, Shawn
Although the commit 5fe156f1cab4 ("regulator: pfuze100: add 
enable/disable for switch") is reverted to avoid the boot failure on some i.MX 
platforms, but adding the "regulator-always-on" property for those pfuze's 
critical switches are the right way and making sense, no matter how the pfuze 
regulator's switch ON/OFF function will be implemented, below patches should 
can be applied anyway?

ARM: dts: imx6sll-evk: make pfuze100 sw4 always on
ARM: dts: make pfuze switch always-on for imx platforms
ARM: dts: imx6sl-evk: keep sw4 always on

Let me know your thoughts, thanks!

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Wednesday, June 27, 2018 9:31 AM
> To: shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de;
> Fabio Estevam ; robh...@kernel.org;
> mark.rutl...@arm.com
> Cc: dl-linux-imx ; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms
> 
> commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> will cause those unreferenced switches being turned off if
> "regulator-always-on" is NOT present, as pfuze switches are normally used by
> critical modules which must be always ON or shared by many peripherals which
> do NOT implement power domain control, so just make sure all switches
> always ON to avoid any system issue caused by unexpectedly turning off
> switches.
> 
> Fixes: 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> Signed-off-by: Anson Huang 
> Reviewed-by: Fabio Estevam 
> ---
> changes since V1:
>   improve the way of referencing commit, and add fix tag.
>  arch/arm/boot/dts/imx6q-display5.dtsi  | 1 +
>  arch/arm/boot/dts/imx6q-mccmon6.dts| 1 +
>  arch/arm/boot/dts/imx6q-novena.dts | 1 +
>  arch/arm/boot/dts/imx6q-pistachio.dts  | 1 +
> arch/arm/boot/dts/imx6qdl-gw54xx.dtsi  | 1 +
> arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 1 +
> arch/arm/boot/dts/imx6sx-sdb-reva.dts  | 1 +
>  7 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-display5.dtsi
> b/arch/arm/boot/dts/imx6q-display5.dtsi
> index 85232c7..33d266f 100644
> --- a/arch/arm/boot/dts/imx6q-display5.dtsi
> +++ b/arch/arm/boot/dts/imx6q-display5.dtsi
> @@ -326,6 +326,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-mccmon6.dts
> b/arch/arm/boot/dts/imx6q-mccmon6.dts
> index b7e9f38..e6429c5 100644
> --- a/arch/arm/boot/dts/imx6q-mccmon6.dts
> +++ b/arch/arm/boot/dts/imx6q-mccmon6.dts
> @@ -166,6 +166,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-novena.dts
> b/arch/arm/boot/dts/imx6q-novena.dts
> index fcd824d..0b3c651 100644
> --- a/arch/arm/boot/dts/imx6q-novena.dts
> +++ b/arch/arm/boot/dts/imx6q-novena.dts
> @@ -341,6 +341,7 @@
>   reg_sw4: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   reg_swbst: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-pistachio.dts
> b/arch/arm/boot/dts/imx6q-pistachio.dts
> index a31e83c..6ea09f9 100644
> --- a/arch/arm/boot/dts/imx6q-pistachio.dts
> +++ b/arch/arm/boot/dts/imx6q-pistachio.dts
> @@ -253,6 +253,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> index a1a6fb5..281cae5 100644
> --- a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> @@ -268,6 +268,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> index 15744ad..6e46a19 100644
> --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> +++ 

RE: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms

2018-07-12 Thread Anson Huang
Hi, Shawn
Although the commit 5fe156f1cab4 ("regulator: pfuze100: add 
enable/disable for switch") is reverted to avoid the boot failure on some i.MX 
platforms, but adding the "regulator-always-on" property for those pfuze's 
critical switches are the right way and making sense, no matter how the pfuze 
regulator's switch ON/OFF function will be implemented, below patches should 
can be applied anyway?

ARM: dts: imx6sll-evk: make pfuze100 sw4 always on
ARM: dts: make pfuze switch always-on for imx platforms
ARM: dts: imx6sl-evk: keep sw4 always on

Let me know your thoughts, thanks!

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Wednesday, June 27, 2018 9:31 AM
> To: shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de;
> Fabio Estevam ; robh...@kernel.org;
> mark.rutl...@arm.com
> Cc: dl-linux-imx ; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms
> 
> commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> will cause those unreferenced switches being turned off if
> "regulator-always-on" is NOT present, as pfuze switches are normally used by
> critical modules which must be always ON or shared by many peripherals which
> do NOT implement power domain control, so just make sure all switches
> always ON to avoid any system issue caused by unexpectedly turning off
> switches.
> 
> Fixes: 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> Signed-off-by: Anson Huang 
> Reviewed-by: Fabio Estevam 
> ---
> changes since V1:
>   improve the way of referencing commit, and add fix tag.
>  arch/arm/boot/dts/imx6q-display5.dtsi  | 1 +
>  arch/arm/boot/dts/imx6q-mccmon6.dts| 1 +
>  arch/arm/boot/dts/imx6q-novena.dts | 1 +
>  arch/arm/boot/dts/imx6q-pistachio.dts  | 1 +
> arch/arm/boot/dts/imx6qdl-gw54xx.dtsi  | 1 +
> arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 1 +
> arch/arm/boot/dts/imx6sx-sdb-reva.dts  | 1 +
>  7 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-display5.dtsi
> b/arch/arm/boot/dts/imx6q-display5.dtsi
> index 85232c7..33d266f 100644
> --- a/arch/arm/boot/dts/imx6q-display5.dtsi
> +++ b/arch/arm/boot/dts/imx6q-display5.dtsi
> @@ -326,6 +326,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-mccmon6.dts
> b/arch/arm/boot/dts/imx6q-mccmon6.dts
> index b7e9f38..e6429c5 100644
> --- a/arch/arm/boot/dts/imx6q-mccmon6.dts
> +++ b/arch/arm/boot/dts/imx6q-mccmon6.dts
> @@ -166,6 +166,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-novena.dts
> b/arch/arm/boot/dts/imx6q-novena.dts
> index fcd824d..0b3c651 100644
> --- a/arch/arm/boot/dts/imx6q-novena.dts
> +++ b/arch/arm/boot/dts/imx6q-novena.dts
> @@ -341,6 +341,7 @@
>   reg_sw4: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   reg_swbst: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-pistachio.dts
> b/arch/arm/boot/dts/imx6q-pistachio.dts
> index a31e83c..6ea09f9 100644
> --- a/arch/arm/boot/dts/imx6q-pistachio.dts
> +++ b/arch/arm/boot/dts/imx6q-pistachio.dts
> @@ -253,6 +253,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> index a1a6fb5..281cae5 100644
> --- a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> @@ -268,6 +268,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> index 15744ad..6e46a19 100644
> --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> +++ 

[PATCH] mm, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP

2018-07-12 Thread Huang, Ying
From: Huang Ying 

CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's
unreasonable to optimize swapping for THP (Transparent Huge Page)
without basic swapping support.

In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
split_swap_cluster() will not be built because it is in swapfile.c,
but it will be called in huge_memory.c.  This doesn't trigger a build
error in practice because the call site is enclosed by
PageSwapCache(), which is defined to be constant 0 when CONFIG_SWAP=n.
But this is fragile and should be fixed.

The comments are fixed too to reflect the latest progress.

Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: "Huang, Ying" 
Reviewed-by: Dan Williams 
Reviewed-by: Naoya Horiguchi 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Zi Yan 
Cc: Daniel Jordan 
---
 mm/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index b78e7cd4e9fe..97114c94239c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -419,10 +419,11 @@ config ARCH_WANTS_THP_SWAP
 
 config THP_SWAP
def_bool y
-   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP
+   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP
help
  Swap transparent huge pages in one piece, without splitting.
- XXX: For now this only does clustered swap space allocation.
+ XXX: For now, swap cluster backing transparent huge page
+ will be split after swapout.
 
  For selection by architectures with reasonable THP sizes.
 
-- 
2.16.4



[PATCH] mm, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP

2018-07-12 Thread Huang, Ying
From: Huang Ying 

CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's
unreasonable to optimize swapping for THP (Transparent Huge Page)
without basic swapping support.

In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
split_swap_cluster() will not be built because it is in swapfile.c,
but it will be called in huge_memory.c.  This doesn't trigger a build
error in practice because the call site is enclosed by
PageSwapCache(), which is defined to be constant 0 when CONFIG_SWAP=n.
But this is fragile and should be fixed.

The comments are fixed too to reflect the latest progress.

Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: "Huang, Ying" 
Reviewed-by: Dan Williams 
Reviewed-by: Naoya Horiguchi 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Zi Yan 
Cc: Daniel Jordan 
---
 mm/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index b78e7cd4e9fe..97114c94239c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -419,10 +419,11 @@ config ARCH_WANTS_THP_SWAP
 
 config THP_SWAP
def_bool y
-   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP
+   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP
help
  Swap transparent huge pages in one piece, without splitting.
- XXX: For now this only does clustered swap space allocation.
+ XXX: For now, swap cluster backing transparent huge page
+ will be split after swapout.
 
  For selection by architectures with reasonable THP sizes.
 
-- 
2.16.4



Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> The p9_idpool being used to allocate the IDs uses an IDR to allocate
> the IDs ... which we then keep in a doubly-linked list, rather than in
> the IDR which allocated them.  We can use an IDR directly which saves
> two pointers per p9_fid, and a tiny memory allocation per p9_client.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/net/9p/client.h |  9 +++--
>  net/9p/client.c | 44 +++--
>  2 files changed, 19 insertions(+), 34 deletions(-)
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 7af9d769b97d..e405729cd1c7 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -27,6 +27,7 @@
>  #define NET_9P_CLIENT_H
>  
>  #include 
> +#include 
>  
>  /* Number of requests per row */
>  #define P9_ROW_MAXTAG 255
> @@ -128,8 +129,7 @@ struct p9_req_t {
>   * @proto_version: 9P protocol version to use
>   * @trans_mod: module API instantiated with this client
>   * @trans: tranport instance state and API
> - * @fidpool: fid handle accounting for session
> - * @fidlist: List of active fid handles
> + * @fids: All active FID handles
>   * @tagpool - transaction id accounting for session
>   * @reqs - 2D array of requests
>   * @max_tag - current maximum tag id allocated
> @@ -169,8 +169,7 @@ struct p9_client {
>   } tcp;
>   } trans_opts;
>  
> - struct p9_idpool *fidpool;
> - struct list_head fidlist;
> + struct idr fids;
>  
>   struct p9_idpool *tagpool;
>   struct p9_req_t *reqs[P9_ROW_MAXTAG];
> @@ -188,7 +187,6 @@ struct p9_client {
>   * @iounit: the server reported maximum transaction size for this file
>   * @uid: the numeric uid of the local user who owns this handle
>   * @rdir: readdir accounting structure (allocated on demand)
> - * @flist: per-client-instance fid tracking
>   * @dlist: per-dentry fid tracking
>   *
>   * TODO: This needs lots of explanation.
> @@ -204,7 +202,6 @@ struct p9_fid {
>  
>   void *rdir;
>  
> - struct list_head flist;
>   struct hlist_node dlist;/* list of all fids attached to a 
> dentry */
>  };
>  
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 389a2904b7b3..b89c7298267c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  {
>   int ret;
>   struct p9_fid *fid;
> - unsigned long flags;
>  
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
>   return NULL;
>  
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
>   memset(>qid, 0, sizeof(struct p9_qid));
>   fid->mode = -1;
>   fid->uid = current_fsuid();
>   fid->clnt = clnt;
>   fid->rdir = NULL;
> - spin_lock_irqsave(>lock, flags);
> - list_add(>flist, >fidlist);
> - spin_unlock_irqrestore(>lock, flags);
> + fid->fid = 0;
>  
> - return fid;
> + idr_preload(GFP_KERNEL);

It is best to use GFP_NOFS instead, or else it may cause some
unpredictable problem, because when out of memory it will
reclaim memory from v9fs.

> + spin_lock_irq(>lock);
> + ret = idr_alloc_u32(>fids, fid, >fid, P9_NOFID - 1,
> + GFP_NOWAIT);
> + spin_unlock_irq(>lock);

use spin_lock instead, clnt->lock is not used in irq context.

> + idr_preload_end();
> +
> + if (!ret)
> + return fid;
>  
> -error:
>   kfree(fid);
>   return NULL;
>  }
> @@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid)
>  
>   p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
>   clnt = fid->clnt;
> - p9_idpool_put(fid->fid, clnt->fidpool);
>   spin_lock_irqsave(>lock, flags);
> - list_del(>flist);
> + idr_remove(>fids, fid->fid);
>   spin_unlock_irqrestore(>lock, flags);
>   kfree(fid->rdir);
>   kfree(fid);
> @@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   memcpy(clnt->name, client_id, strlen(client_id) + 1);
>  
>   spin_lock_init(>lock);
> - INIT_LIST_HEAD(>fidlist);
> + idr_init(>fids);
>  
>   err = p9_tag_init(clnt);
>   if (err < 0)
> @@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   goto destroy_tagpool;
>   }
>  
> - clnt->fidpool = p9_idpool_create();
> - if (IS_ERR(clnt->fidpool)) {
> - err = PTR_ERR(clnt->fidpool);
> - goto put_trans;
> - }
> -
>   p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
>clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);
>  
>   err = clnt->trans_mod->create(clnt, dev_name, options);
>   if (err)
> - goto destroy_fidpool;
> + goto put_trans;
>  
>   if (clnt->msize > 

Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> The p9_idpool being used to allocate the IDs uses an IDR to allocate
> the IDs ... which we then keep in a doubly-linked list, rather than in
> the IDR which allocated them.  We can use an IDR directly which saves
> two pointers per p9_fid, and a tiny memory allocation per p9_client.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/net/9p/client.h |  9 +++--
>  net/9p/client.c | 44 +++--
>  2 files changed, 19 insertions(+), 34 deletions(-)
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 7af9d769b97d..e405729cd1c7 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -27,6 +27,7 @@
>  #define NET_9P_CLIENT_H
>  
>  #include 
> +#include 
>  
>  /* Number of requests per row */
>  #define P9_ROW_MAXTAG 255
> @@ -128,8 +129,7 @@ struct p9_req_t {
>   * @proto_version: 9P protocol version to use
>   * @trans_mod: module API instantiated with this client
>   * @trans: tranport instance state and API
> - * @fidpool: fid handle accounting for session
> - * @fidlist: List of active fid handles
> + * @fids: All active FID handles
>   * @tagpool - transaction id accounting for session
>   * @reqs - 2D array of requests
>   * @max_tag - current maximum tag id allocated
> @@ -169,8 +169,7 @@ struct p9_client {
>   } tcp;
>   } trans_opts;
>  
> - struct p9_idpool *fidpool;
> - struct list_head fidlist;
> + struct idr fids;
>  
>   struct p9_idpool *tagpool;
>   struct p9_req_t *reqs[P9_ROW_MAXTAG];
> @@ -188,7 +187,6 @@ struct p9_client {
>   * @iounit: the server reported maximum transaction size for this file
>   * @uid: the numeric uid of the local user who owns this handle
>   * @rdir: readdir accounting structure (allocated on demand)
> - * @flist: per-client-instance fid tracking
>   * @dlist: per-dentry fid tracking
>   *
>   * TODO: This needs lots of explanation.
> @@ -204,7 +202,6 @@ struct p9_fid {
>  
>   void *rdir;
>  
> - struct list_head flist;
>   struct hlist_node dlist;/* list of all fids attached to a 
> dentry */
>  };
>  
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 389a2904b7b3..b89c7298267c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  {
>   int ret;
>   struct p9_fid *fid;
> - unsigned long flags;
>  
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
>   return NULL;
>  
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
>   memset(>qid, 0, sizeof(struct p9_qid));
>   fid->mode = -1;
>   fid->uid = current_fsuid();
>   fid->clnt = clnt;
>   fid->rdir = NULL;
> - spin_lock_irqsave(>lock, flags);
> - list_add(>flist, >fidlist);
> - spin_unlock_irqrestore(>lock, flags);
> + fid->fid = 0;
>  
> - return fid;
> + idr_preload(GFP_KERNEL);

It is best to use GFP_NOFS instead, or else it may cause some
unpredictable problem, because when out of memory it will
reclaim memory from v9fs.

> + spin_lock_irq(>lock);
> + ret = idr_alloc_u32(>fids, fid, >fid, P9_NOFID - 1,
> + GFP_NOWAIT);
> + spin_unlock_irq(>lock);

use spin_lock instead, clnt->lock is not used in irq context.

> + idr_preload_end();
> +
> + if (!ret)
> + return fid;
>  
> -error:
>   kfree(fid);
>   return NULL;
>  }
> @@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid)
>  
>   p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
>   clnt = fid->clnt;
> - p9_idpool_put(fid->fid, clnt->fidpool);
>   spin_lock_irqsave(>lock, flags);
> - list_del(>flist);
> + idr_remove(>fids, fid->fid);
>   spin_unlock_irqrestore(>lock, flags);
>   kfree(fid->rdir);
>   kfree(fid);
> @@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   memcpy(clnt->name, client_id, strlen(client_id) + 1);
>  
>   spin_lock_init(>lock);
> - INIT_LIST_HEAD(>fidlist);
> + idr_init(>fids);
>  
>   err = p9_tag_init(clnt);
>   if (err < 0)
> @@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   goto destroy_tagpool;
>   }
>  
> - clnt->fidpool = p9_idpool_create();
> - if (IS_ERR(clnt->fidpool)) {
> - err = PTR_ERR(clnt->fidpool);
> - goto put_trans;
> - }
> -
>   p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
>clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);
>  
>   err = clnt->trans_mod->create(clnt, dev_name, options);
>   if (err)
> - goto destroy_fidpool;
> + goto put_trans;
>  
>   if (clnt->msize > 

Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  wrote:
>>
>> The locking pattern is fairly simple and shows where RCpc comes apart
>> from expectation real nice.
> 
> So who does RCpc right now for the unlock-lock sequence? Somebody
> mentioned powerpc. Anybody else?
> 
> How nasty would be be to make powerpc conform? I will always advocate
> tighter locking and ordering rules over looser ones..
> 
> Linus

RISC-V probably would have been RCpc if we weren't having this discussion.
Depending on how we map atomics/acquire/release/unlock/lock, we can end up
producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
behaviors, and we're trying to figure out which we actually need.

I think the debate is this:

Obviously programmers would prefer just to have RCsc and not have to figure out
all the complexity of the other options.  On x86 or architectures with native
RCsc operations (like ARMv8), that's generally easy enough to get.

For weakly-ordered architectures that use fences for ordering (including
PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
from RCpc to either "RCtso" or RCsc.  People using these architectures are
concerned about whether there's a negative performance impact from those extra
fences.

However, some scheduler code, some RCU code, and probably some other examples
already implicitly or explicitly assume unlock()/lock() provides stronger
ordering than RCpc.  So, we have to decide whether to:
1) define unlock()/lock() to enforce "RCtso" or RCsc, insert more fences on
PowerPC and RISC-V accordingly, and probably negatively regress PowerPC
2) leave unlock()/lock() as enforcing only RCpc, fix any code that currently
assumes something stronger than RCpc is being provided, and hope people don't
get it wrong in the future
3) some mixture like having unlock()/lock() be "RCtso" but smp_store_release()/
smp_cond_load_acquire() be only RCpc

Also, FWIW, if other weakly-ordered architectures come along in the future and
also use any kind of lightweight fence rather than native RCsc operations,
they'll likely be in the same boat as RISC-V and Power here, in the sense of
not providing RCsc by default either.

Is that a fair assessment everyone?



I can also not-so-briefly summarize RISC-V's status here, since I think there's
been a bunch of confusion about where we're coming from:

First of all, I promise we're not trying to start a fight about all this :)
We're trying to understand the LKMM requirements so we know what instructions
to use.

With that, the easy case: RISC-V is RCsc if we use AMOs or load-reserved/
store-conditional, all of which have RCsc .aq and .rl bits:

  (a) ...
  amoswap.w.rl x0, x0, [lock]  // unlock()
  ...
loop:
  amoswap.w.aq a0, t1, [lock]  // lock()
  bnez a0, loop// lock()
  (b) ...

(a) is ordered before (b) here, regardless of (a) and (b).  Likewise for our
load-reserved/store-conditional instructions, which also have .aq and rl.
That's similiar to how ARM behaves, and is no problem.  We're happy with that
too.

Unfortunately, we don't (currently?) have plain load-acquire or store-release
opcodes in the ISA.  (That's a different discussion...)  For those, we need
fences instead.  And that's where it gets messier.

RISC-V *would* end up providing only RCpc if we use what I'd argue is the most
"natural" fence-based mapping for store-release operations, and then pair that
with LR/SC:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

However, if (a) and (b) are loads to different addresses, then (a) is not
ordered before (b) here.  One unpaired RCsc operation is not a full fence.
Clearly "fence rw,w" is not sufficient if the scheduler, RCU, and elsewhere
depend on "RCtso" or RCsc.

RISC-V can get back to "RCtso", matching PowerPC, by using a stronger fence:

  (a) ...
  fence.tso  // unlock(), fence.tso == fence rw,w + fence r,r
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

(a) is ordered before (b), unless (a) is a store and (b) is a load to a
different address.

(Modeling note: this example is why I asked for Alan's v3 patch over the v2
patch, which I believe would only have worked if the fence.tso were at the end)

To get full RCsc here, we'd need a fence rw,rw in between the unlock store and
the lock load, much like PowerPC would I believe need a heavyweight sync:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
  fence rw,rw// can attach either to lock() or to unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

In general, RISC-V's fence.tso will suffice wherever PowerPC's lwsync does, and
RISC-V's fence 

Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  wrote:
>>
>> The locking pattern is fairly simple and shows where RCpc comes apart
>> from expectation real nice.
> 
> So who does RCpc right now for the unlock-lock sequence? Somebody
> mentioned powerpc. Anybody else?
> 
> How nasty would be be to make powerpc conform? I will always advocate
> tighter locking and ordering rules over looser ones..
> 
> Linus

RISC-V probably would have been RCpc if we weren't having this discussion.
Depending on how we map atomics/acquire/release/unlock/lock, we can end up
producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
behaviors, and we're trying to figure out which we actually need.

I think the debate is this:

Obviously programmers would prefer just to have RCsc and not have to figure out
all the complexity of the other options.  On x86 or architectures with native
RCsc operations (like ARMv8), that's generally easy enough to get.

For weakly-ordered architectures that use fences for ordering (including
PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
from RCpc to either "RCtso" or RCsc.  People using these architectures are
concerned about whether there's a negative performance impact from those extra
fences.

However, some scheduler code, some RCU code, and probably some other examples
already implicitly or explicitly assume unlock()/lock() provides stronger
ordering than RCpc.  So, we have to decide whether to:
1) define unlock()/lock() to enforce "RCtso" or RCsc, insert more fences on
PowerPC and RISC-V accordingly, and probably negatively regress PowerPC
2) leave unlock()/lock() as enforcing only RCpc, fix any code that currently
assumes something stronger than RCpc is being provided, and hope people don't
get it wrong in the future
3) some mixture like having unlock()/lock() be "RCtso" but smp_store_release()/
smp_cond_load_acquire() be only RCpc

Also, FWIW, if other weakly-ordered architectures come along in the future and
also use any kind of lightweight fence rather than native RCsc operations,
they'll likely be in the same boat as RISC-V and Power here, in the sense of
not providing RCsc by default either.

Is that a fair assessment everyone?



I can also not-so-briefly summarize RISC-V's status here, since I think there's
been a bunch of confusion about where we're coming from:

First of all, I promise we're not trying to start a fight about all this :)
We're trying to understand the LKMM requirements so we know what instructions
to use.

With that, the easy case: RISC-V is RCsc if we use AMOs or load-reserved/
store-conditional, all of which have RCsc .aq and .rl bits:

  (a) ...
  amoswap.w.rl x0, x0, [lock]  // unlock()
  ...
loop:
  amoswap.w.aq a0, t1, [lock]  // lock()
  bnez a0, loop// lock()
  (b) ...

(a) is ordered before (b) here, regardless of (a) and (b).  Likewise for our
load-reserved/store-conditional instructions, which also have .aq and rl.
That's similiar to how ARM behaves, and is no problem.  We're happy with that
too.

Unfortunately, we don't (currently?) have plain load-acquire or store-release
opcodes in the ISA.  (That's a different discussion...)  For those, we need
fences instead.  And that's where it gets messier.

RISC-V *would* end up providing only RCpc if we use what I'd argue is the most
"natural" fence-based mapping for store-release operations, and then pair that
with LR/SC:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

However, if (a) and (b) are loads to different addresses, then (a) is not
ordered before (b) here.  One unpaired RCsc operation is not a full fence.
Clearly "fence rw,w" is not sufficient if the scheduler, RCU, and elsewhere
depend on "RCtso" or RCsc.

RISC-V can get back to "RCtso", matching PowerPC, by using a stronger fence:

  (a) ...
  fence.tso  // unlock(), fence.tso == fence rw,w + fence r,r
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

(a) is ordered before (b), unless (a) is a store and (b) is a load to a
different address.

(Modeling note: this example is why I asked for Alan's v3 patch over the v2
patch, which I believe would only have worked if the fence.tso were at the end)

To get full RCsc here, we'd need a fence rw,rw in between the unlock store and
the lock load, much like PowerPC would I believe need a heavyweight sync:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
  fence rw,rw// can attach either to lock() or to unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

In general, RISC-V's fence.tso will suffice wherever PowerPC's lwsync does, and
RISC-V's fence 

Re: [PATCH v13 06/18] x86/xen/time: initialize pv xen time in init_hypervisor_platform

2018-07-12 Thread Pavel Tatashin
> -void __ref xen_init_time_ops(void)
> +void __init xen_init_time_ops(void)
>  {
> pv_time_ops = xen_time_ops;
>
> @@ -542,17 +542,11 @@ void __init xen_hvm_init_time_ops(void)
> return;
>
> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
> -   printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
> -   "disable pv timer\n");
> +   pr_info("Xen doesn't support pvclock on HVM, disable pv 
> timer");
> return;
> }
> -
> -   pv_time_ops = xen_time_ops;
> +   xen_init_time_ops();
> x86_init.timers.setup_percpu_clockev = xen_time_init;
> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;

Boris reported a bug on HVM, which causes a panic in
x86_late_time_init(). It is introduced here: xen_init_time_ops() sets:
x86_init.timers.timer_init = xen_time_init; which was hpet_time_init()
in HVM. However, we might not even need hpet here. Thus, adding
x86_init.timers.timer_init = x86_init_noop; to the end of
xen_hvm_init_time_ops() should be sufficient.

Thank you,
Pavel


Re: [PATCH v13 06/18] x86/xen/time: initialize pv xen time in init_hypervisor_platform

2018-07-12 Thread Pavel Tatashin
> -void __ref xen_init_time_ops(void)
> +void __init xen_init_time_ops(void)
>  {
> pv_time_ops = xen_time_ops;
>
> @@ -542,17 +542,11 @@ void __init xen_hvm_init_time_ops(void)
> return;
>
> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
> -   printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
> -   "disable pv timer\n");
> +   pr_info("Xen doesn't support pvclock on HVM, disable pv 
> timer");
> return;
> }
> -
> -   pv_time_ops = xen_time_ops;
> +   xen_init_time_ops();
> x86_init.timers.setup_percpu_clockev = xen_time_init;
> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;

Boris reported a bug on HVM, which causes a panic in
x86_late_time_init(). It is introduced here: xen_init_time_ops() sets:
x86_init.timers.timer_init = xen_time_init; which was hpet_time_init()
in HVM. However, we might not even need hpet here. Thus, adding
x86_init.timers.timer_init = x86_init_noop; to the end of
xen_hvm_init_time_ops() should be sufficient.

Thank you,
Pavel


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello

I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [PATCH v3 1/2] leds: core: Introduce generic pattern interface

2018-07-12 Thread Baolin Wang
Hi Jacek,

On 13 July 2018 at 05:41, Jacek Anaszewski  wrote:
> Hi Baolin,
>
>
> On 07/12/2018 02:24 PM, Baolin Wang wrote:
>>
>> Hi Jacek,
>>
>> On 12 July 2018 at 05:10, Jacek Anaszewski 
>> wrote:
>>>
>>> Hi Baolin.
>>>
>>>
>>> On 07/11/2018 01:02 PM, Baolin Wang wrote:


 Hi Jacek and Pavel,

 On 29 June 2018 at 13:03, Baolin Wang  wrote:
>
>
> From: Bjorn Andersson 
>
> Some LED controllers have support for autonomously controlling
> brightness over time, according to some preprogrammed pattern or
> function.
>
> This adds a new optional operator that LED class drivers can implement
> if they support such functionality as well as a new device attribute to
> configure the pattern for a given LED.
>
> [Baolin Wang did some minor improvements.]
>
> Signed-off-by: Bjorn Andersson 
> Signed-off-by: Baolin Wang 
> ---
> Changes from v2:
>- Change kernel version to 4.19.
>- Force user to return error pointer if failed to issue
> pattern_get().
>- Use strstrip() to trim trailing newline.
>- Other optimization.
>
> Changes from v1:
>- Add some comments suggested by Pavel.
>- Change 'delta_t' can be 0.
>
> Note: I removed the pattern repeat check and will get the repeat number
> by adding
> one extra file named 'pattern_repeat' according to previous discussion.
> ---



 Do you have any comments for this version patch set? Thanks.
>>>
>>>
>>>
>>> I tried modifying uleds.c driver to support pattern ops, but
>>> I'm getting segfault when doing "cat pattern". I didn't give
>>> it serious testing and analysis - will do it at weekend.
>>>
>>> It also drew my attention to the issue of desired pattern sysfs
>>> interface semantics on uninitialized pattern. In your implementation
>>> user seems to be unable to determine if the pattern is activated
>>> or not. We should define the semantics for this use case and
>>> describe it in the documentation. Possibly pattern could
>>> return alone new line character then.
>>
>>
>> I am not sure I get your points correctly. If user writes values to
>> pattern interface which means we activated the pattern.
>> If I am wrong, could you elaborate on the issue you concerned? Thanks.
>
>
> Now I see, that writing empty string disables the pattern, right?
> It should be explicitly stated in the pattern file documentation.

Yes, you are right. OK, I will add some documentation for this. Thanks.

>>> This is the code snippet I've used for testing pattern interface:
>>>
>>> static struct led_pattern ptrn[10];
>>> static int ptrn_len;
>>>
>>> static int uled_pattern_clear(struct led_classdev *ldev)
>>> {
>>>  return 0;
>>> }
>>>
>>> static int uled_pattern_set(struct led_classdev *ldev,
>>>struct led_pattern *pattern,
>>>int len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < len; i++) {
>>>  ptrn[i].brightness = pattern[i].brightness;
>>>  ptrn[i].delta_t = pattern[i].delta_t;
>>>  }
>>>
>>>  ptrn_len = len;
>>>
>>>  return 0;
>>> }
>>>
>>> static struct led_pattern *uled_pattern_get(struct led_classdev *ldev,
>>>int *len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < ptrn_len; i++) {
>>>  ptrn[i].brightness = 3;
>>>  ptrn[i].delta_t = 5;
>>>  }
>>>
>>>  *len = ptrn_len;
>>>
>>>  return ptrn;
>>>
>>> }
>>
>>
>> The reason you met segfault when doing "cat pattern" is you should not
>> return one static pattern array, since in pattern_show() it will help
>> to free the pattern memory, could you change to return one pattern
>> pointer with dynamic allocation like my patch 2?
>
>
> Thanks for pointing this out.
>
>
>Documentation/ABI/testing/sysfs-class-led |   17 +
>drivers/leds/led-class.c  |  119
> +
>include/linux/leds.h  |   19 +
>3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-class-led
> b/Documentation/ABI/testing/sysfs-class-led
> index 5f67f7a..e01ac55 100644
> --- a/Documentation/ABI/testing/sysfs-class-led
> +++ b/Documentation/ABI/testing/sysfs-class-led
> @@ -61,3 +61,20 @@ Description:
>   gpio and backlight triggers. In case of the backlight
> trigger,
>   it is useful when driving a LED which is intended to
> indicate
>   a device in a standby like state.
> +
> +What: /sys/class/leds//pattern
> +Date: June 2018
> +KernelVersion: 4.19
> +Description:
> +   Specify a pattern for the LED, for LED hardware that support
> +   altering the brightness as a 

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello

I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [PATCH v3 1/2] leds: core: Introduce generic pattern interface

2018-07-12 Thread Baolin Wang
Hi Jacek,

On 13 July 2018 at 05:41, Jacek Anaszewski  wrote:
> Hi Baolin,
>
>
> On 07/12/2018 02:24 PM, Baolin Wang wrote:
>>
>> Hi Jacek,
>>
>> On 12 July 2018 at 05:10, Jacek Anaszewski 
>> wrote:
>>>
>>> Hi Baolin.
>>>
>>>
>>> On 07/11/2018 01:02 PM, Baolin Wang wrote:


 Hi Jacek and Pavel,

 On 29 June 2018 at 13:03, Baolin Wang  wrote:
>
>
> From: Bjorn Andersson 
>
> Some LED controllers have support for autonomously controlling
> brightness over time, according to some preprogrammed pattern or
> function.
>
> This adds a new optional operator that LED class drivers can implement
> if they support such functionality as well as a new device attribute to
> configure the pattern for a given LED.
>
> [Baolin Wang did some minor improvements.]
>
> Signed-off-by: Bjorn Andersson 
> Signed-off-by: Baolin Wang 
> ---
> Changes from v2:
>- Change kernel version to 4.19.
>- Force user to return error pointer if failed to issue
> pattern_get().
>- Use strstrip() to trim trailing newline.
>- Other optimization.
>
> Changes from v1:
>- Add some comments suggested by Pavel.
>- Change 'delta_t' can be 0.
>
> Note: I removed the pattern repeat check and will get the repeat number
> by adding
> one extra file named 'pattern_repeat' according to previous discussion.
> ---



 Do you have any comments for this version patch set? Thanks.
>>>
>>>
>>>
>>> I tried modifying uleds.c driver to support pattern ops, but
>>> I'm getting segfault when doing "cat pattern". I didn't give
>>> it serious testing and analysis - will do it at weekend.
>>>
>>> It also drew my attention to the issue of desired pattern sysfs
>>> interface semantics on uninitialized pattern. In your implementation
>>> user seems to be unable to determine if the pattern is activated
>>> or not. We should define the semantics for this use case and
>>> describe it in the documentation. Possibly pattern could
>>> return alone new line character then.
>>
>>
>> I am not sure I get your points correctly. If user writes values to
>> pattern interface which means we activated the pattern.
>> If I am wrong, could you elaborate on the issue you concerned? Thanks.
>
>
> Now I see, that writing empty string disables the pattern, right?
> It should be explicitly stated in the pattern file documentation.

Yes, you are right. OK, I will add some documentation for this. Thanks.

>>> This is the code snippet I've used for testing pattern interface:
>>>
>>> static struct led_pattern ptrn[10];
>>> static int ptrn_len;
>>>
>>> static int uled_pattern_clear(struct led_classdev *ldev)
>>> {
>>>  return 0;
>>> }
>>>
>>> static int uled_pattern_set(struct led_classdev *ldev,
>>>struct led_pattern *pattern,
>>>int len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < len; i++) {
>>>  ptrn[i].brightness = pattern[i].brightness;
>>>  ptrn[i].delta_t = pattern[i].delta_t;
>>>  }
>>>
>>>  ptrn_len = len;
>>>
>>>  return 0;
>>> }
>>>
>>> static struct led_pattern *uled_pattern_get(struct led_classdev *ldev,
>>>int *len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < ptrn_len; i++) {
>>>  ptrn[i].brightness = 3;
>>>  ptrn[i].delta_t = 5;
>>>  }
>>>
>>>  *len = ptrn_len;
>>>
>>>  return ptrn;
>>>
>>> }
>>
>>
>> The reason you met segfault when doing "cat pattern" is you should not
>> return one static pattern array, since in pattern_show() it will help
>> to free the pattern memory, could you change to return one pattern
>> pointer with dynamic allocation like my patch 2?
>
>
> Thanks for pointing this out.
>
>
>Documentation/ABI/testing/sysfs-class-led |   17 +
>drivers/leds/led-class.c  |  119
> +
>include/linux/leds.h  |   19 +
>3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-class-led
> b/Documentation/ABI/testing/sysfs-class-led
> index 5f67f7a..e01ac55 100644
> --- a/Documentation/ABI/testing/sysfs-class-led
> +++ b/Documentation/ABI/testing/sysfs-class-led
> @@ -61,3 +61,20 @@ Description:
>   gpio and backlight triggers. In case of the backlight
> trigger,
>   it is useful when driving a LED which is intended to
> indicate
>   a device in a standby like state.
> +
> +What: /sys/class/leds//pattern
> +Date: June 2018
> +KernelVersion: 4.19
> +Description:
> +   Specify a pattern for the LED, for LED hardware that support
> +   altering the brightness as a 

Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller

2018-07-12 Thread Yixun Lan
Hi Rob, Jerome, Kevin

see my comments

On 07/13/18 08:15, Rob Herring wrote:
> On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan  wrote:
>>
>> HI Rob
>>
>> see my comments
>>
>> On 07/12/2018 10:17 PM, Rob Herring wrote:
>>> On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan  wrote:

 Hi Rob

 see my comments

 On 07/12/18 03:43, Rob Herring wrote:
> On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote:
>> Document the MMC sub clock controller driver, the potential consumer
>> of this driver is MMC or NAND.
>
> So you all have decided to properly model this now?
>
 Yes, ;-)

>>
>> Signed-off-by: Yixun Lan 
>> ---
>>  .../bindings/clock/amlogic,mmc-clkc.txt   | 31 +++
>>  1 file changed, 31 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt 
>> b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> new file mode 100644
>> index ..ff6b4bf3ecf9
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> @@ -0,0 +1,31 @@
>> +* Amlogic MMC Sub Clock Controller Driver
>> +
>> +The Amlogic MMC clock controller generates and supplies clock to support
>> +MMC and NAND controller
>> +
>> +Required Properties:
>> +
>> +- compatible: should be:
>> +"amlogic,meson-gx-mmc-clkc"
>> +"amlogic,meson-axg-mmc-clkc"
>> +
>> +- #clock-cells: should be 1.
>> +- clocks: phandles to clocks corresponding to the clock-names property
>> +- clock-names: list of parent clock names
>> +- "clkin0", "clkin1"
>> +
>> +Parent node should have the following properties :
>> +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc"
>
> You don't need "simple-mfd" and probably not syscon either. The order is
> wrong too. Most specific first.
>
 Ok, I will drop "simple-mfd"..

 but the syscon is a must, since this mmc clock model access registers
 via the regmap interface
>>>
>>> A syscon compatible should not be the only way to get a regmap.
>> do you have any suggestion about other function that I can use? is
>> devm_regmap_init_mmio() feasible
>>
>>> Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient.
>>>
>> I'm not sure what's the valid point of removing compatible 'syscon' in
>> driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix?
>> will you propose a patch for this? then I can certainly adjust here
> 
> Removing the 2 lines will simply allow any node to be a syscon. If
> there's a specific driver for a node, then that makes sense to allow
> that.
> 
>>
>>> Why do you need a regmap in the first place? What else needs to access
>>> this register directly?
>> Yes, the SD_EMMC_CLOCK register contain several bits which not fit well
>> into common clock model, and they need to be access in the NAND or eMMC
>> driver itself, Martin had explained this in early thread[1]
>>
>> In this register
>> Bit[31] select NAND or eMMC function
>> Bit[25] enable SDIO IRQ
>> Bit[24] Clock always on
>> Bit[15:14] SRAM Power down
>>
>> [1]
>> https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com
>>
>>> Don't you need a patch removing the clock code
>>> from within the emmc driver? It's not even using regmap, so using
>>> regmap here doesn't help.
>>>
>> No, and current eMMC driver still use iomap to access the register
> 
> Which means a read-modify-write can corrupt the register value if both
> users don't access thru regmap. Changes are probably infrequent enough
> that you get lucky...
> 
What's you says here is true.
and we try to guarantee that only one of NAND or eMMC is enabled, so no
race condition, as a example of the use cases:

1) for enabling NAND driver, we do
   a) enable both mmc-clkc, and NAND driver in DT, they can access
register by using regmap interface
   b) disable eMMC DT node

2) for enabling eMMC driver, we do
   a) enable eMMC node, access register by using iomap (for now)
   b) disable both mmc-clkc and NAND in DT


>> I think we probably would like to take two steps approach.
>> first, from the hardware perspective, the NAND and eMMC(port C) driver
>> can't exist at same time, since they share the pins, clock, internal
>> ram, So we have to only enable one of NAND or eMMC in DT, not enable
>> both of them.
> 
> Yes, of course.
> 
>> Second, we might like to convert eMMC driver to also use mmc-clkc model.
> 
> IMO, this should be done as part of merging this series. Otherwise, we
> have duplicated code for the same thing.

IMO, I'd leave this out of this series, since this patch series is quite
complete as itself. Although, the downside is code duplication.

Still, I need to hear Jerome, or Kevin's option, to 

Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller

2018-07-12 Thread Yixun Lan
Hi Rob, Jerome, Kevin

see my comments

On 07/13/18 08:15, Rob Herring wrote:
> On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan  wrote:
>>
>> HI Rob
>>
>> see my comments
>>
>> On 07/12/2018 10:17 PM, Rob Herring wrote:
>>> On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan  wrote:

 Hi Rob

 see my comments

 On 07/12/18 03:43, Rob Herring wrote:
> On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote:
>> Document the MMC sub clock controller driver, the potential consumer
>> of this driver is MMC or NAND.
>
> So you all have decided to properly model this now?
>
 Yes, ;-)

>>
>> Signed-off-by: Yixun Lan 
>> ---
>>  .../bindings/clock/amlogic,mmc-clkc.txt   | 31 +++
>>  1 file changed, 31 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt 
>> b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> new file mode 100644
>> index ..ff6b4bf3ecf9
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> @@ -0,0 +1,31 @@
>> +* Amlogic MMC Sub Clock Controller Driver
>> +
>> +The Amlogic MMC clock controller generates and supplies clock to support
>> +MMC and NAND controller
>> +
>> +Required Properties:
>> +
>> +- compatible: should be:
>> +"amlogic,meson-gx-mmc-clkc"
>> +"amlogic,meson-axg-mmc-clkc"
>> +
>> +- #clock-cells: should be 1.
>> +- clocks: phandles to clocks corresponding to the clock-names property
>> +- clock-names: list of parent clock names
>> +- "clkin0", "clkin1"
>> +
>> +Parent node should have the following properties :
>> +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc"
>
> You don't need "simple-mfd" and probably not syscon either. The order is
> wrong too. Most specific first.
>
 Ok, I will drop "simple-mfd"..

 but the syscon is a must, since this mmc clock model access registers
 via the regmap interface
>>>
>>> A syscon compatible should not be the only way to get a regmap.
>> do you have any suggestion about other function that I can use? is
>> devm_regmap_init_mmio() feasible
>>
>>> Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient.
>>>
>> I'm not sure what's the valid point of removing compatible 'syscon' in
>> driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix?
>> will you propose a patch for this? then I can certainly adjust here
> 
> Removing the 2 lines will simply allow any node to be a syscon. If
> there's a specific driver for a node, then that makes sense to allow
> that.
> 
>>
>>> Why do you need a regmap in the first place? What else needs to access
>>> this register directly?
>> Yes, the SD_EMMC_CLOCK register contain several bits which not fit well
>> into common clock model, and they need to be access in the NAND or eMMC
>> driver itself, Martin had explained this in early thread[1]
>>
>> In this register
>> Bit[31] select NAND or eMMC function
>> Bit[25] enable SDIO IRQ
>> Bit[24] Clock always on
>> Bit[15:14] SRAM Power down
>>
>> [1]
>> https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com
>>
>>> Don't you need a patch removing the clock code
>>> from within the emmc driver? It's not even using regmap, so using
>>> regmap here doesn't help.
>>>
>> No, and current eMMC driver still use iomap to access the register
> 
> Which means a read-modify-write can corrupt the register value if both
> users don't access thru regmap. Changes are probably infrequent enough
> that you get lucky...
> 
What's you says here is true.
and we try to guarantee that only one of NAND or eMMC is enabled, so no
race condition, as a example of the use cases:

1) for enabling NAND driver, we do
   a) enable both mmc-clkc, and NAND driver in DT, they can access
register by using regmap interface
   b) disable eMMC DT node

2) for enabling eMMC driver, we do
   a) enable eMMC node, access register by using iomap (for now)
   b) disable both mmc-clkc and NAND in DT


>> I think we probably would like to take two steps approach.
>> first, from the hardware perspective, the NAND and eMMC(port C) driver
>> can't exist at same time, since they share the pins, clock, internal
>> ram, So we have to only enable one of NAND or eMMC in DT, not enable
>> both of them.
> 
> Yes, of course.
> 
>> Second, we might like to convert eMMC driver to also use mmc-clkc model.
> 
> IMO, this should be done as part of merging this series. Otherwise, we
> have duplicated code for the same thing.

IMO, I'd leave this out of this series, since this patch series is quite
complete as itself. Although, the downside is code duplication.

Still, I need to hear Jerome, or Kevin's option, to 

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He 
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He 
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   if (!required_kernelcore || required_kernelcore >= totalpages)
>   goto out;
> 
>+  kernel_endpfn = PFN_UP(__pa_symbol(_end));
>   /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-  usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+  arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
>   /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   unsigned long start_pfn, end_pfn;
> 
>   /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * at which zone begins.
>+   */
>+  if (pfn_to_nid(kernel_endpfn) == nid)
>+  usable_startpfn = max(arch_startpfn, kernel_endpfn);

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He 
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He 
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   if (!required_kernelcore || required_kernelcore >= totalpages)
>   goto out;
> 
>+  kernel_endpfn = PFN_UP(__pa_symbol(_end));
>   /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-  usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+  arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
>   /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   unsigned long start_pfn, end_pfn;
> 
>   /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * at which zone begins.
>+   */
>+  if (pfn_to_nid(kernel_endpfn) == nid)
>+  usable_startpfn = max(arch_startpfn, kernel_endpfn);

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


REGRESSION: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-07-12 Thread Marcel Ziswiler
On Mon, 2018-07-02 at 15:16 +0200, Ulf Hansson wrote:
> On 4 June 2018 at 17:35, Aapo Vienamo  wrote:
> > The sdhci get_max_clock callback is set to
> > sdhci_pltfm_clk_get_max_clock
> > and tegra_sdhci_get_max_clock is removed. It appears that the
> > shdci-tegra specific callback was originally introduced due to the
> > requirement that the host clock has to be twice the bus clock on
> > DDR50
> > mode. As far as I can tell the only effect the removal has on DDR50
> > mode
> > is in cases where the parent clock is unable to supply the
> > requested
> > clock rate, causing the DDR50 mode to run at a lower frequency.
> > Currently the DDR50 mode isn't enabled on any of the SoCs and would
> > also
> > require configuring the SDHCI clock divider register to function
> > properly.
> > 
> > The problem with tegra_sdhci_get_max_clock is that it divides the
> > clock
> > rate by two and thus artificially limits the maximum frequency of
> > faster
> > signaling modes which don't have the host-bus frequency ratio
> > requirement
> > of DDR50 such as SDR104 and HS200. Furthermore, the call to
> > clk_round_rate() may return an error which isn't handled by
> > tegra_sdhci_get_max_clock.
> > 
> > Signed-off-by: Aapo Vienamo 
> 
> Thanks, applied for next!
> 
> Kind regards
> Uffe
> 
> > ---
> >  drivers/mmc/host/sdhci-tegra.c | 15 ++-
> >  1 file changed, 2 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci-tegra.c
> > b/drivers/mmc/host/sdhci-tegra.c
> > index 970d38f6..c8745b5 100644
> > --- a/drivers/mmc/host/sdhci-tegra.c
> > +++ b/drivers/mmc/host/sdhci-tegra.c
> > @@ -234,17 +234,6 @@ static void
> > tegra_sdhci_set_uhs_signaling(struct sdhci_host *host,
> > sdhci_set_uhs_signaling(host, timing);
> >  }
> > 
> > -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host
> > *host)
> > -{
> > -   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > -
> > -   /*
> > -* DDR modes require the host to run at double the card
> > frequency, so
> > -* the maximum rate we can support is half of the module
> > input clock.
> > -*/
> > -   return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> > -}
> > -
> >  static void tegra_sdhci_set_tap(struct sdhci_host *host, unsigned
> > int tap)
> >  {
> > u32 reg;
> > @@ -309,7 +298,7 @@ static const struct sdhci_ops tegra_sdhci_ops =
> > {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra20_pdata = {
> > @@ -357,7 +346,7 @@ static const struct sdhci_ops
> > tegra114_sdhci_ops = {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra114_pdata = {
> > --
> > 2.7.4

Hm, for us this definitely breaks stuff. While using Stefan's patch set
[1] we may not only run eMMC at DDR52 even SD cards run stable at
SDR104. With this patch however the clock gets crippled to 45.33 resp.
48 MHz always. This is observed both on Apalis/Colibri T30 as well as
Apalis TK1.

Current next-20180712 just with Stefan's 3 patches:

root@apalis-t30:~# cat /sys/kernel/debug/mmc1/ios 
clock:  4800 Hz
actual clock:   4533 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  3 (8 bits)
timing spec:8 (mmc DDR52)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apalis-t30:~# hdparm -t /dev/mmcblk1

/dev/mmcblk1:
 Timing buffered disk reads: 218 MB in  3.03 seconds =  71.95 MB/sec

root@apalis-t30:~# cat /sys/kernel/debug/mmc2/ios 
clock:  4800 Hz
actual clock:   4800 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  2 (4 bits)
timing spec:6 (sd uhs SDR104)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apalis-t30:~# hdparm -t /dev/mmcblk2

/d

REGRESSION: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-07-12 Thread Marcel Ziswiler
On Mon, 2018-07-02 at 15:16 +0200, Ulf Hansson wrote:
> On 4 June 2018 at 17:35, Aapo Vienamo  wrote:
> > The sdhci get_max_clock callback is set to
> > sdhci_pltfm_clk_get_max_clock
> > and tegra_sdhci_get_max_clock is removed. It appears that the
> > shdci-tegra specific callback was originally introduced due to the
> > requirement that the host clock has to be twice the bus clock on
> > DDR50
> > mode. As far as I can tell the only effect the removal has on DDR50
> > mode
> > is in cases where the parent clock is unable to supply the
> > requested
> > clock rate, causing the DDR50 mode to run at a lower frequency.
> > Currently the DDR50 mode isn't enabled on any of the SoCs and would
> > also
> > require configuring the SDHCI clock divider register to function
> > properly.
> > 
> > The problem with tegra_sdhci_get_max_clock is that it divides the
> > clock
> > rate by two and thus artificially limits the maximum frequency of
> > faster
> > signaling modes which don't have the host-bus frequency ratio
> > requirement
> > of DDR50 such as SDR104 and HS200. Furthermore, the call to
> > clk_round_rate() may return an error which isn't handled by
> > tegra_sdhci_get_max_clock.
> > 
> > Signed-off-by: Aapo Vienamo 
> 
> Thanks, applied for next!
> 
> Kind regards
> Uffe
> 
> > ---
> >  drivers/mmc/host/sdhci-tegra.c | 15 ++-
> >  1 file changed, 2 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci-tegra.c
> > b/drivers/mmc/host/sdhci-tegra.c
> > index 970d38f6..c8745b5 100644
> > --- a/drivers/mmc/host/sdhci-tegra.c
> > +++ b/drivers/mmc/host/sdhci-tegra.c
> > @@ -234,17 +234,6 @@ static void
> > tegra_sdhci_set_uhs_signaling(struct sdhci_host *host,
> > sdhci_set_uhs_signaling(host, timing);
> >  }
> > 
> > -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host
> > *host)
> > -{
> > -   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > -
> > -   /*
> > -* DDR modes require the host to run at double the card
> > frequency, so
> > -* the maximum rate we can support is half of the module
> > input clock.
> > -*/
> > -   return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> > -}
> > -
> >  static void tegra_sdhci_set_tap(struct sdhci_host *host, unsigned
> > int tap)
> >  {
> > u32 reg;
> > @@ -309,7 +298,7 @@ static const struct sdhci_ops tegra_sdhci_ops =
> > {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra20_pdata = {
> > @@ -357,7 +346,7 @@ static const struct sdhci_ops
> > tegra114_sdhci_ops = {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra114_pdata = {
> > --
> > 2.7.4

Hm, for us this definitely breaks stuff. While using Stefan's patch set
[1] we may not only run eMMC at DDR52 even SD cards run stable at
SDR104. With this patch however the clock gets crippled to 45.33 resp.
48 MHz always. This is observed both on Apalis/Colibri T30 as well as
Apalis TK1.

Current next-20180712 just with Stefan's 3 patches:

root@apalis-t30:~# cat /sys/kernel/debug/mmc1/ios 
clock:  4800 Hz
actual clock:   4533 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  3 (8 bits)
timing spec:8 (mmc DDR52)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apalis-t30:~# hdparm -t /dev/mmcblk1

/dev/mmcblk1:
 Timing buffered disk reads: 218 MB in  3.03 seconds =  71.95 MB/sec

root@apalis-t30:~# cat /sys/kernel/debug/mmc2/ios 
clock:  4800 Hz
actual clock:   4800 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  2 (4 bits)
timing spec:6 (sd uhs SDR104)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apalis-t30:~# hdparm -t /dev/mmcblk2

/d

Re: [PATCH 22/32] vfs: Provide documentation for new mount API [ver #9]

2018-07-12 Thread Randy Dunlap
On 07/10/2018 03:43 PM, David Howells wrote:
> Provide documentation for the new mount API.
> 
> Signed-off-by: David Howells 
> ---
> 
>  Documentation/filesystems/mount_api.txt |  439 
> +++
>  1 file changed, 439 insertions(+)
>  create mode 100644 Documentation/filesystems/mount_api.txt

Hi,

I would review this but it sounds like I should just wait for the
next version.

-- 
~Randy


Re: [PATCH 22/32] vfs: Provide documentation for new mount API [ver #9]

2018-07-12 Thread Randy Dunlap
On 07/10/2018 03:43 PM, David Howells wrote:
> Provide documentation for the new mount API.
> 
> Signed-off-by: David Howells 
> ---
> 
>  Documentation/filesystems/mount_api.txt |  439 
> +++
>  1 file changed, 439 insertions(+)
>  create mode 100644 Documentation/filesystems/mount_api.txt

Hi,

I would review this but it sounds like I should just wait for the
next version.

-- 
~Randy


Re: [PATCH v8 2/2] regulator: add QCOM RPMh regulator driver

2018-07-12 Thread David Collins
On 07/12/2018 09:54 AM, Mark Brown wrote:
> On Mon, Jul 09, 2018 at 04:44:14PM -0700, David Collins wrote:
>> On 07/02/2018 03:28 AM, Mark Brown wrote:
>>> On Fri, Jun 22, 2018 at 05:46:14PM -0700, David Collins wrote:
 +static unsigned int rpmh_regulator_pmic4_ldo_of_map_mode(unsigned int 
 mode)
 +{
 +  static const unsigned int of_mode_map[RPMH_REGULATOR_MODE_COUNT] = {
 +  [RPMH_REGULATOR_MODE_RET]  = REGULATOR_MODE_STANDBY,
 +  [RPMH_REGULATOR_MODE_LPM]  = REGULATOR_MODE_IDLE,
 +  [RPMH_REGULATOR_MODE_AUTO] = REGULATOR_MODE_INVALID,
 +  [RPMH_REGULATOR_MODE_HPM]  = REGULATOR_MODE_FAST,
 +  };
> 
>>> Same here, based on that it looks like auto mode is a good map for
>>> normal.
> 
>> LDO type regulators physically do not support AUTO mode.  That is why I
>> specified REGULATOR_MODE_INVALID in the mapping.
> 
> The other question here is why this is even in the table if it's not
> valid (I'm not seeing a need for the MODE_COUNT define)?

I thought that having a table would be more concise and easier to follow.
I can change this to a switch case statement.

Take care,
David

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v8 2/2] regulator: add QCOM RPMh regulator driver

2018-07-12 Thread David Collins
On 07/12/2018 09:54 AM, Mark Brown wrote:
> On Mon, Jul 09, 2018 at 04:44:14PM -0700, David Collins wrote:
>> On 07/02/2018 03:28 AM, Mark Brown wrote:
>>> On Fri, Jun 22, 2018 at 05:46:14PM -0700, David Collins wrote:
 +static unsigned int rpmh_regulator_pmic4_ldo_of_map_mode(unsigned int 
 mode)
 +{
 +  static const unsigned int of_mode_map[RPMH_REGULATOR_MODE_COUNT] = {
 +  [RPMH_REGULATOR_MODE_RET]  = REGULATOR_MODE_STANDBY,
 +  [RPMH_REGULATOR_MODE_LPM]  = REGULATOR_MODE_IDLE,
 +  [RPMH_REGULATOR_MODE_AUTO] = REGULATOR_MODE_INVALID,
 +  [RPMH_REGULATOR_MODE_HPM]  = REGULATOR_MODE_FAST,
 +  };
> 
>>> Same here, based on that it looks like auto mode is a good map for
>>> normal.
> 
>> LDO type regulators physically do not support AUTO mode.  That is why I
>> specified REGULATOR_MODE_INVALID in the mapping.
> 
> The other question here is why this is even in the table if it's not
> valid (I'm not seeing a need for the MODE_COUNT define)?

I thought that having a table would be more concise and easier to follow.
I can change this to a switch case statement.

Take care,
David

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH V2 00/19] C-SKY(csky) Linux Kernel Port

2018-07-12 Thread Guo Ren
On Thu, Jul 12, 2018 at 10:04:10AM -0600, Sandra Loosemore wrote:
> On 07/12/2018 06:51 AM, Guo Ren wrote:
> >On Wed, Jul 11, 2018 at 10:51:33AM +0100, David Howells wrote:
> >>Can you say what the --target tuple should be so that I can add the arch to 
> >>my
> >>collection of Fedora cross-binutils and cross-gcc tools built from upstream
> >>binutils and gcc sources?
> >Metor Graghics are helping us upstream gcc and binutils.
> >
> >@Sandra,
> >
> >Could you help me to reply the question?
> 
> Neither binutils nor gcc support for C-SKY are in the upstream repositories
> yet.  We should be resubmitting the binutils port soon (with bug fixes to
> address the test failures that caused it to be rejected the last time), and
> the gcc port will follow that shortly.
> 
> The target triplets we have been testing are csky-elf, csky-linux-gnu, and
> csky-linux-uclibc.  Note that the gcc port will only support v2
> processors/ABI so that is the default ABI for these triplets.
> 
> I'm not familiar with the Fedora tools, but to build a complete toolchain
> you'll need library support as well and I'm not sure what the submission
> status/plans for that are.  E.g. Mentor did a newlib/libgloss port for local
> testing of the ELF toolchain and provided it to C-SKY, but pushing that to
> the upstream repository ourselves is not on our todo list.
> 
> -Sandra

Thank you, Sandra.

 Guo Ren


Re: [PATCH V2 00/19] C-SKY(csky) Linux Kernel Port

2018-07-12 Thread Guo Ren
On Thu, Jul 12, 2018 at 10:04:10AM -0600, Sandra Loosemore wrote:
> On 07/12/2018 06:51 AM, Guo Ren wrote:
> >On Wed, Jul 11, 2018 at 10:51:33AM +0100, David Howells wrote:
> >>Can you say what the --target tuple should be so that I can add the arch to 
> >>my
> >>collection of Fedora cross-binutils and cross-gcc tools built from upstream
> >>binutils and gcc sources?
> >Metor Graghics are helping us upstream gcc and binutils.
> >
> >@Sandra,
> >
> >Could you help me to reply the question?
> 
> Neither binutils nor gcc support for C-SKY are in the upstream repositories
> yet.  We should be resubmitting the binutils port soon (with bug fixes to
> address the test failures that caused it to be rejected the last time), and
> the gcc port will follow that shortly.
> 
> The target triplets we have been testing are csky-elf, csky-linux-gnu, and
> csky-linux-uclibc.  Note that the gcc port will only support v2
> processors/ABI so that is the default ABI for these triplets.
> 
> I'm not familiar with the Fedora tools, but to build a complete toolchain
> you'll need library support as well and I'm not sure what the submission
> status/plans for that are.  E.g. Mentor did a newlib/libgloss port for local
> testing of the ELF toolchain and provided it to C-SKY, but pushing that to
> the upstream repository ourselves is not on our todo list.
> 
> -Sandra

Thank you, Sandra.

 Guo Ren


[PATCH 16/18] tools/accounting: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/accounting/getdelays.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c
index 9f420d98b5fb..66817a7a4fce 100644
--- a/tools/accounting/getdelays.c
+++ b/tools/accounting/getdelays.c
@@ -314,8 +314,7 @@ int main(int argc, char *argv[])
err(1, "Invalid rcv buf size\n");
break;
case 'm':
-   strncpy(cpumask, optarg, sizeof(cpumask));
-   cpumask[sizeof(cpumask) - 1] = '\0';
+   strlcpy(cpumask, optarg, sizeof(cpumask));
maskset = 1;
printf("cpumask %s maskset %d\n", cpumask, maskset);
break;
-- 
2.17.1



[PATCH 18/18] cpupower: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/power/cpupower/bench/parse.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/power/cpupower/bench/parse.c 
b/tools/power/cpupower/bench/parse.c
index 9ba8a44ad2a7..1566b89989b2 100644
--- a/tools/power/cpupower/bench/parse.c
+++ b/tools/power/cpupower/bench/parse.c
@@ -221,9 +221,8 @@ int prepare_config(const char *path, struct config *config)
sscanf(val, "%u", >cpu);
 
else if (strcmp("governor", opt) == 0) {
-   strncpy(config->governor, val,
+   strlcpy(config->governor, val,
sizeof(config->governor));
-   config->governor[sizeof(config->governor) - 1] = '\0';
}
 
else if (strcmp("priority", opt) == 0) {
-- 
2.17.1



[PATCH 16/18] tools/accounting: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/accounting/getdelays.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c
index 9f420d98b5fb..66817a7a4fce 100644
--- a/tools/accounting/getdelays.c
+++ b/tools/accounting/getdelays.c
@@ -314,8 +314,7 @@ int main(int argc, char *argv[])
err(1, "Invalid rcv buf size\n");
break;
case 'm':
-   strncpy(cpumask, optarg, sizeof(cpumask));
-   cpumask[sizeof(cpumask) - 1] = '\0';
+   strlcpy(cpumask, optarg, sizeof(cpumask));
maskset = 1;
printf("cpumask %s maskset %d\n", cpumask, maskset);
break;
-- 
2.17.1



[PATCH 18/18] cpupower: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/power/cpupower/bench/parse.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/power/cpupower/bench/parse.c 
b/tools/power/cpupower/bench/parse.c
index 9ba8a44ad2a7..1566b89989b2 100644
--- a/tools/power/cpupower/bench/parse.c
+++ b/tools/power/cpupower/bench/parse.c
@@ -221,9 +221,8 @@ int prepare_config(const char *path, struct config *config)
sscanf(val, "%u", >cpu);
 
else if (strcmp("governor", opt) == 0) {
-   strncpy(config->governor, val,
+   strlcpy(config->governor, val,
sizeof(config->governor));
-   config->governor[sizeof(config->governor) - 1] = '\0';
}
 
else if (strcmp("priority", opt) == 0) {
-- 
2.17.1



[PATCH 17/18] perf: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/perf/util/bpf-loader.h | 3 +--
 tools/perf/util/util.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5d3aefd6fae7..8d08a1fc97a0 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -143,10 +143,9 @@ __bpf_strerror(char *buf, size_t size)
 {
if (!size)
return 0;
-   strncpy(buf,
+   strlcpy(buf,
"ERROR: eBPF object loading is disabled during compiling.\n",
size);
-   buf[size - 1] = '\0';
return 0;
 }
 
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eac5b858a371..8b9e3aa7aad3 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -459,8 +459,7 @@ fetch_kernel_version(unsigned int *puint, char *str,
return -1;
 
if (str && str_size) {
-   strncpy(str, utsname.release, str_size);
-   str[str_size - 1] = '\0';
+   strlcpy(str, utsname.release, str_size);
}
 
if (!puint || int_ver_ready)
-- 
2.17.1



[PATCH 13/18] ibmvscsi: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/scsi/ibmvscsi/ibmvscsi.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 17df76f0be3c..79eb8af03a19 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1274,14 +1274,12 @@ static void send_mad_capabilities(struct 
ibmvscsi_host_data *hostdata)
if (hostdata->client_migrated)
hostdata->caps.flags |= cpu_to_be32(CLIENT_MIGRATED);
 
-   strncpy(hostdata->caps.name, dev_name(>host->shost_gendev),
+   strlcpy(hostdata->caps.name, dev_name(>host->shost_gendev),
sizeof(hostdata->caps.name));
-   hostdata->caps.name[sizeof(hostdata->caps.name) - 1] = '\0';
 
location = of_get_property(of_node, "ibm,loc-code", NULL);
location = location ? location : dev_name(hostdata->dev);
-   strncpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
-   hostdata->caps.loc[sizeof(hostdata->caps.loc) - 1] = '\0';
+   strlcpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
 
req->common.type = cpu_to_be32(VIOSRP_CAPABILITIES_TYPE);
req->buffer = cpu_to_be64(hostdata->caps_addr);
-- 
2.17.1



[PATCH 15/18] blktrace: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Using strlcpy fixes this new gcc warning:
kernel/trace/blktrace.c: In function ‘do_blk_trace_setup’:
kernel/trace/blktrace.c:497:2: warning: ‘strncpy’ specified bound 32 equals 
destination size [-Wstringop-truncation]
  strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
  ^

Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/trace/blktrace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 987d9a9ae283..2478d9838eab 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -494,8 +494,7 @@ static int do_blk_trace_setup(struct request_queue *q, char 
*name, dev_t dev,
if (!buts->buf_size || !buts->buf_nr)
return -EINVAL;
 
-   strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
-   buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
+   strlcpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 
/*
 * some device names have larger paths - convert the slashes
-- 
2.17.1



[PATCH 17/18] perf: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 tools/perf/util/bpf-loader.h | 3 +--
 tools/perf/util/util.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5d3aefd6fae7..8d08a1fc97a0 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -143,10 +143,9 @@ __bpf_strerror(char *buf, size_t size)
 {
if (!size)
return 0;
-   strncpy(buf,
+   strlcpy(buf,
"ERROR: eBPF object loading is disabled during compiling.\n",
size);
-   buf[size - 1] = '\0';
return 0;
 }
 
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eac5b858a371..8b9e3aa7aad3 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -459,8 +459,7 @@ fetch_kernel_version(unsigned int *puint, char *str,
return -1;
 
if (str && str_size) {
-   strncpy(str, utsname.release, str_size);
-   str[str_size - 1] = '\0';
+   strlcpy(str, utsname.release, str_size);
}
 
if (!puint || int_ver_ready)
-- 
2.17.1



[PATCH 13/18] ibmvscsi: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/scsi/ibmvscsi/ibmvscsi.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 17df76f0be3c..79eb8af03a19 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1274,14 +1274,12 @@ static void send_mad_capabilities(struct 
ibmvscsi_host_data *hostdata)
if (hostdata->client_migrated)
hostdata->caps.flags |= cpu_to_be32(CLIENT_MIGRATED);
 
-   strncpy(hostdata->caps.name, dev_name(>host->shost_gendev),
+   strlcpy(hostdata->caps.name, dev_name(>host->shost_gendev),
sizeof(hostdata->caps.name));
-   hostdata->caps.name[sizeof(hostdata->caps.name) - 1] = '\0';
 
location = of_get_property(of_node, "ibm,loc-code", NULL);
location = location ? location : dev_name(hostdata->dev);
-   strncpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
-   hostdata->caps.loc[sizeof(hostdata->caps.loc) - 1] = '\0';
+   strlcpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
 
req->common.type = cpu_to_be32(VIOSRP_CAPABILITIES_TYPE);
req->buffer = cpu_to_be64(hostdata->caps_addr);
-- 
2.17.1



[PATCH 15/18] blktrace: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Using strlcpy fixes this new gcc warning:
kernel/trace/blktrace.c: In function ‘do_blk_trace_setup’:
kernel/trace/blktrace.c:497:2: warning: ‘strncpy’ specified bound 32 equals 
destination size [-Wstringop-truncation]
  strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
  ^

Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/trace/blktrace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 987d9a9ae283..2478d9838eab 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -494,8 +494,7 @@ static int do_blk_trace_setup(struct request_queue *q, char 
*name, dev_t dev,
if (!buts->buf_size || !buts->buf_nr)
return -EINVAL;
 
-   strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
-   buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
+   strlcpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 
/*
 * some device names have larger paths - convert the slashes
-- 
2.17.1



[PATCH 12/18] test_power: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/power/supply/test_power.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/power/supply/test_power.c 
b/drivers/power/supply/test_power.c
index 57246cdbd042..64adf630f64f 100644
--- a/drivers/power/supply/test_power.c
+++ b/drivers/power/supply/test_power.c
@@ -297,8 +297,7 @@ static int map_get_value(struct battery_property_map *map, 
const char *key,
char buf[MAX_KEYLENGTH];
int cr;
 
-   strncpy(buf, key, MAX_KEYLENGTH);
-   buf[MAX_KEYLENGTH-1] = '\0';
+   strlcpy(buf, key, MAX_KEYLENGTH);
 
cr = strnlen(buf, MAX_KEYLENGTH) - 1;
if (cr < 0)
-- 
2.17.1



[PATCH 14/18] kdb_support: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/debug/kdb/kdb_support.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index 990b3cc526c8..1f6a4b6bde0b 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -119,8 +119,7 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
 * What was Rusty smoking when he wrote that code?
 */
if (symtab->sym_name != knt1) {
-   strncpy(knt1, symtab->sym_name, knt1_size);
-   knt1[knt1_size-1] = '\0';
+   strlcpy(knt1, symtab->sym_name, knt1_size);
}
for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) {
if (kdb_name_table[i] &&
-- 
2.17.1



[PATCH 05/18] iio: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/iio/common/st_sensors/st_sensors_core.c | 3 +--
 drivers/iio/pressure/st_pressure_i2c.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c 
b/drivers/iio/common/st_sensors/st_sensors_core.c
index 57db19182e95..26fbd1bd9413 100644
--- a/drivers/iio/common/st_sensors/st_sensors_core.c
+++ b/drivers/iio/common/st_sensors/st_sensors_core.c
@@ -380,8 +380,7 @@ void st_sensors_of_name_probe(struct device *dev,
return;
 
/* The name from the OF match takes precedence if present */
-   strncpy(name, of_id->data, len);
-   name[len - 1] = '\0';
+   strlcpy(name, of_id->data, len);
 }
 EXPORT_SYMBOL(st_sensors_of_name_probe);
 #else
diff --git a/drivers/iio/pressure/st_pressure_i2c.c 
b/drivers/iio/pressure/st_pressure_i2c.c
index fbb59059e942..2026a1012012 100644
--- a/drivers/iio/pressure/st_pressure_i2c.c
+++ b/drivers/iio/pressure/st_pressure_i2c.c
@@ -94,9 +94,8 @@ static int st_press_i2c_probe(struct i2c_client *client,
if ((ret < 0) || (ret >= ST_PRESS_MAX))
return -ENODEV;
 
-   strncpy(client->name, st_press_id_table[ret].name,
+   strlcpy(client->name, st_press_id_table[ret].name,
sizeof(client->name));
-   client->name[sizeof(client->name) - 1] = '\0';
} else if (!id)
return -ENODEV;
 
-- 
2.17.1



[PATCH 08/18] myricom: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index b2d2ec8c11e2..f7178cdb6bd8 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -553,8 +553,7 @@ myri10ge_validate_firmware(struct myri10ge_priv *mgp,
}
 
/* save firmware version for ethtool */
-   strncpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
-   mgp->fw_version[sizeof(mgp->fw_version) - 1] = '\0';
+   strlcpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
 
sscanf(mgp->fw_version, "%d.%d.%d", >fw_ver_major,
   >fw_ver_minor, >fw_ver_tiny);
-- 
2.17.1



[PATCH 12/18] test_power: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/power/supply/test_power.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/power/supply/test_power.c 
b/drivers/power/supply/test_power.c
index 57246cdbd042..64adf630f64f 100644
--- a/drivers/power/supply/test_power.c
+++ b/drivers/power/supply/test_power.c
@@ -297,8 +297,7 @@ static int map_get_value(struct battery_property_map *map, 
const char *key,
char buf[MAX_KEYLENGTH];
int cr;
 
-   strncpy(buf, key, MAX_KEYLENGTH);
-   buf[MAX_KEYLENGTH-1] = '\0';
+   strlcpy(buf, key, MAX_KEYLENGTH);
 
cr = strnlen(buf, MAX_KEYLENGTH) - 1;
if (cr < 0)
-- 
2.17.1



[PATCH 14/18] kdb_support: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/debug/kdb/kdb_support.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index 990b3cc526c8..1f6a4b6bde0b 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -119,8 +119,7 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
 * What was Rusty smoking when he wrote that code?
 */
if (symtab->sym_name != knt1) {
-   strncpy(knt1, symtab->sym_name, knt1_size);
-   knt1[knt1_size-1] = '\0';
+   strlcpy(knt1, symtab->sym_name, knt1_size);
}
for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) {
if (kdb_name_table[i] &&
-- 
2.17.1



[PATCH 05/18] iio: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/iio/common/st_sensors/st_sensors_core.c | 3 +--
 drivers/iio/pressure/st_pressure_i2c.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c 
b/drivers/iio/common/st_sensors/st_sensors_core.c
index 57db19182e95..26fbd1bd9413 100644
--- a/drivers/iio/common/st_sensors/st_sensors_core.c
+++ b/drivers/iio/common/st_sensors/st_sensors_core.c
@@ -380,8 +380,7 @@ void st_sensors_of_name_probe(struct device *dev,
return;
 
/* The name from the OF match takes precedence if present */
-   strncpy(name, of_id->data, len);
-   name[len - 1] = '\0';
+   strlcpy(name, of_id->data, len);
 }
 EXPORT_SYMBOL(st_sensors_of_name_probe);
 #else
diff --git a/drivers/iio/pressure/st_pressure_i2c.c 
b/drivers/iio/pressure/st_pressure_i2c.c
index fbb59059e942..2026a1012012 100644
--- a/drivers/iio/pressure/st_pressure_i2c.c
+++ b/drivers/iio/pressure/st_pressure_i2c.c
@@ -94,9 +94,8 @@ static int st_press_i2c_probe(struct i2c_client *client,
if ((ret < 0) || (ret >= ST_PRESS_MAX))
return -ENODEV;
 
-   strncpy(client->name, st_press_id_table[ret].name,
+   strlcpy(client->name, st_press_id_table[ret].name,
sizeof(client->name));
-   client->name[sizeof(client->name) - 1] = '\0';
} else if (!id)
return -ENODEV;
 
-- 
2.17.1



[PATCH 08/18] myricom: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel=153144450722324=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index b2d2ec8c11e2..f7178cdb6bd8 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -553,8 +553,7 @@ myri10ge_validate_firmware(struct myri10ge_priv *mgp,
}
 
/* save firmware version for ethtool */
-   strncpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
-   mgp->fw_version[sizeof(mgp->fw_version) - 1] = '\0';
+   strlcpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
 
sscanf(mgp->fw_version, "%d.%d.%d", >fw_ver_major,
   >fw_ver_minor, >fw_ver_tiny);
-- 
2.17.1



Re: [bug] kpti, perf_event, bts: sporadic truncated trace

2018-07-12 Thread Hugh Dickins
On Thu, 12 Jul 2018, Metzger, Markus T wrote:

> Hello,
> 
> Starting with 4.15 I noticed that BTS is sporadically missing the tail
> of the trace in the perf_event data buffer.  It shows as
> 
> [decode error (1): instruction overflow]
> 
> in GDB.  Chances to see this are higher the longer the debuggee is
> running.  With this [1] tiny patch to one of GDB's tests, I am able to
> reproduce it reliably on my box.  To run the test, use:
> 
> $ make -s check RUNTESTFLAGS="gdb.btrace/exception.exp"
> 
> from the gdb/ sub-directory in the GDB build directory.
> 
> The issue remains when I use 'nopti' on the kernel command-line.
> 
> 
> Bisecting yielded commit
> 
> c1961a4 x86/events/intel/ds: Map debug buffers in cpu_entry_area
> 
> I reverted the commit on top of v4.17 [2] and the issue disappears
> when I use 'nopti' on the kernel command-line.
> 
> regards,
> markus.
> 
> 
> [1]
> diff --git a/gdb/testsuite/gdb.btrace/exception.exp 
> b/gdb/testsuite/gdb.btrace/exception.exp
> index 9408d61..a24ddd3 100755
> --- a/gdb/testsuite/gdb.btrace/exception.exp
> +++ b/gdb/testsuite/gdb.btrace/exception.exp
> @@ -36,16 +36,12 @@ if ![runto_main] {
>  gdb_test_no_output "set record function-call-history-size 0"
>  
>  # set bp
> -set bp_1 [gdb_get_line_number "bp.1" $srcfile]
>  set bp_2 [gdb_get_line_number "bp.2" $srcfile]
> -gdb_breakpoint $bp_1
>  gdb_breakpoint $bp_2
>  
> -# trace the code between the two breakpoints
> -gdb_continue_to_breakpoint "cont to bp.1" ".*$srcfile:$bp_1\r\n.*"
>  # increase the BTS buffer size - the trace can be quite big
> -gdb_test_no_output "set record btrace bts buffer-size 128000"
> -gdb_test_no_output "record btrace"
> +gdb_test_no_output "set record btrace bts buffer-size 1024000"
> +gdb_test_no_output "record btrace bts"
>  gdb_continue_to_breakpoint "cont to bp.2" ".*$srcfile:$bp_2\r\n.*"
>  
>  # show the flat branch trace
> 
> 
> [2]
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
[ snipped the revert ]

Although my name was kept on that commit as a generous courtesy, it
did change a lot after leaving my fingers - and I was never the best
person to be making perf changes in the first place!

I'm sorry to hear that it's breaking you, I've spent a little while
looking through its final state, most of it looks fine to me, but I
notice one discrepancy: whose effect I cannot predict at all, but
there's a chance that it has something to do with what you're seeing.

A little "optimization" crept into alloc_bts_buffer() along the way,
which now places bts_interrupt_threshold not on a record boundary.
And Stephane has shown me the sentence in Vol 3B, 17.4.9, which says
"This address must point to an offset from the BTS buffer base that
is a multiple of the BTS record size."

Please give the patch below a try, and let us know if it helps (if it
does not, then I think we'll need perfier expertise than I can give).

Hugh

--- 4.18-rc4/arch/x86/events/intel/ds.c 2018-06-03 14:15:21.0 -0700
+++ linux/arch/x86/events/intel/ds.c2018-07-12 17:38:28.471378616 -0700
@@ -408,9 +408,11 @@ static int alloc_bts_buffer(int cpu)
ds->bts_buffer_base = (unsigned long) cea;
ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL);
ds->bts_index = ds->bts_buffer_base;
-   max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE);
-   ds->bts_absolute_maximum = ds->bts_buffer_base + max;
-   ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16);
+   max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
+   ds->bts_absolute_maximum = ds->bts_buffer_base +
+   max * BTS_RECORD_SIZE;
+   ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
+   (max / 16) * BTS_RECORD_SIZE;
return 0;
 }
 


Re: [bug] kpti, perf_event, bts: sporadic truncated trace

2018-07-12 Thread Hugh Dickins
On Thu, 12 Jul 2018, Metzger, Markus T wrote:

> Hello,
> 
> Starting with 4.15 I noticed that BTS is sporadically missing the tail
> of the trace in the perf_event data buffer.  It shows as
> 
> [decode error (1): instruction overflow]
> 
> in GDB.  Chances to see this are higher the longer the debuggee is
> running.  With this [1] tiny patch to one of GDB's tests, I am able to
> reproduce it reliably on my box.  To run the test, use:
> 
> $ make -s check RUNTESTFLAGS="gdb.btrace/exception.exp"
> 
> from the gdb/ sub-directory in the GDB build directory.
> 
> The issue remains when I use 'nopti' on the kernel command-line.
> 
> 
> Bisecting yielded commit
> 
> c1961a4 x86/events/intel/ds: Map debug buffers in cpu_entry_area
> 
> I reverted the commit on top of v4.17 [2] and the issue disappears
> when I use 'nopti' on the kernel command-line.
> 
> regards,
> markus.
> 
> 
> [1]
> diff --git a/gdb/testsuite/gdb.btrace/exception.exp 
> b/gdb/testsuite/gdb.btrace/exception.exp
> index 9408d61..a24ddd3 100755
> --- a/gdb/testsuite/gdb.btrace/exception.exp
> +++ b/gdb/testsuite/gdb.btrace/exception.exp
> @@ -36,16 +36,12 @@ if ![runto_main] {
>  gdb_test_no_output "set record function-call-history-size 0"
>  
>  # set bp
> -set bp_1 [gdb_get_line_number "bp.1" $srcfile]
>  set bp_2 [gdb_get_line_number "bp.2" $srcfile]
> -gdb_breakpoint $bp_1
>  gdb_breakpoint $bp_2
>  
> -# trace the code between the two breakpoints
> -gdb_continue_to_breakpoint "cont to bp.1" ".*$srcfile:$bp_1\r\n.*"
>  # increase the BTS buffer size - the trace can be quite big
> -gdb_test_no_output "set record btrace bts buffer-size 128000"
> -gdb_test_no_output "record btrace"
> +gdb_test_no_output "set record btrace bts buffer-size 1024000"
> +gdb_test_no_output "record btrace bts"
>  gdb_continue_to_breakpoint "cont to bp.2" ".*$srcfile:$bp_2\r\n.*"
>  
>  # show the flat branch trace
> 
> 
> [2]
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
[ snipped the revert ]

Although my name was kept on that commit as a generous courtesy, it
did change a lot after leaving my fingers - and I was never the best
person to be making perf changes in the first place!

I'm sorry to hear that it's breaking you, I've spent a little while
looking through its final state, most of it looks fine to me, but I
notice one discrepancy: whose effect I cannot predict at all, but
there's a chance that it has something to do with what you're seeing.

A little "optimization" crept into alloc_bts_buffer() along the way,
which now places bts_interrupt_threshold not on a record boundary.
And Stephane has shown me the sentence in Vol 3B, 17.4.9, which says
"This address must point to an offset from the BTS buffer base that
is a multiple of the BTS record size."

Please give the patch below a try, and let us know if it helps (if it
does not, then I think we'll need perfier expertise than I can give).

Hugh

--- 4.18-rc4/arch/x86/events/intel/ds.c 2018-06-03 14:15:21.0 -0700
+++ linux/arch/x86/events/intel/ds.c2018-07-12 17:38:28.471378616 -0700
@@ -408,9 +408,11 @@ static int alloc_bts_buffer(int cpu)
ds->bts_buffer_base = (unsigned long) cea;
ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL);
ds->bts_index = ds->bts_buffer_base;
-   max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE);
-   ds->bts_absolute_maximum = ds->bts_buffer_base + max;
-   ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16);
+   max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
+   ds->bts_absolute_maximum = ds->bts_buffer_base +
+   max * BTS_RECORD_SIZE;
+   ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
+   (max / 16) * BTS_RECORD_SIZE;
return 0;
 }
 


Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> Return NULL instead of ERR_PTR when we can't allocate a FID.  The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem.  The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
> 
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
> 
> Signed-off-by: Matthew Wilcox 

Reviewed-by: Yiwen Jiang 

> ---
>  net/9p/client.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>  
>   ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
>   goto error;
> - }
>   fid->fid = ret;
>  
>   memset(>qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  
>  error:
>   kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
>  }
>  
>  static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, 
> struct p9_fid *afid,
>   p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
>afid ? afid->fid : -1, uname, aname);
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, 
> uint16_t nwname,
>   clnt = oldfid->clnt;
>   if (clone) {
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>  
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid 
> *file_fid,
>   err = 0;
>   clnt = file_fid->clnt;
>   attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   p9_debug(P9_DEBUG_9P,
> 



Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> Return NULL instead of ERR_PTR when we can't allocate a FID.  The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem.  The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
> 
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
> 
> Signed-off-by: Matthew Wilcox 

Reviewed-by: Yiwen Jiang 

> ---
>  net/9p/client.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>  
>   ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
>   goto error;
> - }
>   fid->fid = ret;
>  
>   memset(>qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  
>  error:
>   kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
>  }
>  
>  static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, 
> struct p9_fid *afid,
>   p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
>afid ? afid->fid : -1, uname, aname);
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, 
> uint16_t nwname,
>   clnt = oldfid->clnt;
>   if (clone) {
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>  
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid 
> *file_fid,
>   err = 0;
>   clnt = file_fid->clnt;
>   attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   p9_debug(P9_DEBUG_9P,
> 



  1   2   3   4   5   6   7   8   9   10   >