date:20171205

[PATCH] usb: dwc3: gadget: Fix PCM1 for ISOC EP with ep->mult less than 3

2017-12-05 Thread Manu Gautam

For isochronous endpoints with ep->mult less than 3, PCM1 value of
trb->size in set incorrectly.
For ep->mult = 2, this is set to 0/-1 and for ep->mult = 1, this is
set to -2. This is because the initial mult is set to ep->mult - 1
instead of 2.

Signed-off-by: Manu Gautam 
---
 drivers/usb/dwc3/gadget.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index f064f15..17013c1 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -920,7 +920,7 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, 
struct dwc3_trb *trb,
 */
if (speed == USB_SPEED_HIGH) {
struct usb_ep *ep = >endpoint;
-   unsigned int mult = ep->mult - 1;
+   unsigned int mult = 2;
unsigned int maxp = usb_endpoint_maxp(ep->desc);
 
if (length <= (2 * maxp))
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: dwc2: add optional usb ecc reset bit

2017-12-05 Thread John Youn

On 11/01/2017 08:35 AM, Dinh Nguyen wrote:
> The dwc2 USB controller in Stratix10 has an additional ECC reset bit that
> needs to get de-asserted in order for the controller to work properly.
>
> Signed-off-by: Dinh Nguyen 
> ---
>  drivers/usb/dwc2/core.h |  1 +
>  drivers/usb/dwc2/platform.c | 10 ++
>  2 files changed, 11 insertions(+)
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 8367d4f9..a4b5f4e 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -920,6 +920,7 @@ struct dwc2_hsotg {
>   int irq;
>   struct clk *clk;
>   struct reset_control *reset;
> + struct reset_control *reset_ecc;
>
>   unsigned int queuing_high_bandwidth:1;
>   unsigned int srp_success:1;
> diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
> index daf0d37..d466e03 100644
> --- a/drivers/usb/dwc2/platform.c
> +++ b/drivers/usb/dwc2/platform.c
> @@ -220,6 +220,15 @@ static int dwc2_lowlevel_hw_init(struct dwc2_hsotg 
> *hsotg)
>
>   reset_control_deassert(hsotg->reset);
>
> + hsotg->reset_ecc = devm_reset_control_get_optional(hsotg->dev, 
> "dwc2-ecc");
> + if (IS_ERR(hsotg->reset_ecc)) {
> + ret = PTR_ERR(hsotg->reset_ecc);
> + dev_err(hsotg->dev, "error getting reset control for ecc %d\n", 
> ret);
> + return ret;
> + }
> +
> + reset_control_deassert(hsotg->reset_ecc);
> +
>   /* Set default UTMI width */
>   hsotg->phyif = GUSBCFG_PHYIF16;
>
> @@ -318,6 +327,7 @@ static int dwc2_driver_remove(struct platform_device *dev)
>   dwc2_lowlevel_hw_disable(hsotg);
>
>   reset_control_assert(hsotg->reset);
> + reset_control_assert(hsotg->reset_ecc);
>
>   return 0;
>  }
>

Acked-by: John Youn 

John
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] usb: xhci: fix TDS for MTK xHCI1.1

2017-12-05 Thread Chunfeng Yun

For MTK's xHCI 1.0 or latter, TD size is the number of max
packet sized packets remaining in the TD, not including
this TRB (following spec).

For MTK's xHCI 0.96 and older, TD size is the number of max
packet sized packets remaining in the TD, including this TRB
(not following spec).

Signed-off-by: Chunfeng Yun 
---
 drivers/usb/host/xhci-ring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index c239c68..0619869 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3108,7 +3108,7 @@ static u32 xhci_td_remainder(struct xhci_hcd *xhci, int 
transferred,
 {
u32 maxp, total_packet_count;
 
-   /* MTK xHCI is mostly 0.97 but contains some features from 1.0 */
+   /* MTK xHCI 0.96 contains some features from 1.0 */
if (xhci->hci_version < 0x100 && !(xhci->quirks & XHCI_MTK_HOST))
return ((td_total_len - transferred) >> 10);
 
@@ -3117,8 +3117,8 @@ static u32 xhci_td_remainder(struct xhci_hcd *xhci, int 
transferred,
trb_buff_len == td_total_len)
return 0;
 
-   /* for MTK xHCI, TD size doesn't include this TRB */
-   if (xhci->quirks & XHCI_MTK_HOST)
+   /* for MTK xHCI 0.96, TD size include this TRB, but not in 1.x */
+   if ((xhci->quirks & XHCI_MTK_HOST) && (xhci->hci_version < 0x100))
trb_buff_len = 0;
 
maxp = usb_endpoint_maxp(>ep->desc);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: core: Fix logging messages with spurious periods after newlines

2017-12-05 Thread Joe Perches

On Wed, 2017-12-06 at 07:27 +0100, Greg Kroah-Hartman wrote:
> On Tue, Dec 05, 2017 at 10:22:05PM -0800, Joe Perches wrote:
> > Using a period after a newline causes bad output.
> 
> Nice catch, how did you find that?

$ git grep '\\n\."'

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: core: Fix logging messages with spurious periods after newlines

2017-12-05 Thread Greg Kroah-Hartman

On Tue, Dec 05, 2017 at 10:22:05PM -0800, Joe Perches wrote:
> Using a period after a newline causes bad output.

Nice catch, how did you find that?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: usb device implemented with functionfs - must app run as root?

2017-12-05 Thread Greg KH

On Tue, Dec 05, 2017 at 10:09:35PM +, andy_purc...@keysight.com wrote:
> I have implemented a USB device using functionfs.
> A colleague now says our app must run as a normal user, not as root.
> 
> I tried it and it does not work. 
> The problem is this - the endpoint files created by the OS are owned by root. 
> These ep files are created after I write the descriptors and strings to the 
> /dev/usbffs/ep0 file. 
> 
> $ ls -l /dev/usbffs/
> total 0
> -rw-rw-rw- 1 xyzuser xyzgrp 0 Dec  5 21:36 ep0
> -rw--- 1 rootroot   0 Dec  5 21:39 ep1
> -rw--- 1 rootroot   0 Dec  5 21:39 ep2
> -rw--- 1 rootroot   0 Dec  5 21:39 ep3
> 
> A normal user-space app cannot open, write, read, these ep files.
> 
> Is there a remedy for this?

Write a udev rule to change the owners of those files :)

You must have done that already for the ep0 file, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] usb: core: Fix logging messages with spurious periods after newlines

2017-12-05 Thread Joe Perches

Using a period after a newline causes bad output.

Miscellanea:

o Coalesce formats too

Signed-off-by: Joe Perches 
---
 drivers/usb/core/driver.c  |  8 
 drivers/usb/core/hub.c | 17 +++--
 drivers/usb/core/message.c |  6 +++---
 3 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 64262a9a8829..d8d7377b5fb8 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -342,8 +342,8 @@ static int usb_probe_interface(struct device *dev)
if (driver->disable_hub_initiated_lpm) {
lpm_disable_error = usb_unlocked_disable_lpm(udev);
if (lpm_disable_error) {
-   dev_err(>dev, "%s Failed to disable LPM for 
driver %s\n.",
-   __func__, driver->name);
+   dev_err(>dev, "%s Failed to disable LPM for 
driver %s\n",
+   __func__, driver->name);
error = lpm_disable_error;
goto err;
}
@@ -537,8 +537,8 @@ int usb_driver_claim_interface(struct usb_driver *driver,
if (driver->disable_hub_initiated_lpm) {
lpm_disable_error = usb_unlocked_disable_lpm(udev);
if (lpm_disable_error) {
-   dev_err(>dev, "%s Failed to disable LPM for 
driver %s\n.",
-   __func__, driver->name);
+   dev_err(>dev, "%s Failed to disable LPM for 
driver %s\n",
+   __func__, driver->name);
return -ENOMEM;
}
}
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index cf7bbcb9a63c..b95855076f43 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1049,12 +1049,10 @@ static void hub_activate(struct usb_hub *hub, enum 
hub_activation_type type)
ret = hcd->driver->update_hub_device(hcd, hdev,
>tt, GFP_NOIO);
if (ret < 0) {
-   dev_err(hub->intfdev, "Host not "
-   "accepting hub info "
-   "update.\n");
-   dev_err(hub->intfdev, "LS/FS devices "
-   "and hubs may not work "
-   "under this hub\n.");
+   dev_err(hub->intfdev,
+   "Host not accepting hub info 
update\n");
+   dev_err(hub->intfdev,
+   "LS/FS devices and hubs may not 
work under this hub\n");
}
}
hub_power_on(hub, true);
@@ -3157,7 +3155,7 @@ int usb_port_suspend(struct usb_device *udev, 
pm_message_t msg)
usb_set_usb2_hardware_lpm(udev, 0);
 
if (usb_disable_ltm(udev)) {
-   dev_err(>dev, "Failed to disable LTM before suspend\n.");
+   dev_err(>dev, "Failed to disable LTM before suspend\n");
status = -ENOMEM;
if (PMSG_IS_AUTO(msg))
goto err_ltm;
@@ -5484,13 +5482,12 @@ static int usb_reset_and_verify_device(struct 
usb_device *udev)
 */
ret = usb_unlocked_disable_lpm(udev);
if (ret) {
-   dev_err(>dev, "%s Failed to disable LPM\n.", __func__);
+   dev_err(>dev, "%s Failed to disable LPM\n", __func__);
goto re_enumerate_no_bos;
}
ret = usb_disable_ltm(udev);
if (ret) {
-   dev_err(>dev, "%s Failed to disable LTM\n.",
-   __func__);
+   dev_err(>dev, "%s Failed to disable LTM\n", __func__);
goto re_enumerate_no_bos;
}
 
diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index f836bae1e485..dac4adeec213 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1356,7 +1356,7 @@ int usb_set_interface(struct usb_device *dev, int 
interface, int alternate)
 * so that the xHCI driver can recalculate the U1/U2 timeouts.
 */
if (usb_disable_lpm(dev)) {
-   dev_err(>dev, "%s Failed to disable LPM\n.", __func__);
+   dev_err(>dev, "%s Failed to disable LPM\n", __func__);
mutex_unlock(hcd->bandwidth_mutex);
return -ENOMEM;
}
@@ -1500,7 +1500,7 @@ int usb_reset_configuration(struct usb_device *dev)
 * that the xHCI driver can recalculate the U1/U2 timeouts.
 */
if (usb_disable_lpm(dev)) {
-   dev_err(>dev, "%s Failed to

Re: [PATCH] usb: dwc2: Fix TxFIFOn sizes and total TxFIFO size issues

2017-12-05 Thread John Youn

On 11/30/2017 12:16 AM, Minas Harutyunyan wrote:
> In host mode reading from DPTXSIZn returning invalid value in
> dwc2_check_param_tx_fifo_sizes function.
>
> In total TxFIFO size calculations unnecessarily reducing by ep_info.
> hw->total_fifo_size can be fully allocated for FIFO's.
>
> Added num_dev_in_eps member in dwc2_hw_params structure to save number
> of IN EPs.
>
> Added g_tx_fifo_size array in dwc2_hw_params structure to store power
> on reset values of DPTXSIZn registers in forced device mode.
>
> Updated dwc2_hsotg_tx_fifo_count() function to get TxFIFO count from
> num_dev_in_eps.
>
> Updated dwc2_get_dev_hwparams() function to store DPTXFSIZn in
> g_tx_fifo_size array.
>
> dwc2_get_host/dev_hwparams() functions call moved after num_dev_in_eps
> set from hwcfg4.
>
> Modified dwc2_check_param_tx_fifo_sizes() function to check TxFIFOn
> sizes based on g_tx_fifo_size array.
>
> Removed ep_info subtraction during calculation of tx_addr_max in
> dwc2_hsotg_tx_fifo_total_depth() function. Also removed
> dwc2_hsotg_ep_info_size() function as no more need.
>
> Signed-off-by: Gevorg Sahakyan 
> Signed-off-by: Minas Harutyunyan 
> ---
>  drivers/usb/dwc2/core.h   |  4 
>  drivers/usb/dwc2/gadget.c | 42 ++
>  drivers/usb/dwc2/params.c | 29 +++--
>  3 files changed, 25 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 8367d4f985c1..686e9b5527dd 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -532,6 +532,7 @@ struct dwc2_core_params {
>   *   2 - Internal DMA
>   * @power_optimized Are power optimizations enabled?
>   * @num_dev_ep  Number of device endpoints available
> + * @num_dev_in_eps  Number of device IN endpoints available
>   * @num_dev_perio_in_ep Number of device periodic IN endpoints
>   *  available
>   * @dev_token_q_depth   Device Mode IN Token Sequence Learning Queue
> @@ -560,6 +561,7 @@ struct dwc2_core_params {
>   *   2 - 8 or 16 bits
>   * @snpsid: Value from SNPSID register
>   * @dev_ep_dirs:Direction of device endpoints (GHWCFG1)
> + * @g_tx_fifo_size[] Power-on values of TxFIFO sizes
>   */
>  struct dwc2_hw_params {
>   unsigned op_mode:3;
> @@ -581,12 +583,14 @@ struct dwc2_hw_params {
>   unsigned fs_phy_type:2;
>   unsigned i2c_enable:1;
>   unsigned num_dev_ep:4;
> + unsigned num_dev_in_eps : 4;
>   unsigned num_dev_perio_in_ep:4;
>   unsigned total_fifo_size:16;
>   unsigned power_optimized:1;
>   unsigned utmi_phy_data_width:2;
>   u32 snpsid;
>   u32 dev_ep_dirs;
> + u32 g_tx_fifo_size[MAX_EPS_CHANNELS];
>  };
>
>  /* Size of control and EP0 buffers */
> diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
> index 0d8e09ccb59c..55950baae437 100644
> --- a/drivers/usb/dwc2/gadget.c
> +++ b/drivers/usb/dwc2/gadget.c
> @@ -198,55 +198,18 @@ int dwc2_hsotg_tx_fifo_count(struct dwc2_hsotg *hsotg)
>  {
>   if (hsotg->hw_params.en_multiple_tx_fifo)
>   /* In dedicated FIFO mode we need count of IN EPs */
> - return (dwc2_readl(hsotg->regs + GHWCFG4)  &
> - GHWCFG4_NUM_IN_EPS_MASK) >> GHWCFG4_NUM_IN_EPS_SHIFT;
> + return hsotg->hw_params.num_dev_in_eps;
>   else
>   /* In shared FIFO mode we need count of Periodic IN EPs */
>   return hsotg->hw_params.num_dev_perio_in_ep;
>  }
>
> -/**
> - * dwc2_hsotg_ep_info_size - return Endpoint Info Control block size in 
> DWORDs
> - */
> -static int dwc2_hsotg_ep_info_size(struct dwc2_hsotg *hsotg)
> -{
> - int val = 0;
> - int i;
> - u32 ep_dirs;
> -
> - /*
> -  * Don't need additional space for ep info control registers in
> -  * slave mode.
> -  */
> - if (!using_dma(hsotg)) {
> - dev_dbg(hsotg->dev, "Buffer DMA ep info size 0\n");
> - return 0;
> - }
> -
> - /*
> -  * Buffer DMA mode - 1 location per endpoit
> -  * Descriptor DMA mode - 4 locations per endpoint
> -  */
> - ep_dirs = hsotg->hw_params.dev_ep_dirs;
> -
> - for (i = 0; i <= hsotg->hw_params.num_dev_ep; i++) {
> - val += ep_dirs & 3 ? 1 : 2;
> - ep_dirs >>= 2;
> - }
> -
> - if (using_desc_dma(hsotg))
> - val = val * 4;
> -
> - return val;
> -}
> -
>  /**
>   * dwc2_hsotg_tx_fifo_total_depth - return total FIFO depth available for
>   * device mode TX FIFOs
>   */
>  int dwc2_hsotg_tx_fifo_total_depth(struct dwc2_hsotg *hsotg)
>  {
> - int ep_info_size;
>   int addr;
>   int tx_addr_max;
>   u32 np_tx_fifo_size;
> @@ -255,8 +218,7 @@ int dwc2_hsotg_tx_fifo_total_depth(struct dwc2_hsotg 
> *hsotg)
>   hsotg->params.g_np_tx_fifo_size);
>
>   /* Get Endpoint Info

Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away

2017-12-05 Thread John Youn

On 12/05/2017 08:18 AM, Stefan Wahren wrote:
> Hi Felipe,
> Hi John,
>
>
> Am 30.10.2017 um 18:08 schrieb Douglas Anderson:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>>then to the rk3288.  Thus, we were dealing with split transactions,
>>which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>>ACK.  Good.  Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>>the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>>sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>>handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>>the inefficiency of the dwc2 interrupt handler) was actually taking
>>_longer_ than it took the other side to send the ACK/NAK.  Thus we
>>were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>>other things from happening in the system.  This included preventing
>>the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future.  Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission.  If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress.  Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there.  Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson 
>> Cc: sta...@vger.kernel.org
>> Reviewed-by: Julius Werner 
>> Tested-by: Stefan Wahren 
>> ---
>>
>> Changes in v3:
>> - Add tested-by for Stefan Wahren
>> - Sent to Felipe Balbi as candiate to land this.
>> - Add Cc for stable (it's always been broken so go as far is as easy)
>>
>> Changes in v2:
>> - Address 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__crosreview.com_737520=DwICaQ=DPL6_X_6JkXFx7AXWqB0tg=U3o8uKoKhWme5_V9D-eeCkB11BFwt4KvWztBgdE9ZpA=Y_xpJ6Ks0XAK5_bQgmeQEvgKThZtPBQJ3cejNCGfEvM=olyPwyYvn_072esVwYxrCduKOKKJPUgc1YHX-CNhM1s=
>>  feedback
>>
>
> does it need a resend?
>

You can add my acked-by:

Acked-by: John Youn 

Regards,
John

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets

2017-12-05 Thread Manu Gautam

Hi Felipe,


On 7/19/2017 1:16 PM, Felipe Balbi wrote:
> Hi,
>
> Manu Gautam  writes:
>>> Manu Gautam  writes:
> Manu Gautam  writes:
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index aea9a5b..b81547d 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -854,8 +854,13 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep 
>> *dep, struct dwc3_trb *trb,
>>  trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS_FIRST;
>>  
>>  if (speed == USB_SPEED_HIGH) {
>> -struct usb_ep *ep = >endpoint;
>> -trb->size |= 
>> DWC3_TRB_SIZE_PCM1(ep->mult - 1);
>> +unsigned int maxp = usb_endpoint_maxp(
>> +
>> dep->endpoint.desc);
>> +unsigned int rem = length % maxp;
>> +unsigned int mult = (length / maxp) & 
>> 0x3;
>> +
>> +trb->size |= DWC3_TRB_SIZE_PCM1(
>> +rem ? mult : mult - 1);
> Manu, It seems to me like we shouldn't be relying on req->length. Which
> gadget driver are you using to test this?
 f_uvc function is used.
 In bus analyzer logs there are DATA2, DATA1 PIDs even for a 2K byte TRB
 (also last packet of the video frame are always less than maxpacket size).
>>> Understood, yeah it makes sense, although I think your patch can be
>>> simplified. Seems to me that it should be enough to set PCM1 to
>>> req->length / usb_endpoint_maxp(), no?
>> Still need to take care of two things:
>> 1. Handle case if If req>length is more than 3K (buggy function driver)
>> 2. We don't need to send extra packet for isoc if length is multiple of maxp.
>> Hence, remainder must be checked.
>>
>>> Or, if we want to make use of ep->mult, we could:
>>>
>>> unsigned int mult = ep->mult - 1;

It should be:
mult = 2;
Otherwise this logic works correctly only for 3K transfers. And for short 
packets '11' is
programmed as PCM1 (as mult becomes negative).
I didn't test updated patch for other mult values earlier, sorry about that. 
Will be sending
a fix for this.


>>>
>>> if (req->length < (usb_endpoint_maxp() << 1))
>>> mult--;
>> I think it should be <=
>> E.g. for 2k size only two transfers should take place)
>>
>>
>>> if (req->length < usb_endpoint_maxp())
>>> mult--;
>> <=
>>
>>> trb->size |= DWC3_TRB_SIZE_PCM1(mult);
>>>
>>> how about that?
>>>
>> This also looks fine and I can send the updated patch.
> please do. While doing that, please also add a comment pointing out the
> USB Spec section you took it from and a simplified text of why we need
> it. This way, nobody will dare changing that part of the code without
> checking the spec ;-)
>
> IOW, add something akin to:
>
> /*
>  * USB Specification X.x Section Y states that ""
>  *
>  * IOW, we should satisfy the following cases:
>  *
>  * i) req->length <= wMaxPacketSize
>  *- DATA0
>  *
>  * ii) wMaxPacketSize < req->length <= (2 * wMaxPacketSize)
>  *- DATA0, DATA1
>  *
>  * iii) (2 * maxPayloadSize) < req->length <= (3 * maxPayloadSize)
>  *- DATA2, DATA1, DATA0
>  */
>
> Or something similar to that.
>

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Matthew Wilcox

On Tue, Dec 05, 2017 at 08:45:49PM -0800, Matthew Wilcox wrote:
> The dquot code is just going to have to live with the fact that there's
> additional locking going on that it doesn't need.  I'm open to getting
> rid of the irqsafety, but we can't give up the spinlock protection
> without giving up the RCU/lockdep analysis and the ability to move nodes.
> I don't suppose the dquot code can 

Oops, thought I'd finished writing this paragraph.

I don't suppose the dquot code can be restructured to use the xa_lock to
protect, say, qi_dquots?  I suspect not, since you wouldn't know which
of the three xarray locks to use.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Matthew Wilcox

On Wed, Dec 06, 2017 at 02:14:56PM +1100, Dave Chinner wrote:
> > The other conversions use the normal API instead of the advanced API, so
> > all of this gets hidden away.  For example, the inode cache does this:
> 
> Ah, OK, that's not obvious from the code changes. :/

Yeah, it's a lot easier to understand (I think!) if you build the
docs in that tree and look at
file:///home/willy/kernel/xarray-3/Documentation/output/core-api/xarray.html
(mutatis mutandi).  I've tried to tell a nice story about how to put
all the pieces together from the normal to the advanced API.

> However, it's probably overkill for XFS. In all the cases, when we
> insert there should be no entry in the tree - the
> radix tree insert error handling code there was simply catching
> "should never happen" cases and handling it without crashing.

I thought it was probably overkill to be using xa_cmpxchg() in the
pag_ici patch.  I didn't want to take away your error handling as part
of the conversion, but I think a rational person implementing it today
would just call xa_store() and not even worry about the return value
except to check it for IS_ERR().

That said, using xa_cmpxchg() in the dquot code looked like the right
thing to do?  Since we'd dropped the qi mutex and the ILOCK, it looks
entirely reasonable for another thread to come in and set up the dquot.
But I'm obviously quite ignorant of the XFS internals, so maybe there's
something else going on that makes this essentially a "can't happen".

> Now that I've looked at this, I have to say that having a return
> value of NULL meaning "success" is quite counter-intuitive. That's
> going to fire my "that looks so wrong" detector every time I look at
> the code and notice it's erroring out on a non-null return value
> that isn't a PTR_ERR case

It's the same convention as cmpxchg().  I think it's triggering your
"looks so wrong" detector because it's fundamentally not the natural
thing to write.  I certainly don't cmpxchg() new entries into an array
and check the result was NULL ;-)

> Also, there's no need for irqsave/restore() locking contexts here as
> we never access these caches from interrupt contexts. That's just
> going to be extra overhead, especially on workloads that run 10^6
> inodes inodes through the cache every second. That's a problem
> caused by driving the locks into the XA structure and then needing
> to support callers that require irq safety

I'm quite happy to have normal API variants that don't save/restore
interrupts.  Just need to come up with good names ... I don't think
xa_store_noirq() is a good name, but maybe you do?

> > It's the design pattern I've always intended to use.  Naturally, the
> > xfs radix trees weren't my initial target; it was the page cache, and
> > the page cache does the same thing; uses the tree_lock to protect both
> > the radix tree and several other fields in that same data structure.
> > 
> > I'm open to argument on this though ... particularly if you have a better
> > design pattern in mind!
> 
> I don't mind structures having internal locking - I have a problem
> with leaking them into contexts outside the structure they protect.
> That way lies madness - you can't change the internal locking in
> future because of external dependencies, and the moment you need
> something different externally we've got to go back to an external
> lock anyway.
> 
> This is demonstrated by the way you converted the XFS dquot tree -
> you didn't replace the dquot tree lock with the internal xa_lock
> because it's a mutex and we have to sleep holding it. IOWs, we've
> added another layer of locking here, not simplified the code.

I agree the dquot code is no simpler than it was, but it's also no more
complicated from a locking analysis point of view; the xa_lock is just
not providing you with any useful exclusion.

At least, not today.  One of the future plans is to allow xa_nodes to
be allocated from ZONE_MOVABLE.  In order to do that, we have to be
able to tell which lock protects any given node.  With the XArray,
we can find that out (xa_node->root->xa_lock); with the radix tree,
we don't even know what kind of lock protects the tree.

> What I really see here is that  we have inconsistent locking
> patterns w.r.t. XA stores inside XFS - some have an external mutex
> to cover a wider scope, some use xa_lock/xa_unlock to span multiple
> operations, some directly access the internal xa lock via direct
> spin_lock/unlock(...xa_lock) calls and non-locking XA call variants.
> In some places you remove explicit rcu_read_lock() calls because the
> internal xa_lock implies RCU, but in other places we still need them
> because we have to protect the objects the tree points to, not just
> the tree
> 
> IOWs, there's no consistent pattern to the changes you've made to
> the XFS code. The existing radix tree code has clear, consistent
> locking, tagging and lookup patterns. In contrast, each conversion
> to the XA code has resulted in a

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Dave Chinner

On Tue, Dec 05, 2017 at 06:02:08PM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 12:36:48PM +1100, Dave Chinner wrote:
> > > - if (radix_tree_preload(GFP_NOFS))
> > > - return -ENOMEM;
> > > -
> > >   INIT_LIST_HEAD(>list_node);
> > >   elem->key = key;
> > >  
> > > - spin_lock(>lock);
> > > - error = radix_tree_insert(>store, key, elem);
> > > - radix_tree_preload_end();
> > > - if (!error)
> > > - _xfs_mru_cache_list_insert(mru, elem);
> > > - spin_unlock(>lock);
> > > + do {
> > > + xas_lock();
> > > + xas_store(, elem);
> > > + error = xas_error();
> > > + if (!error)
> > > + _xfs_mru_cache_list_insert(mru, elem);
> > > + xas_unlock();
> > > + } while (xas_nomem(, GFP_NOFS));
> > 
> > Ok, so why does this have a retry loop on ENOMEM despite the
> > existing code handling that error? And why put such a loop in this
> > code and not any of the other XFS code that used
> > radix_tree_preload() and is arguably much more important to avoid
> > ENOMEM on insert (e.g. the inode cache)?
> 
> If we need more nodes in the tree, xas_store() will try to allocate them
> with GFP_NOWAIT | __GFP_NOWARN.  If that fails, it signals it in xas_error().
> xas_nomem() will notice that we're in an ENOMEM situation, and allocate
> a node using your preferred GFP flags (NOIO in your case).  Then we retry,
> guaranteeing forward progress. [1]
> 
> The other conversions use the normal API instead of the advanced API, so
> all of this gets hidden away.  For example, the inode cache does this:
> 
> +   curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
> 
> and xa_cmpxchg internally does:
> 
> do {
> xa_lock_irqsave(xa, flags);
> curr = xas_create();
> if (curr == old)
> xas_store(, entry);
> xa_unlock_irqrestore(xa, flags);
> } while (xas_nomem(, gfp));

Ah, OK, that's not obvious from the code changes. :/

However, it's probably overkill for XFS. In all the cases, when we
insert there should be no entry in the tree - the
radix tree insert error handling code there was simply catching
"should never happen" cases and handling it without crashing.

Now that I've looked at this, I have to say that having a return
value of NULL meaning "success" is quite counter-intuitive. That's
going to fire my "that looks so wrong" detector every time I look at
the code and notice it's erroring out on a non-null return value
that isn't a PTR_ERR case

Also, there's no need for irqsave/restore() locking contexts here as
we never access these caches from interrupt contexts. That's just
going to be extra overhead, especially on workloads that run 10^6
inodes inodes through the cache every second. That's a problem
caused by driving the locks into the XA structure and then needing
to support callers that require irq safety

> > Also, I really don't like the pattern of using xa_lock()/xa_unlock()
> > to protect access to an external structure. i.e. the mru->lock
> > context is protecting multiple fields and operations in the MRU
> > structure, not just the radix tree operations. Turning that around
> > so that a larger XFS structure and algorithm is now protected by an
> > opaque internal lock from generic storage structure the forms part
> > of the larger structure seems like a bad design pattern to me...
> 
> It's the design pattern I've always intended to use.  Naturally, the
> xfs radix trees weren't my initial target; it was the page cache, and
> the page cache does the same thing; uses the tree_lock to protect both
> the radix tree and several other fields in that same data structure.
> 
> I'm open to argument on this though ... particularly if you have a better
> design pattern in mind!

I don't mind structures having internal locking - I have a problem
with leaking them into contexts outside the structure they protect.
That way lies madness - you can't change the internal locking in
future because of external dependencies, and the moment you need
something different externally we've got to go back to an external
lock anyway.

This is demonstrated by the way you converted the XFS dquot tree -
you didn't replace the dquot tree lock with the internal xa_lock
because it's a mutex and we have to sleep holding it. IOWs, we've
added another layer of locking here, not simplified the code.

What I really see here is that  we have inconsistent locking
patterns w.r.t. XA stores inside XFS - some have an external mutex
to cover a wider scope, some use xa_lock/xa_unlock to span multiple
operations, some directly access the internal xa lock via direct
spin_lock/unlock(...xa_lock) calls and non-locking XA call variants.
In some places you remove explicit rcu_read_lock() calls because the
internal xa_lock implies RCU, but in other places we still need them
because we have to protect the objects the tree points to, not just
the tree

IOWs, there's no

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner

On Tue, Dec 05, 2017 at 06:05:15PM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> > On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > > From: Matthew Wilcox 
> > > 
> > > I looked through some notes and decided this was version 4 of the XArray.
> > > Last posted two weeks ago, this version includes a *lot* of changes.
> > > I'd like to thank Dave Chinner for his feedback, encouragement and
> > > distracting ideas for improvement, which I'll get to once this is merged.
> > 
> > BTW, you need to fix the "To:" line on your patchbombs:
> > 
> > > To: unlisted-recipients: ;, no To-header on input 
> > > <@gmail-pop.l.google.com> 
> > 
> > This bad email address getting quoted to the cc line makes some MTAs
> > very unhappy.
> 
> I know :-(  I was unhappy when I realised what I'd done.
> 
> https://marc.info/?l=git=151252237912266=2
> 
> > I'll give this a quick burn this afternoon and see what catches fire...
> 
> All of the things ... 0day gave me a 90% chance of hanging in one
> configuration.  Need to drill down on it more and find out what stupid
> thing I've done wrong this time.

Yup, Bad Stuff happened on boot:

[   24.548039] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   24.548978]  1-...!: (0 ticks this GP) idle=688/0/0 softirq=143/143 fqs=0
[   24.549926]  5-...!: (0 ticks this GP) idle=db8/0/0 softirq=120/120 fqs=0
[   24.550864]  6-...!: (0 ticks this GP) idle=d58/0/0 softirq=111/111 fqs=0
[   24.551802]  8-...!: (5 GPs behind) idle=514/0/0 softirq=189/189 fqs=0
[   24.552722]  10-...!: (84 GPs behind) idle=ac0/0/0 softirq=80/80 fqs=0
[   24.553617]  11-...!: (8 GPs behind) idle=cfc/0/0 softirq=95/95 fqs=0
[   24.554496]  13-...!: (8 GPs behind) idle=b0c/0/0 softirq=82/82 fqs=0
[   24.555382]  14-...!: (38 GPs behind) idle=a7c/0/0 softirq=93/93 fqs=0
[   24.556305]  15-...!: (4 GPs behind) idle=b18/0/0 softirq=88/88 fqs=0
[   24.557190]  (detected by 9, t=5252 jiffies, g=-178, c=-179, q=994)
[   24.558051] Sending NMI from CPU 9 to CPUs 1:
[   24.558703] NMI backtrace for cpu 1 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.559654] Sending NMI from CPU 9 to CPUs 5:
[   24.559675] NMI backtrace for cpu 5 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.560654] Sending NMI from CPU 9 to CPUs 6:
[   24.560689] NMI backtrace for cpu 6 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.561655] Sending NMI from CPU 9 to CPUs 8:
[   24.561701] NMI backtrace for cpu 8 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.562654] Sending NMI from CPU 9 to CPUs 10:
[   24.562675] NMI backtrace for cpu 10 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.563653] Sending NMI from CPU 9 to CPUs 11:
[   24.563669] NMI backtrace for cpu 11 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.564653] Sending NMI from CPU 9 to CPUs 13:
[   24.564670] NMI backtrace for cpu 13 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.565652] Sending NMI from CPU 9 to CPUs 14:
[   24.565674] NMI backtrace for cpu 14 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.566652] Sending NMI from CPU 9 to CPUs 15:
[   24.59] NMI backtrace for cpu 15 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.567653] rcu_preempt kthread starved for 5256 jiffies! 
g18446744073709551438 c18446744073709551437 f0x0 RCU_GP_WAIT_FQS(3) 
->state=0x402 ->7
[   24.567654] rcu_preempt I15128 9  2 0x8000
[   24.567660] Call Trace:
[   24.567679]  ? __schedule+0x289/0x880
[   24.567681]  schedule+0x2f/0x90
[   24.567682]  schedule_timeout+0x152/0x370
[   24.567686]  ? __next_timer_interrupt+0xc0/0xc0
[   24.567689]  rcu_gp_kthread+0x561/0x880
[   24.567691]  ? force_qs_rnp+0x1a0/0x1a0
[   24.567693]  kthread+0x111/0x130
[   24.567695]  ? __kthread_create_worker+0x120/0x120
[   24.567697]  ret_from_fork+0x24/0x30
[   44.064092] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:854]
[   44.065920] CPU: 0 PID: 854 Comm: kswapd0 Not tainted 4.15.0-rc2-dgc #228
[   44.067769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.070030] RIP: 0010:smp_call_function_single+0xce/0x100
[   44.071521] RSP: :c90001d2fb20 EFLAGS: 0202 ORIG_RAX: 
ff11
[   44.073592] RAX:  RBX: 88013ab515c8 RCX: c9000350bb20
[   44.075560] RDX: 0001 RSI: c90001d2fb20 RDI: c90001d2fb20
[   44.077531] RBP: c90001d2fb50 R08: 0007 R09: 0080
[   44.079483] R10: c90001d2fb78 R11: c90001d2fb30 R12: c90001d2fc10
[   44.081465] R13: ea000449fc78 R14: ea000449fc58 R15: 88013ba36c40
[   44.083434] FS:  () GS:88013fc0() 
knlGS:
[   44.085683] CS:  0010 DS:  ES:  CR0: 80050033
[   44.087276] CR2: 7f1ad65f2260 CR3: 02009001 CR4: 000606f0
[   44.089228] Call Trace:
[   44.089942]  ?

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Matthew Wilcox

On Wed, Dec 06, 2017 at 01:17:52PM +1100, Dave Chinner wrote:
> On Wed, Dec 06, 2017 at 01:53:41AM +, Matthew Wilcox wrote:
> > Huh, you've caught a couple of problems that 0day hasn't sent me yet.  Try 
> > turning on DAX or TRANSPARENT_HUGEPAGE.  Thanks!
> 
> Dax is turned on, CONFIG_TRANSPARENT_HUGEPAGE is not.
> 
> Looks like nothing is setting CONFIG_RADIX_TREE_MULTIORDER, which is
> what xas_set_order() is hidden under.
> 
> Ah, CONFIG_ZONE_DEVICE turns it on, not CONFIG_DAX/CONFIG_FS_DAX.
> 
> H.  That seems wrong if it's used in fs/dax.c...

Yes, it's my mistake for not making xas_set_order available in the
!MULTIORDER case.  I'm working on a fix now.

> What a godawful mess Kconfig has turned into.

I'm not going to argue with that ...
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner

On Wed, Dec 06, 2017 at 01:53:41AM +, Matthew Wilcox wrote:
> Huh, you've caught a couple of problems that 0day hasn't sent me yet.  Try 
> turning on DAX or TRANSPARENT_HUGEPAGE.  Thanks!

Dax is turned on, CONFIG_TRANSPARENT_HUGEPAGE is not.

Looks like nothing is setting CONFIG_RADIX_TREE_MULTIORDER, which is
what xas_set_order() is hidden under.

Ah, CONFIG_ZONE_DEVICE turns it on, not CONFIG_DAX/CONFIG_FS_DAX.

H.  That seems wrong if it's used in fs/dax.c...

$ grep DAX .config
CONFIG_DAX=y
CONFIG_FS_DAX=y
$ grep ZONE_DEVICE .config
CONFIG_ARCH_HAS_ZONE_DEVICE=y
$

So I have DAX enabled, but not ZONE_DEVICE? Shouldn't DAX be
selecting ZONE_DEVICE, not relying on a user to select both of them
so that stuff works properly? Hmmm - there's no menu option to turn
on zone device, so it's selected by something else?  Oh, HMM turns
on ZONE device. But that is "default y", so should be turned on. But
it's not?  And there's no obvious HMM menu config option, either

What a godawful mess Kconfig has turned into.

I'm just going to enable TRANSPARENT_HUGEPAGE - madness awaits me if
I follow the other path down the rat hole

Ok, it build this time.

-Dave.

> 
> > -Original Message-
> > From: Dave Chinner [mailto:da...@fromorbit.com]
> > Sent: Tuesday, December 5, 2017 8:51 PM
> > To: Matthew Wilcox 
> > Cc: Matthew Wilcox ; Ross Zwisler
> > ; Jens Axboe ; Rehas
> > Sachdeva ; linux...@kvack.org; linux-
> > fsde...@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net; linux-
> > ni...@vger.kernel.org; linux-bt...@vger.kernel.org; 
> > linux-...@vger.kernel.org;
> > linux-usb@vger.kernel.org; linux-ker...@vger.kernel.org
> > Subject: Re: [PATCH v4 00/73] XArray version 4
> > 
> > On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> > > On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > > > From: Matthew Wilcox 
> > > >
> > > > I looked through some notes and decided this was version 4 of the 
> > > > XArray.
> > > > Last posted two weeks ago, this version includes a *lot* of changes.
> > > > I'd like to thank Dave Chinner for his feedback, encouragement and
> > > > distracting ideas for improvement, which I'll get to once this is 
> > > > merged.
> > >
> > > BTW, you need to fix the "To:" line on your patchbombs:
> > >
> > > > To: unlisted-recipients: ;, no To-header on input <@gmail-
> > pop.l.google.com>
> > >
> > > This bad email address getting quoted to the cc line makes some MTAs
> > > very unhappy.
> > >
> > > >
> > > > Highlights:
> > > >  - Over 2000 words of documentation in patch 8!  And lots more 
> > > > kernel-doc.
> > > >  - The page cache is now fully converted to the XArray.
> > > >  - Many more tests in the test-suite.
> > > >
> > > > This patch set is not for applying.  0day is still reporting problems,
> > > > and I'd feel bad for eating someone's data.  These patches apply on top
> > > > of a set of prepatory patches which just aren't interesting.  If you
> > > > want to see the patches applied to a tree, I suggest pulling my git 
> > > > tree:
> > > >
> > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit.infrade
> > ad.org%2Fusers%2Fwilly%2Flinux-
> > dax.git%2Fshortlog%2Frefs%2Fheads%2Fxarray-2017-12-
> > 04=02%7C01%7Cmawilcox%40microsoft.com%7Ca3e721545f8b4b9dff1
> > 608d53c4bd42f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364
> > 81218740341312=IXNZXXLTf964OQ0eLDpJt2LCv%2BGGWFW%2FQd4Kc
> > KYu6zo%3D=0
> > > > I also left out the idr_preload removals.  They're still in the git 
> > > > tree,
> > > > but I'm not looking for feedback on them.
> > >
> > > I'll give this a quick burn this afternoon and see what catches fire...
> > 
> > Build warnings/errors:
> > 
> > .
> > lib/radix-tree.c:700:13: warning: ¿radix_tree_free_nodes¿ defined but not 
> > used
> > [-Wunused-function]
> >  static void radix_tree_free_nodes(struct radix_tree_node *node)
> > .
> > lib/xarray.c: In function ¿xas_max¿:
> > lib/xarray.c:291:16: warning: unused variable ¿mask¿
> > [-Wunused-variable]
> >   unsigned long mask, max = xas->xa_index;
> >   ^~~~
> > ..
> > fs/dax.c: In function ¿grab_mapping_entry¿:
> > fs/dax.c:305:2: error: implicit declaration of function ¿xas_set_order¿; 
> > did you
> > mean ¿xas_set_err¿?  [-Werror=implicit-function-declaration]
> >   xas_set_order(, index, size_flag ? PMD_ORDER : 0);
> > ^
> > scripts/Makefile.build:310: recipe for target 'fs/dax.o' failed
> > make[1]: *** [fs/dax.o] Error 1
> > 
> > -Dave.
> > --
> > Dave Chinner
> > da...@fromorbit.com

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Matthew Wilcox

On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox 
> > 
> > I looked through some notes and decided this was version 4 of the XArray.
> > Last posted two weeks ago, this version includes a *lot* of changes.
> > I'd like to thank Dave Chinner for his feedback, encouragement and
> > distracting ideas for improvement, which I'll get to once this is merged.
> 
> BTW, you need to fix the "To:" line on your patchbombs:
> 
> > To: unlisted-recipients: ;, no To-header on input <@gmail-pop.l.google.com> 
> 
> This bad email address getting quoted to the cc line makes some MTAs
> very unhappy.

I know :-(  I was unhappy when I realised what I'd done.

https://marc.info/?l=git=151252237912266=2

> I'll give this a quick burn this afternoon and see what catches fire...

All of the things ... 0day gave me a 90% chance of hanging in one
configuration.  Need to drill down on it more and find out what stupid
thing I've done wrong this time.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Matthew Wilcox

On Wed, Dec 06, 2017 at 12:36:48PM +1100, Dave Chinner wrote:
> > -   if (radix_tree_preload(GFP_NOFS))
> > -   return -ENOMEM;
> > -
> > INIT_LIST_HEAD(>list_node);
> > elem->key = key;
> >  
> > -   spin_lock(>lock);
> > -   error = radix_tree_insert(>store, key, elem);
> > -   radix_tree_preload_end();
> > -   if (!error)
> > -   _xfs_mru_cache_list_insert(mru, elem);
> > -   spin_unlock(>lock);
> > +   do {
> > +   xas_lock();
> > +   xas_store(, elem);
> > +   error = xas_error();
> > +   if (!error)
> > +   _xfs_mru_cache_list_insert(mru, elem);
> > +   xas_unlock();
> > +   } while (xas_nomem(, GFP_NOFS));
> 
> Ok, so why does this have a retry loop on ENOMEM despite the
> existing code handling that error? And why put such a loop in this
> code and not any of the other XFS code that used
> radix_tree_preload() and is arguably much more important to avoid
> ENOMEM on insert (e.g. the inode cache)?

If we need more nodes in the tree, xas_store() will try to allocate them
with GFP_NOWAIT | __GFP_NOWARN.  If that fails, it signals it in xas_error().
xas_nomem() will notice that we're in an ENOMEM situation, and allocate
a node using your preferred GFP flags (NOIO in your case).  Then we retry,
guaranteeing forward progress. [1]

The other conversions use the normal API instead of the advanced API, so
all of this gets hidden away.  For example, the inode cache does this:

+   curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);

and xa_cmpxchg internally does:

do {
xa_lock_irqsave(xa, flags);
curr = xas_create();
if (curr == old)
xas_store(, entry);
xa_unlock_irqrestore(xa, flags);
} while (xas_nomem(, gfp));

> Also, I really don't like the pattern of using xa_lock()/xa_unlock()
> to protect access to an external structure. i.e. the mru->lock
> context is protecting multiple fields and operations in the MRU
> structure, not just the radix tree operations. Turning that around
> so that a larger XFS structure and algorithm is now protected by an
> opaque internal lock from generic storage structure the forms part
> of the larger structure seems like a bad design pattern to me...

It's the design pattern I've always intended to use.  Naturally, the
xfs radix trees weren't my initial target; it was the page cache, and
the page cache does the same thing; uses the tree_lock to protect both
the radix tree and several other fields in that same data structure.

I'm open to argument on this though ... particularly if you have a better
design pattern in mind!

[1] I actually have this documented!  It's in the xas_nomem() kernel-doc:

 * If we need to add new nodes to the XArray, we try to allocate memory
 * with GFP_NOWAIT while holding the lock, which will usually succeed.
 * If it fails, @xas is flagged as needing memory to continue.  The caller
 * should drop the lock and call xas_nomem().  If xas_nomem() succeeds,
 * the caller should retry the operation.
 *
 * Forward progress is guaranteed as one node is allocated here and
 * stored in the xa_state where it will be found by xas_alloc().  More
 * nodes will likely be found in the slab allocator, but we do not tie
 * them up here.
 *
 * Return: true if memory was needed, and was successfully allocated.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Matthew Wilcox

Huh, you've caught a couple of problems that 0day hasn't sent me yet.  Try 
turning on DAX or TRANSPARENT_HUGEPAGE.  Thanks!

> -Original Message-
> From: Dave Chinner [mailto:da...@fromorbit.com]
> Sent: Tuesday, December 5, 2017 8:51 PM
> To: Matthew Wilcox 
> Cc: Matthew Wilcox ; Ross Zwisler
> ; Jens Axboe ; Rehas
> Sachdeva ; linux...@kvack.org; linux-
> fsde...@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net; linux-
> ni...@vger.kernel.org; linux-bt...@vger.kernel.org; linux-...@vger.kernel.org;
> linux-usb@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v4 00/73] XArray version 4
> 
> On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> > On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > > From: Matthew Wilcox 
> > >
> > > I looked through some notes and decided this was version 4 of the XArray.
> > > Last posted two weeks ago, this version includes a *lot* of changes.
> > > I'd like to thank Dave Chinner for his feedback, encouragement and
> > > distracting ideas for improvement, which I'll get to once this is merged.
> >
> > BTW, you need to fix the "To:" line on your patchbombs:
> >
> > > To: unlisted-recipients: ;, no To-header on input <@gmail-
> pop.l.google.com>
> >
> > This bad email address getting quoted to the cc line makes some MTAs
> > very unhappy.
> >
> > >
> > > Highlights:
> > >  - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
> > >  - The page cache is now fully converted to the XArray.
> > >  - Many more tests in the test-suite.
> > >
> > > This patch set is not for applying.  0day is still reporting problems,
> > > and I'd feel bad for eating someone's data.  These patches apply on top
> > > of a set of prepatory patches which just aren't interesting.  If you
> > > want to see the patches applied to a tree, I suggest pulling my git tree:
> > >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit.infrade
> ad.org%2Fusers%2Fwilly%2Flinux-
> dax.git%2Fshortlog%2Frefs%2Fheads%2Fxarray-2017-12-
> 04=02%7C01%7Cmawilcox%40microsoft.com%7Ca3e721545f8b4b9dff1
> 608d53c4bd42f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364
> 81218740341312=IXNZXXLTf964OQ0eLDpJt2LCv%2BGGWFW%2FQd4Kc
> KYu6zo%3D=0
> > > I also left out the idr_preload removals.  They're still in the git tree,
> > > but I'm not looking for feedback on them.
> >
> > I'll give this a quick burn this afternoon and see what catches fire...
> 
> Build warnings/errors:
> 
> .
> lib/radix-tree.c:700:13: warning: ¿radix_tree_free_nodes¿ defined but not used
> [-Wunused-function]
>  static void radix_tree_free_nodes(struct radix_tree_node *node)
> .
> lib/xarray.c: In function ¿xas_max¿:
> lib/xarray.c:291:16: warning: unused variable ¿mask¿
> [-Wunused-variable]
>   unsigned long mask, max = xas->xa_index;
>   ^~~~
> ..
> fs/dax.c: In function ¿grab_mapping_entry¿:
> fs/dax.c:305:2: error: implicit declaration of function ¿xas_set_order¿; did 
> you
> mean ¿xas_set_err¿?  [-Werror=implicit-function-declaration]
>   xas_set_order(, index, size_flag ? PMD_ORDER : 0);
> ^
> scripts/Makefile.build:310: recipe for target 'fs/dax.o' failed
> make[1]: *** [fs/dax.o] Error 1
> 
> -Dave.
> --
> Dave Chinner
> da...@fromorbit.com

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner

On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox 
> > 
> > I looked through some notes and decided this was version 4 of the XArray.
> > Last posted two weeks ago, this version includes a *lot* of changes.
> > I'd like to thank Dave Chinner for his feedback, encouragement and
> > distracting ideas for improvement, which I'll get to once this is merged.
> 
> BTW, you need to fix the "To:" line on your patchbombs:
> 
> > To: unlisted-recipients: ;, no To-header on input <@gmail-pop.l.google.com> 
> 
> This bad email address getting quoted to the cc line makes some MTAs
> very unhappy.
> 
> > 
> > Highlights:
> >  - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
> >  - The page cache is now fully converted to the XArray.
> >  - Many more tests in the test-suite.
> > 
> > This patch set is not for applying.  0day is still reporting problems,
> > and I'd feel bad for eating someone's data.  These patches apply on top
> > of a set of prepatory patches which just aren't interesting.  If you
> > want to see the patches applied to a tree, I suggest pulling my git tree:
> > http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-2017-12-04
> > I also left out the idr_preload removals.  They're still in the git tree,
> > but I'm not looking for feedback on them.
> 
> I'll give this a quick burn this afternoon and see what catches fire...

Build warnings/errors:

.
lib/radix-tree.c:700:13: warning: ¿radix_tree_free_nodes¿ defined but not used 
[-Wunused-function]
 static void radix_tree_free_nodes(struct radix_tree_node *node)
.
lib/xarray.c: In function ¿xas_max¿:
lib/xarray.c:291:16: warning: unused variable ¿mask¿
[-Wunused-variable]
  unsigned long mask, max = xas->xa_index;
  ^~~~
..
fs/dax.c: In function ¿grab_mapping_entry¿:
fs/dax.c:305:2: error: implicit declaration of function ¿xas_set_order¿; did 
you mean ¿xas_set_err¿?  [-Werror=implicit-function-declaration]
  xas_set_order(, index, size_flag ? PMD_ORDER : 0);
^
scripts/Makefile.build:310: recipe for target 'fs/dax.o' failed
make[1]: *** [fs/dax.o] Error 1

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner

On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox 
> 
> I looked through some notes and decided this was version 4 of the XArray.
> Last posted two weeks ago, this version includes a *lot* of changes.
> I'd like to thank Dave Chinner for his feedback, encouragement and
> distracting ideas for improvement, which I'll get to once this is merged.

BTW, you need to fix the "To:" line on your patchbombs:

> To: unlisted-recipients: ;, no To-header on input <@gmail-pop.l.google.com> 

This bad email address getting quoted to the cc line makes some MTAs
very unhappy.

> 
> Highlights:
>  - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
>  - The page cache is now fully converted to the XArray.
>  - Many more tests in the test-suite.
> 
> This patch set is not for applying.  0day is still reporting problems,
> and I'd feel bad for eating someone's data.  These patches apply on top
> of a set of prepatory patches which just aren't interesting.  If you
> want to see the patches applied to a tree, I suggest pulling my git tree:
> http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-2017-12-04
> I also left out the idr_preload removals.  They're still in the git tree,
> but I'm not looking for feedback on them.

I'll give this a quick burn this afternoon and see what catches fire...

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Dave Chinner

On Tue, Dec 05, 2017 at 04:41:58PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox 
> 
> This eliminates a call to radix_tree_preload().

.

>  void
> @@ -431,24 +424,24 @@ xfs_mru_cache_insert(
>   unsigned long   key,
>   struct xfs_mru_cache_elem *elem)
>  {
> + XA_STATE(xas, >store, key);
>   int error;
>  
>   ASSERT(mru && mru->lists);
>   if (!mru || !mru->lists)
>   return -EINVAL;
>  
> - if (radix_tree_preload(GFP_NOFS))
> - return -ENOMEM;
> -
>   INIT_LIST_HEAD(>list_node);
>   elem->key = key;
>  
> - spin_lock(>lock);
> - error = radix_tree_insert(>store, key, elem);
> - radix_tree_preload_end();
> - if (!error)
> - _xfs_mru_cache_list_insert(mru, elem);
> - spin_unlock(>lock);
> + do {
> + xas_lock();
> + xas_store(, elem);
> + error = xas_error();
> + if (!error)
> + _xfs_mru_cache_list_insert(mru, elem);
> + xas_unlock();
> + } while (xas_nomem(, GFP_NOFS));

Ok, so why does this have a retry loop on ENOMEM despite the
existing code handling that error? And why put such a loop in this
code and not any of the other XFS code that used
radix_tree_preload() and is arguably much more important to avoid
ENOMEM on insert (e.g. the inode cache)?

Also, I really don't like the pattern of using xa_lock()/xa_unlock()
to protect access to an external structure. i.e. the mru->lock
context is protecting multiple fields and operations in the MRU
structure, not just the radix tree operations. Turning that around
so that a larger XFS structure and algorithm is now protected by an
opaque internal lock from generic storage structure the forms part
of the larger structure seems like a bad design pattern to me...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 07/73] xarray: Define struct xa_node

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a direct replacement for struct radix_tree_node.  Use a #define
so that radix tree users continue to work without change.

Signed-off-by: Matthew Wilcox 
---
 include/linux/radix-tree.h | 29 +++--
 include/linux/xarray.h | 24 
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f31a278de8eb..f46e3de57115 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -32,6 +32,7 @@
 
 /* Keep unconverted code working */
 #define radix_tree_rootxarray
+#define radix_tree_nodexa_node
 
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
@@ -60,41 +61,17 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*** radix-tree API starts here ***/
 
-#define RADIX_TREE_MAX_TAGS 3
-
 #define RADIX_TREE_MAP_SHIFT   XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK(RADIX_TREE_MAP_SIZE-1)
 
-#define RADIX_TREE_TAG_LONGS   \
-   ((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)
+#define RADIX_TREE_MAX_TAGSXA_MAX_TAGS
+#define RADIX_TREE_TAG_LONGS   XA_TAG_LONGS
 
 #define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
 #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
  RADIX_TREE_MAP_SHIFT))
 
-/*
- * @count is the count of every non-NULL element in the ->slots array
- * whether that is a data entry, a retry entry, a user pointer,
- * a sibling entry or a pointer to the next level of the tree.
- * @exceptional is the count of every element in ->slots which is
- * either a data entry or a sibling entry for data.
- */
-struct radix_tree_node {
-   unsigned char   shift;  /* Bits remaining in each slot */
-   unsigned char   offset; /* Slot offset in parent */
-   unsigned char   count;  /* Total entry count */
-   unsigned char   exceptional;/* Exceptional entry count */
-   struct radix_tree_node *parent; /* Used when ascending tree */
-   struct radix_tree_root *root;   /* The tree we belong to */
-   union {
-   struct list_head private_list;  /* For tree user */
-   struct rcu_head rcu_head;   /* Used when freeing node */
-   };
-   void __rcu  *slots[RADIX_TREE_MAP_SIZE];
-   unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
-};
-
 /* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT + 1)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index dcdac2053ea6..1aff0069458b 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -133,6 +133,30 @@ static inline bool xa_is_value(void *entry)
 #endif
 #define XA_CHUNK_SIZE  (1UL << XA_CHUNK_SHIFT)
 #define XA_CHUNK_MASK  (XA_CHUNK_SIZE - 1)
+#define XA_MAX_TAGS3
+#define XA_TAG_LONGS   DIV_ROUND_UP(XA_CHUNK_SIZE, BITS_PER_LONG)
+
+/*
+ * @count is the count of every non-NULL element in the ->slots array
+ * whether that is a data value entry, a retry entry, a user pointer,
+ * a sibling entry or a pointer to the next level of the tree.
+ * @exceptional is the count of every element in ->slots which is
+ * either a data value entry or a sibling entry for a data value.
+ */
+struct xa_node {
+   unsigned char   shift;  /* Bits remaining in each slot */
+   unsigned char   offset; /* Slot offset in parent */
+   unsigned char   count;  /* Total entry count */
+   unsigned char   exceptional;/* Exceptional entry count */
+   struct xa_node *parent; /* Used when ascending tree */
+   struct xarray * root;   /* The tree we belong to */
+   union {
+   struct list_head private_list;  /* For tree user */
+   struct rcu_head rcu_head;   /* Used when freeing node */
+   };
+   void __rcu  *slots[XA_CHUNK_SIZE];
+   unsigned long   tags[XA_MAX_TAGS][XA_TAG_LONGS];
+};
 
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 10/73] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

XArray tags are slightly more strongly typed than the radix tree tags,
but occupy the same bits.  This commit also adds the xas_ family of tag
operations, for cases where the caller is already holding the lock, and
xa_tagged() to ask whether any array member has a particular tag set.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h |  38 +++-
 lib/radix-tree.c   |  52 +--
 lib/xarray.c   | 247 +
 3 files changed, 310 insertions(+), 27 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index af52ba75e6a3..ed95ebe91169 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -71,6 +72,33 @@ static inline void xa_init(struct xarray *xa)
 
 void *xa_load(struct xarray *, unsigned long index);
 
+typedef unsigned __bitwise xa_tag_t;
+#define XA_TAG_0   ((__force xa_tag_t)0U)
+#define XA_TAG_1   ((__force xa_tag_t)1U)
+#define XA_TAG_2   ((__force xa_tag_t)2U)
+#define XA_NO_TAG  ((__force xa_tag_t)4U)
+
+#define XA_TAG_MAX XA_TAG_2
+#define XA_FREE_TAGXA_TAG_0
+#define XA_FLAGS_TAG(tag)  ((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
+   (__force unsigned)(tag)))
+
+/**
+ * xa_tagged() - Inquire whether any entry in this array has a tag set
+ * @xa: Array
+ * @tag: Tag value
+ *
+ * Return: %true if any entry has this tag set.
+ */
+static inline bool xa_tagged(const struct xarray *xa, xa_tag_t tag)
+{
+   return xa->xa_flags & XA_FLAGS_TAG(tag);
+}
+
+bool xa_get_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 #define BITS_PER_XA_VALUE  (BITS_PER_LONG - 1)
 
 /**
@@ -122,6 +150,10 @@ static inline bool xa_is_value(void *entry)
spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)   lockdep_is_held(&(xa)->xa_lock)
 
+/* Versions of the normal API which require the caller to hold the xa_lock */
+void *__xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
+void *__xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
+
 /*
  * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
  * the best chunk size requires some tradeoffs.  A power of two recommends
@@ -152,7 +184,7 @@ struct xa_node {
unsigned char   offset; /* Slot offset in parent */
unsigned char   count;  /* Total entry count */
unsigned char   exceptional;/* Exceptional entry count */
-   struct xa_node *parent; /* Used when ascending tree */
+   struct xa_node __rcu *parent;   /* Used when ascending tree */
struct xarray * root;   /* The tree we belong to */
union {
struct list_head private_list;  /* For tree user */
@@ -434,6 +466,10 @@ static inline bool xas_retry(struct xa_state *xas, void 
*entry)
 
 void *xas_load(struct xa_state *);
 
+bool xas_get_tag(const struct xa_state *, xa_tag_t);
+void xas_set_tag(const struct xa_state *, xa_tag_t);
+void xas_clear_tag(const struct xa_state *, xa_tag_t);
+
 /**
  * xas_reload() - Refetch an entry from the xarray.
  * @xas: XArray operation state.
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index a919c60b10a4..8a8485749433 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -126,19 +126,19 @@ static inline gfp_t root_gfp_mask(const struct 
radix_tree_root *root)
return root->xa_flags & __GFP_BITS_MASK;
 }
 
-static inline void tag_set(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_set(struct radix_tree_node *node, unsigned int tag,
int offset)
 {
__set_bit(offset, node->tags[tag]);
 }
 
-static inline void tag_clear(struct radix_tree_node *node, unsigned int tag,
+static inline void rtag_clear(struct radix_tree_node *node, unsigned int tag,
int offset)
 {
__clear_bit(offset, node->tags[tag]);
 }
 
-static inline int tag_get(const struct radix_tree_node *node, unsigned int tag,
+static inline int rtag_get(const struct radix_tree_node *node, unsigned int 
tag,
int offset)
 {
return test_bit(offset, node->tags[tag]);
@@ -574,14 +574,14 @@ static int radix_tree_extend(struct radix_tree_root 
*root, gfp_t gfp,
if (is_idr(root)) {
all_tag_set(node, IDR_FREE);
if (!root_tag_get(root, IDR_FREE)) {
-   tag_clear(node, IDR_FREE, 0);
+   rtag_clear(node, IDR_FREE, 0);
root_tag_set(root, IDR_FREE);
}
} else

[PATCH v4 02/73] xarray: Add the xa_lock to the radix_tree_root

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This results in no change in structure size on 64-bit x86 as it fits in
the padding between the gfp_t and the void *.

Signed-off-by: Matthew Wilcox 
---
 fs/f2fs/gc.c   |  2 +-
 include/linux/idr.h| 12 ++--
 include/linux/radix-tree.h |  7 +--
 include/linux/xarray.h | 34 ++
 kernel/pid.c   |  2 +-
 tools/include/linux/spinlock.h |  1 +
 6 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/xarray.h

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d844dcb80570..aac1e02f75df 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -991,7 +991,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
unsigned int init_segno = segno;
struct gc_inode_list gc_list = {
.ilist = LIST_HEAD_INIT(gc_list.ilist),
-   .iroot = RADIX_TREE_INIT(GFP_NOFS),
+   .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};
 
trace_f2fs_gc_begin(sbi->sb, sync, background,
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 5f55e119d128..4ffdb7058121 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -30,11 +30,11 @@ struct idr {
 /* Set the IDR flag and the IDR_FREE tag */
 #define IDR_RT_MARKER  ((__force gfp_t)(3 << __GFP_BITS_SHIFT))
 
-#define IDR_INIT   \
+#define IDR_INIT(name) \
 {  \
-   .idr_rt = RADIX_TREE_INIT(IDR_RT_MARKER)\
+   .idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)  \
 }
-#define DEFINE_IDR(name)   struct idr name = IDR_INIT
+#define DEFINE_IDR(name)   struct idr name = IDR_INIT(name)
 
 /**
  * idr_get_cursor - Return the current position of the cyclic allocator
@@ -193,10 +193,10 @@ struct ida {
struct radix_tree_root  ida_rt;
 };
 
-#define IDA_INIT   {   \
-   .ida_rt = RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT),  \
+#define IDA_INIT(name) {   \
+   .ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT),\
 }
-#define DEFINE_IDA(name)   struct ida name = IDA_INIT
+#define DEFINE_IDA(name)   struct ida name = IDA_INIT(name)
 
 int ida_pre_get(struct ida *ida, gfp_t gfp_mask);
 int ida_get_new_above(struct ida *ida, int starting_id, int *p_id);
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index fc55ff31eca7..d2253b540cd7 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -109,20 +109,23 @@ struct radix_tree_node {
 #define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT + 1)
 
 struct radix_tree_root {
+   spinlock_t  xa_lock;
gfp_t   gfp_mask;
struct radix_tree_node  __rcu *rnode;
 };
 
-#define RADIX_TREE_INIT(mask)  {   \
+#define RADIX_TREE_INIT(name, mask){   \
+   .xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),  \
.gfp_mask = (mask), \
.rnode = NULL,  \
 }
 
 #define RADIX_TREE(name, mask) \
-   struct radix_tree_root name = RADIX_TREE_INIT(mask)
+   struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
 
 #define INIT_RADIX_TREE(root, mask)\
 do {   \
+   spin_lock_init(&(root)->xa_lock);   \
(root)->gfp_mask = (mask);  \
(root)->rnode = NULL;   \
 } while (0)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
new file mode 100644
index ..a5a933925b85
--- /dev/null
+++ b/include/linux/xarray.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_XARRAY_H
+#define _LINUX_XARRAY_H
+/*
+ * eXtensible Arrays
+ * Copyright (c) 2017 Microsoft Corporation
+ * Author: Matthew Wilcox 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#define xa_trylock(xa) spin_trylock(&(xa)->xa_lock)
+#define xa_lock(xa)spin_lock(&(xa)->xa_lock)
+#define xa_unlock(xa)

[PATCH v4 20/73] idr: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

The IDR distinguishes between unallocated entries (read as NULL) and
entries where the user has chosen to store NULL.  The radix tree was
modified to consider NULL entries which had tag 0 _clear_ as being
allocated, but it added a lot of complexity.

Instead, the XArray has a 'zero entry', which the normal API will treat
as NULL, but is distinct from NULL when using the advanced API.  The IDR
code converts between NULL and zero entries.

The idr_for_each_entry_ul() iterator becomes an alias for xa_for_each(),
so we drop the idr_get_next_ul() function as it has no users.

The exported IDR API was a weird mix of GPL-only and general symbols;
I converted them all to GPL as there was no way to use the IDR API
without being GPL.

Signed-off-by: Matthew Wilcox 
---
 Documentation/core-api/xarray.rst   |   6 +
 include/linux/idr.h | 161 +---
 include/linux/xarray.h  |  27 +++-
 lib/idr.c   | 282 +---
 lib/radix-tree.c|  77 +-
 lib/xarray.c|   6 +
 tools/testing/radix-tree/idr-test.c |  23 +++
 7 files changed, 367 insertions(+), 215 deletions(-)

diff --git a/Documentation/core-api/xarray.rst 
b/Documentation/core-api/xarray.rst
index 871161539242..b252bf3dc23f 100644
--- a/Documentation/core-api/xarray.rst
+++ b/Documentation/core-api/xarray.rst
@@ -200,6 +200,12 @@ to :c:func:`xas_retry`, and retry the operation if it 
returns ``true``.
this RCU period.  You should restart the lookup from the head of the
array.
 
+   * - Zero
+ - :c:func:`xa_is_zero`
+ - Zero entries appear as ``NULL`` through the Normal API, but occupy an
+   entry in the XArray which can be tagged or otherwise used to reserve
+   the index.
+
 Other internal entries may be added in the future.  As far as possible, they
 will be handled by :c:func:`xas_retry`.
 
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 4ffdb7058121..06412fbaa65f 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -9,33 +9,34 @@
  * tables.
  */
 
-#ifndef __IDR_H__
-#define __IDR_H__
+#ifndef _LINUX_IDR_H
+#define _LINUX_IDR_H
 
 #include 
 #include 
 #include 
+#include 
 
 struct idr {
-   struct radix_tree_root  idr_rt;
-   unsigned intidr_next;
+   struct xarray   idr_xa;
+   unsigned intidr_next;
 };
 
-/*
- * The IDR API does not expose the tagging functionality of the radix tree
- * to users.  Use tag 0 to track whether a node has free space below it.
- */
-#define IDR_FREE   0
-
-/* Set the IDR flag and the IDR_FREE tag */
-#define IDR_RT_MARKER  ((__force gfp_t)(3 << __GFP_BITS_SHIFT))
+#define IDR_INIT_FLAGS (XA_FLAGS_TRACK_FREE | XA_FLAGS_TAG(0))
 
 #define IDR_INIT(name) \
 {  \
-   .idr_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER)  \
+   .idr_xa = __XARRAY_INIT(name.idr_xa, IDR_INIT_FLAGS),   \
+   .idr_next = 0,  \
 }
 #define DEFINE_IDR(name)   struct idr name = IDR_INIT(name)
 
+static inline void idr_init(struct idr *idr)
+{
+   __xa_init(>idr_xa, IDR_INIT_FLAGS);
+   idr->idr_next = 0;
+}
+
 /**
  * idr_get_cursor - Return the current position of the cyclic allocator
  * @idr: idr handle
@@ -64,62 +65,97 @@ static inline void idr_set_cursor(struct idr *idr, unsigned 
int val)
 
 /**
  * DOC: idr sync
- * idr synchronization (stolen from radix-tree.h)
- *
- * idr_find() is able to be called locklessly, using RCU. The caller must
- * ensure calls to this function are made within rcu_read_lock() regions.
- * Other readers (lock-free or otherwise) and modifications may be running
- * concurrently.
- *
- * It is still required that the caller manage the synchronization and
- * lifetimes of the items. So if RCU lock-free lookups are used, typically
- * this would mean that the items have their own locks, or are amenable to
- * lock-free access; and that the items are freed by RCU (or only freed after
- * having been deleted from the idr tree *and* a synchronize_rcu() grace
- * period).
+ * idr synchronization
+ *
+ * The IDR manages its own locking, using irqsafe spinlocks for operations
+ * which modify the IDR and RCU for operations which do not.  The user of
+ * the IDR may choose to wrap accesses to it in a lock if it needs to
+ * guarantee the IDR does not change during a read access.  The easiest way
+ * to do this is to grab the same lock the IDR uses for write accesses
+ * using one of the idr_lock() wrappers.
+ *
+ * The caller must still manage the synchronization and lifetimes of the
+ * items. So if RCU lock-free lookups are used, typically this would mean
+ * that the items have their own locks, or are amenable to

[PATCH v4 09/73] xarray: Add xa_load

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This first function in the XArray API brings with it a lot of support
infrastructure.  The advanced API is based around the xa_state which is
a more capable version of the radix_tree_iter.

As the test-suite demonstrates, it is possible to use the xarray and
radix tree APIs on the same data structure.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h  | 235 
 lib/radix-tree.c|  43 -
 lib/xarray.c| 160 +++
 tools/testing/radix-tree/.gitignore |   1 +
 tools/testing/radix-tree/Makefile   |   7 +-
 tools/testing/radix-tree/linux/radix-tree.h |   1 -
 tools/testing/radix-tree/linux/rcupdate.h   |   1 +
 tools/testing/radix-tree/linux/xarray.h |   1 +
 tools/testing/radix-tree/xarray-test.c  |  56 +++
 9 files changed, 459 insertions(+), 46 deletions(-)
 create mode 100644 tools/testing/radix-tree/xarray-test.c

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 1aff0069458b..af52ba75e6a3 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -67,6 +69,8 @@ static inline void xa_init(struct xarray *xa)
__xa_init(xa, 0);
 }
 
+void *xa_load(struct xarray *, unsigned long index);
+
 #define BITS_PER_XA_VALUE  (BITS_PER_LONG - 1)
 
 /**
@@ -158,6 +162,46 @@ struct xa_node {
unsigned long   tags[XA_MAX_TAGS][XA_TAG_LONGS];
 };
 
+#ifdef XA_DEBUG
+void xa_dump(const struct xarray *);
+void xa_dump_node(const struct xa_node *);
+#define XA_BUG_ON(node, x) do { \
+   if ((x) && (node)) \
+   xa_dump_node(node); \
+   BUG_ON(x); \
+   } while (0)
+#else
+#define XA_BUG_ON(node, x) do { } while (0)
+#endif
+
+/* Private */
+static inline void *xa_head(struct xarray *xa)
+{
+   return rcu_dereference_check(xa->xa_head, xa_lock_held(xa));
+}
+
+/* Private */
+static inline void *xa_head_locked(struct xarray *xa)
+{
+   return rcu_dereference_protected(xa->xa_head, xa_lock_held(xa));
+}
+
+/* Private */
+static inline void *xa_entry(struct xarray *xa,
+   const struct xa_node *node, unsigned int offset)
+{
+   XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+   return rcu_dereference_check(node->slots[offset], xa_lock_held(xa));
+}
+
+/* Private */
+static inline void *xa_entry_locked(struct xarray *xa,
+   const struct xa_node *node, unsigned int offset)
+{
+   XA_BUG_ON(node, offset >= XA_CHUNK_SIZE);
+   return rcu_dereference_protected(node->slots[offset], xa_lock_held(xa));
+}
+
 /*
  * Internal entries have the bottom two bits set to the value 10b.  Most
  * internal entries are pointers to the next node in the tree.  Since the
@@ -189,6 +233,12 @@ static inline bool xa_is_internal(void *entry)
return ((unsigned long)entry & 3) == 2;
 }
 
+/* Private */
+static inline struct xa_node *xa_to_node(void *entry)
+{
+   return (struct xa_node *)((unsigned long)entry & ~3UL);
+}
+
 /* Private */
 static inline bool xa_is_node(void *entry)
 {
@@ -222,4 +272,189 @@ static inline bool xa_is_sibling(void *entry)
 
 #define XA_RETRY_ENTRY xa_mk_internal(256)
 
+/**
+ * xa_is_retry() - Is the entry a retry entry?
+ * @entry: Entry retrieved from the XArray
+ *
+ * Return: %true if the entry is a retry entry.
+ */
+static inline bool xa_is_retry(void *entry)
+{
+   return unlikely(entry == XA_RETRY_ENTRY);
+}
+
+/**
+ * typedef xa_update_node_t - A callback function from the XArray.
+ * @node: The node which is being processed
+ *
+ * This function is called every time the XArray updates the count of
+ * present and value entries in a node.  It allows advanced users to
+ * maintain the private_list in the node.
+ */
+typedef void (*xa_update_node_t)(struct xa_node *node);
+
+/*
+ * The xa_state is opaque to its users.  It contains various different pieces
+ * of state involved in the current operation on the XArray.  It should be
+ * declared on the stack and passed between the various internal routines.
+ * The various elements in it should not be accessed directly, but only
+ * through the provided accessor functions.  The below documentation is for
+ * the benefit of those working on the code, not for users of the XArray.
+ *
+ * @xa_node usually points to the xa_node containing the slot we're operating
+ * on (and @xa_offset is the offset in the slots array).  If there is a
+ * single entry in the array at index 0, there are no allocated xa_nodes to
+ * point to, and so we store %NULL in @xa_node.  @xa_node is set to
+ * the value %XAS_RESTART if the xa_state is not walked to the correct
+ * position in the tree of nodes for this operation.  If an error occurs
+ * during an operation, it is set to an

[PATCH v4 17/73] xarray: Add xas_next and xas_prev

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

These two functions move the xas index by one position, and adjust the
rest of the iterator state to match it.  This is more efficient than
calling xas_set() as it keeps the iterator at the leaves of the tree
instead of walking the iterator from the root each time.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h |  71 ++-
 lib/xarray.c   |  74 
 tools/testing/radix-tree/xarray-test.c | 214 +
 3 files changed, 357 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index b648c1b93d9f..416708ace115 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -549,6 +549,12 @@ static inline bool xas_not_node(struct xa_node *node)
return (unsigned long)node < 4096;
 }
 
+/* True if the node represents RESTART or an error */
+static inline bool xas_frozen(struct xa_node *node)
+{
+   return (unsigned long)node & 1;
+}
+
 /* True if the node represents head-of-tree, RESTART or BOUNDS */
 static inline bool xas_top(struct xa_node *node)
 {
@@ -664,8 +670,8 @@ static inline bool xa_iter_skip(void *entry)
 }
 
 /*
- * node->shift is always 0 for the inline iterators unless we're processing
- * a multi-index entry.
+ * node->shift is always 0 for next_entry and next_tag unless we're processing
+ * a multi-index entry.  It can be non-0 for next/prev, so it's not used there.
  */
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 #define xa_node_shift(node)node->shift
@@ -673,6 +679,67 @@ static inline bool xa_iter_skip(void *entry)
 #define xa_node_shift(node)0
 #endif
 
+void *__xas_next(struct xa_state *);
+void *__xas_prev(struct xa_state *);
+
+/**
+ * xas_prev() - Move iterator to previous index.
+ * @xas: XArray operation state.
+ *
+ * If the @xas was in an error state, it will remain in an error state
+ * and this function will return %NULL.  If the @xas has never been walked,
+ * it will have the effect of calling xas_load().  Otherwise one will be
+ * subtracted from the index and the state will be walked to the correct
+ * location in the array for the next operation.
+ *
+ * If the iterator was referencing index 0, this function wraps
+ * around to %ULONG_MAX.
+ *
+ * Return: The entry at the new index.  This may be %NULL or an internal
+ * entry, although it should never be a node entry.
+ */
+static inline void *xas_prev(struct xa_state *xas)
+{
+   struct xa_node *node = xas->xa_node;
+
+   if (unlikely(xas_not_node(node) || node->shift ||
+   xas->xa_offset == 0))
+   return __xas_prev(xas);
+
+   xas->xa_index--;
+   xas->xa_offset--;
+   return xa_entry(xas->xa, node, xas->xa_offset);
+}
+
+/**
+ * xas_next() - Move state to next index.
+ * @xas: XArray operation state.
+ *
+ * If the @xas was in an error state, it will remain in an error state
+ * and this function will return %NULL.  If the @xas has never been walked,
+ * it will have the effect of calling xas_load().  Otherwise one will be
+ * added to the index and the state will be walked to the correct
+ * location in the array for the next operation.
+ *
+ * If the iterator was referencing index %ULONG_MAX, this function wraps
+ * around to 0.
+ *
+ * Return: The entry at the new index.  This may be %NULL or an internal
+ * entry, although it should never be a node entry.
+ */
+static inline void *xas_next(struct xa_state *xas)
+{
+   struct xa_node *node = xas->xa_node;
+
+   if (unlikely(xas_not_node(node) || node->shift ||
+   xas->xa_offset == XA_CHUNK_MASK))
+   return __xas_next(xas);
+
+   xas->xa_index++;
+   xas->xa_offset++;
+   return xa_entry(xas->xa, node, xas->xa_offset);
+}
+
 /**
  * xas_next_entry() - Advance iterator to next present entry.
  * @xas: XArray operation state.
diff --git a/lib/xarray.c b/lib/xarray.c
index f3875b251b41..8c6e83d10554 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -799,6 +799,80 @@ void xas_pause(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_pause);
 
+/*
+ * __xas_prev() - Find the previous entry in the XArray.
+ * @xas: XArray operation state.
+ *
+ * Helper function for xas_prev() which handles all the complex cases
+ * out of line.
+ */
+void *__xas_prev(struct xa_state *xas)
+{
+   void *entry;
+
+   if (!xas_frozen(xas->xa_node))
+   xas->xa_index--;
+   if (xas_not_node(xas->xa_node))
+   return xas_load(xas);
+
+   if (xas->xa_offset != get_offset(xas->xa_index, xas->xa_node))
+   xas->xa_offset--;
+
+   while (xas->xa_offset == 255) {
+   xas->xa_offset = xas->xa_node->offset - 1;
+   xas->xa_node = xa_parent(xas->xa, xas->xa_node);
+   if (!xas->xa_node)
+   return set_bounds(xas);
+   }
+
+   for (;;) {
+

[PATCH v4 24/73] page cache: Add and replace pages using the XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Use the XArray APIs to add and replace pages in the page cache.  This
removes two uses of the radix tree preload API and is significantly
shorter code.

Signed-off-by: Matthew Wilcox 
---
 mm/filemap.c | 142 +--
 1 file changed, 61 insertions(+), 81 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 51f88ffc5319..2439747a0a17 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -112,34 +112,6 @@
  *   ->tasklist_lock(memory_failure, collect_procs_ao)
  */
 
-static int page_cache_tree_insert(struct address_space *mapping,
- struct page *page, void **shadowp)
-{
-   struct radix_tree_node *node;
-   void **slot;
-   int error;
-
-   error = __radix_tree_create(>pages, page->index, 0,
-   , );
-   if (error)
-   return error;
-   if (*slot) {
-   void *p;
-
-   p = radix_tree_deref_slot_protected(slot, 
>pages.xa_lock);
-   if (!xa_is_value(p))
-   return -EEXIST;
-
-   mapping->nrexceptional--;
-   if (shadowp)
-   *shadowp = p;
-   }
-   __radix_tree_replace(>pages, node, slot, page,
-workingset_lookup_update(mapping));
-   mapping->nrpages++;
-   return 0;
-}
-
 static void page_cache_tree_delete(struct address_space *mapping,
   struct page *page, void *shadow)
 {
@@ -775,51 +747,44 @@ EXPORT_SYMBOL(file_write_and_wait_range);
  * locked.  This function does not add the new page to the LRU, the
  * caller must do that.
  *
- * The remove + add is atomic.  The only way this function can fail is
- * memory allocation failure.
+ * The remove + add is atomic.  This function cannot fail.
  */
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 {
-   int error;
+   struct address_space *mapping = old->mapping;
+   void (*freepage)(struct page *) = mapping->a_ops->freepage;
+   pgoff_t offset = old->index;
+   XA_STATE(xas, >pages, offset);
+   unsigned long flags;
 
VM_BUG_ON_PAGE(!PageLocked(old), old);
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
 
-   error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-   if (!error) {
-   struct address_space *mapping = old->mapping;
-   void (*freepage)(struct page *);
-   unsigned long flags;
-
-   pgoff_t offset = old->index;
-   freepage = mapping->a_ops->freepage;
-
-   get_page(new);
-   new->mapping = mapping;
-   new->index = offset;
+   get_page(new);
+   new->mapping = mapping;
+   new->index = offset;
 
-   xa_lock_irqsave(>pages, flags);
-   __delete_from_page_cache(old, NULL);
-   error = page_cache_tree_insert(mapping, new, NULL);
-   BUG_ON(error);
+   xas_lock_irqsave(, flags);
+   xas_store(, new);
 
-   /*
-* hugetlb pages do not participate in page cache accounting.
-*/
-   if (!PageHuge(new))
-   __inc_node_page_state(new, NR_FILE_PAGES);
-   if (PageSwapBacked(new))
-   __inc_node_page_state(new, NR_SHMEM);
-   xa_unlock_irqrestore(>pages, flags);
-   mem_cgroup_migrate(old, new);
-   radix_tree_preload_end();
-   if (freepage)
-   freepage(old);
-   put_page(old);
-   }
+   old->mapping = NULL;
+   /* hugetlb pages do not participate in page cache accounting. */
+   if (!PageHuge(old))
+   __dec_node_page_state(new, NR_FILE_PAGES);
+   if (!PageHuge(new))
+   __inc_node_page_state(new, NR_FILE_PAGES);
+   if (PageSwapBacked(old))
+   __dec_node_page_state(new, NR_SHMEM);
+   if (PageSwapBacked(new))
+   __inc_node_page_state(new, NR_SHMEM);
+   xas_unlock_irqrestore(, flags);
+   mem_cgroup_migrate(old, new);
+   if (freepage)
+   freepage(old);
+   put_page(old);
 
-   return error;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
@@ -828,12 +793,15 @@ static int __add_to_page_cache_locked(struct page *page,
  pgoff_t offset, gfp_t gfp_mask,
  void **shadowp)
 {
+   XA_STATE(xas, >pages, offset);
int huge = PageHuge(page);
struct mem_cgroup *memcg;
int error;
+   void *old;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapBacked(page), page);
+   xas_set_update(, workingset_lookup_update(mapping));
 
if

[PATCH v4 13/73] xarray: Add xa_for_each

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This iterator allows the user to efficiently walk a range of the array,
executing the loop body once for each non-NULL entry in that range.
This commit also includes xa_find() and xa_next() which are helper
functions for xa_for_each() but may also be useful in their own right.

In the xas family of functions, we also have xas_for_each(),
xas_find(), xas_next(), xas_pause() and xas_jump().  xas_pause() allows
a xas_for_each() iteration to be resumed later from the next element
and xas_jump() allows an iteration to be resumed from a specified index.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h | 111 ++
 lib/radix-tree.c   |   4 +-
 lib/xarray.c   | 166 +
 tools/testing/radix-tree/xarray-test.c |  91 ++
 4 files changed, 370 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a570d7d9a252..a62baf6f1a28 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -133,6 +133,35 @@ bool xa_get_tag(struct xarray *, unsigned long index, 
xa_tag_t);
 void *xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
+void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
+void *xa_find_after(struct xarray *xa, unsigned long *index, unsigned long 
max);
+
+/**
+ * xa_for_each() - Iterate over a portion of an XArray.
+ * @xa: XArray.
+ * @entry: Entry retrieved from array.
+ * @index: Index of @entry.
+ * @max: Maximum index to retrieve from array.
+ *
+ * Initialise @index to the minimum index you want to retrieve from
+ * the array.  During the iteration, @entry will have the value of the
+ * entry stored in @xa at @index.  The iteration will skip all NULL
+ * entries in the array.  You may modify @index during the
+ * iteration if you want to skip indices.  It is safe to modify the
+ * array during the iteration.  At the end of the iteration, @entry will
+ * be set to NULL and @index will have a value less than or equal to max.
+ *
+ * xa_for_each() is O(n.log(n)) while xas_for_each() is O(n).  You have
+ * to handle your own locking with xas_for_each(), and if you have to unlock
+ * after each iteration, it will also end up being O(n.log(n)).  xa_for_each()
+ * will spin if it hits a retry entry; if you intend to see retry entries,
+ * you should use the xas_for_each() iterator instead.  The xas_for_each()
+ * iterator will expand into more inline code than xa_for_each().
+ */
+#define xa_for_each(xa, entry, index, max) \
+   for (entry = xa_find(xa, , max); entry; \
+entry = xa_find_after(xa, , max))
+
 #define BITS_PER_XA_VALUE  (BITS_PER_LONG - 1)
 
 /**
@@ -486,6 +515,12 @@ static inline bool xas_valid(const struct xa_state *xas)
return !xas_invalid(xas);
 }
 
+/* True if the pointer is something other than a node */
+static inline bool xas_not_node(struct xa_node *node)
+{
+   return (unsigned long)node < 4096;
+}
+
 /* True if the node represents head-of-tree, RESTART or BOUNDS */
 static inline bool xas_top(struct xa_node *node)
 {
@@ -514,6 +549,7 @@ static inline bool xas_retry(struct xa_state *xas, void 
*entry)
 void *xas_load(struct xa_state *);
 void *xas_store(struct xa_state *, void *entry);
 void *xas_create(struct xa_state *);
+void *xas_find(struct xa_state *, unsigned long max);
 
 bool xas_get_tag(const struct xa_state *, xa_tag_t);
 void xas_set_tag(const struct xa_state *, xa_tag_t);
@@ -521,6 +557,7 @@ void xas_clear_tag(const struct xa_state *, xa_tag_t);
 void xas_init_tags(const struct xa_state *);
 
 bool xas_nomem(struct xa_state *, gfp_t);
+void xas_pause(struct xa_state *);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -590,6 +627,80 @@ static inline void xas_set_update(struct xa_state *xas, 
xa_update_node_t update)
xas->xa_update = update;
 }
 
+/* Skip over any of these entries when iterating */
+static inline bool xa_iter_skip(void *entry)
+{
+   return unlikely(!entry ||
+   (xa_is_internal(entry) && entry < XA_RETRY_ENTRY));
+}
+
+/*
+ * node->shift is always 0 for the inline iterators unless we're processing
+ * a multi-index entry.
+ */
+#ifdef CONFIG_RADIX_TREE_MULTIORDER
+#define xa_node_shift(node)node->shift
+#else
+#define xa_node_shift(node)0
+#endif
+
+/**
+ * xas_next_entry() - Advance iterator to next present entry.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ *
+ * xas_next_entry() is an inline function to optimise xarray traversal for
+ * speed.  It is equivalent to calling xas_find(), and will call xas_find()
+ * for all the hard cases.
+ *
+ * Return: The next present entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_entry(struct xa_state *xas, unsigned long max)

[PATCH v4 11/73] xarray: Add xa_store

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

xa_store() differs from radix_tree_insert() in that it will overwrite an
existing element in the array rather than returning an error.  This is
the behaviour which most users want, and those that want more complex
behaviour generally want to use the xas family of routines anyway.

For memory allocation, xa_store() will first attempt to request memory
from the slab allocator; if memory is not immediately available, it will
drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
It does not use the per-CPU cache, although those will continue to exist
until all radix tree users are converted to the xarray.

This patch also includes xa_erase() and __xa_erase() for a streamlined
way to store NULL.  Since there is no need to allocate memory in order
to store a NULL in the XArray, we do not need to trouble the user with
deciding what memory allocation flags to use.

Signed-off-by: Matthew Wilcox 

squash xa_store

Add __xa_erase
---
 include/linux/xarray.h|  98 +
 lib/radix-tree.c  |   4 +-
 lib/xarray.c  | 569 ++
 tools/testing/radix-tree/linux/rcupdate.h |   1 +
 tools/testing/radix-tree/xarray-test.c| 111 +-
 5 files changed, 779 insertions(+), 4 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index ed95ebe91169..6f1f55d9fc94 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -71,6 +71,32 @@ static inline void xa_init(struct xarray *xa)
 }
 
 void *xa_load(struct xarray *, unsigned long index);
+void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+
+/**
+ * xa_erase() - Erase this entry from the XArray.
+ * @xa: XArray.
+ * @index: Index of entry.
+ *
+ * This function is the equivalent of calling xa_store() with %NULL as
+ * the third argument.  The XArray does not need to allocate memory, so
+ * the user does not need to provide GFP flags.
+ */
+static inline void *xa_erase(struct xarray *xa, unsigned long index)
+{
+   return xa_store(xa, index, NULL, 0);
+}
+
+/**
+ * xa_empty() - Determine if an array has any present entries.
+ * @xa: XArray.
+ *
+ * Return: %true if the array contains only NULL pointers.
+ */
+static inline bool xa_empty(const struct xarray *xa)
+{
+   return xa->xa_head == NULL;
+}
 
 typedef unsigned __bitwise xa_tag_t;
 #define XA_TAG_0   ((__force xa_tag_t)0U)
@@ -80,9 +106,15 @@ typedef unsigned __bitwise xa_tag_t;
 
 #define XA_TAG_MAX XA_TAG_2
 #define XA_FREE_TAGXA_TAG_0
+#define XA_FLAGS_TRACK_FREE((__force gfp_t)(1U << __GFP_BITS_SHIFT))
 #define XA_FLAGS_TAG(tag)  ((__force gfp_t)((2U << __GFP_BITS_SHIFT) << \
(__force unsigned)(tag)))
 
+static inline bool xa_track_free(const struct xarray *xa)
+{
+   return xa->xa_flags & XA_FLAGS_TRACK_FREE;
+}
+
 /**
  * xa_tagged() - Inquire whether any entry in this array has a tag set
  * @xa: Array
@@ -151,6 +183,7 @@ static inline bool xa_is_value(void *entry)
 #define xa_lock_held(xa)   lockdep_is_held(&(xa)->xa_lock)
 
 /* Versions of the normal API which require the caller to hold the xa_lock */
+void *__xa_erase(struct xarray *, unsigned long index);
 void *__xa_set_tag(struct xarray *, unsigned long index, xa_tag_t);
 void *__xa_clear_tag(struct xarray *, unsigned long index, xa_tag_t);
 
@@ -265,6 +298,12 @@ static inline bool xa_is_internal(void *entry)
return ((unsigned long)entry & 3) == 2;
 }
 
+/* Private */
+static inline void *xa_mk_node(struct xa_node *node)
+{
+   return (void *)((unsigned long)node | 2);
+}
+
 /* Private */
 static inline struct xa_node *xa_to_node(void *entry)
 {
@@ -445,6 +484,12 @@ static inline bool xas_valid(const struct xa_state *xas)
return !xas_invalid(xas);
 }
 
+/* True if the node represents head-of-tree, RESTART or BOUNDS */
+static inline bool xas_top(struct xa_node *node)
+{
+   return node <= XAS_BOUNDS;
+}
+
 /**
  * xas_retry() - Handle a retry entry.
  * @xas: XArray operation state.
@@ -465,10 +510,15 @@ static inline bool xas_retry(struct xa_state *xas, void 
*entry)
 }
 
 void *xas_load(struct xa_state *);
+void *xas_store(struct xa_state *, void *entry);
+void *xas_create(struct xa_state *);
 
 bool xas_get_tag(const struct xa_state *, xa_tag_t);
 void xas_set_tag(const struct xa_state *, xa_tag_t);
 void xas_clear_tag(const struct xa_state *, xa_tag_t);
+void xas_init_tags(const struct xa_state *);
+
+bool xas_nomem(struct xa_state *, gfp_t);
 
 /**
  * xas_reload() - Refetch an entry from the xarray.
@@ -493,4 +543,52 @@ static inline void *xas_reload(struct xa_state *xas)
return xa_head(xas->xa);
 }
 
+/**
+ * xas_set() - Set up XArray operation state for a different index.
+ * @xas: XArray operation state.
+ * @index: New index into the XArray.
+ *
+ * Move the operation state to refer to a

[PATCH v4 19/73] xarray: Add MAINTAINERS entry

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Add myself as XArray and IDR maintainer.

Signed-off-by: Matthew Wilcox 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d4fdcb12616c..b2f8d606756b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14874,6 +14874,18 @@ T: git 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/vdso
 S: Maintained
 F: arch/x86/entry/vdso/
 
+XARRAY
+M: Matthew Wilcox 
+M: Matthew Wilcox 
+L: linux-fsde...@vger.kernel.org
+S: Supported
+F: Documentation/core-api/xarray.rst
+F: lib/idr.c
+F: lib/xarray.c
+F: include/linux/idr.h
+F: include/linux/xarray.h
+F: tools/testing/radix-tree
+
 XC2028/3028 TUNER DRIVER
 M: Mauro Carvalho Chehab 
 M: Mauro Carvalho Chehab 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 06/73] xarray: Add definition of struct xarray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a direct replacement for struct radix_tree_root.  Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.

Signed-off-by: Matthew Wilcox 
---
 include/linux/radix-tree.h   | 31 -
 include/linux/xarray.h   | 45 +++
 lib/Makefile |  2 +-
 lib/idr.c|  4 +-
 lib/radix-tree.c | 75 
 lib/xarray.c | 47 
 tools/include/linux/spinlock.h   |  1 +
 tools/testing/radix-tree/.gitignore  |  1 +
 tools/testing/radix-tree/Makefile|  8 +++-
 tools/testing/radix-tree/linux/bug.h |  1 +
 tools/testing/radix-tree/linux/kconfig.h |  1 +
 tools/testing/radix-tree/linux/xarray.h  |  2 +
 tools/testing/radix-tree/multiorder.c|  6 +--
 tools/testing/radix-tree/test.c  |  6 +--
 14 files changed, 158 insertions(+), 72 deletions(-)
 create mode 100644 lib/xarray.c
 create mode 100644 tools/testing/radix-tree/linux/kconfig.h
 create mode 100644 tools/testing/radix-tree/linux/xarray.h

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 5130f44d9f93..f31a278de8eb 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -30,6 +30,9 @@
 #include 
 #include 
 
+/* Keep unconverted code working */
+#define radix_tree_rootxarray
+
 /*
  * The bottom two bits of the slot determine how the remaining bits in the
  * slot are interpreted:
@@ -59,10 +62,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 #define RADIX_TREE_MAX_TAGS 3
 
-#ifndef RADIX_TREE_MAP_SHIFT
-#define RADIX_TREE_MAP_SHIFT   (CONFIG_BASE_SMALL ? 4 : 6)
-#endif
-
+#define RADIX_TREE_MAP_SHIFT   XA_CHUNK_SHIFT
 #define RADIX_TREE_MAP_SIZE(1UL << RADIX_TREE_MAP_SHIFT)
 #define RADIX_TREE_MAP_MASK(RADIX_TREE_MAP_SIZE-1)
 
@@ -95,35 +95,20 @@ struct radix_tree_node {
unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
 };
 
-/* The top bits of gfp_mask are used to store the root tags and the IDR flag */
+/* The top bits of xa_flags are used to store the root tags and the IDR flag */
 #define ROOT_IS_IDR((__force gfp_t)(1 << __GFP_BITS_SHIFT))
 #define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT + 1)
 
-struct radix_tree_root {
-   spinlock_t  xa_lock;
-   gfp_t   gfp_mask;
-   struct radix_tree_node  __rcu *rnode;
-};
-
-#define RADIX_TREE_INIT(name, mask){   \
-   .xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),  \
-   .gfp_mask = (mask), \
-   .rnode = NULL,  \
-}
+#define RADIX_TREE_INIT(name, mask)__XARRAY_INIT(name, mask)
 
 #define RADIX_TREE(name, mask) \
struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
 
-#define INIT_RADIX_TREE(root, mask)\
-do {   \
-   spin_lock_init(&(root)->xa_lock);   \
-   (root)->gfp_mask = (mask);  \
-   (root)->rnode = NULL;   \
-} while (0)
+#define INIT_RADIX_TREE(root, mask) __xa_init(root, mask)
 
 static inline bool radix_tree_empty(const struct radix_tree_root *root)
 {
-   return root->rnode == NULL;
+   return root->xa_head == NULL;
 }
 
 /**
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 2c45d87a3476..dcdac2053ea6 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -19,9 +19,54 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
+/**
+ * struct xarray - The anchor of the XArray.
+ *
+ * To use the xarray, define it statically or embed it in your data structure.
+ * It is a very small data structure, so it does not usually make sense to
+ * allocate it separately and keep a pointer to it in your data structure.
+ */
+/*
+ * If all of the entries in the array are NULL, @xa_head is a NULL pointer.
+ * If the only non-NULL entry in the array is at index 0, @xa_head is that
+ * entry.  If any other entry in the array is non-NULL, @xa_head points
+ * to an @xa_node.
+ */
+struct xarray {
+/* private: The entire xarray is opaque */
+   spinlock_t  xa_lock;
+   gfp_t   xa_flags;
+   void __rcu *xa_head;
+};
+
+#define __XARRAY_INIT(name, flags) {   \
+   .xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock),  \
+   .xa_flags = flags,  \
+   .xa_head = NULL,\
+}
+
+#define XARRAY_INIT(name) __XARRAY_INIT(name, 0)
+
+#define

[PATCH v4 08/73] xarray: Add documentation

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is documentation on how to use the XArray, not details about its
internal implementation.

Signed-off-by: Matthew Wilcox 
---
 Documentation/core-api/index.rst  |   1 +
 Documentation/core-api/xarray.rst | 281 ++
 2 files changed, 282 insertions(+)
 create mode 100644 Documentation/core-api/xarray.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index d5bbe035316d..eb16ba30aeb6 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -18,6 +18,7 @@ Core utilities
local_ops
workqueue
genericirq
+   xarray
flexible-arrays
librs
genalloc
diff --git a/Documentation/core-api/xarray.rst 
b/Documentation/core-api/xarray.rst
new file mode 100644
index ..871161539242
--- /dev/null
+++ b/Documentation/core-api/xarray.rst
@@ -0,0 +1,281 @@
+==
+XArray
+==
+
+Overview
+
+
+The XArray is an abstract data type which behaves like a very large array
+of pointers.  It meets many of the same needs as a hash or a conventional
+resizable array.  Unlike a hash, it allows you to sensibly go to the
+next or previous entry in a cache-efficient manner.  In contrast to
+a resizable array, there is no need for copying data or changing MMU
+mappings in order to grow the array.  It is more memory-efficient,
+parallelisable and cache friendly than a doubly-linked list.  It takes
+advantage of RCU to perform lookups without locking.
+
+The XArray implementation is efficient when the indices used are
+densely clustered; hashing the object and using the hash as the index
+will not perform well.  The XArray is optimised for small indices,
+but still has good performance with large indices.  If your index is
+larger than ULONG_MAX then the XArray is not the data type for you.
+The most important user of the XArray is the page cache.
+
+A freshly-initialised XArray contains a ``NULL`` pointer at every index.
+Each non-``NULL`` entry in the array has three bits associated with
+it called tags.  Each tag may be flipped on or off independently of
+the others.  You can search for entries with a given tag set.
+
+Normal pointers may be stored in the XArray directly.  They must be 4-byte
+aligned, which is true for any pointer returned from :c:func:`kmalloc` and
+:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
+nor for function pointers.  You can store pointers to statically allocated
+objects, as long as those objects have an alignment of at least 4.
+
+The XArray does not support storing :c:func:`IS_ERR` pointers; some
+conflict with data values and others conflict with entries the XArray
+uses for its own purposes.  If you need to store special values which
+cannot be confused with real kernel pointers, the values 4, 8, ... 4092
+are available.
+
+You can also store integers between 0 and ``LONG_MAX`` in the XArray.
+You must first convert it into an entry using :c:func:`xa_mk_value`.
+When you retrieve an entry from the XArray, you can check whether it is
+a data value by calling :c:func:`xa_is_value`, and convert it back to
+an integer by calling :c:func:`xa_to_value`.
+
+An unusual feature of the XArray is the ability to create entries which
+occupy a range of indices.  Once stored to, looking up any index in
+the range will return the same entry as looking up any other index in
+the range.  Setting a tag on one index will set it on all of them.
+Storing to any index will store to all of them.  Multi-index entries can
+be explicitly split into smaller entries, or storing ``NULL`` into any
+entry will cause the XArray to forget about the range.
+
+Normal API
+==
+
+Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
+for statically allocated XArrays or :c:func:`xa_init` for dynamically
+allocated ones.
+
+You can then set entries using :c:func:`xa_store` and get entries
+using :c:func:`xa_load`.  xa_store will overwrite any entry with the
+new entry and return the previous entry stored at that index.  If you
+store %NULL, the XArray does not need to allocate memory.  You can call
+:c:func:`xa_erase` to avoid inventing a GFP flags value.  There is no
+difference between an entry that has never been stored to and one that
+has most recently had %NULL stored to it.
+
+You can conditionally replace an entry at an index by using
+:c:func:`xa_cmpxchg`.  Like :c:func:`cmpxchg`, it will only succeed if
+the entry at that index has the 'old' value.  It also returns the entry
+which was at that index; if it returns the same entry which was passed as
+'old', then :c:func:`xa_cmpxchg` succeeded.
+
+You can enquire whether a tag is set on an entry by using
+:c:func:`xa_get_tag`.  If the entry is not ``NULL``, you can set a tag
+on it by using :c:func:`xa_set_tag` and remove the tag from an entry by
+calling :c:func:`xa_clear_tag`.  You can ask whether any entry in the
+XArray

[PATCH v4 00/73] XArray version 4

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

I looked through some notes and decided this was version 4 of the XArray.
Last posted two weeks ago, this version includes a *lot* of changes.
I'd like to thank Dave Chinner for his feedback, encouragement and
distracting ideas for improvement, which I'll get to once this is merged.

Highlights:
 - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
 - The page cache is now fully converted to the XArray.
 - Many more tests in the test-suite.

This patch set is not for applying.  0day is still reporting problems,
and I'd feel bad for eating someone's data.  These patches apply on top
of a set of prepatory patches which just aren't interesting.  If you
want to see the patches applied to a tree, I suggest pulling my git tree:
http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-2017-12-04
I also left out the idr_preload removals.  They're still in the git tree,
but I'm not looking for feedback on them.

Changes since v3:

XArray API differences:
 - Store a pointer to the struct xarray in the xa_state (changes almost
   every prototype in the advanced API).
 - Added xas_lock() etc to operate on the XArray stored in the xa_state.
 - Added xa_erase() as a synonym for xa_store(..., NULL, 0).
 - Added __xa_erase() which is an exact replacement for radix_tree_delete();
   it assumes you are holding the xa_lock.
 - Renamed xa_next() to xa_find_after().
 - Renamed xas_next() to xas_next_entry().
 - Renamed xas_prev_any() and xas_next_any() to xas_prev() and xas_next().
 - Changed the semantics of xas_prev() and xas_next() substantially
   (see their kernel-doc).
 - Renamed skip entry to zero entry.
 - Introduced a new XAS_BOUNDS state to distinguish between an xa_state
   that has not been walked and an xa_state that has walked off the
   current end of the array.
 - Changed xas_set_err() to take a negative errno, not a positive one.
   XAS_ERROR still takes a positive errno, but this is an undocumented
   internal part of the implementation, not an API.
 - Changed behaviour when returning a multi-index entry; xas.xa_index
   is now always set to the first (canonical) index of this entry.
   Before, it was never rewound.  Eg if you have an entry occupying
   indices 4-7, and called xas_load() with xas.xa_index set to 6, it
   will now set xas.xa_index to 4.
 - Changed xas_nomem() to release any allocated memory if there is no
   ENOMEM error.  This means that (unless the user explicitly bypasses
   calling xas_nomem() on some path), there's no need to call xas_destroy()
   and it is removed from the API.
 - Added xas_create_range() for the benefit of our current hugepage users.
   I hope to be able to remove it again once they are converted to use
   multi-index entries.
 - Add xa_get_maybe_tag() which will call xa_get_entries() if you specify
   XA_NO_TAG and xa_get_tagged otherwise.

IDR API differences:
 - Removed the IDR cyclic API change (decided not to do it after all).
 - Made idr_alloc_ul() and idr_alloc_u32() assign the ID before inserting
   the pointer into the IDR, so a lookup cannot find an uninitialised object.

Bug fixes:
 - Made INIT_RADIX_TREE() initialise the xa_lock correctly so lockdep
   doesn't whine about it.
 - Fixed a locking bug in the IPC IDR conversion.
 - If we call xas_store(, NULL) and that causes the XArray to shrink,
   set the xas to the XAS_BOUNDS state so we don't dereference a pointer
   to a node which has been passed to RCU free.  This is only a problem
   on !SMP machines.
 - Fixed bug when shrinking the XArray to a single entry at index 0.
 - Fixed bug where we could scan off the end of the slot array when storing
   a NULL.
 - Made xas_pause() not do anything if we're in an error state.  Before, it
   would have dereferenced a NULL pointer.
 - Fixed a bug in xa_find_after().  it just plain didn't work.  Now there is
   a test-case for it.

Conversions:
 - Converted backing dev cgroup code from radix tree to XArray.
 - Converted the USB XHCI driver from radix tree to XArray.
 - Moved btrfs_page_exists_in_range() guts to page cache code.
 - Renamed page_cache_{next,prev}_hole() to ..._gap().  The page cache
   doesn't cache holes.
 - Finished the page cache conversion.

Miscellaneous:
 - Documentation.  Lots and lots of documentation.  xarray.rst, more XArray
   kernel-doc and also IDR kernel-doc which has been missing for years.
 - Added MAINTAINERS entry for XArray/IDR.
 - Deleted the now-unused parts of the radix tree API (see git tree).
 - Added XA_DEBUG code and enable it in test-suite.
 - Improved code generation for initialising xa_state by explicitly
   initialising the struct padding (stupid gcc).
 - Stub out more code if CONFIG_RADIX_TREE_MULTIORDER isn't enabled.
 - Added more tests to the test-suite.
 - Removed the IDR preload conversions from this patch set (see git tree).

Matthew Wilcox (73):
  xfs: Rename xa_ elements to ail_
  xarray: Add the xa_lock to the

[PATCH v4 30/73] mm: Convert workingset to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

We construct a fake XA_STATE and use it to delete the node with xa_store()
rather than adding a special function for this unique use case.

Signed-off-by: Matthew Wilcox 
---
 include/linux/swap.h |  4 ++--
 mm/workingset.c  | 48 
 2 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c2b8128799c1..e4a8afcb214c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -300,12 +300,12 @@ bool workingset_refault(void *shadow);
 void workingset_activation(struct page *page);
 
 /* Do not use directly, use workingset_lookup_update */
-void workingset_update_node(struct radix_tree_node *node);
+void workingset_update_node(struct xa_node *node);
 
 /* Returns workingset_update_node() if the mapping has shadow entries. */
 #define workingset_lookup_update(mapping)  \
 ({ \
-   radix_tree_update_node_t __helper = workingset_update_node; \
+   xa_update_node_t __helper = workingset_update_node; \
if (dax_mapping(mapping) || shmem_mapping(mapping)) \
__helper = NULL;\
__helper;   \
diff --git a/mm/workingset.c b/mm/workingset.c
index 0a3465700d5f..e51deb274d2f 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -148,7 +148,7 @@
  * and activations is maintained (node->inactive_age).
  *
  * On eviction, a snapshot of this counter (along with some bits to
- * identify the node) is stored in the now empty page cache radix tree
+ * identify the node) is stored in the now empty page cache
  * slot of the evicted page.  This is called a shadow entry.
  *
  * On cache misses for which there are shadow entries, an eligible
@@ -162,7 +162,7 @@
 
 /*
  * Eviction timestamps need to be able to cover the full range of
- * actionable refaults. However, bits are tight in the radix tree
+ * actionable refaults. However, bits are tight in the xarray
  * entry, and after storing the identifier for the lruvec there might
  * not be enough left to represent every single actionable refault. In
  * that case, we have to sacrifice granularity for distance, and group
@@ -338,7 +338,7 @@ void workingset_activation(struct page *page)
 
 static struct list_lru shadow_nodes;
 
-void workingset_update_node(struct radix_tree_node *node)
+void workingset_update_node(struct xa_node *node)
 {
/*
 * Track non-empty nodes that contain only shadow entries;
@@ -370,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
local_irq_enable();
 
/*
-* Approximate a reasonable limit for the radix tree nodes
+* Approximate a reasonable limit for the nodes
 * containing shadow entries. We don't need to keep more
 * shadow entries than possible pages on the active list,
 * since refault distances bigger than that are dismissed.
@@ -385,11 +385,11 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
 * worst-case density of 1/8th. Below that, not all eligible
 * refaults can be detected anymore.
 *
-* On 64-bit with 7 radix_tree_nodes per page and 64 slots
+* On 64-bit with 7 xa_nodes per page and 64 slots
 * each, this will reclaim shadow entries when they consume
 * ~1.8% of available memory:
 *
-* PAGE_SIZE / radix_tree_nodes / node_entries * 8 / PAGE_SIZE
+* PAGE_SIZE / xa_nodes / node_entries * 8 / PAGE_SIZE
 */
if (sc->memcg) {
cache = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
@@ -410,9 +410,9 @@ static enum lru_status shadow_lru_isolate(struct list_head 
*item,
  spinlock_t *lru_lock,
  void *arg)
 {
+   XA_STATE(xas, NULL, 0);
struct address_space *mapping;
-   struct radix_tree_node *node;
-   unsigned int i;
+   struct xa_node *node;
int ret;
 
/*
@@ -420,14 +420,14 @@ static enum lru_status shadow_lru_isolate(struct 
list_head *item,
 * the shadow node LRU under the mapping->pages.xa_lock and the
 * lru_lock.  Because the page cache tree is emptied before
 * the inode can be destroyed, holding the lru_lock pins any
-* address_space that has radix tree nodes on the LRU.
+* address_space that has nodes on the LRU.
 *
 * We can then safely transition to the mapping->pages.xa_lock to
 * pin only the address_space of the particular node we want
 * to reclaim, take the node off-LRU, and drop the lru_lock.
 */
 
-   node = container_of(item, struct radix_tree_node, private_list);
+

[PATCH v4 31/73] mm: Convert truncate to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is essentially xa_cmpxchg() with the locking handled above us,
and it doesn't have to handle replacing a NULL entry.

Signed-off-by: Matthew Wilcox 
---
 mm/truncate.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 69bb743dd7e5..70323c347298 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -33,15 +33,12 @@
 static inline void __clear_shadow_entry(struct address_space *mapping,
pgoff_t index, void *entry)
 {
-   struct radix_tree_node *node;
-   void **slot;
+   XA_STATE(xas, >pages, index);
 
-   if (!__radix_tree_lookup(>pages, index, , ))
+   xas_set_update(, workingset_update_node);
+   if (xas_load() != entry)
return;
-   if (*slot != entry)
-   return;
-   __radix_tree_replace(>pages, node, slot, NULL,
-workingset_update_node);
+   xas_store(, NULL);
mapping->nrexceptional--;
 }
 
@@ -746,10 +743,10 @@ int invalidate_inode_pages2_range(struct address_space 
*mapping,
index++;
}
/*
-* For DAX we invalidate page tables after invalidating radix tree.  We
+* For DAX we invalidate page tables after invalidating page cache.  We
 * could invalidate page tables while invalidating each entry however
 * that would be expensive. And doing range unmapping before doesn't
-* work as we have no cheap way to find whether radix tree entry didn't
+* work as we have no cheap way to find whether page cache entry didn't
 * get remapped later.
 */
if (dax_mapping(mapping)) {
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 27/73] page cache: Convert delete_batch to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Rename the function from page_cache_tree_delete_batch to just
page_cache_delete_batch.

Signed-off-by: Matthew Wilcox 
---
 mm/filemap.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6c9cad248e7f..9e6158cfbaeb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -276,7 +276,7 @@ void delete_from_page_cache(struct page *page)
 EXPORT_SYMBOL(delete_from_page_cache);
 
 /*
- * page_cache_tree_delete_batch - delete several pages from page cache
+ * page_cache_delete_batch - delete several pages from page cache
  * @mapping: the mapping to which pages belong
  * @pvec: pagevec with pages to delete
  *
@@ -289,23 +289,18 @@ EXPORT_SYMBOL(delete_from_page_cache);
  *
  * The function expects xa_lock to be held.
  */
-static void
-page_cache_tree_delete_batch(struct address_space *mapping,
+static void page_cache_delete_batch(struct address_space *mapping,
 struct pagevec *pvec)
 {
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, pvec->pages[0]->index);
int total_pages = 0;
int i = 0, tail_pages = 0;
struct page *page;
-   pgoff_t start;
 
-   start = pvec->pages[0]->index;
-   radix_tree_for_each_slot(slot, >pages, , start) {
+   xas_set_update(, workingset_lookup_update(mapping));
+   xas_for_each(, page, ULONG_MAX) {
if (i >= pagevec_count(pvec) && !tail_pages)
break;
-   page = radix_tree_deref_slot_protected(slot,
-  >pages.xa_lock);
if (xa_is_value(page))
continue;
if (!tail_pages) {
@@ -328,9 +323,7 @@ page_cache_tree_delete_batch(struct address_space *mapping,
} else {
tail_pages--;
}
-   radix_tree_clear_tags(>pages, iter.node, slot);
-   __radix_tree_replace(>pages, iter.node, slot, NULL,
-   workingset_lookup_update(mapping));
+   xas_store(, NULL);
total_pages++;
}
mapping->nrpages -= total_pages;
@@ -351,7 +344,7 @@ void delete_from_page_cache_batch(struct address_space 
*mapping,
 
unaccount_page_cache_page(mapping, pvec->pages[i]);
}
-   page_cache_tree_delete_batch(mapping, pvec);
+   page_cache_delete_batch(mapping, pvec);
xa_unlock_irqrestore(>pages, flags);
 
for (i = 0; i < pagevec_count(pvec); i++)
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 26/73] page cache: Convert page cache lookups to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Introduce page_cache_pin() to factor out the common logic between the
various lookup routines:

find_get_entry
find_get_entries
find_get_pages_range
find_get_pages_contig
find_get_pages_range_tag
find_get_entries_tag
filemap_map_pages

By using the xa_state to control the iteration, we can remove most of
the gotos and just use the normal break/continue loop control flow.

Also convert the regression1 read-side to XArray since that simulates
the functions being modified here.

Signed-off-by: Matthew Wilcox 
---
 include/linux/pagemap.h|   6 +-
 mm/filemap.c   | 380 +
 tools/testing/radix-tree/regression1.c |  68 +++---
 3 files changed, 129 insertions(+), 325 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 34d4fa3ad1c5..1a59f4a5424a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -365,17 +365,17 @@ static inline unsigned find_get_pages(struct 
address_space *mapping,
 unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
   unsigned int nr_pages, struct page **pages);
 unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t 
*index,
-   pgoff_t end, int tag, unsigned int nr_pages,
+   pgoff_t end, xa_tag_t tag, unsigned int nr_pages,
struct page **pages);
 static inline unsigned find_get_pages_tag(struct address_space *mapping,
-   pgoff_t *index, int tag, unsigned int nr_pages,
+   pgoff_t *index, xa_tag_t tag, unsigned int nr_pages,
struct page **pages)
 {
return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag,
nr_pages, pages);
 }
 unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
-   int tag, unsigned int nr_entries,
+   xa_tag_t tag, unsigned int nr_entries,
struct page **entries, pgoff_t *indices);
 
 struct page *grab_cache_page_write_begin(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 6e2808fd3c06..6c9cad248e7f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1401,6 +1401,32 @@ bool page_cache_range_empty(struct address_space 
*mapping, pgoff_t index,
 }
 EXPORT_SYMBOL_GPL(page_cache_range_empty);
 
+/*
+ * page_cache_pin() - Try to pin a page in the page cache.
+ * @xas: The XArray operation state.
+ * @pagep: The page which has been previously found at this location.
+ *
+ * On success, the page has an elevated refcount, but is not locked.
+ * This implements the lockless pagecache protocol as described in
+ * include/linux/pagemap.h; see page_cache_get_speculative().
+ *
+ * Return: True if the page is still in the cache.
+ */
+static bool page_cache_pin(struct xa_state *xas, struct page *page)
+{
+   struct page *head = compound_head(page);
+   bool got = page_cache_get_speculative(head);
+
+   if (likely(got && (xas_reload(xas) == page) &&
+   (compound_head(page) == head)))
+   return true;
+
+   if (got)
+   put_page(head);
+   xas_retry(xas, XA_RETRY_ENTRY);
+   return false;
+}
+
 /**
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
@@ -1416,51 +1442,21 @@ EXPORT_SYMBOL_GPL(page_cache_range_empty);
  */
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 {
-   void **pagep;
-   struct page *head, *page;
+   XA_STATE(xas, >pages, offset);
+   struct page *page;
 
rcu_read_lock();
-repeat:
-   page = NULL;
-   pagep = radix_tree_lookup_slot(>pages, offset);
-   if (pagep) {
-   page = radix_tree_deref_slot(pagep);
-   if (unlikely(!page))
-   goto out;
-   if (radix_tree_exception(page)) {
-   if (radix_tree_deref_retry(page))
-   goto repeat;
-   /*
-* A shadow entry of a recently evicted page,
-* or a swap entry from shmem/tmpfs.  Return
-* it without attempting to raise page count.
-*/
-   goto out;
-   }
-
-   head = compound_head(page);
-   if (!page_cache_get_speculative(head))
-   goto repeat;
-
-   /* The page was split under us? */
-   if (compound_head(page) != head) {
-   put_page(head);
-   goto repeat;
-   }
+   do {
+   page = xas_load();
+   if (xas_retry(, page))
+   continue;
+   if (!page

[PATCH v4 32/73] mm: Convert add_to_swap_cache to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Combine __add_to_swap_cache and add_to_swap_cache into one function
since there is no more need to preload.

Signed-off-by: Matthew Wilcox 
---
 mm/swap_state.c | 93 ++---
 1 file changed, 29 insertions(+), 64 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3f95e8fc4cb2..117b5da9dc01 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -107,14 +107,15 @@ void show_swap_cache_info(void)
 }
 
 /*
- * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
+ * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
  */
-int __add_to_swap_cache(struct page *page, swp_entry_t entry)
+int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
 {
-   int error, i, nr = hpage_nr_pages(page);
-   struct address_space *address_space;
+   struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
+   XA_STATE(xas, _space->pages, idx);
+   unsigned int i, nr = compound_order(page);
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@@ -123,50 +124,30 @@ int __add_to_swap_cache(struct page *page, swp_entry_t 
entry)
page_ref_add(page, nr);
SetPageSwapCache(page);
 
-   address_space = swap_address_space(entry);
-   xa_lock_irq(_space->pages);
-   for (i = 0; i < nr; i++) {
-   set_page_private(page + i, entry.val + i);
-   error = radix_tree_insert(_space->pages,
- idx + i, page + i);
-   if (unlikely(error))
-   break;
-   }
-   if (likely(!error)) {
+   do {
+   xas_lock_irq();
+   xas_create_range(, idx + nr - 1);
+   if (xas_error())
+   goto unlock;
+   for (i = 0; i < nr; i++) {
+   VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
+   set_page_private(page + i, entry.val + i);
+   xas_store(, page + i);
+   xas_next();
+   }
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
ADD_CACHE_INFO(add_total, nr);
-   } else {
-   /*
-* Only the context which have set SWAP_HAS_CACHE flag
-* would call add_to_swap_cache().
-* So add_to_swap_cache() doesn't returns -EEXIST.
-*/
-   VM_BUG_ON(error == -EEXIST);
-   set_page_private(page + i, 0UL);
-   while (i--) {
-   radix_tree_delete(_space->pages, idx + i);
-   set_page_private(page + i, 0UL);
-   }
-   ClearPageSwapCache(page);
-   page_ref_sub(page, nr);
-   }
-   xa_unlock_irq(_space->pages);
+unlock:
+   xas_unlock_irq();
+   } while (xas_nomem(, gfp));
 
-   return error;
-}
-
-
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask)
-{
-   int error;
+   if (!xas_error())
+   return 0;
 
-   error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page));
-   if (!error) {
-   error = __add_to_swap_cache(page, entry);
-   radix_tree_preload_end();
-   }
-   return error;
+   ClearPageSwapCache(page);
+   page_ref_sub(page, nr);
+   return xas_error();
 }
 
 /*
@@ -220,7 +201,7 @@ int add_to_swap(struct page *page)
goto fail;
 
/*
-* Radix-tree node allocations from PF_MEMALLOC contexts could
+* XArray node allocations from PF_MEMALLOC contexts could
 * completely exhaust the page allocator. __GFP_NOMEMALLOC
 * stops emergency reserves from being allocated.
 *
@@ -232,7 +213,6 @@ int add_to_swap(struct page *page)
 */
err = add_to_swap_cache(page, entry,
__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
-   /* -ENOMEM radix-tree allocation failure */
if (err)
/*
 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
@@ -400,19 +380,11 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
break;  /* Out of memory */
}
 
-   /*
-* call radix_tree_preload() while we can wait.
-*/
-   err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL);
-   if (err)
-   break;
-
/*
 * Swap entry may have been freed since our caller observed it.
 */
err =

[PATCH v4 29/73] mm: Convert page-writeback to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Includes moving mapping_tagged() to fs.h as a static inline, and
changing it to return bool.

Signed-off-by: Matthew Wilcox 
---
 include/linux/fs.h  | 17 +--
 mm/page-writeback.c | 62 +++--
 2 files changed, 32 insertions(+), 47 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e4345c13e237..c58bc3c619bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,15 +470,18 @@ struct block_device {
struct mutexbd_fsfreeze_mutex;
 } __randomize_layout;
 
+/* XArray tags, for tagging dirty and writeback pages in the pagecache. */
+#define PAGECACHE_TAG_DIRTYXA_TAG_0
+#define PAGECACHE_TAG_WRITEBACKXA_TAG_1
+#define PAGECACHE_TAG_TOWRITE  XA_TAG_2
+
 /*
- * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
- * radix trees
+ * Returns true if any of the pages in the mapping are marked with the tag.
  */
-#define PAGECACHE_TAG_DIRTY0
-#define PAGECACHE_TAG_WRITEBACK1
-#define PAGECACHE_TAG_TOWRITE  2
-
-int mapping_tagged(struct address_space *mapping, int tag);
+static inline bool mapping_tagged(struct address_space *mapping, xa_tag_t tag)
+{
+   return xa_tagged(>pages, tag);
+}
 
 static inline void i_mmap_lock_write(struct address_space *mapping)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 588ce729d199..0407436a8305 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2098,33 +2098,25 @@ void __init page_writeback_init(void)
  * dirty pages in the file (thus it is important for this function to be quick
  * so that it can tag pages faster than a dirtying process can create them).
  */
-/*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
- */
 void tag_pages_for_writeback(struct address_space *mapping,
 pgoff_t start, pgoff_t end)
 {
-#define WRITEBACK_TAG_BATCH 4096
-   unsigned long tagged = 0;
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
+   unsigned int tagged = 0;
+   void *page;
 
-   xa_lock_irq(>pages);
-   radix_tree_for_each_tagged(slot, >pages, , start,
-   PAGECACHE_TAG_DIRTY) {
-   if (iter.index > end)
-   break;
-   radix_tree_iter_tag_set(>pages, ,
-   PAGECACHE_TAG_TOWRITE);
-   tagged++;
-   if ((tagged % WRITEBACK_TAG_BATCH) != 0)
+   xas_lock_irq();
+   xas_for_each_tag(, page, end, PAGECACHE_TAG_DIRTY) {
+   xas_set_tag(, PAGECACHE_TAG_TOWRITE);
+   if (++tagged % XA_CHECK_SCHED)
continue;
-   slot = radix_tree_iter_resume(slot, );
-   xa_unlock_irq(>pages);
+
+   xas_pause();
+   xas_unlock_irq();
cond_resched();
-   xa_lock_irq(>pages);
+   xas_lock_irq();
}
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2164,7 +2156,7 @@ int write_cache_pages(struct address_space *mapping,
pgoff_t done_index;
int cycled;
int range_whole = 0;
-   int tag;
+   xa_tag_t tag;
 
pagevec_init();
if (wbc->range_cyclic) {
@@ -2445,7 +2437,7 @@ void account_page_cleaned(struct page *page, struct 
address_space *mapping,
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
- * its radix tree.
+ * the xarray.
  *
  * This is also used when a single buffer is being dirtied: we want to set the
  * page dirty in that case, but not all the buffers.  This is a "bottom-up"
@@ -2471,7 +2463,7 @@ int __set_page_dirty_nobuffers(struct page *page)
BUG_ON(page_mapping(page) != mapping);
WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages, page_index(page),
+   __xa_set_tag(>pages, page_index(page),
   PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(>pages, flags);
unlock_page_memcg(page);
@@ -2634,13 +2626,13 @@ EXPORT_SYMBOL(__cancel_dirty_page);
  * Returns true if the page was previously dirty.
  *
  * This is for preparing to put the page under writeout.  We leave the page
- * tagged as dirty in the radix tree so that a concurrent write-for-sync
+ * tagged as dirty in the xarray so that a concurrent write-for-sync
  * can discover it via a PAGECACHE_TAG_DIRTY walk.  The ->writepage
  * implementation will run either set_page_writeback() or set_page_dirty(),
- * at which stage we bring the page's dirty flag and radix-tree dirty tag
+ * at which stage we bring the page's dirty

[PATCH v4 03/73] page cache: Use xa_lock

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->pages,
since we don't really care that it's a tree.  Take the opportunity to
rearrange the elements of address_space to pack them better on 64-bit,
and make the comments more useful.

Signed-off-by: Matthew Wilcox 
---
 Documentation/cgroup-v1/memory.txt  |   2 +-
 Documentation/vm/page_migration |  14 ++--
 arch/arm/include/asm/cacheflush.h   |   6 +-
 arch/nios2/include/asm/cacheflush.h |   6 +-
 arch/parisc/include/asm/cacheflush.h|   6 +-
 drivers/staging/lustre/lustre/llite/glimpse.c   |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   8 +-
 fs/afs/write.c  |   2 +-
 fs/btrfs/compression.c  |   2 +-
 fs/btrfs/extent_io.c|   8 +-
 fs/btrfs/inode.c|   2 +-
 fs/buffer.c |  10 +--
 fs/cifs/file.c  |   2 +-
 fs/dax.c| 106 
 fs/f2fs/data.c  |   6 +-
 fs/f2fs/dir.c   |   6 +-
 fs/f2fs/inline.c|   6 +-
 fs/f2fs/node.c  |   8 +-
 fs/fs-writeback.c   |  18 ++--
 fs/inode.c  |  11 ++-
 fs/nilfs2/btnode.c  |  20 ++---
 fs/nilfs2/page.c|  22 ++---
 include/linux/backing-dev.h |  12 +--
 include/linux/fs.h  |  17 ++--
 include/linux/mm.h  |   2 +-
 include/linux/pagemap.h |   4 +-
 mm/filemap.c|  83 +--
 mm/huge_memory.c|  10 +--
 mm/khugepaged.c |  49 +--
 mm/memcontrol.c |   2 +-
 mm/migrate.c|  31 ---
 mm/page-writeback.c |  42 +-
 mm/readahead.c  |   2 +-
 mm/rmap.c   |   4 +-
 mm/shmem.c  |  60 +++---
 mm/swap_state.c |  17 ++--
 mm/truncate.c   |  22 ++---
 mm/vmscan.c |  12 +--
 mm/workingset.c |  22 ++---
 39 files changed, 322 insertions(+), 342 deletions(-)

diff --git a/Documentation/cgroup-v1/memory.txt 
b/Documentation/cgroup-v1/memory.txt
index cefb63639070..1d17fb0405ef 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -262,7 +262,7 @@ When oom event notifier is registered, event will be 
delivered.
 2.6 Locking
 
lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   mapping->tree_lock.
+   mapping xa_lock.
 
Other lock order is following:
PG_locked.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 0478ae2ad44a..faf849596a85 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -90,7 +90,7 @@ Steps:
 
 1. Lock the page to be migrated
 
-2. Insure that writeback is complete.
+2. Ensure that writeback is complete.
 
 3. Lock the new page that we want to move to. It is locked so that accesses to
this (not yet uptodate) page immediately lock while the move is in progress.
@@ -100,8 +100,8 @@ Steps:
mapcount is not zero then we do not migrate the page. All user space
processes that attempt to access the page will now wait on the page lock.
 
-5. The radix tree lock is taken. This will cause all processes trying
-   to access the page via the mapping to block on the radix tree spinlock.
+5. The address space xa_lock is taken. This will cause all processes trying
+   to access the page via the mapping to block on the spinlock.
 
 6. The refcount of the page is examined and we back out if references remain
otherwise we know that we are the only one referencing this page.
@@ -114,12 +114,12 @@ Steps:
 
 9. The radix tree is changed to point to the new page.
 
-10. The reference count of the old page is dropped because the radix tree
+10. The reference count of the old page is dropped because the address space
 reference is gone. A reference to the new page is established because
-the new page is referenced to by the radix tree.
+the new page is referenced by the address space.
 
-11. The radix tree lock is dropped. With that lookups in the mapping
-become possible again. Processes will move from spinning on

[PATCH v4 04/73] xarray: Replace exceptional entries

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Introduce xarray value entries to replace the radix tree exceptional
entry code.  This is a slight change in encoding to allow the use of an
extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry).
It is also a change in emphasis; exceptional entries are intimidating
and different.  As the comment explains, you can choose to store values
or pointers in the xarray and they are both first-class citizens.

Signed-off-by: Matthew Wilcox 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h|   4 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h|   4 +-
 drivers/gpu/drm/i915/i915_gem.c |  17 ++--
 drivers/staging/lustre/lustre/mdc/mdc_request.c |   2 +-
 fs/btrfs/compression.c  |   2 +-
 fs/btrfs/inode.c|   4 +-
 fs/dax.c| 113 
 fs/proc/task_mmu.c  |   2 +-
 include/linux/fs.h  |  48 ++
 include/linux/radix-tree.h  |  36 ++--
 include/linux/swapops.h |  19 ++--
 include/linux/xarray.h  |  40 +
 lib/idr.c   |  63 ++---
 lib/radix-tree.c|  21 ++---
 mm/filemap.c|  10 +--
 mm/khugepaged.c |   2 +-
 mm/madvise.c|   2 +-
 mm/memcontrol.c |   2 +-
 mm/mincore.c|   2 +-
 mm/readahead.c  |   2 +-
 mm/shmem.c  |  10 +--
 mm/swap.c   |   2 +-
 mm/truncate.c   |  12 +--
 mm/workingset.c |  12 ++-
 tools/testing/radix-tree/idr-test.c |   6 +-
 tools/testing/radix-tree/linux/radix-tree.h |   1 +
 tools/testing/radix-tree/multiorder.c   |  47 +-
 tools/testing/radix-tree/test.c |   2 +-
 28 files changed, 248 insertions(+), 239 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 44697817ccc6..5025c26f1acd 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -649,9 +649,7 @@ static inline bool pte_user(pte_t pte)
BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index abddf5830ad5..f711773568d7 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -329,9 +329,7 @@ static inline void __ptep_set_access_flags(struct mm_struct 
*mm,
 */ \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
} while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
+
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
& ((1UL << SWP_TYPE_BITS) - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3a140eedfc83..0446ed973f75 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5375,7 +5375,8 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
count = __sg_page_count(sg);
 
while (idx + count <= n) {
-   unsigned long exception, i;
+   void *entry;
+   unsigned long i;
int ret;
 
/* If we cannot allocate and insert this entry, or the
@@ -5390,12 +5391,9 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
if (ret && ret != -EEXIST)
goto scan;
 
-   exception =
-   RADIX_TREE_EXCEPTIONAL_ENTRY |
-   idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
+   entry = xa_mk_value(idx);
for (i = 1; i < count; i++) {
-   ret = radix_tree_insert(>radix, idx + i,
-   (void *)exception);
+   ret = radix_tree_insert(>radix, idx + i, entry);
if (ret && ret != -EEXIST)
goto scan;
}
@@ -5433,15 +5431,14 @@

[PATCH v4 14/73] xarray: Add xas_for_each_tag

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This iterator operates across each tagged entry in the specified range.
We do not yet have a user for an xa_for_each_tag iterator, but it would
be straight-forward to add one if needed.  This commit also includes
xas_find_tag() and xas_next_tag().

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h | 68 +++
 lib/xarray.c   | 78 ++
 2 files changed, 146 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a62baf6f1a28..4e61ebd406f5 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -554,6 +554,7 @@ void *xas_find(struct xa_state *, unsigned long max);
 bool xas_get_tag(const struct xa_state *, xa_tag_t);
 void xas_set_tag(const struct xa_state *, xa_tag_t);
 void xas_clear_tag(const struct xa_state *, xa_tag_t);
+void *xas_find_tag(struct xa_state *, unsigned long max, xa_tag_t);
 void xas_init_tags(const struct xa_state *);
 
 bool xas_nomem(struct xa_state *, gfp_t);
@@ -676,6 +677,55 @@ static inline void *xas_next_entry(struct xa_state *xas, 
unsigned long max)
return entry;
 }
 
+/* Private */
+static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
+   xa_tag_t tag)
+{
+   unsigned long *addr = xas->xa_node->tags[(__force unsigned)tag];
+   unsigned int offset = xas->xa_offset;
+
+   if (advance)
+   offset++;
+   if (XA_CHUNK_SIZE == BITS_PER_LONG) {
+   unsigned long data = *addr & (~0UL << offset);
+   if (data)
+   return __ffs(data);
+   return XA_CHUNK_SIZE;
+   }
+
+   return find_next_bit(addr, XA_CHUNK_SIZE, offset);
+}
+
+/**
+ * xas_next_tag() - Advance iterator to next tagged entry.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag to search for.
+ *
+ * xas_next_tag() is an inline function to optimise xarray traversal for
+ * speed.  It is equivalent to calling xas_find_tag(), and will call
+ * xas_find_tag() for all the hard cases.
+ *
+ * Return: The next tagged entry after the one currently referred to by @xas.
+ */
+static inline void *xas_next_tag(struct xa_state *xas, unsigned long max,
+   xa_tag_t tag)
+{
+   struct xa_node *node = xas->xa_node;
+   unsigned int offset;
+
+   if (unlikely(xas_not_node(node) || xa_node_shift(node)))
+   return xas_find_tag(xas, max, tag);
+   offset = xas_find_chunk(xas, true, tag);
+   xas->xa_offset = offset;
+   xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset;
+   if (xas->xa_index > max)
+   return NULL;
+   if (offset == XA_CHUNK_SIZE)
+   return xas_find_tag(xas, max, tag);
+   return xa_entry(xas->xa, node, offset);
+}
+
 /*
  * If iterating while holding a lock, drop the lock and reschedule
  * every %XA_CHECK_SCHED loops.
@@ -701,6 +751,24 @@ enum {
for (entry = xas_find(xas, max); entry; \
 entry = xas_next_entry(xas, max))
 
+/**
+ * xas_for_each_tag() - Iterate over a range of an XArray
+ * @xas: XArray operation state.
+ * @entry: Entry retrieved from array.
+ * @max: Maximum index to retrieve from array.
+ * @tag: Tag to search for.
+ *
+ * The loop body will be executed for each tagged entry in the xarray
+ * between the current xas position and @max.  @entry will be set to
+ * the entry retrieved from the xarray.  It is safe to delete entries
+ * from the array in the loop body.  You should hold either the RCU lock
+ * or the xa_lock while iterating.  If you need to drop the lock, call
+ * xas_pause() first.
+ */
+#define xas_for_each_tag(xas, entry, max, tag) \
+   for (entry = xas_find_tag(xas, max, tag); entry; \
+entry = xas_next_tag(xas, max, tag))
+
 /* Internal functions, mostly shared between radix-tree.c, xarray.c and idr.c 
*/
 void xas_destroy(struct xa_state *);
 
diff --git a/lib/xarray.c b/lib/xarray.c
index ac4ff3daf476..f9eaac2d85f9 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -858,6 +858,84 @@ void *xas_find(struct xa_state *xas, unsigned long max)
 }
 EXPORT_SYMBOL_GPL(xas_find);
 
+/**
+ * xas_find_tag() - Find the next tagged entry in the XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @tag: Tag number to search for.
+ *
+ * If the xas has not yet been walked to an entry, return the tagged entry
+ * which has an index >= xas.xa_index.  If it has been walked, the entry
+ * currently being pointed at has been processed, and so we move to the
+ * next tagged entry.
+ *
+ * If no tagged entry is found and the array is smaller than @max, @xas is
+ * set to the restart state and xas->xa_index is set to the smallest index
+ * not yet in the array.  This allows @xas to be immediately passed to
+ * xas_create().
+ *
+ * Return:

[PATCH v4 16/73] xarray: Add xa_destroy

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This function frees all the internal memory allocated to the xarray
and reinitialises it to be empty.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h |  1 +
 lib/xarray.c   | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index c3efcc3432f7..b648c1b93d9f 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -74,6 +74,7 @@ void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
 void *xa_cmpxchg(struct xarray *, unsigned long index,
void *old, void *entry, gfp_t);
+void xa_destroy(struct xarray *);
 
 /**
  * xa_erase() - Erase this entry from the XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index 251724f62b11..f3875b251b41 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1341,6 +1341,32 @@ int xa_get_tagged(struct xarray *xa, void **dst, 
unsigned long start,
 }
 EXPORT_SYMBOL(xa_get_tagged);
 
+/**
+ * xa_destroy() - Free all internal data structures.
+ * @xa: XArray.
+ *
+ * After calling this function, the XArray is empty and has freed all memory
+ * allocated for its internal data structures.  You are responsible for
+ * freeing the objects referenced by the XArray.
+ */
+void xa_destroy(struct xarray *xa)
+{
+   XA_STATE(xas, xa, 0);
+   unsigned long flags;
+   void *entry;
+
+   xas.xa_node = NULL;
+   xa_lock_irqsave(xa, flags);
+   entry = xa_head_locked(xa);
+   RCU_INIT_POINTER(xa->xa_head, NULL);
+   xas_init_tags();
+   /* lockdep checks we're still holding the lock in xas_free_nodes() */
+   if (xa_is_node(entry))
+   xas_free_nodes(, xa_to_node(entry));
+   xa_unlock_irqrestore(xa, flags);
+}
+EXPORT_SYMBOL(xa_destroy);
+
 #ifdef XA_DEBUG
 void xa_dump_entry(void *entry, unsigned long index)
 {
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 05/73] xarray: Change definition of sibling entries

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Instead of storing a pointer to the slot containing the canonical entry,
store the offset of the slot.  Produces slightly more efficient code
(~300 bytes) and simplifies the implementation.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h | 82 ++
 lib/radix-tree.c   | 65 +++
 2 files changed, 100 insertions(+), 47 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index e55f5cfd14ed..2c45d87a3476 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -58,6 +58,8 @@ static inline bool xa_is_value(void *entry)
return (unsigned long)entry & 1;
 }
 
+/* Everything below here is the Advanced API.  Proceed with caution. */
+
 #define xa_trylock(xa) spin_trylock(&(xa)->xa_lock)
 #define xa_lock(xa)spin_lock(&(xa)->xa_lock)
 #define xa_unlock(xa)  spin_unlock(&(xa)->xa_lock)
@@ -71,4 +73,84 @@ static inline bool xa_is_value(void *entry)
spin_unlock_irqrestore(&(xa)->xa_lock, flags)
 #define xa_lock_held(xa)   lockdep_is_held(&(xa)->xa_lock)
 
+/*
+ * The xarray is constructed out of a set of 'chunks' of pointers.  Choosing
+ * the best chunk size requires some tradeoffs.  A power of two recommends
+ * itself so that we can walk the tree based purely on shifts and masks.
+ * Generally, the larger the better; as the number of slots per level of the
+ * tree increases, the less tall the tree needs to be.  But that needs to be
+ * balanced against the memory consumption of each node.  On a 64-bit system,
+ * xa_node is currently 576 bytes, and we get 7 of them per 4kB page.  If we
+ * doubled the number of slots per node, we'd get only 3 nodes per 4kB page.
+ */
+#ifndef XA_CHUNK_SHIFT
+#define XA_CHUNK_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
+#endif
+#define XA_CHUNK_SIZE  (1UL << XA_CHUNK_SHIFT)
+#define XA_CHUNK_MASK  (XA_CHUNK_SIZE - 1)
+
+/*
+ * Internal entries have the bottom two bits set to the value 10b.  Most
+ * internal entries are pointers to the next node in the tree.  Since the
+ * kernel unmaps page 0 to trap NULL pointer dereferences, we can use values
+ * 0-1023 for special purposes.  Values 0-62 are used for sibling
+ * entries.  Value 256 is used for the retry entry.
+ */
+
+/* Private */
+static inline void *xa_mk_internal(unsigned long v)
+{
+   return (void *)((v << 2) | 2);
+}
+
+/* Private */
+static inline unsigned long xa_to_internal(void *entry)
+{
+   return (unsigned long)entry >> 2;
+}
+
+/**
+ * xa_is_internal() - Is the entry an internal entry?
+ * @entry: Entry retrieved from the XArray
+ *
+ * Return: %true if the entry is an internal entry.
+ */
+static inline bool xa_is_internal(void *entry)
+{
+   return ((unsigned long)entry & 3) == 2;
+}
+
+/* Private */
+static inline bool xa_is_node(void *entry)
+{
+   return xa_is_internal(entry) && (unsigned long)entry > 4096;
+}
+
+/* Private */
+static inline void *xa_mk_sibling(unsigned int offset)
+{
+   return xa_mk_internal(offset);
+}
+
+/* Private */
+static inline unsigned long xa_to_sibling(void *entry)
+{
+   return xa_to_internal(entry);
+}
+
+/**
+ * xa_is_sibling() - Is the entry a sibling entry?
+ * @entry: Entry retrieved from the XArray
+ *
+ * Return: %true if the entry is a sibling entry.
+ */
+static inline bool xa_is_sibling(void *entry)
+{
+   return IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
+   xa_is_internal(entry) &&
+   (entry < xa_mk_sibling(XA_CHUNK_SIZE - 1));
+}
+
+#define XA_RETRY_ENTRY xa_mk_internal(256)
+
 #endif /* _LINUX_XARRAY_H */
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index cda7a730e591..0a7a21dd9398 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 /* Number of nodes in fully populated tree of given height */
@@ -97,24 +98,7 @@ static inline void *node_to_entry(void *ptr)
return (void *)((unsigned long)ptr | RADIX_TREE_INTERNAL_NODE);
 }
 
-#define RADIX_TREE_RETRY   node_to_entry(NULL)
-
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-/* Sibling slots point directly to another slot in the same node */
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-   void __rcu **ptr = node;
-   return (parent->slots <= ptr) &&
-   (ptr < parent->slots + RADIX_TREE_MAP_SIZE);
-}
-#else
-static inline
-bool is_sibling_entry(const struct radix_tree_node *parent, void *node)
-{
-   return false;
-}
-#endif
+#define RADIX_TREE_RETRY   XA_RETRY_ENTRY
 
 static inline unsigned long
 get_slot_offset(const struct radix_tree_node *parent, void __rcu **slot)
@@ -128,16 +112,10 @@ static unsigned int radix_tree_descend(const struct 
radix_tree_node *parent,
unsigned int offset = (index >> parent->shift) &

[PATCH v4 25/73] page cache: Convert page deletion to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

The code is slightly shorter and simpler.

Signed-off-by: Matthew Wilcox 
---
 mm/filemap.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 2439747a0a17..6e2808fd3c06 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -115,27 +115,25 @@
 static void page_cache_tree_delete(struct address_space *mapping,
   struct page *page, void *shadow)
 {
-   int i, nr;
+   XA_STATE(xas, >pages, page->index);
+   unsigned int i, nr;
 
-   /* hugetlb pages are represented by one entry in the radix tree */
+   xas_set_update(, workingset_lookup_update(mapping));
+
+   /* hugetlb pages are represented by a single entry in the xarray */
nr = PageHuge(page) ? 1 : hpage_nr_pages(page);
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageTail(page), page);
VM_BUG_ON_PAGE(nr != 1 && shadow, page);
 
-   for (i = 0; i < nr; i++) {
-   struct radix_tree_node *node;
-   void **slot;
-
-   __radix_tree_lookup(>pages, page->index + i,
-   , );
-
-   VM_BUG_ON_PAGE(!node && nr != 1, page);
-
-   radix_tree_clear_tags(>pages, node, slot);
-   __radix_tree_replace(>pages, node, slot, shadow,
-   workingset_lookup_update(mapping));
+   i = nr;
+repeat:
+   xas_store(, shadow);
+   xas_init_tags();
+   if (--i) {
+   xas_next();
+   goto repeat;
}
 
page->mapping = NULL;
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 23/73] page cache: Add page_cache_range_empty function

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

btrfs has its own custom function for determining whether the page cache
has any pages in a particular range.  Move this functionality to the
page cache, and call it from btrfs.

Signed-off-by: Matthew Wilcox 
---
 fs/btrfs/btrfs_inode.h  |  7 -
 fs/btrfs/inode.c| 70 -
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 26 ++
 4 files changed, 34 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 63f0ccc92a71..a48bd6e0a0bb 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -365,6 +365,11 @@ static inline void btrfs_print_data_csum_error(struct 
btrfs_inode *inode,
logical_start, csum, csum_expected, mirror_num);
 }
 
-bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end);
+static inline bool btrfs_page_exists_in_range(struct inode *inode,
+   loff_t start, loff_t end)
+{
+   return page_cache_range_empty(inode->i_mapping, start >> PAGE_SHIFT,
+   end >> PAGE_SHIFT);
+}
 
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 72f763c56127..a2692bceaa98 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7539,76 +7539,6 @@ noinline int can_nocow_extent(struct inode *inode, u64 
offset, u64 *len,
return ret;
 }
 
-bool btrfs_page_exists_in_range(struct inode *inode, loff_t start, loff_t end)
-{
-   struct radix_tree_root *root = >i_mapping->pages;
-   bool found = false;
-   void **pagep = NULL;
-   struct page *page = NULL;
-   unsigned long start_idx;
-   unsigned long end_idx;
-
-   start_idx = start >> PAGE_SHIFT;
-
-   /*
-* end is the last byte in the last page.  end == start is legal
-*/
-   end_idx = end >> PAGE_SHIFT;
-
-   rcu_read_lock();
-
-   /* Most of the code in this while loop is lifted from
-* find_get_page.  It's been modified to begin searching from a
-* page and return just the first page found in that range.  If the
-* found idx is less than or equal to the end idx then we know that
-* a page exists.  If no pages are found or if those pages are
-* outside of the range then we're fine (yay!) */
-   while (page == NULL &&
-  radix_tree_gang_lookup_slot(root, , NULL, start_idx, 1)) {
-   page = radix_tree_deref_slot(pagep);
-   if (unlikely(!page))
-   break;
-
-   if (radix_tree_exception(page)) {
-   if (radix_tree_deref_retry(page)) {
-   page = NULL;
-   continue;
-   }
-   /*
-* Otherwise, shmem/tmpfs must be storing a swap entry
-* here so return it without attempting to raise page
-* count.
-*/
-   page = NULL;
-   break; /* TODO: Is this relevant for this use case? */
-   }
-
-   if (!page_cache_get_speculative(page)) {
-   page = NULL;
-   continue;
-   }
-
-   /*
-* Has the page moved?
-* This is part of the lockless pagecache protocol. See
-* include/linux/pagemap.h for details.
-*/
-   if (unlikely(page != *pagep)) {
-   put_page(page);
-   page = NULL;
-   }
-   }
-
-   if (page) {
-   if (page->index <= end_idx)
-   found = true;
-   put_page(page);
-   }
-
-   rcu_read_unlock();
-   return found;
-}
-
 static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
  struct extent_state **cached_state, int writing)
 {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0db127c3ccac..34d4fa3ad1c5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -245,6 +245,8 @@ pgoff_t page_cache_next_gap(struct address_space *mapping,
 pgoff_t index, unsigned long max_scan);
 pgoff_t page_cache_prev_gap(struct address_space *mapping,
 pgoff_t index, unsigned long max_scan);
+bool page_cache_range_empty(struct address_space *mapping,
+   pgoff_t index, pgoff_t max);
 
 #define FGP_ACCESSED   0x0001
 #define FGP_LOCK   0x0002
diff --git a/mm/filemap.c b/mm/filemap.c
index 650624f7b79d..51f88ffc5319 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1397,6 +1397,32 @@ pgoff_t page_cache_prev_gap(struct address_space 
*mapping,
 }

[PATCH v4 01/73] xfs: Rename xa_ elements to ail_

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a simple rename, except that xa_ail becomes ail_head.

Signed-off-by: Matthew Wilcox 
---
 fs/xfs/xfs_buf_item.c|  10 ++--
 fs/xfs/xfs_dquot.c   |   4 +-
 fs/xfs/xfs_dquot_item.c  |  11 ++--
 fs/xfs/xfs_inode_item.c  |  22 +++
 fs/xfs/xfs_log.c |   6 +-
 fs/xfs/xfs_log_recover.c |  80 -
 fs/xfs/xfs_trans.c   |  18 +++---
 fs/xfs/xfs_trans_ail.c   | 152 +++
 fs/xfs/xfs_trans_buf.c   |   4 +-
 fs/xfs/xfs_trans_priv.h  |  42 ++---
 10 files changed, 175 insertions(+), 174 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index e0a0af0946f2..6c5035544a93 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -459,7 +459,7 @@ xfs_buf_item_unpin(
bp->b_fspriv = NULL;
bp->b_iodone = NULL;
} else {
-   spin_lock(>xa_lock);
+   spin_lock(>ail_lock);
xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
xfs_buf_item_relse(bp);
ASSERT(bp->b_fspriv == NULL);
@@ -1056,13 +1056,13 @@ xfs_buf_do_callbacks_fail(
struct xfs_log_item *lip = bp->b_fspriv;
struct xfs_ail  *ailp = lip->li_ailp;
 
-   spin_lock(>xa_lock);
+   spin_lock(>ail_lock);
for (; lip; lip = next) {
next = lip->li_bio_list;
if (lip->li_ops->iop_error)
lip->li_ops->iop_error(lip, bp);
}
-   spin_unlock(>xa_lock);
+   spin_unlock(>ail_lock);
 }
 
 static bool
@@ -1215,7 +1215,7 @@ xfs_buf_iodone(
 *
 * Either way, AIL is useless if we're forcing a shutdown.
 */
-   spin_lock(>xa_lock);
+   spin_lock(>ail_lock);
xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
xfs_buf_item_free(BUF_ITEM(lip));
 }
@@ -1236,7 +1236,7 @@ xfs_buf_resubmit_failed_buffers(
/*
 * Clear XFS_LI_FAILED flag from all items before resubmit
 *
-* XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
+* XFS_LI_FAILED set/clear is protected by ail_lock, caller  this
 * function already have it acquired
 */
for (; lip; lip = next) {
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index f248708c10ff..e2a466df5dd1 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -974,7 +974,7 @@ xfs_qm_dqflush_done(
 (lip->li_flags & XFS_LI_FAILED))) {
 
/* xfs_trans_ail_delete() drops the AIL lock. */
-   spin_lock(>xa_lock);
+   spin_lock(>ail_lock);
if (lip->li_lsn == qip->qli_flush_lsn) {
xfs_trans_ail_delete(ailp, lip, 
SHUTDOWN_CORRUPT_INCORE);
} else {
@@ -984,7 +984,7 @@ xfs_qm_dqflush_done(
 */
if (lip->li_flags & XFS_LI_FAILED)
xfs_clear_li_failed(lip);
-   spin_unlock(>xa_lock);
+   spin_unlock(>ail_lock);
}
}
 
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 664dea105e76..62637a226601 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -160,8 +160,9 @@ xfs_dquot_item_error(
 STATIC uint
 xfs_qm_dquot_logitem_push(
struct xfs_log_item *lip,
-   struct list_head*buffer_list) __releases(>li_ailp->xa_lock)
- __acquires(>li_ailp->xa_lock)
+   struct list_head*buffer_list)
+   __releases(>li_ailp->ail_lock)
+   __acquires(>li_ailp->ail_lock)
 {
struct xfs_dquot*dqp = DQUOT_ITEM(lip)->qli_dquot;
struct xfs_buf  *bp = lip->li_buf;
@@ -208,7 +209,7 @@ xfs_qm_dquot_logitem_push(
goto out_unlock;
}
 
-   spin_unlock(>li_ailp->xa_lock);
+   spin_unlock(>li_ailp->ail_lock);
 
error = xfs_qm_dqflush(dqp, );
if (error) {
@@ -220,7 +221,7 @@ xfs_qm_dquot_logitem_push(
xfs_buf_relse(bp);
}
 
-   spin_lock(>li_ailp->xa_lock);
+   spin_lock(>li_ailp->ail_lock);
 out_unlock:
xfs_dqunlock(dqp);
return rval;
@@ -403,7 +404,7 @@ xfs_qm_qoffend_logitem_committed(
 * Delete the qoff-start logitem from the AIL.
 * xfs_trans_ail_delete() drops the AIL lock.
 */
-   spin_lock(>xa_lock);
+   spin_lock(>ail_lock);
xfs_trans_ail_delete(ailp, >qql_item, SHUTDOWN_LOG_IO_ERROR);
 
kmem_free(qfs->qql_item.li_lv_shadow);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..071acd4249a0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -501,8 +501,8 @@ STATIC uint

[PATCH v4 21/73] ida: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Use the xarray infrstructure like we used the radix tree infrastructure.
This lets us get rid of idr_get_free() from the radix tree code.

Signed-off-by: Matthew Wilcox 
---
 include/linux/idr.h|   8 +-
 include/linux/radix-tree.h |   4 -
 lib/idr.c  | 320 ++---
 lib/radix-tree.c   | 126 --
 4 files changed, 187 insertions(+), 271 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 06412fbaa65f..d6b3dbe483b8 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -245,11 +245,11 @@ struct ida_bitmap {
 DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 
 struct ida {
-   struct radix_tree_root  ida_rt;
+   struct xarray   ida_xa;
 };
 
 #define IDA_INIT(name) {   \
-   .ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),   \
+   .ida_xa = __XARRAY_INIT(name.ida_xa, IDR_INIT_FLAGS)\
 }
 #define DEFINE_IDA(name)   struct ida name = IDA_INIT(name)
 
@@ -264,7 +264,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-   INIT_RADIX_TREE(>ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
+   __xa_init(>ida_xa, IDR_INIT_FLAGS);
 }
 
 /**
@@ -281,6 +281,6 @@ static inline int ida_get_new(struct ida *ida, int *p_id)
 
 static inline bool ida_is_empty(const struct ida *ida)
 {
-   return radix_tree_empty(>ida_rt);
+   return xa_empty(>ida_xa);
 }
 #endif /* _LINUX_IDR_H */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f46e3de57115..5d67939b8cd0 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -301,10 +301,6 @@ int radix_tree_split(struct radix_tree_root *, unsigned 
long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
unsigned new_order, void *);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
- struct radix_tree_iter *iter, gfp_t gfp,
- unsigned long max);
-
 enum {
RADIX_TREE_ITER_TAG_MASK = 0x0f,/* tag index in lower nybble */
RADIX_TREE_ITER_TAGGED   = 0x10,/* lookup tagged slots */
diff --git a/lib/idr.c b/lib/idr.c
index e677d1869ead..eb145da485f2 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -6,7 +6,6 @@
 #include 
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
-static DEFINE_SPINLOCK(simple_ida_lock);
 
 /* In radix-tree.c temporarily */
 extern bool idr_nomem(struct xa_state *, gfp_t);
@@ -307,26 +306,23 @@ EXPORT_SYMBOL_GPL(idr_replace);
 /*
  * Developer's notes:
  *
- * The IDA uses the functionality provided by the IDR & radix tree to store
- * bitmaps in each entry.  The XA_FREE_TAG tag means there is at least one bit
- * free, unlike the IDR where it means at least one entry is free.
- *
- * I considered telling the radix tree that each slot is an order-10 node
- * and storing the bit numbers in the radix tree, but the radix tree can't
- * allow a single multiorder entry at index 0, which would significantly
- * increase memory consumption for the IDA.  So instead we divide the index
- * by the number of bits in the leaf bitmap before doing a radix tree lookup.
- *
- * As an optimisation, if there are only a few low bits set in any given
- * leaf, instead of allocating a 128-byte bitmap, we store the bits
+ * The IDA uses the functionality provided by the IDR & XArray to store
+ * bitmaps in each entry.  The XA_FREE_TAG tag is used to mean that there
+ * is at least one bit free, unlike the IDR where it means at least one
+ * array entry is free.
+ *
+ * The XArray supports multi-index entries, so I considered teaching the
+ * XArray that each slot is an order-10 node and indexing the XArray by the
+ * ID.  The XArray has the significant optimisation of storing the first
+ * entry in the struct xarray and avoiding allocating an xa_node.
+ * Unfortunately, it can't do that for multi-order entries.
+ * So instead the XArray index is the ID divided by the number of bits in
+ * the bitmap
+ *
+ * As a further optimisation, if there are only a few low bits set in any
+ * given leaf, instead of allocating a 128-byte bitmap, we store the bits
  * directly in the entry.
  *
- * We allow the radix tree 'exceptional' count to get out of date.  Nothing
- * in the IDA nor the radix tree code checks it.  If it becomes important
- * to maintain an accurate exceptional count, switch the rcu_assign_pointer()
- * calls to radix_tree_iter_replace() which will correct the exceptional
- * count.
- *
  * The IDA always requires a lock to alloc/free.  If we add a 'test_bit'
  * equivalent, it will still need locking.  Going to RCU lookup would require
  * using RCU to free bitmaps, and that's not trivial without embedding an
@@ -336,104 +332,114 @@ EXPORT_SYMBOL_GPL(idr_replace);

[PATCH v4 38/73] mm: Convert collapse_shmem to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

I found another victim of the radix tree being hard to use.  Because
there was no call to radix_tree_preload(), khugepaged was allocating
radix_tree_nodes using GFP_ATOMIC.

I also converted a local_irq_save()/restore() pair to
disable()/enable().

Signed-off-by: Matthew Wilcox 
---
 include/linux/swap.h |   4 +-
 mm/khugepaged.c  | 158 +--
 2 files changed, 67 insertions(+), 95 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 569a8ac4fe3f..9774f43d3e4f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -300,12 +300,12 @@ bool workingset_refault(void *shadow);
 void workingset_activation(struct page *page);
 
 /* Do not use directly, use workingset_lookup_update */
-void workingset_update_node(struct xa_node *node);
+void workingset_update_node(struct radix_tree_node *node);
 
 /* Returns workingset_update_node() if the mapping has shadow entries. */
 #define workingset_lookup_update(mapping)  \
 ({ \
-   xa_update_node_t __helper = workingset_update_node; \
+   radix_tree_update_node_t __helper = workingset_update_node; \
if (dax_mapping(mapping) || shmem_mapping(mapping)) \
__helper = NULL;\
__helper;   \
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 55ade70c33bb..9f49d0cd61c2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1282,17 +1282,17 @@ static void retract_page_tables(struct address_space 
*mapping, pgoff_t pgoff)
  *
  * Basic scheme is simple, details are more complex:
  *  - allocate and freeze a new huge page;
- *  - scan over radix tree replacing old pages the new one
+ *  - scan page cache replacing old pages with the new one
  *+ swap in pages if necessary;
  *+ fill in gaps;
- *+ keep old pages around in case if rollback is required;
- *  - if replacing succeed:
+ *+ keep old pages around in case rollback is required;
+ *  - if replacing succeeds:
  *+ copy data over;
  *+ free old pages;
  *+ unfreeze huge page;
  *  - if replacing failed;
  *+ put all pages back and unfreeze them;
- *+ restore gaps in the radix-tree;
+ *+ restore gaps in the page cache;
  *+ free huge page;
  */
 static void collapse_shmem(struct mm_struct *mm,
@@ -1300,12 +1300,11 @@ static void collapse_shmem(struct mm_struct *mm,
struct page **hpage, int node)
 {
gfp_t gfp;
-   struct page *page, *new_page, *tmp;
+   struct page *new_page;
struct mem_cgroup *memcg;
pgoff_t index, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
int nr_none = 0, result = SCAN_SUCCEED;
 
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
@@ -1330,48 +1329,48 @@ static void collapse_shmem(struct mm_struct *mm,
__SetPageLocked(new_page);
BUG_ON(!page_ref_freeze(new_page, 1));
 
-
/*
-* At this point the new_page is 'frozen' (page_count() is zero), locked
-* and not up-to-date. It's safe to insert it into radix tree, because
-* nobody would be able to map it or use it in other way until we
-* unfreeze it.
+* At this point the new_page is 'frozen' (page_count() is zero),
+* locked and not up-to-date. It's safe to insert it into the page
+* cache, because nobody would be able to map it or use it in other
+* way until we unfreeze it.
 */
 
-   index = start;
-   xa_lock_irq(>pages);
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   int n = min(iter.index, end) - index;
-
-   /*
-* Handle holes in the radix tree: charge it from shmem and
-* insert relevant subpage of new_page into the radix-tree.
-*/
-   if (n && !shmem_charge(mapping->host, n)) {
-   result = SCAN_FAIL;
+   /* This will be less messy when we use multi-index entries */
+   do {
+   xas_lock_irq();
+   xas_create_range(, end - 1);
+   if (!xas_error())
break;
-   }
-   nr_none += n;
-   for (; index < min(iter.index, end); index++) {
-   radix_tree_insert(>pages, index,
-   new_page + (index % HPAGE_PMD_NR));
-   }
+   xas_unlock_irq();
+   if (!xas_nomem(, GFP_KERNEL))
+   goto out;
+   } while (1);
 
-   /* We are done. */
-   if (index >= end)
-   break;
+   for (index =

[PATCH v4 35/73] mm: Convert __do_page_cache_readahead to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This one is trivial.

Signed-off-by: Matthew Wilcox 
---
 mm/readahead.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index f64b31b3a84a..66bcaffd47f0 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -174,9 +174,7 @@ int __do_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
if (page_offset > end_index)
break;
 
-   rcu_read_lock();
-   page = radix_tree_lookup(>pages, page_offset);
-   rcu_read_unlock();
+   page = xa_load(>pages, page_offset);
if (page && !xa_is_value(page))
continue;
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 34/73] mm: Convert cgroup writeback to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a fairly naive conversion, leaving in place the GFP_ATOMIC
allocation.  By switching the locking around, we could use GFP_KERNEL
and probably simplify the error handling.

Signed-off-by: Matthew Wilcox 
---
 include/linux/backing-dev-defs.h |  2 +-
 include/linux/backing-dev.h  |  2 +-
 mm/backing-dev.c | 28 
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index bfe86b54f6c1..074a54aad33c 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -187,7 +187,7 @@ struct backing_dev_info {
struct bdi_writeback wb;  /* the root writeback info for this bdi */
struct list_head wb_list; /* list of all wbs */
 #ifdef CONFIG_CGROUP_WRITEBACK
-   struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */
+   struct xarray cgwb_xa;  /* radix tree of active cgroup wbs */
struct rb_root cgwb_congested_tree; /* their congested states */
 #else
struct bdi_writeback_congested *wb_congested;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 9038f6c1eeda..50f666d23527 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -271,7 +271,7 @@ static inline struct bdi_writeback *wb_find_current(struct 
backing_dev_info *bdi
if (!memcg_css->parent)
return >wb;
 
-   wb = radix_tree_lookup(>cgwb_tree, memcg_css->id);
+   wb = xa_load(>cgwb_xa, memcg_css->id);
 
/*
 * %current's blkcg equals the effective blkcg of its memcg.  No
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 84b2dc76f140..7aa2d893f929 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -417,8 +417,8 @@ static void wb_exit(struct bdi_writeback *wb)
 #include 
 
 /*
- * cgwb_lock protects bdi->cgwb_tree, bdi->cgwb_congested_tree,
- * blkcg->cgwb_list, and memcg->cgwb_list.  bdi->cgwb_tree is also RCU
+ * cgwb_lock protects bdi->cgwb_xa, bdi->cgwb_congested_tree,
+ * blkcg->cgwb_list, and memcg->cgwb_list.  bdi->cgwb_xa is also RCU
  * protected.
  */
 static DEFINE_SPINLOCK(cgwb_lock);
@@ -539,7 +539,7 @@ static void cgwb_kill(struct bdi_writeback *wb)
 {
lockdep_assert_held(_lock);
 
-   WARN_ON(!radix_tree_delete(>bdi->cgwb_tree, wb->memcg_css->id));
+   WARN_ON(xa_erase(>bdi->cgwb_xa, wb->memcg_css->id) != wb);
list_del(>memcg_node);
list_del(>blkcg_node);
percpu_ref_kill(>refcnt);
@@ -571,7 +571,7 @@ static int cgwb_create(struct backing_dev_info *bdi,
 
/* look up again under lock and discard on blkcg mismatch */
spin_lock_irqsave(_lock, flags);
-   wb = radix_tree_lookup(>cgwb_tree, memcg_css->id);
+   wb = xa_load(>cgwb_xa, memcg_css->id);
if (wb && wb->blkcg_css != blkcg_css) {
cgwb_kill(wb);
wb = NULL;
@@ -615,13 +615,18 @@ static int cgwb_create(struct backing_dev_info *bdi,
if (test_bit(WB_registered, >wb.state) &&
blkcg_cgwb_list->next && memcg_cgwb_list->next) {
/* we might have raced another instance of this function */
-   ret = radix_tree_insert(>cgwb_tree, memcg_css->id, wb);
-   if (!ret) {
+   void *curr = xa_cmpxchg(>cgwb_xa, memcg_css->id, NULL,
+   wb, GFP_ATOMIC);
+   if (!curr) {
list_add_tail_rcu(>bdi_node, >wb_list);
list_add(>memcg_node, memcg_cgwb_list);
list_add(>blkcg_node, blkcg_cgwb_list);
css_get(memcg_css);
css_get(blkcg_css);
+   } else if (IS_ERR(curr)) {
+   ret = PTR_ERR(curr);
+   } else {
+   ret = -EEXIST;
}
}
spin_unlock_irqrestore(_lock, flags);
@@ -682,7 +687,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info 
*bdi,
 
do {
rcu_read_lock();
-   wb = radix_tree_lookup(>cgwb_tree, memcg_css->id);
+   wb = xa_load(>cgwb_xa, memcg_css->id);
if (wb) {
struct cgroup_subsys_state *blkcg_css;
 
@@ -704,7 +709,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 {
int ret;
 
-   INIT_RADIX_TREE(>cgwb_tree, GFP_ATOMIC);
+   xa_init(>cgwb_xa);
bdi->cgwb_congested_tree = RB_ROOT;
 
ret = wb_init(>wb, bdi, 1, GFP_KERNEL);
@@ -717,15 +722,14 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
 static void cgwb_bdi_unregister(struct backing_dev_info *bdi)
 {
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >cgwb_xa, 0);
struct bdi_writeback *wb;
 
WARN_ON(test_bit(WB_registered, >wb.state));

[PATCH v4 47/73] shmem: Convert shmem_alloc_hugepage to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

xa_find() is a slightly easier API to use than
radix_tree_gang_lookup_slot() because it contains its own RCU locking.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 54fbfc2c6c09..768d470a03da 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1413,23 +1413,16 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
 {
struct vm_area_struct pvma;
-   struct inode *inode = >vfs_inode;
-   struct address_space *mapping = inode->i_mapping;
-   pgoff_t idx, hindex;
-   void __rcu **results;
+   struct address_space *mapping = info->vfs_inode.i_mapping;
+   pgoff_t hindex;
struct page *page;
 
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE))
return NULL;
 
hindex = round_down(index, HPAGE_PMD_NR);
-   rcu_read_lock();
-   if (radix_tree_gang_lookup_slot(>pages, , ,
-   hindex, 1) && idx < hindex + HPAGE_PMD_NR) {
-   rcu_read_unlock();
+   if (xa_find(>pages, , hindex + HPAGE_PMD_NR - 1))
return NULL;
-   }
-   rcu_read_unlock();
 
shmem_pseudo_vma_init(, info, hindex);
page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 51/73] btrfs: Convert page cache to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 fs/btrfs/compression.c | 4 +---
 fs/btrfs/extent_io.c   | 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e687d06cd97c..4174b166e235 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -449,9 +449,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (pg_index > end_index)
break;
 
-   rcu_read_lock();
-   page = radix_tree_lookup(>pages, pg_index);
-   rcu_read_unlock();
+   page = xa_load(>pages, pg_index);
if (page && !xa_is_value(page)) {
misses++;
if (misses > 4)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b8b5b4562d50..96328c3a548e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5197,11 +5197,9 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
 
clear_page_dirty_for_io(page);
xa_lock_irq(>mapping->pages);
-   if (!PageDirty(page)) {
-   radix_tree_tag_clear(>mapping->pages,
-   page_index(page),
+   if (!PageDirty(page))
+   __xa_clear_tag(>mapping->pages, page_index(page),
PAGECACHE_TAG_DIRTY);
-   }
xa_unlock_irq(>mapping->pages);
ClearPageError(page);
unlock_page(page);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 33/73] mm: Convert delete_from_swap_cache to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Both callers of __delete_from_swap_cache have the swp_entry_t already,
so pass that in to make constructing the XA_STATE easier.

Signed-off-by: Matthew Wilcox 
---
 include/linux/swap.h |  5 +++--
 mm/swap_state.c  | 24 ++--
 mm/vmscan.c  |  2 +-
 3 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index e4a8afcb214c..569a8ac4fe3f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -413,7 +413,7 @@ extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
-extern void __delete_from_swap_cache(struct page *);
+extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
 extern void delete_from_swap_cache(struct page *);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
@@ -588,7 +588,8 @@ static inline int add_to_swap_cache(struct page *page, 
swp_entry_t entry,
return -1;
 }
 
-static inline void __delete_from_swap_cache(struct page *page)
+static inline void __delete_from_swap_cache(struct page *page,
+   swp_entry_t entry)
 {
 }
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 117b5da9dc01..7c862258af66 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -154,23 +154,22 @@ int add_to_swap_cache(struct page *page, swp_entry_t 
entry, gfp_t gfp)
  * This must be called only on pages that have
  * been verified to be in the swap cache.
  */
-void __delete_from_swap_cache(struct page *page)
+void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
 {
-   struct address_space *address_space;
+   struct address_space *address_space = swap_address_space(entry);
int i, nr = hpage_nr_pages(page);
-   swp_entry_t entry;
-   pgoff_t idx;
+   pgoff_t idx = swp_offset(entry);
+   XA_STATE(xas, _space->pages, idx);
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
VM_BUG_ON_PAGE(PageWriteback(page), page);
 
-   entry.val = page_private(page);
-   address_space = swap_address_space(entry);
-   idx = swp_offset(entry);
for (i = 0; i < nr; i++) {
-   radix_tree_delete(_space->pages, idx + i);
+   void *entry = xas_store(, NULL);
+   VM_BUG_ON_PAGE(entry != page + i, entry);
set_page_private(page + i, 0);
+   xas_next();
}
ClearPageSwapCache(page);
address_space->nrpages -= nr;
@@ -246,14 +245,11 @@ int add_to_swap(struct page *page)
  */
 void delete_from_swap_cache(struct page *page)
 {
-   swp_entry_t entry;
-   struct address_space *address_space;
-
-   entry.val = page_private(page);
+   swp_entry_t entry = { .val = page_private(page) };
+   struct address_space *address_space = swap_address_space(entry);
 
-   address_space = swap_address_space(entry);
xa_lock_irq(_space->pages);
-   __delete_from_swap_cache(page);
+   __delete_from_swap_cache(page, entry);
xa_unlock_irq(_space->pages);
 
put_swap_page(page, entry);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 96316bd91f91..51df3f9ba0bc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -715,7 +715,7 @@ static int __remove_mapping(struct address_space *mapping, 
struct page *page,
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
mem_cgroup_swapout(page, swap);
-   __delete_from_swap_cache(page);
+   __delete_from_swap_cache(page, swap);
xa_unlock_irqrestore(>pages, flags);
put_swap_page(page, swap);
} else {
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 45/73] shmem: Convert shmem_wait_for_pins to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

As with shmem_tag_pins(), hold the lock around the entire loop instead
of acquiring & dropping it for each entry we're going to untag.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 59 ---
 1 file changed, 24 insertions(+), 35 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 2f41c7ceea18..e4a2eb1336be 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2636,9 +2636,7 @@ static void shmem_tag_pins(struct address_space *mapping)
  */
 static int shmem_wait_for_pins(struct address_space *mapping)
 {
-   struct radix_tree_iter iter;
-   void **slot;
-   pgoff_t start;
+   XA_STATE(xas, >pages, 0);
struct page *page;
int error, scan;
 
@@ -2646,7 +2644,9 @@ static int shmem_wait_for_pins(struct address_space 
*mapping)
 
error = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
-   if (!radix_tree_tagged(>pages, SHMEM_TAG_PINNED))
+   unsigned int tagged = 0;
+
+   if (!xas_tagged(, SHMEM_TAG_PINNED))
break;
 
if (!scan)
@@ -2654,45 +2654,34 @@ static int shmem_wait_for_pins(struct address_space 
*mapping)
else if (schedule_timeout_killable((HZ << scan) / 200))
scan = LAST_SCAN;
 
-   start = 0;
-   rcu_read_lock();
-   radix_tree_for_each_tagged(slot, >pages, ,
-  start, SHMEM_TAG_PINNED) {
-
-   page = radix_tree_deref_slot(slot);
-   if (radix_tree_exception(page)) {
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
-   continue;
-   }
-
-   page = NULL;
-   }
-
-   if (page &&
-   page_count(page) - page_mapcount(page) != 1) {
-   if (scan < LAST_SCAN)
-   goto continue_resched;
-
+   xas_set(, 0);
+   xas_lock_irq();
+   xas_for_each_tag(, page, ULONG_MAX, SHMEM_TAG_PINNED) {
+   bool clear = true;
+   if (xa_is_value(page))
+   continue;
+   if (page_count(page) - page_mapcount(page) != 1) {
/*
 * On the last scan, we clean up all those tags
 * we inserted; but make a note that we still
 * found pages pinned.
 */
-   error = -EBUSY;
+   if (scan == LAST_SCAN)
+   error = -EBUSY;
+   else
+   clear = false;
}
+   if (clear)
+   xas_clear_tag(, SHMEM_TAG_PINNED);
+   if (++tagged % XA_CHECK_SCHED)
+   continue;
 
-   xa_lock_irq(>pages);
-   radix_tree_tag_clear(>pages,
-iter.index, SHMEM_TAG_PINNED);
-   xa_unlock_irq(>pages);
-continue_resched:
-   if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
-   cond_resched_rcu();
-   }
+   xas_pause();
+   xas_unlock_irq();
+   cond_resched();
+   xas_lock_irq();
}
-   rcu_read_unlock();
+   xas_unlock_irq();
}
 
return error;
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 42/73] shmem: Convert shmem_confirm_swap to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

xa_load has its own RCU locking, so we can eliminate it here.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index fad6c9e7402e..654f367aca90 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -348,12 +348,7 @@ static int shmem_xa_replace(struct address_space *mapping,
 static bool shmem_confirm_swap(struct address_space *mapping,
   pgoff_t index, swp_entry_t swap)
 {
-   void *item;
-
-   rcu_read_lock();
-   item = radix_tree_lookup(>pages, index);
-   rcu_read_unlock();
-   return item == swp_to_radix_entry(swap);
+   return xa_load(>pages, index) == swp_to_radix_entry(swap);
 }
 
 /*
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 46/73] shmem: Convert shmem_add_to_page_cache to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This removes the last caller of radix_tree_maybe_preload_order().
Simpler code, unless we run out of memory for new xa_nodes partway through
inserting entries into the xarray.  Hopefully we can support multi-index
entries in the page cache soon and all the awful code goes away.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 87 --
 1 file changed, 39 insertions(+), 48 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index e4a2eb1336be..54fbfc2c6c09 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -558,9 +558,10 @@ static unsigned long shmem_unused_huge_shrink(struct 
shmem_sb_info *sbinfo,
  */
 static int shmem_add_to_page_cache(struct page *page,
   struct address_space *mapping,
-  pgoff_t index, void *expected)
+  pgoff_t index, void *expected, gfp_t gfp)
 {
-   int error, nr = hpage_nr_pages(page);
+   XA_STATE(xas, >pages, index);
+   unsigned int i, nr = compound_order(page);
 
VM_BUG_ON_PAGE(PageTail(page), page);
VM_BUG_ON_PAGE(index != round_down(index, nr), page);
@@ -569,49 +570,47 @@ static int shmem_add_to_page_cache(struct page *page,
VM_BUG_ON(expected && PageTransHuge(page));
 
page_ref_add(page, nr);
-   page->mapping = mapping;
page->index = index;
+   page->mapping = mapping;
 
-   xa_lock_irq(>pages);
-   if (PageTransHuge(page)) {
-   void __rcu **results;
-   pgoff_t idx;
-   int i;
-
-   error = 0;
-   if (radix_tree_gang_lookup_slot(>pages,
-   , , index, 1) &&
-   idx < index + HPAGE_PMD_NR) {
-   error = -EEXIST;
+   do {
+   xas_lock_irq();
+   xas_create_range(, index + nr - 1);
+   if (xas_error())
+   goto unlock;
+   for (i = 0; i < nr; i++) {
+   void *entry = xas_load();
+   if (entry != expected)
+   xas_set_err(, -ENOENT);
+   if (xas_error())
+   goto undo;
+   xas_store(, page + i);
+   xas_next();
}
-
-   if (!error) {
-   for (i = 0; i < HPAGE_PMD_NR; i++) {
-   error = radix_tree_insert(>pages,
-   index + i, page + i);
-   VM_BUG_ON(error);
-   }
+   if (PageTransHuge(page)) {
count_vm_event(THP_FILE_ALLOC);
+   __inc_node_page_state(page, NR_SHMEM_THPS);
}
-   } else if (!expected) {
-   error = radix_tree_insert(>pages, index, page);
-   } else {
-   error = shmem_xa_replace(mapping, index, expected, page);
-   }
-
-   if (!error) {
mapping->nrpages += nr;
-   if (PageTransHuge(page))
-   __inc_node_page_state(page, NR_SHMEM_THPS);
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
__mod_node_page_state(page_pgdat(page), NR_SHMEM, nr);
-   xa_unlock_irq(>pages);
-   } else {
+   goto unlock;
+undo:
+   while (i-- > 0) {
+   xas_store(, NULL);
+   xas_prev();
+   }
+unlock:
+   xas_unlock_irq();
+   } while (xas_nomem(, gfp));
+
+   if (xas_error()) {
page->mapping = NULL;
-   xa_unlock_irq(>pages);
page_ref_sub(page, nr);
+   return xas_error();
}
-   return error;
+
+   return 0;
 }
 
 /*
@@ -1159,7 +1158,7 @@ static int shmem_unuse_inode(struct shmem_inode_info 
*info,
 */
if (!error)
error = shmem_add_to_page_cache(*pagep, mapping, index,
-   radswap);
+   radswap, gfp);
if (error != -ENOMEM) {
/*
 * Truncation and eviction use free_swap_and_cache(), which
@@ -1677,7 +1676,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
false);
if (!error) {
error = shmem_add_to_page_cache(page, mapping, index,
-   swp_to_radix_entry(swap));
+   swp_to_radix_entry(swap), gfp);
/*
 * We already confirmed swap under page lock, and make
 * no memory allocation here, so usually

[PATCH v4 28/73] page cache: Remove stray radix comment

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9e6158cfbaeb..79d0731b8762 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2601,7 +2601,7 @@ static struct page *do_read_cache_page(struct 
address_space *mapping,
put_page(page);
if (err == -EEXIST)
goto repeat;
-   /* Presumably ENOMEM for radix tree node */
+   /* Presumably ENOMEM for xarray node */
return ERR_PTR(err);
}
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 44/73] shmem: Convert shmem_tag_pins to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Simplify the locking by taking the spinlock while we walk the tree on
the assumption that many acquires and releases of the lock will be
worse than holding the lock for a (potentially) long time.

We could replicate the same locking behaviour with the xarray, but would
have to be careful that the xa_node wasn't RCU-freed under us before we
took the lock.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 39 ---
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index ce285ae635ea..2f41c7ceea18 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2601,35 +2601,28 @@ static loff_t shmem_file_llseek(struct file *file, 
loff_t offset, int whence)
 
 static void shmem_tag_pins(struct address_space *mapping)
 {
-   struct radix_tree_iter iter;
-   void **slot;
-   pgoff_t start;
+   XA_STATE(xas, >pages, 0);
struct page *page;
+   unsigned int tagged = 0;
 
lru_add_drain();
-   start = 0;
-   rcu_read_lock();
 
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   page = radix_tree_deref_slot(slot);
-   if (!page || radix_tree_exception(page)) {
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
-   continue;
-   }
-   } else if (page_count(page) - page_mapcount(page) > 1) {
-   xa_lock_irq(>pages);
-   radix_tree_tag_set(>pages, iter.index,
-  SHMEM_TAG_PINNED);
-   xa_unlock_irq(>pages);
-   }
+   xas_lock_irq();
+   xas_for_each(, page, ULONG_MAX) {
+   if (xa_is_value(page))
+   continue;
+   if (page_count(page) - page_mapcount(page) > 1)
+   xas_set_tag(, SHMEM_TAG_PINNED);
 
-   if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
-   cond_resched_rcu();
-   }
+   if (++tagged % XA_CHECK_SCHED)
+   continue;
+
+   xas_pause();
+   xas_unlock_irq();
+   cond_resched();
+   xas_lock_irq();
}
-   rcu_read_unlock();
+   xas_unlock_irq();
 }
 
 /*
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 41/73] shmem: Convert replace to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

shmem_radix_tree_replace() is renamed to shmem_xa_replace() and
converted to use the XArray API.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index c5731bb954a1..fad6c9e7402e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -321,24 +321,20 @@ void shmem_uncharge(struct inode *inode, long pages)
 }
 
 /*
- * Replace item expected in radix tree by a new item, while holding tree lock.
+ * Replace item expected in xarray by a new item, while holding xa_lock.
  */
-static int shmem_radix_tree_replace(struct address_space *mapping,
+static int shmem_xa_replace(struct address_space *mapping,
pgoff_t index, void *expected, void *replacement)
 {
-   struct radix_tree_node *node;
-   void **pslot;
+   XA_STATE(xas, >pages, index);
void *item;
 
VM_BUG_ON(!expected);
VM_BUG_ON(!replacement);
-   item = __radix_tree_lookup(>pages, index, , );
-   if (!item)
-   return -ENOENT;
+   item = xas_load();
if (item != expected)
return -ENOENT;
-   __radix_tree_replace(>pages, node, pslot,
-replacement, NULL);
+   xas_store(, replacement);
return 0;
 }
 
@@ -605,8 +601,7 @@ static int shmem_add_to_page_cache(struct page *page,
} else if (!expected) {
error = radix_tree_insert(>pages, index, page);
} else {
-   error = shmem_radix_tree_replace(mapping, index, expected,
-page);
+   error = shmem_xa_replace(mapping, index, expected, page);
}
 
if (!error) {
@@ -635,7 +630,7 @@ static void shmem_delete_from_page_cache(struct page *page, 
void *radswap)
VM_BUG_ON_PAGE(PageCompound(page), page);
 
xa_lock_irq(>pages);
-   error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
+   error = shmem_xa_replace(mapping, page->index, page, radswap);
page->mapping = NULL;
mapping->nrpages--;
__dec_node_page_state(page, NR_FILE_PAGES);
@@ -1550,8 +1545,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t 
gfp,
 * a nice clean interface for us to replace oldpage by newpage there.
 */
xa_lock_irq(_mapping->pages);
-   error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
-  newpage);
+   error = shmem_xa_replace(swap_mapping, swap_index, oldpage, newpage);
if (!error) {
__inc_node_page_state(newpage, NR_FILE_PAGES);
__dec_node_page_state(oldpage, NR_FILE_PAGES);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 40/73] pagevec: Use xa_tag_t

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Removes sparse warnings.

Signed-off-by: Matthew Wilcox 
---
 fs/btrfs/extent_io.c| 4 ++--
 fs/ext4/inode.c | 2 +-
 fs/f2fs/data.c  | 2 +-
 fs/gfs2/aops.c  | 2 +-
 include/linux/pagevec.h | 8 +---
 mm/swap.c   | 4 ++--
 6 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 94f734e7e66f..b8b5b4562d50 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3795,7 +3795,7 @@ int btree_write_cache_pages(struct address_space *mapping,
pgoff_t index;
pgoff_t end;/* Inclusive */
int scanned = 0;
-   int tag;
+   xa_tag_t tag;
 
pagevec_init();
if (wbc->range_cyclic) {
@@ -3922,7 +3922,7 @@ static int extent_write_cache_pages(struct address_space 
*mapping,
pgoff_t done_index;
int range_whole = 0;
int scanned = 0;
-   int tag;
+   xa_tag_t tag;
 
/*
 * We have to hold onto the inode so that ordered extents can do their
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..2534304daec3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2605,7 +2605,7 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
long left = mpd->wbc->nr_to_write;
pgoff_t index = mpd->first_page;
pgoff_t end = mpd->last_page;
-   int tag;
+   xa_tag_t tag;
int i, err = 0;
int blkbits = mpd->inode->i_blkbits;
ext4_lblk_t lblk;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8f51ac47b77f..c8f6d9806896 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1640,7 +1640,7 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,
pgoff_t last_idx = ULONG_MAX;
int cycled;
int range_whole = 0;
-   int tag;
+   xa_tag_t tag;
 
pagevec_init();
 
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 1daf15a1f00c..c78ecd008191 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -369,7 +369,7 @@ static int gfs2_write_cache_jdata(struct address_space 
*mapping,
pgoff_t done_index;
int cycled;
int range_whole = 0;
-   int tag;
+   xa_tag_t tag;
 
pagevec_init();
if (wbc->range_cyclic) {
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 5fb6580f7f23..5168901bf06d 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -9,6 +9,8 @@
 #ifndef _LINUX_PAGEVEC_H
 #define _LINUX_PAGEVEC_H
 
+#include 
+
 /* 14 pointers + two long's align the pagevec structure to a power of two */
 #define PAGEVEC_SIZE   14
 
@@ -40,12 +42,12 @@ static inline unsigned pagevec_lookup(struct pagevec *pvec,
 
 unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
-   int tag);
+   xa_tag_t tag);
 unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
-   int tag, unsigned max_pages);
+   xa_tag_t tag, unsigned max_pages);
 static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
-   struct address_space *mapping, pgoff_t *index, int tag)
+   struct address_space *mapping, pgoff_t *index, xa_tag_t tag)
 {
return pagevec_lookup_range_tag(pvec, mapping, index, (pgoff_t)-1, tag);
 }
diff --git a/mm/swap.c b/mm/swap.c
index 8d7773cb2c3f..31d79479dacf 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -991,7 +991,7 @@ EXPORT_SYMBOL(pagevec_lookup_range);
 
 unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
-   int tag)
+   xa_tag_t tag)
 {
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
PAGEVEC_SIZE, pvec->pages);
@@ -1001,7 +1001,7 @@ EXPORT_SYMBOL(pagevec_lookup_range_tag);
 
 unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
-   int tag, unsigned max_pages)
+   xa_tag_t tag, unsigned max_pages)
 {
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 37/73] mm: Convert huge_memory to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Quite a straightforward conversion.

Signed-off-by: Matthew Wilcox 
---
 mm/huge_memory.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 28909c475ee5..5a41b00d86bd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2379,7 +2379,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
if (PageAnon(head) && !PageSwapCache(head)) {
page_ref_inc(page_tail);
} else {
-   /* Additional pin to radix tree */
+   /* Additional pin to page cache */
page_ref_add(page_tail, 2);
}
 
@@ -2450,13 +2450,13 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
ClearPageCompound(head);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
-   /* Additional pin to radix tree of swap cache */
+   /* Additional pin to swap cache */
if (PageSwapCache(head))
page_ref_add(head, 2);
else
page_ref_inc(head);
} else {
-   /* Additional pin to radix tree */
+   /* Additional pin to page cache */
page_ref_add(head, 2);
xa_unlock(>mapping->pages);
}
@@ -2568,7 +2568,7 @@ bool can_split_huge_page(struct page *page, int 
*pextra_pins)
 {
int extra_pins;
 
-   /* Additional pins from radix tree */
+   /* Additional pins from page cache */
if (PageAnon(page))
extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
else
@@ -2664,17 +2664,14 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags);
 
if (mapping) {
-   void **pslot;
+   XA_STATE(xas, >pages, page_index(head));
 
-   xa_lock(>pages);
-   pslot = radix_tree_lookup_slot(>pages,
-   page_index(head));
/*
-* Check if the head page is present in radix tree.
+* Check if the head page is present in page cache.
 * We assume all tail are present too, if head is there.
 */
-   if (radix_tree_deref_slot_protected(pslot,
-   >pages.xa_lock) != head)
+   xa_lock(>pages);
+   if (xas_load() != head)
goto fail;
}
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 36/73] mm: Convert page migration to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 mm/migrate.c | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 59f18c571120..7122fec9b075 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -322,7 +322,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t 
*ptep,
page = migration_entry_to_page(entry);
 
/*
-* Once radix-tree replacement of page migration started, page_count
+* Once page cache replacement of page migration started, page_count
 * *must* be zero. And, we don't want to call wait_on_page_locked()
 * against a page without get_page().
 * So, we use get_page_unless_zero(), here. Even failed, page fault
@@ -437,10 +437,10 @@ int migrate_page_move_mapping(struct address_space 
*mapping,
struct buffer_head *head, enum migrate_mode mode,
int extra_count)
 {
+   XA_STATE(xas, >pages, page_index(page));
struct zone *oldzone, *newzone;
int dirty;
int expected_count = 1 + extra_count;
-   void **pslot;
 
/*
 * Device public or private pages have an extra refcount as they are
@@ -466,20 +466,16 @@ int migrate_page_move_mapping(struct address_space 
*mapping,
oldzone = page_zone(page);
newzone = page_zone(newpage);
 
-   xa_lock_irq(>pages);
-
-   pslot = radix_tree_lookup_slot(>pages,
-   page_index(page));
+   xas_lock_irq();
 
expected_count += 1 + page_has_private(page);
-   if (page_count(page) != expected_count ||
-   radix_tree_deref_slot_protected(pslot, >pages.xa_lock) 
!= page) {
-   xa_unlock_irq(>pages);
+   if (page_count(page) != expected_count || xas_load() != page) {
+   xas_unlock_irq();
return -EAGAIN;
}
 
if (!page_ref_freeze(page, expected_count)) {
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -493,7 +489,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
if (mode == MIGRATE_ASYNC && head &&
!buffer_migrate_lock_buffers(head, mode)) {
page_ref_unfreeze(page, expected_count);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -521,7 +517,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
SetPageDirty(newpage);
}
 
-   radix_tree_replace_slot(>pages, pslot, newpage);
+   xas_store(, newpage);
 
/*
 * Drop cache reference from old page by unfreezing
@@ -530,7 +526,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 */
page_ref_unfreeze(page, expected_count - 1);
 
-   xa_unlock(>pages);
+   xas_unlock();
/* Leave irq disabled to prevent preemption while updating stats */
 
/*
@@ -570,22 +566,18 @@ EXPORT_SYMBOL(migrate_page_move_mapping);
 int migrate_huge_page_move_mapping(struct address_space *mapping,
   struct page *newpage, struct page *page)
 {
+   XA_STATE(xas, >pages, page_index(page));
int expected_count;
-   void **pslot;
-
-   xa_lock_irq(>pages);
-
-   pslot = radix_tree_lookup_slot(>pages, page_index(page));
 
+   xas_lock_irq();
expected_count = 2 + page_has_private(page);
-   if (page_count(page) != expected_count ||
-   radix_tree_deref_slot_protected(pslot, >pages.xa_lock) 
!= page) {
-   xa_unlock_irq(>pages);
+   if (page_count(page) != expected_count || xas_load() != page) {
+   xas_unlock_irq();
return -EAGAIN;
}
 
if (!page_ref_freeze(page, expected_count)) {
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -594,11 +586,11 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
 
get_page(newpage);
 
-   radix_tree_replace_slot(>pages, pslot, newpage);
+   xas_store(, newpage);
 
page_ref_unfreeze(page, expected_count - 1);
 
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
 
return MIGRATEPAGE_SUCCESS;
 }
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 39/73] mm: Convert khugepaged_scan_shmem to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Slightly shorter and easier to read code.

Signed-off-by: Matthew Wilcox 
---
 mm/khugepaged.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9f49d0cd61c2..15f1b2d81a69 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1534,8 +1534,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
pgoff_t start, struct page **hpage)
 {
struct page *page = NULL;
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
int present, swap;
int node = NUMA_NO_NODE;
int result = SCAN_SUCCEED;
@@ -1544,17 +1543,11 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
swap = 0;
memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
rcu_read_lock();
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   if (iter.index >= start + HPAGE_PMD_NR)
-   break;
-
-   page = radix_tree_deref_slot(slot);
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
+   xas_for_each(, page, start + HPAGE_PMD_NR - 1) {
+   if (xas_retry(, page))
continue;
-   }
 
-   if (radix_tree_exception(page)) {
+   if (xa_is_value(page)) {
if (++swap > khugepaged_max_ptes_swap) {
result = SCAN_EXCEED_SWAP_PTE;
break;
@@ -1593,7 +1586,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
present++;
 
if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
}
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 43/73] shmem: Convert find_swap_entry to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a 1:1 conversion.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 654f367aca90..ce285ae635ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1076,28 +1076,27 @@ static void shmem_evict_inode(struct inode *inode)
clear_inode(inode);
 }
 
-static unsigned long find_swap_entry(struct radix_tree_root *root, void *item)
+static unsigned long find_swap_entry(struct xarray *xa, void *item)
 {
-   struct radix_tree_iter iter;
-   void **slot;
-   unsigned long found = -1;
+   XA_STATE(xas, xa, 0);
unsigned int checked = 0;
+   void *entry;
 
rcu_read_lock();
-   radix_tree_for_each_slot(slot, root, , 0) {
-   if (*slot == item) {
-   found = iter.index;
+   xas_for_each(, entry, ULONG_MAX) {
+   if (xas_retry(, entry))
+   continue;
+   if (entry == item)
break;
-   }
checked++;
-   if ((checked % 4096) != 0)
+   if ((checked % XA_CHECK_SCHED) != 0)
continue;
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
-
rcu_read_unlock();
-   return found;
+
+   return xas_invalid() ? -1 : xas.xa_index;
 }
 
 /*
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 50/73] shmem: Comment fixups

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Remove the last mentions of radix tree from various comments.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 01102e2e0ef3..090937922c1d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -743,7 +743,7 @@ void shmem_unlock_mapping(struct address_space *mapping)
 }
 
 /*
- * Remove range of pages and swap entries from radix tree, and free them.
+ * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
  */
 static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
@@ -1118,10 +1118,10 @@ static int shmem_unuse_inode(struct shmem_inode_info 
*info,
 * We needed to drop mutex to make that restrictive page
 * allocation, but the inode might have been freed while we
 * dropped it: although a racing shmem_evict_inode() cannot
-* complete without emptying the radix_tree, our page lock
+* complete without emptying the page cache, our page lock
 * on this swapcache page is not enough to prevent that -
 * free_swap_and_cache() of our swap entry will only
-* trylock_page(), removing swap from radix_tree whatever.
+* trylock_page(), removing swap from page cache whatever.
 *
 * We must not proceed to shmem_add_to_page_cache() if the
 * inode has been freed, but of course we cannot rely on
@@ -1187,7 +1187,7 @@ int shmem_unuse(swp_entry_t swap, struct page *page)
false);
if (error)
goto out;
-   /* No radix_tree_preload: swap entry keeps a place for page in tree */
+   /* No memory allocation: swap entry occupies the slot for the page */
error = -EAGAIN;
 
mutex_lock(_swaplist_mutex);
@@ -1862,7 +1862,7 @@ alloc_nohuge: page = 
shmem_alloc_and_acct_page(gfp, inode,
spin_unlock_irq(>lock);
goto repeat;
}
-   if (error == -EEXIST)   /* from above or from radix_tree_insert */
+   if (error == -EEXIST)
goto repeat;
return error;
 }
@@ -2474,7 +2474,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
 }
 
 /*
- * llseek SEEK_DATA or SEEK_HOLE through the radix_tree.
+ * llseek SEEK_DATA or SEEK_HOLE through the page cache.
  */
 static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
pgoff_t index, pgoff_t end, int whence)
@@ -2562,7 +2562,7 @@ static loff_t shmem_file_llseek(struct file *file, loff_t 
offset, int whence)
 }
 
 /*
- * We need a tag: a new tag would expand every radix_tree_node by 8 bytes,
+ * We need a tag: a new tag would expand every xa_node by 8 bytes,
  * so reuse a tag which we firmly believe is never set or cleared on shmem.
  */
 #define SHMEM_TAG_PINNEDPAGECACHE_TAG_TOWRITE
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 49/73] shmem: Convert shmem_partial_swap_usage to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Simpler code because the xarray takes care of things like the limit and
dereferencing the slot.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index ca45ff493587..01102e2e0ef3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -658,29 +658,17 @@ static int shmem_free_swap(struct address_space *mapping,
 unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end)
 {
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
struct page *page;
unsigned long swapped = 0;
 
rcu_read_lock();
-
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   if (iter.index >= end)
-   break;
-
-   page = radix_tree_deref_slot(slot);
-
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
-   continue;
-   }
-
+   xas_for_each(, page, end - 1) {
if (xa_is_value(page))
swapped++;
 
if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
}
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 48/73] shmem: Convert shmem_free_swap to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a perfect use for xa_cmpxchg().  Note the use of 0 for GFP
flags; we won't be allocating memory.

Signed-off-by: Matthew Wilcox 
---
 mm/shmem.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 768d470a03da..ca45ff493587 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -635,16 +635,13 @@ static void shmem_delete_from_page_cache(struct page 
*page, void *radswap)
 }
 
 /*
- * Remove swap entry from radix tree, free the swap and its page cache.
+ * Remove swap entry from page cache, free the swap and its page cache.
  */
 static int shmem_free_swap(struct address_space *mapping,
   pgoff_t index, void *radswap)
 {
-   void *old;
+   void *old = xa_cmpxchg(>pages, index, radswap, NULL, 0);
 
-   xa_lock_irq(>pages);
-   old = radix_tree_delete_item(>pages, index, radswap);
-   xa_unlock_irq(>pages);
if (old != radswap)
return -ENOENT;
free_swap_and_cache(radix_to_swp_entry(radswap));
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 18/73] xarray: Add xas_create_range

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This hopefully temporary function is useful for users who have not yet
been converted to multi-index entries.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h |  2 ++
 lib/xarray.c   | 22 ++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 416708ace115..afa3374f20bd 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -594,6 +594,8 @@ void xas_init_tags(const struct xa_state *);
 bool xas_nomem(struct xa_state *, gfp_t);
 void xas_pause(struct xa_state *);
 
+void xas_create_range(struct xa_state *, unsigned long max);
+
 /**
  * xas_reload() - Refetch an entry from the xarray.
  * @xas: XArray operation state.
diff --git a/lib/xarray.c b/lib/xarray.c
index 8c6e83d10554..cc88df7bd6df 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -570,6 +570,28 @@ void *xas_create(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_create);
 
+/**
+ * xas_create_range() - Ensure that stores to this range will succeed
+ * @xas: XArray operation state.
+ * @max: The highest index to create a slot for.
+ *
+ * Creates all of the slots in the range between the current position of
+ * @xas and @max.  This is for the benefit of users who have not yet been
+ * converted to multi-index entries.
+ *
+ * The implementation is naive.
+ */
+void xas_create_range(struct xa_state *xas, unsigned long max)
+{
+   XA_STATE(tmp, xas->xa, xas->xa_index);
+
+   do {
+   xas_create();
+   xas_set(, tmp.xa_index + XA_CHUNK_SIZE);
+   } while (tmp.xa_index < max);
+}
+EXPORT_SYMBOL_GPL(xas_create_range);
+
 static void store_siblings(struct xa_state *xas,
void *entry, int *countp, int *valuesp)
 {
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 52/73] fs: Convert buffer to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Mostly comment fixes, but one use of __xa_set_tag.

Signed-off-by: Matthew Wilcox 
---
 fs/buffer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 33c08624d45b..986b50b0fd50 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -593,7 +593,7 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct 
inode *inode)
 EXPORT_SYMBOL(mark_buffer_dirty_inode);
 
 /*
- * Mark the page dirty, and set it dirty in the radix tree, and mark the inode
+ * Mark the page dirty, and set it dirty in the page cache, and mark the inode
  * dirty.
  *
  * If warn is true, then emit a warning if the page is not uptodate and has
@@ -610,8 +610,8 @@ void __set_page_dirty(struct page *page, struct 
address_space *mapping,
if (page->mapping) {/* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages,
-   page_index(page), PAGECACHE_TAG_DIRTY);
+   __xa_set_tag(>pages, page_index(page),
+   PAGECACHE_TAG_DIRTY);
}
xa_unlock_irqrestore(>pages, flags);
 }
@@ -1073,7 +1073,7 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * The relationship between dirty buffers and dirty pages:
  *
  * Whenever a page has any dirty buffers, the page's dirty bit is set, and
- * the page is tagged dirty in its radix tree.
+ * the page is tagged dirty in the page cache.
  *
  * At all times, the dirtiness of the buffers represents the dirtiness of
  * subsections of the page.  If the page has buffers, the page dirty bit is
@@ -1096,9 +1096,9 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * mark_buffer_dirty - mark a buffer_head as needing writeout
  * @bh: the buffer_head to mark dirty
  *
- * mark_buffer_dirty() will set the dirty bit against the buffer, then set its
- * backing page dirty, then tag the page as dirty in its address_space's radix
- * tree and then attach the address_space's inode to its superblock's dirty
+ * mark_buffer_dirty() will set the dirty bit against the buffer, then set
+ * its backing page dirty, then tag the page as dirty in the page cache
+ * and then attach the address_space's inode to its superblock's dirty
  * inode list.
  *
  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 15/73] xarray: Add xa_get_entries, xa_get_tagged and xa_get_maybe_tag

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

These functions allow a range of xarray entries to be extracted into a
compact normal array.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h | 27 
 lib/xarray.c   | 88 ++
 2 files changed, 115 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 4e61ebd406f5..c3efcc3432f7 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -135,6 +135,33 @@ void *xa_clear_tag(struct xarray *, unsigned long index, 
xa_tag_t);
 
 void *xa_find(struct xarray *xa, unsigned long *index, unsigned long max);
 void *xa_find_after(struct xarray *xa, unsigned long *index, unsigned long 
max);
+int xa_get_entries(struct xarray *, void **dst, unsigned long start,
+   unsigned long max, unsigned int n);
+int xa_get_tagged(struct xarray *, void **dst, unsigned long start,
+   unsigned long max, unsigned int n, xa_tag_t);
+
+/**
+ * xa_get_maybe_tag() - Copy entries from the XArray into a normal array.
+ * @xa: The source XArray to copy from.
+ * @dst: The buffer to copy pointers into.
+ * @start: The first index in the XArray eligible to be copied from.
+ * @max: The last index in the XArray eligible to be copied from.
+ * @n: The maximum number of entries to copy.
+ * @tag: Tag number.
+ *
+ * If you specify %XA_NO_TAG as the tag number, this is the same as
+ * xa_get_entries().  Otherwise, it is the same as xa_get_tagged().
+ *
+ * Return: The number of entries copied.
+ */
+static inline int xa_get_maybe_tag(struct xarray *xa, void **dst,
+   unsigned long start, unsigned long max,
+   unsigned int n, xa_tag_t tag)
+{
+   if (tag == XA_NO_TAG)
+   return xa_get_entries(xa, dst, start, max, n);
+   return xa_get_tagged(xa, dst, start, max, n, tag);
+}
 
 /**
  * xa_for_each() - Iterate over a portion of an XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index f9eaac2d85f9..251724f62b11 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1253,6 +1253,94 @@ void *xa_find_after(struct xarray *xa, unsigned long 
*indexp, unsigned long max)
 }
 EXPORT_SYMBOL(xa_find_after);
 
+/**
+ * xa_get_entries() - Copy entries from the XArray into a normal array.
+ * @xa: The source XArray to copy from.
+ * @dst: The buffer to copy pointers into.
+ * @start: The first index in the XArray eligible to be copied from.
+ * @max: The last index in the XArray eligible to be copied from.
+ * @n: The maximum number of entries to copy.
+ *
+ * Copies up to @n non-NULL entries from the XArray.  The copied entries will
+ * have indices between @start and @max, inclusive.
+ *
+ * This function uses the RCU lock to protect itself.  That means that the
+ * entries returned may not represent a snapshot of the XArray at a moment
+ * in time.  For example, if index 5 is stored to, then index 10 is stored to,
+ * calling xa_get_entries() may return the old contents of index 5 and the
+ * new contents of index 10.  Indices not modified while this function is
+ * running will not be skipped.
+ *
+ * If you need stronger guarantees, holding the xa_lock across calls to this
+ * function will prevent concurrent modification.
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_entries(struct xarray *xa, void **dst, unsigned long start,
+   unsigned long max, unsigned int n)
+{
+   XA_STATE(xas, xa, start);
+   void *entry;
+   unsigned int i = 0;
+
+   if (!n)
+   return 0;
+
+   rcu_read_lock();
+   xas_for_each(, entry, max) {
+   if (xas_retry(, entry))
+   continue;
+   dst[i++] = entry;
+   if (i == n)
+   break;
+   }
+   rcu_read_unlock();
+
+   return i;
+}
+EXPORT_SYMBOL(xa_get_entries);
+
+/**
+ * xa_get_tagged() - Copy tagged entries from the XArray into a normal array.
+ * @xa: The source XArray to copy from.
+ * @dst: The buffer to copy pointers into.
+ * @start: The first index in the XArray eligible to be copied from.
+ * @max: The last index in the XArray eligible to be copied from
+ * @n: The maximum number of entries to copy.
+ * @tag: Tag number.
+ *
+ * Copies up to @n non-NULL entries that have @tag set from the XArray.  The
+ * copied entries will have indices between @start and @max, inclusive.
+ *
+ * See the xa_get_entries() documentation for the consistency guarantees
+ * provided.
+ *
+ * Return: The number of entries copied.
+ */
+int xa_get_tagged(struct xarray *xa, void **dst, unsigned long start,
+   unsigned long max, unsigned int n, xa_tag_t tag)
+{
+   XA_STATE(xas, xa, start);
+   void *entry;
+   unsigned int i = 0;
+
+   if (!n)
+   return 0;
+
+   rcu_read_lock();
+   xas_for_each_tag(, entry, max, tag) {
+   if (xas_retry(,

[PATCH v4 57/73] dax: Convert dax_unlock_mapping_entry to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Replace slot_locked() with dax_locked() and inline unlock_slot() into
its only caller.

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 50 --
 1 file changed, 16 insertions(+), 34 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 86bacca51eed..03bfa599f75c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -73,6 +73,11 @@ fs_initcall(init_dax_wait_table);
 #define DAX_ZERO_PAGE  (1UL << 2)
 #define DAX_EMPTY  (1UL << 3)
 
+static bool dax_locked(void *entry)
+{
+   return xa_to_value(entry) & DAX_ENTRY_LOCK;
+}
+
 static unsigned long dax_radix_sector(void *entry)
 {
return xa_to_value(entry) >> DAX_SHIFT;
@@ -182,17 +187,6 @@ static void dax_wake_mapping_entry_waiter(struct 
address_space *mapping,
__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
 }
 
-/*
- * Check whether the given slot is locked. The function must be called with
- * mapping xa_lock held
- */
-static inline int slot_locked(struct address_space *mapping, void **slot)
-{
-   unsigned long entry = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
-   return entry & DAX_ENTRY_LOCK;
-}
-
 /*
  * Mark the given slot is locked. The function must be called with
  * mapping xa_lock held
@@ -206,19 +200,6 @@ static inline void *lock_slot(struct address_space 
*mapping, void **slot)
return entry;
 }
 
-/*
- * Mark the given slot is unlocked. The function must be called with
- * mapping xa_lock held
- */
-static inline void *unlock_slot(struct address_space *mapping, void **slot)
-{
-   unsigned long v = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
-   void *entry = xa_mk_value(v & ~DAX_ENTRY_LOCK);
-   radix_tree_replace_slot(>pages, slot, entry);
-   return entry;
-}
-
 /*
  * Lookup entry in radix tree, wait for it to become unlocked if it is
  * a data value entry and return it. The caller must call
@@ -242,8 +223,7 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
entry = __radix_tree_lookup(>pages, index, NULL,
  );
if (!entry ||
-   WARN_ON_ONCE(!xa_is_value(entry)) ||
-   !slot_locked(mapping, slot)) {
+   WARN_ON_ONCE(!xa_is_value(entry)) || !dax_locked(entry)) {
if (slotp)
*slotp = slot;
return entry;
@@ -262,17 +242,19 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
 static void dax_unlock_mapping_entry(struct address_space *mapping,
 pgoff_t index)
 {
-   void *entry, **slot;
+   XA_STATE(xas, >pages, index);
+   void *entry;
 
-   xa_lock_irq(>pages);
-   entry = __radix_tree_lookup(>pages, index, NULL, );
-   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) ||
-!slot_locked(mapping, slot))) {
-   xa_unlock_irq(>pages);
+   xas_lock_irq();
+   entry = xas_load();
+   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) || !dax_locked(entry))) {
+   xas_unlock_irq();
return;
}
-   unlock_slot(mapping, slot);
-   xa_unlock_irq(>pages);
+   entry = xa_mk_value(xa_to_value(entry) & ~DAX_ENTRY_LOCK);
+   xas_store(, entry);
+   /* Safe to not call xas_pause here -- we don't touch the array after */
+   xas_unlock_irq();
dax_wake_mapping_entry_waiter(mapping, index, entry, false);
 }
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 53/73] fs: Convert writeback to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

A couple of short loops.

Signed-off-by: Matthew Wilcox 
---
 fs/fs-writeback.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a3c2352507f6..18ad86ccba96 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -339,9 +339,9 @@ static void inode_switch_wbs_work_fn(struct work_struct 
*work)
struct address_space *mapping = inode->i_mapping;
struct bdi_writeback *old_wb = inode->i_wb;
struct bdi_writeback *new_wb = isw->new_wb;
-   struct radix_tree_iter iter;
+   XA_STATE(xas, >pages, 0);
+   struct page *page;
bool switched = false;
-   void **slot;
 
/*
 * By the time control reaches here, RCU grace period has passed
@@ -373,27 +373,20 @@ static void inode_switch_wbs_work_fn(struct work_struct 
*work)
/*
 * Count and transfer stats.  Note that PAGECACHE_TAG_DIRTY points
 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
-* pages actually under underwriteback.
+* pages actually under writeback.
 */
-   radix_tree_for_each_tagged(slot, >pages, , 0,
-  PAGECACHE_TAG_DIRTY) {
-   struct page *page = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
-   if (likely(page) && PageDirty(page)) {
+   xas_for_each_tag(, page, ULONG_MAX, PAGECACHE_TAG_DIRTY) {
+   if (PageDirty(page)) {
dec_wb_stat(old_wb, WB_RECLAIMABLE);
inc_wb_stat(new_wb, WB_RECLAIMABLE);
}
}
 
-   radix_tree_for_each_tagged(slot, >pages, , 0,
-  PAGECACHE_TAG_WRITEBACK) {
-   struct page *page = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
-   if (likely(page)) {
-   WARN_ON_ONCE(!PageWriteback(page));
-   dec_wb_stat(old_wb, WB_WRITEBACK);
-   inc_wb_stat(new_wb, WB_WRITEBACK);
-   }
+   xas_set(, 0);
+   xas_for_each_tag(, page, ULONG_MAX, PAGECACHE_TAG_WRITEBACK) {
+   WARN_ON_ONCE(!PageWriteback(page));
+   dec_wb_stat(old_wb, WB_WRITEBACK);
+   inc_wb_stat(new_wb, WB_WRITEBACK);
}
 
wb_get(new_wb);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 58/73] dax: Convert lock_slot to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 03bfa599f75c..d2007a17d257 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -188,15 +188,13 @@ static void dax_wake_mapping_entry_waiter(struct 
address_space *mapping,
 }
 
 /*
- * Mark the given slot is locked. The function must be called with
- * mapping xa_lock held
+ * Mark the given slot as locked.  Must be called with xa_lock held.
  */
-static inline void *lock_slot(struct address_space *mapping, void **slot)
+static inline void *lock_slot(struct xa_state *xas)
 {
-   unsigned long v = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
+   unsigned long v = xa_to_value(xas_load(xas));
void *entry = xa_mk_value(v | DAX_ENTRY_LOCK);
-   radix_tree_replace_slot(>pages, slot, entry);
+   xas_store(xas, entry);
return entry;
 }
 
@@ -247,7 +245,7 @@ static void dax_unlock_mapping_entry(struct address_space 
*mapping,
 
xas_lock_irq();
entry = xas_load();
-   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) || !dax_locked(entry))) {
+   if (WARN_ON_ONCE(!xa_is_value(entry) || !dax_locked(entry))) {
xas_unlock_irq();
return;
}
@@ -306,6 +304,7 @@ static void put_unlocked_mapping_entry(struct address_space 
*mapping,
 static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
unsigned long size_flag)
 {
+   XA_STATE(xas, >pages, index);
bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
void *entry, **slot;
 
@@ -344,7 +343,7 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
 * Make sure 'entry' remains valid while we drop
 * mapping xa_lock.
 */
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
}
 
xa_unlock_irq(>pages);
@@ -411,7 +410,7 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
xa_unlock_irq(>pages);
return entry;
}
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
  out_unlock:
xa_unlock_irq(>pages);
return entry;
@@ -643,6 +642,7 @@ static int dax_writeback_one(struct block_device *bdev,
pgoff_t index, void *entry)
 {
struct radix_tree_root *pages = >pages;
+   XA_STATE(xas, pages, index);
void *entry2, **slot, *kaddr;
long ret = 0, id;
sector_t sector;
@@ -679,7 +679,7 @@ static int dax_writeback_one(struct block_device *bdev,
if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))
goto put_unlocked;
/* Lock the entry to serialize with page faults */
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
/*
 * We can clear the tag now but we have to be careful so that concurrent
 * dax_writeback_one() calls for the same index cannot finish before we
@@ -1504,8 +1504,9 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
  pfn_t pfn)
 {
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-   void *entry, **slot;
pgoff_t index = vmf->pgoff;
+   XA_STATE(xas, >pages, index);
+   void *entry, **slot;
int vmf_ret, error;
 
xa_lock_irq(>pages);
@@ -1521,7 +1522,7 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
return VM_FAULT_NOPAGE;
}
radix_tree_tag_set(>pages, index, PAGECACHE_TAG_DIRTY);
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
xa_unlock_irq(>pages);
switch (pe_size) {
case PE_SIZE_PTE:
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 55/73] f2fs: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a straightforward conversion.

Signed-off-by: Matthew Wilcox 
---
 fs/f2fs/data.c   |  3 +--
 fs/f2fs/dir.c|  5 +
 fs/f2fs/inline.c |  6 +-
 fs/f2fs/node.c   | 10 ++
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c8f6d9806896..1f3f192f152f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2175,8 +2175,7 @@ void f2fs_set_page_dirty_nobuffers(struct page *page)
xa_lock_irqsave(>pages, flags);
WARN_ON_ONCE(!PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages,
-   page_index(page), PAGECACHE_TAG_DIRTY);
+   __xa_set_tag(>pages, page_index(page), PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(>pages, flags);
unlock_page_memcg(page);
 
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index b5515ea6bb2f..296070016ec9 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -708,7 +708,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
unsigned int bit_pos;
int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int i;
 
f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
@@ -739,10 +738,8 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
 
if (bit_pos == NR_DENTRY_IN_BLOCK &&
!truncate_hole(dir, page->index, page->index + 1)) {
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages, page_index(page),
+   xa_clear_tag(>pages, page_index(page),
 PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
 
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 7858b8e15f33..d3c3f84beca9 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -204,7 +204,6 @@ int f2fs_write_inline_data(struct inode *inode, struct page 
*page)
void *src_addr, *dst_addr;
struct dnode_of_data dn;
struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int err;
 
set_new_dnode(, inode, NULL, NULL, 0);
@@ -226,10 +225,7 @@ int f2fs_write_inline_data(struct inode *inode, struct 
page *page)
kunmap_atomic(src_addr);
set_page_dirty(dn.inode_page);
 
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages, page_index(page),
-PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
+   xa_clear_tag(>pages, page_index(page), PAGECACHE_TAG_DIRTY);
 
set_inode_flag(inode, FI_APPEND_WRITE);
set_inode_flag(inode, FI_DATA_EXIST);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 6b64a3009d55..0a6d5c2f996e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -88,14 +88,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int 
type)
 static void clear_node_page_dirty(struct page *page)
 {
struct address_space *mapping = page->mapping;
-   unsigned int long flags;
 
if (PageDirty(page)) {
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages,
-   page_index(page),
+   xa_clear_tag(>pages, page_index(page),
PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
 
clear_page_dirty_for_io(page);
dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
@@ -1142,9 +1138,7 @@ void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
return;
f2fs_bug_on(sbi, check_nid_range(sbi, nid));
 
-   rcu_read_lock();
-   apage = radix_tree_lookup(_MAPPING(sbi)->pages, nid);
-   rcu_read_unlock();
+   apage = xa_load(_MAPPING(sbi)->pages, nid);
if (apage)
return;
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 56/73] lustre: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 drivers/staging/lustre/lustre/llite/glimpse.c   | 12 +---
 drivers/staging/lustre/lustre/mdc/mdc_request.c | 16 
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c 
b/drivers/staging/lustre/lustre/llite/glimpse.c
index 5f2843da911c..25232fdf5797 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -57,7 +57,7 @@ static const struct cl_lock_descr whole_file = {
 };
 
 /*
- * Check whether file has possible unwriten pages.
+ * Check whether file has possible unwritten pages.
  *
  * \retval 1file is mmap-ed or has dirty pages
  *  0otherwise
@@ -66,16 +66,14 @@ blkcnt_t dirty_cnt(struct inode *inode)
 {
blkcnt_t cnt = 0;
struct vvp_object *vob = cl_inode2vvp(inode);
-   void  *results[1];
 
-   if (inode->i_mapping)
-   cnt += radix_tree_gang_lookup_tag(>i_mapping->pages,
- results, 0, 1,
- PAGECACHE_TAG_DIRTY);
+   if (inode->i_mapping && xa_tagged(>i_mapping->pages,
+   PAGECACHE_TAG_DIRTY))
+   cnt = 1;
if (cnt == 0 && atomic_read(>vob_mmap_cnt) > 0)
cnt = 1;
 
-   return (cnt > 0) ? 1 : 0;
+   return cnt;
 }
 
 int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c 
b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 2ec79a6b17da..ea23247e9e02 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -934,17 +934,18 @@ static struct page *mdc_page_locate(struct address_space 
*mapping, __u64 *hash,
 * hash _smaller_ than one we are looking for.
 */
unsigned long offset = hash_x_index(*hash, hash64);
+   XA_STATE(xas, >pages, offset);
struct page *page;
-   int found;
 
-   xa_lock_irq(>pages);
-   found = radix_tree_gang_lookup(>pages,
-  (void **), offset, 1);
-   if (found > 0 && !xa_is_value(page)) {
+   xas_lock_irq();
+   page = xas_find(, ULONG_MAX);
+   if (xa_is_value(page))
+   page = NULL;
+   if (page) {
struct lu_dirpage *dp;
 
get_page(page);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
/*
 * In contrast to find_lock_page() we are sure that directory
 * page cannot be truncated (while DLM lock is held) and,
@@ -992,8 +993,7 @@ static struct page *mdc_page_locate(struct address_space 
*mapping, __u64 *hash,
page = ERR_PTR(-EIO);
}
} else {
-   xa_unlock_irq(>pages);
-   page = NULL;
+   xas_unlock_irq();
}
return page;
 }
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This eliminates a call to radix_tree_preload().

Signed-off-by: Matthew Wilcox 
---
 fs/xfs/xfs_mru_cache.c | 72 +++---
 1 file changed, 33 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index f8a674d7f092..2179bede5396 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -101,10 +101,9 @@
  * an infinite loop in the code.
  */
 struct xfs_mru_cache {
-   struct radix_tree_root  store; /* Core storage data structure.  */
+   struct xarray   store; /* Core storage data structure.  */
struct list_head*lists;/* Array of lists, one per grp.  */
struct list_headreap_list; /* Elements overdue for reaping. */
-   spinlock_t  lock;  /* Lock to protect this struct.  */
unsigned intgrp_count; /* Number of discrete groups.*/
unsigned intgrp_time;  /* Time period spanned by grps.  */
unsigned intlru_grp;   /* Group containing time zero.   */
@@ -232,22 +231,21 @@ _xfs_mru_cache_list_insert(
  * data store, removing it from the reap list, calling the client's free
  * function and deleting the element from the element zone.
  *
- * We get called holding the mru->lock, which we drop and then reacquire.
- * Sparse need special help with this to tell it we know what we are doing.
+ * We get called holding the mru->store lock, which we drop and then reacquire.
+ * Sparse needs special help with this to tell it we know what we are doing.
  */
 STATIC void
 _xfs_mru_cache_clear_reap_list(
struct xfs_mru_cache*mru)
-   __releases(mru->lock) __acquires(mru->lock)
+   __releases(mru->store) __acquires(mru->store)
 {
struct xfs_mru_cache_elem *elem, *next;
struct list_headtmp;
 
INIT_LIST_HEAD();
list_for_each_entry_safe(elem, next, >reap_list, list_node) {
-
/* Remove the element from the data store. */
-   radix_tree_delete(>store, elem->key);
+   __xa_erase(>store, elem->key);
 
/*
 * remove to temp list so it can be freed without
@@ -255,14 +253,14 @@ _xfs_mru_cache_clear_reap_list(
 */
list_move(>list_node, );
}
-   spin_unlock(>lock);
+   xa_unlock(>store);
 
list_for_each_entry_safe(elem, next, , list_node) {
list_del_init(>list_node);
mru->free_func(elem);
}
 
-   spin_lock(>lock);
+   xa_lock(>store);
 }
 
 /*
@@ -284,7 +282,7 @@ _xfs_mru_cache_reap(
if (!mru || !mru->lists)
return;
 
-   spin_lock(>lock);
+   xa_lock(>store);
next = _xfs_mru_cache_migrate(mru, jiffies);
_xfs_mru_cache_clear_reap_list(mru);
 
@@ -298,7 +296,7 @@ _xfs_mru_cache_reap(
queue_delayed_work(xfs_mru_reap_wq, >work, next);
}
 
-   spin_unlock(>lock);
+   xa_unlock(>store);
 }
 
 int
@@ -358,13 +356,8 @@ xfs_mru_cache_create(
for (grp = 0; grp < mru->grp_count; grp++)
INIT_LIST_HEAD(mru->lists + grp);
 
-   /*
-* We use GFP_KERNEL radix tree preload and do inserts under a
-* spinlock so GFP_ATOMIC is appropriate for the radix tree itself.
-*/
-   INIT_RADIX_TREE(>store, GFP_ATOMIC);
+   xa_init(>store);
INIT_LIST_HEAD(>reap_list);
-   spin_lock_init(>lock);
INIT_DELAYED_WORK(>work, _xfs_mru_cache_reap);
 
mru->grp_time  = grp_time;
@@ -394,17 +387,17 @@ xfs_mru_cache_flush(
if (!mru || !mru->lists)
return;
 
-   spin_lock(>lock);
+   xa_lock(>store);
if (mru->queued) {
-   spin_unlock(>lock);
+   xa_unlock(>store);
cancel_delayed_work_sync(>work);
-   spin_lock(>lock);
+   xa_lock(>store);
}
 
_xfs_mru_cache_migrate(mru, jiffies + mru->grp_count * mru->grp_time);
_xfs_mru_cache_clear_reap_list(mru);
 
-   spin_unlock(>lock);
+   xa_unlock(>store);
 }
 
 void
@@ -431,24 +424,24 @@ xfs_mru_cache_insert(
unsigned long   key,
struct xfs_mru_cache_elem *elem)
 {
+   XA_STATE(xas, >store, key);
int error;
 
ASSERT(mru && mru->lists);
if (!mru || !mru->lists)
return -EINVAL;
 
-   if (radix_tree_preload(GFP_NOFS))
-   return -ENOMEM;
-
INIT_LIST_HEAD(>list_node);
elem->key = key;
 
-   spin_lock(>lock);
-   error = radix_tree_insert(>store, key, elem);
-   radix_tree_preload_end();
-   if (!error)
-   _xfs_mru_cache_list_insert(mru, elem);
-   spin_unlock(>lock);
+   do {
+   xas_lock();
+   xas_store(, elem);
+

[PATCH v4 63/73] dax: Convert dax_insert_mapping_entry to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 619aff70583f..de85ce87d333 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -498,9 +498,9 @@ static void *dax_insert_mapping_entry(struct address_space 
*mapping,
  void *entry, sector_t sector,
  unsigned long flags, bool dirty)
 {
-   struct radix_tree_root *pages = >pages;
void *new_entry;
pgoff_t index = vmf->pgoff;
+   XA_STATE(xas, >pages, index);
 
if (dirty)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
@@ -516,7 +516,7 @@ static void *dax_insert_mapping_entry(struct address_space 
*mapping,
PAGE_SIZE, 0);
}
 
-   xa_lock_irq(>pages);
+   xas_lock_irq();
new_entry = dax_radix_locked_entry(sector, flags);
 
if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
@@ -528,21 +528,15 @@ static void *dax_insert_mapping_entry(struct 
address_space *mapping,
 * existing entry is a PMD, we will just leave the PMD in the
 * tree and dirty it if necessary.
 */
-   struct radix_tree_node *node;
-   void **slot;
-   void *ret;
-
-   ret = __radix_tree_lookup(pages, index, , );
-   WARN_ON_ONCE(ret != entry);
-   __radix_tree_replace(pages, node, slot,
-new_entry, NULL);
+   void *prev = xas_store(, new_entry);
+   WARN_ON_ONCE(prev != entry);
entry = new_entry;
}
 
if (dirty)
-   radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY);
+   xas_set_tag(, PAGECACHE_TAG_DIRTY);
 
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return entry;
 }
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 71/73] xfs: Convert xfs dquot to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This is a pretty straight-forward conversion.

Signed-off-by: Matthew Wilcox 
---
 fs/xfs/xfs_dquot.c | 33 +
 fs/xfs/xfs_qm.c| 32 
 fs/xfs/xfs_qm.h| 18 +-
 3 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index e2a466df5dd1..a35fcc37770b 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -44,7 +44,7 @@
  * Lock order:
  *
  * ip->i_lock
- *   qi->qi_tree_lock
+ *   qi->qi_xa_lock
  * dquot->q_qlock (xfs_dqlock() and friends)
  *   dquot->q_flush (xfs_dqflock() and friends)
  *   qi->qi_lru_lock
@@ -752,8 +752,8 @@ xfs_qm_dqget(
xfs_dquot_t **O_dqpp) /* OUT : locked incore dquot */
 {
struct xfs_quotainfo*qi = mp->m_quotainfo;
-   struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
-   struct xfs_dquot*dqp;
+   struct xarray   *xa = xfs_dquot_xa(qi, type);
+   struct xfs_dquot*dqp, *curr;
int error;
 
ASSERT(XFS_IS_QUOTA_RUNNING(mp));
@@ -772,13 +772,14 @@ xfs_qm_dqget(
}
 
 restart:
-   mutex_lock(>qi_tree_lock);
-   dqp = radix_tree_lookup(tree, id);
+   mutex_lock(>qi_xa_lock);
+   dqp = xa_load(xa, id);
+found:
if (dqp) {
xfs_dqlock(dqp);
if (dqp->dq_flags & XFS_DQ_FREEING) {
xfs_dqunlock(dqp);
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
trace_xfs_dqget_freeing(dqp);
delay(1);
goto restart;
@@ -788,7 +789,7 @@ xfs_qm_dqget(
if (flags & XFS_QMOPT_DQNEXT) {
if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
xfs_dqunlock(dqp);
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
error = xfs_dq_get_next_id(mp, type, );
if (error)
return error;
@@ -797,14 +798,14 @@ xfs_qm_dqget(
}
 
dqp->q_nrefs++;
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
 
trace_xfs_dqget_hit(dqp);
XFS_STATS_INC(mp, xs_qm_dqcachehits);
*O_dqpp = dqp;
return 0;
}
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
XFS_STATS_INC(mp, xs_qm_dqcachemisses);
 
/*
@@ -854,20 +855,20 @@ xfs_qm_dqget(
}
}
 
-   mutex_lock(>qi_tree_lock);
-   error = radix_tree_insert(tree, id, dqp);
-   if (unlikely(error)) {
-   WARN_ON(error != -EEXIST);
+   mutex_lock(>qi_xa_lock);
+   curr = xa_cmpxchg(xa, id, NULL, dqp, GFP_NOFS);
+   if (unlikely(curr)) {
+   WARN_ON(IS_ERR(curr));
 
/*
 * Duplicate found. Just throw away the new dquot and start
 * over.
 */
-   mutex_unlock(>qi_tree_lock);
trace_xfs_dqget_dup(dqp);
xfs_qm_dqdestroy(dqp);
XFS_STATS_INC(mp, xs_qm_dquot_dups);
-   goto restart;
+   dqp = curr;
+   goto found;
}
 
/*
@@ -877,7 +878,7 @@ xfs_qm_dqget(
dqp->q_nrefs = 1;
 
qi->qi_dquots++;
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
 
/* If we are asked to find next active id, keep looking */
if (flags & XFS_QMOPT_DQNEXT) {
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 010a13a201aa..5a75836faf92 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -67,7 +67,7 @@ xfs_qm_dquot_walk(
void*data)
 {
struct xfs_quotainfo*qi = mp->m_quotainfo;
-   struct radix_tree_root  *tree = xfs_dquot_tree(qi, type);
+   struct xarray   *xa = xfs_dquot_xa(qi, type);
uint32_tnext_index;
int last_error = 0;
int skipped;
@@ -83,11 +83,11 @@ xfs_qm_dquot_walk(
int error = 0;
int i;
 
-   mutex_lock(>qi_tree_lock);
-   nr_found = radix_tree_gang_lookup(tree, (void **)batch,
-   next_index, XFS_DQ_LOOKUP_BATCH);
+   mutex_lock(>qi_xa_lock);
+   nr_found = xa_get_entries(xa, (void **)batch, next_index,
+   ULONG_MAX, XFS_DQ_LOOKUP_BATCH);
if (!nr_found) {
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);

[PATCH v4 73/73] usb: Convert xhci-mem to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

The XArray API is a slightly better fit for xhci_insert_segment_mapping()
than the radix tree API was.

Signed-off-by: Matthew Wilcox 
---
 drivers/usb/host/xhci-mem.c | 70 +
 drivers/usb/host/xhci.h |  6 ++--
 2 files changed, 35 insertions(+), 41 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index e1fba4688509..533d813bdc52 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -149,70 +149,64 @@ static void xhci_link_rings(struct xhci_hcd *xhci, struct 
xhci_ring *ring,
 }
 
 /*
- * We need a radix tree for mapping physical addresses of TRBs to which stream
- * ID they belong to.  We need to do this because the host controller won't 
tell
+ * We need to map physical addresses of TRBs to the stream ID they belong to.
+ * We need to do this because the host controller won't tell
  * us which stream ring the TRB came from.  We could store the stream ID in an
  * event data TRB, but that doesn't help us for the cancellation case, since 
the
  * endpoint may stop before it reaches that event data TRB.
  *
- * The radix tree maps the upper portion of the TRB DMA address to a ring
+ * The xarray maps the upper portion of the TRB DMA address to a ring
  * segment that has the same upper portion of DMA addresses.  For example, say 
I
  * have segments of size 1KB, that are always 1KB aligned.  A segment may
  * start at 0x10c91000 and end at 0x10c913f0.  If I use the upper 10 bits, the
- * key to the stream ID is 0x43244.  I can use the DMA address of the TRB to
- * pass the radix tree a key to get the right stream ID:
+ * index of the stream ID is 0x43244.  I can use the DMA address of the TRB as
+ * the xarray index to get the right stream ID:
  *
  * 0x10c90fff >> 10 = 0x43243
  * 0x10c912c0 >> 10 = 0x43244
  * 0x10c91400 >> 10 = 0x43245
  *
  * Obviously, only those TRBs with DMA addresses that are within the segment
- * will make the radix tree return the stream ID for that ring.
+ * will make the xarray return the stream ID for that ring.
  *
- * Caveats for the radix tree:
+ * Caveats for the xarray:
  *
- * The radix tree uses an unsigned long as a key pair.  On 32-bit systems, an
+ * The xarray uses an unsigned long for the index.  On 32-bit systems, an
  * unsigned long will be 32-bits; on a 64-bit system an unsigned long will be
  * 64-bits.  Since we only request 32-bit DMA addresses, we can use that as the
- * key on 32-bit or 64-bit systems (it would also be fine if we asked for 
64-bit
- * PCI DMA addresses on a 64-bit system).  There might be a problem on 32-bit
- * extended systems (where the DMA address can be bigger than 32-bits),
+ * index on 32-bit or 64-bit systems (it would also be fine if we asked for
+ * 64-bit PCI DMA addresses on a 64-bit system).  There might be a problem on
+ * 32-bit extended systems (where the DMA address can be bigger than 32-bits),
  * if we allow the PCI dma mask to be bigger than 32-bits.  So don't do that.
  */
-static int xhci_insert_segment_mapping(struct radix_tree_root *trb_address_map,
+
+static unsigned long trb_index(dma_addr_t dma)
+{
+   return (unsigned long)(dma >> TRB_SEGMENT_SHIFT);
+}
+
+static int xhci_insert_segment_mapping(struct xarray *trb_address_map,
struct xhci_ring *ring,
struct xhci_segment *seg,
-   gfp_t mem_flags)
+   gfp_t gfp)
 {
-   unsigned long key;
-   int ret;
-
-   key = (unsigned long)(seg->dma >> TRB_SEGMENT_SHIFT);
/* Skip any segments that were already added. */
-   if (radix_tree_lookup(trb_address_map, key))
-   return 0;
+   void *entry = xa_cmpxchg(trb_address_map, trb_index(seg->dma), NULL,
+   ring, gfp);
 
-   ret = radix_tree_maybe_preload(mem_flags);
-   if (ret)
-   return ret;
-   ret = radix_tree_insert(trb_address_map,
-   key, ring);
-   radix_tree_preload_end();
-   return ret;
+   if (IS_ERR(entry))
+   return PTR_ERR(entry);
+   return 0;
 }
 
-static void xhci_remove_segment_mapping(struct radix_tree_root 
*trb_address_map,
+static void xhci_remove_segment_mapping(struct xarray *trb_address_map,
struct xhci_segment *seg)
 {
-   unsigned long key;
-
-   key = (unsigned long)(seg->dma >> TRB_SEGMENT_SHIFT);
-   if (radix_tree_lookup(trb_address_map, key))
-   radix_tree_delete(trb_address_map, key);
+   xa_erase(trb_address_map, trb_index(seg->dma));
 }
 
 static int xhci_update_stream_segment_mapping(
-   struct radix_tree_root *trb_address_map,
+   struct xarray *trb_address_map,
struct xhci_ring *ring,
struct xhci_segment *first_seg,
struct xhci_segment *last_seg,
@@

[PATCH v4 66/73] page cache: Finish XArray conversion

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

With no more radix tree API users left, we can drop the GFP flags
and use xa_init() instead of INIT_RADIX_TREE().

Signed-off-by: Matthew Wilcox 
---
 fs/inode.c | 2 +-
 include/linux/fs.h | 2 +-
 mm/swap_state.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index c7b00573c10d..2046ff6dd1b3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -348,7 +348,7 @@ EXPORT_SYMBOL(inc_nlink);
 void address_space_init_once(struct address_space *mapping)
 {
memset(mapping, 0, sizeof(*mapping));
-   INIT_RADIX_TREE(>pages, GFP_ATOMIC | __GFP_ACCOUNT);
+   xa_init(>pages);
init_rwsem(>i_mmap_rwsem);
INIT_LIST_HEAD(>private_list);
spin_lock_init(>private_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c58bc3c619bf..b459bf4ddb62 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -410,7 +410,7 @@ int pagecache_write_end(struct file *, struct address_space 
*mapping,
  */
 struct address_space {
struct inode*host;
-   struct radix_tree_root  pages;
+   struct xarray   pages;
gfp_t   gfp_mask;
atomic_ti_mmap_writable;
struct rb_root_cached   i_mmap;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 7c862258af66..101e952e01e6 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -573,7 +573,7 @@ int init_swap_address_space(unsigned int type, unsigned 
long nr_pages)
return -ENOMEM;
for (i = 0; i < nr; i++) {
space = spaces + i;
-   INIT_RADIX_TREE(>pages, GFP_ATOMIC|__GFP_NOWARN);
+   xa_init(>pages);
atomic_set(>i_mmap_writable, 0);
space->a_ops = _aops;
/* swap cache doesn't use writeback related tags */
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 65/73] dax: Fix sparse warning

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

sparse doesn't know that follow_pte_pmd conditionally acquires the ptl,
so add an annotation to let it know what's going on.

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/dax.c b/fs/dax.c
index c663d82e8ba3..7a86ff1153dd 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -531,6 +531,7 @@ static void dax_mapping_entry_mkclean(struct address_space 
*mapping,
 */
if (follow_pte_pmd(vma->vm_mm, address, , , , 
, ))
continue;
+   __acquire(ptl); /* Conditionally acquired above */
 
/*
 * No need to call mmu_notifier_invalidate_range() as we are
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 69/73] xfs: Convert m_perag_tree to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Getting rid of the m_perag_lock lets us also get rid of the call to
radix_tree_preload().  This is a relatively naive conversion; we could
improve performance over the radix tree implementation by passing around
xa_state pointers instead of indices, possibly at the expense of extending
rcu_read_lock() periods.

Signed-off-by: Matthew Wilcox 
---
 fs/xfs/libxfs/xfs_sb.c |  9 -
 fs/xfs/xfs_icache.c| 35 +--
 fs/xfs/xfs_icache.h|  6 +++---
 fs/xfs/xfs_mount.c | 19 ---
 fs/xfs/xfs_mount.h |  3 +--
 5 files changed, 21 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 9b5aae2bcc0b..3b0b65eb8224 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -59,7 +59,7 @@ xfs_perag_get(
int ref = 0;
 
rcu_read_lock();
-   pag = radix_tree_lookup(>m_perag_tree, agno);
+   pag = xa_load(>m_perag_xa, agno);
if (pag) {
ASSERT(atomic_read(>pag_ref) >= 0);
ref = atomic_inc_return(>pag_ref);
@@ -78,14 +78,13 @@ xfs_perag_get_tag(
xfs_agnumber_t  first,
int tag)
 {
+   XA_STATE(xas, >m_perag_xa, first);
struct xfs_perag*pag;
-   int found;
int ref;
 
rcu_read_lock();
-   found = radix_tree_gang_lookup_tag(>m_perag_tree,
-   (void **), first, 1, tag);
-   if (found <= 0) {
+   pag = xas_find_tag(, ULONG_MAX, tag);
+   if (!pag) {
rcu_read_unlock();
return NULL;
}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..f56e500d89e2 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -156,13 +156,10 @@ static void
 xfs_reclaim_work_queue(
struct xfs_mount*mp)
 {
-
-   rcu_read_lock();
-   if (radix_tree_tagged(>m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
+   if (xa_tagged(>m_perag_xa, XFS_ICI_RECLAIM_TAG)) {
queue_delayed_work(mp->m_reclaim_workqueue, >m_reclaim_work,
msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
}
-   rcu_read_unlock();
 }
 
 /*
@@ -194,10 +191,7 @@ xfs_perag_set_reclaim_tag(
return;
 
/* propagate the reclaim tag up into the perag radix tree */
-   spin_lock(>m_perag_lock);
-   radix_tree_tag_set(>m_perag_tree, pag->pag_agno,
-  XFS_ICI_RECLAIM_TAG);
-   spin_unlock(>m_perag_lock);
+   xa_set_tag(>m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
 
/* schedule periodic background inode reclaim */
xfs_reclaim_work_queue(mp);
@@ -216,10 +210,7 @@ xfs_perag_clear_reclaim_tag(
return;
 
/* clear the reclaim tag from the perag radix tree */
-   spin_lock(>m_perag_lock);
-   radix_tree_tag_clear(>m_perag_tree, pag->pag_agno,
-XFS_ICI_RECLAIM_TAG);
-   spin_unlock(>m_perag_lock);
+   xa_clear_tag(>m_perag_xa, pag->pag_agno, XFS_ICI_RECLAIM_TAG);
trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
 }
 
@@ -847,12 +838,10 @@ void
 xfs_queue_eofblocks(
struct xfs_mount *mp)
 {
-   rcu_read_lock();
-   if (radix_tree_tagged(>m_perag_tree, XFS_ICI_EOFBLOCKS_TAG))
+   if (xa_tagged(>m_perag_xa, XFS_ICI_EOFBLOCKS_TAG))
queue_delayed_work(mp->m_eofblocks_workqueue,
   >m_eofblocks_work,
   msecs_to_jiffies(xfs_eofb_secs * 1000));
-   rcu_read_unlock();
 }
 
 void
@@ -874,12 +863,10 @@ STATIC void
 xfs_queue_cowblocks(
struct xfs_mount *mp)
 {
-   rcu_read_lock();
-   if (radix_tree_tagged(>m_perag_tree, XFS_ICI_COWBLOCKS_TAG))
+   if (xa_tagged(>m_perag_xa, XFS_ICI_COWBLOCKS_TAG))
queue_delayed_work(mp->m_eofblocks_workqueue,
   >m_cowblocks_work,
   msecs_to_jiffies(xfs_cowb_secs * 1000));
-   rcu_read_unlock();
 }
 
 void
@@ -1542,7 +1529,7 @@ __xfs_inode_set_eofblocks_tag(
void(*execute)(struct xfs_mount *mp),
void(*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
  int error, unsigned long caller_ip),
-   int tag)
+   xa_tag_ttag)
 {
struct xfs_mount *mp = ip->i_mount;
struct xfs_perag *pag;
@@ -1566,11 +1553,9 @@ __xfs_inode_set_eofblocks_tag(
   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
if (!tagged) {
/* propagate the eofblocks tag up into the perag radix tree */
-   spin_lock(>i_mount->m_perag_lock);
-   radix_tree_tag_set(>i_mount->m_perag_tree,
+

[PATCH v4 22/73] page cache: Convert hole search to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

The page cache offers the ability to search for a miss in the previous or
next N locations.  Rather than teach the XArray about the page cache's
definition of a miss, use xas_prev() and xas_next() to search the page
array.  This should be more efficient as it does not have to start the
lookup from the top for each index.

Signed-off-by: Matthew Wilcox 
---
 fs/nfs/blocklayout/blocklayout.c |   2 +-
 include/linux/pagemap.h  |   4 +-
 mm/filemap.c | 110 ++-
 mm/readahead.c   |   4 +-
 4 files changed, 55 insertions(+), 65 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 995d707537da..7bd643538cff 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -826,7 +826,7 @@ static u64 pnfs_num_cont_bytes(struct inode *inode, pgoff_t 
idx)
end = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
if (end != inode->i_mapping->nrpages) {
rcu_read_lock();
-   end = page_cache_next_hole(mapping, idx + 1, ULONG_MAX);
+   end = page_cache_next_gap(mapping, idx + 1, ULONG_MAX);
rcu_read_unlock();
}
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 80a6149152d4..0db127c3ccac 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -241,9 +241,9 @@ static inline gfp_t readahead_gfp_mask(struct address_space 
*x)
 
 typedef int filler_t(void *, struct page *);
 
-pgoff_t page_cache_next_hole(struct address_space *mapping,
+pgoff_t page_cache_next_gap(struct address_space *mapping,
 pgoff_t index, unsigned long max_scan);
-pgoff_t page_cache_prev_hole(struct address_space *mapping,
+pgoff_t page_cache_prev_gap(struct address_space *mapping,
 pgoff_t index, unsigned long max_scan);
 
 #define FGP_ACCESSED   0x0001
diff --git a/mm/filemap.c b/mm/filemap.c
index 1d012dd3629e..650624f7b79d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1326,86 +1326,76 @@ int __lock_page_or_retry(struct page *page, struct 
mm_struct *mm,
 }
 
 /**
- * page_cache_next_hole - find the next hole (not-present entry)
- * @mapping: mapping
- * @index: index
- * @max_scan: maximum range to search
- *
- * Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the
- * lowest indexed hole.
- *
- * Returns: the index of the hole if found, otherwise returns an index
- * outside of the set specified (in which case 'return - index >=
- * max_scan' will be true). In rare cases of index wrap-around, 0 will
- * be returned.
- *
- * page_cache_next_hole may be called under rcu_read_lock. However,
- * like radix_tree_gang_lookup, this will not atomically search a
- * snapshot of the tree at a single point in time. For example, if a
- * hole is created at index 5, then subsequently a hole is created at
- * index 10, page_cache_next_hole covering both indexes may return 10
- * if called under rcu_read_lock.
+ * page_cache_next_gap() - Find the next gap in the page cache.
+ * @mapping: Mapping.
+ * @index: Index.
+ * @max_scan: Maximum range to search.
+ *
+ * Search the range [index, min(index + max_scan - 1, ULONG_MAX)] for the
+ * gap with the lowest index.
+ *
+ * This function may be called under the rcu_read_lock.  However, this will
+ * not atomically search a snapshot of the cache at a single point in time.
+ * For example, if a gap is created at index 5, then subsequently a gap is
+ * created at index 10, page_cache_next_gap covering both indices may
+ * return 10 if called under the rcu_read_lock.
+ *
+ * Return: The index of the gap if found, otherwise an index outside the
+ * range specified (in which case 'return - index >= max_scan' will be true).
+ * In the rare case of index wrap-around, 0 will be returned.
  */
-pgoff_t page_cache_next_hole(struct address_space *mapping,
+pgoff_t page_cache_next_gap(struct address_space *mapping,
 pgoff_t index, unsigned long max_scan)
 {
-   unsigned long i;
+   XA_STATE(xas, >pages, index);
 
-   for (i = 0; i < max_scan; i++) {
-   struct page *page;
-
-   page = radix_tree_lookup(>pages, index);
-   if (!page || xa_is_value(page))
+   while (max_scan--) {
+   void *entry = xas_next();
+   if (!entry || xa_is_value(entry))
break;
-   index++;
-   if (index == 0)
+   if (xas.xa_index == 0)
break;
}
 
-   return index;
+   return xas.xa_index;
 }
-EXPORT_SYMBOL(page_cache_next_hole);
+EXPORT_SYMBOL(page_cache_next_gap);
 
 /**
- * page_cache_prev_hole - find the prev hole (not-present entry)
- * @mapping: mapping
- * @index: index
- * @max_scan: maximum range to search
- *
- * Search backwards in the

[PATCH v4 67/73] vmalloc: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

The radix tree of vmap blocks is simpler to express as an XArray.
Saves a couple of hundred bytes of text and eliminates a user of the
radix tree preload API.

Signed-off-by: Matthew Wilcox 
---
 mm/vmalloc.c | 39 +--
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..3a46efc27525 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -23,7 +23,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -821,12 +821,11 @@ struct vmap_block {
 static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
 
 /*
- * Radix tree of vmap blocks, indexed by address, to quickly find a vmap block
+ * XArray of vmap blocks, indexed by address, to quickly find a vmap block
  * in the free path. Could get rid of this if we change the API to return a
  * "cookie" from alloc, to be passed to free. But no big deal yet.
  */
-static DEFINE_SPINLOCK(vmap_block_tree_lock);
-static RADIX_TREE(vmap_block_tree, GFP_ATOMIC);
+static DEFINE_XARRAY(vmap_block_tree);
 
 /*
  * We should probably have a fallback mechanism to allocate virtual memory
@@ -865,8 +864,8 @@ static void *new_vmap_block(unsigned int order, gfp_t 
gfp_mask)
struct vmap_block *vb;
struct vmap_area *va;
unsigned long vb_idx;
-   int node, err;
-   void *vaddr;
+   int node;
+   void *ret, *vaddr;
 
node = numa_node_id();
 
@@ -883,13 +882,6 @@ static void *new_vmap_block(unsigned int order, gfp_t 
gfp_mask)
return ERR_CAST(va);
}
 
-   err = radix_tree_preload(gfp_mask);
-   if (unlikely(err)) {
-   kfree(vb);
-   free_vmap_area(va);
-   return ERR_PTR(err);
-   }
-
vaddr = vmap_block_vaddr(va->va_start, 0);
spin_lock_init(>lock);
vb->va = va;
@@ -902,11 +894,12 @@ static void *new_vmap_block(unsigned int order, gfp_t 
gfp_mask)
INIT_LIST_HEAD(>free_list);
 
vb_idx = addr_to_vb_idx(va->va_start);
-   spin_lock(_block_tree_lock);
-   err = radix_tree_insert(_block_tree, vb_idx, vb);
-   spin_unlock(_block_tree_lock);
-   BUG_ON(err);
-   radix_tree_preload_end();
+   ret = xa_store(_block_tree, vb_idx, vb, gfp_mask);
+   if (IS_ERR(ret)) {
+   kfree(vb);
+   free_vmap_area(va);
+   return ret;
+   }
 
vbq = _cpu_var(vmap_block_queue);
spin_lock(>lock);
@@ -923,9 +916,7 @@ static void free_vmap_block(struct vmap_block *vb)
unsigned long vb_idx;
 
vb_idx = addr_to_vb_idx(vb->va->va_start);
-   spin_lock(_block_tree_lock);
-   tmp = radix_tree_delete(_block_tree, vb_idx);
-   spin_unlock(_block_tree_lock);
+   tmp = xa_erase(_block_tree, vb_idx);
BUG_ON(tmp != vb);
 
free_vmap_area_noflush(vb->va);
@@ -1031,7 +1022,6 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 static void vb_free(const void *addr, unsigned long size)
 {
unsigned long offset;
-   unsigned long vb_idx;
unsigned int order;
struct vmap_block *vb;
 
@@ -1045,10 +1035,7 @@ static void vb_free(const void *addr, unsigned long size)
offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
offset >>= PAGE_SHIFT;
 
-   vb_idx = addr_to_vb_idx((unsigned long)addr);
-   rcu_read_lock();
-   vb = radix_tree_lookup(_block_tree, vb_idx);
-   rcu_read_unlock();
+   vb = xa_load(_block_tree, addr_to_vb_idx((unsigned long)addr));
BUG_ON(!vb);
 
vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 61/73] dax: Convert dax_writeback_one to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Likewise easy

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 66f6c4ea18f7..7bd94f1b61d0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -633,8 +633,7 @@ static int dax_writeback_one(struct block_device *bdev,
struct dax_device *dax_dev, struct address_space *mapping,
pgoff_t index, void *entry)
 {
-   struct radix_tree_root *pages = >pages;
-   XA_STATE(xas, pages, index);
+   XA_STATE(xas, >pages, index);
void *entry2, *kaddr;
long ret = 0, id;
sector_t sector;
@@ -649,7 +648,7 @@ static int dax_writeback_one(struct block_device *bdev,
if (WARN_ON(!xa_is_value(entry)))
return -EIO;
 
-   xa_lock_irq(>pages);
+   xas_lock_irq();
entry2 = get_unlocked_mapping_entry();
/* Entry got punched out / reallocated? */
if (!entry2 || WARN_ON_ONCE(!xa_is_value(entry2)))
@@ -668,7 +667,7 @@ static int dax_writeback_one(struct block_device *bdev,
}
 
/* Another fsync thread may have already written back this entry */
-   if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))
+   if (!xas_get_tag(, PAGECACHE_TAG_TOWRITE))
goto put_unlocked;
/* Lock the entry to serialize with page faults */
entry = lock_slot();
@@ -679,8 +678,8 @@ static int dax_writeback_one(struct block_device *bdev,
 * at the entry only under xa_lock and once they do that they will
 * see the entry locked and wait for it to unlock.
 */
-   radix_tree_tag_clear(pages, index, PAGECACHE_TAG_TOWRITE);
-   xa_unlock_irq(>pages);
+   xas_clear_tag(, PAGECACHE_TAG_TOWRITE);
+   xas_unlock_irq();
 
/*
 * Even if dax_writeback_mapping_range() was given a wbc->range_start
@@ -718,9 +717,7 @@ static int dax_writeback_one(struct block_device *bdev,
 * the pfn mappings are writeprotected and fault waits for mapping
 * entry lock.
 */
-   xa_lock_irq(>pages);
-   radix_tree_tag_clear(pages, index, PAGECACHE_TAG_DIRTY);
-   xa_unlock_irq(>pages);
+   xa_clear_tag(>pages, index, PAGECACHE_TAG_DIRTY);
trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT);
  dax_unlock:
dax_read_unlock(id);
@@ -729,7 +726,7 @@ static int dax_writeback_one(struct block_device *bdev,
 
  put_unlocked:
put_unlocked_mapping_entry(, entry2);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return ret;
 }
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 68/73] brd: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Convert brd_pages from a radix tree to an XArray.  Simpler and smaller
code; in particular another user of radix_tree_preload is eliminated.

Signed-off-by: Matthew Wilcox 
---
 drivers/block/brd.c | 87 ++---
 1 file changed, 23 insertions(+), 64 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 8028a3a7e7fd..4d8ae1b399e6 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -29,9 +29,9 @@
 #define PAGE_SECTORS   (1 << PAGE_SECTORS_SHIFT)
 
 /*
- * Each block ramdisk device has a radix_tree brd_pages of pages that stores
- * the pages containing the block device's contents. A brd page's ->index is
- * its offset in PAGE_SIZE units. This is similar to, but in no way connected
+ * Each block ramdisk device has an xarray brd_pages that stores the pages
+ * containing the block device's contents. A brd page's ->index is its
+ * offset in PAGE_SIZE units. This is similar to, but in no way connected
  * with, the kernel's pagecache or buffer cache (which sit above our block
  * device).
  */
@@ -41,13 +41,7 @@ struct brd_device {
struct request_queue*brd_queue;
struct gendisk  *brd_disk;
struct list_headbrd_list;
-
-   /*
-* Backing store of pages and lock to protect it. This is the contents
-* of the block device.
-*/
-   spinlock_t  brd_lock;
-   struct radix_tree_root  brd_pages;
+   struct xarray   brd_pages;
 };
 
 /*
@@ -62,17 +56,9 @@ static struct page *brd_lookup_page(struct brd_device *brd, 
sector_t sector)
 * The page lifetime is protected by the fact that we have opened the
 * device node -- brd pages will never be deleted under us, so we
 * don't need any further locking or refcounting.
-*
-* This is strictly true for the radix-tree nodes as well (ie. we
-* don't actually need the rcu_read_lock()), however that is not a
-* documented feature of the radix-tree API so it is better to be
-* safe here (we don't have total exclusion from radix tree updates
-* here, only deletes).
 */
-   rcu_read_lock();
idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
-   page = radix_tree_lookup(>brd_pages, idx);
-   rcu_read_unlock();
+   page = xa_load(>brd_pages, idx);
 
BUG_ON(page && page->index != idx);
 
@@ -87,7 +73,7 @@ static struct page *brd_lookup_page(struct brd_device *brd, 
sector_t sector)
 static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 {
pgoff_t idx;
-   struct page *page;
+   struct page *curr, *page;
gfp_t gfp_flags;
 
page = brd_lookup_page(brd, sector);
@@ -108,62 +94,36 @@ static struct page *brd_insert_page(struct brd_device 
*brd, sector_t sector)
if (!page)
return NULL;
 
-   if (radix_tree_preload(GFP_NOIO)) {
-   __free_page(page);
-   return NULL;
-   }
-
-   spin_lock(>brd_lock);
idx = sector >> PAGE_SECTORS_SHIFT;
page->index = idx;
-   if (radix_tree_insert(>brd_pages, idx, page)) {
+   curr = xa_cmpxchg(>brd_pages, idx, NULL, page, GFP_NOIO);
+   if (curr) {
__free_page(page);
-   page = radix_tree_lookup(>brd_pages, idx);
+   page = curr;
BUG_ON(!page);
BUG_ON(page->index != idx);
}
-   spin_unlock(>brd_lock);
-
-   radix_tree_preload_end();
 
return page;
 }
 
 /*
- * Free all backing store pages and radix tree. This must only be called when
+ * Free all backing store pages and xarray.  This must only be called when
  * there are no other users of the device.
  */
-#define FREE_BATCH 16
 static void brd_free_pages(struct brd_device *brd)
 {
-   unsigned long pos = 0;
-   struct page *pages[FREE_BATCH];
-   int nr_pages;
-
-   do {
-   int i;
-
-   nr_pages = radix_tree_gang_lookup(>brd_pages,
-   (void **)pages, pos, FREE_BATCH);
-
-   for (i = 0; i < nr_pages; i++) {
-   void *ret;
-
-   BUG_ON(pages[i]->index < pos);
-   pos = pages[i]->index;
-   ret = radix_tree_delete(>brd_pages, pos);
-   BUG_ON(!ret || ret != pages[i]);
-   __free_page(pages[i]);
-   }
-
-   pos++;
-
-   /*
-* This assumes radix_tree_gang_lookup always returns as
-* many pages as possible. If the radix-tree code changes,
-* so will this have to.
-*/
-   } while (nr_pages ==

[PATCH v4 54/73] nilfs2: Convert to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

I'm not 100% convinced that the rewrite of nilfs_copy_back_pages is
correct, but it will at least have different bugs from the current
version.

Signed-off-by: Matthew Wilcox 
---
 fs/nilfs2/btnode.c | 37 +++-
 fs/nilfs2/page.c   | 72 +++---
 2 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 9e2a00207436..92cf58e244f9 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -177,42 +177,36 @@ int nilfs_btnode_prepare_change_key(struct address_space 
*btnc,
ctxt->newbh = NULL;
 
if (inode->i_blkbits == PAGE_SHIFT) {
-   lock_page(obh->b_page);
-   /*
-* We cannot call radix_tree_preload for the kernels older
-* than 2.6.23, because it is not exported for modules.
-*/
+   void *entry;
+   struct page *opage = obh->b_page;
+   lock_page(opage);
 retry:
-   err = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
-   if (err)
-   goto failed_unlock;
/* BUG_ON(oldkey != obh->b_page->index); */
-   if (unlikely(oldkey != obh->b_page->index))
-   NILFS_PAGE_BUG(obh->b_page,
+   if (unlikely(oldkey != opage->index))
+   NILFS_PAGE_BUG(opage,
   "invalid oldkey %lld (newkey=%lld)",
   (unsigned long long)oldkey,
   (unsigned long long)newkey);
 
-   xa_lock_irq(>pages);
-   err = radix_tree_insert(>pages, newkey, obh->b_page);
-   xa_unlock_irq(>pages);
+   entry = xa_cmpxchg(>pages, newkey, NULL, opage, GFP_NOFS);
/*
 * Note: page->index will not change to newkey until
 * nilfs_btnode_commit_change_key() will be called.
 * To protect the page in intermediate state, the page lock
 * is held.
 */
-   radix_tree_preload_end();
-   if (!err)
+   if (!entry)
return 0;
-   else if (err != -EEXIST)
+   if (IS_ERR(entry)) {
+   err = PTR_ERR(entry);
goto failed_unlock;
+   }
 
err = invalidate_inode_pages2_range(btnc, newkey, newkey);
if (!err)
goto retry;
/* fallback to copy mode */
-   unlock_page(obh->b_page);
+   unlock_page(opage);
}
 
nbh = nilfs_btnode_create_block(btnc, newkey);
@@ -252,9 +246,8 @@ void nilfs_btnode_commit_change_key(struct address_space 
*btnc,
mark_buffer_dirty(obh);
 
xa_lock_irq(>pages);
-   radix_tree_delete(>pages, oldkey);
-   radix_tree_tag_set(>pages, newkey,
-  PAGECACHE_TAG_DIRTY);
+   __xa_erase(>pages, oldkey);
+   __xa_set_tag(>pages, newkey, PAGECACHE_TAG_DIRTY);
xa_unlock_irq(>pages);
 
opage->index = obh->b_blocknr = newkey;
@@ -283,9 +276,7 @@ void nilfs_btnode_abort_change_key(struct address_space 
*btnc,
return;
 
if (nbh == NULL) {  /* blocksize == pagesize */
-   xa_lock_irq(>pages);
-   radix_tree_delete(>pages, newkey);
-   xa_unlock_irq(>pages);
+   xa_erase(>pages, newkey);
unlock_page(ctxt->bh->b_page);
} else
brelse(nbh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 1c6703efde9e..31d20f624971 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -304,10 +304,10 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
 void nilfs_copy_back_pages(struct address_space *dmap,
   struct address_space *smap)
 {
+   XA_STATE(xas, >pages, 0);
struct pagevec pvec;
unsigned int i, n;
pgoff_t index = 0;
-   int err;
 
pagevec_init();
 repeat:
@@ -317,43 +317,56 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 
for (i = 0; i < pagevec_count(); i++) {
struct page *page = pvec.pages[i], *dpage;
-   pgoff_t offset = page->index;
+   xas_set(, page->index);
 
lock_page(page);
-   dpage = find_lock_page(dmap, offset);
+   do {
+   xas_lock_irq();
+   dpage = xas_create();
+   if (!xas_error())
+   break;
+   xas_unlock_irq();
+   if (!xas_nomem(, GFP_NOFS)) {
+

[PATCH v4 59/73] dax: More XArray conversion

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This time, we want to convert get_unlocked_mapping_entry() to use the
XArray.  That has a ripple effect, causing us to change the waitqueues
to hash on the address of the xarray rather than the address of the
mapping (functionally equivalent), and create a lot of on-the-stack
xa_state which are only used as a container for passing the xarray and
the index down to deeper function calls.

Also rename dax_wake_mapping_entry_waiter() to dax_wake_entry().

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 80 +---
 1 file changed, 36 insertions(+), 44 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d2007a17d257..ad984dece12e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -120,7 +120,7 @@ static int dax_is_empty_entry(void *entry)
  * DAX radix tree locking
  */
 struct exceptional_entry_key {
-   struct address_space *mapping;
+   struct xarray *xa;
pgoff_t entry_start;
 };
 
@@ -129,9 +129,10 @@ struct wait_exceptional_entry_queue {
struct exceptional_entry_key key;
 };
 
-static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
-   pgoff_t index, void *entry, struct exceptional_entry_key *key)
+static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas,
+   void *entry, struct exceptional_entry_key *key)
 {
+   unsigned long index = xas->xa_index;
unsigned long hash;
 
/*
@@ -142,10 +143,10 @@ static wait_queue_head_t *dax_entry_waitqueue(struct 
address_space *mapping,
if (dax_is_pmd_entry(entry))
index &= ~PG_PMD_COLOUR;
 
-   key->mapping = mapping;
+   key->xa = xas->xa;
key->entry_start = index;
 
-   hash = hash_long((unsigned long)mapping ^ index, DAX_WAIT_TABLE_BITS);
+   hash = hash_long((unsigned long)xas->xa ^ index, DAX_WAIT_TABLE_BITS);
return wait_table + hash;
 }
 
@@ -156,26 +157,23 @@ static int wake_exceptional_entry_func(wait_queue_entry_t 
*wait, unsigned int mo
struct wait_exceptional_entry_queue *ewait =
container_of(wait, struct wait_exceptional_entry_queue, wait);
 
-   if (key->mapping != ewait->key.mapping ||
+   if (key->xa != ewait->key.xa ||
key->entry_start != ewait->key.entry_start)
return 0;
return autoremove_wake_function(wait, mode, sync, NULL);
 }
 
 /*
- * We do not necessarily hold the mapping xa_lock when we call this
- * function so it is possible that 'entry' is no longer a valid item in the
- * radix tree.  This is okay because all we really need to do is to find the
- * correct waitqueue where tasks might be waiting for that old 'entry' and
- * wake them.
+ * @entry may no longer be the entry at the index in the array.  The
+ * important information it's conveying is whether the entry at this
+ * index *used* to be a PMD entry..
  */
-static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
-   pgoff_t index, void *entry, bool wake_all)
+static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all)
 {
struct exceptional_entry_key key;
wait_queue_head_t *wq;
 
-   wq = dax_entry_waitqueue(mapping, index, entry, );
+   wq = dax_entry_waitqueue(xas, entry, );
 
/*
 * Checking for locked entry and prepare_to_wait_exclusive() happens
@@ -207,10 +205,9 @@ static inline void *lock_slot(struct xa_state *xas)
  *
  * The function must be called with mapping xa_lock held.
  */
-static void *get_unlocked_mapping_entry(struct address_space *mapping,
-   pgoff_t index, void ***slotp)
+static void *get_unlocked_mapping_entry(struct xa_state *xas)
 {
-   void *entry, **slot;
+   void *entry;
struct wait_exceptional_entry_queue ewait;
wait_queue_head_t *wq;
 
@@ -218,22 +215,19 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
ewait.wait.func = wake_exceptional_entry_func;
 
for (;;) {
-   entry = __radix_tree_lookup(>pages, index, NULL,
- );
-   if (!entry ||
-   WARN_ON_ONCE(!xa_is_value(entry)) || !dax_locked(entry)) {
-   if (slotp)
-   *slotp = slot;
+   entry = xas_load(xas);
+   if (!entry || WARN_ON_ONCE(!xa_is_value(entry)) ||
+   !dax_locked(entry))
return entry;
-   }
 
-   wq = dax_entry_waitqueue(mapping, index, entry, );
+   wq = dax_entry_waitqueue(xas, entry, );
prepare_to_wait_exclusive(wq, ,
  TASK_UNINTERRUPTIBLE);
-   xa_unlock_irq(>pages);
+   xas_pause(xas);
+   xas_unlock_irq(xas);
schedule();

[PATCH v4 64/73] dax: Convert grab_mapping_entry to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 98 +---
 1 file changed, 26 insertions(+), 72 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index de85ce87d333..c663d82e8ba3 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -44,6 +44,7 @@
 
 /* The 'colour' (ie low bits) within a PMD of a page offset.  */
 #define PG_PMD_COLOUR  ((PMD_SIZE >> PAGE_SHIFT) - 1)
+#define PMD_ORDER  (PMD_SHIFT - PAGE_SHIFT)
 
 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
@@ -89,10 +90,10 @@ static void *dax_radix_locked_entry(sector_t sector, 
unsigned long flags)
DAX_ENTRY_LOCK);
 }
 
-static unsigned int dax_radix_order(void *entry)
+static unsigned int dax_entry_order(void *entry)
 {
if (xa_to_value(entry) & DAX_PMD)
-   return PMD_SHIFT - PAGE_SHIFT;
+   return PMD_ORDER;
return 0;
 }
 
@@ -299,10 +300,11 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
 {
XA_STATE(xas, >pages, index);
bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
-   void *entry, **slot;
+   void *entry;
 
+   xas_set_order(, index, size_flag ? PMD_ORDER : 0);
 restart:
-   xa_lock_irq(>pages);
+   xas_lock_irq();
entry = get_unlocked_mapping_entry();
 
if (WARN_ON_ONCE(entry && !xa_is_value(entry))) {
@@ -326,84 +328,36 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
}
}
 
-   /* No entry for given index? Make sure radix tree is big enough. */
-   if (!entry || pmd_downgrade) {
-   int err;
-
-   if (pmd_downgrade) {
-   /*
-* Make sure 'entry' remains valid while we drop
-* mapping xa_lock.
-*/
-   entry = lock_slot();
-   }
-
-   xa_unlock_irq(>pages);
+   if (pmd_downgrade) {
+   entry = lock_slot();
/*
 * Besides huge zero pages the only other thing that gets
 * downgraded are empty entries which don't need to be
 * unmapped.
 */
-   if (pmd_downgrade && dax_is_zero_entry(entry))
+   if (dax_is_zero_entry(entry)) {
+   xas_pause();
+   xas_unlock_irq();
unmap_mapping_range(mapping,
(index << PAGE_SHIFT) & PMD_MASK, PMD_SIZE, 0);
-
-   err = radix_tree_preload(
-   mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM);
-   if (err) {
-   if (pmd_downgrade)
-   put_locked_mapping_entry(mapping, index);
-   return ERR_PTR(err);
+   xas_lock_irq();
}
-   xa_lock_irq(>pages);
-
-   if (!entry) {
-   /*
-* We needed to drop the pages lock while calling
-* radix_tree_preload() and we didn't have an entry to
-* lock.  See if another thread inserted an entry at
-* our index during this time.
-*/
-   entry = __radix_tree_lookup(>pages, index,
-   NULL, );
-   if (entry) {
-   radix_tree_preload_end();
-   xa_unlock_irq(>pages);
-   goto restart;
-   }
-   }
-
-   if (pmd_downgrade) {
-   radix_tree_delete(>pages, index);
-   mapping->nrexceptional--;
-   dax_wake_entry(, entry, true);
-   }
-
+   xas_store(, NULL);
+   mapping->nrexceptional--;
+   dax_wake_entry(, entry, true);
+   }
+   if (!entry || pmd_downgrade) {
entry = dax_radix_locked_entry(0, size_flag | DAX_EMPTY);
-
-   err = __radix_tree_insert(>pages, index,
-   dax_radix_order(entry), entry);
-   radix_tree_preload_end();
-   if (err) {
-   xa_unlock_irq(>pages);
-   /*
-* Our insertion of a DAX entry failed, most likely
-* because we were inserting a PMD entry and it
-* collided with a PTE sized entry at a different
-* index in the PMD range.  We haven't inserted
-* anything into the radix tree and have no waiters to
-* wake.
-*/
-

[PATCH v4 70/73] xfs: Convert pag_ici_root to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Rename pag_ici_root to pag_ici_xa and use XArray APIs instead of radix
tree APIs.  Shorter code, typechecking on tag numbers, better error
checking in xfs_reclaim_inode(), and eliminates a call to
radix_tree_preload().

Signed-off-by: Matthew Wilcox 
---
 fs/xfs/libxfs/xfs_sb.c |   2 +-
 fs/xfs/libxfs/xfs_sb.h |   2 +-
 fs/xfs/xfs_icache.c| 107 +++--
 fs/xfs/xfs_icache.h|   4 +-
 fs/xfs/xfs_inode.c |  24 ---
 fs/xfs/xfs_mount.c |   3 +-
 fs/xfs/xfs_mount.h |   3 +-
 7 files changed, 54 insertions(+), 91 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 3b0b65eb8224..8fb7c216c761 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -76,7 +76,7 @@ struct xfs_perag *
 xfs_perag_get_tag(
struct xfs_mount*mp,
xfs_agnumber_t  first,
-   int tag)
+   xa_tag_ttag)
 {
XA_STATE(xas, >m_perag_xa, first);
struct xfs_perag*pag;
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 961e6475a309..d2de90b8f39c 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -23,7 +23,7 @@
  */
 extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
 extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
-  int tag);
+  xa_tag_t tag);
 extern voidxfs_perag_put(struct xfs_perag *pag);
 extern int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f56e500d89e2..edd44e190f3e 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -186,7 +186,7 @@ xfs_perag_set_reclaim_tag(
 {
struct xfs_mount*mp = pag->pag_mount;
 
-   lockdep_assert_held(>pag_ici_lock);
+   lockdep_assert_held(>pag_ici_xa.xa_lock);
if (pag->pag_ici_reclaimable++)
return;
 
@@ -205,7 +205,7 @@ xfs_perag_clear_reclaim_tag(
 {
struct xfs_mount*mp = pag->pag_mount;
 
-   lockdep_assert_held(>pag_ici_lock);
+   lockdep_assert_held(>pag_ici_xa.xa_lock);
if (--pag->pag_ici_reclaimable)
return;
 
@@ -228,16 +228,16 @@ xfs_inode_set_reclaim_tag(
struct xfs_perag*pag;
 
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-   spin_lock(>pag_ici_lock);
+   xa_lock(>pag_ici_xa);
spin_lock(>i_flags_lock);
 
-   radix_tree_tag_set(>pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+   __xa_set_tag(>pag_ici_xa, XFS_INO_TO_AGINO(mp, ip->i_ino),
   XFS_ICI_RECLAIM_TAG);
xfs_perag_set_reclaim_tag(pag);
__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 
spin_unlock(>i_flags_lock);
-   spin_unlock(>pag_ici_lock);
+   xa_unlock(>pag_ici_xa);
xfs_perag_put(pag);
 }
 
@@ -246,7 +246,7 @@ xfs_inode_clear_reclaim_tag(
struct xfs_perag*pag,
xfs_ino_t   ino)
 {
-   radix_tree_tag_clear(>pag_ici_root,
+   __xa_clear_tag(>pag_ici_xa,
 XFS_INO_TO_AGINO(pag->pag_mount, ino),
 XFS_ICI_RECLAIM_TAG);
xfs_perag_clear_reclaim_tag(pag);
@@ -367,8 +367,8 @@ xfs_iget_cache_hit(
/*
 * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode
 * from stomping over us while we recycle the inode.  We can't
-* clear the radix tree reclaimable tag yet as it requires
-* pag_ici_lock to be held exclusive.
+* clear the xarray reclaimable tag yet as it requires
+* pag_ici_xa.xa_lock to be held exclusive.
 */
ip->i_flags |= XFS_IRECLAIM;
 
@@ -393,7 +393,7 @@ xfs_iget_cache_hit(
goto out_error;
}
 
-   spin_lock(>pag_ici_lock);
+   xa_lock(>pag_ici_xa);
spin_lock(>i_flags_lock);
 
/*
@@ -410,7 +410,7 @@ xfs_iget_cache_hit(
init_rwsem(>i_rwsem);
 
spin_unlock(>i_flags_lock);
-   spin_unlock(>pag_ici_lock);
+   xa_unlock(>pag_ici_xa);
} else {
/* If the VFS inode is being torn down, pause and try again. */
if (!igrab(inode)) {
@@ -451,7 +451,7 @@ xfs_iget_cache_miss(
int flags,
int lock_flags)
 {
-   struct xfs_inode*ip;
+   struct xfs_inode*ip, *curr;
int error;
xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ino);
int iflags;
@@ -471,17 +471,6 @@ xfs_iget_cache_miss(
goto out_destroy;
}
 
-

[PATCH v4 60/73] dax: Convert __dax_invalidate_mapping_entry to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Simple now that we already have an xa_state!

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index ad984dece12e..66f6c4ea18f7 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -413,24 +413,24 @@ static int __dax_invalidate_mapping_entry(struct 
address_space *mapping,
XA_STATE(xas, >pages, index);
int ret = 0;
void *entry;
-   struct radix_tree_root *pages = >pages;
 
xa_lock_irq(>pages);
entry = get_unlocked_mapping_entry();
if (!entry || WARN_ON_ONCE(!xa_is_value(entry)))
goto out;
if (!trunc &&
-   (radix_tree_tag_get(pages, index, PAGECACHE_TAG_DIRTY) ||
-radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE)))
+   (xas_get_tag(, PAGECACHE_TAG_DIRTY) ||
+xas_get_tag(, PAGECACHE_TAG_TOWRITE)))
goto out;
-   radix_tree_delete(pages, index);
+   xas_store(, NULL);
mapping->nrexceptional--;
ret = 1;
 out:
put_unlocked_mapping_entry(, entry);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return ret;
 }
+
 /*
  * Delete DAX data value entry at @index from @mapping. Wait for radix tree
  * entry to get unlocked before deleting it.
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 62/73] dax: Convert dax_insert_pfn_mkwrite to XArray

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

Signed-off-by: Matthew Wilcox 
---
 fs/dax.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7bd94f1b61d0..619aff70583f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1498,21 +1498,21 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
void *entry;
int vmf_ret, error;
 
-   xa_lock_irq(>pages);
+   xas_lock_irq();
entry = get_unlocked_mapping_entry();
/* Did we race with someone splitting entry or so? */
if (!entry ||
(pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
(pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
put_unlocked_mapping_entry(, entry);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
  VM_FAULT_NOPAGE);
return VM_FAULT_NOPAGE;
}
-   radix_tree_tag_set(>pages, index, PAGECACHE_TAG_DIRTY);
+   xas_set_tag(, PAGECACHE_TAG_DIRTY);
entry = lock_slot();
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
switch (pe_size) {
case PE_SIZE_PTE:
error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 12/73] xarray: Add xa_cmpxchg

2017-12-05 Thread Matthew Wilcox

From: Matthew Wilcox 

This works like doing cmpxchg() on an array entry.  Code which wants
the radix_tree_insert() semantic of not overwriting an existing entry
can cmpxchg() with NULL and get the action it wants.  Plus, instead of
having an error returned, they get the value currently stored in the
array, which often saves them a subsequent lookup.

Signed-off-by: Matthew Wilcox 
---
 include/linux/xarray.h |  2 ++
 lib/xarray.c   | 37 +
 2 files changed, 39 insertions(+)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 6f1f55d9fc94..a570d7d9a252 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -72,6 +72,8 @@ static inline void xa_init(struct xarray *xa)
 
 void *xa_load(struct xarray *, unsigned long index);
 void *xa_store(struct xarray *, unsigned long index, void *entry, gfp_t);
+void *xa_cmpxchg(struct xarray *, unsigned long index,
+   void *old, void *entry, gfp_t);
 
 /**
  * xa_erase() - Erase this entry from the XArray.
diff --git a/lib/xarray.c b/lib/xarray.c
index fbbb02c25b6d..6625b1763123 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -852,6 +852,43 @@ void *xa_store(struct xarray *xa, unsigned long index, 
void *entry, gfp_t gfp)
 }
 EXPORT_SYMBOL(xa_store);
 
+/**
+ * xa_cmpxchg() - Conditionally replace an entry in the XArray.
+ * @xa: XArray.
+ * @index: Index into array.
+ * @old: Old value to test against.
+ * @entry: New value to place in array.
+ * @gfp: Allocation flags.
+ *
+ * If the entry at @index is the same as @old, replace it with @entry.
+ * If the return value is equal to @old, then the exchange was successful.
+ *
+ * Return: The old value at this index or ERR_PTR() if an error happened.
+ */
+void *xa_cmpxchg(struct xarray *xa, unsigned long index,
+   void *old, void *entry, gfp_t gfp)
+{
+   XA_STATE(xas, xa, index);
+   unsigned long flags;
+   void *curr;
+
+   if (WARN_ON_ONCE(xa_is_internal(entry)))
+   return ERR_PTR(-EINVAL);
+
+   do {
+   xa_lock_irqsave(xa, flags);
+   curr = xas_create();
+   if (curr == old)
+   xas_store(, entry);
+   xa_unlock_irqrestore(xa, flags);
+   } while (xas_nomem(, gfp));
+
+   if (xas_error())
+   curr = ERR_PTR(xas_error());
+   return curr;
+}
+EXPORT_SYMBOL(xa_cmpxchg);
+
 /**
  * __xa_set_tag() - Set this tag on this entry while locked.
  * @xa: XArray.
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] usb: xhci: allow imod-interval to be configurable

2017-12-05 Thread Adam Wallis

Rob,

On 12/5/2017 3:48 AM, Mathias Nyman wrote:
> On 05.12.2017 04:54, Adam Wallis wrote:
>> On 12/4/2017 9:15 PM, Chunfeng Yun wrote:
>>> On Mon, 2017-12-04 at 09:27 -0500, Adam Wallis wrote:
 The xHCI driver currently has the IMOD set to 160, which
[..]
> 
> If Rob Acks this version I'll apply it and remove that blank line.
> 
> -Mathias
> 

Let me know if you have any other issues with this patch, otherwise with your
ACK, we are done.

Thanks

Adam

> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


-- 
Adam Wallis
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID

2017-12-05 Thread David Kozub

There is another JMS567-based USB3 UAS enclosure (152d:0578) that fails
with the following error:

[sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[sda] tag#0 Sense Key : Illegal Request [current]
[sda] tag#0 Add. Sense: Invalid field in cdb

The issue occurs both with UAS (occasionally) and mass storage
(immediately after mounting a FS on a disk in the enclosure).

Enabling US_FL_BROKEN_FUA quirk solves this issue.

This patch adds an UNUSUAL_DEV with US_FL_BROKEN_FUA for the enclosure
for both UAS and mass storage.

Signed-off-by: David Kozub 
---
 drivers/usb/storage/unusual_devs.h | 7 +++
 drivers/usb/storage/unusual_uas.h  | 7 +++
 2 files changed, 14 insertions(+)

diff --git a/drivers/usb/storage/unusual_devs.h 
b/drivers/usb/storage/unusual_devs.h
index 2968046e7c05..f72d045ee9ef 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -2100,6 +2100,13 @@ UNUSUAL_DEV(  0x152d, 0x0567, 0x0114, 0x0116,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),
 
+/* Reported by David Kozub  */
+UNUSUAL_DEV(0x152d, 0x0578, 0x, 0x,
+   "JMicron",
+   "JMS567",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_BROKEN_FUA),
+
 /*
  * Reported by Alexandre Oliva 
  * JMicron responds to USN and several other SCSI ioctls with a
diff --git a/drivers/usb/storage/unusual_uas.h 
b/drivers/usb/storage/unusual_uas.h
index d520374a824e..e6127fb21c12 100644
--- a/drivers/usb/storage/unusual_uas.h
+++ b/drivers/usb/storage/unusual_uas.h
@@ -129,6 +129,13 @@ UNUSUAL_DEV(0x152d, 0x0567, 0x, 0x,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA | US_FL_NO_REPORT_OPCODES),
 
+/* Reported-by: David Kozub  */
+UNUSUAL_DEV(0x152d, 0x0578, 0x, 0x,
+   "JMicron",
+   "JMS567",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_BROKEN_FUA),
+
 /* Reported-by: Hans de Goede  */
 UNUSUAL_DEV(0x2109, 0x0711, 0x, 0x,
"VIA",
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] usbip: Use common error handling code in stub_recv_cmd_submit()

2017-12-05 Thread Shuah Khan

On 12/05/2017 02:52 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Tue, 5 Dec 2017 22:40:30 +0100
> 
> Add a jump target so that a bit of exception handling can be better reused
> at the end of this function.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/usb/usbip/stub_rx.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c
> index a70eb81823a3..f46d71abd2f9 100644
> --- a/drivers/usb/usbip/stub_rx.c
> +++ b/drivers/usb/usbip/stub_rx.c
> @@ -445,29 +445,23 @@ static void stub_recv_cmd_submit(struct stub_device 
> *sdev,
>   else
>   priv->urb = usb_alloc_urb(0, GFP_KERNEL);
>  
> - if (!priv->urb) {
> - usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
> - return;
> - }
> + if (!priv->urb)
> + goto add_event_malloc_failure;
>  
>   /* allocate urb transfer buffer, if needed */
>   if (pdu->u.cmd_submit.transfer_buffer_length > 0) {
>   priv->urb->transfer_buffer =
>   kzalloc(pdu->u.cmd_submit.transfer_buffer_length,
>   GFP_KERNEL);
> - if (!priv->urb->transfer_buffer) {
> - usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
> - return;
> - }
> + if (!priv->urb->transfer_buffer)
> + goto add_event_malloc_failure;
>   }
>  
>   /* copy urb setup packet */
>   priv->urb->setup_packet = kmemdup(>u.cmd_submit.setup, 8,
> GFP_KERNEL);
> - if (!priv->urb->setup_packet) {
> - usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);

Please see comments on my comments on the previous patch in this series.
This patch depends on the patch that is incorrect.

> - return;
> - }
> + if (!priv->urb->setup_packet)
> + goto add_event_malloc_failure;
>  
>   /* set other members from the base header of pdu */
>   priv->urb->context= (void *) priv;
> @@ -507,6 +501,10 @@ static void stub_recv_cmd_submit(struct stub_device 
> *sdev,
>   }
>  
>   usbip_dbg_stub_rx("Leave\n");
> + return;
> +
> +add_event_malloc_failure:
> + usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
>  }
>  
>  /* recv a pdu */
> 

I am restructuring this routine to fix a bug and so it stands now,
I don't see a value in taking this patch that doesn't really fix
any specific bug.

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] usbip: Delete an error message for a failed memory allocation in two functions

2017-12-05 Thread Shuah Khan

On 12/05/2017 02:49 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Tue, 5 Dec 2017 22:25:38 +0100
> 
> Omit an extra message for a memory allocation failure in these functions.
> 
> This issue was detected by using the Coccinelle software.

Please include the problem and log from Coccinelle software in any
future patches for the issues detected by Coccinelle.

> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/usb/usbip/stub_rx.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c
> index 536e037f541f..a70eb81823a3 100644
> --- a/drivers/usb/usbip/stub_rx.c
> +++ b/drivers/usb/usbip/stub_rx.c
> @@ -302,7 +302,6 @@ static struct stub_priv *stub_priv_alloc(struct 
> stub_device *sdev,
>  
>   priv = kmem_cache_zalloc(stub_priv_cache, GFP_ATOMIC);
>   if (!priv) {
> - dev_err(>udev->dev, "alloc stub_priv\n");
>   spin_unlock_irqrestore(>priv_lock, flags);
>   usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
>   return NULL;
> @@ -466,7 +465,6 @@ static void stub_recv_cmd_submit(struct stub_device *sdev,
>   priv->urb->setup_packet = kmemdup(>u.cmd_submit.setup, 8,
> GFP_KERNEL);
>   if (!priv->urb->setup_packet) {
> - dev_err(>dev, "allocate setup_packet\n");

If Coccinelle found this as an extra message, there is something
wrong with the Coccinelle script. This is not an extra message.
This message is for the second kmemdup() failure and is necessary.

>   usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
>   return;
>   }
> 

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 147 matches

Mail list logo