[PATCH] [PATCH] of_reserved_mem: Increase the number of reserved regions

2020-10-03 Thread Phil Chang
Certain SoCs need to support large amount of reserved memory
regions, especially to follow the GKI rules from Google.
In MTK new SoC requires more than 68 regions of reserved memory
for each IP's usage, such as load firmware to specific sapce,
so that need to reserve more regisions 

Signed-off-by: Joe Liu 
Signed-off-by: YJ Chiang 
Signed-off-by: Alix Wu 
Signed-off-by: Phil Chang 
---
 drivers/of/of_reserved_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 46b9371c8a33..595f0741dcef 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -22,7 +22,7 @@
 #include 
 #include 
 
-#define MAX_RESERVED_REGIONS   64
+#define MAX_RESERVED_REGIONS   128
 static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
 static int reserved_mem_count;
 
-- 
2.18.0


[PATCH v3] bluetooth: hci_h5: fix memory leak in h5_close

2020-10-03 Thread Anant Thazhemadam
When h5_close() is called and !hu->serdev, h5 is directly freed.
However, h5->rx_skb is not freed before h5 is freed, which causes
a memory leak.
Freeing h5->rx_skb (when !hu->serdev) fixes this memory leak before
freeing h5.

Fixes: ce945552fde4 ("Bluetooth: hci_h5: Add support for serdev enumerated 
devices")
Reported-by: syzbot+6ce141c55b2f7aafd...@syzkaller.appspotmail.com
Tested-by: syzbot+6ce141c55b2f7aafd...@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam 
---
Changes in v3:
* Free h5->rx_skb when !hu->serdev, and fix the memory leak
* Do not incorrectly and unnecessarily call serdev_device_close()

Changes in v2:
* Fixed the Fixes tag

Hans de Goede also suggested calling h5_reset_rx() on close (for both, 
!hu->serdev
and hu->serdev cases). 
However, doing so seems to lead to a null-ptr-dereference error,
https://syzkaller.appspot.com/text?tag=CrashReport=136a9a5d90,
and for this reason, it has not been implemented.

Instead, directly freeing h5->rx_skb seems to suffice in preventing the memory 
leak
reported. 
And since h5 is freed immediately after freeing h5->rx_skb, assigning 
h5->rx_skb to
be NULL isn't necessary.

 drivers/bluetooth/hci_h5.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_h5.c b/drivers/bluetooth/hci_h5.c
index e41854e0d79a..171e55c080ce 100644
--- a/drivers/bluetooth/hci_h5.c
+++ b/drivers/bluetooth/hci_h5.c
@@ -248,8 +248,10 @@ static int h5_close(struct hci_uart *hu)
if (h5->vnd && h5->vnd->close)
h5->vnd->close(h5);
 
-   if (!hu->serdev)
+   if (!hu->serdev){
+   kfree_skb(h5->rx_skb);
kfree(h5);
+   }
 
return 0;
 }
-- 
2.25.1



Re: USBIP is claiming all my USB devices - Commit 7a2f2974f265 is broken

2020-10-03 Thread Greg Kroah-Hartman
On Sat, Oct 03, 2020 at 01:54:46PM -0400, Byron Stanoszek wrote:
> On Sat, 3 Oct 2020, Greg Kroah-Hartman wrote:
> 
> > On Sat, Oct 03, 2020 at 01:18:36PM -0400, Byron Stanoszek wrote:
> > > All,
> > > 
> > > I was testing Linux 5.9-rc7 today when I realized that none of my USB 
> > > devices
> > > were responding anymore. For instance, my mouse does not respond and its 
> > > usual
> > > red LED is not on.
> > > 
> > > Reverting git commit 7a2f2974f265 solved the problem for me.
> > 
> > Can you try the patches listed here:
> > https://lore.kernel.org/r/20201003142651.ga794...@kroah.com
> > 
> > As this issue should be solved with them.  Hopefully :)
> 
> I confirm this also solved the problem for me.

Great!  Those patches are now in Linus's tree so all should be good.

thanks for testing and letting me know.

greg k-h


[PATCH 06/10] fpga: fpga-mgr: socfpga: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/socfpga.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/socfpga.c b/drivers/fpga/socfpga.c
index 4a8a2fcd4e6c..1f467173fc1f 100644
--- a/drivers/fpga/socfpga.c
+++ b/drivers/fpga/socfpga.c
@@ -576,18 +576,7 @@ static int socfpga_fpga_probe(struct platform_device *pdev)
if (!mgr)
return -ENOMEM;
 
-   platform_set_drvdata(pdev, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int socfpga_fpga_remove(struct platform_device *pdev)
-{
-   struct fpga_manager *mgr = platform_get_drvdata(pdev);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(dev, mgr);
 }
 
 #ifdef CONFIG_OF
@@ -601,7 +590,6 @@ MODULE_DEVICE_TABLE(of, socfpga_fpga_of_match);
 
 static struct platform_driver socfpga_fpga_driver = {
.probe = socfpga_fpga_probe,
-   .remove = socfpga_fpga_remove,
.driver = {
.name   = "socfpga_fpga_manager",
.of_match_table = of_match_ptr(socfpga_fpga_of_match),
-- 
2.28.0



[PATCH 10/10] fpga: fpga-mgr: altera-pr-ip: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.
Remove the now obsolete altera_pr_unregister() function.

Signed-off-by: Moritz Fischer 
---

We should take another look at this, IIRC correctly the point of
splitting this up into a separate driver was to make it useable by a
different (pci?) driver later on.

It doesn't seem like this happened, and I think we should just make this
a platform driver?

---
 drivers/fpga/altera-pr-ip-core-plat.c  | 10 --
 drivers/fpga/altera-pr-ip-core.c   | 14 +-
 include/linux/fpga/altera-pr-ip-core.h |  1 -
 3 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/fpga/altera-pr-ip-core-plat.c 
b/drivers/fpga/altera-pr-ip-core-plat.c
index 99b9cc0e70f0..b008a6b8d2d3 100644
--- a/drivers/fpga/altera-pr-ip-core-plat.c
+++ b/drivers/fpga/altera-pr-ip-core-plat.c
@@ -28,15 +28,6 @@ static int alt_pr_platform_probe(struct platform_device 
*pdev)
return alt_pr_register(dev, reg_base);
 }
 
-static int alt_pr_platform_remove(struct platform_device *pdev)
-{
-   struct device *dev = >dev;
-
-   alt_pr_unregister(dev);
-
-   return 0;
-}
-
 static const struct of_device_id alt_pr_of_match[] = {
{ .compatible = "altr,a10-pr-ip", },
{},
@@ -46,7 +37,6 @@ MODULE_DEVICE_TABLE(of, alt_pr_of_match);
 
 static struct platform_driver alt_pr_platform_driver = {
.probe = alt_pr_platform_probe,
-   .remove = alt_pr_platform_remove,
.driver = {
.name   = "alt_a10_pr_ip",
.of_match_table = alt_pr_of_match,
diff --git a/drivers/fpga/altera-pr-ip-core.c b/drivers/fpga/altera-pr-ip-core.c
index 2cf25fd5e897..dfdf21ed34c4 100644
--- a/drivers/fpga/altera-pr-ip-core.c
+++ b/drivers/fpga/altera-pr-ip-core.c
@@ -195,22 +195,10 @@ int alt_pr_register(struct device *dev, void __iomem 
*reg_base)
if (!mgr)
return -ENOMEM;
 
-   dev_set_drvdata(dev, mgr);
-
-   return fpga_mgr_register(mgr);
+   return devm_fpga_mgr_register(dev, mgr);
 }
 EXPORT_SYMBOL_GPL(alt_pr_register);
 
-void alt_pr_unregister(struct device *dev)
-{
-   struct fpga_manager *mgr = dev_get_drvdata(dev);
-
-   dev_dbg(dev, "%s\n", __func__);
-
-   fpga_mgr_unregister(mgr);
-}
-EXPORT_SYMBOL_GPL(alt_pr_unregister);
-
 MODULE_AUTHOR("Matthew Gerlach ");
 MODULE_DESCRIPTION("Altera Partial Reconfiguration IP Core");
 MODULE_LICENSE("GPL v2");
diff --git a/include/linux/fpga/altera-pr-ip-core.h 
b/include/linux/fpga/altera-pr-ip-core.h
index 0b08ac20ab16..a6b4c07858cc 100644
--- a/include/linux/fpga/altera-pr-ip-core.h
+++ b/include/linux/fpga/altera-pr-ip-core.h
@@ -13,6 +13,5 @@
 #include 
 
 int alt_pr_register(struct device *dev, void __iomem *reg_base);
-void alt_pr_unregister(struct device *dev);
 
 #endif /* _ALT_PR_IP_CORE_H */
-- 
2.28.0



[PATCH 05/10] fpga: fpga-mgr: machxo2-spi: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/machxo2-spi.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/machxo2-spi.c b/drivers/fpga/machxo2-spi.c
index b316369156fe..114a64d2b7a4 100644
--- a/drivers/fpga/machxo2-spi.c
+++ b/drivers/fpga/machxo2-spi.c
@@ -371,18 +371,7 @@ static int machxo2_spi_probe(struct spi_device *spi)
if (!mgr)
return -ENOMEM;
 
-   spi_set_drvdata(spi, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int machxo2_spi_remove(struct spi_device *spi)
-{
-   struct fpga_manager *mgr = spi_get_drvdata(spi);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(dev, mgr);
 }
 
 static const struct of_device_id of_match[] = {
@@ -403,7 +392,6 @@ static struct spi_driver machxo2_spi_driver = {
.of_match_table = of_match_ptr(of_match),
},
.probe = machxo2_spi_probe,
-   .remove = machxo2_spi_remove,
.id_table = lattice_ids,
 };
 
-- 
2.28.0



[PATCH 08/10] fpga: fpga-mgr: xilinx-spi: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/xilinx-spi.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
index 824abbbd631e..27defa98092d 100644
--- a/drivers/fpga/xilinx-spi.c
+++ b/drivers/fpga/xilinx-spi.c
@@ -259,18 +259,7 @@ static int xilinx_spi_probe(struct spi_device *spi)
if (!mgr)
return -ENOMEM;
 
-   spi_set_drvdata(spi, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int xilinx_spi_remove(struct spi_device *spi)
-{
-   struct fpga_manager *mgr = spi_get_drvdata(spi);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(>dev, mgr);
 }
 
 static const struct of_device_id xlnx_spi_of_match[] = {
@@ -285,7 +274,6 @@ static struct spi_driver xilinx_slave_spi_driver = {
.of_match_table = of_match_ptr(xlnx_spi_of_match),
},
.probe = xilinx_spi_probe,
-   .remove = xilinx_spi_remove,
 };
 
 module_spi_driver(xilinx_slave_spi_driver)
-- 
2.28.0



[PATCH 07/10] fpga: fpga-mgr: ts73xx: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/ts73xx-fpga.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/ts73xx-fpga.c b/drivers/fpga/ts73xx-fpga.c
index 2888ff000e4d..101f016c6ed8 100644
--- a/drivers/fpga/ts73xx-fpga.c
+++ b/drivers/fpga/ts73xx-fpga.c
@@ -127,18 +127,7 @@ static int ts73xx_fpga_probe(struct platform_device *pdev)
if (!mgr)
return -ENOMEM;
 
-   platform_set_drvdata(pdev, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int ts73xx_fpga_remove(struct platform_device *pdev)
-{
-   struct fpga_manager *mgr = platform_get_drvdata(pdev);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(kdev, mgr);
 }
 
 static struct platform_driver ts73xx_fpga_driver = {
@@ -146,7 +135,6 @@ static struct platform_driver ts73xx_fpga_driver = {
.name   = "ts73xx-fpga-mgr",
},
.probe  = ts73xx_fpga_probe,
-   .remove = ts73xx_fpga_remove,
 };
 module_platform_driver(ts73xx_fpga_driver);
 
-- 
2.28.0



[PATCH 09/10] fpga: fpga-mgr: zynqmp: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/zynqmp-fpga.c | 21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/fpga/zynqmp-fpga.c b/drivers/fpga/zynqmp-fpga.c
index 4a1139e05280..125743c9797f 100644
--- a/drivers/fpga/zynqmp-fpga.c
+++ b/drivers/fpga/zynqmp-fpga.c
@@ -95,7 +95,6 @@ static int zynqmp_fpga_probe(struct platform_device *pdev)
struct device *dev = >dev;
struct zynqmp_fpga_priv *priv;
struct fpga_manager *mgr;
-   int ret;
 
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
@@ -108,24 +107,7 @@ static int zynqmp_fpga_probe(struct platform_device *pdev)
if (!mgr)
return -ENOMEM;
 
-   platform_set_drvdata(pdev, mgr);
-
-   ret = fpga_mgr_register(mgr);
-   if (ret) {
-   dev_err(dev, "unable to register FPGA manager");
-   return ret;
-   }
-
-   return 0;
-}
-
-static int zynqmp_fpga_remove(struct platform_device *pdev)
-{
-   struct fpga_manager *mgr = platform_get_drvdata(pdev);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(dev, mgr);
 }
 
 static const struct of_device_id zynqmp_fpga_of_match[] = {
@@ -137,7 +119,6 @@ MODULE_DEVICE_TABLE(of, zynqmp_fpga_of_match);
 
 static struct platform_driver zynqmp_fpga_driver = {
.probe = zynqmp_fpga_probe,
-   .remove = zynqmp_fpga_remove,
.driver = {
.name = "zynqmp_fpga_manager",
.of_match_table = of_match_ptr(zynqmp_fpga_of_match),
-- 
2.28.0



[PATCH 02/10] fpga: fpga-mgr: altera-ps-spi: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration by using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/altera-ps-spi.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/altera-ps-spi.c b/drivers/fpga/altera-ps-spi.c
index 0221dee8dd4c..23bfd4d1ad0f 100644
--- a/drivers/fpga/altera-ps-spi.c
+++ b/drivers/fpga/altera-ps-spi.c
@@ -307,18 +307,7 @@ static int altera_ps_probe(struct spi_device *spi)
if (!mgr)
return -ENOMEM;
 
-   spi_set_drvdata(spi, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int altera_ps_remove(struct spi_device *spi)
-{
-   struct fpga_manager *mgr = spi_get_drvdata(spi);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(>dev, mgr);
 }
 
 static const struct spi_device_id altera_ps_spi_ids[] = {
@@ -337,7 +326,6 @@ static struct spi_driver altera_ps_driver = {
},
.id_table = altera_ps_spi_ids,
.probe = altera_ps_probe,
-   .remove = altera_ps_remove,
 };
 
 module_spi_driver(altera_ps_driver)
-- 
2.28.0



[PATCH 03/10] fpga: fpga-mgr: dfl-fme-mgr: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/dfl-fme-mgr.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index b3f7eee3c93f..3fc2be87d059 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -316,16 +316,7 @@ static int fme_mgr_probe(struct platform_device *pdev)
mgr->compat_id = compat_id;
platform_set_drvdata(pdev, mgr);
 
-   return fpga_mgr_register(mgr);
-}
-
-static int fme_mgr_remove(struct platform_device *pdev)
-{
-   struct fpga_manager *mgr = platform_get_drvdata(pdev);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(dev, mgr);
 }
 
 static struct platform_driver fme_mgr_driver = {
@@ -333,7 +324,6 @@ static struct platform_driver fme_mgr_driver = {
.name= DFL_FPGA_FME_MGR,
},
.probe   = fme_mgr_probe,
-   .remove  = fme_mgr_remove,
 };
 
 module_platform_driver(fme_mgr_driver);
-- 
2.28.0



[PATCH 00/10] Introduce devm_fpga_mgr_register()

2020-10-03 Thread Moritz Fischer
This patchset introduces the devm_fpga_mgr_register API,
a devres managed version of fpga_mgr_register().

It reduces boilerplate being repeated literally in every
single driver by moving it to the fpga-mgr core.

Moritz Fischer (10):
  fpga: fpga-mgr: Add devm_fpga_mgr_register() API
  fpga: fpga-mgr: altera-ps-spi: Simplify registration
  fpga: fpga-mgr: dfl-fme-mgr: Simplify registration
  fpga: fpga-mgr: ice40-spi: Simplify registration
  fpga: fpga-mgr: machxo2-spi: Simplify registration
  fpga: fpga-mgr: socfpga: Simplify registration
  fpga: fpga-mgr: ts73xx: Simplify registration
  fpga: fpga-mgr: xilinx-spi: Simplify registration
  fpga: fpga-mgr: zynqmp: Simplify registration
  fpga: fpga-mgr: altera-pr-ip: Simplify registration

 drivers/fpga/altera-pr-ip-core-plat.c  | 10 
 drivers/fpga/altera-pr-ip-core.c   | 14 +
 drivers/fpga/altera-ps-spi.c   | 14 +
 drivers/fpga/dfl-fme-mgr.c | 12 +---
 drivers/fpga/fpga-mgr.c| 76 ++
 drivers/fpga/ice40-spi.c   | 14 +
 drivers/fpga/machxo2-spi.c | 14 +
 drivers/fpga/socfpga.c | 14 +
 drivers/fpga/ts73xx-fpga.c | 14 +
 drivers/fpga/xilinx-spi.c  | 14 +
 drivers/fpga/zynqmp-fpga.c | 21 +--
 include/linux/fpga/altera-pr-ip-core.h |  1 -
 include/linux/fpga/fpga-mgr.h  |  2 +
 13 files changed, 77 insertions(+), 143 deletions(-)

-- 
2.28.0



[PATCH 01/10] fpga: fpga-mgr: Add devm_fpga_mgr_register() API

2020-10-03 Thread Moritz Fischer
Add a devm_fpga_mgr_register() API that can be used to register a FPGA
Manager that was created using devm_fpga_mgr_create().

Introduce a struct fpga_mgr_devres that makes the devres
allocation a little bit more readable and gets reused for
devm_fpga_mgr_create() devm_fpga_mgr_register().

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/fpga-mgr.c   | 76 ++-
 include/linux/fpga/fpga-mgr.h |  2 +
 2 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
index f38bab01432e..774ac98fb69c 100644
--- a/drivers/fpga/fpga-mgr.c
+++ b/drivers/fpga/fpga-mgr.c
@@ -21,6 +21,10 @@
 static DEFINE_IDA(fpga_mgr_ida);
 static struct class *fpga_mgr_class;
 
+struct fpga_mgr_devres {
+   struct fpga_manager *mgr;
+};
+
 /**
  * fpga_image_info_alloc - Allocate a FPGA image info struct
  * @dev: owning device
@@ -651,21 +655,21 @@ struct fpga_manager *devm_fpga_mgr_create(struct device 
*dev, const char *name,
  const struct fpga_manager_ops *mops,
  void *priv)
 {
-   struct fpga_manager **ptr, *mgr;
+   struct fpga_mgr_devres *dr;
 
-   ptr = devres_alloc(devm_fpga_mgr_release, sizeof(*ptr), GFP_KERNEL);
-   if (!ptr)
+   dr = devres_alloc(devm_fpga_mgr_release, sizeof(*dr), GFP_KERNEL);
+   if (!dr)
return NULL;
 
-   mgr = fpga_mgr_create(dev, name, mops, priv);
-   if (!mgr) {
-   devres_free(ptr);
-   } else {
-   *ptr = mgr;
-   devres_add(dev, ptr);
+   dr->mgr = fpga_mgr_create(dev, name, mops, priv);
+   if (!dr->mgr) {
+   devres_free(dr);
+   return NULL;
}
 
-   return mgr;
+   devres_add(dev, dr);
+
+   return dr->mgr;
 }
 EXPORT_SYMBOL_GPL(devm_fpga_mgr_create);
 
@@ -722,6 +726,58 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
 }
 EXPORT_SYMBOL_GPL(fpga_mgr_unregister);
 
+static int fpga_mgr_devres_match(struct device *dev, void *priv,
+void *match_data)
+{
+   struct fpga_mgr_devres *dr = priv;
+
+   return match_data == dr->mgr;
+}
+
+static void devm_fpga_mgr_unregister(struct device *dev, void *priv)
+{
+   struct fpga_mgr_devres *dr = priv;
+
+   fpga_mgr_unregister(dr->mgr);
+}
+
+/**
+ * devm_fpga_mgr_register - resource managed variant of fpga_mgr_register()
+ * @dev: managing device for this FPGA manager
+ * @mgr: fpga manager struct
+ *
+ * This is the devres variant of fpga_mgr_register() for which the unregister
+ * function will be called automatically when the managing device is detached.
+ */
+int devm_fpga_mgr_register(struct device *dev, struct fpga_manager *mgr)
+{
+   struct fpga_mgr_devres *dr;
+   int err;
+
+   /* Make sure that the struct fpga_manager * that is passed in is
+* managed itself.
+*/
+   if (WARN_ON(!devres_find(dev, devm_fpga_mgr_release,
+fpga_mgr_devres_match, mgr)))
+   return -EINVAL;
+
+   dr = devres_alloc(devm_fpga_mgr_unregister, sizeof(*dr), GFP_KERNEL);
+   if (!dr)
+   return -ENOMEM;
+
+   err = fpga_mgr_register(mgr);
+   if (err) {
+   devres_free(dr);
+   return err;
+   }
+
+   dr->mgr = mgr;
+   devres_add(dev, dr);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(devm_fpga_mgr_register);
+
 static void fpga_mgr_dev_release(struct device *dev)
 {
 }
diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
index e8ca62b2cb5b..2bc3030a69e5 100644
--- a/include/linux/fpga/fpga-mgr.h
+++ b/include/linux/fpga/fpga-mgr.h
@@ -198,6 +198,8 @@ void fpga_mgr_free(struct fpga_manager *mgr);
 int fpga_mgr_register(struct fpga_manager *mgr);
 void fpga_mgr_unregister(struct fpga_manager *mgr);
 
+int devm_fpga_mgr_register(struct device *dev, struct fpga_manager *mgr);
+
 struct fpga_manager *devm_fpga_mgr_create(struct device *dev, const char *name,
  const struct fpga_manager_ops *mops,
  void *priv);
-- 
2.28.0



[PATCH 04/10] fpga: fpga-mgr: ice40-spi: Simplify registration

2020-10-03 Thread Moritz Fischer
Simplify registration using new devm_fpga_mgr_register() API.

Signed-off-by: Moritz Fischer 
---
 drivers/fpga/ice40-spi.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/fpga/ice40-spi.c b/drivers/fpga/ice40-spi.c
index 8d689fea0dab..69dec5af23c3 100644
--- a/drivers/fpga/ice40-spi.c
+++ b/drivers/fpga/ice40-spi.c
@@ -183,18 +183,7 @@ static int ice40_fpga_probe(struct spi_device *spi)
if (!mgr)
return -ENOMEM;
 
-   spi_set_drvdata(spi, mgr);
-
-   return fpga_mgr_register(mgr);
-}
-
-static int ice40_fpga_remove(struct spi_device *spi)
-{
-   struct fpga_manager *mgr = spi_get_drvdata(spi);
-
-   fpga_mgr_unregister(mgr);
-
-   return 0;
+   return devm_fpga_mgr_register(dev, mgr);
 }
 
 static const struct of_device_id ice40_fpga_of_match[] = {
@@ -205,7 +194,6 @@ MODULE_DEVICE_TABLE(of, ice40_fpga_of_match);
 
 static struct spi_driver ice40_fpga_driver = {
.probe = ice40_fpga_probe,
-   .remove = ice40_fpga_remove,
.driver = {
.name = "ice40spi",
.of_match_table = of_match_ptr(ice40_fpga_of_match),
-- 
2.28.0



drivers/spi/spi-orion.c:409:24: sparse: sparse: incorrect type in argument 1 (different base types)

2020-10-03 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   22fbc037cd32e4e6771d2271b565806cfb8c134c
commit: 80591e61a0f7e88deaada69844e4a31280c4a38f kbuild: tell sparse about the 
$ARCH
date:   11 months ago
config: alpha-randconfig-s032-20201004 (attached as .config)
compiler: alpha-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.2-201-g24bdaac6-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80591e61a0f7e88deaada69844e4a31280c4a38f
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 80591e61a0f7e88deaada69844e4a31280c4a38f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

echo
echo "sparse warnings: (new ones prefixed by >>)"
echo
>> drivers/spi/spi-orion.c:409:24: sparse: sparse: incorrect type in argument 1 
>> (different base types) @@ expected unsigned int [usertype] b @@ got 
>> restricted __le16 [usertype] @@
>> drivers/spi/spi-orion.c:409:24: sparse: expected unsigned int [usertype] 
>> b
   drivers/spi/spi-orion.c:409:24: sparse: got restricted __le16 [usertype]
   drivers/spi/spi-orion.c:419:17: sparse: sparse: cast to restricted __le16
   drivers/spi/spi-orion.c:419:17: sparse: sparse: cast to restricted __le16
   drivers/spi/spi-orion.c:419:17: sparse: sparse: cast to restricted __le16
   drivers/spi/spi-orion.c:419:17: sparse: sparse: cast to restricted __le16

vim +409 drivers/spi/spi-orion.c

60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  392  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  393  static 
inline int
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  394  
orion_spi_write_read_16bit(struct spi_device *spi,
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  395
   const u16 **tx_buf, u16 **rx_buf)
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  396  {
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  397void 
__iomem *tx_reg, *rx_reg, *int_reg;
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  398struct 
orion_spi *orion_spi;
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  399  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  400
orion_spi = spi_master_get_devdata(spi->master);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  401tx_reg 
= spi_reg(orion_spi, ORION_SPI_DATA_OUT_REG);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  402rx_reg 
= spi_reg(orion_spi, ORION_SPI_DATA_IN_REG);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  403int_reg 
= spi_reg(orion_spi, ORION_SPI_INT_CAUSE_REG);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  404  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  405/* 
clear the interrupt cause register */
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  406
writel(0x0, int_reg);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  407  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  408if 
(tx_buf && *tx_buf)
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05 @409
writel(__cpu_to_le16(get_unaligned((*tx_buf)++)), tx_reg);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  410else
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  411
writel(0, tx_reg);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  412  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  413if 
(orion_spi_wait_till_ready(orion_spi) < 0) {
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  414
dev_err(>dev, "TXS timed out\n");
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  415
return -1;
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  416}
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  417  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  418if 
(rx_buf && *rx_buf)
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  419
put_unaligned(__le16_to_cpu(readl(rx_reg)), (*rx_buf)++);
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  420  
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 2008-08-05  421return 
1;
60cadec9da7b6c drivers/spi/orion_spi.c Shadi Ammouri 

Re: [PATCH v7 0/5] Fix DPC hotplug race and enhance error handling

2020-10-03 Thread Raj, Ashok
Hi Ethan

On Sat, Oct 03, 2020 at 03:55:09AM -0400, Ethan Zhao wrote:
> Hi,folks,
> 
> This simple patch set fixed some serious security issues found when DPC
> error injection and NVMe SSD hotplug brute force test were doing -- race
> condition between DPC handler and pciehp, AER interrupt handlers, caused
> system hang and system with DPC feature couldn't recover to normal
> working state as expected (NVMe instance lost, mount operation hang,
> race PCIe access caused uncorrectable errors reported alternatively etc).

I think maybe picking from other commit messages to make this description in 
cover letter bit clear. The fundamental premise is that when due to error
conditions when events are processed by both DPC handler and hotplug handling 
of 
DLLSC both operating on the same device object ends up with crashes.


> 
> With this patch set applied, stable 5.9-rc6 on ICS (Ice Lake SP platform,
> see
> https://en.wikichip.org/wiki/intel/microarchitectures/ice_lake_(server))
> 
> could pass the PCIe Gen4 NVMe SSD brute force hotplug test with any time
> interval between hot-remove and plug-in operation tens of times without
> any errors occur and system works normal.

> 
> With this patch set applied, system with DPC feature could recover from
> NON-FATAL and FATAL errors injection test and works as expected.
> 
> System works smoothly when errors happen while hotplug is doing, no
> uncorrectable errors found.
> 
> Brute DPC error injection script:
> 
> for i in {0..100}
> do
> setpci -s 64:02.0 0x196.w=000a
> setpci -s 65:00.0 0x04.w=0544
> mount /dev/nvme0n1p1 /root/nvme
> sleep 1
> done
> 
> Other details see every commits description part.
> 
> This patch set could be applied to stable 5.9-rc6/rc7 directly.
> 
> Help to review and test.
> 
> v2: changed according to review by Andy Shevchenko.
> v3: changed patch 4/5 to simpler coding.
> v4: move function pci_wait_port_outdpc() to DPC driver and its
>declaration to pci.h. (tip from Christoph Hellwig ).
> v5: fix building issue reported by l...@intel.com with some config.
> v6: move patch[3/5] as the first patch according to Lukas's suggestion.
> and rewrite the comment part of patch[3/5].
> v7: change the patch[4/5], based on Bjorn's code and truth table.
> change the patch[5/5] about the debug output information.
> 
> Thanks,
> Ethan 
> 
> 
> Ethan Zhao (5):
>   PCI/ERR: get device before call device driver to avoid NULL pointer
> dereference
>   PCI/DPC: define a function to check and wait till port finish DPC
> handling
>   PCI: pciehp: check and wait port status out of DPC before handling
> DLLSC and PDC
>   PCI: only return true when dev io state is really changed
>   PCI/ERR: don't mix io state not changed and no driver together
> 
>  drivers/pci/hotplug/pciehp_hpc.c |  4 ++-
>  drivers/pci/pci.h| 55 +---
>  drivers/pci/pcie/dpc.c   | 27 
>  drivers/pci/pcie/err.c   | 18 +--
>  4 files changed, 68 insertions(+), 36 deletions(-)
> 
> 
> base-commit: a1b8638ba1320e6684aa98233c15255eb803fac7
> -- 
> 2.18.4
> 


[PATCH v5] ipvs: Add traffic statistic up even it is VS/DR or VS/TUN mode

2020-10-03 Thread longguang.yue
It's ipvs's duty to do traffic statistic if packets get hit,
no matter what mode it is.

--
Changes in v1: support DR/TUN mode statistic
Changes in v2: ip_vs_conn_out_get handles DR/TUN mode's conn
Changes in v3: fix checkpatch
Changes in v4, v5: restructure and optimise this feature
--

Signed-off-by: longguang.yue 
---
 net/netfilter/ipvs/ip_vs_conn.c | 18 +++---
 net/netfilter/ipvs/ip_vs_core.c | 17 ++---
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a90b8eac16ac..af08ca2d9174 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -401,6 +401,8 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct 
ip_vs_conn_param *p)
 struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
 {
unsigned int hash;
+   __be16 sport;
+   const union nf_inet_addr *saddr;
struct ip_vs_conn *cp, *ret=NULL;
 
/*
@@ -411,10 +413,20 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct 
ip_vs_conn_param *p)
rcu_read_lock();
 
hlist_for_each_entry_rcu(cp, _vs_conn_tab[hash], c_list) {
-   if (p->vport == cp->cport && p->cport == cp->dport &&
-   cp->af == p->af &&
+   if (p->vport != cp->cport)
+   continue;
+
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) {
+   sport = cp->vport;
+   saddr = >vaddr;
+   } else {
+   sport = cp->dport;
+   saddr = >daddr;
+   }
+
+   if (p->cport == sport && cp->af == p->af &&
ip_vs_addr_equal(p->af, p->vaddr, >caddr) &&
-   ip_vs_addr_equal(p->af, p->caddr, >daddr) &&
+   ip_vs_addr_equal(p->af, p->caddr, saddr) &&
p->protocol == cp->protocol &&
cp->ipvs == p->ipvs) {
if (!__ip_vs_conn_get(cp))
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index e3668a6e54e4..494ea1fcf4d8 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -875,7 +875,7 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
unsigned int verdict = NF_DROP;
 
if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
-   goto ignore_cp;
+   goto after_nat;
 
/* Ensure the checksum is correct */
if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
@@ -900,7 +900,7 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
 
if (ip_vs_route_me_harder(cp->ipvs, af, skb, hooknum))
goto out;
-
+after_nat:
/* do the statistics and put it back */
ip_vs_out_stats(cp, skb);
 
@@ -909,8 +909,6 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 0);
-
-ignore_cp:
verdict = NF_ACCEPT;
 
 out:
@@ -1276,6 +1274,9 @@ handle_response(int af, struct sk_buff *skb, struct 
ip_vs_proto_data *pd,
 {
struct ip_vs_protocol *pp = pd->pp;
 
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+   goto after_nat;
+
IP_VS_DBG_PKT(11, af, pp, skb, iph->off, "Outgoing packet");
 
if (skb_ensure_writable(skb, iph->len))
@@ -1316,6 +1317,7 @@ handle_response(int af, struct sk_buff *skb, struct 
ip_vs_proto_data *pd,
 
IP_VS_DBG_PKT(10, af, pp, skb, iph->off, "After SNAT");
 
+after_nat:
ip_vs_out_stats(cp, skb);
ip_vs_set_state(cp, IP_VS_DIR_OUTPUT, skb, pd);
skb->ipvs_property = 1;
@@ -1413,8 +1415,6 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
 ipvs, af, skb, );
 
if (likely(cp)) {
-   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
-   goto ignore_cp;
return handle_response(af, skb, pd, cp, , hooknum);
}
 
@@ -1475,14 +1475,9 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
}
}
 
-out:
IP_VS_DBG_PKT(12, af, pp, skb, iph.off,
  "ip_vs_out: packet continues traversal as normal");
return NF_ACCEPT;
-
-ignore_cp:
-   __ip_vs_conn_put(cp);
-   goto out;
 }
 
 /*
-- 
2.20.1 (Apple Git-117)




possible deadlock in start_transaction

2020-10-03 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:ccc1d052 Merge tag 'dmaengine-fix-5.9' of git://git.kernel..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17f1fa5b90
kernel config:  https://syzkaller.appspot.com/x/.config?x=41b736b7ce1b3ea4
dashboard link: https://syzkaller.appspot.com/bug?extid=ec309a632856890f2635
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ec309a632856890f2...@syzkaller.appspotmail.com

==
WARNING: possible circular locking dependency detected
5.9.0-rc7-syzkaller #0 Not tainted
--
kworker/u4:6/8345 is trying to acquire lock:
888091200640 (sb_internal#3){.+.+}-{0:0}, at: sb_start_intwrite 
include/linux/fs.h:1690 [inline]
888091200640 (sb_internal#3){.+.+}-{0:0}, at: 
start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624

but task is already holding lock:
c900161ffda8 ((work_completion)(&(>dwork)->work)){+.+.}-{0:0}, at: 
process_one_work+0x85f/0x1670 kernel/workqueue.c:2244

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 ((work_completion)(&(>dwork)->work)){+.+.}-{0:0}:
   __flush_work+0x60e/0xac0 kernel/workqueue.c:3041
   wb_shutdown+0x180/0x220 mm/backing-dev.c:355
   bdi_unregister+0x174/0x590 mm/backing-dev.c:872
   del_gendisk+0x820/0xa10 block/genhd.c:933
   loop_remove drivers/block/loop.c:2192 [inline]
   loop_control_ioctl drivers/block/loop.c:2291 [inline]
   loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257
   vfs_ioctl fs/ioctl.c:48 [inline]
   __do_sys_ioctl fs/ioctl.c:753 [inline]
   __se_sys_ioctl fs/ioctl.c:739 [inline]
   __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #3 (loop_ctl_mutex){+.+.}-{3:3}:
   __mutex_lock_common kernel/locking/mutex.c:956 [inline]
   __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
   lo_open+0x19/0xd0 drivers/block/loop.c:1893
   __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507
   blkdev_get fs/block_dev.c:1639 [inline]
   blkdev_open+0x227/0x300 fs/block_dev.c:1753
   do_dentry_open+0x4b9/0x11b0 fs/open.c:817
   do_open fs/namei.c:3251 [inline]
   path_openat+0x1b9a/0x2730 fs/namei.c:3368
   do_filp_open+0x17e/0x3c0 fs/namei.c:3395
   do_sys_openat2+0x16d/0x420 fs/open.c:1168
   do_sys_open fs/open.c:1184 [inline]
   __do_sys_open fs/open.c:1192 [inline]
   __se_sys_open fs/open.c:1188 [inline]
   __x64_sys_open+0x119/0x1c0 fs/open.c:1188
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (>bd_mutex){+.+.}-{3:3}:
   __mutex_lock_common kernel/locking/mutex.c:956 [inline]
   __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
   blkdev_put+0x30/0x520 fs/block_dev.c:1804
   btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
   btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
   btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
   close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
   close_fs_devices fs/btrfs/volumes.c:1193 [inline]
   btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
   close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4148
   generic_shutdown_super+0x144/0x370 fs/super.c:464
   kill_anon_super+0x36/0x60 fs/super.c:1108
   btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
   deactivate_locked_super+0x94/0x160 fs/super.c:335
   deactivate_super+0xad/0xd0 fs/super.c:366
   cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
   task_work_run+0xdd/0x190 kernel/task_work.c:141
   tracehook_notify_resume include/linux/tracehook.h:188 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:165 [inline]
   exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:192
   syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:267
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (_devs->device_list_mutex){+.+.}-{3:3}:
   __mutex_lock_common kernel/locking/mutex.c:956 [inline]
   __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
   btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255
   btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109
   __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916
   btrfs_alloc_data_chunk_ondemand+0x2a1/0x670 fs/btrfs/delalloc-space.c:167
   btrfs_fallocate+0x279/0x2900 fs/btrfs/file.c:3282
   vfs_fallocate+0x48d/0x9d0 fs/open.c:309
   ksys_fallocate fs/open.c:332 [inline]
   __do_sys_fallocate fs/open.c:340 [inline]
   __se_sys_fallocate fs/open.c:338 

Re: [RFC PATCH v1 09/26] docs: reporting-bugs: help users find the proper place for their report

2020-10-03 Thread Randy Dunlap
On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Make it obvious that bugzilla.kernel.org most of the time is the wrong
> place to file a report, as it's not working well. Instead, tell users
> how to read the MAINTAINERS file to find the proper place for their
> report. Also mention ./scripts/get_maintainer.pl. Sadly this is only
> available for users that have the sourced at hand; in an ideal world
> somebody would build a web-service around of this.
> 
> Signed-off-by: Thorsten Leemhuis 
> ---
> 
> = RFC =
> 
> This section tells users to always CC LKML. I placed this in the text here for
> now as a way to force a general discussion about this, as it would be way 
> easier
> if users had one place where they could search for existing reports; maybe it
> should be the same place where fixes are sent to, as then the single search
> would find those, too.
> 
> That might mean "LKML", which these days afaics is a kind of "catch-all" ml
> anyway (which nearly nobody reads). So it might make sense to go "all in" and
> make people send their reports here, too. But TBH I'm a bit unsure myself if
> that's the right approach. Maybe creating a mailing list like
> 'linux-iss...@vger.kernel.org' would be best (and while at it maybe also
> linux-regressi...@vger.kernel.org).

Yes, LKML has become an archival list for almost everything. However, bug 
reports
should still be sent to their more specific list when possible, e.g., USB to
linux-usb, ACPI to linux-acpi, networking to netdev, wireless to linux-wireless,
SCSI to linux-scsi, etc.

I might be OK with one additional bug/issues/regressions mailing list but I
wouldn't care to see that split into more than one list.

> ---
>  Documentation/admin-guide/reporting-bugs.rst | 166 ++-
>  1 file changed, 121 insertions(+), 45 deletions(-)
> 
> diff --git a/Documentation/admin-guide/reporting-bugs.rst 
> b/Documentation/admin-guide/reporting-bugs.rst
> index 61b6592ddf74..3e9923c9650e 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -370,6 +370,127 @@ from being loaded by specifying ``foo.blacklist=1`` as 
> kernel parameter (replace
>  'foo' with the name of the module in question).
>  
>  
> +Locate kernel area that causes the issue
> +
> +
> +*Locate the driver or kernel subsystem that seems to be causing the 
> issue.
> +Find out how and where its developers expect reports. Note: most of the 
> time
> +this won't be bugzilla.kernel.org, as issues typically need to be sent by
> +mail to a maintainer and a public mailing list.*
> +
> +It's crucial to send your report to the right people, as the Linux kernel is 
> big

 is 
a big

> +project and most of its developers are only familiar with a very small part 
> of
> +it. Quite a few developers only care for just one driver; some of them also 
> look
> +after the various infrastructure building blocks the driver is building upon,
> +but sometimes other maintainers take care of those. These people talk with 
> each
> +other, but work mostly separately from each other. But most of them don't 
> care
> +about file systems or memory management, which yet other people take care of.
> +
> +Problem is: the Linux kernel lacks a central bug tracker that all maintainers
> +use, so you have to find the right way and place to report issues yourself. 
> One
> +way to do that: look at the `MAINTAINERS file in the Linux kernel sources
> +`_,

The MAINTAINERS list is also available via html at
https://www.kernel.org/doc/html/latest/process/maintainers.html

but since a reporter might need to use scripts/get_maintainer.pl, maybe the html
doesn't help so much.

> +which lists the points of contact for the various parts of the kernel. The 
> file
> +contains a long text with sections that among others will mention who 
> maintains
> +the various parts of the kernel and the development mailing list for that 
> code.
> +
> +How to decode the maintainers file

 MAINTAINERS

> +~~
> +
> +To illustrate how to use the file lets assume the Wi-Fi in your Laptop 
> suddenly
> +misbehaves after updating the kernel. In that case it's likely an issue in 
> the
> +Wi-Fi driver; it could also be some code it builds upon: the Wi-Fi subsystem,
> +the TCP/IP stack, which are all part of the Network subsystem. But unless you
> +suspect the culprit lies there stick to the driver. Thus run the command
> +``lspci -k`` to tell which kernel driver manages a particular hardware::

Other times it might be 'lsusb' or 'lsscsi' or this might not be applicable at 
all
to LED drivers or pinctrl drivers or W1 or I2C or GPIO (?).

> +
> +   [user@something ~]$ lspci -k
> +   [...]
> +   3a:00.0 Network 

Re: [RFC PATCH v1 25/26] docs: reporting-bugs: explain things could be easier

2020-10-03 Thread Randy Dunlap
On 10/1/20 1:50 AM, Thorsten Leemhuis wrote:
> A few closing words to explain why things are like this until someone
> steps up to make things easier for people.
> 
> Signed-off-by: Thorsten Leemhuis 
> ---
>  Documentation/admin-guide/reporting-bugs.rst | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/admin-guide/reporting-bugs.rst 
> b/Documentation/admin-guide/reporting-bugs.rst
> index 8f60af27635b..42f59419263a 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -1458,6 +1458,15 @@ But don't worry too much about all of this, a lot of 
> drivers have active
>  maintainers who are quite interested in fixing as many issues as possible.
>  
>  
> +Closing words
> +=
> +
> +Compared with other Free/Libre & Open Source Software it's hard to reporting

   to report

> +issues to the Linux kernel developers: the length and complexity of this
> +document and the implications between the lines illustrate that. But that's 
> how
> +it is for now. The main author of this text hopes documenting the state of 
> the
> +art will lay some groundwork to improve the situation over time.
> +
>  .. 
> 
>  .. Temporary marker added while this document is rewritten. Sections above
>  .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
> 


-- 
~Randy



Re: [RFC PATCH v1 24/26] docs: reporting-bugs: explain why users might get neither reply nor fix

2020-10-03 Thread Randy Dunlap
On 10/1/20 1:50 AM, Thorsten Leemhuis wrote:
> Not even getting a reply after one invested quite a bit of time with
> preparing and writing a report can be quite devastating. But when it
> comes to Linux, this can easily happen for good or bad reasons. Hence,
> use this opportunity to explain why this might happen, hopefully some
> people then will be less disappointed if it happens.
> 
> Signed-off-by: Thorsten Leemhuis 
> ---
>  Documentation/admin-guide/reporting-bugs.rst | 56 
>  1 file changed, 56 insertions(+)
> 
> diff --git a/Documentation/admin-guide/reporting-bugs.rst 
> b/Documentation/admin-guide/reporting-bugs.rst
> index 340fa44b352c..8f60af27635b 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -1402,6 +1402,62 @@ for the subsystem as well as the stable mailing list 
> the `MAINTAINERS file
>  mention in the section "STABLE BRANCH".
>  
>  
> +Why some issues won't get any reaction or remain unfixed after being reported
> +=
> +
> +When reporting a problem to the Linux developers, be aware only 'issues of 
> high
> +priority' (regression, security issue, severe problems) are definitely going 
> to
> +get resolved. The maintainers or if all else fails Linus Torvalds himself 
> will
> +make sure of that. They and the other kernel developers will fix a lot of 
> other
> +issues as well. But be aware that sometimes they can't or won't help; and
> +sometimes there isn't even anyone to send a report to.
> +
> +This is best explained with kernel developers that contribute to the Linux
> +kernel in their spare time. Quite a few of the drivers in the kernel were
> +written by such programmers, often because they simply wanted to make their
> +hardware usable on their favorite operating system.
> +
> +These programmers most of the time will happily fix problems other people
> +report. But nobody can force them to do, as they are contributing 
> voluntarily.
> +
> +Then there are situations where such developers really want to fix an issue,
> +but can't: they lack hardware programming documentation to do so. This often
> +happens when the publicly available docs are superficial or the driver was
> +written with the help of reverse engineering.
> +
> +Sooner or later spare time developers will also stop caring for the driver.
> +Maybe their test hardware broke, got replaced by something more fancy, or is 
> so
> +old that it's something you don't find much outside of computer museums
> +anymore. Or the developer stops caring for their code and Linux at all, as
> +something different in their life became way more important. Sometimes nobody
> +is willing to take over the job as maintainer – and nobody can be forced to, 
> as
> +contributing to the Linux kernel is done on a voluntary basis. Abandoned
> +drivers nevertheless remain in the kernel: they are still useful for people 
> and
> +removing would be a regression.
> +
> +The situation is not that different with developers that are paid for their
> +work on the Linux kernel. Those contribute most changes these days. But their
> +employers sooner or later also stop caring for some code and make its 
> programmer
> +focus on other thing. Hardware vendors for example earn their money mainly by

 on other things.

> +selling new hardware; quite a few of them hence are not investing much time 
> and
> +energy in maintaining a Linux kernel driver for something they sold years 
> ago.
> +Enterprise Linux distributors often care for a longer time period, but in new
> +version often leave support for old and rare hardware aside to limit the 
> scope.
> +Often spare time contributors take over once a company leaves some orphan 
> some

  drop last: 
some

> +code, but as mentioned above: sooner or later will leave the code behind, 
> too.

   later they will leave the code 
behind, too.

> +
> +Priorities are another reason why some issues are not fixed, as maintainers
> +quite often are forced to set those, as time to work on Linux is limited. 
> That's
> +true for spare time or the time employers grant their developers to spend on
> +maintenance work on the upstream kernel. Sometimes maintainers also get
> +overwhelmed with reports, even if a driver is working nearly perfectly. To 
> not
> +get completely stuck, the programmer thus might have no other choice then to

than to

> +prioritize issue reports and reject some of them.
> +
> +But don't worry too much about all of this, a lot of drivers have active
> +maintainers who are quite interested in fixing as many issues as possible.
> +
> +
>  .. 
> 
>  .. Temporary marker added while this 

Re: [RFC PATCH v1 20/26] docs: reporting-bugs: instructions for handling regressions

2020-10-03 Thread Randy Dunlap
On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Describe what users will have to do if they deal with a regression.
> Point out that bisection is really important.
> 
> While at it explicitly mention the .config files for the newer kernel
> needs to be similar to the old kernel, as that's an important detail
> quite a few people seem to miss sometimes.
> 
> Signed-off-by: Thorsten Leemhuis 
> ---
>  Documentation/admin-guide/bug-bisect.rst |  2 +
>  Documentation/admin-guide/reporting-bugs.rst | 53 
>  2 files changed, 55 insertions(+)
> 
> diff --git a/Documentation/admin-guide/bug-bisect.rst 
> b/Documentation/admin-guide/bug-bisect.rst
> index 59567da344e8..38d9dbe7177d 100644
> --- a/Documentation/admin-guide/bug-bisect.rst
> +++ b/Documentation/admin-guide/bug-bisect.rst
> @@ -1,3 +1,5 @@
> +.. _bugbisect:
> +
>  Bisecting a bug
>  +++
>  
> diff --git a/Documentation/admin-guide/reporting-bugs.rst 
> b/Documentation/admin-guide/reporting-bugs.rst
> index e1219e56979f..71c49347c544 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -792,6 +792,59 @@ sometimes needs to get decoded to be readable, which is 
> explained in
>  admin-guide/bug-hunting.rst.
>  
>  
> +Special care for regressions
> +
> +
> +*If your problem is a regression, try to narrow down when the issue was
> +introduced as much as possible.*
> +
> +Linux lead developer Linus Torvalds insists that the Linux kernel never
> +worsens, that's why he deems regressions as unacceptable and wants to see 
> them
> +fixed quickly. That's why changes that introduced a regression are often
> +promptly reverted if the issue they cause can't get solved quickly any other
> +way. Reporting a regression is thus a bit like playing a kind of trump card 
> to
> +get something quickly fixed. But for that to happen the culprit needs to be
> +known. Normally it's up to the reporter to track down the change that's 
> causing
> +the regression, as maintainers often won't have the time or setup at hand to
> +reproduce it themselves.
> +
> +To find the culprit there is a process called 'bisection' which the document
> +:ref:`Documentation/admin-guide/bug-bisect.rst ` describes in 
> detail.
> +That process will often require you to build about ten to twenty kernel 
> images
> +and test each of them for the issue. Yes, that takes some time, but 't worry,

   but don't 
worry,

> +it works a lot quicker than most people assume. Thanks to a 'binary search' 
> this
> +will lead you to the one commit in the source code management system that's
> +causing the regression. Once you found it, serch the net for the subject of 
> the

find it, search

Often it can find the bad commit, but sometimes it fails. It's not always 
perfect.

> +change, its commit id and the shortened commit id (the first 12 characters of
> +the commit id). This will lead you to exisiting reports about it, if there 
> are

 existing

> +any.
> +
> +Note, a bisection needs a bit of know-how, which not everyone has, and quite 
> a
> +bit of effort, which not everyone is willing to invest. Nevertheless, it's
> +highly recommended performing a bisection yourself. If you really can't or 
> don't

I would say:
   highly recommended to perform a bisection yourself.

> +want to go down that route at least find out which mainline kernel introduced
> +the regression. If something for example breaks when switching from 5.5.15 to
> +5.8.4, then try at least all the mainline releases in that area (5.6, 5.7 and
> +5.8) to check when it first showed up. Unless you're trying to find a 
> regression
> +in a stable or longterm kernel, avoid testing versions which number has three
> +sections (5.6.12, 5.7.8), as that can lead to confusion and might make your
> +testing useless. Then feel free to go further in the reporting process. But
> +keep in mind: if the developers will be able to help depend on the issue at

depends

> +hand. Sometimes the developers from the report will be able to recognize want
> +went wrong and fix it; other times they might be unable to help unless the
> +reporter performs a bisection.
> +
> +When dealing with regressions make sure the issue you face is really caused 
> by
> +the kernel and not by something else, as outlined above already.
> +
> +In the whole process keep in mind: an issue only qualifies as regression if 
> the
> +older and the newer kernel got build with a similar configuration. The best 
> way

  built

> +to archive this: copy the configuration file (``.config``) from the old 
> kernel
> +freshly to each newer kernel version you try. Afterwards run
> +``make oldnoconfig`` to adjust it for the needs of the new 

Re: [PATCH 2/2] Platform integrity information in sysfs (version 9)

2020-10-03 Thread Randy Dunlap
On 9/30/20 9:37 AM, Daniel Gutson wrote:
> diff --git a/drivers/mtd/spi-nor/controllers/Kconfig 
> b/drivers/mtd/spi-nor/controllers/Kconfig
> index 5c0e0ec2e6d1..e7eaef506fc2 100644
> --- a/drivers/mtd/spi-nor/controllers/Kconfig
> +++ b/drivers/mtd/spi-nor/controllers/Kconfig
> @@ -29,6 +29,7 @@ config SPI_NXP_SPIFI
>  
>  config SPI_INTEL_SPI
>   tristate
> + depends on PLATFORM_INTEGRITY_DATA

So SPI_INTEL_SPI_PCI selects SPI_INTEL_SPI:

config SPI_INTEL_SPI_PCI
tristate "Intel PCH/PCU SPI flash PCI driver (DANGEROUS)"
depends on X86 && PCI
select SPI_INTEL_SPI

without checking that PLATFORM_INTEGRITY_DATA is set/enabled.

"select" does not follow any kconfig dependency chains, so when
PLATFORM_INTEGRITY_DATA is not enabled, this should be causing
a kconfig warning, which is not OK.


-- 
~Randy



Re: [PATCH] ocfs2: ratelimit the 'max lookup times reached' notice

2020-10-03 Thread Joseph Qi



On 2020/10/2 06:44, Mauricio Faria de Oliveira wrote:
> Running stress-ng on ocfs2 completely fills the kernel log with
> 'max lookup times reached, filesystem may have nested directories.'
> 
> Let's ratelimit this message as done with others in the code.
> 
> Test-case:
> 
>   # mkfs.ocfs2 --mount local $DEV
>   # mount $DEV $MNT
>   # cd $MNT
> 
>   # dmesg -C
>   # stress-ng --dirdeep 1 --dirdeep-ops 1000
>   # dmesg | grep -c 'max lookup times reached'
> 
> Before:
> 
>   # dmesg -C
>   # stress-ng --dirdeep 1 --dirdeep-ops 1000
>   ...
>   stress-ng: info:  [6] successful run completed in 3.03s
> 
>   # dmesg | grep -c 'max lookup times reached'
>   967
> 
> After:
> 
>   # dmesg -C
>   # stress-ng --dirdeep 1 --dirdeep-ops 1000
>   ...
>   stress-ng: info:  [739] successful run completed in 0.96s
> 
>   # dmesg | grep -c 'max lookup times reached'
>   10
> 
>   # dmesg
>   [  259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed
>   [  259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max 
> lookup times reached, filesystem may have nested directories, src inode: 
> 18007, dest inode: 17940.
>   ...
> 
> Signed-off-by: Mauricio Faria de Oliveira 

Looks good to me.
Reviewed-by: Joseph Qi 

> ---
>  fs/ocfs2/namei.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index 3c908e9416af..0043eddabdb8 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -1095,8 +1095,8 @@ static int ocfs2_check_if_ancestor(struct ocfs2_super 
> *osb,
>   child_inode_no = parent_inode_no;
>  
>   if (++i >= MAX_LOOKUP_TIMES) {
> - mlog(ML_NOTICE, "max lookup times reached, filesystem "
> - "may have nested directories, "
> + mlog_ratelimited(ML_NOTICE, "max lookup times reached, "
> + "filesystem may have nested 
> directories, "
>   "src inode: %llu, dest inode: %llu.\n",
>   (unsigned long long)src_inode_no,
>   (unsigned long long)dest_inode_no);
> 


Re: [PATCH v10 0/7] Introduce sendpage_ok() to detect misused sendpage in network related drivers

2020-10-03 Thread Coly Li
On 2020/10/3 06:28, David Miller wrote:
> From: Coly Li 
> Date: Fri,  2 Oct 2020 16:27:27 +0800
> 
>> As Sagi Grimberg suggested, the original fix is refind to a more common
>> inline routine:
>> static inline bool sendpage_ok(struct page *page)
>> {
>> return  (!PageSlab(page) && page_count(page) >= 1);
>> }
>> If sendpage_ok() returns true, the checking page can be handled by the
>> concrete zero-copy sendpage method in network layer.
> 
> Series applied.
> 
>> The v10 series has 7 patches, fixes a WARN_ONCE() usage from v9 series,
>  ...
> 
> I still haven't heard from you how such a fundamental build failure
> was even possible.
> 

Hi David,

Here is the detail steps how I leaked this uncompleted patch to you,
1) Add WARN_ONCE() as WARN_ON() to kernel_sendpage(). Maybe I was still
hesitating when I typed WARN_ONCE() on keyboard.
2) Generate the patches, prepare to post
3) Hmm, compiling failed, oh it is WARN_ONCE(). Yeah, WARN_ONCE() might
be more informative and better.
4) Modify to use WARN_ONCE() and compile and try, looks fine.
5) Re-generate the patches to overwrite the previous ones.
6) Post the patches.

The missing part was, before I post the patches, I should do rebase and
commit the change, but (interrupted by other stuffs) it skipped in my
mind. Although I regenerated the series but the change was not included.
The result was, uncompleted patch posted and the second-half change
still stayed in my local file.


> If the v9 patch series did not even compile, how in the world did you
> perform functional testing of these changes?
> 

Only 0002-net-add-WARN_ONCE-in-kernel_sendpage-for-improper-ze.patch was
tested in v9 series, other tests were done in previous versions.

> Please explain this to me, instead of just quietly fixing it and
> posting an updated series.


And not all the patches in the series were tested. Here is the testing
coverage of the series:

The following ones were tested and verified to break nothing and avoid
the mm corruption and panic,
0001-net-introduce-helper-sendpage_ok-in-include-linux-ne.patch
0002-net-add-WARN_ONCE-in-kernel_sendpage-for-improper-ze.patch
0003-nvme-tcp-check-page-by-sendpage_ok-before-calling-ke.patch
0006-scsi-libiscsi-use-sendpage_ok-in-iscsi_tcp_segment_m.patch

The following ones were not tested, due to complicated environment setup,
0005-drbd-code-cleanup-by-using-sendpage_ok-to-check-page.patch
0007-libceph-use-sendpage_ok-in-ceph_tcp_sendpage.patch

This patch I didn't explicitly test, due to lack of knowledge to modify
network code to trigger a buggy condition. It just went with other
tested patches,
0004-tcp-use-sendpage_ok-to-detect-misused-.sendpage.patch


Back to the built failure, I don't have excuse for leaking this
uncompleted version to you. Of cause I will try to avoid to
inefficiently occupy maintainer's time by such silly mess up.

Thanks for your review and the thorough maintenance.

Coly Li


[PATCH v3 10/10] x86: Reclaim TIF_IA32 and TIF_X32

2020-10-03 Thread Gabriel Krisman Bertazi
Now that these flags are no longer used, reclaim those TI bits.

Signed-off-by: Gabriel Krisman Bertazi 
---
 arch/x86/include/asm/thread_info.h | 4 
 arch/x86/kernel/process_64.c   | 6 --
 2 files changed, 10 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..6888aa39c4d6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -91,7 +91,6 @@ struct thread_info {
 #define TIF_NEED_FPU_LOAD  14  /* load FPU on return to userspace */
 #define TIF_NOCPUID15  /* CPUID is not accessible in userland 
*/
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
-#define TIF_IA32   17  /* IA32 compatibility process */
 #define TIF_SLD18  /* Restore split lock detection 
on context switch */
 #define TIF_MEMDIE 20  /* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG 21  /* idle is polling for TIF_NEED_RESCHED 
*/
@@ -101,7 +100,6 @@ struct thread_info {
 #define TIF_LAZY_MMU_UPDATES   27  /* task is updating the mmu lazily */
 #define TIF_SYSCALL_TRACEPOINT 28  /* syscall tracepoint instrumentation */
 #define TIF_ADDR32 29  /* 32-bit address space on 64 bits */
-#define TIF_X3230  /* 32-bit native x86-64 binary 
*/
 #define TIF_FSCHECK31  /* Check FS is USER_DS on return */
 
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
@@ -121,7 +119,6 @@ struct thread_info {
 #define _TIF_NEED_FPU_LOAD (1 << TIF_NEED_FPU_LOAD)
 #define _TIF_NOCPUID   (1 << TIF_NOCPUID)
 #define _TIF_NOTSC (1 << TIF_NOTSC)
-#define _TIF_IA32  (1 << TIF_IA32)
 #define _TIF_SLD   (1 << TIF_SLD)
 #define _TIF_POLLING_NRFLAG(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -130,7 +127,6 @@ struct thread_info {
 #define _TIF_LAZY_MMU_UPDATES  (1 << TIF_LAZY_MMU_UPDATES)
 #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32(1 << TIF_ADDR32)
-#define _TIF_X32   (1 << TIF_X32)
 #define _TIF_FSCHECK   (1 << TIF_FSCHECK)
 
 /* flags to check in __switch_to() */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 40fa7973e4f0..9e71101e9d61 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -640,9 +640,7 @@ void set_personality_64bit(void)
/* inherit personality from parent */
 
/* Make sure to be in 64bit mode */
-   clear_thread_flag(TIF_IA32);
clear_thread_flag(TIF_ADDR32);
-   clear_thread_flag(TIF_X32);
/* Pretend that this comes from a 64bit execve */
task_pt_regs(current)->orig_ax = __NR_execve;
current_thread_info()->status &= ~TS_COMPAT;
@@ -659,8 +657,6 @@ void set_personality_64bit(void)
 static void __set_personality_x32(void)
 {
 #ifdef CONFIG_X86_X32
-   clear_thread_flag(TIF_IA32);
-   set_thread_flag(TIF_X32);
if (current->mm)
current->mm->context.flags = 0;
 
@@ -681,8 +677,6 @@ static void __set_personality_x32(void)
 static void __set_personality_ia32(void)
 {
 #ifdef CONFIG_IA32_EMULATION
-   set_thread_flag(TIF_IA32);
-   clear_thread_flag(TIF_X32);
if (current->mm) {
/*
 * uprobes applied to this MM need to know this and
-- 
2.28.0



[PATCH v3 09/10] x86: Convert mmu context ia32_compat into a proper flags field

2020-10-03 Thread Gabriel Krisman Bertazi
The ia32_compat attribute is a weird thing.  It mirrors TIF_IA32 and
TIF_X32 and is used only in two very unrelated places: (1) to decide if
the vsyscall page is accessible (2) for uprobes to find whether the
patched instruction is 32 or 64 bit.  In preparation to remove the TI
flags, we want new values for ia32_compat, but given its odd semantics,
I'd rather make it a real flags field that configures these specific
behaviours.  So, set_personality_x64 can ask for the vsyscall page,
which is not available in x32/ia32 and set_personality_ia32 can
configure the uprobe code as needed.

uprobe cannot rely on other methods like user_64bit_mode() to decide how
to patch, so it needs some specific flag like this.

Signed-off-by: Gabriel Krisman Bertazi 

---
Changes since v2:
  - Rename MM_CONTEXT_GATE_PAGE -> MM_CONTEXT_HAS_VSYSCALL (andy)
---
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/include/asm/mmu.h|  6 --
 arch/x86/include/asm/mmu_context.h|  2 +-
 arch/x86/kernel/process_64.c  | 17 +++--
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c 
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..1b40b9297083 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -316,7 +316,7 @@ static struct vm_area_struct gate_vma __ro_after_init = {
 struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 {
 #ifdef CONFIG_COMPAT
-   if (!mm || mm->context.ia32_compat)
+   if (!mm || !(mm->context.flags & MM_CONTEXT_HAS_VSYSCALL))
return NULL;
 #endif
if (vsyscall_mode == NONE)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 9257667d13c5..6a00665574ea 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -7,6 +7,9 @@
 #include 
 #include 
 
+#define MM_CONTEXT_UPROBE_IA32 1 /* Uprobes on this MM assume 32-bit code */
+#define MM_CONTEXT_HAS_VSYSCALL2 /* Whether vsyscall page is 
accessible on this MM */
+
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
  */
@@ -33,8 +36,7 @@ typedef struct {
 #endif
 
 #ifdef CONFIG_X86_64
-   /* True if mm supports a task running in 32 bit compatibility mode. */
-   unsigned short ia32_compat;
+   unsigned short flags;
 #endif
 
struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index d98016b83755..054a79157323 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -177,7 +177,7 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 static inline bool is_64bit_mm(struct mm_struct *mm)
 {
return  !IS_ENABLED(CONFIG_IA32_EMULATION) ||
-   !(mm->context.ia32_compat == TIF_IA32);
+   !(mm->context.flags & MM_CONTEXT_UPROBE_IA32);
 }
 #else
 static inline bool is_64bit_mm(struct mm_struct *mm)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a4935d134e9d..40fa7973e4f0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -646,10 +646,8 @@ void set_personality_64bit(void)
/* Pretend that this comes from a 64bit execve */
task_pt_regs(current)->orig_ax = __NR_execve;
current_thread_info()->status &= ~TS_COMPAT;
-
-   /* Ensure the corresponding mm is not marked. */
if (current->mm)
-   current->mm->context.ia32_compat = 0;
+   current->mm->context.flags = MM_CONTEXT_HAS_VSYSCALL;
 
/* TBD: overwrites user setup. Should have two bits.
   But 64bit processes have always behaved this way,
@@ -664,7 +662,8 @@ static void __set_personality_x32(void)
clear_thread_flag(TIF_IA32);
set_thread_flag(TIF_X32);
if (current->mm)
-   current->mm->context.ia32_compat = TIF_X32;
+   current->mm->context.flags = 0;
+
current->personality &= ~READ_IMPLIES_EXEC;
/*
 * in_32bit_syscall() uses the presence of the x32 syscall bit
@@ -684,8 +683,14 @@ static void __set_personality_ia32(void)
 #ifdef CONFIG_IA32_EMULATION
set_thread_flag(TIF_IA32);
clear_thread_flag(TIF_X32);
-   if (current->mm)
-   current->mm->context.ia32_compat = TIF_IA32;
+   if (current->mm) {
+   /*
+* uprobes applied to this MM need to know this and
+* cannot use user_64bit_mode() at that time.
+*/
+   current->mm->context.flags = MM_CONTEXT_UPROBE_IA32;
+   }
+
current->personality |= force_personality32;
/* Prepare the first "return" to user space */
task_pt_regs(current)->orig_ax = __NR_ia32_execve;
-- 
2.28.0



[PATCH v3 04/10] x86: elf: Use e_machine to choose DLINFO in compat

2020-10-03 Thread Gabriel Krisman Bertazi
Since TIF_X32 is going away, avoid using it to find the ELF type on
ARCH_DLINFO.

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when loading ARCH_DLINFO
in compat mode.

Signed-off-by: Gabriel Krisman Bertazi 
Reviewed-by: Andy Lutomirski 
---
 arch/x86/include/asm/elf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index b9a5d488f1a5..9220efc65d78 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -361,7 +361,7 @@ do {
\
 #define AT_SYSINFO 32
 
 #define COMPAT_ARCH_DLINFO \
-if (test_thread_flag(TIF_X32)) \
+if (exec->e_machine == EM_X86_64)  \
ARCH_DLINFO_X32;\
 else   \
ARCH_DLINFO_IA32
-- 
2.28.0



[PATCH v3 05/10] elf: Expose ELF header in compat_start_thread

2020-10-03 Thread Gabriel Krisman Bertazi
Like it is done for SET_PERSONALITY with x86, which requires the ELF
header to select correct personality parameters, x86 requires the
headers on compat_start_thread to choose starting CS for ELF32 binaries,
instead of relying on the going-away TIF_IA32/X32 flags.

This patch adds an indirection macro to ELF invocations of START_THREAD,
that x86 can reimplement to receive the extra parameter just for ELF
files.  This requires no changes to other architectures who don't need
the header information, they can continue to use the original
start_thread for ELF and non-ELF binaries, and it prevents affecting
non-ELF code paths for x86.

Signed-off-by: Gabriel Krisman Bertazi 
---
 fs/binfmt_elf.c| 2 +-
 fs/compat_binfmt_elf.c | 9 +++--
 include/linux/elf.h| 5 +
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 13d053982dd7..7fec77a38b8d 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1279,7 +1279,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 #endif
 
finalize_exec(bprm);
-   start_thread(regs, elf_entry, bprm->p);
+   START_THREAD(elf_ex, regs, elf_entry, bprm->p);
retval = 0;
 out:
return retval;
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 2d24c765cbd7..12b991368f0a 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -106,8 +106,13 @@
 #endif
 
 #ifdef compat_start_thread
-#undef start_thread
-#definestart_threadcompat_start_thread
+#define COMPAT_START_THREAD(ex, regs, new_ip, new_sp)  \
+   compat_start_thread(regs, new_ip, new_sp)
+#endif
+
+#ifdef COMPAT_START_THREAD
+#undef START_THREAD
+#define START_THREAD   COMPAT_START_THREAD
 #endif
 
 #ifdef compat_arch_setup_additional_pages
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 5d5b0321da0b..6dbcfe7a3fd7 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -22,6 +22,11 @@
SET_PERSONALITY(ex)
 #endif
 
+#ifndef START_THREAD
+#define START_THREAD(elf_ex, regs, elf_entry, start_stack) \
+   start_thread(regs, elf_entry, start_stack)
+#endif
+
 #define ELF32_GNU_PROPERTY_ALIGN   4
 #define ELF64_GNU_PROPERTY_ALIGN   8
 
-- 
2.28.0



[PATCH v3 08/10] x86: elf: Use e_machine to select additional_pages between x32

2020-10-03 Thread Gabriel Krisman Bertazi
Since TIF_X32 is going away, avoid using it to find the ELF type when
choosing which additional pages to set up.

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when executing
setup_additional_pages in compat mode.

Signed-off-by: Gabriel Krisman Bertazi 

---
Changes since v2:
  - Avoid a function-like macro in compat_setup_additional_pages (Andy)
---
 arch/x86/entry/vdso/vma.c  | 4 ++--
 arch/x86/include/asm/elf.h | 6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 9185cb1d13b9..50e5d3a2e70a 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -413,10 +413,10 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 
 #ifdef CONFIG_COMPAT
 int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
-  int uses_interp)
+  int uses_interp, bool x32)
 {
 #ifdef CONFIG_X86_X32_ABI
-   if (test_thread_flag(TIF_X32)) {
+   if (x32) {
if (!vdso64_enabled)
return 0;
return map_vdso_randomized(_image_x32);
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 109697a19eb1..44a9b9940535 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -383,8 +383,10 @@ struct linux_binprm;
 extern int arch_setup_additional_pages(struct linux_binprm *bprm,
   int uses_interp);
 extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
- int uses_interp);
-#define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
+ int uses_interp, bool x32);
+#define COMPAT_ARCH_SETUP_ADDITIONAL_PAGES(bprm, ex, interpreter)  \
+   compat_arch_setup_additional_pages(bprm, interpreter,   \
+  (ex->e_machine == EM_X86_64))
 
 /* Do not change the values. See get_align_mask() */
 enum align_flags {
-- 
2.28.0



[PATCH v3 07/10] elf: Expose ELF header on arch_setup_additional_pages

2020-10-03 Thread Gabriel Krisman Bertazi
Like it is done for SET_PERSONALITY with ARM, which requires the ELF
header to select correct personality parameters, x86 requires the
headers when selecting which vdso to load, instead of relying on the
going-away TIF_IA32/X32 flags.  This patch adds an indirection macro to
arch_setup_additional_pages, that x86 can reimplement to receive the
extra parameter just for ELF files.  This requires no changes to other
architectures, who can continue to use the original
arch_setup_additional_pages for ELF and non-ELF binaries.

Signed-off-by: Gabriel Krisman Bertazi 
---
 fs/binfmt_elf.c|  2 +-
 fs/compat_binfmt_elf.c | 11 ---
 include/linux/elf.h|  5 +
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7fec77a38b8d..b41ed57271da 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1218,7 +1218,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
set_binfmt(_format);
 
 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
-   retval = arch_setup_additional_pages(bprm, !!interpreter);
+   retval = ARCH_SETUP_ADDITIONAL_PAGES(bprm, elf_ex, !!interpreter);
if (retval < 0)
goto out;
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 12b991368f0a..2c557229696a 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -115,11 +115,16 @@
 #define START_THREAD   COMPAT_START_THREAD
 #endif
 
-#ifdef compat_arch_setup_additional_pages
+#ifdef compat_arch_setup_additional_pages
+#define COMPAT_ARCH_SETUP_ADDITIONAL_PAGES(bprm, ex, interpreter) \
+   compat_arch_setup_additional_pages(bprm, interpreter)
+#endif
+
+#ifdef COMPAT_ARCH_SETUP_ADDITIONAL_PAGES
 #undef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
-#undef arch_setup_additional_pages
-#definearch_setup_additional_pages compat_arch_setup_additional_pages
+#undef ARCH_SETUP_ADDITIONAL_PAGES
+#defineARCH_SETUP_ADDITIONAL_PAGES COMPAT_ARCH_SETUP_ADDITIONAL_PAGES
 #endif
 
 #ifdef compat_elf_read_implies_exec
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 6dbcfe7a3fd7..c9a46c4e183b 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -27,6 +27,11 @@
start_thread(regs, elf_entry, start_stack)
 #endif
 
+#if defined(ARCH_HAS_SETUP_ADDITIONAL_PAGES) && 
!defined(ARCH_SETUP_ADDITIONAL_PAGES)
+#define ARCH_SETUP_ADDITIONAL_PAGES(bprm, ex, interpreter) \
+   arch_setup_additional_pages(bprm, interpreter)
+#endif
+
 #define ELF32_GNU_PROPERTY_ALIGN   4
 #define ELF64_GNU_PROPERTY_ALIGN   8
 
-- 
2.28.0



[PATCH v3 06/10] x86: elf: Use e_machine to select start_thread for x32

2020-10-03 Thread Gabriel Krisman Bertazi
Since TIF_X32 is going away, avoid using it to find the ELF type in
compat_start_thread.

According to SysV AMD64 ABI Draft, an AMD64 ELF object using ILP32 must
have ELFCLASS32 with (E_MACHINE == EM_X86_64), so use that ELF field to
differentiate a x32 object from a IA32 object when executing
start_thread in compat mode.

Signed-off-by: Gabriel Krisman Bertazi 

---
Changes since v2:
  - Avoid a function-like macro in compat_start_thread (Andy)
---
 arch/x86/include/asm/elf.h   | 5 +++--
 arch/x86/kernel/process_64.c | 5 ++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9220efc65d78..109697a19eb1 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -186,8 +186,9 @@ static inline void elf_common_init(struct thread_struct *t,
 #defineCOMPAT_ELF_PLAT_INIT(regs, load_addr)   \
elf_common_init(>thread, regs, __USER_DS)
 
-void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp);
-#define compat_start_thread compat_start_thread
+void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp, bool 
x32);
+#define COMPAT_START_THREAD(ex, regs, new_ip, new_sp)  \
+   compat_start_thread(regs, new_ip, new_sp, ex->e_machine == EM_X86_64)
 
 void set_personality_ia32(bool);
 #define COMPAT_SET_PERSONALITY(ex) \
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9afefe325acb..a4935d134e9d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -511,11 +511,10 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, 
unsigned long new_sp)
 EXPORT_SYMBOL_GPL(start_thread);
 
 #ifdef CONFIG_COMPAT
-void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp)
+void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp, bool 
x32)
 {
start_thread_common(regs, new_ip, new_sp,
-   test_thread_flag(TIF_X32)
-   ? __USER_CS : __USER32_CS,
+   x32 ? __USER_CS : __USER32_CS,
__USER_DS, __USER_DS);
 }
 #endif
-- 
2.28.0



[PATCH v3 03/10] x86: oprofile: Avoid TIF_IA32 when checking 64bit mode

2020-10-03 Thread Gabriel Krisman Bertazi
In preparation to remove TIF_IA32, stop using it in oprofile code.

Signed-off-by: Gabriel Krisman Bertazi 
---
 arch/x86/oprofile/backtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index a2488b6e27d6..1d8391fcca68 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -49,7 +49,7 @@ x86_backtrace_32(struct pt_regs * const regs, unsigned int 
depth)
struct stack_frame_ia32 *head;
 
/* User process is IA32 */
-   if (!current || !test_thread_flag(TIF_IA32))
+   if (!current || user_64bit_mode(regs))
return 0;
 
head = (struct stack_frame_ia32 *) regs->bp;
-- 
2.28.0



[PATCH v3 00/10] Reclaim TIF_IA32 and TIF_X32

2020-10-03 Thread Gabriel Krisman Bertazi
This is the third version of the patch to reclaim those TI flags.  The
main difference from v2 is that it exports the elf32 headers in the
macros in a proper way, instead of doing some magic to use them.

Andy, I didn't follow (my understanding of) your suggestion to expose
the elf32 headers because by doing that in compat_start_thread would
mean also doing it in start_thread, but the later is not ELF specific.
The mechanism I used, which solves the issue and I hope is not
over-complex is the same that SET_PERSONALITY does, so there is
precedent.  It also has the benefit that we don't need to touch other
architecture functions.  Do you think in this patch series is fine?

This also drops the vmx patch, since that is being reworked by Sean and
Andy, and my patch doesn't change its behavior.

* original cover letter:

We are running out of TI flags for x86.  This patchset removes several
usages of TIF_IA32 and TIF_x32 in preparation to reclaim these flags.
After these cleanups, there is still one more user for both of them,
which I need to take a better look before removing.

Many of the ideas for this patchset came from Andy Lutomirski (Thank
you!)

These were tested by exercising these paths with x32 and ia32 binaries.

Gabriel Krisman Bertazi (10):
  x86: events: Avoid TIF_IA32 when checking 64bit mode
  x86: Simplify compat syscall userspace allocation
  x86: oprofile: Avoid TIF_IA32 when checking 64bit mode
  x86: elf: Use e_machine to choose DLINFO in compat
  elf: Expose ELF header in compat_start_thread
  x86: elf: Use e_machine to select start_thread for x32
  elf: Expose ELF header on arch_setup_additional_pages
  x86: elf: Use e_machine to select additional_pages between x32
  x86: Convert mmu context ia32_compat into a proper flags field
  x86: Reclaim TIF_IA32 and TIF_X32

 arch/x86/entry/vdso/vma.c |  4 ++--
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/events/core.c|  2 +-
 arch/x86/events/intel/ds.c|  2 +-
 arch/x86/events/intel/lbr.c   |  2 +-
 arch/x86/include/asm/compat.h | 15 +++---
 arch/x86/include/asm/elf.h| 13 -
 arch/x86/include/asm/mmu.h|  6 --
 arch/x86/include/asm/mmu_context.h|  2 +-
 arch/x86/include/asm/thread_info.h|  4 
 arch/x86/kernel/perf_regs.c   |  2 +-
 arch/x86/kernel/process_64.c  | 28 +--
 arch/x86/oprofile/backtrace.c |  2 +-
 fs/binfmt_elf.c   |  4 ++--
 fs/compat_binfmt_elf.c| 20 ++-
 include/linux/elf.h   | 10 ++
 16 files changed, 68 insertions(+), 50 deletions(-)

-- 
2.28.0



[PATCH v3 01/10] x86: events: Avoid TIF_IA32 when checking 64bit mode

2020-10-03 Thread Gabriel Krisman Bertazi
In preparation to remove TIF_IA32, stop using it in perf events code.

Tested by running perf on 32-bit, 64-bit and x32 applications.

Suggested-by: Andy Lutomirski 
Signed-off-by: Gabriel Krisman Bertazi 
Acked-by: Peter Zijlstra (Intel) 
---
 arch/x86/events/core.c  | 2 +-
 arch/x86/events/intel/ds.c  | 2 +-
 arch/x86/events/intel/lbr.c | 2 +-
 arch/x86/kernel/perf_regs.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1cbf57dc2ac8..4fe82d9d959b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2499,7 +2499,7 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry_ctx *ent
struct stack_frame_ia32 frame;
const struct stack_frame_ia32 __user *fp;
 
-   if (!test_thread_flag(TIF_IA32))
+   if (user_64bit_mode(regs))
return 0;
 
cs_base = get_segment_base(regs->cs);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 86848c57b55e..94bd0d3acd15 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1261,7 +1261,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
old_to = to;
 
 #ifdef CONFIG_X86_64
-   is_64bit = kernel_ip(to) || !test_thread_flag(TIF_IA32);
+   is_64bit = kernel_ip(to) || any_64bit_mode(regs);
 #endif
insn_init(, kaddr, size, is_64bit);
insn_get_length();
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 8961653c5dd2..1aadb253d296 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1221,7 +1221,7 @@ static int branch_type(unsigned long from, unsigned long 
to, int abort)
 * on 64-bit systems running 32-bit apps
 */
 #ifdef CONFIG_X86_64
-   is64 = kernel_ip((unsigned long)addr) || !test_thread_flag(TIF_IA32);
+   is64 = kernel_ip((unsigned long)addr) || 
any_64bit_mode(current_pt_regs());
 #endif
insn_init(, addr, bytes_read, is64);
insn_get_opcode();
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index bb7e1132290b..9332c49a64a8 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -123,7 +123,7 @@ int perf_reg_validate(u64 mask)
 
 u64 perf_reg_abi(struct task_struct *task)
 {
-   if (test_tsk_thread_flag(task, TIF_IA32))
+   if (!user_64bit_mode(task_pt_regs(task)))
return PERF_SAMPLE_REGS_ABI_32;
else
return PERF_SAMPLE_REGS_ABI_64;
-- 
2.28.0



[PATCH v3 02/10] x86: Simplify compat syscall userspace allocation

2020-10-03 Thread Gabriel Krisman Bertazi
When allocating user memory space for a compat system call, don't
consider whether the originating code is IA32 or X32, just allocate from
a safe region for both, beyond the redzone.  This should be safe for
IA32, and has the benefit of avoiding TIF_IA32, which we want to drop.

Suggested-by: Andy Lutomirski 
Cc: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi 
---
 arch/x86/include/asm/compat.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index d4edf281fff4..a4b5126dff4e 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -179,14 +179,13 @@ typedef struct user_regs_struct compat_elf_gregset_t;
 
 static inline void __user *arch_compat_alloc_user_space(long len)
 {
-   compat_uptr_t sp;
-
-   if (test_thread_flag(TIF_IA32)) {
-   sp = task_pt_regs(current)->sp;
-   } else {
-   /* -128 for the x32 ABI redzone */
-   sp = task_pt_regs(current)->sp - 128;
-   }
+   compat_uptr_t sp = task_pt_regs(current)->sp;
+
+   /*
+* -128 for the x32 ABI redzone.  For IA32, it is not strictly
+* necessary, but not harmful.
+*/
+   sp -= 128;
 
return (void __user *)round_down(sp - len, 16);
 }
-- 
2.28.0



[PATCH v4 1/2] dt-bindings: hwmon: max20730: adding device tree doc for max20730

2020-10-03 Thread Chu Lin
max20730 Integrated, Step-Down Switching Regulator with PMBus

Signed-off-by: Chu Lin 
---
ChangeLog v1 -> v2
  hwmon: pmbus: max20730:
  - Don't do anything to the ret if an error is returned from pmbus_read_word
  - avoid overflow when doing multiplication

ChangeLog v2 -> v3
  dt-bindings: hwmon: max20730:
  - Provide the binding documentation in yaml format
  hwmon: pmbus: max20730:
  - No change

ChangeLog v3 -> v4
  dt-bindings: hwmon: max20730:
  - Fix highefficiency to high efficiency in description
  - Fix presents to present in vout-voltage-divider
  - Add additionalProperties: false
  hwmon: pmbus: max20730:
  - No change

 .../bindings/hwmon/maxim,max20730.yaml| 65 +++
 1 file changed, 65 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/maxim,max20730.yaml

diff --git a/Documentation/devicetree/bindings/hwmon/maxim,max20730.yaml 
b/Documentation/devicetree/bindings/hwmon/maxim,max20730.yaml
new file mode 100644
index ..93e86e3b4602
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/maxim,max20730.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+
+$id: http://devicetree.org/schemas/hwmon/maxim,max20730.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Maxim max20730
+
+maintainers:
+  - Jean Delvare 
+  - Guenter Roeck 
+
+description: |
+  The MAX20730 is a fully integrated, highly efficient switching regulator
+  with PMBus for applications operating from 4.5V to 16V and requiring
+  up to 25A (max) load. This single-chip regulator provides extremely
+  compact, high efficiency power-delivery solutions with high-precision
+  output voltages and excellent transient response.
+
+  Datasheets:
+https://datasheets.maximintegrated.com/en/ds/MAX20730.pdf
+https://datasheets.maximintegrated.com/en/ds/MAX20734.pdf
+https://datasheets.maximintegrated.com/en/ds/MAX20743.pdf
+
+properties:
+  compatible:
+enum:
+  - maxim,max20730
+  - maxim,max20734
+  - maxim,max20743
+
+  reg:
+maxItems: 1
+
+  vout-voltage-divider:
+description: |
+  If voltage divider present at vout, the voltage at voltage sensor pin
+  will be scaled. The properties will convert the raw reading to a more
+  meaningful number if voltage divider present. It has two numbers,
+  the first number is the output resistor, the second number is the total
+  resistance. Therefore, the adjusted vout is equal to
+  Vout = Vout * output_resistance / total resistance.
+$ref: /schemas/types.yaml#/definitions/uint32-array
+minItems: 2
+maxItems: 2
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+i2c {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  max20730@10 {
+compatible = "maxim,max20730";
+reg = <0x10>;
+vout-voltage-divider = <1000 2000>; // vout would be scaled to 0.5
+  };
+};
-- 
2.28.0.806.g8561365e88-goog



[PATCH v4 0/2] hwmon: pmbus: max20730: adjust the vout base on

2020-10-03 Thread Chu Lin
The patchset includes:
Patch #1 - Implmentation of adjusting vout base on voltage divider
Patch #2 - device tree binding documentation

ChangeLog v1 -> v2
  hwmon: pmbus: max20730:
  - Don't do anything to the ret if an error is returned from pmbus_read_word
  - avoid overflow when doing multiplication

ChangeLog v2 -> v3
  dt-bindings: hwmon: max20730:
  - Provide the binding documentation in yaml format
  hwmon: pmbus: max20730:
  - No change

ChangeLog v3 -> v4
  dt-bindings: hwmon: max20730:
  - Fix highefficiency to high efficiency in description
  - Fix presents to present in vout-voltage-divider
  - Add additionalProperties: false
  hwmon: pmbus: max20730:
  - No change

Chu Lin (2):
  dt-bindings: hwmon: max20730: adding device tree doc for max20730
  hwmon: pmbus: max20730: adjust the vout reading given voltage divider

 .../bindings/hwmon/maxim,max20730.yaml| 65 +++
 drivers/hwmon/pmbus/max20730.c| 18 +
 2 files changed, 83 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/maxim,max20730.yaml

-- 
2.28.0.806.g8561365e88-goog



[PATCH v4 2/2] hwmon: pmbus: max20730: adjust the vout reading given voltage divider

2020-10-03 Thread Chu Lin
Problem:
We use voltage dividers so that the voltage presented at the voltage
sense pins is confusing. We might need to convert these readings to more
meaningful readings given the voltage divider.

Solution:
Read the voltage divider resistance from dts and convert the voltage
reading to a more meaningful reading.

Testing:
max20730 with voltage divider

Signed-off-by: Chu Lin 
---
ChangeLog v1 -> v2
  hwmon: pmbus: max20730:
  - Don't do anything to the ret if an error is returned from pmbus_read_word
  - avoid overflow when doing multiplication

ChangeLog v2 -> v3
  dt-bindings: hwmon: max20730:
  - Provide the binding documentation in yaml format
  hwmon: pmbus: max20730:
  - No change

ChangeLog v3 -> v4
  dt-bindings: hwmon: max20730:
  - Fix highefficiency to high efficiency in description
  - Fix presents to present in vout-voltage-divider
  - Add additionalProperties: false
  hwmon: pmbus: max20730:
  - No change

 drivers/hwmon/pmbus/max20730.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/hwmon/pmbus/max20730.c b/drivers/hwmon/pmbus/max20730.c
index a151a2b588a5..fbf2f1e6c969 100644
--- a/drivers/hwmon/pmbus/max20730.c
+++ b/drivers/hwmon/pmbus/max20730.c
@@ -31,6 +31,7 @@ struct max20730_data {
struct pmbus_driver_info info;
struct mutex lock;  /* Used to protect against parallel writes */
u16 mfr_devset1;
+   u32 vout_voltage_divider[2];
 };
 
 #define to_max20730_data(x)  container_of(x, struct max20730_data, info)
@@ -114,6 +115,14 @@ static int max20730_read_word_data(struct i2c_client 
*client, int page,
max_c = max_current[data->id][(data->mfr_devset1 >> 5) & 0x3];
ret = val_to_direct(max_c, PSC_CURRENT_OUT, info);
break;
+   case PMBUS_READ_VOUT:
+   ret = pmbus_read_word_data(client, page, phase, reg);
+   if (ret > 0 && data->vout_voltage_divider[0] && 
data->vout_voltage_divider[1]) {
+   u64 temp = DIV_ROUND_CLOSEST_ULL((u64)ret * 
data->vout_voltage_divider[1],
+
data->vout_voltage_divider[0]);
+   ret = clamp_val(temp, 0, 0x);
+   }
+   break;
default:
ret = -ENODATA;
break;
@@ -364,6 +373,15 @@ static int max20730_probe(struct i2c_client *client,
data->id = chip_id;
mutex_init(>lock);
memcpy(>info, _info[chip_id], sizeof(data->info));
+   if (of_property_read_u32_array(client->dev.of_node, 
"vout-voltage-divider",
+  data->vout_voltage_divider,
+  ARRAY_SIZE(data->vout_voltage_divider)) 
!= 0)
+   memset(data->vout_voltage_divider, 0, 
sizeof(data->vout_voltage_divider));
+   if (data->vout_voltage_divider[1] < data->vout_voltage_divider[0]) {
+   dev_err(dev,
+   "The total resistance of voltage divider is less than 
output resistance\n");
+   return -ENODEV;
+   }
 
ret = i2c_smbus_read_word_data(client, MAX20730_MFR_DEVSET1);
if (ret < 0)
-- 
2.28.0.806.g8561365e88-goog



security/integrity/platform_certs/keyring_handler.c:62:30: warning: no previous prototype for 'get_handler_for_db'

2020-10-03 Thread kernel test robot
Hi Nayna,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   22fbc037cd32e4e6771d2271b565806cfb8c134c
commit: ad723674d6758478829ee766e3f1a2a24d56236f x86/efi: move common keyring 
handler functions to new file
date:   11 months ago
config: ia64-randconfig-r014-20201004 (attached as .config)
compiler: ia64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad723674d6758478829ee766e3f1a2a24d56236f
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout ad723674d6758478829ee766e3f1a2a24d56236f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=ia64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> security/integrity/platform_certs/keyring_handler.c:62:30: warning: no 
>> previous prototype for 'get_handler_for_db' [-Wmissing-prototypes]
  62 | __init efi_element_handler_t get_handler_for_db(const efi_guid_t 
*sig_type)
 |  ^~
>> security/integrity/platform_certs/keyring_handler.c:73:30: warning: no 
>> previous prototype for 'get_handler_for_dbx' [-Wmissing-prototypes]
  73 | __init efi_element_handler_t get_handler_for_dbx(const efi_guid_t 
*sig_type)
 |  ^~~

vim +/get_handler_for_db +62 security/integrity/platform_certs/keyring_handler.c

57  
58  /*
59   * Return the appropriate handler for particular signature list types 
found in
60   * the UEFI db and MokListRT tables.
61   */
  > 62  __init efi_element_handler_t get_handler_for_db(const efi_guid_t 
*sig_type)
63  {
64  if (efi_guidcmp(*sig_type, efi_cert_x509_guid) == 0)
65  return add_to_platform_keyring;
66  return 0;
67  }
68  
69  /*
70   * Return the appropriate handler for particular signature list types 
found in
71   * the UEFI dbx and MokListXRT tables.
72   */
  > 73  __init efi_element_handler_t get_handler_for_dbx(const efi_guid_t 
*sig_type)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [RFC][PATCHSET] epoll cleanups

2020-10-03 Thread Al Viro
On Sun, Oct 04, 2020 at 03:36:08AM +0100, Al Viro wrote:
>   Locking and especilly control flow in fs/eventpoll.c is
> overcomplicated.  As the result, the code has been hard to follow
> and easy to fuck up while modifying.
> 
>   The following series attempts to untangle it; there's more to be done
> there, but this should take care of some of the obfuscated bits.  It also
> reduces the memory footprint of that thing.
> 
>   The series lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #experimental.epoll
> and it survives light local beating.  It really needs review and testing.
> I'll post the individual patches in followups (27 commits, trimming about 120
> lines out of fs/eventpoll.c).

PS: the posted series is on top of #work.epoll, which got merged into mainline
on Friday.  Forgot to mention that - my apologies.


[RFC PATCH 09/27] reverse_path_check_proc(): don't bother with cookies

2020-10-03 Thread Al Viro
From: Al Viro 

We know there's no loops by the time we call it; the
only thing we care about is too deep reverse paths.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3e6f1f97f246..0f540e91aa92 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1311,7 +1311,7 @@ static int reverse_path_check_proc(struct file *file, int 
depth)
int error = 0;
struct epitem *epi;
 
-   if (!ep_push_nested(file)) /* limits recursion */
+   if (depth > EP_MAX_NESTS) /* too deep nesting */
return -1;
 
/* CTL_DEL can remove links here, but that can't increase our count */
@@ -1336,7 +1336,6 @@ static int reverse_path_check_proc(struct file *file, int 
depth)
}
}
rcu_read_unlock();
-   nesting--; /* pop */
return error;
 }
 
-- 
2.11.0



[RFC PATCH 02/27] epoll: get rid of epitem->nwait

2020-10-03 Thread Al Viro
From: Al Viro 

we use it only to indicate allocation failures within queueing
callback back to ep_insert().  Might as well use epq.epi for that
reporting...

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 46 --
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ae41868d9b35..44aca681d897 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -172,9 +172,6 @@ struct epitem {
/* The file descriptor information this item refers to */
struct epoll_filefd ffd;
 
-   /* Number of active wait queue attached to poll operations */
-   int nwait;
-
/* List containing poll wait queues */
struct eppoll_entry *pwqlist;
 
@@ -351,12 +348,6 @@ static inline struct epitem 
*ep_item_from_wait(wait_queue_entry_t *p)
return container_of(p, struct eppoll_entry, wait)->base;
 }
 
-/* Get the "struct epitem" from an epoll queue wrapper */
-static inline struct epitem *ep_item_from_epqueue(poll_table *p)
-{
-   return container_of(p, struct ep_pqueue, pt)->epi;
-}
-
 /* Initialize the poll safe wake up structure */
 static void ep_nested_calls_init(struct nested_calls *ncalls)
 {
@@ -1307,24 +1298,28 @@ static int ep_poll_callback(wait_queue_entry_t *wait, 
unsigned mode, int sync, v
 static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 poll_table *pt)
 {
-   struct epitem *epi = ep_item_from_epqueue(pt);
+   struct ep_pqueue *epq = container_of(pt, struct ep_pqueue, pt);
+   struct epitem *epi = epq->epi;
struct eppoll_entry *pwq;
 
-   if (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL))) 
{
-   init_waitqueue_func_entry(>wait, ep_poll_callback);
-   pwq->whead = whead;
-   pwq->base = epi;
-   if (epi->event.events & EPOLLEXCLUSIVE)
-   add_wait_queue_exclusive(whead, >wait);
-   else
-   add_wait_queue(whead, >wait);
-   pwq->next = epi->pwqlist;
-   epi->pwqlist = pwq;
-   epi->nwait++;
-   } else {
-   /* We have to signal that an error occurred */
-   epi->nwait = -1;
+   if (unlikely(!epi)) // an earlier allocation has failed
+   return;
+
+   pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL);
+   if (unlikely(!pwq)) {
+   epq->epi = NULL;
+   return;
}
+
+   init_waitqueue_func_entry(>wait, ep_poll_callback);
+   pwq->whead = whead;
+   pwq->base = epi;
+   if (epi->event.events & EPOLLEXCLUSIVE)
+   add_wait_queue_exclusive(whead, >wait);
+   else
+   add_wait_queue(whead, >wait);
+   pwq->next = epi->pwqlist;
+   epi->pwqlist = pwq;
 }
 
 static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
@@ -1510,7 +1505,6 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
epi->ep = ep;
ep_set_ffd(>ffd, tfile, fd);
epi->event = *event;
-   epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
if (epi->event.events & EPOLLWAKEUP) {
error = ep_create_wakeup_source(epi);
@@ -1555,7 +1549,7 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 * high memory pressure.
 */
error = -ENOMEM;
-   if (epi->nwait < 0)
+   if (!epq.epi)
goto error_unregister;
 
/* We have to drop the new item inside our item list to keep track of 
it */
-- 
2.11.0



[RFC PATCH 27/27] epoll: take epitem list out of struct file

2020-10-03 Thread Al Viro
From: Al Viro 

Move the head of epitem list out of struct file; for epoll ones it's
moved into struct eventpoll (->refs there), for non-epoll - into
the new object (struct epitem_head).  In place of ->f_ep_links we
leave a pointer to the list head (->f_ep).

->f_ep is protected by ->f_lock and it's zeroed as soon as the list
of epitems becomes empty (that can happen only in ep_remove() by
now).

The list of files for reverse path check is *not* going through
struct file now - it's a single-linked list going through epitem_head
instances.  It's terminated by ERR_PTR(-1) (== EP_UNACTIVE_POINTER),
so the elements of list can be distinguished by head->next != NULL.

epitem_head instances are allocated at ep_insert() time (by
attach_epitem()) and freed either by ep_remove() (if it empties
the set of epitems *and* epitem_head does not belong to the
reverse path check list) or by clear_tfile_check_list() when
the list is emptied (if the set of epitems is empty by that
point).  Allocations are done from a separate slab - minimal kmalloc()
size is too large on some architectures.

As the result, we trim struct file _and_ get rid of the games with
temporary file references.

Locking and barriers are interesting (aren't they always); see unlist_file()
and ep_remove() for details.  The non-obvious part is that ep_remove() needs
to decide if it will be the one to free the damn thing *before* actually
storing NULL to head->epitems.first - that's what smp_load_acquire is for
in there.  unlist_file() lockless path is safe, since we hit it only if
we observe NULL in head->epitems.first and whoever had done that store is
guaranteed to have observed non-NULL in head->next.  IOW, their last access
had been the store of NULL into ->epitems.first and we can safely free
the sucker.  OTOH, we are under rcu_read_lock() and both epitem and
epitem->file have their freeing RCU-delayed.  So if we see non-NULL
->epitems.first, we can grab ->f_lock (all epitems in there share the
same struct file) and safely recheck the emptiness of ->epitems; again,
->next is still non-NULL, so ep_remove() couldn't have freed head yet.
->f_lock serializes us wrt ep_remove(); the rest is trivial.

Note that once head->epitems becomes NULL, nothing can get inserted into
it - the only remaining reference to head after that point is from the
reverse path check list.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c| 168 ++
 fs/file_table.c   |   1 -
 include/linux/eventpoll.h |  11 +--
 include/linux/fs.h|   5 +-
 4 files changed, 129 insertions(+), 56 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eea269670168..d3fdabf6fd34 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -215,6 +215,7 @@ struct eventpoll {
 
/* used to optimize loop detection check */
u64 gen;
+   struct hlist_head refs;
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
/* used to track busy poll napi_id */
@@ -259,7 +260,45 @@ static struct kmem_cache *pwq_cache __read_mostly;
  * List of files with newly added links, where we may need to limit the number
  * of emanating paths. Protected by the epmutex.
  */
-static LIST_HEAD(tfile_check_list);
+struct epitems_head {
+   struct hlist_head epitems;
+   struct epitems_head *next;
+};
+static struct epitems_head *tfile_check_list = EP_UNACTIVE_PTR;
+
+static struct kmem_cache *ephead_cache __read_mostly;
+
+static inline void free_ephead(struct epitems_head *head)
+{
+   if (head)
+   kmem_cache_free(ephead_cache, head);
+}
+
+static void list_file(struct file *file)
+{
+   struct epitems_head *head;
+
+   head = container_of(file->f_ep, struct epitems_head, epitems);
+   if (!head->next) {
+   head->next = tfile_check_list;
+   tfile_check_list = head;
+   }
+}
+
+static void unlist_file(struct epitems_head *head)
+{
+   struct epitems_head *to_free = head;
+   struct hlist_node *p = rcu_dereference(hlist_first_rcu(>epitems));
+   if (p) {
+   struct epitem *epi= container_of(p, struct epitem, fllink);
+   spin_lock(>ffd.file->f_lock);
+   if (!hlist_empty(>epitems))
+   to_free = NULL;
+   head->next = NULL;
+   spin_unlock(>ffd.file->f_lock);
+   }
+   free_ephead(to_free);
+}
 
 #ifdef CONFIG_SYSCTL
 
@@ -632,6 +671,8 @@ static void epi_rcu_free(struct rcu_head *head)
 static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 {
struct file *file = epi->ffd.file;
+   struct epitems_head *to_free;
+   struct hlist_head *head;
 
lockdep_assert_irqs_enabled();
 
@@ -642,8 +683,20 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
 
/* Remove the current item from the list of epoll hooks */
spin_lock(>f_lock);
+   to_free = NULL;
+   head = file->f_ep;
+   if (head->first == >fllink && !epi->fllink.next) {

[RFC PATCH 05/27] untangling ep_call_nested(): take pushing cookie into a helper

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 43aecae0935c..bd2cc78c47c8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -424,6 +424,21 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
 
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
+static bool ep_push_nested(void *cookie)
+{
+   int i;
+
+   if (nesting > EP_MAX_NESTS) /* too deep nesting */
+   return false;
+
+   for (i = 0; i < nesting; i++) {
+   if (cookies[i] == cookie) /* loop detected */
+   return false;
+   }
+   cookies[nesting++] = cookie;
+   return true;
+}
+
 /**
  * ep_call_nested - Perform a bound (possibly) nested call, by checking
  *  that the recursion limit is not exceeded, and that
@@ -440,17 +455,10 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
 static int ep_call_nested(int (*nproc)(void *, void *, int), void *priv,
  void *cookie)
 {
-   int error, i;
+   int error;
 
-   if (nesting > EP_MAX_NESTS) /* too deep nesting */
+   if (!ep_push_nested(cookie))
return -1;
-
-   for (i = 0; i < nesting; i++) {
-   if (cookies[i] == cookie) /* loop detected */
-   return -1;
-   }
-   cookies[nesting++] = cookie;
-
/* Call the nested function */
error = (*nproc)(priv, cookie, nesting - 1);
nesting--;
-- 
2.11.0



[RFC PATCH 10/27] clean reverse_path_check_proc() a bit

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f540e91aa92..33af838046ea 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1317,23 +1317,15 @@ static int reverse_path_check_proc(struct file *file, 
int depth)
/* CTL_DEL can remove links here, but that can't increase our count */
rcu_read_lock();
list_for_each_entry_rcu(epi, >f_ep_links, fllink) {
-   struct file *child_file = epi->ep->file;
-   if (is_file_epoll(child_file)) {
-   if (list_empty(_file->f_ep_links)) {
-   if (path_count_inc(depth)) {
-   error = -1;
-   break;
-   }
-   } else {
-   error = reverse_path_check_proc(child_file,
-   depth + 1);
-   }
-   if (error != 0)
-   break;
-   } else {
-   printk(KERN_ERR "reverse_path_check_proc: "
-   "file is not an ep!\n");
-   }
+   struct file *recepient = epi->ep->file;
+   if (WARN_ON(!is_file_epoll(recepient)))
+   continue;
+   if (list_empty(>f_ep_links))
+   error = path_count_inc(depth);
+   else
+   error = reverse_path_check_proc(recepient, depth + 1);
+   if (error != 0)
+   break;
}
rcu_read_unlock();
return error;
-- 
2.11.0



[RFC PATCH 11/27] ep_loop_check_proc(): lift pushing the cookie into callers

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 33af838046ea..9edea3933790 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1877,9 +1877,6 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
struct rb_node *rbp;
struct epitem *epi;
 
-   if (!ep_push_nested(cookie)) /* limits recursion */
-   return -1;
-
mutex_lock_nested(>mtx, depth + 1);
ep->gen = loop_check_gen;
for (rbp = rb_first_cached(>rbr); rbp; rbp = rb_next(rbp)) {
@@ -1888,8 +1885,13 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
ep_tovisit = epi->ffd.file->private_data;
if (ep_tovisit->gen == loop_check_gen)
continue;
-   error = ep_loop_check_proc(epi->ffd.file, ep_tovisit,
+   if (!ep_push_nested(ep_tovisit)) {
+   error = -1;
+   } else {
+   error = ep_loop_check_proc(epi->ffd.file, 
ep_tovisit,
   depth + 1);
+   nesting--;
+   }
if (error != 0)
break;
} else {
@@ -1909,7 +1911,6 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
}
}
mutex_unlock(>mtx);
-   nesting--; /* pop */
 
return error;
 }
@@ -1927,7 +1928,12 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
  */
 static int ep_loop_check(struct eventpoll *ep, struct file *file)
 {
-   return ep_loop_check_proc(file, ep, 0);
+   int err;
+
+   ep_push_nested(ep); // can't fail
+   err = ep_loop_check_proc(file, ep, 0);
+   nesting--;
+   return err;
 }
 
 static void clear_tfile_check_list(void)
-- 
2.11.0



[RFC PATCH 12/27] get rid of ep_push_nested()

2020-10-03 Thread Al Viro
From: Al Viro 

The only remaining user is loop checking.  But there we only need
to check that we have not walked into the epoll we are inserting
into - we are adding an edge to acyclic graph, so any loop being
created will have to pass through the source of that edge.

So we don't need that array of cookies - we have only one eventpoll
to watch out for.  RIP ep_push_nested(), along with the cookies
array.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 29 -
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9edea3933790..6b1990b8b9a0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -254,8 +254,7 @@ static DEFINE_MUTEX(epmutex);
 static u64 loop_check_gen = 0;
 
 /* Used to check for epoll file descriptor inclusion loops */
-static void *cookies[EP_MAX_NESTS + 1];
-static int nesting;
+static struct eventpoll *inserting_into;
 
 /* Slab cache used to allocate "struct epitem" */
 static struct kmem_cache *epi_cache __read_mostly;
@@ -424,21 +423,6 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
 
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
-static bool ep_push_nested(void *cookie)
-{
-   int i;
-
-   if (nesting > EP_MAX_NESTS) /* too deep nesting */
-   return false;
-
-   for (i = 0; i < nesting; i++) {
-   if (cookies[i] == cookie) /* loop detected */
-   return false;
-   }
-   cookies[nesting++] = cookie;
-   return true;
-}
-
 /*
  * As described in commit 0ccf831cb lockdep: annotate epoll
  * the use of wait queues used by epoll is done in a very controlled
@@ -1885,12 +1869,11 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
ep_tovisit = epi->ffd.file->private_data;
if (ep_tovisit->gen == loop_check_gen)
continue;
-   if (!ep_push_nested(ep_tovisit)) {
+   if (ep_tovisit == inserting_into || depth > 
EP_MAX_NESTS) {
error = -1;
} else {
error = ep_loop_check_proc(epi->ffd.file, 
ep_tovisit,
   depth + 1);
-   nesting--;
}
if (error != 0)
break;
@@ -1928,12 +1911,8 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
  */
 static int ep_loop_check(struct eventpoll *ep, struct file *file)
 {
-   int err;
-
-   ep_push_nested(ep); // can't fail
-   err = ep_loop_check_proc(file, ep, 0);
-   nesting--;
-   return err;
+   inserting_into = ep;
+   return ep_loop_check_proc(file, ep, 0);
 }
 
 static void clear_tfile_check_list(void)
-- 
2.11.0



[RFC PATCH 24/27] convert ->f_ep_links/->fllink to hlist

2020-10-03 Thread Al Viro
From: Al Viro 

we don't care about the order of elements there

Signed-off-by: Al Viro 
---
 fs/eventpoll.c| 18 +-
 include/linux/eventpoll.h |  4 ++--
 include/linux/fs.h|  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 66da645d5eb4..78b8769b72dc 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -160,7 +160,7 @@ struct epitem {
struct eventpoll *ep;
 
/* List header used to link this item to the "struct file" items list */
-   struct list_head fllink;
+   struct hlist_node fllink;
 
/* wakeup_source used when EPOLLWAKEUP is set */
struct wakeup_source __rcu *ws;
@@ -642,7 +642,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
 
/* Remove the current item from the list of epoll hooks */
spin_lock(>f_lock);
-   list_del_rcu(>fllink);
+   hlist_del_rcu(>fllink);
spin_unlock(>f_lock);
 
rb_erase_cached(>rbn, >rbr);
@@ -835,7 +835,8 @@ static const struct file_operations eventpoll_fops = {
 void eventpoll_release_file(struct file *file)
 {
struct eventpoll *ep;
-   struct epitem *epi, *next;
+   struct epitem *epi;
+   struct hlist_node *next;
 
/*
 * We don't want to get "file->f_lock" because it is not
@@ -851,7 +852,7 @@ void eventpoll_release_file(struct file *file)
 * Besides, ep_remove() acquires the lock, so we can't hold it here.
 */
mutex_lock();
-   list_for_each_entry_safe(epi, next, >f_ep_links, fllink) {
+   hlist_for_each_entry_safe(epi, next, >f_ep_links, fllink) {
ep = epi->ep;
mutex_lock_nested(>mtx, 0);
ep_remove(ep, epi);
@@ -1257,11 +1258,11 @@ static int reverse_path_check_proc(struct file *file, 
int depth)
 
/* CTL_DEL can remove links here, but that can't increase our count */
rcu_read_lock();
-   list_for_each_entry_rcu(epi, >f_ep_links, fllink) {
+   hlist_for_each_entry_rcu(epi, >f_ep_links, fllink) {
struct file *recepient = epi->ep->file;
if (WARN_ON(!is_file_epoll(recepient)))
continue;
-   if (list_empty(>f_ep_links))
+   if (hlist_empty(>f_ep_links))
error = path_count_inc(depth);
else
error = reverse_path_check_proc(recepient, depth + 1);
@@ -1361,7 +1362,6 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 
/* Item initialization follow here ... */
INIT_LIST_HEAD(>rdllink);
-   INIT_LIST_HEAD(>fllink);
epi->ep = ep;
ep_set_ffd(>ffd, tfile, fd);
epi->event = *event;
@@ -1373,7 +1373,7 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
mutex_lock(>mtx);
/* Add the current item to the list of active epoll hook for this file 
*/
spin_lock(>f_lock);
-   list_add_tail_rcu(>fllink, >f_ep_links);
+   hlist_add_head_rcu(>fllink, >f_ep_links);
spin_unlock(>f_lock);
 
/*
@@ -1999,7 +1999,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
if (error)
goto error_tgt_fput;
if (op == EPOLL_CTL_ADD) {
-   if (!list_empty(>f_ep_links) ||
+   if (!hlist_empty(>f_ep_links) ||
ep->gen == loop_check_gen ||
is_file_epoll(tf.file)) {
mutex_unlock(>mtx);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 8f000fada5a4..4e215ccfa792 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,7 +25,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int 
tfd, unsigned long t
 /* Used to initialize the epoll bits inside the "struct file" */
 static inline void eventpoll_init_file(struct file *file)
 {
-   INIT_LIST_HEAD(>f_ep_links);
+   INIT_HLIST_HEAD(>f_ep_links);
INIT_LIST_HEAD(>f_tfile_llink);
 }
 
@@ -50,7 +50,7 @@ static inline void eventpoll_release(struct file *file)
 * because the file in on the way to be removed and nobody ( but
 * eventpoll ) has still a reference to this file.
 */
-   if (likely(list_empty(>f_ep_links)))
+   if (likely(hlist_empty(>f_ep_links)))
return;
 
/*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e019ea2f1347..9dc4c09f1d13 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -951,7 +951,7 @@ struct file {
 
 #ifdef CONFIG_EPOLL
/* Used by fs/eventpoll.c to link all the hooks to this file */
-   struct list_headf_ep_links;
+   struct hlist_head   f_ep_links;
struct list_headf_tfile_llink;
 #endif /* #ifdef CONFIG_EPOLL */
struct address_space*f_mapping;
-- 
2.11.0


[RFC PATCH 06/27] untangling ep_call_nested(): move push/pop of cookie into the callbacks

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index bd2cc78c47c8..9a6ee5991f3d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -455,15 +455,7 @@ static bool ep_push_nested(void *cookie)
 static int ep_call_nested(int (*nproc)(void *, void *, int), void *priv,
  void *cookie)
 {
-   int error;
-
-   if (!ep_push_nested(cookie))
-   return -1;
-   /* Call the nested function */
-   error = (*nproc)(priv, cookie, nesting - 1);
-   nesting--;
-
-   return error;
+   return (*nproc)(priv, cookie, nesting);
 }
 
 /*
@@ -1340,6 +1332,9 @@ static int reverse_path_check_proc(void *priv, void 
*cookie, int call_nests)
struct file *child_file;
struct epitem *epi;
 
+   if (!ep_push_nested(cookie)) /* limits recursion */
+   return -1;
+
/* CTL_DEL can remove links here, but that can't increase our count */
rcu_read_lock();
list_for_each_entry_rcu(epi, >f_ep_links, fllink) {
@@ -1362,6 +1357,7 @@ static int reverse_path_check_proc(void *priv, void 
*cookie, int call_nests)
}
}
rcu_read_unlock();
+   nesting--; /* pop */
return error;
 }
 
@@ -1913,6 +1909,9 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
struct rb_node *rbp;
struct epitem *epi;
 
+   if (!ep_push_nested(cookie)) /* limits recursion */
+   return -1;
+
mutex_lock_nested(>mtx, call_nests + 1);
ep->gen = loop_check_gen;
for (rbp = rb_first_cached(>rbr); rbp; rbp = rb_next(rbp)) {
@@ -1942,6 +1941,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
}
}
mutex_unlock(>mtx);
+   nesting--; /* pop */
 
return error;
 }
-- 
2.11.0



[RFC PATCH 16/27] lift the calls of ep_send_events_proc() into the callers

2020-10-03 Thread Al Viro
From: Al Viro 

... and kill ep_scan_ready_list()

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 33 +
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9b9e29e0c85f..3b3a862f8014 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -636,33 +636,6 @@ static void ep_done_scan(struct eventpoll *ep,
mutex_unlock(>mtx);
 }
 
-/**
- * ep_scan_ready_list - Scans the ready list in a way that makes possible for
- *  the scan code, to call f_op->poll(). Also allows for
- *  O(NumReady) performance.
- *
- * @ep: Pointer to the epoll private data structure.
- * @sproc: Pointer to the scan callback.
- * @priv: Private opaque data passed to the @sproc callback.
- * @depth: The current depth of recursive f_op->poll calls.
- * @ep_locked: caller already holds ep->mtx
- *
- * Returns: The same integer error code returned by the @sproc callback.
- */
-static __poll_t ep_scan_ready_list(struct eventpoll *ep,
- __poll_t (*sproc)(struct eventpoll *,
-  struct list_head *, void *),
- void *priv, int depth, bool ep_locked)
-{
-   __poll_t res;
-   LIST_HEAD(txlist);
-
-   ep_start_scan(ep, depth, ep_locked, );
-   res = (*sproc)(ep, , priv);
-   ep_done_scan(ep, depth, ep_locked, );
-   return res;
-}
-
 static void epi_rcu_free(struct rcu_head *head)
 {
struct epitem *epi = container_of(head, struct epitem, rcu);
@@ -1685,11 +1658,15 @@ static int ep_send_events(struct eventpoll *ep,
  struct epoll_event __user *events, int maxevents)
 {
struct ep_send_events_data esed;
+   LIST_HEAD(txlist);
 
esed.maxevents = maxevents;
esed.events = events;
 
-   ep_scan_ready_list(ep, ep_send_events_proc, , 0, false);
+   ep_start_scan(ep, 0, false, );
+   ep_send_events_proc(ep, , );
+   ep_done_scan(ep, 0, false, );
+
return esed.res;
 }
 
-- 
2.11.0



[RFC PATCH 08/27] reverse_path_check_proc(): sane arguments

2020-10-03 Thread Al Viro
From: Al Viro 

no need to force its calling conventions to match the callback for
late unlamented ep_call_nested()...

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8c3b02755a50..3e6f1f97f246 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1306,20 +1306,18 @@ static void path_count_init(void)
path_count[i] = 0;
 }
 
-static int reverse_path_check_proc(void *priv, void *cookie, int depth)
+static int reverse_path_check_proc(struct file *file, int depth)
 {
int error = 0;
-   struct file *file = priv;
-   struct file *child_file;
struct epitem *epi;
 
-   if (!ep_push_nested(cookie)) /* limits recursion */
+   if (!ep_push_nested(file)) /* limits recursion */
return -1;
 
/* CTL_DEL can remove links here, but that can't increase our count */
rcu_read_lock();
list_for_each_entry_rcu(epi, >f_ep_links, fllink) {
-   child_file = epi->ep->file;
+   struct file *child_file = epi->ep->file;
if (is_file_epoll(child_file)) {
if (list_empty(_file->f_ep_links)) {
if (path_count_inc(depth)) {
@@ -1327,7 +1325,7 @@ static int reverse_path_check_proc(void *priv, void 
*cookie, int depth)
break;
}
} else {
-   error = reverse_path_check_proc(child_file, 
child_file,
+   error = reverse_path_check_proc(child_file,
depth + 1);
}
if (error != 0)
@@ -1360,7 +1358,7 @@ static int reverse_path_check(void)
/* let's call this for all tfiles */
list_for_each_entry(current_file, _check_list, f_tfile_llink) {
path_count_init();
-   error = reverse_path_check_proc(current_file, current_file, 0);
+   error = reverse_path_check_proc(current_file, 0);
if (error)
break;
}
-- 
2.11.0



[RFC PATCH 20/27] ep_insert(): we only need tep->mtx around the insertion itself

2020-10-03 Thread Al Viro
From: Al Viro 

We do need ep->mtx (and we are holding it all along), but that's
the lock on the epoll we are inserting into; locking of the
epoll being inserted is not needed for most of that work -
as the matter of fact, we only need it to provide barriers
for the fastpath check (for now).

Move taking and releasing it into ep_insert().  The caller
(do_epoll_ctl()) doesn't need to bother with that at all.
Moreover, that way we kill the kludge in ep_item_poll() - now
it's always called with tep unlocked.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index c987b61701e4..39947b71f7af 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -731,8 +731,6 @@ static int ep_eventpoll_release(struct inode *inode, struct 
file *file)
 
 static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
   int depth);
-static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
-poll_table *pt);
 
 /*
  * Differs from ep_eventpoll_poll() in that internal callers already have
@@ -745,7 +743,6 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
struct eventpoll *ep;
LIST_HEAD(txlist);
__poll_t res;
-   bool locked;
 
pt->_key = epi->event.events;
if (!is_file_epoll(epi->ffd.file))
@@ -754,15 +751,11 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
ep = epi->ffd.file->private_data;
poll_wait(epi->ffd.file, >poll_wait, pt);
 
-   // kludge: ep_insert() calls us with ep->mtx already locked
-   locked = pt && (pt->_qproc == ep_ptable_queue_proc);
-   if (!locked)
-   mutex_lock_nested(>mtx, depth);
+   mutex_lock_nested(>mtx, depth);
ep_start_scan(ep, );
res = ep_read_events_proc(ep, , depth + 1);
ep_done_scan(ep, );
-   if (!locked)
-   mutex_unlock(>mtx);
+   mutex_unlock(>mtx);
return res & epi->event.events;
 }
 
@@ -1365,6 +1358,10 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
long user_watches;
struct epitem *epi;
struct ep_pqueue epq;
+   struct eventpoll *tep = NULL;
+
+   if (is_file_epoll(tfile))
+   tep = tfile->private_data;
 
lockdep_assert_irqs_enabled();
 
@@ -1394,6 +1391,8 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 
atomic_long_inc(>user->epoll_watches);
 
+   if (tep)
+   mutex_lock(>mtx);
/* Add the current item to the list of active epoll hook for this file 
*/
spin_lock(>f_lock);
list_add_tail_rcu(>fllink, >f_ep_links);
@@ -1404,6 +1403,8 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 * protected by "mtx", and ep_insert() is called with "mtx" held.
 */
ep_rbtree_insert(ep, epi);
+   if (tep)
+   mutex_unlock(>mtx);
 
/* now check if we've created too many backpaths */
if (unlikely(full_check && reverse_path_check())) {
@@ -2034,13 +2035,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
error = epoll_mutex_lock(>mtx, 0, nonblock);
if (error)
goto error_tgt_fput;
-   if (is_file_epoll(tf.file)) {
-   error = epoll_mutex_lock(>mtx, 1, 
nonblock);
-   if (error) {
-   mutex_unlock(>mtx);
-   goto error_tgt_fput;
-   }
-   }
}
}
 
@@ -2076,8 +2070,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
error = -ENOENT;
break;
}
-   if (tep != NULL)
-   mutex_unlock(>mtx);
mutex_unlock(>mtx);
 
 error_tgt_fput:
-- 
2.11.0



[RFC PATCH 22/27] fold ep_read_events_proc() into the only caller

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 49 -
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a50b48d26c55..1efe8a1a022a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -729,14 +729,17 @@ static int ep_eventpoll_release(struct inode *inode, 
struct file *file)
return 0;
 }
 
-static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
-  int depth);
+static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt, int 
depth);
 
 static __poll_t __ep_eventpoll_poll(struct file *file, poll_table *wait, int 
depth)
 {
struct eventpoll *ep = file->private_data;
LIST_HEAD(txlist);
-   __poll_t res;
+   struct epitem *epi, *tmp;
+   poll_table pt;
+   __poll_t res = 0;
+
+   init_poll_funcptr(, NULL);
 
/* Insert inside our poll wait queue */
poll_wait(file, >poll_wait, wait);
@@ -747,7 +750,20 @@ static __poll_t __ep_eventpoll_poll(struct file *file, 
poll_table *wait, int dep
 */
mutex_lock_nested(>mtx, depth);
ep_start_scan(ep, );
-   res = ep_read_events_proc(ep, , depth + 1);
+   list_for_each_entry_safe(epi, tmp, , rdllink) {
+   if (ep_item_poll(epi, , depth + 1)) {
+   res = EPOLLIN | EPOLLRDNORM;
+   break;
+   } else {
+   /*
+* Item has been dropped into the ready list by the poll
+* callback, but it's not actually ready, as far as
+* caller requested events goes. We can remove it here.
+*/
+   __pm_relax(ep_wakeup_source(epi));
+   list_del_init(>rdllink);
+   }
+   }
ep_done_scan(ep, );
mutex_unlock(>mtx);
return res;
@@ -772,31 +788,6 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
return res & epi->event.events;
 }
 
-static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
-  int depth)
-{
-   struct epitem *epi, *tmp;
-   poll_table pt;
-
-   init_poll_funcptr(, NULL);
-
-   list_for_each_entry_safe(epi, tmp, head, rdllink) {
-   if (ep_item_poll(epi, , depth)) {
-   return EPOLLIN | EPOLLRDNORM;
-   } else {
-   /*
-* Item has been dropped into the ready list by the poll
-* callback, but it's not actually ready, as far as
-* caller requested events goes. We can remove it here.
-*/
-   __pm_relax(ep_wakeup_source(epi));
-   list_del_init(>rdllink);
-   }
-   }
-
-   return 0;
-}
-
 static __poll_t ep_eventpoll_poll(struct file *file, poll_table *wait)
 {
return __ep_eventpoll_poll(file, wait, 0);
-- 
2.11.0



[RFC PATCH 19/27] ep_insert(): don't open-code ep_remove() on failure exits

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 51 ++-
 1 file changed, 14 insertions(+), 37 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9c567af1f5f..c987b61701e4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1384,12 +1384,16 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
epi->next = EP_UNACTIVE_PTR;
if (epi->event.events & EPOLLWAKEUP) {
error = ep_create_wakeup_source(epi);
-   if (error)
-   goto error_create_wakeup_source;
+   if (error) {
+   kmem_cache_free(epi_cache, epi);
+   return error;
+   }
} else {
RCU_INIT_POINTER(epi->ws, NULL);
}
 
+   atomic_long_inc(>user->epoll_watches);
+
/* Add the current item to the list of active epoll hook for this file 
*/
spin_lock(>f_lock);
list_add_tail_rcu(>fllink, >f_ep_links);
@@ -1402,9 +1406,10 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
ep_rbtree_insert(ep, epi);
 
/* now check if we've created too many backpaths */
-   error = -EINVAL;
-   if (full_check && reverse_path_check())
-   goto error_remove_epi;
+   if (unlikely(full_check && reverse_path_check())) {
+   ep_remove(ep, epi);
+   return -EINVAL;
+   }
 
/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -1424,9 +1429,10 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 * install process. Namely an allocation for a wait queue failed due
 * high memory pressure.
 */
-   error = -ENOMEM;
-   if (!epq.epi)
-   goto error_unregister;
+   if (unlikely(!epq.epi)) {
+   ep_remove(ep, epi);
+   return -ENOMEM;
+   }
 
/* We have to drop the new item inside our item list to keep track of 
it */
write_lock_irq(>lock);
@@ -1448,40 +1454,11 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
 
write_unlock_irq(>lock);
 
-   atomic_long_inc(>user->epoll_watches);
-
/* We have to call this outside the lock */
if (pwake)
ep_poll_safewake(ep, NULL);
 
return 0;
-
-error_unregister:
-   ep_unregister_pollwait(ep, epi);
-error_remove_epi:
-   spin_lock(>f_lock);
-   list_del_rcu(>fllink);
-   spin_unlock(>f_lock);
-
-   rb_erase_cached(>rbn, >rbr);
-
-   /*
-* We need to do this because an event could have been arrived on some
-* allocated wait queue. Note that we don't care about the ep->ovflist
-* list, since that is used/cleaned only inside a section bound by 
"mtx".
-* And ep_insert() is called with "mtx" held.
-*/
-   write_lock_irq(>lock);
-   if (ep_is_linked(epi))
-   list_del_init(>rdllink);
-   write_unlock_irq(>lock);
-
-   wakeup_source_unregister(ep_wakeup_source(epi));
-
-error_create_wakeup_source:
-   kmem_cache_free(epi_cache, epi);
-
-   return error;
 }
 
 /*
-- 
2.11.0



[RFC PATCH 21/27] take the common part of ep_eventpoll_poll() and ep_item_poll() into helper

2020-10-03 Thread Al Viro
From: Al Viro 

The only reason why ep_item_poll() can't simply call ep_eventpoll_poll()
(or, better yet, call vfs_poll() in all cases) is that we need to tell
lockdep how deep into the hierarchy of ->mtx we are.  So let's add
a variant of ep_eventpoll_poll() that would take depth explicitly
and turn ep_eventpoll_poll() into wrapper for that.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 57 +++--
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 39947b71f7af..a50b48d26c55 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -732,6 +732,27 @@ static int ep_eventpoll_release(struct inode *inode, 
struct file *file)
 static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
   int depth);
 
+static __poll_t __ep_eventpoll_poll(struct file *file, poll_table *wait, int 
depth)
+{
+   struct eventpoll *ep = file->private_data;
+   LIST_HEAD(txlist);
+   __poll_t res;
+
+   /* Insert inside our poll wait queue */
+   poll_wait(file, >poll_wait, wait);
+
+   /*
+* Proceed to find out if wanted events are really available inside
+* the ready list.
+*/
+   mutex_lock_nested(>mtx, depth);
+   ep_start_scan(ep, );
+   res = ep_read_events_proc(ep, , depth + 1);
+   ep_done_scan(ep, );
+   mutex_unlock(>mtx);
+   return res;
+}
+
 /*
  * Differs from ep_eventpoll_poll() in that internal callers already have
  * the ep->mtx so we need to start from depth=1, such that mutex_lock_nested()
@@ -740,22 +761,14 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, 
struct list_head *head
 static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 int depth)
 {
-   struct eventpoll *ep;
-   LIST_HEAD(txlist);
+   struct file *file = epi->ffd.file;
__poll_t res;
 
pt->_key = epi->event.events;
-   if (!is_file_epoll(epi->ffd.file))
-   return vfs_poll(epi->ffd.file, pt) & epi->event.events;
-
-   ep = epi->ffd.file->private_data;
-   poll_wait(epi->ffd.file, >poll_wait, pt);
-
-   mutex_lock_nested(>mtx, depth);
-   ep_start_scan(ep, );
-   res = ep_read_events_proc(ep, , depth + 1);
-   ep_done_scan(ep, );
-   mutex_unlock(>mtx);
+   if (!is_file_epoll(file))
+   res = vfs_poll(file, pt);
+   else
+   res = __ep_eventpoll_poll(file, pt, depth);
return res & epi->event.events;
 }
 
@@ -786,23 +799,7 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, 
struct list_head *head
 
 static __poll_t ep_eventpoll_poll(struct file *file, poll_table *wait)
 {
-   struct eventpoll *ep = file->private_data;
-   LIST_HEAD(txlist);
-   __poll_t res;
-
-   /* Insert inside our poll wait queue */
-   poll_wait(file, >poll_wait, wait);
-
-   /*
-* Proceed to find out if wanted events are really available inside
-* the ready list.
-*/
-   mutex_lock(>mtx);
-   ep_start_scan(ep, );
-   res = ep_read_events_proc(ep, , 1);
-   ep_done_scan(ep, );
-   mutex_unlock(>mtx);
-   return res;
+   return __ep_eventpoll_poll(file, wait, 0);
 }
 
 #ifdef CONFIG_PROC_FS
-- 
2.11.0



[RFC PATCH 23/27] ep_insert(): move creation of wakeup source past the fl_ep_links insertion

2020-10-03 Thread Al Viro
From: Al Viro 

That's the beginning of preparations for taking f_ep_links out of struct file.
If insertion might fail, we will need a new failure exit.  Having wakeup
source creation done after that point will simplify life there; ep_remove()
can (and commonly does) live with NULL epi->ws, so it can be used for
cleanup after ep_create_wakeup_source() failure.  It can't be used before
the rbtree insertion, though, so if we are to unify all old failure exits,
we need to move that thing down.  Then we would be free to do simple
kmem_cache_free() on the failure to insert into f_ep_links - no wakeup source
to leak on that failure exit.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1efe8a1a022a..66da645d5eb4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1356,26 +1356,16 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
user_watches = atomic_long_read(>user->epoll_watches);
if (unlikely(user_watches >= max_user_watches))
return -ENOSPC;
-   if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
+   if (!(epi = kmem_cache_zalloc(epi_cache, GFP_KERNEL)))
return -ENOMEM;
 
/* Item initialization follow here ... */
INIT_LIST_HEAD(>rdllink);
INIT_LIST_HEAD(>fllink);
-   epi->pwqlist = NULL;
epi->ep = ep;
ep_set_ffd(>ffd, tfile, fd);
epi->event = *event;
epi->next = EP_UNACTIVE_PTR;
-   if (epi->event.events & EPOLLWAKEUP) {
-   error = ep_create_wakeup_source(epi);
-   if (error) {
-   kmem_cache_free(epi_cache, epi);
-   return error;
-   }
-   } else {
-   RCU_INIT_POINTER(epi->ws, NULL);
-   }
 
atomic_long_inc(>user->epoll_watches);
 
@@ -1400,6 +1390,14 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
return -EINVAL;
}
 
+   if (epi->event.events & EPOLLWAKEUP) {
+   error = ep_create_wakeup_source(epi);
+   if (error) {
+   ep_remove(ep, epi);
+   return error;
+   }
+   }
+
/* Initialize the poll table using the queue callback */
epq.epi = epi;
init_poll_funcptr(, ep_ptable_queue_proc);
-- 
2.11.0



[RFC PATCH 25/27] lift rcu_read_lock() into reverse_path_check()

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 78b8769b72dc..8a7ad752befd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1257,7 +1257,6 @@ static int reverse_path_check_proc(struct file *file, int 
depth)
return -1;
 
/* CTL_DEL can remove links here, but that can't increase our count */
-   rcu_read_lock();
hlist_for_each_entry_rcu(epi, >f_ep_links, fllink) {
struct file *recepient = epi->ep->file;
if (WARN_ON(!is_file_epoll(recepient)))
@@ -1269,7 +1268,6 @@ static int reverse_path_check_proc(struct file *file, int 
depth)
if (error != 0)
break;
}
-   rcu_read_unlock();
return error;
 }
 
@@ -1291,7 +1289,9 @@ static int reverse_path_check(void)
/* let's call this for all tfiles */
list_for_each_entry(current_file, _check_list, f_tfile_llink) {
path_count_init();
+   rcu_read_lock();
error = reverse_path_check_proc(current_file, 0);
+   rcu_read_unlock();
if (error)
break;
}
-- 
2.11.0



[RFC PATCH 07/27] untangling ep_call_nested(): and there was much rejoicing

2020-10-03 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 43 +++
 1 file changed, 11 insertions(+), 32 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9a6ee5991f3d..8c3b02755a50 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -439,25 +439,6 @@ static bool ep_push_nested(void *cookie)
return true;
 }
 
-/**
- * ep_call_nested - Perform a bound (possibly) nested call, by checking
- *  that the recursion limit is not exceeded, and that
- *  the same nested call (by the meaning of same cookie) is
- *  no re-entered.
- *
- * @nproc: Nested call core function pointer.
- * @priv: Opaque data to be passed to the @nproc callback.
- * @cookie: Cookie to be used to identify this nested call.
- *
- * Returns: Returns the code returned by the @nproc callback, or -1 if
- *  the maximum recursion limit has been exceeded.
- */
-static int ep_call_nested(int (*nproc)(void *, void *, int), void *priv,
- void *cookie)
-{
-   return (*nproc)(priv, cookie, nesting);
-}
-
 /*
  * As described in commit 0ccf831cb lockdep: annotate epoll
  * the use of wait queues used by epoll is done in a very controlled
@@ -1325,7 +1306,7 @@ static void path_count_init(void)
path_count[i] = 0;
 }
 
-static int reverse_path_check_proc(void *priv, void *cookie, int call_nests)
+static int reverse_path_check_proc(void *priv, void *cookie, int depth)
 {
int error = 0;
struct file *file = priv;
@@ -1341,13 +1322,13 @@ static int reverse_path_check_proc(void *priv, void 
*cookie, int call_nests)
child_file = epi->ep->file;
if (is_file_epoll(child_file)) {
if (list_empty(_file->f_ep_links)) {
-   if (path_count_inc(call_nests)) {
+   if (path_count_inc(depth)) {
error = -1;
break;
}
} else {
-   error = ep_call_nested(reverse_path_check_proc,
-   child_file, child_file);
+   error = reverse_path_check_proc(child_file, 
child_file,
+   depth + 1);
}
if (error != 0)
break;
@@ -1379,8 +1360,7 @@ static int reverse_path_check(void)
/* let's call this for all tfiles */
list_for_each_entry(current_file, _check_list, f_tfile_llink) {
path_count_init();
-   error = ep_call_nested(reverse_path_check_proc, current_file,
-   current_file);
+   error = reverse_path_check_proc(current_file, current_file, 0);
if (error)
break;
}
@@ -1886,8 +1866,7 @@ static int ep_poll(struct eventpoll *ep, struct 
epoll_event __user *events,
 }
 
 /**
- * ep_loop_check_proc - Callback function to be passed to the @ep_call_nested()
- *  API, to verify that adding an epoll file inside another
+ * ep_loop_check_proc - verify that adding an epoll file inside another
  *  epoll structure, does not violate the constraints, in
  *  terms of closed loops, or too deep chains (which can
  *  result in excessive stack usage).
@@ -1900,7 +1879,7 @@ static int ep_poll(struct eventpoll *ep, struct 
epoll_event __user *events,
  * Returns: Returns zero if adding the epoll @file inside current epoll
  *  structure @ep does not violate the constraints, or -1 otherwise.
  */
-static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
+static int ep_loop_check_proc(void *priv, void *cookie, int depth)
 {
int error = 0;
struct file *file = priv;
@@ -1912,7 +1891,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
if (!ep_push_nested(cookie)) /* limits recursion */
return -1;
 
-   mutex_lock_nested(>mtx, call_nests + 1);
+   mutex_lock_nested(>mtx, depth + 1);
ep->gen = loop_check_gen;
for (rbp = rb_first_cached(>rbr); rbp; rbp = rb_next(rbp)) {
epi = rb_entry(rbp, struct epitem, rbn);
@@ -1920,8 +1899,8 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
ep_tovisit = epi->ffd.file->private_data;
if (ep_tovisit->gen == loop_check_gen)
continue;
-   error = ep_call_nested(ep_loop_check_proc, 
epi->ffd.file,
-   ep_tovisit);
+   error = ep_loop_check_proc(epi->ffd.file, ep_tovisit,
+  

[RFC PATCH 14/27] ep_scan_ready_list(): prepare to splitup

2020-10-03 Thread Al Viro
From: Al Viro 

take the stuff done before and after the callback into separate helpers

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 63 +-
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e971e3ace557..eb012fdc152e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -561,28 +561,10 @@ static inline void ep_pm_stay_awake_rcu(struct epitem 
*epi)
rcu_read_unlock();
 }
 
-/**
- * ep_scan_ready_list - Scans the ready list in a way that makes possible for
- *  the scan code, to call f_op->poll(). Also allows for
- *  O(NumReady) performance.
- *
- * @ep: Pointer to the epoll private data structure.
- * @sproc: Pointer to the scan callback.
- * @priv: Private opaque data passed to the @sproc callback.
- * @depth: The current depth of recursive f_op->poll calls.
- * @ep_locked: caller already holds ep->mtx
- *
- * Returns: The same integer error code returned by the @sproc callback.
- */
-static __poll_t ep_scan_ready_list(struct eventpoll *ep,
- __poll_t (*sproc)(struct eventpoll *,
-  struct list_head *, void *),
- void *priv, int depth, bool ep_locked)
+static void ep_start_scan(struct eventpoll *ep,
+ int depth, bool ep_locked,
+ struct list_head *txlist)
 {
-   __poll_t res;
-   struct epitem *epi, *nepi;
-   LIST_HEAD(txlist);
-
lockdep_assert_irqs_enabled();
 
/*
@@ -602,14 +584,16 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 * in a lockless way.
 */
write_lock_irq(>lock);
-   list_splice_init(>rdllist, );
+   list_splice_init(>rdllist, txlist);
WRITE_ONCE(ep->ovflist, NULL);
write_unlock_irq(>lock);
+}
 
-   /*
-* Now call the callback function.
-*/
-   res = (*sproc)(ep, , priv);
+static void ep_done_scan(struct eventpoll *ep,
+int depth, bool ep_locked,
+struct list_head *txlist)
+{
+   struct epitem *epi, *nepi;
 
write_lock_irq(>lock);
/*
@@ -644,13 +628,38 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
/*
 * Quickly re-inject items left on "txlist".
 */
-   list_splice(, >rdllist);
+   list_splice(txlist, >rdllist);
__pm_relax(ep->ws);
write_unlock_irq(>lock);
 
if (!ep_locked)
mutex_unlock(>mtx);
+}
 
+/**
+ * ep_scan_ready_list - Scans the ready list in a way that makes possible for
+ *  the scan code, to call f_op->poll(). Also allows for
+ *  O(NumReady) performance.
+ *
+ * @ep: Pointer to the epoll private data structure.
+ * @sproc: Pointer to the scan callback.
+ * @priv: Private opaque data passed to the @sproc callback.
+ * @depth: The current depth of recursive f_op->poll calls.
+ * @ep_locked: caller already holds ep->mtx
+ *
+ * Returns: The same integer error code returned by the @sproc callback.
+ */
+static __poll_t ep_scan_ready_list(struct eventpoll *ep,
+ __poll_t (*sproc)(struct eventpoll *,
+  struct list_head *, void *),
+ void *priv, int depth, bool ep_locked)
+{
+   __poll_t res;
+   LIST_HEAD(txlist);
+
+   ep_start_scan(ep, depth, ep_locked, );
+   res = (*sproc)(ep, , priv);
+   ep_done_scan(ep, depth, ep_locked, );
return res;
 }
 
-- 
2.11.0



[RFC PATCH 13/27] ep_loop_check_proc(): saner calling conventions

2020-10-03 Thread Al Viro
From: Al Viro 

1) 'cookie' argument is unused; kill it.
2) 'priv' one is always an epoll struct file, and we only care
about its associated struct eventpoll; pass that instead.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 38 --
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 6b1990b8b9a0..e971e3ace557 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1845,19 +1845,14 @@ static int ep_poll(struct eventpoll *ep, struct 
epoll_event __user *events,
  *  result in excessive stack usage).
  *
  * @priv: Pointer to the epoll file to be currently checked.
- * @cookie: Original cookie for this call. This is the top-of-the-chain epoll
- *  data structure pointer.
- * @call_nests: Current dept of the @ep_call_nested() call stack.
+ * @depth: Current depth of the path being checked.
  *
  * Returns: Returns zero if adding the epoll @file inside current epoll
  *  structure @ep does not violate the constraints, or -1 otherwise.
  */
-static int ep_loop_check_proc(void *priv, void *cookie, int depth)
+static int ep_loop_check_proc(struct eventpoll *ep, int depth)
 {
int error = 0;
-   struct file *file = priv;
-   struct eventpoll *ep = file->private_data;
-   struct eventpoll *ep_tovisit;
struct rb_node *rbp;
struct epitem *epi;
 
@@ -1866,15 +1861,14 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
for (rbp = rb_first_cached(>rbr); rbp; rbp = rb_next(rbp)) {
epi = rb_entry(rbp, struct epitem, rbn);
if (unlikely(is_file_epoll(epi->ffd.file))) {
+   struct eventpoll *ep_tovisit;
ep_tovisit = epi->ffd.file->private_data;
if (ep_tovisit->gen == loop_check_gen)
continue;
-   if (ep_tovisit == inserting_into || depth > 
EP_MAX_NESTS) {
+   if (ep_tovisit == inserting_into || depth > 
EP_MAX_NESTS)
error = -1;
-   } else {
-   error = ep_loop_check_proc(epi->ffd.file, 
ep_tovisit,
-  depth + 1);
-   }
+   else
+   error = ep_loop_check_proc(ep_tovisit, depth + 
1);
if (error != 0)
break;
} else {
@@ -1899,20 +1893,20 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int depth)
 }
 
 /**
- * ep_loop_check - Performs a check to verify that adding an epoll file (@file)
- * another epoll file (represented by @ep) does not create
+ * ep_loop_check - Performs a check to verify that adding an epoll file (@to)
+ * into another epoll file (represented by @from) does not 
create
  * closed loops or too deep chains.
  *
- * @ep: Pointer to the epoll private data structure.
- * @file: Pointer to the epoll file to be checked.
+ * @from: Pointer to the epoll we are inserting into.
+ * @to: Pointer to the epoll to be inserted.
  *
- * Returns: Returns zero if adding the epoll @file inside current epoll
- *  structure @ep does not violate the constraints, or -1 otherwise.
+ * Returns: Returns zero if adding the epoll @to inside the epoll @from
+ * does not violate the constraints, or -1 otherwise.
  */
-static int ep_loop_check(struct eventpoll *ep, struct file *file)
+static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to)
 {
inserting_into = ep;
-   return ep_loop_check_proc(file, ep, 0);
+   return ep_loop_check_proc(to, 0);
 }
 
 static void clear_tfile_check_list(void)
@@ -2086,8 +2080,9 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
loop_check_gen++;
full_check = 1;
if (is_file_epoll(tf.file)) {
+   tep = tf.file->private_data;
error = -ELOOP;
-   if (ep_loop_check(ep, tf.file) != 0)
+   if (ep_loop_check(ep, tep) != 0)
goto error_tgt_fput;
} else {
get_file(tf.file);
@@ -2098,7 +2093,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
if (error)
goto error_tgt_fput;
if (is_file_epoll(tf.file)) {
-   tep = tf.file->private_data;
error = epoll_mutex_lock(>mtx, 1, 
nonblock);
if (error) {
mutex_unlock(>mtx);
-- 
2.11.0



[RFC PATCH 26/27] epoll: massage the check list insertion

2020-10-03 Thread Al Viro
From: Al Viro 

in the "non-epoll target" cases do it in ep_insert() rather than
in do_epoll_ctl(), so that we do it only with some epitem is already
guaranteed to exist.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8a7ad752befd..eea269670168 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1375,6 +1375,10 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
spin_lock(>f_lock);
hlist_add_head_rcu(>fllink, >f_ep_links);
spin_unlock(>f_lock);
+   if (full_check && !tep) {
+   get_file(tfile);
+   list_add(>f_tfile_llink, _check_list);
+   }
 
/*
 * Add the current item to the RB tree. All RB tree operations are
@@ -2013,10 +2017,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct 
epoll_event *epds,
error = -ELOOP;
if (ep_loop_check(ep, tep) != 0)
goto error_tgt_fput;
-   } else {
-   get_file(tf.file);
-   list_add(>f_tfile_llink,
-   _check_list);
}
error = epoll_mutex_lock(>mtx, 0, nonblock);
if (error)
-- 
2.11.0



[RFC PATCH 18/27] lift locking/unlocking ep->mtx out of ep_{start,done}_scan()

2020-10-03 Thread Al Viro
From: Al Viro 

get rid of depth/ep_locked arguments there and document
the kludge in ep_item_poll() that has lead to ep_locked existence in
the first place

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 57 ++---
 1 file changed, 26 insertions(+), 31 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ac996b959e94..f9c567af1f5f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -554,20 +554,13 @@ static inline void ep_pm_stay_awake_rcu(struct epitem 
*epi)
rcu_read_unlock();
 }
 
-static void ep_start_scan(struct eventpoll *ep,
- int depth, bool ep_locked,
- struct list_head *txlist)
-{
-   lockdep_assert_irqs_enabled();
-
-   /*
-* We need to lock this because we could be hit by
-* eventpoll_release_file() and epoll_ctl().
-*/
-
-   if (!ep_locked)
-   mutex_lock_nested(>mtx, depth);
 
+/*
+ * ep->mutex needs to be held because we could be hit by
+ * eventpoll_release_file() and epoll_ctl().
+ */
+static void ep_start_scan(struct eventpoll *ep, struct list_head *txlist)
+{
/*
 * Steal the ready list, and re-init the original one to the
 * empty list. Also, set ep->ovflist to NULL so that events
@@ -576,6 +569,7 @@ static void ep_start_scan(struct eventpoll *ep,
 * because we want the "sproc" callback to be able to do it
 * in a lockless way.
 */
+   lockdep_assert_irqs_enabled();
write_lock_irq(>lock);
list_splice_init(>rdllist, txlist);
WRITE_ONCE(ep->ovflist, NULL);
@@ -583,7 +577,6 @@ static void ep_start_scan(struct eventpoll *ep,
 }
 
 static void ep_done_scan(struct eventpoll *ep,
-int depth, bool ep_locked,
 struct list_head *txlist)
 {
struct epitem *epi, *nepi;
@@ -624,9 +617,6 @@ static void ep_done_scan(struct eventpoll *ep,
list_splice(txlist, >rdllist);
__pm_relax(ep->ws);
write_unlock_irq(>lock);
-
-   if (!ep_locked)
-   mutex_unlock(>mtx);
 }
 
 static void epi_rcu_free(struct rcu_head *head)
@@ -763,11 +753,16 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
 
ep = epi->ffd.file->private_data;
poll_wait(epi->ffd.file, >poll_wait, pt);
-   locked = pt && (pt->_qproc == ep_ptable_queue_proc);
 
-   ep_start_scan(ep, depth, locked, );
+   // kludge: ep_insert() calls us with ep->mtx already locked
+   locked = pt && (pt->_qproc == ep_ptable_queue_proc);
+   if (!locked)
+   mutex_lock_nested(>mtx, depth);
+   ep_start_scan(ep, );
res = ep_read_events_proc(ep, , depth + 1);
-   ep_done_scan(ep, depth, locked, );
+   ep_done_scan(ep, );
+   if (!locked)
+   mutex_unlock(>mtx);
return res & epi->event.events;
 }
 
@@ -809,9 +804,11 @@ static __poll_t ep_eventpoll_poll(struct file *file, 
poll_table *wait)
 * Proceed to find out if wanted events are really available inside
 * the ready list.
 */
-   ep_start_scan(ep, 0, false, );
+   mutex_lock(>mtx);
+   ep_start_scan(ep, );
res = ep_read_events_proc(ep, , 1);
-   ep_done_scan(ep, 0, false, );
+   ep_done_scan(ep, );
+   mutex_unlock(>mtx);
return res;
 }
 
@@ -1573,15 +1570,13 @@ static int ep_send_events(struct eventpoll *ep,
 
init_poll_funcptr(, NULL);
 
-   ep_start_scan(ep, 0, false, );
+   mutex_lock(>mtx);
+   ep_start_scan(ep, );
 
/*
 * We can loop without lock because we are passed a task private list.
-* Items cannot vanish during the loop because ep_scan_ready_list() is
-* holding "mtx" during this call.
+* Items cannot vanish during the loop we are holding ep->mtx.
 */
-   lockdep_assert_held(>mtx);
-
list_for_each_entry_safe(epi, tmp, , rdllink) {
struct wakeup_source *ws;
__poll_t revents;
@@ -1609,9 +1604,8 @@ static int ep_send_events(struct eventpoll *ep,
 
/*
 * If the event mask intersect the caller-requested one,
-* deliver the event to userspace. Again, ep_scan_ready_list()
-* is holding ep->mtx, so no operations coming from userspace
-* can change the item.
+* deliver the event to userspace. Again, we are holding 
ep->mtx,
+* so no operations coming from userspace can change the item.
 */
revents = ep_item_poll(epi, , 1);
if (!revents)
@@ -1645,7 +1639,8 @@ static int ep_send_events(struct eventpoll *ep,
ep_pm_stay_awake(epi);
}
}
-   ep_done_scan(ep, 0, false, );
+   ep_done_scan(ep, );
+   mutex_unlock(>mtx);
 
return res;
 }
-- 
2.11.0



[RFC PATCH 04/27] untangling ep_call_nested(): it's all serialized on epmutex.

2020-10-03 Thread Al Viro
From: Al Viro 

IOW,
* no locking is needed to protect the list
* the list is actually a stack
* no need to check ->ctx
* it can bloody well be a static 5-element array - nobody is
going to be accessing it in parallel.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 80 --
 1 file changed, 11 insertions(+), 69 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ef73d71a5dc8..43aecae0935c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -109,25 +109,6 @@ struct epoll_filefd {
int fd;
 } __packed;
 
-/*
- * Structure used to track possible nested calls, for too deep recursions
- * and loop cycles.
- */
-struct nested_call_node {
-   struct list_head llink;
-   void *cookie;
-   void *ctx;
-};
-
-/*
- * This structure is used as collector for nested calls, to check for
- * maximum recursion dept and loop cycles.
- */
-struct nested_calls {
-   struct list_head tasks_call_list;
-   spinlock_t lock;
-};
-
 /* Wait structure used by the poll hooks */
 struct eppoll_entry {
/* List header used to link this structure to the "struct epitem" */
@@ -273,7 +254,8 @@ static DEFINE_MUTEX(epmutex);
 static u64 loop_check_gen = 0;
 
 /* Used to check for epoll file descriptor inclusion loops */
-static struct nested_calls poll_loop_ncalls;
+static void *cookies[EP_MAX_NESTS + 1];
+static int nesting;
 
 /* Slab cache used to allocate "struct epitem" */
 static struct kmem_cache *epi_cache __read_mostly;
@@ -348,13 +330,6 @@ static inline struct epitem 
*ep_item_from_wait(wait_queue_entry_t *p)
return container_of(p, struct eppoll_entry, wait)->base;
 }
 
-/* Initialize the poll safe wake up structure */
-static void ep_nested_calls_init(struct nested_calls *ncalls)
-{
-   INIT_LIST_HEAD(>tasks_call_list);
-   spin_lock_init(>lock);
-}
-
 /**
  * ep_events_available - Checks if ready events might be available.
  *
@@ -465,47 +440,20 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
 static int ep_call_nested(int (*nproc)(void *, void *, int), void *priv,
  void *cookie)
 {
-   int error, call_nests = 0;
-   unsigned long flags;
-   struct nested_calls *ncalls = _loop_ncalls;
-   struct list_head *lsthead = >tasks_call_list;
-   struct nested_call_node *tncur;
-   struct nested_call_node tnode;
+   int error, i;
 
-   spin_lock_irqsave(>lock, flags);
+   if (nesting > EP_MAX_NESTS) /* too deep nesting */
+   return -1;
 
-   /*
-* Try to see if the current task is already inside this wakeup call.
-* We use a list here, since the population inside this set is always
-* very much limited.
-*/
-   list_for_each_entry(tncur, lsthead, llink) {
-   if (tncur->ctx == current &&
-   (tncur->cookie == cookie || ++call_nests > EP_MAX_NESTS)) {
-   /*
-* Ops ... loop detected or maximum nest level reached.
-* We abort this wake by breaking the cycle itself.
-*/
-   error = -1;
-   goto out_unlock;
-   }
+   for (i = 0; i < nesting; i++) {
+   if (cookies[i] == cookie) /* loop detected */
+   return -1;
}
-
-   /* Add the current task and cookie to the list */
-   tnode.ctx = current;
-   tnode.cookie = cookie;
-   list_add(, lsthead);
-
-   spin_unlock_irqrestore(>lock, flags);
+   cookies[nesting++] = cookie;
 
/* Call the nested function */
-   error = (*nproc)(priv, cookie, call_nests);
-
-   /* Remove the current task from the list */
-   spin_lock_irqsave(>lock, flags);
-   list_del();
-out_unlock:
-   spin_unlock_irqrestore(>lock, flags);
+   error = (*nproc)(priv, cookie, nesting - 1);
+   nesting--;
 
return error;
 }
@@ -2380,12 +2328,6 @@ static int __init eventpoll_init(void)
BUG_ON(max_user_watches < 0);
 
/*
-* Initialize the structure used to perform epoll file descriptor
-* inclusion loops checks.
-*/
-   ep_nested_calls_init(_loop_ncalls);
-
-   /*
 * We can have many thousands of epitems, so prevent this from
 * using an extra cache line on 64-bit (and smaller) CPUs
 */
-- 
2.11.0



[RFC PATCH 01/27] epoll: switch epitem->pwqlist to single-linked list

2020-10-03 Thread Al Viro
From: Al Viro 

We only traverse it once to destroy all associated eppoll_entry at
epitem destruction time.  The order of traversal is irrelevant there.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 51 +--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..ae41868d9b35 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -128,6 +128,24 @@ struct nested_calls {
spinlock_t lock;
 };
 
+/* Wait structure used by the poll hooks */
+struct eppoll_entry {
+   /* List header used to link this structure to the "struct epitem" */
+   struct eppoll_entry *next;
+
+   /* The "base" pointer is set to the container "struct epitem" */
+   struct epitem *base;
+
+   /*
+* Wait queue item that will be linked to the target file wait
+* queue head.
+*/
+   wait_queue_entry_t wait;
+
+   /* The wait queue head that linked the "wait" wait queue item */
+   wait_queue_head_t *whead;
+};
+
 /*
  * Each file descriptor added to the eventpoll interface will
  * have an entry of this type linked to the "rbr" RB tree.
@@ -158,7 +176,7 @@ struct epitem {
int nwait;
 
/* List containing poll wait queues */
-   struct list_head pwqlist;
+   struct eppoll_entry *pwqlist;
 
/* The "container" of this item */
struct eventpoll *ep;
@@ -231,24 +249,6 @@ struct eventpoll {
 #endif
 };
 
-/* Wait structure used by the poll hooks */
-struct eppoll_entry {
-   /* List header used to link this structure to the "struct epitem" */
-   struct list_head llink;
-
-   /* The "base" pointer is set to the container "struct epitem" */
-   struct epitem *base;
-
-   /*
-* Wait queue item that will be linked to the target file wait
-* queue head.
-*/
-   wait_queue_entry_t wait;
-
-   /* The wait queue head that linked the "wait" wait queue item */
-   wait_queue_head_t *whead;
-};
-
 /* Wrapper struct used by poll queueing */
 struct ep_pqueue {
poll_table pt;
@@ -617,13 +617,11 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
  */
 static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 {
-   struct list_head *lsthead = >pwqlist;
+   struct eppoll_entry **p = >pwqlist;
struct eppoll_entry *pwq;
 
-   while (!list_empty(lsthead)) {
-   pwq = list_first_entry(lsthead, struct eppoll_entry, llink);
-
-   list_del(>llink);
+   while ((pwq = *p) != NULL) {
+   *p = pwq->next;
ep_remove_wait_queue(pwq);
kmem_cache_free(pwq_cache, pwq);
}
@@ -1320,7 +1318,8 @@ static void ep_ptable_queue_proc(struct file *file, 
wait_queue_head_t *whead,
add_wait_queue_exclusive(whead, >wait);
else
add_wait_queue(whead, >wait);
-   list_add_tail(>llink, >pwqlist);
+   pwq->next = epi->pwqlist;
+   epi->pwqlist = pwq;
epi->nwait++;
} else {
/* We have to signal that an error occurred */
@@ -1507,7 +1506,7 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
/* Item initialization follow here ... */
INIT_LIST_HEAD(>rdllink);
INIT_LIST_HEAD(>fllink);
-   INIT_LIST_HEAD(>pwqlist);
+   epi->pwqlist = NULL;
epi->ep = ep;
ep_set_ffd(>ffd, tfile, fd);
epi->event = *event;
-- 
2.11.0



[RFC PATCH 03/27] untangling ep_call_nested(): get rid of useless arguments

2020-10-03 Thread Al Viro
From: Al Viro 

ctx is always equal to current, ncalls - to _loop_ncalls.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 44aca681d897..ef73d71a5dc8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -455,21 +455,19 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
  *  the same nested call (by the meaning of same cookie) is
  *  no re-entered.
  *
- * @ncalls: Pointer to the nested_calls structure to be used for this call.
  * @nproc: Nested call core function pointer.
  * @priv: Opaque data to be passed to the @nproc callback.
  * @cookie: Cookie to be used to identify this nested call.
- * @ctx: This instance context.
  *
  * Returns: Returns the code returned by the @nproc callback, or -1 if
  *  the maximum recursion limit has been exceeded.
  */
-static int ep_call_nested(struct nested_calls *ncalls,
- int (*nproc)(void *, void *, int), void *priv,
- void *cookie, void *ctx)
+static int ep_call_nested(int (*nproc)(void *, void *, int), void *priv,
+ void *cookie)
 {
int error, call_nests = 0;
unsigned long flags;
+   struct nested_calls *ncalls = _loop_ncalls;
struct list_head *lsthead = >tasks_call_list;
struct nested_call_node *tncur;
struct nested_call_node tnode;
@@ -482,7 +480,7 @@ static int ep_call_nested(struct nested_calls *ncalls,
 * very much limited.
 */
list_for_each_entry(tncur, lsthead, llink) {
-   if (tncur->ctx == ctx &&
+   if (tncur->ctx == current &&
(tncur->cookie == cookie || ++call_nests > EP_MAX_NESTS)) {
/*
 * Ops ... loop detected or maximum nest level reached.
@@ -494,7 +492,7 @@ static int ep_call_nested(struct nested_calls *ncalls,
}
 
/* Add the current task and cookie to the list */
-   tnode.ctx = ctx;
+   tnode.ctx = current;
tnode.cookie = cookie;
list_add(, lsthead);
 
@@ -1397,10 +1395,8 @@ static int reverse_path_check_proc(void *priv, void 
*cookie, int call_nests)
break;
}
} else {
-   error = ep_call_nested(_loop_ncalls,
-   reverse_path_check_proc,
-   child_file, child_file,
-   current);
+   error = ep_call_nested(reverse_path_check_proc,
+   child_file, child_file);
}
if (error != 0)
break;
@@ -1431,9 +1427,8 @@ static int reverse_path_check(void)
/* let's call this for all tfiles */
list_for_each_entry(current_file, _check_list, f_tfile_llink) {
path_count_init();
-   error = ep_call_nested(_loop_ncalls,
-   reverse_path_check_proc, current_file,
-   current_file, current);
+   error = ep_call_nested(reverse_path_check_proc, current_file,
+   current_file);
if (error)
break;
}
@@ -1970,9 +1965,8 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
ep_tovisit = epi->ffd.file->private_data;
if (ep_tovisit->gen == loop_check_gen)
continue;
-   error = ep_call_nested(_loop_ncalls,
-   ep_loop_check_proc, epi->ffd.file,
-   ep_tovisit, current);
+   error = ep_call_nested(ep_loop_check_proc, 
epi->ffd.file,
+   ep_tovisit);
if (error != 0)
break;
} else {
@@ -2009,8 +2003,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, 
int call_nests)
  */
 static int ep_loop_check(struct eventpoll *ep, struct file *file)
 {
-   return ep_call_nested(_loop_ncalls,
- ep_loop_check_proc, file, ep, current);
+   return ep_call_nested(ep_loop_check_proc, file, ep);
 }
 
 static void clear_tfile_check_list(void)
-- 
2.11.0



[RFC PATCH 15/27] lift the calls of ep_read_events_proc() into the callers

2020-10-03 Thread Al Viro
From: Al Viro 

Expand the calls of ep_scan_ready_list() that get ep_read_events_proc().
As a side benefit we can pass depth to ep_read_events_proc() by value
and not by address - the latter used to be forced by the signature
expected from ep_scan_ready_list() callback.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eb012fdc152e..9b9e29e0c85f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -774,7 +774,7 @@ static int ep_eventpoll_release(struct inode *inode, struct 
file *file)
 }
 
 static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
-  void *priv);
+  int depth);
 static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 poll_table *pt);
 
@@ -787,6 +787,8 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
 int depth)
 {
struct eventpoll *ep;
+   LIST_HEAD(txlist);
+   __poll_t res;
bool locked;
 
pt->_key = epi->event.events;
@@ -797,20 +799,19 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
poll_wait(epi->ffd.file, >poll_wait, pt);
locked = pt && (pt->_qproc == ep_ptable_queue_proc);
 
-   return ep_scan_ready_list(epi->ffd.file->private_data,
- ep_read_events_proc, , depth,
- locked) & epi->event.events;
+   ep_start_scan(ep, depth, locked, );
+   res = ep_read_events_proc(ep, , depth + 1);
+   ep_done_scan(ep, depth, locked, );
+   return res & epi->event.events;
 }
 
 static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head 
*head,
-  void *priv)
+  int depth)
 {
struct epitem *epi, *tmp;
poll_table pt;
-   int depth = *(int *)priv;
 
init_poll_funcptr(, NULL);
-   depth++;
 
list_for_each_entry_safe(epi, tmp, head, rdllink) {
if (ep_item_poll(epi, , depth)) {
@@ -832,7 +833,8 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, 
struct list_head *head
 static __poll_t ep_eventpoll_poll(struct file *file, poll_table *wait)
 {
struct eventpoll *ep = file->private_data;
-   int depth = 0;
+   LIST_HEAD(txlist);
+   __poll_t res;
 
/* Insert inside our poll wait queue */
poll_wait(file, >poll_wait, wait);
@@ -841,8 +843,10 @@ static __poll_t ep_eventpoll_poll(struct file *file, 
poll_table *wait)
 * Proceed to find out if wanted events are really available inside
 * the ready list.
 */
-   return ep_scan_ready_list(ep, ep_read_events_proc,
- , depth, false);
+   ep_start_scan(ep, 0, false, );
+   res = ep_read_events_proc(ep, , 1);
+   ep_done_scan(ep, 0, false, );
+   return res;
 }
 
 #ifdef CONFIG_PROC_FS
-- 
2.11.0



[RFC PATCH 17/27] ep_send_events_proc(): fold into the caller

2020-10-03 Thread Al Viro
From: Al Viro 

... and get rid of struct ep_send_events_data - not needed anymore.
The weird way of passing the arguments in (and real return value
out - nominal return value of ep_send_events_proc() is ignored)
was due to the signature forced on ep_scan_ready_list() callbacks.

Signed-off-by: Al Viro 
---
 fs/eventpoll.c | 60 --
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3b3a862f8014..ac996b959e94 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -233,13 +233,6 @@ struct ep_pqueue {
struct epitem *epi;
 };
 
-/* Used by the ep_send_events() function as callback private data */
-struct ep_send_events_data {
-   int maxevents;
-   struct epoll_event __user *events;
-   int res;
-};
-
 /*
  * Configuration options available inside /proc/sys/fs/epoll/
  */
@@ -1570,18 +1563,17 @@ static int ep_modify(struct eventpoll *ep, struct 
epitem *epi,
return 0;
 }
 
-static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head 
*head,
-  void *priv)
+static int ep_send_events(struct eventpoll *ep,
+ struct epoll_event __user *events, int maxevents)
 {
-   struct ep_send_events_data *esed = priv;
-   __poll_t revents;
struct epitem *epi, *tmp;
-   struct epoll_event __user *uevent = esed->events;
-   struct wakeup_source *ws;
+   LIST_HEAD(txlist);
poll_table pt;
+   int res = 0;
 
init_poll_funcptr(, NULL);
-   esed->res = 0;
+
+   ep_start_scan(ep, 0, false, );
 
/*
 * We can loop without lock because we are passed a task private list.
@@ -1590,8 +1582,11 @@ static __poll_t ep_send_events_proc(struct eventpoll 
*ep, struct list_head *head
 */
lockdep_assert_held(>mtx);
 
-   list_for_each_entry_safe(epi, tmp, head, rdllink) {
-   if (esed->res >= esed->maxevents)
+   list_for_each_entry_safe(epi, tmp, , rdllink) {
+   struct wakeup_source *ws;
+   __poll_t revents;
+
+   if (res >= maxevents)
break;
 
/*
@@ -1622,16 +1617,16 @@ static __poll_t ep_send_events_proc(struct eventpoll 
*ep, struct list_head *head
if (!revents)
continue;
 
-   if (__put_user(revents, >events) ||
-   __put_user(epi->event.data, >data)) {
-   list_add(>rdllink, head);
+   if (__put_user(revents, >events) ||
+   __put_user(epi->event.data, >data)) {
+   list_add(>rdllink, );
ep_pm_stay_awake(epi);
-   if (!esed->res)
-   esed->res = -EFAULT;
-   return 0;
+   if (!res)
+   res = -EFAULT;
+   break;
}
-   esed->res++;
-   uevent++;
+   res++;
+   events++;
if (epi->event.events & EPOLLONESHOT)
epi->event.events &= EP_PRIVATE_BITS;
else if (!(epi->event.events & EPOLLET)) {
@@ -1650,24 +1645,9 @@ static __poll_t ep_send_events_proc(struct eventpoll 
*ep, struct list_head *head
ep_pm_stay_awake(epi);
}
}
-
-   return 0;
-}
-
-static int ep_send_events(struct eventpoll *ep,
- struct epoll_event __user *events, int maxevents)
-{
-   struct ep_send_events_data esed;
-   LIST_HEAD(txlist);
-
-   esed.maxevents = maxevents;
-   esed.events = events;
-
-   ep_start_scan(ep, 0, false, );
-   ep_send_events_proc(ep, , );
ep_done_scan(ep, 0, false, );
 
-   return esed.res;
+   return res;
 }
 
 static inline struct timespec64 ep_set_mstimeout(long ms)
-- 
2.11.0



[RFC][PATCHSET] epoll cleanups

2020-10-03 Thread Al Viro
Locking and especilly control flow in fs/eventpoll.c is
overcomplicated.  As the result, the code has been hard to follow
and easy to fuck up while modifying.

The following series attempts to untangle it; there's more to be done
there, but this should take care of some of the obfuscated bits.  It also
reduces the memory footprint of that thing.

The series lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #experimental.epoll
and it survives light local beating.  It really needs review and testing.
I'll post the individual patches in followups (27 commits, trimming about 120
lines out of fs/eventpoll.c).

First we trim struct epitem a bit:
(1/27) epoll: switch epitem->pwqlist to single-linked list
struct epitem has an associated set of struct eppoll_entry.
It's populated once (at epitem creation), traversed once (at epitem
destruction) and the order of elements does not matter.  No need
to bother with a cyclic list when a single-linked one will work
just fine.
NB: it might make sense to embed the first (and almost always
the only) element of that list into struct epitem itself.  Not in
this series yet.
(2/27) epoll: get rid of epitem->nwait
All it's used for is a rather convoluted mechanism for reporting
eppoll_entry allocation failures to ep_insert() at epitem creation time.
Can be done in a simpler way...

Getting rid of ep_call_nested().  The thing used to have a much
wider use (including being called from wakeup callbacks, etc.); as it is,
it's greatly overcomplicated.  First of all, let's simplify the control
flow there:
(3/27) untangling ep_call_nested(): get rid of useless arguments
Two of the arguments are always the same.  Kill 'em.
(4/27) untangling ep_call_nested(): it's all serialized on epmutex.
It's fully serialized (the remaining calls, that is), which
allows to simplify things quite a bit - instead of a list of structures
in stack frames of recursive calls (possibly from many threads at the
same time, so they can be interspersed), all we really need is
a static array.  And very little of those structs is actually needed -
we don't need to separate the ones from different threads, etc.
(5/27) untangling ep_call_nested(): take pushing cookie into a helper
ep_call_nested() does three things: it checks that the recursion
depth is not too large, it adds a pointer to prohibited set (and fails
if it's already been there) and it calls a callback.  Take handling of
prohibited set into a helper.
(6/27) untangling ep_call_nested(): move push/pop of cookie into the callbacks
... and move the calls of that helper into the callbacks - all two
of them.  At that point ep_call_nested() has been reduced to an indirect
function call.
(7/27) untangling ep_call_nested(): and there was much rejoicing

Besides the obfuscated control flow, ep_call_nested() used to have
another nasty effect - the callbacks had been procrusted into the
prototype expected by ep_call_nested().  Now we can untangle those:
(8/27) reverse_path_check_proc(): sane arguments
'priv' and 'cookie' arguments are always equal here, and both are
actually struct file *, not void *.
(9/27) reverse_path_check_proc(): don't bother with cookies
all we needed from ep_call_nested() was the recursion depth limit;
we have already checked for loops by the time we call it.
(10/27) clean reverse_path_check_proc() a bit
(11/27) ep_loop_check_proc(): lift pushing the cookie into callers
move maintaining the prohibited set into the caller; that way
we don't need the 'cookie' argument.
(12/27) get rid of ep_push_nested()
... and we don't really need the prohibited _set_ - we are adding
an edge to an acyclic graph and we want to verify that it won't create a loop.
Sure. we need to walk through the nodes reachable from the destination of
the edge to be, but all we need to verify is that the soure of that edge is
not among them.  IOW, we only need to check against *one* prohibited node.
That kills the last remnants of ep_call_nested().
(13/27) ep_loop_check_proc(): saner calling conventions
'cookie' is not used, 'priv' is actually a epoll struct file *and*
we only care about associated struct eventpoll.  So pass that instead.

Next source of obfuscation (and indirect function calls; I like
Haskell, but this is C and it wouldn't have made a good higher order
function anyway) is ep_scan_ready_list().  We start with splitting the
parts before and after the call of callback into new helpers and expanding
the calls of ep_scan_ready_list(), with the callbacks now called directly.

(14/27) ep_scan_ready_list(): prepare to splitup
new helpers
(15/27) lift the calls of ep_read_events_proc() into the callers
expand the calls of ep_scan_ready_list() that get ep_read_events_proc
as callback.  Allows for somewhat saner calling conventions for
ep_read_events_proc() (passing  as void *, casting 

Re: Where is the declaration of buffer used in kernel_param_ops .get functions?

2020-10-03 Thread Joe Perches
On Sun, 2020-10-04 at 02:36 +0100, Matthew Wilcox wrote:
> On Sat, Oct 03, 2020 at 06:19:18PM -0700, Joe Perches wrote:
> > These patches came up because I was looking for
> > the location of the declaration of the buffer used
> > in kernel/params.c struct kernel_param_ops .get
> > functions.
> > 
> > I didn't find it.
> > 
> > I want to see if it's appropriate to convert the
> > sprintf family of functions used in these .get
> > functions to sysfs_emit.
> > 
> > Patches submitted here:
> > https://lore.kernel.org/lkml/5d606519698ce4c8f1203a2b35797d8254c6050a.1600285923.git@perches.com/T/
> > 
> > Anyone know if it's appropriate to change the
> > sprintf-like uses in these functions to sysfs_emit
> > and/or sysfs_emit_at?
> 
> There's a lot of preprocessor magic to wade through.
> 
> I'm pretty sure this comes through include/linux/moduleparam.h
> and kernel/module.c.

Dunno, looked there, still can't find it.

btw:

The __module_param_call macro looks very dodgy
as it uses both __used and __attribute__((unused))
and likely one of them should be removed (unused?)

It looks like the comes from varying definitions of
__attribute_used__ eventually converted to __used 
for old gcc versions 2, 3, and 4.

1da177e4c3f4:include/linux/compiler-gcc2.h:#define __attribute_used__   
__attribute__((__unused__))
1da177e4c3f4:include/linux/compiler-gcc3.h:# define __attribute_used__  
__attribute__((__used__))
1da177e4c3f4:include/linux/compiler-gcc3.h:# define __attribute_used__  
__attribute__((__unused__))
1da177e4c3f4:include/linux/compiler-gcc4.h:#define __attribute_used__   
__attribute__((__used__))

Maybe:

---
 include/linux/moduleparam.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 47879fc7f75e..fc820b27fb00 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -288,10 +288,10 @@ struct kparam_array
/* Default value instead of permissions? */ \
static const char __param_str_##name[] = prefix #name;  \
static struct kernel_param __moduleparam_const __param_##name   \
-   __used  \
-__attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void * \
-   = { __param_str_##name, THIS_MODULE, ops,   \
-   VERIFY_OCTAL_PERMISSIONS(perm), level, flags, { arg } }
+   __used __section("__param") __aligned(sizeof(void *)) = {   \
+   __param_str_##name, THIS_MODULE, ops,   \
+   VERIFY_OCTAL_PERMISSIONS(perm), level, flags, { arg }   \
+   }
 
 /* Obsolete - use module_param_cb() */
 #define module_param_call(name, _set, _get, arg, perm) \




arm-linux-gnueabi-ld: drivers/gpu/drm/bridge/sil-sii8620.c:2191: undefined reference to `extcon_register_notifier'

2020-10-03 Thread kernel test robot
Hi Masahiro,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   22fbc037cd32e4e6771d2271b565806cfb8c134c
commit: def2fbffe62c00c330c7f41584a356001179c59c kconfig: allow symbols implied 
by y to become m
date:   7 months ago
config: arm-randconfig-p001-20201004 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=def2fbffe62c00c330c7f41584a356001179c59c
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout def2fbffe62c00c330c7f41584a356001179c59c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: drivers/gpu/drm/bridge/sil-sii8620.o: in function 
`sii8620_remove':
   drivers/gpu/drm/bridge/sil-sii8620.c:2355: undefined reference to 
`extcon_unregister_notifier'
   arm-linux-gnueabi-ld: drivers/gpu/drm/bridge/sil-sii8620.o: in function 
`sii8620_extcon_init':
   drivers/gpu/drm/bridge/sil-sii8620.c:2179: undefined reference to 
`extcon_find_edev_by_node'
>> arm-linux-gnueabi-ld: drivers/gpu/drm/bridge/sil-sii8620.c:2191: undefined 
>> reference to `extcon_register_notifier'
   arm-linux-gnueabi-ld: drivers/gpu/drm/bridge/sil-sii8620.o: in function 
`sii8620_extcon_work':
   drivers/gpu/drm/bridge/sil-sii8620.c:2139: undefined reference to 
`extcon_get_state'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v3 5/7] rtc: New driver for RTC in Netronix embedded controller

2020-10-03 Thread Jonathan Neuschäfer
On Fri, Sep 25, 2020 at 07:44:24AM +0200, Uwe Kleine-König wrote:
> Hello Jonathan,
> 
> On Thu, Sep 24, 2020 at 09:24:53PM +0200, Jonathan Neuschäfer wrote:
> > +#define NTXEC_REG_WRITE_YEAR   0x10
> > +#define NTXEC_REG_WRITE_MONTH  0x11
> > +#define NTXEC_REG_WRITE_DAY0x12
> > +#define NTXEC_REG_WRITE_HOUR   0x13
> > +#define NTXEC_REG_WRITE_MINUTE 0x14
> > +#define NTXEC_REG_WRITE_SECOND 0x15
> > +
> > +#define NTXEC_REG_READ_YM  0x20
> > +#define NTXEC_REG_READ_DH  0x21
> > +#define NTXEC_REG_READ_MS  0x23
> 
> Is this an official naming? I think at least ..._MS is a poor name.
> Maybe consider ..._MINSEC instead and make the other two names a bit longer
> for consistency?

It's inofficial (the vendor kernel uses 0x10 etc. directly).
I'll pick longer names.

> > +static int ntxec_read_time(struct device *dev, struct rtc_time *tm)
> > +{
> > +   struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> > +   unsigned int value;
> > +   int res;
> > +
> > +   res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_YM, );
> > +   if (res < 0)
> > +   return res;
> > +
> > +   tm->tm_year = (value >> 8) + 100;
> > +   tm->tm_mon = (value & 0xff) - 1;
> > +
> > +   res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_DH, );
> > +   if (res < 0)
> > +   return res;
> > +
> > +   tm->tm_mday = value >> 8;
> > +   tm->tm_hour = value & 0xff;
> > +
> > +   res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_MS, );
> > +   if (res < 0)
> > +   return res;
> > +
> > +   tm->tm_min = value >> 8;
> > +   tm->tm_sec = value & 0xff;
> > +
> > +   return 0;
> > +}
> > +
> > +static int ntxec_set_time(struct device *dev, struct rtc_time *tm)
> > +{
> > +   struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> > +   int res = 0;
> > +
> > +   res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_YEAR, 
> > ntxec_reg8(tm->tm_year - 100));
> > +   if (res)
> > +   return res;
> > +
> > +   res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_MONTH, 
> > ntxec_reg8(tm->tm_mon + 1));
> > +   if (res)
> > +   return res;
> > +
> > +   res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_DAY, 
> > ntxec_reg8(tm->tm_mday));
> > +   if (res)
> > +   return res;
> > +
> > +   res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_HOUR, 
> > ntxec_reg8(tm->tm_hour));
> > +   if (res)
> > +   return res;
> > +
> > +   res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_MINUTE, 
> > ntxec_reg8(tm->tm_min));
> > +   if (res)
> > +   return res;
> > +
> > +   return regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_SECOND, 
> > ntxec_reg8(tm->tm_sec));
> 
> I wonder: Is this racy? If you write minute, does the seconds reset to
> zero or something like that? Or can it happen, that after writing the
> minute register and before writing the second register the seconds
> overflow and you end up with the time set to a minute later than
> intended? If so it might be worth to set the seconds to 0 at the start
> of the function (with an explaining comment).

The setting the minutes does not reset the seconds, so I think this race
condition is possible. I'll add the workaround.

> .read_time has a similar race. What happens if minutes overflow between
> reading NTXEC_REG_READ_DH and NTXEC_REG_READ_MS?

Yes, we get read tearing in that case. It could even propagate all the
way to the year/month field, for example when the following time rolls
over:
   A   |  B  |  C
2020-10-31 23:59:59
2020-11-01 00:00:00

- If the increment happens after reading C, we get 2020-10-31 23:59:59
- If the increment happens between reading B and C, we get 2020-10-31 23:00:00
- If the increment happens between reading A and B, we get 2020-10-01 00:00:00
- If the increment happens before reading A, we get2020-11-01 00:00:00

... both of which are far from correct.

To mitigate this issue, I think something like the following is needed:

- Read year/month
- Read day/hour
- Read minute/second
- Read day/hour, compare with previously read value, restart on mismatch
- Read year/month, compare with previously read value, restart on mismatch

The order of the last two steps doesn't matter, as far as I can see, but
if I remove one of them, I can't catch all cases of read tearing.

> > +static struct platform_driver ntxec_rtc_driver = {
> > +   .driver = {
> > +   .name = "ntxec-rtc",
> > +   },
> > +   .probe = ntxec_rtc_probe,
> 
> No .remove function?

I don't think it would serve a purpose in this driver. There are no
device-specific resources to release (no clocks to unprepare, for
example).


Thanks,
Jonathan Neuschäfer


signature.asc
Description: PGP signature


[PATCH v3 10/14] iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable

2020-10-03 Thread Suravee Suthikulpanit
To simplify the fetch_pte function. There is no functional change.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h  |  2 +-
 drivers/iommu/amd/io_pgtable.c | 13 +++--
 drivers/iommu/amd/iommu.c  |  4 +++-
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2059e64fdc53..69996e57fae2 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -134,7 +134,7 @@ extern int iommu_map_page(struct protection_domain *dom,
 extern unsigned long iommu_unmap_page(struct protection_domain *dom,
  unsigned long bus_addr,
  unsigned long page_size);
-extern u64 *fetch_pte(struct protection_domain *domain,
+extern u64 *fetch_pte(struct amd_io_pgtable *pgtable,
  unsigned long address,
  unsigned long *page_size);
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 989db64a89a7..93ff8cb452ed 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -317,7 +317,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
  * This function checks if there is a PTE for a given dma address. If
  * there is one, it returns the pointer to it.
  */
-u64 *fetch_pte(struct protection_domain *domain,
+u64 *fetch_pte(struct amd_io_pgtable *pgtable,
   unsigned long address,
   unsigned long *page_size)
 {
@@ -326,11 +326,11 @@ u64 *fetch_pte(struct protection_domain *domain,
 
*page_size = 0;
 
-   if (address > PM_LEVEL_SIZE(domain->iop.mode))
+   if (address > PM_LEVEL_SIZE(pgtable->mode))
return NULL;
 
-   level  =  domain->iop.mode - 1;
-   pte= >iop.root[PM_LEVEL_INDEX(level, address)];
+   level  =  pgtable->mode - 1;
+   pte= >root[PM_LEVEL_INDEX(level, address)];
*page_size =  PTE_LEVEL_PAGE_SIZE(level);
 
while (level > 0) {
@@ -465,6 +465,8 @@ unsigned long iommu_unmap_page(struct protection_domain 
*dom,
   unsigned long iova,
   unsigned long size)
 {
+   struct io_pgtable_ops *ops = >iop.iop.ops;
+   struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
unsigned long long unmapped;
unsigned long unmap_size;
u64 *pte;
@@ -474,8 +476,7 @@ unsigned long iommu_unmap_page(struct protection_domain 
*dom,
unmapped = 0;
 
while (unmapped < size) {
-   pte = fetch_pte(dom, iova, _size);
-
+   pte = fetch_pte(pgtable, iova, _size);
if (pte) {
int i, count;
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3f6ede1e572c..87cea1cde414 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2078,13 +2078,15 @@ static phys_addr_t amd_iommu_iova_to_phys(struct 
iommu_domain *dom,
  dma_addr_t iova)
 {
struct protection_domain *domain = to_pdomain(dom);
+   struct io_pgtable_ops *ops = >iop.iop.ops;
+   struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
unsigned long offset_mask, pte_pgsize;
u64 *pte, __pte;
 
if (domain->iop.mode == PAGE_MODE_NONE)
return iova;
 
-   pte = fetch_pte(domain, iova, _pgsize);
+   pte = fetch_pte(pgtable, iova, _pgsize);
 
if (!pte || !IOMMU_PTE_PRESENT(*pte))
return 0;
-- 
2.17.1



[PATCH v3 13/14] iommu/amd: Introduce IOMMU flush callbacks

2020-10-03 Thread Suravee Suthikulpanit
Add TLB flush callback functions, which are used by the IO
page table framework.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/io_pgtable.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index d8b329aa0bb2..3c2faa47ea5d 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -514,6 +514,33 @@ static phys_addr_t iommu_v1_iova_to_phys(struct 
io_pgtable_ops *ops, unsigned lo
 /*
  * 
  */
+static void v1_tlb_flush_all(void *cookie)
+{
+}
+
+static void v1_tlb_flush_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+}
+
+static void v1_tlb_flush_leaf(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+}
+
+static void v1_tlb_add_page(struct iommu_iotlb_gather *gather,
+unsigned long iova, size_t granule,
+void *cookie)
+{
+}
+
+const struct iommu_flush_ops v1_flush_ops = {
+   .tlb_flush_all  = v1_tlb_flush_all,
+   .tlb_flush_walk = v1_tlb_flush_walk,
+   .tlb_flush_leaf = v1_tlb_flush_leaf,
+   .tlb_add_page   = v1_tlb_add_page,
+};
+
 static void v1_free_pgtable(struct io_pgtable *iop)
 {
 }
@@ -526,6 +553,8 @@ static struct io_pgtable *v1_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *coo
pgtable->iop.ops.unmap= iommu_v1_unmap_page;
pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
 
+   cfg->tlb = _flush_ops;
+
return >iop;
 }
 
-- 
2.17.1



[PATCH v3 11/14] iommu/amd: Introduce iommu_v1_iova_to_phys

2020-10-03 Thread Suravee Suthikulpanit
This implements iova_to_phys for AMD IOMMU v1 pagetable,
which will be used by the IO page table framework.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/io_pgtable.c | 22 ++
 drivers/iommu/amd/iommu.c  | 16 +---
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 93ff8cb452ed..7841e5e1e563 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -494,6 +494,26 @@ unsigned long iommu_unmap_page(struct protection_domain 
*dom,
return unmapped;
 }
 
+static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned 
long iova)
+{
+   struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+   unsigned long offset_mask, pte_pgsize;
+   u64 *pte, __pte;
+
+   if (pgtable->mode == PAGE_MODE_NONE)
+   return iova;
+
+   pte = fetch_pte(pgtable, iova, _pgsize);
+
+   if (!pte || !IOMMU_PTE_PRESENT(*pte))
+   return 0;
+
+   offset_mask = pte_pgsize - 1;
+   __pte   = __sme_clr(*pte & PM_ADDR_MASK);
+
+   return (__pte & ~offset_mask) | (iova & offset_mask);
+}
+
 /*
  * 
  */
@@ -505,6 +525,8 @@ static struct io_pgtable *v1_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *coo
 {
struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 
+   pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
+
return >iop;
 }
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 87cea1cde414..9a1a16031e00 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2079,22 +2079,8 @@ static phys_addr_t amd_iommu_iova_to_phys(struct 
iommu_domain *dom,
 {
struct protection_domain *domain = to_pdomain(dom);
struct io_pgtable_ops *ops = >iop.iop.ops;
-   struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
-   unsigned long offset_mask, pte_pgsize;
-   u64 *pte, __pte;
 
-   if (domain->iop.mode == PAGE_MODE_NONE)
-   return iova;
-
-   pte = fetch_pte(pgtable, iova, _pgsize);
-
-   if (!pte || !IOMMU_PTE_PRESENT(*pte))
-   return 0;
-
-   offset_mask = pte_pgsize - 1;
-   __pte   = __sme_clr(*pte & PM_ADDR_MASK);
-
-   return (__pte & ~offset_mask) | (iova & offset_mask);
+   return ops->iova_to_phys(ops, iova);
 }
 
 static bool amd_iommu_capable(enum iommu_cap cap)
-- 
2.17.1



[PATCH v3 06/14] iommu/amd: Move IO page table related functions

2020-10-03 Thread Suravee Suthikulpanit
Preparing to migrate to use IO page table framework.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h  |  18 ++
 drivers/iommu/amd/io_pgtable.c | 473 
 drivers/iommu/amd/iommu.c  | 476 +
 3 files changed, 493 insertions(+), 474 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 8b7be9171030..ee7ff4d827e1 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -122,4 +122,22 @@ void amd_iommu_apply_ivrs_quirks(void);
 static inline void amd_iommu_apply_ivrs_quirks(void) { }
 #endif
 
+/* TODO: These are temporary and will be removed once fully transition */
+extern void free_pagetable(struct domain_pgtable *pgtable);
+extern int iommu_map_page(struct protection_domain *dom,
+ unsigned long bus_addr,
+ unsigned long phys_addr,
+ unsigned long page_size,
+ int prot,
+ gfp_t gfp);
+extern unsigned long iommu_unmap_page(struct protection_domain *dom,
+ unsigned long bus_addr,
+ unsigned long page_size);
+extern u64 *fetch_pte(struct protection_domain *domain,
+ unsigned long address,
+ unsigned long *page_size);
+extern void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
+struct domain_pgtable *pgtable);
+extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
+u64 *root, int mode);
 #endif
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 6b2de9e467d9..c11355afe624 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -23,6 +23,479 @@
 #include "amd_iommu_types.h"
 #include "amd_iommu.h"
 
+/*
+ * Helper function to get the first pte of a large mapping
+ */
+static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
+unsigned long *count)
+{
+   unsigned long pte_mask, pg_size, cnt;
+   u64 *fpte;
+
+   pg_size  = PTE_PAGE_SIZE(*pte);
+   cnt  = PAGE_SIZE_PTE_COUNT(pg_size);
+   pte_mask = ~((cnt << 3) - 1);
+   fpte = (u64 *)(((unsigned long)pte) & pte_mask);
+
+   if (page_size)
+   *page_size = pg_size;
+
+   if (count)
+   *count = cnt;
+
+   return fpte;
+}
+
+/
+ *
+ * The functions below are used the create the page table mappings for
+ * unity mapped regions.
+ *
+ /
+
+static void free_page_list(struct page *freelist)
+{
+   while (freelist != NULL) {
+   unsigned long p = (unsigned long)page_address(freelist);
+
+   freelist = freelist->freelist;
+   free_page(p);
+   }
+}
+
+static struct page *free_pt_page(unsigned long pt, struct page *freelist)
+{
+   struct page *p = virt_to_page((void *)pt);
+
+   p->freelist = freelist;
+
+   return p;
+}
+
+#define DEFINE_FREE_PT_FN(LVL, FN) 
\
+static struct page *free_pt_##LVL (unsigned long __pt, struct page *freelist)  
\
+{  
\
+   unsigned long p;
\
+   u64 *pt;
\
+   int i;  
\
+   
\
+   pt = (u64 *)__pt;   
\
+   
\
+   for (i = 0; i < 512; ++i) { 
\
+   /* PTE present? */  
\
+   if (!IOMMU_PTE_PRESENT(pt[i]))  
\
+   continue;   
\
+   
\
+   /* Large PTE? */
\
+   if (PM_PTE_LEVEL(pt[i]) == 0 || 
\
+   PM_PTE_LEVEL(pt[i]) == 7)   
\
+   continue;   
\
+   
\
+   p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
\
+   freelist = FN(p, 

[PATCH v3 14/14] iommu/amd: Adopt IO page table framework

2020-10-03 Thread Suravee Suthikulpanit
Switch to using IO page table framework for AMD IOMMU v1 page table.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 77f44b927ae7..6f8316206fb8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1573,6 +1574,22 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
return ret;
 }
 
+struct io_pgtable_ops *
+amd_iommu_setup_io_pgtable_ops(struct iommu_dev_data *dev_data,
+  struct protection_domain *domain)
+{
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[dev_data->devid];
+
+   domain->iop.pgtbl_cfg = (struct io_pgtable_cfg) {
+   .pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .ias= IOMMU_IN_ADDR_BIT_SIZE,
+   .oas= IOMMU_OUT_ADDR_BIT_SIZE,
+   .iommu_dev  = >dev->dev,
+   };
+
+   return alloc_io_pgtable_ops(AMD_IOMMU_V1, >iop.pgtbl_cfg, 
domain);
+}
+
 /*
  * If a device is not yet associated with a domain, this function makes the
  * device visible in the domain
@@ -1580,6 +1597,7 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 static int attach_device(struct device *dev,
 struct protection_domain *domain)
 {
+   struct io_pgtable_ops *pgtbl_ops;
struct iommu_dev_data *dev_data;
struct pci_dev *pdev;
unsigned long flags;
@@ -1623,6 +1641,12 @@ static int attach_device(struct device *dev,
 skip_ats_check:
ret = 0;
 
+   pgtbl_ops = amd_iommu_setup_io_pgtable_ops(dev_data, domain);
+   if (!pgtbl_ops) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
do_attach(dev_data, domain);
 
/*
@@ -1958,6 +1982,8 @@ static void amd_iommu_domain_free(struct iommu_domain 
*dom)
if (domain->dev_cnt > 0)
cleanup_domain(domain);
 
+   free_io_pgtable_ops(>iop.iop.ops);
+
BUG_ON(domain->dev_cnt != 0);
 
if (!dom)
-- 
2.17.1



[PATCH v3 12/14] iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page

2020-10-03 Thread Suravee Suthikulpanit
These implement map and unmap for AMD IOMMU v1 pagetable, which
will be used by the IO pagetable framework.

Also clean up unused extern function declarations.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h  | 13 -
 drivers/iommu/amd/io_pgtable.c | 25 -
 drivers/iommu/amd/iommu.c  |  7 ---
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 69996e57fae2..2e8dc2a1ec0f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -124,19 +124,6 @@ void amd_iommu_apply_ivrs_quirks(void);
 static inline void amd_iommu_apply_ivrs_quirks(void) { }
 #endif
 
-/* TODO: These are temporary and will be removed once fully transition */
-extern int iommu_map_page(struct protection_domain *dom,
- unsigned long bus_addr,
- unsigned long phys_addr,
- unsigned long page_size,
- int prot,
- gfp_t gfp);
-extern unsigned long iommu_unmap_page(struct protection_domain *dom,
- unsigned long bus_addr,
- unsigned long page_size);
-extern u64 *fetch_pte(struct amd_io_pgtable *pgtable,
- unsigned long address,
- unsigned long *page_size);
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
 extern void amd_iommu_free_pgtable(struct amd_io_pgtable *pgtable);
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 7841e5e1e563..d8b329aa0bb2 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -317,9 +317,9 @@ static u64 *alloc_pte(struct protection_domain *domain,
  * This function checks if there is a PTE for a given dma address. If
  * there is one, it returns the pointer to it.
  */
-u64 *fetch_pte(struct amd_io_pgtable *pgtable,
-  unsigned long address,
-  unsigned long *page_size)
+static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
+ unsigned long address,
+ unsigned long *page_size)
 {
int level;
u64 *pte;
@@ -392,13 +392,10 @@ static struct page *free_clear_pte(u64 *pte, u64 pteval, 
struct page *freelist)
  * supporting all features of AMD IOMMU page tables like level skipping
  * and full 64 bit address spaces.
  */
-int iommu_map_page(struct protection_domain *dom,
-  unsigned long iova,
-  unsigned long paddr,
-  unsigned long size,
-  int prot,
-  gfp_t gfp)
+static int iommu_v1_map_page(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
+   struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
struct page *freelist = NULL;
bool updated = false;
u64 __pte, *pte;
@@ -461,11 +458,11 @@ int iommu_map_page(struct protection_domain *dom,
return ret;
 }
 
-unsigned long iommu_unmap_page(struct protection_domain *dom,
-  unsigned long iova,
-  unsigned long size)
+static unsigned long iommu_v1_unmap_page(struct io_pgtable_ops *ops,
+ unsigned long iova,
+ size_t size,
+ struct iommu_iotlb_gather *gather)
 {
-   struct io_pgtable_ops *ops = >iop.iop.ops;
struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
unsigned long long unmapped;
unsigned long unmap_size;
@@ -525,6 +522,8 @@ static struct io_pgtable *v1_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *coo
 {
struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 
+   pgtable->iop.ops.map  = iommu_v1_map_page;
+   pgtable->iop.ops.unmap= iommu_v1_unmap_page;
pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
 
return >iop;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9a1a16031e00..77f44b927ae7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2044,6 +2044,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
 gfp_t gfp)
 {
struct protection_domain *domain = to_pdomain(dom);
+   struct io_pgtable_ops *ops = >iop.iop.ops;
int prot = 0;
int ret;
 
@@ -2055,8 +2056,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
-
+   ret = ops->map(ops, iova, paddr, page_size, prot, gfp);

[PATCH v3 09/14] iommu/amd: Rename variables to be consistent with struct io_pgtable_ops

2020-10-03 Thread Suravee Suthikulpanit
There is no functional change.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/io_pgtable.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 6c063d2c8bf0..989db64a89a7 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -393,9 +393,9 @@ static struct page *free_clear_pte(u64 *pte, u64 pteval, 
struct page *freelist)
  * and full 64 bit address spaces.
  */
 int iommu_map_page(struct protection_domain *dom,
-  unsigned long bus_addr,
-  unsigned long phys_addr,
-  unsigned long page_size,
+  unsigned long iova,
+  unsigned long paddr,
+  unsigned long size,
   int prot,
   gfp_t gfp)
 {
@@ -404,15 +404,15 @@ int iommu_map_page(struct protection_domain *dom,
u64 __pte, *pte;
int ret, i, count;
 
-   BUG_ON(!IS_ALIGNED(bus_addr, page_size));
-   BUG_ON(!IS_ALIGNED(phys_addr, page_size));
+   BUG_ON(!IS_ALIGNED(iova, size));
+   BUG_ON(!IS_ALIGNED(paddr, size));
 
ret = -EINVAL;
if (!(prot & IOMMU_PROT_MASK))
goto out;
 
-   count = PAGE_SIZE_PTE_COUNT(page_size);
-   pte   = alloc_pte(dom, bus_addr, page_size, NULL, gfp, );
+   count = PAGE_SIZE_PTE_COUNT(size);
+   pte   = alloc_pte(dom, iova, size, NULL, gfp, );
 
ret = -ENOMEM;
if (!pte)
@@ -425,10 +425,10 @@ int iommu_map_page(struct protection_domain *dom,
updated = true;
 
if (count > 1) {
-   __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
+   __pte = PAGE_SIZE_PTE(__sme_set(paddr), size);
__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
} else
-   __pte = __sme_set(phys_addr) | IOMMU_PTE_PR | IOMMU_PTE_FC;
+   __pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC;
 
if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -462,20 +462,19 @@ int iommu_map_page(struct protection_domain *dom,
 }
 
 unsigned long iommu_unmap_page(struct protection_domain *dom,
-  unsigned long bus_addr,
-  unsigned long page_size)
+  unsigned long iova,
+  unsigned long size)
 {
unsigned long long unmapped;
unsigned long unmap_size;
u64 *pte;
 
-   BUG_ON(!is_power_of_2(page_size));
+   BUG_ON(!is_power_of_2(size));
 
unmapped = 0;
 
-   while (unmapped < page_size) {
-
-   pte = fetch_pte(dom, bus_addr, _size);
+   while (unmapped < size) {
+   pte = fetch_pte(dom, iova, _size);
 
if (pte) {
int i, count;
@@ -485,7 +484,7 @@ unsigned long iommu_unmap_page(struct protection_domain 
*dom,
pte[i] = 0ULL;
}
 
-   bus_addr  = (bus_addr & ~(unmap_size - 1)) + unmap_size;
+   iova = (iova & ~(unmap_size - 1)) + unmap_size;
unmapped += unmap_size;
}
 
-- 
2.17.1



[PATCH v3 02/14] iommu/amd: Prepare for generic IO page table framework

2020-10-03 Thread Suravee Suthikulpanit
Add initial hook up code to implement generic IO page table framework.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/Kconfig   |  1 +
 drivers/iommu/amd/Makefile  |  2 +-
 drivers/iommu/amd/amd_iommu_types.h | 35 +++
 drivers/iommu/amd/io_pgtable.c  | 43 +
 drivers/iommu/amd/iommu.c   | 10 ---
 drivers/iommu/io-pgtable.c  |  3 ++
 include/linux/io-pgtable.h  |  2 ++
 7 files changed, 85 insertions(+), 11 deletions(-)
 create mode 100644 drivers/iommu/amd/io_pgtable.c

diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index 626b97d0dd21..a3cbafb603f5 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,6 +10,7 @@ config AMD_IOMMU
select IOMMU_API
select IOMMU_IOVA
select IOMMU_DMA
+   select IOMMU_IO_PGTABLE
depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
  With this option you can enable support for AMD IOMMU hardware in
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index dc5a2fa4fd37..a935f8f4b974 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += iommu_v2.o
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f696ac7c5f89..e3ac3e57e507 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Maximum number of IOMMUs supported
@@ -252,6 +253,19 @@
 
 #define GA_GUEST_NR0x1
 
+#define IOMMU_IN_ADDR_BIT_SIZE  52
+#define IOMMU_OUT_ADDR_BIT_SIZE 52
+
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * 512GB Pages are not supported due to a hardware bug
+ */
+#define AMD_IOMMU_PGSIZES  ((~0xFFFUL) & ~(2ULL << 38))
+
 /* Bit value definition for dte irq remapping fields*/
 #define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
 #define DTE_IRQ_REMAP_INTCTL_MASK  (0x3ULL << 60)
@@ -461,6 +475,26 @@ struct amd_irte_ops;
 
 #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED  (1 << 0)
 
+#define io_pgtable_to_data(x) \
+   container_of((x), struct amd_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x) \
+   io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+#define io_pgtable_ops_to_domain(x) \
+   container_of(io_pgtable_ops_to_data(x), \
+struct protection_domain, iop)
+
+#define io_pgtable_cfg_to_data(x) \
+   container_of((x), struct amd_io_pgtable, pgtbl_cfg)
+
+struct amd_io_pgtable {
+   struct io_pgtable_cfg   pgtbl_cfg;
+   struct io_pgtable   iop;
+   int mode;
+   u64 *root;
+};
+
 /*
  * This structure contains generic data for  IOMMU protection domains
  * independent of their use.
@@ -469,6 +503,7 @@ struct protection_domain {
struct list_head dev_list; /* List of all devices in this domain */
struct iommu_domain domain; /* generic domain handle used by
   iommu core code */
+   struct amd_io_pgtable iop;
spinlock_t lock;/* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
atomic64_t pt_root; /* pgtable root and pgtable mode */
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
new file mode 100644
index ..6b2de9e467d9
--- /dev/null
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic AMD IO page table allocator.
+ *
+ * Copyright (C) 2020 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit 
+ */
+
+#define pr_fmt(fmt) "AMD-Vi: " fmt
+#define dev_fmt(fmt)pr_fmt(fmt)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "amd_iommu_types.h"
+#include "amd_iommu.h"
+
+/*
+ * 
+ */
+static void v1_free_pgtable(struct io_pgtable *iop)
+{
+}
+
+static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void 
*cookie)
+{
+   struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
+
+   return >iop;
+}
+
+struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns = {
+   .alloc  = v1_alloc_pgtable,
+   .free   = v1_free_pgtable,
+};
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e92b3f744292..2b7eb51dcbb8 100644
--- 

[PATCH v3 08/14] iommu/amd: Remove amd_iommu_domain_get_pgtable

2020-10-03 Thread Suravee Suthikulpanit
Since the IO page table root and mode parameters have been moved into
the struct amd_io_pg, the function is no longer needed. Therefore,
remove it along with the struct domain_pgtable.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  4 ++--
 drivers/iommu/amd/amd_iommu_types.h |  6 -
 drivers/iommu/amd/io_pgtable.c  | 36 ++---
 drivers/iommu/amd/iommu.c   | 34 ---
 4 files changed, 19 insertions(+), 61 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 8dff7d85be79..2059e64fdc53 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -101,6 +101,8 @@ static inline
 void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 root)
 {
atomic64_set(>iop.pt_root, root);
+   domain->iop.root = (u64 *)(root & PAGE_MASK);
+   domain->iop.mode = root & 7; /* lowest 3 bits encode pgtable mode */
 }
 
 static inline
@@ -135,8 +137,6 @@ extern unsigned long iommu_unmap_page(struct 
protection_domain *dom,
 extern u64 *fetch_pte(struct protection_domain *domain,
  unsigned long address,
  unsigned long *page_size);
-extern void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
-struct domain_pgtable *pgtable);
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
 extern void amd_iommu_free_pgtable(struct amd_io_pgtable *pgtable);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 80b5c34357ed..de3fe9433080 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -514,12 +514,6 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
 
-/* For decocded pt_root */
-struct domain_pgtable {
-   int mode;
-   u64 *root;
-};
-
 /*
  * Structure where we save information about one hardware AMD IOMMU in the
  * system.
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 23e82da2dea8..6c063d2c8bf0 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -184,30 +184,27 @@ static bool increase_address_space(struct 
protection_domain *domain,
   unsigned long address,
   gfp_t gfp)
 {
-   struct domain_pgtable pgtable;
unsigned long flags;
bool ret = true;
u64 *pte;
 
spin_lock_irqsave(>lock, flags);
 
-   amd_iommu_domain_get_pgtable(domain, );
-
-   if (address <= PM_LEVEL_SIZE(pgtable.mode))
+   if (address <= PM_LEVEL_SIZE(domain->iop.mode))
goto out;
 
ret = false;
-   if (WARN_ON_ONCE(pgtable.mode == PAGE_MODE_6_LEVEL))
+   if (WARN_ON_ONCE(domain->iop.mode == PAGE_MODE_6_LEVEL))
goto out;
 
pte = (void *)get_zeroed_page(gfp);
if (!pte)
goto out;
 
-   *pte = PM_LEVEL_PDE(pgtable.mode, iommu_virt_to_phys(pgtable.root));
+   *pte = PM_LEVEL_PDE(domain->iop.mode, 
iommu_virt_to_phys(domain->iop.root));
 
-   pgtable.root  = pte;
-   pgtable.mode += 1;
+   domain->iop.root  = pte;
+   domain->iop.mode += 1;
amd_iommu_update_and_flush_device_table(domain);
amd_iommu_domain_flush_complete(domain);
 
@@ -215,7 +212,7 @@ static bool increase_address_space(struct protection_domain 
*domain,
 * Device Table needs to be updated and flushed before the new root can
 * be published.
 */
-   amd_iommu_domain_set_pgtable(domain, pte, pgtable.mode);
+   amd_iommu_domain_set_pgtable(domain, pte, domain->iop.mode);
 
ret = true;
 
@@ -232,29 +229,23 @@ static u64 *alloc_pte(struct protection_domain *domain,
  gfp_t gfp,
  bool *updated)
 {
-   struct domain_pgtable pgtable;
int level, end_lvl;
u64 *pte, *page;
 
BUG_ON(!is_power_of_2(page_size));
 
-   amd_iommu_domain_get_pgtable(domain, );
-
-   while (address > PM_LEVEL_SIZE(pgtable.mode)) {
+   while (address > PM_LEVEL_SIZE(domain->iop.mode)) {
/*
 * Return an error if there is no memory to update the
 * page-table.
 */
if (!increase_address_space(domain, address, gfp))
return NULL;
-
-   /* Read new values to check if update was successful */
-   amd_iommu_domain_get_pgtable(domain, );
}
 
 
-   level   = pgtable.mode - 1;
-   pte = [PM_LEVEL_INDEX(level, address)];
+   level   = domain->iop.mode - 1;
+   pte = >iop.root[PM_LEVEL_INDEX(level, address)];
address = PAGE_SIZE_ALIGN(address, page_size);
end_lvl 

[PATCH v3 05/14] iommu/amd: Declare functions as extern

2020-10-03 Thread Suravee Suthikulpanit
And move declaration to header file so that they can be included across
multiple files. There is no functional change.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h |  3 +++
 drivers/iommu/amd/iommu.c | 39 +--
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 22ecacb71675..8b7be9171030 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -48,6 +48,9 @@ extern int amd_iommu_domain_enable_v2(struct iommu_domain 
*dom, int pasids);
 extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
u64 address);
 extern void amd_iommu_update_and_flush_device_table(struct protection_domain 
*domain);
+extern void amd_iommu_domain_update(struct protection_domain *domain);
+extern void amd_iommu_domain_flush_complete(struct protection_domain *domain);
+extern void amd_iommu_domain_flush_tlb_pde(struct protection_domain *domain);
 extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
 extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
 unsigned long cr3);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 09da37c4c9c4..f91f35edb7ba 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -88,7 +88,6 @@ struct iommu_cmd {
 
 struct kmem_cache *amd_iommu_irq_cache;
 
-static void update_domain(struct protection_domain *domain);
 static void detach_device(struct device *dev);
 
 /
@@ -1294,12 +1293,12 @@ static void domain_flush_pages(struct protection_domain 
*domain,
 }
 
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
-static void domain_flush_tlb_pde(struct protection_domain *domain)
+void amd_iommu_domain_flush_tlb_pde(struct protection_domain *domain)
 {
__domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1);
 }
 
-static void domain_flush_complete(struct protection_domain *domain)
+void amd_iommu_domain_flush_complete(struct protection_domain *domain)
 {
int i;
 
@@ -1324,7 +1323,7 @@ static void domain_flush_np_cache(struct 
protection_domain *domain,
 
spin_lock_irqsave(>lock, flags);
domain_flush_pages(domain, iova, size);
-   domain_flush_complete(domain);
+   amd_iommu_domain_flush_complete(domain);
spin_unlock_irqrestore(>lock, flags);
}
 }
@@ -1481,7 +1480,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
pgtable.root  = pte;
pgtable.mode += 1;
amd_iommu_update_and_flush_device_table(domain);
-   domain_flush_complete(domain);
+   amd_iommu_domain_flush_complete(domain);
 
/*
 * Device Table needs to be updated and flushed before the new root can
@@ -1734,8 +1733,8 @@ static int iommu_map_page(struct protection_domain *dom,
 * Updates and flushing already happened in
 * increase_address_space().
 */
-   domain_flush_tlb_pde(dom);
-   domain_flush_complete(dom);
+   amd_iommu_domain_flush_tlb_pde(dom);
+   amd_iommu_domain_flush_complete(dom);
spin_unlock_irqrestore(>lock, flags);
}
 
@@ -1978,10 +1977,10 @@ static void do_detach(struct iommu_dev_data *dev_data)
device_flush_dte(dev_data);
 
/* Flush IOTLB */
-   domain_flush_tlb_pde(domain);
+   amd_iommu_domain_flush_tlb_pde(domain);
 
/* Wait for the flushes to finish */
-   domain_flush_complete(domain);
+   amd_iommu_domain_flush_complete(domain);
 
/* decrease reference counters - needs to happen after the flushes */
domain->dev_iommu[iommu->index] -= 1;
@@ -2114,9 +2113,9 @@ static int attach_device(struct device *dev,
 * left the caches in the IOMMU dirty. So we have to flush
 * here to evict all dirty stuff.
 */
-   domain_flush_tlb_pde(domain);
+   amd_iommu_domain_flush_tlb_pde(domain);
 
-   domain_flush_complete(domain);
+   amd_iommu_domain_flush_complete(domain);
 
 out:
spin_unlock(_data->lock);
@@ -2277,7 +2276,7 @@ void amd_iommu_update_and_flush_device_table(struct 
protection_domain *domain)
domain_flush_devices(domain);
 }
 
-static void update_domain(struct protection_domain *domain)
+void amd_iommu_domain_update(struct protection_domain *domain)
 {
struct domain_pgtable pgtable;
 
@@ -2286,8 +2285,8 @@ static void update_domain(struct protection_domain 
*domain)
amd_iommu_update_and_flush_device_table(domain);
 
/* Flush domain TLB(s) and wait for completion */
-   domain_flush_tlb_pde(domain);
-   domain_flush_complete(domain);
+   

[PATCH v3 04/14] iommu/amd: Convert to using amd_io_pgtable

2020-10-03 Thread Suravee Suthikulpanit
Make use of the new struct amd_io_pgtable in preparation to remove
the struct domain_pgtable.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h |  1 +
 drivers/iommu/amd/iommu.c | 25 ++---
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index da6e09657e00..22ecacb71675 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -47,6 +47,7 @@ extern void amd_iommu_domain_direct_map(struct iommu_domain 
*dom);
 extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
 extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
u64 address);
+extern void amd_iommu_update_and_flush_device_table(struct protection_domain 
*domain);
 extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
 extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
 unsigned long cr3);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c8b8619cc744..09da37c4c9c4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -90,8 +90,6 @@ struct kmem_cache *amd_iommu_irq_cache;
 
 static void update_domain(struct protection_domain *domain);
 static void detach_device(struct device *dev);
-static void update_and_flush_device_table(struct protection_domain *domain,
- struct domain_pgtable *pgtable);
 
 /
  *
@@ -1482,7 +1480,7 @@ static bool increase_address_space(struct 
protection_domain *domain,
 
pgtable.root  = pte;
pgtable.mode += 1;
-   update_and_flush_device_table(domain, );
+   amd_iommu_update_and_flush_device_table(domain);
domain_flush_complete(domain);
 
/*
@@ -1857,17 +1855,16 @@ static void free_gcr3_table(struct protection_domain 
*domain)
 }
 
 static void set_dte_entry(u16 devid, struct protection_domain *domain,
- struct domain_pgtable *pgtable,
  bool ats, bool ppr)
 {
u64 pte_root = 0;
u64 flags = 0;
u32 old_domid;
 
-   if (pgtable->mode != PAGE_MODE_NONE)
-   pte_root = iommu_virt_to_phys(pgtable->root);
+   if (domain->iop.mode != PAGE_MODE_NONE)
+   pte_root = iommu_virt_to_phys(domain->iop.root);
 
-   pte_root |= (pgtable->mode & DEV_ENTRY_MODE_MASK)
+   pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
@@ -1957,7 +1954,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
 
/* Update device table */
amd_iommu_domain_get_pgtable(domain, );
-   set_dte_entry(dev_data->devid, domain, ,
+   set_dte_entry(dev_data->devid, domain,
  ats, dev_data->iommu_v2);
clone_aliases(dev_data->pdev);
 
@@ -2263,22 +2260,20 @@ static int amd_iommu_domain_get_attr(struct 
iommu_domain *domain,
  *
  */
 
-static void update_device_table(struct protection_domain *domain,
-   struct domain_pgtable *pgtable)
+static void update_device_table(struct protection_domain *domain)
 {
struct iommu_dev_data *dev_data;
 
list_for_each_entry(dev_data, >dev_list, list) {
-   set_dte_entry(dev_data->devid, domain, pgtable,
+   set_dte_entry(dev_data->devid, domain,
  dev_data->ats.enabled, dev_data->iommu_v2);
clone_aliases(dev_data->pdev);
}
 }
 
-static void update_and_flush_device_table(struct protection_domain *domain,
- struct domain_pgtable *pgtable)
+void amd_iommu_update_and_flush_device_table(struct protection_domain *domain)
 {
-   update_device_table(domain, pgtable);
+   update_device_table(domain);
domain_flush_devices(domain);
 }
 
@@ -2288,7 +2283,7 @@ static void update_domain(struct protection_domain 
*domain)
 
/* Update device table */
amd_iommu_domain_get_pgtable(domain, );
-   update_and_flush_device_table(domain, );
+   amd_iommu_update_and_flush_device_table(domain);
 
/* Flush domain TLB(s) and wait for completion */
domain_flush_tlb_pde(domain);
-- 
2.17.1



[PATCH v3 01/14] iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline

2020-10-03 Thread Suravee Suthikulpanit
Move the function to header file to allow inclusion in other files.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h | 13 +
 drivers/iommu/amd/iommu.c | 10 --
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 57309716fd18..97cdb235ce69 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -93,6 +93,19 @@ static inline void *iommu_phys_to_virt(unsigned long paddr)
return phys_to_virt(__sme_clr(paddr));
 }
 
+static inline
+void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 root)
+{
+   atomic64_set(>pt_root, root);
+}
+
+static inline
+void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
+{
+   amd_iommu_domain_set_pt_root(domain, 0);
+}
+
+
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
 struct device *dev);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index db4fb840c59c..e92b3f744292 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -162,16 +162,6 @@ static void amd_iommu_domain_get_pgtable(struct 
protection_domain *domain,
pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
 }
 
-static void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 
root)
-{
-   atomic64_set(>pt_root, root);
-}
-
-static void amd_iommu_domain_clr_pt_root(struct protection_domain *domain)
-{
-   amd_iommu_domain_set_pt_root(domain, 0);
-}
-
 static void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode)
 {
-- 
2.17.1



[PATCH v3 07/14] iommu/amd: Restructure code for freeing page table

2020-10-03 Thread Suravee Suthikulpanit
Introduce amd_iommu_free_pgtable helper function, which consolidates
logic for freeing page table.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h  |  2 +-
 drivers/iommu/amd/io_pgtable.c | 12 +++-
 drivers/iommu/amd/iommu.c  | 19 ++-
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index ee7ff4d827e1..8dff7d85be79 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -123,7 +123,6 @@ static inline void amd_iommu_apply_ivrs_quirks(void) { }
 #endif
 
 /* TODO: These are temporary and will be removed once fully transition */
-extern void free_pagetable(struct domain_pgtable *pgtable);
 extern int iommu_map_page(struct protection_domain *dom,
  unsigned long bus_addr,
  unsigned long phys_addr,
@@ -140,4 +139,5 @@ extern void amd_iommu_domain_get_pgtable(struct 
protection_domain *domain,
 struct domain_pgtable *pgtable);
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
+extern void amd_iommu_free_pgtable(struct amd_io_pgtable *pgtable);
 #endif
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index c11355afe624..23e82da2dea8 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -136,14 +136,24 @@ static struct page *free_sub_pt(unsigned long root, int 
mode,
return freelist;
 }
 
-void free_pagetable(struct domain_pgtable *pgtable)
+void amd_iommu_free_pgtable(struct amd_io_pgtable *pgtable)
 {
+   struct protection_domain *dom;
struct page *freelist = NULL;
unsigned long root;
 
if (pgtable->mode == PAGE_MODE_NONE)
return;
 
+   dom = container_of(pgtable, struct protection_domain, iop);
+
+   /* Update data structure */
+   amd_iommu_domain_clr_pt_root(dom);
+
+   /* Make changes visible to IOMMUs */
+   amd_iommu_domain_update(dom);
+
+   /* Page-table is not visible to IOMMU anymore, so free it */
BUG_ON(pgtable->mode < PAGE_MODE_NONE ||
   pgtable->mode > PAGE_MODE_6_LEVEL);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4d65f64236b6..cbbea7b952fb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1882,17 +1882,13 @@ static void cleanup_domain(struct protection_domain 
*domain)
 
 static void protection_domain_free(struct protection_domain *domain)
 {
-   struct domain_pgtable pgtable;
-
if (!domain)
return;
 
if (domain->id)
domain_id_free(domain->id);
 
-   amd_iommu_domain_get_pgtable(domain, );
-   amd_iommu_domain_clr_pt_root(domain);
-   free_pagetable();
+   amd_iommu_free_pgtable(>iop);
 
kfree(domain);
 }
@@ -2281,22 +2277,11 @@ EXPORT_SYMBOL(amd_iommu_unregister_ppr_notifier);
 void amd_iommu_domain_direct_map(struct iommu_domain *dom)
 {
struct protection_domain *domain = to_pdomain(dom);
-   struct domain_pgtable pgtable;
unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
 
-   /* First save pgtable configuration*/
-   amd_iommu_domain_get_pgtable(domain, );
-
-   /* Remove page-table from domain */
-   amd_iommu_domain_clr_pt_root(domain);
-
-   /* Make changes visible to IOMMUs */
-   amd_iommu_domain_update(domain);
-
-   /* Page-table is not visible to IOMMU anymore, so free it */
-   free_pagetable();
+   amd_iommu_free_pgtable(>iop);
 
spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.17.1



[PATCH v3 00/14] iommu/amd: Add Generic IO Page Table Framework Support

2020-10-03 Thread Suravee Suthikulpanit
The framework allows callable implementation of IO page table.
This allows AMD IOMMU driver to switch between different types
of AMD IOMMU page tables (e.g. v1 vs. v2).

This series refactors the current implementation of AMD IOMMU v1 page table
to adopt the framework. There should be no functional change.
Subsequent series will introduce support for the AMD IOMMU v2 page table.

Thanks,
Suravee

Change from V2 
(https://lore.kernel.org/lkml/835c0d46-ed96-9fbe-856a-777dcffac...@amd.com/T/#t)
  - Patch 2/14: Introduce helper function io_pgtable_cfg_to_data.
  - Patch 13/14: Put back the struct iommu_flush_ops since patch v2 would run 
into
NULL pointer bug when calling free_io_pgtable_ops if not defined.

Change from V1 (https://lkml.org/lkml/2020/9/23/251)
  - Do not specify struct io_pgtable_cfg.coherent_walk, since it is
not currently used. (per Robin)
  - Remove unused struct iommu_flush_ops.  (patch 2/13)
  - Move amd_iommu_setup_io_pgtable_ops to iommu.c instead of io_pgtable.c
patch 13/13)

Suravee Suthikulpanit (14):
  iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline
  iommu/amd: Prepare for generic IO page table framework
  iommu/amd: Move pt_root to to struct amd_io_pgtable
  iommu/amd: Convert to using amd_io_pgtable
  iommu/amd: Declare functions as extern
  iommu/amd: Move IO page table related functions
  iommu/amd: Restructure code for freeing page table
  iommu/amd: Remove amd_iommu_domain_get_pgtable
  iommu/amd: Rename variables to be consistent with struct
io_pgtable_ops
  iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable
  iommu/amd: Introduce iommu_v1_iova_to_phys
  iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page
  iommu/amd: Introduce IOMMU flush callbacks
  iommu/amd: Adopt IO page table framework

 drivers/iommu/amd/Kconfig   |   1 +
 drivers/iommu/amd/Makefile  |   2 +-
 drivers/iommu/amd/amd_iommu.h   |  22 +
 drivers/iommu/amd/amd_iommu_types.h |  43 +-
 drivers/iommu/amd/io_pgtable.c  | 564 
 drivers/iommu/amd/iommu.c   | 646 +++-
 drivers/iommu/io-pgtable.c  |   3 +
 include/linux/io-pgtable.h  |   2 +
 8 files changed, 691 insertions(+), 592 deletions(-)
 create mode 100644 drivers/iommu/amd/io_pgtable.c

-- 
2.17.1



[PATCH v3 03/14] iommu/amd: Move pt_root to to struct amd_io_pgtable

2020-10-03 Thread Suravee Suthikulpanit
To better organize the data structure since it contains IO page table
related information.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   | 2 +-
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 97cdb235ce69..da6e09657e00 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -96,7 +96,7 @@ static inline void *iommu_phys_to_virt(unsigned long paddr)
 static inline
 void amd_iommu_domain_set_pt_root(struct protection_domain *domain, u64 root)
 {
-   atomic64_set(>pt_root, root);
+   atomic64_set(>iop.pt_root, root);
 }
 
 static inline
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index e3ac3e57e507..80b5c34357ed 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -493,6 +493,7 @@ struct amd_io_pgtable {
struct io_pgtable   iop;
int mode;
u64 *root;
+   atomic64_t pt_root; /* pgtable root and pgtable mode */
 };
 
 /*
@@ -506,7 +507,6 @@ struct protection_domain {
struct amd_io_pgtable iop;
spinlock_t lock;/* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
-   atomic64_t pt_root; /* pgtable root and pgtable mode */
int glx;/* Number of levels for GCR3 table */
u64 *gcr3_tbl;  /* Guest CR3 table */
unsigned long flags;/* flags to find out type of domain */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2b7eb51dcbb8..c8b8619cc744 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -146,7 +146,7 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
 static void amd_iommu_domain_get_pgtable(struct protection_domain *domain,
 struct domain_pgtable *pgtable)
 {
-   u64 pt_root = atomic64_read(>pt_root);
+   u64 pt_root = atomic64_read(>iop.pt_root);
 
pgtable->root = (u64 *)(pt_root & PAGE_MASK);
pgtable->mode = pt_root & 7; /* lowest 3 bits encode pgtable mode */
-- 
2.17.1



[PATCH] tools: memory-model: Document that the LKMM can easily miss control dependencies

2020-10-03 Thread Alan Stern
Add a small section to the litmus-tests.txt documentation file for
the Linux Kernel Memory Model explaining that the memory model often
fails to recognize certain control dependencies.

Suggested-by: Akira Yokosawa 
Signed-off-by: Alan Stern 

---

 tools/memory-model/Documentation/litmus-tests.txt |   17 +
 1 file changed, 17 insertions(+)

Index: usb-devel/tools/memory-model/Documentation/litmus-tests.txt
===
--- usb-devel.orig/tools/memory-model/Documentation/litmus-tests.txt
+++ usb-devel/tools/memory-model/Documentation/litmus-tests.txt
@@ -946,6 +946,23 @@ Limitations of the Linux-kernel memory m
carrying a dependency, then the compiler can break that dependency
by substituting a constant of that value.
 
+   Conversely, LKMM sometimes doesn't recognize that a particular
+   optimization is not allowed, and as a result, thinks that a
+   dependency is not present (because the optimization would break it).
+   The memory model misses some pretty obvious control dependencies
+   because of this limitation.  A simple example is:
+
+   r1 = READ_ONCE(x);
+   if (r1 == 0)
+   smp_mb();
+   WRITE_ONCE(y, 1);
+
+   There is a control dependency from the READ_ONCE to the WRITE_ONCE,
+   even when r1 is nonzero, but LKMM doesn't realize this and thinks
+   that the write may execute before the read if r1 != 0.  (Yes, that
+   doesn't make sense if you think about it, but the memory model's
+   intelligence is limited.)
+
 2. Multiple access sizes for a single variable are not supported,
and neither are misaligned or partially overlapping accesses.
 


Re: Where is the declaration of buffer used in kernel_param_ops .get functions?

2020-10-03 Thread Matthew Wilcox
On Sat, Oct 03, 2020 at 06:19:18PM -0700, Joe Perches wrote:
> These patches came up because I was looking for
> the location of the declaration of the buffer used
> in kernel/params.c struct kernel_param_ops .get
> functions.
> 
> I didn't find it.
> 
> I want to see if it's appropriate to convert the
> sprintf family of functions used in these .get
> functions to sysfs_emit.
> 
> Patches submitted here:
> https://lore.kernel.org/lkml/5d606519698ce4c8f1203a2b35797d8254c6050a.1600285923.git@perches.com/T/
> 
> Anyone know if it's appropriate to change the
> sprintf-like uses in these functions to sysfs_emit
> and/or sysfs_emit_at?

There's a lot of preprocessor magic to wade through.

I'm pretty sure this comes through include/linux/moduleparam.h
and kernel/module.c.


Re: [External] [RFC] Documentation: Add documentation for new performance_profile sysfs class

2020-10-03 Thread Mark Pearson

Hi Hans,

On 2020-10-03 9:19 a.m., Hans de Goede wrote:

On modern systems CPU/GPU/... performance is often dynamically configurable
in the form of e.g. variable clock-speeds and TPD. The performance is often
automatically adjusted to the load by some automatic-mechanism (which may
very well live outside the kernel).

These auto performance-adjustment mechanisms often can be configured with
one of several performance-profiles, with either a bias towards low-power
consumption (and cool and quiet) or towards performance (and higher power
consumption and thermals).

Introduce a new performance_profile class/sysfs API which offers a generic
API for selecting the performance-profile of these automatic-mechanisms.

Cc: Mark Pearson 
Cc: Elia Devito 
Cc: Bastien Nocera 
Cc: Benjamin Berg 
Cc: linux...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Signed-off-by: Hans de Goede 
---
  .../testing/sysfs-class-performance_profile   | 104 ++
  1 file changed, 104 insertions(+)
  create mode 100644 Documentation/ABI/testing/sysfs-class-performance_profile

diff --git a/Documentation/ABI/testing/sysfs-class-performance_profile 
b/Documentation/ABI/testing/sysfs-class-performance_profile
new file mode 100644
index ..9c67cae39600
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-performance_profile
@@ -0,0 +1,104 @@
+Performance-profile selection (e.g. 
/sys/class/performance_profile/thinkpad_acpi/)
+
+On modern systems CPU/GPU/... performance is often dynamically configurable
+in the form of e.g. variable clock-speeds and TPD. The performance is often
+automatically adjusted to the load by some automatic-mechanism (which may
+very well live outside the kernel).
+
+These auto performance-adjustment mechanisms often can be configured with
+one of several performance-profiles, with either a bias towards low-power
+consumption (and cool and quiet) or towards performance (and higher power
+consumption and thermals).
+
+The purpose of the performance_profile class is to offer a generic sysfs
+API for selecting the performance-profile of these automatic-mechanisms.
+
+Note that this API is only for selecting the performance-profile, it is
+NOT a goal of this API to allow monitoring the resulting performance
+characteristics. Monitoring performance is best done with device/vendor
+specific tools such as e.g. turbostat.
+
+Specifically when selecting a high-performance profile the actual achieved
+performance may be limited by various factors such as: the heat generated by
+other components, room temperature, free air flow at the bottom of a laptop,
+etc. It is explicitly NOT a goal of this API to let userspace know about
+any sub-optimal conditions which are impeding reaching the requested
+performance level.
+
+Since numbers are a rather meaningless way to describe performance-profiles
+this API uses strings to describe the various profiles. To make sure that
+userspace gets a consistent experience when using this API this API document
+defines a fixed set of profile-names. Drivers *must* map their internal
+profile representation/names onto this fixed set.
+
+If for some reason there is no good match when mapping then a new profile-name
+may be added. Drivers which wish to introduce new profile-names must:
+1. Have very good reasons to do so.
+2. Add the new profile-name to this document, so that future drivers which also
+   have a similar problem can use the same new. Usually new profile-names will
+   be added to the "extra profile-names" section of this document. But in some
+   cases the set of standard profile-names may be extended.
+
+What:  /sys/class/performance_profile//available_profiles
+Date:  October 2020
+Contact:   Hans de Goede 
+Description:
+   Reading this file gives a space separated list of profiles
+   supported for this device.
+
+   Drivers must use the following standard profile-names whenever
+   possible:
+
+   low-power:  Emphasises low power consumption
+   (and also cool and quiet)
+   balanced-low-power: Balances between low power consumption
+   and performance with a slight bias
+   towards low power
+   balanced:   Balance between low power consumption
+   and performance
+   balanced-performance:   Balances between performance and low
+   power consumption with a slight bias
+   towards performance
+   performance:Emphasises performance (and may lead to
+   higher temperatures and fan speeds)
+
+   Userspace may expect drivers to offer at least several of these
+   standard profile-names! If none of the above are a 

Where is the declaration of buffer used in kernel_param_ops .get functions?

2020-10-03 Thread Joe Perches
These patches came up because I was looking for
the location of the declaration of the buffer used
in kernel/params.c struct kernel_param_ops .get
functions.

I didn't find it.

I want to see if it's appropriate to convert the
sprintf family of functions used in these .get
functions to sysfs_emit.

Patches submitted here:
https://lore.kernel.org/lkml/5d606519698ce4c8f1203a2b35797d8254c6050a.1600285923.git@perches.com/T/

Anyone know if it's appropriate to change the
sprintf-like uses in these functions to sysfs_emit
and/or sysfs_emit_at?




[PATCH 3/8] staging: rtl8723bs: replace _RND8 with round_up()

2020-10-03 Thread Ross Schmidt
Use round_up instead of inline _RND8.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c|  2 +-
 drivers/staging/rtl8723bs/include/osdep_service.h | 11 ---
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c 
b/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
index b9ccaad748ea..1fbf89cb72d0 100644
--- a/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
+++ b/drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c
@@ -369,7 +369,7 @@ static void rtl8723bs_recv_tasklet(struct tasklet_struct *t)
}
}
 
-   pkt_offset = _RND8(pkt_offset);
+   pkt_offset = round_up(pkt_offset, 8);
precvbuf->pdata += pkt_offset;
ptr = precvbuf->pdata;
precvframe = NULL;
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index 8f0e5cbf485b..c5e9a4eebd27 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -132,17 +132,6 @@ static inline int rtw_bug_check(void *parg1, void *parg2, 
void *parg3, void *par
 
 #define _RND(sz, r) sz)+((r)-1))/(r))*(r))
 
-static inline u32 _RND8(u32 sz)
-{
-
-   u32 val;
-
-   val = ((sz >> 3) + ((sz & 7) ? 1 : 0)) << 3;
-
-   return val;
-
-}
-
 #ifndef MAC_FMT
 #define MAC_FMT "%pM"
 #endif
-- 
2.26.2



[PATCH 8/8] staging: rtl8723bs: replace _cancel_timer with del_timer_sync

2020-10-03 Thread Ross Schmidt
Replace _cancel_timer with API function del_timer_sync.

One instance of del_timer_sync is moved and an unnecessary pair of spin
locks are removed.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_cmd.c |  3 +--
 drivers/staging/rtl8723bs/core/rtw_mlme.c| 16 ++--
 drivers/staging/rtl8723bs/hal/sdio_ops.c |  3 +--
 .../rtl8723bs/include/osdep_service_linux.h  |  6 --
 4 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_cmd.c 
b/drivers/staging/rtl8723bs/core/rtw_cmd.c
index 047ec5167f86..2abe205e3453 100644
--- a/drivers/staging/rtl8723bs/core/rtw_cmd.c
+++ b/drivers/staging/rtl8723bs/core/rtw_cmd.c
@@ -2034,7 +2034,6 @@ void rtw_joinbss_cmd_callback(struct adapter *padapter,  
struct cmd_obj *pcmd)
 
 void rtw_createbss_cmd_callback(struct adapter *padapter, struct cmd_obj *pcmd)
 {
-   u8 timer_cancelled;
struct sta_info *psta = NULL;
struct wlan_network *pwlan = NULL;
struct  mlme_priv *pmlmepriv = >mlmepriv;
@@ -2049,7 +2048,7 @@ void rtw_createbss_cmd_callback(struct adapter *padapter, 
struct cmd_obj *pcmd)
_set_timer(>assoc_timer, 1);
}
 
-   _cancel_timer(>assoc_timer, _cancelled);
+   del_timer_sync(>assoc_timer);
 
spin_lock_bh(>lock);
 
diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme.c 
b/drivers/staging/rtl8723bs/core/rtw_mlme.c
index e65c5a870b46..9531ba54e95b 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme.c
@@ -814,7 +814,6 @@ void rtw_survey_event_callback(struct adapter   
*adapter, u8 *pbuf)
 
 void rtw_surveydone_event_callback(struct adapter  *adapter, u8 *pbuf)
 {
-   u8 timer_cancelled = false;
struct  mlme_priv *pmlmepriv = &(adapter->mlmepriv);
 
spin_lock_bh(>lock);
@@ -827,22 +826,12 @@ void rtw_surveydone_event_callback(struct adapter 
*adapter, u8 *pbuf)
RT_TRACE(_module_rtl871x_mlme_c_, _drv_info_, 
("rtw_surveydone_event_callback: fw_state:%x\n\n", get_fwstate(pmlmepriv)));
 
if (check_fwstate(pmlmepriv, _FW_UNDER_SURVEY)) {
-   /* u8 timer_cancelled; */
-
-   timer_cancelled = true;
-   /* _cancel_timer(>scan_to_timer, _cancelled); 
*/
-
+   del_timer_sync(>scan_to_timer);
_clr_fwstate_(pmlmepriv, _FW_UNDER_SURVEY);
} else {
 
RT_TRACE(_module_rtl871x_mlme_c_, _drv_err_, ("nic status =%x, 
survey done event comes too late!\n", get_fwstate(pmlmepriv)));
}
-   spin_unlock_bh(>lock);
-
-   if (timer_cancelled)
-   _cancel_timer(>scan_to_timer, _cancelled);
-
-   spin_lock_bh(>lock);
 
rtw_set_signal_stat_timer(>recvpriv);
 
@@ -1298,7 +1287,6 @@ static void rtw_joinbss_update_network(struct adapter 
*padapter, struct wlan_net
 void rtw_joinbss_event_prehandle(struct adapter *adapter, u8 *pbuf)
 {
static u8 retry;
-   u8 timer_cancelled;
struct sta_info *ptarget_sta = NULL, *pcur_sta = NULL;
struct  sta_priv *pstapriv = >stapriv;
struct  mlme_priv *pmlmepriv = &(adapter->mlmepriv);
@@ -1392,7 +1380,7 @@ void rtw_joinbss_event_prehandle(struct adapter *adapter, 
u8 *pbuf)
}
 
/* s5. Cancel assoc_timer */
-   _cancel_timer(>assoc_timer, 
_cancelled);
+   del_timer_sync(>assoc_timer);
 
RT_TRACE(_module_rtl871x_mlme_c_, _drv_info_, ("Cancel 
assoc_timer\n"));
 
diff --git a/drivers/staging/rtl8723bs/hal/sdio_ops.c 
b/drivers/staging/rtl8723bs/hal/sdio_ops.c
index 465f51b99d39..369f55d11519 100644
--- a/drivers/staging/rtl8723bs/hal/sdio_ops.c
+++ b/drivers/staging/rtl8723bs/hal/sdio_ops.c
@@ -945,8 +945,7 @@ void sd_int_dpc(struct adapter *adapter)
if (hal->sdio_hisr & SDIO_HISR_CPWM1) {
struct reportpwrstate_parm report;
 
-   u8 bcancelled;
-   _cancel_timer(&(pwrctl->pwr_rpwm_timer), );
+   del_timer_sync(&(pwrctl->pwr_rpwm_timer));
 
report.state = SdioLocalCmd52Read1Byte(adapter, 
SDIO_REG_HCPWM1_8723B);
 
diff --git a/drivers/staging/rtl8723bs/include/osdep_service_linux.h 
b/drivers/staging/rtl8723bs/include/osdep_service_linux.h
index 4a5bdb93e75d..498d5474010c 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service_linux.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service_linux.h
@@ -83,12 +83,6 @@ static inline void _set_timer(_timer *ptimer, u32 delay_time)
mod_timer(ptimer, (jiffies + (delay_time * HZ / 1000)));
 }
 
-static inline void _cancel_timer(_timer *ptimer, u8 *bcancelled)
-{
-   del_timer_sync(ptimer);
-   *bcancelled =  true;/* true == 1; false == 0 */
-}
-
 static inline void _init_workitem(_workitem *pwork, void *pfunc, void *cntx)
 {
INIT_WORK(pwork, pfunc);
-- 
2.26.2



[PATCH 6/8] staging: rtl8723bs: replace RTW_GET_LE16 with get_unaligned_le16

2020-10-03 Thread Ross Schmidt
Replace RTW_GET_LE16 macro with get_unaligned_le16.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_ap.c   | 5 +++--
 drivers/staging/rtl8723bs/core/rtw_ieee80211.c| 4 ++--
 drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 7 ---
 drivers/staging/rtl8723bs/include/osdep_service.h | 2 --
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_ap.c 
b/drivers/staging/rtl8723bs/core/rtw_ap.c
index a76e81330756..4f270d509ad3 100644
--- a/drivers/staging/rtl8723bs/core/rtw_ap.c
+++ b/drivers/staging/rtl8723bs/core/rtw_ap.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 extern unsigned char RTW_WPA_OUI[];
 extern unsigned char WMM_OUI[];
@@ -995,12 +996,12 @@ int rtw_check_beacon_data(struct adapter *padapter, u8 
*pbuf,  int len)
/* beacon interval */
p = rtw_get_beacon_interval_from_ie(ie);/* ie + 8;  8: TimeStamp, 
2: Beacon Interval 2:Capability */
/* pbss_network->Configuration.BeaconPeriod = le16_to_cpu(*(unsigned 
short*)p); */
-   pbss_network->Configuration.BeaconPeriod = RTW_GET_LE16(p);
+   pbss_network->Configuration.BeaconPeriod = get_unaligned_le16(p);
 
/* capability */
/* cap = *(unsigned short *)rtw_get_capability_from_ie(ie); */
/* cap = le16_to_cpu(cap); */
-   cap = RTW_GET_LE16(ie);
+   cap = get_unaligned_le16(ie);
 
/* SSID */
p = rtw_get_ie(
diff --git a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c 
b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
index 977f0ed53ad7..3b7a3c220032 100644
--- a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
+++ b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
@@ -500,7 +500,7 @@ int rtw_parse_wpa_ie(u8 *wpa_ie, int wpa_ie_len, int 
*group_cipher, int *pairwis
/* pairwise_cipher */
if (left >= 2) {
/* count = le16_to_cpu(*(u16*)pos); */
-   count = RTW_GET_LE16(pos);
+   count = get_unaligned_le16(pos);
pos += 2;
left -= 2;
 
@@ -570,7 +570,7 @@ int rtw_parse_wpa2_ie(u8 *rsn_ie, int rsn_ie_len, int 
*group_cipher, int *pairwi
/* pairwise_cipher */
if (left >= 2) {
  /* count = le16_to_cpu(*(u16*)pos); */
-   count = RTW_GET_LE16(pos);
+   count = get_unaligned_le16(pos);
pos += 2;
left -= 2;
 
diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c 
b/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
index 6db637701063..b912ad2f4b72 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct mlme_handler mlme_sta_tbl[] = {
{WIFI_ASSOCREQ, "OnAssocReq",   },
@@ -1213,7 +1214,7 @@ unsigned int OnAssocReq(struct adapter *padapter, union 
recv_frame *precv_frame)
goto asoc_class2_error;
}
 
-   capab_info = RTW_GET_LE16(pframe + WLAN_HDR_A3_LEN);
+   capab_info = get_unaligned_le16(pframe + WLAN_HDR_A3_LEN);
/* capab_info = le16_to_cpu(*(unsigned short *)(pframe + 
WLAN_HDR_A3_LEN)); */
 
left = pkt_len - (sizeof(struct ieee80211_hdr_3addr) + ie_offset);
@@ -1959,7 +1960,7 @@ unsigned int OnAction_back(struct adapter *padapter, 
union recv_frame *precv_fra
break;
 
case RTW_WLAN_ACTION_ADDBA_RESP: /* ADDBA response */
-   status = RTW_GET_LE16(_body[3]);
+   status = get_unaligned_le16(_body[3]);
tid = ((frame_body[5] >> 2) & 0x7);
 
if (status == 0) {
@@ -1989,7 +1990,7 @@ unsigned int OnAction_back(struct adapter *padapter, 
union recv_frame *precv_fra
~BIT((frame_body[3] >> 4) & 0xf);
 
/* reason_code = frame_body[4] | (frame_body[5] 
<< 8); */
-   reason_code = RTW_GET_LE16(_body[4]);
+   reason_code = 
get_unaligned_le16(_body[4]);
} else if ((frame_body[3] & BIT(3)) == BIT(3)) {
tid = (frame_body[3] >> 4) & 0x0F;
 
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index a26c8db302e0..2f7e1665b6b1 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -152,8 +152,6 @@ extern void rtw_free_netdev(struct net_device * netdev);
 
 #define RTW_GET_BE16(a) ((u16) (((a)[0] << 8) | (a)[1]))
 
-#define RTW_GET_LE16(a) ((u16) (((a)[1] << 8) | (a)[0]))
-
 void rtw_buf_free(u8 **buf, u32 *buf_len);
 void rtw_buf_update(u8 **buf, u32 *buf_len, u8 *src, u32 src_len);
 
-- 
2.26.2



[PATCH 4/8] staging: rtl8723bs: remove unused macros

2020-10-03 Thread Ross Schmidt
Remove several macros in osdep_service.h because they are not used.

Signed-off-by: Ross Schmidt 
---
 .../staging/rtl8723bs/include/osdep_service.h | 57 ---
 1 file changed, 57 deletions(-)

diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index c5e9a4eebd27..da4aa3e71a4b 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -151,68 +151,11 @@ extern void rtw_free_netdev(struct net_device * netdev);
 /* Macros for handling unaligned memory accesses */
 
 #define RTW_GET_BE16(a) ((u16) (((a)[0] << 8) | (a)[1]))
-#define RTW_PUT_BE16(a, val)   \
-   do {\
-   (a)[0] = ((u16) (val)) >> 8;\
-   (a)[1] = ((u16) (val)) & 0xff;  \
-   } while (0)
 
 #define RTW_GET_LE16(a) ((u16) (((a)[1] << 8) | (a)[0]))
-#define RTW_PUT_LE16(a, val)   \
-   do {\
-   (a)[1] = ((u16) (val)) >> 8;\
-   (a)[0] = ((u16) (val)) & 0xff;  \
-   } while (0)
 
 #define RTW_GET_BE24(a) u32) (a)[0]) << 16) | (((u32) (a)[1]) << 8) | \
 ((u32) (a)[2]))
-#define RTW_PUT_BE24(a, val)   \
-   do {\
-   (a)[0] = (u8) u32) (val)) >> 16) & 0xff);   \
-   (a)[1] = (u8) u32) (val)) >> 8) & 0xff);\
-   (a)[2] = (u8) (((u32) (val)) & 0xff);   \
-   } while (0)
-
-#define RTW_GET_BE32(a) u32) (a)[0]) << 24) | (((u32) (a)[1]) << 16) | \
-(((u32) (a)[2]) << 8) | ((u32) (a)[3]))
-#define RTW_PUT_BE32(a, val)   \
-   do {\
-   (a)[0] = (u8) u32) (val)) >> 24) & 0xff);   \
-   (a)[1] = (u8) u32) (val)) >> 16) & 0xff);   \
-   (a)[2] = (u8) u32) (val)) >> 8) & 0xff);\
-   (a)[3] = (u8) (((u32) (val)) & 0xff);   \
-   } while (0)
-
-#define RTW_GET_LE32(a) u32) (a)[3]) << 24) | (((u32) (a)[2]) << 16) | \
-(((u32) (a)[1]) << 8) | ((u32) (a)[0]))
-#define RTW_PUT_LE32(a, val)   \
-   do {\
-   (a)[3] = (u8) u32) (val)) >> 24) & 0xff);   \
-   (a)[2] = (u8) u32) (val)) >> 16) & 0xff);   \
-   (a)[1] = (u8) u32) (val)) >> 8) & 0xff);\
-   (a)[0] = (u8) (((u32) (val)) & 0xff);   \
-   } while (0)
-
-#define RTW_GET_BE64(a) u64) (a)[0]) << 56) | (((u64) (a)[1]) << 48) | \
-(((u64) (a)[2]) << 40) | (((u64) (a)[3]) << 32) | \
-(((u64) (a)[4]) << 24) | (((u64) (a)[5]) << 16) | \
-(((u64) (a)[6]) << 8) | ((u64) (a)[7]))
-#define RTW_PUT_BE64(a, val)   \
-   do {\
-   (a)[0] = (u8) (((u64) (val)) >> 56);\
-   (a)[1] = (u8) (((u64) (val)) >> 48);\
-   (a)[2] = (u8) (((u64) (val)) >> 40);\
-   (a)[3] = (u8) (((u64) (val)) >> 32);\
-   (a)[4] = (u8) (((u64) (val)) >> 24);\
-   (a)[5] = (u8) (((u64) (val)) >> 16);\
-   (a)[6] = (u8) (((u64) (val)) >> 8); \
-   (a)[7] = (u8) (((u64) (val)) & 0xff);   \
-   } while (0)
-
-#define RTW_GET_LE64(a) u64) (a)[7]) << 56) | (((u64) (a)[6]) << 48) | \
-(((u64) (a)[5]) << 40) | (((u64) (a)[4]) << 32) | \
-(((u64) (a)[3]) << 24) | (((u64) (a)[2]) << 16) | \
-(((u64) (a)[1]) << 8) | ((u64) (a)[0]))
 
 void rtw_buf_free(u8 **buf, u32 *buf_len);
 void rtw_buf_update(u8 **buf, u32 *buf_len, u8 *src, u32 src_len);
-- 
2.26.2



[PATCH 5/8] staging: rtl8723bs: replace RTW_GET_BE24 with get_unaligned_be24

2020-10-03 Thread Ross Schmidt
Replace RTW_GET_BE24 macro with get_unaligned_be24.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_ieee80211.c| 3 ++-
 drivers/staging/rtl8723bs/include/osdep_service.h | 3 ---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c 
b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
index ca98274ae390..977f0ed53ad7 100644
--- a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
+++ b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 u8 RTW_WPA_OUI_TYPE[] = { 0x00, 0x50, 0xf2, 1 };
 u16 RTW_WPA_VERSION = 1;
@@ -874,7 +875,7 @@ static int rtw_ieee802_11_parse_vendor_specific(u8 *pos, 
uint elen,
return -1;
}
 
-   oui = RTW_GET_BE24(pos);
+   oui = get_unaligned_be24(pos);
switch (oui) {
case OUI_MICROSOFT:
/* Microsoft/Wi-Fi information elements are further typed and
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index da4aa3e71a4b..a26c8db302e0 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -154,9 +154,6 @@ extern void rtw_free_netdev(struct net_device * netdev);
 
 #define RTW_GET_LE16(a) ((u16) (((a)[1] << 8) | (a)[0]))
 
-#define RTW_GET_BE24(a) u32) (a)[0]) << 16) | (((u32) (a)[1]) << 8) | \
-((u32) (a)[2]))
-
 void rtw_buf_free(u8 **buf, u32 *buf_len);
 void rtw_buf_update(u8 **buf, u32 *buf_len, u8 *src, u32 src_len);
 
-- 
2.26.2



[PATCH 2/8] staging: rtl8723bs: replace _RND4 with round_up()

2020-10-03 Thread Ross Schmidt
Use round_up instead of inline _RND4.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_cmd.c  |  2 +-
 drivers/staging/rtl8723bs/hal/sdio_ops.c  |  2 +-
 drivers/staging/rtl8723bs/include/osdep_service.h | 11 ---
 3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_cmd.c 
b/drivers/staging/rtl8723bs/core/rtw_cmd.c
index bd18d1803e27..047ec5167f86 100644
--- a/drivers/staging/rtl8723bs/core/rtw_cmd.c
+++ b/drivers/staging/rtl8723bs/core/rtw_cmd.c
@@ -469,7 +469,7 @@ int rtw_cmd_thread(void *context)
 
pcmdpriv->cmd_issued_cnt++;
 
-   pcmd->cmdsz = _RND4((pcmd->cmdsz));/* _RND4 */
+   pcmd->cmdsz = round_up((pcmd->cmdsz), 4);
 
memcpy(pcmdbuf, pcmd->parmbuf, pcmd->cmdsz);
 
diff --git a/drivers/staging/rtl8723bs/hal/sdio_ops.c 
b/drivers/staging/rtl8723bs/hal/sdio_ops.c
index 544d5a093229..465f51b99d39 100644
--- a/drivers/staging/rtl8723bs/hal/sdio_ops.c
+++ b/drivers/staging/rtl8723bs/hal/sdio_ops.c
@@ -474,7 +474,7 @@ static u32 sdio_write_port(
return _FAIL;
}
 
-   cnt = _RND4(cnt);
+   cnt = round_up(cnt, 4);
HalSdioGetCmdAddr8723BSdio(adapter, addr, cnt >> 2, );
 
if (cnt > psdio->block_transfer_len)
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index ea3f4f3c86d2..8f0e5cbf485b 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -132,17 +132,6 @@ static inline int rtw_bug_check(void *parg1, void *parg2, 
void *parg3, void *par
 
 #define _RND(sz, r) sz)+((r)-1))/(r))*(r))
 
-static inline u32 _RND4(u32 sz)
-{
-
-   u32 val;
-
-   val = ((sz >> 2) + ((sz & 3) ? 1 : 0)) << 2;
-
-   return val;
-
-}
-
 static inline u32 _RND8(u32 sz)
 {
 
-- 
2.26.2



[PATCH 1/8] staging: rtl8723bs: replace RND4 with round_up()

2020-10-03 Thread Ross Schmidt
Use round_up instead of define RND4.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_security.c | 6 +++---
 drivers/staging/rtl8723bs/core/rtw_xmit.c | 4 ++--
 drivers/staging/rtl8723bs/hal/sdio_ops.c  | 6 +++---
 drivers/staging/rtl8723bs/include/osdep_service.h | 1 -
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_security.c 
b/drivers/staging/rtl8723bs/core/rtw_security.c
index 7f74e1d05b3a..159d32ace2bc 100644
--- a/drivers/staging/rtl8723bs/core/rtw_security.c
+++ b/drivers/staging/rtl8723bs/core/rtw_security.c
@@ -260,7 +260,7 @@ void rtw_wep_encrypt(struct adapter *padapter, u8 
*pxmitframe)
arcfour_encrypt(, payload+length, 
crc, 4);
 
pframe += pxmitpriv->frag_len;
-   pframe = (u8 *)RND4((SIZE_PTR)(pframe));
+   pframe = (u8 *)round_up((SIZE_PTR)(pframe), 4);
}
}
 
@@ -716,7 +716,7 @@ u32 rtw_tkip_encrypt(struct adapter *padapter, u8 
*pxmitframe)
arcfour_encrypt(, 
payload+length, crc, 4);
 
pframe += pxmitpriv->frag_len;
-   pframe = (u8 *)RND4((SIZE_PTR)(pframe));
+   pframe = (u8 
*)round_up((SIZE_PTR)(pframe), 4);
}
}
 
@@ -1523,7 +1523,7 @@ u32 rtw_aes_encrypt(struct adapter *padapter, u8 
*pxmitframe)
 
aes_cipher(prwskey, pattrib->hdrlen, pframe, 
length);
pframe += pxmitpriv->frag_len;
-   pframe = (u8 *)RND4((SIZE_PTR)(pframe));
+   pframe = (u8 *)round_up((SIZE_PTR)(pframe), 4);
}
}
 
diff --git a/drivers/staging/rtl8723bs/core/rtw_xmit.c 
b/drivers/staging/rtl8723bs/core/rtw_xmit.c
index 571353404a95..6ecaff9728fd 100644
--- a/drivers/staging/rtl8723bs/core/rtw_xmit.c
+++ b/drivers/staging/rtl8723bs/core/rtw_xmit.c
@@ -865,7 +865,7 @@ static s32 xmitframe_addmic(struct adapter *padapter, 
struct xmit_frame *pxmitfr
payload = pframe;
 
for (curfragnum = 0; curfragnum < pattrib->nr_frags; 
curfragnum++) {
-   payload = (u8 *)RND4((SIZE_PTR)(payload));
+   payload = (u8 *)round_up((SIZE_PTR)(payload), 
4);
RT_TRACE(_module_rtl871x_xmit_c_, _drv_err_, 
("===curfragnum =%d, pframe = 0x%.2x, 0x%.2x, 0x%.2x, 0x%.2x, 0x%.2x, 0x%.2x, 
0x%.2x, 0x%.2x,!!!\n",
curfragnum, *payload, *(payload+1), 
*(payload+2), *(payload+3), *(payload+4), *(payload+5), *(payload+6), 
*(payload+7)));
 
@@ -1209,7 +1209,7 @@ s32 rtw_xmitframe_coalesce(struct adapter *padapter, _pkt 
*pkt, struct xmit_fram
 
addr = (SIZE_PTR)(pframe);
 
-   mem_start = (unsigned char *)RND4(addr) + hw_hdr_offset;
+   mem_start = (unsigned char *)round_up(addr, 4) + hw_hdr_offset;
memcpy(mem_start, pbuf_start + hw_hdr_offset, pattrib->hdrlen);
}
 
diff --git a/drivers/staging/rtl8723bs/hal/sdio_ops.c 
b/drivers/staging/rtl8723bs/hal/sdio_ops.c
index b6b4adb5a28a..544d5a093229 100644
--- a/drivers/staging/rtl8723bs/hal/sdio_ops.c
+++ b/drivers/staging/rtl8723bs/hal/sdio_ops.c
@@ -534,7 +534,7 @@ static s32 _sdio_local_read(
if (!mac_pwr_ctrl_on)
return _sd_cmd52_read(intfhdl, addr, cnt, buf);
 
-   n = RND4(cnt);
+   n = round_up(cnt, 4);
tmpbuf = rtw_malloc(n);
if (!tmpbuf)
return -1;
@@ -575,7 +575,7 @@ s32 sdio_local_read(
)
return sd_cmd52_read(intfhdl, addr, cnt, buf);
 
-   n = RND4(cnt);
+   n = round_up(cnt, 4);
tmpbuf = rtw_malloc(n);
if (!tmpbuf)
return -1;
@@ -859,7 +859,7 @@ static struct recv_buf *sd_recv_rxfifo(struct adapter 
*adapter, u32 size)
 
/*  Patch for some SDIO Host 4 bytes issue */
/*  ex. RK3188 */
-   readsize = RND4(size);
+   readsize = round_up(size, 4);
 
/* 3 1. alloc recvbuf */
recv_priv = >recvpriv;
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index be34e279670b..ea3f4f3c86d2 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -131,7 +131,6 @@ static inline int rtw_bug_check(void *parg1, void *parg2, 
void *parg3, void *par
 }
 
 #define _RND(sz, r) sz)+((r)-1))/(r))*(r))
-#define RND4(x)(((x >> 2) + (((x & 3) == 0) ?  0 : 1)) << 2)
 
 static inline u32 _RND4(u32 sz)
 {
-- 
2.26.2



Re: [PATCH] MIPS: cevt-r4k: Enable intimer for Loongson64 CPUs with extimer

2020-10-03 Thread Jiaxun Yang



于 2020年10月2日 GMT+08:00 下午9:27:21, Thomas Bogendoerfer 
 写到:
>On Wed, Sep 23, 2020 at 07:02:54PM +0800, Jiaxun Yang wrote:
>>  
>> +#ifdef CONFIG_CPU_LOONGSON64
>> +static int c0_compare_int_enable(struct clock_event_device *cd)
>> +{
>> +if (cpu_has_extimer)
>> +set_c0_config6(LOONGSON_CONF6_INTIMER);
>
>why are you not simply do this in loognson64 board setup code and avoid
>the whole cluttering of #ifdef CONFIG_CPU_LOONGSON64 over common code ?

Because I'm going to add extimer support later that require dynamic switch of 
cevt-r4k.

This callback is required.

Thanks.

- Jiaxun

>
>Thomas.
>


[PATCH 7/8] staging: rtl8723bs: replace RTW_GET_BE16 with get_unaligned_be16

2020-10-03 Thread Ross Schmidt
Replace RTW_GET_BE16 macro with get_unlaligned_be16.

Signed-off-by: Ross Schmidt 
---
 drivers/staging/rtl8723bs/core/rtw_ieee80211.c| 4 ++--
 drivers/staging/rtl8723bs/core/rtw_recv.c | 3 ++-
 drivers/staging/rtl8723bs/include/osdep_service.h | 2 --
 drivers/staging/rtl8723bs/os_dep/recv_linux.c | 3 ++-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c 
b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
index 3b7a3c220032..c43cca4a3828 100644
--- a/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
+++ b/drivers/staging/rtl8723bs/core/rtw_ieee80211.c
@@ -801,8 +801,8 @@ u8 *rtw_get_wps_attr(u8 *wps_ie, uint wps_ielen, u16 
target_attr_id, u8 *buf_att
 
while (attr_ptr - wps_ie < wps_ielen) {
/*  4 = 2(Attribute ID) + 2(Length) */
-   u16 attr_id = RTW_GET_BE16(attr_ptr);
-   u16 attr_data_len = RTW_GET_BE16(attr_ptr + 2);
+   u16 attr_id = get_unaligned_be16(attr_ptr);
+   u16 attr_data_len = get_unaligned_be16(attr_ptr + 2);
u16 attr_len = attr_data_len + 4;
 
/* DBG_871X("%s attr_ptr:%p, id:%u, length:%u\n", __func__, 
attr_ptr, attr_id, attr_data_len); */
diff --git a/drivers/staging/rtl8723bs/core/rtw_recv.c 
b/drivers/staging/rtl8723bs/core/rtw_recv.c
index 7e1da0e35812..6979f8dbccb8 100644
--- a/drivers/staging/rtl8723bs/core/rtw_recv.c
+++ b/drivers/staging/rtl8723bs/core/rtw_recv.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static u8 SNAP_ETH_TYPE_IPX[2] = {0x81, 0x37};
 static u8 SNAP_ETH_TYPE_APPLETALK_AARP[2] = {0x80, 0xf3};
@@ -1906,7 +1907,7 @@ static int amsdu_to_msdu(struct adapter *padapter, union 
recv_frame *prframe)
while (a_len > ETH_HLEN) {
 
/* Offset 12 denote 2 mac address */
-   nSubframe_Length = RTW_GET_BE16(pdata + 12);
+   nSubframe_Length = get_unaligned_be16(pdata + 12);
 
if (a_len < (ETHERNET_HEADER_SIZE + nSubframe_Length)) {
DBG_871X("nRemain_Length is %d and nSubframe_Length is 
: %d\n", a_len, nSubframe_Length);
diff --git a/drivers/staging/rtl8723bs/include/osdep_service.h 
b/drivers/staging/rtl8723bs/include/osdep_service.h
index 2f7e1665b6b1..a94b72397ce7 100644
--- a/drivers/staging/rtl8723bs/include/osdep_service.h
+++ b/drivers/staging/rtl8723bs/include/osdep_service.h
@@ -150,8 +150,6 @@ extern void rtw_free_netdev(struct net_device * netdev);
 
 /* Macros for handling unaligned memory accesses */
 
-#define RTW_GET_BE16(a) ((u16) (((a)[0] << 8) | (a)[1]))
-
 void rtw_buf_free(u8 **buf, u32 *buf_len);
 void rtw_buf_update(u8 **buf, u32 *buf_len, u8 *src, u32 src_len);
 
diff --git a/drivers/staging/rtl8723bs/os_dep/recv_linux.c 
b/drivers/staging/rtl8723bs/os_dep/recv_linux.c
index b2a1bbb30df6..900ff3a3b014 100644
--- a/drivers/staging/rtl8723bs/os_dep/recv_linux.c
+++ b/drivers/staging/rtl8723bs/os_dep/recv_linux.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void rtw_os_free_recvframe(union recv_frame *precvframe)
 {
@@ -69,7 +70,7 @@ _pkt *rtw_os_alloc_msdu_pkt(union recv_frame *prframe, u16 
nSubframe_Length, u8
skb_reserve(sub_skb, 12);
skb_put_data(sub_skb, (pdata + ETH_HLEN), nSubframe_Length);
 
-   eth_type = RTW_GET_BE16(_skb->data[6]);
+   eth_type = get_unaligned_be16(_skb->data[6]);
 
if (sub_skb->len >= 8 &&
((!memcmp(sub_skb->data, rfc1042_header, SNAP_SIZE) &&
-- 
2.26.2



  1   2   3   4   5   >