Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-18 Thread Keith Busch
On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote:
> Hello
> 
> Please consider the following scenario.
> nvme_reset_ctrl
>   -> set state to RESETTING
>   -> queue reset_work   
> (scheduling)
> nvme_reset_work
>   -> nvme_dev_disable
> -> quiesce queues
> -> nvme_cancel_request 
>on outstanding requests
> ---_boundary_
>   -> nvme initializing (issue request on adminq)
> 
> Before the _boundary_, not only quiesce the queues, but only cancel
> all the outstanding requests.
> 
> A request could expire when the ctrl state is RESETTING.
>  - If the timeout occur before the _boundary_, the expired requests
>are from the previous work.
>  - Otherwise, the expired requests are from the controller initializing
>procedure, such as sending cq/sq create commands to adminq to setup
>io queues.
> In current implementation, nvme_timeout cannot identify the _boundary_ 
> so only handles second case above.

Bare with me a moment, as I'm only just now getting a real chance to look
at this, and I'm not quite sure I follow what problem this is solving.

The nvme_dev_disable routine makes forward progress without depending on
timeout handling to complete expired commands. Once controller disabling
completes, there can't possibly be any started requests that can expire.
So we don't need nvme_timeout to do anything for requests above the
boundary.


Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit

2018-01-18 Thread Christian Borntraeger


On 01/19/2018 07:29 AM, QingFeng Hao wrote:
> 
> 
> 在 2018/1/17 17:48, Martin Schwidefsky 写道:
>> Clear all user space registers on entry to the kernel and all KVM guest
>> registers on KVM guest exit if the register does not contain either a
>> parameter or a result value.
> I am not sure if I understand this but it will be safer?

It ist similar to commit 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at 
VM-exit").
The idea is to minimize potential payload channels.

> And can we abstract the operations to be a macro like CLEAR_REG_7?

No, please.
xgr %r7,%r7
is absolutely clear what it does, a MACRO often is not.



[PATCH][V2] mtd: nand: marvell: fix spelling mistake: "suceed"-> "succeed"

2018-01-18 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistakes in dev_err error message text.

Signed-off-by: Colin Ian King 
---
 drivers/mtd/nand/marvell_nand.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/marvell_nand.c b/drivers/mtd/nand/marvell_nand.c
index b8fec6093b75..4bd53b360277 100644
--- a/drivers/mtd/nand/marvell_nand.c
+++ b/drivers/mtd/nand/marvell_nand.c
@@ -517,7 +517,7 @@ static int marvell_nfc_prepare_cmd(struct nand_chip *chip)
/* Poll ND_RUN and clear NDSR before issuing any command */
ret = marvell_nfc_wait_ndrun(chip);
if (ret) {
-   dev_err(nfc->dev, "Last operation did not suceed\n");
+   dev_err(nfc->dev, "Last operation did not succeed\n");
return ret;
}
 
-- 
2.15.1



Re: [PATCH 4.4 045/115] sched/deadline: Throttle a constrained deadline task activated after the deadline

2018-01-18 Thread Greg Kroah-Hartman
On Fri, Jan 19, 2018 at 01:00:45AM +, Ben Hutchings wrote:
> On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.  If anyone has any objections, please let me
> > know.
> > 
> > --
> > 
> > From: Daniel Bristot de Oliveira 
> > 
> > 
> > [ Upstream commit df8eac8cafce7d086be3bd5cf5a838fa37594dfb ]
> [...]
> 
> I think this needs another fix on top:
> 
> commit ae83b56a56f8d9643dedbee86b457fa1c5d42f59
> Author: Xunlei Pang 
> Date:   Wed May 10 21:03:37 2017 +0800
> 
> sched/deadline: Zero out positive runtime after throttling constrained 
> tasks

Now queued up, thanks.

> There's another fix related to this, but it doesn't appear to fix a
> regression and I don't know how critical it is:
> 
> commit 3effcb4247e74a51f5d8b775a1ee4abf87cc089a
> Author: Daniel Bristot de Oliveira 
> Date:   Mon May 29 16:24:03 2017 +0200
> 
> sched/deadline: Use the revised wakeup rule for suspending constrained dl 
> tasks

I'll hold off on this one until someone actually asks for it, as it's a
big change.

thanks again for the review,

greg k-h


Re: [PATCH 4.4 040/115] scsi: hpsa: update check for logical volume status

2018-01-18 Thread Greg Kroah-Hartman
On Fri, Jan 19, 2018 at 12:29:12AM +, Ben Hutchings wrote:
> On Mon, 2017-12-18 at 16:48 +0100, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Don Brace 
> > 
> > 
> > [ Upstream commit 85b29008d8af6d94a0723aaa8d93cfb6e041158b ]
> > 
> >  - Add in a new case for volume offline. Resolves internal testing bug
> >    for multilun array management.
> >  - Return correct status for failed TURs.
> [...]
> 
> This apparently caused a regression that is fixed by:
> 
> commit eb94588dabec82e012281608949a860f64752914
> Author: Tomas Henzl 
> Date:   Mon Mar 20 16:42:48 2017 +0100
> 
> scsi: hpsa: fix volume offline state

Many thanks, also now queued up for 4.9 which needs this too.

greg k-h


Re: [PATCH] general protection fault in sock_has_perm

2018-01-18 Thread Greg KH
On Thu, Jan 18, 2018 at 01:58:45PM -0800, Mark Salyzyn wrote:
> general protection fault:  [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 14233 Comm: syz-executor2 Not tainted 4.4.112-g5f6325b #28
> task: 8801d1095f00 task.stack: 8800b595
> RIP: 0010:[]  [] 
> sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069
> RSP: 0018:8800b5957ce0  EFLAGS: 00010202
> RAX: dc00 RBX: 110016b2af9f RCX: 81b69b51
> RDX: 0002 RSI:  RDI: 0010
> RBP: 8800b5957de0 R08: 0001 R09: 0001
> R10:  R11: 110016b2af68 R12: 8800b5957db8
> R13:  R14: 8800b7259f40 R15: 00d7
> FS:  7f72f5ae2700() GS:8801db30() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00a2fa38 CR3: 0001d798 CR4: 00160670
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Stack:
>  81b69a1f 8800b5957d58 8000b5957d30 41b58ab3
>  83fc82f2 81b69980 0246 8801d1096770
>  8801d3165668 8157844b 8801d1095f00
>  8801
> Call Trace:
> [] selinux_socket_setsockopt+0x4d/0x80 
> security/selinux/hooks.c:4338
> [] security_socket_setsockopt+0x7d/0xb0 
> security/security.c:1257
> [] SYSC_setsockopt net/socket.c:1757 [inline]
> [] SyS_setsockopt+0xe8/0x250 net/socket.c:1746
> [] entry_SYSCALL_64_fastpath+0x16/0x92
> Code: c2 42 9b b6 81 be 01 00 00 00 48 c7 c7 a0 cb 2b 84 e8
> f7 2f 6d ff 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48 89
> fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 83 01 00
> 00 41 8b 75 10 31
> RIP  [] sock_has_perm+0x1fe/0x3e0 
> security/selinux/hooks.c:4069
> RSP 
> ---[ end trace 7b5aaf788fef6174 ]---
> 
> In the absence of commit a4298e4522d6 ("net: add SOCK_RCU_FREE socket
> flag") and all the associated infrastructure changes to take advantage
> of a RCU grace period before freeing, there is a heightened
> possibility that a security check is performed while an ill-timed
> setsockopt call races in from user space.  It then is prudent to null
> check sk_security, and if the case, reject the permissions.
> 
> This adjustment is orthogonal to infrastructure improvements that may
> nullify the needed check, but should be added as good code hygiene.
> 
> Signed-off-by: Mark Salyzyn 
> Cc: Paul Moore 
> Cc: Stephen Smalley 
> Cc: Eric Paris 
> Cc: James Morris 
> Cc: "Serge E. Hallyn" 
> Cc: seli...@tycho.nsa.gov
> Cc: linux-security-mod...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org
> ---
> This patch should be applied to all stable trees (author wants
> minimum of 3.18, 4.4, 4.9 and 4.14)

Note, if you want this type of thing to show up in the patch itself, so
I will see it when it hits Linus's tree, you can just change the stable
line to be:
cc: stable  # 3.18+

thanks,

greg k-h


答复: 答复: 答复: [PATCH v6] mfd: Add support for RTS5250S power saving

2018-01-18 Thread 冯锐
> On Wed, Dec 27, 2017 at 05:37:50PM -0600, Bjorn Helgaas wrote:
> > On Tue, Dec 19, 2017 at 08:15:24AM +, 冯锐 wrote:
> > > > On Fri, Dec 15, 2017 at 09:42:45AM +, 冯锐 wrote:
> > > > > > [+cc Hans, Dave, linux-pci]
> > > > > >
> > > > > > On Thu, Sep 07, 2017 at 04:26:39PM +0800,
> > > > > > rui_f...@realsil.com.cn
> > > > wrote:
> > > > > > > From: Rui Feng 
> > > > > >
> > > > > > I wish this had been posted to linux-pci before being merged.
> > > > > >
> > > > > > I'm concerned because some of this appears to overlap and
> > > > > > conflict with PCI core management of ASPM.
> > > > > >
> > > > > > I assume these devices advertise ASPM support in their Link
> > > > > > Capabilites registers, right?  If so, why isn't the existing
> > > > > > PCI core ASPM support sufficient?
> > > > > >
> > > > > When L1SS is configured, the device(hardware) can't enter L1SS
> > > > > status automatically, it need driver(software) to do some work
> > > > > to achieve the
> > > > function.
> > > >
> > > > So this is a hardware defect in the device?  As far as I know,
> > > > ASPM and L1SS are specified such that they should work without special
> driver support.
> > > >
> > > Yes, you can say that.
> > >
> > > > > > > Enable power saving for RTS5250S as following steps:
> > > > > > > 1.Set 0xFE58 to enable clock power management.
> > > > > >
> > > > > > Is this clock power management something specific to RTS5250S,
> > > > > > or is it standard PCIe architected stuff?
> > > > > >
> > > > > 0xFE58 is specific register to RTS5250S not standard PCIe architected
> stuff.
> > > >
> > > > OK.  I asked because devices often mirror architected PCIe config
> > > > things in device-specific MMIO space, and if I squint just right,
> > > > I can sort of match up the register bits you used with things in the 
> > > > PCIe
> spec.
> > > >
> > > > > > > 2.Check cfg space whether support L1SS or not.
> > > > > >
> > > > > > This sounds like standard PCIe ASPM L1 Substates, right?
> > > > > >
> > > > > Yes.
> > > > >
> > > > > > > 3.If support L1SS, set 0xFF03 to free clkreq.
> > > > > > > 4.When entering idle status, enable aspm
> > > > > > >   and set parameters for L1SS and LTR.
> > > > > > > 5.Wnen entering run status, disable aspm
> > > > > > >   and set parameters for L1SS and LTR.
> > > > > >
> > > > > > In general, drivers should not configure ASPM, L1SS, and LTR
> > > > > > themselves; the PCI core should do that.
> > > > > >
> > > > > > If a driver needs to tweak ASPM at run-time, it should use
> > > > > > interfaces exported by the PCI core to do so.
> > > > > >
> > > > > Which interface I can use to set ASPM? I use "pci_write_config_byte"
> now.
> > > >
> > > > What do you need to do?  include/linux/pci-aspm.h exports
> > > > pci_disable_link_state(), which is mainly used to avoid ASPM
> > > > states that have hardware errata.
> > > >
> > > I want to enable ASPM(L0 -> L1) and disable ASPM(L1 -> L0), which
> > > interface can I use?
> >
> > You can use pci_disable_link_state() to disable usage of L1.
> >
> > Currently there is no corresponding pci_enable_link_state().  What if
> > we added something like the following (untested)?  Would that work for
> > you?
> 
> Hi Rui,
> 
> Any thoughts on the patch below?

I'm busy with other work, the patch seems ok, I will test it later.
> 
> > commit 209930d809fa602b8aafdd171b26719cee6c6649
> > Author: Bjorn Helgaas 
> > Date:   Wed Dec 27 16:56:26 2017 -0600
> >
> > PCI/ASPM: Add pci_enable_link_state()
> >
> > Some drivers want control over the ASPM states their device is allowed
> to
> > use.  We already have a pci_disable_link_state(), and drivers can use
> that
> > to prevent the device from entering L0 or L1s.
> >
> > Add a corresponding pci_enable_link_state() so a driver can enable use
> of
> > L0 or L1s again.
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index
> > 3b9b4d50cd98..ca217195f800 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -1028,6 +1028,67 @@ void pcie_aspm_powersave_config_link(struct
> pci_dev *pdev)
> > up_read(&pci_bus_sem);
> >  }
> >
> > +/**
> > + * pci_enable_link_state - Enable device's link state, so the link
> > +may
> > + * enter specific states.  Note that if the BIOS didn't grant ASPM
> > + * control to the OS, this does nothing because we can't touch the
> > +LNKCTL
> > + * register.
> > + *
> > + * @pdev: PCI device
> > + * @state: ASPM link state to enable
> > + */
> > +void pci_enable_link_state(struct pci_dev *pdev, int state) {
> > +   struct pci_dev *parent = pdev->bus->self;
> > +   struct pcie_link_state *link;
> > +   u32 lnkcap;
> > +
> > +   if (!pci_is_pcie(pdev))
> > +   return;
> > +
> > +   if (pdev->has_secondary_link)
> > +   parent = pdev;
> > +   if (!parent || !parent->link_state)
> > +   return;
> > +
> > +   /*
> > +* A driver requested that ASPM be enabled on this device, but
> > +* if we don't have per

Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

2018-01-18 Thread Ming Lei
On Fri, Jan 19, 2018 at 05:09:46AM +, Bart Van Assche wrote:
> On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote:
> > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
> > it should be DM-only which returns STS_RESOURCE so often.
> 
> That's wrong at least for SCSI. See also 
> https://marc.info/?l=linux-block&m=151578329417076.
> 

> For other scenario's, e.g. if a SCSI initiator submits a
> SCSI request over a fabric and the SCSI target replies with "BUSY" then the

Could you explain a bit when SCSI target replies with BUSY very often?

Inside initiator, we have limited the max per-LUN requests and per-host
requests already before calling .queue_rq().

> SCSI core will end the I/O request with status BLK_STS_RESOURCE after the
> maximum number of retries has been reached (see also scsi_io_completion()).
> In that last case, if a SCSI target sends a "BUSY" reply over the wire back
> to the initiator, there is no other approach for the SCSI initiator to
> figure out whether it can queue another request than to resubmit the
> request. The worst possible strategy is to resubmit a request immediately
> because that will cause a significant fraction of the fabric bandwidth to
> be used just for replying "BUSY" to requests that can't be processed
> immediately.


-- 
Ming


Re: [patch v17 2/4] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver

2018-01-18 Thread kbuild test robot
Hi Oleksandr,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15-rc8]
[cannot apply to next-20180118]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

Note: the 
linux-review/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20180119-123719
 HEAD b9c3d4721186f8264960ad87c6c499cdd1b6c2e8 builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

   drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_init':
>> drivers/jtag/jtag-aspeed.c:657:21: error: implicit declaration of function 
>> 'devm_reset_control_get_shared'; did you mean 'devm_pinctrl_get_select'? 
>> [-Werror=implicit-function-declaration]
 aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev,
^
devm_pinctrl_get_select
>> drivers/jtag/jtag-aspeed.c:657:19: warning: assignment makes pointer from 
>> integer without a cast [-Wint-conversion]
 aspeed_jtag->rst = devm_reset_control_get_shared(aspeed_jtag->dev,
  ^
>> drivers/jtag/jtag-aspeed.c:664:2: error: implicit declaration of function 
>> 'reset_control_deassert' [-Werror=implicit-function-declaration]
 reset_control_deassert(aspeed_jtag->rst);
 ^~
   drivers/jtag/jtag-aspeed.c: In function 'aspeed_jtag_deinit':
>> drivers/jtag/jtag-aspeed.c:707:2: error: implicit declaration of function 
>> 'reset_control_assert' [-Werror=implicit-function-declaration]
 reset_control_assert(aspeed_jtag->rst);
 ^~~~
   cc1: some warnings being treated as errors

vim +657 drivers/jtag/jtag-aspeed.c

   631  
   632  int aspeed_jtag_init(struct platform_device *pdev,
   633   struct aspeed_jtag *aspeed_jtag)
   634  {
   635  struct resource *res;
   636  int err;
   637  
   638  res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
   639  aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, 
res);
   640  if (IS_ERR(aspeed_jtag->reg_base))
   641  return -ENOMEM;
   642  
   643  aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL);
   644  if (IS_ERR(aspeed_jtag->pclk)) {
   645  dev_err(aspeed_jtag->dev, "devm_clk_get failed\n");
   646  return PTR_ERR(aspeed_jtag->pclk);
   647  }
   648  
   649  aspeed_jtag->irq = platform_get_irq(pdev, 0);
   650  if (aspeed_jtag->irq < 0) {
   651  dev_err(aspeed_jtag->dev, "no irq specified\n");
   652  return -ENOENT;
   653  }
   654  
   655  clk_prepare_enable(aspeed_jtag->pclk);
   656  
 > 657  aspeed_jtag->rst = 
 > devm_reset_control_get_shared(aspeed_jtag->dev,
   658   NULL);
   659  if (IS_ERR(aspeed_jtag->rst)) {
   660  dev_err(aspeed_jtag->dev,
   661  "missing or invalid reset controller device 
tree entry");
   662  return PTR_ERR(aspeed_jtag->rst);
   663  }
 > 664  reset_control_deassert(aspeed_jtag->rst);
   665  
   666  /* Enable clock */
   667  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN |
   668ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL);
   669  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN |
   670ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW);
   671  
   672  err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq,
   673 aspeed_jtag_interrupt, 0,
   674 "aspeed-jtag", aspeed_jtag);
   675  if (err) {
   676  dev_err(aspeed_jtag->dev, "unable to get IRQ");
   677  goto clk_unprep;
   678  }
   679  dev_dbg(&pdev->dev, "IRQ %d.\n", aspeed_jtag->irq);
   680  
   681  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE |
   682ASPEED_JTAG_ISR_INS

Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

2018-01-18 Thread Ming Lei
On Thu, Jan 18, 2018 at 09:02:45PM -0700, Jens Axboe wrote:
> On 1/18/18 7:32 PM, Ming Lei wrote:
> > On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote:
> >> On 1/18/18 11:47 AM, Bart Van Assche wrote:
>  This is all very tiresome.
> >>>
> >>> Yes, this is tiresome. It is very annoying to me that others keep
> >>> introducing so many regressions in such important parts of the kernel.
> >>> It is also annoying to me that I get blamed if I report a regression
> >>> instead of seeing that the regression gets fixed.
> >>
> >> I agree, it sucks that any change there introduces the regression. I'm
> >> fine with doing the delay insert again until a new patch is proven to be
> >> better.
> > 
> > That way is still buggy as I explained, since rerun queue before adding
> > request to hctx->dispatch_list isn't correct. Who can make sure the request
> > is visible when __blk_mq_run_hw_queue() is called?
> 
> That race basically doesn't exist for a 10ms gap.
> 
> > Not mention this way will cause performance regression again.
> 
> How so? It's _exactly_ the same as what you are proposing, except mine
> will potentially run the queue when it need not do so. But given that
> these are random 10ms queue kicks because we are screwed, it should not
> matter. The key point is that it only should be if we have NO better
> options. If it's a frequently occurring event that we have to return
> BLK_STS_RESOURCE, then we need to get a way to register an event for
> when that condition clears. That event will then kick the necessary
> queue(s).

Please see queue_delayed_work_on(), hctx->run_work is shared by all
scheduling, once blk_mq_delay_run_hw_queue(100ms) returns, no new
scheduling can make progress during the 100ms.

> 
> >> From the original topic of this email, we have conditions that can cause
> >> the driver to not be able to submit an IO. A set of those conditions can
> >> only happen if IO is in flight, and those cases we have covered just
> >> fine. Another set can potentially trigger without IO being in flight.
> >> These are cases where a non-device resource is unavailable at the time
> >> of submission. This might be iommu running out of space, for instance,
> >> or it might be a memory allocation of some sort. For these cases, we
> >> don't get any notification when the shortage clears. All we can do is
> >> ensure that we restart operations at some point in the future. We're SOL
> >> at that point, but we have to ensure that we make forward progress.
> > 
> > Right, it is a generic issue, not DM-specific one, almost all drivers
> > call kmalloc(GFP_ATOMIC) in IO path.
> 
> GFP_ATOMIC basically never fails, unless we are out of memory. The

I guess GFP_KERNEL may never fail, but GFP_ATOMIC failure might be
possible, and it is mentioned[1] there is such code in mm allocation
path, also OOM can happen too.

  if (some randomly generated condition) && (request is atomic)
  return NULL;

[1] https://lwn.net/Articles/276731/

> exception is higher order allocations. If a driver has a higher order
> atomic allocation in its IO path, the device driver writer needs to be
> taken out behind the barn and shot. Simple as that. It will NEVER work
> well in a production environment. Witness the disaster that so many NIC
> driver writers have learned.
> 
> This is NOT the case we care about here. It's resources that are more
> readily depleted because other devices are using them. If it's a high
> frequency or generally occurring event, then we simply must have a
> callback to restart the queue from that. The condition then becomes
> identical to device private starvation, the only difference being from
> where we restart the queue.
> 
> > IMO, there is enough time for figuring out a generic solution before
> > 4.16 release.
> 
> I would hope so, but the proposed solutions have not filled me with
> a lot of confidence in the end result so far.
> 
> >> That last set of conditions better not be a a common occurence, since
> >> performance is down the toilet at that point. I don't want to introduce
> >> hot path code to rectify it. Have the driver return if that happens in a
> >> way that is DIFFERENT from needing a normal restart. The driver knows if
> >> this is a resource that will become available when IO completes on this
> >> device or not. If we get that return, we have a generic run-again delay.
> > 
> > Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
> > it should be DM-only which returns STS_RESOURCE so often.
> 
> Where does the dm STS_RESOURCE error usually come from - what's exact
> resource are we running out of?

It is from blk_get_request(underlying queue), see multipath_clone_and_map().

Thanks,
Ming


Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions

2018-01-18 Thread Baoquan He
On 01/19/18 at 02:42pm, Dou Liyang wrote:
> Hi Baoquan,
> 
> At 01/05/2018 12:39 PM, Baoquan He wrote:
> [...]
> >   /*
> > - * Not an __init, needed by kexec/kdump code.
> > - * For safety IO-APIC and Local APIC need be cleared before this.
> > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT 
> > is
> > + * provided by using the APICs in conjunction with standard 
> > 8259A-equivalent
> > + * programmable interrupt controllers (PICs). It's necessary to deliver 
> > legacy
> > + * interrupts even when APIC mode is not enabled. This is required by 
> > kexec/
> > + * kdump before enter into the 2nd kernel.
> >*/
> >   void switch_to_legacy_irq_mode(void)
> >   {
> > if (!nr_legacy_irqs())
> > return;
> > -   x86_io_apic_ops.disable();
> > +   ioapic_set_virtual_wire_mode();
> > +
> > +   if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > +   lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
> 
> Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
> exclusive.

Thanks for looking into this, dou!

It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode
subsection, there are two kinds of virtual wire mode, one is
8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other
is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0
of LAPIC. Whatever it is, LAPIC need be set as through-lapic.

Above is what I got from mp_spec. But from function
native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be
telling that if io-apic is connected to 8259A-Equivalent pics, we need
mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2.

Thanks
Baoquan
> 
> But We do that because both the through-lapic and through-ioapic virtual
> wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
> the lapic_set_legacy_irq_mode(). So we need call them both.
> 
> IMO, this cleanup may not make it clear. we can separate these two mode
> totally or just keep it like before.
> 
> Thanks,
>   dou.
> >   }
> >   #ifdef CONFIG_X86_32
> > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> > index 1151ccd72ce9..c30f0f273dbd 100644
> > --- a/arch/x86/kernel/x86_init.c
> > +++ b/arch/x86/kernel/x86_init.c
> > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
> >   struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> > .read   = native_io_apic_read,
> > -   .disable= native_disable_io_apic,
> > +   .disable= switch_to_legacy_irq_mode,
> >   };
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index 49721b4e1975..751472ddf536 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> >  * now.
> >  */
> > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > -   disconnect_bsp_APIC(0);
> > +   lapic_set_legacy_irq_mode(0);
> >   }
> >   static void __init irq_remapping_modify_x86_ops(void)
> > 
> 
> 


Re: [PATCH v4] perf report: Fix regression when decoding intelPT traces

2018-01-18 Thread Adrian Hunter
On 18/01/18 18:29, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jan 10, 2018 at 01:31:52PM -0700, Mathieu Poirier escreveu:
>> Commit (93d10af26bb7 perf tools: Optimize sample parsing for ordered
>> events) breaks intelPT trace decoding by invariably returning an error if
>> the event type isn't a PERF_SAMPLE_TIME.
> 
> Adrian, have you had the chance of looking at this?
> 
> I'm tentatively applying with Jiri's ack.

Yes, it is fine.  FWIW

Acked-by: Adrian Hunter 

> 
> - Arnaldo
>  
>> With this patch the timestamp is initialised and processing is allowed to
>> continue if the error returned by function
>> perf_evlist__parse_sample_timestamp() is not a fault.
>>
>> Signed-off-by: Mathieu Poirier 
>> Acked-by: Jiri Olsa 
>> ---
>> Changes for v4:
>> - Rebased to latest perf/core branch
>> - Added Jiri's ACK
>> ---
>>  tools/perf/util/session.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 54e30f1bcbd7..07221884f725 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -1508,10 +1508,10 @@ static s64 perf_session__process_event(struct 
>> perf_session *session,
>>  return perf_session__process_user_event(session, event, 
>> file_offset);
>>  
>>  if (tool->ordered_events) {
>> -u64 timestamp;
>> +u64 timestamp = -1ULL;
>>  
>>  ret = perf_evlist__parse_sample_timestamp(evlist, event, 
>> ×tamp);
>> -if (ret)
>> +if (ret && ret != -1)
>>  return ret;
>>  
>>  ret = perf_session__queue_event(session, event, timestamp, 
>> file_offset);
>> -- 
>> 2.7.4
> 



[PATCH] Fix explanation of lower bits in the SPARSEMEM mem_map pointer

2018-01-18 Thread Petr Tesarik
The comment is confusing. On the one hand, it refers to 32-bit
alignment (struct page alignment on 32-bit platforms), but this
would only guarantee that the 2 lowest bits must be zero. On the
other hand, it claims that at least 3 bits are available, and 3 bits
are actually used.

This is not broken, because there is a stronger alignment guarantee,
just less obvious. Let's fix the comment to make it clear how many
bits are available and why.

Signed-off-by: Petr Tesarik 
---
 include/linux/mmzone.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 67f2e3c38939..7522a6987595 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1166,8 +1166,16 @@ extern unsigned long usemap_size(void);
 
 /*
  * We use the lower bits of the mem_map pointer to store
- * a little bit of information.  There should be at least
- * 3 bits here due to 32-bit alignment.
+ * a little bit of information.  The pointer is calculated
+ * as mem_map - section_nr_to_pfn(pnum).  The result is
+ * aligned to the minimum alignment of the two values:
+ *   1. All mem_map arrays are page-aligned.
+ *   2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
+ *  lowest bits.  PFN_SECTION_SHIFT is arch-specific
+ *  (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
+ *  worst combination is powerpc with 256k pages,
+ *  which results in PFN_SECTION_SHIFT equal 6.
+ * To sum it up, at least 6 bits are available.
  */
 #defineSECTION_MARKED_PRESENT  (1UL<<0)
 #define SECTION_HAS_MEM_MAP(1UL<<1)
-- 
2.13.6


Re: [PATCH v5 0/2] kprobes: improve error handling when arming/disarming kprobes

2018-01-18 Thread Masami Hiramatsu
Hi Ingo,

Could you pick this to tip tree?

Thank you,

On Wed, 10 Jan 2018 00:51:22 +0100
Jessica Yu  wrote:

> Hi,
> 
> This patchset attempts to improve error handling when arming or disarming
> ftrace-based kprobes. The current behavior is to simply WARN when ftrace
> (un-)registration fails, without propagating the error code. This can lead
> to confusing situations where, for example, register_kprobe()/enable_kprobe()
> would return 0 indicating success even if arming via ftrace had failed. In
> this scenario we'd end up with a non-functioning kprobe even though kprobe
> registration (or enablement) returned success. In this patchset, we take
> errors from ftrace into account and propagate the error when we cannot arm
> or disarm a kprobe.
> 
> Below is an example that illustrates the problem using livepatch and
> systemtap (which uses kprobes underneath). Both livepatch and kprobes use
> ftrace ops with the IPMODIFY flag set, so registration at the same
> function entry is limited to only one ftrace user. 
> 
> Before
> --
> # modprobe livepatch-sample   # patches cmdline_proc_show, ftrace ops has 
> IPMODIFY set
> # stap -e 'probe kernel.function("cmdline_proc_show").call { printf 
> ("cmdline_proc_show\n"); }'
> 
>.. (nothing prints after reading /proc/cmdline) ..
> 
> The systemtap handler doesn't execute due to a kprobe arming failure caused
> by a ftrace IPMODIFY conflict with livepatch, and there isn't an obvious
> indication of error from systemtap (because register_kprobe() returned
> success) unless the user inspects dmesg.
> 
> After
> -
> # modprobe livepatch-sample 
> # stap -e 'probe kernel.function("cmdline_proc_show").call { printf 
> ("cmdline_proc_show\n"); }'
> WARNING: probe 
> kernel.function("cmdline_proc_show@/home/jeyu/work/linux-next/fs/proc/cmdline.c:6").call
>  (address 0xa82fe910) registration error (rc -16)
> 
> Although the systemtap handler doesn't execute (as it shouldn't), the
> ftrace error is propagated and now systemtap prints a visible error message
> stating that (kprobe) registration had failed (because register_kprobe()
> returned an error), along with the propagated error code.
> 
> This patchset was based on Petr Mladek's original patchset (patches 2 and 3)
> back in 2015, which improved kprobes error handling, found here:
> 
>https://lkml.org/lkml/2015/2/26/452
> 
> However, further work on this had been paused since then and the patches
> were not upstreamed.
> 
> This patchset has been lightly sanity-tested (on linux-next) with kprobes,
> kretprobes, and optimized kprobes. It passes the kprobes smoke test, but
> more testing is greatly appreciated.
> 
> Changes from v4:
>  - Switch from WARN() to pr_debug() in arm_kprobe_ftrace() so the stack
>dumps don't pollute dmesg, as IPMODIFY conflicts can occur in normal usage
>  - Added Masami's ack to the first patch
> 
> Changes from v3:
>  - Have (dis)arm_kprobe_ftrace() return -ENODEV instead of 0 in case of
>!CONFIG_KPROBES_ON_FTRACE
>  - Add total count of all probes tried in (dis)arm_all_kprobes()
> 
> Changes from v2:
>  - Add missing synchronize rcu in register_aggr_kprobe()
>  - s/kprobes/probes/ on error message in (dis)arm_all_kprobes()
> 
> Changes from v1:
> - Don't arm the kprobe before adding it to the kprobe table, otherwise
>   we'll temporarily see a stray breakpoint.
> - Remove kprobe from the kprobe_table and call synchronize_sched() if
>   arming during register_kprobe() fails.
> - add Masami's ack on the 2nd patch (unchanged from v1)
> 
> ---
> Jessica Yu (2):
>   kprobes: propagate error from arm_kprobe_ftrace()
>   kprobes: propagate error from disarm_kprobe_ftrace()
> 
>  kernel/kprobes.c | 178 
> +++
>  1 file changed, 128 insertions(+), 50 deletions(-)
> 
> -- 
> 2.13.6
> 


-- 
Masami Hiramatsu 


Re: [PATCH] xhci:Fix NULL pointer in xhci debugfs

2018-01-18 Thread Mathias Nyman

On 19.01.2018 04:13, Zhengjun Xing wrote:

Commit dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") causes a
null pointer dereference while fixing xhci-debugfs usage of ring pointers
that were freed during hibernate.

The fix passed addresses to ring pointers instead, but forgot to do this
change for the xhci_ring_trb_show function.

The address of the ring pointer passed to xhci-debugfs was of a temporary
ring pointer "new_ring" instead of the actual ring "ring" pointer. The
temporary new_ring pointer will be set to NULL later causing the NULL
pointer dereference.

This issue was seen when reading xhci related files in debugfs:

cat /sys/kernel/debug/usb/xhci/*/devices/*/ep*/trbs

[  184.604861] BUG: unable to handle kernel NULL pointer dereference at (null)
[  184.613776] IP: xhci_ring_trb_show+0x3a/0x890
[  184.618733] PGD 264193067 P4D 264193067 PUD 263238067 PMD 0
[  184.625184] Oops:  [#1] SMP
[  184.726410] RIP: 0010:xhci_ring_trb_show+0x3a/0x890
[  184.731944] RSP: 0018:ba8243c0fd90 EFLAGS: 00010246
[  184.737880] RAX:  RBX:  RCX: 000295d6
[  184.746020] RDX: 000295d5 RSI: 0001 RDI: 971a6418d400
[  184.754121] RBP:  R08:  R09: 
[  184.76] R10: 971a64c98a80 R11: 971a62a00e40 R12: 971a62a85500
[  184.770325] R13: 0002 R14: 971a6418d400 R15: 971a6418d400
[  184.778448] FS:  7fe725a79700() GS:971a6ec0() 
knlGS:
[  184.787644] CS:  0010 DS:  ES:  CR0: 80050033
[  184.794168] CR2:  CR3: 00025f365005 CR4: 003606f0
[  184.802318] Call Trace:
[  184.805094]  ? seq_read+0x281/0x3b0
[  184.809068]  seq_read+0xeb/0x3b0
[  184.812735]  full_proxy_read+0x4d/0x70
[  184.817007]  __vfs_read+0x23/0x120
[  184.820870]  vfs_read+0x91/0x130
[  184.824538]  SyS_read+0x42/0x90
[  184.828106]  entry_SYSCALL_64_fastpath+0x1a/0x7d

Fixes: dde634057da7 ("xhci: Fix use-after-free in xhci debugfs")
Signed-off-by: Zhengjun Xing 
---


Thanks, adding  to queue

-Mathias


linux-next: Tree for Jan 19

2018-01-18 Thread Stephen Rothwell
Hi all,

News: there will probably be very few, if any, releases next week as LCA
is on (unfortunate clash with the merge window).

Changes since 20180118:

The powerpc tree gained a build failure due to an interaction with Linus'
tree, so I applied a merge fix patch.  It gained another for which I
applied a supplied fix patch.

The f2fs tree gained a build failure due to an interaction with the
btrfs tree for which I reverted a commit.

The net-next tree gained a conflict against the net tree.

Non-merge commits (relative to Linus' tree): 9833
 9793 files changed, 406830 insertions(+), 263432 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (dda3e15231b3 Merge branch 'fixes' of 
git://git.armlinux.org.uk/~rmk/linux-arm)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from 
.gitignore)
Merging arc-current/for-curr (8ff3afc159f2 ARC: Enable fatal signals on boot 
for dev platforms)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (1b689a95ce74 powerpc/pseries: include 
linux/types.h in asm/hvcall.h)
Merging sparc/master (59585b4be9ae sparc64: repair calling incorrect hweight 
function from stubs)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (b200bfd6112a fm10k: mark PM functions as __maybe_unused)
Merging bpf/master (7155f8f39157 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf)
Merging ipsec/master (ad9294dbc227 bpf: fix cls_bpf on filter replace)
Merging netfilter/master (889c604fd0b5 netfilter: x_tables: fix int overflow in 
xt_alloc_table_info())
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (cc124d5cc8d8 brcmfmac: fix CLM load error for 
legacy chips when user helper is enabled)
Merging mac80211/master (59b179b48ce2 cfg80211: check dev_set_name() return 
value)
Merging rdma-fixes/for-rc (ae59c3f0b6cf RDMA/mlx5: Fix out-of-bound access 
while querying AH)
Merging sound-current/for-linus (b3defb791b26 ALSA: seq: Make ioctls race-free)
Merging pci-current/for-linus (d6c1efecd1e1 x86/PCI: Enable AMD 64-bit window 
on resume)
Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6)
Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb.current/usb-linus (a8750ddca918 Linux 4.15-rc8)
Merging usb-gadget-fixes/fixes (b2cd1df66037 Linux 4.15-rc7)
Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new 
device ID ELV ALC 8xxx)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (a8750ddca918 Linux 4.15-rc8)
Merging char-misc.current/char-misc-linus (a8750ddca918 Linux 4

Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread jianchao.wang
Hi Keith

Thanks for your kindly reminding.

On 01/19/2018 02:05 PM, Keith Busch wrote:
>>> The driver may be giving up on the command here, but that doesn't mean
>>> the controller has. We can't just end the request like this because that
>>> will release the memory the controller still owns. We must wait until
>>> after nvme_dev_disable clears bus master because we can't say for sure
>>> the controller isn't going to write to that address right after we end
>>> the request.
>>>
>> Yes, but the controller is going to be reseted or shutdown at the moment,
>> even if the controller accesses a bad address and goes wrong, everything will
>> be ok after reset or shutdown. :)
> Hm, I don't follow. DMA access after free is never okay.
Yes, this may cause unexpected memory corruption.

Thanks
Jianchao


[git pull] drm fixes for 4.15 final

2018-01-18 Thread Dave Airlie
Hi Linus,

This is a set of drm regression fixes that I'd like to get into 4.15
final, but I understand if it's too much too late, and am happy to
drop these into -next and make people chase the stable monkey.

The i915 change fixes a display corruption problem introduced in 4.15,
the nouveau changes are for regressions in 4.15, one of the vmwgfx
fixes goes back a little further, the other is a 4.15 regression fix,
the 3 sun4i changes fix blank HDMI output on those devices.

Again happy if you don't take these, just let me know, I suspect 4.15
will have a lot of stable backports for security things over time!

Thanks,
Dave.


The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da:

  Linux 4.15-rc8 (2018-01-14 15:32:30 -0800)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-v4.15-rc9

for you to fetch changes up to 04cef3eadcf0bf9783a985286cc5f48c5d33fd7a:

  Merge tag 'drm-intel-fixes-2018-01-18' of
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes (2018-01-19
12:40:07 +1000)


nouveau, i915, vmwgfx and sun4i regression fixes


Ben Skeggs (1):
  drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling

Dave Airlie (4):
  Merge branch 'vmwgfx-fixes-4.15' of
git://people.freedesktop.org/~thomash/linux into drm-fixes
  Merge tag 'drm-misc-fixes-2018-01-17' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
  Merge branch 'linux-4.15' of git://github.com/skeggsb/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2018-01-18' of
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Jon Hunter (1):
  drm/nouveau/bar/gk20a: Avoid bar teardown during init

Jonathan Liu (3):
  drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate
  drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
  drm/sun4i: hdmi: Add missing rate halving check in
sun4i_tmds_determine_rate

Rob Clark (1):
  drm/vmwgfx: fix memory corruption with legacy/sou connectors

Thierry Reding (1):
  drm/nouveau/drm/nouveau: Pass the proper arguments to
nvif_object_map_handle()

Ville Syrjälä (3):
  drm/i915: Add .get_hw_state() method for planes
  drm/i915: Redo plane sanitation during readout
  drm/i915: Fix deadlock in i830_disable_pipe()

Woody Suwalski (1):
  drm/vmwgfx: Fix a boot time warning

 drivers/gpu/drm/i915/intel_display.c   | 303 +++--
 drivers/gpu/drm/i915/intel_drv.h   |   2 +
 drivers/gpu/drm/i915/intel_sprite.c|  83 ++
 drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h  |   1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c   |   4 +-
 drivers/gpu/drm/nouveau/nvkm/engine/device/base.c  |   4 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c |   3 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c|   1 -
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/Kbuild |   2 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c|  41 +++
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h  |  10 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c |  45 +++
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c  |  16 +-
 drivers/gpu/drm/sun4i/sun4i_hdmi_tmds_clk.c|   9 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |   4 +-
 17 files changed, 367 insertions(+), 167 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/mcp77.c
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmmcp77.c


Re: [RESEND PATCH 3/3] x86/apic: Clean up the names of legacy irq mode setting related functions

2018-01-18 Thread Dou Liyang

Hi Baoquan,

At 01/05/2018 12:39 PM, Baoquan He wrote:
[...]

  /*
- * Not an __init, needed by kexec/kdump code.
- * For safety IO-APIC and Local APIC need be cleared before this.
+ * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
+ * provided by using the APICs in conjunction with standard 8259A-equivalent
+ * programmable interrupt controllers (PICs). It's necessary to deliver legacy
+ * interrupts even when APIC mode is not enabled. This is required by kexec/
+ * kdump before enter into the 2nd kernel.
   */
  void switch_to_legacy_irq_mode(void)
  {
if (!nr_legacy_irqs())
return;
  
-	x86_io_apic_ops.disable();

+   ioapic_set_virtual_wire_mode();
+
+   if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
+   lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);


Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
exclusive.

But We do that because both the through-lapic and through-ioapic virtual 
wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in

the lapic_set_legacy_irq_mode(). So we need call them both.

IMO, this cleanup may not make it clear. we can separate these two mode 
totally or just keep it like before.


Thanks,
dou.

  }
  
  #ifdef CONFIG_X86_32

diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 1151ccd72ce9..c30f0f273dbd 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
  
  struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {

.read   = native_io_apic_read,
-   .disable= native_disable_io_apic,
+   .disable= switch_to_legacy_irq_mode,
  };
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 49721b4e1975..751472ddf536 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
 * now.
 */
if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
-   disconnect_bsp_APIC(0);
+   lapic_set_legacy_irq_mode(0);
  }
  
  static void __init irq_remapping_modify_x86_ops(void)







Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY

2018-01-18 Thread Icenowy Zheng


于 2018年1月19日 GMT+08:00 下午2:25:09, Chen-Yu Tsai  写到:
>Hi Kishon,
>
>On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauer
> wrote:
>> On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote:
>>> Allwinner R40 features a USB PHY like the one in A64, but with 3
>PHYs.
>>>
>>> Add support for it.
>>>
>>> Signed-off-by: Icenowy Zheng 
>>> Acked-by: Maxime Ripard 
>>> Acked-by: Rob Herring 
>>
>> You may add
>>
>> Tested-by: hermann.la...@iwr.uni-heidelberg.de
>
>Gentle ping for this patch to be included in 4.16

I think maybe I forgot PATCH in title so it didn't enter patchwork?

>
>ChenYu
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH 6/6] s390: scrub registers on kernel entry and KVM exit

2018-01-18 Thread QingFeng Hao



在 2018/1/17 17:48, Martin Schwidefsky 写道:

Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

I am not sure if I understand this but it will be safer?
And can we abstract the operations to be a macro like CLEAR_REG_7?
Thanks


Suggested-by: Christian Borntraeger 
Reviewed-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
  arch/s390/kernel/entry.S | 41 +
  1 file changed, 41 insertions(+)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 2a22c03..47227d3 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -322,6 +322,12 @@ ENTRY(sie64a)
  sie_exit:
lg  %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg%r0,%r13,0(%r14)# save guest gprs 0-13
+   xgr %r0,%r0 # clear guest registers
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers
lg  %r2,__SF_EMPTY+16(%r15) # return exit reason code
br  %r14
@@ -358,6 +364,7 @@ ENTRY(system_call)
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
BPENTER __TI_flags(%r12),_TIF_NOBP
stmg%r0,%r7,__PT_R0(%r11)
+   xgr %r0,%r0
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
@@ -640,6 +647,14 @@ ENTRY(pgm_check_handler)
  4:lgr %r13,%r11
la  %r11,STACK_FRAME_OVERHEAD(%r15)
stmg%r0,%r7,__PT_R0(%r11)
+   xgr %r0,%r0 # clear user space registers
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
@@ -706,6 +721,15 @@ ENTRY(io_int_handler)
lmg %r8,%r9,__LC_IO_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   xgr %r0,%r0 # clear user space registers
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
@@ -924,6 +948,15 @@ ENTRY(ext_int_handler)
lmg %r8,%r9,__LC_EXT_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   xgr %r0,%r0 # clear user space registers
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
lghi%r1,__LC_EXT_PARAMS2
@@ -1133,6 +1166,14 @@ ENTRY(mcck_int_handler)
  .Lmcck_skip:
lghi%r14,__LC_GPREGS_SAVE_AREA+64
stmg%r0,%r7,__PT_R0(%r11)
+   xgr %r0,%r0 # clear user space registers
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),0(%r14)
stmg%r8,%r9,__PT_PSW(%r11)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)


--
Regards
QingFeng Hao



Re: [RESEND] phy: sun4i-usb: add support for R40 USB PHY

2018-01-18 Thread Chen-Yu Tsai
Hi Kishon,

On Mon, Jan 15, 2018 at 11:06 PM, Hermann Lauer
 wrote:
> On Wed, Jan 03, 2018 at 04:49:44PM +0800, Icenowy Zheng wrote:
>> Allwinner R40 features a USB PHY like the one in A64, but with 3 PHYs.
>>
>> Add support for it.
>>
>> Signed-off-by: Icenowy Zheng 
>> Acked-by: Maxime Ripard 
>> Acked-by: Rob Herring 
>
> You may add
>
> Tested-by: hermann.la...@iwr.uni-heidelberg.de

Gentle ping for this patch to be included in 4.16

ChenYu


Re: [RESEND PATCH 2/3] x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump kernel

2018-01-18 Thread Dou Liyang

Hi Baoquan,

At 01/17/2018 06:08 PM, Baoquan He wrote:

On 01/17/18 at 05:47pm, Dou Liyang wrote:

Hi Baoquan,

At 01/05/2018 12:38 PM, Baoquan He wrote:

In commit

commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local 
APIC").

lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
Virtual wire mode. While the above commit putting disable_IO_APIC earlier
causes local APIC is completely disabled. So the legacy irq mode is
disabled too before jump to kexec/kdump kernel.


I have a question:

As you said, Due to disable_IO_APIC() is triggered before
lapic_shutdown(), So the interrupt virtual wire mode will be disabled.

but, I found that:

After machine_crash_shutdown() is executed, Linux will call
machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be
called again, why it can't switch to virtual wire mode successfully? Or
is my understanding wrong?

The disable_IO_APIC() calling has a condition check,

if (image->preserve_context) {
disable_IO_APIC();
}

For preserve_context case, it comes from kernel_kexec(). You can check
it in kexec man page, that is another scenario we use kexec for. But not
kexec and kdump.



Understood!

This patch looks good to me and I also tested it, it's OK.

Thanks,
dou.


+--+
| __crash_kexec|
+--+
|
|+-+
+--> | machine_crash_shutdown  |
|+++
| |
| |  +-+
| +> | disable_IO_APIC |
| |  +-+
| |
| |  ++
| +-^+ lapic_shutdown |
|++
|
|+-+
+--> | machine_kexec   |
|+++
| |
| |  +-+
| +> | disable_IO_APIC |
|+-+
|
v

Thanks,
dou.

In normal kernel it defaults to be PIC mode or Virtual Wire mode during
system initialization before APIC mode is enabled and this is done by
BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
we should set system as PIC or Virtual Wire mode before jump to kdump
kernel code directly.

So let's take clear_IO_APIC out from disable_IO_APIC and rename
disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
kexec/kdump jumping.

Signed-off-by: Baoquan He 
---
   arch/x86/include/asm/io_apic.h |  3 ++-
   arch/x86/kernel/apic/io_apic.c | 12 
   arch/x86/kernel/crash.c|  2 +-
   arch/x86/kernel/machine_kexec_32.c | 15 +--
   arch/x86/kernel/machine_kexec_64.c | 15 +--
   arch/x86/kernel/reboot.c   |  2 +-
   6 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a8834dd546cd..e38ad3863a2c 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
   extern void setup_IO_APIC(void);
   extern void enable_IO_APIC(void);
-extern void disable_IO_APIC(void);
+extern void clear_IO_APIC (void);
+extern void switch_to_legacy_irq_mode(void);
   extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
   extern void print_IO_APICs(void);
   #else  /* !CONFIG_X86_IO_APIC */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8a7963421460..a47aa915d18c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned 
int pin)
   mpc_ioapic_id(apic), pin);
   }
-static void clear_IO_APIC (void)
+void clear_IO_APIC (void)
   {
int apic, pin;
@@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
   }
   /*
- * Not an __init, needed by the reboot code
+ * Not an __init, needed by kexec/kdump code.
+ * For safety IO-APIC and Local APIC need be cleared before this.
*/
-void disable_IO_APIC(void)
+void switch_to_legacy_irq_mode(void)
   {
-   /*
-* Clear the IO-APIC before rebooting:
-*/
-   clear_IO_APIC();
-
if (!nr_legacy_irqs())
return;
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 10e74d4778a1..318ffeaaf55a 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
   #ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
-   disable_IO_APIC();
+   clear_IO_APIC();
   #endif
lapic_shutdown();
   #ifdef CONFIG_HPET_TIMER
diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c

Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-18 Thread Wei Wang

On 01/18/2018 12:44 AM, Michael S. Tsirkin wrote:

On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote:
  
+static void virtballoon_changed(struct virtio_device *vdev)

+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   __u32 cmd_id;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(&vb->stop_update_lock, flags);
+   if (!vb->stop_update)

Why do you ignore stop_update for freeze?
This means new wq entries can be added during remove
causing use after free issues.


I think stop_update isn't needed, because the lock has already been 
handled internally by the APIs. Similar examples like 
mem_cgroup_css_free() in "mm/memcontrol.c", there is no such locks used 
for cancel_work_sync(&memcg->high_work).


Best,
Wei


Re: [PATCH v5 20/44] dt-bindings: clock: Add bindings for TI DA8XX USB PHY clocks

2018-01-18 Thread Sekhar Nori
On Friday 19 January 2018 12:30 AM, David Lechner wrote:
> On 01/18/2018 06:10 AM, Sekhar Nori wrote:
>> On Monday 08 January 2018 07:47 AM, David Lechner wrote:
>>> This adds a new binding for TI DA8XX USB PHY clocks. These clocks are
>>> part
>>> of a syscon register called CFGCHIP3.
>>
>> CFGCHIP2
>>
>>>
>>> Signed-off-by: David Lechner 
>>
>>> +Examples:
>>> +
>>> +    cfgchip: syscon@1417c {
>>> +    compatible = "ti,da830-cfgchip", "syscon", "simple-mfd";
>>> +    reg = <0x1417c 0x14>;
>>> +
>>> +    usb0_phy_clk: usb0-phy-clock {
>>> +    compatible = "ti,da830-usb0-phy-clock";
>>> +    #clock-cells = <0>;
>>> +    clocks = <&usb_refclkin>, <&pll0_aux_clk>, <&psc1 1>;
>>> +    clock-names = "usb_refclkin", "auxclk", "usb0_lpsc";
>>> +    clock-output-names = "usb0_phy_clk";
>>
>> Probably call this "usb0_phy" to match with the input name used for
>> usb1_phy_clk?
> 
> I was planning on just dropping clock-output-names altogether actually
> since they don't really do anything useful.
> 
> Also, I was considering sending a series to change the con_id for the
> PHY clocks.
> 
> My current revision of the device tree bindings is looking like this:
> 
> usb_phy: usb-phy {
>     compatible = "ti,da830-usb-phy";
>     #phy-cells = <1>;
>     clocks = <&usb_phy_clk 0>, <&usb_phy_clk 1>;
>     clock-names = "usb20_phy", "usb11_phy";
>     status = "disabled";
> };
> usb_phy_clk: usb-phy-clocks {
>     compatible = "ti,da830-usb-phy-clocks";
>     #clock-cells = <1>;
>     clocks = <&psc1 1>, <&usb_refclkin>, <&pll0_auxclk>;
>     clock-names = "fck", "usb_refclkin", "auxclk";
> };
> 
> The clock-names = "usb20_phy", "usb11_phy" comes from the existing con_ids
> in the PHY driver's clk_get()s.
> 
> However, in device tree, we are usually referring to the USB devices as
> usb0 and usb1 instead of usb20 and usb11, respectively. Figure 6-2 "USB
> Clocking Diagram" in spruh82c.pdf (AM1808 TRM) calls these clocks "CLK48"
> and "CLK48MHz from USB 2.0 PHY", so I was thinking of changing the con_ids
> (and therefore also clock-names) to "usb0_clk48" and "usb1_clk48".

This is fine with me.

Thanks,
Sekhar


Re: [PATCH v5 43/44] ARM: da8xx-dt: switch to device tree clocks

2018-01-18 Thread Sekhar Nori
On Friday 19 January 2018 12:10 AM, David Lechner wrote:
> On 01/18/2018 09:27 AM, Sekhar Nori wrote:
>> On Monday 08 January 2018 07:55 AM, David Lechner wrote:
>>> This removes all of the clock init code from da8xx-dt.c. This includes
>>> all of the OF_DEV_AUXDATA that was just used for looking up clocks.
>>>
>>> Note: You need to have clocks defined in your device tree or your system
>>> won't boot after this patch.
>>
>> I am not sure we can do this then, as we cannot break DT compatibility.
>>
> 
> In the past, you have told me that you don't want the .dts changes and code
> changes in the same patch. In this case, if you apply either one

Thats still true.

> separately,
> it will break clocks. It does not matter which one is first.
> 
> So either we have to squash [PATCH v5 44/44] ARM: dts: da850: Add clocks
> into this patch or deal with the breakage.

I am not so much concerned about temporary breakage in the middle of the
series, but more about DT compatibility after the entire series is applied.

Thanks,
Sekhar


[PATCH V2 net-next 1/4] net: hns3: add support for get_regs

2018-01-18 Thread Peng Li
From: Fuyun Liang 

This patch adds get_regs support for ethtool cmd.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|   3 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  23 +++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   4 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 176 +
 4 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 634e932..d104ce5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -356,7 +356,8 @@ struct hnae3_ae_ops {
u32 stringset, u8 *data);
int (*get_sset_count)(struct hnae3_handle *handle, int stringset);
 
-   void (*get_regs)(struct hnae3_handle *handle, void *data);
+   void (*get_regs)(struct hnae3_handle *handle, u32 *version,
+void *data);
int (*get_regs_len)(struct hnae3_handle *handle);
 
u32 (*get_rss_key_size)(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 358f780..1c8b293 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1063,6 +1063,27 @@ static int hns3_set_coalesce(struct net_device *netdev,
return 0;
 }
 
+static int hns3_get_regs_len(struct net_device *netdev)
+{
+   struct hnae3_handle *h = hns3_get_handle(netdev);
+
+   if (!h->ae_algo->ops->get_regs_len)
+   return -EOPNOTSUPP;
+
+   return h->ae_algo->ops->get_regs_len(h);
+}
+
+static void hns3_get_regs(struct net_device *netdev,
+ struct ethtool_regs *cmd, void *data)
+{
+   struct hnae3_handle *h = hns3_get_handle(netdev);
+
+   if (!h->ae_algo->ops->get_regs)
+   return;
+
+   h->ae_algo->ops->get_regs(h, &cmd->version, data);
+}
+
 static const struct ethtool_ops hns3vf_ethtool_ops = {
.get_drvinfo = hns3_get_drvinfo,
.get_ringparam = hns3_get_ringparam,
@@ -1103,6 +1124,8 @@ static const struct ethtool_ops hns3_ethtool_ops = {
.set_channels = hns3_set_channels,
.get_coalesce = hns3_get_coalesce,
.set_coalesce = hns3_set_coalesce,
+   .get_regs_len = hns3_get_regs_len,
+   .get_regs = hns3_get_regs,
 };
 
 void hns3_ethtool_set_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 3c3159b..2561e7a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -102,6 +102,10 @@ enum hclge_opcode_type {
HCLGE_OPC_STATS_64_BIT  = 0x0030,
HCLGE_OPC_STATS_32_BIT  = 0x0031,
HCLGE_OPC_STATS_MAC = 0x0032,
+
+   HCLGE_OPC_QUERY_REG_NUM = 0x0040,
+   HCLGE_OPC_QUERY_32_BIT_REG  = 0x0041,
+   HCLGE_OPC_QUERY_64_BIT_REG  = 0x0042,
/* Device management command */
 
/* MAC commond */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 27f0ab6..c3d2cca 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5544,6 +5544,180 @@ static int hclge_set_channels(struct hnae3_handle 
*handle, u32 new_tqps_num)
return ret;
 }
 
+static int hclge_get_regs_num(struct hclge_dev *hdev, u32 *regs_num_32_bit,
+ u32 *regs_num_64_bit)
+{
+   struct hclge_desc desc;
+   u32 total_num;
+   int ret;
+
+   hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_QUERY_REG_NUM, true);
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "Query register number cmd failed, ret = %d.\n", ret);
+   return ret;
+   }
+
+   *regs_num_32_bit = le32_to_cpu(desc.data[0]);
+   *regs_num_64_bit = le32_to_cpu(desc.data[1]);
+
+   total_num = *regs_num_32_bit + *regs_num_64_bit;
+   if (!total_num)
+   return -EINVAL;
+
+   return 0;
+}
+
+static int hclge_get_32_bit_regs(struct hclge_dev *hdev, u32 regs_num,
+void *data)
+{
+#define HCLGE_32_BIT_REG_RTN_DATANUM 8
+
+   struct hclge_desc *desc;
+   u32 *reg_val = data;
+   __le32 *desc_data;
+   int cmd_num;
+   int i, k, n;
+   int ret;
+
+   if (regs_num == 0)
+   return 0;
+
+   cmd_num = DIV_ROUND_UP(regs_num + 2, HCLGE_32_BIT_REG_RTN_DATANUM);
+   desc = kcalloc(cmd_num, sizeof(struct hclge_desc), GFP_KERNEL);
+   if (!desc)
+   return -ENOMEM;
+
+   hc

[PATCH V2 net-next 2/4] net: hns3: add manager table initialization for hardware

2018-01-18 Thread Peng Li
From: Fuyun Liang 

The manager table is empty by default. If it is not initialized, the
management pkgs like LLDP will be dropped by hardware. Default entries
need to be added to manager table.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  22 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 101 +
 2 files changed, 123 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 2561e7a..1cd28e0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -605,6 +605,28 @@ struct hclge_mac_vlan_mask_entry_cmd {
u8 rsv2[14];
 };
 
+#define HCLGE_MAC_MGR_MASK_VLAN_B  BIT(0)
+#define HCLGE_MAC_MGR_MASK_MAC_B   BIT(1)
+#define HCLGE_MAC_MGR_MASK_ETHERTYPE_B BIT(2)
+#define HCLGE_MAC_ETHERTYPE_LLDP   0x88cc
+
+struct hclge_mac_mgr_tbl_entry_cmd {
+   u8  flags;
+   u8  resp_code;
+   __le16  vlan_tag;
+   __le32  mac_addr_hi32;
+   __le16  mac_addr_lo16;
+   __le16  rsv1;
+   __le16  ethter_type;
+   __le16  egress_port;
+   __le16  egress_queue;
+   u8  sw_port_id_aware;
+   u8  rsv2;
+   u8  i_port_bitmap;
+   u8  i_port_direction;
+   u8  rsv3[2];
+};
+
 #define HCLGE_CFG_MTA_MAC_SEL_S0x0
 #define HCLGE_CFG_MTA_MAC_SEL_MGENMASK(1, 0)
 #define HCLGE_CFG_MTA_MAC_EN_B 0x7
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c3d2cca..6e64bed 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -392,6 +392,16 @@ static const struct hclge_comm_stats_str 
g_mac_stats_string[] = {
HCLGE_MAC_STATS_FIELD_OFF(mac_rx_send_app_bad_pkt_num)}
 };
 
+static const struct hclge_mac_mgr_tbl_entry_cmd hclge_mgr_table[] = {
+   {
+   .flags = HCLGE_MAC_MGR_MASK_VLAN_B,
+   .ethter_type = cpu_to_le16(HCLGE_MAC_ETHERTYPE_LLDP),
+   .mac_addr_hi32 = cpu_to_le32(htonl(0x0180C200)),
+   .mac_addr_lo16 = cpu_to_le16(htons(0x000E)),
+   .i_port_bitmap = 0x1,
+   },
+};
+
 static int hclge_64_bit_update_stats(struct hclge_dev *hdev)
 {
 #define HCLGE_64_BIT_CMD_NUM 5
@@ -4249,6 +4259,91 @@ int hclge_rm_mc_addr_common(struct hclge_vport *vport,
return status;
 }
 
+static int hclge_get_mac_ethertype_cmd_status(struct hclge_dev *hdev,
+ u16 cmdq_resp, u8 resp_code)
+{
+#define HCLGE_ETHERTYPE_SUCCESS_ADD0
+#define HCLGE_ETHERTYPE_ALREADY_ADD1
+#define HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW   2
+#define HCLGE_ETHERTYPE_KEY_CONFLICT   3
+
+   int return_status;
+
+   if (cmdq_resp) {
+   dev_err(&hdev->pdev->dev,
+   "cmdq execute failed for get_mac_ethertype_cmd_status, 
status=%d.\n",
+   cmdq_resp);
+   return -EIO;
+   }
+
+   switch (resp_code) {
+   case HCLGE_ETHERTYPE_SUCCESS_ADD:
+   case HCLGE_ETHERTYPE_ALREADY_ADD:
+   return_status = 0;
+   break;
+   case HCLGE_ETHERTYPE_MGR_TBL_OVERFLOW:
+   dev_err(&hdev->pdev->dev,
+   "add mac ethertype failed for manager table 
overflow.\n");
+   return_status = -EIO;
+   break;
+   case HCLGE_ETHERTYPE_KEY_CONFLICT:
+   dev_err(&hdev->pdev->dev,
+   "add mac ethertype failed for key conflict.\n");
+   return_status = -EIO;
+   break;
+   default:
+   dev_err(&hdev->pdev->dev,
+   "add mac ethertype failed for undefined, code=%d.\n",
+   resp_code);
+   return_status = -EIO;
+   }
+
+   return return_status;
+}
+
+static int hclge_add_mgr_tbl(struct hclge_dev *hdev,
+const struct hclge_mac_mgr_tbl_entry_cmd *req)
+{
+   struct hclge_desc desc;
+   u8 resp_code;
+   u16 retval;
+   int ret;
+
+   hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MAC_ETHTYPE_ADD, false);
+   memcpy(desc.data, req, sizeof(struct hclge_mac_mgr_tbl_entry_cmd));
+
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "add mac ethertype failed for cmd_send, ret =%d.\n",
+   ret);
+   return ret;
+   }
+
+   resp_code = (le32_to_cpu(desc.data[0]) >> 8) & 0xff;
+   retval = le16_to_cpu(desc.retval);
+
+   return hclge_get_mac_ethertype_cmd_status(hdev, retval, resp_code);
+}
+
+static int init_mgr

[PATCH V2 net-next 3/4] net: hns3: add ethtool -p support for fiber port

2018-01-18 Thread Peng Li
From: Jian Shen 

Add led location support for fiber port. The led will keep blinking
when locating.

Signed-off-by: Jian Shen 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  2 +
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 12 
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 20 +++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 70 ++
 4 files changed, 104 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index d104ce5..fd06bc7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -405,6 +405,8 @@ struct hnae3_ae_ops {
int (*set_channels)(struct hnae3_handle *handle, u32 new_tqps_num);
void (*get_flowctrl_adv)(struct hnae3_handle *handle,
 u32 *flowctrl_adv);
+   int (*set_led_id)(struct hnae3_handle *handle,
+ enum ethtool_phys_id_state status);
 };
 
 struct hnae3_dcb_ops {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 1c8b293..7410205 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1084,6 +1084,17 @@ static void hns3_get_regs(struct net_device *netdev,
h->ae_algo->ops->get_regs(h, &cmd->version, data);
 }
 
+static int hns3_set_phys_id(struct net_device *netdev,
+   enum ethtool_phys_id_state state)
+{
+   struct hnae3_handle *h = hns3_get_handle(netdev);
+
+   if (!h->ae_algo || !h->ae_algo->ops || !h->ae_algo->ops->set_led_id)
+   return -EOPNOTSUPP;
+
+   return h->ae_algo->ops->set_led_id(h, state);
+}
+
 static const struct ethtool_ops hns3vf_ethtool_ops = {
.get_drvinfo = hns3_get_drvinfo,
.get_ringparam = hns3_get_ringparam,
@@ -1126,6 +1137,7 @@ static const struct ethtool_ops hns3_ethtool_ops = {
.set_coalesce = hns3_set_coalesce,
.get_regs_len = hns3_get_regs_len,
.get_regs = hns3_get_regs,
+   .set_phys_id = hns3_set_phys_id,
 };
 
 void hns3_ethtool_set_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 1cd28e0..122f862 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -227,6 +227,9 @@ enum hclge_opcode_type {
 
/* Mailbox cmd */
HCLGEVF_OPC_MBX_PF_TO_VF= 0x2000,
+
+   /* Led command */
+   HCLGE_OPC_LED_STATUS_CFG= 0xB000,
 };
 
 #define HCLGE_TQP_REG_OFFSET   0x8
@@ -807,6 +810,23 @@ struct hclge_reset_cmd {
 #define HCLGE_NIC_CMQ_DESC_NUM 1024
 #define HCLGE_NIC_CMQ_DESC_NUM_S   3
 
+#define HCLGE_LED_PORT_SPEED_STATE_S   0
+#define HCLGE_LED_PORT_SPEED_STATE_M   GENMASK(5, 0)
+#define HCLGE_LED_ACTIVITY_STATE_S 0
+#define HCLGE_LED_ACTIVITY_STATE_M GENMASK(1, 0)
+#define HCLGE_LED_LINK_STATE_S 0
+#define HCLGE_LED_LINK_STATE_M GENMASK(1, 0)
+#define HCLGE_LED_LOCATE_STATE_S   0
+#define HCLGE_LED_LOCATE_STATE_M   GENMASK(1, 0)
+
+struct hclge_set_led_state_cmd {
+   u8 port_speed_led_config;
+   u8 link_led_config;
+   u8 activity_led_config;
+   u8 locate_led_config;
+   u8 rsv[20];
+};
+
 int hclge_cmd_init(struct hclge_dev *hdev);
 static inline void hclge_write_reg(void __iomem *base, u32 reg, u32 value)
 {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 6e64bed..12150f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5819,6 +5819,75 @@ static void hclge_get_regs(struct hnae3_handle *handle, 
u32 *version,
"Get 64 bit register failed, ret = %d.\n", ret);
 }
 
+static int hclge_set_led_status_sfp(struct hclge_dev *hdev, u8 
speed_led_status,
+   u8 act_led_status, u8 link_led_status,
+   u8 locate_led_status)
+{
+   struct hclge_set_led_state_cmd *req;
+   struct hclge_desc desc;
+   int ret;
+
+   hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_LED_STATUS_CFG, false);
+
+   req = (struct hclge_set_led_state_cmd *)desc.data;
+   hnae_set_field(req->port_speed_led_config, HCLGE_LED_PORT_SPEED_STATE_M,
+  HCLGE_LED_PORT_SPEED_STATE_S, speed_led_status);
+   hnae_set_field(req->link_led_config, HCLGE_LED_ACTIVITY_STATE_M,
+  HCLGE_LED_ACTIVITY_STATE_S, act_led_status);
+   hnae_set_field(req->activity_led_config, HCLGE_LED_LINK_STATE_M,
+  HCLGE_LED_LINK_STATE_S, link_led_status);
+   hnae_set_field(req->lo

[PATCH V2 net-next 4/4] net: hns3: add net status led support for fiber port

2018-01-18 Thread Peng Li
From: Jian Shen 

Check the net status per second, include port speed, total rx/tx packets
and link status. Updating the led status for fiber port.

Signed-off-by: Jian Shen 
Signed-off-by: Peng Li 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   1 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 109 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   3 +
 3 files changed, 113 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 122f862..3fd10a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -115,6 +115,7 @@ enum hclge_opcode_type {
HCLGE_OPC_QUERY_LINK_STATUS = 0x0307,
HCLGE_OPC_CONFIG_MAX_FRM_SIZE   = 0x0308,
HCLGE_OPC_CONFIG_SPEED_DUP  = 0x0309,
+   HCLGE_OPC_STATS_MAC_TRAFFIC = 0x0314,
/* MACSEC command */
 
/* PFC/Pause CMD*/
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 12150f2..32bc6f6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -39,6 +39,7 @@ static int hclge_set_mta_filter_mode(struct hclge_dev *hdev,
 static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu);
 static int hclge_init_vlan_config(struct hclge_dev *hdev);
 static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev);
+static int hclge_update_led_status(struct hclge_dev *hdev);
 
 static struct hnae3_ae_algo ae_algo;
 
@@ -505,6 +506,38 @@ static int hclge_32_bit_update_stats(struct hclge_dev 
*hdev)
return 0;
 }
 
+static int hclge_mac_get_traffic_stats(struct hclge_dev *hdev)
+{
+   struct hclge_mac_stats *mac_stats = &hdev->hw_stats.mac_stats;
+   struct hclge_desc desc;
+   __le64 *desc_data;
+   int ret;
+
+   /* for fiber port, need to query the total rx/tx packets statstics,
+* used for data transferring checking.
+*/
+   if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_FIBER)
+   return 0;
+
+   if (test_bit(HCLGE_STATE_STATISTICS_UPDATING, &hdev->state))
+   return 0;
+
+   hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_STATS_MAC_TRAFFIC, true);
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "Get MAC total pkt stats fail, ret = %d\n", ret);
+
+   return ret;
+   }
+
+   desc_data = (__le64 *)(&desc.data[0]);
+   mac_stats->mac_tx_total_pkt_num += le64_to_cpu(*desc_data++);
+   mac_stats->mac_rx_total_pkt_num += le64_to_cpu(*desc_data);
+
+   return 0;
+}
+
 static int hclge_mac_update_stats(struct hclge_dev *hdev)
 {
 #define HCLGE_MAC_CMD_NUM 21
@@ -2846,13 +2879,20 @@ static void hclge_service_task(struct work_struct *work)
struct hclge_dev *hdev =
container_of(work, struct hclge_dev, service_task);
 
+   /* The total rx/tx packets statstics are wanted to be updated
+* per second. Both hclge_update_stats_for_all() and
+* hclge_mac_get_traffic_stats() can do it.
+*/
if (hdev->hw_stats.stats_timer >= HCLGE_STATS_TIMER_INTERVAL) {
hclge_update_stats_for_all(hdev);
hdev->hw_stats.stats_timer = 0;
+   } else {
+   hclge_mac_get_traffic_stats(hdev);
}
 
hclge_update_speed_duplex(hdev);
hclge_update_link_status(hdev);
+   hclge_update_led_status(hdev);
hclge_service_complete(hdev);
 }
 
@@ -5888,6 +5928,75 @@ static int hclge_set_led_id(struct hnae3_handle *handle,
return ret;
 }
 
+enum hclge_led_port_speed {
+   HCLGE_SPEED_LED_FOR_1G,
+   HCLGE_SPEED_LED_FOR_10G,
+   HCLGE_SPEED_LED_FOR_25G,
+   HCLGE_SPEED_LED_FOR_40G,
+   HCLGE_SPEED_LED_FOR_50G,
+   HCLGE_SPEED_LED_FOR_100G,
+};
+
+static u8 hclge_led_get_speed_status(u32 speed)
+{
+   u8 speed_led;
+
+   switch (speed) {
+   case HCLGE_MAC_SPEED_1G:
+   speed_led = HCLGE_SPEED_LED_FOR_1G;
+   break;
+   case HCLGE_MAC_SPEED_10G:
+   speed_led = HCLGE_SPEED_LED_FOR_10G;
+   break;
+   case HCLGE_MAC_SPEED_25G:
+   speed_led = HCLGE_SPEED_LED_FOR_25G;
+   break;
+   case HCLGE_MAC_SPEED_40G:
+   speed_led = HCLGE_SPEED_LED_FOR_40G;
+   break;
+   case HCLGE_MAC_SPEED_50G:
+   speed_led = HCLGE_SPEED_LED_FOR_50G;
+   break;
+   case HCLGE_MAC_SPEED_100G:
+   speed_led = HCLGE_SPEED_LED_FOR_100G;
+   break;
+   default:
+   speed_led = HCLGE_LED_NO_CHANGE;
+   }
+
+   return speed_led;
+}
+
+static int hclge_update_led_status(struct hclge_dev *hdev)
+{
+  

[PATCH V2 net-next 0/4] add some features to hns3 driver

2018-01-18 Thread Peng Li
This patchset adds some features to hns3 driver, include the support
for ethtool command -d, -p and support for manager table.

[Patch 1/4] adds support for ethtool command -d, its ops is get_regs.
driver will send command to command queue, and get regs number and
regs value from command queue.
[Patch 2/4] adds manager table initialization for hardware.
[Patch 3/4] adds support for ethtool command -p. For fiber ports, driver
sends command to command queue, and IMP will write SGPIO regs to control
leds.
[Patch 4/4] adds support for net status led for fiber ports. Net status
include  port speed, total rx/tx packets and link status. Driver send
the status to command queue, and IMP will write SGPIO to control leds.

---
Change log:
V1 -> V2:
1, fix comments from Andrew Lunn, remove the patch "net: hns3: add
ethtool -p support for phy device".
---

Fuyun Liang (2):
  net: hns3: add support for get_regs
  net: hns3: add manager table initialization for hardware

Jian Shen (2):
  net: hns3: add ethtool -p support for fiber port
  net: hns3: add net status led support for fiber port

 drivers/net/ethernet/hisilicon/hns3/hnae3.h|   5 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  35 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  47 +++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 456 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   3 +
 5 files changed, 545 insertions(+), 1 deletion(-)

-- 
2.9.3



Re: [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers

2018-01-18 Thread Chunming Zhou



On 2018年01月19日 00:47, Andrey Grodzovsky wrote:

Large amounts of VRAM are usually not CPU accessible, so they are not mapped
into the processes address space. But since the device drivers usually support
swapping buffers from VRAM to system memory we can still run into an out of
memory situation when userspace starts to allocate to much.

This patch gives the OOM another hint which process is
holding how many resources.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/drm_file.c | 12 
  drivers/gpu/drm/drm_gem.c  |  8 
  include/drm/drm_file.h |  4 
  3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index b3c6e99..626cc76 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -747,3 +747,15 @@ void drm_send_event(struct drm_device *dev, struct 
drm_pending_event *e)
spin_unlock_irqrestore(&dev->event_lock, irqflags);
  }
  EXPORT_SYMBOL(drm_send_event);
+
+long drm_oom_badness(struct file *f)
+{
+
+   struct drm_file *file_priv = f->private_data;
+
+   if (file_priv)
+   return atomic_long_read(&file_priv->f_oom_badness);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_oom_badness);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 01f8d94..ffbadc8 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -264,6 +264,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
drm_gem_remove_prime_handles(obj, file_priv);
drm_vma_node_revoke(&obj->vma_node, file_priv);
  
+	atomic_long_sub(obj->size >> PAGE_SHIFT,

+   &file_priv->f_oom_badness);
+
drm_gem_object_handle_put_unlocked(obj);
  
  	return 0;

@@ -299,6 +302,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
idr_remove(&filp->object_idr, handle);
spin_unlock(&filp->table_lock);
  
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);

+
return 0;
  }
  EXPORT_SYMBOL(drm_gem_handle_delete);
@@ -417,6 +422,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
}
  
  	*handlep = handle;

+
+   atomic_long_add(obj->size >> PAGE_SHIFT,
+   &file_priv->f_oom_badness);
For VRAM case, it should be counted only when vram bo is evicted to 
system memory.
For example, vram total is 8GB, system memory total is 8GB, one 
application allocates 7GB vram and 7GB system memory, which is allowed, 
but if following your idea, then this application will be killed by OOM, 
right?


Regards,
David Zhou

return 0;
  
  err_revoke:

diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0e0c868..ac3aa75 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -317,6 +317,8 @@ struct drm_file {
  
  	/* private: */

unsigned long lock_count; /* DRI1 legacy lock count */
+
+   atomic_long_t   f_oom_badness;
  };
  
  /**

@@ -378,4 +380,6 @@ void drm_event_cancel_free(struct drm_device *dev,
  void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event 
*e);
  void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
  
+long drm_oom_badness(struct file *f);

+
  #endif /* _DRM_FILE_H_ */




Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread Keith Busch
On Fri, Jan 19, 2018 at 01:55:29PM +0800, jianchao.wang wrote:
> On 01/19/2018 12:59 PM, Keith Busch wrote:
> > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
> >> +   * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
> >> +   *   request should come from the previous work and we handle
> >> +   *   it as nvme_cancel_request.
> >> +   * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired
> >> +   *   request should come from the initializing procedure such as
> >> +   *   setup io queues, because all the previous outstanding
> >> +   *   requests should have been cancelled.
> >> */
> >> -  if (dev->ctrl.state == NVME_CTRL_RESETTING) {
> >> -  dev_warn(dev->ctrl.device,
> >> -   "I/O %d QID %d timeout, disable controller\n",
> >> -   req->tag, nvmeq->qid);
> >> -  nvme_dev_disable(dev, false);
> >> +  switch (dev->ctrl.state) {
> >> +  case NVME_CTRL_RESETTING:
> >> +  nvme_req(req)->status = NVME_SC_ABORT_REQ;
> >> +  return BLK_EH_HANDLED;
> >> +  case NVME_CTRL_RECONNECTING:
> >> +  WARN_ON_ONCE(nvmeq->qid);
> >>nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> >>return BLK_EH_HANDLED;
> >> +  default:
> >> +  break;
> >>}
> > 
> > The driver may be giving up on the command here, but that doesn't mean
> > the controller has. We can't just end the request like this because that
> > will release the memory the controller still owns. We must wait until
> > after nvme_dev_disable clears bus master because we can't say for sure
> > the controller isn't going to write to that address right after we end
> > the request.
> > 
> Yes, but the controller is going to be reseted or shutdown at the moment,
> even if the controller accesses a bad address and goes wrong, everything will
> be ok after reset or shutdown. :)

Hm, I don't follow. DMA access after free is never okay.


RE: [RFC] Per file OOM badness

2018-01-18 Thread He, Roger


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Michal Hocko
Sent: Friday, January 19, 2018 1:14 AM
To: Grodzovsky, Andrey 
Cc: linux...@kvack.org; amd-...@lists.freedesktop.org; 
linux-kernel@vger.kernel.org; dri-de...@lists.freedesktop.org; Koenig, 
Christian 
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian 
> > König a few years ago. The original RFC can be found at 
> > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897
> > 78.html
> > 
> > This is the same idea and I've just adressed his concern from the 
> > original RFC and switched to a callback into file_ops instead of a new 
> > member in struct file.
> 
> Please add the full description to the cover letter and do not make 
> people hunt links.
> 
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate 
> memory on
> : behalf of an application the OOM killer usually doesn't knew about 
> that unless
> : the application also get this memory mapped into their address space.
> : 
> : This is especially annoying for graphics drivers where a lot of the 
> VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into 
> the
> : address space of the process using it.
> : 
> : The problem now is that when an application starts to use a lot of 
> VRAM those
> : buffers objects sooner or later get swapped out to system memory, 
> but when we
> : now run into an out of memory situation the OOM killer obviously 
> doesn't knew
> : anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable 
entity? And how do you actually enforce that those resources get freed on the 
oom killer action?

Here I think we need more fine granularity for distinguishing the buffer is 
taking VRAM or system memory.

> : The following set of patches tries to address this problem by 
> introducing a per
> : file OOM badness score, which device drivers can use to give the OOM 
> killer a
> : hint how many resources are bound to a file descriptor so that it 
> can make
> : better decisions which process to kill.

But files are not killable, they can be shared... In other words this doesn't 
help the oom killer to make an educated guess at all.

> : 
> : So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a 
pain, especially when there is way too much of it. If you can free that memory 
somehow then you can hook into slab shrinker API and react on the memory 
pressure. If you can account such amemory to a particular process and 
make sure that the consumption is bound by the process life time then we can 
think of an accounting that oom_badness can consider when selecting a victim.

I think you are misunderstanding here.
Actually for now, the memory in TTM Pools already has mm_shrink which is 
implemented in ttm_pool_mm_shrink_init.
And here the memory we want to make it contribute to OOM badness is not in TTM 
Pools.
Because when TTM buffer allocation success, the memory already is removed from 
TTM Pools.  

Thanks
Roger(Hongbo.He)

--
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread jianchao.wang
Hi Keith

Thanks for your kindly response and directive.

On 01/19/2018 12:59 PM, Keith Busch wrote:
> On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
>> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
>> + *   request should come from the previous work and we handle
>> + *   it as nvme_cancel_request.
>> + * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired
>> + *   request should come from the initializing procedure such as
>> + *   setup io queues, because all the previous outstanding
>> + *   requests should have been cancelled.
>>   */
>> -if (dev->ctrl.state == NVME_CTRL_RESETTING) {
>> -dev_warn(dev->ctrl.device,
>> - "I/O %d QID %d timeout, disable controller\n",
>> - req->tag, nvmeq->qid);
>> -nvme_dev_disable(dev, false);
>> +switch (dev->ctrl.state) {
>> +case NVME_CTRL_RESETTING:
>> +nvme_req(req)->status = NVME_SC_ABORT_REQ;
>> +return BLK_EH_HANDLED;
>> +case NVME_CTRL_RECONNECTING:
>> +WARN_ON_ONCE(nvmeq->qid);
>>  nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>>  return BLK_EH_HANDLED;
>> +default:
>> +break;
>>  }
> 
> The driver may be giving up on the command here, but that doesn't mean
> the controller has. We can't just end the request like this because that
> will release the memory the controller still owns. We must wait until
> after nvme_dev_disable clears bus master because we can't say for sure
> the controller isn't going to write to that address right after we end
> the request.
> 
Yes, but the controller is going to be reseted or shutdown at the moment,
even if the controller accesses a bad address and goes wrong, everything will
be ok after reset or shutdown. :)

Thanks
Jianchao  




linux-next: build failure after merge of the powerpc tree

2018-01-18 Thread Stephen Rothwell
Hi all,

After merging the powerpc tree, today's linux-next build (powerpc64
allnoconfig) failed like this:

arch/powerpc/kernel/mce_power.o: In function `.mce_handle_error':
mce_power.c:(.text+0x5a8): undefined reference to `.hash__tlbiel_all'
mce_power.c:(.text+0x6b8): undefined reference to `.hash__tlbiel_all'
arch/powerpc/mm/hash_utils_64.o: In function `.hash__early_init_mmu':
hash_utils_64.c:(.init.text+0x9d0): undefined reference to `.hash__tlbiel_all'

Caused by commit

  d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on 
POWER9")

The definition of hash__tlbiel_all() is in
arch/powerpc/mm/hash_native_64.c which is only built if CONFIG_PPC_NATIVE
is set, which it is not for this build.

I applied a supplied fix patch.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 2/4] dmaengine: qcom: bam_dma: add num-channels binding for remotely controlled

2018-01-18 Thread Vinod Koul
On Tue, Jan 16, 2018 at 07:02:34PM +, srinivas.kandaga...@linaro.org wrote:
> From: Srinivas Kandagatla 
> 
> When Linux is master of BAM, it can directly read registers to know number
> of supported channels, however when its remotely controlled reading these
> registers would trigger a crash if the BAM is not yet intialized/powered up
> on the remote side.
> 
> This patch adds num-channels binding to specify number of supported
> dma channels on remotely controlled BAM.
> 
> Signed-off-by: Srinivas Kandagatla 
> ---
>  Documentation/devicetree/bindings/dma/qcom_bam_dma.txt |  2 ++
>  drivers/dma/qcom/bam_dma.c | 13 +++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt 
> b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt
> index 9cbf5d9df8fd..aa6822cbb230 100644
> --- a/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt
> +++ b/Documentation/devicetree/bindings/dma/qcom_bam_dma.txt
> @@ -15,6 +15,8 @@ Required properties:
>the secure world.
>  - qcom,controlled-remotely : optional, indicates that the bam is controlled 
> by
>remote proccessor i.e. execution environment.
> +- num-channels : optional, indicates supported number of DMA channels in a
> +  remotely controlled bam.
>  
>  Example:
>  
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 78e488e8f96d..523bd178047a 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -1083,8 +1083,10 @@ static int bam_init(struct bam_device *bdev)
>   if (bdev->ee >= val)
>   return -EINVAL;
>  
> - val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES));
> - bdev->num_channels = val & BAM_NUM_PIPES_MASK;
> + if (!bdev->num_channels) {
> + val = readl_relaxed(bam_addr(bdev, 0, BAM_NUM_PIPES));
> + bdev->num_channels = val & BAM_NUM_PIPES_MASK;
> + }
>  
>   if (bdev->controlled_remotely)
>   return 0;
> @@ -1179,6 +1181,13 @@ static int bam_dma_probe(struct platform_device *pdev)
>   bdev->controlled_remotely = of_property_read_bool(pdev->dev.of_node,
>   "qcom,controlled-remotely");
>  
> + if (bdev->controlled_remotely) {

hmm so if we remove the remotely controlled instanced from DT and then Linux
won't see them and not do anything. Do we need to do configuration of these
instances too?

> + ret = of_property_read_u32(pdev->dev.of_node, "num-channels",
> +&bdev->num_channels);
> + if (ret)
> + dev_err(bdev->dev, "num-channels unspecified in dt\n");
> + }
> +
>   bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk");
>   if (IS_ERR(bdev->bamclk)) {
>   bdev->bamclk = NULL;
> -- 
> 2.15.1
> 

-- 
~Vinod


RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver

2018-01-18 Thread 李書帆
Hi Jun,

  For now, RT1711H is not fully compatible with TCPCI. So the existing tcpci.c 
may not work for it.

Best Regards,
*
Shu-Fan Lee
Richtek Technology Corporation
TEL: +886-3-5526789 #2359
FAX: +886-3-5526612
*

-Original Message-
From: Jun Li [mailto:jun...@nxp.com]
Sent: Friday, January 19, 2018 11:10 AM
To: ShuFanLee; heikki.kroge...@linux.intel.com
Cc: cy_huang(黃啟原); shufan_lee(李書帆); linux-kernel@vger.kernel.org; 
linux-...@vger.kernel.org; Guenter Roeck
Subject: RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver

Hi
> -Original Message-
> From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-
> ow...@vger.kernel.org] On Behalf Of ShuFanLee
> Sent: Wednesday, January 10, 2018 2:59 PM
> To: heikki.kroge...@linux.intel.com
> Cc: cy_hu...@richtek.com; shufan_...@richtek.com; linux-
> ker...@vger.kernel.org; linux-...@vger.kernel.org
> Subject: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver
>
> From: ShuFanLee 
>
> Richtek RT1711H Type-C chip driver that works with Type-C Port
> Controller Manager to provide USB PD and USB Type-C functionalities.

A general question, is this Rt1711h type-c chip compatible with TCPCI 
(Universal Serial Bus Type-C Port Controller Interface Specification)?
looks like it has the same register map and has some extension, can the 
existing ./drivers/staging/typec/tcpic.c basically work for you?

+Guenter

Li Jun

>
> Signed-off-by: ShuFanLee 
> ---
>  .../devicetree/bindings/usb/richtek,rt1711h.txt|   38 +
>  arch/arm64/boot/dts/hisilicon/rt1711h.dtsi |   11 +
>  drivers/usb/typec/Kconfig  |2 +
>  drivers/usb/typec/Makefile |1 +
>  drivers/usb/typec/rt1711h/Kconfig  |7 +
>  drivers/usb/typec/rt1711h/Makefile |2 +
>  drivers/usb/typec/rt1711h/rt1711h.c| 2241 
> 
>  drivers/usb/typec/rt1711h/rt1711h.h|  300 +++
>  8 files changed, 2602 insertions(+)
>  create mode 100644
> Documentation/devicetree/bindings/usb/richtek,rt1711h.txt
>  create mode 100644 arch/arm64/boot/dts/hisilicon/rt1711h.dtsi
>  create mode 100644 drivers/usb/typec/rt1711h/Kconfig  create mode
> 100644 drivers/usb/typec/rt1711h/Makefile
>  create mode 100644 drivers/usb/typec/rt1711h/rt1711h.c
>  create mode 100644 drivers/usb/typec/rt1711h/rt1711h.h
>
* Email Confidentiality Notice 

The information contained in this e-mail message (including any attachments) 
may be confidential, proprietary, privileged, or otherwise exempt from 
disclosure under applicable laws. It is intended to be conveyed only to the 
designated recipient(s). Any use, dissemination, distribution, printing, 
retaining or copying of this e-mail (including its attachments) by unintended 
recipient(s) is strictly prohibited and may be unlawful. If you are not an 
intended recipient of this e-mail, or believe that you have received this 
e-mail in error, please notify the sender immediately (by replying to this 
e-mail), delete any and all copies of this e-mail (including any attachments) 
from your system, and do not disclose the content of this e-mail to any other 
person. Thank you!


Re: [PATCH 1/4] dmaengine: qcom: bam_dma: make bam clk optional

2018-01-18 Thread Vinod Koul
On Tue, Jan 16, 2018 at 07:02:33PM +, srinivas.kandaga...@linaro.org wrote:
> From: Srinivas Kandagatla 
> 
> When BAM is remotely controlled it does not sound correct to control
> its clk on Linux side. Make it optional, so that its not madatory

s/madatory/mandatory

> for remote controlled BAM instances.
> 
> Signed-off-by: Srinivas Kandagatla 
> ---
>  drivers/dma/qcom/bam_dma.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 03c4eb3fd314..78e488e8f96d 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -1180,13 +1180,14 @@ static int bam_dma_probe(struct platform_device *pdev)
>   "qcom,controlled-remotely");
>  
>   bdev->bamclk = devm_clk_get(bdev->dev, "bam_clk");

but you still do clk_get unconditionally?

> - if (IS_ERR(bdev->bamclk))
> - return PTR_ERR(bdev->bamclk);
> -
> - ret = clk_prepare_enable(bdev->bamclk);
> - if (ret) {
> - dev_err(bdev->dev, "failed to prepare/enable clock\n");
> - return ret;
> + if (IS_ERR(bdev->bamclk)) {
> + bdev->bamclk = NULL;
> + } else {
> + ret = clk_prepare_enable(bdev->bamclk);
> + if (ret) {
> + dev_err(bdev->dev, "failed to prepare/enable clock\n");
> + return ret;
> + }

wouldn't it be better to set that an instance is remote controlled and thus
not at all visible to Linux?

>   }
>  
>   ret = bam_init(bdev);
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
~Vinod


Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-18 Thread Sergey Senozhatsky
On (01/18/18 10:02), Andi Kleen wrote:
> Dave Young  writes:
> > printk("%sHardware name: %s\n",
> >log_lvl, dump_stack_arch_desc_str);
> > +   if (kexec_crash_loaded())
> > +   printk("%skdump kernel loaded\n", log_lvl);
> 
> Oops/warnings are getting longer and longer, often scrolling away
> from the screen, and if the kernel crashes backscroll does not work
> anymore, so precious information is lost.

true. I even ended up having a console_reflush_on_panic() function. it
simply re-prints with a delay [so I can at least read the oops] logbuf
entries every once in a while, staring with the first oops_in_progress
record.

something like below [it's completely hacked up, but at least gives
an idea]

---

 include/linux/console.h |  1 +
 kernel/panic.c  |  7 +++
 kernel/printk/printk.c  | 39 ++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..502e3f539448 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -168,6 +168,7 @@ extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
 extern void console_flush_on_panic(void);
+extern void console_reflush_on_panic(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..39cd59bbfaab 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -137,6 +137,7 @@ void panic(const char *fmt, ...)
va_list args;
long i, i_next = 0;
int state = 0;
+   int reflush_tick = 0;
int old_cpu, this_cpu;
bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
 
@@ -298,6 +299,12 @@ void panic(const char *fmt, ...)
i_next = i + 3600 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
+
+   reflush_tick++;
+   if (reflush_tick == 32) { /* don't reflush too often */
+   console_reflush_on_panic();
+   reflush_tick = 0;
+   }
}
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..ef3f28d4c741 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -426,6 +426,10 @@ static u32 log_next_idx;
 static u64 console_seq;
 static u32 console_idx;
 
+/* index and sequence number of the record which started the oops print out */
+static u64 log_oops_seq;
+static u32 log_oops_idx;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1736,6 +1740,15 @@ static inline void printk_delay(void)
}
 }
 
+/*
+ * Why do we have printk_delay() in vprintk_emit()
+ * and not in console_unlock()?
+ */
+static inline void console_unlock_delay(void)
+{
+   printk_delay();
+}
+
 /*
  * Continuation lines are buffered, and not committed to the record buffer
  * until the line is complete, or a race forces it. The line fragments
@@ -1849,6 +1862,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* This stops the holder of console_sem just where we want him */
logbuf_lock_irqsave(flags);
+
/*
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.
@@ -1890,7 +1904,11 @@ asmlinkage int vprintk_emit(int facility, int level,
lflags |= LOG_PREFIX|LOG_NEWLINE;
 
printed_len = log_output(facility, level, lflags, dict, dictlen, text, 
text_len);
-
+   /* Oops... */
+   if (oops_in_progress && !log_oops_seq) {
+   log_oops_seq = log_next_seq;
+   log_oops_idx = log_next_idx;
+   }
logbuf_unlock_irqrestore(flags);
 
/* If called from the scheduler, we can not call up(). */
@@ -2396,6 +2414,7 @@ void console_unlock(void)
 
stop_critical_timings();/* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
+   console_unlock_delay();
start_critical_timings();
 
if (console_lock_spinning_disable_and_check()) {
@@ -2495,6 +2514,24 @@ void console_flush_on_panic(void)
console_unlock();
 }
 
+/**
+ * console_reflush_on_panic - re-flush console content starting from the
+ * first oops_in_progress record
+ */
+void console_reflush_on_panic(void)
+{
+   unsigned long flags;
+
+   logbuf_lock_irqsave(flags);
+   console_seq = log_oops_seq;
+   console_idx = log_oops_idx;
+   logbuf_unlock_irqrestore(flags);
+
+   if (!printk_delay_msec)
+   printk_delay_msec = 273; /* I can't read any faster */
+   console_flush_on_panic();
+}
+
 /*
  * Return the console tty driver structure and its associated index
  */
-- 
2

Re: [PATCH] cpufreq: remove at32ap-cpufreq

2018-01-18 Thread Viresh Kumar
On 18-01-18, 21:02, Corentin Labbe wrote:
> Since AVR32 arch was removed, at32ap-cpufreq is useless.
> Remove this driver.
> 
> Signed-off-by: Corentin Labbe 
> ---
>  drivers/cpufreq/Kconfig  |  10 ---
>  drivers/cpufreq/Makefile |   1 -
>  drivers/cpufreq/at32ap-cpufreq.c | 127 
> ---
>  3 files changed, 138 deletions(-)
>  delete mode 100644 drivers/cpufreq/at32ap-cpufreq.c

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH 0/7] PM /Domain/OPP: Add support to get performance state from DT

2018-01-18 Thread Viresh Kumar
On 18-01-18, 20:24, Rafael J. Wysocki wrote:
> On Thursday, January 18, 2018 7:34:04 AM CET Viresh Kumar wrote:
> > On 22-12-17, 12:56, Viresh Kumar wrote:
> > > Hi,
> > > 
> > > Now that the DT bindings [1] are already Reviewed/Acked by respective
> > > maintainers, here is the code to start using them.
> > > 
> > > The first two patches provide helpers in the OPP core, [3-5]/7 update
> > > the PM domain core to start supporting domain OPP tables, etc, 6/7
> > > updates the OPP core to use the new callback provided by the PM domains
> > > to get performance state and the last one removes the unused helpers
> > > now.
> > > 
> > > This is tested on Hikey620 and works just fine.
> > 
> > Ping !
> 
> Well, whom are you pinging exactly and why?

Ulf and Kevin as its been almost a month since this series is posted
and has received no comments at all.

-- 
viresh


RE: [RFC] Per file OOM badness

2018-01-18 Thread He, Roger
Basically the idea is right to me.

1. But we need smaller granularity to control the contribution to OOM badness.
 Because when the TTM buffer resides in VRAM rather than evict to system 
memory, we should not take this account into badness.
 But I think it is not easy to implement.

2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure 
the buffer size is already taken into account for kernel.
 If yes, at last the size will be counted again by your patches.

So, I am thinking if we can counted the TTM buffer size into: 
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
};
Which is done by kernel based on CPU VM (page table).

Something like that:
When GTT allocate suceess:
add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size);

When GTT swapped out:
dec_mm_counter from MM_ANONPAGES frist, then 
add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size);  // or MM_SHMEMPAGES or 
add new item.

Update the corresponding item in mm_rss_stat always.
If that, we can control the status update accurately. 
What do you think about that?
And is there any side-effect for this approach?


Thanks
Roger(Hongbo.He)

-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Andrey Grodzovsky
Sent: Friday, January 19, 2018 12:48 AM
To: linux-kernel@vger.kernel.org; linux...@kvack.org; 
dri-de...@lists.freedesktop.org; amd-...@lists.freedesktop.org
Cc: Koenig, Christian 
Subject: [RFC] Per file OOM badness

Hi, this series is a revised version of an RFC sent by Christian König a few 
years ago. The original RFC can be found at 
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC 
and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH] e1000e: Remove Other from EIAC.

2018-01-18 Thread Benjamin Poirier
On 2018/01/18 18:42, Shrikrishna Khare wrote:
> 
> 
> On Thu, 18 Jan 2018, Benjamin Poirier wrote:
> 
> > On 2018/01/18 15:50, Benjamin Poirier wrote:
> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation.
> > > 
> > > Some experimentation showed that this flaw in vmware e1000e emulation can
> > > be worked around by not setting Other in EIAC. This is how it was before
> > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > 
> > vmware folks, please comment.
> 
> Thank you for bringing this to our attention.
> 
> Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which 
> has the said patch), I could bring up e1000e interface (version: 3.2.6-k),
> get dhcp address and even do large file downloads without difficulty.
> 
> Could you give us more pointers on how we may be able to reproduce this 
> locally? Was there anything different with the configuration when the 
> issue was observed? Is the issue consistently reproducible?

It's consistently reproducible, however I noticed that once in a while
there is a genuine "Other" interrupt that comes in and triggers the link
status change. The problem is with interrupts that are triggered via a
write to ICS (such as in e1000e_trigger_lsc()). Can you reproduce a
problem if you do:
ip link set ethX down
ip link set ethX up

If you're building your own kernel, you can add the following patch and
cat /sys/kernel/debug/tracing/trace_pipe

For me it shows on v4.15-rc8:
   <...>-2578  [000]  83527.938321: e1000e_trigger_lsc: trigger_lsc
   <...>-2578  [000] d.h. 83527.938398: e1000_msix_other: icr 0x0

With the patch that I submitted, it shows:
 wickedd-1329  [002] .N..20.123545: e1000e_trigger_lsc: trigger_lsc
  -0 [000] d.h.20.123630: e1000_msix_other: icr 0x8104
  -0 [000] d.h.20.123654: e1000_msix_other: lsc
  -0 [000] d.h.20.123676: e1000_msix_other: mod_timer


diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..16620ce840fc 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1918,22 +1918,29 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
bool enable = true;
 
icr = er32(ICR);
+   trace_printk("icr 0x%x\n", icr);
+
if (icr & E1000_ICR_RXO) {
+   trace_printk("rxo\n");
ew32(ICR, E1000_ICR_RXO);
enable = false;
/* napi poll will re-enable Other, make sure it runs */
if (napi_schedule_prep(&adapter->napi)) {
+   trace_printk("napi schedule\n");
adapter->total_rx_bytes = 0;
adapter->total_rx_packets = 0;
__napi_schedule(&adapter->napi);
}
}
if (icr & E1000_ICR_LSC) {
+   trace_printk("lsc\n");
ew32(ICR, E1000_ICR_LSC);
hw->mac.get_link_status = true;
/* guard against interrupt when we're going down */
-   if (!test_bit(__E1000_DOWN, &adapter->state))
+   if (!test_bit(__E1000_DOWN, &adapter->state)) {
+   trace_printk("mod_timer\n");
mod_timer(&adapter->watchdog_timer, jiffies + 1);
+   }
}
 
if (enable && !test_bit(__E1000_DOWN, &adapter->state))
@@ -4221,6 +4228,8 @@ static void e1000e_trigger_lsc(struct e1000_adapter 
*adapter)
 {
struct e1000_hw *hw = &adapter->hw;
 
+   trace_printk("trigger_lsc\n");
+
if (adapter->msix_entries)
ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER);
else


Re: [PATCH v8 5/5] document: add document for kaslr_mem

2018-01-18 Thread Chao Fan
On Fri, Jan 19, 2018 at 11:53:31AM +0800, Baoquan He wrote:
>On 01/19/18 at 11:36am, Chao Fan wrote:
>> Signed-off-by: Chao Fan 
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt | 10 ++
>>  1 file changed, 10 insertions(+)
>> 
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index e2de7c006a74..28a879f62560 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2350,6 +2350,16 @@
>>  allocations which rules out almost all kernel
>>  allocations. Use with caution!
>>  
>> +kaslr_mem=nn[KMG][@ss[KMG]]
>> +[KNL] Force usage of a specific region of memory
>> +for KASLR during kernel decompression stage.
>> +Region of usable memory is from ss to ss+nn. If ss
>> +is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0.
>> +Multiple regions can be specified, comma delimited.
>> +Notice: we support 4 regions at most now.
>
>Better not use 'we' here. You can refer to kernel-parameter.txt.

You are right, so I resend this part, and add several Cc.

Thanks,
Chao Fan
>
>> +Example:
>> +kaslr_mem=1G,500M@2G,1G@4G
>> +
>>  MTD_Partition=  [MTD]
>>  Format: ,,,
>>  
>> -- 
>> 2.14.3
>> 
>> 
>> 
>
>




[RESEND PATCH v8 5/5] document: add document for kaslr_mem

2018-01-18 Thread Chao Fan
Cc: linux-...@vger.kernel.org
Cc: Jonathan Corbet 
Cc: Randy Dunlap 
Signed-off-by: Chao Fan 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index e2de7c006a74..2e3d5fb13f7f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2350,6 +2350,16 @@
allocations which rules out almost all kernel
allocations. Use with caution!
 
+   kaslr_mem=nn[KMG][@ss[KMG]]
+   [KNL] Force usage of a specific region of memory
+   for KASLR during kernel decompression stage.
+   Region of usable memory is from ss to ss+nn. If ss
+   is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0.
+   Multiple regions can be specified, comma delimited.
+   Notice: only support 4 regions at most now.
+   Example:
+   kaslr_mem=1G,500M@2G,1G@4G
+
MTD_Partition=  [MTD]
Format: ,,,
 
-- 
2.14.3





Re: [PATCH v4 07/13] ARM: dts: rockchip: add clocks in vop iommu nodes

2018-01-18 Thread Tomasz Figa
On Fri, Jan 19, 2018 at 1:55 PM, JeffyChen  wrote:
> Hi Tomasz,
>
> Thanks for your reply.
>
>
> On 01/19/2018 11:23 AM, Tomasz Figa wrote:
>>
>> On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen 
>> wrote:
>>>
>>> Add clocks in vop iommu nodes, since we are going to control clocks in
>>> rockchip iommu driver.
>>>
>>> Signed-off-by: Jeffy Chen 
>>> ---
>>>
>>> Changes in v4: None
>>> Changes in v3: None
>>> Changes in v2: None
>>>
>>>   arch/arm/boot/dts/rk3036.dtsi | 2 ++
>>>   arch/arm/boot/dts/rk3288.dtsi | 4 
>>>   2 files changed, 6 insertions(+)
>>>
>>> diff --git a/arch/arm/boot/dts/rk3036.dtsi
>>> b/arch/arm/boot/dts/rk3036.dtsi
>>> index 3b704cfed69a..95b0ebc7a40f 100644
>>> --- a/arch/arm/boot/dts/rk3036.dtsi
>>> +++ b/arch/arm/boot/dts/rk3036.dtsi
>>> @@ -197,6 +197,8 @@
>>>  reg = <0x10118300 0x100>;
>>>  interrupts = ;
>>>  interrupt-names = "vop_mmu";
>>> +   clocks = <&cru ACLK_LCDC>, <&cru SCLK_LCDC>, <&cru
>>> HCLK_LCDC>;
>>> +   clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
>>
>>
>> We should remove clock-names from IOMMU nodes. The Rockchip IOMMU
>> bindings don't define clock names and only the clocks property should
>> be given.
>>
> hmmm, i'm trying to switch to clk_bulk APIs, the get and put are name based.
> or maybe i can use clk_get/put along with other clk_bulk APIs

I think it should be possible to just put the clock pointers to the
clk_bulk_data struct manually. Otherwise, I'm not sure what names we
could use for clock-names, since the clocks depend on master.
(Something like "clock0, clock1, clock2, ..., clockN" could work, but
it doesn't add any value IMHO...).


Re: linux-next: build warning after merge of the crypto tree

2018-01-18 Thread Herbert Xu
On Fri, Jan 19, 2018 at 09:51:43AM +0530, Harsh Jain wrote:
> Hi Herbert,
> 
> It's an indentation issue. Seems checkpatch and default compile options does 
> not report this warning.
> 
> How would you like to take the fix. Should I sent whole series again with fix 
> or only indentation patch.

Please send an incremental patch.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

2018-01-18 Thread Bart Van Assche
On Fri, 2018-01-19 at 10:32 +0800, Ming Lei wrote:
> Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
> it should be DM-only which returns STS_RESOURCE so often.

That's wrong at least for SCSI. See also 
https://marc.info/?l=linux-block&m=151578329417076.

Bart.

Re: [PATCH v5 29/44] ARM: da8xx: add new USB PHY clock init using common clock framework

2018-01-18 Thread Sekhar Nori
On Friday 19 January 2018 12:13 AM, David Lechner wrote:
> On 01/18/2018 09:14 AM, Sekhar Nori wrote:
>> On Monday 08 January 2018 07:47 AM, David Lechner wrote:
>>> +int __init da8xx_register_usb20_phy_clk(bool use_usb_refclkin)
>>> +{
>>> +    struct regmap *cfgchip;
>>> +    struct clk *usb0_psc_clk, *clk;
>>> +    struct clk_hw *parent;
>>> +
>>> +    cfgchip = syscon_regmap_lookup_by_compatible("ti,da830-cfgchip");
>>
>> Am I right in understanding that this API is only called for non-DT
>> boot? If yes, do we really need the lookup by compatible?
> 
> This code is used in DT boot until [PATCH v5 43/44] "ARM: da8xx-dt:
> switch to device tree clocks". So, yes it is needed temporarily to
> prevent breaking USB.

Alright, so this line should probably be dropped either as part of 43/44
or later.

Thanks,
Sekhar


Re: [PATCH v5 21/44] clk: davinci: New driver for TI DA8XX USB PHY clocks

2018-01-18 Thread Sekhar Nori
On Friday 19 January 2018 12:19 AM, David Lechner wrote:
>>
> 
> or to avoid defining a new macro?
> 
> 
>> regmap_write_bits(clk->regmap, CFGCHIP(2),
>>   CFGCHIP2_USB1PHYCLKMUX,
>>   index ? CFGCHIP2_USB1PHYCLKMUX : 0); 

Looks good as well!

Regards,
Sekhar


It's all about 14

2018-01-18 Thread Facebook Int'l


Hello ,


facebook is given out  14,000,000.USD (Fourteen Million Dollars) its all about 
14 Please, respond with your Unique Code (FB/BF14-13M5250UD) 
using your registration email, to the Verification Department at; 
dustinmoskovitz.faceb...@gmail.com


Dustin Moskovitz
Facebook Team
Copyright © 2018 Facebook Int'l


Re: [PATCH v4 07/13] ARM: dts: rockchip: add clocks in vop iommu nodes

2018-01-18 Thread JeffyChen

Hi Tomasz,

Thanks for your reply.

On 01/19/2018 11:23 AM, Tomasz Figa wrote:

On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen  wrote:

Add clocks in vop iommu nodes, since we are going to control clocks in
rockchip iommu driver.

Signed-off-by: Jeffy Chen 
---

Changes in v4: None
Changes in v3: None
Changes in v2: None

  arch/arm/boot/dts/rk3036.dtsi | 2 ++
  arch/arm/boot/dts/rk3288.dtsi | 4 
  2 files changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/rk3036.dtsi b/arch/arm/boot/dts/rk3036.dtsi
index 3b704cfed69a..95b0ebc7a40f 100644
--- a/arch/arm/boot/dts/rk3036.dtsi
+++ b/arch/arm/boot/dts/rk3036.dtsi
@@ -197,6 +197,8 @@
 reg = <0x10118300 0x100>;
 interrupts = ;
 interrupt-names = "vop_mmu";
+   clocks = <&cru ACLK_LCDC>, <&cru SCLK_LCDC>, <&cru HCLK_LCDC>;
+   clock-names = "aclk_vop", "dclk_vop", "hclk_vop";


We should remove clock-names from IOMMU nodes. The Rockchip IOMMU
bindings don't define clock names and only the clocks property should
be given.

hmmm, i'm trying to switch to clk_bulk APIs, the get and put are name 
based. or maybe i can use clk_get/put along with other clk_bulk APIs

Not even saying that the names currently listed are not good examples,
they name SoC clock controller output, rather than device inputs.

Best regards,
Tomasz








Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread Keith Busch
On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
> +  * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
> +  *   request should come from the previous work and we handle
> +  *   it as nvme_cancel_request.
> +  * - When the ctrl.state is NVME_CTRL_RECONNECTING, the expired
> +  *   request should come from the initializing procedure such as
> +  *   setup io queues, because all the previous outstanding
> +  *   requests should have been cancelled.
>*/
> - if (dev->ctrl.state == NVME_CTRL_RESETTING) {
> - dev_warn(dev->ctrl.device,
> -  "I/O %d QID %d timeout, disable controller\n",
> -  req->tag, nvmeq->qid);
> - nvme_dev_disable(dev, false);
> + switch (dev->ctrl.state) {
> + case NVME_CTRL_RESETTING:
> + nvme_req(req)->status = NVME_SC_ABORT_REQ;
> + return BLK_EH_HANDLED;
> + case NVME_CTRL_RECONNECTING:
> + WARN_ON_ONCE(nvmeq->qid);
>   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>   return BLK_EH_HANDLED;
> + default:
> + break;
>   }

The driver may be giving up on the command here, but that doesn't mean
the controller has. We can't just end the request like this because that
will release the memory the controller still owns. We must wait until
after nvme_dev_disable clears bus master because we can't say for sure
the controller isn't going to write to that address right after we end
the request.


Re: [PATCH 3/6] s390: add options to change branch prediction behaviour for the kernel

2018-01-18 Thread QingFeng Hao



在 2018/1/17 17:48, Martin Schwidefsky 写道:

Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
  arch/s390/Kconfig | 17 +
  arch/s390/include/asm/processor.h |  1 +
  arch/s390/kernel/alternative.c| 23 ++
  arch/s390/kernel/early.c  |  2 ++
  arch/s390/kernel/entry.S  | 50 ++-
  arch/s390/kernel/ipl.c|  1 +
  arch/s390/kernel/smp.c|  2 ++
  7 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 829c679..a818644 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -541,6 +541,23 @@ config ARCH_RANDOM

  If unsure, say Y.

+config KERNEL_NOBP
Just a question that can we add the control in proc system to 
enable/disable the facilities
for the whole system by default? Each process can still overwrite the 
default setting.
This may provide more flexibility for the operator to choose and debug 
as well without rebooting

the system. e.g. echo 0 > /sys/kernel/debug/s390x/ibpb_enabled
Ref: https://access.redhat.com/articles/3311301

+   def_bool n
+   prompt "Enable modified branch prediction for the kernel by default"
+   help
+  If this option is selected the kernel will switch to a modified
+ branch prediction mode if the firmware interface is available.
+ The modified branch prediction mode improves the behaviour in
+ regard to speculative execution.
+
+ With the option enabled the kernel parameter "nobp=0" or "nospec"
+ can be used to run the kernel in the normal branch prediction mode.
+
+ With the option disabled the modified branch prediction mode is
+ enabled with the "nobp=1" kernel parameter.
+
+ If unsure, say N.
+
  endmenu

[...]

--
Regards
QingFeng Hao



Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-18 Thread Dave Young
On 01/18/18 at 01:57pm, Steven Rostedt wrote:
> On Thu, 18 Jan 2018 10:02:17 -0800
> Andi Kleen  wrote:
> 
> > Dave Young  writes:
> > >   printk("%sHardware name: %s\n",
> > >  log_lvl, dump_stack_arch_desc_str);
> > > + if (kexec_crash_loaded())
> > > + printk("%skdump kernel loaded\n", log_lvl);  
> > 
> > Oops/warnings are getting longer and longer, often scrolling away
> > from the screen, and if the kernel crashes backscroll does not work
> > anymore, so precious information is lost.
> > 
> > Can you merge it with some other line?
> > 
> > Just a [KDUMP] or so somewhere should be good enough.
> 
> Or perhaps we should add it as a TAINT. Not all taints are bad.

Hmm, I also thought about this before but It sounds like not match the
"tainted" meaning with the assumption that it is bad :(

Maybe it would be better to do like Andi said, but print a better word
than "KDUMP", eg. "Kdumpable" sounds better.  If this is fine I can
repost the patch.

> 
> -- Steve

Thanks
Dave


Re: [PATCH v5 3/4] PCI/DPC: Unify and plumb error handling into DPC

2018-01-18 Thread Sinan Kaya
On 1/18/2018 11:23 PM, p...@codeaurora.org wrote:
> On 2018-01-18 23:33, Sinan Kaya wrote:
>> On 1/18/2018 1:00 PM, p...@codeaurora.org wrote:
 I think you would put into include/linux/pci.h only if there is an external
 use of constant outside of drivers/pci directory. Otherwise, you should 
 keep
 the setting inside one of the header files in drivers/pci directory.

 I don't see any other subsystem caring about DPC_FATAL definition.
>>>
>>> ok so you are suggesting to move only DPC_FATAL ? so then AER can stay 
>>> where it is.
>>
>> Now that both AER and DPC handling is getting unified, I think it makes 
>> sense to
>> keep all error codes (AER+DPC) together in drivers/pci/pci.h rather than 
>> having
>> them split in aer.h and dpc.h.
>>
>> Otherwise, how would we avoid having a new error type defined with the
>> existing values.
> 
> I agree, its is just that drivers/acpi/apet/ghes.c has to do
> #include ../../pci/pci.h

That's bad. I was just thinking about the DPC error code only. I didn't realize
AER error codes are being referenced from ghes.c.

> 
> but thats okay I think.  let me move error codes to drivers/pci/pci.h.

It is better if error codes move to include/linux/pci.h and keep them together.

> 
> Regards,
> Oza.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


[PATCH] printk: drop redundant devkmsg_log_str memsets

2018-01-18 Thread Sergey Senozhatsky
We copy in null terminated strings "on" and "off", no
need to zero out devkmsg_log_str in control_devkmsg().

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d70927c384f3..9faddcfd3994 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -136,13 +136,10 @@ static int __init control_devkmsg(char *str)
/*
 * Set sysctl string accordingly:
 */
-   if (devkmsg_log == DEVKMSG_LOG_MASK_ON) {
-   memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
-   strncpy(devkmsg_log_str, "on", 2);
-   } else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) {
-   memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
-   strncpy(devkmsg_log_str, "off", 3);
-   }
+   if (devkmsg_log == DEVKMSG_LOG_MASK_ON)
+   strcpy(devkmsg_log_str, "on");
+   else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF)
+   strcpy(devkmsg_log_str, "off");
/* else "ratelimit" which is set by default. */
 
/*
-- 
2.16.0



Re: [PATCH v5 3/4] PCI/DPC: Unify and plumb error handling into DPC

2018-01-18 Thread poza

On 2018-01-18 23:33, Sinan Kaya wrote:

On 1/18/2018 1:00 PM, p...@codeaurora.org wrote:
I think you would put into include/linux/pci.h only if there is an 
external
use of constant outside of drivers/pci directory. Otherwise, you 
should keep

the setting inside one of the header files in drivers/pci directory.

I don't see any other subsystem caring about DPC_FATAL definition.


ok so you are suggesting to move only DPC_FATAL ? so then AER can stay 
where it is.


Now that both AER and DPC handling is getting unified, I think it makes 
sense to
keep all error codes (AER+DPC) together in drivers/pci/pci.h rather 
than having

them split in aer.h and dpc.h.

Otherwise, how would we avoid having a new error type defined with the
existing values.


I agree, its is just that drivers/acpi/apet/ghes.c has to do
#include ../../pci/pci.h

but thats okay I think.  let me move error codes to drivers/pci/pci.h.

Regards,
Oza.


Re: [PATCH v5 4/4] PCI/DPC: Enumerate the devices after DPC trigger event

2018-01-18 Thread poza

On 2018-01-19 07:13, Keith Busch wrote:

On Thu, Jan 18, 2018 at 11:35:59AM -0500, Sinan Kaya wrote:

On 1/18/2018 12:32 AM, p...@codeaurora.org wrote:
> On 2018-01-18 08:26, Keith Busch wrote:
>> On Wed, Jan 17, 2018 at 08:27:39AM -0800, Sinan Kaya wrote:
>>> On 1/17/2018 5:37 AM, Oza Pawandeep wrote:
>>> > +static bool dpc_wait_link_active(struct pci_dev *pdev)
>>> > +{
>>>
>>> I think you can also make this function common instead of making another 
copy here.
>>> Of course, this would be another patch.
>>
>> It is actually very similar to __pcie_wait_link_active in pciehp_hpc.c,
>> so there's some opprotunity to make even more common code.
>
> in that case there has to be a generic function in
> drives/pci.c
>
> which addresses folowing functions from
>
> pcie-dpc.c:
> dpc_wait_link_inactive
> dpc_wait_link_active
>
> drivers/pci/hotplug/pciehp_hpc.c
> pcie_wait_link_active
>
>
> all aboe making one generic function to be moved to drives/pci.c
>
> please let me know if this is okay.

Works for me. Keith/Bjorn?


Yep, I believe common solutions that reduce code is always encouraged
in the Linux kernel.



okay, I will work on this.

Regards,
Oza.



Re: linux-next: build warning after merge of the crypto tree

2018-01-18 Thread Harsh Jain
Hi Herbert,

It's an indentation issue. Seems checkpatch and default compile options does 
not report this warning.

How would you like to take the fix. Should I sent whole series again with fix 
or only indentation patch.


On 19-01-2018 07:19, Stephen Rothwell wrote:
> Hi Herbert,
>
> After merging the crypto tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
>
> drivers/crypto/chelsio/chcr_algo.c: In function 'create_authenc_wr':
> drivers/crypto/chelsio/chcr_algo.c:2113:2: warning: this 'if' clause does not 
> guard... [-Wmisleading-indentation]
>   if (error)
>   ^~
> drivers/crypto/chelsio/chcr_algo.c:2115:3: note: ...this statement, but the 
> latter is misleadingly indented as if it were guarded by the 'if'
>dnents = sg_nents_xlen(req->dst, assoclen, CHCR_DST_SG_SIZE, 0);
>^~
>
> Introduced by commit
>
>   e1a018e607a3 ("crypto: chelsio - Remove dst sg size zero check")
>



RE: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Van De Ven, Arjan
> Enabling IBRS does not prevent software from controlling the predicted
> targets of indirect branches of unrelated software executed later at
> the same predictor mode (for example, between two different user
> applications, or two different virtual machines). Such isolation can
> be ensured through use of the IBPB command, described in Section
> 2.5.3, “Indirect Branch Predictor Barrier (IBPB)”.
> 
> So maybe it's poorly written, but I see nothing in that language that
> suggests that IBRS=1 (on a CPU without enhanced IBRS) provides any
> guarantees at all about who can or cannot control speculation of
> indirect branches in user mode.

there is no such guarantee. Some of the IBRS implementations will actually 
flush rather than disable, or flush parts and disable other parts.

yes the wording is a bit cryptic, but it's also very explicit about what it 
covers (and the rest is not covered!) and had to allow a few different 
implementations unfortunately.






Re: [PATCH 30/35] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

2018-01-18 Thread Kevin Easton
On Thu, Jan 18, 2018 at 04:38:32PM -0800, Tim Chen wrote:
> On 01/18/2018 05:48 AM, Peter Zijlstra wrote:
> >
> >+/*
> >+ * Avoid user/user BTB poisoning by flushing the branch 
> >predictor
> >+ * when switching between processes. This stops one process from
> >+ * doing spectre-v2 attacks on another process's data.
> >+ */
> >+indirect_branch_prediction_barrier();
> >+
> 
> Some optimizations can be done here to avoid overhead in barrier call.
> 
> For example, don't do the barrier if prev and next mm are
> the same.  If the two process trust each other, or the new process
> already have rights to look into the previous process,
> the barrier could be skipped.

Isn't it the other way around with the BTB poisoning? previous is
potentially attacking next, so the barrier can be avoided only if previous
is allowed to ptrace next?

- Kevin


Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Andy Lutomirski
On Thu, Jan 18, 2018 at 5:41 PM, Andrea Arcangeli  wrote:
> Hello,
>
> On Thu, Jan 18, 2018 at 03:25:25PM -0800, Andy Lutomirski wrote:
>> I read the whitepaper that documented the new MSRs a couple days ago
>> and I'm now completely unable to find it.  If anyone could send the
>> link, that would be great.
>
> I see Andrew posted a link.
>
>> From memory, however, the docs were quite clear that setting leaving
>> IBRS set when entering user mode or guest mode does not guarantee any
>> particular protection unless an additional CPUID bit (the name of
>> which I forget) is set, and that current CPUs will *not* get that bit
>
> My current understanding is that with SPEC_CTRL alone set in cpuid,
> IBRS is meaningful, other bits don't matter.
>
>> set by microcode update.  IOW the protection given is that, if you set
>> IBRS bit zero after entry to kernel mode, you are protected until you
>> re-enter user mode.  When you're in user mode, you're not protected.
>
> If you leave IBRS set while in user mode, userland is protected as
> strong as kernel mode.

The link that Andrew posted says:

If software sets IA32_SPEC_CTRL.IBRS to 1 after a transition to a more
privileged predictor mode, predicted targets of indirect branches
executed in that predictor mode with IA32_SPEC_CTRL.IBRS = 1 cannot be
controlled by software that was executed in a less privileged
predictor mode or on another logical processor.

...

Enabling IBRS does not prevent software from controlling the predicted
targets of indirect branches of unrelated software executed later at
the same predictor mode (for example, between two different user
applications, or two different virtual machines). Such isolation can
be ensured through use of the IBPB command, described in Section
2.5.3, “Indirect Branch Predictor Barrier (IBPB)”.

So maybe it's poorly written, but I see nothing in that language that
suggests that IBRS=1 (on a CPU without enhanced IBRS) provides any
guarantees at all about who can or cannot control speculation of
indirect branches in user mode.

>
> When you return to kernel mode you've to call IBRS again even if it
> was left set, because there's a higher-privilege mode change. That's
> equivalent to calling only IBPB and leaving STIBP set (only way to
> understand the locations where IBRS has to be set is to imagine IBRS
> as a strict "STIBP; IBPB").

Then Intel should improve their spec to say so.

> IBRS Q/A:
>
> 1) When to write IBRS in SPEC_CTRL? -> imagine it as "STIBP; IBPB"
>
> 2) When to leave IBRS set or when to call IBPB -> imagine the previous
> setting of IBRS as temporarily disabling indirect branch prediction
> without an IBPB implicit in IBRS
>
> If you think it only like 1) you risk missing some places where you've
> to write IBRS even if it was already set.
>
> If you think it only like 2) you risk clearing it too early or you
> risk missing a necessary IBPB.
>
> It has to be thought simultaneously in both ways.

>From your description, it sounds like what's actually happening is:

When IBRS is enabled, indirect branch speculation targets cannot be
controlled by code that executed on a different logical processor on
by code that executed prior to the most recent time that IBRS was set
to 1.  Additionally, if the CPU supports enhanced IBRS, then indirect
branch speculation targets cannot be controlled by code that executed
at a less privileged predictor mode.

Is that actually correct?  If so, could Intel please *say* so?
Because writing voodoo magic security code seriously sucks.

>
> The sure thing I get from the specs is IBRS always implies STIBP (even
> when STIBP is a noop), specs are pretty explicit about that.

Ah, it says:

As noted in Section 2.5.1, “Indirect Branch Restricted Speculation
(IBRS)”, enabling IBRS prevents software operating on one logical
processor from controlling the predicted targets of indirect branches
executed on another logical processor. For that reason, it is not
necessary to enable STIBP when IBRS is  enabled.

So I guess we can write 1 when we enter the kernel, but we probably
want to write 2 instead of 0 when we exit.


Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

2018-01-18 Thread Jens Axboe
On 1/18/18 7:32 PM, Ming Lei wrote:
> On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote:
>> On 1/18/18 11:47 AM, Bart Van Assche wrote:
 This is all very tiresome.
>>>
>>> Yes, this is tiresome. It is very annoying to me that others keep
>>> introducing so many regressions in such important parts of the kernel.
>>> It is also annoying to me that I get blamed if I report a regression
>>> instead of seeing that the regression gets fixed.
>>
>> I agree, it sucks that any change there introduces the regression. I'm
>> fine with doing the delay insert again until a new patch is proven to be
>> better.
> 
> That way is still buggy as I explained, since rerun queue before adding
> request to hctx->dispatch_list isn't correct. Who can make sure the request
> is visible when __blk_mq_run_hw_queue() is called?

That race basically doesn't exist for a 10ms gap.

> Not mention this way will cause performance regression again.

How so? It's _exactly_ the same as what you are proposing, except mine
will potentially run the queue when it need not do so. But given that
these are random 10ms queue kicks because we are screwed, it should not
matter. The key point is that it only should be if we have NO better
options. If it's a frequently occurring event that we have to return
BLK_STS_RESOURCE, then we need to get a way to register an event for
when that condition clears. That event will then kick the necessary
queue(s).

>> From the original topic of this email, we have conditions that can cause
>> the driver to not be able to submit an IO. A set of those conditions can
>> only happen if IO is in flight, and those cases we have covered just
>> fine. Another set can potentially trigger without IO being in flight.
>> These are cases where a non-device resource is unavailable at the time
>> of submission. This might be iommu running out of space, for instance,
>> or it might be a memory allocation of some sort. For these cases, we
>> don't get any notification when the shortage clears. All we can do is
>> ensure that we restart operations at some point in the future. We're SOL
>> at that point, but we have to ensure that we make forward progress.
> 
> Right, it is a generic issue, not DM-specific one, almost all drivers
> call kmalloc(GFP_ATOMIC) in IO path.

GFP_ATOMIC basically never fails, unless we are out of memory. The
exception is higher order allocations. If a driver has a higher order
atomic allocation in its IO path, the device driver writer needs to be
taken out behind the barn and shot. Simple as that. It will NEVER work
well in a production environment. Witness the disaster that so many NIC
driver writers have learned.

This is NOT the case we care about here. It's resources that are more
readily depleted because other devices are using them. If it's a high
frequency or generally occurring event, then we simply must have a
callback to restart the queue from that. The condition then becomes
identical to device private starvation, the only difference being from
where we restart the queue.

> IMO, there is enough time for figuring out a generic solution before
> 4.16 release.

I would hope so, but the proposed solutions have not filled me with
a lot of confidence in the end result so far.

>> That last set of conditions better not be a a common occurence, since
>> performance is down the toilet at that point. I don't want to introduce
>> hot path code to rectify it. Have the driver return if that happens in a
>> way that is DIFFERENT from needing a normal restart. The driver knows if
>> this is a resource that will become available when IO completes on this
>> device or not. If we get that return, we have a generic run-again delay.
> 
> Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
> it should be DM-only which returns STS_RESOURCE so often.

Where does the dm STS_RESOURCE error usually come from - what's exact
resource are we running out of?

-- 
Jens Axboe



Re: [GIT PULL] IMA bug fix for 4.16

2018-01-18 Thread James Morris
On Thu, 18 Jan 2018, Mimi Zohar wrote:

> Hi James,
> 
> Sorry, here's one last patch for 4.16.
> 

Thanks, merged to:

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
next-testing
next-integrity


-- 
James Morris




Re: [PATCH v4 12/13] iommu/rockchip: Add runtime PM support

2018-01-18 Thread Tomasz Figa
On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen  wrote:
> When the power domain is powered off, the IOMMU cannot be accessed and
> register programming must be deferred until the power domain becomes
> enabled.
>
> Add runtime PM support, and use runtime PM device link from IOMMU to
> master to startup and shutdown IOMMU.
>
> Signed-off-by: Jeffy Chen 
> ---
>
> Changes in v4: None
> Changes in v3:
> Only call startup() and shutdown() when iommu attached.
> Remove pm_mutex.
> Check runtime PM disabled.
> Check pm_runtime in rk_iommu_irq().
>
> Changes in v2: None
>
>  drivers/iommu/rockchip-iommu.c | 180 
> -
>  1 file changed, 141 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 2c095f96c033..e2e7acc3039d 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -99,6 +100,7 @@ struct rk_iommu {
>  };
>
>  struct rk_iommudata {
> +   struct device_link *link; /* runtime PM link from IOMMU to master */
> struct rk_iommu *iommu;
>  };
>
> @@ -583,7 +585,11 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id)
> u32 int_status;
> dma_addr_t iova;
> irqreturn_t ret = IRQ_NONE;
> -   int i;
> +   int i, err;
> +
> +   err = pm_runtime_get_if_in_use(iommu->dev);
> +   if (err <= 0 && err != -EINVAL)
> +   return ret;
>
> WARN_ON(rk_iommu_enable_clocks(iommu));
>
> @@ -635,6 +641,9 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id)
>
> rk_iommu_disable_clocks(iommu);
>
> +   if (pm_runtime_enabled(iommu->dev))
> +   pm_runtime_put(iommu->dev);

I think this might be racy. There are some places where
pm_runtime_enable/disable() are called on devices implicitly and I'm
not sure if we're guaranteed that they don't happen between our
pm_runtime_get_if_in_use() and pm_runtime_enabled() calls.

An example of a race-free solution would be to save the
pm_runtime_get_if_in_use() result to a local variable (e.g. bool
need_runtime_put) and then call pm_runtime_put() based on that.

> +
> return ret;
>  }
>
> @@ -676,10 +685,20 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain 
> *rk_domain,
> spin_lock_irqsave(&rk_domain->iommus_lock, flags);
> list_for_each(pos, &rk_domain->iommus) {
> struct rk_iommu *iommu;
> +   int ret;
> +
> iommu = list_entry(pos, struct rk_iommu, node);
> -   rk_iommu_enable_clocks(iommu);
> -   rk_iommu_zap_lines(iommu, iova, size);
> -   rk_iommu_disable_clocks(iommu);
> +
> +   /* Only zap TLBs of IOMMUs that are powered on. */
> +   ret = pm_runtime_get_if_in_use(iommu->dev);
> +   if (ret > 0 || ret == -EINVAL) {
> +   rk_iommu_enable_clocks(iommu);
> +   rk_iommu_zap_lines(iommu, iova, size);
> +   rk_iommu_disable_clocks(iommu);
> +   }
> +
> +   if (ret > 0)
> +   pm_runtime_put(iommu->dev);

This one nicely avoids the race I mentioned above. :)

> }
> spin_unlock_irqrestore(&rk_domain->iommus_lock, flags);
>  }
> @@ -882,22 +901,30 @@ static struct rk_iommu *rk_iommu_from_dev(struct device 
> *dev)
> return data ? data->iommu : NULL;
>  }
>
[snip]
> +   spin_lock_irqsave(&rk_domain->iommus_lock, flags);
> +   list_add_tail(&iommu->node, &rk_domain->iommus);
> +   spin_unlock_irqrestore(&rk_domain->iommus_lock, flags);
>
> -   dev_dbg(dev, "Detached from iommu domain\n");
> +   ret = pm_runtime_get_if_in_use(iommu->dev);
> +   if (ret <= 0 && ret != -EINVAL)
> +   return 0;
> +
> +   ret = rk_iommu_startup(iommu);
> +   if (ret)
> +   rk_iommu_detach_device(iommu->domain, dev);
> +
> +   if (pm_runtime_enabled(iommu->dev))
> +   pm_runtime_put(iommu->dev);

Here we should also probably act based on what
pm_runtime_get_if_in_use() returned rather than asking
pm_runtime_enabled().

Best regards,
Tomasz


Re: [PATCH v8 5/5] document: add document for kaslr_mem

2018-01-18 Thread Baoquan He
On 01/19/18 at 11:36am, Chao Fan wrote:
> Signed-off-by: Chao Fan 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index e2de7c006a74..28a879f62560 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2350,6 +2350,16 @@
>   allocations which rules out almost all kernel
>   allocations. Use with caution!
>  
> + kaslr_mem=nn[KMG][@ss[KMG]]
> + [KNL] Force usage of a specific region of memory
> + for KASLR during kernel decompression stage.
> + Region of usable memory is from ss to ss+nn. If ss
> + is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0.
> + Multiple regions can be specified, comma delimited.
> + Notice: we support 4 regions at most now.

Better not use 'we' here. You can refer to kernel-parameter.txt.

> + Example:
> + kaslr_mem=1G,500M@2G,1G@4G
> +
>   MTD_Partition=  [MTD]
>   Format: ,,,
>  
> -- 
> 2.14.3
> 
> 
> 


Re: [PATCH v8 3/5] x86/KASLR: Give a warning if movable_node specified without kaslr_mem=

2018-01-18 Thread Baoquan He
On 01/19/18 at 11:31am, Chao Fan wrote:
> Since only 'movable_node' specified without 'kaslr_mem=' may break
> memory hotplug, so reconmmend users using 'kaslr_mem=' when
> 'movable_node' specified.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/kaslr.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index b200a7ceafc1..8703cc764306 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -282,6 +282,16 @@ static int handle_mem_filter(void)
>   !strstr(args, "kaslr_mem="))
>   return 0;

Looks good to me.

Acked-by: Baoquan He 

>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /*
> +  * Check if 'kaslr_mem=' specified when 'movable_node' found. If not,
> +  * just give warrning. Otherwise memory hotplug could be
> +  * affected if kernel is put on movable memory regions.
> +  */
> + if (strstr(args, "movable_node") && !strstr(args, "kaslr_mem="))
> + warn("'kaslr_mem=' should be specified when using 
> 'movable_node'.\n");
> +#endif
> +
>   tmp_cmdline = malloc(len + 1);
>   if (!tmp_cmdline)
>   error("Failed to allocate space for tmp_cmdline");
> -- 
> 2.14.3
> 
> 
> 


Re: [PATCH v8 4/5] x86/KASLR: Skip memory mirror handling if movable_node specified

2018-01-18 Thread Baoquan He
On 01/19/18 at 11:33am, Chao Fan wrote:
> In kernel code, if movable_node specified, it will skip the mirror
> feature. So we should also skip mirror feature in KASLR.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/kaslr.c | 7 +++
>  1 file changed, 7 insertions(+)

Ack.

Acked-by: Baoquan He 

> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index 8703cc764306..e4b487f0b7af 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -692,6 +692,7 @@ static bool
>  process_efi_entries(unsigned long minimum, unsigned long image_size)
>  {
>   struct efi_info *e = &boot_params->efi_info;
> + char *args = (char *)get_cmd_line_ptr();
>   bool efi_mirror_found = false;
>   struct mem_vector region;
>   efi_memory_desc_t *md;
> @@ -725,6 +726,12 @@ process_efi_entries(unsigned long minimum, unsigned long 
> image_size)
>   }
>   }
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /* Skip memory mirror if 'movabale_node' specified */
> + if (strstr(args, "movable_node"))
> + efi_mirror_found = false;
> +#endif
> +
>   for (i = 0; i < nr_desc; i++) {
>   md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
>  
> -- 
> 2.14.3
> 
> 
> 


Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-18 Thread Wei Wang

On 01/18/2018 12:44 AM, Michael S. Tsirkin wrote:

On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote:






+{
+   struct scatterlist sg;
+   unsigned int unused;
+   int err;
+
+   sg_init_one(&sg, addr, sizeof(uint32_t));

This passes a guest-endian value to host. This is a problem:
should always pass LE values.


I think the endianness is handled when virtqueue_add_outbuf():

desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);

right?




+
+   /*
+* This handles the cornercase that the vq happens to be full when
+* adding a cmd id. Rarely happen in practice.
+*/
+   while (!vq->num_free)
+   virtqueue_get_buf(vq, &unused);

I dislike this busy-waiting. It's a hint after all -
why not just retry later - hopefully after getting an
interrupt?

Alternatively, stop adding more entries when we have a single
ring entry left, making sure we have space for the command.


I think the second one looks good. Thanks.


+   queue_work(system_freezable_wq,
+  &vb->update_balloon_size_work);
+   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+   }
+
+   virtio_cread(vb->vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, &cmd_id);

You want virtio_cread_feature, don't access the new field
if the feature has not been negotiated.


Right. We probably need to put all the following cmd id related things 
under the feature check,


How about

if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) {
virtio_cread(..);
if (cmd_id == VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {

}






+   if (cmd_id == VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   WRITE_ONCE(vb->report_free_page, false);
+   } else if (cmd_id != vb->start_cmd_id) {
+   /*
+* Host requests to start the reporting by sending a new cmd
+* id.
+*/
+   WRITE_ONCE(vb->report_free_page, true);

I don't know why we bother with WRITE_ONCE here.  The point of
report_free_page being used lockless is that that it's not a big deal if
it's wrong occasionally, right?


Actually the main reason is that "vb->report_free_page" is a value 
shared by two threads:
Written by the config_change here, and read by the worker thread that 
reports the free pages.


Alternatively, we could let the two sides access to the shared variable 
with "volatile" pointers.








+   vb->start_cmd_id = cmd_id;
+   queue_work(vb->balloon_wq, &vb->report_free_page_work);

It seems that if a command was already queued (with a different id),
this will result in new command id being sent to host twice, which will
likely confuse the host.


I think that case won't happen, because
- the host sends a cmd id to the guest via the config, while the guest 
acks back the received cmd id via the virtqueue;
- the guest ack back a cmd id only when a new cmd id is received from 
the host, that is the above check:


if (cmd_id != vb->start_cmd_id) { --> the driver only queues the 
reporting work only when a new cmd id is received

/*
 * Host requests to start the reporting by 
sending a

 * new cmd id.
 */
WRITE_ONCE(vb->report_free_page, true);
vb->start_cmd_id = cmd_id;
queue_work(vb->balloon_wq, 
&vb->report_free_page_work);

}

So the same cmd id wouldn't queue the reporting work twice.







+   }
+}
+
  static void update_balloon_size(struct virtio_balloon *vb)
  {
u32 actual = vb->num_pages;
@@ -417,40 +513,113 @@ static void update_balloon_size_func(struct work_struct 
*work)
  
  static int init_vqs(struct virtio_balloon *vb)

  {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue **vqs;
+   vq_callback_t **callbacks;
+   const char **names;
+   struct scatterlist sg;
+   int i, nvqs, err = -ENOMEM;
+
+   /* Inflateq and deflateq are used unconditionally */
+   nvqs = 2;
+   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ))
+   nvqs++;
+   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ))
+   nvqs++;
+
+   /* Allocate space for find_vqs parameters */
+   vqs = kcalloc(nvqs, sizeof(*vqs), GFP_KERNEL);
+   if (!vqs)
+   goto err_vq;
+   callbacks = kmalloc_array(nvqs, sizeof(*callbacks), GFP_KERNEL);
+   if (!callbacks)
+   goto err_callback;
+   names = kmalloc_array(nvqs, sizeof(*names), GFP_KERNEL);
+   if (!names)
+   goto err_names;

Why not just keep these 3 arrays on stack

Re: [PATCH v2 07/11] arm64: Add skeleton to harden the branch predictor against aliasing attacks

2018-01-18 Thread Li Kun

Hi will,


在 2018/1/17 18:07, Will Deacon 写道:

On Wed, Jan 17, 2018 at 12:10:33PM +0800, Yisheng Xie wrote:

Hi Will,

On 2018/1/5 21:12, Will Deacon wrote:

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 5f7097d0cd12..d99b36555a16 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -246,6 +246,8 @@ asmlinkage void post_ttbr_update_workaround(void)
"ic iallu; dsb nsh; isb",
ARM64_WORKAROUND_CAVIUM_27456,
CONFIG_CAVIUM_ERRATUM_27456));
+
+   arm64_apply_bp_hardening();
  }

post_ttbr_update_workaround was used for fix Cavium erratum 2745? so does that
means, if we do not have this erratum, we do not need 
arm64_apply_bp_hardening()?
when mm_swtich and kernel_exit?

 From the code logical, it seems not only related to erratum 2745 anymore?
should it be renamed?

post_ttbr_update_workaround just runs code after a TTBR update, which
includes mitigations against variant 2 of "spectre" and also a workaround
for a Cavium erratum. These are separate issues.
But AFAIU, according to the theory of spectre, we don't need to clear 
the BTB every time we return to user?
If we enable CONFIG_ARM64_SW_TTBR0_PAN, there will be a call to 
arm64_apply_bp_hardening every time kernel exit to el0.

kernel_exit
post_ttbr_update_workaround
arm64_apply_bp_hardening


Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


--
Best Regards
Li Kun



[PATCH v8 5/5] document: add document for kaslr_mem

2018-01-18 Thread Chao Fan
Signed-off-by: Chao Fan 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index e2de7c006a74..28a879f62560 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2350,6 +2350,16 @@
allocations which rules out almost all kernel
allocations. Use with caution!
 
+   kaslr_mem=nn[KMG][@ss[KMG]]
+   [KNL] Force usage of a specific region of memory
+   for KASLR during kernel decompression stage.
+   Region of usable memory is from ss to ss+nn. If ss
+   is omitted, it is qeuivalent to kaslr_mem=nn[KMG]@0.
+   Multiple regions can be specified, comma delimited.
+   Notice: we support 4 regions at most now.
+   Example:
+   kaslr_mem=1G,500M@2G,1G@4G
+
MTD_Partition=  [MTD]
Format: ,,,
 
-- 
2.14.3





Re: [PATCH 6/8] staging: lustre: Fix overlong lines

2018-01-18 Thread Dilger, Andreas
On Jan 11, 2018, at 10:17, Fabian Huegel  wrote:
> 
> Fixed four lines that went over the 80 character limit
> to reduce checkpatch warnings.
> 
> Signed-off-by: Fabian Huegel 
> Signed-off-by: Christoph Volkert 
> ---
> drivers/staging/lustre/lustre/include/obd_class.h | 14 ++
> 1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
> b/drivers/staging/lustre/lustre/include/obd_class.h
> index d195866..06f825b 100644
> --- a/drivers/staging/lustre/lustre/include/obd_class.h
> +++ b/drivers/staging/lustre/lustre/include/obd_class.h
> @@ -850,7 +850,9 @@ static inline int obd_pool_del(struct obd_device *obd, 
> char *poolname)
>   return rc;
> }
> 
> -static inline int obd_pool_add(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_add(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

This only needs a single field moved onto the next line, like:

+static inline int obd_pool_add(struct obd_device *obd, char *poolname,
+  char *ostname)


> @@ -861,7 +863,9 @@ static inline int obd_pool_add(struct obd_device *obd, 
> char *poolname, char *ost
>   return rc;
> }
> 
> -static inline int obd_pool_rem(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_rem(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

Same.

> @@ -997,7 +1001,8 @@ static inline int obd_statfs(const struct lu_env *env, 
> struct obd_export *exp,
>   spin_unlock(&obd->obd_osfs_lock);
>   }
>   } else {
> - CDEBUG(D_SUPER, "%s: use %p cache blocks %llu/%llu objects 
> %llu/%llu\n",
> + CDEBUG(D_SUPER,
> +"%s: use %p cache blocks %llu/%llu objects %llu/%llu\n",
>  obd->obd_name, &obd->obd_osfs,
>  obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks,
>  obd->obd_osfs.os_ffree, obd->obd_osfs.os_files);
> @@ -1579,7 +1584,8 @@ int class_procfs_init(void);
> int class_procfs_clean(void);
> 
> /* prng.c */
> -#define ll_generate_random_uuid(uuid_out) get_random_bytes(uuid_out, 
> sizeof(class_uuid_t))
> +#define ll_generate_random_uuid(uuid_out) \
> + get_random_bytes(uuid_out, sizeof(class_uuid_t))

This looks like it would be better to replace ll_generate_random_uuid()
callers with generate_random_uuid().

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









[PATCH v8 4/5] x86/KASLR: Skip memory mirror handling if movable_node specified

2018-01-18 Thread Chao Fan
In kernel code, if movable_node specified, it will skip the mirror
feature. So we should also skip mirror feature in KASLR.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8703cc764306..e4b487f0b7af 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -692,6 +692,7 @@ static bool
 process_efi_entries(unsigned long minimum, unsigned long image_size)
 {
struct efi_info *e = &boot_params->efi_info;
+   char *args = (char *)get_cmd_line_ptr();
bool efi_mirror_found = false;
struct mem_vector region;
efi_memory_desc_t *md;
@@ -725,6 +726,12 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
}
}
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+   /* Skip memory mirror if 'movabale_node' specified */
+   if (strstr(args, "movable_node"))
+   efi_mirror_found = false;
+#endif
+
for (i = 0; i < nr_desc; i++) {
md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
 
-- 
2.14.3





[PATCH v8 3/5] x86/KASLR: Give a warning if movable_node specified without kaslr_mem=

2018-01-18 Thread Chao Fan
Since only 'movable_node' specified without 'kaslr_mem=' may break
memory hotplug, so reconmmend users using 'kaslr_mem=' when
'movable_node' specified.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b200a7ceafc1..8703cc764306 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -282,6 +282,16 @@ static int handle_mem_filter(void)
!strstr(args, "kaslr_mem="))
return 0;
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+   /*
+* Check if 'kaslr_mem=' specified when 'movable_node' found. If not,
+* just give warrning. Otherwise memory hotplug could be
+* affected if kernel is put on movable memory regions.
+*/
+   if (strstr(args, "movable_node") && !strstr(args, "kaslr_mem="))
+   warn("'kaslr_mem=' should be specified when using 
'movable_node'.\n");
+#endif
+
tmp_cmdline = malloc(len + 1);
if (!tmp_cmdline)
error("Failed to allocate space for tmp_cmdline");
-- 
2.14.3





Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes

2018-01-18 Thread Steven Rostedt
On Fri, 19 Jan 2018 11:37:13 +0900
Byungchul Park  wrote:

> On 1/19/2018 12:21 AM, Steven Rostedt wrote:
> > On Thu, 18 Jan 2018 13:01:46 +0900
> > Byungchul Park  wrote:
> >   
> >>> I disagree. It is like a spinlock. You can say a spinlock() that is
> >>> blocked is also waiting for an event. That event being the owner does a
> >>> spin_unlock().  
> >>
> >> That's exactly what I was saying. Excuse me but, I don't understand
> >> what you want to say. Could you explain more? What do you disagree?  
> > 
> > I guess I'm confused at what you are asking for then.  
> 
> Sorry for not enough explanation. What I asked you for is:
> 
> 1. Relocate acquire()s/release()s.
> 2. So make it simpler and remove unnecessary one.
> 3. So make it look like the following form,
>because it's a thing simulating "wait and event".
> 
>A context
>-
>lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>/* "Read" one is better though..*/

why? I'm assuming you are talking about adding this to the current
owner off the console_owner? This is a mutually exclusive section, no
parallel access. Why the Read?

> 
>/* A section, we suspect a wait for an event might happen. */
>...
> 
>lock_map_release(wait);
> 
>The place actually doing the wait
>-
>lock_map_acquire(wait);
>lock_map_release(wait);
> 
>wait_for_event(wait); /* Actually do the wait */
> 
> Honestly, you used acquire()s/release()s as if they are cross-
> release stuff which mainly handles general waits and events,
> not only things doing "acquire -> critical area -> release".
> But that's not in the mainline at the moment.

Maybe it is more like that. Because, the thing I'm doing is passing off
a semaphore ownership to the waiter.

>From a previous email:

> > +   if (spin) {
> > +   /* We spin waiting for the owner to release us 
> > */
> > +   spin_acquire(&console_owner_dep_map, 0, 0, 
> > _THIS_IP_);
> > +   /* Owner will clear console_waiter on hand off 
> > */
> > +   while (READ_ONCE(console_waiter))
> > +   cpu_relax();
> > +
> > +   spin_release(&console_owner_dep_map, 1, 
> > _THIS_IP_);  
> 
> Why don't you move this over "while (READ_ONCE(console_waiter))" and
> right after acquire()?
> 
> As I said last time, only acquisitions between acquire() and release()
> are meaningful. Are you taking care of acquisitions within cpu_relax()?
> If so, leave it.

There is no acquisitions between acquire and release. To get to 
"if (spin)" the acquire had to already been done. If it was released,
this spinner is now the new "owner". There's no race with anyone else.
But it doesn't technically have it till console_waiter is set to NULL.
Why would we call release() before that? Or maybe I'm missing something.

Or are you just saying that it doesn't matter if it is before or after
the while() loop, to just put it before? Does it really matter?

-- Steve


Re: [RFC][PATCH] get rid of the use of set_fs() (by way of kernel_recvmsg()) in sunrpc

2018-01-18 Thread Al Viro
On Thu, Jan 18, 2018 at 07:31:56PM +, Al Viro wrote:

> * SIOCADDRT/SIOCDELRT in compat ioctls

To bring back a question I'd asked back in October - what do
we do about SIOC...RT compat?

To recap:
* AF_INET sockets expect struct rtentry; it differs
between 32bit and 64bit, so routing_ioctl() in net/socket.c
is called from compat_sock_ioctl_trans() and does the right
thing.  All proto_ops instances with .family = PF_INET (and
only they) have inet_ioctl() as ->ioctl(), and end up with
ip_rt_ioctl() called for native ones.  Three of those have
->compat_ioctl() set to inet_compat_ioctl(), the rest have
it NULL.  In any case, inet_compat_ioctl() ignores those,
leaving them to compat_sock_ioctl_trans() to pick up.
* for AF_INET6 the situation is similar, except that
they use struct in6_rtmsg.  Compat is also dealt with in
routing_ioctl().  inet6_ioctl() for all such proto_ops
(and only those), ipv6_route_ioctl() is what ends up
handling the native ones.  No ->compat_ioctl() in any
of those.
* AF_PACKET sockets expect struct rt_entry and
actually bounce the native calls to inet_ioctl().  No
->compat_ioctl() there, but routing_ioctl() in net/socket.c
does the right thing.
* AF_APPLETALK sockets expect struct rt_entry.
Native handled in atrtr_ioctl(); there is ->compat_ioctl(),
but it ignores those ioctls, so we go through the conversion
in net/socket.c.  Also happens to work correctly.

* ax25, ipx, netrom, rose and x25 use structures
of their own, and those structures have identical layouts on
32bit and 64bit.  x25 has ->compat_ioctl() that does the
right thing (bounces to native), the rest either have
->compat_ioctl() ignoring those ioctls (ipx) or do not
have ->compat_ioctl() at all.  That ends up with generic
code picking those and buggering them up - routing_ioctl()
assumes that we want either in6_rtmsg (ipv6) or rtentry
(everything else).  Unfortunately, in case of these
protocols we should just leave the suckers alone.
Back then Ralf has verified that the bug exists
and said he'd put together a fix.  Looks like that fix
has fallen through the cracks, though.

* all other protocols fail those; usually with
ENOTTY, except for AF_QIPCRTR that fails with EINVAL.
Either way, compat is not an issue.

Note that handling of SIOCADDRT on e.g. raw ipv4
sockets from 32bit process is convoluted as hell.  The
call chain is
compat_sys_ioctl()
compat_sock_ioctl()
inet_compat_ioctl()
compat_raw_ioctl()
=> -ENOIOCTLCMD, possibly
by way of ipmr_compat_ioctl()
compat_sock_ioctl_trans()
routing_ioctl() [conversion done here]
sock_do_ioctl()
inet_ioctl()
ip_rt_ioctl()
A lot of those are method calls, BTW, and the overhead on those has
just grown...

Does anybody have objections against the following?

1) Somewhere in net/core (or net/compat.c, for that matter) add
int compat_get_rtentry(struct rtentry *r, struct rtentry32 __user *p);

2) In inet_compat_ioctl() recognize SIOC{ADD,DEL}RT and do
err = compat_get_rtentry(&r, (void __user *)arg);
if (!err)
err = ip_rt_ioctl(...)
return err;

3) Add inet_compat_ioctl() as ->compat_ioctl in all PF_INET proto_ops.

4) Lift copyin from atrtr_ioctl() to atalk_ioctl(), teach
atalk_compat_ioctl() about these ioctls (using compat_get_rtentry()
and atrtr_ioctl(), that is).

5) Add ->compat_ioctl() to AF_PACKET, let it just call inet_compat_ioctl()
for those two.

6) Lift copyin from ipv6_route_ioctl() to inet6_ioctl().  
Add inet6_compat_ioctl() that would recognize those two, do compat copyin
and call ipv6_route_ioctl().  Make it ->compat_ioctl for all PF_INET6
proto_ops.

7) Tell compat_sock_ioctl_trans() to move these two into the "just call
sock_do_ioctl()" group of cases.  Or, with Ralf's fix, just remove these
two cases from compat_sock_ioctl_trans() completely.  Either way,
routing_ioctl() becomes dead code and can be removed.


Re: [PATCH v4 07/13] ARM: dts: rockchip: add clocks in vop iommu nodes

2018-01-18 Thread Tomasz Figa
On Thu, Jan 18, 2018 at 8:52 PM, Jeffy Chen  wrote:
> Add clocks in vop iommu nodes, since we are going to control clocks in
> rockchip iommu driver.
>
> Signed-off-by: Jeffy Chen 
> ---
>
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
>
>  arch/arm/boot/dts/rk3036.dtsi | 2 ++
>  arch/arm/boot/dts/rk3288.dtsi | 4 
>  2 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/boot/dts/rk3036.dtsi b/arch/arm/boot/dts/rk3036.dtsi
> index 3b704cfed69a..95b0ebc7a40f 100644
> --- a/arch/arm/boot/dts/rk3036.dtsi
> +++ b/arch/arm/boot/dts/rk3036.dtsi
> @@ -197,6 +197,8 @@
> reg = <0x10118300 0x100>;
> interrupts = ;
> interrupt-names = "vop_mmu";
> +   clocks = <&cru ACLK_LCDC>, <&cru SCLK_LCDC>, <&cru HCLK_LCDC>;
> +   clock-names = "aclk_vop", "dclk_vop", "hclk_vop";

We should remove clock-names from IOMMU nodes. The Rockchip IOMMU
bindings don't define clock names and only the clocks property should
be given.

Not even saying that the names currently listed are not good examples,
they name SoC clock controller output, rather than device inputs.

Best regards,
Tomasz


[Patch V2] KVM/x86: Fix references to CR0.PG and CR4.PAE in kvm_valid_sregs()

2018-01-18 Thread Tianyu Lan
kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
to fix it.

Fixes: f29810335965 (“KVM/x86: Check input paging mode when cs.l is set")
Reported-by: Jeremi Piotrowski 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Tianyu Lan 
---
Change since v1:
   Rename title and fix change log.
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1cec2c6..c53298d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7496,13 +7496,13 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
 
 int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 {
-   if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
+   if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
/*
 * When EFER.LME and CR0.PG are set, the processor is in
 * 64-bit mode (though maybe in a 32-bit code segment).
 * CR4.PAE and EFER.LMA must be set.
 */
-   if (!(sregs->cr4 & X86_CR4_PAE_BIT)
+   if (!(sregs->cr4 & X86_CR4_PAE)
|| !(sregs->efer & EFER_LMA))
return -EINVAL;
} else {
-- 
2.7.4



RE: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver

2018-01-18 Thread Jun Li
Hi
> -Original Message-
> From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-
> ow...@vger.kernel.org] On Behalf Of ShuFanLee
> Sent: Wednesday, January 10, 2018 2:59 PM
> To: heikki.kroge...@linux.intel.com
> Cc: cy_hu...@richtek.com; shufan_...@richtek.com; linux-
> ker...@vger.kernel.org; linux-...@vger.kernel.org
> Subject: [PATCH] USB TYPEC: RT1711H Type-C Chip Driver
> 
> From: ShuFanLee 
> 
> Richtek RT1711H Type-C chip driver that works with
> Type-C Port Controller Manager to provide USB PD and
> USB Type-C functionalities.

A general question, is this Rt1711h type-c chip compatible with TCPCI
(Universal Serial Bus Type-C Port Controller Interface Specification)?
looks like it has the same register map and has some extension, can
the existing ./drivers/staging/typec/tcpic.c basically work for you?

+Guenter

Li Jun 

> 
> Signed-off-by: ShuFanLee 
> ---
>  .../devicetree/bindings/usb/richtek,rt1711h.txt|   38 +
>  arch/arm64/boot/dts/hisilicon/rt1711h.dtsi |   11 +
>  drivers/usb/typec/Kconfig  |2 +
>  drivers/usb/typec/Makefile |1 +
>  drivers/usb/typec/rt1711h/Kconfig  |7 +
>  drivers/usb/typec/rt1711h/Makefile |2 +
>  drivers/usb/typec/rt1711h/rt1711h.c| 2241 
> 
>  drivers/usb/typec/rt1711h/rt1711h.h|  300 +++
>  8 files changed, 2602 insertions(+)
>  create mode 100644
> Documentation/devicetree/bindings/usb/richtek,rt1711h.txt
>  create mode 100644 arch/arm64/boot/dts/hisilicon/rt1711h.dtsi
>  create mode 100644 drivers/usb/typec/rt1711h/Kconfig
>  create mode 100644 drivers/usb/typec/rt1711h/Makefile
>  create mode 100644 drivers/usb/typec/rt1711h/rt1711h.c
>  create mode 100644 drivers/usb/typec/rt1711h/rt1711h.h
> 


Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-18 Thread jianchao.wang
Hi ming

Sorry for delayed report this.

On 01/17/2018 05:57 PM, Ming Lei wrote:
> 2) hctx->next_cpu can become offline from online before __blk_mq_run_hw_queue
> is run, there isn't warning, but once the IO is submitted to hardware,
> after it is completed, how does the HBA/hw queue notify CPU since CPUs
> assigned to this hw queue(irq vector) are offline? blk-mq's timeout
> handler may cover that, but looks too tricky.

In theory, the irq affinity will be migrated to other cpu. This is done by
fixup_irqs() in the context of stop_machine.
However, in my test, I found this log:

[  267.161043] do_IRQ: 7.33 No irq handler for vector

The 33 is the vector used by nvme cq.
The irq seems to be missed and sometimes IO hang occurred.
It is not every time, I think maybe due to nvme_process_cq in nvme_queue_rq.

I add dump stack behind the error log and get following:
[  267.161043] do_IRQ: 7.33 No irq handler for vector migration/7
[  267.161045] CPU: 7 PID: 52 Comm: migration/7 Not tainted 4.15.0-rc7+ #27
[  267.161045] Hardware name: LENOVO 10MLS0E339/3106, BIOS M1AKT22A 06/27/2017
[  267.161046] Call Trace:
[  267.161047]  
[  267.161052]  dump_stack+0x7c/0xb5
[  267.161054]  do_IRQ+0xb9/0xf0
[  267.161056]  common_interrupt+0xa2/0xa2
[  267.161057]  
[  267.161059] RIP: 0010:multi_cpu_stop+0xb0/0x120
[  267.161060] RSP: 0018:bb6c81af7e70 EFLAGS: 0202 ORIG_RAX: 
ffde
[  267.161061] RAX: 0001 RBX: 0004 RCX: 
[  267.161062] RDX: 0006 RSI: 898c4591 RDI: 0202
[  267.161063] RBP: bb6c826e7c88 R08: 991abc1256bc R09: 0005
[  267.161063] R10: bb6c81af7db8 R11: 89c91d20 R12: 0001
[  267.161064] R13: bb6c826e7cac R14: 0003 R15: 
[  267.161067]  ? cpu_stop_queue_work+0x90/0x90
[  267.161068]  cpu_stopper_thread+0x83/0x100
[  267.161070]  smpboot_thread_fn+0x161/0x220
[  267.161072]  kthread+0xf5/0x130
[  267.161073]  ? sort_range+0x20/0x20
[  267.161074]  ? kthread_associate_blkcg+0xe0/0xe0
[  267.161076]  ret_from_fork+0x24/0x30

The irq just occurred after the irq is enabled in multi_cpu_stop.

0x8112d655 is in multi_cpu_stop 
(/home/will/u04/source_code/linux-block/kernel/stop_machine.c:223).
218  */
219 touch_nmi_watchdog();
220 }
221 } while (curstate != MULTI_STOP_EXIT);
222 
223 local_irq_restore(flags);
224 return err;
225 }


Thanks
Jianchao


Re: [PATCH v7 5/5] document: add document for kaslr_mem

2018-01-18 Thread Baoquan He
On 01/17/18 at 06:53pm, Chao Fan wrote:
> Signed-off-by: Chao Fan 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index e2de7c006a74..f6d5adde1a73 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2350,6 +2350,16 @@
>   allocations which rules out almost all kernel
>   allocations. Use with caution!
>  
> + kaslr_mem=nn[KMG][@ss[KMG]]
> + [KNL] Force usage of a specific region of memory.
[KNL] Force usage of a specific region of memory
for KASLR during kernel decompression stage.
Region of memory to be used is from ss to ss+nn.
If ss is omitted, it is equivalent to 
kaslr_mem=nn[KMG]@0.
Multiple regions can be specified, comma delimited.
Notice: only support 4 regions at most now.
Example:
kaslr_mem=1G,500M@2G,1G@4G

Try to rewrite the doc, just for reference.

> + Make some features, like memory hotplug and 1G huge
> + page work well with KASLR. Region of usable memory is
> + from ss to ss+nn. If ss is omitted, it defaults to 0.
> + Multiple regions can be specified, comma delimited.
> + Notice: we support 4 regions at most now.
> + Example:
> + kaslr_mem=1G,500M@2G,1G@4G
> +
>   MTD_Partition=  [MTD]
>   Format: ,,,
>  
> -- 
> 2.14.3
> 
> 
> 


Re: [RFC PATCH] e1000e: Remove Other from EIAC.

2018-01-18 Thread Shrikrishna Khare


On Thu, 18 Jan 2018, Benjamin Poirier wrote:

> On 2018/01/18 15:50, Benjamin Poirier wrote:
> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation.
> > 
> > Some experimentation showed that this flaw in vmware e1000e emulation can
> > be worked around by not setting Other in EIAC. This is how it was before
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> 
> vmware folks, please comment.

Thank you for bringing this to our attention.

Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which 
has the said patch), I could bring up e1000e interface (version: 3.2.6-k),
get dhcp address and even do large file downloads without difficulty.

Could you give us more pointers on how we may be able to reproduce this 
locally? Was there anything different with the configuration when the 
issue was observed? Is the issue consistently reproducible?

Thanks,
Shri


Re: [PATCH net-next 3/5] net: hns3: add ethtool -p support for phy device

2018-01-18 Thread lipeng (Y)



On 2018/1/18 22:25, Andrew Lunn wrote:

+static int hclge_set_led_status_phy(struct phy_device *phydev, int value)
+{
+   int ret, cur_page;
+
+   mutex_lock(&phydev->lock);
+
+   ret = phy_read(phydev, HCLGE_PHY_PAGE_REG);
+   if (ret < 0)
+   goto out;
+   else
+   cur_page = ret;
+
+   ret = phy_write(phydev, HCLGE_PHY_PAGE_REG, HCLGE_PHY_PAGE_LED);
+   if (ret)
+   goto out;
+
+   ret = phy_write(phydev, HCLGE_LED_FC_REG, value);
+   if (ret)
+   goto out;
+
+   ret = phy_write(phydev, HCLGE_PHY_PAGE_REG, cur_page);
+
+out:
+   mutex_unlock(&phydev->lock);
+   return ret;
+}

Sorry, but NACK.

Please add an interface to phylib and the phy driver you are using to
do this.


  #define HCLGE_PHY_PAGE_MDIX   0
  #define HCLGE_PHY_PAGE_COPPER 0
+#define HCLGE_PHY_PAGE_LED 3
  
  /* Page Selection Reg. */

  #define HCLGE_PHY_PAGE_REG22
@@ -73,6 +74,15 @@
  /* Copper Specific Status Register */
  #define HCLGE_PHY_CSS_REG 17
  
+/* LED Function Control Register */

+#define HCLGE_LED_FC_REG   16
+
+/* LED Polarity Control Register */
+#define HCLGE_LED_PC_REG   17
+
+#define HCLGE_LED_FORCE_ON 9
+#define HCLGE_LED_FORCE_OFF8
+

By the looks of these defines, you assume you have a Marvell PHY.
Please make this generic so anybody with a Marvell PHY can use it.

Andrew

Hi  Andrw,

As your suggestion, we need add  interface to  phylib and the phy driver.
We will consider your suggestion and push this patch after we fix your 
comments.


so we will remove this patch  in V2 patch-set.

Thanks
Peng Li


.






Re: [PATCH] scsi: fas216: fix sense buffer initialization

2018-01-18 Thread Martin K. Petersen

Arnd,

> While testing with the ARM specific memset() macro removed, I ran
> into a compiler warning that shows an old bug:
>
> drivers/scsi/arm/fas216.c: In function 'fas216_rq_sns_done':
> drivers/scsi/arm/fas216.c:2014:40: error: argument to 'sizeof' in 'memset' 
> call is the same expression as the destination; did you mean to provide an 
> explicit length? [-Werror=sizeof-pointer-memaccess]
>
> It turns out that the definition of the scsi_cmd structure changed back
> in linux-2.6.25, so now we clear only four bytes (sizeof(pointer)) instead
> of 96 (SCSI_SENSE_BUFFERSIZE). I did not check whether we actually need
> to initialize the buffer here, but it's clear that if we do it, we
> should use the correct size.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH] ecryptfs: lookup: Don't check if mount_crypt_stat is NULL

2018-01-18 Thread Guenter Roeck
mount_crypt_stat is assigned to
&ecryptfs_superblock_to_private(ecryptfs_dentry->d_sb)->mount_crypt_stat,
and mount_crypt_stat is not the first object in struct ecryptfs_sb_info.
mount_crypt_stat is therefore never NULL. At the same time, no crash
in ecryptfs_lookup() has been reported, and the lookup functions in
other file systems don't check if d_sb is NULL either.
Given that, remove the NULL check.

Signed-off-by: Guenter Roeck 
---
 fs/ecryptfs/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 847904aa63a9..97d17eaeba07 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -395,8 +395,7 @@ static struct dentry *ecryptfs_lookup(struct inode 
*ecryptfs_dir_inode,
 
mount_crypt_stat = &ecryptfs_superblock_to_private(
ecryptfs_dentry->d_sb)->mount_crypt_stat;
-   if (mount_crypt_stat
-   && (mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES)) {
+   if (mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES) {
rc = ecryptfs_encrypt_and_encode_filename(
&encrypted_and_encoded_name, &len,
mount_crypt_stat, name, len);
-- 
2.7.4



Re: [Resend Patch] KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs()

2018-01-18 Thread Lan Tianyu
Hi Eric:
Great thanks for your review.
On Thu, Jan 18, 2018 at 10:39:04AM -0800, Eric Biggers wrote:
> On Tue, Jan 16, 2018 at 05:34:07PM +0800, Tianyu Lan wrote:
> > kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
> > status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
> > to fix it.
> > 
> > Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set)
> > Reported-by: Jeremi Piotrowski 
> > Cc: Paolo Bonzini 
> > Cc: Radim Krčmář 
> > Signed-off-by: Tianyu Lan 
> > ---
> > Sorry for noise. Missed kvm maillist.
> > 
> >  arch/x86/kvm/x86.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1cec2c6..c53298d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7496,13 +7496,13 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
> >  
> >  int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> >  {
> > -   if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
> > +   if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
> > /*
> >  * When EFER.LME and CR0.PG are set, the processor is in
> >  * 64-bit mode (though maybe in a 32-bit code segment).
> >  * CR4.PAE and EFER.LMA must be set.
> >  */
> > -   if (!(sregs->cr4 & X86_CR4_PAE_BIT)
> > +   if (!(sregs->cr4 & X86_CR4_PAE)
> > || !(sregs->efer & EFER_LMA))
> > return -EINVAL;
> > } else {
> > -- 
> > 2.7.4
> > 
> 
> I came across this too and was just about to send the exact same patch.  It
> looks good to me as long as the bits it's supposed to be checking were correct
> in the first place.  Patch title could maybe be shortened a bit, e.g. 
> "KVM/x86:
> Fix references to CR0.PG and CR4.PAE in kvm_valid_sregs()".  The "Fixes:" line
> is also formatted incorrectly.

That will be better and will update.

> 
> Thanks,
> 
> Eric


Re: [PATCH v7 2/5] x86/KASLR: Handle the memory regions specified in kaslr_mem

2018-01-18 Thread Baoquan He
On 01/17/18 at 06:53pm, Chao Fan wrote:
> If no 'kaslr_mem=' specified, just handle the e820/efi entries directly
> as before. Otherwise, limit kernel to memory regions specified in
> 'kaslr_mem=' commandline.
> 
> Rename process_mem_region to slots_count to match
> slots_fetch_random, and name new function as process_mem_region.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/kaslr.c | 64 
> +---
>  1 file changed, 53 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index b21741135673..b200a7ceafc1 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -548,9 +548,9 @@ static unsigned long slots_fetch_random(void)
>   return 0;
>  }

Looks good, ack.

Acked-by: Baoquan He 

>  
> -static void process_mem_region(struct mem_vector *entry,
> -unsigned long minimum,
> -unsigned long image_size)
> +static void slots_count(struct mem_vector *entry,
> + unsigned long minimum,
> + unsigned long image_size)
>  {
>   struct mem_vector region, overlap;
>   struct slot_area slot_area;
> @@ -627,6 +627,52 @@ static void process_mem_region(struct mem_vector *entry,
>   }
>  }
>  
> +static bool process_mem_region(struct mem_vector region,
> +unsigned long long minimum,
> +unsigned long long image_size)
> +{
> + /*
> +  * If kaslr_mem= specified, walk all the regions, and
> +  * filter the intersection to slots_count.
> +  */
> + if (num_usable_region > 0) {
> + int i;
> +
> + for (i = 0; i < num_usable_region; i++) {
> + struct mem_vector entry;
> + unsigned long long start, end, entry_end, region_end;
> +
> + start = mem_usable[i].start;
> + end = start + mem_usable[i].size;
> + region_end = region.start + region.size;
> +
> + entry.start = clamp(region.start, start, end);
> + entry_end = clamp(region_end, start, end);
> +
> + if (entry.start < entry_end) {
> + entry.size = entry_end - entry.start;
> + slots_count(&entry, minimum, image_size);
> + }
> +
> + if (slot_area_index == MAX_SLOT_AREA) {
> + debug_putstr("Aborted e820/efi memmap scan 
> (slot_areas full)!\n");
> + return 1;
> + }
> + }
> + return 0;
> + }
> +
> + /*
> +  * If no kaslr_mem stored, use region directly
> +  */
> + slots_count(®ion, minimum, image_size);
> + if (slot_area_index == MAX_SLOT_AREA) {
> + debug_putstr("Aborted e820/efi memmap scan (slot_areas 
> full)!\n");
> + return 1;
> + }
> + return 0;
> +}
> +
>  #ifdef CONFIG_EFI
>  /*
>   * Returns true if mirror region found (and must have been processed
> @@ -692,11 +738,9 @@ process_efi_entries(unsigned long minimum, unsigned long 
> image_size)
>  
>   region.start = md->phys_addr;
>   region.size = md->num_pages << EFI_PAGE_SHIFT;
> - process_mem_region(®ion, minimum, image_size);
> - if (slot_area_index == MAX_SLOT_AREA) {
> - debug_putstr("Aborted EFI scan (slot_areas full)!\n");
> +
> + if (process_mem_region(region, minimum, image_size))
>   break;
> - }
>   }
>   return true;
>  }
> @@ -723,11 +767,9 @@ static void process_e820_entries(unsigned long minimum,
>   continue;
>   region.start = entry->addr;
>   region.size = entry->size;
> - process_mem_region(®ion, minimum, image_size);
> - if (slot_area_index == MAX_SLOT_AREA) {
> - debug_putstr("Aborted e820 scan (slot_areas full)!\n");
> +
> + if (process_mem_region(region, minimum, image_size))
>   break;
> - }
>   }
>  }
>  
> -- 
> 2.14.3
> 
> 
> 


Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes

2018-01-18 Thread Byungchul Park

On 1/19/2018 12:21 AM, Steven Rostedt wrote:

On Thu, 18 Jan 2018 13:01:46 +0900
Byungchul Park  wrote:


I disagree. It is like a spinlock. You can say a spinlock() that is
blocked is also waiting for an event. That event being the owner does a
spin_unlock().


That's exactly what I was saying. Excuse me but, I don't understand
what you want to say. Could you explain more? What do you disagree?


I guess I'm confused at what you are asking for then.


Sorry for not enough explanation. What I asked you for is:

   1. Relocate acquire()s/release()s.
   2. So make it simpler and remove unnecessary one.
   3. So make it look like the following form,
  because it's a thing simulating "wait and event".

  A context
  -
  lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
  /* "Read" one is better though..*/

  /* A section, we suspect a wait for an event might happen. */
  ...

  lock_map_release(wait);

  The place actually doing the wait
  -
  lock_map_acquire(wait);
  lock_map_release(wait);

  wait_for_event(wait); /* Actually do the wait */

Honestly, you used acquire()s/release()s as if they are cross-
release stuff which mainly handles general waits and events,
not only things doing "acquire -> critical area -> release".
But that's not in the mainline at the moment.


I find your way confusing. I'm simulating a spinlock not a wait for
completion. A wait for completion usually initiates something then


I used the word, *event* instead of *completion*. wait_for_completion()
and complete() are just an example of a pair of waiter and event.
Lock and unlock can also be another example, too.

Important thing is that who waits and who triggers the event. Using the
pair, we can achieve various things, for examples:

 1. Synchronization like wait_for_completion() does.
 2. Control exclusively entering into a critical area.
 3. Whatever.


waits for it to complete. This is trying to get into a critical area
but another task is currently in it. It's simulating a spinlock as far
as I can see.


Anyway it's an example of "waiter for an event, and the event".

JFYI, spinning or sleeping does not matter. Those are just methods to

 ^
 whether spining or sleeping doesn't matter.


achieve a wait. I know you're not talking about this though. It's JFYI.


OK, if it is just FYI.


Actually, the last paragraph is JFYI tho.


-- Steve





--
Thanks,
Byungchul


Re: [PATCH] [RESEND] megaraid: use ktime_get_real for firmware time

2018-01-18 Thread Martin K. Petersen

Arnd,

> do_gettimeofday() overflows in 2038 on 32-bit architectures and
> is deprecated, so convert this driver to call ktime_get_real()
> directly. This also simplifies the calculation.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/3] hisi_sas: v2 hw LED support

2018-01-18 Thread Martin K. Petersen

John,

> This patchset includes SGPIO support for driving LEDs for boards
> including a SoC (like hip07) with v2 hw.

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v7 1/5] x86/KASLR: Add kaslr_mem=nn[KMG]@ss[KMG]

2018-01-18 Thread Baoquan He
On 01/17/18 at 06:53pm, Chao Fan wrote:
> Introduce a new kernel parameter kaslr_mem=nn[KMG]@ss[KMG] which is used
> by KASLR only during kernel decompression stage.
> 
> Users can use it to specify memory regions where kernel can be randomized
> into. E.g if movable_node specified in kernel cmdline, kernel could be
  ~ remove 'into'
> extracted into those movable regions, this will make memory hotplug fail.
> With the help of 'kaslr_mem=', limit kernel in those immovable regions
> specified.
> 
> Signed-off-by: Chao Fan 
> ---
>  arch/x86/boot/compressed/kaslr.c | 73 
> ++--
>  1 file changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index 8199a6187251..b21741135673 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -108,6 +108,15 @@ enum mem_avoid_index {
>  
>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>  
> +/* Only support at most 4 usable memory regions specified for kaslr */
> +#define MAX_KASLR_MEM_USABLE 4
> +
> +/* Store the usable memory regions for kaslr */
> +static struct mem_vector mem_usable[MAX_KASLR_MEM_USABLE];

The name xx_usable sounds not so good, while I don't know what
is better. Otherwise this patch looks good to me.

Ack it.

Acked-by: Baoquan He 

> +
> +/* The amount of usable regions for kaslr user specify, not more than 4 */
> +static int num_usable_region;
> +
>  static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>  {
>   /* Item one is entirely before item two. */
> @@ -206,7 +215,62 @@ static void mem_avoid_memmap(char *str)
>   memmap_too_large = true;
>  }
>  
> -static int handle_mem_memmap(void)
> +static int parse_kaslr_mem(char *p,
> +unsigned long long *start,
> +unsigned long long *size)
> +{
> + char *oldp;
> +
> + if (!p)
> + return -EINVAL;
> +
> + oldp = p;
> + *size = memparse(p, &p);
> + if (p == oldp)
> + return -EINVAL;
> +
> + switch (*p) {
> + case '@':
> + *start = memparse(p + 1, &p);
> + return 0;
> + default:
> + /*
> +  * If w/o offset, only size specified, kaslr_mem=nn[KMG]
> +  * has the same behaviour as kaslr_mem=nn[KMG]@0. It means
> +  * the region starts from 0.
> +  */
> + *start = 0;
> + return 0;
> + }
> +
> + return -EINVAL;
> +}
> +
> +static void parse_kaslr_mem_regions(char *str)
> +{
> + static int i;
> +
> + while (str && (i < MAX_KASLR_MEM_USABLE)) {
> + int rc;
> + unsigned long long start, size;
> + char *k = strchr(str, ',');
> +
> + if (k)
> + *k++ = 0;
> +
> + rc = parse_kaslr_mem(str, &start, &size);
> + if (rc < 0)
> + break;
> + str = k;
> +
> + mem_usable[i].start = start;
> + mem_usable[i].size = size;
> + i++;
> + }
> + num_usable_region = i;
> +}
> +
> +static int handle_mem_filter(void)
>  {
>   char *args = (char *)get_cmd_line_ptr();
>   size_t len = strlen((char *)args);
> @@ -214,7 +278,8 @@ static int handle_mem_memmap(void)
>   char *param, *val;
>   u64 mem_size;
>  
> - if (!strstr(args, "memmap=") && !strstr(args, "mem="))
> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
> + !strstr(args, "kaslr_mem="))
>   return 0;
>  
>   tmp_cmdline = malloc(len + 1);
> @@ -239,6 +304,8 @@ static int handle_mem_memmap(void)
>  
>   if (!strcmp(param, "memmap")) {
>   mem_avoid_memmap(val);
> + } else if (!strcmp(param, "kaslr_mem")) {
> + parse_kaslr_mem_regions(val);
>   } else if (!strcmp(param, "mem")) {
>   char *p = val;
>  
> @@ -378,7 +445,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
> long input_size,
>   /* We don't need to set a mapping for setup_data. */
>  
>   /* Mark the memmap regions we need to avoid */
> - handle_mem_memmap();
> + handle_mem_filter();
>  
>  #ifdef CONFIG_X86_VERBOSE_BOOTUP
>   /* Make sure video RAM can be used. */
> -- 
> 2.14.3
> 
> 
> 


Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

2018-01-18 Thread Ming Lei
On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote:
> On 1/18/18 11:47 AM, Bart Van Assche wrote:
> >> This is all very tiresome.
> > 
> > Yes, this is tiresome. It is very annoying to me that others keep
> > introducing so many regressions in such important parts of the kernel.
> > It is also annoying to me that I get blamed if I report a regression
> > instead of seeing that the regression gets fixed.
> 
> I agree, it sucks that any change there introduces the regression. I'm
> fine with doing the delay insert again until a new patch is proven to be
> better.

That way is still buggy as I explained, since rerun queue before adding
request to hctx->dispatch_list isn't correct. Who can make sure the request
is visible when __blk_mq_run_hw_queue() is called?

Not mention this way will cause performance regression again.

> 
> From the original topic of this email, we have conditions that can cause
> the driver to not be able to submit an IO. A set of those conditions can
> only happen if IO is in flight, and those cases we have covered just
> fine. Another set can potentially trigger without IO being in flight.
> These are cases where a non-device resource is unavailable at the time
> of submission. This might be iommu running out of space, for instance,
> or it might be a memory allocation of some sort. For these cases, we
> don't get any notification when the shortage clears. All we can do is
> ensure that we restart operations at some point in the future. We're SOL
> at that point, but we have to ensure that we make forward progress.

Right, it is a generic issue, not DM-specific one, almost all drivers
call kmalloc(GFP_ATOMIC) in IO path.

IMO, there is enough time for figuring out a generic solution before
4.16 release.

> 
> That last set of conditions better not be a a common occurence, since
> performance is down the toilet at that point. I don't want to introduce
> hot path code to rectify it. Have the driver return if that happens in a
> way that is DIFFERENT from needing a normal restart. The driver knows if
> this is a resource that will become available when IO completes on this
> device or not. If we get that return, we have a generic run-again delay.

Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and
it should be DM-only which returns STS_RESOURCE so often.

> 
> This basically becomes the same as doing the delay queue thing from DM,
> but just in a generic fashion.

Yeah, it is right.

-- 
Ming


Re: [PATCH] kconfig: Clarify choice dependency propagation

2018-01-18 Thread Masahiro Yamada
Hi Ulf,


2018-01-19 1:58 GMT+09:00 Ulf Magnusson :
> On Thu, Jan 18, 2018 at 5:47 PM, Masahiro Yamada
>  wrote:
>> 2018-01-14 23:12 GMT+09:00 Ulf Magnusson :
>>> It's easy to miss that choices are special-cased to pass on their mode
>>> as the parent dependency.
>>>
>>> No functional changes. Only comments added.
>>>
>>> Signed-off-by: Ulf Magnusson 
>>> ---
>>>  scripts/kconfig/menu.c | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
>>> index 92d3f06cd8a2..53964d911708 100644
>>> --- a/scripts/kconfig/menu.c
>>> +++ b/scripts/kconfig/menu.c
>>> @@ -323,6 +323,13 @@ void menu_finalize(struct menu *parent)
>>> if (menu->sym && menu->sym->type == 
>>> S_UNKNOWN)
>>> menu_set_type(sym->type);
>>> }
>>> +
>>> +   /*
>>> +* Use the choice itself as the parent dependency of
>>> +* the contained items. This turns the mode of the
>>> +* choice into an upper bound on the visibility of 
>>> the
>>> +* choice symbols.
>>> +*/
>>
>> Does the last "choice symbols" mean "choice values"?
>> The "choice" itself is a symbol with NULL name,
>> so I'd like to clarify it.
>
> Yep, means the choice values (which are symbols). "Choice values"
> would probably be clearer, yeah, or maybe "choice value symbols".
>
> Should I submit a new version? I'm fine with just a
> 's/symbols/values/' or 's/symbols/value symbols/' otherwise.
>
> Cheers,
> Ulf


It is trivial, so I will locally fix it up.

s/symbols/value symbols/

Thanks!




-- 
Best Regards
Masahiro Yamada


[GIT] Networking

2018-01-18 Thread David Miller

1) Fix BPF divides by zero, from Eric Dumazet and Alexei Starovoitov.

2) Reject stores into bpf context via st and xadd, from Daniel
   Borkmann.

3) Fix a memory leak in TUN, from Cong Wang.

4) Disable RX aggregation on a specific troublesome configuration of
   r8152 in a Dell TB16b dock.

5) Fix sw_ctx leak in tls, from Sabrina Dubroca.

6) Fix program replacement in cls_bpf, from Daniel Borkmann.

7) Fix uninitialized station_info structures in cfg80211, from Johannes
   Berg.

8) Fix miscalculation of transport header offset field in flow
   dissector, from Eric Dumazet.

9) Fix LPM tree leak on failure in mlxsw driver, from Ido Schimmel.

Please pull, thanks a lot!

The following changes since commit 8cbab92dff778e516064c13113ca15d4869ec883:

  Merge tag 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma (2018-01-16 16:47:40 
-0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to a0dca10fce42ae82651edbe682b1c637a8ecd365:

  ibmvnic: Fix IPv6 packet descriptors (2018-01-18 21:19:06 -0500)


Alexei Starovoitov (1):
  bpf: fix 32-bit divide by zero

Alexey Kodanev (1):
  ip6_gre: init dev->mtu and dev->hard_header_len correctly

Arnd Bergmann (1):
  fm10k: mark PM functions as __maybe_unused

Christophe Leroy (1):
  net: fs_enet: do not call phy_stop() in interrupts

Cong Wang (1):
  tun: fix a memory leak for tfile->tx_array

Daniel Borkmann (4):
  bpf, arm64: fix stack_depth tracking in combination with tail calls
  bpf: reject stores into ctx via st and xadd
  bpf: fix cls_bpf on filter replace
  bpf: mark dst unknown on inconsistent {s, u}bounds adjustments

David S. Miller (4):
  Merge tag 'linux-can-fixes-for-4.15-20180116' of 
ssh://gitolite.kernel.org/.../mkl/linux-can
  Merge git://git.kernel.org/.../bpf/bpf
  Merge tag 'wireless-drivers-for-davem-2018-01-17' of 
git://git.kernel.org/.../kvalo/wireless-drivers
      Merge tag 'linux-can-fixes-for-4.15-20180118' of 
ssh://gitolite.kernel.org/.../mkl/linux-can

Eric Dumazet (2):
  bpf: fix divides by zero
  flow_dissector: properly cap thoff field

Guenter Roeck (1):
  bcma: Fix 'allmodconfig' and BCMA builds on MIPS targets

Ido Schimmel (1):
  mlxsw: spectrum_router: Free LPM tree upon failure

Ilya Lesokhin (1):
  net/tls: Only attach to sockets in ESTABLISHED state

James Hogan (1):
  ssb: Disable PCI host for PCI_DRIVERS_GENERIC

Johannes Berg (1):
  cfg80211: fix station info handling bugs

Kai-Heng Feng (1):
  r8152: disable RX aggregation on Dell TB16 dock

Marc Kleine-Budde (2):
  can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
  can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

Rex Chang (1):
  Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater 
than 1500

Sabrina Dubroca (3):
  tls: fix sw_ctx leak
  tls: return -EBUSY if crypto_info is already set
  tls: reset crypto_info when do_tls_setsockopt_tx fails

Stephane Grosjean (1):
  can: peak: fix potential bug in packet fragmentation

Thomas Falcon (2):
  ibmvnic: Fix IP offload control buffer
  ibmvnic: Fix IPv6 packet descriptors

Wei Wang (1):
  ipv6: don't let tb6_root node share routes with other node

Wright Feng (1):
  brcmfmac: fix CLM load error for legacy chips when user helper is enabled

Xin Long (1):
  netlink: reset extack earlier in netlink_rcv_skb

 arch/arm64/net/bpf_jit_comp.c |  20 ++-
 drivers/bcma/Kconfig  |   2 +-
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c|  21 +--
 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c |  16 +++--
 drivers/net/ethernet/freescale/fs_enet/fs_enet.h  |   1 +
 drivers/net/ethernet/ibm/ibmvnic.c|  24 -
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  |   9 ++---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c |  20 +++
 drivers/net/ethernet/ti/netcp_core.c  |   2 +-
 drivers/net/tun.c |  15 ++--
 drivers/net/usb/r8152.c   |  13 +++
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c |   9 ++---
 drivers/ssb/Kconfig   |   2 +-
 kernel/bpf/core.c |   4 +--
 kernel/bpf/verifier.c |  64 
+++--
 net/can/af_can.c  |  36 
---
 net/core/filter.c |   4 +++
 net/core/flow_dissector.c |   3 +-

Re: [Ocfs2-devel] [PATCH v4 3/3] ocfs2: nowait aio support

2018-01-18 Thread alex chen
Hi Gang,

Looks good to me.

On 2018/1/15 17:08, Gang He wrote:
> Return -EAGAIN if any of the following checks fail for
> direct I/O with nowait flag:
> Can not get the related locks immediately,
> Blocks are not allocated at the write location, it will trigger
> block allocation, this will block IO operations.
> 
> Signed-off-by: Gang He 
Reviewed-by: Alex Chen 

> ---
>  fs/ocfs2/dir.c |   2 +-
>  fs/ocfs2/dlmglue.c |  20 +++---
>  fs/ocfs2/dlmglue.h |   2 +-
>  fs/ocfs2/file.c| 101 
> +++--
>  fs/ocfs2/mmap.c|   2 +-
>  fs/ocfs2/ocfs2_trace.h |  10 +++--
>  6 files changed, 104 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
> index febe631..ea50901 100644
> --- a/fs/ocfs2/dir.c
> +++ b/fs/ocfs2/dir.c
> @@ -1957,7 +1957,7 @@ int ocfs2_readdir(struct file *file, struct dir_context 
> *ctx)
>  
>   trace_ocfs2_readdir((unsigned long long)OCFS2_I(inode)->ip_blkno);
>  
> - error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level);
> + error = ocfs2_inode_lock_atime(inode, file->f_path.mnt, &lock_level, 1);
>   if (lock_level && error >= 0) {
>   /* We release EX lock which used to update atime
>* and get PR lock again to reduce contention
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index a68efa3..07e169f 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2515,13 +2515,18 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>  
>  int ocfs2_inode_lock_atime(struct inode *inode,
> struct vfsmount *vfsmnt,
> -   int *level)
> +   int *level, int wait)
>  {
>   int ret;
>  
> - ret = ocfs2_inode_lock(inode, NULL, 0);
> + if (wait)
> + ret = ocfs2_inode_lock(inode, NULL, 0);
> + else
> + ret = ocfs2_try_inode_lock(inode, NULL, 0);
> +
>   if (ret < 0) {
> - mlog_errno(ret);
> + if (ret != -EAGAIN)
> + mlog_errno(ret);
>   return ret;
>   }
>  
> @@ -2533,9 +2538,14 @@ int ocfs2_inode_lock_atime(struct inode *inode,
>   struct buffer_head *bh = NULL;
>  
>   ocfs2_inode_unlock(inode, 0);
> - ret = ocfs2_inode_lock(inode, &bh, 1);
> + if (wait)
> + ret = ocfs2_inode_lock(inode, &bh, 1);
> + else
> + ret = ocfs2_try_inode_lock(inode, &bh, 1);
> +
>   if (ret < 0) {
> - mlog_errno(ret);
> + if (ret != -EAGAIN)
> + mlog_errno(ret);
>   return ret;
>   }
>   *level = 1;
> diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
> index 05910fc..c83dbb5 100644
> --- a/fs/ocfs2/dlmglue.h
> +++ b/fs/ocfs2/dlmglue.h
> @@ -123,7 +123,7 @@ void ocfs2_refcount_lock_res_init(struct ocfs2_lock_res 
> *lockres,
>  void ocfs2_open_unlock(struct inode *inode);
>  int ocfs2_inode_lock_atime(struct inode *inode,
> struct vfsmount *vfsmnt,
> -   int *level);
> +   int *level, int wait);
>  int ocfs2_inode_lock_full_nested(struct inode *inode,
>struct buffer_head **ret_bh,
>int ex,
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index a1d0510..5d1784a 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -140,6 +140,8 @@ static int ocfs2_file_open(struct inode *inode, struct 
> file *file)
>   spin_unlock(&oi->ip_lock);
>   }
>  
> + file->f_mode |= FMODE_NOWAIT;
> +
>  leave:
>   return status;
>  }
> @@ -2132,12 +2134,12 @@ static int ocfs2_prepare_inode_for_refcount(struct 
> inode *inode,
>  }
>  
>  static int ocfs2_prepare_inode_for_write(struct file *file,
> -  loff_t pos,
> -  size_t count)
> +  loff_t pos, size_t count, int wait)
>  {
> - int ret = 0, meta_level = 0;
> + int ret = 0, meta_level = 0, overwrite_io = 0;
>   struct dentry *dentry = file->f_path.dentry;
>   struct inode *inode = d_inode(dentry);
> + struct buffer_head *di_bh = NULL;
>   loff_t end;
>  
>   /*
> @@ -2145,13 +2147,40 @@ static int ocfs2_prepare_inode_for_write(struct file 
> *file,
>* if we need to make modifications here.
>*/
>   for(;;) {
> - ret = ocfs2_inode_lock(inode, NULL, meta_level);
> + if (wait)
> + ret = ocfs2_inode_lock(inode, NULL, meta_level);
> + else
> + ret = ocfs2_try_inode_lock(inode,
> + overwrite_io ? NULL : &di_bh, meta_level);
>   if (ret < 0) {
>   meta_level = -1;
> - mlog_errno(ret);
> + 

[PATCH] xhci:Fix NULL pointer in xhci debugfs

2018-01-18 Thread Zhengjun Xing
Commit dde634057da7 ("xhci: Fix use-after-free in xhci debugfs") causes a
null pointer dereference while fixing xhci-debugfs usage of ring pointers
that were freed during hibernate.

The fix passed addresses to ring pointers instead, but forgot to do this
change for the xhci_ring_trb_show function.

The address of the ring pointer passed to xhci-debugfs was of a temporary
ring pointer "new_ring" instead of the actual ring "ring" pointer. The
temporary new_ring pointer will be set to NULL later causing the NULL
pointer dereference.

This issue was seen when reading xhci related files in debugfs:

cat /sys/kernel/debug/usb/xhci/*/devices/*/ep*/trbs

[  184.604861] BUG: unable to handle kernel NULL pointer dereference at (null)
[  184.613776] IP: xhci_ring_trb_show+0x3a/0x890
[  184.618733] PGD 264193067 P4D 264193067 PUD 263238067 PMD 0
[  184.625184] Oops:  [#1] SMP
[  184.726410] RIP: 0010:xhci_ring_trb_show+0x3a/0x890
[  184.731944] RSP: 0018:ba8243c0fd90 EFLAGS: 00010246
[  184.737880] RAX:  RBX:  RCX: 000295d6
[  184.746020] RDX: 000295d5 RSI: 0001 RDI: 971a6418d400
[  184.754121] RBP:  R08:  R09: 
[  184.76] R10: 971a64c98a80 R11: 971a62a00e40 R12: 971a62a85500
[  184.770325] R13: 0002 R14: 971a6418d400 R15: 971a6418d400
[  184.778448] FS:  7fe725a79700() GS:971a6ec0() 
knlGS:
[  184.787644] CS:  0010 DS:  ES:  CR0: 80050033
[  184.794168] CR2:  CR3: 00025f365005 CR4: 003606f0
[  184.802318] Call Trace:
[  184.805094]  ? seq_read+0x281/0x3b0
[  184.809068]  seq_read+0xeb/0x3b0
[  184.812735]  full_proxy_read+0x4d/0x70
[  184.817007]  __vfs_read+0x23/0x120
[  184.820870]  vfs_read+0x91/0x130
[  184.824538]  SyS_read+0x42/0x90
[  184.828106]  entry_SYSCALL_64_fastpath+0x1a/0x7d

Fixes: dde634057da7 ("xhci: Fix use-after-free in xhci debugfs")
Signed-off-by: Zhengjun Xing 
---
 drivers/usb/host/xhci-debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index e26e685d8a57..5851052d4668 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -211,7 +211,7 @@ static void xhci_ring_dump_segment(struct seq_file *s,
 static int xhci_ring_trb_show(struct seq_file *s, void *unused)
 {
int i;
-   struct xhci_ring*ring = s->private;
+   struct xhci_ring*ring = *(struct xhci_ring **)s->private;
struct xhci_segment *seg = ring->first_seg;
 
for (i = 0; i < ring->num_segs; i++) {
@@ -387,7 +387,7 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
 
snprintf(epriv->name, sizeof(epriv->name), "ep%02d", ep_index);
epriv->root = xhci_debugfs_create_ring_dir(xhci,
-  &dev->eps[ep_index].new_ring,
+  &dev->eps[ep_index].ring,
   epriv->name,
   spriv->root);
spriv->eps[ep_index] = epriv;
-- 
2.11.0



[PATCH] clk: meson: axg: fix the od shift of the sys_pll

2018-01-18 Thread Yixun Lan
According to datasheet, the od shift of sys_pll is 16,
fix the typo which introduced at previous commit.

Fixes: 78b4af312f91 ('clk: meson-axg: add clock controller drivers')
Signed-off-by: Yixun Lan 
---
 drivers/clk/meson/axg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/meson/axg.c b/drivers/clk/meson/axg.c
index 7988dc8506b0..04a231eaf648 100644
--- a/drivers/clk/meson/axg.c
+++ b/drivers/clk/meson/axg.c
@@ -64,7 +64,7 @@ static struct meson_clk_pll axg_sys_pll = {
},
.od = {
.reg_off = HHI_SYS_PLL_CNTL,
-   .shift   = 10,
+   .shift   = 16,
.width   = 2,
},
.lock = &meson_clk_lock,
-- 
2.15.1



Re: [PATCH 3/9] clk: meson: remove unnecessary rounding in the pll clock

2018-01-18 Thread Yixun Lan

On 01/19/18 02:45, Jerome Brunet wrote:
> The pll driver perform the rate calculation in Mhz, which adds an
> unnecessary rounding down to the Mhz of the rate. Use 64bits long
> integer to perform this calculation safely on meson8b and perform the
> calculation in Hz instead
> 
> Fixes: 7a29a869434e ("clk: meson: Add support for Meson clock controller")
> Signed-off-by: Jerome Brunet 
> ---
>  drivers/clk/meson/clk-pll.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/clk/meson/clk-pll.c b/drivers/clk/meson/clk-pll.c
> index 2614341fc4ad..fa4cec13d6e8 100644
> --- a/drivers/clk/meson/clk-pll.c
> +++ b/drivers/clk/meson/clk-pll.c
> @@ -51,8 +51,7 @@ static unsigned long meson_clk_pll_recalc_rate(struct 
> clk_hw *hw,
>  {
>   struct meson_clk_pll *pll = to_meson_clk_pll(hw);
>   struct parm *p;
> - unsigned long parent_rate_mhz = parent_rate / 100;
> - unsigned long rate_mhz;
> + u64 rate;
>   u16 n, m, frac = 0, od, od2 = 0;
>   u32 reg;
>  
> @@ -74,17 +73,18 @@ static unsigned long meson_clk_pll_recalc_rate(struct 
> clk_hw *hw,
>   od2 = PARM_GET(p->width, p->shift, reg);
>   }
>  
> + rate = (u64)m * parent_rate;
> +
>   p = &pll->frac;
>   if (p->width) {
>   reg = readl(pll->base + p->reg_off);
>   frac = PARM_GET(p->width, p->shift, reg);
> - rate_mhz = (parent_rate_mhz * m + \
> - (parent_rate_mhz * frac >> 12)) * 2 / n;
> - rate_mhz = rate_mhz >> od >> od2;
> - } else
> - rate_mhz = (parent_rate_mhz * m / n) >> od >> od2;
>  
> - return rate_mhz * 100;
> + rate += (u64)parent_rate * frac >> 12;
> + rate *= 2;
> + }
> +
> + return (rate / n) >> od >> od2;
>  }
>  
>  static long meson_clk_pll_round_rate(struct clk_hw *hw, unsigned long rate,
> 

Hi Jerome:
 This is exactly what I want to propose, thanks for pushing this!

 With the whole series, the fixed_pll is more accurate, and the ethernet
driver on axg is capable of choosing fclk_div2..

Yixun


  1   2   3   4   5   6   7   8   9   10   >