Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Paul E. McKenney
On Sun, Apr 24, 2016 at 10:37:25PM -0700, Guenter Roeck wrote:
> On 04/24/2016 10:28 PM, Paul E. McKenney wrote:
> >On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:
> >>Hi Paul,
> >>
> >>On 04/24/2016 02:31 PM, Paul E. McKenney wrote:
> >>>On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:
> Hi,
> 
> I see the following log message when running a qemu test for 'beagle'
> with omap2plus_defconfig.
> 
> ===
> [ INFO: suspicious RCU usage. ]
> 4.6.0-rc4-next-20160422 #1 Not tainted
> ---
> include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
> Hardware name: Generic OMAP3-GP (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa8/0xe0)
> [] (dump_stack) from [] 
> (pwrdm_set_next_pwrst+0xf8/0x1cc)
> [] (pwrdm_set_next_pwrst) from [] 
> (omap3_enter_idle_bm+0x1b8/0x1e8)
> [] (omap3_enter_idle_bm) from [] 
> (cpuidle_enter_state+0x84/0x408)
> [] (cpuidle_enter_state) from [] 
> (cpu_startup_entry+0x1c8/0x3f0)
> [] (cpu_startup_entry) from [] 
> (start_kernel+0x354/0x3cc)
> 
> bisect points to commit 'rcu: Remove superfluous versions of
> rcu_read_lock_sched_held()'. Bisect log is attached.
> >>>
> >>>I believe that the real fix is not a revert of that commit, but rather
> >>>that some of the tracing statements need an "_rcuidle" suffix.
> >>>
> >>>Something like the following (untested, probably does not build) patch.
> >>>
> >>>   Thanx, Paul
> >>>
> >>>
> >>>
> >>>commit ca91304178e1cf53ee391236a0ac3969cc814e5f
> >>>Author: Paul E. McKenney 
> >>>Date:   Sun Apr 24 14:30:16 2016 -0700
> >>>
> >>> arm: Use _rcuidle tracepoint to allow use from idle
> >>>
> >>> Signed-off-by: Paul E. McKenney 
> >>>
> >>>diff --git a/arch/arm/mach-omap2/powerdomain.c 
> >>>b/arch/arm/mach-omap2/powerdomain.c
> >>>index 78af6d8cf2e2..12b66b5bcc55 100644
> >>>--- a/arch/arm/mach-omap2/powerdomain.c
> >>>+++ b/arch/arm/mach-omap2/powerdomain.c
> >>>@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
> >>>pwrst)
> >>>
> >>>   if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
> >>>   /* Trace the pwrdm desired target state */
> >>>-  trace_power_domain_target(pwrdm->name, pwrst,
> >>>-smp_processor_id());
> >>>+  trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
> >>>+smp_processor_id());
> >>>   /* Program the pwrdm desired target state */
> >>>   ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
> >>>   }
> >>>
> >>
> >>It does build. After applying it, I get a different traceback.
> >>
> >>[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >>[] (show_stack) from [] (dump_stack+0xa8/0xe0)
> >>[] (dump_stack) from [] 
> >>(_pwrdm_state_switch+0x188/0x32c)
> >>[] (_pwrdm_state_switch) from [] 
> >>(_pwrdm_post_transition_cb+0xc/0x14)
> >>[] (_pwrdm_post_transition_cb) from [] 
> >>(pwrdm_for_each+0x30/0x5c)
> >>[] (pwrdm_for_each) from [] 
> >>(pwrdm_post_transition+0x24/0x30)
> >>[] (pwrdm_post_transition) from [] 
> >>(omap_sram_idle+0xfc/0x240)
> >>[] (omap_sram_idle) from [] 
> >>(omap3_enter_idle_bm+0xf0/0x1e8)
> >>[] (omap3_enter_idle_bm) from [] 
> >>(cpuidle_enter_state+0x84/0x408)
> >>[] (cpuidle_enter_state) from [] 
> >>(cpu_startup_entry+0x1c8/0x3f0)
> >>[] (cpu_startup_entry) from [] 
> >>(start_kernel+0x354/0x3cc)
> >>
> >>After making the same change in _pwrdm_state_switch(), the traceback is gone
> >>from my tests (beagle, beagle-xm, and overo-tobi).
> >
> >Very good!
> >
> >(And yes, you normally find these one at a time...)
> >
> Are you going to submit a formal patch ?

I can, but please feel free to send mine along with yours, if you wish.

Either way, please let me know.

Thanx, Paul



Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Paul E. McKenney
On Sun, Apr 24, 2016 at 10:37:25PM -0700, Guenter Roeck wrote:
> On 04/24/2016 10:28 PM, Paul E. McKenney wrote:
> >On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:
> >>Hi Paul,
> >>
> >>On 04/24/2016 02:31 PM, Paul E. McKenney wrote:
> >>>On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:
> Hi,
> 
> I see the following log message when running a qemu test for 'beagle'
> with omap2plus_defconfig.
> 
> ===
> [ INFO: suspicious RCU usage. ]
> 4.6.0-rc4-next-20160422 #1 Not tainted
> ---
> include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
> Hardware name: Generic OMAP3-GP (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa8/0xe0)
> [] (dump_stack) from [] 
> (pwrdm_set_next_pwrst+0xf8/0x1cc)
> [] (pwrdm_set_next_pwrst) from [] 
> (omap3_enter_idle_bm+0x1b8/0x1e8)
> [] (omap3_enter_idle_bm) from [] 
> (cpuidle_enter_state+0x84/0x408)
> [] (cpuidle_enter_state) from [] 
> (cpu_startup_entry+0x1c8/0x3f0)
> [] (cpu_startup_entry) from [] 
> (start_kernel+0x354/0x3cc)
> 
> bisect points to commit 'rcu: Remove superfluous versions of
> rcu_read_lock_sched_held()'. Bisect log is attached.
> >>>
> >>>I believe that the real fix is not a revert of that commit, but rather
> >>>that some of the tracing statements need an "_rcuidle" suffix.
> >>>
> >>>Something like the following (untested, probably does not build) patch.
> >>>
> >>>   Thanx, Paul
> >>>
> >>>
> >>>
> >>>commit ca91304178e1cf53ee391236a0ac3969cc814e5f
> >>>Author: Paul E. McKenney 
> >>>Date:   Sun Apr 24 14:30:16 2016 -0700
> >>>
> >>> arm: Use _rcuidle tracepoint to allow use from idle
> >>>
> >>> Signed-off-by: Paul E. McKenney 
> >>>
> >>>diff --git a/arch/arm/mach-omap2/powerdomain.c 
> >>>b/arch/arm/mach-omap2/powerdomain.c
> >>>index 78af6d8cf2e2..12b66b5bcc55 100644
> >>>--- a/arch/arm/mach-omap2/powerdomain.c
> >>>+++ b/arch/arm/mach-omap2/powerdomain.c
> >>>@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
> >>>pwrst)
> >>>
> >>>   if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
> >>>   /* Trace the pwrdm desired target state */
> >>>-  trace_power_domain_target(pwrdm->name, pwrst,
> >>>-smp_processor_id());
> >>>+  trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
> >>>+smp_processor_id());
> >>>   /* Program the pwrdm desired target state */
> >>>   ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
> >>>   }
> >>>
> >>
> >>It does build. After applying it, I get a different traceback.
> >>
> >>[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >>[] (show_stack) from [] (dump_stack+0xa8/0xe0)
> >>[] (dump_stack) from [] 
> >>(_pwrdm_state_switch+0x188/0x32c)
> >>[] (_pwrdm_state_switch) from [] 
> >>(_pwrdm_post_transition_cb+0xc/0x14)
> >>[] (_pwrdm_post_transition_cb) from [] 
> >>(pwrdm_for_each+0x30/0x5c)
> >>[] (pwrdm_for_each) from [] 
> >>(pwrdm_post_transition+0x24/0x30)
> >>[] (pwrdm_post_transition) from [] 
> >>(omap_sram_idle+0xfc/0x240)
> >>[] (omap_sram_idle) from [] 
> >>(omap3_enter_idle_bm+0xf0/0x1e8)
> >>[] (omap3_enter_idle_bm) from [] 
> >>(cpuidle_enter_state+0x84/0x408)
> >>[] (cpuidle_enter_state) from [] 
> >>(cpu_startup_entry+0x1c8/0x3f0)
> >>[] (cpu_startup_entry) from [] 
> >>(start_kernel+0x354/0x3cc)
> >>
> >>After making the same change in _pwrdm_state_switch(), the traceback is gone
> >>from my tests (beagle, beagle-xm, and overo-tobi).
> >
> >Very good!
> >
> >(And yes, you normally find these one at a time...)
> >
> Are you going to submit a formal patch ?

I can, but please feel free to send mine along with yours, if you wish.

Either way, please let me know.

Thanx, Paul



Re: [PATCH v7 5/8] [media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver

2016-04-24 Thread 李務誠
> >
> > ...
> >
> > > +static int fops_vcodec_open(struct file *file)
> > > +{
> > > +   struct video_device *vfd = video_devdata(file);
> > > +   struct mtk_vcodec_dev *dev = video_drvdata(file);
> > > +   struct mtk_vcodec_ctx *ctx = NULL;
> > > +   int ret = 0;
> > > +
> > > +   if (dev->instance_mask == ~0UL) {
> > > +   /* ffz Undefined if no zero exists, err handling here */
> > > +   mtk_v4l2_err("Too many open contexts");
> > > +   ret = -EBUSY;
> > > +   goto err_alloc;
> >
> > I'm not happy seeing this here. You should always be able to open the 
> > device.
> > I would expect to see a check like this in e.g. start_streaming, since 
> > that's
> > where you start to use the hardware for real, and checking if you have 
> > enough
> > resources there is perfectly fine.
> >
> > If this is an artificial constraint (i.e. not based on a real hardware 
> > limitation),
> > then it should perhaps just be dropped. Such constraints tend to be 
> > pointless.
> > If you want to encode 20 streams simultaneously, then why not? It will be 
> > very
> > slow, but that's not this driver's problem :-)
> >
> We use ffz to get instance index.
> This only make sure that instance id is correct since in ffz
> description,
> "Undefined if no zero exists, so code should check against ~0UL first."
> In this case, it may not be able to move to start_streaming.
> Any suggestion that how we do this?
The instance index is only used for printing the debug information.
No? If that's the case, you can remove instance index and print
mtk_vcodec_ctx address for debugging.
>
>
> > > +   }
> > > +
> > > +   mutex_lock(>dev_mutex);
> > > +
> > > +   ctx = devm_kzalloc(>plat_dev->dev, sizeof(*ctx), GFP_KERNEL);
> >
> > Why is this a devm_ call? It is not managed by a device, so it seems to me 
> > that
> > a regular kzalloc is good enough here.
> >
> > > +   if (!ctx) {
> > > +   ret = -ENOMEM;
> > > +   goto err_alloc;
> > > +   }
> > > +
> > > +   ctx->idx = ffz(dev->instance_mask);
> > > +   v4l2_fh_init(>fh, video_devdata(file));
> > > +   file->private_data = >fh;
> > > +   v4l2_fh_add(>fh);
> > > +   INIT_LIST_HEAD(>list);
> > > +   ctx->dev = dev;
> > > +   init_waitqueue_head(>queue);
> > > +
> > > +   if (vfd == dev->vfd_enc) {
> > > +   ctx->type = MTK_INST_ENCODER;
> > > +   ret = mtk_vcodec_enc_ctrls_setup(ctx);
> > > +   if (ret) {
> > > +   mtk_v4l2_err("Failed to setup controls() (%d)",
> > > +  ret);
> > > +   goto err_ctrls_setup;
> > > +   }
> > > +   ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev_enc, ctx,
> > > +_vcodec_enc_queue_init);
> > > +   if (IS_ERR(ctx->m2m_ctx)) {
> > > +   ret = PTR_ERR(ctx->m2m_ctx);
> > > +   mtk_v4l2_err("Failed to v4l2_m2m_ctx_init() (%d)",
> > > +  ret);
> > > +   goto err_m2m_ctx_init;
> > > +   }
> > > +   mtk_vcodec_enc_set_default_params(ctx);
> > > +   } else {
> > > +   mtk_v4l2_err("Invalid vfd !");
> >
> > This shouldn't be possible at all. I would just drop the 'if (vfd == 
> > dev->vfd_enc)' check.
> Got it, will remove in next version.
>
> >
> > > +   ret = -ENOENT;
> > > +   goto err_m2m_ctx_init;
> > > +   }
> > > +
> > > +   if (v4l2_fh_is_singular(>fh)) {
> > > +   ret = vpu_load_firmware(dev->vpu_plat_dev);
> > > +   if (ret < 0) {
> > > +   /*
> > > + * Return 0 if downloading firmware successfully,
> > > + * otherwise it is failed
> > > + */
> > > +   mtk_v4l2_err("vpu_load_firmware failed!");
> > > +   goto err_load_fw;
> > > +   }
> >
> > The fw load seems to be a one-time thing, but here it is done every time
> > someone opens the device and nobody else had it open.
> >
> > If this is a one time thing, then using a bool 'loaded_fw' makes more sense.
> >
> More than one module use vpu firmware, encoder/decoder/mdp...etc.
> If this is first encode instance, we need to check and load vpu
> firmware.
> vpu_load_firmware will check and load firmware when necessary.
>
> best regards,
> Tiffany
> > > +
> > > +   dev->enc_capability =
> > > +   vpu_get_venc_hw_capa(dev->vpu_plat_dev);
> > > +   mtk_v4l2_debug(0, "encoder capability %x", 
> > > dev->enc_capability);
> > > +   }
> > > +
> > > +   mtk_v4l2_debug(2, "Create instance [%d]@%p m2m_ctx=%p ",
> > > +ctx->idx, ctx, ctx->m2m_ctx);
> > > +   set_bit(ctx->idx, >instance_mask);
> > > +   dev->num_instances++;
> > > +   list_add(>list, >ctx_list);
> > > +
> > > +   mutex_unlock(>dev_mutex);
> > > +   mtk_v4l2_debug(0, "%s encoder [%d]", dev_name(>plat_dev->dev),
> > > +  ctx->idx);
> > > +   

Re: [PATCH v7 5/8] [media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver

2016-04-24 Thread 李務誠
> >
> > ...
> >
> > > +static int fops_vcodec_open(struct file *file)
> > > +{
> > > +   struct video_device *vfd = video_devdata(file);
> > > +   struct mtk_vcodec_dev *dev = video_drvdata(file);
> > > +   struct mtk_vcodec_ctx *ctx = NULL;
> > > +   int ret = 0;
> > > +
> > > +   if (dev->instance_mask == ~0UL) {
> > > +   /* ffz Undefined if no zero exists, err handling here */
> > > +   mtk_v4l2_err("Too many open contexts");
> > > +   ret = -EBUSY;
> > > +   goto err_alloc;
> >
> > I'm not happy seeing this here. You should always be able to open the 
> > device.
> > I would expect to see a check like this in e.g. start_streaming, since 
> > that's
> > where you start to use the hardware for real, and checking if you have 
> > enough
> > resources there is perfectly fine.
> >
> > If this is an artificial constraint (i.e. not based on a real hardware 
> > limitation),
> > then it should perhaps just be dropped. Such constraints tend to be 
> > pointless.
> > If you want to encode 20 streams simultaneously, then why not? It will be 
> > very
> > slow, but that's not this driver's problem :-)
> >
> We use ffz to get instance index.
> This only make sure that instance id is correct since in ffz
> description,
> "Undefined if no zero exists, so code should check against ~0UL first."
> In this case, it may not be able to move to start_streaming.
> Any suggestion that how we do this?
The instance index is only used for printing the debug information.
No? If that's the case, you can remove instance index and print
mtk_vcodec_ctx address for debugging.
>
>
> > > +   }
> > > +
> > > +   mutex_lock(>dev_mutex);
> > > +
> > > +   ctx = devm_kzalloc(>plat_dev->dev, sizeof(*ctx), GFP_KERNEL);
> >
> > Why is this a devm_ call? It is not managed by a device, so it seems to me 
> > that
> > a regular kzalloc is good enough here.
> >
> > > +   if (!ctx) {
> > > +   ret = -ENOMEM;
> > > +   goto err_alloc;
> > > +   }
> > > +
> > > +   ctx->idx = ffz(dev->instance_mask);
> > > +   v4l2_fh_init(>fh, video_devdata(file));
> > > +   file->private_data = >fh;
> > > +   v4l2_fh_add(>fh);
> > > +   INIT_LIST_HEAD(>list);
> > > +   ctx->dev = dev;
> > > +   init_waitqueue_head(>queue);
> > > +
> > > +   if (vfd == dev->vfd_enc) {
> > > +   ctx->type = MTK_INST_ENCODER;
> > > +   ret = mtk_vcodec_enc_ctrls_setup(ctx);
> > > +   if (ret) {
> > > +   mtk_v4l2_err("Failed to setup controls() (%d)",
> > > +  ret);
> > > +   goto err_ctrls_setup;
> > > +   }
> > > +   ctx->m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev_enc, ctx,
> > > +_vcodec_enc_queue_init);
> > > +   if (IS_ERR(ctx->m2m_ctx)) {
> > > +   ret = PTR_ERR(ctx->m2m_ctx);
> > > +   mtk_v4l2_err("Failed to v4l2_m2m_ctx_init() (%d)",
> > > +  ret);
> > > +   goto err_m2m_ctx_init;
> > > +   }
> > > +   mtk_vcodec_enc_set_default_params(ctx);
> > > +   } else {
> > > +   mtk_v4l2_err("Invalid vfd !");
> >
> > This shouldn't be possible at all. I would just drop the 'if (vfd == 
> > dev->vfd_enc)' check.
> Got it, will remove in next version.
>
> >
> > > +   ret = -ENOENT;
> > > +   goto err_m2m_ctx_init;
> > > +   }
> > > +
> > > +   if (v4l2_fh_is_singular(>fh)) {
> > > +   ret = vpu_load_firmware(dev->vpu_plat_dev);
> > > +   if (ret < 0) {
> > > +   /*
> > > + * Return 0 if downloading firmware successfully,
> > > + * otherwise it is failed
> > > + */
> > > +   mtk_v4l2_err("vpu_load_firmware failed!");
> > > +   goto err_load_fw;
> > > +   }
> >
> > The fw load seems to be a one-time thing, but here it is done every time
> > someone opens the device and nobody else had it open.
> >
> > If this is a one time thing, then using a bool 'loaded_fw' makes more sense.
> >
> More than one module use vpu firmware, encoder/decoder/mdp...etc.
> If this is first encode instance, we need to check and load vpu
> firmware.
> vpu_load_firmware will check and load firmware when necessary.
>
> best regards,
> Tiffany
> > > +
> > > +   dev->enc_capability =
> > > +   vpu_get_venc_hw_capa(dev->vpu_plat_dev);
> > > +   mtk_v4l2_debug(0, "encoder capability %x", 
> > > dev->enc_capability);
> > > +   }
> > > +
> > > +   mtk_v4l2_debug(2, "Create instance [%d]@%p m2m_ctx=%p ",
> > > +ctx->idx, ctx, ctx->m2m_ctx);
> > > +   set_bit(ctx->idx, >instance_mask);
> > > +   dev->num_instances++;
> > > +   list_add(>list, >ctx_list);
> > > +
> > > +   mutex_unlock(>dev_mutex);
> > > +   mtk_v4l2_debug(0, "%s encoder [%d]", dev_name(>plat_dev->dev),
> > > +  ctx->idx);
> > > +   

Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Guenter Roeck

On 04/24/2016 10:28 PM, Paul E. McKenney wrote:

On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:

Hi Paul,

On 04/24/2016 02:31 PM, Paul E. McKenney wrote:

On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:

Hi,

I see the following log message when running a qemu test for 'beagle'
with omap2plus_defconfig.

===
[ INFO: suspicious RCU usage. ]
4.6.0-rc4-next-20160422 #1 Not tainted
---
include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
Hardware name: Generic OMAP3-GP (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xa8/0xe0)
[] (dump_stack) from [] (pwrdm_set_next_pwrst+0xf8/0x1cc)
[] (pwrdm_set_next_pwrst) from [] 
(omap3_enter_idle_bm+0x1b8/0x1e8)
[] (omap3_enter_idle_bm) from [] 
(cpuidle_enter_state+0x84/0x408)
[] (cpuidle_enter_state) from [] 
(cpu_startup_entry+0x1c8/0x3f0)
[] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)

bisect points to commit 'rcu: Remove superfluous versions of
rcu_read_lock_sched_held()'. Bisect log is attached.


I believe that the real fix is not a revert of that commit, but rather
that some of the tracing statements need an "_rcuidle" suffix.

Something like the following (untested, probably does not build) patch.

Thanx, Paul



commit ca91304178e1cf53ee391236a0ac3969cc814e5f
Author: Paul E. McKenney 
Date:   Sun Apr 24 14:30:16 2016 -0700

 arm: Use _rcuidle tracepoint to allow use from idle

 Signed-off-by: Paul E. McKenney 

diff --git a/arch/arm/mach-omap2/powerdomain.c 
b/arch/arm/mach-omap2/powerdomain.c
index 78af6d8cf2e2..12b66b5bcc55 100644
--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
pwrst)

if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
/* Trace the pwrdm desired target state */
-   trace_power_domain_target(pwrdm->name, pwrst,
- smp_processor_id());
+   trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
+ smp_processor_id());
/* Program the pwrdm desired target state */
ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
}



It does build. After applying it, I get a different traceback.

[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xa8/0xe0)
[] (dump_stack) from [] (_pwrdm_state_switch+0x188/0x32c)
[] (_pwrdm_state_switch) from [] 
(_pwrdm_post_transition_cb+0xc/0x14)
[] (_pwrdm_post_transition_cb) from [] 
(pwrdm_for_each+0x30/0x5c)
[] (pwrdm_for_each) from [] 
(pwrdm_post_transition+0x24/0x30)
[] (pwrdm_post_transition) from [] 
(omap_sram_idle+0xfc/0x240)
[] (omap_sram_idle) from [] (omap3_enter_idle_bm+0xf0/0x1e8)
[] (omap3_enter_idle_bm) from [] 
(cpuidle_enter_state+0x84/0x408)
[] (cpuidle_enter_state) from [] 
(cpu_startup_entry+0x1c8/0x3f0)
[] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)

After making the same change in _pwrdm_state_switch(), the traceback is gone
from my tests (beagle, beagle-xm, and overo-tobi).


Very good!

(And yes, you normally find these one at a time...)


Are you going to submit a formal patch ?

Thanks,
Guenter



Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Guenter Roeck

On 04/24/2016 10:28 PM, Paul E. McKenney wrote:

On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:

Hi Paul,

On 04/24/2016 02:31 PM, Paul E. McKenney wrote:

On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:

Hi,

I see the following log message when running a qemu test for 'beagle'
with omap2plus_defconfig.

===
[ INFO: suspicious RCU usage. ]
4.6.0-rc4-next-20160422 #1 Not tainted
---
include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
Hardware name: Generic OMAP3-GP (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xa8/0xe0)
[] (dump_stack) from [] (pwrdm_set_next_pwrst+0xf8/0x1cc)
[] (pwrdm_set_next_pwrst) from [] 
(omap3_enter_idle_bm+0x1b8/0x1e8)
[] (omap3_enter_idle_bm) from [] 
(cpuidle_enter_state+0x84/0x408)
[] (cpuidle_enter_state) from [] 
(cpu_startup_entry+0x1c8/0x3f0)
[] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)

bisect points to commit 'rcu: Remove superfluous versions of
rcu_read_lock_sched_held()'. Bisect log is attached.


I believe that the real fix is not a revert of that commit, but rather
that some of the tracing statements need an "_rcuidle" suffix.

Something like the following (untested, probably does not build) patch.

Thanx, Paul



commit ca91304178e1cf53ee391236a0ac3969cc814e5f
Author: Paul E. McKenney 
Date:   Sun Apr 24 14:30:16 2016 -0700

 arm: Use _rcuidle tracepoint to allow use from idle

 Signed-off-by: Paul E. McKenney 

diff --git a/arch/arm/mach-omap2/powerdomain.c 
b/arch/arm/mach-omap2/powerdomain.c
index 78af6d8cf2e2..12b66b5bcc55 100644
--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
pwrst)

if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
/* Trace the pwrdm desired target state */
-   trace_power_domain_target(pwrdm->name, pwrst,
- smp_processor_id());
+   trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
+ smp_processor_id());
/* Program the pwrdm desired target state */
ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
}



It does build. After applying it, I get a different traceback.

[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xa8/0xe0)
[] (dump_stack) from [] (_pwrdm_state_switch+0x188/0x32c)
[] (_pwrdm_state_switch) from [] 
(_pwrdm_post_transition_cb+0xc/0x14)
[] (_pwrdm_post_transition_cb) from [] 
(pwrdm_for_each+0x30/0x5c)
[] (pwrdm_for_each) from [] 
(pwrdm_post_transition+0x24/0x30)
[] (pwrdm_post_transition) from [] 
(omap_sram_idle+0xfc/0x240)
[] (omap_sram_idle) from [] (omap3_enter_idle_bm+0xf0/0x1e8)
[] (omap3_enter_idle_bm) from [] 
(cpuidle_enter_state+0x84/0x408)
[] (cpuidle_enter_state) from [] 
(cpu_startup_entry+0x1c8/0x3f0)
[] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)

After making the same change in _pwrdm_state_switch(), the traceback is gone
from my tests (beagle, beagle-xm, and overo-tobi).


Very good!

(And yes, you normally find these one at a time...)


Are you going to submit a formal patch ?

Thanks,
Guenter



Re: [PATCH V4 4/4] gpio: tegra: Add support for gpio debounce

2016-04-24 Thread Alexandre Courbot
Sorry, just realized I commented on v3...

On Fri, Apr 22, 2016 at 7:09 PM, Laxman Dewangan  wrote:
> NVIDIA's Tegra210 support the HW debounce in the GPIO controller
> for all its GPIO pins.
>
> Add support for setting debounce timing by implementing the
> set_debounce callback of gpiochip.
>
> Signed-off-by: Laxman Dewangan 
>
> ---
> Changes from V1:
> - Write debounce count before enable.
> - Make sure the debounce count do not have any boot residuals.
>
> Changes from V2:
> - Only access register for debounce when SoC support debounce.
>
> Changes from V3:
> - Add locking mechanism in debounce count register update.
> - Move DBC register from prev patch to here.
>
>  drivers/gpio/gpio-tegra.c | 69 
> ++-
>  1 file changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index 6af5eb2..45d80ec 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -46,10 +46,13 @@
>  #define GPIO_INT_ENB(t, x) (GPIO_REG(t, x) + 0x50)
>  #define GPIO_INT_LVL(t, x) (GPIO_REG(t, x) + 0x60)
>  #define GPIO_INT_CLR(t, x) (GPIO_REG(t, x) + 0x70)
> +#define GPIO_DBC_CNT(t, x) (GPIO_REG(t, x) + 0xF0)
> +
>
>  #define GPIO_MSK_CNF(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x00)
>  #define GPIO_MSK_OE(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x10)
>  #define GPIO_MSK_OUT(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0X20)
> +#define GPIO_MSK_DBC_EN(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x30)
>  #define GPIO_MSK_INT_STA(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x40)
>  #define GPIO_MSK_INT_ENB(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x50)
>  #define GPIO_MSK_INT_LVL(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x60)
> @@ -67,6 +70,7 @@ struct tegra_gpio_bank {
> int bank;
> int irq;
> spinlock_t lvl_lock[4];
> +   spinlock_t dbc_lock[4]; /* Lock for updating debounce count register 
> */

I'm nit'ing here, but maybe one spinlock shared by all ports would be
enough? (the same would apply to lvl_lock, so feel free to do this as
a separate patch) I don't think we expect *that* many concurrent
accesses, do we?

>  #ifdef CONFIG_PM_SLEEP
> u32 cnf[4];
> u32 out[4];
> @@ -74,11 +78,14 @@ struct tegra_gpio_bank {
> u32 int_enb[4];
> u32 int_lvl[4];
> u32 wake_enb[4];
> +   u32 dbc_enb[4];
>  #endif
> +   u32 dbc_cnt[4];
> struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> +   bool debounce_supported;
> u32 bank_stride;
> u32 upper_offset;
>  };
> @@ -182,6 +189,38 @@ static int tegra_gpio_direction_output(struct gpio_chip 
> *chip, unsigned offset,
> return 0;
>  }
>
> +static int tegra_gpio_set_debounce(struct gpio_chip *chip, unsigned int 
> offset,
> +  unsigned int debounce)
> +{
> +   struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> +   unsigned int debounce_ms = DIV_ROUND_UP(debounce, 1000);
> +   unsigned long flags;
> +   int port = GPIO_PORT(offset);
> +   int bank = GPIO_BANK(offset);

Maybe declare "bank" as follows:

struct tegra_gpio_bank *bank = >bank_info[GPIO_BANK(offset)];

This will allow you to simplify the code that follows:

> +
> +   if (!debounce_ms) {
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset),
> + offset, 0);
> +   return 0;
> +   }
> +
> +   debounce_ms = min(debounce_ms, 255U);
> +
> +   /* There is only one debounce count register per port and hence
> +* set the maximum of current and requested debounce time.
> +*/
> +   spin_lock_irqsave(>bank_info[bank].dbc_lock[port], flags);
> +   if (tgi->bank_info[bank].dbc_cnt[port] < debounce_ms) {
> +   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, 
> offset));
> +   tgi->bank_info[bank].dbc_cnt[port] = debounce_ms;
> +   }
> +   spin_unlock_irqrestore(>bank_info[bank].dbc_lock[port], flags);

Becomes:

   spin_lock_irqsave(bank->dbc_lock[port], flags);
   if (bank->dbc_cnt[port] < debounce_ms) {
   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, offset));
   bank->dbc_cnt[port] = debounce_ms;
   }
   spin_unlock_irqrestore(>dbc_lock[port], flags);

Which is nicer to the eyes.

Extra points if you initialize port and bank after we ensure that
debounce_ms is not zero and that their value will actually be used.

> +
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset), offset, 1);
> +
> +   return 0;
> +}
> +
>  static int tegra_gpio_to_irq(struct gpio_chip *chip, unsigned offset)
>  {
> struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> @@ -197,6 +236,7 @@ static struct gpio_chip tegra_gpio_chip = {
> 

Re: [PATCH V4 4/4] gpio: tegra: Add support for gpio debounce

2016-04-24 Thread Alexandre Courbot
Sorry, just realized I commented on v3...

On Fri, Apr 22, 2016 at 7:09 PM, Laxman Dewangan  wrote:
> NVIDIA's Tegra210 support the HW debounce in the GPIO controller
> for all its GPIO pins.
>
> Add support for setting debounce timing by implementing the
> set_debounce callback of gpiochip.
>
> Signed-off-by: Laxman Dewangan 
>
> ---
> Changes from V1:
> - Write debounce count before enable.
> - Make sure the debounce count do not have any boot residuals.
>
> Changes from V2:
> - Only access register for debounce when SoC support debounce.
>
> Changes from V3:
> - Add locking mechanism in debounce count register update.
> - Move DBC register from prev patch to here.
>
>  drivers/gpio/gpio-tegra.c | 69 
> ++-
>  1 file changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index 6af5eb2..45d80ec 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -46,10 +46,13 @@
>  #define GPIO_INT_ENB(t, x) (GPIO_REG(t, x) + 0x50)
>  #define GPIO_INT_LVL(t, x) (GPIO_REG(t, x) + 0x60)
>  #define GPIO_INT_CLR(t, x) (GPIO_REG(t, x) + 0x70)
> +#define GPIO_DBC_CNT(t, x) (GPIO_REG(t, x) + 0xF0)
> +
>
>  #define GPIO_MSK_CNF(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x00)
>  #define GPIO_MSK_OE(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x10)
>  #define GPIO_MSK_OUT(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0X20)
> +#define GPIO_MSK_DBC_EN(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x30)
>  #define GPIO_MSK_INT_STA(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x40)
>  #define GPIO_MSK_INT_ENB(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x50)
>  #define GPIO_MSK_INT_LVL(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x60)
> @@ -67,6 +70,7 @@ struct tegra_gpio_bank {
> int bank;
> int irq;
> spinlock_t lvl_lock[4];
> +   spinlock_t dbc_lock[4]; /* Lock for updating debounce count register 
> */

I'm nit'ing here, but maybe one spinlock shared by all ports would be
enough? (the same would apply to lvl_lock, so feel free to do this as
a separate patch) I don't think we expect *that* many concurrent
accesses, do we?

>  #ifdef CONFIG_PM_SLEEP
> u32 cnf[4];
> u32 out[4];
> @@ -74,11 +78,14 @@ struct tegra_gpio_bank {
> u32 int_enb[4];
> u32 int_lvl[4];
> u32 wake_enb[4];
> +   u32 dbc_enb[4];
>  #endif
> +   u32 dbc_cnt[4];
> struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> +   bool debounce_supported;
> u32 bank_stride;
> u32 upper_offset;
>  };
> @@ -182,6 +189,38 @@ static int tegra_gpio_direction_output(struct gpio_chip 
> *chip, unsigned offset,
> return 0;
>  }
>
> +static int tegra_gpio_set_debounce(struct gpio_chip *chip, unsigned int 
> offset,
> +  unsigned int debounce)
> +{
> +   struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> +   unsigned int debounce_ms = DIV_ROUND_UP(debounce, 1000);
> +   unsigned long flags;
> +   int port = GPIO_PORT(offset);
> +   int bank = GPIO_BANK(offset);

Maybe declare "bank" as follows:

struct tegra_gpio_bank *bank = >bank_info[GPIO_BANK(offset)];

This will allow you to simplify the code that follows:

> +
> +   if (!debounce_ms) {
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset),
> + offset, 0);
> +   return 0;
> +   }
> +
> +   debounce_ms = min(debounce_ms, 255U);
> +
> +   /* There is only one debounce count register per port and hence
> +* set the maximum of current and requested debounce time.
> +*/
> +   spin_lock_irqsave(>bank_info[bank].dbc_lock[port], flags);
> +   if (tgi->bank_info[bank].dbc_cnt[port] < debounce_ms) {
> +   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, 
> offset));
> +   tgi->bank_info[bank].dbc_cnt[port] = debounce_ms;
> +   }
> +   spin_unlock_irqrestore(>bank_info[bank].dbc_lock[port], flags);

Becomes:

   spin_lock_irqsave(bank->dbc_lock[port], flags);
   if (bank->dbc_cnt[port] < debounce_ms) {
   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, offset));
   bank->dbc_cnt[port] = debounce_ms;
   }
   spin_unlock_irqrestore(>dbc_lock[port], flags);

Which is nicer to the eyes.

Extra points if you initialize port and bank after we ensure that
debounce_ms is not zero and that their value will actually be used.

> +
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset), offset, 1);
> +
> +   return 0;
> +}
> +
>  static int tegra_gpio_to_irq(struct gpio_chip *chip, unsigned offset)
>  {
> struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> @@ -197,6 +236,7 @@ static struct gpio_chip tegra_gpio_chip = {
> .get= tegra_gpio_get,
>  

Re: [PATCH v2 0/6] Introduce ZONE_CMA

2016-04-24 Thread Joonsoo Kim
On Mon, Apr 25, 2016 at 02:21:04PM +0900, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Hello,
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> This series try to solve problems of current CMA implementation.
> 
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
> 
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
> 
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
> 
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
> 
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
> 
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
> 
> Anyway, this patchset solves four problems related to CMA implementation.
> 
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
> 
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   92.4  186.5
> pswpin: 8218647
> pswpout:160   69839
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   93.1  93.4
> pswpin: 8446
> pswpout:183   92
> 
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.
> 
> 2) Reclaim problem
> Currently, there is no logic to distinguish CMA pages in reclaim path.
> If reclaim is initiated for unmovable and reclaimable allocation,
> reclaiming CMA pages doesn't help to satisfy the request and reclaiming
> 

Re: [PATCH v2 0/6] Introduce ZONE_CMA

2016-04-24 Thread Joonsoo Kim
On Mon, Apr 25, 2016 at 02:21:04PM +0900, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Hello,
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> This series try to solve problems of current CMA implementation.
> 
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
> 
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
> 
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
> 
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
> 
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
> 
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
> 
> Anyway, this patchset solves four problems related to CMA implementation.
> 
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
> 
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   92.4  186.5
> pswpin: 8218647
> pswpout:160   69839
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   93.1  93.4
> pswpin: 8446
> pswpout:183   92
> 
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.
> 
> 2) Reclaim problem
> Currently, there is no logic to distinguish CMA pages in reclaim path.
> If reclaim is initiated for unmovable and reclaimable allocation,
> reclaiming CMA pages doesn't help to satisfy the request and reclaiming
> CMA page is just waste. 

Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Paul E. McKenney
On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:
> Hi Paul,
> 
> On 04/24/2016 02:31 PM, Paul E. McKenney wrote:
> >On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:
> >>Hi,
> >>
> >>I see the following log message when running a qemu test for 'beagle'
> >>with omap2plus_defconfig.
> >>
> >>===
> >>[ INFO: suspicious RCU usage. ]
> >>4.6.0-rc4-next-20160422 #1 Not tainted
> >>---
> >>include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!
> >>
> >>other info that might help us debug this:
> >>
> >>RCU used illegally from idle CPU!
> >>rcu_scheduler_active = 1, debug_locks = 0
> >>RCU used illegally from extended quiescent state!
> >>no locks held by swapper/0/0.
> >>
> >>stack backtrace:
> >>CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
> >>Hardware name: Generic OMAP3-GP (Flattened Device Tree)
> >>[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >>[] (show_stack) from [] (dump_stack+0xa8/0xe0)
> >>[] (dump_stack) from [] 
> >>(pwrdm_set_next_pwrst+0xf8/0x1cc)
> >>[] (pwrdm_set_next_pwrst) from [] 
> >>(omap3_enter_idle_bm+0x1b8/0x1e8)
> >>[] (omap3_enter_idle_bm) from [] 
> >>(cpuidle_enter_state+0x84/0x408)
> >>[] (cpuidle_enter_state) from [] 
> >>(cpu_startup_entry+0x1c8/0x3f0)
> >>[] (cpu_startup_entry) from [] 
> >>(start_kernel+0x354/0x3cc)
> >>
> >>bisect points to commit 'rcu: Remove superfluous versions of
> >>rcu_read_lock_sched_held()'. Bisect log is attached.
> >
> >I believe that the real fix is not a revert of that commit, but rather
> >that some of the tracing statements need an "_rcuidle" suffix.
> >
> >Something like the following (untested, probably does not build) patch.
> >
> > Thanx, Paul
> >
> >
> >
> >commit ca91304178e1cf53ee391236a0ac3969cc814e5f
> >Author: Paul E. McKenney 
> >Date:   Sun Apr 24 14:30:16 2016 -0700
> >
> > arm: Use _rcuidle tracepoint to allow use from idle
> >
> > Signed-off-by: Paul E. McKenney 
> >
> >diff --git a/arch/arm/mach-omap2/powerdomain.c 
> >b/arch/arm/mach-omap2/powerdomain.c
> >index 78af6d8cf2e2..12b66b5bcc55 100644
> >--- a/arch/arm/mach-omap2/powerdomain.c
> >+++ b/arch/arm/mach-omap2/powerdomain.c
> >@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
> >pwrst)
> >
> > if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
> > /* Trace the pwrdm desired target state */
> >-trace_power_domain_target(pwrdm->name, pwrst,
> >-  smp_processor_id());
> >+trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
> >+  smp_processor_id());
> > /* Program the pwrdm desired target state */
> > ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
> > }
> >
> 
> It does build. After applying it, I get a different traceback.
> 
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa8/0xe0)
> [] (dump_stack) from [] (_pwrdm_state_switch+0x188/0x32c)
> [] (_pwrdm_state_switch) from [] 
> (_pwrdm_post_transition_cb+0xc/0x14)
> [] (_pwrdm_post_transition_cb) from [] 
> (pwrdm_for_each+0x30/0x5c)
> [] (pwrdm_for_each) from [] 
> (pwrdm_post_transition+0x24/0x30)
> [] (pwrdm_post_transition) from [] 
> (omap_sram_idle+0xfc/0x240)
> [] (omap_sram_idle) from [] 
> (omap3_enter_idle_bm+0xf0/0x1e8)
> [] (omap3_enter_idle_bm) from [] 
> (cpuidle_enter_state+0x84/0x408)
> [] (cpuidle_enter_state) from [] 
> (cpu_startup_entry+0x1c8/0x3f0)
> [] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)
> 
> After making the same change in _pwrdm_state_switch(), the traceback is gone
> from my tests (beagle, beagle-xm, and overo-tobi).

Very good!

(And yes, you normally find these one at a time...)

Thanx, Paul



Re: next: suspicious RCU usage message since commit 'rcu: Remove superfluous versions of rcu_read_lock_sched_held()'

2016-04-24 Thread Paul E. McKenney
On Sun, Apr 24, 2016 at 04:56:38PM -0700, Guenter Roeck wrote:
> Hi Paul,
> 
> On 04/24/2016 02:31 PM, Paul E. McKenney wrote:
> >On Sun, Apr 24, 2016 at 02:14:24PM -0700, Guenter Roeck wrote:
> >>Hi,
> >>
> >>I see the following log message when running a qemu test for 'beagle'
> >>with omap2plus_defconfig.
> >>
> >>===
> >>[ INFO: suspicious RCU usage. ]
> >>4.6.0-rc4-next-20160422 #1 Not tainted
> >>---
> >>include/trace/events/power.h:328 suspicious rcu_dereference_check() usage!
> >>
> >>other info that might help us debug this:
> >>
> >>RCU used illegally from idle CPU!
> >>rcu_scheduler_active = 1, debug_locks = 0
> >>RCU used illegally from extended quiescent state!
> >>no locks held by swapper/0/0.
> >>
> >>stack backtrace:
> >>CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160422 #1
> >>Hardware name: Generic OMAP3-GP (Flattened Device Tree)
> >>[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >>[] (show_stack) from [] (dump_stack+0xa8/0xe0)
> >>[] (dump_stack) from [] 
> >>(pwrdm_set_next_pwrst+0xf8/0x1cc)
> >>[] (pwrdm_set_next_pwrst) from [] 
> >>(omap3_enter_idle_bm+0x1b8/0x1e8)
> >>[] (omap3_enter_idle_bm) from [] 
> >>(cpuidle_enter_state+0x84/0x408)
> >>[] (cpuidle_enter_state) from [] 
> >>(cpu_startup_entry+0x1c8/0x3f0)
> >>[] (cpu_startup_entry) from [] 
> >>(start_kernel+0x354/0x3cc)
> >>
> >>bisect points to commit 'rcu: Remove superfluous versions of
> >>rcu_read_lock_sched_held()'. Bisect log is attached.
> >
> >I believe that the real fix is not a revert of that commit, but rather
> >that some of the tracing statements need an "_rcuidle" suffix.
> >
> >Something like the following (untested, probably does not build) patch.
> >
> > Thanx, Paul
> >
> >
> >
> >commit ca91304178e1cf53ee391236a0ac3969cc814e5f
> >Author: Paul E. McKenney 
> >Date:   Sun Apr 24 14:30:16 2016 -0700
> >
> > arm: Use _rcuidle tracepoint to allow use from idle
> >
> > Signed-off-by: Paul E. McKenney 
> >
> >diff --git a/arch/arm/mach-omap2/powerdomain.c 
> >b/arch/arm/mach-omap2/powerdomain.c
> >index 78af6d8cf2e2..12b66b5bcc55 100644
> >--- a/arch/arm/mach-omap2/powerdomain.c
> >+++ b/arch/arm/mach-omap2/powerdomain.c
> >@@ -523,8 +523,8 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 
> >pwrst)
> >
> > if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
> > /* Trace the pwrdm desired target state */
> >-trace_power_domain_target(pwrdm->name, pwrst,
> >-  smp_processor_id());
> >+trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
> >+  smp_processor_id());
> > /* Program the pwrdm desired target state */
> > ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
> > }
> >
> 
> It does build. After applying it, I get a different traceback.
> 
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa8/0xe0)
> [] (dump_stack) from [] (_pwrdm_state_switch+0x188/0x32c)
> [] (_pwrdm_state_switch) from [] 
> (_pwrdm_post_transition_cb+0xc/0x14)
> [] (_pwrdm_post_transition_cb) from [] 
> (pwrdm_for_each+0x30/0x5c)
> [] (pwrdm_for_each) from [] 
> (pwrdm_post_transition+0x24/0x30)
> [] (pwrdm_post_transition) from [] 
> (omap_sram_idle+0xfc/0x240)
> [] (omap_sram_idle) from [] 
> (omap3_enter_idle_bm+0xf0/0x1e8)
> [] (omap3_enter_idle_bm) from [] 
> (cpuidle_enter_state+0x84/0x408)
> [] (cpuidle_enter_state) from [] 
> (cpu_startup_entry+0x1c8/0x3f0)
> [] (cpu_startup_entry) from [] (start_kernel+0x354/0x3cc)
> 
> After making the same change in _pwrdm_state_switch(), the traceback is gone
> from my tests (beagle, beagle-xm, and overo-tobi).

Very good!

(And yes, you normally find these one at a time...)

Thanx, Paul



[PATCH v2 3/6] mm/cma: populate ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 ---
 mm/cma.c   | 41 +
 mm/internal.h  |  3 +++
 mm/page_alloc.c| 26 --
 4 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20d8a5d..260c741 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..8684f50 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include 
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -145,6 +146,11 @@ err:
 static int __init cma_init_reserved_areas(void)
 {
int i;
+   struct zone *zone;
+   unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+   if (!cma_area_count)
+   return 0;
 
for (i = 0; i < cma_area_count; i++) {
int ret = cma_activate_area(_areas[i]);
@@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
return ret;
}
 
+   for (i = 0; i < cma_area_count; i++) {
+   if (start_pfn > cma_areas[i].base_pfn)
+   start_pfn = cma_areas[i].base_pfn;
+   if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
+   end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
+   }
+
+   for_each_populated_zone(zone) {
+   if (!is_zone_cma(zone))
+   continue;
+
+   /* ZONE_CMA doesn't need to exceed CMA region */
+   zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+   zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+   zone->zone_start_pfn;
+   }
+
+   /*
+* Reserved pages for ZONE_CMA are now activated and this would change
+* ZONE_CMA's managed page counter and other zone's present counter.
+* We need to re-calculate various zone information that depends on
+* this initialization.
+*/
+   build_all_zonelists(NULL, NULL);
+   for_each_populated_zone(zone) {
+   zone_pcp_update(zone);
+   set_zone_contiguous(zone);
+   }
+
+   /*
+* We need to re-init per zone wmark by calling
+* init_per_zone_wmark_min() but doesn't call here because it is
+* registered on module_init and it will be called later than us.
+*/
+
return 0;
 }
 core_initcall(cma_init_reserved_areas);
diff --git a/mm/internal.h b/mm/internal.h
index e30f40e..64e3131 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -156,6 +156,9 @@ extern void __free_pages_bootmem(struct page *page, 
unsigned long pfn,
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern int user_min_free_kbytes;
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone 

[PATCH v2 3/6] mm/cma: populate ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 ---
 mm/cma.c   | 41 +
 mm/internal.h  |  3 +++
 mm/page_alloc.c| 26 --
 4 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20d8a5d..260c741 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..8684f50 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include 
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -145,6 +146,11 @@ err:
 static int __init cma_init_reserved_areas(void)
 {
int i;
+   struct zone *zone;
+   unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+   if (!cma_area_count)
+   return 0;
 
for (i = 0; i < cma_area_count; i++) {
int ret = cma_activate_area(_areas[i]);
@@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
return ret;
}
 
+   for (i = 0; i < cma_area_count; i++) {
+   if (start_pfn > cma_areas[i].base_pfn)
+   start_pfn = cma_areas[i].base_pfn;
+   if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
+   end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
+   }
+
+   for_each_populated_zone(zone) {
+   if (!is_zone_cma(zone))
+   continue;
+
+   /* ZONE_CMA doesn't need to exceed CMA region */
+   zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+   zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+   zone->zone_start_pfn;
+   }
+
+   /*
+* Reserved pages for ZONE_CMA are now activated and this would change
+* ZONE_CMA's managed page counter and other zone's present counter.
+* We need to re-calculate various zone information that depends on
+* this initialization.
+*/
+   build_all_zonelists(NULL, NULL);
+   for_each_populated_zone(zone) {
+   zone_pcp_update(zone);
+   set_zone_contiguous(zone);
+   }
+
+   /*
+* We need to re-init per zone wmark by calling
+* init_per_zone_wmark_min() but doesn't call here because it is
+* registered on module_init and it will be called later than us.
+*/
+
return 0;
 }
 core_initcall(cma_init_reserved_areas);
diff --git a/mm/internal.h b/mm/internal.h
index e30f40e..64e3131 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -156,6 +156,9 @@ extern void __free_pages_bootmem(struct page *page, 
unsigned long pfn,
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern int user_min_free_kbytes;
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #if defined CONFIG_COMPACTION || 

[PATCH v2 4/6] mm/cma: remove ALLOC_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
consider ALLOC_CMA at all.

Signed-off-by: Joonsoo Kim 
---
 mm/internal.h   |  3 +--
 mm/page_alloc.c | 18 ++
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 64e3131..a25d45b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -478,8 +478,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone 
*zone,
 #define ALLOC_HARDER   0x10 /* try to alloc harder */
 #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET   0x40 /* check for correct cpuset */
-#define ALLOC_CMA  0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR 0x100 /* fair zone allocation */
+#define ALLOC_FAIR 0x80 /* fair zone allocation */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a6a195..69546b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2582,12 +2582,6 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
else
min -= min / 4;
 
-#ifdef CONFIG_CMA
-   /* If allocation can't use CMA areas don't use free CMA pages */
-   if (!(alloc_flags & ALLOC_CMA))
-   free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
-
/*
 * Check watermarks for an order-0 allocation request. If these
 * are not met, then a high-order request also cannot go ahead
@@ -2617,10 +2611,8 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
}
 
 #ifdef CONFIG_CMA
-   if ((alloc_flags & ALLOC_CMA) &&
-   !list_empty(>free_list[MIGRATE_CMA])) {
+   if (!list_empty(>free_list[MIGRATE_CMA]))
return true;
-   }
 #endif
}
return false;
@@ -3217,10 +3209,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 unlikely(test_thread_flag(TIF_MEMDIE
alloc_flags |= ALLOC_NO_WATERMARKS;
}
-#ifdef CONFIG_CMA
-   if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+
return alloc_flags;
 }
 
@@ -3573,9 +3562,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
if (unlikely(!zonelist->_zonerefs->zone))
return NULL;
 
-   if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
-- 
1.9.1



[PATCH v2 6/6] mm/cma: remove per zone CMA stat

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
so we don't need to maintain CMA stat in other zones. Remove it.

Signed-off-by: Joonsoo Kim 
---
 fs/proc/meminfo.c  |  2 +-
 include/linux/cma.h|  6 ++
 include/linux/mmzone.h |  1 -
 mm/cma.c   | 15 +++
 mm/page_alloc.c|  5 ++---
 mm/vmstat.c|  1 -
 6 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index ae5cc52..51449d0 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -172,7 +172,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 #ifdef CONFIG_CMA
, K(totalcma_pages)
-   , K(global_page_state(NR_FREE_CMA_PAGES))
+   , K(cma_get_free())
 #endif
);
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..816290c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int 
align);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned 
int count);
+
+#ifdef CONFIG_CMA
+extern unsigned long cma_get_free(void);
+#else
+static inline unsigned long cma_get_free(void) { return 0; }
+#endif
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75b41c5..3996a7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -140,7 +140,6 @@ enum zone_stat_item {
NR_SHMEM_HUGEPAGES, /* transparent shmem huge pages */
NR_SHMEM_PMDMAPPED, /* shmem huge pages currently mapped hugely */
NR_SHMEM_FREEHOLES, /* unused memory of high-order allocations */
-   NR_FREE_CMA_PAGES,
NR_VM_ZONE_STAT_ITEMS };
 
 /*
diff --git a/mm/cma.c b/mm/cma.c
index bd436e4..6dbddf2 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma)
return cma->count << PAGE_SHIFT;
 }
 
+unsigned long cma_get_free(void)
+{
+   struct zone *zone;
+   unsigned long freecma = 0;
+
+   for_each_populated_zone(zone) {
+   if (!is_zone_cma(zone))
+   continue;
+
+   freecma += zone_page_state(zone, NR_FREE_PAGES);
+   }
+
+   return freecma;
+}
+
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 int align_order)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51b2b0c..570edad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -4107,7 +4108,7 @@ void show_free_areas(unsigned int filter)
global_page_state(NR_SHMEM_FREEHOLES),
global_page_state(NR_FREE_PAGES),
free_pcp,
-   global_page_state(NR_FREE_CMA_PAGES));
+   cma_get_free());
 
for_each_populated_zone(zone) {
int i;
@@ -4150,7 +4151,6 @@ void show_free_areas(unsigned int filter)
" bounce:%lukB"
" free_pcp:%lukB"
" local_pcp:%ukB"
-   " free_cma:%lukB"
" writeback_tmp:%lukB"
" pages_scanned:%lu"
" all_unreclaimable? %s"
@@ -4188,7 +4188,6 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_BOUNCE)),
K(free_pcp),
K(this_cpu_read(zone->pageset->pcp.count)),
-   K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
K(zone_page_state(zone, NR_PAGES_SCANNED)),
(!zone_reclaimable(zone) ? "yes" : "no")
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 39a0c3c..81acdae 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -766,7 +766,6 @@ const char * const vmstat_text[] = {
"nr_shmem_hugepages",
"nr_shmem_pmdmapped",
"nr_shmem_freeholes",
-   "nr_free_cma",
 
/* enum writeback_stat_item counters */
"nr_dirty_threshold",
-- 
1.9.1



[PATCH v2 5/6] mm/cma: remove MIGRATE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and there is no other type of pages. Therefore, we don't need to
use MIGRATE_CMA to distinguish and handle differently for CMA pages
and ordinary pages. Remove MIGRATE_CMA.

Unfortunately, this patch make free CMA counter incorrect because
we count it when pages are on the MIGRATE_CMA. It will be fixed
by next patch. I can squash next patch here but it makes changes
complicated and hard to review so I separate that.

Signed-off-by: Joonsoo Kim 
---
 include/linux/gfp.h|  3 +-
 include/linux/mmzone.h | 22 
 include/linux/vmstat.h |  8 -
 mm/cma.c   |  2 +-
 mm/compaction.c| 10 ++
 mm/hugetlb.c   |  2 +-
 mm/page_alloc.c| 90 ++
 mm/page_isolation.c|  5 ++-
 mm/vmstat.c|  5 +--
 9 files changed, 32 insertions(+), 115 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4d6c008..1a3b869 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -559,8 +559,7 @@ static inline bool pm_suspended_storage(void)
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || 
defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
- unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5c97ba9..75b41c5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,22 +41,6 @@ enum {
MIGRATE_RECLAIMABLE,
MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
-   /*
-* MIGRATE_CMA migration type is designed to mimic the way
-* ZONE_MOVABLE works.  Only movable pages can be allocated
-* from MIGRATE_CMA pageblocks and page allocator never
-* implicitly change migration type of MIGRATE_CMA pageblock.
-*
-* The way to use it is to change migratetype of a range of
-* pageblocks to MIGRATE_CMA which can be done by
-* __free_pageblock_cma() function.  What is important though
-* is that a range of pageblocks must be aligned to
-* MAX_ORDER_NR_PAGES should biggest page be bigger then
-* a single pageblock.
-*/
-   MIGRATE_CMA,
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE,/* can't allocate from here */
 #endif
@@ -66,12 +50,6 @@ enum {
 /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
 extern char * const migratetype_names[MIGRATE_TYPES];
 
-#ifdef CONFIG_CMA
-#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#else
-#  define is_migrate_cma(migratetype) false
-#endif
-
 #define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 02fce41..6ddf080 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -260,14 +260,6 @@ static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
 #endif /* CONFIG_SMP */
 
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
-int migratetype)
-{
-   __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-   if (is_migrate_cma(migratetype))
-   __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
 
 #endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index 8684f50..bd436e4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -444,7 +444,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align)
 
pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(_mutex);
-   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+   ret = alloc_contig_range(pfn, pfn + count);
mutex_unlock(_mutex);
if (ret == 0) {
page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index 315e5d5..91e0969 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -76,7 +76,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+   return migratetype == MIGRATE_MOVABLE;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -965,7 +965,7 @@ static bool suitable_migration_target(struct 

[PATCH v2 4/6] mm/cma: remove ALLOC_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
consider ALLOC_CMA at all.

Signed-off-by: Joonsoo Kim 
---
 mm/internal.h   |  3 +--
 mm/page_alloc.c | 18 ++
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 64e3131..a25d45b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -478,8 +478,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone 
*zone,
 #define ALLOC_HARDER   0x10 /* try to alloc harder */
 #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET   0x40 /* check for correct cpuset */
-#define ALLOC_CMA  0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR 0x100 /* fair zone allocation */
+#define ALLOC_FAIR 0x80 /* fair zone allocation */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a6a195..69546b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2582,12 +2582,6 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
else
min -= min / 4;
 
-#ifdef CONFIG_CMA
-   /* If allocation can't use CMA areas don't use free CMA pages */
-   if (!(alloc_flags & ALLOC_CMA))
-   free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
-
/*
 * Check watermarks for an order-0 allocation request. If these
 * are not met, then a high-order request also cannot go ahead
@@ -2617,10 +2611,8 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
}
 
 #ifdef CONFIG_CMA
-   if ((alloc_flags & ALLOC_CMA) &&
-   !list_empty(>free_list[MIGRATE_CMA])) {
+   if (!list_empty(>free_list[MIGRATE_CMA]))
return true;
-   }
 #endif
}
return false;
@@ -3217,10 +3209,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 unlikely(test_thread_flag(TIF_MEMDIE
alloc_flags |= ALLOC_NO_WATERMARKS;
}
-#ifdef CONFIG_CMA
-   if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+
return alloc_flags;
 }
 
@@ -3573,9 +3562,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
if (unlikely(!zonelist->_zonerefs->zone))
return NULL;
 
-   if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
-- 
1.9.1



[PATCH v2 6/6] mm/cma: remove per zone CMA stat

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
so we don't need to maintain CMA stat in other zones. Remove it.

Signed-off-by: Joonsoo Kim 
---
 fs/proc/meminfo.c  |  2 +-
 include/linux/cma.h|  6 ++
 include/linux/mmzone.h |  1 -
 mm/cma.c   | 15 +++
 mm/page_alloc.c|  5 ++---
 mm/vmstat.c|  1 -
 6 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index ae5cc52..51449d0 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -172,7 +172,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 #ifdef CONFIG_CMA
, K(totalcma_pages)
-   , K(global_page_state(NR_FREE_CMA_PAGES))
+   , K(cma_get_free())
 #endif
);
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..816290c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int 
align);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned 
int count);
+
+#ifdef CONFIG_CMA
+extern unsigned long cma_get_free(void);
+#else
+static inline unsigned long cma_get_free(void) { return 0; }
+#endif
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75b41c5..3996a7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -140,7 +140,6 @@ enum zone_stat_item {
NR_SHMEM_HUGEPAGES, /* transparent shmem huge pages */
NR_SHMEM_PMDMAPPED, /* shmem huge pages currently mapped hugely */
NR_SHMEM_FREEHOLES, /* unused memory of high-order allocations */
-   NR_FREE_CMA_PAGES,
NR_VM_ZONE_STAT_ITEMS };
 
 /*
diff --git a/mm/cma.c b/mm/cma.c
index bd436e4..6dbddf2 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma)
return cma->count << PAGE_SHIFT;
 }
 
+unsigned long cma_get_free(void)
+{
+   struct zone *zone;
+   unsigned long freecma = 0;
+
+   for_each_populated_zone(zone) {
+   if (!is_zone_cma(zone))
+   continue;
+
+   freecma += zone_page_state(zone, NR_FREE_PAGES);
+   }
+
+   return freecma;
+}
+
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 int align_order)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51b2b0c..570edad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -4107,7 +4108,7 @@ void show_free_areas(unsigned int filter)
global_page_state(NR_SHMEM_FREEHOLES),
global_page_state(NR_FREE_PAGES),
free_pcp,
-   global_page_state(NR_FREE_CMA_PAGES));
+   cma_get_free());
 
for_each_populated_zone(zone) {
int i;
@@ -4150,7 +4151,6 @@ void show_free_areas(unsigned int filter)
" bounce:%lukB"
" free_pcp:%lukB"
" local_pcp:%ukB"
-   " free_cma:%lukB"
" writeback_tmp:%lukB"
" pages_scanned:%lu"
" all_unreclaimable? %s"
@@ -4188,7 +4188,6 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_BOUNCE)),
K(free_pcp),
K(this_cpu_read(zone->pageset->pcp.count)),
-   K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
K(zone_page_state(zone, NR_PAGES_SCANNED)),
(!zone_reclaimable(zone) ? "yes" : "no")
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 39a0c3c..81acdae 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -766,7 +766,6 @@ const char * const vmstat_text[] = {
"nr_shmem_hugepages",
"nr_shmem_pmdmapped",
"nr_shmem_freeholes",
-   "nr_free_cma",
 
/* enum writeback_stat_item counters */
"nr_dirty_threshold",
-- 
1.9.1



[PATCH v2 5/6] mm/cma: remove MIGRATE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and there is no other type of pages. Therefore, we don't need to
use MIGRATE_CMA to distinguish and handle differently for CMA pages
and ordinary pages. Remove MIGRATE_CMA.

Unfortunately, this patch make free CMA counter incorrect because
we count it when pages are on the MIGRATE_CMA. It will be fixed
by next patch. I can squash next patch here but it makes changes
complicated and hard to review so I separate that.

Signed-off-by: Joonsoo Kim 
---
 include/linux/gfp.h|  3 +-
 include/linux/mmzone.h | 22 
 include/linux/vmstat.h |  8 -
 mm/cma.c   |  2 +-
 mm/compaction.c| 10 ++
 mm/hugetlb.c   |  2 +-
 mm/page_alloc.c| 90 ++
 mm/page_isolation.c|  5 ++-
 mm/vmstat.c|  5 +--
 9 files changed, 32 insertions(+), 115 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4d6c008..1a3b869 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -559,8 +559,7 @@ static inline bool pm_suspended_storage(void)
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || 
defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
- unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5c97ba9..75b41c5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,22 +41,6 @@ enum {
MIGRATE_RECLAIMABLE,
MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
-   /*
-* MIGRATE_CMA migration type is designed to mimic the way
-* ZONE_MOVABLE works.  Only movable pages can be allocated
-* from MIGRATE_CMA pageblocks and page allocator never
-* implicitly change migration type of MIGRATE_CMA pageblock.
-*
-* The way to use it is to change migratetype of a range of
-* pageblocks to MIGRATE_CMA which can be done by
-* __free_pageblock_cma() function.  What is important though
-* is that a range of pageblocks must be aligned to
-* MAX_ORDER_NR_PAGES should biggest page be bigger then
-* a single pageblock.
-*/
-   MIGRATE_CMA,
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE,/* can't allocate from here */
 #endif
@@ -66,12 +50,6 @@ enum {
 /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
 extern char * const migratetype_names[MIGRATE_TYPES];
 
-#ifdef CONFIG_CMA
-#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#else
-#  define is_migrate_cma(migratetype) false
-#endif
-
 #define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 02fce41..6ddf080 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -260,14 +260,6 @@ static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
 #endif /* CONFIG_SMP */
 
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
-int migratetype)
-{
-   __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-   if (is_migrate_cma(migratetype))
-   __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
 
 #endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index 8684f50..bd436e4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -444,7 +444,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align)
 
pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(_mutex);
-   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+   ret = alloc_contig_range(pfn, pfn + count);
mutex_unlock(_mutex);
if (ret == 0) {
page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index 315e5d5..91e0969 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -76,7 +76,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+   return migratetype == MIGRATE_MOVABLE;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -965,7 +965,7 @@ static bool suitable_migration_target(struct page *page)
return 

[PATCH v2 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory

2016-04-24 Thread js1304
From: Joonsoo Kim 

Some of zone threshold depends on number of managed pages in the zone.
When memory is going on/offline, it can be changed and we need to
adjust them.

This patch add recalculation to appropriate places and clean-up
related function for better maintanance.

Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 71fa015..ffa93e0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4633,6 +4633,8 @@ int local_memory_node(int node)
 }
 #endif
 
+static void setup_min_unmapped_ratio(struct zone *zone);
+static void setup_min_slab_ratio(struct zone *zone);
 #else  /* CONFIG_NUMA */
 
 static void set_zonelist_order(void)
@@ -5747,9 +5749,8 @@ static void __paginginit free_area_init_core(struct 
pglist_data *pgdat)
zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
 #ifdef CONFIG_NUMA
zone->node = nid;
-   zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
-   / 100;
-   zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
+   setup_min_unmapped_ratio(zone);
+   setup_min_slab_ratio(zone);
 #endif
zone->name = zone_names[j];
spin_lock_init(>lock);
@@ -6655,6 +6656,7 @@ int __meminit init_per_zone_wmark_min(void)
 {
unsigned long lowmem_kbytes;
int new_min_free_kbytes;
+   struct zone *zone;
 
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
@@ -6672,6 +6674,14 @@ int __meminit init_per_zone_wmark_min(void)
setup_per_zone_wmarks();
refresh_zone_stat_thresholds();
setup_per_zone_lowmem_reserve();
+
+   for_each_zone(zone) {
+#ifdef CONFIG_NUMA
+   setup_min_unmapped_ratio(zone);
+   setup_min_slab_ratio(zone);
+#endif
+   }
+
return 0;
 }
 module_init(init_per_zone_wmark_min)
@@ -6713,6 +6723,12 @@ int watermark_scale_factor_sysctl_handler(struct 
ctl_table *table, int write,
 }
 
 #ifdef CONFIG_NUMA
+static void setup_min_unmapped_ratio(struct zone *zone)
+{
+   zone->min_unmapped_pages = (zone->managed_pages *
+   sysctl_min_unmapped_ratio) / 100;
+}
+
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int 
write,
void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6724,11 +6740,17 @@ int sysctl_min_unmapped_ratio_sysctl_handler(struct 
ctl_table *table, int write,
return rc;
 
for_each_zone(zone)
-   zone->min_unmapped_pages = (zone->managed_pages *
-   sysctl_min_unmapped_ratio) / 100;
+   setup_min_unmapped_ratio(zone);
+
return 0;
 }
 
+static void setup_min_slab_ratio(struct zone *zone)
+{
+   zone->min_slab_pages = (zone->managed_pages *
+   sysctl_min_slab_ratio) / 100;
+}
+
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6740,8 +6762,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table 
*table, int write,
return rc;
 
for_each_zone(zone)
-   zone->min_slab_pages = (zone->managed_pages *
-   sysctl_min_slab_ratio) / 100;
+   setup_min_slab_ratio(zone);
+
return 0;
 }
 #endif
-- 
1.9.1



[PATCH v2 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16


CMA reserve:0 MB512 MB
Elapsed-time:   92.4186.5
pswpin: 82  18647
pswpout:160 69839


CMA reserve:0 MB512 MB
Elapsed-time:   93.193.4
pswpin: 84  46
pswpout:183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem 

[PATCH v2 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory

2016-04-24 Thread js1304
From: Joonsoo Kim 

Some of zone threshold depends on number of managed pages in the zone.
When memory is going on/offline, it can be changed and we need to
adjust them.

This patch add recalculation to appropriate places and clean-up
related function for better maintanance.

Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 71fa015..ffa93e0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4633,6 +4633,8 @@ int local_memory_node(int node)
 }
 #endif
 
+static void setup_min_unmapped_ratio(struct zone *zone);
+static void setup_min_slab_ratio(struct zone *zone);
 #else  /* CONFIG_NUMA */
 
 static void set_zonelist_order(void)
@@ -5747,9 +5749,8 @@ static void __paginginit free_area_init_core(struct 
pglist_data *pgdat)
zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
 #ifdef CONFIG_NUMA
zone->node = nid;
-   zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
-   / 100;
-   zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
+   setup_min_unmapped_ratio(zone);
+   setup_min_slab_ratio(zone);
 #endif
zone->name = zone_names[j];
spin_lock_init(>lock);
@@ -6655,6 +6656,7 @@ int __meminit init_per_zone_wmark_min(void)
 {
unsigned long lowmem_kbytes;
int new_min_free_kbytes;
+   struct zone *zone;
 
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
@@ -6672,6 +6674,14 @@ int __meminit init_per_zone_wmark_min(void)
setup_per_zone_wmarks();
refresh_zone_stat_thresholds();
setup_per_zone_lowmem_reserve();
+
+   for_each_zone(zone) {
+#ifdef CONFIG_NUMA
+   setup_min_unmapped_ratio(zone);
+   setup_min_slab_ratio(zone);
+#endif
+   }
+
return 0;
 }
 module_init(init_per_zone_wmark_min)
@@ -6713,6 +6723,12 @@ int watermark_scale_factor_sysctl_handler(struct 
ctl_table *table, int write,
 }
 
 #ifdef CONFIG_NUMA
+static void setup_min_unmapped_ratio(struct zone *zone)
+{
+   zone->min_unmapped_pages = (zone->managed_pages *
+   sysctl_min_unmapped_ratio) / 100;
+}
+
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int 
write,
void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6724,11 +6740,17 @@ int sysctl_min_unmapped_ratio_sysctl_handler(struct 
ctl_table *table, int write,
return rc;
 
for_each_zone(zone)
-   zone->min_unmapped_pages = (zone->managed_pages *
-   sysctl_min_unmapped_ratio) / 100;
+   setup_min_unmapped_ratio(zone);
+
return 0;
 }
 
+static void setup_min_slab_ratio(struct zone *zone)
+{
+   zone->min_slab_pages = (zone->managed_pages *
+   sysctl_min_slab_ratio) / 100;
+}
+
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6740,8 +6762,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table 
*table, int write,
return rc;
 
for_each_zone(zone)
-   zone->min_slab_pages = (zone->managed_pages *
-   sysctl_min_slab_ratio) / 100;
+   setup_min_slab_ratio(zone);
+
return 0;
 }
 #endif
-- 
1.9.1



[PATCH v2 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16


CMA reserve:0 MB512 MB
Elapsed-time:   92.4186.5
pswpin: 82  18647
pswpout:160 69839


CMA reserve:0 MB512 MB
Elapsed-time:   93.193.4
pswpin: 84  46
pswpout:183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh 

[PATCH v2 0/6] Introduce ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Hello,

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16


CMA reserve:0 MB512 MB
Elapsed-time:   92.4186.5
pswpin: 82  18647
pswpout:160 69839


CMA reserve:0 MB512 MB
Elapsed-time:   93.193.4
pswpin: 84  46
pswpout:183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and 

[PATCH v2 0/6] Introduce ZONE_CMA

2016-04-24 Thread js1304
From: Joonsoo Kim 

Hello,

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16


CMA reserve:0 MB512 MB
Elapsed-time:   92.4186.5
pswpin: 82  18647
pswpout:160 69839


CMA reserve:0 MB512 MB
Elapsed-time:   93.193.4
pswpin: 84  46
pswpout:183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page 

我的相片在

2016-04-24 Thread 我的相片在
你的老朋友邀你来Q群:343257759 抢优惠券

我的相片在

2016-04-24 Thread 我的相片在
你的老朋友邀你来Q群:343257759 抢优惠券

Re: [PATCH v7 5/8] [media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver

2016-04-24 Thread tiffany lin
Hi Hans,


On Fri, 2016-04-22 at 15:47 +0200, Hans Verkuil wrote:
> On 04/22/2016 06:25 AM, Tiffany Lin wrote:
> > Add v4l2 layer encoder driver for MT8173
> > 
> > Signed-off-by: Tiffany Lin 
> > 
> > ---
> >  drivers/media/platform/Kconfig |   16 +
> >  drivers/media/platform/Makefile|2 +
> >  drivers/media/platform/mtk-vcodec/Makefile |   14 +
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h |  339 +
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 1301 
> > 
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.h |   59 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_drv.c |  467 +++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_pm.c  |  137 +++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_pm.h  |   26 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_intr.c|   56 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_intr.h|   27 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_util.c|   96 ++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_util.h|   87 ++
> >  drivers/media/platform/mtk-vcodec/venc_drv_base.h  |   62 +
> >  drivers/media/platform/mtk-vcodec/venc_drv_if.c|  107 ++
> >  drivers/media/platform/mtk-vcodec/venc_drv_if.h|  165 +++
> >  drivers/media/platform/mtk-vcodec/venc_ipi_msg.h   |  210 
> >  17 files changed, 3171 insertions(+)
> >  create mode 100644 drivers/media/platform/mtk-vcodec/Makefile
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_drv.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_intr.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_intr.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_util.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_util.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_base.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_if.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_if.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_ipi_msg.h
> > 
> > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c 
> > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > new file mode 100644
> > index 000..b2b662c
> > --- /dev/null
> > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > @@ -0,0 +1,1301 @@
> > +/* V4L2 specification suggests the driver corrects the format struct if 
> > any of
> > +  * the dimensions is unsupported
> > +  */
> > +static int vidioc_try_fmt(struct v4l2_format *f, struct mtk_video_fmt *fmt)
> > +{
> > +   struct v4l2_pix_format_mplane *pix_fmt_mp = >fmt.pix_mp;
> > +   int i;
> > +
> > +   pix_fmt_mp->field = V4L2_FIELD_NONE;
> > +
> > +   if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> > +   pix_fmt_mp->num_planes = 1;
> > +   pix_fmt_mp->plane_fmt[0].bytesperline = 0;
> > +   } else if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > +   int tmp_w, tmp_h;
> > +
> > +   pix_fmt_mp->height = clamp(pix_fmt_mp->height,
> > +   MTK_VENC_MIN_H,
> > +   MTK_VENC_MAX_H);
> > +   pix_fmt_mp->width = clamp(pix_fmt_mp->width,
> > +   MTK_VENC_MIN_W,
> > +   MTK_VENC_MAX_W);
> 
> Strange indentation. I see this in more places. Can you check this and fix
> where needed?
> 
I will fix all this strange indentation in next version.

> > +
> > +   /* find next closer width align 16, heign align 32, size align
> > + * 64 rectangle
> > + */
> > +   tmp_w = pix_fmt_mp->width;
> > +   tmp_h = pix_fmt_mp->height;
> > +   v4l_bound_align_image(_fmt_mp->width,
> > +   MTK_VENC_MIN_W,
> > +   MTK_VENC_MAX_W, 4,
> > +   _fmt_mp->height,
> > +   MTK_VENC_MIN_H,
> > +   MTK_VENC_MAX_H, 5, 6);
> > +
> 
> ...
> 
> > +static int vidioc_try_fmt_vid_out_mplane(struct file *file, void *priv,
> > +struct v4l2_format *f)
> > +{
> > +   struct mtk_video_fmt *fmt;
> > +
> > +   fmt = mtk_venc_find_format(f);
> > +   if (!fmt) {
> > +  

Re: [PATCH v7 5/8] [media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver

2016-04-24 Thread tiffany lin
Hi Hans,


On Fri, 2016-04-22 at 15:47 +0200, Hans Verkuil wrote:
> On 04/22/2016 06:25 AM, Tiffany Lin wrote:
> > Add v4l2 layer encoder driver for MT8173
> > 
> > Signed-off-by: Tiffany Lin 
> > 
> > ---
> >  drivers/media/platform/Kconfig |   16 +
> >  drivers/media/platform/Makefile|2 +
> >  drivers/media/platform/mtk-vcodec/Makefile |   14 +
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h |  339 +
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 1301 
> > 
> >  drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.h |   59 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_drv.c |  467 +++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_pm.c  |  137 +++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_enc_pm.h  |   26 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_intr.c|   56 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_intr.h|   27 +
> >  .../media/platform/mtk-vcodec/mtk_vcodec_util.c|   96 ++
> >  .../media/platform/mtk-vcodec/mtk_vcodec_util.h|   87 ++
> >  drivers/media/platform/mtk-vcodec/venc_drv_base.h  |   62 +
> >  drivers/media/platform/mtk-vcodec/venc_drv_if.c|  107 ++
> >  drivers/media/platform/mtk-vcodec/venc_drv_if.h|  165 +++
> >  drivers/media/platform/mtk-vcodec/venc_ipi_msg.h   |  210 
> >  17 files changed, 3171 insertions(+)
> >  create mode 100644 drivers/media/platform/mtk-vcodec/Makefile
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_drv.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_intr.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_intr.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_util.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/mtk_vcodec_util.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_base.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_if.c
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_drv_if.h
> >  create mode 100644 drivers/media/platform/mtk-vcodec/venc_ipi_msg.h
> > 
> > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c 
> > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > new file mode 100644
> > index 000..b2b662c
> > --- /dev/null
> > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > @@ -0,0 +1,1301 @@
> > +/* V4L2 specification suggests the driver corrects the format struct if 
> > any of
> > +  * the dimensions is unsupported
> > +  */
> > +static int vidioc_try_fmt(struct v4l2_format *f, struct mtk_video_fmt *fmt)
> > +{
> > +   struct v4l2_pix_format_mplane *pix_fmt_mp = >fmt.pix_mp;
> > +   int i;
> > +
> > +   pix_fmt_mp->field = V4L2_FIELD_NONE;
> > +
> > +   if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> > +   pix_fmt_mp->num_planes = 1;
> > +   pix_fmt_mp->plane_fmt[0].bytesperline = 0;
> > +   } else if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > +   int tmp_w, tmp_h;
> > +
> > +   pix_fmt_mp->height = clamp(pix_fmt_mp->height,
> > +   MTK_VENC_MIN_H,
> > +   MTK_VENC_MAX_H);
> > +   pix_fmt_mp->width = clamp(pix_fmt_mp->width,
> > +   MTK_VENC_MIN_W,
> > +   MTK_VENC_MAX_W);
> 
> Strange indentation. I see this in more places. Can you check this and fix
> where needed?
> 
I will fix all this strange indentation in next version.

> > +
> > +   /* find next closer width align 16, heign align 32, size align
> > + * 64 rectangle
> > + */
> > +   tmp_w = pix_fmt_mp->width;
> > +   tmp_h = pix_fmt_mp->height;
> > +   v4l_bound_align_image(_fmt_mp->width,
> > +   MTK_VENC_MIN_W,
> > +   MTK_VENC_MAX_W, 4,
> > +   _fmt_mp->height,
> > +   MTK_VENC_MIN_H,
> > +   MTK_VENC_MAX_H, 5, 6);
> > +
> 
> ...
> 
> > +static int vidioc_try_fmt_vid_out_mplane(struct file *file, void *priv,
> > +struct v4l2_format *f)
> > +{
> > +   struct mtk_video_fmt *fmt;
> > +
> > +   fmt = mtk_venc_find_format(f);
> > +   if (!fmt) {
> > +   f->fmt.pix.pixelformat = 

Re: [PATCH V4 3/4] gpio: tegra: Get rid of all file scoped global variables

2016-04-24 Thread Alexandre Courbot
On Fri, Apr 22, 2016 at 7:09 PM, Laxman Dewangan  wrote:
> Move the file scoped multiple global variable from Tegra GPIO
> driver to the structure and make this as gpiochip data which
> can be referred from GPIO chip callbacks.
>
> Signed-off-by: Laxman Dewangan 
> Reviewed-by: Stephen Warren 
>
> ---
> This patch is reworked on top of earlier patch
> gpio: tegra: Remove the need of keeping device handle for gpio driver
>
> There was review comment that we should get for all variable and hence
> this is outcome of the discussion.
>
> Changes from V3:
> - Remove DBC/EN registers.
> - Remove non-required new lines.
> - Collected RB from Stephen.
>
>  drivers/gpio/gpio-tegra.c | 296 
> +++---
>  1 file changed, 176 insertions(+), 120 deletions(-)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index cd69422..6af5eb2 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -35,24 +35,24 @@
>  #define GPIO_PORT(x)   (((x) >> 3) & 0x3)
>  #define GPIO_BIT(x)((x) & 0x7)
>
> -#define GPIO_REG(x)(GPIO_BANK(x) * tegra_gpio_bank_stride + \
> +#define GPIO_REG(tgi, x)   (GPIO_BANK(x) * tgi->soc->bank_stride + \
> GPIO_PORT(x) * 4)
>
> -#define GPIO_CNF(x)(GPIO_REG(x) + 0x00)
> -#define GPIO_OE(x) (GPIO_REG(x) + 0x10)
> -#define GPIO_OUT(x)(GPIO_REG(x) + 0X20)
> -#define GPIO_IN(x) (GPIO_REG(x) + 0x30)
> -#define GPIO_INT_STA(x)(GPIO_REG(x) + 0x40)
> -#define GPIO_INT_ENB(x)(GPIO_REG(x) + 0x50)
> -#define GPIO_INT_LVL(x)(GPIO_REG(x) + 0x60)
> -#define GPIO_INT_CLR(x)(GPIO_REG(x) + 0x70)
> -
> -#define GPIO_MSK_CNF(x)(GPIO_REG(x) + 
> tegra_gpio_upper_offset + 0x00)
> -#define GPIO_MSK_OE(x) (GPIO_REG(x) + tegra_gpio_upper_offset + 0x10)
> -#define GPIO_MSK_OUT(x)(GPIO_REG(x) + 
> tegra_gpio_upper_offset + 0X20)
> -#define GPIO_MSK_INT_STA(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x40)
> -#define GPIO_MSK_INT_ENB(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x50)
> -#define GPIO_MSK_INT_LVL(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x60)
> +#define GPIO_CNF(t, x) (GPIO_REG(t, x) + 0x00)
> +#define GPIO_OE(t, x)  (GPIO_REG(t, x) + 0x10)
> +#define GPIO_OUT(t, x) (GPIO_REG(t, x) + 0X20)
> +#define GPIO_IN(t, x)  (GPIO_REG(t, x) + 0x30)
> +#define GPIO_INT_STA(t, x) (GPIO_REG(t, x) + 0x40)
> +#define GPIO_INT_ENB(t, x) (GPIO_REG(t, x) + 0x50)
> +#define GPIO_INT_LVL(t, x) (GPIO_REG(t, x) + 0x60)
> +#define GPIO_INT_CLR(t, x) (GPIO_REG(t, x) + 0x70)
> +
> +#define GPIO_MSK_CNF(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x00)
> +#define GPIO_MSK_OE(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x10)
> +#define GPIO_MSK_OUT(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0X20)
> +#define GPIO_MSK_INT_STA(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x40)
> +#define GPIO_MSK_INT_ENB(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x50)
> +#define GPIO_MSK_INT_LVL(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x60)
>
>  #define GPIO_INT_LVL_MASK  0x010101
>  #define GPIO_INT_LVL_EDGE_RISING   0x000101
> @@ -61,6 +61,8 @@
>  #define GPIO_INT_LVL_LEVEL_HIGH0x01
>  #define GPIO_INT_LVL_LEVEL_LOW 0x00
>
> +struct tegra_gpio_info;
> +
>  struct tegra_gpio_bank {
> int bank;
> int irq;
> @@ -73,6 +75,7 @@ struct tegra_gpio_bank {
> u32 int_lvl[4];
> u32 wake_enb[4];
>  #endif
> +   struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> @@ -80,22 +83,25 @@ struct tegra_gpio_soc_config {
> u32 upper_offset;
>  };
>
> -static struct device *dev;
> -static struct irq_domain *irq_domain;
> -static void __iomem *regs;
> -static u32 tegra_gpio_bank_count;
> -static u32 tegra_gpio_bank_stride;
> -static u32 tegra_gpio_upper_offset;
> -static struct tegra_gpio_bank *tegra_gpio_banks;
> +struct tegra_gpio_info {

I think tegra_gpio_chip would be a better name for this structure
(especially if you make "struct gpio_chip gc" its first member to
highlight the fact that it inherits from it) and more in line with
what other GPIO drivers do.

You can then rename the former tegra_gpio_chip to something like
tegra_gpio_funcs since it would just be used to set the chip's
functions if you follow my comment on patch 4/4.

> +   struct device   *dev;
> +   void __iomem*regs;
> +   struct irq_domain   *irq_domain;
> +   struct tegra_gpio_bank  *bank_info;
> +   const struct tegra_gpio_soc_config  *soc;
> +   struct gpio_chip*gc;
> +   u32  

Re: [PATCH V4 3/4] gpio: tegra: Get rid of all file scoped global variables

2016-04-24 Thread Alexandre Courbot
On Fri, Apr 22, 2016 at 7:09 PM, Laxman Dewangan  wrote:
> Move the file scoped multiple global variable from Tegra GPIO
> driver to the structure and make this as gpiochip data which
> can be referred from GPIO chip callbacks.
>
> Signed-off-by: Laxman Dewangan 
> Reviewed-by: Stephen Warren 
>
> ---
> This patch is reworked on top of earlier patch
> gpio: tegra: Remove the need of keeping device handle for gpio driver
>
> There was review comment that we should get for all variable and hence
> this is outcome of the discussion.
>
> Changes from V3:
> - Remove DBC/EN registers.
> - Remove non-required new lines.
> - Collected RB from Stephen.
>
>  drivers/gpio/gpio-tegra.c | 296 
> +++---
>  1 file changed, 176 insertions(+), 120 deletions(-)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index cd69422..6af5eb2 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -35,24 +35,24 @@
>  #define GPIO_PORT(x)   (((x) >> 3) & 0x3)
>  #define GPIO_BIT(x)((x) & 0x7)
>
> -#define GPIO_REG(x)(GPIO_BANK(x) * tegra_gpio_bank_stride + \
> +#define GPIO_REG(tgi, x)   (GPIO_BANK(x) * tgi->soc->bank_stride + \
> GPIO_PORT(x) * 4)
>
> -#define GPIO_CNF(x)(GPIO_REG(x) + 0x00)
> -#define GPIO_OE(x) (GPIO_REG(x) + 0x10)
> -#define GPIO_OUT(x)(GPIO_REG(x) + 0X20)
> -#define GPIO_IN(x) (GPIO_REG(x) + 0x30)
> -#define GPIO_INT_STA(x)(GPIO_REG(x) + 0x40)
> -#define GPIO_INT_ENB(x)(GPIO_REG(x) + 0x50)
> -#define GPIO_INT_LVL(x)(GPIO_REG(x) + 0x60)
> -#define GPIO_INT_CLR(x)(GPIO_REG(x) + 0x70)
> -
> -#define GPIO_MSK_CNF(x)(GPIO_REG(x) + 
> tegra_gpio_upper_offset + 0x00)
> -#define GPIO_MSK_OE(x) (GPIO_REG(x) + tegra_gpio_upper_offset + 0x10)
> -#define GPIO_MSK_OUT(x)(GPIO_REG(x) + 
> tegra_gpio_upper_offset + 0X20)
> -#define GPIO_MSK_INT_STA(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x40)
> -#define GPIO_MSK_INT_ENB(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x50)
> -#define GPIO_MSK_INT_LVL(x)(GPIO_REG(x) + tegra_gpio_upper_offset + 0x60)
> +#define GPIO_CNF(t, x) (GPIO_REG(t, x) + 0x00)
> +#define GPIO_OE(t, x)  (GPIO_REG(t, x) + 0x10)
> +#define GPIO_OUT(t, x) (GPIO_REG(t, x) + 0X20)
> +#define GPIO_IN(t, x)  (GPIO_REG(t, x) + 0x30)
> +#define GPIO_INT_STA(t, x) (GPIO_REG(t, x) + 0x40)
> +#define GPIO_INT_ENB(t, x) (GPIO_REG(t, x) + 0x50)
> +#define GPIO_INT_LVL(t, x) (GPIO_REG(t, x) + 0x60)
> +#define GPIO_INT_CLR(t, x) (GPIO_REG(t, x) + 0x70)
> +
> +#define GPIO_MSK_CNF(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x00)
> +#define GPIO_MSK_OE(t, x)  (GPIO_REG(t, x) + t->soc->upper_offset + 0x10)
> +#define GPIO_MSK_OUT(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0X20)
> +#define GPIO_MSK_INT_STA(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x40)
> +#define GPIO_MSK_INT_ENB(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x50)
> +#define GPIO_MSK_INT_LVL(t, x) (GPIO_REG(t, x) + t->soc->upper_offset + 0x60)
>
>  #define GPIO_INT_LVL_MASK  0x010101
>  #define GPIO_INT_LVL_EDGE_RISING   0x000101
> @@ -61,6 +61,8 @@
>  #define GPIO_INT_LVL_LEVEL_HIGH0x01
>  #define GPIO_INT_LVL_LEVEL_LOW 0x00
>
> +struct tegra_gpio_info;
> +
>  struct tegra_gpio_bank {
> int bank;
> int irq;
> @@ -73,6 +75,7 @@ struct tegra_gpio_bank {
> u32 int_lvl[4];
> u32 wake_enb[4];
>  #endif
> +   struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> @@ -80,22 +83,25 @@ struct tegra_gpio_soc_config {
> u32 upper_offset;
>  };
>
> -static struct device *dev;
> -static struct irq_domain *irq_domain;
> -static void __iomem *regs;
> -static u32 tegra_gpio_bank_count;
> -static u32 tegra_gpio_bank_stride;
> -static u32 tegra_gpio_upper_offset;
> -static struct tegra_gpio_bank *tegra_gpio_banks;
> +struct tegra_gpio_info {

I think tegra_gpio_chip would be a better name for this structure
(especially if you make "struct gpio_chip gc" its first member to
highlight the fact that it inherits from it) and more in line with
what other GPIO drivers do.

You can then rename the former tegra_gpio_chip to something like
tegra_gpio_funcs since it would just be used to set the chip's
functions if you follow my comment on patch 4/4.

> +   struct device   *dev;
> +   void __iomem*regs;
> +   struct irq_domain   *irq_domain;
> +   struct tegra_gpio_bank  *bank_info;
> +   const struct tegra_gpio_soc_config  *soc;
> +   struct gpio_chip*gc;
> +   u32 bank_count;
> +};
>
> -static inline void 

Re: [RFC][PATCH 0/6] /dev/random - a new approach

2016-04-24 Thread Stephan Mueller
Am Sonntag, 24. April 2016, 23:25:00 schrieb Pavel Machek:

Hi Pavel,

> > /* This RNG does not work if no high-resolution timer is available */
> > BUG_ON(!random_get_entropy() && !random_get_entropy());
> 
> Heh, does this cause BUG() with 2^-64 probability? :-).

No, but for the listed arches, get_cycles would return 0. And I only call the 
function twice to not be tripped by a potential wrap around at the time of 
calling.
> 
> > If there is no high-resolution timer, the LRNG will not produce good
> > entropic random numbers. The current kernel code implements
> > high-resolution timers for all but the following architectures where
> > neither random_get_entropy nor
> > get_cycles are implemented:
> Ok, what about stuff like Intel 486 (no RDTSC)?
> 
> > Thus, for all large-scale architectures, the LRNG would be applicable.
> > 
> > Please note that also the legacy /dev/random will have hard time to obtain
> > entropy for these environments. The majority of the entropy comes
> > from high-
> 
> Understood.
> 
> > Though, the patch I offer leaves the legacy /dev/random in peace for those
> > architectures to not touch the status quo.
> 
> Well -- that's the major problem -- right? Makes it tricky to tell
> what changed, and we had two RNGs to maintain.

I would rather think that even the legacy /dev/random should not return any 
values in those environments. The random numbers that are returned on these 
systems are bogus, considering that the only noise source that could deliver 
some entropy excluding timestamps (if you trust the user) are the HID event 
values. And for those listed systems, I doubt very much that they are used in 
a desktop environment where you have a console.

If everybody agrees, I can surely add some logic to make the LRNG working on 
those systems. But those additions cannot be subjected to a thorough entropy 
analysis. Yet I feel that this is wrong.

My goal with the LRNG is to provide a new design using proven techniques that 
is forward looking. I am aware that the design does not work in circumstances 
where the high-res timer is not present. But do we have to settle on the least 
common denominator knowing that this one will not really work to begin with?

Ciao
Stephan


Re: [RFC][PATCH 0/6] /dev/random - a new approach

2016-04-24 Thread Stephan Mueller
Am Sonntag, 24. April 2016, 23:25:00 schrieb Pavel Machek:

Hi Pavel,

> > /* This RNG does not work if no high-resolution timer is available */
> > BUG_ON(!random_get_entropy() && !random_get_entropy());
> 
> Heh, does this cause BUG() with 2^-64 probability? :-).

No, but for the listed arches, get_cycles would return 0. And I only call the 
function twice to not be tripped by a potential wrap around at the time of 
calling.
> 
> > If there is no high-resolution timer, the LRNG will not produce good
> > entropic random numbers. The current kernel code implements
> > high-resolution timers for all but the following architectures where
> > neither random_get_entropy nor
> > get_cycles are implemented:
> Ok, what about stuff like Intel 486 (no RDTSC)?
> 
> > Thus, for all large-scale architectures, the LRNG would be applicable.
> > 
> > Please note that also the legacy /dev/random will have hard time to obtain
> > entropy for these environments. The majority of the entropy comes
> > from high-
> 
> Understood.
> 
> > Though, the patch I offer leaves the legacy /dev/random in peace for those
> > architectures to not touch the status quo.
> 
> Well -- that's the major problem -- right? Makes it tricky to tell
> what changed, and we had two RNGs to maintain.

I would rather think that even the legacy /dev/random should not return any 
values in those environments. The random numbers that are returned on these 
systems are bogus, considering that the only noise source that could deliver 
some entropy excluding timestamps (if you trust the user) are the HID event 
values. And for those listed systems, I doubt very much that they are used in 
a desktop environment where you have a console.

If everybody agrees, I can surely add some logic to make the LRNG working on 
those systems. But those additions cannot be subjected to a thorough entropy 
analysis. Yet I feel that this is wrong.

My goal with the LRNG is to provide a new design using proven techniques that 
is forward looking. I am aware that the design does not work in circumstances 
where the high-res timer is not present. But do we have to settle on the least 
common denominator knowing that this one will not really work to begin with?

Ciao
Stephan


Re: [PATCH V3 4/4] gpio: tegra: Add support for gpio debounce

2016-04-24 Thread Alexandre Courbot
On Wed, Apr 20, 2016 at 10:30 PM, Laxman Dewangan  wrote:
> NVIDIA's Tegra210 support the HW debounce in the GPIO
> controller for all its GPIO pins.
>
> Add support for setting debounce timing by implementing the
> set_debounce callback of gpiochip.
>
> Signed-off-by: Laxman Dewangan 
>
> ---
> Changes from V1:
> - Write debounce count before enable.
> - Make sure the debounce count do not have any boot residuals.
>
> Changes from V2:
> - Only access register fo debounce when SoC support debounce.
> ---
>  drivers/gpio/gpio-tegra.c | 58 
> +++
>  1 file changed, 58 insertions(+)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index 36e865f..1f8ec24 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -76,11 +76,14 @@ struct tegra_gpio_bank {
> u32 int_enb[4];
> u32 int_lvl[4];
> u32 wake_enb[4];
> +   u32 dbc_enb[4];
>  #endif
> +   u32 dbc_cnt[4];
> struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> +   bool debounce_supported;
> u32 bank_stride;
> u32 upper_offset;
>  };
> @@ -184,6 +187,35 @@ static int tegra_gpio_direction_output(struct gpio_chip 
> *chip, unsigned offset,
> return 0;
>  }
>
> +static int tegra_gpio_set_debounce(struct gpio_chip *chip, unsigned int 
> offset,
> +  unsigned int debounce)
> +{
> +   struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> +   unsigned int debounce_ms = DIV_ROUND_UP(debounce, 1000);
> +   int port = GPIO_PORT(offset);
> +   int bank = GPIO_BANK(offset);
> +
> +   if (!debounce_ms) {
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset),
> + offset, 0);
> +   return 0;
> +   }
> +
> +   debounce_ms = min(debounce_ms, 255U);
> +
> +   /* There is only one debounce count register per port and hence
> +* set the maximum of current and requested debounce time.
> +*/
> +   if (tgi->bank_info[bank].dbc_cnt[port] < debounce_ms) {
> +   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, 
> offset));
> +   tgi->bank_info[bank].dbc_cnt[port] = debounce_ms;
> +   }
> +
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset), offset, 1);
> +
> +   return 0;
> +}
> +
>  static int tegra_gpio_to_irq(struct gpio_chip *chip, unsigned offset)
>  {
> struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> @@ -199,6 +231,7 @@ static struct gpio_chip tegra_gpio_chip = {
> .get= tegra_gpio_get,
> .direction_output   = tegra_gpio_direction_output,
> .set= tegra_gpio_set,
> +   .set_debounce   = tegra_gpio_set_debounce,
> .to_irq = tegra_gpio_to_irq,
> .base   = 0,
>  };
> @@ -363,6 +396,14 @@ static int tegra_gpio_resume(struct device *dev)
> unsigned int gpio = (b<<5) | (p<<3);
> tegra_gpio_writel(tgi, bank->cnf[p],
>   GPIO_CNF(tgi, gpio));
> +
> +   if (tgi->soc->debounce_supported) {
> +   tegra_gpio_writel(tgi, bank->dbc_cnt[p],
> + GPIO_DBC_CNT(tgi, gpio));
> +   tegra_gpio_writel(tgi, bank->dbc_enb[p],
> + GPIO_MSK_DBC_EN(tgi, gpio));
> +   }
> +
> tegra_gpio_writel(tgi, bank->out[p],
>   GPIO_OUT(tgi, gpio));
> tegra_gpio_writel(tgi, bank->oe[p],
> @@ -398,6 +439,13 @@ static int tegra_gpio_suspend(struct device *dev)
> GPIO_OUT(tgi, gpio));
> bank->oe[p] = tegra_gpio_readl(tgi,
>GPIO_OE(tgi, gpio));
> +   if (tgi->soc->debounce_supported) {
> +   bank->dbc_enb[p] = tegra_gpio_readl(tgi,
> +   GPIO_MSK_DBC_EN(tgi, gpio));
> +   bank->dbc_enb[p] = (bank->dbc_enb[p] << 8) |
> +   bank->dbc_enb[p];
> +   }
> +
> bank->int_enb[p] = tegra_gpio_readl(tgi,
> GPIO_INT_ENB(tgi, gpio));
> bank->int_lvl[p] = tegra_gpio_readl(tgi,
> @@ -550,6 +598,9 @@ static int tegra_gpio_probe(struct platform_device *pdev)
>
> platform_set_drvdata(pdev, tgi);
>
> +   if (!config->debounce_supported)
> +   tgi->gc->set_debounce = NULL;

This 

Re: [PATCH V3 4/4] gpio: tegra: Add support for gpio debounce

2016-04-24 Thread Alexandre Courbot
On Wed, Apr 20, 2016 at 10:30 PM, Laxman Dewangan  wrote:
> NVIDIA's Tegra210 support the HW debounce in the GPIO
> controller for all its GPIO pins.
>
> Add support for setting debounce timing by implementing the
> set_debounce callback of gpiochip.
>
> Signed-off-by: Laxman Dewangan 
>
> ---
> Changes from V1:
> - Write debounce count before enable.
> - Make sure the debounce count do not have any boot residuals.
>
> Changes from V2:
> - Only access register fo debounce when SoC support debounce.
> ---
>  drivers/gpio/gpio-tegra.c | 58 
> +++
>  1 file changed, 58 insertions(+)
>
> diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
> index 36e865f..1f8ec24 100644
> --- a/drivers/gpio/gpio-tegra.c
> +++ b/drivers/gpio/gpio-tegra.c
> @@ -76,11 +76,14 @@ struct tegra_gpio_bank {
> u32 int_enb[4];
> u32 int_lvl[4];
> u32 wake_enb[4];
> +   u32 dbc_enb[4];
>  #endif
> +   u32 dbc_cnt[4];
> struct tegra_gpio_info *tgi;
>  };
>
>  struct tegra_gpio_soc_config {
> +   bool debounce_supported;
> u32 bank_stride;
> u32 upper_offset;
>  };
> @@ -184,6 +187,35 @@ static int tegra_gpio_direction_output(struct gpio_chip 
> *chip, unsigned offset,
> return 0;
>  }
>
> +static int tegra_gpio_set_debounce(struct gpio_chip *chip, unsigned int 
> offset,
> +  unsigned int debounce)
> +{
> +   struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> +   unsigned int debounce_ms = DIV_ROUND_UP(debounce, 1000);
> +   int port = GPIO_PORT(offset);
> +   int bank = GPIO_BANK(offset);
> +
> +   if (!debounce_ms) {
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset),
> + offset, 0);
> +   return 0;
> +   }
> +
> +   debounce_ms = min(debounce_ms, 255U);
> +
> +   /* There is only one debounce count register per port and hence
> +* set the maximum of current and requested debounce time.
> +*/
> +   if (tgi->bank_info[bank].dbc_cnt[port] < debounce_ms) {
> +   tegra_gpio_writel(tgi, debounce_ms, GPIO_DBC_CNT(tgi, 
> offset));
> +   tgi->bank_info[bank].dbc_cnt[port] = debounce_ms;
> +   }
> +
> +   tegra_gpio_mask_write(tgi, GPIO_MSK_DBC_EN(tgi, offset), offset, 1);
> +
> +   return 0;
> +}
> +
>  static int tegra_gpio_to_irq(struct gpio_chip *chip, unsigned offset)
>  {
> struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
> @@ -199,6 +231,7 @@ static struct gpio_chip tegra_gpio_chip = {
> .get= tegra_gpio_get,
> .direction_output   = tegra_gpio_direction_output,
> .set= tegra_gpio_set,
> +   .set_debounce   = tegra_gpio_set_debounce,
> .to_irq = tegra_gpio_to_irq,
> .base   = 0,
>  };
> @@ -363,6 +396,14 @@ static int tegra_gpio_resume(struct device *dev)
> unsigned int gpio = (b<<5) | (p<<3);
> tegra_gpio_writel(tgi, bank->cnf[p],
>   GPIO_CNF(tgi, gpio));
> +
> +   if (tgi->soc->debounce_supported) {
> +   tegra_gpio_writel(tgi, bank->dbc_cnt[p],
> + GPIO_DBC_CNT(tgi, gpio));
> +   tegra_gpio_writel(tgi, bank->dbc_enb[p],
> + GPIO_MSK_DBC_EN(tgi, gpio));
> +   }
> +
> tegra_gpio_writel(tgi, bank->out[p],
>   GPIO_OUT(tgi, gpio));
> tegra_gpio_writel(tgi, bank->oe[p],
> @@ -398,6 +439,13 @@ static int tegra_gpio_suspend(struct device *dev)
> GPIO_OUT(tgi, gpio));
> bank->oe[p] = tegra_gpio_readl(tgi,
>GPIO_OE(tgi, gpio));
> +   if (tgi->soc->debounce_supported) {
> +   bank->dbc_enb[p] = tegra_gpio_readl(tgi,
> +   GPIO_MSK_DBC_EN(tgi, gpio));
> +   bank->dbc_enb[p] = (bank->dbc_enb[p] << 8) |
> +   bank->dbc_enb[p];
> +   }
> +
> bank->int_enb[p] = tegra_gpio_readl(tgi,
> GPIO_INT_ENB(tgi, gpio));
> bank->int_lvl[p] = tegra_gpio_readl(tgi,
> @@ -550,6 +598,9 @@ static int tegra_gpio_probe(struct platform_device *pdev)
>
> platform_set_drvdata(pdev, tgi);
>
> +   if (!config->debounce_supported)
> +   tgi->gc->set_debounce = NULL;

This last line is equivalent to doing

 

[PATCH] sched: don't output cpu sched info by default

2016-04-24 Thread Zhang Long
From: LongX Zhang 

Users usually install dozens of apps on Android devices.
When system memory is used up as thousands of threads might run,
Android userspace debug prcoess might dump system info by sysrq.
One info is of cpu sched. Usually, one thread has one line dump.
Such log is huge sometimes and such dumping spends lots of time
and make the system worse. Sometimes, watchdog resets the system
in the end.

The patch fixes it by dumping cpu sched info only when
sched_debug_enabled is true.

Signed-off-by: Zhang Yanmin 
Signed-off-by: LongX Zhang 
---
 kernel/sched/core.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..92d2d83 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4973,6 +4973,25 @@ void sched_show_task(struct task_struct *p)
show_stack(p, NULL);
 }
 
+#ifdef CONFIG_SCHED_DEBUG
+
+static __read_mostly int sched_debug_enabled;
+
+static int __init sched_debug_setup(char *str)
+{
+   sched_debug_enabled = 1;
+
+   return 0;
+}
+early_param("sched_debug", sched_debug_setup);
+
+static inline bool sched_debug(void)
+{
+   return sched_debug_enabled;
+}
+
+#endif
+
 void show_state_filter(unsigned long state_filter)
 {
struct task_struct *g, *p;
@@ -4998,7 +5017,8 @@ void show_state_filter(unsigned long state_filter)
touch_all_softlockup_watchdogs();
 
 #ifdef CONFIG_SCHED_DEBUG
-   sysrq_sched_debug_show();
+   if (sched_debug())
+   sysrq_sched_debug_show();
 #endif
rcu_read_unlock();
/*
@@ -5499,21 +5519,6 @@ static cpumask_var_t sched_domains_tmpmask; /* 
sched_domains_mutex */
 
 #ifdef CONFIG_SCHED_DEBUG
 
-static __read_mostly int sched_debug_enabled;
-
-static int __init sched_debug_setup(char *str)
-{
-   sched_debug_enabled = 1;
-
-   return 0;
-}
-early_param("sched_debug", sched_debug_setup);
-
-static inline bool sched_debug(void)
-{
-   return sched_debug_enabled;
-}
-
 static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
  struct cpumask *groupmask)
 {
-- 
1.9.1





[PATCH] sched: don't output cpu sched info by default

2016-04-24 Thread Zhang Long
From: LongX Zhang 

Users usually install dozens of apps on Android devices.
When system memory is used up as thousands of threads might run,
Android userspace debug prcoess might dump system info by sysrq.
One info is of cpu sched. Usually, one thread has one line dump.
Such log is huge sometimes and such dumping spends lots of time
and make the system worse. Sometimes, watchdog resets the system
in the end.

The patch fixes it by dumping cpu sched info only when
sched_debug_enabled is true.

Signed-off-by: Zhang Yanmin 
Signed-off-by: LongX Zhang 
---
 kernel/sched/core.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..92d2d83 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4973,6 +4973,25 @@ void sched_show_task(struct task_struct *p)
show_stack(p, NULL);
 }
 
+#ifdef CONFIG_SCHED_DEBUG
+
+static __read_mostly int sched_debug_enabled;
+
+static int __init sched_debug_setup(char *str)
+{
+   sched_debug_enabled = 1;
+
+   return 0;
+}
+early_param("sched_debug", sched_debug_setup);
+
+static inline bool sched_debug(void)
+{
+   return sched_debug_enabled;
+}
+
+#endif
+
 void show_state_filter(unsigned long state_filter)
 {
struct task_struct *g, *p;
@@ -4998,7 +5017,8 @@ void show_state_filter(unsigned long state_filter)
touch_all_softlockup_watchdogs();
 
 #ifdef CONFIG_SCHED_DEBUG
-   sysrq_sched_debug_show();
+   if (sched_debug())
+   sysrq_sched_debug_show();
 #endif
rcu_read_unlock();
/*
@@ -5499,21 +5519,6 @@ static cpumask_var_t sched_domains_tmpmask; /* 
sched_domains_mutex */
 
 #ifdef CONFIG_SCHED_DEBUG
 
-static __read_mostly int sched_debug_enabled;
-
-static int __init sched_debug_setup(char *str)
-{
-   sched_debug_enabled = 1;
-
-   return 0;
-}
-early_param("sched_debug", sched_debug_setup);
-
-static inline bool sched_debug(void)
-{
-   return sched_debug_enabled;
-}
-
 static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
  struct cpumask *groupmask)
 {
-- 
1.9.1





RE: [PATCH 5/6] Documentation: Binding doc for ethernet master in NS2

2016-04-24 Thread Pramod Kumar
Hi Rob,

Thanks for review and providing your valuable comments.

> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: 23 April 2016 01:44
> To: Pramod Kumar
> Cc: Catalin Marinas; Will Deacon; Masahiro Yamada; Chen-Yu Tsai; BCM
Kernel
> Feedback; Pawel Moll; Mark Rutland; Arnd Bergmann; Suzuki K Poulose;
Punit
> Agrawal; devicet...@vger.kernel.org;
linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [PATCH 5/6] Documentation: Binding doc for ethernet master
in
> NS2
>
> On Thu, Apr 21, 2016 at 02:48:42PM +0530, Pramod Kumar wrote:
> > Adding binding doc for ethernet master present in shared MDIO
> > controller.
> >
> > Signed-off-by: Pramod Kumar 
> > Reviewed-by: Ray Jui 
> > Reviewed-by: Scott Branden 
> > ---
> >  .../bindings/net/brcm,iproc-mdio-shared.txt| 32
> ++
> >  1 file changed, 32 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > b/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > new file mode 100644
> > index 000..1ffdd4b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > @@ -0,0 +1,32 @@
> > +Broadcom Ethernet master for shared mdio controller
> > +
> > +Required properties:
> > +- compatible: must be "brcm,iproc-mdio-master-eth"
> > +- reg: master id of Ethernet block
> > +- address-cells: should be 1
> > +- size-cells: should be 0
> > +
> > +Sub-nodes:
> > +  Each port's PHY should be represented as a sub-node.
> > +
> > +Sub-nodes required properties:
> > +- reg: the PHY number
> > +- phy-mode: media type connecting the PHY and MAC.
> > +
> > +
> > +Example:
> > +   eth-master@0 {
>
> Is this a child of something?
>

This is an Shared MDIO master node as described in cover letter and is the
child node of iProc_shared_mdio plarform driver.


> Why is this not just an mdio bus underneath the ethernet controller? How
is this
> accessed?

This is the part for Shared MDIO controller which is shared among other
subsystem as well hence defined here. This works as glue layer between
This controller and legacy MDIO framework and register the MDIO bus to
legacy framework.
When any read/write request is issue from Legacy MDIO controller framework
it gets propagated to Shared controller platform driver via this driver
and finally platform driver issue MDIO transaction over bus.

>
> > +   compatible = "brcm,iproc-mdio-master-eth";
> > +   reg = <0x0>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   gphy0: eth-phy@10 {
> > +   reg = <0x10>;
> > +   phy-mode = "mii";
> > +   };
> > +   status = "ok"
> > +   };
> > +
> > +For more info on ethernet phy binding, please,refer to:
> > +Documentation/devicetree/bindings/net/phy.txt
> > +Documentation/devicetree/bindings/net/ethernet.txt
> > --
> > 1.9.1
> >

Regards,
Pramod


RE: [PATCH 5/6] Documentation: Binding doc for ethernet master in NS2

2016-04-24 Thread Pramod Kumar
Hi Rob,

Thanks for review and providing your valuable comments.

> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: 23 April 2016 01:44
> To: Pramod Kumar
> Cc: Catalin Marinas; Will Deacon; Masahiro Yamada; Chen-Yu Tsai; BCM
Kernel
> Feedback; Pawel Moll; Mark Rutland; Arnd Bergmann; Suzuki K Poulose;
Punit
> Agrawal; devicet...@vger.kernel.org;
linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [PATCH 5/6] Documentation: Binding doc for ethernet master
in
> NS2
>
> On Thu, Apr 21, 2016 at 02:48:42PM +0530, Pramod Kumar wrote:
> > Adding binding doc for ethernet master present in shared MDIO
> > controller.
> >
> > Signed-off-by: Pramod Kumar 
> > Reviewed-by: Ray Jui 
> > Reviewed-by: Scott Branden 
> > ---
> >  .../bindings/net/brcm,iproc-mdio-shared.txt| 32
> ++
> >  1 file changed, 32 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > b/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > new file mode 100644
> > index 000..1ffdd4b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/brcm,iproc-mdio-shared.txt
> > @@ -0,0 +1,32 @@
> > +Broadcom Ethernet master for shared mdio controller
> > +
> > +Required properties:
> > +- compatible: must be "brcm,iproc-mdio-master-eth"
> > +- reg: master id of Ethernet block
> > +- address-cells: should be 1
> > +- size-cells: should be 0
> > +
> > +Sub-nodes:
> > +  Each port's PHY should be represented as a sub-node.
> > +
> > +Sub-nodes required properties:
> > +- reg: the PHY number
> > +- phy-mode: media type connecting the PHY and MAC.
> > +
> > +
> > +Example:
> > +   eth-master@0 {
>
> Is this a child of something?
>

This is an Shared MDIO master node as described in cover letter and is the
child node of iProc_shared_mdio plarform driver.


> Why is this not just an mdio bus underneath the ethernet controller? How
is this
> accessed?

This is the part for Shared MDIO controller which is shared among other
subsystem as well hence defined here. This works as glue layer between
This controller and legacy MDIO framework and register the MDIO bus to
legacy framework.
When any read/write request is issue from Legacy MDIO controller framework
it gets propagated to Shared controller platform driver via this driver
and finally platform driver issue MDIO transaction over bus.

>
> > +   compatible = "brcm,iproc-mdio-master-eth";
> > +   reg = <0x0>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   gphy0: eth-phy@10 {
> > +   reg = <0x10>;
> > +   phy-mode = "mii";
> > +   };
> > +   status = "ok"
> > +   };
> > +
> > +For more info on ethernet phy binding, please,refer to:
> > +Documentation/devicetree/bindings/net/phy.txt
> > +Documentation/devicetree/bindings/net/ethernet.txt
> > --
> > 1.9.1
> >

Regards,
Pramod


RE: [PATCH 2/6] Documentation: DT binding doc for iProc Shared MDIO Controller.

2016-04-24 Thread Pramod Kumar
Hi Rob,

Thanks for reviewing and providing your valuable comments.

> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: 23 April 2016 01:41
> To: Pramod Kumar
> Cc: Catalin Marinas; Will Deacon; Masahiro Yamada; Chen-Yu Tsai; Mark
> Rutland; devicet...@vger.kernel.org; Pawel Moll; Arnd Bergmann; Suzuki K
> Poulose; net...@vger.kernel.org; Punit Agrawal;
linux-kernel@vger.kernel.org;
> BCM Kernel Feedback; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH 2/6] Documentation: DT binding doc for iProc Shared
MDIO
> Controller.
>
> On Thu, Apr 21, 2016 at 02:48:39PM +0530, Pramod Kumar wrote:
> > Add DT binding doc for iProc Shared MDIO Controller which populate all
> > masters to Shared MDIO framework.
> >
> > Signed-off-by: Pramod Kumar 
> > Reviewed-by: Ray Jui 
> > Reviewed-by: Scott Branden 
> > ---
> >  .../bindings/bus/brcm,iproc-shared-mdio.txt| 76
> ++
> >  1 file changed, 76 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > b/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > new file mode 100644
> > index 000..f455f3d
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > @@ -0,0 +1,76 @@
> > +Broadcom iProc Shared MDIO Controller
> > +
> > +Required properties:
> > +- compatible:
> > +"brcm,iproc-shared-mdio" for shared MDIO controller.
> > +- reg: specifies the base physical address and size of the registers
> > +- reg-names: should be "mdio"
> > +- #address-cells: must be 1.
> > +- #size-cells: must be 0.
> > +
> > +optional:
> > +child nodes: Masters are represented as a child node of shared MDIO
> > +controller and all the PHYs handled by this master are represented as
its child
> node.
>
> Seems kind of useless if child nodes are optional.
>

I agree. I'll put it under required properties.

> > +
> > +Master nodes properties are defined as-
> > +
> > +Required properties:
> > +- compatible: Used to match driver of this PHY.
> > +- reg: MDIO master ID.
> > +- #address-cells: must be 1.
> > +- #size-cells: must be 0.
> > +
> > +optional:
> > +-brcm,phy-internal: if presents, PHYs are internal. Absence shows phy
> > +is external.
> > +-brcm,is-c45: if presents, Controller uses Clause-45 to issue MDIO
> transaction.
> > +else Controller uses Clause-22 for transactions.
>
> Isn't this a property of the phy? IIRC, there is a standard property or
compatible
> string for this.
>

Shared MDIO controller's master holds all above property to ensure the
proper MDIO transaction over its bus.  Hence tried to pinned here.
These properties are standard one for the Ethernet PHY but Broadcom SoCs
MDIO is shared even with other I/O subsystem. Other subsystem does not
defines this property hence has been defined here.

> > +
> > +PHY nodes are represented as the child node of Master. Child nodes
> > +properties are defined as-
> > +
> > +Required properties:
> > +-reg: phy id of attached PHY.
> > +
> > +optional:
> > +There could be additional properties required to configure the
> > +specific phy like phy-mode in case of gpphy node below. These should
> > +be defined here and used by respective drivers.
> > +
> > +Example:
> > +iproc_mdio: iproc_mdio@663f {
> > +   compatible = "brcm,iproc-shared-mdio";
> > +   reg = <0x6602023c 0x14>;
> > +   reg-names = "mdio";
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   sata-master@6 {
>
> mdio@6

Do you advise us to rename the node name as mdio@6? If yes, I don't have
any personal preferences here and would do it. I'd thought to give proper
name so that consumer could get what this node is representing.
Please suggest us.

>
> > +   compatible = "brcm,iproc-ns2-sata-phy";
> > +   reg = <0x6>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   brcm,phy-internal;
> > +
> > +   sata_phy0: sata-phy@1 {
>
> phy@1
>

Same as above.

> > +   reg = <0x1>;
> > +   #phy-cells = <0>;
> > +   };
> > +
> > +   sata_phy1: sata-phy@2 {
>
> phy@2
>

Same as above.

> > +   reg = <0x2>;
> > +   #phy-cells = <0>;
> > +   };
> > +   };
> > +
> > +   eth-master@0 {
>
> mdio@0

Same as above.
>
> > +   compatible = "brcm,iproc-mdio-master-eth";
> > +   reg = <0x0>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   gphy0: eth-phy@10 {
>
> phy@10
>

Same as above.
> > +   reg = <0x10>;
> > +   phy-mode = "mii";
> > +   };
> > +   };
> > +};
> > --
> > 1.9.1
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > 

RE: [PATCH 2/6] Documentation: DT binding doc for iProc Shared MDIO Controller.

2016-04-24 Thread Pramod Kumar
Hi Rob,

Thanks for reviewing and providing your valuable comments.

> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: 23 April 2016 01:41
> To: Pramod Kumar
> Cc: Catalin Marinas; Will Deacon; Masahiro Yamada; Chen-Yu Tsai; Mark
> Rutland; devicet...@vger.kernel.org; Pawel Moll; Arnd Bergmann; Suzuki K
> Poulose; net...@vger.kernel.org; Punit Agrawal;
linux-kernel@vger.kernel.org;
> BCM Kernel Feedback; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH 2/6] Documentation: DT binding doc for iProc Shared
MDIO
> Controller.
>
> On Thu, Apr 21, 2016 at 02:48:39PM +0530, Pramod Kumar wrote:
> > Add DT binding doc for iProc Shared MDIO Controller which populate all
> > masters to Shared MDIO framework.
> >
> > Signed-off-by: Pramod Kumar 
> > Reviewed-by: Ray Jui 
> > Reviewed-by: Scott Branden 
> > ---
> >  .../bindings/bus/brcm,iproc-shared-mdio.txt| 76
> ++
> >  1 file changed, 76 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > b/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > new file mode 100644
> > index 000..f455f3d
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/bus/brcm,iproc-shared-mdio.txt
> > @@ -0,0 +1,76 @@
> > +Broadcom iProc Shared MDIO Controller
> > +
> > +Required properties:
> > +- compatible:
> > +"brcm,iproc-shared-mdio" for shared MDIO controller.
> > +- reg: specifies the base physical address and size of the registers
> > +- reg-names: should be "mdio"
> > +- #address-cells: must be 1.
> > +- #size-cells: must be 0.
> > +
> > +optional:
> > +child nodes: Masters are represented as a child node of shared MDIO
> > +controller and all the PHYs handled by this master are represented as
its child
> node.
>
> Seems kind of useless if child nodes are optional.
>

I agree. I'll put it under required properties.

> > +
> > +Master nodes properties are defined as-
> > +
> > +Required properties:
> > +- compatible: Used to match driver of this PHY.
> > +- reg: MDIO master ID.
> > +- #address-cells: must be 1.
> > +- #size-cells: must be 0.
> > +
> > +optional:
> > +-brcm,phy-internal: if presents, PHYs are internal. Absence shows phy
> > +is external.
> > +-brcm,is-c45: if presents, Controller uses Clause-45 to issue MDIO
> transaction.
> > +else Controller uses Clause-22 for transactions.
>
> Isn't this a property of the phy? IIRC, there is a standard property or
compatible
> string for this.
>

Shared MDIO controller's master holds all above property to ensure the
proper MDIO transaction over its bus.  Hence tried to pinned here.
These properties are standard one for the Ethernet PHY but Broadcom SoCs
MDIO is shared even with other I/O subsystem. Other subsystem does not
defines this property hence has been defined here.

> > +
> > +PHY nodes are represented as the child node of Master. Child nodes
> > +properties are defined as-
> > +
> > +Required properties:
> > +-reg: phy id of attached PHY.
> > +
> > +optional:
> > +There could be additional properties required to configure the
> > +specific phy like phy-mode in case of gpphy node below. These should
> > +be defined here and used by respective drivers.
> > +
> > +Example:
> > +iproc_mdio: iproc_mdio@663f {
> > +   compatible = "brcm,iproc-shared-mdio";
> > +   reg = <0x6602023c 0x14>;
> > +   reg-names = "mdio";
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   sata-master@6 {
>
> mdio@6

Do you advise us to rename the node name as mdio@6? If yes, I don't have
any personal preferences here and would do it. I'd thought to give proper
name so that consumer could get what this node is representing.
Please suggest us.

>
> > +   compatible = "brcm,iproc-ns2-sata-phy";
> > +   reg = <0x6>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   brcm,phy-internal;
> > +
> > +   sata_phy0: sata-phy@1 {
>
> phy@1
>

Same as above.

> > +   reg = <0x1>;
> > +   #phy-cells = <0>;
> > +   };
> > +
> > +   sata_phy1: sata-phy@2 {
>
> phy@2
>

Same as above.

> > +   reg = <0x2>;
> > +   #phy-cells = <0>;
> > +   };
> > +   };
> > +
> > +   eth-master@0 {
>
> mdio@0

Same as above.
>
> > +   compatible = "brcm,iproc-mdio-master-eth";
> > +   reg = <0x0>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   gphy0: eth-phy@10 {
>
> phy@10
>

Same as above.
> > +   reg = <0x10>;
> > +   phy-mode = "mii";
> > +   };
> > +   };
> > +};
> > --
> > 1.9.1
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > 

Re: [RFC][PATCH 03/31] locking,arc: Implement atomic_fetch_{add,sub,and,andnot,or,xor}()

2016-04-24 Thread Vineet Gupta
On Friday 22 April 2016 07:46 PM, Peter Zijlstra wrote:
> On Fri, Apr 22, 2016 at 10:50:41AM +, Vineet Gupta wrote:
>
>>> > > +#define ATOMIC_FETCH_OP(op, c_op, asm_op)  
>>> > > \
>>> > > +static inline int atomic_fetch_##op(int i, atomic_t *v)
>>> > > \
>>> > > +{  
>>> > > \
>>> > > +   unsigned int val, result;   
>>> > > \
>>> > > +   SCOND_FAIL_RETRY_VAR_DEF
>>> > > \
>>> > > +   
>>> > > \
>>> > > +   /*  
>>> > > \
>>> > > +* Explicit full memory barrier needed before/after as  
>>> > > \
>>> > > +* LLOCK/SCOND thmeselves don't provide any such semantics  
>>> > > \
>>> > > +*/ 
>>> > > \
>>> > > +   smp_mb();   
>>> > > \
>>> > > +   
>>> > > \
>>> > > +   __asm__ __volatile__(   
>>> > > \
>>> > > +   "1: llock   %[val], [%[ctr]]\n" 
>>> > > \
>>> > > +   "   mov %[result], %[val]   \n" 
>>> > > \
>> > 
>> > Calling it result could be a bit confusing, this is meant to be the "orig" 
>> > value.
>> > So it indeed "result" of the API, but for atomic operation it is pristine 
>> > value.
>> > 
>> > Also we can optimize away that MOV - given there are plenty of regs, so
>> > 
>>> > > +   "   " #asm_op " %[val], %[val], %[i]\n" 
>>> > > \
>>> > > +   "   scond   %[val], [%[ctr]]\n" 
>>> > > \
>> > 
>> > Instead have
>> > 
>> > +  "   " #asm_op " %[result], %[val], %[i] \n" \
>> > +  "   scond   %[result], [%[ctr]] \n" \
>> > 
>> > 
> Indeed, how about something like so?
>
> ---
> Subject: locking,arc: Implement atomic_fetch_{add,sub,and,andnot,or,xor}()
> From: Peter Zijlstra 
> Date: Mon Apr 18 01:16:09 CEST 2016
>
> Implement FETCH-OP atomic primitives, these are very similar to the
> existing OP-RETURN primitives we already have, except they return the
> value of the atomic variable _before_ modification.
>
> This is especially useful for irreversible operations -- such as
> bitops (because it becomes impossible to reconstruct the state prior
> to modification).
>
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Vineet Gupta 



Re: [RFC][PATCH 03/31] locking,arc: Implement atomic_fetch_{add,sub,and,andnot,or,xor}()

2016-04-24 Thread Vineet Gupta
On Friday 22 April 2016 07:46 PM, Peter Zijlstra wrote:
> On Fri, Apr 22, 2016 at 10:50:41AM +, Vineet Gupta wrote:
>
>>> > > +#define ATOMIC_FETCH_OP(op, c_op, asm_op)  
>>> > > \
>>> > > +static inline int atomic_fetch_##op(int i, atomic_t *v)
>>> > > \
>>> > > +{  
>>> > > \
>>> > > +   unsigned int val, result;   
>>> > > \
>>> > > +   SCOND_FAIL_RETRY_VAR_DEF
>>> > > \
>>> > > +   
>>> > > \
>>> > > +   /*  
>>> > > \
>>> > > +* Explicit full memory barrier needed before/after as  
>>> > > \
>>> > > +* LLOCK/SCOND thmeselves don't provide any such semantics  
>>> > > \
>>> > > +*/ 
>>> > > \
>>> > > +   smp_mb();   
>>> > > \
>>> > > +   
>>> > > \
>>> > > +   __asm__ __volatile__(   
>>> > > \
>>> > > +   "1: llock   %[val], [%[ctr]]\n" 
>>> > > \
>>> > > +   "   mov %[result], %[val]   \n" 
>>> > > \
>> > 
>> > Calling it result could be a bit confusing, this is meant to be the "orig" 
>> > value.
>> > So it indeed "result" of the API, but for atomic operation it is pristine 
>> > value.
>> > 
>> > Also we can optimize away that MOV - given there are plenty of regs, so
>> > 
>>> > > +   "   " #asm_op " %[val], %[val], %[i]\n" 
>>> > > \
>>> > > +   "   scond   %[val], [%[ctr]]\n" 
>>> > > \
>> > 
>> > Instead have
>> > 
>> > +  "   " #asm_op " %[result], %[val], %[i] \n" \
>> > +  "   scond   %[result], [%[ctr]] \n" \
>> > 
>> > 
> Indeed, how about something like so?
>
> ---
> Subject: locking,arc: Implement atomic_fetch_{add,sub,and,andnot,or,xor}()
> From: Peter Zijlstra 
> Date: Mon Apr 18 01:16:09 CEST 2016
>
> Implement FETCH-OP atomic primitives, these are very similar to the
> existing OP-RETURN primitives we already have, except they return the
> value of the atomic variable _before_ modification.
>
> This is especially useful for irreversible operations -- such as
> bitops (because it becomes impossible to reconstruct the state prior
> to modification).
>
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Vineet Gupta 



Re: [PATCH 2/2 v6] arc: axs10x: Add DT bindings for I2S PLL Clock

2016-04-24 Thread Vineet Gupta
On Thursday 21 April 2016 10:49 PM, Jose Abreu wrote:
> Add device tree bindings for AXS10X I2S PLL Clock driver.
>
> Signed-off-by: Jose Abreu 

Lets worry about different firmware versions et all after basic patch is merged.
I presume this patch will be merged via clk tree ?

Acked-by: Vineet Gupta 



Re: [PATCH 2/2 v6] arc: axs10x: Add DT bindings for I2S PLL Clock

2016-04-24 Thread Vineet Gupta
On Thursday 21 April 2016 10:49 PM, Jose Abreu wrote:
> Add device tree bindings for AXS10X I2S PLL Clock driver.
>
> Signed-off-by: Jose Abreu 

Lets worry about different firmware versions et all after basic patch is merged.
I presume this patch will be merged via clk tree ?

Acked-by: Vineet Gupta 



Re: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()

2016-04-24 Thread Viresh Kumar
On 25-04-16, 03:07, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> The way cpufreq_governor_start() initializes j_cdbs->prev_load is
> questionable.
> 
> First off, j_cdbs->prev_cpu_wall used as a denominator in the
> computation may be zero.  The case this happens is when
> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
> used to return that number is called exactly at the jiffies_64
> wrap time.  It is rather hard to trigger that error, but it is not
> impossible and it will just crash the kernel then.
> 
> Second, j_cdbs->prev_load is computed as the average load during
> the entire time since the system started and it may not reflect the
> load in the previous sampling period (as it is expected to).
> That doesn't play well with the way dbs_update() uses that value.
> Namely, if the update time delta (wall_time) happens do be greater
> than twice the sampling rate on the first invocation of it, the
> initial value of j_cdbs->prev_load (which may be completely off) will
> be returned to the caller as the current load (unless it is equal to
> zero and unless another CPU sharing the same policy object has a
> greater load value).
> 
> For this reason, notice that the prev_load field of struct cpu_dbs_info
> is only used by dbs_update() and only in that one place, so if
> cpufreq_governor_start() is modified to always initialize it to 0,
> it will make dbs_update() always compute the actual load first time
> it checks the update time delta against the doubled sampling rate
> (after initialization) and there won't be any side effects of it.
> 
> Consequently, modify cpufreq_governor_start() as described.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/cpufreq/cpufreq_governor.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> ===
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
>  
>   for_each_cpu(j, policy->cpus) {
>   struct cpu_dbs_info *j_cdbs = _cpu(cpu_dbs, j);
> - unsigned int prev_load;
>  
>   j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, 
> _cdbs->prev_cpu_wall, io_busy);
> -
> - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
> - j_cdbs->prev_load = 100 * prev_load / (unsigned 
> int)j_cdbs->prev_cpu_wall;
> + /*
> +  * Make the first invocation of dbs_update() compute the load.
> +  */
> + j_cdbs->prev_load = 0;
>  
>   if (ignore_nice)
>   j_cdbs->prev_cpu_nice = 
> kcpustat_cpu(j).cpustat[CPUTIME_NICE];

I tried to understand why the

commit 18b46abd0009 ("cpufreq: governor: Be friendly towards
latency-sensitive bursty workloads")

modify the START section and added this stuff and I completely failed
to understand it now. Do you remember why was this added at all ?

-- 
viresh


Re: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()

2016-04-24 Thread Viresh Kumar
On 25-04-16, 03:07, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> The way cpufreq_governor_start() initializes j_cdbs->prev_load is
> questionable.
> 
> First off, j_cdbs->prev_cpu_wall used as a denominator in the
> computation may be zero.  The case this happens is when
> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
> used to return that number is called exactly at the jiffies_64
> wrap time.  It is rather hard to trigger that error, but it is not
> impossible and it will just crash the kernel then.
> 
> Second, j_cdbs->prev_load is computed as the average load during
> the entire time since the system started and it may not reflect the
> load in the previous sampling period (as it is expected to).
> That doesn't play well with the way dbs_update() uses that value.
> Namely, if the update time delta (wall_time) happens do be greater
> than twice the sampling rate on the first invocation of it, the
> initial value of j_cdbs->prev_load (which may be completely off) will
> be returned to the caller as the current load (unless it is equal to
> zero and unless another CPU sharing the same policy object has a
> greater load value).
> 
> For this reason, notice that the prev_load field of struct cpu_dbs_info
> is only used by dbs_update() and only in that one place, so if
> cpufreq_governor_start() is modified to always initialize it to 0,
> it will make dbs_update() always compute the actual load first time
> it checks the update time delta against the doubled sampling rate
> (after initialization) and there won't be any side effects of it.
> 
> Consequently, modify cpufreq_governor_start() as described.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/cpufreq/cpufreq_governor.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> ===
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
>  
>   for_each_cpu(j, policy->cpus) {
>   struct cpu_dbs_info *j_cdbs = _cpu(cpu_dbs, j);
> - unsigned int prev_load;
>  
>   j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, 
> _cdbs->prev_cpu_wall, io_busy);
> -
> - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
> - j_cdbs->prev_load = 100 * prev_load / (unsigned 
> int)j_cdbs->prev_cpu_wall;
> + /*
> +  * Make the first invocation of dbs_update() compute the load.
> +  */
> + j_cdbs->prev_load = 0;
>  
>   if (ignore_nice)
>   j_cdbs->prev_cpu_nice = 
> kcpustat_cpu(j).cpustat[CPUTIME_NICE];

I tried to understand why the

commit 18b46abd0009 ("cpufreq: governor: Be friendly towards
latency-sensitive bursty workloads")

modify the START section and added this stuff and I completely failed
to understand it now. Do you remember why was this added at all ?

-- 
viresh


RE: [PATCH 1/6] bus: Add shared MDIO bus framework

2016-04-24 Thread Pramod Kumar
Hi David,

Thanks for providing input over the patch.  Will address the comment as
described below.

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: 24 April 2016 23:48
> To: pramod.ku...@broadcom.com
> Cc: robh...@kernel.org; catalin.mari...@arm.com; will.dea...@arm.com;
> yamada.masah...@socionext.com; w...@csie.org; bcm-kernel-feedback-
> l...@broadcom.com; pawel.m...@arm.com; mark.rutl...@arm.com;
> a...@arndb.de; suzuki.poul...@arm.com; punit.agra...@arm.com;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org; anup.pa...@broadcom.com
> Subject: Re: [PATCH 1/6] bus: Add shared MDIO bus framework
>
> From: Pramod Kumar 
> Date: Thu, 21 Apr 2016 14:48:38 +0530
>
> > +struct shared_mdio_master *shared_mdio_alloc_master(struct device
> *parent,
> > +   struct device_node
*node)
> > +{
> > +   int ret = 0;
> > +   struct shared_mdio_master *master;
>
> Always order local variable declarations in reverse christmas tree
(longest to
> shortest line) order.
>

Sure. Next patch will address this.

> > +static int shared_mdio_driver_probe(struct device *dev) {
> > +   int rc;
> > +   struct shared_mdio_master *master = to_shared_mdio_master(dev);
> > +   struct shared_mdio_driver *drv =
to_shared_mdio_driver(dev->driver);
>
> Likewise.

Sure.

> Please audit your entire submission for this issue.

Sure. I'll audit the entire patch set for above issue.

Regards,
Pramod


RE: [PATCH 1/6] bus: Add shared MDIO bus framework

2016-04-24 Thread Pramod Kumar
Hi David,

Thanks for providing input over the patch.  Will address the comment as
described below.

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: 24 April 2016 23:48
> To: pramod.ku...@broadcom.com
> Cc: robh...@kernel.org; catalin.mari...@arm.com; will.dea...@arm.com;
> yamada.masah...@socionext.com; w...@csie.org; bcm-kernel-feedback-
> l...@broadcom.com; pawel.m...@arm.com; mark.rutl...@arm.com;
> a...@arndb.de; suzuki.poul...@arm.com; punit.agra...@arm.com;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org; anup.pa...@broadcom.com
> Subject: Re: [PATCH 1/6] bus: Add shared MDIO bus framework
>
> From: Pramod Kumar 
> Date: Thu, 21 Apr 2016 14:48:38 +0530
>
> > +struct shared_mdio_master *shared_mdio_alloc_master(struct device
> *parent,
> > +   struct device_node
*node)
> > +{
> > +   int ret = 0;
> > +   struct shared_mdio_master *master;
>
> Always order local variable declarations in reverse christmas tree
(longest to
> shortest line) order.
>

Sure. Next patch will address this.

> > +static int shared_mdio_driver_probe(struct device *dev) {
> > +   int rc;
> > +   struct shared_mdio_master *master = to_shared_mdio_master(dev);
> > +   struct shared_mdio_driver *drv =
to_shared_mdio_driver(dev->driver);
>
> Likewise.

Sure.

> Please audit your entire submission for this issue.

Sure. I'll audit the entire patch set for above issue.

Regards,
Pramod


[PATCH] arm64: fix /proc/cpuinfo for elf32

2016-04-24 Thread Zeng Tao
For elf32 thread, personality is used for arm32,
and thread_flag for arm64.

Here personality is used for arm64, so fix it.

Signed-off-by: Zeng Tao 
---
 arch/arm64/kernel/cpuinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 84c8684..f739398 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -126,7 +126,7 @@ static int c_show(struct seq_file *m, void *v)
 * software which does already (at least for 32-bit).
 */
seq_puts(m, "Features\t:");
-   if (personality(current->personality) == PER_LINUX32) {
+   if (test_thread_flag(TIF_32BIT)) {
 #ifdef CONFIG_COMPAT
for (j = 0; compat_hwcap_str[j]; j++)
if (compat_elf_hwcap & (1 << j))
-- 
1.9.1



[PATCH] arm64: fix /proc/cpuinfo for elf32

2016-04-24 Thread Zeng Tao
For elf32 thread, personality is used for arm32,
and thread_flag for arm64.

Here personality is used for arm64, so fix it.

Signed-off-by: Zeng Tao 
---
 arch/arm64/kernel/cpuinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 84c8684..f739398 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -126,7 +126,7 @@ static int c_show(struct seq_file *m, void *v)
 * software which does already (at least for 32-bit).
 */
seq_puts(m, "Features\t:");
-   if (personality(current->personality) == PER_LINUX32) {
+   if (test_thread_flag(TIF_32BIT)) {
 #ifdef CONFIG_COMPAT
for (j = 0; compat_hwcap_str[j]; j++)
if (compat_elf_hwcap & (1 << j))
-- 
1.9.1



Re: [PATCH net-next] hv_netvsc: Fix the list processing for network change event

2016-04-24 Thread David Miller
From: Haiyang Zhang 
Date: Thu, 21 Apr 2016 16:13:01 -0700

> RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
> media disconnect & connect. The second half should be added to the list
> head, not to the tail. So all events are processed in normal order.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: K. Y. Srinivasan 

Applied, thanks.


Re: [PATCH net-next] hv_netvsc: Fix the list processing for network change event

2016-04-24 Thread David Miller
From: Haiyang Zhang 
Date: Thu, 21 Apr 2016 16:13:01 -0700

> RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
> media disconnect & connect. The second half should be added to the list
> head, not to the tail. So all events are processed in normal order.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: K. Y. Srinivasan 

Applied, thanks.


Re: [PATCH net] ipv4/fib: don't warn when primary address is missing if in_dev is dead

2016-04-24 Thread David Miller
From: Paolo Abeni 
Date: Thu, 21 Apr 2016 22:23:31 +0200

> After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
> during inetdev destroy.") when deleting an interface,
> fib_del_ifaddr() can be executed without any primary address
> present on the dead interface.
> 
> The above is safe, but triggers some "bug: prim == NULL" warnings.
> 
> This commit avoids warning if the in_dev is dead
> 
> Signed-off-by: Paolo Abeni 

Applied, thank you.


Re: [PATCH net] ipv4/fib: don't warn when primary address is missing if in_dev is dead

2016-04-24 Thread David Miller
From: Paolo Abeni 
Date: Thu, 21 Apr 2016 22:23:31 +0200

> After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
> during inetdev destroy.") when deleting an interface,
> fib_del_ifaddr() can be executed without any primary address
> present on the dead interface.
> 
> The above is safe, but triggers some "bug: prim == NULL" warnings.
> 
> This commit avoids warning if the in_dev is dead
> 
> Signed-off-by: Paolo Abeni 

Applied, thank you.


Re: [RFC][PATCHSET] reduce messing with iovecs in cifs

2016-04-24 Thread Steve French
Reviewed-by: Steve French 

Let me know if you want any of them to go in via the cifs tree or
prefer going in through your tree (other than patch 1 which could go
in the net-next tree are you indicated)

On Sat, Apr 9, 2016 at 3:43 PM, Al Viro  wrote:
> Now that sendmsg/recvmsg do not mangle iovecs and are capable of
> handling bvec-based ->msg_iter, we can seriously reduce the amount of PITA
> in cifs.  The series below is completely untested, and I would appreciate
> comments/review/testing/etc.
>
> I'll post the individual patches in followups; for those who prefer to use
> git it can be found in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git sendmsg.cifs
>
> 1/6: [net] drop 'size' argument of sock_recvmsg()
> should go via net-next; does what it says.
>
> 2/6: cifs: merge the hash calculation helpers
> takes the common parts of {cifs,smb2,smb3}_calc_signature() into a
> common helper.
>
> 3/6: cifs: quit playing games with draining iovecs
> Switch smb_send_kvec() to passing msghdr (and thus iov_iter) and
> make it use sock_sendmsg() - that allows to avoid draining iovecs, since
> ->msg_iter will be advanced properly and all we need is to keep it around
> between the calls of sock_sendmsg(), rather than reinitializing it on each
> loop iteration.  The same thing allows to get rid of messing with kmap()
> when sending the stuff in ->rq_pages[] - ITER_BVEC will do the right thing.
>
> 4/6: cifs: no need to wank with copying and advancing iovec on recvmsg side 
> either
> Similar to the previous - use sock_recvmsg() in 
> cifs_readv_from_socket()
> and there's no need to modify iovecs, or allocate a copy especially for
> such modifications, etc.
>
> 5/6: cifs_readv_receive: use cifs_read_from_socket()
> building a 1-element iovec array for cifs_readv_from_socket() is
> an overkill - simple cifs_read_from_socket() will do just fine.
>
> 6/6: cifs: don't bother with kmap on read_pages side
> Similar to the other half of 3/6: we can use ITER_BVEC for
> read-into-page case.  Just make cifs_readv_from_socket() take msghdr from
> caller and use a helper that would feed it a bvec-backed ->msg_iter.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve


Re: [RFC][PATCHSET] reduce messing with iovecs in cifs

2016-04-24 Thread Steve French
Reviewed-by: Steve French 

Let me know if you want any of them to go in via the cifs tree or
prefer going in through your tree (other than patch 1 which could go
in the net-next tree are you indicated)

On Sat, Apr 9, 2016 at 3:43 PM, Al Viro  wrote:
> Now that sendmsg/recvmsg do not mangle iovecs and are capable of
> handling bvec-based ->msg_iter, we can seriously reduce the amount of PITA
> in cifs.  The series below is completely untested, and I would appreciate
> comments/review/testing/etc.
>
> I'll post the individual patches in followups; for those who prefer to use
> git it can be found in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git sendmsg.cifs
>
> 1/6: [net] drop 'size' argument of sock_recvmsg()
> should go via net-next; does what it says.
>
> 2/6: cifs: merge the hash calculation helpers
> takes the common parts of {cifs,smb2,smb3}_calc_signature() into a
> common helper.
>
> 3/6: cifs: quit playing games with draining iovecs
> Switch smb_send_kvec() to passing msghdr (and thus iov_iter) and
> make it use sock_sendmsg() - that allows to avoid draining iovecs, since
> ->msg_iter will be advanced properly and all we need is to keep it around
> between the calls of sock_sendmsg(), rather than reinitializing it on each
> loop iteration.  The same thing allows to get rid of messing with kmap()
> when sending the stuff in ->rq_pages[] - ITER_BVEC will do the right thing.
>
> 4/6: cifs: no need to wank with copying and advancing iovec on recvmsg side 
> either
> Similar to the previous - use sock_recvmsg() in 
> cifs_readv_from_socket()
> and there's no need to modify iovecs, or allocate a copy especially for
> such modifications, etc.
>
> 5/6: cifs_readv_receive: use cifs_read_from_socket()
> building a 1-element iovec array for cifs_readv_from_socket() is
> an overkill - simple cifs_read_from_socket() will do just fine.
>
> 6/6: cifs: don't bother with kmap on read_pages side
> Similar to the other half of 3/6: we can use ITER_BVEC for
> read-into-page case.  Just make cifs_readv_from_socket() take msghdr from
> caller and use a helper that would feed it a bvec-backed ->msg_iter.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve


[PATCH] Use MICRO UINT_MAX instead of actual value

2016-04-24 Thread Minfei Huang
It's more elegant to use MICRO UINT_MAX to represent the max value of
type unsigned int. So replace the actual value by using this MICRO.

Signed-off-by: Minfei Huang 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 643f457..2c0bb13 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -597,7 +597,7 @@ static void nvme_config_discard(struct nvme_ns *ns)
 
ns->queue->limits.discard_alignment = logical_block_size;
ns->queue->limits.discard_granularity = logical_block_size;
-   blk_queue_max_discard_sectors(ns->queue, 0x);
+   blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
 }
 
-- 
2.6.3



[PATCH] Use MICRO UINT_MAX instead of actual value

2016-04-24 Thread Minfei Huang
It's more elegant to use MICRO UINT_MAX to represent the max value of
type unsigned int. So replace the actual value by using this MICRO.

Signed-off-by: Minfei Huang 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 643f457..2c0bb13 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -597,7 +597,7 @@ static void nvme_config_discard(struct nvme_ns *ns)
 
ns->queue->limits.discard_alignment = logical_block_size;
ns->queue->limits.discard_granularity = logical_block_size;
-   blk_queue_max_discard_sectors(ns->queue, 0x);
+   blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
 }
 
-- 
2.6.3



RE: [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-24 Thread Li, Liang Z
> On Fri, Apr 22, 2016 at 10:48:38AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Tue, Apr 19, 2016 at 03:02:09PM +, Li, Liang Z wrote:
> > > > > On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote:
> > > > > > The free page bitmap will be sent to QEMU through virtio
> > > > > > interface and used for live migration optimization.
> > > > > > Drop the cache before building the free page bitmap can get
> > > > > > more free pages. Whether dropping the cache is decided by user.
> > > > > >
> > > > >
> > > > > How do you prevent the guest from using those recently-freed
> > > > > pages for something else, between when you build the bitmap and
> > > > > the live migration completes?
> > > >
> > > > Because the dirty page logging is enabled before building the
> > > > bitmap, there is no need to prevent the guest from using the recently-
> freed pages ...
> > > >
> > > > Liang
> > >
> > > Well one point of telling host that page is free is so that it can
> > > mark it clean even if it was dirty previously.
> > > So I think you must pass the pages to guest under the lock.
> > > This will allow host optimizations such as marking these pages
> > > MADV_DONTNEED or MADV_FREE.
> > > Otherwise it's all too tied up to a specific usecase - you aren't
> > > telling host that a page is free, you are telling it that a page was
> > > free in the past.
> >
> > But doing it under lock sounds pretty expensive, especially given how
> > long the userspace side is going to take to work through the bitmap
> > and device what to do.
> >
> > Dave
> 
> We need to make it as fast as we can since the VCPU is stopped on exit
> anyway. This just means e.g. sizing the bitmap reasonably - don't always try
> to fit all memory in a single bitmap.

Then we should pause the whole VM when using the bitmap, too expensive?

> Really, if the page can in fact be in use when you tell host it's free, then 
> it's
> rather hard to explain what does it mean from host/guest interface point of
> view.
> 

How about rename the interface to a more appropriate name other than 'free 
page' ?

Liang.
> It probably can be defined but the interface seems very complex.
> 
> Let's start with a simple thing instead unless it can be shown that there's a
> performance problem.
> 
> 
> > >
> > > --
> > > MST
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


RE: [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-24 Thread Li, Liang Z
> On Fri, Apr 22, 2016 at 10:48:38AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Tue, Apr 19, 2016 at 03:02:09PM +, Li, Liang Z wrote:
> > > > > On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote:
> > > > > > The free page bitmap will be sent to QEMU through virtio
> > > > > > interface and used for live migration optimization.
> > > > > > Drop the cache before building the free page bitmap can get
> > > > > > more free pages. Whether dropping the cache is decided by user.
> > > > > >
> > > > >
> > > > > How do you prevent the guest from using those recently-freed
> > > > > pages for something else, between when you build the bitmap and
> > > > > the live migration completes?
> > > >
> > > > Because the dirty page logging is enabled before building the
> > > > bitmap, there is no need to prevent the guest from using the recently-
> freed pages ...
> > > >
> > > > Liang
> > >
> > > Well one point of telling host that page is free is so that it can
> > > mark it clean even if it was dirty previously.
> > > So I think you must pass the pages to guest under the lock.
> > > This will allow host optimizations such as marking these pages
> > > MADV_DONTNEED or MADV_FREE.
> > > Otherwise it's all too tied up to a specific usecase - you aren't
> > > telling host that a page is free, you are telling it that a page was
> > > free in the past.
> >
> > But doing it under lock sounds pretty expensive, especially given how
> > long the userspace side is going to take to work through the bitmap
> > and device what to do.
> >
> > Dave
> 
> We need to make it as fast as we can since the VCPU is stopped on exit
> anyway. This just means e.g. sizing the bitmap reasonably - don't always try
> to fit all memory in a single bitmap.

Then we should pause the whole VM when using the bitmap, too expensive?

> Really, if the page can in fact be in use when you tell host it's free, then 
> it's
> rather hard to explain what does it mean from host/guest interface point of
> view.
> 

How about rename the interface to a more appropriate name other than 'free 
page' ?

Liang.
> It probably can be defined but the interface seems very complex.
> 
> Let's start with a simple thing instead unless it can be shown that there's a
> performance problem.
> 
> 
> > >
> > > --
> > > MST
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [PATCH 26/41] Documentation: kasan: fix spelling mistake

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/kasan.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
> index 7dd95b3..9c696e4 100644
> --- a/Documentation/kasan.txt
> +++ b/Documentation/kasan.txt
> @@ -116,7 +116,7 @@ Memory state around the buggy address:
>   8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ==
>  
> -The header of the report discribe what kind of bug happened and what kind of
> +The header of the report describe what kind of bug happened and what kind of

describes

>  access caused it. It's followed by the description of the accessed slub 
> object
>  (see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) 
> and
>  the description of the accessed memory page.
> 


-- 
~Randy


Re: [PATCH 26/41] Documentation: kasan: fix spelling mistake

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/kasan.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
> index 7dd95b3..9c696e4 100644
> --- a/Documentation/kasan.txt
> +++ b/Documentation/kasan.txt
> @@ -116,7 +116,7 @@ Memory state around the buggy address:
>   8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ==
>  
> -The header of the report discribe what kind of bug happened and what kind of
> +The header of the report describe what kind of bug happened and what kind of

describes

>  access caused it. It's followed by the description of the accessed slub 
> object
>  (see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) 
> and
>  the description of the accessed memory page.
> 


-- 
~Randy


[PATCH 4/4] thermal: bang-bang governor: act on lower trip boundary

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

With interrupt driven thermal zones we pass the lower and upper
temperature on which shall be acted, so in the governor we have to act on
the exact lower temperature to be consistent. Otherwise an interrupt maybe
generated on the exact lower temperature, but the bang bang governor does
not react.

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org

---

 drivers/thermal/gov_bang_bang.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 70836c5..9d1dfea 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -59,7 +59,7 @@ static void thermal_zone_trip_update(struct 
thermal_zone_device *tz, int trip)
if (instance->target == 0 && tz->temperature >= trip_temp)
instance->target = 1;
else if (instance->target == 1 &&
-   tz->temperature < trip_temp - trip_hyst)
+   tz->temperature <= trip_temp - trip_hyst)
instance->target = 0;
 
dev_dbg(>cdev->device, "target=%d\n",
-- 
1.9.1



[PATCH 4/4] thermal: bang-bang governor: act on lower trip boundary

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

With interrupt driven thermal zones we pass the lower and upper
temperature on which shall be acted, so in the governor we have to act on
the exact lower temperature to be consistent. Otherwise an interrupt maybe
generated on the exact lower temperature, but the bang bang governor does
not react.

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org

---

 drivers/thermal/gov_bang_bang.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 70836c5..9d1dfea 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -59,7 +59,7 @@ static void thermal_zone_trip_update(struct 
thermal_zone_device *tz, int trip)
if (instance->target == 0 && tz->temperature >= trip_temp)
instance->target = 1;
else if (instance->target == 1 &&
-   tz->temperature < trip_temp - trip_hyst)
+   tz->temperature <= trip_temp - trip_hyst)
instance->target = 0;
 
dev_dbg(>cdev->device, "target=%d\n",
-- 
1.9.1



[PATCH 3/4] thermal: streamline get_trend callbacks

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

The .get_trend callback in struct thermal_zone_device_ops has
the prototype:
int (*get_trend) (struct thermal_zone_device *, int,
  enum thermal_trend *);
whereas the .get_trend callback in struct thermal_zone_of_device_ops
has:
int (*get_trend)(void *, long *);

Streamline both prototypes and add the trip argument to the OF callback
aswell and use enum thermal_trend * instead of an integer pointer.

While the OF prototype may be the better one, this should be decided at
framework level and not on OF level.

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/of-thermal.c   | 11 +-
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 25 +++---
 include/linux/thermal.h|  2 +-
 3 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index 8722e63..13833d9 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -202,24 +202,15 @@ static int of_thermal_get_trend(struct 
thermal_zone_device *tz, int trip,
enum thermal_trend *trend)
 {
struct __thermal_zone *data = tz->devdata;
-   long dev_trend;
int r;
 
if (!data->ops->get_trend)
return -EINVAL;
 
-   r = data->ops->get_trend(data->sensor_data, _trend);
+   r = data->ops->get_trend(data->sensor_data, trip, trend);
if (r)
return r;
 
-   /* TODO: These intervals might have some thresholds, but in core code */
-   if (dev_trend > 0)
-   *trend = THERMAL_TREND_RAISING;
-   else if (dev_trend < 0)
-   *trend = THERMAL_TREND_DROPPING;
-   else
-   *trend = THERMAL_TREND_STABLE;
-
return 0;
 }
 
diff --git a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c 
b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
index 15c0a9a..4a6757c 100644
--- a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
+++ b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
@@ -239,7 +239,7 @@ static int ti_thermal_get_trip_temp(struct 
thermal_zone_device *thermal,
return 0;
 }
 
-static int __ti_thermal_get_trend(void *p, long *trend)
+static int __ti_thermal_get_trend(void *p, int trip, enum thermal_trend *trend)
 {
struct ti_thermal_data *data = p;
struct ti_bandgap *bgp;
@@ -252,22 +252,6 @@ static int __ti_thermal_get_trend(void *p, long *trend)
if (ret)
return ret;
 
-   *trend = tr;
-
-   return 0;
-}
-
-/* Get the temperature trend callback functions for thermal zone */
-static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
-   int trip, enum thermal_trend *trend)
-{
-   int ret;
-   long tr;
-
-   ret = __ti_thermal_get_trend(thermal->devdata, );
-   if (ret)
-   return ret;
-
if (tr > 0)
*trend = THERMAL_TREND_RAISING;
else if (tr < 0)
@@ -278,6 +262,13 @@ static int ti_thermal_get_trend(struct thermal_zone_device 
*thermal,
return 0;
 }
 
+/* Get the temperature trend callback functions for thermal zone */
+static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
+   int trip, enum thermal_trend *trend)
+{
+   return __ti_thermal_get_trend(thermal->devdata, trip, trend);
+}
+
 /* Get critical temperature callback functions for thermal zone */
 static int ti_thermal_get_crit_temp(struct thermal_zone_device *thermal,
int *temp)
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index cb64866..3b96961 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -344,7 +344,7 @@ struct thermal_genl_event {
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
-   int (*get_trend)(void *, long *);
+   int (*get_trend)(void *, int, enum thermal_trend *);
int (*set_trips)(void *, int, int);
int (*set_emul_temp)(void *, int);
int (*set_trip_temp)(void *, int, int);
-- 
1.9.1



[PATCH 3/4] thermal: streamline get_trend callbacks

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

The .get_trend callback in struct thermal_zone_device_ops has
the prototype:
int (*get_trend) (struct thermal_zone_device *, int,
  enum thermal_trend *);
whereas the .get_trend callback in struct thermal_zone_of_device_ops
has:
int (*get_trend)(void *, long *);

Streamline both prototypes and add the trip argument to the OF callback
aswell and use enum thermal_trend * instead of an integer pointer.

While the OF prototype may be the better one, this should be decided at
framework level and not on OF level.

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/of-thermal.c   | 11 +-
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 25 +++---
 include/linux/thermal.h|  2 +-
 3 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index 8722e63..13833d9 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -202,24 +202,15 @@ static int of_thermal_get_trend(struct 
thermal_zone_device *tz, int trip,
enum thermal_trend *trend)
 {
struct __thermal_zone *data = tz->devdata;
-   long dev_trend;
int r;
 
if (!data->ops->get_trend)
return -EINVAL;
 
-   r = data->ops->get_trend(data->sensor_data, _trend);
+   r = data->ops->get_trend(data->sensor_data, trip, trend);
if (r)
return r;
 
-   /* TODO: These intervals might have some thresholds, but in core code */
-   if (dev_trend > 0)
-   *trend = THERMAL_TREND_RAISING;
-   else if (dev_trend < 0)
-   *trend = THERMAL_TREND_DROPPING;
-   else
-   *trend = THERMAL_TREND_STABLE;
-
return 0;
 }
 
diff --git a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c 
b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
index 15c0a9a..4a6757c 100644
--- a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
+++ b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
@@ -239,7 +239,7 @@ static int ti_thermal_get_trip_temp(struct 
thermal_zone_device *thermal,
return 0;
 }
 
-static int __ti_thermal_get_trend(void *p, long *trend)
+static int __ti_thermal_get_trend(void *p, int trip, enum thermal_trend *trend)
 {
struct ti_thermal_data *data = p;
struct ti_bandgap *bgp;
@@ -252,22 +252,6 @@ static int __ti_thermal_get_trend(void *p, long *trend)
if (ret)
return ret;
 
-   *trend = tr;
-
-   return 0;
-}
-
-/* Get the temperature trend callback functions for thermal zone */
-static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
-   int trip, enum thermal_trend *trend)
-{
-   int ret;
-   long tr;
-
-   ret = __ti_thermal_get_trend(thermal->devdata, );
-   if (ret)
-   return ret;
-
if (tr > 0)
*trend = THERMAL_TREND_RAISING;
else if (tr < 0)
@@ -278,6 +262,13 @@ static int ti_thermal_get_trend(struct thermal_zone_device 
*thermal,
return 0;
 }
 
+/* Get the temperature trend callback functions for thermal zone */
+static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
+   int trip, enum thermal_trend *trend)
+{
+   return __ti_thermal_get_trend(thermal->devdata, trip, trend);
+}
+
 /* Get critical temperature callback functions for thermal zone */
 static int ti_thermal_get_crit_temp(struct thermal_zone_device *thermal,
int *temp)
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index cb64866..3b96961 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -344,7 +344,7 @@ struct thermal_genl_event {
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
-   int (*get_trend)(void *, long *);
+   int (*get_trend)(void *, int, enum thermal_trend *);
int (*set_trips)(void *, int, int);
int (*set_emul_temp)(void *, int);
int (*set_trip_temp)(void *, int, int);
-- 
1.9.1



Re: [PATCH 19/41] Documentation: dt: opp: fix spelling mistake

2016-04-24 Thread Viresh Kumar
On 25-04-16, 01:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/devicetree/bindings/opp/opp.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt 
> b/Documentation/devicetree/bindings/opp/opp.txt
> index 601256f..ee91cbd 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -45,7 +45,7 @@ Devices supporting OPPs must set their 
> "operating-points-v2" property with
>  phandle to a OPP table in their DT node. The OPP core will use this phandle 
> to
>  find the operating points for the device.
>  
> -If required, this can be extended for SoC vendor specfic bindings. Such 
> bindings
> +If required, this can be extended for SoC vendor specific bindings. Such 
> bindings
>  should be documented as 
> Documentation/devicetree/bindings/power/-opp.txt
>  and should have a compatible description like: 
> "operating-points-v2-".

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH 19/41] Documentation: dt: opp: fix spelling mistake

2016-04-24 Thread Viresh Kumar
On 25-04-16, 01:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/devicetree/bindings/opp/opp.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt 
> b/Documentation/devicetree/bindings/opp/opp.txt
> index 601256f..ee91cbd 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -45,7 +45,7 @@ Devices supporting OPPs must set their 
> "operating-points-v2" property with
>  phandle to a OPP table in their DT node. The OPP core will use this phandle 
> to
>  find the operating points for the device.
>  
> -If required, this can be extended for SoC vendor specfic bindings. Such 
> bindings
> +If required, this can be extended for SoC vendor specific bindings. Such 
> bindings
>  should be documented as 
> Documentation/devicetree/bindings/power/-opp.txt
>  and should have a compatible description like: 
> "operating-points-v2-".

Acked-by: Viresh Kumar 

-- 
viresh


[PATCH 0/4] Thermal: Support for hardware-tracked trip points

2016-04-24 Thread Caesar Wang
The history patches come from Mikko and Sascha.
http://thread.gmane.org/gmane.linux.power-management.general/59451

Now, I pick them up to continue upstream.
Nevermind!

Tis series adds support for hardware trip points. It picks up earlier
work from Mikko Perttunen. Mikko implemented hardware trip points as part
of the device tree support. It was suggested back then to move the
functionality to the thermal core instead of putting more code into the
device tree support. This series does exactly that.

This series patches rebase the conflicts.
Note that the hardware-tracked trip points are very well tested currently.

Verified and tested on 
https://github.com/Caesar-github/rockchip/tree/wip/fixes-thermal-0425
That's based on linux-kernel 20160422.

[0.00] Linux version 4.6.0-rc4-next-20160422-00016-g0ac0bfb-dirty



Sascha Hauer (4):
  thermal: Add support for hardware-tracked trip points
  thermal: of: implement .set_trips for device tree thermal zones
  thermal: streamline get_trend callbacks
  thermal: bang-bang governor: act on lower trip boundary

 drivers/thermal/gov_bang_bang.c|  2 +-
 drivers/thermal/of-thermal.c   | 23 ++-
 drivers/thermal/thermal_core.c | 48 ++
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 25 ---
 include/linux/thermal.h|  9 +++-
 5 files changed, 78 insertions(+), 29 deletions(-)

-- 
1.9.1



[PATCH 1/4] thermal: Add support for hardware-tracked trip points

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

This adds support for hardware-tracked trip points to the device tree
thermal sensor framework.

The framework supports an arbitrary number of trip points. Whenever
the current temperature is updated, the trip points immediately
below and above the current temperature are found. A .set_trips
callback is then called with the temperatures. If there is no trip
point above or below the current temperature, the passed trip
temperature will be -INT_MAX or INT_MAX respectively. In this callback,
the driver should program the hardware such that it is notified
when either of these trip points are triggered. When a trip point
is triggered, the driver should call `thermal_zone_device_update'
for the respective thermal zone. This will cause the trip points
to be updated again.

If .set_trips is not implemented, the framework behaves as before.

This patch is based on an earlier version from Mikko Perttunen


Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/thermal_core.c | 48 ++
 include/linux/thermal.h|  3 +++
 2 files changed, 51 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index f1db496..cfef8cc 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -520,6 +520,47 @@ exit:
 }
 EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
 
+static void thermal_zone_set_trips(struct thermal_zone_device *tz)
+{
+   int low = -INT_MAX;
+   int high = INT_MAX;
+   int trip_temp, hysteresis;
+   int temp = tz->temperature;
+   int i, ret;
+
+   if (!tz->ops->set_trips)
+   return;
+
+   for (i = 0; i < tz->trips; i++) {
+   int trip_low;
+
+   tz->ops->get_trip_temp(tz, i, _temp);
+   tz->ops->get_trip_hyst(tz, i, );
+
+   trip_low = trip_temp - hysteresis;
+
+   if (trip_low < temp && trip_low > low)
+   low = trip_low;
+
+   if (trip_temp > temp && trip_temp < high)
+   high = trip_temp;
+   }
+
+   /* No need to change trip points */
+   if (tz->prev_low_trip == low && tz->prev_high_trip == high)
+   return;
+
+   tz->prev_low_trip = low;
+   tz->prev_high_trip = high;
+
+   dev_dbg(>device, "new temperature boundaries: %d < x < %d\n",
+   low, high);
+
+   ret = tz->ops->set_trips(tz, low, high);
+   if (ret)
+   dev_err(>device, "Failed to set trips: %d\n", ret);
+}
+
 static void update_temperature(struct thermal_zone_device *tz)
 {
int temp, ret;
@@ -569,6 +610,8 @@ void thermal_zone_device_update(struct thermal_zone_device 
*tz)
 
update_temperature(tz);
 
+   thermal_zone_set_trips(tz);
+
for (count = 0; count < tz->trips; count++)
handle_thermal_trip(tz, count);
 }
@@ -754,6 +797,9 @@ trip_point_hyst_store(struct device *dev, struct 
device_attribute *attr,
 */
ret = tz->ops->set_trip_hyst(tz, trip, temperature);
 
+   if (!ret)
+   thermal_zone_set_trips(tz);
+
return ret ? ret : count;
 }
 
@@ -1843,6 +1889,8 @@ struct thermal_zone_device 
*thermal_zone_device_register(const char *type,
tz->trips = trips;
tz->passive_delay = passive_delay;
tz->polling_delay = polling_delay;
+   tz->prev_low_trip = INT_MAX;
+   tz->prev_high_trip = -INT_MAX;
/* A new thermal zone needs to be updated anyway. */
atomic_set(>need_update, 1);
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e45abe7..e258359 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -98,6 +98,7 @@ struct thermal_zone_device_ops {
int (*unbind) (struct thermal_zone_device *,
   struct thermal_cooling_device *);
int (*get_temp) (struct thermal_zone_device *, int *);
+   int (*set_trips) (struct thermal_zone_device *, int, int);
int (*get_mode) (struct thermal_zone_device *,
 enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,
@@ -199,6 +200,8 @@ struct thermal_zone_device {
int last_temperature;
int emul_temperature;
int passive;
+   int prev_low_trip;
+   int prev_high_trip;
unsigned int forced_passive;
atomic_t need_update;
struct thermal_zone_device_ops *ops;
-- 
1.9.1



[PATCH 2/4] thermal: of: implement .set_trips for device tree thermal zones

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/of-thermal.c | 12 
 include/linux/thermal.h  |  4 
 2 files changed, 16 insertions(+)

diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index b8e509c..8722e63 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -101,6 +101,17 @@ static int of_thermal_get_temp(struct thermal_zone_device 
*tz,
return data->ops->get_temp(data->sensor_data, temp);
 }
 
+static int of_thermal_set_trips(struct thermal_zone_device *tz,
+   int low, int high)
+{
+   struct __thermal_zone *data = tz->devdata;
+
+   if (!data->ops || !data->ops->set_trips)
+   return -EINVAL;
+
+   return data->ops->set_trips(data->sensor_data, low, high);
+}
+
 /**
  * of_thermal_get_ntrips - function to export number of available trip
  *points.
@@ -427,6 +438,7 @@ thermal_zone_of_add_sensor(struct device_node *zone,
 
tzd->ops->get_temp = of_thermal_get_temp;
tzd->ops->get_trend = of_thermal_get_trend;
+   tzd->ops->set_trips = of_thermal_set_trips;
tzd->ops->set_emul_temp = of_thermal_set_emul_temp;
mutex_unlock(>lock);
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e258359..cb64866 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -336,12 +336,16 @@ struct thermal_genl_event {
  *
  * Optional:
  * @get_trend: a pointer to a function that reads the sensor temperature trend.
+ * @@set_trips: a pointer to a function that sets a temperature window. When
+ * this window is left the driver must inform the thermal core via
+ *  thermal_zone_device_update.
  * @set_emul_temp: a pointer to a function that sets sensor emulated
  *temperature.
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
int (*get_trend)(void *, long *);
+   int (*set_trips)(void *, int, int);
int (*set_emul_temp)(void *, int);
int (*set_trip_temp)(void *, int, int);
 };
-- 
1.9.1



[PATCH 1/4] thermal: Add support for hardware-tracked trip points

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

This adds support for hardware-tracked trip points to the device tree
thermal sensor framework.

The framework supports an arbitrary number of trip points. Whenever
the current temperature is updated, the trip points immediately
below and above the current temperature are found. A .set_trips
callback is then called with the temperatures. If there is no trip
point above or below the current temperature, the passed trip
temperature will be -INT_MAX or INT_MAX respectively. In this callback,
the driver should program the hardware such that it is notified
when either of these trip points are triggered. When a trip point
is triggered, the driver should call `thermal_zone_device_update'
for the respective thermal zone. This will cause the trip points
to be updated again.

If .set_trips is not implemented, the framework behaves as before.

This patch is based on an earlier version from Mikko Perttunen


Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/thermal_core.c | 48 ++
 include/linux/thermal.h|  3 +++
 2 files changed, 51 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index f1db496..cfef8cc 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -520,6 +520,47 @@ exit:
 }
 EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
 
+static void thermal_zone_set_trips(struct thermal_zone_device *tz)
+{
+   int low = -INT_MAX;
+   int high = INT_MAX;
+   int trip_temp, hysteresis;
+   int temp = tz->temperature;
+   int i, ret;
+
+   if (!tz->ops->set_trips)
+   return;
+
+   for (i = 0; i < tz->trips; i++) {
+   int trip_low;
+
+   tz->ops->get_trip_temp(tz, i, _temp);
+   tz->ops->get_trip_hyst(tz, i, );
+
+   trip_low = trip_temp - hysteresis;
+
+   if (trip_low < temp && trip_low > low)
+   low = trip_low;
+
+   if (trip_temp > temp && trip_temp < high)
+   high = trip_temp;
+   }
+
+   /* No need to change trip points */
+   if (tz->prev_low_trip == low && tz->prev_high_trip == high)
+   return;
+
+   tz->prev_low_trip = low;
+   tz->prev_high_trip = high;
+
+   dev_dbg(>device, "new temperature boundaries: %d < x < %d\n",
+   low, high);
+
+   ret = tz->ops->set_trips(tz, low, high);
+   if (ret)
+   dev_err(>device, "Failed to set trips: %d\n", ret);
+}
+
 static void update_temperature(struct thermal_zone_device *tz)
 {
int temp, ret;
@@ -569,6 +610,8 @@ void thermal_zone_device_update(struct thermal_zone_device 
*tz)
 
update_temperature(tz);
 
+   thermal_zone_set_trips(tz);
+
for (count = 0; count < tz->trips; count++)
handle_thermal_trip(tz, count);
 }
@@ -754,6 +797,9 @@ trip_point_hyst_store(struct device *dev, struct 
device_attribute *attr,
 */
ret = tz->ops->set_trip_hyst(tz, trip, temperature);
 
+   if (!ret)
+   thermal_zone_set_trips(tz);
+
return ret ? ret : count;
 }
 
@@ -1843,6 +1889,8 @@ struct thermal_zone_device 
*thermal_zone_device_register(const char *type,
tz->trips = trips;
tz->passive_delay = passive_delay;
tz->polling_delay = polling_delay;
+   tz->prev_low_trip = INT_MAX;
+   tz->prev_high_trip = -INT_MAX;
/* A new thermal zone needs to be updated anyway. */
atomic_set(>need_update, 1);
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e45abe7..e258359 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -98,6 +98,7 @@ struct thermal_zone_device_ops {
int (*unbind) (struct thermal_zone_device *,
   struct thermal_cooling_device *);
int (*get_temp) (struct thermal_zone_device *, int *);
+   int (*set_trips) (struct thermal_zone_device *, int, int);
int (*get_mode) (struct thermal_zone_device *,
 enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,
@@ -199,6 +200,8 @@ struct thermal_zone_device {
int last_temperature;
int emul_temperature;
int passive;
+   int prev_low_trip;
+   int prev_high_trip;
unsigned int forced_passive;
atomic_t need_update;
struct thermal_zone_device_ops *ops;
-- 
1.9.1



[PATCH 2/4] thermal: of: implement .set_trips for device tree thermal zones

2016-04-24 Thread Caesar Wang
From: Sascha Hauer 

Signed-off-by: Sascha Hauer 
Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org
---

 drivers/thermal/of-thermal.c | 12 
 include/linux/thermal.h  |  4 
 2 files changed, 16 insertions(+)

diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index b8e509c..8722e63 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -101,6 +101,17 @@ static int of_thermal_get_temp(struct thermal_zone_device 
*tz,
return data->ops->get_temp(data->sensor_data, temp);
 }
 
+static int of_thermal_set_trips(struct thermal_zone_device *tz,
+   int low, int high)
+{
+   struct __thermal_zone *data = tz->devdata;
+
+   if (!data->ops || !data->ops->set_trips)
+   return -EINVAL;
+
+   return data->ops->set_trips(data->sensor_data, low, high);
+}
+
 /**
  * of_thermal_get_ntrips - function to export number of available trip
  *points.
@@ -427,6 +438,7 @@ thermal_zone_of_add_sensor(struct device_node *zone,
 
tzd->ops->get_temp = of_thermal_get_temp;
tzd->ops->get_trend = of_thermal_get_trend;
+   tzd->ops->set_trips = of_thermal_set_trips;
tzd->ops->set_emul_temp = of_thermal_set_emul_temp;
mutex_unlock(>lock);
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e258359..cb64866 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -336,12 +336,16 @@ struct thermal_genl_event {
  *
  * Optional:
  * @get_trend: a pointer to a function that reads the sensor temperature trend.
+ * @@set_trips: a pointer to a function that sets a temperature window. When
+ * this window is left the driver must inform the thermal core via
+ *  thermal_zone_device_update.
  * @set_emul_temp: a pointer to a function that sets sensor emulated
  *temperature.
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
int (*get_trend)(void *, long *);
+   int (*set_trips)(void *, int, int);
int (*set_emul_temp)(void *, int);
int (*set_trip_temp)(void *, int, int);
 };
-- 
1.9.1



[PATCH 0/4] Thermal: Support for hardware-tracked trip points

2016-04-24 Thread Caesar Wang
The history patches come from Mikko and Sascha.
http://thread.gmane.org/gmane.linux.power-management.general/59451

Now, I pick them up to continue upstream.
Nevermind!

Tis series adds support for hardware trip points. It picks up earlier
work from Mikko Perttunen. Mikko implemented hardware trip points as part
of the device tree support. It was suggested back then to move the
functionality to the thermal core instead of putting more code into the
device tree support. This series does exactly that.

This series patches rebase the conflicts.
Note that the hardware-tracked trip points are very well tested currently.

Verified and tested on 
https://github.com/Caesar-github/rockchip/tree/wip/fixes-thermal-0425
That's based on linux-kernel 20160422.

[0.00] Linux version 4.6.0-rc4-next-20160422-00016-g0ac0bfb-dirty



Sascha Hauer (4):
  thermal: Add support for hardware-tracked trip points
  thermal: of: implement .set_trips for device tree thermal zones
  thermal: streamline get_trend callbacks
  thermal: bang-bang governor: act on lower trip boundary

 drivers/thermal/gov_bang_bang.c|  2 +-
 drivers/thermal/of-thermal.c   | 23 ++-
 drivers/thermal/thermal_core.c | 48 ++
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 25 ---
 include/linux/thermal.h|  9 +++-
 5 files changed, 78 insertions(+), 29 deletions(-)

-- 
1.9.1



Re: [PATCH 10/10] cpufreq: mvebu: Use generic platdev driver

2016-04-24 Thread Viresh Kumar
On 23-04-16, 00:42, Arnd Bergmann wrote:
> On Thursday 21 April 2016 14:29:02 Viresh Kumar wrote:
> > diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
> > index 79d0a5d9da8e..f24f46776fbb 100644
> > --- a/arch/arm/mach-mvebu/pmsu.c
> > +++ b/arch/arm/mach-mvebu/pmsu.c
> > @@ -685,8 +685,6 @@ static int __init armada_xp_pmsu_cpufreq_init(void)
> > dev_err(cpu_dev, "%s: failed to mark OPPs as 
> > shared: %d\n",
> > __func__, ret);
> > }
> > -
> > -   platform_device_register_simple("cpufreq-dt", -1, NULL, 0);
> > return 0;
> >  }
> >  
> > diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c 
> > b/drivers/cpufreq/cpufreq-dt-platdev.c
> > index 69b2a222c84e..5704a92c52dc 100644
> > --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> > +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> > @@ -33,6 +33,8 @@ static const struct of_device_id machines[] = {
> >  
> > { .compatible = "marvell,berlin", },
> >  
> > +   { .compatible = "marvell,armadaxp", },
> > +
> > { .compatible = "samsung,exynos3250", },
> > { .compatible = "samsung,exynos4210", },
> > { .compatible = "samsung,exynos4212", },
> 
> I think to make it clear that the ordering is significant here, I would leave 
> this
> platform_device_register_simple() in armada_xp_pmsu_cpufreq_init().
> 
> However, it might be helpful to move that function into a new file in
> drivers/cpufreq/ if that works.

We just can't get wrong with the ordering here, as this is done from
device_initcall() and so that point is out of question.

The other thing that can happen is that armada_xp_pmsu_cpufreq_init()
call can fail. In that case, most of the times cpufreq-dt ->init()
will fail as well, so even that is fine for me.

And, so I think we can keep this patch as is.

Do you agree ?

-- 
viresh


Re: [PATCH 10/10] cpufreq: mvebu: Use generic platdev driver

2016-04-24 Thread Viresh Kumar
On 23-04-16, 00:42, Arnd Bergmann wrote:
> On Thursday 21 April 2016 14:29:02 Viresh Kumar wrote:
> > diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
> > index 79d0a5d9da8e..f24f46776fbb 100644
> > --- a/arch/arm/mach-mvebu/pmsu.c
> > +++ b/arch/arm/mach-mvebu/pmsu.c
> > @@ -685,8 +685,6 @@ static int __init armada_xp_pmsu_cpufreq_init(void)
> > dev_err(cpu_dev, "%s: failed to mark OPPs as 
> > shared: %d\n",
> > __func__, ret);
> > }
> > -
> > -   platform_device_register_simple("cpufreq-dt", -1, NULL, 0);
> > return 0;
> >  }
> >  
> > diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c 
> > b/drivers/cpufreq/cpufreq-dt-platdev.c
> > index 69b2a222c84e..5704a92c52dc 100644
> > --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> > +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> > @@ -33,6 +33,8 @@ static const struct of_device_id machines[] = {
> >  
> > { .compatible = "marvell,berlin", },
> >  
> > +   { .compatible = "marvell,armadaxp", },
> > +
> > { .compatible = "samsung,exynos3250", },
> > { .compatible = "samsung,exynos4210", },
> > { .compatible = "samsung,exynos4212", },
> 
> I think to make it clear that the ordering is significant here, I would leave 
> this
> platform_device_register_simple() in armada_xp_pmsu_cpufreq_init().
> 
> However, it might be helpful to move that function into a new file in
> drivers/cpufreq/ if that works.

We just can't get wrong with the ordering here, as this is done from
device_initcall() and so that point is out of question.

The other thing that can happen is that armada_xp_pmsu_cpufreq_init()
call can fail. In that case, most of the times cpufreq-dt ->init()
will fail as well, so even that is fine for me.

And, so I think we can keep this patch as is.

Do you agree ?

-- 
viresh


Re: [PATCH 09/41] Documentation: dt: clock: fix spelling mistakes

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt | 2 +-
>  Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt | 2 +-
>  Documentation/devicetree/bindings/clock/st/st,clkgen.txt| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt 
> b/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> index 0c2bf5e..7f36853 100644
> --- a/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> +++ b/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> @@ -16,7 +16,7 @@ Required Properties:
>  Optional Properties:
>  
>  - rockchip,grf: phandle to the syscon managing the "general register files"
> -  If missing pll rates are not changable, due to the missing pll lock status.
> +  If missing pll rates are not changeable, due to the missing pll lock 
> status.
>  
>  Each clock is assigned an identifier and client nodes can use this identifier
>  to specify the clock which they consume. All available clocks are defined as
> diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt 
> b/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> index c9fbb76..8cb47c3 100644
> --- a/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> +++ b/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> @@ -15,7 +15,7 @@ Required Properties:
>  Optional Properties:
>  
>  - rockchip,grf: phandle to the syscon managing the "general register files"
> -  If missing pll rates are not changable, due to the missing pll lock status.
> +  If missing pll rates are not changeable, due to the missing pll lock 
> status.
>  
>  Each clock is assigned an identifier and client nodes can use this identifier
>  to specify the clock which they consume. All available clocks are defined as
> diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt 
> b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> index 78978f1..426bdda 100644
> --- a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> @@ -40,7 +40,7 @@ address is common of all subnode.
>   };
>  
>  This binding uses the common clock binding[1].
> -Each subnode should use the binding discribe in [2]..[7]
> +Each subnode should use the binding describe in [2]..[7]

described ?

>  
>  [1] Documentation/devicetree/bindings/clock/clock-bindings.txt
>  [2] Documentation/devicetree/bindings/clock/st,clkgen-divmux.txt
> 


-- 
~Randy


Re: [PATCH 09/41] Documentation: dt: clock: fix spelling mistakes

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt | 2 +-
>  Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt | 2 +-
>  Documentation/devicetree/bindings/clock/st/st,clkgen.txt| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt 
> b/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> index 0c2bf5e..7f36853 100644
> --- a/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> +++ b/Documentation/devicetree/bindings/clock/rockchip,rk3188-cru.txt
> @@ -16,7 +16,7 @@ Required Properties:
>  Optional Properties:
>  
>  - rockchip,grf: phandle to the syscon managing the "general register files"
> -  If missing pll rates are not changable, due to the missing pll lock status.
> +  If missing pll rates are not changeable, due to the missing pll lock 
> status.
>  
>  Each clock is assigned an identifier and client nodes can use this identifier
>  to specify the clock which they consume. All available clocks are defined as
> diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt 
> b/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> index c9fbb76..8cb47c3 100644
> --- a/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> +++ b/Documentation/devicetree/bindings/clock/rockchip,rk3288-cru.txt
> @@ -15,7 +15,7 @@ Required Properties:
>  Optional Properties:
>  
>  - rockchip,grf: phandle to the syscon managing the "general register files"
> -  If missing pll rates are not changable, due to the missing pll lock status.
> +  If missing pll rates are not changeable, due to the missing pll lock 
> status.
>  
>  Each clock is assigned an identifier and client nodes can use this identifier
>  to specify the clock which they consume. All available clocks are defined as
> diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt 
> b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> index 78978f1..426bdda 100644
> --- a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt
> @@ -40,7 +40,7 @@ address is common of all subnode.
>   };
>  
>  This binding uses the common clock binding[1].
> -Each subnode should use the binding discribe in [2]..[7]
> +Each subnode should use the binding describe in [2]..[7]

described ?

>  
>  [1] Documentation/devicetree/bindings/clock/clock-bindings.txt
>  [2] Documentation/devicetree/bindings/clock/st,clkgen-divmux.txt
> 


-- 
~Randy


Re: [PATCH 04/41] Documentation: cgroup: fix spelling mistakes

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> @@ -1123,7 +1123,7 @@ writeback as follows.
>  
>  6-1. Basics
>  
> -cgroup namespace provides a mechanism to virtualize the view of the
> +cgroup namespace provides a mechanism to virtualise the view of the
>  "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone
>  flag can be used with clone(2) and unshare(2) to create a new cgroup
>  namespace.  The process running inside the cgroup namespace will have
> @@ -1256,7 +1256,7 @@ This will mount the unified cgroup hierarchy with 
> cgroupns root as the
>  filesystem root.  The process needs CAP_SYS_ADMIN against its user and
>  mount namespaces.
>  
> -The virtualization of /proc/self/cgroup file combined with restricting
> +The virtualisation of /proc/self/cgroup file combined with restricting
>  the view of cgroup hierarchy by namespace-private cgroupfs mount
>  provides a properly isolated cgroup view inside the container.

We generally accept British or American spellings, so these changes
are not needed.


-- 
~Randy


Re: [PATCH 04/41] Documentation: cgroup: fix spelling mistakes

2016-04-24 Thread Randy Dunlap
On 04/24/16 17:24, Eric Engestrom wrote:
> @@ -1123,7 +1123,7 @@ writeback as follows.
>  
>  6-1. Basics
>  
> -cgroup namespace provides a mechanism to virtualize the view of the
> +cgroup namespace provides a mechanism to virtualise the view of the
>  "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone
>  flag can be used with clone(2) and unshare(2) to create a new cgroup
>  namespace.  The process running inside the cgroup namespace will have
> @@ -1256,7 +1256,7 @@ This will mount the unified cgroup hierarchy with 
> cgroupns root as the
>  filesystem root.  The process needs CAP_SYS_ADMIN against its user and
>  mount namespaces.
>  
> -The virtualization of /proc/self/cgroup file combined with restricting
> +The virtualisation of /proc/self/cgroup file combined with restricting
>  the view of cgroup hierarchy by namespace-private cgroupfs mount
>  provides a properly isolated cgroup view inside the container.

We generally accept British or American spellings, so these changes
are not needed.


-- 
~Randy


[PATCH] clk: fix member type of struct clk_hw_onecell_data

2016-04-24 Thread Masahiro Yamada
We cannot assign any value to an array type variable.  So,

  hw_data->hws = kcalloc(hw_data->num, sizeof(struct clk_hw *),
 GFP_KERNEL);

fails with "invalid use of flexible array member" error.

There are two ways to fix this issue.

[1] Make it a double-pointer
  struct clk_hw_onecell_data {
  size_t num;
  struct clk_hw **hws;
  };

This works as struct clk_onecell_data does.

[2] Make it a zero-sized array
  struct clk_hw_onecell_data {
  size_t num;
  struct clk_hw *hws[0];
  };

This allows one-shot memory allocation like this:

  hw_data = kmalloc(sizeof(*hw_data) + clk_num * sizeof(struct clk_hw *),
GFP_KERNEL);

This commit adopts [2] because it looks like Stephen's intention
(he moved hws[] to the bottom of struct clk_hw_onecell_data).

Fixes: 0861e5b8cf80 ("clk: Add clk_hw OF clk providers")
Signed-off-by: Masahiro Yamada 
---

 include/linux/clk-provider.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index fd2ccd5..1850e25 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -769,7 +769,7 @@ struct clk_onecell_data {
 
 struct clk_hw_onecell_data {
size_t num;
-   struct clk_hw *hws[];
+   struct clk_hw *hws[0];
 };
 
 extern struct of_device_id __clk_of_table;
-- 
1.9.1



[PATCH] clk: fix member type of struct clk_hw_onecell_data

2016-04-24 Thread Masahiro Yamada
We cannot assign any value to an array type variable.  So,

  hw_data->hws = kcalloc(hw_data->num, sizeof(struct clk_hw *),
 GFP_KERNEL);

fails with "invalid use of flexible array member" error.

There are two ways to fix this issue.

[1] Make it a double-pointer
  struct clk_hw_onecell_data {
  size_t num;
  struct clk_hw **hws;
  };

This works as struct clk_onecell_data does.

[2] Make it a zero-sized array
  struct clk_hw_onecell_data {
  size_t num;
  struct clk_hw *hws[0];
  };

This allows one-shot memory allocation like this:

  hw_data = kmalloc(sizeof(*hw_data) + clk_num * sizeof(struct clk_hw *),
GFP_KERNEL);

This commit adopts [2] because it looks like Stephen's intention
(he moved hws[] to the bottom of struct clk_hw_onecell_data).

Fixes: 0861e5b8cf80 ("clk: Add clk_hw OF clk providers")
Signed-off-by: Masahiro Yamada 
---

 include/linux/clk-provider.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index fd2ccd5..1850e25 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -769,7 +769,7 @@ struct clk_onecell_data {
 
 struct clk_hw_onecell_data {
size_t num;
-   struct clk_hw *hws[];
+   struct clk_hw *hws[0];
 };
 
 extern struct of_device_id __clk_of_table;
-- 
1.9.1



[PATCH 1/2] bfp tools: Remove expression with no effect

2016-04-24 Thread Florian Fainelli
Assigning "attr" to "attr" does not have any effect, but was caught by
Coverity, so let's remove this.

Reported-by: coverity (CID 1354720)
Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature 
check")
Signed-off-by: Florian Fainelli 
---
 tools/build/feature/test-bpf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
index b389026839b9..8236df9a46ca 100644
--- a/tools/build/feature/test-bpf.c
+++ b/tools/build/feature/test-bpf.c
@@ -27,7 +27,6 @@ int main(void)
attr.log_level = 0;
attr.kern_version = 0;
 
-   attr = attr;
/*
 * Test existence of __NR_bpf and BPF_PROG_LOAD.
 * This call should fail if we run the testcase.
-- 
2.7.4



[PATCH 1/2] bfp tools: Remove expression with no effect

2016-04-24 Thread Florian Fainelli
Assigning "attr" to "attr" does not have any effect, but was caught by
Coverity, so let's remove this.

Reported-by: coverity (CID 1354720)
Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature 
check")
Signed-off-by: Florian Fainelli 
---
 tools/build/feature/test-bpf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
index b389026839b9..8236df9a46ca 100644
--- a/tools/build/feature/test-bpf.c
+++ b/tools/build/feature/test-bpf.c
@@ -27,7 +27,6 @@ int main(void)
attr.log_level = 0;
attr.kern_version = 0;
 
-   attr = attr;
/*
 * Test existence of __NR_bpf and BPF_PROG_LOAD.
 * This call should fail if we run the testcase.
-- 
2.7.4



[PATCH 2/2] bfp tools: Fix syscall argument

2016-04-24 Thread Florian Fainelli
Coverity flagged this under CID 1354884 as a sizeof mismatch, it turns
out that the argument "attr" passed to syscall should have been a
pointer to attr in the first place.

Reported-by: coverity (CID 1354884)
Fixes: 8f9e05fb298f ("perf tools: Fix PowerPC native building")
Signed-off-by: Florian Fainelli 
---
 tools/build/feature/test-bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
index 8236df9a46ca..e04ab89a1013 100644
--- a/tools/build/feature/test-bpf.c
+++ b/tools/build/feature/test-bpf.c
@@ -31,5 +31,5 @@ int main(void)
 * Test existence of __NR_bpf and BPF_PROG_LOAD.
 * This call should fail if we run the testcase.
 */
-   return syscall(__NR_bpf, BPF_PROG_LOAD, attr, sizeof(attr));
+   return syscall(__NR_bpf, BPF_PROG_LOAD, , sizeof(attr));
 }
-- 
2.7.4



[PATCH 2/2] bfp tools: Fix syscall argument

2016-04-24 Thread Florian Fainelli
Coverity flagged this under CID 1354884 as a sizeof mismatch, it turns
out that the argument "attr" passed to syscall should have been a
pointer to attr in the first place.

Reported-by: coverity (CID 1354884)
Fixes: 8f9e05fb298f ("perf tools: Fix PowerPC native building")
Signed-off-by: Florian Fainelli 
---
 tools/build/feature/test-bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
index 8236df9a46ca..e04ab89a1013 100644
--- a/tools/build/feature/test-bpf.c
+++ b/tools/build/feature/test-bpf.c
@@ -31,5 +31,5 @@ int main(void)
 * Test existence of __NR_bpf and BPF_PROG_LOAD.
 * This call should fail if we run the testcase.
 */
-   return syscall(__NR_bpf, BPF_PROG_LOAD, attr, sizeof(attr));
+   return syscall(__NR_bpf, BPF_PROG_LOAD, , sizeof(attr));
 }
-- 
2.7.4



[PATCH 0/2] bfp tools: Couple Coverity fixes

2016-04-24 Thread Florian Fainelli
Hi all,

Two trivial patches that were flagged by Coverity.

Thanks!

Florian Fainelli (2):
  bfp tools: Remove expression with no effect
  bfp tools: Fix syscall argument

 tools/build/feature/test-bpf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.7.4



[PATCH 0/2] bfp tools: Couple Coverity fixes

2016-04-24 Thread Florian Fainelli
Hi all,

Two trivial patches that were flagged by Coverity.

Thanks!

Florian Fainelli (2):
  bfp tools: Remove expression with no effect
  bfp tools: Fix syscall argument

 tools/build/feature/test-bpf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.7.4



RE: [PATCH] checkpatch: Add support to check already applied git commits

2016-04-24 Thread Du, Changbin
Hi,

> From: Joe Perches [mailto:j...@perches.com]
> Sent: Monday, April 25, 2016 7:12 AM
> To: Andrew Morton ; Andy Whitcroft
> 
> Cc: Du, Changbin ; linux-kernel@vger.kernel.org
> Subject: [PATCH] checkpatch: Add support to check already applied git
> commits
> 
> It's sometimes useful to scan already committed patches.
> 
> Add --git  to scan specific or multiple commits.
> 
> Single commits are scanned with
>   --git 
> Multiple commits are scanned with
>   --git 
>   --git -
> 
> Signed-off-by: "Du, Changbin" 
> Signed-off-by: Joe Perches 
> ---
> 
> A few miscellaneous improvements to Changbin's original patch:
> 
> o Don't exec git for each -,
>   use a single "git log - "
> o Consolidate the git exec for the  and - variants
> o Output 12 character commit hash ids
> o Don't scan git commit merges
> o Use -M to reduce the size of rename commits
> 

Thanks, it has been much more better now. I like this new one. 
This is my first time write a Perl script. :)

Also thanks for Sebastian's explanation, I got this idea just
because I had the same use case with you.

Best Regards,
Du, Changbin


RE: [PATCH] checkpatch: Add support to check already applied git commits

2016-04-24 Thread Du, Changbin
Hi,

> From: Joe Perches [mailto:j...@perches.com]
> Sent: Monday, April 25, 2016 7:12 AM
> To: Andrew Morton ; Andy Whitcroft
> 
> Cc: Du, Changbin ; linux-kernel@vger.kernel.org
> Subject: [PATCH] checkpatch: Add support to check already applied git
> commits
> 
> It's sometimes useful to scan already committed patches.
> 
> Add --git  to scan specific or multiple commits.
> 
> Single commits are scanned with
>   --git 
> Multiple commits are scanned with
>   --git 
>   --git -
> 
> Signed-off-by: "Du, Changbin" 
> Signed-off-by: Joe Perches 
> ---
> 
> A few miscellaneous improvements to Changbin's original patch:
> 
> o Don't exec git for each -,
>   use a single "git log - "
> o Consolidate the git exec for the  and - variants
> o Output 12 character commit hash ids
> o Don't scan git commit merges
> o Use -M to reduce the size of rename commits
> 

Thanks, it has been much more better now. I like this new one. 
This is my first time write a Perl script. :)

Also thanks for Sebastian's explanation, I got this idea just
because I had the same use case with you.

Best Regards,
Du, Changbin


[lkp] [thermal] fd87ba5cc7: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC

2016-04-24 Thread kernel test robot
FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Eduardo-Valentin/thermal-sysfs-rework/20160424-073943
commit fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3 ("thermal: use dev.groups to 
manage always present tz attributes")

on test machine: vm-lkp-wsx03-2G: 2 threads qemu-system-x86_64 -enable-kvm -cpu 
host with 2G memory

caused below changes:


[8.442718] power_supply test_battery: uevent
[8.443279] power_supply test_battery: POWER_SUPPLY_NAME=test_battery
[8.444309] [ cut here ]
[8.444905] WARNING: CPU: 1 PID: 1 at fs/sysfs/group.c:61 
internal_create_group+0x16c/0x264
[8.446198] general protection fault:  [#1] SMP DEBUG_PAGEALLOC 
[8.447057] Modules linked in:
[8.447477] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.6.0-rc4-00067-gfd87ba5 #1
[8.448413] task: 88007515c000 ti: 88007516 task.ti: 
88007516
[8.449337] RIP: 0010:[]  [] 
string+0x29/0x4c
[8.450293] RSP: :880075163b10  EFLAGS: 00010086
[8.450955] RAX:  RBX: 880075163c48 RCX: 0a00ff04
[8.451885] RDX: 0065766973736170 RSI:  RDI: ba58c32a
[8.452773] RBP: 880075163b10 R08: ba58c700 R09: 
[8.453659] R10:  R11: ba58bf0d R12: ba58c700
[8.454557] R13: ba58c32a R14: b98b139e R15: 03e0
[8.455447] FS:  () GS:88007810() 
knlGS:
[8.456451] CS:  0010 DS:  ES:  CR0: 80050033
[8.457163] CR2:  CR3: 39a06000 CR4: 06e0
[8.458057] Stack:
[8.458336]  880075163b60 b91b3fae b98b139c 
ba58c320
[8.459343]  0a00ff04 03e0  

[8.460329]  0001  880075163b78 
b91b41c4
[8.461342] Call Trace:
[8.461660]  [] vsnprintf+0x272/0x47b
[8.462321]  [] vscnprintf+0xd/0x23
[8.462956]  [] vprintk_emit+0x1e4/0x395
[8.463650]  [] ? internal_create_group+0x16c/0x264
[8.464452]  [] vprintk_default+0x18/0x1a
[8.465148]  [] vprintk+0x9/0xb
[8.465753]  [] __warn+0x80/0xd3
[8.466364]  [] warn_slowpath_fmt+0x46/0x4e
[8.467090]  [] ? sysfs_add_file_mode_ns+0xca/0x171
[8.467892]  [] internal_create_group+0x16c/0x264
[8.468673]  [] sysfs_create_group+0xe/0x10
[8.469393]  [] sysfs_create_groups+0x31/0x66
[8.470129]  [] device_add+0x267/0x4b4
[8.470801]  [] ? __raw_spin_lock_init+0x2e/0x4c
[8.471590]  [] device_register+0x15/0x18
[8.472301]  [] thermal_zone_device_register+0x12e/0x769
[8.473239]  [] thermal_zone_device_register+0x6d/0x84
[8.474092]  [] __power_supply_register+0x24c/0x304
[8.474900]  [] power_supply_register+0xe/0x10
[8.475663]  [] test_power_init+0x32/0xa8
[8.476369]  [] ? wm8350_power_driver_init+0x14/0x14
[8.477176]  [] do_one_initcall+0xe2/0x169
[8.477897]  [] kernel_init_freeable+0x118/0x19e
[8.478672]  [] kernel_init+0x9/0xeb
[8.479331]  [] ret_from_fork+0x22/0x40
[8.47]  [] ? rest_init+0xbd/0xbd
[8.480659] Code: 5d c3 55 49 89 ca 48 c7 c0 29 6d 8a b9 49 c1 fa 30 48 81 
fa ff 0f 00 00 49 89 f0 48 0f 46 d0 48 89 e5 31 c0 49 39 c2 89 c6 74 19 <44> 8a 
0c 02 45 84 c9 74 10 4c 39 c7 73 03 44 88 0f 48 ff c7 48 
[8.484158] RIP  [] string+0x29/0x4c
[8.484831]  RSP 
[8.485299] ---[ end trace 8d682e68977f59c6 ]---
[8.485882] Kernel panic - not syncing: Fatal exception


FYI, raw QEMU command line is:

qemu-system-x86_64 -enable-kvm -cpu host -kernel 
/pkg/linux/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/vmlinuz-4.6.0-rc4-00067-gfd87ba5
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-lkp-wsx03-2G-11/bisect_boot-1-debian-x86_64-2015-02-07.cgz-x86_64-randconfig-s2-04240821-fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3-20160424-71155-1r13ao5-0.yaml
 ARCH=x86_64 kconfig=x86_64-randconfig-s2-04240821 
branch=linux-devel/devel-spot-201604240758 
commit=fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3 
BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/vmlinuz-4.6.0-rc4-00067-gfd87ba5
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-2G/debian-x86_64-2015-02-07.cgz/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/0
 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug 
sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
ip=vm-lkp-wsx03-2G-11::dhcp'  -initrd /fs/sdc1/initrd-vm-lkp-wsx03-2G-11 -m 
2048 -smp 2 -device e1000,netdev=net0 -netdev 
user,id=net0,hostfwd=tcp::23630-:22 -boot order=nc -no-reboot -watchdog 
i6300esb -rtc base=localtime 

[lkp] [thermal] fd87ba5cc7: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC

2016-04-24 Thread kernel test robot
FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Eduardo-Valentin/thermal-sysfs-rework/20160424-073943
commit fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3 ("thermal: use dev.groups to 
manage always present tz attributes")

on test machine: vm-lkp-wsx03-2G: 2 threads qemu-system-x86_64 -enable-kvm -cpu 
host with 2G memory

caused below changes:


[8.442718] power_supply test_battery: uevent
[8.443279] power_supply test_battery: POWER_SUPPLY_NAME=test_battery
[8.444309] [ cut here ]
[8.444905] WARNING: CPU: 1 PID: 1 at fs/sysfs/group.c:61 
internal_create_group+0x16c/0x264
[8.446198] general protection fault:  [#1] SMP DEBUG_PAGEALLOC 
[8.447057] Modules linked in:
[8.447477] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.6.0-rc4-00067-gfd87ba5 #1
[8.448413] task: 88007515c000 ti: 88007516 task.ti: 
88007516
[8.449337] RIP: 0010:[]  [] 
string+0x29/0x4c
[8.450293] RSP: :880075163b10  EFLAGS: 00010086
[8.450955] RAX:  RBX: 880075163c48 RCX: 0a00ff04
[8.451885] RDX: 0065766973736170 RSI:  RDI: ba58c32a
[8.452773] RBP: 880075163b10 R08: ba58c700 R09: 
[8.453659] R10:  R11: ba58bf0d R12: ba58c700
[8.454557] R13: ba58c32a R14: b98b139e R15: 03e0
[8.455447] FS:  () GS:88007810() 
knlGS:
[8.456451] CS:  0010 DS:  ES:  CR0: 80050033
[8.457163] CR2:  CR3: 39a06000 CR4: 06e0
[8.458057] Stack:
[8.458336]  880075163b60 b91b3fae b98b139c 
ba58c320
[8.459343]  0a00ff04 03e0  

[8.460329]  0001  880075163b78 
b91b41c4
[8.461342] Call Trace:
[8.461660]  [] vsnprintf+0x272/0x47b
[8.462321]  [] vscnprintf+0xd/0x23
[8.462956]  [] vprintk_emit+0x1e4/0x395
[8.463650]  [] ? internal_create_group+0x16c/0x264
[8.464452]  [] vprintk_default+0x18/0x1a
[8.465148]  [] vprintk+0x9/0xb
[8.465753]  [] __warn+0x80/0xd3
[8.466364]  [] warn_slowpath_fmt+0x46/0x4e
[8.467090]  [] ? sysfs_add_file_mode_ns+0xca/0x171
[8.467892]  [] internal_create_group+0x16c/0x264
[8.468673]  [] sysfs_create_group+0xe/0x10
[8.469393]  [] sysfs_create_groups+0x31/0x66
[8.470129]  [] device_add+0x267/0x4b4
[8.470801]  [] ? __raw_spin_lock_init+0x2e/0x4c
[8.471590]  [] device_register+0x15/0x18
[8.472301]  [] thermal_zone_device_register+0x12e/0x769
[8.473239]  [] thermal_zone_device_register+0x6d/0x84
[8.474092]  [] __power_supply_register+0x24c/0x304
[8.474900]  [] power_supply_register+0xe/0x10
[8.475663]  [] test_power_init+0x32/0xa8
[8.476369]  [] ? wm8350_power_driver_init+0x14/0x14
[8.477176]  [] do_one_initcall+0xe2/0x169
[8.477897]  [] kernel_init_freeable+0x118/0x19e
[8.478672]  [] kernel_init+0x9/0xeb
[8.479331]  [] ret_from_fork+0x22/0x40
[8.47]  [] ? rest_init+0xbd/0xbd
[8.480659] Code: 5d c3 55 49 89 ca 48 c7 c0 29 6d 8a b9 49 c1 fa 30 48 81 
fa ff 0f 00 00 49 89 f0 48 0f 46 d0 48 89 e5 31 c0 49 39 c2 89 c6 74 19 <44> 8a 
0c 02 45 84 c9 74 10 4c 39 c7 73 03 44 88 0f 48 ff c7 48 
[8.484158] RIP  [] string+0x29/0x4c
[8.484831]  RSP 
[8.485299] ---[ end trace 8d682e68977f59c6 ]---
[8.485882] Kernel panic - not syncing: Fatal exception


FYI, raw QEMU command line is:

qemu-system-x86_64 -enable-kvm -cpu host -kernel 
/pkg/linux/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/vmlinuz-4.6.0-rc4-00067-gfd87ba5
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-lkp-wsx03-2G-11/bisect_boot-1-debian-x86_64-2015-02-07.cgz-x86_64-randconfig-s2-04240821-fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3-20160424-71155-1r13ao5-0.yaml
 ARCH=x86_64 kconfig=x86_64-randconfig-s2-04240821 
branch=linux-devel/devel-spot-201604240758 
commit=fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3 
BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/vmlinuz-4.6.0-rc4-00067-gfd87ba5
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-2G/debian-x86_64-2015-02-07.cgz/x86_64-randconfig-s2-04240821/gcc-5/fd87ba5cc746cfd6ad36f7a26a77849fb674e2c3/0
 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug 
sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
ip=vm-lkp-wsx03-2G-11::dhcp'  -initrd /fs/sdc1/initrd-vm-lkp-wsx03-2G-11 -m 
2048 -smp 2 -device e1000,netdev=net0 -netdev 
user,id=net0,hostfwd=tcp::23630-:22 -boot order=nc -no-reboot -watchdog 
i6300esb -rtc base=localtime 

[lkp] [Input] 50fea9b0cf: kmsg.i8042:Unable_to_get_stable_CTR_read

2016-04-24 Thread kernel test robot
FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Mark-Laws/Input-i8042-Fix-console-keyboard-support-on-Gen2-Hyper-V-VMs/20160422-210451
commit 50fea9b0cfa3721f9320fd422942a662db568a29 ("Input: i8042 - Fix console 
keyboard support on Gen2 Hyper-V VMs")

on test machine: vm-lkp-wsx03-openwrt-i386: 1 threads qemu-system-i386 
-enable-kvm with 192M memory

caused below changes:


[2.178106] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 
0x60,0x64 irq 1,12
[2.178106] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 
0x60,0x64 irq 1,12
[2.180340] i8042: Unable to get stable CTR read
[2.180340] i8042: Unable to get stable CTR read
[2.181286] i8042: probe of i8042 failed with error -5
[2.181286] i8042: probe of i8042 failed with error -5


FYI, raw QEMU command line is:

qemu-system-i386 -enable-kvm -kernel 
/pkg/linux/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/vmlinuz-4.5.0-00062-g50fea9b
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-6/bisect_boot-1-openwrt-i386.cgz-i386-randconfig-i1-201616-50fea9b0cfa3721f9320fd422942a662db568a29-20160422-17397-11xfb0e-1.yaml
 ARCH=i386 kconfig=i386-randconfig-i1-201616 
branch=linux-devel/devel-catchup-201604222108 
commit=50fea9b0cfa3721f9320fd422942a662db568a29 
BOOT_IMAGE=/pkg/linux/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/vmlinuz-4.5.0-00062-g50fea9b
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-openwrt-i386/openwrt-i386.cgz/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/0
 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug 
sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
ip=vm-lkp-wsx03-openwrt-i386-6::dhcp drbd.minor_count=8'  -initrd 
/fs/sdc1/initrd-vm-lkp-wsx03-openwrt-i386-6 -m 192 -smp 1 -device 
e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog 
i6300esb -rtc base=localtime -drive 
file=/fs/sdc1/disk0-vm-lkp-wsx03-openwrt-i386-6,media=disk,if=virtio -drive 
file=/fs/sdc1/disk1-vm-lkp-wsx03-openwrt-i386-6,media=disk,if=virtio -pidfile 
/dev/shm/kboot/pid-vm-lkp-wsx03-openwrt-i386-6 -serial 
file:/dev/shm/kboot/serial-vm-lkp-wsx03-openwrt-i386-6 -daemonize -display none 
-monitor null 



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.5.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

[lkp] [Input] 50fea9b0cf: kmsg.i8042:Unable_to_get_stable_CTR_read

2016-04-24 Thread kernel test robot
FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Mark-Laws/Input-i8042-Fix-console-keyboard-support-on-Gen2-Hyper-V-VMs/20160422-210451
commit 50fea9b0cfa3721f9320fd422942a662db568a29 ("Input: i8042 - Fix console 
keyboard support on Gen2 Hyper-V VMs")

on test machine: vm-lkp-wsx03-openwrt-i386: 1 threads qemu-system-i386 
-enable-kvm with 192M memory

caused below changes:


[2.178106] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 
0x60,0x64 irq 1,12
[2.178106] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 
0x60,0x64 irq 1,12
[2.180340] i8042: Unable to get stable CTR read
[2.180340] i8042: Unable to get stable CTR read
[2.181286] i8042: probe of i8042 failed with error -5
[2.181286] i8042: probe of i8042 failed with error -5


FYI, raw QEMU command line is:

qemu-system-i386 -enable-kvm -kernel 
/pkg/linux/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/vmlinuz-4.5.0-00062-g50fea9b
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-6/bisect_boot-1-openwrt-i386.cgz-i386-randconfig-i1-201616-50fea9b0cfa3721f9320fd422942a662db568a29-20160422-17397-11xfb0e-1.yaml
 ARCH=i386 kconfig=i386-randconfig-i1-201616 
branch=linux-devel/devel-catchup-201604222108 
commit=50fea9b0cfa3721f9320fd422942a662db568a29 
BOOT_IMAGE=/pkg/linux/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/vmlinuz-4.5.0-00062-g50fea9b
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-openwrt-i386/openwrt-i386.cgz/i386-randconfig-i1-201616/gcc-5/50fea9b0cfa3721f9320fd422942a662db568a29/0
 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug 
sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
ip=vm-lkp-wsx03-openwrt-i386-6::dhcp drbd.minor_count=8'  -initrd 
/fs/sdc1/initrd-vm-lkp-wsx03-openwrt-i386-6 -m 192 -smp 1 -device 
e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog 
i6300esb -rtc base=localtime -drive 
file=/fs/sdc1/disk0-vm-lkp-wsx03-openwrt-i386-6,media=disk,if=virtio -drive 
file=/fs/sdc1/disk1-vm-lkp-wsx03-openwrt-i386-6,media=disk,if=virtio -pidfile 
/dev/shm/kboot/pid-vm-lkp-wsx03-openwrt-i386-6 -serial 
file:/dev/shm/kboot/serial-vm-lkp-wsx03-openwrt-i386-6 -daemonize -display none 
-monitor null 



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.5.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

Re: [PATCH 25/41] Documentation: filesystems: fix spelling mistakes

2016-04-24 Thread Steve French
Reviewed-by: Steve French 

On Sun, Apr 24, 2016 at 7:24 PM, Eric Engestrom  wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/filesystems/autofs4.txt  | 6 +++---
>  Documentation/filesystems/cifs/CHANGES | 2 +-
>  Documentation/filesystems/proc.txt | 4 ++--
>  Documentation/filesystems/vfs.txt  | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/autofs4.txt 
> b/Documentation/filesystems/autofs4.txt
> index 39d02e1..25fe9db 100644
> --- a/Documentation/filesystems/autofs4.txt
> +++ b/Documentation/filesystems/autofs4.txt
> @@ -225,7 +225,7 @@ unmount any filesystems mounted on the autofs filesystem 
> or remove any
>  symbolic links or empty directories any time it likes.  If the unmount
>  or removal is successful the filesystem will be returned to the state
>  it was before the mount or creation, so that any access of the name
> -will trigger normal auto-mount processing.  In particlar, `rmdir` and
> +will trigger normal auto-mount processing.  In particular, `rmdir` and
>  `unlink` do not leave negative entries in the dcache as a normal
>  filesystem would, so an attempt to access a recently-removed object is
>  passed to autofs for handling.
> @@ -242,7 +242,7 @@ time stamp on each directory or symlink.  For symlinks it 
> genuinely
>  does record the last time the symlink was "used" or followed to find
>  out where it points to.  For directories the field is a slight
>  misnomer.  It actually records the last time that autofs checked if
> -the directory or one of its descendents was busy and found that it
> +the directory or one of its descendants was busy and found that it
>  was.  This is just as useful and doesn't require updating the field so
>  often.
>
> @@ -255,7 +255,7 @@ up.
>
>  There is an option with indirect mounts to consider each of the leaves
>  that has been mounted on instead of considering the top-level names.
> -This is intended for compatability with version 4 of autofs and should
> +This is intended for compatibility with version 4 of autofs and should
>  be considered as deprecated.
>
>  When autofs considers a directory it checks the `last_used` time and
> diff --git a/Documentation/filesystems/cifs/CHANGES 
> b/Documentation/filesystems/cifs/CHANGES
> index bc0025c..fe8f1ed 100644
> --- a/Documentation/filesystems/cifs/CHANGES
> +++ b/Documentation/filesystems/cifs/CHANGES
> @@ -455,7 +455,7 @@ Fix internationalization problem in cifs readdir with 
> filenames that map to
>  longer UTF-8 strings than the string on the wire was in Unicode.  Add 
> workaround
>  for readdir to netapp servers. Fix search rewind (seek into readdir to return
>  non-consecutive entries).  Do not do readdir when server negotiates
> -buffer size to small to fit filename. Add support for reading POSIX ACLs from
> +buffer size too small to fit filename. Add support for reading POSIX ACLs 
> from
>  the server (add also acl and noacl mount options).
>
>  Version 1.24
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index 7f5607a..03b6019 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -462,7 +462,7 @@ accessed.
>  "Anonymous" shows the amount of memory that does not belong to any file.  
> Even
>  a mapping associated with a file may contain anonymous pages: when 
> MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
> -"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
> +"AnonHugePages" shows the amount of memory backed by transparent hugepage.
>  "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>  hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
>  reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
> @@ -1899,7 +1899,7 @@ hidepid=1 means users may not access any /proc// 
> directories but their
>  own.  Sensitive files like cmdline, sched*, status are now protected against
>  other users.  This makes it impossible to learn whether any user runs
>  specific program (given the program doesn't reveal itself by its behaviour).
> -As an additional bonus, as /proc//cmdline is unaccessible for other 
> users,
> +As an additional bonus, as /proc//cmdline is inaccessible for other 
> users,
>  poorly written programs passing sensitive information via program arguments 
> are
>  now protected against local eavesdroppers.
>
> diff --git a/Documentation/filesystems/vfs.txt 
> b/Documentation/filesystems/vfs.txt
> index 4164bd6..ec67866 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -1014,7 +1014,7 @@ struct dentry_operations {
> Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
> pathname generation. (Instead of doing it when dentry is created,
> 

Re: [PATCH 25/41] Documentation: filesystems: fix spelling mistakes

2016-04-24 Thread Steve French
Reviewed-by: Steve French 

On Sun, Apr 24, 2016 at 7:24 PM, Eric Engestrom  wrote:
> Signed-off-by: Eric Engestrom 
> ---
>  Documentation/filesystems/autofs4.txt  | 6 +++---
>  Documentation/filesystems/cifs/CHANGES | 2 +-
>  Documentation/filesystems/proc.txt | 4 ++--
>  Documentation/filesystems/vfs.txt  | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/autofs4.txt 
> b/Documentation/filesystems/autofs4.txt
> index 39d02e1..25fe9db 100644
> --- a/Documentation/filesystems/autofs4.txt
> +++ b/Documentation/filesystems/autofs4.txt
> @@ -225,7 +225,7 @@ unmount any filesystems mounted on the autofs filesystem 
> or remove any
>  symbolic links or empty directories any time it likes.  If the unmount
>  or removal is successful the filesystem will be returned to the state
>  it was before the mount or creation, so that any access of the name
> -will trigger normal auto-mount processing.  In particlar, `rmdir` and
> +will trigger normal auto-mount processing.  In particular, `rmdir` and
>  `unlink` do not leave negative entries in the dcache as a normal
>  filesystem would, so an attempt to access a recently-removed object is
>  passed to autofs for handling.
> @@ -242,7 +242,7 @@ time stamp on each directory or symlink.  For symlinks it 
> genuinely
>  does record the last time the symlink was "used" or followed to find
>  out where it points to.  For directories the field is a slight
>  misnomer.  It actually records the last time that autofs checked if
> -the directory or one of its descendents was busy and found that it
> +the directory or one of its descendants was busy and found that it
>  was.  This is just as useful and doesn't require updating the field so
>  often.
>
> @@ -255,7 +255,7 @@ up.
>
>  There is an option with indirect mounts to consider each of the leaves
>  that has been mounted on instead of considering the top-level names.
> -This is intended for compatability with version 4 of autofs and should
> +This is intended for compatibility with version 4 of autofs and should
>  be considered as deprecated.
>
>  When autofs considers a directory it checks the `last_used` time and
> diff --git a/Documentation/filesystems/cifs/CHANGES 
> b/Documentation/filesystems/cifs/CHANGES
> index bc0025c..fe8f1ed 100644
> --- a/Documentation/filesystems/cifs/CHANGES
> +++ b/Documentation/filesystems/cifs/CHANGES
> @@ -455,7 +455,7 @@ Fix internationalization problem in cifs readdir with 
> filenames that map to
>  longer UTF-8 strings than the string on the wire was in Unicode.  Add 
> workaround
>  for readdir to netapp servers. Fix search rewind (seek into readdir to return
>  non-consecutive entries).  Do not do readdir when server negotiates
> -buffer size to small to fit filename. Add support for reading POSIX ACLs from
> +buffer size too small to fit filename. Add support for reading POSIX ACLs 
> from
>  the server (add also acl and noacl mount options).
>
>  Version 1.24
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index 7f5607a..03b6019 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -462,7 +462,7 @@ accessed.
>  "Anonymous" shows the amount of memory that does not belong to any file.  
> Even
>  a mapping associated with a file may contain anonymous pages: when 
> MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
> -"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
> +"AnonHugePages" shows the amount of memory backed by transparent hugepage.
>  "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>  hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
>  reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
> @@ -1899,7 +1899,7 @@ hidepid=1 means users may not access any /proc// 
> directories but their
>  own.  Sensitive files like cmdline, sched*, status are now protected against
>  other users.  This makes it impossible to learn whether any user runs
>  specific program (given the program doesn't reveal itself by its behaviour).
> -As an additional bonus, as /proc//cmdline is unaccessible for other 
> users,
> +As an additional bonus, as /proc//cmdline is inaccessible for other 
> users,
>  poorly written programs passing sensitive information via program arguments 
> are
>  now protected against local eavesdroppers.
>
> diff --git a/Documentation/filesystems/vfs.txt 
> b/Documentation/filesystems/vfs.txt
> index 4164bd6..ec67866 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -1014,7 +1014,7 @@ struct dentry_operations {
> Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
> pathname generation. (Instead of doing it when dentry is created,
> it's done only when the path is needed.). Real filesystems probably

Re: [PATCH RESEND v2 4/4] ARM64: dts: rockchip: add dts file for RK3399 evaluation board

2016-04-24 Thread Caesar Wang



在 2016年04月22日 13:51, Jianqun Xu 写道:

This patch add rk3399-evb.dts for RK3399 evaluation board.
Tested on RK3399 evb.

Signed-off-by: Jianqun Xu 
---
changes in v2:
- remove rk808 since without i2c, which will upstream independently
- remove es8316 since without i2c, which will upstream independently
- fix codingstyle issues

  arch/arm64/boot/dts/rockchip/Makefile   |   1 +
  arch/arm64/boot/dts/rockchip/rk3399-evb.dts | 122 
  2 files changed, 123 insertions(+)
  create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-evb.dts

diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index df37865..7037a16 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -1,6 +1,7 @@
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-evb-act8846.dtb
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-geekbox.dtb
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-r88.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-evb.dtb
  
  always		:= $(dtb-y)

  subdir-y  := $(dts-dirs)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
new file mode 100644
index 000..309f870
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2016 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include 
+#include "rk3399.dtsi"
+
+/ {
+   compatible = "rockchip,rk3399-evb", "rockchip,rk3399";


I pick them up in my github. 
(https://github.com/Caesar-github/rockchip/tree/wip/fixes-thermal-0425)


Can we add the following strings to match the loader (coreboot)?
 No matter, we can match the loader(coreboot) to bring up the evb board 
on now or in the future.


...,"google,rk3399evb-rev3", 
google,rk3399evb-rev2",google,rk3399evb-rev1","google,rk3399evb-rev0"


Bring up my evb board:
...
Compat preference: google,rk3399evb-rev0
Config conf@4, kernel kernel@1, fdt fdt@4, compat google,rk3399evb-rev0 
(match) rockchip,rk3399-evb rockchip,rk3399
Config conf@3, kernel kernel@1, fdt fdt@3, compat rockchip,r88 
rockchip,rk3368
Config conf@2, kernel kernel@1, fdt fdt@2, compat geekbuying,geekbox 
rockchip,rk3368
Config conf@1 (default), kernel kernel@1, fdt fdt@1, compat 
rockchip,rk3368-evb-act8846 rockchip,rk3368

Choosing best match conf@4.
Shutting down all USB controllers.
Exiting depthcharge with code 4 at timestamp: 6031792
WARNING: Skipping low memory range [0x0:0x50]!
Relocating kernel to 0x68
jumping to kernel
[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 
4.6.0-rc4-next-20160422-00016-g0ac0bfb-dirty (wxt@nb) (gcc version 
4.9.x-google 20140827 (prerelease) (GCC) ) #16 SMP PR




--



-
Caesar

--
Thanks,
Caesar



Re: [PATCH RESEND v2 4/4] ARM64: dts: rockchip: add dts file for RK3399 evaluation board

2016-04-24 Thread Caesar Wang



在 2016年04月22日 13:51, Jianqun Xu 写道:

This patch add rk3399-evb.dts for RK3399 evaluation board.
Tested on RK3399 evb.

Signed-off-by: Jianqun Xu 
---
changes in v2:
- remove rk808 since without i2c, which will upstream independently
- remove es8316 since without i2c, which will upstream independently
- fix codingstyle issues

  arch/arm64/boot/dts/rockchip/Makefile   |   1 +
  arch/arm64/boot/dts/rockchip/rk3399-evb.dts | 122 
  2 files changed, 123 insertions(+)
  create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-evb.dts

diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index df37865..7037a16 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -1,6 +1,7 @@
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-evb-act8846.dtb
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-geekbox.dtb
  dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-r88.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-evb.dtb
  
  always		:= $(dtb-y)

  subdir-y  := $(dts-dirs)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
new file mode 100644
index 000..309f870
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2016 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include 
+#include "rk3399.dtsi"
+
+/ {
+   compatible = "rockchip,rk3399-evb", "rockchip,rk3399";


I pick them up in my github. 
(https://github.com/Caesar-github/rockchip/tree/wip/fixes-thermal-0425)


Can we add the following strings to match the loader (coreboot)?
 No matter, we can match the loader(coreboot) to bring up the evb board 
on now or in the future.


...,"google,rk3399evb-rev3", 
google,rk3399evb-rev2",google,rk3399evb-rev1","google,rk3399evb-rev0"


Bring up my evb board:
...
Compat preference: google,rk3399evb-rev0
Config conf@4, kernel kernel@1, fdt fdt@4, compat google,rk3399evb-rev0 
(match) rockchip,rk3399-evb rockchip,rk3399
Config conf@3, kernel kernel@1, fdt fdt@3, compat rockchip,r88 
rockchip,rk3368
Config conf@2, kernel kernel@1, fdt fdt@2, compat geekbuying,geekbox 
rockchip,rk3368
Config conf@1 (default), kernel kernel@1, fdt fdt@1, compat 
rockchip,rk3368-evb-act8846 rockchip,rk3368

Choosing best match conf@4.
Shutting down all USB controllers.
Exiting depthcharge with code 4 at timestamp: 6031792
WARNING: Skipping low memory range [0x0:0x50]!
Relocating kernel to 0x68
jumping to kernel
[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 
4.6.0-rc4-next-20160422-00016-g0ac0bfb-dirty (wxt@nb) (gcc version 
4.9.x-google 20140827 (prerelease) (GCC) ) #16 SMP PR




--



-
Caesar

--
Thanks,
Caesar



  1   2   3   4   5   6   >