Re: [PATCH v3 17/17] mm, vmscan: make compaction_ready() more accurate and readable

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:37AM +0200, Vlastimil Babka wrote:
> The compaction_ready() is used during direct reclaim for costly order
> allocations to skip reclaim for zones where compaction should be attempted
> instead. It's combining the standard compaction_suitable() check with its own
> watermark check based on high watermark with extra gap, and the result is
> confusing at best.
> 
> This patch attempts to better structure and document the checks involved.
> First, compaction_suitable() can determine that the allocation should either
> succeed already, or that compaction doesn't have enough free pages to proceed.
> The third possibility is that compaction has enough free pages, but we still
> decide to reclaim first - unless we are already above the high watermark with
> gap.  This does not mean that the reclaim will actually reach this watermark
> during single attempt, this is rather an over-reclaim protection. So document
> the code as such. The check for compaction_deferred() is removed completely, 
> as
> it in fact had no proper role here.
> 
> The result after this patch is mainly a less confusing code. We also skip some
> over-reclaim in cases where the allocation should already succed.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/vmscan.c | 43 ---
>  1 file changed, 20 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 484ff05d5a8f..724131661f0c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2462,40 +2462,37 @@ static bool shrink_zone(struct zone *zone, struct 
> scan_control *sc,
>  }
>  
>  /*
> - * Returns true if compaction should go ahead for a high-order request, or
> - * the high-order allocation would succeed without compaction.
> + * Returns true if compaction should go ahead for a costly-order request, or
> + * the allocation would already succeed without compaction. Return false if 
> we
> + * should reclaim first.
>   */
>  static inline bool compaction_ready(struct zone *zone, int order, int 
> classzone_idx)
>  {
>   unsigned long balance_gap, watermark;
> - bool watermark_ok;
> + enum compact_result suitable;
> +
> + suitable = compaction_suitable(zone, order, 0, classzone_idx);
> + if (suitable == COMPACT_PARTIAL)
> + /* Allocation should succeed already. Don't reclaim. */
> + return true;
> + if (suitable == COMPACT_SKIPPED)
> + /* Compaction cannot yet proceed. Do reclaim. */
> + return false;
>  
>   /*
> -  * Compaction takes time to run and there are potentially other
> -  * callers using the pages just freed. Continue reclaiming until
> -  * there is a buffer of free pages available to give compaction
> -  * a reasonable chance of completing and allocating the page
> +  * Compaction is already possible, but it takes time to run and there
> +  * are potentially other callers using the pages just freed. So proceed
> +  * with reclaim to make a buffer of free pages available to give
> +  * compaction a reasonable chance of completing and allocating the page.
> +  * Note that we won't actually reclaim the whole buffer in one attempt
> +  * as the target watermark in should_continue_reclaim() is lower. But if
> +  * we are already above the high+gap watermark, don't reclaim at all.
>*/
>   balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
>   zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
>   watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
> - watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 
> classzone_idx);

Hmm... it doesn't explain why both high_wmark_pages and balance_gap
are needed. If we want to make a buffer, one of them would work.

Thanks.

> -
> - /*
> -  * If compaction is deferred, reclaim up to a point where
> -  * compaction will have a chance of success when re-enabled
> -  */
> - if (compaction_deferred(zone, order))
> - return watermark_ok;
> -
> - /*
> -  * If compaction is not ready to start and allocation is not likely
> -  * to succeed without it, then keep reclaiming.
> -  */
> - if (compaction_suitable(zone, order, 0, classzone_idx) == 
> COMPACT_SKIPPED)
> - return false;
>  
> - return watermark_ok;
> + return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  }
>  
>  /*
> -- 
> 2.8.4


Re: [PATCH v3 17/17] mm, vmscan: make compaction_ready() more accurate and readable

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:37AM +0200, Vlastimil Babka wrote:
> The compaction_ready() is used during direct reclaim for costly order
> allocations to skip reclaim for zones where compaction should be attempted
> instead. It's combining the standard compaction_suitable() check with its own
> watermark check based on high watermark with extra gap, and the result is
> confusing at best.
> 
> This patch attempts to better structure and document the checks involved.
> First, compaction_suitable() can determine that the allocation should either
> succeed already, or that compaction doesn't have enough free pages to proceed.
> The third possibility is that compaction has enough free pages, but we still
> decide to reclaim first - unless we are already above the high watermark with
> gap.  This does not mean that the reclaim will actually reach this watermark
> during single attempt, this is rather an over-reclaim protection. So document
> the code as such. The check for compaction_deferred() is removed completely, 
> as
> it in fact had no proper role here.
> 
> The result after this patch is mainly a less confusing code. We also skip some
> over-reclaim in cases where the allocation should already succed.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/vmscan.c | 43 ---
>  1 file changed, 20 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 484ff05d5a8f..724131661f0c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2462,40 +2462,37 @@ static bool shrink_zone(struct zone *zone, struct 
> scan_control *sc,
>  }
>  
>  /*
> - * Returns true if compaction should go ahead for a high-order request, or
> - * the high-order allocation would succeed without compaction.
> + * Returns true if compaction should go ahead for a costly-order request, or
> + * the allocation would already succeed without compaction. Return false if 
> we
> + * should reclaim first.
>   */
>  static inline bool compaction_ready(struct zone *zone, int order, int 
> classzone_idx)
>  {
>   unsigned long balance_gap, watermark;
> - bool watermark_ok;
> + enum compact_result suitable;
> +
> + suitable = compaction_suitable(zone, order, 0, classzone_idx);
> + if (suitable == COMPACT_PARTIAL)
> + /* Allocation should succeed already. Don't reclaim. */
> + return true;
> + if (suitable == COMPACT_SKIPPED)
> + /* Compaction cannot yet proceed. Do reclaim. */
> + return false;
>  
>   /*
> -  * Compaction takes time to run and there are potentially other
> -  * callers using the pages just freed. Continue reclaiming until
> -  * there is a buffer of free pages available to give compaction
> -  * a reasonable chance of completing and allocating the page
> +  * Compaction is already possible, but it takes time to run and there
> +  * are potentially other callers using the pages just freed. So proceed
> +  * with reclaim to make a buffer of free pages available to give
> +  * compaction a reasonable chance of completing and allocating the page.
> +  * Note that we won't actually reclaim the whole buffer in one attempt
> +  * as the target watermark in should_continue_reclaim() is lower. But if
> +  * we are already above the high+gap watermark, don't reclaim at all.
>*/
>   balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
>   zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
>   watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
> - watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 
> classzone_idx);

Hmm... it doesn't explain why both high_wmark_pages and balance_gap
are needed. If we want to make a buffer, one of them would work.

Thanks.

> -
> - /*
> -  * If compaction is deferred, reclaim up to a point where
> -  * compaction will have a chance of success when re-enabled
> -  */
> - if (compaction_deferred(zone, order))
> - return watermark_ok;
> -
> - /*
> -  * If compaction is not ready to start and allocation is not likely
> -  * to succeed without it, then keep reclaiming.
> -  */
> - if (compaction_suitable(zone, order, 0, classzone_idx) == 
> COMPACT_SKIPPED)
> - return false;
>  
> - return watermark_ok;
> + return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  }
>  
>  /*
> -- 
> 2.8.4


Re: [PATCH v3 13/17] mm, compaction: use correct watermark when checking allocation success

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:33AM +0200, Vlastimil Babka wrote:
> The __compact_finished() function uses low watermark in a check that has to
> pass if the direct compaction is to finish and allocation should succeed. This
> is too pessimistic, as the allocation will typically use min watermark. It may
> happen that during compaction, we drop below the low watermark (due to 
> parallel
> activity), but still form the target high-order page. By checking against low
> watermark, we might needlessly continue compaction.
> 
> Similarly, __compaction_suitable() uses low watermark in a check whether
> allocation can succeed without compaction. Again, this is unnecessarily
> pessimistic.
> 
> After this patch, these check will use direct compactor's alloc_flags to
> determine the watermark, which is effectively the min watermark.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/compaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 76897850c3c2..371760a85085 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1320,7 +1320,7 @@ static enum compact_result __compact_finished(struct 
> zone *zone, struct compact_
>   return COMPACT_CONTINUE;
>  
>   /* Compaction run is not finished if the watermark is not met */
> - watermark = low_wmark_pages(zone);
> + watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];

finish condition is changed. We have two more watermark checks in
try_to_compact_pages() and kcompactd_do_work() and they should be
changed too.

Thanks.
>  
>   if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
>   cc->alloc_flags))
> @@ -1385,7 +1385,7 @@ static enum compact_result __compaction_suitable(struct 
> zone *zone, int order,
>   if (is_via_compact_memory(order))
>   return COMPACT_CONTINUE;
>  
> - watermark = low_wmark_pages(zone);
> + watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>   /*
>* If watermarks for high-order allocation are already met, there
>* should be no need for compaction at all.
> @@ -1399,7 +1399,7 @@ static enum compact_result __compaction_suitable(struct 
> zone *zone, int order,
>* This is because during migration, copies of pages need to be
>* allocated and for a short time, the footprint is higher
>*/
> - watermark += (2UL << order);
> + watermark = low_wmark_pages(zone) + (2UL << order);
>   if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>alloc_flags, wmark_target))
>   return COMPACT_SKIPPED;
> -- 
> 2.8.4


Re: [PATCH v3 13/17] mm, compaction: use correct watermark when checking allocation success

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:33AM +0200, Vlastimil Babka wrote:
> The __compact_finished() function uses low watermark in a check that has to
> pass if the direct compaction is to finish and allocation should succeed. This
> is too pessimistic, as the allocation will typically use min watermark. It may
> happen that during compaction, we drop below the low watermark (due to 
> parallel
> activity), but still form the target high-order page. By checking against low
> watermark, we might needlessly continue compaction.
> 
> Similarly, __compaction_suitable() uses low watermark in a check whether
> allocation can succeed without compaction. Again, this is unnecessarily
> pessimistic.
> 
> After this patch, these check will use direct compactor's alloc_flags to
> determine the watermark, which is effectively the min watermark.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/compaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 76897850c3c2..371760a85085 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1320,7 +1320,7 @@ static enum compact_result __compact_finished(struct 
> zone *zone, struct compact_
>   return COMPACT_CONTINUE;
>  
>   /* Compaction run is not finished if the watermark is not met */
> - watermark = low_wmark_pages(zone);
> + watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];

finish condition is changed. We have two more watermark checks in
try_to_compact_pages() and kcompactd_do_work() and they should be
changed too.

Thanks.
>  
>   if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
>   cc->alloc_flags))
> @@ -1385,7 +1385,7 @@ static enum compact_result __compaction_suitable(struct 
> zone *zone, int order,
>   if (is_via_compact_memory(order))
>   return COMPACT_CONTINUE;
>  
> - watermark = low_wmark_pages(zone);
> + watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>   /*
>* If watermarks for high-order allocation are already met, there
>* should be no need for compaction at all.
> @@ -1399,7 +1399,7 @@ static enum compact_result __compaction_suitable(struct 
> zone *zone, int order,
>* This is because during migration, copies of pages need to be
>* allocated and for a short time, the footprint is higher
>*/
> - watermark += (2UL << order);
> + watermark = low_wmark_pages(zone) + (2UL << order);
>   if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>alloc_flags, wmark_target))
>   return COMPACT_SKIPPED;
> -- 
> 2.8.4


Re: [PATCH v7 1/4] soc: mediatek: Refine scpsys to support multiple platform

2016-07-05 Thread James Liao
Hi Matthias,

On Sat, 2016-07-02 at 18:33 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > Refine scpsys driver common code to support multiple SoC / platform.
> >
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/mtk-scpsys.c | 363 
> > +++---
> >  1 file changed, 220 insertions(+), 143 deletions(-)
> >
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 57e781c..5870a24 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -11,17 +11,15 @@
> >   * GNU General Public License for more details.
> >   */
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> > -#include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > -#include 
> >  #include 
> > +#include 
> > +
> >  #include 
> >
> >  #define SPM_VDE_PWR_CON0x0210
> > @@ -34,6 +32,7 @@
> >  #define SPM_MFG_2D_PWR_CON 0x02c0
> >  #define SPM_MFG_ASYNC_PWR_CON  0x02c4
> >  #define SPM_USB_PWR_CON0x02cc
> > +
> >  #define SPM_PWR_STATUS 0x060c
> >  #define SPM_PWR_STATUS_2ND 0x0610
> >
> > @@ -55,12 +54,12 @@
> >  #define PWR_STATUS_USB BIT(25)
> >
> >  enum clk_id {
> > -   MT8173_CLK_NONE,
> > -   MT8173_CLK_MM,
> > -   MT8173_CLK_MFG,
> > -   MT8173_CLK_VENC,
> > -   MT8173_CLK_VENC_LT,
> > -   MT8173_CLK_MAX,
> > +   CLK_NONE,
> > +   CLK_MM,
> > +   CLK_MFG,
> > +   CLK_VENC,
> > +   CLK_VENC_LT,
> > +   CLK_MAX,
> >  };
> >
> >  #define MAX_CLKS   2
> > @@ -76,98 +75,6 @@ struct scp_domain_data {
> > bool active_wakeup;
> >  };
> >
> > -static const struct scp_domain_data scp_domain_data[] = {
> > -   [MT8173_POWER_DOMAIN_VDEC] = {
> > -   .name = "vdec",
> > -   .sta_mask = PWR_STATUS_VDEC,
> > -   .ctl_offs = SPM_VDE_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(12, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   },
> > -   [MT8173_POWER_DOMAIN_VENC] = {
> > -   .name = "venc",
> > -   .sta_mask = PWR_STATUS_VENC,
> > -   .ctl_offs = SPM_VEN_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_MM, MT8173_CLK_VENC},
> > -   },
> > -   [MT8173_POWER_DOMAIN_ISP] = {
> > -   .name = "isp",
> > -   .sta_mask = PWR_STATUS_ISP,
> > -   .ctl_offs = SPM_ISP_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(13, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   },
> > -   [MT8173_POWER_DOMAIN_MM] = {
> > -   .name = "mm",
> > -   .sta_mask = PWR_STATUS_DISP,
> > -   .ctl_offs = SPM_DIS_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(12, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   .bus_prot_mask = MT8173_TOP_AXI_PROT_EN_MM_M0 |
> > -   MT8173_TOP_AXI_PROT_EN_MM_M1,
> > -   },
> > -   [MT8173_POWER_DOMAIN_VENC_LT] = {
> > -   .name = "venc_lt",
> > -   .sta_mask = PWR_STATUS_VENC_LT,
> > -   .ctl_offs = SPM_VEN2_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_MM, MT8173_CLK_VENC_LT},
> > -   },
> > -   [MT8173_POWER_DOMAIN_AUDIO] = {
> > -   .name = "audio",
> > -   .sta_mask = PWR_STATUS_AUDIO,
> > -   .ctl_offs = SPM_AUDIO_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   },
> > -   [MT8173_POWER_DOMAIN_USB] = {
> > -   .name = "usb",
> > -   .sta_mask = PWR_STATUS_USB,
> > -   .ctl_offs = SPM_USB_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   .active_wakeup = true,
> > -   },
> > -   [MT8173_POWER_DOMAIN_MFG_ASYNC] = {
> > -   .name = "mfg_async",
> > -   .sta_mask = PWR_STATUS_MFG_ASYNC,
> > -   .ctl_offs = SPM_MFG_ASYNC_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = 0,
> > -   .clk_id = {MT8173_CLK_MFG},
> > -   },
> > -   [MT8173_POWER_DOMAIN_MFG_2D] = {
> > -   .name = "mfg_2d",
> > -   .sta_mask = PWR_STATUS_MFG_2D,
> > -   .ctl_offs = SPM_MFG_2D_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(13, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   },
> > -   

Re: [PATCH v7 1/4] soc: mediatek: Refine scpsys to support multiple platform

2016-07-05 Thread James Liao
Hi Matthias,

On Sat, 2016-07-02 at 18:33 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > Refine scpsys driver common code to support multiple SoC / platform.
> >
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/mtk-scpsys.c | 363 
> > +++---
> >  1 file changed, 220 insertions(+), 143 deletions(-)
> >
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 57e781c..5870a24 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -11,17 +11,15 @@
> >   * GNU General Public License for more details.
> >   */
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> > -#include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > -#include 
> >  #include 
> > +#include 
> > +
> >  #include 
> >
> >  #define SPM_VDE_PWR_CON0x0210
> > @@ -34,6 +32,7 @@
> >  #define SPM_MFG_2D_PWR_CON 0x02c0
> >  #define SPM_MFG_ASYNC_PWR_CON  0x02c4
> >  #define SPM_USB_PWR_CON0x02cc
> > +
> >  #define SPM_PWR_STATUS 0x060c
> >  #define SPM_PWR_STATUS_2ND 0x0610
> >
> > @@ -55,12 +54,12 @@
> >  #define PWR_STATUS_USB BIT(25)
> >
> >  enum clk_id {
> > -   MT8173_CLK_NONE,
> > -   MT8173_CLK_MM,
> > -   MT8173_CLK_MFG,
> > -   MT8173_CLK_VENC,
> > -   MT8173_CLK_VENC_LT,
> > -   MT8173_CLK_MAX,
> > +   CLK_NONE,
> > +   CLK_MM,
> > +   CLK_MFG,
> > +   CLK_VENC,
> > +   CLK_VENC_LT,
> > +   CLK_MAX,
> >  };
> >
> >  #define MAX_CLKS   2
> > @@ -76,98 +75,6 @@ struct scp_domain_data {
> > bool active_wakeup;
> >  };
> >
> > -static const struct scp_domain_data scp_domain_data[] = {
> > -   [MT8173_POWER_DOMAIN_VDEC] = {
> > -   .name = "vdec",
> > -   .sta_mask = PWR_STATUS_VDEC,
> > -   .ctl_offs = SPM_VDE_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(12, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   },
> > -   [MT8173_POWER_DOMAIN_VENC] = {
> > -   .name = "venc",
> > -   .sta_mask = PWR_STATUS_VENC,
> > -   .ctl_offs = SPM_VEN_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_MM, MT8173_CLK_VENC},
> > -   },
> > -   [MT8173_POWER_DOMAIN_ISP] = {
> > -   .name = "isp",
> > -   .sta_mask = PWR_STATUS_ISP,
> > -   .ctl_offs = SPM_ISP_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(13, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   },
> > -   [MT8173_POWER_DOMAIN_MM] = {
> > -   .name = "mm",
> > -   .sta_mask = PWR_STATUS_DISP,
> > -   .ctl_offs = SPM_DIS_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(12, 12),
> > -   .clk_id = {MT8173_CLK_MM},
> > -   .bus_prot_mask = MT8173_TOP_AXI_PROT_EN_MM_M0 |
> > -   MT8173_TOP_AXI_PROT_EN_MM_M1,
> > -   },
> > -   [MT8173_POWER_DOMAIN_VENC_LT] = {
> > -   .name = "venc_lt",
> > -   .sta_mask = PWR_STATUS_VENC_LT,
> > -   .ctl_offs = SPM_VEN2_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_MM, MT8173_CLK_VENC_LT},
> > -   },
> > -   [MT8173_POWER_DOMAIN_AUDIO] = {
> > -   .name = "audio",
> > -   .sta_mask = PWR_STATUS_AUDIO,
> > -   .ctl_offs = SPM_AUDIO_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   },
> > -   [MT8173_POWER_DOMAIN_USB] = {
> > -   .name = "usb",
> > -   .sta_mask = PWR_STATUS_USB,
> > -   .ctl_offs = SPM_USB_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(15, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   .active_wakeup = true,
> > -   },
> > -   [MT8173_POWER_DOMAIN_MFG_ASYNC] = {
> > -   .name = "mfg_async",
> > -   .sta_mask = PWR_STATUS_MFG_ASYNC,
> > -   .ctl_offs = SPM_MFG_ASYNC_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = 0,
> > -   .clk_id = {MT8173_CLK_MFG},
> > -   },
> > -   [MT8173_POWER_DOMAIN_MFG_2D] = {
> > -   .name = "mfg_2d",
> > -   .sta_mask = PWR_STATUS_MFG_2D,
> > -   .ctl_offs = SPM_MFG_2D_PWR_CON,
> > -   .sram_pdn_bits = GENMASK(11, 8),
> > -   .sram_pdn_ack_bits = GENMASK(13, 12),
> > -   .clk_id = {MT8173_CLK_NONE},
> > -   },
> > -   [MT8173_POWER_DOMAIN_MFG] = {
> > -   .name = "mfg",

Re: [PATCH] KVM: VMX: switch to hrtimer for TSC deadline timer when L2 guest is running

2016-07-05 Thread Wanpeng Li
Cc Jan,
2016-07-06 13:10 GMT+08:00 Haozhong Zhang :
> A different VMCS is loaded when L2 guest is running, so it's incorrect
> to use the VMX preemption timer for L1 TSC deadline timer. This patch
> switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
> and switches back to VMX preemption timer when nested VMEXIT from L2 to
> L1.

Nested preemption timer is emulated by hrtimer, so it doesn't
influence vmcs02, why this is needed?

Regards,
Wanpeng Li


Re: [PATCH] KVM: VMX: switch to hrtimer for TSC deadline timer when L2 guest is running

2016-07-05 Thread Wanpeng Li
Cc Jan,
2016-07-06 13:10 GMT+08:00 Haozhong Zhang :
> A different VMCS is loaded when L2 guest is running, so it's incorrect
> to use the VMX preemption timer for L1 TSC deadline timer. This patch
> switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
> and switches back to VMX preemption timer when nested VMEXIT from L2 to
> L1.

Nested preemption timer is emulated by hrtimer, so it doesn't
influence vmcs02, why this is needed?

Regards,
Wanpeng Li


Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that 
> compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with the lower priority due to parallel activity, so we might needlessly
> struggle on the lower priority and possibly run out of compaction retry
> attempts in the process.
> 
> We can remove these corner cases by increasing compaction priority regardless
> of compaction_failed(). Examining further the compaction result can be
> postponed only after reaching the highest priority. This is a simple solution
> and we don't need to worry about reaching the highest priority "too soon" 
> here,
> because hen should_compact_retry() is called it means that the system is
> already struggling and the allocation is supposed to either try as hard as
> possible, or it cannot fail at all. There's not much point staying at lower
> priorities with heuristics that may result in only partial compaction.
> Also we now count compaction retries only after reaching the highest priority.

I'm not sure that this patch is safe. Deferring and skip-bit in
compaction is highly related to reclaim/compaction. Just ignoring them and 
(almost)
unconditionally increasing compaction priority will result in less
reclaim and less success rate on compaction. And, as a necessarily, it
would trigger OOM more frequently.

It would not be your fault. This patch is reasonable in current
situation. It just makes current things more deterministic
although I dislike that current things and this patch would amplify
those problem.

Thanks.

> The only exception here is the COMPACT_SKIPPED result, which means that
> compaction could not run at all due to being below order-0 watermarks. In that
> case, don't increase compaction priority, and check if compaction could 
> proceed
> when everything reclaimable was reclaimed. Before this patch, this was tied to
> compaction_withdrawn(), but the other results considered there are in fact 
> only
> possible due to low compaction priority so we can ignore them thanks to the
> patch. Since there are no other callers of compaction_withdrawn(), change its
> semantics to remove the low priority scenarios.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  include/linux/compaction.h | 28 ++---
>  mm/page_alloc.c| 51 
> ++
>  2 files changed, 31 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 869b594cf4ff..a6b3d5d2ae53 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -106,8 +106,8 @@ static inline bool compaction_failed(enum compact_result 
> result)
>  }
>  
>  /*
> - * Compaction  has backed off for some reason. It might be throttling or
> - * lock contention. Retrying is still worthwhile.
> + * Compaction has backed off because it cannot proceed until there is enough
> + * free memory. Retrying is still worthwhile after reclaim.
>   */
>  static inline bool compaction_withdrawn(enum compact_result result)
>  {
> @@ -118,30 +118,6 @@ static inline bool compaction_withdrawn(enum 
> compact_result result)
>   if (result == COMPACT_SKIPPED)
>   return true;
>  
> - /*
> -  * If compaction is deferred for high-order allocations, it is
> -  * because sync compaction recently failed. If this is the case
> -  * and the caller requested a THP allocation, we do not want
> -  * to heavily disrupt the system, so we fail the allocation
> -  * instead of entering direct reclaim.
> -  */
> - if (result == COMPACT_DEFERRED)
> - return true;
> -
> - /*
> -  * If compaction in async mode encounters contention or blocks higher
> -  * priority task we back off early rather than cause stalls.
> -  */
> - if (result == COMPACT_CONTENDED)
> - return true;
> -
> - /*
> -  * Page scanners have met but we haven't scanned full zones so this
> -  * is a back off in fact.
> -  */
> - if (result == COMPACT_PARTIAL_SKIPPED)
> - return true;
> -
>   return false;
>  }
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 204cc988fd64..e1efdc8d2a52 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3217,7 +3217,7 @@ static inline bool
>  should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>enum compact_result compact_result,
>enum compact_priority *compact_priority,
> -  int compaction_retries)
> +  int *compaction_retries)
>  {
>   int max_retries = MAX_COMPACT_RETRIES;
>  
> @@ -3225,28 +3225,35 

Re: [PATCH v3 12/17] mm, compaction: more reliably increase direct compaction priority

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:32AM +0200, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that 
> compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with the lower priority due to parallel activity, so we might needlessly
> struggle on the lower priority and possibly run out of compaction retry
> attempts in the process.
> 
> We can remove these corner cases by increasing compaction priority regardless
> of compaction_failed(). Examining further the compaction result can be
> postponed only after reaching the highest priority. This is a simple solution
> and we don't need to worry about reaching the highest priority "too soon" 
> here,
> because hen should_compact_retry() is called it means that the system is
> already struggling and the allocation is supposed to either try as hard as
> possible, or it cannot fail at all. There's not much point staying at lower
> priorities with heuristics that may result in only partial compaction.
> Also we now count compaction retries only after reaching the highest priority.

I'm not sure that this patch is safe. Deferring and skip-bit in
compaction is highly related to reclaim/compaction. Just ignoring them and 
(almost)
unconditionally increasing compaction priority will result in less
reclaim and less success rate on compaction. And, as a necessarily, it
would trigger OOM more frequently.

It would not be your fault. This patch is reasonable in current
situation. It just makes current things more deterministic
although I dislike that current things and this patch would amplify
those problem.

Thanks.

> The only exception here is the COMPACT_SKIPPED result, which means that
> compaction could not run at all due to being below order-0 watermarks. In that
> case, don't increase compaction priority, and check if compaction could 
> proceed
> when everything reclaimable was reclaimed. Before this patch, this was tied to
> compaction_withdrawn(), but the other results considered there are in fact 
> only
> possible due to low compaction priority so we can ignore them thanks to the
> patch. Since there are no other callers of compaction_withdrawn(), change its
> semantics to remove the low priority scenarios.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  include/linux/compaction.h | 28 ++---
>  mm/page_alloc.c| 51 
> ++
>  2 files changed, 31 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 869b594cf4ff..a6b3d5d2ae53 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -106,8 +106,8 @@ static inline bool compaction_failed(enum compact_result 
> result)
>  }
>  
>  /*
> - * Compaction  has backed off for some reason. It might be throttling or
> - * lock contention. Retrying is still worthwhile.
> + * Compaction has backed off because it cannot proceed until there is enough
> + * free memory. Retrying is still worthwhile after reclaim.
>   */
>  static inline bool compaction_withdrawn(enum compact_result result)
>  {
> @@ -118,30 +118,6 @@ static inline bool compaction_withdrawn(enum 
> compact_result result)
>   if (result == COMPACT_SKIPPED)
>   return true;
>  
> - /*
> -  * If compaction is deferred for high-order allocations, it is
> -  * because sync compaction recently failed. If this is the case
> -  * and the caller requested a THP allocation, we do not want
> -  * to heavily disrupt the system, so we fail the allocation
> -  * instead of entering direct reclaim.
> -  */
> - if (result == COMPACT_DEFERRED)
> - return true;
> -
> - /*
> -  * If compaction in async mode encounters contention or blocks higher
> -  * priority task we back off early rather than cause stalls.
> -  */
> - if (result == COMPACT_CONTENDED)
> - return true;
> -
> - /*
> -  * Page scanners have met but we haven't scanned full zones so this
> -  * is a back off in fact.
> -  */
> - if (result == COMPACT_PARTIAL_SKIPPED)
> - return true;
> -
>   return false;
>  }
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 204cc988fd64..e1efdc8d2a52 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3217,7 +3217,7 @@ static inline bool
>  should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>enum compact_result compact_result,
>enum compact_priority *compact_priority,
> -  int compaction_retries)
> +  int *compaction_retries)
>  {
>   int max_retries = MAX_COMPACT_RETRIES;
>  
> @@ -3225,28 +3225,35 @@ should_compact_retry(struct 

Re: [PATCH] lockdep: Add a document describing crossrelease feature

2016-07-05 Thread Byungchul Park
On Wed, Jul 06, 2016 at 11:17:10AM +0900, Byungchul Park wrote:
> 
> lock(A)
> wait_for(B)
>  <- serialized by atomic operation
>   lock(A)
>   unlock(A)
>   wake(B)
> unlock(A)

By the way, I have a question. Is there anyone who could answer it?

I want to serialize between two context's lock operations, for example,

context A   context B
--  --
lock A
lock B  ...
lock C
atomic_inc_return
~~ <- serialization
atomic_read
lock D
... lock E
lock F

so that we can see these in the order like A -> B -> C -> D -> E -> F.

atomic_inc_return() is used after lock C in context A, and atomic_read()
is used before lock D in context B. And I want to make it serialized when
the atomic_read() can see the increased value.

Can I use smp_mb__after_atomic() just after atomic_read() or should I use
smp_mb()? I think anyway I have to choose one of them for that ordering.

Thank you,
Byungchul


[PATCH] clocksource: cadence_ttc: fix a return value in case of error

2016-07-05 Thread Christophe JAILLET
IS_ERR and PTR_ERR should use the same variable, clk_ce in this case.

Fixes: 4de1eb07c47f (Convert init function to return error)

Signed-off-by: Christophe JAILLET 
---
 drivers/clocksource/cadence_ttc_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/cadence_ttc_timer.c 
b/drivers/clocksource/cadence_ttc_timer.c
index 388a77b..fbfbdec 100644
--- a/drivers/clocksource/cadence_ttc_timer.c
+++ b/drivers/clocksource/cadence_ttc_timer.c
@@ -523,7 +523,7 @@ static int __init ttc_timer_init(struct device_node *timer)
clk_ce = of_clk_get(timer, clksel);
if (IS_ERR(clk_ce)) {
pr_err("ERROR: timer input clock not found\n");
-   return PTR_ERR(clk_cs);
+   return PTR_ERR(clk_ce);
}
 
ret = ttc_setup_clocksource(clk_cs, timer_baseaddr, timer_width);
-- 
2.7.4



Re: [PATCH] lockdep: Add a document describing crossrelease feature

2016-07-05 Thread Byungchul Park
On Wed, Jul 06, 2016 at 11:17:10AM +0900, Byungchul Park wrote:
> 
> lock(A)
> wait_for(B)
>  <- serialized by atomic operation
>   lock(A)
>   unlock(A)
>   wake(B)
> unlock(A)

By the way, I have a question. Is there anyone who could answer it?

I want to serialize between two context's lock operations, for example,

context A   context B
--  --
lock A
lock B  ...
lock C
atomic_inc_return
~~ <- serialization
atomic_read
lock D
... lock E
lock F

so that we can see these in the order like A -> B -> C -> D -> E -> F.

atomic_inc_return() is used after lock C in context A, and atomic_read()
is used before lock D in context B. And I want to make it serialized when
the atomic_read() can see the increased value.

Can I use smp_mb__after_atomic() just after atomic_read() or should I use
smp_mb()? I think anyway I have to choose one of them for that ordering.

Thank you,
Byungchul


[PATCH] clocksource: cadence_ttc: fix a return value in case of error

2016-07-05 Thread Christophe JAILLET
IS_ERR and PTR_ERR should use the same variable, clk_ce in this case.

Fixes: 4de1eb07c47f (Convert init function to return error)

Signed-off-by: Christophe JAILLET 
---
 drivers/clocksource/cadence_ttc_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/cadence_ttc_timer.c 
b/drivers/clocksource/cadence_ttc_timer.c
index 388a77b..fbfbdec 100644
--- a/drivers/clocksource/cadence_ttc_timer.c
+++ b/drivers/clocksource/cadence_ttc_timer.c
@@ -523,7 +523,7 @@ static int __init ttc_timer_init(struct device_node *timer)
clk_ce = of_clk_get(timer, clksel);
if (IS_ERR(clk_ce)) {
pr_err("ERROR: timer input clock not found\n");
-   return PTR_ERR(clk_cs);
+   return PTR_ERR(clk_ce);
}
 
ret = ttc_setup_clocksource(clk_cs, timer_baseaddr, timer_width);
-- 
2.7.4



Re: porting kcov to android

2016-07-05 Thread Dmitry Vyukov
Well, something is broken.
Shadow in the report is complete mess (fc is heap redzone, while f4 is
stack redzone). I wonder if it is the bootstrap shadow page that is
used for both heap and stack. Or maybe we return poisoned pages to
pagealloc.
The first thing I would try is to disable stack and global
instrumentation (there are separate flags somewhere in the makefiles).


On Wed, Jul 6, 2016 at 6:57 AM, Baozeng  wrote:
> Hello all,
> I backported KASAN to 3.10.102 stable kerenl
> (ca1199fccf14540e86f6da955333e31d6fec5f3e), based on Andrey Ryabinin's work
> (backport KASAN to RHEL7-based (3.10 based) OpenVZ kernel). I met the
> following kernel panic when starting the kernel using the following command:
>
> qemu-system-x86_64 -hda ./wheezy.img -snapshot -m 2048 -net nic -net
> user,host=10.0.2.10,hostfwd=tcp::51727-:22 -nographic -enable-kvm -numa
> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet -soundhw
> all -kernel ./bzImage -append console=ttyS0 root=/dev/sda debug
> earlyprintk=serial slub_debug=UZ
>
> any suggestions?
>
> ==
> BUG: KASan: out of bounds access in usage_match+0x63/0x70 at addr
> 88002c81ff40
> Read of size 8 by task khubd/923
> =
> BUG kmalloc-4096 (Not tainted): kasan: bad access detected
> -
>
> Disabling lock debugging due to kernel taint
> INFO: Allocated in input_dev_pm_ops+0x520/0x5e0 age=131944943344261 cpu=0
> pid=-536871936
> 0x41b58ab3
> [<  none  >] vsock_dgram_ops+0x337bd3/0x3a5a50 ??:?
> [<  none  >] sysfs_new_dirent+0x0/0x410
> /linux-stable/fs/sysfs/dir.c:1027
> 0x88002c8209d8
> 0xed000590413c
> 0xdc00
> 0x88002c8209e0
> 0x88002c820920
> [<  none  >] mutex_unlock+0x15/0x20 /linux-stable/kernel/mutex.c:252
> 0x11000590412f
> 0x88002c820958
> [<  none  >] sysfs_attr_ns+0x162/0x260
> /linux-stable/fs/sysfs/file.c:522
> 0x11000590412f
> 0x88002c820a18
> [<  none  >] dev_attr_uniq+0x0/0x60
> arch/x86/crypto/sha512-avx2-asm.o:?
> 0x8800280feae0
> INFO: Freed in sysfs_add_file_mode+0x141/0x2d0 age=6421765850 cpu=746719736
> pid=-30720
> 0x1242cf991f0
> 0x0002
> 0x41b58ab3
> [<  none  >] vsock_dgram_ops+0x337b87/0x3a5a50 ??:?
> [<  none  >] sysfs_add_file_mode+0x0/0x2d0
> /linux-stable/fs/sysfs/file.c:693
> 0x88002cf998c8
> INFO: Slab 0xeab20600 objects=7 used=0 fp=0x88002c818000
> flags=0x1fc4080
> INFO: Object 0x88002c81f8c0 @offset=30912 fp=0x0002
>
>
> Redzone 88002c8208c0: 1a 41 90 05 00 f1 ff 1f
> .A..
> Padding 88002c8209f8: 40 0a 82 2c 00 88 ff ff
> @..,
> CPU: 0 PID: 923 Comm: khubd Tainted: GB3.10.102+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  88002c818000 88002c81fc60 850cbe98 88002c81fc90
>  81584f48 88002d806f40 eab20600 88002c81f8c0
>   88002c81fcb8 8158b731 ed0005903fe8
> Call Trace:
> Memory state around the buggy address:
>  88002c81fe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88002c81fe80: fc fc f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 f4
>>88002c81ff00: f4 f4 f2 f2 f2 f2 fc fc fc fc fc fc fc fc f2 f2
>^
>  88002c81ff80: f2 f2 fc fc fc fc fc fc fc fc f3 f3 f3 f3 fc fc
>  88002c82: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral
> protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 923 Comm: khubd Tainted: GB3.10.102+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> task: 88002cf991f0 ti: 88002c82 task.ti: 88002c82
> RIP: 0010:[]  []
> cpuacct_charge+0x1ab/0x490
> RSP: :88002de03be0  EFLAGS: 00010046
> RAX: dc001d5585dc RBX: c5a0 RCX: eaac2ee0
> RDX: 869c2c60 RSI: 10c1a6c0 RDI: 860d3600
> RBP: 88002de03c28 R08: 0001 R09: 0001
> R10: 0020 R11: ed000fffb001 R12: 860d35a0
> R13: dc00 R14: 134c2dae R15: 2c820050
> FS:  () GS:88002de0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2:  CR3: 0600d000 CR4: 06f0
> DR0:  DR1: 

Re: porting kcov to android

2016-07-05 Thread Dmitry Vyukov
Well, something is broken.
Shadow in the report is complete mess (fc is heap redzone, while f4 is
stack redzone). I wonder if it is the bootstrap shadow page that is
used for both heap and stack. Or maybe we return poisoned pages to
pagealloc.
The first thing I would try is to disable stack and global
instrumentation (there are separate flags somewhere in the makefiles).


On Wed, Jul 6, 2016 at 6:57 AM, Baozeng  wrote:
> Hello all,
> I backported KASAN to 3.10.102 stable kerenl
> (ca1199fccf14540e86f6da955333e31d6fec5f3e), based on Andrey Ryabinin's work
> (backport KASAN to RHEL7-based (3.10 based) OpenVZ kernel). I met the
> following kernel panic when starting the kernel using the following command:
>
> qemu-system-x86_64 -hda ./wheezy.img -snapshot -m 2048 -net nic -net
> user,host=10.0.2.10,hostfwd=tcp::51727-:22 -nographic -enable-kvm -numa
> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet -soundhw
> all -kernel ./bzImage -append console=ttyS0 root=/dev/sda debug
> earlyprintk=serial slub_debug=UZ
>
> any suggestions?
>
> ==
> BUG: KASan: out of bounds access in usage_match+0x63/0x70 at addr
> 88002c81ff40
> Read of size 8 by task khubd/923
> =
> BUG kmalloc-4096 (Not tainted): kasan: bad access detected
> -
>
> Disabling lock debugging due to kernel taint
> INFO: Allocated in input_dev_pm_ops+0x520/0x5e0 age=131944943344261 cpu=0
> pid=-536871936
> 0x41b58ab3
> [<  none  >] vsock_dgram_ops+0x337bd3/0x3a5a50 ??:?
> [<  none  >] sysfs_new_dirent+0x0/0x410
> /linux-stable/fs/sysfs/dir.c:1027
> 0x88002c8209d8
> 0xed000590413c
> 0xdc00
> 0x88002c8209e0
> 0x88002c820920
> [<  none  >] mutex_unlock+0x15/0x20 /linux-stable/kernel/mutex.c:252
> 0x11000590412f
> 0x88002c820958
> [<  none  >] sysfs_attr_ns+0x162/0x260
> /linux-stable/fs/sysfs/file.c:522
> 0x11000590412f
> 0x88002c820a18
> [<  none  >] dev_attr_uniq+0x0/0x60
> arch/x86/crypto/sha512-avx2-asm.o:?
> 0x8800280feae0
> INFO: Freed in sysfs_add_file_mode+0x141/0x2d0 age=6421765850 cpu=746719736
> pid=-30720
> 0x1242cf991f0
> 0x0002
> 0x41b58ab3
> [<  none  >] vsock_dgram_ops+0x337b87/0x3a5a50 ??:?
> [<  none  >] sysfs_add_file_mode+0x0/0x2d0
> /linux-stable/fs/sysfs/file.c:693
> 0x88002cf998c8
> INFO: Slab 0xeab20600 objects=7 used=0 fp=0x88002c818000
> flags=0x1fc4080
> INFO: Object 0x88002c81f8c0 @offset=30912 fp=0x0002
>
>
> Redzone 88002c8208c0: 1a 41 90 05 00 f1 ff 1f
> .A..
> Padding 88002c8209f8: 40 0a 82 2c 00 88 ff ff
> @..,
> CPU: 0 PID: 923 Comm: khubd Tainted: GB3.10.102+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  88002c818000 88002c81fc60 850cbe98 88002c81fc90
>  81584f48 88002d806f40 eab20600 88002c81f8c0
>   88002c81fcb8 8158b731 ed0005903fe8
> Call Trace:
> Memory state around the buggy address:
>  88002c81fe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88002c81fe80: fc fc f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 f4
>>88002c81ff00: f4 f4 f2 f2 f2 f2 fc fc fc fc fc fc fc fc f2 f2
>^
>  88002c81ff80: f2 f2 fc fc fc fc fc fc fc fc f3 f3 f3 f3 fc fc
>  88002c82: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral
> protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 923 Comm: khubd Tainted: GB3.10.102+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> task: 88002cf991f0 ti: 88002c82 task.ti: 88002c82
> RIP: 0010:[]  []
> cpuacct_charge+0x1ab/0x490
> RSP: :88002de03be0  EFLAGS: 00010046
> RAX: dc001d5585dc RBX: c5a0 RCX: eaac2ee0
> RDX: 869c2c60 RSI: 10c1a6c0 RDI: 860d3600
> RBP: 88002de03c28 R08: 0001 R09: 0001
> R10: 0020 R11: ed000fffb001 R12: 860d35a0
> R13: dc00 R14: 134c2dae R15: 2c820050
> FS:  () GS:88002de0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2:  CR3: 0600d000 CR4: 06f0
> DR0:  DR1:  DR2: 

Re: [PATCH v7 2/4] soc: mediatek: Init MT8173 scpsys driver earlier

2016-07-05 Thread James Liao
On Sat, 2016-07-02 at 18:35 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > Some power domain comsumers may init before module_init.
> > So the power domain provider (scpsys) need to be initialized
> > earlier too.
> >
> > Take an example for our IOMMU (M4U) and SMI. SMI is a bridge
> > between IOMMU and multimedia HW. SMI is responsible to
> > enable/disable iommu and help transfer data for each multimedia
> > HW. Both of them have to wait until the power and clocks are
> > enabled.
> >
> > So scpsys driver should be initialized before SMI, and SMI should
> > be initialized before IOMMU, and then init IOMMU consumers
> > (display/vdec/venc/camera etc.).
> >
> > IOMMU is subsys_init by default. So we need to init scpsys driver
> > before subsys_init.
> >
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/mtk-scpsys.c | 19 ++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 5870a24..00c0adb 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -617,4 +617,21 @@ static struct platform_driver scpsys_drv = {
> > .of_match_table = of_match_ptr(of_scpsys_match_tbl),
> > },
> >  };
> > -builtin_platform_driver(scpsys_drv);
> > +
> > +static int __init scpsys_drv_init(void)
> > +{
> > +   return platform_driver_register(_drv);
> > +}
> > +
> > +/*
> > + * There are some Mediatek drivers which depend on the power domain driver 
> > need
> > + * to probe in earlier initcall levels. So scpsys driver also need to probe
> > + * earlier.
> > + *
> > + * IOMMU(M4U) and SMI drivers for example. SMI is a bridge between IOMMU 
> > and
> > + * multimedia HW. IOMMU depends on SMI, and SMI is a power domain consumer,
> > + * so the proper probe sequence should be scpsys -> SMI -> IOMMU driver.
> > + * IOMMU drivers are initialized during subsys_init by default, so we need 
> > to
> > + * move SMI and scpsys drivers to subsys_init or earlier init levels.
> > + */
> > +subsys_initcall(scpsys_drv_init);
> >
> 
> Can't we achieve this with probe deferring? I'm not really keen on 
> coding the order of the different drivers like this.

Hi Matthias,

Some drivers such as IOMMU don't support probe deferring. So scpsys need
to init before them by changing init level.


Best regards,

James




[PATCH v5 2/7] perf config: Add macros assigning key-value pairs to default_config_item

2016-07-05 Thread Taeung Song
In near future, default_config_item arrays will be added
(e.g. const struct default_config_item colors_config_items[])
To simply assign config key-value pairs to default_config_item,
add macros that are CONF_VAR() and CONF_*_VAR() for each type.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Wang Nan 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 1fd8e4c..613900f 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -92,4 +92,24 @@ struct default_config_section {
const struct default_config_item *items;
 };
 
+#define CONF_VAR(_name, _field, _val, _type)   \
+   { .name = _name, .value._field = _val, .type = _type }
+
+#define CONF_BOOL_VAR(_name, _val) \
+   CONF_VAR(_name, b, _val, CONFIG_TYPE_BOOL)
+#define CONF_INT_VAR(_name, _val)  \
+   CONF_VAR(_name, i, _val, CONFIG_TYPE_INT)
+#define CONF_LONG_VAR(_name, _val) \
+   CONF_VAR(_name, l, _val, CONFIG_TYPE_LONG)
+#define CONF_U64_VAR(_name, _val)  \
+   CONF_VAR(_name, ll, _val, CONFIG_TYPE_U64)
+#define CONF_FLOAT_VAR(_name, _val)\
+   CONF_VAR(_name, f, _val, CONFIG_TYPE_FLOAT)
+#define CONF_DOUBLE_VAR(_name, _val)   \
+   CONF_VAR(_name, d, _val, CONFIG_TYPE_DOUBLE)
+#define CONF_STR_VAR(_name, _val)  \
+   CONF_VAR(_name, s, _val, CONFIG_TYPE_STRING)
+#define CONF_END() \
+   { .name = NULL }
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



Re: [PATCH v7 2/4] soc: mediatek: Init MT8173 scpsys driver earlier

2016-07-05 Thread James Liao
On Sat, 2016-07-02 at 18:35 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > Some power domain comsumers may init before module_init.
> > So the power domain provider (scpsys) need to be initialized
> > earlier too.
> >
> > Take an example for our IOMMU (M4U) and SMI. SMI is a bridge
> > between IOMMU and multimedia HW. SMI is responsible to
> > enable/disable iommu and help transfer data for each multimedia
> > HW. Both of them have to wait until the power and clocks are
> > enabled.
> >
> > So scpsys driver should be initialized before SMI, and SMI should
> > be initialized before IOMMU, and then init IOMMU consumers
> > (display/vdec/venc/camera etc.).
> >
> > IOMMU is subsys_init by default. So we need to init scpsys driver
> > before subsys_init.
> >
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/mtk-scpsys.c | 19 ++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 5870a24..00c0adb 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -617,4 +617,21 @@ static struct platform_driver scpsys_drv = {
> > .of_match_table = of_match_ptr(of_scpsys_match_tbl),
> > },
> >  };
> > -builtin_platform_driver(scpsys_drv);
> > +
> > +static int __init scpsys_drv_init(void)
> > +{
> > +   return platform_driver_register(_drv);
> > +}
> > +
> > +/*
> > + * There are some Mediatek drivers which depend on the power domain driver 
> > need
> > + * to probe in earlier initcall levels. So scpsys driver also need to probe
> > + * earlier.
> > + *
> > + * IOMMU(M4U) and SMI drivers for example. SMI is a bridge between IOMMU 
> > and
> > + * multimedia HW. IOMMU depends on SMI, and SMI is a power domain consumer,
> > + * so the proper probe sequence should be scpsys -> SMI -> IOMMU driver.
> > + * IOMMU drivers are initialized during subsys_init by default, so we need 
> > to
> > + * move SMI and scpsys drivers to subsys_init or earlier init levels.
> > + */
> > +subsys_initcall(scpsys_drv_init);
> >
> 
> Can't we achieve this with probe deferring? I'm not really keen on 
> coding the order of the different drivers like this.

Hi Matthias,

Some drivers such as IOMMU don't support probe deferring. So scpsys need
to init before them by changing init level.


Best regards,

James




[PATCH v5 2/7] perf config: Add macros assigning key-value pairs to default_config_item

2016-07-05 Thread Taeung Song
In near future, default_config_item arrays will be added
(e.g. const struct default_config_item colors_config_items[])
To simply assign config key-value pairs to default_config_item,
add macros that are CONF_VAR() and CONF_*_VAR() for each type.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Wang Nan 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 1fd8e4c..613900f 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -92,4 +92,24 @@ struct default_config_section {
const struct default_config_item *items;
 };
 
+#define CONF_VAR(_name, _field, _val, _type)   \
+   { .name = _name, .value._field = _val, .type = _type }
+
+#define CONF_BOOL_VAR(_name, _val) \
+   CONF_VAR(_name, b, _val, CONFIG_TYPE_BOOL)
+#define CONF_INT_VAR(_name, _val)  \
+   CONF_VAR(_name, i, _val, CONFIG_TYPE_INT)
+#define CONF_LONG_VAR(_name, _val) \
+   CONF_VAR(_name, l, _val, CONFIG_TYPE_LONG)
+#define CONF_U64_VAR(_name, _val)  \
+   CONF_VAR(_name, ll, _val, CONFIG_TYPE_U64)
+#define CONF_FLOAT_VAR(_name, _val)\
+   CONF_VAR(_name, f, _val, CONFIG_TYPE_FLOAT)
+#define CONF_DOUBLE_VAR(_name, _val)   \
+   CONF_VAR(_name, d, _val, CONFIG_TYPE_DOUBLE)
+#define CONF_STR_VAR(_name, _val)  \
+   CONF_VAR(_name, s, _val, CONFIG_TYPE_STRING)
+#define CONF_END() \
+   { .name = NULL }
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[PATCH v5 7/7] perf config: Initialize annotate_browser__opts with default config items

2016-07-05 Thread Taeung Song
Set default config values for 'annotate' section with 'annotate_config_items[]'
instead of actual bool type values.
(e.g. using annotate_config_items[CONFIG_ANNOTATE_USE_OFFSET].value
instead of 'true' bool type value for 'annotate.use_offset'.)

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browsers/annotate.c | 16 
 tools/perf/util/config.h  |  6 ++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 29dc6d2..0fb78b5 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -38,10 +38,7 @@ static struct annotate_browser_opt {
 show_linenr,
 show_nr_jumps,
 show_total_period;
-} annotate_browser__opts = {
-   .use_offset = true,
-   .jump_arrows= true,
-};
+} annotate_browser__opts;
 
 struct annotate_browser {
struct ui_browser b;
@@ -1157,7 +1154,18 @@ static int annotate__config(const char *var, const char 
*value,
return 0;
 }
 
+static void default_annotate_config_init(void)
+{
+   annotate_browser__opts.hide_src_code = 
CONF_ANNOTATE_DEFAULT_VAL(HIDE_SRC_CODE, b);
+   annotate_browser__opts.use_offset = 
CONF_ANNOTATE_DEFAULT_VAL(USE_OFFSET, b);
+   annotate_browser__opts.jump_arrows = 
CONF_ANNOTATE_DEFAULT_VAL(JUMP_ARROWS, b);
+   annotate_browser__opts.show_linenr = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_LINENR, b);
+   annotate_browser__opts.show_nr_jumps = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_NR_JUMPS, b);
+   annotate_browser__opts.show_total_period = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_TOTAL_PERIOD, b);
+}
+
 void annotate_browser__init(void)
 {
+   default_annotate_config_init();
perf_config(annotate__config, NULL);
 }
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 470c93a..c30e6bb 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -136,6 +136,12 @@ struct default_config_section {
 #define CONF_END() \
{ .name = NULL }
 
+#define CONF_DEFAULT_VAL(section, name, field) \
+   section##_config_items[CONFIG_##name].value.field
+
+#define CONF_ANNOTATE_DEFAULT_VAL(name, field) \
+   CONF_DEFAULT_VAL(annotate, ANNOTATE_##name, field)
+
 extern const struct default_config_item colors_config_items[];
 extern const struct default_config_item annotate_config_items[];
 
-- 
2.5.0



[PATCH v5 7/7] perf config: Initialize annotate_browser__opts with default config items

2016-07-05 Thread Taeung Song
Set default config values for 'annotate' section with 'annotate_config_items[]'
instead of actual bool type values.
(e.g. using annotate_config_items[CONFIG_ANNOTATE_USE_OFFSET].value
instead of 'true' bool type value for 'annotate.use_offset'.)

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browsers/annotate.c | 16 
 tools/perf/util/config.h  |  6 ++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 29dc6d2..0fb78b5 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -38,10 +38,7 @@ static struct annotate_browser_opt {
 show_linenr,
 show_nr_jumps,
 show_total_period;
-} annotate_browser__opts = {
-   .use_offset = true,
-   .jump_arrows= true,
-};
+} annotate_browser__opts;
 
 struct annotate_browser {
struct ui_browser b;
@@ -1157,7 +1154,18 @@ static int annotate__config(const char *var, const char 
*value,
return 0;
 }
 
+static void default_annotate_config_init(void)
+{
+   annotate_browser__opts.hide_src_code = 
CONF_ANNOTATE_DEFAULT_VAL(HIDE_SRC_CODE, b);
+   annotate_browser__opts.use_offset = 
CONF_ANNOTATE_DEFAULT_VAL(USE_OFFSET, b);
+   annotate_browser__opts.jump_arrows = 
CONF_ANNOTATE_DEFAULT_VAL(JUMP_ARROWS, b);
+   annotate_browser__opts.show_linenr = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_LINENR, b);
+   annotate_browser__opts.show_nr_jumps = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_NR_JUMPS, b);
+   annotate_browser__opts.show_total_period = 
CONF_ANNOTATE_DEFAULT_VAL(SHOW_TOTAL_PERIOD, b);
+}
+
 void annotate_browser__init(void)
 {
+   default_annotate_config_init();
perf_config(annotate__config, NULL);
 }
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 470c93a..c30e6bb 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -136,6 +136,12 @@ struct default_config_section {
 #define CONF_END() \
{ .name = NULL }
 
+#define CONF_DEFAULT_VAL(section, name, field) \
+   section##_config_items[CONFIG_##name].value.field
+
+#define CONF_ANNOTATE_DEFAULT_VAL(name, field) \
+   CONF_DEFAULT_VAL(annotate, ANNOTATE_##name, field)
+
 extern const struct default_config_item colors_config_items[];
 extern const struct default_config_item annotate_config_items[];
 
-- 
2.5.0



[PATCH v5 4/7] perf config: Use combined {fore,back}ground colors value instead of each two color

2016-07-05 Thread Taeung Song
To manage all default config values at one spot (at util/config.c),
it would be better that actual variables for each 'colors' config
also have only one value like 'default_config_item' type.

If we do, it smoothly work to initialize 'colors' default config values
by 'colors_config_items' array that have default values at util/config.c
because both actual variable and config item of 'colors_config_items'
are equal in the number of values (as just one).

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browser.c | 81 -
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index 3eb3edb..b4e21d1 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -503,61 +503,53 @@ unsigned int ui_browser__list_head_refresh(struct 
ui_browser *browser)
 }
 
 static struct ui_browser_colorset {
-   const char *name, *fg, *bg;
+   const char *name, *fore_back_colors;
int colorset;
 } ui_browser__colorsets[] = {
{
-   .colorset = HE_COLORSET_TOP,
-   .name = "top",
-   .fg   = "red",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_TOP,
+   .name  = "top",
+   .fore_back_colors = "red, default",
},
{
-   .colorset = HE_COLORSET_MEDIUM,
-   .name = "medium",
-   .fg   = "green",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_MEDIUM,
+   .name  = "medium",
+   .fore_back_colors = "green, default",
},
{
-   .colorset = HE_COLORSET_NORMAL,
-   .name = "normal",
-   .fg   = "default",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_NORMAL,
+   .name  = "normal",
+   .fore_back_colors = "default, default",
},
{
-   .colorset = HE_COLORSET_SELECTED,
-   .name = "selected",
-   .fg   = "black",
-   .bg   = "yellow",
+   .colorset  = HE_COLORSET_SELECTED,
+   .name  = "selected",
+   .fore_back_colors = "black, yellow",
},
{
-   .colorset = HE_COLORSET_JUMP_ARROWS,
-   .name = "jump_arrows",
-   .fg   = "blue",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_JUMP_ARROWS,
+   .name  = "jump_arrows",
+   .fore_back_colors = "blue, default",
},
{
-   .colorset = HE_COLORSET_ADDR,
-   .name = "addr",
-   .fg   = "magenta",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_ADDR,
+   .name  = "addr",
+   .fore_back_colors = "magenta, default",
},
{
-   .colorset = HE_COLORSET_ROOT,
-   .name = "root",
-   .fg   = "white",
-   .bg   = "blue",
+   .colorset  = HE_COLORSET_ROOT,
+   .name  = "root",
+   .fore_back_colors = "white, blue",
},
{
.name = NULL,
}
 };
 
-
 static int ui_browser__color_config(const char *var, const char *value,
void *data __maybe_unused)
 {
-   char *fg = NULL, *bg;
+   char *fore_back_colors;
int i;
 
/* same dir for all commands */
@@ -570,22 +562,18 @@ static int ui_browser__color_config(const char *var, 
const char *value,
if (strcmp(ui_browser__colorsets[i].name, name) != 0)
continue;
 
-   fg = strdup(value);
-   if (fg == NULL)
-   break;
+   if (strstr(value, ",") == NULL)
+   return -1;
 
-   bg = strchr(fg, ',');
-   if (bg == NULL)
+   fore_back_colors = strdup(value);
+   if (fore_back_colors == NULL)
break;
+   ui_browser__colorsets[i].fore_back_colors = fore_back_colors;
 
-   *bg = '\0';
-   while (isspace(*++bg));
-   ui_browser__colorsets[i].bg = bg;
-   ui_browser__colorsets[i].fg = fg;
return 0;
}
 
-   free(fg);
+   free(fore_back_colors);
return -1;
 }
 
@@ -743,8 +731,17 @@ void ui_browser__init(void)
perf_config(ui_browser__color_config, NULL);
 
while (ui_browser__colorsets[i].name) {
+   char *fore_back_colors, 

[PATCH v5 6/7] perf config: Add 'annotate' section default configs arrrays

2016-07-05 Thread Taeung Song
Actual variable for configs of 'annotate' section is like below.

(at ui/browsers/annoate.c)
static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But I suggest using 'annoate' default config array that
have all default config key-value pairs for 'annotate' section

In near future, this arrays will be used on ui/browsers/annoate.c
because of setting default actual variable for 'annotate' config.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 11 +++
 tools/perf/util/config.h | 11 +++
 2 files changed, 22 insertions(+)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index a0c0170..d8d5415 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -41,8 +41,19 @@ const struct default_config_item colors_config_items[] = {
CONF_END()
 };
 
+const struct default_config_item annotate_config_items[] = {
+   CONF_BOOL_VAR("hide_src_code", false),
+   CONF_BOOL_VAR("use_offset", true),
+   CONF_BOOL_VAR("jump_arrows", true),
+   CONF_BOOL_VAR("show_nr_jumps", false),
+   CONF_BOOL_VAR("show_linenr", false),
+   CONF_BOOL_VAR("show_total_period", false),
+   CONF_END()
+};
+
 const struct default_config_section default_sections[] = {
{ .name = "colors", .items = colors_config_items },
+   { .name = "annotate", .items = annotate_config_items },
 };
 
 static int get_next_char(void)
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index f2ce8b4..470c93a 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -75,6 +75,7 @@ enum perf_config_type {
 
 enum config_section_idx {
CONFIG_COLORS,
+   CONFIG_ANNOTATE,
 };
 
 enum colors_config_items_idx {
@@ -87,6 +88,15 @@ enum colors_config_items_idx {
CONFIG_COLORS_ROOT,
 };
 
+enum annotate_config_items_idx {
+   CONFIG_ANNOTATE_HIDE_SRC_CODE,
+   CONFIG_ANNOTATE_USE_OFFSET,
+   CONFIG_ANNOTATE_JUMP_ARROWS,
+   CONFIG_ANNOTATE_SHOW_NR_JUMPS,
+   CONFIG_ANNOTATE_SHOW_LINENR,
+   CONFIG_ANNOTATE_SHOW_TOTAL_PERIOD,
+};
+
 struct default_config_item {
const char *name;
union {
@@ -127,5 +137,6 @@ struct default_config_section {
{ .name = NULL }
 
 extern const struct default_config_item colors_config_items[];
+extern const struct default_config_item annotate_config_items[];
 
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[PATCH v5 5/7] perf config: Initialize ui_browser__colorsets with default config items

2016-07-05 Thread Taeung Song
Set default config values for 'colors' section with 'colors_config_items[]'
instead of actual const char * type values.
(e.g. using colors_config_item[CONFIG_COLORS_TOP].value
instead of "red, default" string value for 'colors.top')

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browser.c | 53 +
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index b4e21d1..380abab 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -507,39 +507,32 @@ static struct ui_browser_colorset {
int colorset;
 } ui_browser__colorsets[] = {
{
-   .colorset  = HE_COLORSET_TOP,
-   .name  = "top",
-   .fore_back_colors = "red, default",
+   .colorset = HE_COLORSET_TOP,
+   .name = "top",
},
{
-   .colorset  = HE_COLORSET_MEDIUM,
-   .name  = "medium",
-   .fore_back_colors = "green, default",
+   .colorset = HE_COLORSET_MEDIUM,
+   .name = "medium",
},
{
-   .colorset  = HE_COLORSET_NORMAL,
-   .name  = "normal",
-   .fore_back_colors = "default, default",
+   .colorset = HE_COLORSET_NORMAL,
+   .name = "normal",
},
{
-   .colorset  = HE_COLORSET_SELECTED,
-   .name  = "selected",
-   .fore_back_colors = "black, yellow",
+   .colorset = HE_COLORSET_SELECTED,
+   .name = "selected",
},
{
-   .colorset  = HE_COLORSET_JUMP_ARROWS,
-   .name  = "jump_arrows",
-   .fore_back_colors = "blue, default",
+   .colorset = HE_COLORSET_JUMP_ARROWS,
+   .name = "jump_arrows",
},
{
-   .colorset  = HE_COLORSET_ADDR,
-   .name  = "addr",
-   .fore_back_colors = "magenta, default",
+   .colorset = HE_COLORSET_ADDR,
+   .name = "addr",
},
{
-   .colorset  = HE_COLORSET_ROOT,
-   .name  = "root",
-   .fore_back_colors = "white, blue",
+   .colorset = HE_COLORSET_ROOT,
+   .name = "root",
},
{
.name = NULL,
@@ -724,10 +717,28 @@ void __ui_browser__line_arrow(struct ui_browser *browser, 
unsigned int column,
__ui_browser__line_arrow_down(browser, column, start, end);
 }
 
+static void default_colors_config_init(void)
+{
+   int i, j;
+
+   for (i = 0; ui_browser__colorsets[i].name != NULL; ++i) {
+   const char *name = ui_browser__colorsets[i].name;
+
+   for (j = 0; colors_config_items[j].name != NULL; j++) {
+   if (!strcmp(name, colors_config_items[j].name)) {
+   ui_browser__colorsets[i].fore_back_colors =
+   colors_config_items[j].value.s;
+   break;
+   }
+   }
+   }
+}
+
 void ui_browser__init(void)
 {
int i = 0;
 
+   default_colors_config_init();
perf_config(ui_browser__color_config, NULL);
 
while (ui_browser__colorsets[i].name) {
-- 
2.5.0



[PATCH v5 4/7] perf config: Use combined {fore,back}ground colors value instead of each two color

2016-07-05 Thread Taeung Song
To manage all default config values at one spot (at util/config.c),
it would be better that actual variables for each 'colors' config
also have only one value like 'default_config_item' type.

If we do, it smoothly work to initialize 'colors' default config values
by 'colors_config_items' array that have default values at util/config.c
because both actual variable and config item of 'colors_config_items'
are equal in the number of values (as just one).

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browser.c | 81 -
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index 3eb3edb..b4e21d1 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -503,61 +503,53 @@ unsigned int ui_browser__list_head_refresh(struct 
ui_browser *browser)
 }
 
 static struct ui_browser_colorset {
-   const char *name, *fg, *bg;
+   const char *name, *fore_back_colors;
int colorset;
 } ui_browser__colorsets[] = {
{
-   .colorset = HE_COLORSET_TOP,
-   .name = "top",
-   .fg   = "red",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_TOP,
+   .name  = "top",
+   .fore_back_colors = "red, default",
},
{
-   .colorset = HE_COLORSET_MEDIUM,
-   .name = "medium",
-   .fg   = "green",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_MEDIUM,
+   .name  = "medium",
+   .fore_back_colors = "green, default",
},
{
-   .colorset = HE_COLORSET_NORMAL,
-   .name = "normal",
-   .fg   = "default",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_NORMAL,
+   .name  = "normal",
+   .fore_back_colors = "default, default",
},
{
-   .colorset = HE_COLORSET_SELECTED,
-   .name = "selected",
-   .fg   = "black",
-   .bg   = "yellow",
+   .colorset  = HE_COLORSET_SELECTED,
+   .name  = "selected",
+   .fore_back_colors = "black, yellow",
},
{
-   .colorset = HE_COLORSET_JUMP_ARROWS,
-   .name = "jump_arrows",
-   .fg   = "blue",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_JUMP_ARROWS,
+   .name  = "jump_arrows",
+   .fore_back_colors = "blue, default",
},
{
-   .colorset = HE_COLORSET_ADDR,
-   .name = "addr",
-   .fg   = "magenta",
-   .bg   = "default",
+   .colorset  = HE_COLORSET_ADDR,
+   .name  = "addr",
+   .fore_back_colors = "magenta, default",
},
{
-   .colorset = HE_COLORSET_ROOT,
-   .name = "root",
-   .fg   = "white",
-   .bg   = "blue",
+   .colorset  = HE_COLORSET_ROOT,
+   .name  = "root",
+   .fore_back_colors = "white, blue",
},
{
.name = NULL,
}
 };
 
-
 static int ui_browser__color_config(const char *var, const char *value,
void *data __maybe_unused)
 {
-   char *fg = NULL, *bg;
+   char *fore_back_colors;
int i;
 
/* same dir for all commands */
@@ -570,22 +562,18 @@ static int ui_browser__color_config(const char *var, 
const char *value,
if (strcmp(ui_browser__colorsets[i].name, name) != 0)
continue;
 
-   fg = strdup(value);
-   if (fg == NULL)
-   break;
+   if (strstr(value, ",") == NULL)
+   return -1;
 
-   bg = strchr(fg, ',');
-   if (bg == NULL)
+   fore_back_colors = strdup(value);
+   if (fore_back_colors == NULL)
break;
+   ui_browser__colorsets[i].fore_back_colors = fore_back_colors;
 
-   *bg = '\0';
-   while (isspace(*++bg));
-   ui_browser__colorsets[i].bg = bg;
-   ui_browser__colorsets[i].fg = fg;
return 0;
}
 
-   free(fg);
+   free(fore_back_colors);
return -1;
 }
 
@@ -743,8 +731,17 @@ void ui_browser__init(void)
perf_config(ui_browser__color_config, NULL);
 
while (ui_browser__colorsets[i].name) {
+   char *fore_back_colors, *fg, *bg;
struct ui_browser_colorset *c = _browser__colorsets[i++];
-   sltt_set_color(c->colorset, c->name, 

[PATCH v5 6/7] perf config: Add 'annotate' section default configs arrrays

2016-07-05 Thread Taeung Song
Actual variable for configs of 'annotate' section is like below.

(at ui/browsers/annoate.c)
static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But I suggest using 'annoate' default config array that
have all default config key-value pairs for 'annotate' section

In near future, this arrays will be used on ui/browsers/annoate.c
because of setting default actual variable for 'annotate' config.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 11 +++
 tools/perf/util/config.h | 11 +++
 2 files changed, 22 insertions(+)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index a0c0170..d8d5415 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -41,8 +41,19 @@ const struct default_config_item colors_config_items[] = {
CONF_END()
 };
 
+const struct default_config_item annotate_config_items[] = {
+   CONF_BOOL_VAR("hide_src_code", false),
+   CONF_BOOL_VAR("use_offset", true),
+   CONF_BOOL_VAR("jump_arrows", true),
+   CONF_BOOL_VAR("show_nr_jumps", false),
+   CONF_BOOL_VAR("show_linenr", false),
+   CONF_BOOL_VAR("show_total_period", false),
+   CONF_END()
+};
+
 const struct default_config_section default_sections[] = {
{ .name = "colors", .items = colors_config_items },
+   { .name = "annotate", .items = annotate_config_items },
 };
 
 static int get_next_char(void)
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index f2ce8b4..470c93a 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -75,6 +75,7 @@ enum perf_config_type {
 
 enum config_section_idx {
CONFIG_COLORS,
+   CONFIG_ANNOTATE,
 };
 
 enum colors_config_items_idx {
@@ -87,6 +88,15 @@ enum colors_config_items_idx {
CONFIG_COLORS_ROOT,
 };
 
+enum annotate_config_items_idx {
+   CONFIG_ANNOTATE_HIDE_SRC_CODE,
+   CONFIG_ANNOTATE_USE_OFFSET,
+   CONFIG_ANNOTATE_JUMP_ARROWS,
+   CONFIG_ANNOTATE_SHOW_NR_JUMPS,
+   CONFIG_ANNOTATE_SHOW_LINENR,
+   CONFIG_ANNOTATE_SHOW_TOTAL_PERIOD,
+};
+
 struct default_config_item {
const char *name;
union {
@@ -127,5 +137,6 @@ struct default_config_section {
{ .name = NULL }
 
 extern const struct default_config_item colors_config_items[];
+extern const struct default_config_item annotate_config_items[];
 
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[PATCH v5 5/7] perf config: Initialize ui_browser__colorsets with default config items

2016-07-05 Thread Taeung Song
Set default config values for 'colors' section with 'colors_config_items[]'
instead of actual const char * type values.
(e.g. using colors_config_item[CONFIG_COLORS_TOP].value
instead of "red, default" string value for 'colors.top')

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/ui/browser.c | 53 +
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index b4e21d1..380abab 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -507,39 +507,32 @@ static struct ui_browser_colorset {
int colorset;
 } ui_browser__colorsets[] = {
{
-   .colorset  = HE_COLORSET_TOP,
-   .name  = "top",
-   .fore_back_colors = "red, default",
+   .colorset = HE_COLORSET_TOP,
+   .name = "top",
},
{
-   .colorset  = HE_COLORSET_MEDIUM,
-   .name  = "medium",
-   .fore_back_colors = "green, default",
+   .colorset = HE_COLORSET_MEDIUM,
+   .name = "medium",
},
{
-   .colorset  = HE_COLORSET_NORMAL,
-   .name  = "normal",
-   .fore_back_colors = "default, default",
+   .colorset = HE_COLORSET_NORMAL,
+   .name = "normal",
},
{
-   .colorset  = HE_COLORSET_SELECTED,
-   .name  = "selected",
-   .fore_back_colors = "black, yellow",
+   .colorset = HE_COLORSET_SELECTED,
+   .name = "selected",
},
{
-   .colorset  = HE_COLORSET_JUMP_ARROWS,
-   .name  = "jump_arrows",
-   .fore_back_colors = "blue, default",
+   .colorset = HE_COLORSET_JUMP_ARROWS,
+   .name = "jump_arrows",
},
{
-   .colorset  = HE_COLORSET_ADDR,
-   .name  = "addr",
-   .fore_back_colors = "magenta, default",
+   .colorset = HE_COLORSET_ADDR,
+   .name = "addr",
},
{
-   .colorset  = HE_COLORSET_ROOT,
-   .name  = "root",
-   .fore_back_colors = "white, blue",
+   .colorset = HE_COLORSET_ROOT,
+   .name = "root",
},
{
.name = NULL,
@@ -724,10 +717,28 @@ void __ui_browser__line_arrow(struct ui_browser *browser, 
unsigned int column,
__ui_browser__line_arrow_down(browser, column, start, end);
 }
 
+static void default_colors_config_init(void)
+{
+   int i, j;
+
+   for (i = 0; ui_browser__colorsets[i].name != NULL; ++i) {
+   const char *name = ui_browser__colorsets[i].name;
+
+   for (j = 0; colors_config_items[j].name != NULL; j++) {
+   if (!strcmp(name, colors_config_items[j].name)) {
+   ui_browser__colorsets[i].fore_back_colors =
+   colors_config_items[j].value.s;
+   break;
+   }
+   }
+   }
+}
+
 void ui_browser__init(void)
 {
int i = 0;
 
+   default_colors_config_init();
perf_config(ui_browser__color_config, NULL);
 
while (ui_browser__colorsets[i].name) {
-- 
2.5.0



[PATCH v5 1/7] perf config: Introduce default_config_section and default_config_item for default config key-value pairs

2016-07-05 Thread Taeung Song
When initializing default perf config values,
we currently use values of actual type(int, bool, char *, etc.).

For example,
If there isn't user config value at ~/.perfconfig
for 'annotate.use_offset' config variable,
default value for it is 'true' bool type value in perf like below.

At ui/browsers/annoate.c

static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But I suggest using 'struct default_config_section' and
'struct default_config_item' that have default config key-value pairs
in order to initialize default config values with them, in near future.

If we do, we could manage default perf config values at one spot (i.e. 
util/config.c)
with default config arrays and it could be easy and simple to modify
default config values or add new configs.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Wang Nan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 6f813d4..1fd8e4c 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -63,4 +63,33 @@ void perf_config__refresh(void);
perf_config_sections__for_each_entry(>sections, section)   
\
perf_config_items__for_each_entry(>items, item)
 
+enum perf_config_type {
+   CONFIG_TYPE_BOOL,
+   CONFIG_TYPE_INT,
+   CONFIG_TYPE_LONG,
+   CONFIG_TYPE_U64,
+   CONFIG_TYPE_FLOAT,
+   CONFIG_TYPE_DOUBLE,
+   CONFIG_TYPE_STRING
+};
+
+struct default_config_item {
+   const char *name;
+   union {
+   bool b;
+   int i;
+   u32 l;
+   u64 ll;
+   float f;
+   double d;
+   const char *s;
+   } value;
+   enum perf_config_type type;
+};
+
+struct default_config_section {
+   const char *name;
+   const struct default_config_item *items;
+};
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[RFC PATCH v5 0/7] perf config: Introduce default config key-value pairs arrays

2016-07-05 Thread Taeung Song
Hello, :)

When initializing default perf config values,
we currently use values of actual type(int, bool, char *, etc.).
But I suggest using default config key-value pairs arrays.

For example,
If there isn't user config value at ~/.perfconfig for 'annotate.use_offset' 
config variable,
default value for it is 'true' bool type value in perf like below.

At ui/browsers/annoate.c

static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But if we use new config arrays that have all default config key-value pairs,
we could initialize default config values with them.

If we do, we can manage default perf config values at one spot (like 
util/config.c)
and It can be easy and simple to modify default config values or add new 
configs.

For example,
If we use new default config arrays and there isn't user config value for 
'annoate.use_offset'
default value for it will be set as 
annotate_config_items[CONFIG_ANNOATE_USE_OFFSET].value
instead of actual boolean type value 'true'.

IMHO, I think it would needed to use new default config arrays
to manage default perf config values more effectively.
And this pathset contains patchs for only 'colors' and 'annoate' section
because waiting for other opinions.

If you review this patchset, I'd appreciate it :-)

Thanks,
Taeung

v5:
- rebased on current acme/perf/core

v4:
- rename 'fb_ground' to 'fore_back_colors' (Namhyung)
- add struct default_config_section
- split first patch[PATCH 1/7] as two
- remove perf_default_config_init() at perf.c
- rebased on current acme/perf/core

v3:
- remove default config arrays for the rest sections except 'colors' and 
'annotate'
- use combined {fore, back}ground colors instead of each two color
- introduce perf_default_config_init() that call all default_*_config_init()
  for each config section

v2:
- rename 'ui_browser__config_gcolors' to 'ui_browser__config_colors' (Arnaldo)
- change 'ground colors' to '{back, fore}ground colors' (Arnaldo)
- use strtok + ltrim instead of strchr and while (isspace(*++bg)); (Arnaldo)

Taeung Song (7):
  perf config: Introduce default_config_section and default_config_item
for default config key-value pairs
  perf config: Add macros assigning key-value pairs to
default_config_item
  perf config: Add 'colors' section default configs arrrays
  perf config: Use combined {fore,back}ground colors value instead of
each two color
  perf config: Initialize ui_browser__colorsets with default config
items
  perf config: Add 'annotate' section default configs arrrays
  perf config: Initialize annotate_browser__opts with default config
items

 tools/perf/ui/browser.c   | 64 +-
 tools/perf/ui/browsers/annotate.c | 16 ++--
 tools/perf/util/config.c  | 26 +
 tools/perf/util/config.h  | 82 +++
 4 files changed, 156 insertions(+), 32 deletions(-)

-- 
2.5.0



[PATCH v5 3/7] perf config: Add 'colors' section default configs arrrays

2016-07-05 Thread Taeung Song
Actual variable for configs of 'colors' section is like below.

(at ui/browser.c)
static struct ui_browser_colorset {
const char *name, *fg, *bg;
int colorset;
} ui_browser__colorsets[] = {
{
.colorset = HE_COLORSET_TOP,
.name = "top",
.fg   = "red",
.bg   = "default",
},
...

But I suggest using 'colors' default config array that
have all default config key-value pairs for 'colors' section

In near future, this array will be used on ui/browser.c
because of setting default actual variable for 'colors' config.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 15 +++
 tools/perf/util/config.h | 16 
 2 files changed, 31 insertions(+)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 18dae74..a0c0170 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -30,6 +30,21 @@ static struct perf_config_set *config_set;
 
 const char *config_exclusive_filename;
 
+const struct default_config_item colors_config_items[] = {
+   CONF_STR_VAR("top", "red, default"),
+   CONF_STR_VAR("medium", "green, default"),
+   CONF_STR_VAR("normal", "default, default"),
+   CONF_STR_VAR("selected", "black, yellow"),
+   CONF_STR_VAR("jump_arrows", "blue, default"),
+   CONF_STR_VAR("addr", "magenta, default"),
+   CONF_STR_VAR("root", "white, blue"),
+   CONF_END()
+};
+
+const struct default_config_section default_sections[] = {
+   { .name = "colors", .items = colors_config_items },
+};
+
 static int get_next_char(void)
 {
int c;
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 613900f..f2ce8b4 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -73,6 +73,20 @@ enum perf_config_type {
CONFIG_TYPE_STRING
 };
 
+enum config_section_idx {
+   CONFIG_COLORS,
+};
+
+enum colors_config_items_idx {
+   CONFIG_COLORS_TOP,
+   CONFIG_COLORS_MEDIUM,
+   CONFIG_COLORS_NORMAL,
+   CONFIG_COLORS_SELECTED,
+   CONFIG_COLORS_JUMP_ARROWS,
+   CONFIG_COLORS_ADDR,
+   CONFIG_COLORS_ROOT,
+};
+
 struct default_config_item {
const char *name;
union {
@@ -112,4 +126,6 @@ struct default_config_section {
 #define CONF_END() \
{ .name = NULL }
 
+extern const struct default_config_item colors_config_items[];
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[PATCH v5 1/7] perf config: Introduce default_config_section and default_config_item for default config key-value pairs

2016-07-05 Thread Taeung Song
When initializing default perf config values,
we currently use values of actual type(int, bool, char *, etc.).

For example,
If there isn't user config value at ~/.perfconfig
for 'annotate.use_offset' config variable,
default value for it is 'true' bool type value in perf like below.

At ui/browsers/annoate.c

static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But I suggest using 'struct default_config_section' and
'struct default_config_item' that have default config key-value pairs
in order to initialize default config values with them, in near future.

If we do, we could manage default perf config values at one spot (i.e. 
util/config.c)
with default config arrays and it could be easy and simple to modify
default config values or add new configs.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Wang Nan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 6f813d4..1fd8e4c 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -63,4 +63,33 @@ void perf_config__refresh(void);
perf_config_sections__for_each_entry(>sections, section)   
\
perf_config_items__for_each_entry(>items, item)
 
+enum perf_config_type {
+   CONFIG_TYPE_BOOL,
+   CONFIG_TYPE_INT,
+   CONFIG_TYPE_LONG,
+   CONFIG_TYPE_U64,
+   CONFIG_TYPE_FLOAT,
+   CONFIG_TYPE_DOUBLE,
+   CONFIG_TYPE_STRING
+};
+
+struct default_config_item {
+   const char *name;
+   union {
+   bool b;
+   int i;
+   u32 l;
+   u64 ll;
+   float f;
+   double d;
+   const char *s;
+   } value;
+   enum perf_config_type type;
+};
+
+struct default_config_section {
+   const char *name;
+   const struct default_config_item *items;
+};
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



[RFC PATCH v5 0/7] perf config: Introduce default config key-value pairs arrays

2016-07-05 Thread Taeung Song
Hello, :)

When initializing default perf config values,
we currently use values of actual type(int, bool, char *, etc.).
But I suggest using default config key-value pairs arrays.

For example,
If there isn't user config value at ~/.perfconfig for 'annotate.use_offset' 
config variable,
default value for it is 'true' bool type value in perf like below.

At ui/browsers/annoate.c

static struct annotate_browser_opt {
   bool hide_src_code,
use_offset,
jump_arrows,
show_linenr,
show_nr_jumps,
show_total_period;
} annotate_browser__opts = {
   .use_offset  = true,
   .jump_arrows = true,
};

But if we use new config arrays that have all default config key-value pairs,
we could initialize default config values with them.

If we do, we can manage default perf config values at one spot (like 
util/config.c)
and It can be easy and simple to modify default config values or add new 
configs.

For example,
If we use new default config arrays and there isn't user config value for 
'annoate.use_offset'
default value for it will be set as 
annotate_config_items[CONFIG_ANNOATE_USE_OFFSET].value
instead of actual boolean type value 'true'.

IMHO, I think it would needed to use new default config arrays
to manage default perf config values more effectively.
And this pathset contains patchs for only 'colors' and 'annoate' section
because waiting for other opinions.

If you review this patchset, I'd appreciate it :-)

Thanks,
Taeung

v5:
- rebased on current acme/perf/core

v4:
- rename 'fb_ground' to 'fore_back_colors' (Namhyung)
- add struct default_config_section
- split first patch[PATCH 1/7] as two
- remove perf_default_config_init() at perf.c
- rebased on current acme/perf/core

v3:
- remove default config arrays for the rest sections except 'colors' and 
'annotate'
- use combined {fore, back}ground colors instead of each two color
- introduce perf_default_config_init() that call all default_*_config_init()
  for each config section

v2:
- rename 'ui_browser__config_gcolors' to 'ui_browser__config_colors' (Arnaldo)
- change 'ground colors' to '{back, fore}ground colors' (Arnaldo)
- use strtok + ltrim instead of strchr and while (isspace(*++bg)); (Arnaldo)

Taeung Song (7):
  perf config: Introduce default_config_section and default_config_item
for default config key-value pairs
  perf config: Add macros assigning key-value pairs to
default_config_item
  perf config: Add 'colors' section default configs arrrays
  perf config: Use combined {fore,back}ground colors value instead of
each two color
  perf config: Initialize ui_browser__colorsets with default config
items
  perf config: Add 'annotate' section default configs arrrays
  perf config: Initialize annotate_browser__opts with default config
items

 tools/perf/ui/browser.c   | 64 +-
 tools/perf/ui/browsers/annotate.c | 16 ++--
 tools/perf/util/config.c  | 26 +
 tools/perf/util/config.h  | 82 +++
 4 files changed, 156 insertions(+), 32 deletions(-)

-- 
2.5.0



[PATCH v5 3/7] perf config: Add 'colors' section default configs arrrays

2016-07-05 Thread Taeung Song
Actual variable for configs of 'colors' section is like below.

(at ui/browser.c)
static struct ui_browser_colorset {
const char *name, *fg, *bg;
int colorset;
} ui_browser__colorsets[] = {
{
.colorset = HE_COLORSET_TOP,
.name = "top",
.fg   = "red",
.bg   = "default",
},
...

But I suggest using 'colors' default config array that
have all default config key-value pairs for 'colors' section

In near future, this array will be used on ui/browser.c
because of setting default actual variable for 'colors' config.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 15 +++
 tools/perf/util/config.h | 16 
 2 files changed, 31 insertions(+)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 18dae74..a0c0170 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -30,6 +30,21 @@ static struct perf_config_set *config_set;
 
 const char *config_exclusive_filename;
 
+const struct default_config_item colors_config_items[] = {
+   CONF_STR_VAR("top", "red, default"),
+   CONF_STR_VAR("medium", "green, default"),
+   CONF_STR_VAR("normal", "default, default"),
+   CONF_STR_VAR("selected", "black, yellow"),
+   CONF_STR_VAR("jump_arrows", "blue, default"),
+   CONF_STR_VAR("addr", "magenta, default"),
+   CONF_STR_VAR("root", "white, blue"),
+   CONF_END()
+};
+
+const struct default_config_section default_sections[] = {
+   { .name = "colors", .items = colors_config_items },
+};
+
 static int get_next_char(void)
 {
int c;
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 613900f..f2ce8b4 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -73,6 +73,20 @@ enum perf_config_type {
CONFIG_TYPE_STRING
 };
 
+enum config_section_idx {
+   CONFIG_COLORS,
+};
+
+enum colors_config_items_idx {
+   CONFIG_COLORS_TOP,
+   CONFIG_COLORS_MEDIUM,
+   CONFIG_COLORS_NORMAL,
+   CONFIG_COLORS_SELECTED,
+   CONFIG_COLORS_JUMP_ARROWS,
+   CONFIG_COLORS_ADDR,
+   CONFIG_COLORS_ROOT,
+};
+
 struct default_config_item {
const char *name;
union {
@@ -112,4 +126,6 @@ struct default_config_section {
 #define CONF_END() \
{ .name = NULL }
 
+extern const struct default_config_item colors_config_items[];
+
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0



Re: [PATCH v7 4/4] soc: mediatek: Add MT2701 scpsys driver

2016-07-05 Thread James Liao
On Sat, 2016-07-02 at 18:41 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > From: Shunli Wang 
> >
> > Add scpsys driver for MT2701.
> >
> > mtk-scpsys now supports MT8173 (arm64) and MT2701 (arm). So it should
> > be enabled on both arm64 and arm platforms.
> >
> > Signed-off-by: Shunli Wang 
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/Kconfig  |   2 +-
> >  drivers/soc/mediatek/mtk-scpsys.c | 117 
> > +-
> >  2 files changed, 116 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/soc/mediatek/Kconfig b/drivers/soc/mediatek/Kconfig
> > index 0a4ea80..609bb34 100644
> > --- a/drivers/soc/mediatek/Kconfig
> > +++ b/drivers/soc/mediatek/Kconfig
> > @@ -23,7 +23,7 @@ config MTK_PMIC_WRAP
> >  config MTK_SCPSYS
> > bool "MediaTek SCPSYS Support"
> > depends on ARCH_MEDIATEK || COMPILE_TEST
> > -   default ARM64 && ARCH_MEDIATEK
> > +   default ARCH_MEDIATEK
> > select REGMAP
> > select MTK_INFRACFG
> > select PM_GENERIC_DOMAINS if PM
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 00c0adb..f4d1230 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >
> > +#include 
> >  #include 
> >
> >  #define SPM_VDE_PWR_CON0x0210
> > @@ -27,8 +28,13 @@
> >  #define SPM_VEN_PWR_CON0x0230
> >  #define SPM_ISP_PWR_CON0x0238
> >  #define SPM_DIS_PWR_CON0x023c
> > +#define SPM_CONN_PWR_CON   0x0280
> >  #define SPM_VEN2_PWR_CON   0x0298
> > -#define SPM_AUDIO_PWR_CON  0x029c
> > +#define SPM_AUDIO_PWR_CON  0x029c  /* MT8173 */
> > +#define SPM_BDP_PWR_CON0x029c  /* MT2701 */
> > +#define SPM_ETH_PWR_CON0x02a0
> > +#define SPM_HIF_PWR_CON0x02a4
> > +#define SPM_IFR_MSC_PWR_CON0x02a8
> >  #define SPM_MFG_2D_PWR_CON 0x02c0
> >  #define SPM_MFG_ASYNC_PWR_CON  0x02c4
> >  #define SPM_USB_PWR_CON0x02cc
> > @@ -42,10 +48,15 @@
> >  #define PWR_ON_2ND_BIT BIT(3)
> >  #define PWR_CLK_DIS_BITBIT(4)
> >
> > +#define PWR_STATUS_CONNBIT(1)
> >  #define PWR_STATUS_DISPBIT(3)
> >  #define PWR_STATUS_MFG BIT(4)
> >  #define PWR_STATUS_ISP BIT(5)
> >  #define PWR_STATUS_VDECBIT(7)
> > +#define PWR_STATUS_BDP BIT(14)
> > +#define PWR_STATUS_ETH BIT(15)
> > +#define PWR_STATUS_HIF BIT(16)
> > +#define PWR_STATUS_IFR_MSC BIT(17)
> >  #define PWR_STATUS_VENC_LT BIT(20)
> >  #define PWR_STATUS_VENCBIT(21)
> >  #define PWR_STATUS_MFG_2D  BIT(22)
> > @@ -59,6 +70,7 @@ enum clk_id {
> > CLK_MFG,
> > CLK_VENC,
> > CLK_VENC_LT,
> > +   CLK_ETHIF,
> > CLK_MAX,
> >  };
> >
> > @@ -321,7 +333,8 @@ static void init_clks(struct platform_device *pdev, 
> > struct clk *clk[CLK_MAX])
> > CLK_MM,
> > CLK_MFG,
> > CLK_VENC,
> > -   CLK_VENC_LT
> > +   CLK_VENC_LT,
> > +   CLK_ETHIF
> > };
> >
> > static const char * const clk_names[] = {
> > @@ -329,6 +342,7 @@ static void init_clks(struct platform_device *pdev, 
> > struct clk *clk[CLK_MAX])
> > "mfg",
> > "venc",
> > "venc_lt",
> > +   "ethif",
> > };
> >
> > int i;
> > @@ -459,6 +473,102 @@ static void mtk_register_power_domains(struct 
> > platform_device *pdev,
> >  }
> >
> >  /*
> > + * MT2701 power domain support
> > + */
> > +
> > +static const struct scp_domain_data scp_domain_data_mt2701[] = {
> > +   [MT2701_POWER_DOMAIN_CONN] = {
> > +   .name = "conn",
> > +   .sta_mask = PWR_STATUS_CONN,
> > +   .ctl_offs = SPM_CONN_PWR_CON,
> > +   .bus_prot_mask = 0x0104,
> > +   .active_wakeup = true,
> 
> .clk_id = {CLK_NONE},
> 
> > +   },
> > +   [MT2701_POWER_DOMAIN_DISP] = {
> > +   .name = "disp",
> > +   .sta_mask = PWR_STATUS_DISP,
> > +   .ctl_offs = SPM_DIS_PWR_CON,
> > +   .sram_pdn_bits = GENMASK(11, 8),
> > +   .clk_id = {CLK_MM},
> > +   .bus_prot_mask = 0x0002,
> > +   .active_wakeup = true,
> > +   },
> > +   [MT2701_POWER_DOMAIN_MFG] = {
> > +   .name = "mfg",
> > +   .sta_mask = PWR_STATUS_MFG,
> > +   .ctl_offs = SPM_MFG_PWR_CON,
> > +   .sram_pdn_bits = GENMASK(11, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   .clk_id = {CLK_MFG},
> > +  

Re: [PATCH v7 4/4] soc: mediatek: Add MT2701 scpsys driver

2016-07-05 Thread James Liao
On Sat, 2016-07-02 at 18:41 +0200, Matthias Brugger wrote:
> 
> On 05/16/2016 11:28 AM, James Liao wrote:
> > From: Shunli Wang 
> >
> > Add scpsys driver for MT2701.
> >
> > mtk-scpsys now supports MT8173 (arm64) and MT2701 (arm). So it should
> > be enabled on both arm64 and arm platforms.
> >
> > Signed-off-by: Shunli Wang 
> > Signed-off-by: James Liao 
> > Reviewed-by: Kevin Hilman 
> > ---
> >  drivers/soc/mediatek/Kconfig  |   2 +-
> >  drivers/soc/mediatek/mtk-scpsys.c | 117 
> > +-
> >  2 files changed, 116 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/soc/mediatek/Kconfig b/drivers/soc/mediatek/Kconfig
> > index 0a4ea80..609bb34 100644
> > --- a/drivers/soc/mediatek/Kconfig
> > +++ b/drivers/soc/mediatek/Kconfig
> > @@ -23,7 +23,7 @@ config MTK_PMIC_WRAP
> >  config MTK_SCPSYS
> > bool "MediaTek SCPSYS Support"
> > depends on ARCH_MEDIATEK || COMPILE_TEST
> > -   default ARM64 && ARCH_MEDIATEK
> > +   default ARCH_MEDIATEK
> > select REGMAP
> > select MTK_INFRACFG
> > select PM_GENERIC_DOMAINS if PM
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
> > b/drivers/soc/mediatek/mtk-scpsys.c
> > index 00c0adb..f4d1230 100644
> > --- a/drivers/soc/mediatek/mtk-scpsys.c
> > +++ b/drivers/soc/mediatek/mtk-scpsys.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >
> > +#include 
> >  #include 
> >
> >  #define SPM_VDE_PWR_CON0x0210
> > @@ -27,8 +28,13 @@
> >  #define SPM_VEN_PWR_CON0x0230
> >  #define SPM_ISP_PWR_CON0x0238
> >  #define SPM_DIS_PWR_CON0x023c
> > +#define SPM_CONN_PWR_CON   0x0280
> >  #define SPM_VEN2_PWR_CON   0x0298
> > -#define SPM_AUDIO_PWR_CON  0x029c
> > +#define SPM_AUDIO_PWR_CON  0x029c  /* MT8173 */
> > +#define SPM_BDP_PWR_CON0x029c  /* MT2701 */
> > +#define SPM_ETH_PWR_CON0x02a0
> > +#define SPM_HIF_PWR_CON0x02a4
> > +#define SPM_IFR_MSC_PWR_CON0x02a8
> >  #define SPM_MFG_2D_PWR_CON 0x02c0
> >  #define SPM_MFG_ASYNC_PWR_CON  0x02c4
> >  #define SPM_USB_PWR_CON0x02cc
> > @@ -42,10 +48,15 @@
> >  #define PWR_ON_2ND_BIT BIT(3)
> >  #define PWR_CLK_DIS_BITBIT(4)
> >
> > +#define PWR_STATUS_CONNBIT(1)
> >  #define PWR_STATUS_DISPBIT(3)
> >  #define PWR_STATUS_MFG BIT(4)
> >  #define PWR_STATUS_ISP BIT(5)
> >  #define PWR_STATUS_VDECBIT(7)
> > +#define PWR_STATUS_BDP BIT(14)
> > +#define PWR_STATUS_ETH BIT(15)
> > +#define PWR_STATUS_HIF BIT(16)
> > +#define PWR_STATUS_IFR_MSC BIT(17)
> >  #define PWR_STATUS_VENC_LT BIT(20)
> >  #define PWR_STATUS_VENCBIT(21)
> >  #define PWR_STATUS_MFG_2D  BIT(22)
> > @@ -59,6 +70,7 @@ enum clk_id {
> > CLK_MFG,
> > CLK_VENC,
> > CLK_VENC_LT,
> > +   CLK_ETHIF,
> > CLK_MAX,
> >  };
> >
> > @@ -321,7 +333,8 @@ static void init_clks(struct platform_device *pdev, 
> > struct clk *clk[CLK_MAX])
> > CLK_MM,
> > CLK_MFG,
> > CLK_VENC,
> > -   CLK_VENC_LT
> > +   CLK_VENC_LT,
> > +   CLK_ETHIF
> > };
> >
> > static const char * const clk_names[] = {
> > @@ -329,6 +342,7 @@ static void init_clks(struct platform_device *pdev, 
> > struct clk *clk[CLK_MAX])
> > "mfg",
> > "venc",
> > "venc_lt",
> > +   "ethif",
> > };
> >
> > int i;
> > @@ -459,6 +473,102 @@ static void mtk_register_power_domains(struct 
> > platform_device *pdev,
> >  }
> >
> >  /*
> > + * MT2701 power domain support
> > + */
> > +
> > +static const struct scp_domain_data scp_domain_data_mt2701[] = {
> > +   [MT2701_POWER_DOMAIN_CONN] = {
> > +   .name = "conn",
> > +   .sta_mask = PWR_STATUS_CONN,
> > +   .ctl_offs = SPM_CONN_PWR_CON,
> > +   .bus_prot_mask = 0x0104,
> > +   .active_wakeup = true,
> 
> .clk_id = {CLK_NONE},
> 
> > +   },
> > +   [MT2701_POWER_DOMAIN_DISP] = {
> > +   .name = "disp",
> > +   .sta_mask = PWR_STATUS_DISP,
> > +   .ctl_offs = SPM_DIS_PWR_CON,
> > +   .sram_pdn_bits = GENMASK(11, 8),
> > +   .clk_id = {CLK_MM},
> > +   .bus_prot_mask = 0x0002,
> > +   .active_wakeup = true,
> > +   },
> > +   [MT2701_POWER_DOMAIN_MFG] = {
> > +   .name = "mfg",
> > +   .sta_mask = PWR_STATUS_MFG,
> > +   .ctl_offs = SPM_MFG_PWR_CON,
> > +   .sram_pdn_bits = GENMASK(11, 8),
> > +   .sram_pdn_ack_bits = GENMASK(12, 12),
> > +   .clk_id = {CLK_MFG},
> > +   .active_wakeup = true,
> > +   },
> > +   [MT2701_POWER_DOMAIN_VDEC] = {
> > +   .name = 

[PATCH v2 2/3] arm/xen: add support for vm_assist hypercall

2016-07-05 Thread Juergen Gross
Add support for the Xen HYPERVISOR_vm_assist hypercall.

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/include/asm/xen/hypercall.h | 1 +
 arch/arm/xen/enlighten.c | 1 +
 arch/arm/xen/hypercall.S | 1 +
 arch/arm64/xen/hypercall.S   | 1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/arm/include/asm/xen/hypercall.h 
b/arch/arm/include/asm/xen/hypercall.h
index b6b962d..9d874db 100644
--- a/arch/arm/include/asm/xen/hypercall.h
+++ b/arch/arm/include/asm/xen/hypercall.h
@@ -52,6 +52,7 @@ int HYPERVISOR_memory_op(unsigned int cmd, void *arg);
 int HYPERVISOR_physdev_op(int cmd, void *arg);
 int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args);
 int HYPERVISOR_tmem_op(void *arg);
+int HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type);
 int HYPERVISOR_platform_op_raw(void *arg);
 static inline int HYPERVISOR_platform_op(struct xen_platform_op *op)
 {
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 71db30c..0f3aa12 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -389,4 +389,5 @@ EXPORT_SYMBOL_GPL(HYPERVISOR_vcpu_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_multicall);
+EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist);
 EXPORT_SYMBOL_GPL(privcmd_call);
diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S
index 9a36f4f..a648dfc 100644
--- a/arch/arm/xen/hypercall.S
+++ b/arch/arm/xen/hypercall.S
@@ -91,6 +91,7 @@ HYPERCALL3(vcpu_op);
 HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
+HYPERCALL2(vm_assist);
 
 ENTRY(privcmd_call)
stmdb sp!, {r4}
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
index 70df80e..329c802 100644
--- a/arch/arm64/xen/hypercall.S
+++ b/arch/arm64/xen/hypercall.S
@@ -82,6 +82,7 @@ HYPERCALL3(vcpu_op);
 HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
+HYPERCALL2(vm_assist);
 
 ENTRY(privcmd_call)
mov x16, x0
-- 
2.6.6



[PATCH v2 2/3] arm/xen: add support for vm_assist hypercall

2016-07-05 Thread Juergen Gross
Add support for the Xen HYPERVISOR_vm_assist hypercall.

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/include/asm/xen/hypercall.h | 1 +
 arch/arm/xen/enlighten.c | 1 +
 arch/arm/xen/hypercall.S | 1 +
 arch/arm64/xen/hypercall.S   | 1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/arm/include/asm/xen/hypercall.h 
b/arch/arm/include/asm/xen/hypercall.h
index b6b962d..9d874db 100644
--- a/arch/arm/include/asm/xen/hypercall.h
+++ b/arch/arm/include/asm/xen/hypercall.h
@@ -52,6 +52,7 @@ int HYPERVISOR_memory_op(unsigned int cmd, void *arg);
 int HYPERVISOR_physdev_op(int cmd, void *arg);
 int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args);
 int HYPERVISOR_tmem_op(void *arg);
+int HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type);
 int HYPERVISOR_platform_op_raw(void *arg);
 static inline int HYPERVISOR_platform_op(struct xen_platform_op *op)
 {
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 71db30c..0f3aa12 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -389,4 +389,5 @@ EXPORT_SYMBOL_GPL(HYPERVISOR_vcpu_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_multicall);
+EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist);
 EXPORT_SYMBOL_GPL(privcmd_call);
diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S
index 9a36f4f..a648dfc 100644
--- a/arch/arm/xen/hypercall.S
+++ b/arch/arm/xen/hypercall.S
@@ -91,6 +91,7 @@ HYPERCALL3(vcpu_op);
 HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
+HYPERCALL2(vm_assist);
 
 ENTRY(privcmd_call)
stmdb sp!, {r4}
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
index 70df80e..329c802 100644
--- a/arch/arm64/xen/hypercall.S
+++ b/arch/arm64/xen/hypercall.S
@@ -82,6 +82,7 @@ HYPERCALL3(vcpu_op);
 HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
+HYPERCALL2(vm_assist);
 
 ENTRY(privcmd_call)
mov x16, x0
-- 
2.6.6



[PATCH] KVM: VMX: switch to hrtimer for TSC deadline timer when L2 guest is running

2016-07-05 Thread Haozhong Zhang
A different VMCS is loaded when L2 guest is running, so it's incorrect
to use the VMX preemption timer for L1 TSC deadline timer. This patch
switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
and switches back to VMX preemption timer when nested VMEXIT from L2 to
L1.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..cc29c2a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10203,6 +10203,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (!vmcs02)
return -ENOMEM;
 
+   if (kvm_lapic_hv_timer_in_use(vcpu))
+   kvm_lapic_switch_to_sw_timer(vcpu);
+
enter_guest_mode(vcpu);
 
vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
@@ -10227,6 +10230,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (msr_entry_idx) {
leave_guest_mode(vcpu);
vmx_load_vmcs01(vcpu);
+   kvm_lapic_switch_to_hv_timer(vcpu);
nested_vmx_entry_failure(vcpu, vmcs12,
EXIT_REASON_MSR_LOAD_FAIL, msr_entry_idx);
return 1;
@@ -10700,6 +10704,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, 
u32 exit_reason,
nested_vmx_abort(vcpu, VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 
vmx_load_vmcs01(vcpu);
+   kvm_lapic_switch_to_hv_timer(vcpu);
 
if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
&& nested_exit_intr_ack_set(vcpu)) {
-- 
2.9.0



[PATCH] KVM: VMX: switch to hrtimer for TSC deadline timer when L2 guest is running

2016-07-05 Thread Haozhong Zhang
A different VMCS is loaded when L2 guest is running, so it's incorrect
to use the VMX preemption timer for L1 TSC deadline timer. This patch
switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
and switches back to VMX preemption timer when nested VMEXIT from L2 to
L1.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..cc29c2a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10203,6 +10203,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (!vmcs02)
return -ENOMEM;
 
+   if (kvm_lapic_hv_timer_in_use(vcpu))
+   kvm_lapic_switch_to_sw_timer(vcpu);
+
enter_guest_mode(vcpu);
 
vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
@@ -10227,6 +10230,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
if (msr_entry_idx) {
leave_guest_mode(vcpu);
vmx_load_vmcs01(vcpu);
+   kvm_lapic_switch_to_hv_timer(vcpu);
nested_vmx_entry_failure(vcpu, vmcs12,
EXIT_REASON_MSR_LOAD_FAIL, msr_entry_idx);
return 1;
@@ -10700,6 +10704,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, 
u32 exit_reason,
nested_vmx_abort(vcpu, VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 
vmx_load_vmcs01(vcpu);
+   kvm_lapic_switch_to_hv_timer(vcpu);
 
if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
&& nested_exit_intr_ack_set(vcpu)) {
-- 
2.9.0



Re: [PATCH 2/2] i2c-dev: Don't block the adapter from unregistering

2016-07-05 Thread kbuild test robot
Hi,

[auto build test WARNING on wsa/i2c/for-next]
[also build test WARNING on v4.7-rc6 next-20160705]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Viresh-Kumar/i2c-dev-Don-t-let-userspace-block-adapter/20160706-110245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/linux/init.h:1: warning: no structured comments found
   kernel/sched/core.c:2079: warning: No description found for parameter 
'cookie'
   kernel/sys.c:1: warning: no structured comments found
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
>> include/linux/i2c.h:242: warning: No description found for parameter 
>> 'adapter_nr'

vim +/adapter_nr +242 include/linux/i2c.h

d64f73be David Brownell  2007-07-12  226   * managing the device.
^1da177e Linus Torvalds  2005-04-16  227   */
^1da177e Linus Torvalds  2005-04-16  228  struct i2c_client {
2096b956 David Brownell  2007-05-01  229unsigned short flags;   
/* div., see below  */
5071860a Jean Delvare2005-07-20  230unsigned short addr;
/* chip address - NOTE: 7bit*/
^1da177e Linus Torvalds  2005-04-16  231
/* addresses are stored in the  */
5071860a Jean Delvare2005-07-20  232
/* _LOWER_ 7 bits   */
2096b956 David Brownell  2007-05-01  233char name[I2C_NAME_SIZE];
^1da177e Linus Torvalds  2005-04-16  234struct i2c_adapter *adapter;
/* the adapter we sit on*/
^1da177e Linus Torvalds  2005-04-16  235struct device dev;  
/* the device structure */
8e29da9e Wolfram Sang2008-07-01  236int irq;
/* irq issued by device */
42fc20b8 Viresh Kumar2016-07-05  237int adapter_nr;
4735c98f Jean Delvare2008-07-14  238struct list_head detected;
d5fd120e Jean Delvare2015-01-26  239  #if IS_ENABLED(CONFIG_I2C_SLAVE)
4b1acc43 Wolfram Sang2014-11-18  240i2c_slave_cb_t slave_cb;
/* callback for slave mode  */
d5fd120e Jean Delvare2015-01-26  241  #endif
^1da177e Linus Torvalds  2005-04-16 @242  };
^1da177e Linus Torvalds  2005-04-16  243  #define to_i2c_client(d) 
container_of(d, struct i2c_client, dev)
^1da177e Linus Torvalds  2005-04-16  244  
9b766b81 David Brownell  2008-01-27  245  extern struct i2c_client 
*i2c_verify_client(struct device *dev);
643dd09e Stephen Warren  2012-04-17  246  extern struct i2c_adapter 
*i2c_verify_adapter(struct device *dev);
9b766b81 David Brownell  2008-01-27  247  
a61fc683 Ben Gardner 2005-07-27  248  static inline struct i2c_client 
*kobj_to_i2c_client(struct kobject *kobj)
a61fc683 Ben Gardner 2005-07-27  249  {
d75d53cd Mark M. Hoffman 2007-07-12  250struct device * const dev = 
container_of(kobj, struct device, kobj);

:: The code at line 242 was first introduced by commit
:: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:: TO: Linus Torvalds <torva...@ppc970.osdl.org>
:: CC: Linus Torvalds <torva...@ppc970.osdl.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 2/2] i2c-dev: Don't block the adapter from unregistering

2016-07-05 Thread kbuild test robot
Hi,

[auto build test WARNING on wsa/i2c/for-next]
[also build test WARNING on v4.7-rc6 next-20160705]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Viresh-Kumar/i2c-dev-Don-t-let-userspace-block-adapter/20160706-110245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/linux/init.h:1: warning: no structured comments found
   kernel/sched/core.c:2079: warning: No description found for parameter 
'cookie'
   kernel/sys.c:1: warning: no structured comments found
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
>> include/linux/i2c.h:242: warning: No description found for parameter 
>> 'adapter_nr'

vim +/adapter_nr +242 include/linux/i2c.h

d64f73be David Brownell  2007-07-12  226   * managing the device.
^1da177e Linus Torvalds  2005-04-16  227   */
^1da177e Linus Torvalds  2005-04-16  228  struct i2c_client {
2096b956 David Brownell  2007-05-01  229unsigned short flags;   
/* div., see below  */
5071860a Jean Delvare2005-07-20  230unsigned short addr;
/* chip address - NOTE: 7bit*/
^1da177e Linus Torvalds  2005-04-16  231
/* addresses are stored in the  */
5071860a Jean Delvare2005-07-20  232
/* _LOWER_ 7 bits   */
2096b956 David Brownell  2007-05-01  233char name[I2C_NAME_SIZE];
^1da177e Linus Torvalds  2005-04-16  234struct i2c_adapter *adapter;
/* the adapter we sit on*/
^1da177e Linus Torvalds  2005-04-16  235struct device dev;  
/* the device structure */
8e29da9e Wolfram Sang2008-07-01  236int irq;
/* irq issued by device */
42fc20b8 Viresh Kumar2016-07-05  237int adapter_nr;
4735c98f Jean Delvare2008-07-14  238struct list_head detected;
d5fd120e Jean Delvare2015-01-26  239  #if IS_ENABLED(CONFIG_I2C_SLAVE)
4b1acc43 Wolfram Sang2014-11-18  240i2c_slave_cb_t slave_cb;
/* callback for slave mode  */
d5fd120e Jean Delvare2015-01-26  241  #endif
^1da177e Linus Torvalds  2005-04-16 @242  };
^1da177e Linus Torvalds  2005-04-16  243  #define to_i2c_client(d) 
container_of(d, struct i2c_client, dev)
^1da177e Linus Torvalds  2005-04-16  244  
9b766b81 David Brownell  2008-01-27  245  extern struct i2c_client 
*i2c_verify_client(struct device *dev);
643dd09e Stephen Warren  2012-04-17  246  extern struct i2c_adapter 
*i2c_verify_adapter(struct device *dev);
9b766b81 David Brownell  2008-01-27  247  
a61fc683 Ben Gardner 2005-07-27  248  static inline struct i2c_client 
*kobj_to_i2c_client(struct kobject *kobj)
a61fc683 Ben Gardner 2005-07-27  249  {
d75d53cd Mark M. Hoffman 2007-07-12  250struct device * const dev = 
container_of(kobj, struct device, kobj);

:: The code at line 242 was first introduced by commit
:: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:: TO: Linus Torvalds 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v3 09/17] mm, compaction: make whole_zone flag ignore cached scanner positions

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:29AM +0200, Vlastimil Babka wrote:
> A recent patch has added whole_zone flag that compaction sets when scanning
> starts from the zone boundary, in order to report that zone has been fully
> scanned in one attempt. For allocations that want to try really hard or cannot
> fail, we will want to introduce a mode where scanning whole zone is guaranteed
> regardless of the cached positions.
> 
> This patch reuses the whole_zone flag in a way that if it's already passed 
> true
> to compaction, the cached scanner positions are ignored. Employing this flag

Okay. But, please don't reset cached scanner position even if whole_zone
flag is set. Just set cc->migrate_pfn and free_pfn, appropriately. With
your following patches, whole_zone could be set without any compaction
try so there is no point to reset cached scanner position in this
case.

Thanks.

> during reclaim/compaction loop will be done in the next patch. This patch
> however converts compaction invoked from userspace via procfs to use this 
> flag.
> Before this patch, the cached positions were first reset to zone boundaries 
> and
> then read back from struct zone, so there was a window where a parallel
> compaction could replace the reset values, making the manual compaction less
> effective. Using the flag instead of performing reset is more robust.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/compaction.c | 15 +--
>  mm/internal.h   |  2 +-
>  2 files changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f825a58bc37c..e7fe848e318e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1501,11 +1501,13 @@ static enum compact_result compact_zone(struct zone 
> *zone, struct compact_contro
>*/
>   cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
>   cc->free_pfn = zone->compact_cached_free_pfn;
> - if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> + if (cc->whole_zone || cc->free_pfn < start_pfn ||
> + cc->free_pfn >= end_pfn) {
>   cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
>   zone->compact_cached_free_pfn = cc->free_pfn;
>   }
> - if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
> + if (cc->whole_zone || cc->migrate_pfn < start_pfn ||
> + cc->migrate_pfn >= end_pfn) {
>   cc->migrate_pfn = start_pfn;
>   zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
>   zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> @@ -1751,14 +1753,6 @@ static void __compact_pgdat(pg_data_t *pgdat, struct 
> compact_control *cc)
>   INIT_LIST_HEAD(>freepages);
>   INIT_LIST_HEAD(>migratepages);
>  
> - /*
> -  * When called via /proc/sys/vm/compact_memory
> -  * this makes sure we compact the whole zone regardless of
> -  * cached scanner positions.
> -  */
> - if (is_via_compact_memory(cc->order))
> - __reset_isolation_suitable(zone);
> -
>   if (is_via_compact_memory(cc->order) ||
>   !compaction_deferred(zone, cc->order))
>   compact_zone(zone, cc);
> @@ -1794,6 +1788,7 @@ static void compact_node(int nid)
>   .order = -1,
>   .mode = MIGRATE_SYNC,
>   .ignore_skip_hint = true,
> + .whole_zone = true,
>   };
>  
>   __compact_pgdat(NODE_DATA(nid), );
> diff --git a/mm/internal.h b/mm/internal.h
> index 680e5ce2ab37..153bb52335b4 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -179,7 +179,7 @@ struct compact_control {
>   enum migrate_mode mode; /* Async or sync migration mode */
>   bool ignore_skip_hint;  /* Scan blocks even if marked skip */
>   bool direct_compaction; /* False from kcompactd or /proc/... */
> - bool whole_zone;/* Whole zone has been scanned */
> + bool whole_zone;/* Whole zone should/has been scanned */
>   int order;  /* order a direct compactor needs */
>   const gfp_t gfp_mask;   /* gfp mask of a direct compactor */
>   const unsigned int alloc_flags; /* alloc flags of a direct compactor */
> -- 
> 2.8.4


Re: [PATCH v3 09/17] mm, compaction: make whole_zone flag ignore cached scanner positions

2016-07-05 Thread Joonsoo Kim
On Fri, Jun 24, 2016 at 11:54:29AM +0200, Vlastimil Babka wrote:
> A recent patch has added whole_zone flag that compaction sets when scanning
> starts from the zone boundary, in order to report that zone has been fully
> scanned in one attempt. For allocations that want to try really hard or cannot
> fail, we will want to introduce a mode where scanning whole zone is guaranteed
> regardless of the cached positions.
> 
> This patch reuses the whole_zone flag in a way that if it's already passed 
> true
> to compaction, the cached scanner positions are ignored. Employing this flag

Okay. But, please don't reset cached scanner position even if whole_zone
flag is set. Just set cc->migrate_pfn and free_pfn, appropriately. With
your following patches, whole_zone could be set without any compaction
try so there is no point to reset cached scanner position in this
case.

Thanks.

> during reclaim/compaction loop will be done in the next patch. This patch
> however converts compaction invoked from userspace via procfs to use this 
> flag.
> Before this patch, the cached positions were first reset to zone boundaries 
> and
> then read back from struct zone, so there was a window where a parallel
> compaction could replace the reset values, making the manual compaction less
> effective. Using the flag instead of performing reset is more robust.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Michal Hocko 
> ---
>  mm/compaction.c | 15 +--
>  mm/internal.h   |  2 +-
>  2 files changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f825a58bc37c..e7fe848e318e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1501,11 +1501,13 @@ static enum compact_result compact_zone(struct zone 
> *zone, struct compact_contro
>*/
>   cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
>   cc->free_pfn = zone->compact_cached_free_pfn;
> - if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> + if (cc->whole_zone || cc->free_pfn < start_pfn ||
> + cc->free_pfn >= end_pfn) {
>   cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
>   zone->compact_cached_free_pfn = cc->free_pfn;
>   }
> - if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
> + if (cc->whole_zone || cc->migrate_pfn < start_pfn ||
> + cc->migrate_pfn >= end_pfn) {
>   cc->migrate_pfn = start_pfn;
>   zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
>   zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> @@ -1751,14 +1753,6 @@ static void __compact_pgdat(pg_data_t *pgdat, struct 
> compact_control *cc)
>   INIT_LIST_HEAD(>freepages);
>   INIT_LIST_HEAD(>migratepages);
>  
> - /*
> -  * When called via /proc/sys/vm/compact_memory
> -  * this makes sure we compact the whole zone regardless of
> -  * cached scanner positions.
> -  */
> - if (is_via_compact_memory(cc->order))
> - __reset_isolation_suitable(zone);
> -
>   if (is_via_compact_memory(cc->order) ||
>   !compaction_deferred(zone, cc->order))
>   compact_zone(zone, cc);
> @@ -1794,6 +1788,7 @@ static void compact_node(int nid)
>   .order = -1,
>   .mode = MIGRATE_SYNC,
>   .ignore_skip_hint = true,
> + .whole_zone = true,
>   };
>  
>   __compact_pgdat(NODE_DATA(nid), );
> diff --git a/mm/internal.h b/mm/internal.h
> index 680e5ce2ab37..153bb52335b4 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -179,7 +179,7 @@ struct compact_control {
>   enum migrate_mode mode; /* Async or sync migration mode */
>   bool ignore_skip_hint;  /* Scan blocks even if marked skip */
>   bool direct_compaction; /* False from kcompactd or /proc/... */
> - bool whole_zone;/* Whole zone has been scanned */
> + bool whole_zone;/* Whole zone should/has been scanned */
>   int order;  /* order a direct compactor needs */
>   const gfp_t gfp_mask;   /* gfp mask of a direct compactor */
>   const unsigned int alloc_flags; /* alloc flags of a direct compactor */
> -- 
> 2.8.4


Re: [PATCH net] net: mvneta: set real interrupt per packet for tx_done

2016-07-05 Thread Willy Tarreau
Hi,

On Wed, Jul 06, 2016 at 04:18:58AM +0200, Marcin Wojtas wrote:
> From: Dmitri Epshtein 
> 
> Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
> set coalescing threshold to a value guaranteeing interrupt generation
> per each sent packet, so that buffers can be released with no delay.
> 
> In fact setting threshold to '1' was wrong, because it causes interrupt
> every two packets. According to the documentation a reason behind it is
> following - interrupt occurs once sent buffers counter reaches a value,
> which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
> behavior was confirmed during tests. Also when testing the SoC working
> as a NAS device, better performance was observed with int-per-packet,
> as it strongly depends on the fact that all transmitted packets are
> released immediately.
> 
> This commit enables NETA controller work in interrupt per sent packet mode
> by setting coalescing threshold to 0.

We had a discussion about this in January 2015 and I thought I sent a patch
to change it but I can't find it, so I think it was only proposed to some
users for testing. I also remember that on more recent kernels by then
(>=3.13) we observed a slightly better performance with this value set to
zero.

Acked-by: Willy Tarreau 

Willy


Re: [PATCH net] net: mvneta: set real interrupt per packet for tx_done

2016-07-05 Thread Willy Tarreau
Hi,

On Wed, Jul 06, 2016 at 04:18:58AM +0200, Marcin Wojtas wrote:
> From: Dmitri Epshtein 
> 
> Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
> set coalescing threshold to a value guaranteeing interrupt generation
> per each sent packet, so that buffers can be released with no delay.
> 
> In fact setting threshold to '1' was wrong, because it causes interrupt
> every two packets. According to the documentation a reason behind it is
> following - interrupt occurs once sent buffers counter reaches a value,
> which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
> behavior was confirmed during tests. Also when testing the SoC working
> as a NAS device, better performance was observed with int-per-packet,
> as it strongly depends on the fact that all transmitted packets are
> released immediately.
> 
> This commit enables NETA controller work in interrupt per sent packet mode
> by setting coalescing threshold to 0.

We had a discussion about this in January 2015 and I thought I sent a patch
to change it but I can't find it, so I think it was only proposed to some
users for testing. I also remember that on more recent kernels by then
(>=3.13) we observed a slightly better performance with this value set to
zero.

Acked-by: Willy Tarreau 

Willy


[PATCH v2 0/3] xen: add full support for CONFIG_PARAVIRT_TIME_ACCOUNTING

2016-07-05 Thread Juergen Gross
With most recent Xen hypervisor (4.8) it is now possible to add full
support of CONFIG_PARAVIRT_TIME_ACCOUNTING.

To be applied on top of commit ed2f61fdd2356c2a1d1239aa1507ce4e2e059306
"xen: add steal_clock support on x86" of kernel/git/xen/tip.git

Runtime tested on x86_64, compile tested on arm64.

Changes in V2:
- patch 3: removed static variable as requested by Stefano Stabellini

Juergen Gross (3):
  xen: update xen headers
  arm/xen: add support for vm_assist hypercall
  xen: support runqueue steal time on xen

 arch/arm/include/asm/xen/hypercall.h |  1 +
 arch/arm/xen/enlighten.c |  1 +
 arch/arm/xen/hypercall.S |  1 +
 arch/arm64/xen/hypercall.S   |  1 +
 drivers/xen/time.c   | 40 +---
 include/xen/interface/vcpu.h | 24 ++
 include/xen/interface/xen.h  | 17 ++-
 7 files changed, 58 insertions(+), 27 deletions(-)

-- 
2.6.6



[PATCH v2 3/3] xen: support runqueue steal time on xen

2016-07-05 Thread Juergen Gross
Up to now reading the stolen time of a remote cpu was not possible in a
performant way under Xen. This made support of runqueue steal time via
paravirt_steal_rq_enabled impossible.

With the addition of an appropriate hypervisor interface this is now
possible, so add the support.

Signed-off-by: Juergen Gross 
---
V2: removed static variable as requested by Stefano Stabellini
---
 drivers/xen/time.c | 40 +++-
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 2257b66..a7fe35b 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -47,27 +47,31 @@ static u64 get64(const u64 *p)
return ret;
 }
 
-/*
- * Runstate accounting
- */
-void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
+static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res,
+ unsigned int cpu)
 {
u64 state_time;
struct vcpu_runstate_info *state;
 
BUG_ON(preemptible());
 
-   state = this_cpu_ptr(_runstate);
+   state = per_cpu_ptr(_runstate, cpu);
 
-   /*
-* The runstate info is always updated by the hypervisor on
-* the current CPU, so there's no need to use anything
-* stronger than a compiler barrier when fetching it.
-*/
do {
state_time = get64(>state_entry_time);
+   rmb();  /* Hypervisor might update data. */
*res = READ_ONCE(*state);
-   } while (get64(>state_entry_time) != state_time);
+   rmb();  /* Hypervisor might update data. */
+   } while (get64(>state_entry_time) != state_time ||
+(state_time & XEN_RUNSTATE_UPDATE));
+}
+
+/*
+ * Runstate accounting
+ */
+void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
+{
+   xen_get_runstate_snapshot_cpu(res, smp_processor_id());
 }
 
 /* return true when a vcpu could run but has no real cpu to run on */
@@ -80,8 +84,7 @@ static u64 xen_steal_clock(int cpu)
 {
struct vcpu_runstate_info state;
 
-   BUG_ON(cpu != smp_processor_id());
-   xen_get_runstate_snapshot();
+   xen_get_runstate_snapshot_cpu(, cpu);
return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
 }
 
@@ -98,11 +101,14 @@ void xen_setup_runstate_info(int cpu)
 
 void __init xen_time_setup_guest(void)
 {
+   bool xen_runstate_remote;
+
+   xen_runstate_remote = !HYPERVISOR_vm_assist(VMASST_CMD_enable,
+   VMASST_TYPE_runstate_update_flag);
+
pv_time_ops.steal_clock = xen_steal_clock;
 
static_key_slow_inc(_steal_enabled);
-   /*
-* We can't set paravirt_steal_rq_enabled as this would require the
-* capability to read another cpu's runstate info.
-*/
+   if (xen_runstate_remote)
+   static_key_slow_inc(_steal_rq_enabled);
 }
-- 
2.6.6



[PATCH v2 1/3] xen: update xen headers

2016-07-05 Thread Juergen Gross
Update some Xen headers to be able to use new functionality.

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Stefano Stabellini 
---
 include/xen/interface/vcpu.h | 24 +++-
 include/xen/interface/xen.h  | 17 -
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..98188c8 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -75,15 +75,21 @@
  */
 #define VCPUOP_get_runstate_info4
 struct vcpu_runstate_info {
-   /* VCPU's current state (RUNSTATE_*). */
-   int  state;
-   /* When was current state entered (system time, ns)? */
-   uint64_t state_entry_time;
-   /*
-* Time spent in each RUNSTATE_* (ns). The sum of these times is
-* guaranteed not to drift from system time.
-*/
-   uint64_t time[4];
+   /* VCPU's current state (RUNSTATE_*). */
+   int  state;
+   /* When was current state entered (system time, ns)? */
+   uint64_t state_entry_time;
+   /*
+* Update indicator set in state_entry_time:
+* When activated via VMASST_TYPE_runstate_update_flag, set during
+* updates in guest memory mapped copy of vcpu_runstate_info.
+*/
+#define XEN_RUNSTATE_UPDATE(1ULL << 63)
+   /*
+* Time spent in each RUNSTATE_* (ns). The sum of these times is
+* guaranteed not to drift from system time.
+*/
+   uint64_t time[4];
 };
 DEFINE_GUEST_HANDLE_STRUCT(vcpu_runstate_info);
 
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index d133112..1b0d189 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -413,7 +413,22 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 /* x86/PAE guests: support PDPTs above 4GB. */
 #define VMASST_TYPE_pae_extended_cr3 3
 
-#define MAX_VMASST_TYPE 3
+/*
+ * x86 guests: Sane behaviour for virtual iopl
+ *  - virtual iopl updated from do_iret() hypercalls.
+ *  - virtual iopl reported in bounce frames.
+ *  - guest kernels assumed to be level 0 for the purpose of iopl checks.
+ */
+#define VMASST_TYPE_architectural_iopl   4
+
+/*
+ * All guests: activate update indicator in vcpu_runstate_info
+ * Enable setting the XEN_RUNSTATE_UPDATE flag in guest memory mapped
+ * vcpu_runstate_info during updates of the runstate information.
+ */
+#define VMASST_TYPE_runstate_update_flag 5
+
+#define MAX_VMASST_TYPE 5
 
 #ifndef __ASSEMBLY__
 
-- 
2.6.6



[PATCH v2 0/3] xen: add full support for CONFIG_PARAVIRT_TIME_ACCOUNTING

2016-07-05 Thread Juergen Gross
With most recent Xen hypervisor (4.8) it is now possible to add full
support of CONFIG_PARAVIRT_TIME_ACCOUNTING.

To be applied on top of commit ed2f61fdd2356c2a1d1239aa1507ce4e2e059306
"xen: add steal_clock support on x86" of kernel/git/xen/tip.git

Runtime tested on x86_64, compile tested on arm64.

Changes in V2:
- patch 3: removed static variable as requested by Stefano Stabellini

Juergen Gross (3):
  xen: update xen headers
  arm/xen: add support for vm_assist hypercall
  xen: support runqueue steal time on xen

 arch/arm/include/asm/xen/hypercall.h |  1 +
 arch/arm/xen/enlighten.c |  1 +
 arch/arm/xen/hypercall.S |  1 +
 arch/arm64/xen/hypercall.S   |  1 +
 drivers/xen/time.c   | 40 +---
 include/xen/interface/vcpu.h | 24 ++
 include/xen/interface/xen.h  | 17 ++-
 7 files changed, 58 insertions(+), 27 deletions(-)

-- 
2.6.6



[PATCH v2 3/3] xen: support runqueue steal time on xen

2016-07-05 Thread Juergen Gross
Up to now reading the stolen time of a remote cpu was not possible in a
performant way under Xen. This made support of runqueue steal time via
paravirt_steal_rq_enabled impossible.

With the addition of an appropriate hypervisor interface this is now
possible, so add the support.

Signed-off-by: Juergen Gross 
---
V2: removed static variable as requested by Stefano Stabellini
---
 drivers/xen/time.c | 40 +++-
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 2257b66..a7fe35b 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -47,27 +47,31 @@ static u64 get64(const u64 *p)
return ret;
 }
 
-/*
- * Runstate accounting
- */
-void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
+static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res,
+ unsigned int cpu)
 {
u64 state_time;
struct vcpu_runstate_info *state;
 
BUG_ON(preemptible());
 
-   state = this_cpu_ptr(_runstate);
+   state = per_cpu_ptr(_runstate, cpu);
 
-   /*
-* The runstate info is always updated by the hypervisor on
-* the current CPU, so there's no need to use anything
-* stronger than a compiler barrier when fetching it.
-*/
do {
state_time = get64(>state_entry_time);
+   rmb();  /* Hypervisor might update data. */
*res = READ_ONCE(*state);
-   } while (get64(>state_entry_time) != state_time);
+   rmb();  /* Hypervisor might update data. */
+   } while (get64(>state_entry_time) != state_time ||
+(state_time & XEN_RUNSTATE_UPDATE));
+}
+
+/*
+ * Runstate accounting
+ */
+void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
+{
+   xen_get_runstate_snapshot_cpu(res, smp_processor_id());
 }
 
 /* return true when a vcpu could run but has no real cpu to run on */
@@ -80,8 +84,7 @@ static u64 xen_steal_clock(int cpu)
 {
struct vcpu_runstate_info state;
 
-   BUG_ON(cpu != smp_processor_id());
-   xen_get_runstate_snapshot();
+   xen_get_runstate_snapshot_cpu(, cpu);
return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
 }
 
@@ -98,11 +101,14 @@ void xen_setup_runstate_info(int cpu)
 
 void __init xen_time_setup_guest(void)
 {
+   bool xen_runstate_remote;
+
+   xen_runstate_remote = !HYPERVISOR_vm_assist(VMASST_CMD_enable,
+   VMASST_TYPE_runstate_update_flag);
+
pv_time_ops.steal_clock = xen_steal_clock;
 
static_key_slow_inc(_steal_enabled);
-   /*
-* We can't set paravirt_steal_rq_enabled as this would require the
-* capability to read another cpu's runstate info.
-*/
+   if (xen_runstate_remote)
+   static_key_slow_inc(_steal_rq_enabled);
 }
-- 
2.6.6



[PATCH v2 1/3] xen: update xen headers

2016-07-05 Thread Juergen Gross
Update some Xen headers to be able to use new functionality.

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Stefano Stabellini 
---
 include/xen/interface/vcpu.h | 24 +++-
 include/xen/interface/xen.h  | 17 -
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..98188c8 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -75,15 +75,21 @@
  */
 #define VCPUOP_get_runstate_info4
 struct vcpu_runstate_info {
-   /* VCPU's current state (RUNSTATE_*). */
-   int  state;
-   /* When was current state entered (system time, ns)? */
-   uint64_t state_entry_time;
-   /*
-* Time spent in each RUNSTATE_* (ns). The sum of these times is
-* guaranteed not to drift from system time.
-*/
-   uint64_t time[4];
+   /* VCPU's current state (RUNSTATE_*). */
+   int  state;
+   /* When was current state entered (system time, ns)? */
+   uint64_t state_entry_time;
+   /*
+* Update indicator set in state_entry_time:
+* When activated via VMASST_TYPE_runstate_update_flag, set during
+* updates in guest memory mapped copy of vcpu_runstate_info.
+*/
+#define XEN_RUNSTATE_UPDATE(1ULL << 63)
+   /*
+* Time spent in each RUNSTATE_* (ns). The sum of these times is
+* guaranteed not to drift from system time.
+*/
+   uint64_t time[4];
 };
 DEFINE_GUEST_HANDLE_STRUCT(vcpu_runstate_info);
 
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index d133112..1b0d189 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -413,7 +413,22 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 /* x86/PAE guests: support PDPTs above 4GB. */
 #define VMASST_TYPE_pae_extended_cr3 3
 
-#define MAX_VMASST_TYPE 3
+/*
+ * x86 guests: Sane behaviour for virtual iopl
+ *  - virtual iopl updated from do_iret() hypercalls.
+ *  - virtual iopl reported in bounce frames.
+ *  - guest kernels assumed to be level 0 for the purpose of iopl checks.
+ */
+#define VMASST_TYPE_architectural_iopl   4
+
+/*
+ * All guests: activate update indicator in vcpu_runstate_info
+ * Enable setting the XEN_RUNSTATE_UPDATE flag in guest memory mapped
+ * vcpu_runstate_info during updates of the runstate information.
+ */
+#define VMASST_TYPE_runstate_update_flag 5
+
+#define MAX_VMASST_TYPE 5
 
 #ifndef __ASSEMBLY__
 
-- 
2.6.6



Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check

2016-07-05 Thread xinhui

Hi, wanpeng

On 2016年07月05日 17:57, Wanpeng Li wrote:

Hi Xinhui,
2016-06-28 22:43 GMT+08:00 Pan Xinhui :

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself. Currently
kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
as parameter and return true if the cpu is preempted. Then kernel can break
the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Only pSeries need supoort it. And the fact is powerNV are built into same
kernel image with pSeries. So we need return false if we are runnig as
powerNV. The another fact is that lppaca->yiled_count keeps zero on
powerNV. So we can just skip the machine type.


Lock holder vCPU preemption can be detected by hardware pSeries or
paravirt method?


There is one shard struct between kernel and powerVM/KVM. And we read the 
yield_count of this struct to detect if one vcpu is running or not.
SO it's easy for ppc to implement such interface. Note that yield_count is set 
by powerVM/KVM.
and only pSeries can run a guest for now. :)

I also review x86 related code, looks like we need add one hyer-call to get 
such vcpu preemption info?

thanks
xinui

Regards,
Wanpeng Li





Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check

2016-07-05 Thread xinhui

Hi, wanpeng

On 2016年07月05日 17:57, Wanpeng Li wrote:

Hi Xinhui,
2016-06-28 22:43 GMT+08:00 Pan Xinhui :

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself. Currently
kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
as parameter and return true if the cpu is preempted. Then kernel can break
the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Only pSeries need supoort it. And the fact is powerNV are built into same
kernel image with pSeries. So we need return false if we are runnig as
powerNV. The another fact is that lppaca->yiled_count keeps zero on
powerNV. So we can just skip the machine type.


Lock holder vCPU preemption can be detected by hardware pSeries or
paravirt method?


There is one shard struct between kernel and powerVM/KVM. And we read the 
yield_count of this struct to detect if one vcpu is running or not.
SO it's easy for ppc to implement such interface. Note that yield_count is set 
by powerVM/KVM.
and only pSeries can run a guest for now. :)

I also review x86 related code, looks like we need add one hyer-call to get 
such vcpu preemption info?

thanks
xinui

Regards,
Wanpeng Li





Re: [PATCH v3 1/2] clk: exynos5433: do not use CLK_IGNORE_UNUSED for SPI clocks

2016-07-05 Thread Andi Shyti
Hi Sylwester,

> +#if 0
> clocks = <_peric CLK_PCLK_SPI1>,
>  <_top CLK_SCLK_SPI1_PERIC>;
> +#else
> +   clocks = <_peric CLK_PCLK_SPI1>,
> +<_peric CLK_SCLK_SPI1>;
> +#endif

Yes, that's how it should be, indeed.

> /* Disable Clock */
> if (sdd->port_conf->clk_from_cmu) {
> -   clk_disable_unprepare(sdd->src_clk);
> +   /* clk_disable_unprepare(sdd->src_clk); */
> } else {
> val = readl(regs + S3C64XX_SPI_CLK_CFG);
> val &= ~S3C64XX_SPI_ENCLK_ENABLE;
> @@ -626,7 +626,7 @@ static void s3c64xx_spi_config(struct 
> s3c64xx_spi_driver_data *sdd)
> /* There is half-multiplier before the SPI */
> clk_set_rate(sdd->src_clk, sdd->cur_speed * 2);
> /* Enable Clock */
> -   clk_prepare_enable(sdd->src_clk);
> +   /* clk_prepare_enable(sdd->src_clk); */
> } else {
> /* Configure Clock */
> val = readl(regs + S3C64XX_SPI_CLK_CFG);

I don't see anything wrong on the above. We could make it as:

@@ -596,9 +597,7 @@ static void s3c64xx_spi_config(struct 
s3c64xx_spi_driver_data *sdd)
u32 val;
 
/* Disable Clock */
-   if (sdd->port_conf->clk_from_cmu) {
-   clk_disable_unprepare(sdd->src_clk);
-   } else {
+   if (!sdd->port_conf->clk_from_cmu) {
val = readl(regs + S3C64XX_SPI_CLK_CFG);
val &= ~S3C64XX_SPI_ENCLK_ENABLE;
writel(val, regs + S3C64XX_SPI_CLK_CFG);
@@ -640,13 +639,7 @@ static void s3c64xx_spi_config(struct 
s3c64xx_spi_driver_data *sdd)
 
writel(val, regs + S3C64XX_SPI_MODE_CFG);
 
-   if (sdd->port_conf->clk_from_cmu) {
-   /* Configure Clock */
-   /* There is half-multiplier before the SPI */
-   clk_set_rate(sdd->src_clk, sdd->cur_speed * 2);
-   /* Enable Clock */
-   clk_prepare_enable(sdd->src_clk);
-   } else {
+   if (!sdd->port_conf->clk_from_cmu) {
/* Configure Clock */
val = readl(regs + S3C64XX_SPI_CLK_CFG);
val &= ~S3C64XX_SPI_PSR_MASK;

> I meant we could amend which clocks are specified at the SPI bus device
> DT nodes and change handling of clocks in the spi-s3c64xx driver to model
> everything properly and get it all working.

I think that if the clock comes from the cmu it's not necessary
to disable it. I would like to avoid adding DTS properties
because we have the clock disabling inherited from old code which
it might not be required at all (in our tests, indeed it works).

Thanks,
Andi


Re: [PATCH v3 1/2] clk: exynos5433: do not use CLK_IGNORE_UNUSED for SPI clocks

2016-07-05 Thread Andi Shyti
Hi Sylwester,

> +#if 0
> clocks = <_peric CLK_PCLK_SPI1>,
>  <_top CLK_SCLK_SPI1_PERIC>;
> +#else
> +   clocks = <_peric CLK_PCLK_SPI1>,
> +<_peric CLK_SCLK_SPI1>;
> +#endif

Yes, that's how it should be, indeed.

> /* Disable Clock */
> if (sdd->port_conf->clk_from_cmu) {
> -   clk_disable_unprepare(sdd->src_clk);
> +   /* clk_disable_unprepare(sdd->src_clk); */
> } else {
> val = readl(regs + S3C64XX_SPI_CLK_CFG);
> val &= ~S3C64XX_SPI_ENCLK_ENABLE;
> @@ -626,7 +626,7 @@ static void s3c64xx_spi_config(struct 
> s3c64xx_spi_driver_data *sdd)
> /* There is half-multiplier before the SPI */
> clk_set_rate(sdd->src_clk, sdd->cur_speed * 2);
> /* Enable Clock */
> -   clk_prepare_enable(sdd->src_clk);
> +   /* clk_prepare_enable(sdd->src_clk); */
> } else {
> /* Configure Clock */
> val = readl(regs + S3C64XX_SPI_CLK_CFG);

I don't see anything wrong on the above. We could make it as:

@@ -596,9 +597,7 @@ static void s3c64xx_spi_config(struct 
s3c64xx_spi_driver_data *sdd)
u32 val;
 
/* Disable Clock */
-   if (sdd->port_conf->clk_from_cmu) {
-   clk_disable_unprepare(sdd->src_clk);
-   } else {
+   if (!sdd->port_conf->clk_from_cmu) {
val = readl(regs + S3C64XX_SPI_CLK_CFG);
val &= ~S3C64XX_SPI_ENCLK_ENABLE;
writel(val, regs + S3C64XX_SPI_CLK_CFG);
@@ -640,13 +639,7 @@ static void s3c64xx_spi_config(struct 
s3c64xx_spi_driver_data *sdd)
 
writel(val, regs + S3C64XX_SPI_MODE_CFG);
 
-   if (sdd->port_conf->clk_from_cmu) {
-   /* Configure Clock */
-   /* There is half-multiplier before the SPI */
-   clk_set_rate(sdd->src_clk, sdd->cur_speed * 2);
-   /* Enable Clock */
-   clk_prepare_enable(sdd->src_clk);
-   } else {
+   if (!sdd->port_conf->clk_from_cmu) {
/* Configure Clock */
val = readl(regs + S3C64XX_SPI_CLK_CFG);
val &= ~S3C64XX_SPI_PSR_MASK;

> I meant we could amend which clocks are specified at the SPI bus device
> DT nodes and change handling of clocks in the spi-s3c64xx driver to model
> everything properly and get it all working.

I think that if the clock comes from the cmu it's not necessary
to disable it. I would like to avoid adding DTS properties
because we have the clock disabling inherited from old code which
it might not be required at all (in our tests, indeed it works).

Thanks,
Andi


Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Oleg Drokin

On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
 +  /* Otherwise we just unhash it to be rehashed afresh via
 +   * lookup if necessary
 +   */
 +  d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>   BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke… Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
(LOOKUP_OPEN | LOOKUP_CREATE))
return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking 
held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.


Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Oleg Drokin

On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
 +  /* Otherwise we just unhash it to be rehashed afresh via
 +   * lookup if necessary
 +   */
 +  d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>   BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke… Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
(LOOKUP_OPEN | LOOKUP_CREATE))
return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking 
held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.


Re: [linux-sunxi] [PATCH 2/4] power: add axp20x-battery driver

2016-07-05 Thread Michael Haas

Hi There,

On 05.07.2016 10:33, Icenowy Zheng wrote:

On 01.07.2016 11:29, Icenowy Zheng wrote:



  +
  +static enum power_supply_property axp22x_battery_properties[] = {
  + POWER_SUPPLY_PROP_CAPACITY,
  + POWER_SUPPLY_PROP_HEALTH,
  + POWER_SUPPLY_PROP_PRESENT,
  + POWER_SUPPLY_PROP_STATUS,
  + POWER_SUPPLY_PROP_CURRENT_NOW,
  + POWER_SUPPLY_PROP_VOLTAGE_NOW,
  +};


Here's what Bruno's driver supports:

static enum power_supply_property axp20x_battery_power_properties[] = {
POWER_SUPPLY_PROP_PRESENT,
POWER_SUPPLY_PROP_ONLINE,
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_VOLTAGE_NOW,
POWER_SUPPLY_PROP_CURRENT_NOW,
POWER_SUPPLY_PROP_CURRENT_MAX,
POWER_SUPPLY_PROP_HEALTH,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN,
POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN,
/* POWER_SUPPLY_PROP_POWER_NOW, */
POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN,
/* POWER_SUPPLY_PROP_CHARGE_NOW, */
POWER_SUPPLY_PROP_CAPACITY,
POWER_SUPPLY_PROP_TEMP,
POWER_SUPPLY_PROP_TEMP_ALERT_MIN,
POWER_SUPPLY_PROP_TEMP_ALERT_MAX,
};


  +
  +static int axp20x_battery_probe(struct platform_device *pdev)
  +{
  + struct axp20x_dev *axp20x = dev_get_drvdata(pdev->dev.parent);
  + struct power_supply_config psy_cfg = {};
  + struct axp20x_battery *power;
  + static const char * const axp22x_irq_names[] = {
  + "BATT_PLUGIN", "BATT_REMOVAL", "BATT_ENT_ACT_MODE",
  + "BATT_EXIT_ACT_MODE", "CHARG", "CHARG_DONE", NULL };



And here are the interrupts handled:

static const char * const irq_names[] = { "BATT_HOT", "BATT_COLD",
"BATT_PLUGIN", "BATT_REMOVAL", "BATT_ACTIVATE",
"BATT_ACTIVATED", "BATT_CHARGING", "BATT_CHARGED",
"BATT_CHG_CURR_LOW", "BATT_POWER_LOW_WARN",
"BATT_POWER_LOW_CRIT" };



There are a couple of issues with the version of Bruno's driver that I have:

* power management is disabled (in the driver, not in the charger) - 
think suspend/resume
* the temperature sensor data is not turned into a temperature value 
correctly

* the IRQ handlers need to be cleaned up (remove logging)
* device tree binding documentation is missing

Other than that, it's basically working and I have been using it at 
least for some testing.


The device tree bindings support:
* OCV curve support
* battery resistance
* battery capacity
* temp sensor settings

You can find my copy of Bruno's driver here:
https://github.com/mhaas/linux-sunxi/blob/axp209-charger/drivers/power/axp20x_fuel_gauge.c

Thanks,

Michael







Re: [linux-sunxi] [PATCH 2/4] power: add axp20x-battery driver

2016-07-05 Thread Michael Haas

Hi There,

On 05.07.2016 10:33, Icenowy Zheng wrote:

On 01.07.2016 11:29, Icenowy Zheng wrote:



  +
  +static enum power_supply_property axp22x_battery_properties[] = {
  + POWER_SUPPLY_PROP_CAPACITY,
  + POWER_SUPPLY_PROP_HEALTH,
  + POWER_SUPPLY_PROP_PRESENT,
  + POWER_SUPPLY_PROP_STATUS,
  + POWER_SUPPLY_PROP_CURRENT_NOW,
  + POWER_SUPPLY_PROP_VOLTAGE_NOW,
  +};


Here's what Bruno's driver supports:

static enum power_supply_property axp20x_battery_power_properties[] = {
POWER_SUPPLY_PROP_PRESENT,
POWER_SUPPLY_PROP_ONLINE,
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_VOLTAGE_NOW,
POWER_SUPPLY_PROP_CURRENT_NOW,
POWER_SUPPLY_PROP_CURRENT_MAX,
POWER_SUPPLY_PROP_HEALTH,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN,
POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN,
/* POWER_SUPPLY_PROP_POWER_NOW, */
POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN,
/* POWER_SUPPLY_PROP_CHARGE_NOW, */
POWER_SUPPLY_PROP_CAPACITY,
POWER_SUPPLY_PROP_TEMP,
POWER_SUPPLY_PROP_TEMP_ALERT_MIN,
POWER_SUPPLY_PROP_TEMP_ALERT_MAX,
};


  +
  +static int axp20x_battery_probe(struct platform_device *pdev)
  +{
  + struct axp20x_dev *axp20x = dev_get_drvdata(pdev->dev.parent);
  + struct power_supply_config psy_cfg = {};
  + struct axp20x_battery *power;
  + static const char * const axp22x_irq_names[] = {
  + "BATT_PLUGIN", "BATT_REMOVAL", "BATT_ENT_ACT_MODE",
  + "BATT_EXIT_ACT_MODE", "CHARG", "CHARG_DONE", NULL };



And here are the interrupts handled:

static const char * const irq_names[] = { "BATT_HOT", "BATT_COLD",
"BATT_PLUGIN", "BATT_REMOVAL", "BATT_ACTIVATE",
"BATT_ACTIVATED", "BATT_CHARGING", "BATT_CHARGED",
"BATT_CHG_CURR_LOW", "BATT_POWER_LOW_WARN",
"BATT_POWER_LOW_CRIT" };



There are a couple of issues with the version of Bruno's driver that I have:

* power management is disabled (in the driver, not in the charger) - 
think suspend/resume
* the temperature sensor data is not turned into a temperature value 
correctly

* the IRQ handlers need to be cleaned up (remove logging)
* device tree binding documentation is missing

Other than that, it's basically working and I have been using it at 
least for some testing.


The device tree bindings support:
* OCV curve support
* battery resistance
* battery capacity
* temp sensor settings

You can find my copy of Bruno's driver here:
https://github.com/mhaas/linux-sunxi/blob/axp209-charger/drivers/power/axp20x_fuel_gauge.c

Thanks,

Michael







Re: [PATCH 5/5] overlayfs: Use vfs_getxattr_noperm() for real inode

2016-07-05 Thread Miklos Szeredi
On Tue, Jul 5, 2016 at 11:16 PM, Vivek Goyal  wrote:
> On Tue, Jul 05, 2016 at 01:29:39PM -0700, Casey Schaufler wrote:
>> On 7/5/2016 8:50 AM, Vivek Goyal wrote:
>> > ovl_getxattr() currently uses vfs_getxattr() on realinode. This fails
>> > if mounter does not have DAC/MAC permission to access getxattr.
>> >
>> > Specifically this becomes a problem when selinux is trying to initialize
>> > overlay inode and does ->getxattr(overlay_inode). A task might trigger
>> > initialization of overlay inode and we will access real inode xattr in the
>> > context of mounter and if mounter does not have permissions, then inode
>> > selinux context initialization fails and inode is labeled as unlabeled_t.
>> >
>> > One way to deal with it is to let SELinux do getxattr checks both on
>> > overlay inode and underlying inode and overlay can call 
>> > vfs_getxattr_noperm()
>> > to make sure when selinux is trying to initialize label on inode, it does
>> > not go through checks on lower levels and initialization is successful.
>> > And after inode initialization, SELinux will make sure task has getatttr
>> > permission.
>> >
>> > One issue with this approach is that it does not work for directories as
>> > d_real() returns the overlay dentry for directories and not the underlying
>> > directory dentry.
>> >
>> > Another way to deal with it to introduce another function pointer in
>> > inode_operations, say getxattr_noperm(), which is responsible to get
>> > xattr without any checks. SELinux initialization code will call this
>> > first if it is available on inode. So user space code path will call
>> > ->getxattr() and that will go through checks and SELinux internal
>> > initialization will call ->getxattr_noperm() and that will not
>> > go through checks.
>> >
>> > For now, I am just converting ovl_getxattr() to get xattr without
>> > any checks on underlying inode. That means it is possible for
>> > a task to get xattr of a file/dir on lower/upper through overlay mount
>> > while it is not possible outside overlay mount.
>> >
>> > If this is a major concern, I can look into implementing getxattr_noperm().
>>
>> This is a major concern.
>
> Hmm.., In that case I will write patch to provide another inode operation
> getxattr_noperm() and a wrapper which falls back to getxattr() if noperm
> variant is not defined. That should take care of this issue.

That's not going to fly.  A slighly better, but still quite ugly
solution would be to add a "flags" arg to the current ->getxattr()
callback indicating whether the caller wants permission checking
inside the call or not.

But we already have the current->creds.  Can't that be used to control
the permission checking done by the callback?

Thanks,
Miklos


Re: [PATCH 5/5] overlayfs: Use vfs_getxattr_noperm() for real inode

2016-07-05 Thread Miklos Szeredi
On Tue, Jul 5, 2016 at 11:16 PM, Vivek Goyal  wrote:
> On Tue, Jul 05, 2016 at 01:29:39PM -0700, Casey Schaufler wrote:
>> On 7/5/2016 8:50 AM, Vivek Goyal wrote:
>> > ovl_getxattr() currently uses vfs_getxattr() on realinode. This fails
>> > if mounter does not have DAC/MAC permission to access getxattr.
>> >
>> > Specifically this becomes a problem when selinux is trying to initialize
>> > overlay inode and does ->getxattr(overlay_inode). A task might trigger
>> > initialization of overlay inode and we will access real inode xattr in the
>> > context of mounter and if mounter does not have permissions, then inode
>> > selinux context initialization fails and inode is labeled as unlabeled_t.
>> >
>> > One way to deal with it is to let SELinux do getxattr checks both on
>> > overlay inode and underlying inode and overlay can call 
>> > vfs_getxattr_noperm()
>> > to make sure when selinux is trying to initialize label on inode, it does
>> > not go through checks on lower levels and initialization is successful.
>> > And after inode initialization, SELinux will make sure task has getatttr
>> > permission.
>> >
>> > One issue with this approach is that it does not work for directories as
>> > d_real() returns the overlay dentry for directories and not the underlying
>> > directory dentry.
>> >
>> > Another way to deal with it to introduce another function pointer in
>> > inode_operations, say getxattr_noperm(), which is responsible to get
>> > xattr without any checks. SELinux initialization code will call this
>> > first if it is available on inode. So user space code path will call
>> > ->getxattr() and that will go through checks and SELinux internal
>> > initialization will call ->getxattr_noperm() and that will not
>> > go through checks.
>> >
>> > For now, I am just converting ovl_getxattr() to get xattr without
>> > any checks on underlying inode. That means it is possible for
>> > a task to get xattr of a file/dir on lower/upper through overlay mount
>> > while it is not possible outside overlay mount.
>> >
>> > If this is a major concern, I can look into implementing getxattr_noperm().
>>
>> This is a major concern.
>
> Hmm.., In that case I will write patch to provide another inode operation
> getxattr_noperm() and a wrapper which falls back to getxattr() if noperm
> variant is not defined. That should take care of this issue.

That's not going to fly.  A slighly better, but still quite ugly
solution would be to add a "flags" arg to the current ->getxattr()
callback indicating whether the caller wants permission checking
inside the call or not.

But we already have the current->creds.  Can't that be used to control
the permission checking done by the callback?

Thanks,
Miklos


Re: [PATCH v2 0/7] lib: string: add functions to case-convert strings

2016-07-05 Thread Markus Mayer
On 5 July 2016 at 15:56, Joe Perches  wrote:
> On Tue, 2016-07-05 at 15:36 -0700, Markus Mayer wrote:
>> On 5 July 2016 at 15:14, Joe Perches  wrote:
>> > On Tue, 2016-07-05 at 13:47 -0700, Markus Mayer wrote:
>> > > This series introduces a family of generic string case conversion
>> > > functions. This kind of functionality is needed in several places in
>> > > the kernel. Right now, everybody seems to be implementing their own
>> > > copy of this functionality.
>> > >
>> > > Based on the discussion of the previous version of this series[1] and
>> > > the use cases found in the kernel, it does look like having several
>> > > flavours of case conversion functions is beneficial. The use cases fall
>> > > into three categories:
>> > > - copying a string and converting the case while specifying a
>> > >   maximum length to mimic strncpy()
>> > > - copying a string and converting the case without specifying a
>> > >   length to mimic strcpy()
>> > > - converting the case of a string in-place (i.e. modifying the
>> > >   string that was passed in)
>> > >
>> > > Consequently, I am proposing these new functions:
>> > > char *strncpytoupper(char *dst, const char *src, size_t len);
>> > > char *strncpytolower(char *dst, const char *src, size_t len);
>> > > char *strcpytoupper(char *dst, const char *src);
>> > > char *strcpytolower(char *dst, const char *src);
>> > > char *strtoupper(char *s);
>> > > char *strtolower(char *s);
>> > I think there isn't much value in anything other
>> > than strto.
>> >
>> > Using str[n]cpy followed by strto is
>> > pretty obvious and rarely used anyway.
>> First time around, folks were proposing the "copy" variants when I
>> submitted just strtolower() by itself[1]. They just asked for source
>> and destination parameters to strtolower(), but looking at the use
>> cases that wouldn't have worked so well. Hence it evolved into these 6
>> functions.
>>
>> Here's a breakdown of how the functions are being used (patches 2-7),
>> see also [2]:
>>
>> Patch 2: strncpytolower()
>> Patch 3: strtolower()
>> Patch 4: strncpytolower() and strtolower()
>> Patch 5: strtolower()
>> Patch 6: strcpytoupper()
>> Patch 7: strcpytoupper()
>>
>> So it does look like the copy + change case variant is more frequently
>> used than just strto.
>
> Are these functions useful?   Not to me, not so much.

The use cases do exist. I'll leave it up to the maintainers to decide
whether duplicate implementations or potentially unused generic
functions are to be preferred.

What I do know is that I have a driver in the wings that also needs a
strolower() implementation. If this ends up not being accepted, I'll
have to add yet another driver-local strtolower() implementation to
the kernel. But if that's the decision then so be it.

> None of the functions would have the strcpy performance of
> the arch / asm
> versions of strcpy and the savings in overall
> code isn't significant (or
> measured?).

100% agreed. These functions won't set any speed records. Keep in mind
also that in 4 out of the 6 cases where I replaced local
implementations of "strcpytoXXX", the code was doing essentially the
same I am proposing here (no assembly code). Only 2 of the 6 called
strncpy() before, benefiting from optimized assembly implementations.
But they still had to walk the string explicitly afterwards to convert
the case, which probably means the overall speed won't change that
much using the functions proposed here.

> Of course none of the uses are runtime performance important.

That is also very true.

> This patch also adds always compiled functions that aren't used
> in many .configs.

It adds 2 functions (strncpytolower() and strncpytoupper()). The other
4 are static inline one-liners and won't show up anywhere if not used
(and if used the compiler will insert calls to strncpyto
instead).

Regards,
-Markus


Re: [PATCH v2 0/7] lib: string: add functions to case-convert strings

2016-07-05 Thread Markus Mayer
On 5 July 2016 at 15:56, Joe Perches  wrote:
> On Tue, 2016-07-05 at 15:36 -0700, Markus Mayer wrote:
>> On 5 July 2016 at 15:14, Joe Perches  wrote:
>> > On Tue, 2016-07-05 at 13:47 -0700, Markus Mayer wrote:
>> > > This series introduces a family of generic string case conversion
>> > > functions. This kind of functionality is needed in several places in
>> > > the kernel. Right now, everybody seems to be implementing their own
>> > > copy of this functionality.
>> > >
>> > > Based on the discussion of the previous version of this series[1] and
>> > > the use cases found in the kernel, it does look like having several
>> > > flavours of case conversion functions is beneficial. The use cases fall
>> > > into three categories:
>> > > - copying a string and converting the case while specifying a
>> > >   maximum length to mimic strncpy()
>> > > - copying a string and converting the case without specifying a
>> > >   length to mimic strcpy()
>> > > - converting the case of a string in-place (i.e. modifying the
>> > >   string that was passed in)
>> > >
>> > > Consequently, I am proposing these new functions:
>> > > char *strncpytoupper(char *dst, const char *src, size_t len);
>> > > char *strncpytolower(char *dst, const char *src, size_t len);
>> > > char *strcpytoupper(char *dst, const char *src);
>> > > char *strcpytolower(char *dst, const char *src);
>> > > char *strtoupper(char *s);
>> > > char *strtolower(char *s);
>> > I think there isn't much value in anything other
>> > than strto.
>> >
>> > Using str[n]cpy followed by strto is
>> > pretty obvious and rarely used anyway.
>> First time around, folks were proposing the "copy" variants when I
>> submitted just strtolower() by itself[1]. They just asked for source
>> and destination parameters to strtolower(), but looking at the use
>> cases that wouldn't have worked so well. Hence it evolved into these 6
>> functions.
>>
>> Here's a breakdown of how the functions are being used (patches 2-7),
>> see also [2]:
>>
>> Patch 2: strncpytolower()
>> Patch 3: strtolower()
>> Patch 4: strncpytolower() and strtolower()
>> Patch 5: strtolower()
>> Patch 6: strcpytoupper()
>> Patch 7: strcpytoupper()
>>
>> So it does look like the copy + change case variant is more frequently
>> used than just strto.
>
> Are these functions useful?   Not to me, not so much.

The use cases do exist. I'll leave it up to the maintainers to decide
whether duplicate implementations or potentially unused generic
functions are to be preferred.

What I do know is that I have a driver in the wings that also needs a
strolower() implementation. If this ends up not being accepted, I'll
have to add yet another driver-local strtolower() implementation to
the kernel. But if that's the decision then so be it.

> None of the functions would have the strcpy performance of
> the arch / asm
> versions of strcpy and the savings in overall
> code isn't significant (or
> measured?).

100% agreed. These functions won't set any speed records. Keep in mind
also that in 4 out of the 6 cases where I replaced local
implementations of "strcpytoXXX", the code was doing essentially the
same I am proposing here (no assembly code). Only 2 of the 6 called
strncpy() before, benefiting from optimized assembly implementations.
But they still had to walk the string explicitly afterwards to convert
the case, which probably means the overall speed won't change that
much using the functions proposed here.

> Of course none of the uses are runtime performance important.

That is also very true.

> This patch also adds always compiled functions that aren't used
> in many .configs.

It adds 2 functions (strncpytolower() and strncpytoupper()). The other
4 are static inline one-liners and won't show up anywhere if not used
(and if used the compiler will insert calls to strncpyto
instead).

Regards,
-Markus


Re: [PATCH 04/14] PCI: generic: make it explicitly non-modular

2016-07-05 Thread Jayachandran C
On Wed, Jul 6, 2016 at 1:49 AM, David Daney  wrote:
>
> On 07/04/2016 10:37 AM, Will Deacon wrote:
>>
>> On Sat, Jul 02, 2016 at 07:13:24PM -0400, Paul Gortmaker wrote:
>>>
>>> The Kconfig currently controlling compilation of this code is:
>>>
>>> drivers/pci/host/Kconfig:config PCI_HOST_GENERIC
>>> drivers/pci/host/Kconfig:   bool "Generic PCI host controller"
>>>
>>> ...meaning that it currently is not being built as a module by anyone.
>>>
>>> Lets remove the few trace uses of modular code and macros, so that
>>> when reading the driver there is no doubt it is builtin-only.
>>>
>>> Since module_platform_driver() uses the same init level priority as
>>> builtin_platform_driver() the init ordering remains unchanged with
>>> this commit.
>>>
>>> Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
>>>
>>> We also delete the MODULE_LICENSE tag etc. since all that information
>>> is already contained at the top of the file in the comments.
>>
>>
>> Ideally, we'd simply fix this to build as a module, but it's not clear
>> how to do that now that the ecam accessors have been split out into
>> their own file. A liberal sprinkling of EXPORT_SYMBOL might work, but
>> it's a bit grotty.
>>
>> David, Jayachandran -- do you have any desire to build your PCI host
>> controller drivers as modules?
>
>
> I can only speak to the Cavium case.
>
> The system is not usable without PCI, so there is no advantage to making the 
> PCI host drivers modular.  At this point I don't see any reason to expend 
> effort making it work as a module.
>

Broadcom's case is also similar - pci-host-generic would to be built-in
when booting with device tree, or it would be unnecessary when booting
with ACPI. We don't have a scenario in which having it as a module would
be useful.

JC


Re: [PATCH 3/3] xen: support runqueue steal time on xen

2016-07-05 Thread Juergen Gross
On 05/07/16 17:23, Stefano Stabellini wrote:
> On Wed, 22 Jun 2016, Juergen Gross wrote:
>> Up to now reading the stolen time of a remote cpu was not possible in a
>> performant way under Xen. This made support of runqueue steal time via
>> paravirt_steal_rq_enabled impossible.
>>
>> With the addition of an appropriate hypervisor interface this is now
>> possible, so add the support.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  drivers/xen/time.c | 42 +-
>>  1 file changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
>> index 2257b66..65afbe9 100644
>> --- a/drivers/xen/time.c
>> +++ b/drivers/xen/time.c
>> @@ -19,6 +19,9 @@
>>  /* runstate info updated by Xen */
>>  static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
>>  
>> +/* runstate info of remote cpu accessible */
>> +static bool xen_runstate_remote;
> 
> Honestly I would rather have one global variable less if it means only
> dropping one BUG_ON.

Okay, I'll remove it.


Juergen


Re: [PATCH 3/3] xen: support runqueue steal time on xen

2016-07-05 Thread Juergen Gross
On 05/07/16 17:23, Stefano Stabellini wrote:
> On Wed, 22 Jun 2016, Juergen Gross wrote:
>> Up to now reading the stolen time of a remote cpu was not possible in a
>> performant way under Xen. This made support of runqueue steal time via
>> paravirt_steal_rq_enabled impossible.
>>
>> With the addition of an appropriate hypervisor interface this is now
>> possible, so add the support.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  drivers/xen/time.c | 42 +-
>>  1 file changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
>> index 2257b66..65afbe9 100644
>> --- a/drivers/xen/time.c
>> +++ b/drivers/xen/time.c
>> @@ -19,6 +19,9 @@
>>  /* runstate info updated by Xen */
>>  static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
>>  
>> +/* runstate info of remote cpu accessible */
>> +static bool xen_runstate_remote;
> 
> Honestly I would rather have one global variable less if it means only
> dropping one BUG_ON.

Okay, I'll remove it.


Juergen


Re: [PATCH 04/14] PCI: generic: make it explicitly non-modular

2016-07-05 Thread Jayachandran C
On Wed, Jul 6, 2016 at 1:49 AM, David Daney  wrote:
>
> On 07/04/2016 10:37 AM, Will Deacon wrote:
>>
>> On Sat, Jul 02, 2016 at 07:13:24PM -0400, Paul Gortmaker wrote:
>>>
>>> The Kconfig currently controlling compilation of this code is:
>>>
>>> drivers/pci/host/Kconfig:config PCI_HOST_GENERIC
>>> drivers/pci/host/Kconfig:   bool "Generic PCI host controller"
>>>
>>> ...meaning that it currently is not being built as a module by anyone.
>>>
>>> Lets remove the few trace uses of modular code and macros, so that
>>> when reading the driver there is no doubt it is builtin-only.
>>>
>>> Since module_platform_driver() uses the same init level priority as
>>> builtin_platform_driver() the init ordering remains unchanged with
>>> this commit.
>>>
>>> Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
>>>
>>> We also delete the MODULE_LICENSE tag etc. since all that information
>>> is already contained at the top of the file in the comments.
>>
>>
>> Ideally, we'd simply fix this to build as a module, but it's not clear
>> how to do that now that the ecam accessors have been split out into
>> their own file. A liberal sprinkling of EXPORT_SYMBOL might work, but
>> it's a bit grotty.
>>
>> David, Jayachandran -- do you have any desire to build your PCI host
>> controller drivers as modules?
>
>
> I can only speak to the Cavium case.
>
> The system is not usable without PCI, so there is no advantage to making the 
> PCI host drivers modular.  At this point I don't see any reason to expend 
> effort making it work as a module.
>

Broadcom's case is also similar - pci-host-generic would to be built-in
when booting with device tree, or it would be unnecessary when booting
with ACPI. We don't have a scenario in which having it as a module would
be useful.

JC


Re: [f2fs-dev] [PATCH 1/2] f2fs: check only data or node for summary

2016-07-05 Thread He YunLei

On 2016/6/11 5:01, Jaegeuk Kim wrote:

We can check data or node types only for gc, since we allocate different type of
data/node blocks in a different logs occasionally.

Signed-off-by: Jaegeuk Kim 
---
  fs/f2fs/gc.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index e1d274c..c2c4ac3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -806,7 +806,8 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
f2fs_put_page(sum_page, 0);

sum = page_address(sum_page);
-   f2fs_bug_on(sbi, type != GET_SUM_TYPE((>footer)));
+   f2fs_bug_on(sbi, IS_DATASEG(type) !=
+   IS_DATASEG(GET_SUM_TYPE((>footer;


Hi, Kim
type has been transformed into SUM_TYPE_DATA or SUM_TYPE_NODE, so here
no need to do this.

Thanks


/*
 * this is to avoid deadlock:





Re: [f2fs-dev] [PATCH 1/2] f2fs: check only data or node for summary

2016-07-05 Thread He YunLei

On 2016/6/11 5:01, Jaegeuk Kim wrote:

We can check data or node types only for gc, since we allocate different type of
data/node blocks in a different logs occasionally.

Signed-off-by: Jaegeuk Kim 
---
  fs/f2fs/gc.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index e1d274c..c2c4ac3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -806,7 +806,8 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
f2fs_put_page(sum_page, 0);

sum = page_address(sum_page);
-   f2fs_bug_on(sbi, type != GET_SUM_TYPE((>footer)));
+   f2fs_bug_on(sbi, IS_DATASEG(type) !=
+   IS_DATASEG(GET_SUM_TYPE((>footer;


Hi, Kim
type has been transformed into SUM_TYPE_DATA or SUM_TYPE_NODE, so here
no need to do this.

Thanks


/*
 * this is to avoid deadlock:





Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:22:59AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 11:07 PM, Neo Jia wrote:
> >This is kept there in case the validate_map_request() is not provided by 
> >vendor
> >driver then by default assume 1:1 mapping. So if validate_map_request() is 
> >not
> >provided, fault handler should not fail.
> 
> THESE are the parameters you passed to validate_map_request(), and these info 
> is
> available in mmap(), it really does not matter if you move 
> validate_map_request()
> to mmap(). That's what i want to say.

Let me answer this at the end of my response.

> 
> >
> >>
> >>>None of such information is available at VFIO mmap() time. For example, 
> >>>several VMs
> >>>are sharing the same physical device to provide mediated access. All VMs 
> >>>will
> >>>call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >>>initialization
> >>>process, at that moment, we definitely can't mmap the entire physical MMIO
> >>>into both VM blindly for obvious reason.
> >>>
> >>
> >>mmap() carries @length information, so you only need to allocate the 
> >>specified size
> >>(corresponding to @length) of memory for them.
> >
> >Again, you still look at this as a static partition at QEMU configuration 
> >time
> >where the guest mmio will be mapped as a whole at some offset of the physical
> >mmio region. (You still can do that like I said above by not providing
> >validate_map_request() in your vendor driver.)
> >
> 
> Then you can move validate_map_request() to here to achieve custom 
> allocation-policy.
> 
> >But this is not the framework we are defining here.
> >
> >The framework we have here is to provide the driver vendor flexibility to 
> >decide
> >the guest mmio and physical mmio mapping on page basis, and such information 
> >is
> >available during runtime.
> >
> >How such information gets communicated between guest and host driver is up to
> >driver vendor.
> 
> The problems is the sequence of the way "provide the driver vendor
> flexibility to decide the guest mmio and physical mmio mapping on page basis"
> and mmap().
> 
> We should provide such allocation info first then do mmap(). You current 
> design,
> do mmap() -> communication telling such info -> use such info when fault 
> happens,
> is really BAD, because you can not control the time when memory fault will 
> happen.
> The guest may access this memory before the communication you mentioned above,
> and another reason is that KVM MMU can prefetch memory at any time.

Like I have said before if your implementation doesn't need such flexibility,
you can still do a static mapping at VFIO mmap() time, then your mediated driver
doesn't have to provide validate_map_request, also the fault handler will not be
called. 

Let me address your questions below.

1. Information available at VFIO mmap() time?

So you are saying that the @req_size and  are both available in the time
when people are calling VFIO mmap() when guest OS is not even running, right?

The answer is No, the only thing are available at VFIO mmap are the following:

1) guest MMIO size

2) host physical MMIO size

3) guest MMIO starting address

4) host MMIO starting address

But none of above are the @req_size and @pgoff that we are talking about at the
validate_map_request time.

Our host MMIO is representing the means to access GPU HW resource. Those GPU HW
resources are allocated dynamically at runtime. So we have no visibility of the
@pgoff and @req_size that is covering some specific type of GPU HW resource at
VFIO mmap time. Also we don't even know if such resource will be required for a
particular VM or not.

For example, VM1 will need to launch a lot of graphics workload than VM2. So the
end result is that VM1 will gets a lot of resource A allocated than VM2 to
support his graphics workload. And to access resource A, the host mmio region
will be allocated as well, say [pfn_a size_a], the VM2 is [pfn_b, size_b].

Clearly, such region can be destroyed and reallocated through mediated driver
lifetime. This is why we need to have a fault handler there to map the proper
pages into guest after validation in the runtime.

I hope above response can address your question why we can't provide such
allocation info at VFIO mmap() time.

2. Guest might access mmio region at any time ...

Guest with a mediated GPU inside can definitely access his BARs at any time. If
guest is accessing some his BAR region that is not previously allocated, then
such access will be denied and with current scheme VM will crash to prevent
malicious access from the guest. This is another reason we choose to keep the
guest MMIO mediated.

3. KVM MMU can prefetch memory at any time.

You are talking about the KVM MMU prefetch the guest mmio region which is marked
as prefetchable right?

On baremetal, the prefetch is basic a cache line fill, where the range needs to
be marked as cachable for the CPU, then it issues a read to anywhere in the
cache line.

Is KVM MMU prefetch the same as 

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:22:59AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 11:07 PM, Neo Jia wrote:
> >This is kept there in case the validate_map_request() is not provided by 
> >vendor
> >driver then by default assume 1:1 mapping. So if validate_map_request() is 
> >not
> >provided, fault handler should not fail.
> 
> THESE are the parameters you passed to validate_map_request(), and these info 
> is
> available in mmap(), it really does not matter if you move 
> validate_map_request()
> to mmap(). That's what i want to say.

Let me answer this at the end of my response.

> 
> >
> >>
> >>>None of such information is available at VFIO mmap() time. For example, 
> >>>several VMs
> >>>are sharing the same physical device to provide mediated access. All VMs 
> >>>will
> >>>call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >>>initialization
> >>>process, at that moment, we definitely can't mmap the entire physical MMIO
> >>>into both VM blindly for obvious reason.
> >>>
> >>
> >>mmap() carries @length information, so you only need to allocate the 
> >>specified size
> >>(corresponding to @length) of memory for them.
> >
> >Again, you still look at this as a static partition at QEMU configuration 
> >time
> >where the guest mmio will be mapped as a whole at some offset of the physical
> >mmio region. (You still can do that like I said above by not providing
> >validate_map_request() in your vendor driver.)
> >
> 
> Then you can move validate_map_request() to here to achieve custom 
> allocation-policy.
> 
> >But this is not the framework we are defining here.
> >
> >The framework we have here is to provide the driver vendor flexibility to 
> >decide
> >the guest mmio and physical mmio mapping on page basis, and such information 
> >is
> >available during runtime.
> >
> >How such information gets communicated between guest and host driver is up to
> >driver vendor.
> 
> The problems is the sequence of the way "provide the driver vendor
> flexibility to decide the guest mmio and physical mmio mapping on page basis"
> and mmap().
> 
> We should provide such allocation info first then do mmap(). You current 
> design,
> do mmap() -> communication telling such info -> use such info when fault 
> happens,
> is really BAD, because you can not control the time when memory fault will 
> happen.
> The guest may access this memory before the communication you mentioned above,
> and another reason is that KVM MMU can prefetch memory at any time.

Like I have said before if your implementation doesn't need such flexibility,
you can still do a static mapping at VFIO mmap() time, then your mediated driver
doesn't have to provide validate_map_request, also the fault handler will not be
called. 

Let me address your questions below.

1. Information available at VFIO mmap() time?

So you are saying that the @req_size and  are both available in the time
when people are calling VFIO mmap() when guest OS is not even running, right?

The answer is No, the only thing are available at VFIO mmap are the following:

1) guest MMIO size

2) host physical MMIO size

3) guest MMIO starting address

4) host MMIO starting address

But none of above are the @req_size and @pgoff that we are talking about at the
validate_map_request time.

Our host MMIO is representing the means to access GPU HW resource. Those GPU HW
resources are allocated dynamically at runtime. So we have no visibility of the
@pgoff and @req_size that is covering some specific type of GPU HW resource at
VFIO mmap time. Also we don't even know if such resource will be required for a
particular VM or not.

For example, VM1 will need to launch a lot of graphics workload than VM2. So the
end result is that VM1 will gets a lot of resource A allocated than VM2 to
support his graphics workload. And to access resource A, the host mmio region
will be allocated as well, say [pfn_a size_a], the VM2 is [pfn_b, size_b].

Clearly, such region can be destroyed and reallocated through mediated driver
lifetime. This is why we need to have a fault handler there to map the proper
pages into guest after validation in the runtime.

I hope above response can address your question why we can't provide such
allocation info at VFIO mmap() time.

2. Guest might access mmio region at any time ...

Guest with a mediated GPU inside can definitely access his BARs at any time. If
guest is accessing some his BAR region that is not previously allocated, then
such access will be denied and with current scheme VM will crash to prevent
malicious access from the guest. This is another reason we choose to keep the
guest MMIO mediated.

3. KVM MMU can prefetch memory at any time.

You are talking about the KVM MMU prefetch the guest mmio region which is marked
as prefetchable right?

On baremetal, the prefetch is basic a cache line fill, where the range needs to
be marked as cachable for the CPU, then it issues a read to anywhere in the
cache line.

Is KVM MMU prefetch the same as 

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:57 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.


My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)



It always can be changed especially the vfio patchset is not in a good shape.





May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when 
fault
happens rather than mmap(), and this design is exact what we are discussing for
nearly two days.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:57 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.


My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)



It always can be changed especially the vfio patchset is not in a good shape.





May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when 
fault
happens rather than mmap(), and this design is exact what we are discussing for
nearly two days.



Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Oleg Drokin

On Jul 5, 2016, at 11:20 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>> +   /* Otherwise we just unhash it to be rehashed afresh via
>>> +* lookup if necessary
>>> +*/
>>> +   d_drop(dentry);
>> 
>> So we can even drop this part and retain the top condition as it was.
>> d_add does not care if the dentry we are feeding it was hashed or not,
>> so do you see any downsides to doing that I wonder?
> 
> d_add() on hashed dentry will end up reaching this:
> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
> {
>BUG_ON(!d_unhashed(entry));

Ah, ok. Yes, I remember about it now from the older issue with nfs.

It's still puzzling why I did not hit it yet, but oh well.

I wonder if handling of negative dentries broke… Time for more investigations.



Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Oleg Drokin

On Jul 5, 2016, at 11:20 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>> +   /* Otherwise we just unhash it to be rehashed afresh via
>>> +* lookup if necessary
>>> +*/
>>> +   d_drop(dentry);
>> 
>> So we can even drop this part and retain the top condition as it was.
>> d_add does not care if the dentry we are feeding it was hashed or not,
>> so do you see any downsides to doing that I wonder?
> 
> d_add() on hashed dentry will end up reaching this:
> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
> {
>BUG_ON(!d_unhashed(entry));

Ah, ok. Yes, I remember about it now from the older issue with nfs.

It's still puzzling why I did not hit it yet, but oh well.

I wonder if handling of negative dentries broke… Time for more investigations.



Re: [PATCH 2/2] s390/mm: use ipte range to invalidate multiple page table entries

2016-07-05 Thread Hillf Danton
> 
> +void ptep_invalidate_range(struct mm_struct *mm, unsigned long start,
> +unsigned long end, pte_t *ptep)
> +{
> + unsigned long nr;
> +
> + if (!MACHINE_HAS_IPTE_RANGE || mm_has_pgste(mm))
> + return;
> + preempt_disable();
> + nr = (end - start) >> PAGE_SHIFT;
> + /* If the flush is likely to be local skip the ipte range */
> + if (nr && !cpumask_equal(mm_cpumask(mm),
> +  cpumask_of(smp_processor_id(

s/smp/raw_smp/ to avoid adding schedule entry with page table
lock held?

> + __ptep_ipte_range(start, nr - 1, ptep);
> + preempt_enable();
> +}
> +EXPORT_SYMBOL(ptep_invalidate_range);
> +

thanks
Hillf



Re: [PATCH 2/2] s390/mm: use ipte range to invalidate multiple page table entries

2016-07-05 Thread Hillf Danton
> 
> +void ptep_invalidate_range(struct mm_struct *mm, unsigned long start,
> +unsigned long end, pte_t *ptep)
> +{
> + unsigned long nr;
> +
> + if (!MACHINE_HAS_IPTE_RANGE || mm_has_pgste(mm))
> + return;
> + preempt_disable();
> + nr = (end - start) >> PAGE_SHIFT;
> + /* If the flush is likely to be local skip the ipte range */
> + if (nr && !cpumask_equal(mm_cpumask(mm),
> +  cpumask_of(smp_processor_id(

s/smp/raw_smp/ to avoid adding schedule entry with page table
lock held?

> + __ptep_ipte_range(start, nr - 1, ptep);
> + preempt_enable();
> +}
> +EXPORT_SYMBOL(ptep_invalidate_range);
> +

thanks
Hillf



Re: [GIT PULL 3/3] bcm2835-arm64-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 05/07/2016 11:03, Jason Cooper a écrit :
> Eric,
> 
> On Sun, Jul 03, 2016 at 05:45:36PM -0700, Eric Anholt wrote:
>>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
>>
>> are available in the git repository at:
>>
>>   https://github.com/anholt/linux tags/bcm2835-arm64-next-2016-07-03
>>
>> for you to fetch changes up to 628d30d1ccb897fee54a6f7312561cf2f6f72f09:
>>
>>   arm64: Add platform selection for BCM2835. (2016-06-15 14:10:30 -0700)
>>
>> 
>> This pull request brings in the build support for the Raspberry Pi
>> arm64 port.
>>
>> 
>> Alexander Graf (1):
>>   arm64: Allow for different DMA and CPU bus offsets
>>
>> Eric Anholt (3):
>>   irqchip: bcm2835: Avoid arch/arm-specific handle_IRQ
>>   Merge remote-tracking branch 'irqchip/irqchip/bcm' into 
>> bcm2835-arm64-next
> 
> You may want to draw attention to the fact that this has an external
> dependency on my irqchip/bcm branch.  Also, that it's stable, been in
> -next for a couple of weeks, and based on v4.7-rc1, etc.

Thanks for the heads-up, merged and updated the merge commit to make
this clear, thanks everyone.
-- 
Florian


Re: [GIT PULL 3/3] bcm2835-arm64-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 05/07/2016 11:03, Jason Cooper a écrit :
> Eric,
> 
> On Sun, Jul 03, 2016 at 05:45:36PM -0700, Eric Anholt wrote:
>>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
>>
>> are available in the git repository at:
>>
>>   https://github.com/anholt/linux tags/bcm2835-arm64-next-2016-07-03
>>
>> for you to fetch changes up to 628d30d1ccb897fee54a6f7312561cf2f6f72f09:
>>
>>   arm64: Add platform selection for BCM2835. (2016-06-15 14:10:30 -0700)
>>
>> 
>> This pull request brings in the build support for the Raspberry Pi
>> arm64 port.
>>
>> 
>> Alexander Graf (1):
>>   arm64: Allow for different DMA and CPU bus offsets
>>
>> Eric Anholt (3):
>>   irqchip: bcm2835: Avoid arch/arm-specific handle_IRQ
>>   Merge remote-tracking branch 'irqchip/irqchip/bcm' into 
>> bcm2835-arm64-next
> 
> You may want to draw attention to the fact that this has an external
> dependency on my irqchip/bcm branch.  Also, that it's stable, been in
> -next for a couple of weeks, and based on v4.7-rc1, etc.

Thanks for the heads-up, merged and updated the merge commit to make
this clear, thanks everyone.
-- 
Florian


Re: [GIT PULL 2/3] bcm2835-dt-64-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 03/07/2016 17:45, Eric Anholt a écrit :
>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/anholt/linux tags/bcm2835-dt-64-next-2016-07-03
> 
> for you to fetch changes up to 02d08603649816a941246c18252e5c41fd07625a:
> 
>   ARM: bcm2837: dt: Add the ethernet to the device trees (2016-06-07 15:23:08 
> -0700)
> 
> 
> This pull request brings in the Raspberry Pi 3 DT for its arm64
> support.  Note that it also merges in the ethernet DT changes so that
> the Pi3's ethernet can also get the MAC address.
> 
> 
> Eric Anholt (3):
>   dt-bindings: Add root properties for Raspberry Pi 3
>   ARM: bcm2835: Add devicetree for the Raspberry Pi 3.
>   Merge tag 'bcm2835-dt-ethernet' into HEAD
> 
> Gerd Hoffmann (1):
>   ARM: bcm2837: dt: Add the ethernet to the device trees
> 
> Lubomir Rintel (1):
>   ARM: bcm2835: dt: Add the ethernet to the device trees

Merged, thanks Eric!
-- 
Florian


Re: [GIT PULL 2/3] bcm2835-dt-64-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 03/07/2016 17:45, Eric Anholt a écrit :
>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/anholt/linux tags/bcm2835-dt-64-next-2016-07-03
> 
> for you to fetch changes up to 02d08603649816a941246c18252e5c41fd07625a:
> 
>   ARM: bcm2837: dt: Add the ethernet to the device trees (2016-06-07 15:23:08 
> -0700)
> 
> 
> This pull request brings in the Raspberry Pi 3 DT for its arm64
> support.  Note that it also merges in the ethernet DT changes so that
> the Pi3's ethernet can also get the MAC address.
> 
> 
> Eric Anholt (3):
>   dt-bindings: Add root properties for Raspberry Pi 3
>   ARM: bcm2835: Add devicetree for the Raspberry Pi 3.
>   Merge tag 'bcm2835-dt-ethernet' into HEAD
> 
> Gerd Hoffmann (1):
>   ARM: bcm2837: dt: Add the ethernet to the device trees
> 
> Lubomir Rintel (1):
>   ARM: bcm2835: dt: Add the ethernet to the device trees

Merged, thanks Eric!
-- 
Florian


Re: [GIT PULL 1/3] bcm2835-dt-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 03/07/2016 17:45, Eric Anholt a écrit :
>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/anholt/linux tags/bcm2835-dt-next-2016-07-03
> 
> for you to fetch changes up to 6a93792774fc72861b7e8efaa3545a88272b4413:
> 
>   ARM: bcm2835: dt: Add the ethernet to the device trees (2016-05-31 10:32:34 
> -0700)
> 
> 
> This pull request brings in the change to describe the ethernet in the
> DT so that the firmware can tell us its MAC address.
> 
> 
> Lubomir Rintel (1):
>   ARM: bcm2835: dt: Add the ethernet to the device trees

Merged, thanks!
-- 
Florian


Re: [GIT PULL 1/3] bcm2835-dt-next-2016-07-03

2016-07-05 Thread Florian Fainelli
Le 03/07/2016 17:45, Eric Anholt a écrit :
>   Linux 4.7-rc1 (2016-05-29 09:29:24 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/anholt/linux tags/bcm2835-dt-next-2016-07-03
> 
> for you to fetch changes up to 6a93792774fc72861b7e8efaa3545a88272b4413:
> 
>   ARM: bcm2835: dt: Add the ethernet to the device trees (2016-05-31 10:32:34 
> -0700)
> 
> 
> This pull request brings in the change to describe the ethernet in the
> DT so that the firmware can tell us its MAC address.
> 
> 
> Lubomir Rintel (1):
>   ARM: bcm2835: dt: Add the ethernet to the device trees

Merged, thanks!
-- 
Florian


[PATCH v3] fs/dcache.c: avoid soft-lockup in dput()

2016-07-05 Thread Wei Fang
We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:

WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[] dput+0x100/0x298
[] terminate_walk+0x4c/0x60
[] path_lookupat+0x5cc/0x7a8
[] filename_lookup+0x38/0xf0
[] user_path_at_empty+0x78/0xd0
[] user_path_at+0x1c/0x28
[] SyS_faccessat+0xb4/0x230

->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.

Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.

Cc: 
Signed-off-by: Wei Fang 
---
Changes v1->v2:
- add might_sleep() to annotate that dput() can sleep
Changes v2->v3:
- put cond_resched() in dput()

 fs/dcache.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ad4a542..77f6687 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -589,7 +589,6 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 
 failed:
spin_unlock(>d_lock);
-   cpu_relax();
return dentry; /* try again with same dentry */
 }
 
@@ -763,6 +762,8 @@ void dput(struct dentry *dentry)
return;
 
 repeat:
+   might_sleep();
+
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
@@ -796,8 +797,10 @@ repeat:
 
 kill_it:
dentry = dentry_kill(dentry);
-   if (dentry)
+   if (dentry) {
+   cond_resched();
goto repeat;
+   }
 }
 EXPORT_SYMBOL(dput);
 
-- 
1.7.1



[PATCH v3] fs/dcache.c: avoid soft-lockup in dput()

2016-07-05 Thread Wei Fang
We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:

WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[] dput+0x100/0x298
[] terminate_walk+0x4c/0x60
[] path_lookupat+0x5cc/0x7a8
[] filename_lookup+0x38/0xf0
[] user_path_at_empty+0x78/0xd0
[] user_path_at+0x1c/0x28
[] SyS_faccessat+0xb4/0x230

->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.

Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.

Cc: 
Signed-off-by: Wei Fang 
---
Changes v1->v2:
- add might_sleep() to annotate that dput() can sleep
Changes v2->v3:
- put cond_resched() in dput()

 fs/dcache.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ad4a542..77f6687 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -589,7 +589,6 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 
 failed:
spin_unlock(>d_lock);
-   cpu_relax();
return dentry; /* try again with same dentry */
 }
 
@@ -763,6 +762,8 @@ void dput(struct dentry *dentry)
return;
 
 repeat:
+   might_sleep();
+
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
@@ -796,8 +797,10 @@ repeat:
 
 kill_it:
dentry = dentry_kill(dentry);
-   if (dentry)
+   if (dentry) {
+   cond_resched();
goto repeat;
+   }
 }
 EXPORT_SYMBOL(dput);
 
-- 
1.7.1



Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Al Viro
On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
> > +   /* Otherwise we just unhash it to be rehashed afresh via
> > +* lookup if necessary
> > +*/
> > +   d_drop(dentry);
> 
> So we can even drop this part and retain the top condition as it was.
> d_add does not care if the dentry we are feeding it was hashed or not,
> so do you see any downsides to doing that I wonder?

d_add() on hashed dentry will end up reaching this:
static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
{
BUG_ON(!d_unhashed(entry));



Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

2016-07-05 Thread Al Viro
On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
> > +   /* Otherwise we just unhash it to be rehashed afresh via
> > +* lookup if necessary
> > +*/
> > +   d_drop(dentry);
> 
> So we can even drop this part and retain the top condition as it was.
> d_add does not care if the dentry we are feeding it was hashed or not,
> so do you see any downsides to doing that I wonder?

d_add() on hashed dentry will end up reaching this:
static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
{
BUG_ON(!d_unhashed(entry));



Re: [RESEND PATCH v5 1/5] mfd: RK808: Add RK818 support

2016-07-05 Thread Andy Yan

Hi Wadim:

On 2016年06月09日 16:23, Wadim Egorov wrote:

Hi,

On 08.06.2016 16:17, Lee Jones wrote:

On Thu, 02 Jun 2016, Wadim Egorov wrote:


The RK818 chip is a power management IC for multimedia and handheld

"Power Management IC (PMIC)"


devices. It contains the following components:

- Regulators
- RTC
- Clkout

Clocking


- battery support

Battery support


Both chips RK808 and RK818 are using a similar register map.

"Both RK808 ad RK818 chips"


So we can reuse the RTC and Clkout functionality.

Swap '.' for ','.


Signed-off-by: Wadim Egorov 
---
  drivers/mfd/Kconfig   |   4 +-
  drivers/mfd/rk808.c   | 231 ++
  include/linux/mfd/rk808.h | 162 ++--
  3 files changed, 350 insertions(+), 47 deletions(-)

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 1bcf601..7ba464b 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -839,13 +839,13 @@ config MFD_RC5T583
  different functionality of the device.
  
  config MFD_RK808

-   tristate "Rockchip RK808 Power Management chip"
+   tristate "Rockchip RK808/RK818 Power Management chip"

"Chip"


depends on I2C && OF
select MFD_CORE
select REGMAP_I2C
select REGMAP_IRQ
help
- If you say yes here you get support for the RK808
+ If you say yes here you get support for the RK808 and RK818
  Power Management chips.
  This driver provides common support for accessing the device
  through I2C interface. The device supports multiple sub-devices
diff --git a/drivers/mfd/rk808.c b/drivers/mfd/rk808.c
index 49d7f62..3cf9724 100644
--- a/drivers/mfd/rk808.c
+++ b/drivers/mfd/rk808.c
@@ -1,11 +1,15 @@
  /*
- * MFD core driver for Rockchip RK808
+ * MFD core driver for Rockchip RK808/RK818
   *
   * Copyright (c) 2014, Fuzhou Rockchip Electronics Co., Ltd
   *
   * Author: Chris Zhong 
   * Author: Zhang Qing 
   *
+ * Copyright (C) 2016 PHYTEC Messtechnik GmbH
+ *
+ * Author: Wadim Egorov 
+ *
   * This program is free software; you can redistribute it and/or modify it
   * under the terms and conditions of the GNU General Public License,
   * version 2, as published by the Free Software Foundation.
@@ -22,12 +26,7 @@
  #include 
  #include 
  #include 
-
-struct rk808_reg_data {
-   int addr;
-   int mask;
-   int value;
-};

Why are you moving this to the header?

It is now part of the rk808 struct.


+#include 
  
  static bool rk808_is_volatile_reg(struct device *dev, unsigned int reg)

  {
@@ -57,6 +56,14 @@ static bool rk808_is_volatile_reg(struct device *dev, 
unsigned int reg)
return false;
  }
  
+static const struct regmap_config rk818_regmap_config = {

+   .reg_bits = 8,
+   .val_bits = 8,
+   .max_register = RK818_USB_CTRL_REG,
+   .cache_type = REGCACHE_RBTREE,
+   .volatile_reg = rk808_is_volatile_reg,
+};
+
  static const struct regmap_config rk808_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -83,7 +90,17 @@ static const struct mfd_cell rk808s[] = {
},
  };
  
-static const struct rk808_reg_data pre_init_reg[] = {

+static const struct mfd_cell rk818s[] = {
+   { .name = "rk808-clkout", },

How does this differ to a normal -clock driver?

I don't know. It is a normal clock driver.


+   { .name = "rk808-regulator", },
+   {
+   .name = "rk808-rtc",
+   .num_resources = ARRAY_SIZE(rtc_resources),
+   .resources = _resources[0],

.resources = rtc_resources,  ?


+   },
+};
+
+static const struct rk8xx_reg_data rk808_pre_init_reg[] = {
{ RK808_BUCK3_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_150MA },
{ RK808_BUCK4_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_200MA },
{ RK808_BOOST_CONFIG_REG, BOOST_ILMIN_MASK, BOOST_ILMIN_100MA },
@@ -94,6 +111,22 @@ static const struct rk808_reg_data pre_init_reg[] = {
VB_LO_SEL_3500MV },
  };
  
+static const struct rk8xx_reg_data rk818_pre_init_reg[] = {

+   { RK818_USB_CTRL_REG,   RK818_USB_ILIM_SEL_MASK,
+   RK818_USB_ILMIN_2000MA },
+   /* close charger when usb lower then 3.4V */
+   { RK818_USB_CTRL_REG,   RK818_USB_CHG_SD_VSEL_MASK, (0x7 << 4) },
+   /* no action when vref */
+   { RK818_H5V_EN_REG, BIT(1), RK818_REF_RDY_CTRL },
+   /* enable HDMI 5V */
+   { RK818_H5V_EN_REG, BIT(0), RK818_H5V_EN },
+   /* improve efficiency */
+   { RK818_BUCK2_CONFIG_REG, BUCK2_RATE_MASK,  BUCK_ILMIN_250MA },
+   { RK818_BUCK4_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_250MA },
+   { RK818_BOOST_CONFIG_REG, BOOST_ILMIN_MASK, BOOST_ILMIN_100MA },
+   { RK808_VB_MON_REG, MASK_ALL,   VB_LO_ACT | VB_LO_SEL_3500MV },
+};

The 

Re: [RESEND PATCH v5 1/5] mfd: RK808: Add RK818 support

2016-07-05 Thread Andy Yan

Hi Wadim:

On 2016年06月09日 16:23, Wadim Egorov wrote:

Hi,

On 08.06.2016 16:17, Lee Jones wrote:

On Thu, 02 Jun 2016, Wadim Egorov wrote:


The RK818 chip is a power management IC for multimedia and handheld

"Power Management IC (PMIC)"


devices. It contains the following components:

- Regulators
- RTC
- Clkout

Clocking


- battery support

Battery support


Both chips RK808 and RK818 are using a similar register map.

"Both RK808 ad RK818 chips"


So we can reuse the RTC and Clkout functionality.

Swap '.' for ','.


Signed-off-by: Wadim Egorov 
---
  drivers/mfd/Kconfig   |   4 +-
  drivers/mfd/rk808.c   | 231 ++
  include/linux/mfd/rk808.h | 162 ++--
  3 files changed, 350 insertions(+), 47 deletions(-)

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 1bcf601..7ba464b 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -839,13 +839,13 @@ config MFD_RC5T583
  different functionality of the device.
  
  config MFD_RK808

-   tristate "Rockchip RK808 Power Management chip"
+   tristate "Rockchip RK808/RK818 Power Management chip"

"Chip"


depends on I2C && OF
select MFD_CORE
select REGMAP_I2C
select REGMAP_IRQ
help
- If you say yes here you get support for the RK808
+ If you say yes here you get support for the RK808 and RK818
  Power Management chips.
  This driver provides common support for accessing the device
  through I2C interface. The device supports multiple sub-devices
diff --git a/drivers/mfd/rk808.c b/drivers/mfd/rk808.c
index 49d7f62..3cf9724 100644
--- a/drivers/mfd/rk808.c
+++ b/drivers/mfd/rk808.c
@@ -1,11 +1,15 @@
  /*
- * MFD core driver for Rockchip RK808
+ * MFD core driver for Rockchip RK808/RK818
   *
   * Copyright (c) 2014, Fuzhou Rockchip Electronics Co., Ltd
   *
   * Author: Chris Zhong 
   * Author: Zhang Qing 
   *
+ * Copyright (C) 2016 PHYTEC Messtechnik GmbH
+ *
+ * Author: Wadim Egorov 
+ *
   * This program is free software; you can redistribute it and/or modify it
   * under the terms and conditions of the GNU General Public License,
   * version 2, as published by the Free Software Foundation.
@@ -22,12 +26,7 @@
  #include 
  #include 
  #include 
-
-struct rk808_reg_data {
-   int addr;
-   int mask;
-   int value;
-};

Why are you moving this to the header?

It is now part of the rk808 struct.


+#include 
  
  static bool rk808_is_volatile_reg(struct device *dev, unsigned int reg)

  {
@@ -57,6 +56,14 @@ static bool rk808_is_volatile_reg(struct device *dev, 
unsigned int reg)
return false;
  }
  
+static const struct regmap_config rk818_regmap_config = {

+   .reg_bits = 8,
+   .val_bits = 8,
+   .max_register = RK818_USB_CTRL_REG,
+   .cache_type = REGCACHE_RBTREE,
+   .volatile_reg = rk808_is_volatile_reg,
+};
+
  static const struct regmap_config rk808_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -83,7 +90,17 @@ static const struct mfd_cell rk808s[] = {
},
  };
  
-static const struct rk808_reg_data pre_init_reg[] = {

+static const struct mfd_cell rk818s[] = {
+   { .name = "rk808-clkout", },

How does this differ to a normal -clock driver?

I don't know. It is a normal clock driver.


+   { .name = "rk808-regulator", },
+   {
+   .name = "rk808-rtc",
+   .num_resources = ARRAY_SIZE(rtc_resources),
+   .resources = _resources[0],

.resources = rtc_resources,  ?


+   },
+};
+
+static const struct rk8xx_reg_data rk808_pre_init_reg[] = {
{ RK808_BUCK3_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_150MA },
{ RK808_BUCK4_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_200MA },
{ RK808_BOOST_CONFIG_REG, BOOST_ILMIN_MASK, BOOST_ILMIN_100MA },
@@ -94,6 +111,22 @@ static const struct rk808_reg_data pre_init_reg[] = {
VB_LO_SEL_3500MV },
  };
  
+static const struct rk8xx_reg_data rk818_pre_init_reg[] = {

+   { RK818_USB_CTRL_REG,   RK818_USB_ILIM_SEL_MASK,
+   RK818_USB_ILMIN_2000MA },
+   /* close charger when usb lower then 3.4V */
+   { RK818_USB_CTRL_REG,   RK818_USB_CHG_SD_VSEL_MASK, (0x7 << 4) },
+   /* no action when vref */
+   { RK818_H5V_EN_REG, BIT(1), RK818_REF_RDY_CTRL },
+   /* enable HDMI 5V */
+   { RK818_H5V_EN_REG, BIT(0), RK818_H5V_EN },
+   /* improve efficiency */
+   { RK818_BUCK2_CONFIG_REG, BUCK2_RATE_MASK,  BUCK_ILMIN_250MA },
+   { RK818_BUCK4_CONFIG_REG, BUCK_ILMIN_MASK,  BUCK_ILMIN_250MA },
+   { RK818_BOOST_CONFIG_REG, BOOST_ILMIN_MASK, BOOST_ILMIN_100MA },
+   { RK808_VB_MON_REG, MASK_ALL,   VB_LO_ACT | VB_LO_SEL_3500MV },
+};

The alignment here looks odd.

I will fix it in the next version.


  static const struct 

Re: [PATCH] drm/radeon: Remove deprecated create_singlethread_workqueue

2016-07-05 Thread Michel Dänzer
On 06.07.2016 06:06, Tejun Heo wrote:
> On Mon, Jul 04, 2016 at 12:58:32PM +0900, Michel Dänzer wrote:
>> On 02.07.2016 22:46, Tejun Heo wrote:
>>> On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
 alloc_workqueue replaces deprecated create_singlethread_workqueue().

 A dedicated workqueue has been used since work items need to be flushed
 as a group rather than individually.
> 
> There seem to be two work items involved, the flip one and unpin one.
> Provided that there's no ordering requirement between the two, can't
> we just flush them individually?

There is an ordering requirement between the two queues, but it's
enforced by the driver (by only queuing the unpin work once a flip has
completed, which only happens after the corresponding flip work has run).


Not being very familiar with the workqueue APIs, I'll describe how it's
supposed to work from a driver POV, which will hopefully help you guys
decide on the most appropriate alloc_workqueue parameters.

There is one flip work queue for each hardware CRTC. At most one
radeon_flip_work_func item can be queued for any of them at any time.
When a radeon_flip_work_func item is queued, it should be executed ASAP
(so WQ_HIGHPRI might be appropriate?).

In contrast, the radeon_unpin_work_func items aren't particularly
urgent, and it's okay for several of them to be queued up. So I guess it
would actually make sense to use a different workqueue for them, maybe
even the default one?


 Since there are only a fixed number of work items, explicit concurrency
 limit is unnecessary here.
>>>
>>> What are the involved work items?
>>
>> drivers/gpu/drm/radeon/radeon_display.c:radeon_flip_work_func()
>>
>>> Is it safe to run them concurrently?
>>
>> Concurrently with what exactly?
> 
> The flip and unpin work items.

Yes, it's safe to run those concurrently.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: [PATCH] drm/radeon: Remove deprecated create_singlethread_workqueue

2016-07-05 Thread Michel Dänzer
On 06.07.2016 06:06, Tejun Heo wrote:
> On Mon, Jul 04, 2016 at 12:58:32PM +0900, Michel Dänzer wrote:
>> On 02.07.2016 22:46, Tejun Heo wrote:
>>> On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
 alloc_workqueue replaces deprecated create_singlethread_workqueue().

 A dedicated workqueue has been used since work items need to be flushed
 as a group rather than individually.
> 
> There seem to be two work items involved, the flip one and unpin one.
> Provided that there's no ordering requirement between the two, can't
> we just flush them individually?

There is an ordering requirement between the two queues, but it's
enforced by the driver (by only queuing the unpin work once a flip has
completed, which only happens after the corresponding flip work has run).


Not being very familiar with the workqueue APIs, I'll describe how it's
supposed to work from a driver POV, which will hopefully help you guys
decide on the most appropriate alloc_workqueue parameters.

There is one flip work queue for each hardware CRTC. At most one
radeon_flip_work_func item can be queued for any of them at any time.
When a radeon_flip_work_func item is queued, it should be executed ASAP
(so WQ_HIGHPRI might be appropriate?).

In contrast, the radeon_unpin_work_func items aren't particularly
urgent, and it's okay for several of them to be queued up. So I guess it
would actually make sense to use a different workqueue for them, maybe
even the default one?


 Since there are only a fixed number of work items, explicit concurrency
 limit is unnecessary here.
>>>
>>> What are the involved work items?
>>
>> drivers/gpu/drm/radeon/radeon_display.c:radeon_flip_work_func()
>>
>>> Is it safe to run them concurrently?
>>
>> Concurrently with what exactly?
> 
> The flip and unpin work items.

Yes, it's safe to run those concurrently.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/06/2016 10:18 AM, Neo Jia wrote:
> >On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >>>
> >>>
> >>>On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >
> >KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and 
> >these
> >patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is 
> being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped 
> memory gets
> unmmaped correctly via unmap_mapping_range.
> >>>
> >>>Great, then I'll include them in 4.8.
> >>
> >>Code is okay, but i still suspect if this implementation, fetch mmio pages 
> >>in fault
> >>handler, is needed. We'd better include these patches after the design of 
> >>vfio
> >>framework is decided.
> >
> >Hi Guangrong,
> >
> >I disagree. The design of VFIO framework has been actively discussed in the 
> >KVM
> >and QEMU mailing for a while and the fault handler is agreed upon to provide 
> >the
> >flexibility for different driver vendors' implementation. With that said, I 
> >am
> >still open to discuss with you and anybody else about this framework as the 
> >goal
> >is to allow multiple vendor to plugin into this framework to support their
> >mediated device virtualization scheme, such as Intel, IBM and us.
> 
> The discussion is still going on. And current vfio patchset we reviewed is 
> still
> problematic.

My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)

> 
> >
> >May I ask you what the exact issue you have with this interface for Intel to 
> >support
> >your own GPU virtualization?
> 
> Intel's vGPU can work with this framework. We really appreciate your / 
> nvidia's
> contribution.

Then, I don't think we should embargo Paolo's patch.

> 
> i didn’t mean to offend you, i just want to make sure if this complexity is 
> really
> needed and inspect if this framework is safe enough and think it over if we 
> have
> a better implementation.

Not at all. :-)

Suggestions are always welcome, I just want to know the exact issues you have
with the code so I can have a better response to address that with proper 
information.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/06/2016 10:18 AM, Neo Jia wrote:
> >On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >>>
> >>>
> >>>On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >
> >KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and 
> >these
> >patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is 
> being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped 
> memory gets
> unmmaped correctly via unmap_mapping_range.
> >>>
> >>>Great, then I'll include them in 4.8.
> >>
> >>Code is okay, but i still suspect if this implementation, fetch mmio pages 
> >>in fault
> >>handler, is needed. We'd better include these patches after the design of 
> >>vfio
> >>framework is decided.
> >
> >Hi Guangrong,
> >
> >I disagree. The design of VFIO framework has been actively discussed in the 
> >KVM
> >and QEMU mailing for a while and the fault handler is agreed upon to provide 
> >the
> >flexibility for different driver vendors' implementation. With that said, I 
> >am
> >still open to discuss with you and anybody else about this framework as the 
> >goal
> >is to allow multiple vendor to plugin into this framework to support their
> >mediated device virtualization scheme, such as Intel, IBM and us.
> 
> The discussion is still going on. And current vfio patchset we reviewed is 
> still
> problematic.

My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)

> 
> >
> >May I ask you what the exact issue you have with this interface for Intel to 
> >support
> >your own GPU virtualization?
> 
> Intel's vGPU can work with this framework. We really appreciate your / 
> nvidia's
> contribution.

Then, I don't think we should embargo Paolo's patch.

> 
> i didn’t mean to offend you, i just want to make sure if this complexity is 
> really
> needed and inspect if this framework is safe enough and think it over if we 
> have
> a better implementation.

Not at all. :-)

Suggestions are always welcome, I just want to know the exact issues you have
with the code so I can have a better response to address that with proper 
information.

Thanks,
Neo



[PATCH 0/2] i2c-dev: Don't let userspace block adapter

2016-07-05 Thread Viresh Kumar
Hi Wolfram/Jean,

I am part of the kernel team for Google's projectara [1], where we are
building a module smart phone.

This series tries to fix one of the problems we hit on our system as we
are required to hotplug pretty much every thing on the phone and so this
fixes hotplug issues with i2c-dev.

As described in the second patch, the current implementation of i2c-dev
file operations doesn't let the modules (hardware attached to the phone)
eject from the phone as the cleanup path for the module hasn't finished
yet (i2c adapter not removed).

We can't let the userspace block the kernel devices forever in such
cases.

I was able to test them on the ARA phone with kernel 3.10 only and not
mainline.

--
viresh

[1] https://atap.google.com/ara/

Viresh Kumar (2):
  i2c-dev: don't get i2c adapter via i2c_dev
  i2c-dev: Don't block the adapter from unregistering

 drivers/i2c/i2c-dev.c | 79 ++-
 include/linux/i2c.h   |  1 +
 2 files changed, 67 insertions(+), 13 deletions(-)

-- 
2.7.4



[PATCH 0/2] i2c-dev: Don't let userspace block adapter

2016-07-05 Thread Viresh Kumar
Hi Wolfram/Jean,

I am part of the kernel team for Google's projectara [1], where we are
building a module smart phone.

This series tries to fix one of the problems we hit on our system as we
are required to hotplug pretty much every thing on the phone and so this
fixes hotplug issues with i2c-dev.

As described in the second patch, the current implementation of i2c-dev
file operations doesn't let the modules (hardware attached to the phone)
eject from the phone as the cleanup path for the module hasn't finished
yet (i2c adapter not removed).

We can't let the userspace block the kernel devices forever in such
cases.

I was able to test them on the ARA phone with kernel 3.10 only and not
mainline.

--
viresh

[1] https://atap.google.com/ara/

Viresh Kumar (2):
  i2c-dev: don't get i2c adapter via i2c_dev
  i2c-dev: Don't block the adapter from unregistering

 drivers/i2c/i2c-dev.c | 79 ++-
 include/linux/i2c.h   |  1 +
 2 files changed, 67 insertions(+), 13 deletions(-)

-- 
2.7.4



[PATCH 2/2] i2c-dev: Don't block the adapter from unregistering

2016-07-05 Thread Viresh Kumar
The i2c-dev calls i2c_get_adapter() from the .open() callback, which
doesn't let the adapter device unregister unless the .close() callback
is called.

On some platforms (like Google ARA), this doesn't let the modules
(hardware attached to the phone) eject from the phone as the cleanup
path for the module hasn't finished yet (i2c adapter not removed).

We can't let the userspace block the kernel forever in such cases.

Fix this by calling i2c_get_adapter() from all other file operations,
i.e.  read/write/ioctl, to make sure the adapter doesn't get away while
we are in the middle of a operation, but not otherwise. In .open() we
will release the adapter device before returning and so if there is no
data transfer in progress, then the i2c-dev doesn't block the adapter
from unregistering.

Signed-off-by: Viresh Kumar 
---
 drivers/i2c/i2c-dev.c | 72 ++-
 include/linux/i2c.h   |  1 +
 2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index 66f323fd3982..b2562603daa9 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -142,13 +142,25 @@ static ssize_t i2cdev_read(struct file *file, char __user 
*buf, size_t count,
int ret;
 
struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
 
if (count > 8192)
count = 8192;
 
tmp = kmalloc(count, GFP_KERNEL);
-   if (tmp == NULL)
-   return -ENOMEM;
+   if (tmp == NULL) {
+   ret = -ENOMEM;
+   goto put_adapter;
+   }
 
pr_debug("i2c-dev: i2c-%d reading %zu bytes.\n",
iminor(file_inode(file)), count);
@@ -157,6 +169,9 @@ static ssize_t i2cdev_read(struct file *file, char __user 
*buf, size_t count,
if (ret >= 0)
ret = copy_to_user(buf, tmp, count) ? -EFAULT : ret;
kfree(tmp);
+
+put_adapter:
+   i2c_put_adapter(adap);
return ret;
 }
 
@@ -166,19 +181,34 @@ static ssize_t i2cdev_write(struct file *file, const char 
__user *buf,
int ret;
char *tmp;
struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
 
if (count > 8192)
count = 8192;
 
tmp = memdup_user(buf, count);
-   if (IS_ERR(tmp))
-   return PTR_ERR(tmp);
+   if (IS_ERR(tmp)) {
+   ret = PTR_ERR(tmp);
+   goto put_adapter;
+   }
 
pr_debug("i2c-dev: i2c-%d writing %zu bytes.\n",
iminor(file_inode(file)), count);
 
ret = i2c_master_send(client, tmp, count);
kfree(tmp);
+
+put_adapter:
+   i2c_put_adapter(adap);
return ret;
 }
 
@@ -412,9 +442,9 @@ static noinline int i2cdev_ioctl_smbus(struct i2c_client 
*client,
return res;
 }
 
-static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long 
arg)
+static long __i2cdev_ioctl(struct i2c_client *client, unsigned int cmd,
+  unsigned long arg)
 {
-   struct i2c_client *client = file->private_data;
unsigned long funcs;
 
dev_dbg(>adapter->dev, "ioctl, cmd=0x%02x, arg=0x%02lx\n",
@@ -480,6 +510,28 @@ static long i2cdev_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return 0;
 }
 
+static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long 
arg)
+{
+   struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+   unsigned long ret;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
+
+   ret = __i2cdev_ioctl(client, cmd, arg);
+
+put_adapter:
+   i2c_put_adapter(adap);
+   return ret;
+}
+
 static int i2cdev_open(struct inode *inode, struct file *file)
 {
unsigned int minor = iminor(inode);
@@ -504,9 +556,16 @@ static int i2cdev_open(struct inode *inode, struct file 
*file)
}
snprintf(client->name, I2C_NAME_SIZE, "i2c-dev %d", adap->nr);
 
+   client->adapter_nr = minor;
client->adapter = adap;
file->private_data = client;
 
+   /*
+* Allow the adapter to unregister while userspace has opened the i2c
+* device.
+*/
+   i2c_put_adapter(client->adapter);
+
return 0;
 }
 
@@ -514,7 +573,6 @@ static int i2cdev_release(struct inode 

[PATCH 1/2] i2c-dev: don't get i2c adapter via i2c_dev

2016-07-05 Thread Viresh Kumar
There is no code protecting i2c_dev to be freed after it is returned
from i2c_dev_get_by_minor() and using it to access the value which we
already have (minor) isn't safe really.

Avoid using it and get the adapter directly from 'minor'.

Signed-off-by: Viresh Kumar 
---
 drivers/i2c/i2c-dev.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index 6ecfd76270f2..66f323fd3982 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -485,13 +485,8 @@ static int i2cdev_open(struct inode *inode, struct file 
*file)
unsigned int minor = iminor(inode);
struct i2c_client *client;
struct i2c_adapter *adap;
-   struct i2c_dev *i2c_dev;
-
-   i2c_dev = i2c_dev_get_by_minor(minor);
-   if (!i2c_dev)
-   return -ENODEV;
 
-   adap = i2c_get_adapter(i2c_dev->adap->nr);
+   adap = i2c_get_adapter(minor);
if (!adap)
return -ENODEV;
 
-- 
2.7.4



[PATCH 2/2] i2c-dev: Don't block the adapter from unregistering

2016-07-05 Thread Viresh Kumar
The i2c-dev calls i2c_get_adapter() from the .open() callback, which
doesn't let the adapter device unregister unless the .close() callback
is called.

On some platforms (like Google ARA), this doesn't let the modules
(hardware attached to the phone) eject from the phone as the cleanup
path for the module hasn't finished yet (i2c adapter not removed).

We can't let the userspace block the kernel forever in such cases.

Fix this by calling i2c_get_adapter() from all other file operations,
i.e.  read/write/ioctl, to make sure the adapter doesn't get away while
we are in the middle of a operation, but not otherwise. In .open() we
will release the adapter device before returning and so if there is no
data transfer in progress, then the i2c-dev doesn't block the adapter
from unregistering.

Signed-off-by: Viresh Kumar 
---
 drivers/i2c/i2c-dev.c | 72 ++-
 include/linux/i2c.h   |  1 +
 2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index 66f323fd3982..b2562603daa9 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -142,13 +142,25 @@ static ssize_t i2cdev_read(struct file *file, char __user 
*buf, size_t count,
int ret;
 
struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
 
if (count > 8192)
count = 8192;
 
tmp = kmalloc(count, GFP_KERNEL);
-   if (tmp == NULL)
-   return -ENOMEM;
+   if (tmp == NULL) {
+   ret = -ENOMEM;
+   goto put_adapter;
+   }
 
pr_debug("i2c-dev: i2c-%d reading %zu bytes.\n",
iminor(file_inode(file)), count);
@@ -157,6 +169,9 @@ static ssize_t i2cdev_read(struct file *file, char __user 
*buf, size_t count,
if (ret >= 0)
ret = copy_to_user(buf, tmp, count) ? -EFAULT : ret;
kfree(tmp);
+
+put_adapter:
+   i2c_put_adapter(adap);
return ret;
 }
 
@@ -166,19 +181,34 @@ static ssize_t i2cdev_write(struct file *file, const char 
__user *buf,
int ret;
char *tmp;
struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
 
if (count > 8192)
count = 8192;
 
tmp = memdup_user(buf, count);
-   if (IS_ERR(tmp))
-   return PTR_ERR(tmp);
+   if (IS_ERR(tmp)) {
+   ret = PTR_ERR(tmp);
+   goto put_adapter;
+   }
 
pr_debug("i2c-dev: i2c-%d writing %zu bytes.\n",
iminor(file_inode(file)), count);
 
ret = i2c_master_send(client, tmp, count);
kfree(tmp);
+
+put_adapter:
+   i2c_put_adapter(adap);
return ret;
 }
 
@@ -412,9 +442,9 @@ static noinline int i2cdev_ioctl_smbus(struct i2c_client 
*client,
return res;
 }
 
-static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long 
arg)
+static long __i2cdev_ioctl(struct i2c_client *client, unsigned int cmd,
+  unsigned long arg)
 {
-   struct i2c_client *client = file->private_data;
unsigned long funcs;
 
dev_dbg(>adapter->dev, "ioctl, cmd=0x%02x, arg=0x%02lx\n",
@@ -480,6 +510,28 @@ static long i2cdev_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return 0;
 }
 
+static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long 
arg)
+{
+   struct i2c_client *client = file->private_data;
+   struct i2c_adapter *adap;
+   unsigned long ret;
+
+   adap = i2c_get_adapter(client->adapter_nr);
+   if (!adap)
+   return -ENODEV;
+
+   if (adap != client->adapter) {
+   ret = -EINVAL;
+   goto put_adapter;
+   }
+
+   ret = __i2cdev_ioctl(client, cmd, arg);
+
+put_adapter:
+   i2c_put_adapter(adap);
+   return ret;
+}
+
 static int i2cdev_open(struct inode *inode, struct file *file)
 {
unsigned int minor = iminor(inode);
@@ -504,9 +556,16 @@ static int i2cdev_open(struct inode *inode, struct file 
*file)
}
snprintf(client->name, I2C_NAME_SIZE, "i2c-dev %d", adap->nr);
 
+   client->adapter_nr = minor;
client->adapter = adap;
file->private_data = client;
 
+   /*
+* Allow the adapter to unregister while userspace has opened the i2c
+* device.
+*/
+   i2c_put_adapter(client->adapter);
+
return 0;
 }
 
@@ -514,7 +573,6 @@ static int i2cdev_release(struct inode *inode, struct file 
*file)
 {

[PATCH 1/2] i2c-dev: don't get i2c adapter via i2c_dev

2016-07-05 Thread Viresh Kumar
There is no code protecting i2c_dev to be freed after it is returned
from i2c_dev_get_by_minor() and using it to access the value which we
already have (minor) isn't safe really.

Avoid using it and get the adapter directly from 'minor'.

Signed-off-by: Viresh Kumar 
---
 drivers/i2c/i2c-dev.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index 6ecfd76270f2..66f323fd3982 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -485,13 +485,8 @@ static int i2cdev_open(struct inode *inode, struct file 
*file)
unsigned int minor = iminor(inode);
struct i2c_client *client;
struct i2c_adapter *adap;
-   struct i2c_dev *i2c_dev;
-
-   i2c_dev = i2c_dev_get_by_minor(minor);
-   if (!i2c_dev)
-   return -ENODEV;
 
-   adap = i2c_get_adapter(i2c_dev->adap->nr);
+   adap = i2c_get_adapter(minor);
if (!adap)
return -ENODEV;
 
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >