Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 19, 2017 at 12:39:47PM +, Liang, Kan escreveu:
> > On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > > > he proposed solution and it was changed by Arnaldo in here:

> > > >   https://marc.info/?l=linux-kernel=149132267410294=2

> > > > but looks like it never got merged

> > > > could you please add this or similar code before you add the locking
> > > > code/overhead in?

> > > I'm rehashing that patch and adding it on top of what is in my
> > > perf/core branch, will push soon, for now you can take a look at
> > tmp.perf/core.

> > checked the code.. one nit, could we have single threaded by default?
> > only one command is multithreaded atm, it could call perf_set_multihreaded
> > instead of all current related commands call perf_set_singlethreaded

> I agree with single threaded as default setting, also I think we need both
> functions, perf_set_multihreaded and perf_set_singlethreaded.
> Perf tools probably be half single threaded and half multithreaded.
> E.g. the perf top optimization. Only the events synthesize codes are
> multithreaded. So we have to set multithreaded first, then change it
> to single threaded.

Ok, agreed with both of you, i.e. I'll make it single threaded by
default, and provide both functions, this way we get a default that is
what most tools use, and a way to select multithreaded mode for when it
is needed, then going back to single threaded.

- Arnaldo


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 19, 2017 at 12:39:47PM +, Liang, Kan escreveu:
> > On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > > > he proposed solution and it was changed by Arnaldo in here:

> > > >   https://marc.info/?l=linux-kernel=149132267410294=2

> > > > but looks like it never got merged

> > > > could you please add this or similar code before you add the locking
> > > > code/overhead in?

> > > I'm rehashing that patch and adding it on top of what is in my
> > > perf/core branch, will push soon, for now you can take a look at
> > tmp.perf/core.

> > checked the code.. one nit, could we have single threaded by default?
> > only one command is multithreaded atm, it could call perf_set_multihreaded
> > instead of all current related commands call perf_set_singlethreaded

> I agree with single threaded as default setting, also I think we need both
> functions, perf_set_multihreaded and perf_set_singlethreaded.
> Perf tools probably be half single threaded and half multithreaded.
> E.g. the perf top optimization. Only the events synthesize codes are
> multithreaded. So we have to set multithreaded first, then change it
> to single threaded.

Ok, agreed with both of you, i.e. I'll make it single threaded by
default, and provide both functions, this way we get a default that is
what most tools use, and a way to select multithreaded mode for when it
is needed, then going back to single threaded.

- Arnaldo


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Liang, Kan
> On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > > From: Kan Liang 
> > > >
> > > > The patch series intends to fix the severe performance issue in
> > > > Knights Landing/Mill, when monitoring in heavy load system.
> > > > perf top costs a few minutes to show the result, which is
> > > > unacceptable.
> > > > With the patch series applied, the latency will reduces to several
> > > > seconds.
> > > >
> > > > machine__synthesize_threads and perf_top__mmap_read costs most
> of
> > > > the perf top time (> 99%).
> > >
> > > looks like this patchset adds locking into code paths used by other
> > > single threaded tools and that might be bad for them as noted by
> > > Andi in here:
> > >
> > >   https://marc.info/?l=linux-kernel=149031672928989=2
> > >
> > > he proposed solution and it was changed by Arnaldo in here:
> > >
> > >   https://marc.info/?l=linux-kernel=149132267410294=2
> > >
> > > but looks like it never got merged
> > >
> > > could you please add this or similar code before you add the locking
> > > code/overhead in?
> >
> > I'm rehashing that patch and adding it on top of what is in my
> > perf/core branch, will push soon, for now you can take a look at
> tmp.perf/core.
> 
> checked the code.. one nit, could we have single threaded by default?
> only one command is multithreaded atm, it could call perf_set_multihreaded
> instead of all current related commands call perf_set_singlethreaded

I agree with single threaded as default setting, also I think we need both
functions, perf_set_multihreaded and perf_set_singlethreaded.
Perf tools probably be half single threaded and half multithreaded.
E.g. the perf top optimization. Only the events synthesize codes are
multithreaded. So we have to set multithreaded first, then change it
to single threaded.

Thanks,
Kan

> 
> other than that it looks ok
> 
> thanks,
> jirka


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Liang, Kan
> On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > > From: Kan Liang 
> > > >
> > > > The patch series intends to fix the severe performance issue in
> > > > Knights Landing/Mill, when monitoring in heavy load system.
> > > > perf top costs a few minutes to show the result, which is
> > > > unacceptable.
> > > > With the patch series applied, the latency will reduces to several
> > > > seconds.
> > > >
> > > > machine__synthesize_threads and perf_top__mmap_read costs most
> of
> > > > the perf top time (> 99%).
> > >
> > > looks like this patchset adds locking into code paths used by other
> > > single threaded tools and that might be bad for them as noted by
> > > Andi in here:
> > >
> > >   https://marc.info/?l=linux-kernel=149031672928989=2
> > >
> > > he proposed solution and it was changed by Arnaldo in here:
> > >
> > >   https://marc.info/?l=linux-kernel=149132267410294=2
> > >
> > > but looks like it never got merged
> > >
> > > could you please add this or similar code before you add the locking
> > > code/overhead in?
> >
> > I'm rehashing that patch and adding it on top of what is in my
> > perf/core branch, will push soon, for now you can take a look at
> tmp.perf/core.
> 
> checked the code.. one nit, could we have single threaded by default?
> only one command is multithreaded atm, it could call perf_set_multihreaded
> instead of all current related commands call perf_set_singlethreaded

I agree with single threaded as default setting, also I think we need both
functions, perf_set_multihreaded and perf_set_singlethreaded.
Perf tools probably be half single threaded and half multithreaded.
E.g. the perf top optimization. Only the events synthesize codes are
multithreaded. So we have to set multithreaded first, then change it
to single threaded.

Thanks,
Kan

> 
> other than that it looks ok
> 
> thanks,
> jirka


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Jiri Olsa
On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > From: Kan Liang 
> > > 
> > > The patch series intends to fix the severe performance issue in
> > > Knights Landing/Mill, when monitoring in heavy load system.
> > > perf top costs a few minutes to show the result, which is
> > > unacceptable.
> > > With the patch series applied, the latency will reduces to
> > > several seconds.
> > > 
> > > machine__synthesize_threads and perf_top__mmap_read costs most of
> > > the perf top time (> 99%).
> > 
> > looks like this patchset adds locking into code paths
> > used by other single threaded tools and that might
> > be bad for them as noted by Andi in here:
> > 
> >   https://marc.info/?l=linux-kernel=149031672928989=2
> > 
> > he proposed solution and it was changed by Arnaldo in here:
> > 
> >   https://marc.info/?l=linux-kernel=149132267410294=2
> > 
> > but looks like it never got merged
> > 
> > could you please add this or similar code before you add the
> > locking code/overhead in?
> 
> I'm rehashing that patch and adding it on top of what is in my perf/core
> branch, will push soon, for now you can take a look at tmp.perf/core.

checked the code.. one nit, could we have single threaded by default?
only one command is multithreaded atm, it could call perf_set_multihreaded
instead of all current related commands call perf_set_singlethreaded

other than that it looks ok

thanks,
jirka


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-19 Thread Jiri Olsa
On Mon, Sep 18, 2017 at 10:01:00AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > From: Kan Liang 
> > > 
> > > The patch series intends to fix the severe performance issue in
> > > Knights Landing/Mill, when monitoring in heavy load system.
> > > perf top costs a few minutes to show the result, which is
> > > unacceptable.
> > > With the patch series applied, the latency will reduces to
> > > several seconds.
> > > 
> > > machine__synthesize_threads and perf_top__mmap_read costs most of
> > > the perf top time (> 99%).
> > 
> > looks like this patchset adds locking into code paths
> > used by other single threaded tools and that might
> > be bad for them as noted by Andi in here:
> > 
> >   https://marc.info/?l=linux-kernel=149031672928989=2
> > 
> > he proposed solution and it was changed by Arnaldo in here:
> > 
> >   https://marc.info/?l=linux-kernel=149132267410294=2
> > 
> > but looks like it never got merged
> > 
> > could you please add this or similar code before you add the
> > locking code/overhead in?
> 
> I'm rehashing that patch and adding it on top of what is in my perf/core
> branch, will push soon, for now you can take a look at tmp.perf/core.

checked the code.. one nit, could we have single threaded by default?
only one command is multithreaded atm, it could call perf_set_multihreaded
instead of all current related commands call perf_set_singlethreaded

other than that it looks ok

thanks,
jirka


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Liang, Kan


> Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > From: Kan Liang 
> > >
> > > The patch series intends to fix the severe performance issue in
> > > Knights Landing/Mill, when monitoring in heavy load system.
> > > perf top costs a few minutes to show the result, which is
> > > unacceptable.
> > > With the patch series applied, the latency will reduces to several
> > > seconds.
> > >
> > > machine__synthesize_threads and perf_top__mmap_read costs most of
> > > the perf top time (> 99%).
> >
> > looks like this patchset adds locking into code paths used by other
> > single threaded tools and that might be bad for them as noted by Andi
> > in here:
> >
> >   https://marc.info/?l=linux-kernel=149031672928989=2
> >
> > he proposed solution and it was changed by Arnaldo in here:
> >
> >   https://marc.info/?l=linux-kernel=149132267410294=2
> >
> > but looks like it never got merged
> >
> > could you please add this or similar code before you add the locking
> > code/overhead in?
> 
> I'm rehashing that patch and adding it on top of what is in my perf/core
> branch, will push soon, for now you can take a look at tmp.perf/core.

Thanks.
I will make the V3 based on tmp.perf/core.

Kan



RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Liang, Kan


> Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> > On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > > From: Kan Liang 
> > >
> > > The patch series intends to fix the severe performance issue in
> > > Knights Landing/Mill, when monitoring in heavy load system.
> > > perf top costs a few minutes to show the result, which is
> > > unacceptable.
> > > With the patch series applied, the latency will reduces to several
> > > seconds.
> > >
> > > machine__synthesize_threads and perf_top__mmap_read costs most of
> > > the perf top time (> 99%).
> >
> > looks like this patchset adds locking into code paths used by other
> > single threaded tools and that might be bad for them as noted by Andi
> > in here:
> >
> >   https://marc.info/?l=linux-kernel=149031672928989=2
> >
> > he proposed solution and it was changed by Arnaldo in here:
> >
> >   https://marc.info/?l=linux-kernel=149132267410294=2
> >
> > but looks like it never got merged
> >
> > could you please add this or similar code before you add the locking
> > code/overhead in?
> 
> I'm rehashing that patch and adding it on top of what is in my perf/core
> branch, will push soon, for now you can take a look at tmp.perf/core.

Thanks.
I will make the V3 based on tmp.perf/core.

Kan



Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > From: Kan Liang 
> > 
> > The patch series intends to fix the severe performance issue in
> > Knights Landing/Mill, when monitoring in heavy load system.
> > perf top costs a few minutes to show the result, which is
> > unacceptable.
> > With the patch series applied, the latency will reduces to
> > several seconds.
> > 
> > machine__synthesize_threads and perf_top__mmap_read costs most of
> > the perf top time (> 99%).
> 
> looks like this patchset adds locking into code paths
> used by other single threaded tools and that might
> be bad for them as noted by Andi in here:
> 
>   https://marc.info/?l=linux-kernel=149031672928989=2
> 
> he proposed solution and it was changed by Arnaldo in here:
> 
>   https://marc.info/?l=linux-kernel=149132267410294=2
> 
> but looks like it never got merged
> 
> could you please add this or similar code before you add the
> locking code/overhead in?

I'm rehashing that patch and adding it on top of what is in my perf/core
branch, will push soon, for now you can take a look at tmp.perf/core.

- Arnaldo


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 18, 2017 at 10:57:08AM +0200, Jiri Olsa escreveu:
> On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> > From: Kan Liang 
> > 
> > The patch series intends to fix the severe performance issue in
> > Knights Landing/Mill, when monitoring in heavy load system.
> > perf top costs a few minutes to show the result, which is
> > unacceptable.
> > With the patch series applied, the latency will reduces to
> > several seconds.
> > 
> > machine__synthesize_threads and perf_top__mmap_read costs most of
> > the perf top time (> 99%).
> 
> looks like this patchset adds locking into code paths
> used by other single threaded tools and that might
> be bad for them as noted by Andi in here:
> 
>   https://marc.info/?l=linux-kernel=149031672928989=2
> 
> he proposed solution and it was changed by Arnaldo in here:
> 
>   https://marc.info/?l=linux-kernel=149132267410294=2
> 
> but looks like it never got merged
> 
> could you please add this or similar code before you add the
> locking code/overhead in?

I'm rehashing that patch and adding it on top of what is in my perf/core
branch, will push soon, for now you can take a look at tmp.perf/core.

- Arnaldo


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Jiri Olsa
On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> The patch series intends to fix the severe performance issue in
> Knights Landing/Mill, when monitoring in heavy load system.
> perf top costs a few minutes to show the result, which is
> unacceptable.
> With the patch series applied, the latency will reduces to
> several seconds.
> 
> machine__synthesize_threads and perf_top__mmap_read costs most of
> the perf top time (> 99%).

looks like this patchset adds locking into code paths
used by other single threaded tools and that might
be bad for them as noted by Andi in here:

  https://marc.info/?l=linux-kernel=149031672928989=2

he proposed solution and it was changed by Arnaldo in here:

  https://marc.info/?l=linux-kernel=149132267410294=2

but looks like it never got merged

could you please add this or similar code before you add the
locking code/overhead in?

thanks,
jirka


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-18 Thread Jiri Olsa
On Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> The patch series intends to fix the severe performance issue in
> Knights Landing/Mill, when monitoring in heavy load system.
> perf top costs a few minutes to show the result, which is
> unacceptable.
> With the patch series applied, the latency will reduces to
> several seconds.
> 
> machine__synthesize_threads and perf_top__mmap_read costs most of
> the perf top time (> 99%).

looks like this patchset adds locking into code paths
used by other single threaded tools and that might
be bad for them as noted by Andi in here:

  https://marc.info/?l=linux-kernel=149031672928989=2

he proposed solution and it was changed by Arnaldo in here:

  https://marc.info/?l=linux-kernel=149132267410294=2

but looks like it never got merged

could you please add this or similar code before you add the
locking code/overhead in?

thanks,
jirka


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Fri, Sep 15, 2017 at 05:29:13PM +, Liang, Kan escreveu:
> > > Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > > They are small fixes. I think it's better to merge them with the old
> patches.
> > > > Should I include the modified hashtable patches in V3?
> 
> > > I'll add these now and test, then push another branch, ok?
> 
> > Sure. Thanks.
> > I will prepare the V3 for the new branch then.
> 
> Ok, passes all the tests I trew at them, including 'perf test' and building in
> several distro containers.
> 
> I just pushed perf/core, please continue from there, ok?
> 

Sure.

Thanks,
Kan


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Fri, Sep 15, 2017 at 05:29:13PM +, Liang, Kan escreveu:
> > > Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > > They are small fixes. I think it's better to merge them with the old
> patches.
> > > > Should I include the modified hashtable patches in V3?
> 
> > > I'll add these now and test, then push another branch, ok?
> 
> > Sure. Thanks.
> > I will prepare the V3 for the new branch then.
> 
> Ok, passes all the tests I trew at them, including 'perf test' and building in
> several distro containers.
> 
> I just pushed perf/core, please continue from there, ok?
> 

Sure.

Thanks,
Kan


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Arnaldo Carvalho de Melo
Em Fri, Sep 15, 2017 at 05:29:13PM +, Liang, Kan escreveu:
> > Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > They are small fixes. I think it's better to merge them with the old 
> > > patches.
> > > Should I include the modified hashtable patches in V3?

> > I'll add these now and test, then push another branch, ok?
 
> Sure. Thanks.
> I will prepare the V3 for the new branch then. 

Ok, passes all the tests I trew at them, including 'perf test' and
building in several distro containers.

I just pushed perf/core, please continue from there, ok?

- Arnaldo


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Arnaldo Carvalho de Melo
Em Fri, Sep 15, 2017 at 05:29:13PM +, Liang, Kan escreveu:
> > Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > They are small fixes. I think it's better to merge them with the old 
> > > patches.
> > > Should I include the modified hashtable patches in V3?

> > I'll add these now and test, then push another branch, ok?
 
> Sure. Thanks.
> I will prepare the V3 for the new branch then. 

Ok, passes all the tests I trew at them, including 'perf test' and
building in several distro containers.

I just pushed perf/core, please continue from there, ok?

- Arnaldo


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> > > escreveu:
> > > > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > > > >
> > > > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> > > escreveu:
> > > > > >
> > > > > > So I got the first two patches already merged, and made some
> > > > > > comments about the other patches, please check those,
> > > > > >
> > > > >
> > > > > Thanks for the review Arnaldo.
> > > > >
> > > > > I will take a close look for the comments.
> > > > > For the next version, I only need to include patch 3-10, correct?
> > > >
> > > > Right, and go from my perf/core branch. The hashtable patch is
> > > > still not there as I am running tests before pushing out, but it
> > > > should be there later today.
> > >
> > > So, its at my repo, branch tmp.perf/threads_hashtable
> > >
> > > But 'perf trace' is broken, please take a look below:
> > >
> > > [root@jouet ~]# gdb -c core
> > > GNU gdb (GDB) Fedora 8.0-20.fc26
> > > 
> > > Core was generated by `perf trace -e block:block_bio_queue'.
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x0051089a in ?? ()
> > > (gdb) file perf
> > > Reading symbols from perf...done.
> > > (gdb) bt
> > > #0  0x0051089a in machine__findnew_thread
> > > (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false)
> > > at
> > > util/machine.c:429
> >
> > I think the root cause is tid==-1. So the index of hashtable will be -1.
> > The patch as below should fix it.
> >
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index e6d5381..3c564b8 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -57,7 +57,7 @@ struct machine {
> >
> >  static inline struct threads *machine__threads(struct machine
> > *machine, pid_t tid)  {
> > -   return >threads[tid % THREADS__TABLE_SIZE];
> > +   return >threads[(unsigned int)tid %
> THREADS__TABLE_SIZE];
> >  }
> >
> >  static inline
> >
> >
> > There should be another issue which was introduced by
> > 33013b9a5607 ("perf machine: Optimize a bit the
> > machine__findnew_thread() methods") It should use tid not pid to get the
> threads.
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 90ae9c7..ddeea05 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -473,7 +473,7 @@ static struct thread
> > *machine__findnew_thread(struct machine *machine,
> >
> >  struct thread *__machine__findnew_thread(struct machine *machine,
> > pid_t pid, pid_t tid)  {
> > -   return machine__findnew_thread(machine,
> machine__threads(machine, pid), pid, tid, true);
> > +   return machine__findnew_thread(machine,
> > +machine__threads(machine, tid), pid, tid, true);
> >  }
> >
> > They are small fixes. I think it's better to merge them with the old 
> > patches.
> > Should I include the modified hashtable patches in V3?
> 
> I'll add these now and test, then push another branch, ok?
>

Sure. Thanks.
I will prepare the V3 for the new branch then. 

Thanks,
Kan


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > > Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> > > escreveu:
> > > > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > > > >
> > > > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> > > escreveu:
> > > > > >
> > > > > > So I got the first two patches already merged, and made some
> > > > > > comments about the other patches, please check those,
> > > > > >
> > > > >
> > > > > Thanks for the review Arnaldo.
> > > > >
> > > > > I will take a close look for the comments.
> > > > > For the next version, I only need to include patch 3-10, correct?
> > > >
> > > > Right, and go from my perf/core branch. The hashtable patch is
> > > > still not there as I am running tests before pushing out, but it
> > > > should be there later today.
> > >
> > > So, its at my repo, branch tmp.perf/threads_hashtable
> > >
> > > But 'perf trace' is broken, please take a look below:
> > >
> > > [root@jouet ~]# gdb -c core
> > > GNU gdb (GDB) Fedora 8.0-20.fc26
> > > 
> > > Core was generated by `perf trace -e block:block_bio_queue'.
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x0051089a in ?? ()
> > > (gdb) file perf
> > > Reading symbols from perf...done.
> > > (gdb) bt
> > > #0  0x0051089a in machine__findnew_thread
> > > (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false)
> > > at
> > > util/machine.c:429
> >
> > I think the root cause is tid==-1. So the index of hashtable will be -1.
> > The patch as below should fix it.
> >
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index e6d5381..3c564b8 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -57,7 +57,7 @@ struct machine {
> >
> >  static inline struct threads *machine__threads(struct machine
> > *machine, pid_t tid)  {
> > -   return >threads[tid % THREADS__TABLE_SIZE];
> > +   return >threads[(unsigned int)tid %
> THREADS__TABLE_SIZE];
> >  }
> >
> >  static inline
> >
> >
> > There should be another issue which was introduced by
> > 33013b9a5607 ("perf machine: Optimize a bit the
> > machine__findnew_thread() methods") It should use tid not pid to get the
> threads.
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 90ae9c7..ddeea05 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -473,7 +473,7 @@ static struct thread
> > *machine__findnew_thread(struct machine *machine,
> >
> >  struct thread *__machine__findnew_thread(struct machine *machine,
> > pid_t pid, pid_t tid)  {
> > -   return machine__findnew_thread(machine,
> machine__threads(machine, pid), pid, tid, true);
> > +   return machine__findnew_thread(machine,
> > +machine__threads(machine, tid), pid, tid, true);
> >  }
> >
> > They are small fixes. I think it's better to merge them with the old 
> > patches.
> > Should I include the modified hashtable patches in V3?
> 
> I'll add these now and test, then push another branch, ok?
>

Sure. Thanks.
I will prepare the V3 for the new branch then. 

Thanks,
Kan


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Arnaldo Carvalho de Melo
Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> > escreveu:
> > > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > > >
> > > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> > escreveu:
> > > > >
> > > > > So I got the first two patches already merged, and made some
> > > > > comments about the other patches, please check those,
> > > > >
> > > >
> > > > Thanks for the review Arnaldo.
> > > >
> > > > I will take a close look for the comments.
> > > > For the next version, I only need to include patch 3-10, correct?
> > >
> > > Right, and go from my perf/core branch. The hashtable patch is still
> > > not there as I am running tests before pushing out, but it should be
> > > there later today.
> > 
> > So, its at my repo, branch tmp.perf/threads_hashtable
> > 
> > But 'perf trace' is broken, please take a look below:
> > 
> > [root@jouet ~]# gdb -c core
> > GNU gdb (GDB) Fedora 8.0-20.fc26
> > 
> > Core was generated by `perf trace -e block:block_bio_queue'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x0051089a in ?? ()
> > (gdb) file perf
> > Reading symbols from perf...done.
> > (gdb) bt
> > #0  0x0051089a in machine__findnew_thread
> > (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false) at
> > util/machine.c:429
> 
> I think the root cause is tid==-1. So the index of hashtable will be -1.
> The patch as below should fix it.
> 
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index e6d5381..3c564b8 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -57,7 +57,7 @@ struct machine {
>  
>  static inline struct threads *machine__threads(struct machine *machine, 
> pid_t tid)
>  {
> - return >threads[tid % THREADS__TABLE_SIZE];
> + return >threads[(unsigned int)tid % THREADS__TABLE_SIZE];
>  }
>  
>  static inline
> 
> 
> There should be another issue which was introduced by  
> 33013b9a5607 ("perf machine: Optimize a bit the machine__findnew_thread() 
> methods")
> It should use tid not pid to get the threads.
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 90ae9c7..ddeea05 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -473,7 +473,7 @@ static struct thread *machine__findnew_thread(struct 
> machine *machine,
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
> pid_t tid)
>  {
> - return machine__findnew_thread(machine, machine__threads(machine, 
> pid), pid, tid, true);
> + return machine__findnew_thread(machine, machine__threads(machine, 
> tid), pid, tid, true);
>  }
> 
> They are small fixes. I think it's better to merge them with the old patches.
> Should I include the modified hashtable patches in V3?

I'll add these now and test, then push another branch, ok?

- Arnaldo


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Arnaldo Carvalho de Melo
Em Fri, Sep 15, 2017 at 03:11:51PM +, Liang, Kan escreveu:
> > Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> > escreveu:
> > > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > > >
> > > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> > escreveu:
> > > > >
> > > > > So I got the first two patches already merged, and made some
> > > > > comments about the other patches, please check those,
> > > > >
> > > >
> > > > Thanks for the review Arnaldo.
> > > >
> > > > I will take a close look for the comments.
> > > > For the next version, I only need to include patch 3-10, correct?
> > >
> > > Right, and go from my perf/core branch. The hashtable patch is still
> > > not there as I am running tests before pushing out, but it should be
> > > there later today.
> > 
> > So, its at my repo, branch tmp.perf/threads_hashtable
> > 
> > But 'perf trace' is broken, please take a look below:
> > 
> > [root@jouet ~]# gdb -c core
> > GNU gdb (GDB) Fedora 8.0-20.fc26
> > 
> > Core was generated by `perf trace -e block:block_bio_queue'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x0051089a in ?? ()
> > (gdb) file perf
> > Reading symbols from perf...done.
> > (gdb) bt
> > #0  0x0051089a in machine__findnew_thread
> > (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false) at
> > util/machine.c:429
> 
> I think the root cause is tid==-1. So the index of hashtable will be -1.
> The patch as below should fix it.
> 
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index e6d5381..3c564b8 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -57,7 +57,7 @@ struct machine {
>  
>  static inline struct threads *machine__threads(struct machine *machine, 
> pid_t tid)
>  {
> - return >threads[tid % THREADS__TABLE_SIZE];
> + return >threads[(unsigned int)tid % THREADS__TABLE_SIZE];
>  }
>  
>  static inline
> 
> 
> There should be another issue which was introduced by  
> 33013b9a5607 ("perf machine: Optimize a bit the machine__findnew_thread() 
> methods")
> It should use tid not pid to get the threads.
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 90ae9c7..ddeea05 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -473,7 +473,7 @@ static struct thread *machine__findnew_thread(struct 
> machine *machine,
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
> pid_t tid)
>  {
> - return machine__findnew_thread(machine, machine__threads(machine, 
> pid), pid, tid, true);
> + return machine__findnew_thread(machine, machine__threads(machine, 
> tid), pid, tid, true);
>  }
> 
> They are small fixes. I think it's better to merge them with the old patches.
> Should I include the modified hashtable patches in V3?

I'll add these now and test, then push another branch, ok?

- Arnaldo


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> escreveu:
> > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > >
> > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> escreveu:
> > > >
> > > > So I got the first two patches already merged, and made some
> > > > comments about the other patches, please check those,
> > > >
> > >
> > > Thanks for the review Arnaldo.
> > >
> > > I will take a close look for the comments.
> > > For the next version, I only need to include patch 3-10, correct?
> >
> > Right, and go from my perf/core branch. The hashtable patch is still
> > not there as I am running tests before pushing out, but it should be
> > there later today.
> 
> So, its at my repo, branch tmp.perf/threads_hashtable
> 
> But 'perf trace' is broken, please take a look below:
> 
> [root@jouet ~]# gdb -c core
> GNU gdb (GDB) Fedora 8.0-20.fc26
> 
> Core was generated by `perf trace -e block:block_bio_queue'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0051089a in ?? ()
> (gdb) file perf
> Reading symbols from perf...done.
> (gdb) bt
> #0  0x0051089a in machine__findnew_thread
> (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false) at
> util/machine.c:429

I think the root cause is tid==-1. So the index of hashtable will be -1.
The patch as below should fix it.

diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e6d5381..3c564b8 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -57,7 +57,7 @@ struct machine {
 
 static inline struct threads *machine__threads(struct machine *machine, pid_t 
tid)
 {
-   return >threads[tid % THREADS__TABLE_SIZE];
+   return >threads[(unsigned int)tid % THREADS__TABLE_SIZE];
 }
 
 static inline


There should be another issue which was introduced by  
33013b9a5607 ("perf machine: Optimize a bit the machine__findnew_thread() 
methods")
It should use tid not pid to get the threads.

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 90ae9c7..ddeea05 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -473,7 +473,7 @@ static struct thread *machine__findnew_thread(struct 
machine *machine,
 
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid)
 {
-   return machine__findnew_thread(machine, machine__threads(machine, 
pid), pid, tid, true);
+   return machine__findnew_thread(machine, machine__threads(machine, 
tid), pid, tid, true);
 }

They are small fixes. I think it's better to merge them with the old patches.
Should I include the modified hashtable patches in V3?

Thanks,
Kan



RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-15 Thread Liang, Kan
> Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo
> escreveu:
> > Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > >
> > > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com
> escreveu:
> > > >
> > > > So I got the first two patches already merged, and made some
> > > > comments about the other patches, please check those,
> > > >
> > >
> > > Thanks for the review Arnaldo.
> > >
> > > I will take a close look for the comments.
> > > For the next version, I only need to include patch 3-10, correct?
> >
> > Right, and go from my perf/core branch. The hashtable patch is still
> > not there as I am running tests before pushing out, but it should be
> > there later today.
> 
> So, its at my repo, branch tmp.perf/threads_hashtable
> 
> But 'perf trace' is broken, please take a look below:
> 
> [root@jouet ~]# gdb -c core
> GNU gdb (GDB) Fedora 8.0-20.fc26
> 
> Core was generated by `perf trace -e block:block_bio_queue'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0051089a in ?? ()
> (gdb) file perf
> Reading symbols from perf...done.
> (gdb) bt
> #0  0x0051089a in machine__findnew_thread
> (machine=0x3dfcab0, threads=0x3dfca78, pid=-1, tid=-1, create=false) at
> util/machine.c:429

I think the root cause is tid==-1. So the index of hashtable will be -1.
The patch as below should fix it.

diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e6d5381..3c564b8 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -57,7 +57,7 @@ struct machine {
 
 static inline struct threads *machine__threads(struct machine *machine, pid_t 
tid)
 {
-   return >threads[tid % THREADS__TABLE_SIZE];
+   return >threads[(unsigned int)tid % THREADS__TABLE_SIZE];
 }
 
 static inline


There should be another issue which was introduced by  
33013b9a5607 ("perf machine: Optimize a bit the machine__findnew_thread() 
methods")
It should use tid not pid to get the threads.

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 90ae9c7..ddeea05 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -473,7 +473,7 @@ static struct thread *machine__findnew_thread(struct 
machine *machine,
 
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid)
 {
-   return machine__findnew_thread(machine, machine__threads(machine, 
pid), pid, tid, true);
+   return machine__findnew_thread(machine, machine__threads(machine, 
tid), pid, tid, true);
 }

They are small fixes. I think it's better to merge them with the old patches.
Should I include the modified hashtable patches in V3?

Thanks,
Kan



Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-14 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > 
> > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> > > 
> > > So I got the first two patches already merged, and made some comments
> > > about the other patches, please check those,
> > > 
> > 
> > Thanks for the review Arnaldo.
> > 
> > I will take a close look for the comments. 
> > For the next version, I only need to include patch 3-10, correct?
> 
> Right, and go from my perf/core branch. The hashtable patch is still not
> there as I am running tests before pushing out, but it should be there
> later today.

So, its at my repo, branch tmp.perf/threads_hashtable

But 'perf trace' is broken, please take a look below:

[root@jouet ~]# gdb -c core
GNU gdb (GDB) Fedora 8.0-20.fc26

Core was generated by `perf trace -e block:block_bio_queue'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0051089a in ?? ()
(gdb) file perf
Reading symbols from perf...done.
(gdb) bt
#0  0x0051089a in machine__findnew_thread (machine=0x3dfcab0, 
threads=0x3dfca78, pid=-1, tid=-1, create=false) at util/machine.c:429
#1  0x00510b49 in machine__find_thread (machine=0x3dfcab0, pid=-1, 
tid=-1) at util/machine.c:498
#2  0x00483dd0 in trace__set_filter_loop_pids (trace=0x7ffc0d23f880) at 
builtin-trace.c:2247
#3  0x0048438b in trace__run (trace=0x7ffc0d23f880, argc=0, 
argv=0x7ffc0d242a00) at builtin-trace.c:2385
#4  0x00487219 in cmd_trace (argc=0, argv=0x7ffc0d242a00) at 
builtin-trace.c:3121
#5  0x004c0b4a in run_builtin (p=0xa75d00 , argc=3, 
argv=0x7ffc0d242a00) at perf.c:296
#6  0x004c0db7 in handle_internal_command (argc=3, argv=0x7ffc0d242a00) 
at perf.c:348
#7  0x004c0f09 in run_argv (argcp=0x7ffc0d24285c, argv=0x7ffc0d242850) 
at perf.c:392
#8  0x004c12d7 in main (argc=3, argv=0x7ffc0d242a00) at perf.c:536
(gdb)


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-14 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 13, 2017 at 12:38:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > > 
> > > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> > > 
> > > So I got the first two patches already merged, and made some comments
> > > about the other patches, please check those,
> > > 
> > 
> > Thanks for the review Arnaldo.
> > 
> > I will take a close look for the comments. 
> > For the next version, I only need to include patch 3-10, correct?
> 
> Right, and go from my perf/core branch. The hashtable patch is still not
> there as I am running tests before pushing out, but it should be there
> later today.

So, its at my repo, branch tmp.perf/threads_hashtable

But 'perf trace' is broken, please take a look below:

[root@jouet ~]# gdb -c core
GNU gdb (GDB) Fedora 8.0-20.fc26

Core was generated by `perf trace -e block:block_bio_queue'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0051089a in ?? ()
(gdb) file perf
Reading symbols from perf...done.
(gdb) bt
#0  0x0051089a in machine__findnew_thread (machine=0x3dfcab0, 
threads=0x3dfca78, pid=-1, tid=-1, create=false) at util/machine.c:429
#1  0x00510b49 in machine__find_thread (machine=0x3dfcab0, pid=-1, 
tid=-1) at util/machine.c:498
#2  0x00483dd0 in trace__set_filter_loop_pids (trace=0x7ffc0d23f880) at 
builtin-trace.c:2247
#3  0x0048438b in trace__run (trace=0x7ffc0d23f880, argc=0, 
argv=0x7ffc0d242a00) at builtin-trace.c:2385
#4  0x00487219 in cmd_trace (argc=0, argv=0x7ffc0d242a00) at 
builtin-trace.c:3121
#5  0x004c0b4a in run_builtin (p=0xa75d00 , argc=3, 
argv=0x7ffc0d242a00) at perf.c:296
#6  0x004c0db7 in handle_internal_command (argc=3, argv=0x7ffc0d242a00) 
at perf.c:348
#7  0x004c0f09 in run_argv (argcp=0x7ffc0d24285c, argv=0x7ffc0d242850) 
at perf.c:392
#8  0x004c12d7 in main (argc=3, argv=0x7ffc0d242a00) at perf.c:536
(gdb)


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > 
> > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> > 
> > So I got the first two patches already merged, and made some comments
> > about the other patches, please check those,
> > 
> 
> Thanks for the review Arnaldo.
> 
> I will take a close look for the comments. 
> For the next version, I only need to include patch 3-10, correct?

Right, and go from my perf/core branch. The hashtable patch is still not
there as I am running tests before pushing out, but it should be there
later today.

Thanks!

- Arnaldo
 
> 
> Thanks,
> Kan
> 
> > Thanks,
> > 
> > - Arnaldo
> > 
> > > Changes since V1:
> > >  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
> > >  - Patch 6: use a smaller locked section for comm_str__put
> > >add a locked wrapper for comm_str__findnew  (Arnaldo)
> > >
> > > Kan Liang (10):
> > >   perf tools: hashtable for machine threads
> > >   perf tools: using scandir to replace readdir
> > >   petf tools: using comm_str to replace comm in hist_entry
> > >   petf tools: introduce a new function to set namespaces id
> > >   perf tools: lock to protect thread list
> > >   perf tools: lock to protect comm_str rb tree
> > >   perf tools: change machine comm_exec type to atomic
> > >   perf top: implement multithreading for perf_event__synthesize_threads
> > >   perf top: add option to set the number of thread for event synthesize
> > >   perf top: switch back to overwrite mode
> > >
> > >  tools/perf/builtin-kvm.c  |   3 +-
> > >  tools/perf/builtin-record.c   |   2 +-
> > >  tools/perf/builtin-top.c  |   9 +-
> > >  tools/perf/builtin-trace.c|  21 +++--
> > >  tools/perf/tests/mmap-thread-lookup.c |   2 +-
> > >  tools/perf/ui/browsers/hists.c|   2 +-
> > >  tools/perf/util/comm.c|  18 +++-
> > >  tools/perf/util/event.c   | 149 
> > > +---
> > >  tools/perf/util/event.h   |  14 ++-
> > >  tools/perf/util/evlist.c  |   5 +-
> > >  tools/perf/util/hist.c|  11 +--
> > >  tools/perf/util/machine.c | 158 
> > > +-
> > >  tools/perf/util/machine.h |  34 ++--
> > >  tools/perf/util/rb_resort.h   |   5 +-
> > >  tools/perf/util/sort.c|   8 +-
> > >  tools/perf/util/sort.h|   2 +-
> > >  tools/perf/util/thread.c  |  68 ---
> > >  tools/perf/util/thread.h  |   6 +-
> > >  tools/perf/util/top.h |   1 +
> > >  19 files changed, 376 insertions(+), 142 deletions(-)
> > >
> > > --
> > > 2.5.5


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 13, 2017 at 03:29:44PM +, Liang, Kan escreveu:
> > 
> > Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> > 
> > So I got the first two patches already merged, and made some comments
> > about the other patches, please check those,
> > 
> 
> Thanks for the review Arnaldo.
> 
> I will take a close look for the comments. 
> For the next version, I only need to include patch 3-10, correct?

Right, and go from my perf/core branch. The hashtable patch is still not
there as I am running tests before pushing out, but it should be there
later today.

Thanks!

- Arnaldo
 
> 
> Thanks,
> Kan
> 
> > Thanks,
> > 
> > - Arnaldo
> > 
> > > Changes since V1:
> > >  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
> > >  - Patch 6: use a smaller locked section for comm_str__put
> > >add a locked wrapper for comm_str__findnew  (Arnaldo)
> > >
> > > Kan Liang (10):
> > >   perf tools: hashtable for machine threads
> > >   perf tools: using scandir to replace readdir
> > >   petf tools: using comm_str to replace comm in hist_entry
> > >   petf tools: introduce a new function to set namespaces id
> > >   perf tools: lock to protect thread list
> > >   perf tools: lock to protect comm_str rb tree
> > >   perf tools: change machine comm_exec type to atomic
> > >   perf top: implement multithreading for perf_event__synthesize_threads
> > >   perf top: add option to set the number of thread for event synthesize
> > >   perf top: switch back to overwrite mode
> > >
> > >  tools/perf/builtin-kvm.c  |   3 +-
> > >  tools/perf/builtin-record.c   |   2 +-
> > >  tools/perf/builtin-top.c  |   9 +-
> > >  tools/perf/builtin-trace.c|  21 +++--
> > >  tools/perf/tests/mmap-thread-lookup.c |   2 +-
> > >  tools/perf/ui/browsers/hists.c|   2 +-
> > >  tools/perf/util/comm.c|  18 +++-
> > >  tools/perf/util/event.c   | 149 
> > > +---
> > >  tools/perf/util/event.h   |  14 ++-
> > >  tools/perf/util/evlist.c  |   5 +-
> > >  tools/perf/util/hist.c|  11 +--
> > >  tools/perf/util/machine.c | 158 
> > > +-
> > >  tools/perf/util/machine.h |  34 ++--
> > >  tools/perf/util/rb_resort.h   |   5 +-
> > >  tools/perf/util/sort.c|   8 +-
> > >  tools/perf/util/sort.h|   2 +-
> > >  tools/perf/util/thread.c  |  68 ---
> > >  tools/perf/util/thread.h  |   6 +-
> > >  tools/perf/util/top.h |   1 +
> > >  19 files changed, 376 insertions(+), 142 deletions(-)
> > >
> > > --
> > > 2.5.5


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Liang, Kan
> 
> Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> 
> So I got the first two patches already merged, and made some comments
> about the other patches, please check those,
> 

Thanks for the review Arnaldo.

I will take a close look for the comments. 
For the next version, I only need to include patch 3-10, correct?


Thanks,
Kan

> Thanks,
> 
> - Arnaldo
> 
> > Changes since V1:
> >  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
> >  - Patch 6: use a smaller locked section for comm_str__put
> >add a locked wrapper for comm_str__findnew  (Arnaldo)
> >
> > Kan Liang (10):
> >   perf tools: hashtable for machine threads
> >   perf tools: using scandir to replace readdir
> >   petf tools: using comm_str to replace comm in hist_entry
> >   petf tools: introduce a new function to set namespaces id
> >   perf tools: lock to protect thread list
> >   perf tools: lock to protect comm_str rb tree
> >   perf tools: change machine comm_exec type to atomic
> >   perf top: implement multithreading for perf_event__synthesize_threads
> >   perf top: add option to set the number of thread for event synthesize
> >   perf top: switch back to overwrite mode
> >
> >  tools/perf/builtin-kvm.c  |   3 +-
> >  tools/perf/builtin-record.c   |   2 +-
> >  tools/perf/builtin-top.c  |   9 +-
> >  tools/perf/builtin-trace.c|  21 +++--
> >  tools/perf/tests/mmap-thread-lookup.c |   2 +-
> >  tools/perf/ui/browsers/hists.c|   2 +-
> >  tools/perf/util/comm.c|  18 +++-
> >  tools/perf/util/event.c   | 149 
> > +---
> >  tools/perf/util/event.h   |  14 ++-
> >  tools/perf/util/evlist.c  |   5 +-
> >  tools/perf/util/hist.c|  11 +--
> >  tools/perf/util/machine.c | 158 
> > +-
> >  tools/perf/util/machine.h |  34 ++--
> >  tools/perf/util/rb_resort.h   |   5 +-
> >  tools/perf/util/sort.c|   8 +-
> >  tools/perf/util/sort.h|   2 +-
> >  tools/perf/util/thread.c  |  68 ---
> >  tools/perf/util/thread.h  |   6 +-
> >  tools/perf/util/top.h |   1 +
> >  19 files changed, 376 insertions(+), 142 deletions(-)
> >
> > --
> > 2.5.5


RE: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Liang, Kan
> 
> Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:
> 
> So I got the first two patches already merged, and made some comments
> about the other patches, please check those,
> 

Thanks for the review Arnaldo.

I will take a close look for the comments. 
For the next version, I only need to include patch 3-10, correct?


Thanks,
Kan

> Thanks,
> 
> - Arnaldo
> 
> > Changes since V1:
> >  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
> >  - Patch 6: use a smaller locked section for comm_str__put
> >add a locked wrapper for comm_str__findnew  (Arnaldo)
> >
> > Kan Liang (10):
> >   perf tools: hashtable for machine threads
> >   perf tools: using scandir to replace readdir
> >   petf tools: using comm_str to replace comm in hist_entry
> >   petf tools: introduce a new function to set namespaces id
> >   perf tools: lock to protect thread list
> >   perf tools: lock to protect comm_str rb tree
> >   perf tools: change machine comm_exec type to atomic
> >   perf top: implement multithreading for perf_event__synthesize_threads
> >   perf top: add option to set the number of thread for event synthesize
> >   perf top: switch back to overwrite mode
> >
> >  tools/perf/builtin-kvm.c  |   3 +-
> >  tools/perf/builtin-record.c   |   2 +-
> >  tools/perf/builtin-top.c  |   9 +-
> >  tools/perf/builtin-trace.c|  21 +++--
> >  tools/perf/tests/mmap-thread-lookup.c |   2 +-
> >  tools/perf/ui/browsers/hists.c|   2 +-
> >  tools/perf/util/comm.c|  18 +++-
> >  tools/perf/util/event.c   | 149 
> > +---
> >  tools/perf/util/event.h   |  14 ++-
> >  tools/perf/util/evlist.c  |   5 +-
> >  tools/perf/util/hist.c|  11 +--
> >  tools/perf/util/machine.c | 158 
> > +-
> >  tools/perf/util/machine.h |  34 ++--
> >  tools/perf/util/rb_resort.h   |   5 +-
> >  tools/perf/util/sort.c|   8 +-
> >  tools/perf/util/sort.h|   2 +-
> >  tools/perf/util/thread.c  |  68 ---
> >  tools/perf/util/thread.h  |   6 +-
> >  tools/perf/util/top.h |   1 +
> >  19 files changed, 376 insertions(+), 142 deletions(-)
> >
> > --
> > 2.5.5


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Arnaldo Carvalho de Melo
Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:

So I got the first two patches already merged, and made some comments
about the other patches, please check those,

Thanks,

- Arnaldo
 
> Changes since V1:
>  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
>  - Patch 6: use a smaller locked section for comm_str__put
>add a locked wrapper for comm_str__findnew  (Arnaldo)
> 
> Kan Liang (10):
>   perf tools: hashtable for machine threads
>   perf tools: using scandir to replace readdir
>   petf tools: using comm_str to replace comm in hist_entry
>   petf tools: introduce a new function to set namespaces id
>   perf tools: lock to protect thread list
>   perf tools: lock to protect comm_str rb tree
>   perf tools: change machine comm_exec type to atomic
>   perf top: implement multithreading for perf_event__synthesize_threads
>   perf top: add option to set the number of thread for event synthesize
>   perf top: switch back to overwrite mode
> 
>  tools/perf/builtin-kvm.c  |   3 +-
>  tools/perf/builtin-record.c   |   2 +-
>  tools/perf/builtin-top.c  |   9 +-
>  tools/perf/builtin-trace.c|  21 +++--
>  tools/perf/tests/mmap-thread-lookup.c |   2 +-
>  tools/perf/ui/browsers/hists.c|   2 +-
>  tools/perf/util/comm.c|  18 +++-
>  tools/perf/util/event.c   | 149 +---
>  tools/perf/util/event.h   |  14 ++-
>  tools/perf/util/evlist.c  |   5 +-
>  tools/perf/util/hist.c|  11 +--
>  tools/perf/util/machine.c | 158 
> +-
>  tools/perf/util/machine.h |  34 ++--
>  tools/perf/util/rb_resort.h   |   5 +-
>  tools/perf/util/sort.c|   8 +-
>  tools/perf/util/sort.h|   2 +-
>  tools/perf/util/thread.c  |  68 ---
>  tools/perf/util/thread.h  |   6 +-
>  tools/perf/util/top.h |   1 +
>  19 files changed, 376 insertions(+), 142 deletions(-)
> 
> -- 
> 2.5.5


Re: [PATCH RFC V2 00/10] perf top optimization

2017-09-13 Thread Arnaldo Carvalho de Melo
Em Sun, Sep 10, 2017 at 07:23:13PM -0700, kan.li...@intel.com escreveu:

So I got the first two patches already merged, and made some comments
about the other patches, please check those,

Thanks,

- Arnaldo
 
> Changes since V1:
>  - Patch 1: machine threads and hashtable related renaming (Arnaldo)
>  - Patch 6: use a smaller locked section for comm_str__put
>add a locked wrapper for comm_str__findnew  (Arnaldo)
> 
> Kan Liang (10):
>   perf tools: hashtable for machine threads
>   perf tools: using scandir to replace readdir
>   petf tools: using comm_str to replace comm in hist_entry
>   petf tools: introduce a new function to set namespaces id
>   perf tools: lock to protect thread list
>   perf tools: lock to protect comm_str rb tree
>   perf tools: change machine comm_exec type to atomic
>   perf top: implement multithreading for perf_event__synthesize_threads
>   perf top: add option to set the number of thread for event synthesize
>   perf top: switch back to overwrite mode
> 
>  tools/perf/builtin-kvm.c  |   3 +-
>  tools/perf/builtin-record.c   |   2 +-
>  tools/perf/builtin-top.c  |   9 +-
>  tools/perf/builtin-trace.c|  21 +++--
>  tools/perf/tests/mmap-thread-lookup.c |   2 +-
>  tools/perf/ui/browsers/hists.c|   2 +-
>  tools/perf/util/comm.c|  18 +++-
>  tools/perf/util/event.c   | 149 +---
>  tools/perf/util/event.h   |  14 ++-
>  tools/perf/util/evlist.c  |   5 +-
>  tools/perf/util/hist.c|  11 +--
>  tools/perf/util/machine.c | 158 
> +-
>  tools/perf/util/machine.h |  34 ++--
>  tools/perf/util/rb_resort.h   |   5 +-
>  tools/perf/util/sort.c|   8 +-
>  tools/perf/util/sort.h|   2 +-
>  tools/perf/util/thread.c  |  68 ---
>  tools/perf/util/thread.h  |   6 +-
>  tools/perf/util/top.h |   1 +
>  19 files changed, 376 insertions(+), 142 deletions(-)
> 
> -- 
> 2.5.5