Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-11 Thread Jiri Olsa
On Sun, Mar 10, 2019 at 07:17:08PM +0300, Alexey Budankov wrote:

SNIP

> > so to be on the same page.. normal processing without compression is:
> > 
> >   perf_mmap__push does:
> > push(mmap buf)
> >   record__pushfn
> > record__write
> >   write(buf)
> > 
> >   perf_mmap__aio_push does:
> > memcpy(aio buf, mmap buf)
> >   push(aio buf)
> > record__aio_pushfn
> >   record__aio_write
> > aio_write(aio buf)
> > 
> > 
> > and for compression it would be:
> > 
> >   perf_mmap__push does:
> > push(mmap buf)
> >   compress_push
> > memcpy(compress buffer, mmapbuf)  EXTRA copy
> >   record__pushfn
> > record__write
> >   write(buf)
> > 
> >   perf_mmap__aio_push does:
> > memcpy(aio buf, mmap buf)
> >   memcpy(compress buffer, mmapbuf)EXTRA copy
> > push(aio buf)
> >   record__aio_pushfn
> > record__aio_write
> >   aio_write(aio buf)
> > 
> > 
> > side note: that actualy makes me think why do we even have 
> > perf_mmap__aio_push,
> > it looks like we could copy the buf in the callback push function with no 
> > harm?
> 
> Well, yes, perf_mmap__aio_push() can be avoided and perf_mmap__push() can be 
> used
> as for serial as for AIO, moving all the specifics to record code from mmap.c,
> like this:
> 
> Serial
>   perf_mmap__push(, record__pushfn)
>  push(), possibly two times
> record__pushfn()
>  if (-z) zstd_compress(map->base => map->data) <-- 
> compressing memcpy()
>  record__write(-z ? map->data, map->base)
> AIO
>   record__aio_push()
>  perf_mmap__push(, record__aio_pushfn())
>   push(), possibly two times
>  record__aio_pushfn()
> if (-z) zstd_compress(map->base => map->aio.data[i]) <--- 
> compressing memcpy()
> else memcpy(map->base => map->aio.data[i]) <--- plain 
> memcpy()
>record__aio_write(map->aio.data[i]) 
> 
> So now it looks optimal as from performance and data loss reduction 
> perspective as from design perspective. What do you think?

yes, that looks much better.. so we'd have record__pushfn
for standard (serial) and record__aio_pushfn for AIO

thanks,
jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-10 Thread Alexey Budankov
On 08.03.2019 13:46, Jiri Olsa wrote:
> On Thu, Mar 07, 2019 at 06:26:47PM +0300, Alexey Budankov wrote:
>>
>> On 07.03.2019 15:14, Jiri Olsa wrote:
>>> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:

 On 05.03.2019 15:25, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>>  
>>  /*
>>   * Increment md->refcount to guard md->data[idx] buffer
>> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
>> *to, int idx,
>>  md->prev = head;
>>  perf_mmap__consume(md);
>>  
>> -rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + 
>> size, *off);
>> +rc = push(to, md->aio.data[idx], size0 + size, *off, 
>> >aio.cblocks[idx]);
>>  if (!rc) {
>>  *off += size0 + size;
>>  } else {
>> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>>  }
>>  
>>  int perf_mmap__push(struct perf_mmap *md, void *to,
>> -int push(struct perf_mmap *map, void *to, void 
>> *buf, size_t size))
>> +int push(struct perf_mmap *map, void *to, void 
>> *buf, size_t size),
>> +perf_mmap__compress_fn_t compress, void *comp_data)
>>  {
>>  u64 head = perf_mmap__read_head(md);
>>  unsigned char *data = md->base + page_size;
>>  unsigned long size;
>>  void *buf;
>>  int rc = 0;
>> +size_t mmap_len = perf_mmap__mmap_len(md);
>>  
>>  rc = perf_mmap__read_init(md);
>>  if (rc < 0)
>> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>  buf = [md->start & md->mask];
>>  size = md->mask + 1 - (md->start & md->mask);
>>  md->start += size;
>> -
>> +if (compress) {
>> +size = compress(comp_data, md->data, mmap_len, 
>> buf, size);
>> +buf = md->data;
>> +}
>>  if (push(md, to, buf, size) < 0) {
>>  rc = -1;
>>  goto out;
>
> when we discussed the compress callback should be another layer
> in perf_mmap__push I was thinking more of the layered/fifo design,
> like:
>
> normaly we call:
>
>   perf_mmap__push(... push = record__pushfn ...)
>   -> reads mmap data and calls push(data), which translates as:
>
>   record__pushfn(data);
>   - which stores the data
>
>
> for compressed it'd be:
>
>   perf_mmap__push(... push = compressed_push ...)
>
>   -> reads mmap data and calls push(data), which translates as:
>
>   compressed_push(data)
>   -> reads data, compresses them and calls, next push 
> callback in line:
>
>   record__pushfn(data)
>   - which stores the data
>
>
> there'd need to be the logic for compressed_push to
> remember the 'next push' function

 That is suboptimal for AIO. Also compression is an independent operation 
 that 
 could be applied on any of push stages you mean.
>>>
>>> not sure what you mean by suboptimal, but I think
>>> that it can still happen in subsequent push callback
>>>

>
> but I think this was the original idea behind the
> perf_mmap__push -> it gets the data and pushes them for
> the next processing.. it should stay as simple as that

 Agree on keeping simplicity and, at the moment, there is no any push to 
 the next 
 processing in the code so provided implementation fits as for serial as 
 for AIO
 at the same time sticking to simplicity as much as possibly. If you see 
 something 
 that would fit better please speak up and share.
>>>
>>> I have to insist that perf_mmap__push stays untouched
>>> and we do other processing in the push callbacks
>>
>> What is about perf_mmap__aio_push()?
>>
>> Without compression it does 
>>  memcpy(), memcpy(), aio_push()
>>
>> With compression its does
>>  memcpy_with_compression(), memcpy_with_compression(), aio_push()
> 
> so to be on the same page.. normal processing without compression is:
> 
>   perf_mmap__push does:
>   push(mmap buf)
> record__pushfn
>   record__write
> write(buf)
> 
>   perf_mmap__aio_push does:
>   memcpy(aio buf, mmap buf)
> push(aio buf)
>   record__aio_pushfn
> record__aio_write
>   aio_write(aio buf)
> 
> 
> and for compression it would be:
> 
>   perf_mmap__push does:
>   push(mmap buf)
> compress_push
>   memcpy(compress buffer, 

Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-10 Thread Alexey Budankov
On 08.03.2019 13:46, Jiri Olsa wrote:
> On Thu, Mar 07, 2019 at 06:26:47PM +0300, Alexey Budankov wrote:
>>
>> On 07.03.2019 15:14, Jiri Olsa wrote:
>>> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:

 On 05.03.2019 15:25, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>>  
>>  /*
>>   * Increment md->refcount to guard md->data[idx] buffer
>> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
>> *to, int idx,
>>  md->prev = head;
>>  perf_mmap__consume(md);
>>  
>> -rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + 
>> size, *off);
>> +rc = push(to, md->aio.data[idx], size0 + size, *off, 
>> >aio.cblocks[idx]);
>>  if (!rc) {
>>  *off += size0 + size;
>>  } else {
>> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>>  }
>>  
>>  int perf_mmap__push(struct perf_mmap *md, void *to,
>> -int push(struct perf_mmap *map, void *to, void 
>> *buf, size_t size))
>> +int push(struct perf_mmap *map, void *to, void 
>> *buf, size_t size),
>> +perf_mmap__compress_fn_t compress, void *comp_data)
>>  {
>>  u64 head = perf_mmap__read_head(md);
>>  unsigned char *data = md->base + page_size;
>>  unsigned long size;
>>  void *buf;
>>  int rc = 0;
>> +size_t mmap_len = perf_mmap__mmap_len(md);
>>  
>>  rc = perf_mmap__read_init(md);
>>  if (rc < 0)
>> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>  buf = [md->start & md->mask];
>>  size = md->mask + 1 - (md->start & md->mask);
>>  md->start += size;
>> -
>> +if (compress) {
>> +size = compress(comp_data, md->data, mmap_len, 
>> buf, size);
>> +buf = md->data;
>> +}
>>  if (push(md, to, buf, size) < 0) {
>>  rc = -1;
>>  goto out;
>
> when we discussed the compress callback should be another layer
> in perf_mmap__push I was thinking more of the layered/fifo design,
> like:
>
> normaly we call:
>
>   perf_mmap__push(... push = record__pushfn ...)
>   -> reads mmap data and calls push(data), which translates as:
>
>   record__pushfn(data);
>   - which stores the data
>
>
> for compressed it'd be:
>
>   perf_mmap__push(... push = compressed_push ...)
>
>   -> reads mmap data and calls push(data), which translates as:
>
>   compressed_push(data)
>   -> reads data, compresses them and calls, next push 
> callback in line:
>
>   record__pushfn(data)
>   - which stores the data
>
>
> there'd need to be the logic for compressed_push to
> remember the 'next push' function

 That is suboptimal for AIO. Also compression is an independent operation 
 that 
 could be applied on any of push stages you mean.
>>>
>>> not sure what you mean by suboptimal, but I think
>>> that it can still happen in subsequent push callback
>>>

>
> but I think this was the original idea behind the
> perf_mmap__push -> it gets the data and pushes them for
> the next processing.. it should stay as simple as that

 Agree on keeping simplicity and, at the moment, there is no any push to 
 the next 
 processing in the code so provided implementation fits as for serial as 
 for AIO
 at the same time sticking to simplicity as much as possibly. If you see 
 something 
 that would fit better please speak up and share.
>>>
>>> I have to insist that perf_mmap__push stays untouched
>>> and we do other processing in the push callbacks
>>
>> What is about perf_mmap__aio_push()?
>>
>> Without compression it does 
>>  memcpy(), memcpy(), aio_push()
>>
>> With compression its does
>>  memcpy_with_compression(), memcpy_with_compression(), aio_push()
> 
> so to be on the same page.. normal processing without compression is:
> 
>   perf_mmap__push does:
>   push(mmap buf)
> record__pushfn
>   record__write
> write(buf)
> 
>   perf_mmap__aio_push does:
>   memcpy(aio buf, mmap buf)
> push(aio buf)
>   record__aio_pushfn
> record__aio_write
>   aio_write(aio buf)
> 
> 
> and for compression it would be:
> 
>   perf_mmap__push does:
>   push(mmap buf)
> compress_push
>   memcpy(compress buffer, 

Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-08 Thread Jiri Olsa
On Thu, Mar 07, 2019 at 06:26:47PM +0300, Alexey Budankov wrote:
> 
> On 07.03.2019 15:14, Jiri Olsa wrote:
> > On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:
> >>
> >> On 05.03.2019 15:25, Jiri Olsa wrote:
> >>> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> >>>
> >>> SNIP
> >>>
>   
>   /*
>    * Increment md->refcount to guard md->data[idx] buffer
>  @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
>  *to, int idx,
>   md->prev = head;
>   perf_mmap__consume(md);
>   
>  -rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + 
>  size, *off);
>  +rc = push(to, md->aio.data[idx], size0 + size, *off, 
>  >aio.cblocks[idx]);
>   if (!rc) {
>   *off += size0 + size;
>   } else {
>  @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>   }
>   
>   int perf_mmap__push(struct perf_mmap *md, void *to,
>  -int push(struct perf_mmap *map, void *to, void 
>  *buf, size_t size))
>  +int push(struct perf_mmap *map, void *to, void 
>  *buf, size_t size),
>  +perf_mmap__compress_fn_t compress, void *comp_data)
>   {
>   u64 head = perf_mmap__read_head(md);
>   unsigned char *data = md->base + page_size;
>   unsigned long size;
>   void *buf;
>   int rc = 0;
>  +size_t mmap_len = perf_mmap__mmap_len(md);
>   
>   rc = perf_mmap__read_init(md);
>   if (rc < 0)
>  @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>   buf = [md->start & md->mask];
>   size = md->mask + 1 - (md->start & md->mask);
>   md->start += size;
>  -
>  +if (compress) {
>  +size = compress(comp_data, md->data, mmap_len, 
>  buf, size);
>  +buf = md->data;
>  +}
>   if (push(md, to, buf, size) < 0) {
>   rc = -1;
>   goto out;
> >>>
> >>> when we discussed the compress callback should be another layer
> >>> in perf_mmap__push I was thinking more of the layered/fifo design,
> >>> like:
> >>>
> >>> normaly we call:
> >>>
> >>>   perf_mmap__push(... push = record__pushfn ...)
> >>>   -> reads mmap data and calls push(data), which translates as:
> >>>
> >>>   record__pushfn(data);
> >>>   - which stores the data
> >>>
> >>>
> >>> for compressed it'd be:
> >>>
> >>>   perf_mmap__push(... push = compressed_push ...)
> >>>
> >>>   -> reads mmap data and calls push(data), which translates as:
> >>>
> >>>   compressed_push(data)
> >>>   -> reads data, compresses them and calls, next push 
> >>> callback in line:
> >>>
> >>>   record__pushfn(data)
> >>>   - which stores the data
> >>>
> >>>
> >>> there'd need to be the logic for compressed_push to
> >>> remember the 'next push' function
> >>
> >> That is suboptimal for AIO. Also compression is an independent operation 
> >> that 
> >> could be applied on any of push stages you mean.
> > 
> > not sure what you mean by suboptimal, but I think
> > that it can still happen in subsequent push callback
> > 
> >>
> >>>
> >>> but I think this was the original idea behind the
> >>> perf_mmap__push -> it gets the data and pushes them for
> >>> the next processing.. it should stay as simple as that
> >>
> >> Agree on keeping simplicity and, at the moment, there is no any push to 
> >> the next 
> >> processing in the code so provided implementation fits as for serial as 
> >> for AIO
> >> at the same time sticking to simplicity as much as possibly. If you see 
> >> something 
> >> that would fit better please speak up and share.
> > 
> > I have to insist that perf_mmap__push stays untouched
> > and we do other processing in the push callbacks
> 
> What is about perf_mmap__aio_push()?
> 
> Without compression it does 
>   memcpy(), memcpy(), aio_push()
> 
> With compression its does
>   memcpy_with_compression(), memcpy_with_compression(), aio_push()

so to be on the same page.. normal processing without compression is:

  perf_mmap__push does:
push(mmap buf)
  record__pushfn
record__write
  write(buf)

  perf_mmap__aio_push does:
memcpy(aio buf, mmap buf)
  push(aio buf)
record__aio_pushfn
  record__aio_write
aio_write(aio buf)


and for compression it would be:

  perf_mmap__push does:
push(mmap buf)
  compress_push
memcpy(compress buffer, mmapbuf)  EXTRA copy
  record__pushfn

Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 07.03.2019 18:26, Alexey Budankov wrote:
> 
> On 07.03.2019 15:14, Jiri Olsa wrote:
>> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:
>>>
>>> On 05.03.2019 15:25, Jiri Olsa wrote:
 On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

 SNIP

>  
>   /*
>* Increment md->refcount to guard md->data[idx] buffer
> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
> *to, int idx,
>   md->prev = head;
>   perf_mmap__consume(md);
>  
> - rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + size, 
> *off);
> + rc = push(to, md->aio.data[idx], size0 + size, *off, 
> >aio.cblocks[idx]);
>   if (!rc) {
>   *off += size0 + size;
>   } else {
> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>  }
>  
>  int perf_mmap__push(struct perf_mmap *md, void *to,
> - int push(struct perf_mmap *map, void *to, void *buf, size_t 
> size))
> + int push(struct perf_mmap *map, void *to, void *buf, size_t 
> size),
> + perf_mmap__compress_fn_t compress, void *comp_data)
>  {
>   u64 head = perf_mmap__read_head(md);
>   unsigned char *data = md->base + page_size;
>   unsigned long size;
>   void *buf;
>   int rc = 0;
> + size_t mmap_len = perf_mmap__mmap_len(md);
>  
>   rc = perf_mmap__read_init(md);
>   if (rc < 0)
> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>   buf = [md->start & md->mask];
>   size = md->mask + 1 - (md->start & md->mask);
>   md->start += size;
> -
> + if (compress) {
> + size = compress(comp_data, md->data, mmap_len, buf, 
> size);
> + buf = md->data;
> + }
>   if (push(md, to, buf, size) < 0) {
>   rc = -1;
>   goto out;

 when we discussed the compress callback should be another layer
 in perf_mmap__push I was thinking more of the layered/fifo design,
 like:

 normaly we call:

perf_mmap__push(... push = record__pushfn ...)
-> reads mmap data and calls push(data), which translates as:

record__pushfn(data);
- which stores the data


 for compressed it'd be:

perf_mmap__push(... push = compressed_push ...)

-> reads mmap data and calls push(data), which translates as:

compressed_push(data)
-> reads data, compresses them and calls, next push 
 callback in line:

record__pushfn(data)
- which stores the data


 there'd need to be the logic for compressed_push to
 remember the 'next push' function
>>>
>>> That is suboptimal for AIO. Also compression is an independent operation 
>>> that 
>>> could be applied on any of push stages you mean.
>>
>> not sure what you mean by suboptimal, but I think
>> that it can still happen in subsequent push callback
>>
>>>

 but I think this was the original idea behind the
 perf_mmap__push -> it gets the data and pushes them for
 the next processing.. it should stay as simple as that
>>>
>>> Agree on keeping simplicity and, at the moment, there is no any push to the 
>>> next 
>>> processing in the code so provided implementation fits as for serial as for 
>>> AIO
>>> at the same time sticking to simplicity as much as possibly. If you see 
>>> something 
>>> that would fit better please speak up and share.
>>
>> I have to insist that perf_mmap__push stays untouched
>> and we do other processing in the push callbacks
> 
> What is about perf_mmap__aio_push()?
> 
> Without compression it does 
>   memcpy(), memcpy(), aio_push()
> 
> With compression its does
>   memcpy_with_compression(), memcpy_with_compression(), aio_push()
> 
> and deviation that increases amount of copy operations i.e. implementing 
> three or more 
> is suboptimal in terms of runtime overhead and data loss decrease
> 
> Compression for serial streaming can be implemented in push() callback.
> AIO case would go with compression over a parameter in aio_push().

I meant compress_fn and comp_data parameters _for_ aio_push().

> So the both trace writing schemas could be optimally extended.
> 
> ~Alexey
> 
>>
>> jirka
>>
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 07.03.2019 15:14, Jiri Olsa wrote:
> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:
>>
>> On 05.03.2019 15:25, Jiri Olsa wrote:
>>> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
  
/*
 * Increment md->refcount to guard md->data[idx] buffer
 @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
 *to, int idx,
md->prev = head;
perf_mmap__consume(md);
  
 -  rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + size, 
 *off);
 +  rc = push(to, md->aio.data[idx], size0 + size, *off, 
 >aio.cblocks[idx]);
if (!rc) {
*off += size0 + size;
} else {
 @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
  }
  
  int perf_mmap__push(struct perf_mmap *md, void *to,
 -  int push(struct perf_mmap *map, void *to, void *buf, size_t 
 size))
 +  int push(struct perf_mmap *map, void *to, void *buf, size_t 
 size),
 +  perf_mmap__compress_fn_t compress, void *comp_data)
  {
u64 head = perf_mmap__read_head(md);
unsigned char *data = md->base + page_size;
unsigned long size;
void *buf;
int rc = 0;
 +  size_t mmap_len = perf_mmap__mmap_len(md);
  
rc = perf_mmap__read_init(md);
if (rc < 0)
 @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
buf = [md->start & md->mask];
size = md->mask + 1 - (md->start & md->mask);
md->start += size;
 -
 +  if (compress) {
 +  size = compress(comp_data, md->data, mmap_len, buf, 
 size);
 +  buf = md->data;
 +  }
if (push(md, to, buf, size) < 0) {
rc = -1;
goto out;
>>>
>>> when we discussed the compress callback should be another layer
>>> in perf_mmap__push I was thinking more of the layered/fifo design,
>>> like:
>>>
>>> normaly we call:
>>>
>>> perf_mmap__push(... push = record__pushfn ...)
>>> -> reads mmap data and calls push(data), which translates as:
>>>
>>> record__pushfn(data);
>>> - which stores the data
>>>
>>>
>>> for compressed it'd be:
>>>
>>> perf_mmap__push(... push = compressed_push ...)
>>>
>>> -> reads mmap data and calls push(data), which translates as:
>>>
>>> compressed_push(data)
>>> -> reads data, compresses them and calls, next push 
>>> callback in line:
>>>
>>> record__pushfn(data)
>>> - which stores the data
>>>
>>>
>>> there'd need to be the logic for compressed_push to
>>> remember the 'next push' function
>>
>> That is suboptimal for AIO. Also compression is an independent operation 
>> that 
>> could be applied on any of push stages you mean.
> 
> not sure what you mean by suboptimal, but I think
> that it can still happen in subsequent push callback
> 
>>
>>>
>>> but I think this was the original idea behind the
>>> perf_mmap__push -> it gets the data and pushes them for
>>> the next processing.. it should stay as simple as that
>>
>> Agree on keeping simplicity and, at the moment, there is no any push to the 
>> next 
>> processing in the code so provided implementation fits as for serial as for 
>> AIO
>> at the same time sticking to simplicity as much as possibly. If you see 
>> something 
>> that would fit better please speak up and share.
> 
> I have to insist that perf_mmap__push stays untouched
> and we do other processing in the push callbacks

What is about perf_mmap__aio_push()?

Without compression it does 
memcpy(), memcpy(), aio_push()

With compression its does
memcpy_with_compression(), memcpy_with_compression(), aio_push()

and deviation that increases amount of copy operations i.e. implementing three 
or more 
is suboptimal in terms of runtime overhead and data loss decrease

Compression for serial streaming can be implemented in push() callback.
AIO case would go with compression over a parameter in aio_push().
So the both trace writing schemas could be optimally extended.

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 07.03.2019 14:59, Jiri Olsa wrote:
> On Thu, Mar 07, 2019 at 11:26:16AM +0300, Alexey Budankov wrote:
>>
>> On 05.03.2019 15:26, Jiri Olsa wrote:
>>> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
 +static size_t record__process_comp_header(void *record, size_t increment)
 +{
 +  struct compressed_event *event = record;
 +  size_t size = sizeof(struct compressed_event);
 +
 +  if (increment) {
 +  event->header.size += increment;
 +  return increment;
 +  } else {
 +  event->header.type = PERF_RECORD_COMPRESSED;
 +  event->header.size = size;
 +  return size;
 +  }
>>>
>>> so zstd_compress_stream_to_records calls this in the loop:
>>>
>>>   while (input.pos < input.size) {
>>> ...
>>> size = process_header(record, 0);
>>> ...
>>> size = process_header(record, size);
>>> }
>>>
>>> the header.size will get overwritten with every new iteration, no?
>>
>> Every new record is headed by PERF_RECORD_COMPRESSED header an updated with 
>> resulted compressed frame size after compression. 
>>
>> process_header(, 0) puts header, then 
>> compression puts compressed frame, then 
>> process_header(, N) updates header.size value (+=).
> 
> and then process_header(, 0) is called again no?

Yes, in the next iteration these three steps repeat for the next record.

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Jiri Olsa
On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:
> 
> On 05.03.2019 15:25, Jiri Olsa wrote:
> > On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >>  
> >>/*
> >> * Increment md->refcount to guard md->data[idx] buffer
> >> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void 
> >> *to, int idx,
> >>md->prev = head;
> >>perf_mmap__consume(md);
> >>  
> >> -  rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + size, 
> >> *off);
> >> +  rc = push(to, md->aio.data[idx], size0 + size, *off, 
> >> >aio.cblocks[idx]);
> >>if (!rc) {
> >>*off += size0 + size;
> >>} else {
> >> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
> >>  }
> >>  
> >>  int perf_mmap__push(struct perf_mmap *md, void *to,
> >> -  int push(struct perf_mmap *map, void *to, void *buf, size_t 
> >> size))
> >> +  int push(struct perf_mmap *map, void *to, void *buf, size_t 
> >> size),
> >> +  perf_mmap__compress_fn_t compress, void *comp_data)
> >>  {
> >>u64 head = perf_mmap__read_head(md);
> >>unsigned char *data = md->base + page_size;
> >>unsigned long size;
> >>void *buf;
> >>int rc = 0;
> >> +  size_t mmap_len = perf_mmap__mmap_len(md);
> >>  
> >>rc = perf_mmap__read_init(md);
> >>if (rc < 0)
> >> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
> >>buf = [md->start & md->mask];
> >>size = md->mask + 1 - (md->start & md->mask);
> >>md->start += size;
> >> -
> >> +  if (compress) {
> >> +  size = compress(comp_data, md->data, mmap_len, buf, 
> >> size);
> >> +  buf = md->data;
> >> +  }
> >>if (push(md, to, buf, size) < 0) {
> >>rc = -1;
> >>goto out;
> > 
> > when we discussed the compress callback should be another layer
> > in perf_mmap__push I was thinking more of the layered/fifo design,
> > like:
> > 
> > normaly we call:
> > 
> > perf_mmap__push(... push = record__pushfn ...)
> > -> reads mmap data and calls push(data), which translates as:
> > 
> > record__pushfn(data);
> > - which stores the data
> > 
> > 
> > for compressed it'd be:
> > 
> > perf_mmap__push(... push = compressed_push ...)
> > 
> > -> reads mmap data and calls push(data), which translates as:
> > 
> > compressed_push(data)
> > -> reads data, compresses them and calls, next push 
> > callback in line:
> > 
> > record__pushfn(data)
> > - which stores the data
> > 
> > 
> > there'd need to be the logic for compressed_push to
> > remember the 'next push' function
> 
> That is suboptimal for AIO. Also compression is an independent operation that 
> could be applied on any of push stages you mean.

not sure what you mean by suboptimal, but I think
that it can still happen in subsequent push callback

> 
> > 
> > but I think this was the original idea behind the
> > perf_mmap__push -> it gets the data and pushes them for
> > the next processing.. it should stay as simple as that
> 
> Agree on keeping simplicity and, at the moment, there is no any push to the 
> next 
> processing in the code so provided implementation fits as for serial as for 
> AIO
> at the same time sticking to simplicity as much as possibly. If you see 
> something 
> that would fit better please speak up and share.

I have to insist that perf_mmap__push stays untouched
and we do other processing in the push callbacks

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Jiri Olsa
On Thu, Mar 07, 2019 at 11:26:16AM +0300, Alexey Budankov wrote:
> 
> On 05.03.2019 15:26, Jiri Olsa wrote:
> > On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >> +static size_t record__process_comp_header(void *record, size_t increment)
> >> +{
> >> +  struct compressed_event *event = record;
> >> +  size_t size = sizeof(struct compressed_event);
> >> +
> >> +  if (increment) {
> >> +  event->header.size += increment;
> >> +  return increment;
> >> +  } else {
> >> +  event->header.type = PERF_RECORD_COMPRESSED;
> >> +  event->header.size = size;
> >> +  return size;
> >> +  }
> > 
> > so zstd_compress_stream_to_records calls this in the loop:
> > 
> >   while (input.pos < input.size) {
> > ...
> > size = process_header(record, 0);
> > ...
> > size = process_header(record, size);
> > }
> > 
> > the header.size will get overwritten with every new iteration, no?
> 
> Every new record is headed by PERF_RECORD_COMPRESSED header an updated with 
> resulted compressed frame size after compression. 
> 
> process_header(, 0) puts header, then 
> compression puts compressed frame, then 
> process_header(, N) updates header.size value (+=).

and then process_header(, 0) is called again no?

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 05.03.2019 15:25, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  
>>  /*
>>   * Increment md->refcount to guard md->data[idx] buffer
>> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, 
>> int idx,
>>  md->prev = head;
>>  perf_mmap__consume(md);
>>  
>> -rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + size, 
>> *off);
>> +rc = push(to, md->aio.data[idx], size0 + size, *off, 
>> >aio.cblocks[idx]);
>>  if (!rc) {
>>  *off += size0 + size;
>>  } else {
>> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>>  }
>>  
>>  int perf_mmap__push(struct perf_mmap *md, void *to,
>> -int push(struct perf_mmap *map, void *to, void *buf, size_t 
>> size))
>> +int push(struct perf_mmap *map, void *to, void *buf, size_t 
>> size),
>> +perf_mmap__compress_fn_t compress, void *comp_data)
>>  {
>>  u64 head = perf_mmap__read_head(md);
>>  unsigned char *data = md->base + page_size;
>>  unsigned long size;
>>  void *buf;
>>  int rc = 0;
>> +size_t mmap_len = perf_mmap__mmap_len(md);
>>  
>>  rc = perf_mmap__read_init(md);
>>  if (rc < 0)
>> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>  buf = [md->start & md->mask];
>>  size = md->mask + 1 - (md->start & md->mask);
>>  md->start += size;
>> -
>> +if (compress) {
>> +size = compress(comp_data, md->data, mmap_len, buf, 
>> size);
>> +buf = md->data;
>> +}
>>  if (push(md, to, buf, size) < 0) {
>>  rc = -1;
>>  goto out;
> 
> when we discussed the compress callback should be another layer
> in perf_mmap__push I was thinking more of the layered/fifo design,
> like:
> 
> normaly we call:
> 
>   perf_mmap__push(... push = record__pushfn ...)
>   -> reads mmap data and calls push(data), which translates as:
> 
>   record__pushfn(data);
>   - which stores the data
> 
> 
> for compressed it'd be:
> 
>   perf_mmap__push(... push = compressed_push ...)
> 
>   -> reads mmap data and calls push(data), which translates as:
> 
>   compressed_push(data)
>   -> reads data, compresses them and calls, next push 
> callback in line:
> 
>   record__pushfn(data)
>   - which stores the data
> 
> 
> there'd need to be the logic for compressed_push to
> remember the 'next push' function

That is suboptimal for AIO. Also compression is an independent operation that 
could be applied on any of push stages you mean.

> 
> but I think this was the original idea behind the
> perf_mmap__push -> it gets the data and pushes them for
> the next processing.. it should stay as simple as that

Agree on keeping simplicity and, at the moment, there is no any push to the 
next 
processing in the code so provided implementation fits as for serial as for AIO
at the same time sticking to simplicity as much as possibly. If you see 
something 
that would fit better please speak up and share.

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 05.03.2019 15:26, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static int record__aio_enabled(struct record *rec);
>> +
>>  static void record__aio_mmap_read_sync(struct record *rec)
>>  {
>>  int i;
>>  struct perf_evlist *evlist = rec->evlist;
>>  struct perf_mmap *maps = evlist->mmap;
>>  
>> -if (!rec->opts.nr_cblocks)
>> +if (!record__aio_enabled(rec))
>>  return;
>>  
>>  for (i = 0; i < evlist->nr_mmaps; i++) {
>> @@ -292,13 +294,17 @@ static int record__aio_parse(const struct option *opt,
>>  
>>  if (unset) {
>>  opts->nr_cblocks = 0;
>> -} else {
>> -if (str)
>> -opts->nr_cblocks = strtol(str, NULL, 0);
>> -if (!opts->nr_cblocks)
>> -opts->nr_cblocks = nr_cblocks_default;
>> +return 0;
>>  }
>>  
>> +if (str)
>> +opts->nr_cblocks = strtol(str, NULL, 0);
>> +if (!opts->nr_cblocks)
>> +opts->nr_cblocks = nr_cblocks_default;
>> +
>> +if (opts->nr_cblocks > nr_cblocks_max)
>> +opts->nr_cblocks = nr_cblocks_max;
>> +
>>  return 0;
> 
> does not seem to be related to the patch, if that's the case please separate

in v7.

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 05.03.2019 15:26, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static size_t record__process_comp_header(void *record, size_t increment)
>> +{
>> +struct compressed_event *event = record;
>> +size_t size = sizeof(struct compressed_event);
>> +
>> +if (increment) {
>> +event->header.size += increment;
>> +return increment;
>> +} else {
>> +event->header.type = PERF_RECORD_COMPRESSED;
>> +event->header.size = size;
>> +return size;
>> +}
> 
> so zstd_compress_stream_to_records calls this in the loop:
> 
>   while (input.pos < input.size) {
>   ...
> size = process_header(record, 0);
>   ...
> size = process_header(record, size);
>   }
> 
> the header.size will get overwritten with every new iteration, no?

Every new record is headed by PERF_RECORD_COMPRESSED header an updated with 
resulted compressed frame size after compression. 

process_header(, 0) puts header, then 
compression puts compressed frame, then 
process_header(, N) updates header.size value (+=).

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 05.03.2019 15:26, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  
>> +static size_t record__process_comp_header(void *record, size_t increment)
>> +{
>> +struct compressed_event *event = record;
>> +size_t size = sizeof(struct compressed_event);
>> +
>> +if (increment) {
>> +event->header.size += increment;
>> +return increment;
>> +} else {
>> +event->header.type = PERF_RECORD_COMPRESSED;
>> +event->header.size = size;
>> +return size;
>> +}
>> +}
>> +
>> +static size_t record__zstd_compress(void *data, void *dst, size_t dst_size,
> 
> we use the __ to mark 'struct' related function,
> which supposed to be the first argument.. IMO zstd_compress
> will be fine in here

in v7.

~Alexey

> 
> jirka
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-07 Thread Alexey Budankov


On 05.03.2019 15:25, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static size_t record__zstd_compress(void *data, void *dst, size_t dst_size,
>> +void *src, size_t src_size)
>> +{
>> +size_t compressed;
>> +struct perf_session *session = data;
>> +/* maximum size of record data size (2^16 - 1 - header) */
>> +const size_t max_record_size = (1 << 8 * sizeof(u16)) -
> 
> I dont follow this calculation, could you just use PERF_SAMPLE_MAX_SIZE?

in v7.

~Alexey

> 
> jirka
> 
>> +1 - sizeof(struct compressed_event);
>> +
> 
> SNIP
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Jiri Olsa
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

SNIP

> +static size_t record__process_comp_header(void *record, size_t increment)
> +{
> + struct compressed_event *event = record;
> + size_t size = sizeof(struct compressed_event);
> +
> + if (increment) {
> + event->header.size += increment;
> + return increment;
> + } else {
> + event->header.type = PERF_RECORD_COMPRESSED;
> + event->header.size = size;
> + return size;
> + }

so zstd_compress_stream_to_records calls this in the loop:

  while (input.pos < input.size) {
...
size = process_header(record, 0);
...
size = process_header(record, size);
}

the header.size will get overwritten with every new iteration, no?

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Jiri Olsa
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

SNIP

> +static int record__aio_enabled(struct record *rec);
> +
>  static void record__aio_mmap_read_sync(struct record *rec)
>  {
>   int i;
>   struct perf_evlist *evlist = rec->evlist;
>   struct perf_mmap *maps = evlist->mmap;
>  
> - if (!rec->opts.nr_cblocks)
> + if (!record__aio_enabled(rec))
>   return;
>  
>   for (i = 0; i < evlist->nr_mmaps; i++) {
> @@ -292,13 +294,17 @@ static int record__aio_parse(const struct option *opt,
>  
>   if (unset) {
>   opts->nr_cblocks = 0;
> - } else {
> - if (str)
> - opts->nr_cblocks = strtol(str, NULL, 0);
> - if (!opts->nr_cblocks)
> - opts->nr_cblocks = nr_cblocks_default;
> + return 0;
>   }
>  
> + if (str)
> + opts->nr_cblocks = strtol(str, NULL, 0);
> + if (!opts->nr_cblocks)
> + opts->nr_cblocks = nr_cblocks_default;
> +
> + if (opts->nr_cblocks > nr_cblocks_max)
> + opts->nr_cblocks = nr_cblocks_max;
> +
>   return 0;

does not seem to be related to the patch, if that's the case please separate

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Jiri Olsa
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

SNIP

>  
> +static size_t record__process_comp_header(void *record, size_t increment)
> +{
> + struct compressed_event *event = record;
> + size_t size = sizeof(struct compressed_event);
> +
> + if (increment) {
> + event->header.size += increment;
> + return increment;
> + } else {
> + event->header.type = PERF_RECORD_COMPRESSED;
> + event->header.size = size;
> + return size;
> + }
> +}
> +
> +static size_t record__zstd_compress(void *data, void *dst, size_t dst_size,

we use the __ to mark 'struct' related function,
which supposed to be the first argument.. IMO zstd_compress
will be fine in here

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Jiri Olsa
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

SNIP

> +static size_t record__zstd_compress(void *data, void *dst, size_t dst_size,
> + void *src, size_t src_size)
> +{
> + size_t compressed;
> + struct perf_session *session = data;
> + /* maximum size of record data size (2^16 - 1 - header) */
> + const size_t max_record_size = (1 << 8 * sizeof(u16)) -

I dont follow this calculation, could you just use PERF_SAMPLE_MAX_SIZE?

jirka

> + 1 - sizeof(struct compressed_event);
> +

SNIP


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Jiri Olsa
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

SNIP

>  
>   /*
>* Increment md->refcount to guard md->data[idx] buffer
> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, 
> int idx,
>   md->prev = head;
>   perf_mmap__consume(md);
>  
> - rc = push(to, >aio.cblocks[idx], md->aio.data[idx], size0 + size, 
> *off);
> + rc = push(to, md->aio.data[idx], size0 + size, *off, 
> >aio.cblocks[idx]);
>   if (!rc) {
>   *off += size0 + size;
>   } else {
> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>  }
>  
>  int perf_mmap__push(struct perf_mmap *md, void *to,
> - int push(struct perf_mmap *map, void *to, void *buf, size_t 
> size))
> + int push(struct perf_mmap *map, void *to, void *buf, size_t 
> size),
> + perf_mmap__compress_fn_t compress, void *comp_data)
>  {
>   u64 head = perf_mmap__read_head(md);
>   unsigned char *data = md->base + page_size;
>   unsigned long size;
>   void *buf;
>   int rc = 0;
> + size_t mmap_len = perf_mmap__mmap_len(md);
>  
>   rc = perf_mmap__read_init(md);
>   if (rc < 0)
> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>   buf = [md->start & md->mask];
>   size = md->mask + 1 - (md->start & md->mask);
>   md->start += size;
> -
> + if (compress) {
> + size = compress(comp_data, md->data, mmap_len, buf, 
> size);
> + buf = md->data;
> + }
>   if (push(md, to, buf, size) < 0) {
>   rc = -1;
>   goto out;

when we discussed the compress callback should be another layer
in perf_mmap__push I was thinking more of the layered/fifo design,
like:

normaly we call:

perf_mmap__push(... push = record__pushfn ...)
-> reads mmap data and calls push(data), which translates as:

record__pushfn(data);
- which stores the data


for compressed it'd be:

perf_mmap__push(... push = compressed_push ...)

-> reads mmap data and calls push(data), which translates as:

compressed_push(data)
-> reads data, compresses them and calls, next push 
callback in line:

record__pushfn(data)
- which stores the data


there'd need to be the logic for compressed_push to
remember the 'next push' function

but I think this was the original idea behind the
perf_mmap__push -> it gets the data and pushes them for
the next processing.. it should stay as simple as that

jirka


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-05 Thread Alexey Budankov


On 05.03.2019 3:01, Andi Kleen wrote:
> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
> 
> Could do this as a follow up patch, but at some point the new
> records need to be documented in Documentation/perf.data-file-format.txt

Well, let's have it as a part of v6 04/10.

Thanks,
Alexey

> 
> -Andi
> 


Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-04 Thread Andi Kleen
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

Could do this as a follow up patch, but at some point the new
records need to be documented in Documentation/perf.data-file-format.txt

-Andi