Re: [PATCH] pgbench log file headers

2024-03-30 Thread Andrey M. Borodin



> On 6 Jan 2024, at 20:20, vignesh C  wrote:
> 
> One of the test has failed in CFBOT at[1] with:

Hi Adam,

This is a kind reminder that CF entry [0] is waiting for an update. Thanks!


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/4660/




Re: [PATCH] pgbench log file headers

2024-01-06 Thread vignesh C
On Tue, 21 Nov 2023 at 09:52, Adam Hendel  wrote:
>
> Hello,
>
> On Mon, Nov 13, 2023 at 6:01 PM Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
>> > Currently, pgbench will log individual transactions to a logfile when the
>> > `--log` parameter flag is provided. The logfile, however, does not include
>> > column header. It has become a fairly standard expectation of users to have
>> > column headers present in flat files. Without the header in the pgbench log
>> > files, new users must navigate to the docs and piece together the column
>> > headers themselves. Most industry leading frameworks have tooling built in
>> > to read column headers though, for example python/pandas read_csv().
>>
>> The disadvantage of doing that is that a single pgbench run with --log will
>> generate many files when using -j, to avoid contention. The easiest way to
>> deal with that after the run is to cat all the log files to a larger one. If
>> you do that with headers present, you obviously have a few bogus rows (the
>> heades from the various files).
>>
>> I guess you could avoid the "worst" of that by documenting that the combined
>> log file should be built by
>>   cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
>> and only outputting the header in the file generated by the main thread.
>>
>>
>> > We can improve the experience for users by adding column headers to pgbench
>> > logfiles with an optional command line flag, `--log-header`. This will keep
>> > the API backwards compatible by making users opt-in to the column headers.
>> > It follows the existing pattern of having conditional flags in pgbench’s
>> > API; the `--log` option would have both –log-prefix and –log-header if this
>> > work is accepted.
>>
>> > The implementation considers the column headers only when the
>> > `--log-header` flag is present. The values for the columns are taken
>> > directly from the “Per-Transaction Logging” section in
>> > https://www.postgresql.org/docs/current/pgbench.html and takes into account
>> > the conditional columns `schedule_lag` and `retries`.
>> >
>> >
>> > Below is an example of what that logfile will look like:
>> >
>> >
>> > pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
>> > --log-header
>> >
>> > client_id transaction_no time script_no time_epoch time_us
>>
>> Independent of your patch, but we imo ought to combine time_epoch time_us in
>> the log output. There's no point in forcing consumers to combine those 
>> fields,
>> and it makes logging more expensive...  And if we touch this, we should just
>> switch to outputting nanoseconds instead of microseconds.
>>
>> It also is quite odd that we have "time" and "time_epoch", "time_us", where
>> time is the elapsed time of an individual "transaction" and time_epoch +
>> time_us together are a wall-clock timestamp.  Without differentiating between
>> those, the column headers seem not very useful, because one needs to look in
>> the documentation to understand the fields.
>>
>>
>> I don't think there's all that strong a need for backward compatibility in
>> pgbench. We could just change the columns as I suggest above and then always
>> emit the header in the "main" log file.
>
>
> I updated the patch to always log the header and only off the main thread. As 
> for the time headers, I will work on renaming/combining those in a separate 
> commit as:

One of the test has failed in CFBOT at[1] with:
[09:15:00.526](0.000s) not ok 422 - transaction count for
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193
(11)
[09:15:00.526](0.000s) #   Failed test 'transaction count for
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193
(11)'
#   at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl
line 1257.
[09:15:00.526](0.000s) not ok 423 - transaction format for 001_pgbench_log_3
[09:15:00.526](0.000s) #   Failed test 'transaction format for
001_pgbench_log_3'
#   at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl
line 1257.
# Log entry not matching: client_id transaction_no time script_no
time_epoch time_us
# Running: pgbench --no-vacuum -f
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_incomplete_transaction_block

More details for the same is available at [2]

[1] - https://cirrus-ci.com/task/5139049757802496
[2] - 
https://api.cirrus-ci.com/v1/artifact/task/5139049757802496/testrun/build/testrun/pgbench/001_pgbench_with_server/log/regress_log_001_pgbench_with_server

Regards,
Vignesh




Re: [PATCH] pgbench log file headers

2023-11-20 Thread Adam Hendel
Hello,

On Mon, Nov 13, 2023 at 6:01 PM Andres Freund  wrote:

> Hi,
>
> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
> > Currently, pgbench will log individual transactions to a logfile when the
> > `--log` parameter flag is provided. The logfile, however, does not
> include
> > column header. It has become a fairly standard expectation of users to
> have
> > column headers present in flat files. Without the header in the pgbench
> log
> > files, new users must navigate to the docs and piece together the column
> > headers themselves. Most industry leading frameworks have tooling built
> in
> > to read column headers though, for example python/pandas read_csv().
>
> The disadvantage of doing that is that a single pgbench run with --log will
> generate many files when using -j, to avoid contention. The easiest way to
> deal with that after the run is to cat all the log files to a larger one.
> If
> you do that with headers present, you obviously have a few bogus rows (the
> heades from the various files).
>
> I guess you could avoid the "worst" of that by documenting that the
> combined
> log file should be built by
>   cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
> and only outputting the header in the file generated by the main thread.
>
>
> > We can improve the experience for users by adding column headers to
> pgbench
> > logfiles with an optional command line flag, `--log-header`. This will
> keep
> > the API backwards compatible by making users opt-in to the column
> headers.
> > It follows the existing pattern of having conditional flags in pgbench’s
> > API; the `--log` option would have both –log-prefix and –log-header if
> this
> > work is accepted.
>
> > The implementation considers the column headers only when the
> > `--log-header` flag is present. The values for the columns are taken
> > directly from the “Per-Transaction Logging” section in
> > https://www.postgresql.org/docs/current/pgbench.html and takes into
> account
> > the conditional columns `schedule_lag` and `retries`.
> >
> >
> > Below is an example of what that logfile will look like:
> >
> >
> > pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> > --log-header
> >
> > client_id transaction_no time script_no time_epoch time_us
>
> Independent of your patch, but we imo ought to combine time_epoch time_us
> in
> the log output. There's no point in forcing consumers to combine those
> fields,
> and it makes logging more expensive...  And if we touch this, we should
> just
> switch to outputting nanoseconds instead of microseconds.
>
> It also is quite odd that we have "time" and "time_epoch", "time_us", where
> time is the elapsed time of an individual "transaction" and time_epoch +
> time_us together are a wall-clock timestamp.  Without differentiating
> between
> those, the column headers seem not very useful, because one needs to look
> in
> the documentation to understand the fields.
>
>
> I don't think there's all that strong a need for backward compatibility in
> pgbench. We could just change the columns as I suggest above and then
> always
> emit the header in the "main" log file.
>

I updated the patch to always log the header and only off the main thread.
As for the time headers, I will work on renaming/combining those in a
separate commit as:

time -> time_elapsed
time_epoch + time_us -> time_completion_us



> Greetings,
>
> Andres Freund
>


v2-0001-Adds-colum-headers-to-the-log-file-produced-by-pg.patch
Description: Binary data


Re: [PATCH] pgbench log file headers

2023-11-15 Thread Adam Hendel
Hello


On Mon, Nov 13, 2023 at 6:01 PM Andres Freund  wrote:

> Hi,
>
> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
> > Currently, pgbench will log individual transactions to a logfile when the
> > `--log` parameter flag is provided. The logfile, however, does not
> include
> > column header. It has become a fairly standard expectation of users to
> have
> > column headers present in flat files. Without the header in the pgbench
> log
> > files, new users must navigate to the docs and piece together the column
> > headers themselves. Most industry leading frameworks have tooling built
> in
> > to read column headers though, for example python/pandas read_csv().
>
> The disadvantage of doing that is that a single pgbench run with --log will
> generate many files when using -j, to avoid contention. The easiest way to
> deal with that after the run is to cat all the log files to a larger one.
> If
> you do that with headers present, you obviously have a few bogus rows (the
> heades from the various files).
>
> I guess you could avoid the "worst" of that by documenting that the
> combined
> log file should be built by
>   cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
> and only outputting the header in the file generated by the main thread.
>
>
> We can improve the experience for users by adding column headers to
> pgbench
> > logfiles with an optional command line flag, `--log-header`. This will
> keep
> > the API backwards compatible by making users opt-in to the column
> headers.
> > It follows the existing pattern of having conditional flags in pgbench’s
> > API; the `--log` option would have both –log-prefix and –log-header if
> this
> > work is accepted.
>
> > The implementation considers the column headers only when the
> > `--log-header` flag is present. The values for the columns are taken
> > directly from the “Per-Transaction Logging” section in
> > https://www.postgresql.org/docs/current/pgbench.html and takes into
> account
> > the conditional columns `schedule_lag` and `retries`.
> >
> >
> > Below is an example of what that logfile will look like:
> >
> >
> > pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> > --log-header
> >
> > client_id transaction_no time script_no time_epoch time_us
>
> Independent of your patch, but we imo ought to combine time_epoch time_us
> in
> the log output. There's no point in forcing consumers to combine those
> fields,
> and it makes logging more expensive...  And if we touch this, we should
> just
> switch to outputting nanoseconds instead of microseconds.
>
> It also is quite odd that we have "time" and "time_epoch", "time_us", where
> time is the elapsed time of an individual "transaction" and time_epoch +
> time_us together are a wall-clock timestamp.  Without differentiating
> between
> those, the column headers seem not very useful, because one needs to look
> in
> the documentation to understand the fields.


> I don't think there's all that strong a need for backward compatibility in
> pgbench. We could just change the columns as I suggest above and then
> always
> emit the header in the "main" log file.
>
>
Do you think this should be done in separate patches?

First patch: log the column headers in the "main" log file. No --log-header
flag, make it the default behavior of --log.

Next patch: collapse "time_epoch" and "time_us" in the log output and give
the "time" column a name that is more clear.

Adam


> Greetings,
>
> Andres Freund
>


Re: [PATCH] pgbench log file headers

2023-11-13 Thread Andres Freund
Hi,

On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
> Currently, pgbench will log individual transactions to a logfile when the
> `--log` parameter flag is provided. The logfile, however, does not include
> column header. It has become a fairly standard expectation of users to have
> column headers present in flat files. Without the header in the pgbench log
> files, new users must navigate to the docs and piece together the column
> headers themselves. Most industry leading frameworks have tooling built in
> to read column headers though, for example python/pandas read_csv().

The disadvantage of doing that is that a single pgbench run with --log will
generate many files when using -j, to avoid contention. The easiest way to
deal with that after the run is to cat all the log files to a larger one. If
you do that with headers present, you obviously have a few bogus rows (the
heades from the various files).

I guess you could avoid the "worst" of that by documenting that the combined
log file should be built by
  cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
and only outputting the header in the file generated by the main thread.


> We can improve the experience for users by adding column headers to pgbench
> logfiles with an optional command line flag, `--log-header`. This will keep
> the API backwards compatible by making users opt-in to the column headers.
> It follows the existing pattern of having conditional flags in pgbench’s
> API; the `--log` option would have both –log-prefix and –log-header if this
> work is accepted.

> The implementation considers the column headers only when the
> `--log-header` flag is present. The values for the columns are taken
> directly from the “Per-Transaction Logging” section in
> https://www.postgresql.org/docs/current/pgbench.html and takes into account
> the conditional columns `schedule_lag` and `retries`.
> 
> 
> Below is an example of what that logfile will look like:
> 
> 
> pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> --log-header
> 
> client_id transaction_no time script_no time_epoch time_us

Independent of your patch, but we imo ought to combine time_epoch time_us in
the log output. There's no point in forcing consumers to combine those fields,
and it makes logging more expensive...  And if we touch this, we should just
switch to outputting nanoseconds instead of microseconds.

It also is quite odd that we have "time" and "time_epoch", "time_us", where
time is the elapsed time of an individual "transaction" and time_epoch +
time_us together are a wall-clock timestamp.  Without differentiating between
those, the column headers seem not very useful, because one needs to look in
the documentation to understand the fields.


I don't think there's all that strong a need for backward compatibility in
pgbench. We could just change the columns as I suggest above and then always
emit the header in the "main" log file.

Greetings,

Andres Freund




Re: [PATCH] pgbench log file headers

2023-11-13 Thread Andrew Atkinson
Hi Adam. Column headers in pgbench log files seem helpful. Besides
programs, it seems helpful for humans to understand the column data as
well. I was able to apply your patch and verify that the headers are added
to the log file:

andy@MacBook-Air-4 ~/P/postgres (master)> rm pgbench_log.*

andy@MacBook-Air-4 ~/P/postgres (master)> src/bin/pgbench/pgbench
postgres://andy:@localhost:5432/postgres --log --log-header

pgbench (17devel)




andy@MacBook-Air-4 ~/P/postgres (master)> cat pgbench_log.*

client_id transaction_no time script_no time_epoch time_us

0 1 8435 0 1699902315 902700

0 2 1130 0 1699902315 903973

...



The generated pgbench_log.62387 log file showed headers "client_id
transaction_no time script_no time_epoch time_us". Hope that helps with
your patch acceptance journey.


Good luck!


Andrew Atkinson

On Mon, Nov 13, 2023 at 11:55 AM Adam Hendel  wrote:

> Hello Hackers!
>
> Currently, pgbench will log individual transactions to a logfile when the
> `--log` parameter flag is provided. The logfile, however, does not include
> column header. It has become a fairly standard expectation of users to have
> column headers present in flat files. Without the header in the pgbench log
> files, new users must navigate to the docs and piece together the column
> headers themselves. Most industry leading frameworks have tooling built in
> to read column headers though, for example python/pandas read_csv().
>
> We can improve the experience for users by adding column headers to
> pgbench logfiles with an optional command line flag, `--log-header`. This
> will keep the API backwards compatible by making users opt-in to the column
> headers. It follows the existing pattern of having conditional flags in
> pgbench’s API; the `--log` option would have both –log-prefix and
> –log-header if this work is accepted.
>
> The implementation considers the column headers only when the
> `--log-header` flag is present. The values for the columns are taken
> directly from the “Per-Transaction Logging” section in
> https://www.postgresql.org/docs/current/pgbench.html and takes into
> account the conditional columns `schedule_lag` and `retries`.
>
>
> Below is an example of what that logfile will look like:
>
>
> pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> --log-header
>
> client_id transaction_no time script_no time_epoch time_us
>
> 0 1 1863 0 169988 791102
>
> 0 2 706 0 169988 791812
>
>
> If the interface and overall approach makes sense, I will work on adding
> documentation and tests for this too.
>
> Respectfully,
>
> Adam Hendel
>
>


[PATCH] pgbench log file headers

2023-11-13 Thread Adam Hendel
Hello Hackers!

Currently, pgbench will log individual transactions to a logfile when the
`--log` parameter flag is provided. The logfile, however, does not include
column header. It has become a fairly standard expectation of users to have
column headers present in flat files. Without the header in the pgbench log
files, new users must navigate to the docs and piece together the column
headers themselves. Most industry leading frameworks have tooling built in
to read column headers though, for example python/pandas read_csv().

We can improve the experience for users by adding column headers to pgbench
logfiles with an optional command line flag, `--log-header`. This will keep
the API backwards compatible by making users opt-in to the column headers.
It follows the existing pattern of having conditional flags in pgbench’s
API; the `--log` option would have both –log-prefix and –log-header if this
work is accepted.

The implementation considers the column headers only when the
`--log-header` flag is present. The values for the columns are taken
directly from the “Per-Transaction Logging” section in
https://www.postgresql.org/docs/current/pgbench.html and takes into account
the conditional columns `schedule_lag` and `retries`.


Below is an example of what that logfile will look like:


pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
--log-header

client_id transaction_no time script_no time_epoch time_us

0 1 1863 0 169988 791102

0 2 706 0 169988 791812


If the interface and overall approach makes sense, I will work on adding
documentation and tests for this too.

Respectfully,

Adam Hendel


v1-0001-Adds-log-header-flag-to-pgbench-which-adds-column.patch
Description: Binary data