Re: [PATCH v4 3/4] trace2: don't overload target directories

2019-10-04 Thread Josh Steadmon
On 2019.10.04 11:12, Johannes Schindelin wrote:
> Hi Josh,
> 
> On Thu, 3 Oct 2019, Josh Steadmon wrote:
> 
> > [...]
> > diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> > index 5dda0ca1cd..af3405f179 100644
> > --- a/trace2/tr2_dst.c
> > +++ b/trace2/tr2_dst.c
> > @@ -1,3 +1,5 @@
> > +#include 
> > +
> 
> This completely breaks the Windows build:
> 
>   In file included from trace2/tr2_dst.c:1:
>   compat/win32/dirent.h:13:14: error: 'MAX_PATH' undeclared here (not in a
>   function)
>  13 |  char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 
> conversion) */
> |  ^~~~
> 
> See the full build log in all its glory here:
> 
> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=17409&view=logs&jobId=fd490c07-0b22-5182-fac9-6d67fe1e939b&taskId=ce91d5d6-0c55-50f5-8ab9-6695c03ab102&lineStart=252&lineEnd=255&colStart=1&colEnd=30
> 
> The fix is as easy as probably surprising: simply delete that
> `#include`. It is redundant anyway:
> https://github.com/git/git/blob/v2.23.0/git-compat-util.h#L184

Sorry about that. Fixed in V5, which will be out shortly. Is there a way
to trigger the Azure pipeline apart from using GitGitGadget?

> Deleting that `#include` line from your patch not only lets the file
> compile cleanly on Windows, it also conforms to our coding style where
> `cache.h` or `git-compat-util.h` need to be the first `#include`. That
> rule's purpose is precisely to prevent platform-specific compile errors
> like the one shown above.

Hmm, I wonder if Coccinelle could catch more CodingGuideline mistakes
such as this? Although it seems there are a few exceptions to this rule,
so maybe it's not a good fit in this particular case.

> Ciao,
> Johannes
> 
> >  #include "cache.h"
> >  #include "trace2/tr2_dst.h"
> >  #include "trace2/tr2_sid.h"
> > @@ -8,6 +10,19 @@
> >   */
> >  #define MAX_AUTO_ATTEMPTS 10
> >
> > +/*
> > + * Sentinel file used to detect when we're overloading a directory with 
> > too many
> > + * trace files.
> > + */
> > +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> > +
> > +/*
> > + * When set to zero, disables directory overload checking. Otherwise, 
> > controls
> > + * how many files we can write to a directory before entering overload 
> > mode.
> > + * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
> > + */
> > +static int tr2env_max_files = 0;
> > +
> >  static int tr2_dst_want_warning(void)
> >  {
> > static int tr2env_dst_debug = -1;
> > @@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
> > dst->need_close = 0;
> >  }
> >
> > +/*
> > + * Check to make sure we're not overloading the target directory with too 
> > many
> > + * files. First get the threshold (if present) from the config or envvar. 
> > If
> > + * it's zero or unset, disable this check.  Next check for the presence of 
> > a
> > + * sentinel file, then check file count. If we are overloaded, create the
> > + * sentinel file if it doesn't already exist.
> > + *
> > + * We expect that some trace processing system is gradually collecting 
> > files
> > + * from the target directory; after it removes the sentinel file we'll 
> > start
> > + * writing traces again.
> > + */
> > +static int tr2_dst_overloaded(const char *tgt_prefix)
> > +{
> > +   int file_count = 0, max_files = 0, ret = 0;
> > +   const char *max_files_var;
> > +   DIR *dirp;
> > +   struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> > +   struct stat statbuf;
> > +
> > +   /* Get the config or envvar and decide if we should continue this check 
> > */
> > +   max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
> > +   if (max_files_var && *max_files_var && ((max_files = 
> > atoi(max_files_var)) >= 0))
> > +   tr2env_max_files = max_files;
> > +
> > +   if (!tr2env_max_files) {
> > +   ret = 0;
> > +   goto cleanup;
> > +   }
> > +
> > +   strbuf_addstr(&path, tgt_prefix);
> > +   if (!is_dir_sep(path.buf[path.len - 1])) {
> > +   strbuf_addch(&path, '/');
> > +   }
> > +
> > +   /* check sentinel */
> > +   strbuf_addbuf(&sentinel_path, &path);
> > +   strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +   if (!stat(sentinel_path.buf, &statbuf)) {
> > +   ret = 1;
> > +   goto cleanup;
> > +   }
> > +
> > +   /* check file count */
> > +   dirp = opendir(path.buf);
> > +   while (file_count < tr2env_max_files && dirp && readdir(dirp))
> > +   file_count++;
> > +   if (dirp)
> > +   closedir(dirp);
> > +
> > +   if (file_count >= tr2env_max_files) {
> > +   creat(sentinel_path.buf, 0666);
> > +   ret = 1;
> > +   goto cleanup;
> > +   }
> > +
> > +cleanup:
> > +   strbuf_release(&path);
> > +   strbuf_release(&sentinel_path);
> > +   return ret;
> > +}
> > +
> >  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char 
> > *tgt_prefix)
> >  {
> > int fd;
> > @@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct t

Re: [PATCH v4 3/4] trace2: don't overload target directories

2019-10-04 Thread Josh Steadmon
On 2019.10.04 09:25, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> 
> Sorry for not mentioning this, but "don't overload" is a suboptimal
> keyword for the entire topic and for this particular step for a few
> reasons.  For one, "overload" is an overloaded verb that gives an
> incorrect impression that the problem you are dealing with is that
> the target directory you specify is (mis)used for other purposes,
> which is not the case.  You instead refrain from creating too many
> files.  The other (which is probably more serious) is that it is
> unclear what approach you chose to solve the "directory ends up
> holding too many files".  One could simply discard new traces to do
> so, one could concatenate to existing files to avoid creating new
> files, one could even cycle the directory (i.e. path/to/log may
> become path/to/log.old.1 and path/to/log is recreated as an empty
> directory when it gets a new file).
> 
> trace2: discard new traces when a target directory has too many files
> 
> or something would convey the problem you are solving (i.e. "too
> many files" implying negative performance and usability impact
> coming from it) and solution (i.e. "discard new traces"), if it is
> the approach you have chosen.

Understood. Reworded in V5, which will be out shortly.

> > +   /* check sentinel */
> > +   strbuf_addbuf(&sentinel_path, &path);
> > +   strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> > +   if (!stat(sentinel_path.buf, &statbuf)) {
> > +   ret = 1;
> > +   goto cleanup;
> > +   }
> > +
> > +   /* check file count */
> > +   dirp = opendir(path.buf);
> > +   while (file_count < tr2env_max_files && dirp && readdir(dirp))
> > +   file_count++;
> > +   if (dirp)
> > +   closedir(dirp);
> 
> So, until we accumulate too many files in the directory, every
> process when it starts tracing will scan the output directory.
> Hopefully the max is not set to too large a value.


Re: [PATCH v4 3/4] trace2: don't overload target directories

2019-10-04 Thread Johannes Schindelin
Hi Josh,

On Thu, 3 Oct 2019, Josh Steadmon wrote:

> [...]
> diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
> index 5dda0ca1cd..af3405f179 100644
> --- a/trace2/tr2_dst.c
> +++ b/trace2/tr2_dst.c
> @@ -1,3 +1,5 @@
> +#include 
> +

This completely breaks the Windows build:

In file included from trace2/tr2_dst.c:1:
compat/win32/dirent.h:13:14: error: 'MAX_PATH' undeclared here (not in a
function)
   13 |  char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 
conversion) */
  |  ^~~~

See the full build log in all its glory here:

https://dev.azure.com/gitgitgadget/git/_build/results?buildId=17409&view=logs&jobId=fd490c07-0b22-5182-fac9-6d67fe1e939b&taskId=ce91d5d6-0c55-50f5-8ab9-6695c03ab102&lineStart=252&lineEnd=255&colStart=1&colEnd=30

The fix is as easy as probably surprising: simply delete that
`#include`. It is redundant anyway:
https://github.com/git/git/blob/v2.23.0/git-compat-util.h#L184

Deleting that `#include` line from your patch not only lets the file
compile cleanly on Windows, it also conforms to our coding style where
`cache.h` or `git-compat-util.h` need to be the first `#include`. That
rule's purpose is precisely to prevent platform-specific compile errors
like the one shown above.

Ciao,
Johannes

>  #include "cache.h"
>  #include "trace2/tr2_dst.h"
>  #include "trace2/tr2_sid.h"
> @@ -8,6 +10,19 @@
>   */
>  #define MAX_AUTO_ATTEMPTS 10
>
> +/*
> + * Sentinel file used to detect when we're overloading a directory with too 
> many
> + * trace files.
> + */
> +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
> +
> +/*
> + * When set to zero, disables directory overload checking. Otherwise, 
> controls
> + * how many files we can write to a directory before entering overload mode.
> + * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
> + */
> +static int tr2env_max_files = 0;
> +
>  static int tr2_dst_want_warning(void)
>  {
>   static int tr2env_dst_debug = -1;
> @@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
>   dst->need_close = 0;
>  }
>
> +/*
> + * Check to make sure we're not overloading the target directory with too 
> many
> + * files. First get the threshold (if present) from the config or envvar. If
> + * it's zero or unset, disable this check.  Next check for the presence of a
> + * sentinel file, then check file count. If we are overloaded, create the
> + * sentinel file if it doesn't already exist.
> + *
> + * We expect that some trace processing system is gradually collecting files
> + * from the target directory; after it removes the sentinel file we'll start
> + * writing traces again.
> + */
> +static int tr2_dst_overloaded(const char *tgt_prefix)
> +{
> + int file_count = 0, max_files = 0, ret = 0;
> + const char *max_files_var;
> + DIR *dirp;
> + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
> + struct stat statbuf;
> +
> + /* Get the config or envvar and decide if we should continue this check 
> */
> + max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
> + if (max_files_var && *max_files_var && ((max_files = 
> atoi(max_files_var)) >= 0))
> + tr2env_max_files = max_files;
> +
> + if (!tr2env_max_files) {
> + ret = 0;
> + goto cleanup;
> + }
> +
> + strbuf_addstr(&path, tgt_prefix);
> + if (!is_dir_sep(path.buf[path.len - 1])) {
> + strbuf_addch(&path, '/');
> + }
> +
> + /* check sentinel */
> + strbuf_addbuf(&sentinel_path, &path);
> + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> + if (!stat(sentinel_path.buf, &statbuf)) {
> + ret = 1;
> + goto cleanup;
> + }
> +
> + /* check file count */
> + dirp = opendir(path.buf);
> + while (file_count < tr2env_max_files && dirp && readdir(dirp))
> + file_count++;
> + if (dirp)
> + closedir(dirp);
> +
> + if (file_count >= tr2env_max_files) {
> + creat(sentinel_path.buf, 0666);
> + ret = 1;
> + goto cleanup;
> + }
> +
> +cleanup:
> + strbuf_release(&path);
> + strbuf_release(&sentinel_path);
> + return ret;
> +}
> +
>  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
>  {
>   int fd;
> @@ -50,6 +126,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, 
> const char *tgt_prefix)
>   strbuf_addstr(&path, sid);
>   base_path_len = path.len;
>
> + if (tr2_dst_overloaded(tgt_prefix)) {
> + strbuf_release(&path);
> + if (tr2_dst_want_warning())
> + warning("trace2: not opening %s trace file due to too "
> + "many files in target directory %s",
> + tr2_sysenv_display_name(dst->sysenv_var),
> + tgt_prefix);
> + return 0;
> + }
> +
>   for (attempt_count = 0; attempt

Re: [PATCH v4 3/4] trace2: don't overload target directories

2019-10-03 Thread Junio C Hamano
Josh Steadmon  writes:

> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.

Sorry for not mentioning this, but "don't overload" is a suboptimal
keyword for the entire topic and for this particular step for a few
reasons.  For one, "overload" is an overloaded verb that gives an
incorrect impression that the problem you are dealing with is that
the target directory you specify is (mis)used for other purposes,
which is not the case.  You instead refrain from creating too many
files.  The other (which is probably more serious) is that it is
unclear what approach you chose to solve the "directory ends up
holding too many files".  One could simply discard new traces to do
so, one could concatenate to existing files to avoid creating new
files, one could even cycle the directory (i.e. path/to/log may
become path/to/log.old.1 and path/to/log is recreated as an empty
directory when it gets a new file).

trace2: discard new traces when a target directory has too many files

or something would convey the problem you are solving (i.e. "too
many files" implying negative performance and usability impact
coming from it) and solution (i.e. "discard new traces"), if it is
the approach you have chosen.

> + /* check sentinel */
> + strbuf_addbuf(&sentinel_path, &path);
> + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
> + if (!stat(sentinel_path.buf, &statbuf)) {
> + ret = 1;
> + goto cleanup;
> + }
> +
> + /* check file count */
> + dirp = opendir(path.buf);
> + while (file_count < tr2env_max_files && dirp && readdir(dirp))
> + file_count++;
> + if (dirp)
> + closedir(dirp);

So, until we accumulate too many files in the directory, every
process when it starts tracing will scan the output directory.
Hopefully the max is not set to too large a value.


[PATCH v4 3/4] trace2: don't overload target directories

2019-10-03 Thread Josh Steadmon
trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the directory is overloaded. A directory is overloaded
  if there is a sentinel file declaring an overload, or if the number of
  files exceeds trace2.maxFiles. If the latter, create a sentinel file
  to speed up later overload checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the
overload check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Signed-off-by: Josh Steadmon 
---
 Documentation/config/trace2.txt |  6 +++
 t/t0212-trace2-event.sh | 17 +++
 trace2/tr2_dst.c| 86 +
 trace2/tr2_sysenv.c |  3 ++
 trace2/tr2_sysenv.h |  2 +
 5 files changed, 114 insertions(+)

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@ trace2.destinationDebug::
By default, these errors are suppressed and tracing is
silently disabled.  May be overridden by the
`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+   Integer.  When writing trace files to a target directory, do not
+   write additional traces if we would exceed this many files. Instead,
+   write a sentinel file that will block further tracing to this
+   directory. Defaults to 0, which disables this check.
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index ff5b9cc729..2ff97e72da 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -265,4 +265,21 @@ test_expect_success JSON_PP 'using global config, event 
stream, error event' '
test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+   mkdir trace_target_dir &&
+   test_when_finished "rm -r trace_target_dir" &&
+   (
+   GIT_TRACE2_MAX_FILES=5 &&
+   export GIT_TRACE2_MAX_FILES &&
+   cd trace_target_dir &&
+   test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+   xargs touch <../expected_filenames.txt &&
+   cd .. &&
+   GIT_TRACE2_EVENT="$(pwd)/trace_target_dir" test-tool trace2 
001return 0
+   ) &&
+   echo git-trace2-overload >>expected_filenames.txt &&
+   ls trace_target_dir >ls_output.txt &&
+   test_cmp expected_filenames.txt ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..af3405f179 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@
+#include 
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,19 @@
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too 
many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * When set to zero, disables directory overload checking. Otherwise, controls
+ * how many files we can write to a directory before entering overload mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
static int tr2env_dst_debug = -1;
@@ -32,6 +47,67 @@ void tr2_dst_trace_disable(struct tr2_dst *dst)
dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+   int file_count = 0, max_files = 0, ret = 0;
+   const char *max_files_var;
+   DIR *dirp;
+   struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+   struct stat statbuf;
+
+   /* Get the config or envvar and decide if we should continue this check 
*/
+   max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+   if (max_files_var