Re: [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-21 Thread Jeff King
On Wed, Sep 21, 2016 at 08:23:11PM +0200, Johannes Schindelin wrote:

> We solve this by introducing a helper, regexec_buf(), that takes a
> pointer and a length instead of a NUL-terminated string.
> 
> This helper then uses REG_STARTEND where available, and falls back to
> allocating and constructing a NUL-terminated string. Given the
> wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
> for Windows has it because it uses compat/regex/ that has it), I think
> this is a fair trade-off.

I did a double-take on this, but then read:

> Changes since v3:
> [...]
> - removed fallback when REG_STARTEND is not supported, in favor of
>   requiring NO_REGEX.

So I think we are all in agreement. :)

With the exception of a few commit message fixups that Junio already
pointed out, this looks good to me. Thanks.

-Peff


[PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-21 Thread Johannes Schindelin
[Cc:ing Benjamin Kramer & René Scharfe because they both worked on
the REG_STARTEND code in grep.c that I replace in this iteration of the
patch series]

This patch series addresses a problem where `git diff` is called using
`-G` or `-S --pickaxe-regex` on new-born files that are configured
without user diff drivers, and that hence get mmap()ed into memory.

The problem with that: mmap()ed memory is *not* NUL-terminated, yet the
pickaxe code calls regexec() on it just the same.

This problem has been reported by my colleague Chris Sidi.

We solve this by introducing a helper, regexec_buf(), that takes a
pointer and a length instead of a NUL-terminated string.

This helper then uses REG_STARTEND where available, and falls back to
allocating and constructing a NUL-terminated string. Given the
wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
for Windows has it because it uses compat/regex/ that has it), I think
this is a fair trade-off.

Changes since v3:

- reworded the onelines as per Junio's suggestions.

- removed fallback when REG_STARTEND is not supported, in favor of
  requiring NO_REGEX.

- removed the regmatch() function from grep.c, in favor of using
  regexec_buf().


Johannes Schindelin (3):
  regex: -G feeds a non NUL-terminated string to regexec() and
fails
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: use regexec_buf()

 Makefile|  3 ++-
 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 git-compat-util.h   | 13 +
 grep.c  | 14 ++
 t/t4061-diff-pickaxe.sh | 22 ++
 xdiff-interface.c   | 13 -
 7 files changed, 53 insertions(+), 33 deletions(-)
 create mode 100755 t/t4061-diff-pickaxe.sh

Published-As: https://github.com/dscho/git/releases/tag/mmap-regexec-v4
Fetch-It-Via: git fetch https://github.com/dscho/git mmap-regexec-v4

Interdiff vs v3:

 diff --git a/Makefile b/Makefile
 index df4f86b..c6f7f66 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -301,7 +301,8 @@ all::
  # crashes due to allocation and free working on different 'heaps'.
  # It's defined automatically if USE_NED_ALLOCATOR is set.
  #
 -# Define NO_REGEX if you have no or inferior regex support in your C library.
 +# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 +# feature.
  #
  # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
  # user.
 diff --git a/git-compat-util.h b/git-compat-util.h
 index 627ec5f..8aab0c3 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -977,25 +977,17 @@ void git_qsort(void *base, size_t nmemb, size_t size,
  #define qsort git_qsort
  #endif
  
 +#ifndef REG_STARTEND
 +#error "Git requires REG_STARTEND support. Compile with 
NO_REGEX=NeedsStartEnd"
 +#endif
 +
  static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
size,
  size_t nmatch, regmatch_t pmatch[], int eflags)
  {
 -#ifdef REG_STARTEND
assert(nmatch > 0 && pmatch);
pmatch[0].rm_so = 0;
pmatch[0].rm_eo = size;
return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
 -#else
 -  char *buf2 = xmalloc(size + 1);
 -  int ret;
 -
 -  memcpy(buf2, buf, size);
 -  buf2[size] = '\0';
 -  ret = regexec(preg, buf2, nmatch, pmatch, eflags);
 -  free(buf2);
 -
 -  return ret;
 -#endif
  }
  
  #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 diff --git a/grep.c b/grep.c
 index d7d00b8..1194d35 100644
 --- a/grep.c
 +++ b/grep.c
 @@ -898,17 +898,6 @@ static int fixmatch(struct grep_pat *p, char *line, char 
*eol,
}
  }
  
 -static int regmatch(const regex_t *preg, char *line, char *eol,
 -  regmatch_t *match, int eflags)
 -{
 -#ifdef REG_STARTEND
 -  match->rm_so = 0;
 -  match->rm_eo = eol - line;
 -  eflags |= REG_STARTEND;
 -#endif
 -  return regexec(preg, line, 1, match, eflags);
 -}
 -
  static int patmatch(struct grep_pat *p, char *line, char *eol,
regmatch_t *match, int eflags)
  {
 @@ -919,7 +908,8 @@ static int patmatch(struct grep_pat *p, char *line, char 
*eol,
else if (p->pcre_regexp)
hit = !pcrematch(p, line, eol, match, eflags);
else
 -  hit = !regmatch(>regexp, line, eol, match, eflags);
 +  hit = !regexec_buf(>regexp, line, eol - line, 1, match,
 + eflags);
  
return hit;
  }

-- 
2.10.0.windows.1.10.g803177d

base-commit: f6727b0509ec3417a5183ba6e658143275a734f5