Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-16 Thread David Malcolm
On Wed, 2015-12-09 at 18:44 +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:58 PM, David Malcolm wrote:
> > On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> >>
> >> This seems like fairly low impact but also low cost, so I'm fine with it
> >> in principle. I wonder whether the length of the marker is the same
> >> across all versions of patch (and VC tools)?
> >
> > It's hardcoded for GNU patch:
> [...]
>  >From what I can tell, Perforce is the outlier here.
> 
> Thanks for checking all that.
> 
> >> Just thinking out loud - I guess it would be too much to hope for to
> >> share lexers between frontends so that we need only one copy of this?
> >
> > Probably :(
> 
> Someone slap sense into me, I just thought of deriving C and C++ parsers 
> from a common base class... (no this is not a suggestion for this patch).
> 
> > Would a better wording be:
> >
> > extern short some_var; /* This line would lead to a warning due to the
> >duplicate name, but it is skipped when handling
> >the conflict marker.  */
> 
> I think so, yes.
> 
> > That said, it's not clear they're always at the beginning of a line;
> > this bazaar bug indicates that CVS (and bazaar) can emit them
> > mid-line:
> >https://bugs.launchpad.net/bzr/+bug/36399
> 
> Ok. CVS I think we shouldn't worry about, and it looks like this is one 
> particular bug/corner case where the conflict end marker is the last 
> thing in the file. I think on the whole it's best to check for beginning 
> of the line as you've done.
> 
> > Wording-wise, should it be "merge conflict marker", rather
> > than "patch conflict marker"?
> >
> > Clang spells it:
> > "error: version control conflict marker in file"
> > http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts
> 
> Yeah, if another compiler has a similar/identical diagnostic I think we 
> should just copy that unless there's a very good reason not to.
> 
> > Rebased on top of r231445 (from yesterday).
> > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.
> >
> > OK for trunk?
> 
> I'm inclined to say yes since it was originally submitted in time and 
> it's hard to imagine how the change could be risky (I'll point out right 
> away that there are one or two other patches in the queue that were also 
> submitted in time which I feel should not be considered for gcc-6 at 
> this point due to risk).
> 
> Let's wait until the end of the week for objections, commit then.

Thanks.  I updated it based on the feedback above, including changing
the wording to match clang's:
 "error: version control conflict marker in file"
I replaced "patch conflict marker" with "conflict marker" in the code
and names of test cases.

I took the liberty of adding:
  gcc_assert (n > 0);
to c_parser_peek_nth_token based on Martin's feedback.

Having verified bootstrap (on x86_64-pc-linux-gnu), I've
committed it to trunk as r231712.

I'm attaching what I committed, for reference.
Index: gcc/c-family/ChangeLog
===
--- gcc/c-family/ChangeLog	(revision 231711)
+++ gcc/c-family/ChangeLog	(revision 231712)
@@ -1,3 +1,8 @@
+2015-12-16  David Malcolm  
+
+	* c-common.h (conflict_marker_get_final_tok_kind): New prototype.
+	* c-lex.c (conflict_marker_get_final_tok_kind): New function.
+
 2015-12-15  Ilya Verbin  
 
 	* c-common.c (c_common_attribute_table): Handle "omp declare target
Index: gcc/c-family/c-lex.c
===
--- gcc/c-family/c-lex.c	(revision 231711)
+++ gcc/c-family/c-lex.c	(revision 231712)
@@ -1263,3 +1263,29 @@
 
   return value;
 }
+
+/* Helper function for c_parser_peek_conflict_marker
+   and cp_lexer_peek_conflict_marker.
+   Given a possible conflict marker token of kind TOK1_KIND
+   consisting of a pair of characters, get the token kind for the
+   standalone final character.  */
+
+enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind)
+{
+  switch (tok1_kind)
+{
+default: gcc_unreachable ();
+case CPP_LSHIFT:
+  /* "<<" and '<' */
+  return CPP_LESS;
+
+case CPP_EQ_EQ:
+  /* "==" and '=' */
+  return CPP_EQ;
+
+case CPP_RSHIFT:
+  /* ">>" and '>' */
+  return CPP_GREATER;
+}
+}
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h	(revision 231711)
+++ gcc/c-family/c-common.h	(revision 231712)
@@ -1089,6 +1089,10 @@
 extern int c_gimplify_expr (tree *, gimple_seq *, gimple_seq *);
 extern tree c_build_bind_expr (location_t, tree, tree);
 
+/* In c-lex.c.  */
+extern enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+
 /* In c-pch.c  */
 extern void pch_init (void);
 extern void pch_cpp_save_state (void);
Index: 

[PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread David Malcolm
On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> On 10/30/2015 04:16 PM, David Malcolm wrote:
> > The idea is to more gracefully handle merger conflict markers
> > in the source code being compiled.  Specifically, in the C and
> > C++ frontends, if we're about to emit an error, see if the
> > source code is at a merger conflict marker, and if so, emit
> > a more specific message, so that the user gets this:
> >
> > foo.c:1:1: error: source file contains patch conflict marker
> >   <<< HEAD
> >   ^
> >
> > It's something of a "fit and finish" cosmetic item, but these
> > things add up.
>
> This seems like fairly low impact but also low cost, so I'm fine with it
> in principle. I wonder whether the length of the marker is the same
> across all versions of patch (and VC tools)?

It's hardcoded for GNU patch:
  http://git.savannah.gnu.org/cgit/patch.git/tree/src/merge.c
which hardcodes e.g.:
  fputs (outstate->after_newline + "\n<<<\n", fp);

I don't know if it's hardcoded for CVS or Subversion, but both have
documentation showing that format:
 ftp://ftp.gnu.org/old-gnu/Manuals/cvs/html_node/cvs_38.html
 http://svnbook.red-bean.com/en/1.7/svn.tour.cycle.html#svn.tour.cycle.resolve

It's the default of git:
  http://git.kernel.org/cgit/git/git.git/tree/Documentation/merge-config.txt
(config option "merge.conflictStyle")
This git commit (to git) seems to have generalized it to support a
"conflict-marker-size" attribute:
  https://github.com/git/git/commit/8588567c96490b8d236b1bc13f9bcb0dfa118efe

Mercurial uses them; the format appears to be a keyword-argument in:
  https://selenic.com/hg/file/tip/mercurial/simplemerge.py#l91
but it's hardcoded in this regex in filemerge.py:
if re.search("^(<<< .*|===|>>> .*)$", fcd.data(),

Bazaar uses them; see e.g.:
  
http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/tests/test_merge3.py
(I couldn't easily tell if they're configurable)

FWIW, Perforce appears to use a different format;
http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.resolve.html
has an example showing:

 ORIGINAL file#n
(text from the original version)
 THEIR file#m
(text from their file)
 YOURS file
(text from your file)


>From what I can tell, Perforce is the outlier here.

> > +static bool
> > +c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind)
> > +{
> > +  c_token *token2 = c_parser_peek_2nd_token (parser);
> > +  if (token2->type != tok1_kind)
> > +return false;
> > +  c_token *token3 = c_parser_peek_nth_token (parser, 3);
> > +  if (token3->type != tok1_kind)
> > +return false;
> > +  c_token *token4 = c_parser_peek_nth_token (parser, 4);
> > +  if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind))
> > +return false;
> > +  return true;
> > +}
>
> Just thinking out loud - I guess it would be too much to hope for to
> share lexers between frontends so that we need only one copy of this?

Probably :(

> > +extern short some_var; /* this line would lead to a warning */
>
> Would or does? I don't see anything suppressing it?

It's skipped in error-handling.

c_parser_declaration_or_fndef has:
1794  declarator = c_parser_declarator (parser,
1795specs->typespec_kind != 
ctsk_none,
1796C_DTR_NORMAL, );
1797  if (declarator == NULL)
1798{
[...snip...]
1807  c_parser_skip_to_end_of_block_or_statement (parser);

The call to c_parser_declarator fails, and when issuing:
3465  c_parser_error (parser, "expected identifier or %<(%>");
we emit the "conflict marker" wording error for the error.

Then at line 1807 we skip, discarding everything up to the ";" in that
decl.

Would a better wording be:

extern short some_var; /* This line would lead to a warning due to the
  duplicate name, but it is skipped when handling
  the conflict marker.  */

> There seems to be no testcase verifying what happens if the marker is
> not at the start of the line (IMO it should not be interpreted as a marker).

The v3 patch actually reported them as markers regardless of
location.
The v4 patch now verifies that they are at the start of the line;
I've added test coverage for this (patch-conflict-markers-11.c).

That said, it's not clear they're always at the beginning of a line;
this bazaar bug indicates that CVS (and bazaar) can emit them
mid-line:
  https://bugs.launchpad.net/bzr/+bug/36399

I noticed a visual glitch with the v3 patch now that we have range
information for tokens:  with caret-printing, we get just the first
token within the marker underlined:

 <<< HEAD
 ^~

which looks strange (especially with the underlined chars colorized).

Hence in the v4 patch I've added a location tweak so that it
underline/colorizes *all* of the marker:

 <<< HEAD
 ^~~

Wording-wise, should it be 

Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 05:58 PM, David Malcolm wrote:

On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:


This seems like fairly low impact but also low cost, so I'm fine with it
in principle. I wonder whether the length of the marker is the same
across all versions of patch (and VC tools)?


It's hardcoded for GNU patch:

[...]
>From what I can tell, Perforce is the outlier here.

Thanks for checking all that.


Just thinking out loud - I guess it would be too much to hope for to
share lexers between frontends so that we need only one copy of this?


Probably :(


Someone slap sense into me, I just thought of deriving C and C++ parsers 
from a common base class... (no this is not a suggestion for this patch).



Would a better wording be:

extern short some_var; /* This line would lead to a warning due to the
   duplicate name, but it is skipped when handling
   the conflict marker.  */


I think so, yes.


That said, it's not clear they're always at the beginning of a line;
this bazaar bug indicates that CVS (and bazaar) can emit them
mid-line:
   https://bugs.launchpad.net/bzr/+bug/36399


Ok. CVS I think we shouldn't worry about, and it looks like this is one 
particular bug/corner case where the conflict end marker is the last 
thing in the file. I think on the whole it's best to check for beginning 
of the line as you've done.



Wording-wise, should it be "merge conflict marker", rather
than "patch conflict marker"?

Clang spells it:
"error: version control conflict marker in file"
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts


Yeah, if another compiler has a similar/identical diagnostic I think we 
should just copy that unless there's a very good reason not to.



Rebased on top of r231445 (from yesterday).
Successfully bootstrapped on x86_64-pc-linux-gnu.
Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.

OK for trunk?


I'm inclined to say yes since it was originally submitted in time and 
it's hard to imagine how the change could be risky (I'll point out right 
away that there are one or two other patches in the queue that were also 
submitted in time which I feel should not be considered for gcc-6 at 
this point due to risk).


Let's wait until the end of the week for objections, commit then.


Bernd


Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread Jeff Law

On 12/09/2015 10:44 AM, Bernd Schmidt wrote:

Just thinking out loud - I guess it would be too much to hope for to
share lexers between frontends so that we need only one copy of this?


Probably :(


Someone slap sense into me, I just thought of deriving C and C++ parsers
from a common base class... (no this is not a suggestion for this patch).

It'd be nice.  But I don't even think it's on anyone's TODO list.


Wording-wise, should it be "merge conflict marker", rather
than "patch conflict marker"?

Clang spells it:
"error: version control conflict marker in file"
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts



Yeah, if another compiler has a similar/identical diagnostic I think we
should just copy that unless there's a very good reason not to.

Agreed.




Rebased on top of r231445 (from yesterday).
Successfully bootstrapped on x86_64-pc-linux-gnu.
Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.

OK for trunk?


I'm inclined to say yes since it was originally submitted in time and
it's hard to imagine how the change could be risky (I'll point out right
away that there are one or two other patches in the queue that were also
submitted in time which I feel should not be considered for gcc-6 at
this point due to risk).

Let's wait until the end of the week for objections, commit then.
As the person who has come the closest to objecting, I'll go on the 
record as "not objecting".


jeff