Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread David Malcolm
On Fri, 2015-11-13 at 07:57 +0100, Marek Polacek wrote:
> Probably coming too late, sorry.

> On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> > index 4335a87..eb4e1fc 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "c-family/c-ubsan.h"
> >  #include "cilk.h"
> >  #include "gomp-constants.h"
> > +#include "spellcheck.h"
> >  
> >  /* Possible cases of implicit bad conversions.  Used to select
> > diagnostic messages in convert_for_assignment.  */
> > @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
> >return tree_cons (NULL_TREE, field, NULL_TREE);
> >  }
> >  
> > +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> > +
> > +static void
> > +lookup_field_fuzzy_find_candidates (tree type, tree component,
> > +   vec *candidates)
> > +{
> > +  tree field;
> > +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> 
> I'd prefer declaring field in the for loop, so
>   for (tree field = TYPE_FIELDS...
> 
> > + && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > + || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> 
> This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

I based this code on the code in lookup_field right above it;
I copied-and-pasted that conditional, so presumably it should also be
changed in lookup_field (which has the condition twice)?

FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.

/* Nonzero if TYPE is a record or union type.  */
#define RECORD_OR_UNION_TYPE_P(TYPE)\
  (TREE_CODE (TYPE) == RECORD_TYPE  \
   || TREE_CODE (TYPE) == UNION_TYPE\
   || TREE_CODE (TYPE) == QUAL_UNION_TYPE)

FWIW I've made the change in the attached patch (both to the new
function, and to lookup_field).

> > +   {
> > + lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> > + component,
> > + candidates);
> > +   }
> 
> Lose the brackets around a single statement.

Done.

> > +  if (DECL_NAME (field))
> > +   candidates->safe_push (DECL_NAME (field));
> > +}
> > +}
> > +
> > +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> > +   rather than returning a TREE_LIST for an exact match.  */
> > +
> > +static tree
> > +lookup_field_fuzzy (tree type, tree component)
> > +{
> > +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> > +
> > +  /* First, gather a list of candidates.  */
> > +  auto_vec  candidates;
> > +
> > +  lookup_field_fuzzy_find_candidates (type, component,
> > + );
> > +
> > +  /* Now determine which is closest.  */
> > +  int i;
> > +  tree identifier;
> > +  tree best_identifier = NULL;
> 
> NULL_TREE

Fixed.

> > +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> > +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> > +{
> > +  gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> > +  edit_distance_t dist = levenshtein_distance (component, identifier);
> > +  if (dist < best_distance)
> > +   {
> > + best_distance = dist;
> > + best_identifier = identifier;
> > +   }
> > +}
> > +
> > +  /* If more than half of the letters were misspelled, the suggestion is
> > + likely to be meaningless.  */
> > +  if (best_identifier)
> > +{
> > +  unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> > +IDENTIFIER_LENGTH (best_identifier)) / 2;
> > +  if (best_distance > cutoff)
> > +   return NULL;
> 
> NULL_TREE

Fixed.

> > +/* The Levenshtein distance is an "edit-distance": the minimal
> > +   number of one-character insertions, removals or substitutions
> > +   that are needed to change one string into another.
> > +
> > +   This implementation uses the Wagner-Fischer algorithm.  */
> > +
> > +static edit_distance_t
> > +levenshtein_distance (const char *s, int len_s,
> > + const char *t, int len_t)
> > +{
> > +  const bool debug = false;
> > +
> > +  if (debug)
> > +{
> > +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > +}
> 
> Did you leave this debug stuff here intentionally?

I find it useful, but I believe it's against our policy, so I've deleted
it in the attached patch.

> > +  /* Build the rest of the row by considering neighbours to
> > +the north, west and northwest.  */
> > +  for (int j = 0; j < len_s; j++)
> > +   {
> > + edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > + edit_distance_t deletion = v1[j] + 1;
> > + edit_distance_t insertion= v0[j + 1] + 1;
> 
> The formatting doesn't look right here.

It's correct; it's "diff" inserting two spaces before a tab combined
with our mixed spaces+tab convention: the "for" is at column 6 (6
spaces), whereas the other lines 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 07:16:08AM -0500, David Malcolm wrote:
> > > +   && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > > +   || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> > 
> > This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).
> 
> I based this code on the code in lookup_field right above it;
> I copied-and-pasted that conditional, so presumably it should also be
> changed in lookup_field (which has the condition twice)?
> 
> FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.
> 
> /* Nonzero if TYPE is a record or union type.  */
> #define RECORD_OR_UNION_TYPE_P(TYPE)  \
>   (TREE_CODE (TYPE) == RECORD_TYPE\
>|| TREE_CODE (TYPE) == UNION_TYPE  \
>|| TREE_CODE (TYPE) == QUAL_UNION_TYPE)
> 
> FWIW I've made the change in the attached patch (both to the new
> function, and to lookup_field).

Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
things down.  I think we should have a C FE-only macro, maybe called
RECORD_OR_UNION_TYPE_P that only checks for those two types, but this is
something that I can deal with later on.

So I think please just drop these changes for now.  Sorry again.

> > > +  const bool debug = false;
> > > +
> > > +  if (debug)
> > > +{
> > > +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > > +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > > +}
> > 
> > Did you leave this debug stuff here intentionally?
> 
> I find it useful, but I believe it's against our policy, so I've deleted
> it in the attached patch.

Probably.  But you could surely have a separate DEBUG_FUNCTION that can be
called from gdb.
 
> > > +  /* Build the rest of the row by considering neighbours to
> > > +  the north, west and northwest.  */
> > > +  for (int j = 0; j < len_s; j++)
> > > + {
> > > +   edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > > +   edit_distance_t deletion = v1[j] + 1;
> > > +   edit_distance_t insertion= v0[j + 1] + 1;
> > 
> > The formatting doesn't look right here.
> 
> It's correct; it's "diff" inserting two spaces before a tab combined
> with our mixed spaces+tab convention: the "for" is at column 6 (6
> spaces), whereas the other lines are at column 8 (1 tab), which looks
> weird in a diff.

Sorry, what I had in mind were the spaces after "deletion" and "insertion"
before "=".  Not a big deal, of course.
 
> Patch attached; only tested lightly so far (compiles, and passes
> spellcheck subset of tests).
> 
> OK for trunk if it passes bootstrap?

Ok modulo the RECORD_OR_UNION_TYPE_P changes, thanks.

Marek


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > On 11/13/2015 04:11 PM, Marek Polacek wrote:
> > >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> > >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> > >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> > >things down.
> > 
> > I don't think so, the three codes are adjacent so we should be generating
> > "(unsigned)(code - RECORD_TYPE) < 3".
> 
> Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> form, then we don't need a separate version for the C FE.

Why?  The compiler should do that already, or do you care about
-O0 builds or host compilers other than gcc that aren't able to do this?
The disadvantage of writing it manually that way is that you need to assert
somewhere that the 3 values indeed are consecutive, while
when the (host?) compiler performs this optimization, it does that only if
they are consecutive, if they are not, the code will be just less efficient.

Jakub


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 04:56:30PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> > On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > > I don't think so, the three codes are adjacent so we should be generating
> > > "(unsigned)(code - RECORD_TYPE) < 3".
> > 
> > Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> > form, then we don't need a separate version for the C FE.
> 
> Why?  The compiler should do that already, or do you care about
> -O0 builds or host compilers other than gcc that aren't able to do this?

I don't.

> The disadvantage of writing it manually that way is that you need to assert
> somewhere that the 3 values indeed are consecutive, while
> when the (host?) compiler performs this optimization, it does that only if
> they are consecutive, if they are not, the code will be just less efficient.

Ok, I understand now what Bernd meant.  I didn't realize the compiler already
does such optimization with those _TYPEs...

Marek


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Bernd Schmidt

On 11/13/2015 04:11 PM, Marek Polacek wrote:

Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
things down.


I don't think so, the three codes are adjacent so we should be 
generating "(unsigned)(code - RECORD_TYPE) < 3".



Bernd


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> On 11/13/2015 04:11 PM, Marek Polacek wrote:
> >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> >things down.
> 
> I don't think so, the three codes are adjacent so we should be generating
> "(unsigned)(code - RECORD_TYPE) < 3".

Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
form, then we don't need a separate version for the C FE.

I'll look at this cleanup in the next week.

Marek


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-12 Thread David Malcolm
On Sun, 2015-11-01 at 23:44 -0700, Jeff Law wrote:
> On 10/30/2015 06:47 AM, David Malcolm wrote:
> 
> > The typename suggestion seems to be at least somewhat controversial,
> > whereas (I hope) the misspelled field names suggestion is more
> > acceptable.
> >
> > Hence I'm focusing on the field name lookup for now; other uses of the
> > algorithm (e.g. the typename lookup) could be done in followup patches,
> > but I'm deferring them for now in the hope of getting the simplest case
> > into trunk as a first step.  Similarly, for simplicity, I didn't
> > implement any attempt at error-recovery using the hint.
> >
> > The following patch kit is in two parts (for ease of review; they would
> > be applied together):
> >
> >patch 1: Implement Levenshtein distance
> >patch 2: C FE: suggest corrections for misspelled field names
> >
> > I didn't implement a limiter, on the grounds that this only fires
> > once per "has no member named" error, and so is unlikely to slow
> > things down noticeably.
> >
> > Successfully bootstrapped the combination of these two
> > on x86_64-pc-linux-gnu (adds 11 new PASS results to gcc.sum)
> >
> > OK for trunk?
> >
> >   gcc/Makefile.in  |   1 +
> >   gcc/c/c-typeck.c |  70 +++-
> >   gcc/spellcheck.c | 136 
> > +++
> >   gcc/spellcheck.h |  32 ++
> >   gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
> >   gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++
> >   gcc/testsuite/gcc.dg/plugin/plugin.exp   |   1 +
> >   gcc/testsuite/gcc.dg/spellcheck-fields.c |  63 +++
> >   8 files changed, 375 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/spellcheck.c
> >   create mode 100644 gcc/spellcheck.h
> >   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
> >   create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c
> I'm going to assume you got levenshtein's algorithm reasonably correct.
> 
> This is OK for the trunk.  

Thanks.

FWIW I applied some fixes for the nits identified by Mikael in:
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00046.html
renaming params "m" and "n" to "len_s" and "len_t", and fixing the
comment - under the "obvious" rule.

I've committed the combination of the two patches (with the nit fixes)
as r230284; attached is what I committed (for reference).

> Obviously I'd like to see it extend into the 
> other front-ends (C++ in particular).  Then I'd like to see it extend 
> beyond just misspelled field names.

(nods)
>From 7d22e0182f7d21f2b18a64530e7f94dd36cec7b0 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Thu, 29 Oct 2015 15:29:26 -0400
Subject: [PATCH] Implement Levenshtein distance; use in C FE for
 misspelled field names

This is the combination of:
  [PATCH 1/2] Implement Levenshtein distance
  [PATCH 2/2] C FE: suggest corrections for misspelled field names
plus some nit fixes to spellcheck.c.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add spellcheck.o.
	* spellcheck.c: New file.
	* spellcheck.h: New file.

gcc/c/ChangeLog:
	* c-typeck.c: Include spellcheck.h.
	(lookup_field_fuzzy_find_candidates): New function.
	(lookup_field_fuzzy): New function.
	(build_component_ref): If the field was not found, try using
	lookup_field_fuzzy and potentially offer a suggestion.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/levenshtein-test-1.c: New file.
	* gcc.dg/plugin/levenshtein_plugin.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	levenshtein_plugin.c.
	* gcc.dg/spellcheck-fields.c: New file.
---
 gcc/Makefile.in  |   1 +
 gcc/c/c-typeck.c |  74 +++-
 gcc/spellcheck.c | 136 +++
 gcc/spellcheck.h |  32 ++
 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp   |   1 +
 gcc/testsuite/gcc.dg/spellcheck-fields.c |  63 +++
 8 files changed, 379 insertions(+), 1 deletion(-)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 34d2356..f17234d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1403,6 +1403,7 @@ OBJS = \
 	shrink-wrap.o \
 	simplify-rtx.o \
 	sparseset.o \
+	spellcheck.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	statistics.o \
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4335a87..eb4e1fc 100644
--- a/gcc/c/c-typeck.c
+++ 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-12 Thread Marek Polacek
Probably coming too late, sorry.

On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> index 4335a87..eb4e1fc 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-ubsan.h"
>  #include "cilk.h"
>  #include "gomp-constants.h"
> +#include "spellcheck.h"
>  
>  /* Possible cases of implicit bad conversions.  Used to select
> diagnostic messages in convert_for_assignment.  */
> @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
>return tree_cons (NULL_TREE, field, NULL_TREE);
>  }
>  
> +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> +
> +static void
> +lookup_field_fuzzy_find_candidates (tree type, tree component,
> + vec *candidates)
> +{
> +  tree field;
> +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))

I'd prefer declaring field in the for loop, so
  for (tree field = TYPE_FIELDS...

> +   && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> +   || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))

This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

> + {
> +   lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> +   component,
> +   candidates);
> + }

Lose the brackets around a single statement.

> +  if (DECL_NAME (field))
> + candidates->safe_push (DECL_NAME (field));
> +}
> +}
> +
> +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> +   rather than returning a TREE_LIST for an exact match.  */
> +
> +static tree
> +lookup_field_fuzzy (tree type, tree component)
> +{
> +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> +
> +  /* First, gather a list of candidates.  */
> +  auto_vec  candidates;
> +
> +  lookup_field_fuzzy_find_candidates (type, component,
> +   );
> +
> +  /* Now determine which is closest.  */
> +  int i;
> +  tree identifier;
> +  tree best_identifier = NULL;

NULL_TREE

> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> +{
> +  gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> +  edit_distance_t dist = levenshtein_distance (component, identifier);
> +  if (dist < best_distance)
> + {
> +   best_distance = dist;
> +   best_identifier = identifier;
> + }
> +}
> +
> +  /* If more than half of the letters were misspelled, the suggestion is
> + likely to be meaningless.  */
> +  if (best_identifier)
> +{
> +  unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> +  IDENTIFIER_LENGTH (best_identifier)) / 2;
> +  if (best_distance > cutoff)
> + return NULL;

NULL_TREE

> +/* The Levenshtein distance is an "edit-distance": the minimal
> +   number of one-character insertions, removals or substitutions
> +   that are needed to change one string into another.
> +
> +   This implementation uses the Wagner-Fischer algorithm.  */
> +
> +static edit_distance_t
> +levenshtein_distance (const char *s, int len_s,
> +   const char *t, int len_t)
> +{
> +  const bool debug = false;
> +
> +  if (debug)
> +{
> +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> +}

Did you leave this debug stuff here intentionally?

> +  /* Build the rest of the row by considering neighbours to
> +  the north, west and northwest.  */
> +  for (int j = 0; j < len_s; j++)
> + {
> +   edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> +   edit_distance_t deletion = v1[j] + 1;
> +   edit_distance_t insertion= v0[j + 1] + 1;

The formatting doesn't look right here.

Marek


Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-01 Thread Jeff Law

On 10/30/2015 06:47 AM, David Malcolm wrote:


The typename suggestion seems to be at least somewhat controversial,
whereas (I hope) the misspelled field names suggestion is more
acceptable.

Hence I'm focusing on the field name lookup for now; other uses of the
algorithm (e.g. the typename lookup) could be done in followup patches,
but I'm deferring them for now in the hope of getting the simplest case
into trunk as a first step.  Similarly, for simplicity, I didn't
implement any attempt at error-recovery using the hint.

The following patch kit is in two parts (for ease of review; they would
be applied together):

   patch 1: Implement Levenshtein distance
   patch 2: C FE: suggest corrections for misspelled field names

I didn't implement a limiter, on the grounds that this only fires
once per "has no member named" error, and so is unlikely to slow
things down noticeably.

Successfully bootstrapped the combination of these two
on x86_64-pc-linux-gnu (adds 11 new PASS results to gcc.sum)

OK for trunk?

  gcc/Makefile.in  |   1 +
  gcc/c/c-typeck.c |  70 +++-
  gcc/spellcheck.c | 136 +++
  gcc/spellcheck.h |  32 ++
  gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
  gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++
  gcc/testsuite/gcc.dg/plugin/plugin.exp   |   1 +
  gcc/testsuite/gcc.dg/spellcheck-fields.c |  63 +++
  8 files changed, 375 insertions(+), 1 deletion(-)
  create mode 100644 gcc/spellcheck.c
  create mode 100644 gcc/spellcheck.h
  create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
  create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c

I'm going to assume you got levenshtein's algorithm reasonably correct.

This is OK for the trunk.  Obviously I'd like to see it extend into the 
other front-ends (C++ in particular).  Then I'd like to see it extend 
beyond just misspelled field names.


jeff