Bug#616100: flex with %bison-locations yyset_lloc usage

2016-02-15 Thread Manoj Srivastava
reassign 616100 bison
thanks

This is mostly about bison documenting how to use the yyloc_param,
especially for reentrant parsers.

manoj
-- 
PROGRAMMER: (n) Red-eyed, mumbling mammal capable of conversing with
inanimate objects.
Manoj Srivastava    
4096R/C5779A1C E37E 5EC5 2A01 DA25 AD20  05B6 CF48 9438 C577 9A1C


smime.p7s
Description: S/MIME cryptographic signature


Bug#616100: flex with %bison-locations yyset_lloc usage

2011-03-02 Thread Ian Jackson
Package: flex, bison
Version: 2.5.35-6, 1:2.3.dfsg-5

The parameter yyloc_param to yylex is specified in the bison
documentation to be only a way for yylex to return the location to
bison.  It is not documented to be available for storage by yylex
between calls to the lexer.  In bison in a reentrant parser the
variable is allocated on the stack inside yyparse which means that it
may be corrupted between different calls to yyparse.  And, indeed, a
caller other than bison might reasonably pass an even more local stack
variable for such an out parameter.

Furthermore, in the obvious calling pattern the contents ought to be
used uninitialised.  However, it seems that bison _does_ initialise
the variable to {1,0,1,0}, on entry to yyparse, if it has the default
type - but not otherwise.  I can't seem to find this documented
anywhere either.

This causes difficulties because caller might want to call yyparse
more than once on the same stream, since the applications' actions can
cause yyparse to return early by using YYACCEPT or YYABORT.  Currently
such an application would find the location counting is reset on each
entry to yyparse.

With the attached input file, the following command:
  flex --header-file=libxlu_cfg_l.h --outfile=libxlu_cfg_l.c libxlu_cfg_l.l
produces a scanner which relies on the value of yyloc_param passed
(probably by bison) to each yylex call being actually a pointer to the
same structure, untouched from each call to the next.  This is not
in accordance with the documentation, though it does work.


Relatedly in a reentrant lexer with locations, we get this function:

  void xlu__cfg_yyset_lloc (YYLTYPE *  yylloc_param , yyscan_t yyscanner)
  {
  struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
  yylloc = yylloc_param;
  }

This squirrels away the user's provided pointer!  This is not even
slightly documented in the manual; the manual doesn't mention the
semantics of yyset_lloc at all.  The natural interpretation of the
prototype is that it copies *yylloc_param (ie, the contents), not the
pointer.  Likewise yyget_lloc returns the pointer.


There are two reasons to change the documentation for the calling
convention for reentrant yylex with locations, rather than the code:

1. This convention, with a persistent location in the parser, avoids
   unnecessary copying of the location on each lexer symbol.

2. Changing it would break old code, unless a new lexer option were
   introduced, which would add complexity.

In this view it would seem that some means needs to be provided for
the user to initialise the location explicitly on entry to yyparse,
either because they want it to have a different type, or because they
want the default type but to preserve the value somehow.

In any case the reentrant versions of the yyset/get_lloc functions
need to be fixed.  It is difficult to imagine anyone using them in
their current state.

Ian.

/* -*- fundamental -*- */

%{
#include libxlu_cfg_i.h

#define ctx ((CfgParseContext*)yyextra)
#define YY_NO_INPUT

#define GOT(x) do{\
yylloc-first_line= yylineno; \
return (x);   \
  }while(0)

/* Some versions of flex have a bug (Fedora bugzilla 612465) which causes
 * it to fail to declare these functions, which it defines.  So declare
 * them ourselves.  Hopefully we won't have to simultaneously support
 * a flex version which declares these differently somehow. */
int xlu__cfg_yyget_column(yyscan_t yyscanner);
void xlu__cfg_yyset_column(int  column_no, yyscan_t yyscanner);

%}

%option warn
%option nodefault
%option batch
%option 8bit
%option yylineno
%option noyywrap
%option bison-bridge
%option bison-locations
%option reentrant
%option prefix=xlu__cfg_yy
%option nounput

%x lexerr

%%

[a-z][_0-9a-z]* {
  yylval-string= xlu__cfgl_strdup(ctx,yytext);
  GOT(IDENT);
}
[0-9][0-9a-fx]* {
  yylval-string= xlu__cfgl_strdup(ctx,yytext);
  GOT(NUMBER);
}

[ \t]

,   { GOT(','); }
\[  { GOT('['); }
\]  { GOT(']'); }
\=  { GOT('='); }
\;  { GOT(';'); }

\n|\#.*\n   { yylloc-first_line= yylineno-1; return NEWLINE; }

\'([^\'\\\n]|\\.)*\'{
  yylval-string= xlu__cfgl_dequote(ctx,yytext);
  GOT(STRING);
}
\([^n]|\\.)*\{
  yylval-string= xlu__cfgl_dequote(ctx,yytext);
  GOT(STRING);
}

[+-.():]{
  ctx-likely_python= 1;
  BEGIN(lexerr);
  yymore();
}

.   {
  BEGIN(lexerr);
  yymore();
}

lexerr[^