lexer

Brice Figureau Fri, 04 Dec 2009 02:47:58 -0800

Hi,

I tried to read the whole patch (please remove parser.rb patch next time
it takes 90% of the patch for no added value) and it looks (really)
good, so I'm +1 if that was needed.


It looks like you allow "expression"s in the ${ } tag, which is really
nice.
I have a minor comment (see below).

On Thu, 2009-12-03 at 10:46 -0800, Markus Roberts wrote:
> This patch moves the syntactic aspects of string interpolation up
> into the lexer/parser phase, preparatory to moving the semantic
> portions down to the as yet unnamed futures resolution phase.
> 
> This is an enabling move, designed to allow:
> 
> * Futures resolution in and between interpolated strings
> * Interpolation of hash elements into strings
> * Removal of certain order-dependent paths
> * Further modularization of the lexer/parser
> 
> The key change is switching from viewing strings with interpolation
> as single lexical entities (which await later special case processing)
> to viewing them as formulas for constructing strings, with the internal
> structure of the string exposed by the parser.
> 
> Thus a string like:
> 
>     "Hello $name, are you enjoying ${language_feature}?"
> 
> internally becomes something like:
> 
>     concat("Hello ",$name,", are you enjoying ",$language_feature,"?")
> 
> where "concat" is an internal string concatenation function.
> 
> A few test cases to show the user observable effects of this change:
> 
>     notice("string with ${'a nested single quoted string'} inside it.")
>     $v2 = 3+4
>     notice("string with ${['an array ',3,'+',4,'=',$v2]} in it.")
>     notice("string with ${(3+5)/4} nested math ops in it.")
> 
> ...and so forth.
> 
> The key changes in the internals are:
> 
> * Unification of SQTEXT and DQTEXT into a new token type STRING (since
>   nothing past the lexer cares about the distinction.
> * Creation of several new token types to represent the components of
>   an interpolated string:
> 
>       DQPRE    The initial portion of an interpolated string
>       DQMID    The portion of a string betwixt two interpolations
>       DQPOST   The final portion of an interpolated string
>       DQCONT   The as-yet-unlexed portion after an interpolation
> 
>   Thus, in the example above (phantom curly braces added for clarity),
> 
>       DQPRE    "Hello ${
>       DQMID    }, are you enjoying ${
>       DQPOST   }?"
> 
>   DQCONT is a bookkeeping token and is never generated.
> * Creation of a DOLLAR_VAR token to strip the "$" off of variables
>   with explicit dollar signs, so that the VARIABLEs produced from
>   things like "Test ${x}" (where the "$" has already been consumed)
>   do not fail for want of a "$"
> * Reworking the grammar rules in the obvious way
> * Introduction of a "concatenation" AST node type (which will be going
>   away in a subsequent refactor).
> 
> Note finally that this is a component of a set of interrelated refactors,
> and some of the changes around the edges of the above will only makes
> sense in context of the other parts.
> 
> Signed-off-by: Markus Roberts <[email protected]>
> ---
>  lib/puppet/parser/ast/leaf.rb |   10 +
>  lib/puppet/parser/grammar.ra  |   18 +-
>  lib/puppet/parser/lexer.rb    |  152 ++--
>  lib/puppet/parser/parser.rb   | 1964 
> +++++++++++++++++++++--------------------
>  spec/unit/parser/lexer.rb     |  229 ++----
>  5 files changed, 1181 insertions(+), 1192 deletions(-)
> 
> diff --git a/lib/puppet/parser/ast/leaf.rb b/lib/puppet/parser/ast/leaf.rb
> index c8ac6f7..957d271 100644
> --- a/lib/puppet/parser/ast/leaf.rb
> +++ b/lib/puppet/parser/ast/leaf.rb
> @@ -72,6 +72,16 @@ class Puppet::Parser::AST
>          end
>      end
>  
> +    class Concat < AST::Leaf
> +        def evaluate(scope)
> +            @value.collect { |x| x.evaluate(scope) }.join
> +        end
> +
> +        def to_s
> +            "concat(#[email protected](',')})"
> +        end

Those to_s are mainly used by Puppetdoc RDoc to recreate the puppet
manifest from the AST node. Please change this method to return
something more similar to the originally parsed string otherwise all
strings will look strange in puppetdoc.

But since you said in your comment it will disappear soon in favor of
Futures, you can leave it now.

BTW, I'm wondering how puppetdoc will deal with future
"interpretation" :-(
I'll look at that when you'll release the future patch.

> +    end
> +
>      # The 'default' option on case statements and selectors.
>      class Default < AST::Leaf; end
>  
> diff --git a/lib/puppet/parser/grammar.ra b/lib/puppet/parser/grammar.ra
> index 4c74211..078c65b 100644
> --- a/lib/puppet/parser/grammar.ra
> +++ b/lib/puppet/parser/grammar.ra
> @@ -4,7 +4,8 @@
>  
>  class Puppet::Parser::Parser
>  
> -token LBRACK DQTEXT SQTEXT RBRACK LBRACE RBRACE SYMBOL FARROW COMMA TRUE
> +token STRING DQPRE DQMID DQPOST
> +token LBRACK  RBRACK LBRACE RBRACE SYMBOL FARROW COMMA TRUE
>  token FALSE EQUALS APPENDS LESSEQUAL NOTEQUAL DOT COLON LLCOLLECT RRCOLLECT
>  token QMARK LPAREN RPAREN ISEQUAL GREATEREQUAL GREATERTHAN LESSTHAN
>  token IF ELSE IMPORT DEFINE ELSIF VARIABLE CLASS INHERITS NODE BOOLEAN
> @@ -421,11 +422,13 @@ funcrvalue:   NAME LPAREN funcvalues RPAREN {
>          :ftype => :rvalue
>  }
>  
> -quotedtext: DQTEXT {
> -    result = ast AST::String, :value => val[0][:value], :line => 
> val[0][:line]
> -}           | SQTEXT {
> -    result = ast AST::FlatString, :value => val[0][:value], :line => 
> val[0][:line]
> -}
> +quotedtext: STRING       { result = ast AST::String, :value => 
> val[0][:value],                  :line => val[0][:line] }           
> +          | DQPRE dqrval { result = ast AST::Concat, :value => 
> [ast(AST::String,val[0])]+val[1], :line => val[0][:line] }
> +
> +dqrval: expression dqtail { result = [val[0]] + val[1] }
> +
> +dqtail: DQPOST        { result = [ast(AST::String,val[0])]          }
> +      | DQMID dqrval  { result = [ast(AST::String,val[0])] + val[1] }
>  
>  boolean:    BOOLEAN {
>      result = ast AST::Boolean, :value => val[0][:value], :line => 
> val[0][:line]
> @@ -696,8 +699,7 @@ nodename: hostname {
>  }
>  
>  hostname: NAME { result = val[0][:value] }
> -        | SQTEXT { result = val[0][:value] }
> -        | DQTEXT { result = val[0][:value] }
> +        | STRING { result = val[0][:value] }
>          | DEFAULT { result = val[0][:value] }
>          | regex
>  
> diff --git a/lib/puppet/parser/lexer.rb b/lib/puppet/parser/lexer.rb
> index bb4fdf9..26e6b60 100644
> --- a/lib/puppet/parser/lexer.rb
> +++ b/lib/puppet/parser/lexer.rb
> @@ -11,11 +11,14 @@ end
>  module Puppet::Parser; end
>  
>  class Puppet::Parser::Lexer
> -    attr_reader :last, :file, :lexing_context
> +    attr_reader :last, :file, :lexing_context, :token_queue
>  
>      attr_accessor :line, :indefine
>  
> -    # Our base token class.
> +    def lex_error msg
> +        raise Puppet::LexError.new(msg)
> +    end
> +        
>      class Token
>          attr_accessor :regex, :name, :string, :skip, :incr_line, :skip_text, 
> :accumulate
>  
> @@ -28,6 +31,7 @@ class Puppet::Parser::Lexer
>              end
>          end
>  
> +        # MQR: Why not just alias?

This is certainly my fault. I wasn't aware of method aliasing when I
wrote this, and this looked like a pattern already used in puppet...
You have my blessing to rewrite it the way you want.

>          %w{skip accumulate}.each do |method|
>              define_method(method+"?") do
>                  self.send(method)
> @@ -142,10 +146,13 @@ class Puppet::Parser::Lexer
>          '=~' => :MATCH,
>          '!~' => :NOMATCH,
>          %r{([a-z][-\w]*)?(::[a-z][-\w]*)+} => :CLASSNAME, # Require '::' in 
> the class name, else we'd compete with NAME
> -        %r{((::){0,1}[A-Z][-\w]*)+} => :CLASSREF
> -    )
> -
> -    TOKENS.add_tokens "Whatever" => :DQTEXT, "Nomatter" => :SQTEXT, 
> "alsonomatter" => :BOOLEAN
> +        %r{((::){0,1}[A-Z][-\w]*)+} => :CLASSREF,
> +        "<string>" => :STRING, 
> +        "<dqstring up to first interpolation>" => :DQPRE,
> +        "<dqstring between two interpolations>" => :DQMID,
> +        "<dqstring after final interpolation>" => :DQPOST,
> +        "<boolean>" => :BOOLEAN
> +        )
>  
>      TOKENS.add_token :NUMBER, 
> %r{\b(?:0[xX][0-9A-Fa-f]+|0?\d+(?:\.\d+)?(?:[eE]-?\d+)?)\b} do |lexer, value|
>          [TOKENS[:NAME], value]
> @@ -163,6 +170,9 @@ class Puppet::Parser::Lexer
>          end
>          [string_token, value]
>      end
> +    def (TOKENS[:NAME]).acceptable?(context={})
> +        ![:DQPRE,:DQMID].include? context[:after]
> +    end
>
>      TOKENS.add_token :COMMENT, %r{#.*}, :accumulate => true, :skip => true 
> do |lexer,value|
>          value.sub!(/# ?/,'')
> @@ -176,7 +186,7 @@ class Puppet::Parser::Lexer
>          [self,value]
>      end
>  
> -    regex_token = TOKENS.add_token :REGEX, %r{/[^/\n]*/} do |lexer, value|
> +    TOKENS.add_token :REGEX, %r{/[^/\n]*/} do |lexer, value|
>          # Make sure we haven't matched an escaped /
>          while value[-2..-2] == '\\'
>              other = lexer.scan_until(%r{/})
> @@ -186,27 +196,40 @@ class Puppet::Parser::Lexer
>          [self, Regexp.new(regex)]
>      end
>  
> -    def regex_token.acceptable?(context={})
> +    def (TOKENS[:REGEX]).acceptable?(context={})
>          [:NODE,:LBRACE,:RBRACE,:MATCH,:NOMATCH,:COMMA].include? 
> context[:after]
>      end
>  
>      TOKENS.add_token :RETURN, "\n", :skip => true, :incr_line => true, 
> :skip_text => true
>  
>      TOKENS.add_token :SQUOTE, "'" do |lexer, value|
> -        value = lexer.slurpstring(value)
> -        [TOKENS[:SQTEXT], value]
> +        [TOKENS[:STRING], lexer.slurpstring(value).first ]
>      end
>  
> -    TOKENS.add_token :DQUOTE, '"' do |lexer, value|
> -        value = lexer.slurpstring(value)
> -        [TOKENS[:DQTEXT], value]
> +    DQ_initial_token_types      = {'$' => :DQPRE,'"' => :STRING}
> +    DQ_continuation_token_types = {'$' => :DQMID,'"' => :DQPOST}
> +
> +    TOKENS.add_token :DQUOTE, /"/ do |lexer, value| 
> +         lexer.tokenize_interpolated_string(DQ_initial_token_types)
>      end
>  
> -    TOKENS.add_token :VARIABLE, %r{\$(\w*::)*\w+} do |lexer, value|
> -        value = value.sub(/^\$/, '')
> -        [self, value]
> +    TOKENS.add_token :DQCONT, /\}/ do |lexer, value|
> +        lexer.tokenize_interpolated_string(DQ_continuation_token_types)
> +    end
> +    def (TOKENS[:DQCONT]).acceptable?(context={})
> +        context[:string_interpolation_depth] > 0
>      end
>  
> +    TOKENS.add_token :DOLLAR_VAR, %r{\$(\w*::)*\w+} do |lexer, value|
> +        [TOKENS[:VARIABLE],value[1..-1]]
> +    end
> +
> +    TOKENS.add_token :VARIABLE, %r{(\w*::)*\w+}
> +    def (TOKENS[:VARIABLE]).acceptable?(context={})
> +        [:DQPRE,:DQMID].include? context[:after]
> +    end
> +
> +
>      TOKENS.sort_tokens
>  
>      @@pairs = {
> @@ -244,9 +267,7 @@ class Puppet::Parser::Lexer
>      def expected
>          return nil if @expected.empty?
>          name = @expected[-1]
> -        raise "Could not find expected token %s" % name unless token = 
> TOKENS.lookup(name)
> -
> -        return token
> +        TOKENS.lookup(name) or lex_error "Could not find expected token 
> #{name}"
>      end
>  
>      # scan the whole file
> @@ -274,22 +295,19 @@ class Puppet::Parser::Lexer
>          }
>      end
>  
> -    def find_string_token
> -        matched_token = value = nil
> +    def shift_token
> +        @token_queue.shift
> +    end
>  
> +    def find_string_token
>          # We know our longest string token is three chars, so try each size 
> in turn
>          # until we either match or run out of chars.  This way our 
> worst-case is three
> -        # tries, where it is otherwise the number of string chars we have.  
> Also,
> +        # tries, where it is otherwise the number of string token we have.  
> Also,
>          # the lookups are optimized hash lookups, instead of regex scans.
> -        [3, 2, 1].each do |i|
> -            str = @scanner.peek(i)
> -            if matched_token = TOKENS.lookup(str)
> -                value = @scanner.scan(matched_token.regex)
> -                break
> -            end
> -        end
> -
> -        return matched_token, value
> +        # 
> +        s = @scanner.peek(3)
> +        token = TOKENS.lookup(s[0,3]) || TOKENS.lookup(s[0,2]) || 
> TOKENS.lookup(s[0,1])
> +        [ token, token && @scanner.scan(token.regex) ]
>      end
>  
>      # Find the next token that matches a regex.  We look for these first.
> @@ -316,7 +334,7 @@ class Puppet::Parser::Lexer
>      # Find the next token, returning the string and the token.
>      def find_token
>          @find += 1
> -        find_regex_token || find_string_token
> +        shift_token || find_regex_token || find_string_token
>      end
>  
>      def indefine?
> @@ -343,10 +361,15 @@ class Puppet::Parser::Lexer
>          @skip = %r{[ \t]+}
>  
>          @namestack = []
> +        @token_queue = []
>          @indefine = false
>          @expected = []
>          @commentstack = [ ['', @line] ]
> -        @lexing_context = {:after => nil, :start_of_line => true}
> +        @lexing_context = {
> +            :after => nil, 
> +            :start_of_line => true, 
> +            :string_interpolation_depth => 0
> +            }
>      end
>  
>      # Make any necessary changes to the token and/or value.
> @@ -396,28 +419,17 @@ class Puppet::Parser::Lexer
>      # this is the heart of the lexer
>      def scan
>          #Puppet.debug("entering scan")
> -        raise Puppet::LexError.new("Invalid or empty string") unless @scanner
> +        lex_error "Invalid or empty string" unless @scanner
>  
>          # Skip any initial whitespace.
>          skip()
>  
> -        until @scanner.eos? do
> +        until token_queue.empty? and @scanner.eos? do
>              yielded = false
>              matched_token, value = find_token
>  
>              # error out if we didn't match anything at all
> -            if matched_token.nil?
> -                nword = nil
> -                # Try to pull a 'word' out of the remaining string.
> -                if @scanner.rest =~ /^(\S+)/
> -                    nword = $1
> -                elsif @scanner.rest =~ /^(\s+)/
> -                    nword = $1
> -                else
> -                    nword = @scanner.rest
> -                end
> -                raise "Could not match '%s'" % nword
> -            end
> +            lex_error "Could not match #[email protected][/^(\S+|\s+|.*)/]}" 
> unless matched_token
>  
>              newline = matched_token.name == :RETURN
>  
> @@ -433,6 +445,8 @@ class Puppet::Parser::Lexer
>              end
>  
>              lexing_context[:after]         = final_token.name unless newline
> +            lexing_context[:string_interpolation_depth] += 1 if 
> final_token.name == :DQPRE
> +            lexing_context[:string_interpolation_depth] -= 1 if 
> final_token.name == :DQPOST
>  
>              value = token_value[:value]
>  
> @@ -481,24 +495,40 @@ class Puppet::Parser::Lexer
>          @scanner.scan_until(regex)
>      end
>  
> -    # we've encountered an opening quote...
> +    # we've encountered the start of a string...
>      # slurp in the rest of the string and return it
> -    def slurpstring(quote)
> +    Valid_escapes_in_strings = %w{ \\  $ ' " n t s }+["\n"]
> +    def slurpstring(terminators)
>          # we search for the next quote that isn't preceded by a
>          # backslash; the caret is there to match empty strings
> -        str = @scanner.scan_until(/([^\\]|^)#{quote}/)
> -        if str.nil?
> -            raise Puppet::LexError.new("Unclosed quote after '%s' in '%s'" %
> -                [self.last,self.rest])
> -        else
> -            str.sub!(/#{quote}\Z/,"")
> -            str.gsub!(/\\#{quote}/,quote)
> -        end
> -
> -        # Add to our line count for every carriage return in multi-line 
> strings.
> -        @line += str.count("\n")
> +        str = @scanner.scan_until(/([^\\]|^)[#{terminators}]/) or lex_error 
> "Unclosed quote after '#{last}' in '#{rest}'"
> +        @line += str.count("\n") # literal carriage returns add to the line 
> count.
> +        str.gsub!(/\\(.)/) {
> +            case ch=$1
> +            when 'n'; "\n"
> +            when 't'; "\t"
> +            when 's'; " "
> +            else
> +                if Valid_escapes_in_strings.include? ch
> +                    ch
> +                else
> +                    Puppet.warning "Unrecognised escape sequence 
> '\\#{ch}'#{file && " in file #{file}"}#{line && " at line #{line}"}"
> +                     "\\#{ch}"
> +                end
> +            end
> +        }
> +        [ str[0..-2],str[-1,1] ]
> +    end
>  
> -        return str
> +    def tokenize_interpolated_string(token_type)
> +        value,terminator = slurpstring('"$')
> +        token_queue << [TOKENS[token_type[terminator]],value]
> +        while terminator == '$' and not @scanner.scan(/\{/)
> +            token_queue << 
> [TOKENS[:VARIABLE],@scanner.scan(%r{(\w*::)*\w+|[0-9]})]
> +            value,terminator = slurpstring('"$')
> +            token_queue << 
> [TOKENS[DQ_continuation_token_types[terminator]],value]
> +        end
> +        token_queue.shift
>      end
>  
>      # just parse a string, not a whole file
> diff --git a/spec/unit/parser/lexer.rb b/spec/unit/parser/lexer.rb
> index 959f360..2e58ef4 100755
> --- a/spec/unit/parser/lexer.rb
> +++ b/spec/unit/parser/lexer.rb
> @@ -5,18 +5,12 @@ require File.dirname(__FILE__) + '/../../spec_helper'
>  require 'puppet/parser/lexer'
>  
>  # This is a special matcher to match easily lexer output
> -Spec::Matchers.create :be_like do |ary|
> -    match do |result|
> -        r = true
> -        result.zip(ary) do |a,b|
> -            unless a[0] == b[0] and ((a[1].is_a?(Hash) and a[1][:value] == 
> b[1]) or a[1] == b[1])
> -                r = false
> -                break
> -            end
> -        end
> -        r
> +Spec::Matchers.create :be_like do |*expected|
> +    match do |actual|
> +        expected.zip(actual).all? { |e,a| !e or a[0] == e or (e.is_a? Array 
> and a[0] == e[0] and (a[1] == e[1] or (a[1].is_a?(Hash) and a[1][:value] == 
> e[1]))) }
>      end
>  end
> +__ = nil

What's the use of this?

>  describe Puppet::Parser::Lexer do
>      describe "when reading strings" do
> @@ -217,7 +211,7 @@ describe Puppet::Parser::Lexer::TOKENS do
>      end
>  
>      # These tokens' strings don't matter, just that the tokens exist.
> -    [:DQTEXT, :SQTEXT, :BOOLEAN, :NAME, :NUMBER, :COMMENT, :MLCOMMENT, 
> :RETURN, :SQUOTE, :DQUOTE, :VARIABLE].each do |name|
> +    [:STRING, :DQPRE, :DQMID, :DQPOST, :BOOLEAN, :NAME, :NUMBER, :COMMENT, 
> :MLCOMMENT, :RETURN, :SQUOTE, :DQUOTE, :VARIABLE].each do |name|
>          it "should have a token named #{name.to_s}" do
>              Puppet::Parser::Lexer::TOKENS[name].should_not be_nil
>          end
> @@ -294,7 +288,6 @@ end
>  describe Puppet::Parser::Lexer::TOKENS[:NUMBER] do
>      before do
>          @token = Puppet::Parser::Lexer::TOKENS[:NUMBER]
> -#        @regex = Regexp.new('^'[email protected]+'$')
>          @regex = @token.regex
>      end
>  
> @@ -401,48 +394,42 @@ describe Puppet::Parser::Lexer::TOKENS[:RETURN] do
>      end
>  end
>  
> -describe Puppet::Parser::Lexer::TOKENS[:SQUOTE] do
> -    before { @token = Puppet::Parser::Lexer::TOKENS[:SQUOTE] }
> -
> -    it "should match against single quotes" do
> -        @token.regex.should =~ "'"
> -    end
> -
> -    it "should slurp the rest of the quoted string" do
> -        lexer = stub("lexer")
> -        lexer.expects(:slurpstring).with("myval").returns("otherval")
> -        @token.convert(lexer, "myval")
> -    end
> -
> -    it "should return the SQTEXT token with the slurped string" do
> -        lexer = stub("lexer")
> -        lexer.stubs(:slurpstring).with("myval").returns("otherval")
> -        @token.convert(lexer, "myval").should == 
> [Puppet::Parser::Lexer::TOKENS[:SQTEXT], "otherval"]
> -    end
> +def tokens_scanned_from(s)
> +    lexer = Puppet::Parser::Lexer.new
> +    lexer.string = s
> +    lexer.fullscan[0..-2]
>  end
>  
> -describe Puppet::Parser::Lexer::TOKENS[:DQUOTE] do
> -    before { @token = Puppet::Parser::Lexer::TOKENS[:DQUOTE] }
> -
> -    it "should match against single quotes" do
> -        @token.regex.should =~ '"'
> -    end
> -
> -    it "should slurp the rest of the quoted string" do
> -        lexer = stub("lexer")
> -        lexer.expects(:slurpstring).with("myval").returns("otherval")
> -        @token.convert(lexer, "myval")
> -    end
> -
> -    it "should return the DQTEXT token with the slurped string" do
> -        lexer = stub("lexer")
> -        lexer.stubs(:slurpstring).with("myval").returns("otherval")
> -        @token.convert(lexer, "myval").should == 
> [Puppet::Parser::Lexer::TOKENS[:DQTEXT], "otherval"]
> -    end
> +describe Puppet::Parser::Lexer,"when lexing strings" do
> +    {
> +        %q['single quoted string')]                                     => 
> [[:STRING,'single quoted string']],
> +        %q["double quoted string"]                                      => 
> [[:STRING,'double quoted string']],
> +        %q['single quoted string with an escaped "\\'"']                => 
> [[:STRING,'single quoted string with an escaped "\'"']],
> +        %q["string with an escaped '\\"'"]                              => 
> [[:STRING,"string with an escaped '\"'"]],
> +        %q["string with an escaped '\\$'"]                              => 
> [[:STRING,"string with an escaped '$'"]],
> +        %q["string with $v (but no braces)"]                            => 
> [[:DQPRE,"string with "],[:VARIABLE,'v'],[:DQPOST,' (but no braces)']],
> +        %q["string with ${v} in braces"]                                => 
> [[:DQPRE,"string with "],[:VARIABLE,'v'],[:DQPOST,' in braces']],
> +        %q["string with $v and $v (but no braces)"]                     => 
> [[:DQPRE,"string with "],[:VARIABLE,"v"],[:DQMID," and 
> "],[:VARIABLE,"v"],[:DQPOST," (but no braces)"]],
> +        %q["string with ${v} and ${v} in braces"]                       => 
> [[:DQPRE,"string with "],[:VARIABLE,"v"],[:DQMID," and 
> "],[:VARIABLE,"v"],[:DQPOST," in braces"]],
> +        %q["string with ${'a nested single quoted string'} inside it."] => 
> [[:DQPRE,"string with "],[:STRING,'a nested single quoted string'],[:DQPOST,' 
> inside it.']],
> +        %q["string with ${['an array ',$v2]} in it."]                   => 
> [[:DQPRE,"string with "],:LBRACK,[:STRING,"an array 
> "],:COMMA,[:VARIABLE,"v2"],:RBRACK,[:DQPOST," in it."]],
> +        %q{a simple "scanner" test}                                     => 
> [[:NAME,"a"],[:NAME,"simple"], [:STRING,"scanner"],[:NAME,"test"]],
> +        %q{a simple 'single quote scanner' test}                        => 
> [[:NAME,"a"],[:NAME,"simple"], [:STRING,"single quote 
> scanner"],[:NAME,"test"]],
> +        %q{a harder 'a $b \c"'}                                         => 
> [[:NAME,"a"],[:NAME,"harder"], [:STRING,'a $b \c"']],
> +        %q{a harder "scanner test"}                                     => 
> [[:NAME,"a"],[:NAME,"harder"], [:STRING,"scanner test"]],
> +        %q{a hardest "scanner \"test\""}                                => 
> [[:NAME,"a"],[:NAME,"hardest"],[:STRING,'scanner "test"']],
> +        %Q{a hardestest "scanner \\"test\\"\n"}                         => 
> [[:NAME,"a"],[:NAME,"hardestest"],[:STRING,%Q{scanner "test"\n}]],
> +        %q{function("call")}                                            => 
> [[:NAME,"function"],[:LPAREN,"("],[:STRING,'call'],[:RPAREN,")"]],
> +        %q["string with ${(3+5)/4} nested math."]                       => 
> [[:DQPRE,"string with 
> "],:LPAREN,[:NAME,"3"],:PLUS,[:NAME,"5"],:RPAREN,:DIV,[:NAME,"4"],[:DQPOST," 
> nested math."]]
> +    }.each { |src,expected_result|
> +        it "should handle #{src} correctly" do
> +            tokens_scanned_from(src).should be_like(*expected_result)
> +        end
> +    }
>  end
>  
> -describe Puppet::Parser::Lexer::TOKENS[:VARIABLE] do
> -    before { @token = Puppet::Parser::Lexer::TOKENS[:VARIABLE] }
> +describe Puppet::Parser::Lexer::TOKENS[:DOLLAR_VAR] do
> +    before { @token = Puppet::Parser::Lexer::TOKENS[:DOLLAR_VAR] }
>  
>      it "should match against alpha words prefixed with '$'" do
>          @token.regex.should =~ '$this_var'
> @@ -465,26 +452,16 @@ describe Puppet::Parser::Lexer::TOKENS[:REGEX] do
>      end
>  
>      describe "when scanning" do
> -        def tokens_scanned_from(s)
> -            lexer = Puppet::Parser::Lexer.new
> -            lexer.string = s
> -            tokens = []
> -            lexer.scan do |name, value|
> -                tokens << value
> -            end
> -            tokens[0..-2]
> -        end
> -
>          it "should not consider escaped slashes to be the end of a regex" do
> -            tokens_scanned_from("$x =~ /this \\/ foo/").last[:value].should 
> == Regexp.new("this / foo")
> +            tokens_scanned_from("$x =~ /this \\/ foo/").should 
> be_like(__,__,[:REGEX,%r{this / foo}])

Ah ok, got it. You can dismiss my previous question...

>          end
>  
>          it "should not lex chained division as a regex" do
> -            tokens_scanned_from("$x = $a/$b/$c").any? {|t| t[:value].class 
> == Regexp }.should == false
> +            tokens_scanned_from("$x = $a/$b/$c").collect { |name, data| name 
> }.should_not be_include( :REGEX )
>          end
>  
>          it "should accept a regular expression after NODE" do
> -            tokens_scanned_from("node 
> /www.*\.mysite\.org/").last[:value].should == Regexp.new("www.*\.mysite\.org")
> +            tokens_scanned_from("node /www.*\.mysite\.org/").should 
> be_like(__,[:REGEX,Regexp.new("www.*\.mysite\.org")])
>          end
>  
>          it "should accept regular expressions in a CASE" do
> @@ -493,7 +470,9 @@ describe Puppet::Parser::Lexer::TOKENS[:REGEX] do
>                  /regex/: {notice("this notably sucks")}
>                  }
>              }
> -            tokens_scanned_from(s)[12][:value].should == Regexp.new("regex")
> +            tokens_scanned_from(s).should be_like(
> +                
> :CASE,:VARIABLE,:LBRACE,:STRING,:COLON,:LBRACE,:VARIABLE,:EQUALS,:NAME,:DIV,:NAME,:RBRACE,[:REGEX,/regex/],:COLON,:LBRACE,:NAME,:LPAREN,:STRING,:RPAREN,:RBRACE,:RBRACE
> +            )
>          end
>   
>     end
> @@ -540,8 +519,7 @@ describe Puppet::Parser::Lexer, "when lexing comments" do
>      end
>  
>      it "should skip whitespace before lexing the next token after a 
> non-token" do
> -        @lexer.string = "/* 1\n\n */ \ntest"
> -        @lexer.fullscan.should be_like([[:NAME, "test"],[false,false]])
> +        tokens_scanned_from("/* 1\n\n */ \ntest").should be_like([:NAME, 
> "test"])
>      end
>  
>      it "should not return comments seen after the current line" do
> @@ -564,50 +542,17 @@ describe "Puppet::Parser::Lexer in the old tests" do
>      before { @lexer = Puppet::Parser::Lexer.new }
>  
>      it "should do simple lexing" do
> -        strings = {
> -%q{\\} => [[:BACKSLASH,"\\"],[false,false]],
> -%q{simplest scanner test} => 
> [[:NAME,"simplest"],[:NAME,"scanner"],[:NAME,"test"],[false,false]],
> -%q{returned scanner test
> -} => [[:NAME,"returned"],[:NAME,"scanner"],[:NAME,"test"],[false,false]]
> -        }
> -        strings.each { |str,ary|
> -            @lexer.string = str
> -            @lexer.fullscan().should be_like(ary)
> -        }
> -    end
> -
> -    it "should correctly lex quoted strings" do
> -        strings = {
> -%q{a simple "scanner" test
> -} => 
> [[:NAME,"a"],[:NAME,"simple"],[:DQTEXT,"scanner"],[:NAME,"test"],[false,false]],
> -%q{a simple 'single quote scanner' test
> -} => [[:NAME,"a"],[:NAME,"simple"],[:SQTEXT,"single quote 
> scanner"],[:NAME,"test"],[false,false]],
> -%q{a harder 'a $b \c"'
> -} => [[:NAME,"a"],[:NAME,"harder"],[:SQTEXT,'a $b \c"'],[false,false]],
> -%q{a harder "scanner test"
> -} => [[:NAME,"a"],[:NAME,"harder"],[:DQTEXT,"scanner test"],[false,false]],
> -%q{a hardest "scanner \"test\""
> -} => [[:NAME,"a"],[:NAME,"hardest"],[:DQTEXT,'scanner 
> "test"'],[false,false]],
> -%q{a hardestest "scanner \"test\"
> -"
> -} => [[:NAME,"a"],[:NAME,"hardestest"],[:DQTEXT,'scanner "test"
> -'],[false,false]],
> -%q{function("call")} => 
> [[:NAME,"function"],[:LPAREN,"("],[:DQTEXT,'call'],[:RPAREN,")"],[false,false]]
> -}
> -        strings.each { |str,array|
> -            @lexer.string = str
> -            @lexer.fullscan().should be_like(array)
> +        {
> +            %q{\\}                      => [[:BACKSLASH,"\\"]],
> +            %q{simplest scanner test}   => 
> [[:NAME,"simplest"],[:NAME,"scanner"],[:NAME,"test"]],
> +            %Q{returned scanner test\n} => 
> [[:NAME,"returned"],[:NAME,"scanner"],[:NAME,"test"]]
> +        }.each { |source,expected|
> +            tokens_scanned_from(source).should be_like(*expected)
>          }
>      end
>  
>      it "should fail usefully" do
> -        strings = %w{
> -            ^
> -        }
> -        strings.each { |str|
> -            @lexer.string = str
> -            lambda { @lexer.fullscan() }.should raise_error(RuntimeError)
> -        }
> +        lambda { tokens_scanned_from('^') }.should raise_error(RuntimeError)
>      end
>  
>      it "should fail if the string is not set" do
> @@ -615,106 +560,64 @@ describe "Puppet::Parser::Lexer in the old tests" do
>      end
>  
>      it "should correctly identify keywords" do
> -        @lexer.string = "case"
> -        @lexer.fullscan.should be_like([[:CASE, "case"], [false, false]])
> +        tokens_scanned_from("case").should be_like([:CASE, "case"])
>      end
>  
> -    it "should correctly match strings" do
> -        names = %w{this is a bunch of names}
> -        types = %w{Many Different Words A Word}
> -        words = %w{differently Cased words A a}
> +    it "should correctly parse class references" do
> +        %w{Many Different Words A Word}.each { |t| 
> tokens_scanned_from(t).should be_like([:CLASSREF,t])}
> +    end
>  
> -        names.each { |t|
> -            @lexer.string = t
> -            @lexer.fullscan.should be_like([[:NAME,t],[false,false]])
> -        }
> -        types.each { |t|
> -            @lexer.string = t
> -            @lexer.fullscan.should be_like([[:CLASSREF,t],[false,false]])
> -        }
> +    # #774
> +    it "should correctly parse namespaced class refernces token" do
> +        %w{Foo ::Foo Foo::Bar ::Foo::Bar}.each { |t| 
> tokens_scanned_from(t).should be_like([:CLASSREF, t]) }
>      end
>  
> -    it "should correctly parse names with numerals" do
> -       string = %w{1name name1 11names names11}
> +    it "should correctly parse names" do
> +        %w{this is a bunch of names}.each { |t| 
> tokens_scanned_from(t).should be_like([:NAME,t]) }
> +    end
>  
> -       string.each { |t|
> -            @lexer.string = t
> -            @lexer.fullscan.should be_like([[:NAME,t],[false,false]])
> -       }
> +    it "should correctly parse names with numerals" do
> +        %w{1name name1 11names names11}.each { |t| 
> tokens_scanned_from(t).should be_like([:NAME,t]) }
>      end
>  
>      it "should correctly parse empty strings" do
> -        bit = '$var = ""'
> -
> -        @lexer.string = bit
> -
> -        lambda { @lexer.fullscan }.should_not raise_error
> +        lambda { tokens_scanned_from('$var = ""') }.should_not raise_error
>      end
>  
>      it "should correctly parse virtual resources" do
> -        string = "@type {"
> -
> -        @lexer.string = string
> -
> -        @lexer.fullscan.should be_like([[:AT, "@"], [:NAME, "type"], 
> [:LBRACE, "{"], [false,false]])
> +        tokens_scanned_from("@type {").should be_like([:AT, "@"], [:NAME, 
> "type"], [:LBRACE, "{"])
>      end
>  
>      it "should correctly deal with namespaces" do
>          @lexer.string = %{class myclass}
> -
>          @lexer.fullscan
> -
>          @lexer.namespace.should == "myclass"
>  
>          @lexer.namepop
> -
>          @lexer.namespace.should == ""
>  
>          @lexer.string = "class base { class sub { class more"
> -
>          @lexer.fullscan
> -
>          @lexer.namespace.should == "base::sub::more"
>  
>          @lexer.namepop
> -
>          @lexer.namespace.should == "base::sub"
>      end
>  
>      it "should correctly handle fully qualified names" do
>          @lexer.string = "class base { class sub::more {"
> -
>          @lexer.fullscan
> -
>          @lexer.namespace.should == "base::sub::more"
>  
>          @lexer.namepop
> -
>          @lexer.namespace.should == "base"
>      end
>  
>      it "should correctly lex variables" do
>          ["$variable", "$::variable", "$qualified::variable", 
> "$further::qualified::variable"].each do |string|
> -            @lexer.string = string
> -
> -            @lexer.scan do |t, s|
> -                t.should == :VARIABLE
> -                string.sub(/^\$/, '').should == s[:value]
> -                break
> -            end
> +            tokens_scanned_from(string).should 
> be_like([:VARIABLE,string.sub(/^\$/,'')])
>          end
>      end
> -
> -    # #774
> -    it "should correctly parse the CLASSREF token" do
> -        string = ["Foo", "::Foo","Foo::Bar","::Foo::Bar"]
> -
> -        string.each do |foo|
> -            @lexer.string = foo
> -            @lexer.fullscan.should be_like([[:CLASSREF, foo],[false,false]])
> -        end
> -    end
> -
>  end
>  
>  require 'puppettest/support/utils'
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

--

You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] [PATCH/puppet 1/1] Moving the string interpolation parsing to the parser/lexer

Reply via email to