Thanks a lot Adrian! It's working beautifully!

On Mon, Jul 18, 2011 at 4:49 AM,  <[email protected]> wrote:
> Hi Talek, what you should do is include the tail items in the scanner and add 
> a pattern that covers any word that is not 'select'. If you specify  'select' 
> ahead of the generic pattern it will be matched in favour of the generic 
> pattern on only that word.
>
> Adrian
> -----Original Message-----
> From: Alec Tica <[email protected]>
> Sender: [email protected]
> Date: Fri, 15 Jul 2011 00:20:42
> To: <[email protected]>
> Reply-To: [email protected]
> Subject: [ragel-users] Detect keywords with a ragel scanner
>
> Hi,
>
> I'm new to Ragel and I'm trying to figure out how to solve,
> apparently, a very simple problem. Let's say I have the following
> text:
>
> "select 1 from dual;select 2 from dual;/*comment*/select 3 from dual;select"
>
> I want to detect all "select" keywords using a scanner but taking into
> consideration the word boundaries. "select" is a keyword only if:
>
> 1. starts at: the very beginning of the text or it has a whitespace
> before or a comment or a statement separator (;)
> 2. ends at: the very end of the text or it has a whitespace after or a
> comment or a statement separator (;)
> 3. is not within quotes
> 4. is not part of a comment
>
> Till now I have:
>
> <code>
> %%{
>  machine example;
>
>  action is_eof {
>    true if p == eof - 1
>  }
>
>  # eof
>  EOF = zlen when is_eof;
>
>  # strings
>  squoted_string = ['] ( (any - [''])** ) ['];
>  dquoted_string = '"' ( any )* :>> '"';
>
>  # comments
>  ml_comment = '/*' ( any )* :>> '*/';
>  sl_comment = '--' ( any )* :>> ('\n' | EOF);
>  comment = ml_comment | sl_comment;
>
>  tail = space | comment | ';' | EOF;
>
>  # keyword
>  select = 'select' . tail;
>
>  main := |*
>    squoted_string;
>    dquoted_string;
>    comment;
>    select => { puts "found at #{ts}-#{te}" };
>    any;
>  *|;
>
> }%%
>
> %% write data;
>
> data = 'unselect 1 from dual;select 2 from dual;/*comment*/select 3
> from dual;select'
> # convert the provided string in a stream of chars
> stream_data = data.unpack("c*") if(data.is_a?(String))
> eof = stream_data.length
>
> %% write init;
> %% write exec;
> </code>
>
> Of course, the above scanner incorrectly matches the "unselect" word
> from the data. Anyway, I feel that I'm not on the right track
> therefore I'd like to ask for your advice.
>
> Many thanks in advance!
>
> --
> talek
>
> _______________________________________________
> ragel-users mailing list
> [email protected]
> http://www.complang.org/mailman/listinfo/ragel-users
> _______________________________________________
> ragel-users mailing list
> [email protected]
> http://www.complang.org/mailman/listinfo/ragel-users
>



-- 
talek

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to