Re: [ruby.parslet] Parsing the NCBI Genetic Code Table

Stefan Rohlfing Wed, 10 Aug 2011 01:40:57 -0700

Melissa,

I really thought 'match' would take any regular expression, but I looked it
up and you are right:


# Returns an atom matching a character class. All regular expressions can be
# used, as long as they match only a single character at a time.

With this information I got :id to work, but got stuck again at :name. :name
sometimes reaches over two lines, but the document had already been split
into lines after each newline.

I then tried to first split the document into blocks before and after a
parentheses, but did not succeed. However, I will try to solve this problem
in the next few days.

Thanks again for your help

Stefan




On Tue, Aug 9, 2011 at 21:06, Melissa Whittington <
[email protected]> wrote:

> Stefan,
>
> Ah! I missed one important mistake that I've easily made myself
> before. You can't use 'match' to match multiple characters, the
> regular expression can only match one character. I find that slightly
> unintuitive and it gives no warning if you try to do this.
>
> I tried this:
>  rule(:content)         {str('  id ') >> match('\d').repeat >>
> textdata.repeat}
>  rule(:no_value)        {textdata.repeat(1)}
>
> Because it tries to match :content first, it will only match :no_value
> if it didn't match :content. That matched all the lines with "id".
>
> For me, learning parslet has been fairly trial and error too. And
> google thinks 'parsley' is a much better word to search for than
> 'parslet', heh.
>
> -mj
>
> On Tue, Aug 9, 2011 at 4:57 AM, Stefan Rohlfing
> <[email protected]> wrote:
> > Melissa,
> > Thanks for your help!
> > However, after fixing the problems you pointed me to I got stuck again
> > https://github.com/bytesource/CodonTableParser/blob/master/parser.rb
> > and I am realizing that I am more or less relying on trial & error here.
> In
> > other words, I am still lacking the knowledge of translating a document
> into
> > its Backus Naur form with which I can then feed the parser (Parslet).
> > As I have no background in computer science, I would be interested in any
> > resources (printed or online) you have found valuable in laying the basis
> > for building a parser. This question is for everyone, as I am always
> > interested in different opinions.
> > Stefan
> >
> > On Mon, Aug 8, 2011 at 19:49, Melissa Whittington
> > <[email protected]> wrote:
> >>
> >> Whoops, I meant "The :file rule's repeat is what is describing multiple
> >> lines."
> >>
> >> -mj
> >>
> >> On Mon, Aug 8, 2011 at 7:47 AM, Melissa Whittington
> >> <[email protected]> wrote:
> >> > Stefan,
> >> >
> >> > The reason you're getting that error on the last line is because there
> >> > will be no newline at the end of the last line, so just switch it to
> >> > 'newline.maybe'.
> >> >
> >> > Your :line rule also does not need the .repeat because there will only
> >> > be one of either a :codon or a :comment and not more. The :line rule's
> >> > repeat is what is describing multiple lines.
> >> >
> >> > Also, I don't know what "repeat(1)" by itself does, but you probably
> >> > don't mean that?
> >> >
> >> > Don't forget any only matches one character. You should probably not
> >> > use any, either. For your :content and :no_value rules, they should be
> >> > matching everything on a line (sans a possible newline). You could use
> >> > any.repeat to parse the rest of the line, but it will try to parse
> >> > *anything* including newlines and on to the next lines which is not
> >> > what you want.
> >> >
> >> > So, it'll probably be helpful to be a little more descriptive.
> >> >
> >> > Hope that helps you make a little more progress!
> >> >
> >> > -mj
> >> >
> >> > On Mon, Aug 8, 2011 at 12:21 AM, Stefan Rohlfing
> >> > <[email protected]> wrote:
> >> >> Hi,
> >> >> I am trying to parse the NCBI genetic code table:
> >> >>
> >> >>
> https://github.com/bytesource/CodonTableParser/blob/master/data/codons.txt
> >> >> to extract those lines of each block that contain either "name",
> "id",
> >> >> "ncbieaa", or "sncbieaa".
> >> >> As each line either contains the content I am interested in or text
> >> >> that can
> >> >> be discarded, I started by first parsing the document on a per-line
> >> >> basis:
> >> >> https://github.com/bytesource/CodonTableParser/blob/master/parser.rb
> >> >> Unfortunately, parsing the file resulted in an error message that
> tells
> >> >> me
> >> >> Parslet failed to parse line 233, which is the very last line of the
> >> >> file:
> >> >> Expected at least 1 of LINE NEWLINE at line 1 char 1.
> >> >> `- Expected at least 1 of LINE NEWLINE at line 1 char 1.
> >> >>    `- Failed to match sequence (LINE NEWLINE) at line 233 char 1.
> >> >>       `- Failed to match sequence (LF CR?) at line 233 char 1.
> >> >>          `- Premature end of input at line 233 char 1.
> >> >> However, apart from knowing where is problem is located, I have
> >> >> difficulties
> >> >> finding out where my code went wrong.
> >> >> I already read Parslet's documentation without finding a solution, so
> >> >> now I
> >> >> hope someone on this list might help me with my problem.
> >> >> On a site note, I am often not sure when to use 'repeat(1)' instead
> of
> >> >> just
> >> >> repeat. I know the latter repeats the rule zero or more times, but
> how
> >> >> do I
> >> >> decide when zero is enough? Is there a rule to follow?
> >> >> Thanks again in advance!
> >> >> Stefan
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >
> >
>

Re: [ruby.parslet] Parsing the NCBI Genetic Code Table

Reply via email to