Melissa, I really thought 'match' would take any regular expression, but I looked it up and you are right:
# Returns an atom matching a character class. All regular expressions can be # used, as long as they match only a single character at a time. With this information I got :id to work, but got stuck again at :name. :name sometimes reaches over two lines, but the document had already been split into lines after each newline. I then tried to first split the document into blocks before and after a parentheses, but did not succeed. However, I will try to solve this problem in the next few days. Thanks again for your help Stefan On Tue, Aug 9, 2011 at 21:06, Melissa Whittington < [email protected]> wrote: > Stefan, > > Ah! I missed one important mistake that I've easily made myself > before. You can't use 'match' to match multiple characters, the > regular expression can only match one character. I find that slightly > unintuitive and it gives no warning if you try to do this. > > I tried this: > rule(:content) {str(' id ') >> match('\d').repeat >> > textdata.repeat} > rule(:no_value) {textdata.repeat(1)} > > Because it tries to match :content first, it will only match :no_value > if it didn't match :content. That matched all the lines with "id". > > For me, learning parslet has been fairly trial and error too. And > google thinks 'parsley' is a much better word to search for than > 'parslet', heh. > > -mj > > On Tue, Aug 9, 2011 at 4:57 AM, Stefan Rohlfing > <[email protected]> wrote: > > Melissa, > > Thanks for your help! > > However, after fixing the problems you pointed me to I got stuck again > > https://github.com/bytesource/CodonTableParser/blob/master/parser.rb > > and I am realizing that I am more or less relying on trial & error here. > In > > other words, I am still lacking the knowledge of translating a document > into > > its Backus Naur form with which I can then feed the parser (Parslet). > > As I have no background in computer science, I would be interested in any > > resources (printed or online) you have found valuable in laying the basis > > for building a parser. This question is for everyone, as I am always > > interested in different opinions. > > Stefan > > > > On Mon, Aug 8, 2011 at 19:49, Melissa Whittington > > <[email protected]> wrote: > >> > >> Whoops, I meant "The :file rule's repeat is what is describing multiple > >> lines." > >> > >> -mj > >> > >> On Mon, Aug 8, 2011 at 7:47 AM, Melissa Whittington > >> <[email protected]> wrote: > >> > Stefan, > >> > > >> > The reason you're getting that error on the last line is because there > >> > will be no newline at the end of the last line, so just switch it to > >> > 'newline.maybe'. > >> > > >> > Your :line rule also does not need the .repeat because there will only > >> > be one of either a :codon or a :comment and not more. The :line rule's > >> > repeat is what is describing multiple lines. > >> > > >> > Also, I don't know what "repeat(1)" by itself does, but you probably > >> > don't mean that? > >> > > >> > Don't forget any only matches one character. You should probably not > >> > use any, either. For your :content and :no_value rules, they should be > >> > matching everything on a line (sans a possible newline). You could use > >> > any.repeat to parse the rest of the line, but it will try to parse > >> > *anything* including newlines and on to the next lines which is not > >> > what you want. > >> > > >> > So, it'll probably be helpful to be a little more descriptive. > >> > > >> > Hope that helps you make a little more progress! > >> > > >> > -mj > >> > > >> > On Mon, Aug 8, 2011 at 12:21 AM, Stefan Rohlfing > >> > <[email protected]> wrote: > >> >> Hi, > >> >> I am trying to parse the NCBI genetic code table: > >> >> > >> >> > https://github.com/bytesource/CodonTableParser/blob/master/data/codons.txt > >> >> to extract those lines of each block that contain either "name", > "id", > >> >> "ncbieaa", or "sncbieaa". > >> >> As each line either contains the content I am interested in or text > >> >> that can > >> >> be discarded, I started by first parsing the document on a per-line > >> >> basis: > >> >> https://github.com/bytesource/CodonTableParser/blob/master/parser.rb > >> >> Unfortunately, parsing the file resulted in an error message that > tells > >> >> me > >> >> Parslet failed to parse line 233, which is the very last line of the > >> >> file: > >> >> Expected at least 1 of LINE NEWLINE at line 1 char 1. > >> >> `- Expected at least 1 of LINE NEWLINE at line 1 char 1. > >> >> `- Failed to match sequence (LINE NEWLINE) at line 233 char 1. > >> >> `- Failed to match sequence (LF CR?) at line 233 char 1. > >> >> `- Premature end of input at line 233 char 1. > >> >> However, apart from knowing where is problem is located, I have > >> >> difficulties > >> >> finding out where my code went wrong. > >> >> I already read Parslet's documentation without finding a solution, so > >> >> now I > >> >> hope someone on this list might help me with my problem. > >> >> On a site note, I am often not sure when to use 'repeat(1)' instead > of > >> >> just > >> >> repeat. I know the latter repeats the rule zero or more times, but > how > >> >> do I > >> >> decide when zero is enough? Is there a rule to follow? > >> >> Thanks again in advance! > >> >> Stefan > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> > > > > > >
