On Thu, Oct 27, 2011 at 08:12:13AM -0400, James Cox wrote: > Hey, > > I'm trying to come up with a peg for natural language time parsing. > > given a string like: > > "the sunday before last" > > or > > "4 fridays hence" > > or > > "jan 1st last year" > > I'd like to parse these and convert to meaningful data. > > 'the sunday before last' translates to sunday, past, 2 > '2 fridays hence' translates to friday, future, 4 > 'jan 1st last year' translates to jan, 1, past, year-1 > > (or something like that :)) > > i've started in building my grammar, but i'm struggling to get my head > around how to approach it - and therefore would appreciate any > feedback as to the best approach. > > e.g. i'm not sure that PEG is completely right, and since there are no > tokens separating content (other than a space) it's been tricky to > figure out how to approach it. > > so... if anyone is willing to help share some pointers and discuss > approach for this, i'd very much appreciate it! >
If you have prior experience with other grammar languages, the most significant things to me that are different about PEG would be the sorts of things you can do with backtracking. You can assert that conditions *will* be true much later in your parse at some crucial early stage with the & predicate. You can similarly assert some future condition will not be true with the ! predicate. Since things that were traditionally done in the lexing stage are done in the grammar, I've mostly seen them handled by convention. For instance, if you make sure you swallow all whitespace after matching an atom, and you do so consistently, you'll develop a good feel for where you need to add rules for handling whitespace in your grammar. It may look a bit different at first, but it's no more or less wrong. At the end of the day, your grammar will describe a formal language that will incidentally look like a natural language. You'll likely find you have to make calls and decisions and just decide that one particulary interpretation of an input string is correct. Or your domain may be so limited that it's obvious or there is no overlap. A brief look at your example statements suggests that the high-level structure is: quantity? date reference-point Where reference-point is the set of statements like "hence" and "before last", date is the English description of a date, and quantity is an integer, perhaps spoken like you're Abe Lincoln. If you'd like to get used to PEG, I would suggest starting with that middle component, date, and parse a date in some number of supported but unambiguous formats. Take this problem and turn it into bite-sized pieces. :-) -Alan -- .i ma'a lo bradi cu penmi gi'e du _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg