On Thu, Oct 27, 2011 at 08:12:13AM -0400, James Cox wrote:
> Hey,
> 
> I'm trying to come up with a peg for natural language time parsing.
> 
> given a string like:
> 
> "the sunday before last"
> 
> or
> 
> "4 fridays hence"
> 
> or
> 
> "jan 1st last year"
> 
> I'd like to parse these and convert to meaningful data.
> 
> 'the sunday before last' translates to sunday, past, 2
> '2 fridays hence' translates to friday, future, 4
> 'jan 1st last year' translates to jan, 1, past, year-1
> 
> (or something like that :))
> 
> i've started in building my grammar, but i'm struggling to get my head
> around how to approach it - and therefore would appreciate any
> feedback as to the best approach.
> 
> e.g. i'm not sure that PEG is completely right, and since there are no
> tokens separating content (other than a space) it's been tricky to
> figure out how to approach it.
> 
> so... if anyone is willing to help share some pointers and discuss
> approach for this, i'd very much appreciate it!
> 

If you have prior experience with other grammar languages, the most
significant things to me that are different about PEG would be the
sorts of things you can do with backtracking.  You can assert that
conditions *will* be true much later in your parse at some crucial
early stage with the & predicate.  You can similarly assert some
future condition will not be true with the ! predicate.

Since things that were traditionally done in the lexing stage are
done in the grammar, I've mostly seen them handled by convention.
For instance, if you make sure you swallow all whitespace after
matching an atom, and you do so consistently, you'll develop a
good feel for where you need to add rules for handling whitespace
in your grammar.  It may look a bit different at first, but it's
no more or less wrong.

At the end of the day, your grammar will describe a formal language
that will incidentally look like a natural language.  You'll likely
find you have to make calls and decisions and just decide that one
particulary interpretation of an input string is correct.  Or your
domain may be so limited that it's obvious or there is no overlap.

A brief look at your example statements suggests that the high-level
structure is:

  quantity? date reference-point

Where reference-point is the set of statements like "hence" and
"before last", date is the English description of a date, and
quantity is an integer, perhaps spoken like you're Abe Lincoln.

If you'd like to get used to PEG, I would suggest starting with
that middle component, date, and parse a date in some number of
supported but unambiguous formats.  Take this problem and turn
it into bite-sized pieces.  :-)

-Alan
-- 
.i ma'a lo bradi cu penmi gi'e du

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to