Hey folks! I'm writing the "definitive" URL parser class. Lofty goal, perhaps, but also a learning exercise. I have an issue with entering and leaving actions.
My code's on GitHub: https://github.com/francois/urlparser/blob/master/url.rl#L34 Given the following two URLs: tcp://127.0.0.1:1234 tcp://a:[email protected]:1234/ For both URLs, I correctly recognize the scheme. For both URLs, either user or hostname is wrong, and in both cases, the port's not recognized. My Ruby implementation is at https://github.com/francois/urlparser/blob/master/ruby/lib/urlparser/parser.rl#L14 My question boils down to: how do I definitively know that what I'm looking at is a user, vs a hostname, since both have nearly the same set of characters. Should I be using "State Action Embedding Operators"? Actually, scratch that: it seems that's what I should be doing, because I managed to recognize the host in some cases. For the first URL above, I can recognize most of the port: I end up with 123, not 1234, thus losing the last character. A little pointer to some existing parser with the similar behavior would be appreciated. Thanks! François _______________________________________________ ragel-users mailing list [email protected] http://www.complang.org/mailman/listinfo/ragel-users
