Hello again, Im building an HTML5 Parser in smalltalk. Im building it according to the pseudo code provided by WHATWG mainly these: Tokenization: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html Tree Construction: http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html
Im still in the Tokenization phase. The spec defines a state machine. It describes what has to be done when we reach a certain state. My approach to building the tokenizer is by representing each state as a method. Each method does some operations ( calling other methods to represent changing state or returning tokens etc..) I will be translating the pseudo code provided as is to Smalltalk. I am not sure if this is the best approach to do things especially that I am still new to Smalltalk. I was told by Stephane Ducasse to use PetitParser. Doing a quick reading I noticed that it is used when grammars are available. In my case I don't have a grammar but pseudo-code of a parser. Can I know if anyone has any suggestions for such a project or any comments on the approach I am aiming to follow ? Thanks in advance, Mohammad
