On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote: > > I just want to make sure that in places where no state change is called > it means we stay in the same state right? Take the RCDATA state below. > In the anything else branch we emit character token and then go consume > another character and check all the cases in this state. This is the > only thing that makes sense but I just want to make sure :)
On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote: > > You missed "When a token is emitted, it must immediately be handled by > the tree construction stage. The tree construction stage can affect the > state of the tokenization stage ..." but if that does not result in a > change of state either, then yes, as far as I am aware. On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote: > > I'm trying to build an HTML5 Parser in Smalltalk and as a first step I'm > implementing the tokenizer and everything happens there. I think this is > the case only when we have scripts that add characters to the HTML > document which is out of the scope of the project I am working on at the > moment. Is this true or not ? On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote: > > No. Grepping for "PLAINTEXT" should make this clear. There's a number of places in the tree construction stage that change the tokenizer state, in particular, the parsing for these elements: title, noscript, noframes, style, xmp, iframe, noembed, script, plaintext, textarea. HTH, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
