Re: [whatwg] Parsing the string html
If you go to http://livedom.validator.nu/ and try to add html the DOM tree shows the head and body elements. I am using this as a reference to compare to my results. Its not reliable then? Thank you -Original Message- From: Tab Atkins Jr. [mailto:jackalm...@gmail.com] Sent: Saturday, August 03, 2013 1:18 AM To: Mohammad Al Houssami (Alumni) Cc: Ian Hickson; wha...@whatwg.org Subject: Re: [whatwg] Parsing the string html On Fri, Aug 2, 2013 at 5:08 PM, Mohammad Al Houssami (Alumni) mh...@mail.aub.edu wrote: That is totally correct. But are the head and body elements added to the document? So basically when we stop parsing the document should only have the html element is that correct? No, the spec clearly says Insert an HTML element... for those as you trace through the parsing. ~TJ
Re: [whatwg] Parsing the string html
That is totally correct. But are the head and body elements added to the document? So basically when we stop parsing the document should only have the html element is that correct? -Original Message- From: Ian Hickson [mailto:i...@hixie.ch] Sent: Saturday, August 03, 2013 12:05 AM To: Mohammad Al Houssami (Alumni) Cc: wha...@whatwg.org Subject: Re: [whatwg] Parsing the string html On Fri, 2 Aug 2013, Mohammad Al Houssami (Alumni) wrote: When parsing the string html the document should supposedly have an html root with head and body children. ( This is what live dom viewer shows at least) but according to the specs( if im not wrong) we only get the document with the html element and the stack of open elements will have html head and body elements in it. The html start tag token causes you to jump from the initial insertion mode to the before html insertion mode, and then the html element is created and you jump to before head. You then hit the end of file token, and that causes the head element to be generated, and switches you to in head, where head is popped and you switch to after head, where you insert a body element and switch to in body, at which point you stop parsing. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
[whatwg] HTML5 parsing and specification challenges and difficulties
Hello there, I am implementing an HTML5 parser and as part of the project I have to write a report that should include what are the major challenges faced during writing the specs of the parser and the actual parsing process. I have been trying to find some useful information but couldn't find any. Does anyone know if any references for what I want? Thanks a lot :)
[whatwg] Html5 Parser Tree Construction Stage
Hello All, I am building an HTML5 parser according to the specs on the whatwg website. I am currently in the tree construction stage and it seems to be so complex to have a general view of what is happening by reading the specs or at least know what things are needed ( like node types element types and the variables of each..) Is there any place where these things are listed or maybe an explanation of the tree construction stage that explains what is happening during in a general view? Any help is much appreciated :) Mohammad
[whatwg] HTML Namespace Elements
Hello Everyone. In the tokenizer specifications of the HTML5 parser the following is written : Otherwise, if there is a current node and it is not an element in the HTML namespace What does it mean ? It is linked to this page http://www.w3.org/1999/xhtml which doesnt provide any information regarding the HTML namespace. Any help is much appreciated. Thanks :)
[whatwg] HTML5 Tokenizer Test Cases and Correct Output
Hello everyone. I was wondering if there is some sort of tests for the Tokenizer along with the correct output of tokens as well as a way of representing tokens. What I have in mind is running the tokenizer on some HTML input and printing the tokens in the same way the correct output is written. I will then be comparing the result I have with the correct one provided character by character. :) Thanks in advance. Mohammad
[whatwg] Attribute value (double-quoted) state
Hello everyone, Can someone explain wat is meant in the attribute value double quoted state in the tokenization specs : http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#attribute-value-%28double-quoted%29-state It says the below. U+0026 AMPERSAND () Switch to the character reference in attribute value statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-attribute-value-state, with the additional allowed characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#additional-allowed-character being U+0022 QUOTATION MARK (). What does the additional allowed character mean? It says the following: The additional allowed character, if there is one Not a character reference. No characters are consumed, and nothing is returned. (This is not an error, either.) It didn't make any sense to me. Im still a beginner and know very few things about this and im trying to build a parser. Any help or explanation would be very much appreciated. Thank in advance Mohammad
[whatwg] Tokenizor PseudoCode
Hello Everyone, I just want to make sure that in places where no state change is called it means we stay in the same state right? Take the RCDATA state below. In the anything else branch we emit character token and then go consume another character and check all the cases in this state. This is the only thing that makes sense but I just want to make sure :) Thanks 12.2.4.3 RCDATA state Consume the next input characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#next-input-character: U+0026 AMPERSAND () Switch to the character reference in RCDATA statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-rcdata-state. U+003C LESS-THAN SIGN () Switch to the RCDATA less-than sign statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#rcdata-less-than-sign-state. U+ NULL Parse errorhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parse-error. Emit a U+FFFD REPLACEMENT CHARACTER character token. EOF Emit an end-of-file token. Anything else Emit the current input characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#current-input-character as a character token.
Re: [whatwg] Tokenizor PseudoCode
I'm trying to build an HTML5 Parser in Smalltalk and as a first step I'm implementing the tokenizer and everything happens there. I think this is the case only when we have scripts that add characters to the HTML document which is out of the scope of the project I am working on at the moment. Is this true or not ? Thanks Mohammad -Original Message- From: Bjoern Hoehrmann [mailto:derhoe...@gmx.net] Sent: Friday, March 15, 2013 11:30 PM To: Mohammad Al Houssami (Alumni) Cc: whatwg@lists.whatwg.org Subject: Re: [whatwg] Tokenizor PseudoCode * Mohammad Al Houssami (Alumni) wrote: I just want to make sure that in places where no state change is called it means we stay in the same state right? You missed When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage ... but if that does not result in a change of state either, then yes, as far as I am aware. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/