Re: [whatwg] Parsing the string html

2013-08-03 Thread Mohammad Al Houssami (Alumni)
If you go to http://livedom.validator.nu/ and try to add html the DOM tree 
shows the head and body elements. I am using this as a reference to compare to 
my results. Its not reliable then?
Thank you 

-Original Message-
From: Tab Atkins Jr. [mailto:jackalm...@gmail.com] 
Sent: Saturday, August 03, 2013 1:18 AM
To: Mohammad Al Houssami (Alumni)
Cc: Ian Hickson; wha...@whatwg.org
Subject: Re: [whatwg] Parsing the string html

On Fri, Aug 2, 2013 at 5:08 PM, Mohammad Al Houssami (Alumni) 
mh...@mail.aub.edu wrote:
 That is totally correct. But are the head and body elements added to the 
 document? So basically when we stop parsing the document should only have the 
 html element is that correct?

No, the spec clearly says Insert an HTML element... for those as you trace 
through the parsing.

~TJ



Re: [whatwg] Parsing the string html

2013-08-02 Thread Mohammad Al Houssami (Alumni)
That is totally correct. But are the head and body elements added to the 
document? So basically when we stop parsing the document should only have the 
html element is that correct? 

-Original Message-
From: Ian Hickson [mailto:i...@hixie.ch] 
Sent: Saturday, August 03, 2013 12:05 AM
To: Mohammad Al Houssami (Alumni)
Cc: wha...@whatwg.org
Subject: Re: [whatwg] Parsing the string html

On Fri, 2 Aug 2013, Mohammad Al Houssami (Alumni) wrote:
 
 When parsing the string html the document should supposedly have an 
 html root with head and body children. ( This is what live dom viewer 
 shows at least) but according to the specs( if im not wrong) we only 
 get the document with the html element and the stack of open elements 
 will have html head and body elements in it.

The html start tag token causes you to jump from the initial 
insertion mode to the before html insertion mode, and then the html element 
is created and you jump to before head.

You then hit the end of file token, and that causes the head element to be 
generated, and switches you to in head, where head is popped and you switch 
to after head, where you insert a body element and switch to in body, at 
which point you stop parsing.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'




[whatwg] HTML5 parsing and specification challenges and difficulties

2013-07-30 Thread Mohammad Al Houssami (Alumni)
Hello there,

I am implementing an HTML5 parser and as part of the project I have to write a 
report that should include what are the major challenges faced during writing 
the specs of the parser and the actual parsing process. I have been trying to 
find some useful information but couldn't find any. Does anyone know if any 
references for what I want?

Thanks a lot :)


[whatwg] Html5 Parser Tree Construction Stage

2013-06-23 Thread Mohammad Al Houssami (Alumni)
Hello All,

I am building an HTML5 parser according to the specs on the whatwg website. I 
am currently in the tree construction stage and it seems to be so complex to 
have a general view of what is happening by reading the specs or at least know 
what things are needed ( like node types element types and the variables of 
each..) Is there any place where these things are listed or maybe an 
explanation of the tree construction stage that explains what is happening 
during in a general view?

Any help is much appreciated :)
Mohammad


[whatwg] HTML Namespace Elements

2013-04-08 Thread Mohammad Al Houssami (Alumni)
Hello Everyone.

In the tokenizer specifications of the HTML5 parser the following is written :

Otherwise, if there is a current node and it is not an element in the HTML 
namespace 
What does it mean ?
It is linked to this page http://www.w3.org/1999/xhtml which doesnt provide any 
information regarding the HTML namespace.

Any help is much appreciated.

Thanks :)



[whatwg] HTML5 Tokenizer Test Cases and Correct Output

2013-03-28 Thread Mohammad Al Houssami (Alumni)
Hello everyone.

I was wondering if there is some sort of tests for the Tokenizer along with the 
correct output of tokens as well as a way of representing tokens.
What I have in mind is running the tokenizer on some HTML input and printing 
the tokens in the same way the correct output is written.
I will  then be comparing the result I have with the correct one provided 
character by character. :)

Thanks in advance.
Mohammad



[whatwg] Attribute value (double-quoted) state

2013-03-25 Thread Mohammad Al Houssami (Alumni)
Hello everyone,
Can someone explain wat is meant in the attribute value double quoted state in 
the tokenization specs : 
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#attribute-value-%28double-quoted%29-state
It says the below.
U+0026 AMPERSAND ()
Switch to the character reference in attribute value 
statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-attribute-value-state,
 with the additional allowed 
characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#additional-allowed-character
 being U+0022 QUOTATION MARK ().
What does the additional allowed character mean? It says the following:
The additional allowed character, if there is one
Not a character reference. No characters are consumed, and nothing is returned. 
(This is not an error, either.)

It didn't make any sense to me. Im still a beginner and know very few things 
about this and im trying to build a parser.
Any help or explanation would be very much appreciated.

Thank in advance
Mohammad




[whatwg] Tokenizor PseudoCode

2013-03-15 Thread Mohammad Al Houssami (Alumni)
Hello Everyone,

I just want to make sure that in places where no state change is called it 
means we stay in the same state right?
Take the RCDATA state below. In the anything else branch we emit character 
token and then go consume another character and check all the cases in this 
state.
This is the only thing that makes sense but I just want to make sure :)

Thanks


12.2.4.3 RCDATA state
Consume the next input 
characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#next-input-character:
U+0026 AMPERSAND ()
Switch to the character reference in RCDATA 
statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-rcdata-state.
U+003C LESS-THAN SIGN ()
Switch to the RCDATA less-than sign 
statehttp://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#rcdata-less-than-sign-state.
U+ NULL
Parse 
errorhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parse-error.
 Emit a U+FFFD REPLACEMENT CHARACTER character token.
EOF
Emit an end-of-file token.
Anything else
Emit the current input 
characterhttp://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#current-input-character
 as a character token.



Re: [whatwg] Tokenizor PseudoCode

2013-03-15 Thread Mohammad Al Houssami (Alumni)
I'm trying to build an HTML5 Parser in Smalltalk and as a first step I'm 
implementing the tokenizer and everything happens there. I think this is the 
case only when we have scripts that add characters to the HTML document which 
is out of the scope of the project I am working on at the moment. Is this true 
or not ?
Thanks
Mohammad


-Original Message-
From: Bjoern Hoehrmann [mailto:derhoe...@gmx.net] 
Sent: Friday, March 15, 2013 11:30 PM
To: Mohammad Al Houssami (Alumni)
Cc: whatwg@lists.whatwg.org
Subject: Re: [whatwg] Tokenizor PseudoCode

* Mohammad Al Houssami (Alumni) wrote:
I just want to make sure that in places where no state change is called 
it means we stay in the same state right?

You missed When a token is emitted, it must immediately be handled by the tree 
construction stage. The tree construction stage can affect the state of the 
tokenization stage ... but if that does not result in a change of state 
either, then yes, as far as I am aware.
--
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am 
Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/