Maciej,
*I* deemed using a character type template for the HTMLTokenizer as being
unwieldy. Given there was the existing SegmentedString input abstraction, it
made logical sense to put the 8/16 bit coding there. If I would have moved the
8/16 logic into the tokenizer itself, we might have
Oh, Ok. I misunderstood your original message to say that the project
as a whole had reached this conclusion, which certainly isn't the
case, rather than that you personally had reached that conclusion.
As for the long-term direction of the HTML parser, my guess is that
the optimum design will
On Mar 11, 2013, at 9:54 AM, Adam Barth aba...@webkit.org wrote:
As for the long-term direction of the HTML parser, my guess is that the
optimum design will be to deliver the network bytes to the parser directly on
the parser thread.
Sounds right to me.
If you're about to reply
No complaints with the long term direction. I agree that it is a tall order to
implement.
- Michael
On Mar 11, 2013, at 9:54 AM, Adam Barth aba...@webkit.org wrote:
Oh, Ok. I misunderstood your original message to say that the project
as a whole had reached this conclusion, which certainly
On Mon, Mar 11, 2013 at 9:56 AM, Darin Adler da...@apple.com wrote:
On Mar 11, 2013, at 9:54 AM, Adam Barth aba...@webkit.org wrote:
If you're about to reply complaining about the above, please save your
complaints for another time.
Huh?
The last time we tried to talk about changing the
On Mar 7, 2013 10:37 PM, Brady Eidson beid...@apple.com wrote:
On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff msab...@apple.com
wrote:
The various tokenizers / lexers work various ways to handle LChar
versus UChar input streams. Most of the other tokenizers are templatized
on input character
On Sat, Mar 9, 2013 at 12:48 PM, Luis de Bethencourt
l...@debethencourt.com wrote:
On Mar 7, 2013 10:37 PM, Brady Eidson beid...@apple.com wrote:
On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff msab...@apple.com
wrote:
The various tokenizers / lexers work various ways to handle LChar
On Mar 9, 2013, at 3:05 PM, Adam Barth aba...@webkit.org wrote:
In retrospect, I think what I was reacting to was msaboff statement
that an unnamed group of people had decided that the HTML tokenizer
was too unwieldy to have a dedicated 8-bit path. In particular, it's
unclear to me who
Hi folks.
Today, bytes that come in from the network get turned into UTF-16 by the
decoding process. We then turn some of them back into Latin-1 during the
parsing process. Should we make changes so there’s an 8-bit path? It might be
as simple as writing code that has more of an all-ASCII
There is an all-ASCII case in TextCodecUTF8::decode(). It should be keeping
all ASCII data as 8 bit. TextCodecWindowsLatin1::decode() has not only an
all-ASCII case, but it only up converts to 16 bit in a couple of rare cases.
Is there some other case you don't think we are handling?
-
No. I retract my question. Sounds like we already have it right! thanks for
setting me straight.
Maybe some day we could make a non copying code path that points directly at
the data in the SharedBuffer, but I have no idea if that'd be beneficial.
-- Darin
Sent from my iPhone
On Mar 7,
The HTMLTokenizer still works in UChars. There's likely some
performance to be gained by moving it to an 8-bit character type.
There's some trickiness involved because HTML entities can expand to
characters outside of Latin-1. Also, it's unclear if we want two
tokenizers (one that's 8 bits wide
The various tokenizers / lexers work various ways to handle LChar versus UChar
input streams. Most of the other tokenizers are templatized on input character
type. In the case of HTML, the tokenizer handles a UChar character at a time.
For 8 bit input streams, the zero extension of a LChar to
Yes, I understand how the HTML tokenizer works. :)
Adam
On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff msab...@apple.com wrote:
The various tokenizers / lexers work various ways to handle LChar versus
UChar input streams. Most of the other tokenizers are templatized on input
character
On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff msab...@apple.com wrote:
The various tokenizers / lexers work various ways to handle LChar versus
UChar input streams. Most of the other tokenizers are templatized on input
character type. In the case of HTML, the tokenizer handles a UChar
15 matches
Mail list logo