Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-07 Thread Patrick Gansterer
On 05.10.2013, at 19:13, Brent Fulgham wrote:

 The WinCairo port is as close to the AppleWin port as possible. It uses ICU 
 and I have no intention of changing that.
 
 The WinCE port is maintained by Patrick Gangsterer. I believe that this port 
 does not want to use ICU, preferring to use the limited subset of i18n 
 features provided by the operating system.

That's correct, but I think that changing the current API to ICU and implement 
some stub functions instead is a good idea.
Can we put the source code of this dummy-ICU somewhere into the tree?
See also the discussion about a reduced ICU at [1], but this would require a 
big copy of the ICU code in the tree, which I don't see as a good idea.

 I have heard from a number of people, mainly using WebKit in resource 
 constrained environments, who prefer to omit ICU due to its relatively large 
 footprint. But many of their concerns about library size might be satisfied 
 by rebuilding ICU with settings that omit the large encoding database. This 
 makes sense if their use cases do not need these features.

If you use WebKit e.g. as a simple english-only GUI without text-input there is 
no need for ICU, expect to compile the remaining code. So a dummy-ICU would 
be everything you need for this use-case and reduces the required resources

 -Brent
 
 Sent from my iPad
 
 On Oct 4, 2013, at 11:48 PM, Dirk Schulze dschu...@adobe.com wrote:
 
 
 On Oct 5, 2013, at 7:37 AM, Darin Adler da...@apple.com wrote:
 
 Any thoughts on this? I am not sure what the status of the WinCE port is, 
 but I’d like to hear from the maintainers of that port on the port status 
 and their view on this strategy.
 
 Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued 
 a long time ago and already removed. Probably I was wrong.
 
 Greetings,
 Dirk

-- Patrick

[1] https://lists.webkit.org/pipermail/webkit-dev/2013-June/025018.html

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-07 Thread Darin Adler
On Oct 7, 2013, at 1:34 AM, Patrick Gansterer par...@paroga.com wrote:

 On 05.10.2013, at 19:13, Brent Fulgham wrote:
 
 The WinCairo port is as close to the AppleWin port as possible. It uses ICU 
 and I have no intention of changing that.
 
 The WinCE port is maintained by Patrick Gangsterer. I believe that this port 
 does not want to use ICU, preferring to use the limited subset of i18n 
 features provided by the operating system.
 
 That's correct, but I think that changing the current API to ICU and 
 implement some stub functions instead is a good idea.
 Can we put the source code of this dummy-ICU somewhere into the tree?

Sure, seems fine to have it in the WebKit tree, presumably alongside or inside 
WTF. It’s really the same thing as what’s in wtf/unicode right now with some 
different function names. Mostly it would be moving that code inside functions 
with new names.

What we need is a road map.

I know how to change WebKit to use ICU directly, and how to test that both on 
my own Mac and the EWS and buildbot machines, but I don’t know how to test and 
figure out exactly how many of these stub functions are needed, even for a port 
with a buildbot but especially for one without.

Patrick, to state the obvious, if it really is only the WinCE port that would 
this, then I think it’s a project we need your help on.

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-07 Thread Patrick Gansterer

On 07.10.2013, at 18:28, Darin Adler wrote:

 On Oct 7, 2013, at 1:34 AM, Patrick Gansterer par...@paroga.com wrote:
 
 On 05.10.2013, at 19:13, Brent Fulgham wrote:
 
 The WinCairo port is as close to the AppleWin port as possible. It uses ICU 
 and I have no intention of changing that.
 
 The WinCE port is maintained by Patrick Gangsterer. I believe that this 
 port does not want to use ICU, preferring to use the limited subset of i18n 
 features provided by the operating system.
 
 That's correct, but I think that changing the current API to ICU and 
 implement some stub functions instead is a good idea.
 Can we put the source code of this dummy-ICU somewhere into the tree?
 
 Sure, seems fine to have it in the WebKit tree, presumably alongside or 
 inside WTF. It’s really the same thing as what’s in wtf/unicode right now 
 with some different function names. Mostly it would be moving that code 
 inside functions with new names.
 
 What we need is a road map.
 
 I know how to change WebKit to use ICU directly, and how to test that both on 
 my own Mac and the EWS and buildbot machines, but I don’t know how to test 
 and figure out exactly how many of these stub functions are needed, even for 
 a port with a buildbot but especially for one without.
 
 Patrick, to state the obvious, if it really is only the WinCE port that would 
 this, then I think it’s a project we need your help on.

I'd suggest that you do your thing by kicking out the WTF-Unicode and replace 
it by the ICU functions. Maybe you can think about where the dummy functions 
can live and I'll do the rest. I don't want to block anyone with this, but it 
would be great if someone feels responsible for reviewing my patches then. :-)

-- Patrick
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-06 Thread Geoffrey Garen
 There is an issue with ICU: it uses UTF16 as its internal representation, 
 while most of the Web nowadays is UTF8. Therefore, page text goes through 
 unnecessary encoding conversion, and takes more memory than in UTF8 (for most 
 of languages). So it might be not a good development direction to tie up 
 WebKit to ICU.

Is there a benchmark or website that can verify these claims?

Thanks,
Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-06 Thread Alp Toker
Geoffrey, http://userguide.icu-project.org/conversion/converters says:

Since ICU uses Unicode (UTF-16) internally, all converters convert
between UTF-16 (with the endianness according to the current platform)
and another encoding.

That said, I don't think it's a major concern because ICU works on byte
streams. It's not like these strings will persist internally somewhere
eating lots of memory.

From experience, the old WTF in-place converters found in WebKit
mobile ports of past were way-buggy and probably only ever tested with
ASCII. I'd say use ICU and don't look back :-)

Alp.


On 06/10/2013 20:08, Geoffrey Garen wrote:
 There is an issue with ICU: it uses UTF16 as its internal representation, 
 while most of the Web nowadays is UTF8. Therefore, page text goes through 
 unnecessary encoding conversion, and takes more memory than in UTF8 (for 
 most of languages). So it might be not a good development direction to tie 
 up WebKit to ICU.
 Is there a benchmark or website that can verify these claims?

 Thanks,
 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev

-- 
http://www.nuanti.com
the browser experts

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-06 Thread Benjamin Poulain
I think the question was about the performance impact of using UTF-16 as
an internal representation of characters.

The original claim was in effect that the encoding conversion to UTF-16
is so costly that it offsets any gain of doing codepoint operations on
UTF-16 instead of UTF-8.

It is a very strong claim because experiments so far have proven the
opposite. I think the statement against ICU/UTF16 needs to be backed by
experimental data.

Benjamin

On 10/6/13, 12:31 PM, Alp Toker wrote:
 Geoffrey, http://userguide.icu-project.org/conversion/converters says:
 
 Since ICU uses Unicode (UTF-16) internally, all converters convert
 between UTF-16 (with the endianness according to the current platform)
 and another encoding.
 
 That said, I don't think it's a major concern because ICU works on byte
 streams. It's not like these strings will persist internally somewhere
 eating lots of memory.
 
 From experience, the old WTF in-place converters found in WebKit
 mobile ports of past were way-buggy and probably only ever tested with
 ASCII. I'd say use ICU and don't look back :-)
 
 Alp.
 
 
 On 06/10/2013 20:08, Geoffrey Garen wrote:
 There is an issue with ICU: it uses UTF16 as its internal representation, 
 while most of the Web nowadays is UTF8. Therefore, page text goes through 
 unnecessary encoding conversion, and takes more memory than in UTF8 (for 
 most of languages). So it might be not a good development direction to tie 
 up WebKit to ICU.
 Is there a benchmark or website that can verify these claims?

 Thanks,
 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-06 Thread Geoffrey Garen
 Since ICU uses Unicode (UTF-16) internally, all converters convert
 between UTF-16 (with the endianness according to the current platform)
 and another encoding.”

The claim I would like to verify is that this design is slower and takes more 
memory” due to unnecessary encoding conversion”.

Engineers working on WebKit performance should provide — and require of others 
— verifiable empirical data to back up performance claims.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-06 Thread Alexey Proskuryakov

05.10.2013, в 04:09, Konstantin Tokarev annu...@yandex.ru написал(а):

 There is an issue with ICU: it uses UTF16 as its internal representation, 
 while most of the Web nowadays is UTF8. Therefore, page text goes through 
 unnecessary encoding conversion, and takes more memory than in UTF8 (for most 
 of languages). So it might be not a good development direction to tie up 
 WebKit to ICU.

UTF-8 decoding is performed by a custom codec in WTF, we don't use ICU for that.

So the question of which internal representation to use for strings that were 
UTF-8 on the wire is orthogonal to whether we use ICU directly or through an 
abstraction layer.

- WBR, Alexey Proskuryakov

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-05 Thread Dirk Schulze

On Oct 5, 2013, at 7:37 AM, Darin Adler da...@apple.com wrote:

 Any thoughts on this? I am not sure what the status of the WinCE port is, but 
 I’d like to hear from the maintainers of that port on the port status and 
 their view on this strategy.

Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a 
long time ago and already removed. Probably I was wrong.

Greetings,
Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-05 Thread Darin Adler
On Oct 4, 2013, at 11:48 PM, Dirk Schulze dschu...@adobe.com wrote:

 On Oct 5, 2013, at 7:37 AM, Darin Adler da...@apple.com wrote:
 
 Any thoughts on this? I am not sure what the status of the WinCE port is, 
 but I’d like to hear from the maintainers of that port on the port status 
 and their view on this strategy.
 
 Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a 
 long time ago and already removed. Probably I was wrong.

I don’t know. Let me word the question differently:

Is anyone using UnciodeWchar.h/cpp for their port? If so, please respond on 
this thread.

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-05 Thread Konstantin Tokarev


05.10.2013, 09:38, Darin Adler da...@apple.com:
 Hi folks.

 A while back the WebKit project made use of ICU directly. There were some 
 port maintainers who instead wanted to make WebKit work without ICU. At the 
 time, the strategy we pursued was to make a Unicode layer in WTF that layered 
 on top of ICU. We then created multiple implementations of that layer on top 
 of other back ends.

There is an issue with ICU: it uses UTF16 as its internal representation, while 
most of the Web nowadays is UTF8. Therefore, page text goes through unnecessary 
encoding conversion, and takes more memory than in UTF8 (for most of 
languages). So it might be not a good development direction to tie up WebKit to 
ICU.


-- 
Regards,
Konstantin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Proposal: Use ICU in WebKit code

2013-10-05 Thread Brent Fulgham
The WinCairo port is as close to the AppleWin port as possible. It uses ICU and 
I have no intention of changing that.

The WinCE port is maintained by Patrick Gangsterer. I believe that this port 
does not want to use ICU, preferring to use the limited subset of i18n features 
provided by the operating system.

I have heard from a number of people, mainly using WebKit in resource 
constrained environments, who prefer to omit ICU due to its relatively large 
footprint. But many of their concerns about library size might be satisfied by 
rebuilding ICU with settings that omit the large encoding database. This makes 
sense if their use cases do not need these features.

-Brent

Sent from my iPad

 On Oct 4, 2013, at 11:48 PM, Dirk Schulze dschu...@adobe.com wrote:
 
 
 On Oct 5, 2013, at 7:37 AM, Darin Adler da...@apple.com wrote:
 
 Any thoughts on this? I am not sure what the status of the WinCE port is, 
 but I’d like to hear from the maintainers of that port on the port status 
 and their view on this strategy.
 
 Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a 
 long time ago and already removed. Probably I was wrong.
 
 Greetings,
 Dirk
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] Proposal: Use ICU in WebKit code

2013-10-04 Thread Darin Adler
Hi folks.

A while back the WebKit project made use of ICU directly. There were some port 
maintainers who instead wanted to make WebKit work without ICU. At the time, 
the strategy we pursued was to make a Unicode layer in WTF that layered on top 
of ICU. We then created multiple implementations of that layer on top of other 
back ends.

But this Unicode layer is simply an awkward renamed subset of ICU. I find it 
inconvenient when doing work that requires ICU features and it has held back my 
work in the past.

At this point we are down to only two back ends: The one for ICU, and one that 
is implemented on top of Windows functions, UnicodeWchar.h/cpp. I believe 
UnicodeWchar is currently used only by the WinCE port. A number of the 
UnicodeWchar implementations are not complete. For example, the toLower 
function does not handle the “ß” character.

I suggest we remove the Unicode.h abstraction and use ICU directly. I suggest 
we continue to use the ICU C API, by the way, not the C++ API.

For the WinCE port, I suggest we do one of these two things:

A) Change the port to require the ICU library.

B) Implement a subset of ICU that is enough to compile WebKit, using 
implementations quite like the ones in UnicodeW.h/cpp today, but using the ICU 
function names and constants, rathe than an abstraction layer invented for WTF.

Thus, code in WebKit can make use of ICU directly in a way that’s easier to 
understand. Any port that wants to work without ICU can implement an ICU subset 
compatibility layer in a way that does not require changes to the WebKit code.

I am not in a good position to test this ICU subset compatibility layer, but I 
think it would be a quick easy job to refactor UnicodeWchar.h/cpp into that 
form.

Any thoughts on this? I am not sure what the status of the WinCE port is, but 
I’d like to hear from the maintainers of that port on the port status and their 
view on this strategy.

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev