Re: [webkit-dev] Proposal: Use ICU in WebKit code
On 07.10.2013, at 18:28, Darin Adler wrote: > On Oct 7, 2013, at 1:34 AM, Patrick Gansterer wrote: > >> On 05.10.2013, at 19:13, Brent Fulgham wrote: >> >>> The WinCairo port is as close to the AppleWin port as possible. It uses ICU >>> and I have no intention of changing that. >>> >>> The WinCE port is maintained by Patrick Gangsterer. I believe that this >>> port does not want to use ICU, preferring to use the limited subset of i18n >>> features provided by the operating system. >> >> That's correct, but I think that changing the current API to ICU and >> implement some stub functions instead is a good idea. >> Can we put the source code of this "dummy-ICU" somewhere into the tree? > > Sure, seems fine to have it in the WebKit tree, presumably alongside or > inside WTF. It’s really the same thing as what’s in wtf/unicode right now > with some different function names. Mostly it would be moving that code > inside functions with new names. > > What we need is a road map. > > I know how to change WebKit to use ICU directly, and how to test that both on > my own Mac and the EWS and buildbot machines, but I don’t know how to test > and figure out exactly how many of these stub functions are needed, even for > a port with a buildbot but especially for one without. > > Patrick, to state the obvious, if it really is only the WinCE port that would > this, then I think it’s a project we need your help on. I'd suggest that you do your thing by kicking out the WTF-Unicode and replace it by the ICU functions. Maybe you can think about where the dummy functions can live and I'll do the rest. I don't want to block anyone with this, but it would be great if someone feels "responsible" for reviewing my patches then. :-) -- Patrick ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
On Oct 7, 2013, at 1:34 AM, Patrick Gansterer wrote: > On 05.10.2013, at 19:13, Brent Fulgham wrote: > >> The WinCairo port is as close to the AppleWin port as possible. It uses ICU >> and I have no intention of changing that. >> >> The WinCE port is maintained by Patrick Gangsterer. I believe that this port >> does not want to use ICU, preferring to use the limited subset of i18n >> features provided by the operating system. > > That's correct, but I think that changing the current API to ICU and > implement some stub functions instead is a good idea. > Can we put the source code of this "dummy-ICU" somewhere into the tree? Sure, seems fine to have it in the WebKit tree, presumably alongside or inside WTF. It’s really the same thing as what’s in wtf/unicode right now with some different function names. Mostly it would be moving that code inside functions with new names. What we need is a road map. I know how to change WebKit to use ICU directly, and how to test that both on my own Mac and the EWS and buildbot machines, but I don’t know how to test and figure out exactly how many of these stub functions are needed, even for a port with a buildbot but especially for one without. Patrick, to state the obvious, if it really is only the WinCE port that would this, then I think it’s a project we need your help on. — Darin ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
On 05.10.2013, at 19:13, Brent Fulgham wrote: > The WinCairo port is as close to the AppleWin port as possible. It uses ICU > and I have no intention of changing that. > > The WinCE port is maintained by Patrick Gangsterer. I believe that this port > does not want to use ICU, preferring to use the limited subset of i18n > features provided by the operating system. That's correct, but I think that changing the current API to ICU and implement some stub functions instead is a good idea. Can we put the source code of this "dummy-ICU" somewhere into the tree? See also the discussion about a reduced ICU at [1], but this would require a big copy of the ICU code in the tree, which I don't see as a good idea. > I have heard from a number of people, mainly using WebKit in resource > constrained environments, who prefer to omit ICU due to its relatively large > footprint. But many of their concerns about library size might be satisfied > by rebuilding ICU with settings that omit the large encoding database. This > makes sense if their use cases do not need these features. If you use WebKit e.g. as a simple english-only GUI without text-input there is no need for ICU, expect to compile the remaining code. So a "dummy-ICU" would be everything you need for this use-case and reduces the required resources > -Brent > > Sent from my iPad > >> On Oct 4, 2013, at 11:48 PM, Dirk Schulze wrote: >> >> >>> On Oct 5, 2013, at 7:37 AM, Darin Adler wrote: >>> >>> Any thoughts on this? I am not sure what the status of the WinCE port is, >>> but I’d like to hear from the maintainers of that port on the port status >>> and their view on this strategy. >> >> Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued >> a long time ago and already removed. Probably I was wrong. >> >> Greetings, >> Dirk -- Patrick [1] https://lists.webkit.org/pipermail/webkit-dev/2013-June/025018.html ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
05.10.2013, в 04:09, Konstantin Tokarev написал(а): > There is an issue with ICU: it uses UTF16 as its internal representation, > while most of the Web nowadays is UTF8. Therefore, page text goes through > unnecessary encoding conversion, and takes more memory than in UTF8 (for most > of languages). So it might be not a good development direction to tie up > WebKit to ICU. UTF-8 decoding is performed by a custom codec in WTF, we don't use ICU for that. So the question of which internal representation to use for strings that were UTF-8 on the wire is orthogonal to whether we use ICU directly or through an abstraction layer. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
> "Since ICU uses Unicode (UTF-16) internally, all converters convert > between UTF-16 (with the endianness according to the current platform) > and another encoding.” The claim I would like to verify is that this design is slower and "takes more memory” due to "unnecessary encoding conversion”. Engineers working on WebKit performance should provide — and require of others — verifiable empirical data to back up performance claims. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
I think the question was about the performance impact of using UTF-16 as an internal representation of characters. The original claim was in effect that the encoding conversion to UTF-16 is so costly that it offsets any gain of doing codepoint operations on UTF-16 instead of UTF-8. It is a very strong claim because experiments so far have proven the opposite. I think the statement against ICU/UTF16 needs to be backed by experimental data. Benjamin On 10/6/13, 12:31 PM, Alp Toker wrote: > Geoffrey, http://userguide.icu-project.org/conversion/converters says: > > "Since ICU uses Unicode (UTF-16) internally, all converters convert > between UTF-16 (with the endianness according to the current platform) > and another encoding." > > That said, I don't think it's a major concern because ICU works on byte > streams. It's not like these strings will persist internally somewhere > eating lots of memory. > > From experience, the old WTF in-place converters found in WebKit > "mobile" ports of past were way-buggy and probably only ever tested with > ASCII. I'd say use ICU and don't look back :-) > > Alp. > > > On 06/10/2013 20:08, Geoffrey Garen wrote: >>> There is an issue with ICU: it uses UTF16 as its internal representation, >>> while most of the Web nowadays is UTF8. Therefore, page text goes through >>> unnecessary encoding conversion, and takes more memory than in UTF8 (for >>> most of languages). So it might be not a good development direction to tie >>> up WebKit to ICU. >> Is there a benchmark or website that can verify these claims? >> >> Thanks, >> Geoff >> ___ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> https://lists.webkit.org/mailman/listinfo/webkit-dev > ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
Geoffrey, http://userguide.icu-project.org/conversion/converters says: "Since ICU uses Unicode (UTF-16) internally, all converters convert between UTF-16 (with the endianness according to the current platform) and another encoding." That said, I don't think it's a major concern because ICU works on byte streams. It's not like these strings will persist internally somewhere eating lots of memory. >From experience, the old WTF in-place converters found in WebKit "mobile" ports of past were way-buggy and probably only ever tested with ASCII. I'd say use ICU and don't look back :-) Alp. On 06/10/2013 20:08, Geoffrey Garen wrote: >> There is an issue with ICU: it uses UTF16 as its internal representation, >> while most of the Web nowadays is UTF8. Therefore, page text goes through >> unnecessary encoding conversion, and takes more memory than in UTF8 (for >> most of languages). So it might be not a good development direction to tie >> up WebKit to ICU. > Is there a benchmark or website that can verify these claims? > > Thanks, > Geoff > ___ > webkit-dev mailing list > webkit-dev@lists.webkit.org > https://lists.webkit.org/mailman/listinfo/webkit-dev -- http://www.nuanti.com the browser experts ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
> There is an issue with ICU: it uses UTF16 as its internal representation, > while most of the Web nowadays is UTF8. Therefore, page text goes through > unnecessary encoding conversion, and takes more memory than in UTF8 (for most > of languages). So it might be not a good development direction to tie up > WebKit to ICU. Is there a benchmark or website that can verify these claims? Thanks, Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
The WinCairo port is as close to the AppleWin port as possible. It uses ICU and I have no intention of changing that. The WinCE port is maintained by Patrick Gangsterer. I believe that this port does not want to use ICU, preferring to use the limited subset of i18n features provided by the operating system. I have heard from a number of people, mainly using WebKit in resource constrained environments, who prefer to omit ICU due to its relatively large footprint. But many of their concerns about library size might be satisfied by rebuilding ICU with settings that omit the large encoding database. This makes sense if their use cases do not need these features. -Brent Sent from my iPad > On Oct 4, 2013, at 11:48 PM, Dirk Schulze wrote: > > >> On Oct 5, 2013, at 7:37 AM, Darin Adler wrote: >> >> Any thoughts on this? I am not sure what the status of the WinCE port is, >> but I’d like to hear from the maintainers of that port on the port status >> and their view on this strategy. > > Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a > long time ago and already removed. Probably I was wrong. > > Greetings, > Dirk > ___ > webkit-dev mailing list > webkit-dev@lists.webkit.org > https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
05.10.2013, 09:38, "Darin Adler" : > Hi folks. > > A while back the WebKit project made use of ICU directly. There were some > port maintainers who instead wanted to make WebKit work without ICU. At the > time, the strategy we pursued was to make a Unicode layer in WTF that layered > on top of ICU. We then created multiple implementations of that layer on top > of other back ends. There is an issue with ICU: it uses UTF16 as its internal representation, while most of the Web nowadays is UTF8. Therefore, page text goes through unnecessary encoding conversion, and takes more memory than in UTF8 (for most of languages). So it might be not a good development direction to tie up WebKit to ICU. -- Regards, Konstantin ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
On Oct 4, 2013, at 11:48 PM, Dirk Schulze wrote: > On Oct 5, 2013, at 7:37 AM, Darin Adler wrote: > >> Any thoughts on this? I am not sure what the status of the WinCE port is, >> but I’d like to hear from the maintainers of that port on the port status >> and their view on this strategy. > > Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a > long time ago and already removed. Probably I was wrong. I don’t know. Let me word the question differently: Is anyone using UnciodeWchar.h/cpp for their port? If so, please respond on this thread. — Darin ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Proposal: Use ICU in WebKit code
On Oct 5, 2013, at 7:37 AM, Darin Adler wrote: > Any thoughts on this? I am not sure what the status of the WinCE port is, but > I’d like to hear from the maintainers of that port on the port status and > their view on this strategy. Do you really mean WinCE or WinCairo? I thought that WinCE was discontinued a long time ago and already removed. Probably I was wrong. Greetings, Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
[webkit-dev] Proposal: Use ICU in WebKit code
Hi folks. A while back the WebKit project made use of ICU directly. There were some port maintainers who instead wanted to make WebKit work without ICU. At the time, the strategy we pursued was to make a Unicode layer in WTF that layered on top of ICU. We then created multiple implementations of that layer on top of other back ends. But this Unicode layer is simply an awkward renamed subset of ICU. I find it inconvenient when doing work that requires ICU features and it has held back my work in the past. At this point we are down to only two back ends: The one for ICU, and one that is implemented on top of Windows functions, UnicodeWchar.h/cpp. I believe UnicodeWchar is currently used only by the WinCE port. A number of the UnicodeWchar implementations are not complete. For example, the toLower function does not handle the “ß” character. I suggest we remove the Unicode.h abstraction and use ICU directly. I suggest we continue to use the ICU C API, by the way, not the C++ API. For the WinCE port, I suggest we do one of these two things: A) Change the port to require the ICU library. B) Implement a subset of ICU that is enough to compile WebKit, using implementations quite like the ones in UnicodeW.h/cpp today, but using the ICU function names and constants, rathe than an abstraction layer invented for WTF. Thus, code in WebKit can make use of ICU directly in a way that’s easier to understand. Any port that wants to work without ICU can implement an ICU subset compatibility layer in a way that does not require changes to the WebKit code. I am not in a good position to test this ICU subset compatibility layer, but I think it would be a quick easy job to refactor UnicodeWchar.h/cpp into that form. Any thoughts on this? I am not sure what the status of the WinCE port is, but I’d like to hear from the maintainers of that port on the port status and their view on this strategy. — Darin ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev