On 8/13/20 7:05 PM, Frank Tang (譚永鋒) wrote:


On Thu, 13 Aug 2020 at 02:19, Emilio Cobos Álvarez <[email protected] <mailto:[email protected]>> wrote:


    On 8/13/20 12:17 AM, Frank Tang wrote:


            [Note: Resent due to message header problem. Sorry]


            Contact emails

    [email protected] <mailto:[email protected]>, [email protected]
    <mailto:[email protected]>


            Explainer


    https://github.com/tc39/proposal-intl-segmenter
    <https://github.com/tc39/proposal-intl-segmenter>


            Specification

    https://tc39.github.io/proposal-intl-segmenter/
    <https://tc39.github.io/proposal-intl-segmenter/>


            Design docs


    
https://docs.google.com/document/d/1xugLpLmgRFnNXK8ztariTAbD2IXueDw1T3VNuuZCz8k/edit#heading=h.xgjl2srtytjt
    
<https://docs.google.com/document/d/1xugLpLmgRFnNXK8ztariTAbD2IXueDw1T3VNuuZCz8k/edit#heading=h.xgjl2srtytjt>
    
https://docs.google.com/presentation/d/1X2zBU3bZ4ergVMWfubCsdnHFzeaDgqiTRJVgvNGjQBs/edit#slide=id.p
    
<https://docs.google.com/presentation/d/1X2zBU3bZ4ergVMWfubCsdnHFzeaDgqiTRJVgvNGjQBs/edit#slide=id.p>


            TAG review

    review by ECMA402


            Summary

    Intl.Segmenter implements methods for finding the location of
    boundaries in text, including grapheme, line, word and sentence
    boundary analysis.


            Motivation

    Currently, chrome is shipped with Intl.v8BreakIterator - a non
    standard way for similar functionality. According to
    https://www.chromestatus.com/metrics/feature/timeline/popularity/556
    <https://www.chromestatus.com/metrics/feature/timeline/popularity/556> on
    2020 Feb there are 0.74% of the web page use it. Intl.Segmenter
    is the web standard to replace it.


            Risks



            Interoperability and Compatibility

    The specification is moved to Stage 3 in TC39 2020-Jul meeting
    with support from ECMA402.

    /Gecko/: In development
    (https://bugzilla.mozilla.org/show_bug.cgi?id=1423593
    <https://bugzilla.mozilla.org/show_bug.cgi?id=1423593>)

    FWIW, in development seems a bit of a stretch since there hasn't
    been activity in the bug for a while.

The main reason is there is a long discussion of the approach in the spec and the spec was moved from Stage 3 back to stage 3 last 2 years for the new champion to revise it. It finally reach a better shape and moved to Stage 3 in TC39 in July meeting after getting folks from Mozilla supporting during monthly ECMA402 meeting.

Sure, to be clear, I wasn't trying to push back. Jbliust wanted to point out that it doesn't seem to be worked on right now, so "in development" doesn't seem quite accurate. "Positive" seems like a more accurate description per this document <https://docs.google.com/document/d/1xkHRXnFS8GDqZi7E0SSbR3a7CZsGScdxPUWBsNgo-oo/edit#>?

    The patch is three years old and there was a bit of a concern due
    to the binary size growing quite a bit
    <https://bugzilla.mozilla.org/show_bug.cgi?id=1423593#c9>.

    (Not an expert on this, but Gecko's layout engine doesn't use ICU
    for line-breaking,

I know, I hand wrote that between 1998-2002 when I was Mozilla's i18n module owner and Netscape i18n client manager. They have not changed that part code for the last 20+ years as I know (beside the work I worked with some Thai folks in late 2003 to deal with Thai line breaking).  The current version of Intl.Segmenter spec took out the line break support two years ago so that is irrelevant anyway.

That's an awesome bit of trivia, thanks :)

FWIW, it seems that for a bunch of more complex languages (Thai included) nowadays we rely on platform-native APIs instead (see bug 389520 <https://bugzilla.mozilla.org/show_bug.cgi?id=389520>, bug 336959 <https://bugzilla.mozilla.org/show_bug.cgi?id=336959>, bug 390048 <https://bugzilla.mozilla.org/show_bug.cgi?id=390048>, etc).

 -- Emilio

    IIRC, so a lot of the ICU data that would be required for this has
    to be imported).

I reduced the ICU break rule table size by ~50% in https://github.com/unicode-org/icu/pull/1100 <https://github.com/unicode-org/icu/pull/1100> so the data size for break iterator in ICU68 scheduled to be released in late Oct 2020 will be ~230K less than 67.

    There may be alternative implementation strategies or what not,
    but it doesn't seem to be actively worked on.


    /WebKit/: No signal

    /Web developers/: No signals


            Ergonomics

    Engineer from Apple believe we should not add line break support
    to the Intl.Segmenter because the developer may abuse the API and
    perform text layout by themselves instead of depending on CSS.
    The line break feature then were removed from the specification
    in the current shape.


            Will this feature be supported on all six Blink platforms
            (Windows, Mac, Linux, Chrome OS, Android, and Android
            WebView)?

    Yes


            Is this feature fully tested by web-platform-tests
            
<https://chromium.googlesource.com/chromium/src/+/master/docs/testing/web_platform_tests.md>?

    Yes
    https://github.com/tc39/test262/tree/master/test/intl402/Segmenter
    <https://github.com/tc39/test262/tree/master/test/intl402/Segmenter>


            Link to entry on the Chrome Platform Status

    https://www.chromestatus.com/feature/6099397733515264
    <https://www.chromestatus.com/feature/6099397733515264>

    This intent message was generated by Chrome Platform Status
    <https://www.chromestatus.com/>.
-- You received this message because you are subscribed to the
    Google Groups "blink-dev" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOcELL-m7DZ5OAwZj9FqX9VKZKWYd_Qf0YeaXCs3YAEbcnPsKA%40mail.gmail.com
    
<https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOcELL-m7DZ5OAwZj9FqX9VKZKWYd_Qf0YeaXCs3YAEbcnPsKA%40mail.gmail.com?utm_medium=email&utm_source=footer>.



--
Frank Yung-Fong Tang
譚永鋒 / 🌭🍊
Sr. Software Engineer

--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/67d8ede1-baa0-b82b-b33a-efa80e1d0e79%40mozilla.com.

Reply via email to