[go-nuts] Re: [unicode] Missing Katakana runes in rangetable?

2022-06-27 Thread Matt Sherman
Ah, I was barking up the wrong tree on this, please disregard. It’s an extending character, which by itself (I infer) is not categorized as Katakana. On Monday, June 27, 2022 at 10:51:07 PM UTC-4 Matt Sherman wrote: > Hi there, I stumbled across a surprising discovery that > unic

[go-nuts] [unicode] Missing Katakana runes in rangetable?

2022-06-27 Thread Matt Sherman
Hi there, I stumbled across a surprising discovery that unicode.Is(unicode.Katakana, 'ー') returns false. This is code point U+30FC, and appears in the Katakana code block . Looking at the rangetable

[go-nuts] Re: [ANN] Unicode text segmentation

2020-05-07 Thread Matt Sherman
Sorry, bad link. Here it is: https://github.com/clipperhouse/uax29 On Thursday, May 7, 2020 at 12:06:18 PM UTC-4, Matt Sherman wrote: > > Hi gophers, I’ve implemented Unicode text segmentation for Go: > https://github.com/clipperhouse/uax29/words > > It tokenizes text into w

[go-nuts] [ANN] Unicode text segmentation

2020-05-07 Thread Matt Sherman
Hi gophers, I’ve implemented Unicode text segmentation for Go: https://github.com/clipperhouse/uax29/words It tokenizes text into words, sentences or graphemes according to the Unicode spec . I’d been tokenizing text in ad hoc ways, and then learned that

Re: [go-nuts] x/text: Interest in Unicode text segmentation?

2020-04-17 Thread Matt Sherman
to generate canonical x/text files, such as including the Unicode > and CLDR versions. > > The top-level file gen.go is used to orchestrate building x/text and > captured dependencies between packages. > > I may have some designs laying around for the API. > > On Thu, 16 A

Re: [go-nuts] x/text: Interest in Unicode text segmentation?

2020-04-16 Thread Matt Sherman
/gen.go On Thu, Apr 16, 2020 at 1:52 PM wrote: > Yes that would be interesting. Especially if it can be generated from the > Unicode raw data upon updates. > > On Wed, 15 Apr 2020 at 23:56 Ian Lance Taylor wrote: > >> [ +mpvl ] >> >> On Wed, Apr 15, 2020 at 2:30 PM

[go-nuts] x/text: Interest in Unicode text segmentation?

2020-04-15 Thread Matt Sherman
Hi, I am working on a tokenizer based on Unicode text segmentation (UAX 29 ). I am wondering if there would be an interest in adding range tables for word break categories

Re: [go-nuts] Re: Generics as builtin typeclasses

2018-09-06 Thread Matt Sherman
Thanks Ian, was hoping you’d weigh in. Perhaps a compromise position would be that these type groups/classes/contracts are not language builtins but in the stdlib? contracts.Comparable, etc. If we really don’t want to see the dot, we import with _. And, for a first implementation, only the

Re: [go-nuts] Generics as builtin typeclasses

2018-09-04 Thread Matt Sherman
@Matthias I don’t mention it in my post but I think that’d be fine, e.g.: type Set(type T comparable) []T type OrderedSlice(type T orderable) []T On Tuesday, September 4, 2018 at 3:52:50 PM UTC-4, Matthias B. wrote: > > On Tue, 4 Sep 2018 11:57:02 -0700 (PDT) > Matt Sherma

[go-nuts] Re: Generics as builtin typeclasses

2018-09-04 Thread Matt Sherman
eric) T { > > > or > > func Sum(x []T Numeric) T { > > > > Jon > > On Tuesday, September 4, 2018 at 11:57:02 AM UTC-7, Matt Sherman wrote: >> >> Here’s a riff on generics focused on builtin typeclasses (instead of user >> contracts): https://c

[go-nuts] Generics as builtin typeclasses

2018-09-04 Thread Matt Sherman
Here’s a riff on generics focused on builtin typeclasses (instead of user contracts): https://clipperhouse.com/go-generics-typeclasses/ Feedback welcome. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop

[go-nuts] Re: Custom types for use with range (Go 2)

2018-08-07 Thread Matt Sherman
Sorry for late reply: yes, it’s sugar, and a first implementation might be to have the compiler simply rewrite it like a macro, as in your example. And I realize that my example was more verbose than need be. We don’t call an iterator on arrays, maps, etc, so my example should have been: for t

[go-nuts] [show] A lemmatizer for Go

2018-05-27 Thread Matt Sherman
Hi, been a while since I’ve been on the list! I’ve started a package with tokenizers and lemmatizers for Go, called ‘Jargon'. It’s intended to be useful for detecting synonyms in text, and turning them into their canonical terms. It’s early so I am looking for feedback: would you find such a