[Apertium-stuff] My idea of tokenization flow

2020-04-22 Thread 杨伟哲
Hi all, My GSoC proposed project is "Robust tokenization in lttoolbox"[1]. These days I have further studied source files of the lexical analysis tool *lt-proc *of lttoolbox. Now, about the way how lttoolbox realizes the operation of tokenization, I have a preliminary idea. My idea is recorded

Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-22 Thread Daniel Swanson
Hi Per, If I understand correctly, this might give what you want: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | uniq lt-expand lists all the forms, grep finds all the ones where the first tag is , sed gets rid of everything but the lemma, and

Re: [Apertium-stuff] Bylaws Overhaul Proposal

2020-04-22 Thread Jonathan Washington
If you don't like the diff that GitHub offers, you can clone the repos and use whatever diff tool you prefer. The git executable itself has several diff modes too, including a word diff (which sounds like what you're after?). -- Jonathan On Wed, Apr 22, 2020, 01:14 Tino Didriksen wrote: > On

[Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-22 Thread Per Tunedal
Hi, I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing else). How do I accomplish this? I read the Wiki: http://wiki.apertium.org/wiki/Dixtools:_Grep Thus I tried: apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix but nothing was filtered. I got the whole file.

Re: [Apertium-stuff] Where do I find the dictionaries

2020-04-22 Thread Per Tunedal
Hi, thank you all! apertium-get apertium-swe worked like a charm. Yours, Per Tunedal On Tue, Apr 21, 2020, at 19:05, Tino Didriksen wrote: > Correct, data packages are not meant for development use. > > The monolingual packages install only exactly as much as is needed for > building pair