Hi all,
My GSoC proposed project is "Robust tokenization in lttoolbox"[1].
These days I have further studied source files of the lexical analysis
tool *lt-proc *of lttoolbox. Now, about the way how lttoolbox realizes
the operation of tokenization, I have a preliminary idea.
My idea is recorded
Hi Per,
If I understand correctly, this might give what you want:
lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E
's/[^<:>]+:([^<:>]+).*/\1/g' | uniq
lt-expand lists all the forms, grep finds all the ones where the first tag
is , sed gets rid of everything but the lemma, and
If you don't like the diff that GitHub offers, you can clone the repos and
use whatever diff tool you prefer. The git executable itself has several
diff modes too, including a word diff (which sounds like what you're
after?).
--
Jonathan
On Wed, Apr 22, 2020, 01:14 Tino Didriksen wrote:
> On
Hi,
I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing
else). How do I accomplish this?
I read the Wiki:
http://wiki.apertium.org/wiki/Dixtools:_Grep
Thus I tried:
apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix
but nothing was filtered. I got the whole file.
Hi,
thank you all!
apertium-get apertium-swe worked like a charm.
Yours,
Per Tunedal
On Tue, Apr 21, 2020, at 19:05, Tino Didriksen wrote:
> Correct, data packages are not meant for development use.
>
> The monolingual packages install only exactly as much as is needed for
> building pair