Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Jonathan Washington
Hi all, After having read through and thought some on this thread, I have some responses. First of all, I don't care what the "default" is (i.e., whatever apertium-init creates without flags), as long as there remains choice. A lot of pairs already have things set up in different ways, and I

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Jonathan Washington
On Tue, May 26, 2020, 08:48 Francis Tyers wrote: > El 2020-05-26 12:27, Kevin Brubeck Unhammer escribió: > > Xavi Ivars čálii: > > > >> * In the trimming disadvantages number 1, we're stating that we're OK > >> having crappy monodixes because we *fix* that later on with trimming. > >> I'm > >>

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Francis Tyers
El 2020-05-26 12:27, Kevin Brubeck Unhammer escribió: Xavi Ivars čálii: * In the trimming disadvantages number 1, we're stating that we're OK having crappy monodixes because we *fix* that later on with trimming. I'm sure that's where we are now, but as a project that focuses a lot on

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > Here's a timing test for weighted dictionaries. > On apertium-eng-kaz: > > 1. > real 0m4.257s > 2. > real 0m7.990s With nob→nno plain lt-trim 1. takes 33s whereas the long script in 2. takes 45s. File size increases from 1.2M to 5.5M, seems acceptable (the unweighted,

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Tanmai Khanna
> * In the trimming disadvantages number 1, we're stating that we're OK having crappy monodixes because we *fix* that later on with trimming. I'm sure that's where we are >now, but as a project that focuses a lot on provided free (as in speech) language resources that are later used for many other

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Tanmai Khanna
Here's a timing test for weighted dictionaries. On apertium-eng-kaz: 1. lt-trim analyser.bin bidix.bin analyser-found.bin Time: real 0m4.257s user 0m4.120s sys 0m0.131s 2. lt-trim analyser.bin bidix.bin analyser-found.bin lt-print -H analyser.bin > analyser.att lt-print -H

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Samuel Sloniker
Maybe make trimming the default, but make apertium-init disable it for new pairs? On Mon, May 25, 2020, 10:01 Tino Didriksen wrote: > On Mon, 25 May 2020 at 12:29, Xavi Ivars wrote: > >> * In the trimming disadvantages number 1, we're stating that we're OK >> having crappy monodixes because we

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Tino Didriksen
On Mon, 25 May 2020 at 12:29, Xavi Ivars wrote: > * In the trimming disadvantages number 1, we're stating that we're OK > having crappy monodixes because we *fix* that later on with trimming. I'm > sure that's where we are now, but as a project that focuses a lot on > provided free (as in

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Daniel Swanson
Hi Apertiumers, > Wasn't there a "separable"-based solution that looked good though? Besides trimming and not trimming, I would like to suggest a third alternative. As of yesterday, apertium-separable can read and merge multiple source files. I suggest moving MWEs from monodixes to -separable

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Kevin Brubeck Unhammer
Flammie A Pirinen čálii: >> 4. Weighting the monodix will take more compile time than just trimming it. > > Some numbers would be interesting, I think both are quite heavy and we > don't do much further processing in finite-state algebra (/hfst space) > so the weighted models won't blow up. In

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Flammie A Pirinen
On Mon, May 25, 2020 at 03:10:28PM +0530, Tanmai Khanna wrote: > *Disadvantages:* > 1. The monodix has some erroneous analyses - wrong surface forms, wrong > analyses, or even MWEs that aren't really MWEs and can be translated word > by word. These are currently removed since bidixes are more

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > *making trimming the norm and having the option of > eliminating it, or making eliminating trimming the norm and having the > option of activating it, or to have partial trimming, as discussed later.* I'd vote for keeping trimming the norm, implementing the project

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Xavi Ivars
Hi, I'm not sure if default trimming or default non-trimming should be the right decision (probably, to be safer, a default-trimming approach would be better as the starting point), but I want to bring up a few comments on your list. * In the trimming disadvantages number 1, we're stating that

[Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-25 Thread Tanmai Khanna
Hey Apertiumers, This mail is regarding an ongoing project to eliminate dictionary trimming. The project idea can be found here . The project description was to work around everything in Why we trim