subject:"Re\: \[Apertium\-stuff\] Lexical selection"

Re: [Apertium-stuff] Lexical Selection

2021-03-25 Thread Francis Tyers via Apertium-stuff


A 2021-03-23 10:49, Helena Egea Piñeiro escrigué:

Hola!

Quería preguntar sobre la diferencia de la selección léxica de
apertium 3.2 a 3.3. En un hilo anterior pedía como sería posible
obtener varias opciones de traducción como "Tengo mucho trabajo" >
"Tinc molta feina/treball". Para esto seguí recomendaciones de
interrumpir el pipelline ya que para algunos ejemplos en el apertium
que instalé del paquete spa-cat la desambiguación no se hacía
mediante entradas en los diccionarios del tipo SRL o SLR, sino que
había entradas independientes. Si esto no es así no consigo ninguna
opción en el flujo de:

 lt-proc -w
'/usr/share/apertium/apertium-spa-cat/spa-cat.automorf.bin' | cg-proc
-w '/usr/share/apertium/apertium-spa-cat/spa-cat.rlx.bin' |
apertium-tagger -g $2
'/usr/share/apertium/apertium-spa-cat/spa-cat.prob' |
apertium-pretransfer| lt-proc -b
'/usr/share/apertium/apertium-spa-cat/spa-cat.autobil.bin'

No sé si en apertium 3.2 es posible tener entradas sin
desambiguación y que se resuelvan despues. O si hay alguna manera de
con las herramientas de apertium se muestre la opción sin elegir
previa al procesar las entradas de tipo SRL o SLR. O si puedo
modificar los diccionarios y dejar entradas separadas sin que afecte a
la selección posterior


Hola Helena,

En este momento no es posible hacer esto. El problema principal es que
los diccionarios bilingües pueden tener correspondencías que no son
compatibles en términos de reglas de transfer. De hecho, en tu ejemplo
hay uno:

Tengo un trabajo pesado. -> Tinc un treball/feina pesat*

El problema es que se tendría que tener multiples posibilidades en
las reglas de transferencia también.

Esto se podría hacer, pero lo que hemos visto hasta ahora es que
poca gente quiere esa funcionalidad en la realidad. Para los
traductores es más trabajo tener que constantamente quitar una
palabra en dos. Y para la gente que quiere sólo tener una idea
de que un texto va tampoco ayuda mucho.

Si nos das más información sobre tu idea/proyecto quizás podemos
orientarte mejor.

Muchísimas gracias por tu correo y bienvenida a Apertium,

Francis M. Tyers


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2021-03-25 Thread Tino Didriksen

Quick note: You're talking about Apertium version 3.2 and 3.3. Those
versions are from 2010 and 2014. We're at Apertium version 3.7.1 these
days, and we only support development with latest versions of all tools.

Version 3.3 is so old it's not even in the oldest supported Ubuntu or
Debian.

If you need help installing newer versions (please use our binaries), ask
on IRC (Freenode #apertium).

-- Tino Didriksen


On Tue, 23 Mar 2021 at 11:49, Helena Egea Piñeiro 
wrote:

> Hola!
>
> Quería preguntar sobre la diferencia de la selección léxica de apertium
> 3.2 a 3.3. En un hilo anterior pedía como sería posible obtener varias
> opciones de traducción como "Tengo mucho trabajo" > "Tinc molta
> feina/treball". Para esto seguí recomendaciones de interrumpir el pipelline
> ya que para algunos ejemplos en el apertium que instalé del paquete spa-cat
> la desambiguación no se hacía mediante entradas en los diccionarios del
> tipo SRL o SLR, sino que había entradas independientes. Si esto no es así
> no consigo ninguna opción en el flujo de:
>
>  lt-proc -w '/usr/share/apertium/apertium-spa-cat/spa-cat.automorf.bin' |
> cg-proc -w '/usr/share/apertium/apertium-spa-cat/spa-cat.rlx.bin' |
> apertium-tagger -g $2 '/usr/share/apertium/apertium-spa-cat/spa-cat.prob' |
> apertium-pretransfer| lt-proc -b
> '/usr/share/apertium/apertium-spa-cat/spa-cat.autobil.bin'
>
> No sé si en apertium 3.2 es posible tener entradas sin desambiguación y
> que se resuelvan despues. O si hay alguna manera de con las herramientas de
> apertium se muestre la opción sin elegir previa al procesar las entradas de
> tipo SRL o SLR. O si puedo modificar los diccionarios y dejar entradas
> separadas sin que afecte a la selección posterior
>
> Gracias!
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2020-04-13 Thread Jonathan Washington

I believe the multitrans script in lex-tools (
https://github.com/apertium/apertium-lex-tools) makes it possible to get
all versions of the translation by expanding the dictionary and skipping
lexical selection.  So you'd get two sentences output for this particular
example:

- The season is more rainy
- The station is more rainy

--
Jonathan

On Fri, Apr 3, 2020, 09:00 Kevin Brubeck Unhammer  wrote:

> Jaume Ortolà i Font
>  čálii:
>
> > Missatge de egea piñeiro helena <
> helena.egea-tryufelddafe5aofshc...@public.gmane.org> del dia dc., 1
> > d’abr. 2020 a les 10:48:
> >
> >> How to show the text translated with the multiple options due to
> polisemy.
> >> "The *season/station* more rainy is"
> >>
> >
> > This is a recurrent request, that could be useful in some applications,
> but
> > there is no way to do it in Apertium now.
>
> You can make a new pipeline that splits into separate lexical units
> instead of disambiguating. There's an example for eng-ita at
> http://wiki.apertium.org/wiki/Translate_without_disambiguation
>
> Basically, replace cg-proc+apertium-tagger
>
> #!/usr/bin/python3
> import streamparser,sys
> for (b, lu) in streamparser.parse_file(sys.stdin,with_text=True):
>  print(b+"[/]".join(["^"+streamparser.reading_to_string(r)+"$" for r in
> lu.readings]),end="")'
>
> and replace lrx-proc with
>
> #!/usr/bin/python3
> import streamparser,sys
> for (b, lu) in streamparser.parse_file(sys.stdin, with_text=True):
>   print(b +
> "[/]".join(["^"+lu.wordform+"/"+streamparser.reading_to_string(r)+"$" for r
> in lu.readings]), end="")'
>
> in your pipeline and you get slash-separated alternatives.
>
>
> Of course, this won't get handled correctly by transfer (transfer will
> see e.g. several nouns in a row where there was one source noun), but if
> all you want is to send all alternatives through, it may be Good Enough
> for some purposes (e.g. testvoc, or MT for language learning).
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2020-04-03 Thread Kevin Brubeck Unhammer

Jaume Ortolà i Font
 čálii:

> Missatge de egea piñeiro helena 
>  del dia dc., 1
> d’abr. 2020 a les 10:48:
>
>> How to show the text translated with the multiple options due to polisemy.
>> "The *season/station* more rainy is"
>>
>
> This is a recurrent request, that could be useful in some applications, but
> there is no way to do it in Apertium now.

You can make a new pipeline that splits into separate lexical units
instead of disambiguating. There's an example for eng-ita at
http://wiki.apertium.org/wiki/Translate_without_disambiguation

Basically, replace cg-proc+apertium-tagger

#!/usr/bin/python3
import streamparser,sys
for (b, lu) in streamparser.parse_file(sys.stdin,with_text=True):
 print(b+"[/]".join(["^"+streamparser.reading_to_string(r)+"$" for r in 
lu.readings]),end="")'

and replace lrx-proc with

#!/usr/bin/python3
import streamparser,sys
for (b, lu) in streamparser.parse_file(sys.stdin, with_text=True):
  print(b + 
"[/]".join(["^"+lu.wordform+"/"+streamparser.reading_to_string(r)+"$" for r in 
lu.readings]), end="")'

in your pipeline and you get slash-separated alternatives.


Of course, this won't get handled correctly by transfer (transfer will
see e.g. several nouns in a row where there was one source noun), but if
all you want is to send all alternatives through, it may be Good Enough
for some purposes (e.g. testvoc, or MT for language learning).


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2020-04-03 Thread Tanmai Khanna

To show the translated text with multiple options due to polysemy would
require Apertium to preserve all the polysemous forms of a word in the
Lexical Unit. While this isn't possible as of now, we're working on a
project to try and extend the Apertium stream such that we can include an
arbitrary amount of secondary/optional information.

Once this is done, we can add an option to preserve the disambiguated
option as well as the other possible options in the Lexical Unit, and then
output all of these instead of the one that seems the most likely.

Tanmai

On Fri, Apr 3, 2020 at 3:47 PM Jaume Ortolà i Font 
wrote:

> Missatge de egea piñeiro helena  del dia dc., 1
> d’abr. 2020 a les 10:48:
>
>> How to show the text translated with the multiple options due to
>> polisemy.
>> "The *season/station* more rainy is"
>>
>
> This is a recurrent request, that could be useful in some applications,
> but there is no way to do it in Apertium now.
>
>
>> Also, to study this, I'm trying to find out how to make some changes on
>> the lexical rules (.lrx files) and see an actual change on the output, but
>> the commands through the pipeline doesn't seem to refer to that file. All
>> that I can change to test different polisemy cases are the 'srl' and 'lrs'
>> on the dictionary  ammong the 'D' option (default). I assumed the
>> autloex.bin somehow called the lrx file but I don't know for sure and I
>> couldn't find information so far.
>>
>
> You can also see examples of lexical selection rules in the spa-cat
> metalrx files. In eng-cat, Marc Riera can tell you how it is done.
>
> Jaume Ortolà
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2020-04-03 Thread Jaume Ortolà i Font

Missatge de egea piñeiro helena  del dia dc., 1
d’abr. 2020 a les 10:48:

> How to show the text translated with the multiple options due to polisemy.
> "The *season/station* more rainy is"
>

This is a recurrent request, that could be useful in some applications, but
there is no way to do it in Apertium now.


> Also, to study this, I'm trying to find out how to make some changes on
> the lexical rules (.lrx files) and see an actual change on the output, but
> the commands through the pipeline doesn't seem to refer to that file. All
> that I can change to test different polisemy cases are the 'srl' and 'lrs'
> on the dictionary  ammong the 'D' option (default). I assumed the
> autloex.bin somehow called the lrx file but I don't know for sure and I
> couldn't find information so far.
>

You can also see examples of lexical selection rules in the spa-cat metalrx
files. In eng-cat, Marc Riera can tell you how it is done.

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

2020-04-01 Thread Hèctor Alòs i Font

Hola Helena,

Tens regles de selecció lèxica per a "estació" en el parell
apertium-fra-cat. Pots mirar el fitxer apertium-fra-cat.cat-fra.metalrx. Si
afines les regles per a l'anglès, pots també fer-ho en aquests al francès
:) I si et cal algun aclariment més, no dubtis a preguntar.

Cordialment,
Hèctor

Missatge de egea piñeiro helena  del dia dc., 1
d’abr. 2020 a les 11:48:

> Hello!
>
>
> I'm working on eng-cat and spa-cat dictionaries trying to find a way to
> avoid lexical selection. I mean, in such an example as this one:
>
> ^El/The$ ^*estació/season*
> */station*$ ^més/more$
> ^plujós/rainy$
> ^ser/be$
> ^el/the$ ^estiu/summer$
>
> How to show the text translated with the multiple options due to polisemy.
>
> "The *season/station* more rainy is"
>
> Also, to study this, I'm trying to find out how to make some changes on
> the lexical rules (.lrx files) and see an actual change on the output, but
> the commands through the pipeline doesn't seem to refer to that file. All
> that I can change to test different polisemy cases are the 'srl' and 'lrs'
> on the dictionary  ammong the 'D' option (default). I assumed the
> autloex.bin somehow called the lrx file but I don't know for sure and I
> couldn't find information so far.
>
>
> echo "L'estació més plujosa és l'estiu" | lt-proc -w
> '/usr/share/apertium/apertium-eng-cat/cat-eng.automorf.bin' | cg-proc -w
> '/usr/share/apertium/apertium-eng-cat/cat-eng.rlx.bin' | apertium-tagger -g
> $2 '/usr/share/apertium/apertium-eng-cat/cat-eng.prob' |
> apertium-pretransfer| lsx-proc
> '/usr/share/apertium/apertium-eng-cat/cat-eng.autosep.bin' | lt-proc -b
> '/usr/share/apertium/apertium-eng-cat/cat-eng.autobil.bin' |* lrx-proc -m
> -t '/usr/share/apertium/apertium-eng-cat/cat-eng.autolex.bin' *
>
>
> Thanks!
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection rule learning questions

2019-06-14 Thread Francis Tyers


El 2019-06-14 05:43, Jonathan Washington escribió:

чт, 13 июн. 2019 г. в 20:54, Francis Tyers
:


El 2019-06-13 22:34, Danielle Rossetti Dos Santos escribió:

Hello,

I'm working with the monolingual transfer rule learning code and

have

a few questions:

1. I see some language pairs used to have a multi mode (such as in
this old version of eng-cat [1]). They also used to have "poly"
dictionaries (such as this one [2]). These files seem necessary

for

the latest monolingual rule learning script I've found [3]. Why do
language pairs no longer have a multi mode or poly dictionaries?


They are deprecated.


2. Is there a script that can generate a poly dictionary from a
bilingual dictionary?


Not really no, it is deprecated.


Multi modes seem to be an important step in the training of lexical
selection rules using monolingual corpora.  According to both of the
following pages, the mode is mandatory:

http://wiki.apertium.org/wiki/Running_the_monolingual_rule_learning
and
http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_monolingual_corpora

Is there some way around this?  Alternatively, might it be appropriate
to restore (and de-deprecate) poly dictionaries and multi modes so
that lexical selection rules can be learned?  Or do you have some
other suggestion for how to proceed here?



Look at the -multi mode in sh-mk (like the docs suggest)

https://github.com/apertium/apertium-hbs-mkd/blob/master/modes.xml

All other -multi modes with apertium-multi-translations are deprecated 
and should not be used/referred to.



F.


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection rule learning questions

2019-06-13 Thread Jonathan Washington

чт, 13 июн. 2019 г. в 20:54, Francis Tyers :

> El 2019-06-13 22:34, Danielle Rossetti Dos Santos escribió:
> > Hello,
> >
> > I'm working with the monolingual transfer rule learning code and have
> > a few questions:
> >
> > 1. I see some language pairs used to have a multi mode (such as in
> > this old version of eng-cat [1]). They also used to have "poly"
> > dictionaries (such as this one [2]). These files seem necessary for
> > the latest monolingual rule learning script I've found [3]. Why do
> > language pairs no longer have a multi mode or poly dictionaries?
>
> They are deprecated.
>
> > 2. Is there a script that can generate a poly dictionary from a
> > bilingual dictionary?
>
> Not really no, it is deprecated.
>

Multi modes seem to be an important step in the training of lexical
selection rules using monolingual corpora.  According to both of the
following pages, the mode is mandatory:
http://wiki.apertium.org/wiki/Running_the_monolingual_rule_learning
and
http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_monolingual_corpora

Is there some way around this?  Alternatively, might it be appropriate to
restore (and de-deprecate) poly dictionaries and multi modes so that
lexical selection rules can be learned?  Or do you have some other
suggestion for how to proceed here?

--
Jonathan


>
> > 3.  The third step in the monolingual rule learning script I linked
> > above says this should be ran:
> >
> > cat europarl.en-es.es.tagged | ~/source/apertium-lex-tools/multitrans
> > ~/source/apertium-en-es/en-es.autobil -m -f -t -n >
> > europarl.en-es.es.multi-trimmed
> > I was trying to do this step with the apertium-en-pt language pair
> > using 10% of the English-Portuguese
> > Europarl corpus. I stopped the program because the output file was
> > getting really big (dozens of
> > gigabytes). Is this expected behavior from ./multitrans with the -m
> > option? If so, how are the
> > English-Spanish Europarl examples run?
>
> Yes, they are run with a very large harddisk. :)
>
> However, it would be helpful to know
>
> 1) what kind of output you are getting
> 2) what the exact setup is that you are using.
>
> F.
>
> F.
> in order to work out if there
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection rule learning questions

2019-06-13 Thread Francis Tyers


El 2019-06-13 22:34, Danielle Rossetti Dos Santos escribió:

Hello,

I'm working with the monolingual transfer rule learning code and have
a few questions:

1. I see some language pairs used to have a multi mode (such as in
this old version of eng-cat [1]). They also used to have "poly"
dictionaries (such as this one [2]). These files seem necessary for
the latest monolingual rule learning script I've found [3]. Why do
language pairs no longer have a multi mode or poly dictionaries?


They are deprecated.


2. Is there a script that can generate a poly dictionary from a
bilingual dictionary?


Not really no, it is deprecated.


3.  The third step in the monolingual rule learning script I linked
above says this should be ran:

cat europarl.en-es.es.tagged | ~/source/apertium-lex-tools/multitrans
~/source/apertium-en-es/en-es.autobil -m -f -t -n >
europarl.en-es.es.multi-trimmed
I was trying to do this step with the apertium-en-pt language pair
using 10% of the English-Portuguese
Europarl corpus. I stopped the program because the output file was
getting really big (dozens of
gigabytes). Is this expected behavior from ./multitrans with the -m
option? If so, how are the
English-Spanish Europarl examples run?


Yes, they are run with a very large harddisk. :)

However, it would be helpful to know

1) what kind of output you are getting
2) what the exact setup is that you are using.

F.

F.
in order to work out if there


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection

2017-02-16 Thread Hèctor Alòs i Font

Thanks, Kevin. Interesting solution :)

2017-02-14 14:12 GMT+03:00 Kevin Brubeck Unhammer :

> Hèctor Alòs i Font 
> čálii:
>
> > I'm writing lexical selection rules for French prix (price/prize). Is
> there any form to match a word specifically at the beginning of a sentence
> (as it can be
> > done in CG)? Matching the word with capital letters is not a good
> solution in this case.
>
> Something like
>
> 
>   
>   
>   
>   
> 
>   
>   
>   
> 
>   
>   
> 
>
> will turn "One. H One. One." into "One. H Two. One.".
>
> $ cat from-biltrans
> ^One/Two/One$^./.$
> ^H/H$
> ^one/two/one$^./.$
> ^One/Two/One$^./.$
>
> $ lrx-comp onetwo.lrx onetwo.autolex.bin
> 3: 47@51
>
> $ lrx-proc -m -t onetwo.autolex.bin < from-biltrans
> 4:SELECT:1.0:One:One
> ^One/One$^./.$
> 4:SELECT:2.0:one:two
> ^H/H$
> ^one/two$^./.$
> 4:SELECT:4.0:One:One
> ^One/One$^./.$
>
> The first One is matched only by rule 1.
> The second One is matched only by rule 2.
> The third One is matched rule 1 and 3.
>
> I don't know if there's a way to match "beginning-of-stream" explicitly
> – then you could do with just two rules.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection

2017-02-14 Thread Kevin Brubeck Unhammer

Hèctor Alòs i Font 
čálii:

> I'm writing lexical selection rules for French prix (price/prize). Is there 
> any form to match a word specifically at the beginning of a sentence (as it 
> can be
> done in CG)? Matching the word with capital letters is not a good solution in 
> this case.

Something like


  
  
  
  

  
  
  

  
  


will turn "One. H One. One." into "One. H Two. One.". 

$ cat from-biltrans 
^One/Two/One$^./.$
^H/H$ 
^one/two/one$^./.$
^One/Two/One$^./.$

$ lrx-comp onetwo.lrx onetwo.autolex.bin
3: 47@51

$ lrx-proc -m -t onetwo.autolex.bin < from-biltrans
4:SELECT:1.0:One:One
^One/One$^./.$
4:SELECT:2.0:one:two
^H/H$ 
^one/two$^./.$
4:SELECT:4.0:One:One
^One/One$^./.$

The first One is matched only by rule 1.
The second One is matched only by rule 2.
The third One is matched rule 1 and 3.

I don't know if there's a way to match "beginning-of-stream" explicitly
– then you could do with just two rules.


signature.asc
Description: PGP signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection also in source language

2016-02-21 Thread Mikel L. Forcada

Joonas,
well, in my message I was advocating more of a division of work (CG for
morphosyntactical disambiguation with lexical selection dealing with
problems related to lemmas). This does not mean that you cannot do
everything with CG — in fact, I see no principled reason why you can't.

A couple of things should however be taken into account:

(1) our CG processor is currently a bit slow, while Fran's lexical
selection processor is lots faster as rules are compiled as finite-state
processors

(2) when writing a morphosyntactical disambiguation rule in CG gets tough,
one can let a statistical part-of-speech tagger deal with the remaining
ambiguity (I recently added a Google Summer of Code idea to do this).

Hope this helps

Mikel


2016-02-21 17:42 GMT+01:00 Joonas Kylmälä :

> Thanks for the infromation, Tino. So as we can achieve 100%
> disambiguation with CG then there's no need for adding extra lexical
> selection module after the CG! :)
>
> On Sun, Feb 21, 2016 at 6:18 PM, Tino Didriksen 
> wrote:
> > On 21 February 2016 at 17:07, Joonas Kylmälä 
> wrote:
> >>
> >> I read from  that
> >> CG can leave 3-7% of all words ambiguous (not sure how reliable that
> >> information is..) and at the moment the language pairs that use
> >> vislcg3 don't have anything after vislcg3 in the pipeline that would
> >> resolve those ambiguities, and so the first analysis is selected
> >> whether or not it is the right one.
> >
> >
> > It is not a limitation of CG. You can achieve 100% disambiguation if you
> add
> > or improve the CG rules.
> >
> > -- Tino Didriksen
> >
> >
> --
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>


-- 
Mikel L. ForcadaE-mail: m...@dlsi.ua.es
Departament de Llenguatges  Phone: +34-96-590-9776
i Sistemes Informàticsalso +34-96-590-3772.
UNIVERSITAT D'ALACANT   Fax:   +34-96-590-9326, -3464
E-03071 ALACANT, Spain.

URL: http://www.dlsi.ua.es/~mlf
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection also in source language

2016-02-21 Thread Joonas Kylmälä

Thanks for the infromation, Tino. So as we can achieve 100%
disambiguation with CG then there's no need for adding extra lexical
selection module after the CG! :)

On Sun, Feb 21, 2016 at 6:18 PM, Tino Didriksen  wrote:
> On 21 February 2016 at 17:07, Joonas Kylmälä  wrote:
>>
>> I read from  that
>> CG can leave 3-7% of all words ambiguous (not sure how reliable that
>> information is..) and at the moment the language pairs that use
>> vislcg3 don't have anything after vislcg3 in the pipeline that would
>> resolve those ambiguities, and so the first analysis is selected
>> whether or not it is the right one.
>
>
> It is not a limitation of CG. You can achieve 100% disambiguation if you add
> or improve the CG rules.
>
> -- Tino Didriksen
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical selection also in source language

2016-02-21 Thread Mikel L. Forcada

Hi Joonas:
I always thought that the role of Constraint Grammar (CG) in Apertium was
more of a morphosyntactic desambiguation. Therefore, even if the CG
processor completely solved the morphosyntactic ambiguity of each and every
source-language surface form, there could still well be the chance for a
given *lemma* to have more than one target-language equivalent. For
instance, the Spanish surface form "registro" can have two lexical forms:
it may be a verb (registrar.vblex.pri.1.sg) or a noun (registro.n.m.sg). A
CG rule could use the context to discard the noun lexical form, but when
translating into English, any verb lexical form with "registrar" as lemma
could mean "search" as when the police has a "search warrant" and enters a
house to look for some evidence, or "register", as in annotating something
in a register.  This second problem is clearly a candidate for a lexical
selection module.

Cheers
Mikel

2016-02-21 17:07 GMT+01:00 Joonas Kylmälä :

> Hey everyone,
>
> I read from  that
> CG can leave 3-7% of all words ambiguous (not sure how reliable that
> information is..) and at the moment the language pairs that use
> vislcg3 don't have anything after vislcg3 in the pipeline that would
> resolve those ambiguities, and so the first analysis is selected
> whether or not it is the right one. Could we use the
> apertium-lex-tools (lrx-proc) also after cg-proc to get better
> translations? I think we might need to do some changes to
> apertium-lex-tools (or possibly not) in order to get it working.
>
> I also understand that it slows down the translation process a bit but
> it would benefit those people who want more accurate translations, and
> it would be easy to make it optional for those that don't want it
> because of the pipeline architecture we use.
>
> What do you think about this?
>
> -Joonas
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>


-- 
Mikel L. ForcadaE-mail: m...@dlsi.ua.es
Departament de Llenguatges  Phone: +34-96-590-9776
i Sistemes Informàticsalso +34-96-590-3772.
UNIVERSITAT D'ALACANT   Fax:   +34-96-590-9326, -3464
E-03071 ALACANT, Spain.

URL: http://www.dlsi.ua.es/~mlf
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] lexical selection: updates and questions

2011-12-02 Thread Isaac Clerencia

On Fri, Dec 2, 2011 at 11:27 AM, Francis Tyers fty...@prompsit.com wrote:
 One of our GCI students, Brian Toews, has been writing lexical selection
 rules for English-Spanish, people can check out his work here:

 https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es/dev/apertium-en-es.en-es.lrx

Oh, I wasn't aware of this. Looks great. Is it already being used in
SVN? (i.e., if I build apertium from svn, will it already use this?)

-- 
Isaac Clerencia
isaac.cleren...@gmail.com

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] lexical selection: updates and questions

2011-12-02 Thread Francis Tyers

El dv 02 de 12 de 2011 a les 15:53 +, en/na Jimmy O'Regan va
escriure:
 On 2 December 2011 15:51, Isaac Clerencia is...@warp.es wrote:
  On Fri, Dec 2, 2011 at 11:27 AM, Francis Tyers fty...@prompsit.com wrote:
  One of our GCI students, Brian Toews, has been writing lexical selection
  rules for English-Spanish, people can check out his work here:
 
  https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es/dev/apertium-en-es.en-es.lrx
 
  Oh, I wasn't aware of this. Looks great. Is it already being used in
  SVN? (i.e., if I build apertium from svn, will it already use this?)
 
 No. This is bleeding edge - the mail you quote mentions a format change :)

There is a prototype which works, but I'm going to change the rule
format to take into account some suggestions from other apertium
developers. 

I hope to have a stable release by Christmas, at the moment I'm sure the
rule format won't change again, so if anyone is interested in writing
rules for this for your language pair, please let me know. 

Fran


--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical Selection

Re: [Apertium-stuff] Lexical selection rule learning questions

Re: [Apertium-stuff] Lexical selection rule learning questions

Re: [Apertium-stuff] Lexical selection rule learning questions

Re: [Apertium-stuff] Lexical selection

Re: [Apertium-stuff] Lexical selection

Re: [Apertium-stuff] Lexical selection also in source language

Re: [Apertium-stuff] Lexical selection also in source language

Re: [Apertium-stuff] Lexical selection also in source language

Re: [Apertium-stuff] lexical selection: updates and questions

Re: [Apertium-stuff] lexical selection: updates and questions

17 matches

Site Navigation

Mail list logo

Footer information