Re: [Apertium-stuff] Automatically change first-person to third-person
Hi Kevin, your new solution with rtx would be of interest. I´ve attached your old Python script. Your answer from 2018: $ curl http://termbin.com/sah4 >pres2pret.py $ curl https://raw.githubusercontent.com/goavki/streamparser/master/streamparser.py >streamparser.py $ sudo apt install apertium-nno $ echo 'ååh hald no opp, eg gjer det etter kvart.' |\ apertium-deshtml -n |\ lt-proc /usr/share/apertium/apertium-nno/nno.automorf.bin |\ cg-proc -1 /usr/share/apertium/apertium-nno/nno.rlx.bin |\ python3 pres2pret.py|\ lt-proc -g /usr/share/apertium/apertium-nno/nno.autogen.bin |\ apertium-rehtml-noent which gives the (rather ungrammatical) answer "ååh hald no opp, eg gjorde det etter kvart". The link once again: https://sourceforge.net/p/apertium/mailman/apertium-stuff/thread/1519736195.3991384.1284992528.191E054E%40webmail.messagingengine.com/#msg36238830 Or search for "change of tense" in the list archives. The original subject was: Automatic change of tense -- Vänligen Per Tunedal On Mon, Feb 14, 2022, at 20:10, Kevin Brubeck Unhammer wrote: >> The link in that earlier email is dead, so I can't see what the original >> script was doing, but based on the name it might have just been replacing >> with , in which case, if you still have that script, you could >> just edit it to replace with . > > Wops, I should've attached it … > > These days I think I'd use rtx for this, probably would be an even > shorter file =D > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > Attachments: > * signature.asc#!/usr/bin/env python3 import sys from streamparser import parse_file, readingToString, known, SReading for blank, lu in parse_file(sys.stdin, withText=True): pres = [s for r in lu.readings for s in r if s.tags == ['vblex', 'pres']] if pres != []: pret = [SReading(baseform=s.baseform, tags=['vblex', 'pret']) for s in pres] print(blank+" ".join("^{}$".format(readingToString(pret)) for r in lu.readings), end="") else: print(blank+"["+lu.wordform+"]", end="") ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Automatically change first-person to third-person
HI again, OOps, I happened to take the wrong book from the book shelf. I should have taken one of those a bit more to the left :-( The examples are from the works of an other author, Sun Axelsson, not Majgull Axelsson. But the question is still valid. And you could still make Majgull happy on her anniversary. Yours, Per Tunedal -- Vänligen Per Tunedal On Mon, Feb 14, 2022, at 11:11, Per Tunedal wrote: > Hi, > Today is the anniversary of the Swedish author Majgull Axelsson. I've > just read an interview with her in my morning paper, Dagens Nyheter. > She tells about the work with a new novel. She has just decided to > change the narrator perspective from first-person to third-person > (singular) and wish some help: > "Jag önskar mig innerligt i fördelsedagspresent, ett datorprogram som > kan ändra tempus och berättarperspektiv väldigt enkelt, skämtar hon." > > My translation: > "I fervently wish me a computer program that is able to change the > tense and the narrative perspective very easy, she says with a laugh." > > I've in the past already asked for changing the tense, so this is > doable. > https://sourceforge.net/p/apertium/mailman/apertium-stuff/thread/1519736195.3991384.1284992528.191E054E%40webmail.messagingengine.com/#msg36238830 > > She would like to change the perpecive as well, something like: > > Example 1: > "Jag länktade hem och lyfte glaset med en sliskig mintlikör till > munnen. Jag svalde klunk efter klunk och tog klumparna jag svalde för > sockeravlagringar. Tills jag såg att det var flugor." > (Her book "Honungsvargar", page 19) > > To: > Hon länktade hem och lyfte glaset med en sliskig mintlikör till munnen. > Hon svalde klunk efter klunk och tog klumparna hon svalde för > sockeravlagringar. Tills hon såg att det var flugor. ... " (Narrator > changed from 1:st person to third person) > > or: > > Hon länktar hem och lyfter glaset med en sliskig mintlikör till munnen. > Hon sväljer klunk efter klunk och tar klumparna hon sväljer för > sockeravlagringar. Tills hon ser att det är flugor. ... " (Narrator > changed from 1:st person to third person and past tens to present tense) > > > Example 2: > "Mig, skrev han, hade han aldrig älskat." > ("Honungsvargar", page 23) > > To: > "Henne, skrev han, hade han aldrig älskat." (Narrator changed from 1:st > person to third person) > > Maybe the Apertium community could make her wish true? > > Yours, > Per Tunedal > > BTW A more fundamental change of narrative perspective would be harder. > Like changing narrator from the woman to the man in this text. I don't > think it's possible. > > The first example would become something like: > "Han såg henne lyfta glaset med den sliskiga minlikören till munnen. > Hon svalde klunk efter klunk. Plötsligt stelnade hon till och stirrade > ned i glaset. 'Flugor', skrek hon." > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Automatically change first-person to third-person
Hi, Today is the anniversary of the Swedish author Majgull Axelsson. I've just read an interview with her in my morning paper, Dagens Nyheter. She tells about the work with a new novel. She has just decided to change the narrator perspective from first-person to third-person (singular) and wish some help: "Jag önskar mig innerligt i fördelsedagspresent, ett datorprogram som kan ändra tempus och berättarperspektiv väldigt enkelt, skämtar hon." My translation: "I fervently wish me a computer program that is able to change the tense and the narrative perspective very easy, she says with a laugh." I've in the past already asked for changing the tense, so this is doable. https://sourceforge.net/p/apertium/mailman/apertium-stuff/thread/1519736195.3991384.1284992528.191E054E%40webmail.messagingengine.com/#msg36238830 She would like to change the perpecive as well, something like: Example 1: "Jag länktade hem och lyfte glaset med en sliskig mintlikör till munnen. Jag svalde klunk efter klunk och tog klumparna jag svalde för sockeravlagringar. Tills jag såg att det var flugor." (Her book "Honungsvargar", page 19) To: Hon länktade hem och lyfte glaset med en sliskig mintlikör till munnen. Hon svalde klunk efter klunk och tog klumparna hon svalde för sockeravlagringar. Tills hon såg att det var flugor. ... " (Narrator changed from 1:st person to third person) or: Hon länktar hem och lyfter glaset med en sliskig mintlikör till munnen. Hon sväljer klunk efter klunk och tar klumparna hon sväljer för sockeravlagringar. Tills hon ser att det är flugor. ... " (Narrator changed from 1:st person to third person and past tens to present tense) Example 2: "Mig, skrev han, hade han aldrig älskat." ("Honungsvargar", page 23) To: "Henne, skrev han, hade han aldrig älskat." (Narrator changed from 1:st person to third person) Maybe the Apertium community could make her wish true? Yours, Per Tunedal BTW A more fundamental change of narrative perspective would be harder. Like changing narrator from the woman to the man in this text. I don't think it's possible. The first example would become something like: "Han såg henne lyfta glaset med den sliskiga minlikören till munnen. Hon svalde klunk efter klunk. Plötsligt stelnade hon till och stirrade ned i glaset. 'Flugor', skrek hon." ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Semantics in Apertium (was Apertium's Wider Use & Secondary Tags)
Hi all, I liked your examples Hector. 1. Synonyms might be good for a problem in Swedish. As I've mentioned in the past, nouns with t-gender (neutrum) in singular, indefinit form, cannot be combined with adjectives that ends on the letter "t" or the letter "d". That form is not used because it can neither be pronounced, nor written. Usually it is noted as "nonexistent" or "not used" in Swedish dictionnaries and grammars. Some examples: A lion cannot be afraid! "ett (impossible form of rädd) lejon" But two lions can: "två rädda lejon" The same applies to e.g. gosts (spöken) and children (barn). Normally n-genus (utrium) is used for anime (living things) in Swedish, but some words have by some reason got the "wrong" gender. When encountering such words you have to substitute the adjective by a synomym (or reformulate). 2. Genre might be useful for word selection in some cases. In the past I began adding info on genre in the Swedish wordlist, for future use. When choosing a good synonym for "rädd" (afraid) as above, you don't have any exact match. Synoms are e.g. "skrajsen" (have got the wind up), genre fam = colloquial/casual/informal (familier) or "skräckslagen" (terrified/terror struck),genre neu = neutral (neutre) or maybe a bit formal, "rädd" is much more current. (BTW the connotations differ between "rädd" and "skräckslagen", the later is stronger ...) I used the following genres, inspired by Le Petit Robert, by Oxfords Advanced Learners Dictionary and by Bonniers svenska ordbok: neu = neutral (neutre) sol = solemn(solennel) fam = colloquial/casual/informal (familier) pej= depreciatory/pejorative (dénigrant/péjoratif) vulg= vulgar (vulgaire) old = old-fashioned (vieilli/archaïque) dial = dialectal (dialectal) It might be a good idea to agree on what genres to use, and apply it for all languages. 3. In the past I began adding domain info as well in the Swedish wordlist. I hoped it might be useful for word selection. I used e.g. c="domain:general style:fam" in the -tag, as proposed by Francis. I haven't got any opinion on the best way to add the info, I'm just eager to have the possibility. And a possibility to use the info. It might be a good idea to agree on domains, as well. Yours, Per Tunedal On Mon, Jun 15, 2020, at 18:38, Hèctor Alòs i Font wrote: > Here come several practical examples. I tried to select them for their > variety. The result is more a wish list than something structured. > > Let's begin with "je la baise". Depending on the context this may be "I kiss > her" or "I fuck her". The context can tell us if we are in a formal or > colloquial type of language. Another issue is that in this case the anaphora > resolution can also help us: if the pronoun reference is "hand", it can only > be "kiss"; if it is a person, the doubt persists. > > Another kind of problem is the Arpitan words "chamô" ("camel"; plural > "camels") and "chamôs ("chamois"; unchanged in plural). So, translating into > French, I got yesterday chamois in a Bible text of Exodus xD I solved it > deciding in a CG rule that all "chamôs" (without nothing around in singular) > are camels. (Similar cases in French: fil/fils, foi/fois, cour/cours) > > In French there are plenty of words with different meanings, depending on the > genre: livre, page, tour, etc. The problem is that often the immediate > surrounding context does not disambiguate: des livres, les pages, de tour, > etc. A similar but slightly different case is the word pairs homicide > mf/homicide m, féminicide mf/féminicide m, parricide mf/parricide, etc.: the > one with the genre "mf" is a person and the other is the action. > > Other problems come in lexical selection. For instance, as a rule, Catalan > preposition "de" is translated as "de" in French, but if the following word > is a material, "en" must be selected (de fusta > en bois). So in the > Catalan2French lrx file we have a list of materials, as we have a list of > countries, a list of musical instruments, a list of animals, etc. I dream > about a monolingual dictionary where we could get this kind of information. > It is not useful to have these lists for many language pairs using Catalan. > This information should be in apertium-cat and not in every apertium-cat-xxx > lrx file. > > Moreover, If we had words not only with different kind of semantic labels, > but also marked as synonyms, maybe it'd be possible to give a translation > using a word labeled as synonym (if it has a translation) instead of > "unknown". > > Hèctor > >
[Apertium-stuff] OT: Diceware and Dicelist
Hi all, thank you very much for the help with extracting words from the apertium-swe.swe.dix See the threads: How do I get a list of lemmas for nouns List of verbs Thanks to you, I have managed to build Swedish word lists for creating secure passwords with the help of dices. A passphrase of random words is far more easy to remember than a password of random characters. You can find my first dicelist here: https://github.com/havet/Dicelist The original idea comes from Arnold G. Reinhold and Diceware is his registred trademark. His list is for 5 dices and consists of 6^5 = 7776 english words. Later The Electronic Frontier Foundation published three alternative wordlists, two of them for 4 dices (6^4=1296 english words). For a start, I've published a Swedish wordlist for 4 dices (1296 words). Use 4 dices to randomly get a combination of digits between and , that corresponds to a word in the list. The combination 1234 corresponds e.g. to the word "avog" and the combination 5316 corresponds to the word "roa". You need to get at least 8 words to form a secure password. It will be slightly stronger than a password consisting of 12 random characters chosen from a set consisting of upper and lower case characters (a-z), numbers and symbols. I've excluded the numbers and the strange combinations of characters that are included in the original Diceware list. I have also tried to exclude rare words, offensive words, homophones and words hard to spell. Contributions are welcome! Make a wordlist in your own language. It's fairly easy if your language is used in Apertium. It's an advantage if you have access to lists of vulgar words and of homophones in your language. A word-frequency list is useful as well. More information: https://en.wikipedia.org/wiki/Diceware http://world.std.com/~reinhold/diceware.html https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases Yours, Per Tunedal ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] List of verbs
Hi again Kevin, thank you for the explanation. Case closed! Yours, Per Tunedal On Mon, May 11, 2020, at 13:39, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > > Hi Kevin, > > Thanks for the explanation. But what's the point of expanding the > > dictionary, anyway? > > It gets you all forms, instead of just lemmas, and it can get you lemmas > even where they're not marked with lm (the lm attribute is just treated > as a comment by apertium code, whereas the part of the entry is > actually used, thus more likely to be checked for correctness) > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] List of verbs
Hi Kevin, Thanks for the explanation. But what's the point of expanding the dictionary, anyway? I successfully tried: grep "lm=" apertium-swe.swe.dix | grep "__n_" grep "lm=" apertium-swe.swe.dix | grep "__vblex" grep "lm=" apertium-swe.swe.dix | grep "__adj" etc Faster and easier. But I didn't get the two nouns "tur", due to the comment in Norwegian. Had to add tur manually. I used: grep "lm=" apertium-swe.swe.dix | grep "__adj" | sed 's/\"><.*//' | sed 's/vals But it didn't mach these two lines: turtur¹ turtur² I didn't care, as it was just two lines that had comments. Yours, Per Tunedal On Mon, May 11, 2020, at 10:18, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > [...] > > > arna > > arnas > > arnas- > > ars > > ars- > > I have so far not been able to find out where they come from. They are not > > listed as nouns in apertium-swe.swe.dix > > Probably the sed not being able to hand lines like > > DJ:arna:DJ > > You may have to grep out lines with two colons first. > > > Among the adjectives I got e.g. the following verbs: > > abbreviera > > abdikera > > abonnera > > abortera > > Participles of verbs get tagged / . You can grep > them out. > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] List of verbs
Hi again, I've found that the solution you suggested doesn't work properly. Some non-existent words are produced in the process and are kept throughout the filtering. This got worse when I tried to get adjectives. The list was full of strange words, as well as words of other kinds, like e.g. verbs. I suspected the expansion produces some output that pollutes the result. Thus I tried working directly on apertium-swe-swe.dix, like this: grep "lm=" apertium-swe.swe.dix | grep "__n_" | less This produced a usable list of nouns. A side effect is that this is far faster. Remember, I asked about some very strange Swedish "nouns": arna arnas arnas- ars ars- I have so far not been able to find out where they come from. They are not listed as nouns in apertium-swe.swe.dix Among the adjectives I got e.g. the following verbs: abbreviera abdikera abonnera abortera I used: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | sed 's/[¹²³]//g' Any one who has a clue? Yours, Per Tunedal On Tue, Apr 28, 2020, at 18:36, Samuel Sloniker wrote: > egrep and fgrep are deprecated. Use grep -E and grep -F . > > On Tue, Apr 28, 2020 at 7:56 AM Per Tunedal wrote: >> Hi, >> thank you all for your kind help. I'm getting the lists I need. >> Yours >> Per Tunedal >> >> On Mon, Apr 27, 2020, at 20:35, Bernard Chardonneau wrote: >> > Yes, me I rather do that instead of >> > >> > (|||) >> > >> > and I also use fgrep and egrep instead of grep -F and grep -E >> > as it was/(is ?) in UNIX. >> > >> > >> > > Date: Sun, 26 Apr 2020 10:40:39 -0700 >> > > From: Samuel Sloniker >> > > To: apertium-stuff@lists.sourceforge.net >> > > Reply-To: apertium-stuff@lists.sourceforge.net >> > > Subject: Re: [Apertium-stuff] List of verbs >> > > Pièce(s) jointes(s) probable(s)> >> > > >> > > Shouldn't also work? >> > > >> > > On Fri, Apr 24, 2020 at 7:25 AM Daniel Swanson >> >> > > wrote: >> > > >> > > > Also, to explain the patterns >> > > > >> > > > [^<:>]+ is "match any string of characters that doesn't contain a tag >> or a >> > > > colon" >> > > > >> > > > So the grep is "anything without tags or colons (i.e. a surface form) >> then >> > > > a colon then another string (a lemma) then a tag" >> > > > >> > > > The sed matches roughly the same thing except it has () around the >> lemma >> > > > so it can refer to it later and .* to match whatever tags there may >> be. \1 >> > > > then replaces the line with the contents of the first (), i.e. the >> lemma. >> > > > >> > >> > >> > Bernard Chardonneau (France) >> > Phone : [33] 9 72 36 32 90 >> > GSM phone : [33] 7 69 46 16 31 >> > >> > An alternative Apertium translation website : >> > http://apertiumtrad.tuxfamily.org >> > >> > Multilingual websites for my free softwares : >> > http://libremail.free.fr and http://libremail.tuxfamily.org >> > http://cyloop.tuxfamily.org (mainly translated with Apertium) >> > >> > My general website (in french only) >> > http://bech.free.fr >> > >> > >> > ___ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > >> >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Strange Swedish nouns in apertium-swe.swe.dix
Hi, I suspect a bug somewhere. I will check it up soon. Yours, Per Tunedal On Fri, Apr 24, 2020, at 15:55, Per Tunedal wrote: > Hi Fran, > I thought there might be some reason to the strange nouns. Some trick > to solve some problem? Or maybe an error in the expression I´ve used to > get the list: > > lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E > 's/[^<:>]+:([^<:>]+).*/\1/g' | sort -u > > Yours, > Per Tunedal > > > On Fri, Apr 24, 2020, at 15:51, Francis Tyers wrote: > > El 2020-04-24 14:48, Per Tunedal escribió: > > > Hi, > > > I've found some strange nouns in my list of Swedish nouns from > > > apertium-swe.swe.dix: > > > arna > > > arnas > > > arnas- > > > ars > > > ars- > > > > > > What's that? > > > > > > And a misspelled word: > > > Södermalmvåning > > > should be: > > > Södermalmsvåning > > > > > > Yours, > > > Per Tunedal > > > > > > > Dear Per, > > > > Please feel free to send a pull request via GitHub! > > > > Best regards, > > > > Fran > > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] List of verbs
Hi, thank you all for your kind help. I'm getting the lists I need. Yours Per Tunedal On Mon, Apr 27, 2020, at 20:35, Bernard Chardonneau wrote: > Yes, me I rather do that instead of > > (|||) > > and I also use fgrep and egrep instead of grep -F and grep -E > as it was/(is ?) in UNIX. > > > > Date: Sun, 26 Apr 2020 10:40:39 -0700 > > From: Samuel Sloniker > > To: apertium-stuff@lists.sourceforge.net > > Reply-To: apertium-stuff@lists.sourceforge.net > > Subject: Re: [Apertium-stuff] List of verbs > > Pièce(s) jointes(s) probable(s)> > > > > Shouldn't also work? > > > > On Fri, Apr 24, 2020 at 7:25 AM Daniel Swanson > > wrote: > > > > > Also, to explain the patterns > > > > > > [^<:>]+ is "match any string of characters that doesn't contain a tag or a > > > colon" > > > > > > So the grep is "anything without tags or colons (i.e. a surface form) then > > > a colon then another string (a lemma) then a tag" > > > > > > The sed matches roughly the same thing except it has () around the lemma > > > so it can refer to it later and .* to match whatever tags there may be. \1 > > > then replaces the line with the contents of the first (), i.e. the lemma. > > > > > > Bernard Chardonneau (France) > Phone : [33] 9 72 36 32 90 > GSM phone : [33] 7 69 46 16 31 > > An alternative Apertium translation website : > http://apertiumtrad.tuxfamily.org > > Multilingual websites for my free softwares : > http://libremail.free.fr and http://libremail.tuxfamily.org > http://cyloop.tuxfamily.org (mainly translated with Apertium) > > My general website (in french only) > http://bech.free.fr > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] List of verbs
Hi, Now I would like to list the verbs. I cannot fully understand the grep- and sed expressions for getting the nouns: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | sort -u How should I modify the expressions to get the verbs instead? Yours, Per Tunedal ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi, Thank you Kevin! Works like a charm. BTW I've already changed 'unique' to 'sort -u' Yours, Per On Thu, Apr 23, 2020, at 10:42, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > > Hi Kevin, > > thanks for the explanation. Thus they are homonyms. How do I get rid of the > > duplicates? > > I just want: > > > > tur > > before the `| uniq`, stick in > > | sed 's/[¹²³]//g' > > > (You may have to change `uniq` to `sort -u` in case things are not ordered > already) > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Kevin, thanks for the explanation. Thus they are homonyms. How do I get rid of the duplicates? I just want: tur Yours, Per Tunedal On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > > Hi Daniel, > > Thank you! Works like a charm with a small exception. > > > > I get some strange duplicates like e.g. tur: > > > > tur¹ > > tur² > > slump vs färd, they have different paradigms: > > turtur¹ > turtur² > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Daniel, Thank you! Works like a charm with a small exception. I get some strange duplicates like e.g. tur: tur¹ tur² Yours, Per Tunedal On Wed, Apr 22, 2020, at 16:28, Daniel Swanson wrote: > Hi Per, > > If I understand correctly, this might give what you want: > > lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E > 's/[^<:>]+:([^<:>]+).*/\1/g' | uniq > > lt-expand lists all the forms, grep finds all the ones where the first tag is > , sed gets rid of everything but the lemma, and uniq removes duplicates. > > Daniel > > On Wed, Apr 22, 2020 at 7:54 AM Per Tunedal wrote: >> Hi, >> I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing >> else). How do I accomplish this? >> >> I read the Wiki: >> http://wiki.apertium.org/wiki/Dixtools:_Grep >> >> Thus I tried: >> apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix >> >> but nothing was filtered. I got the whole file. >> >> I have a bit trouble using grep, as I find regular expressions a bit hard >> to grasp. Unfortunately, I often get it wrong and get unexpected results. >> >> Now, I would like a list of nouns (just the lemmas) for a start. Then I >> need lists of the other parts of speech, verbs for instance. >> >> The expression below from http://wiki.apertium.org/wiki/Dictionary_reader: >> apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix >> gives me ALL lemmas. But I would like to choose the part of speech. >> >> I'm running Ubuntu as an app on Windows 10. >> >> Please give me a hand! >> >> Yours, >> Per Tunedal >> >> >> >> >> >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] How do I get a list of lemmas for nouns
Hi, I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing else). How do I accomplish this? I read the Wiki: http://wiki.apertium.org/wiki/Dixtools:_Grep Thus I tried: apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix but nothing was filtered. I got the whole file. I have a bit trouble using grep, as I find regular expressions a bit hard to grasp. Unfortunately, I often get it wrong and get unexpected results. Now, I would like a list of nouns (just the lemmas) for a start. Then I need lists of the other parts of speech, verbs for instance. The expression below from http://wiki.apertium.org/wiki/Dictionary_reader: apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix gives me ALL lemmas. But I would like to choose the part of speech. I'm running Ubuntu as an app on Windows 10. Please give me a hand! Yours, Per Tunedal ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Where do I find the dictionaries
Hi, thank you all! apertium-get apertium-swe worked like a charm. Yours, Per Tunedal On Tue, Apr 21, 2020, at 19:05, Tino Didriksen wrote: > Correct, data packages are not meant for development use. > > The monolingual packages install only exactly as much as is needed for > building pair packages and what an end-user may need for corpus analysis. > > Developers can use the apertium-get helper to install and build a > development-usable data package from source. E.g. running "apertium-get > apertium-swe" will install apertium-swe in the active folder. > > -- Tino Didriksen > > > On Tue, 21 Apr 2020 at 18:29, Jonathan Washington > wrote: >> Hi Per, >> >> To add to what Daniel said, language data installed from apt is put in >> system directories as root, and is not good for doing dev work. >> >> As a fairly up-to-date Apertium language data developer, I don't know the >> path of system-installed language data off the top of my head (you can >> always run dpkg -L apertium-swe to find out) and I'm not even sure it >> includes the uncompiled dictionaries. Maybe I'm just an elite developer >> without my pulse on the needs of actual Apertium users. >> >> But I do recommend what Daniel suggested—that would be the easiest approach, >> imo. >> >> -- >> Jonathan > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Where do I find the dictionaries
Hi, I'm a bit rusty, not having used Apertium for a long time. I would like to get a dictionary containing Swedish lemmas, doing something like: apertium-dixtools grep --par '.*__n' apertium-swe.dix Where do I find the Swedish monodix? I'm running Ubuntu as an app on Windows 10. I've installed Apertium nightly build. The language pairs swe-dan and swe-nor are installed from the repository with sudo apt-get install ... And I've successfully installed apertium-dixtools. Then I got stuck. I cannot figure out where the language files are installed. Yours, Per Tunedal ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] AddToDix-for-Apertium
Hi, I've just uploaded my old java programs to Github: https://github.com/havet/AddToDix-for-Apertium I have published them in the hope that they might be useful for the Apertium community. The programs are made for contributors that like human languages better than XML and works in Windows. The main advantage is that the contributor doesn't have to edit the actual dictionaries, written in XML. A language-savvy wanting to contribute, might be deterred from contributing by the look of the XML-files and might make lots of trivial errors if he/she tries to edit the code. The tools are made for adding to the old deprecated Apertium dictionaries for translation between Danish and Swedish (apertium-sv-da.sv etc). They have to be adapted to the new dictionaries, to be useful for contributing to Apertium. Yours. Per Tunedal PS I noticed the programs where still downloaded from my old site. I plan to remove them from that site, along with other programs. Some of the other programs will be published at Github as well. ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Apertium versions for iPhone and iPad
Hi, may I remind you of the idea to have dual licenses to permit Apertium apps for iPhone and iPad? We have the opportunity whenever a language is created from scratch. In those cases there is a limited number of developers involved and their consent may be acquired. The same applies to new modules etc. Yours, Per Tunedal -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Automatic change of tense
Hi Kevin, that would work. Obviously, some post editing is needed, but Apertium would do a lot of the tedious work . I'll give it a try. Nice Python script. Thank you! Yours, Per Tunedal On Tue, Feb 27, 2018, at 21:00, Kevin Brubeck Unhammer wrote: > > $ curl http://termbin.com/sah4 >pres2pret.py > > $ curl > https://raw.githubusercontent.com/goavki/streamparser/master/streamparser.py > >streamparser.py > > $ sudo apt install apertium-nno > > $ echo 'ååh hald no opp, eg gjer det etter kvart.' |\ > apertium-deshtml -n |\ > lt-proc /usr/share/apertium/apertium-nno/nno.automorf.bin |\ > cg-proc -1 /usr/share/apertium/apertium-nno/nno.rlx.bin |\ > python3 pres2pret.py|\ > lt-proc -g /usr/share/apertium/apertium-nno/nno.autogen.bin |\ > apertium-rehtml-noent > > > which gives the (rather ungrammatical) answer "ååh hald no opp, eg > gjorde det etter kvart". > > > Per Tunedal <per.tune...@operamail.com> čálii: > > > Hi all of you, > > can Apertium be used to change tense in a text? > > > > Scenario: > > > > I've written a text of some hundred pages in e.g. past tense and would > > like to have it in present (or the other way around). > > > > I suppose all information needed is in the monolingual dictionary for > > the language in question. We've got an analyser and a generator. > > > > The question is: > > > > How do I put this together to: > > > > - analyse a monolingual text > > - change the tense of the verbs as needed > > - generate the text with the chosen tense > > > > Yours, > > Per Tunedal > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 2 attachments: > + pres2pret.py > 1k (text/x-python) > + signature.asc > 1k (application/pgp-signature) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Automatic change of tense
Hi all of you, can Apertium be used to change tense in a text? Scenario: I've written a text of some hundred pages in e.g. past tense and would like to have it in present (or the other way around). I suppose all information needed is in the monolingual dictionary for the language in question. We've got an analyser and a generator. The question is: How do I put this together to: - analyse a monolingual text - change the tense of the verbs as needed - generate the text with the chosen tense Yours, Per Tunedal -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-swe-nor 0.2.0 released!
Hi Fran, sorry for the confusion. I'm a bit rusty on Apertium. If the meaning anledning = anledning is the least frequent: 1) Removing the anledning = anledning entry and 2) adding an entry möjlighet = anledning (or tillfälle = anledning) would do the trick. Yours, Per Tunedal On Wed, Jun 8, 2016, at 15:39, Francis Tyers wrote: > A 2016-06-08 14:37, Per Tunedal escrigué: > > Hi Fran! > > You'd better double check this with a Norwegian, because looking in my > > Norwegian dictionary "anledning" can have more meanings: > > 1) what I described (very common as far as I know) > > 2) = tillfälle, tilldragelse (occurrence, event) > > 3) = anledning, orsak (as in Swedish: reason, motive) > > > > I don't know how frequent the third meaning is, but I have never > > noticed > > it - maybe because there wasn't any problem :-) > > > > If the first meaning is the most frequent, I would do translate to > > "möjlighet" or maybe "tillfälle" would be better as it includes meaning > > 2 above (but it's not fluent Swedish in the first meaning). > > > >> anledningmöjlighet >> n="n"/> > > That entry doesn't make sense as "möjlighet" is not a Norwegian word. > > The pair is Swedish -- Norwegian, so that means that > SWEDISHNORWEGIAN > > anledninganledning n="n"/> > anledninggrunn n="n"/> > anledninghøve n="n"/> > > tillfälleanledning n="n"/> > möjlighetanledning n="n"/> > > Is what you are suggesting simply to remove the anledning = anledning > entry ? > > Fran > > -- > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-swe-nor 0.2.0 released!
Hi Fran! You'd better double check this with a Norwegian, because looking in my Norwegian dictionary "anledning" can have more meanings: 1) what I described (very common as far as I know) 2) = tillfälle, tilldragelse (occurrence, event) 3) = anledning, orsak (as in Swedish: reason, motive) I don't know how frequent the third meaning is, but I have never noticed it - maybe because there wasn't any problem :-) If the first meaning is the most frequent, I would do translate to "möjlighet" or maybe "tillfälle" would be better as it includes meaning 2 above (but it's not fluent Swedish in the first meaning). > anledning möjlighet n="n"/> Yours, Per Tunedal On Wed, Jun 8, 2016, at 11:11, Francis Tyers wrote: > A 2016-06-08 09:52, Per Tunedal escrigué: > > Hi! > > Congratulations! > > I've done some very superficial testing and so far just found some > > problems with well-known "false friends", e.g. the common word > > "anledning". The meaning in Swedish is "reason, motive", but in > > norwegian rather "opportunity, possibility". This might cause some > > severe misunderstandings between Norwegians and Swedes. > > > > If a Swede for instance invites a Norwegian to a > > party/wedding/christening, the Norwegian might answer "Jeg har ikke > > anledning til det." A direct, word for word, translation to "Jag har > > inte anledning till det" would be a very offensive answer to the > > invitation and might cause a life long hostility, if the > > misunderstanding isn't resolved. > > > > My neighbour answered me: "... vis jeg har anledning til det." And I > > explained to him that it wasn't a good answer to a Swede ... > > > > It might be a good idea to check for other false friends and adjust the > > translation. > > > > Thanks for the bug report! > > Here are the entries in the bilingual dictionary for "anledning", how > would you change them to > make them better? > > anledninganledning n="n"/> > anledninggrunn n="n"/> > anledninghøve n="n"/> > > tillfälleanledning n="n"/> > > Fran > > -- > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-swe-nor 0.2.0 released!
Hi! Congratulations! I've done some very superficial testing and so far just found some problems with well-known "false friends", e.g. the common word "anledning". The meaning in Swedish is "reason, motive", but in norwegian rather "opportunity, possibility". This might cause some severe misunderstandings between Norwegians and Swedes. If a Swede for instance invites a Norwegian to a party/wedding/christening, the Norwegian might answer "Jeg har ikke anledning til det." A direct, word for word, translation to "Jag har inte anledning till det" would be a very offensive answer to the invitation and might cause a life long hostility, if the misunderstanding isn't resolved. My neighbour answered me: "... vis jeg har anledning til det." And I explained to him that it wasn't a good answer to a Swede ... It might be a good idea to check for other false friends and adjust the translation. Yours, Per Tunedal On Tue, Jun 7, 2016, at 22:51, Kevin Brubeck Unhammer wrote: > 111 years ago today, the union between Sweden and Norway was dissolved. > But we're still good friends. > > Here's the first proper release of apertium-swe-nor, giving translation > From Swedish→Nynorsk, Nynorsk→Swedish, Swedish→Bokmål and Bokmål→Swedish > – meaning all directions between Swedish, Norwegian and Danish are now > covered by apertium :-) > > Changes from the beta[1] include better disambiguation of Swedish, > complete > testvoc, expanded bidix and better transfer from/to supine. We'll be > asking the natives on this list for some help with evaluation in a bit … > > As with the previous dan-nor and swe-dan releases, this work was > sponsored by Apertium and Wikimedia Foundation. > > Signed tarballs available from > https://sourceforge.net/projects/apertium/files/apertium-swe/ (0.7.0) > https://sourceforge.net/projects/apertium/files/apertium-nno/ (0.9.0) > https://sourceforge.net/projects/apertium/files/apertium-nob/ (0.9.0) > https://sourceforge.net/projects/apertium/files/apertium-swe-nor/ (0.2.0) > > The pair is already testable from https://apertium.org and it seems > Kartik and Tino are hard at work packaging stuff so it should be in > Content Translation for testing Quite Soon™. > > > [1] http://permalink.gmane.org/gmane.comp.nlp.apertium/5809 > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > -- > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Backporting of swe-dan to sv-da was: Re: Java Pairs Updated
Hi, On Mon, Mar 7, 2016, at 09:58, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > Hi Mikel, > > Would you include it in the Java-versions, if I make a back-port of > > swe-dan to sv-da? > > As mentioned, you don't need a backport, you just need to remove the CG > From the pipeline. I don't know how the modes files are represented in > the omegaT plugin, but presumably there is somewhere in it that says > > "lt-proc swe-dan.automorf.bin" > "cg-proc swe-dan.rlx.bin" > "apertium-tagger -g swe-dan.prob" > > etc.; and then you just remove the "cg-proc" bit and leave the rest, and > it should still run. > > > Is it a requirement that the new version would be "released"? What would > > qualify the new version of sv-da as a "release"? > > If we are to serve it from the official channels, yes, -- snip-- Yes, please. > > -Kevin Per Tunedal -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi, The page http://wiki.apertium.org/wiki/Lemmatisation tells me to run: $ echo "Den här är en test." | apertium -d . swe-tagger | cg-proc guesser.bin | sed 's/<[^>]\+>//g' | cg-proc -n guesser.bin Thus swe-tagger implicitly tells me that CG is used for disambiguation? I don't have to change anything at all. What about other languages? Will the command work by just changing swe to the actual language? It doesn't matter if the language uses CG or not? Yours, Per Tunedal On Mon, Mar 7, 2016, at 10:20, Kevin Brubeck Unhammer wrote: > Kevin Brubeck Unhammer <unham...@fsfe.org> čálii: > > > Per Tunedal <per.tune...@operamail.com> čálii: > > > >> Hi again, > >> Obviously, CG would be quite helpful for disambiguation when doing > >> lemmatisation. Would it be complicated to add an option to use CG (if > >> present)? Using the cg-rules for the language would probable remove some > >> more ambiguity. > > > > Exchange -tagger for -disam to run CG as well. > > Sorry, I confused myself: -tagger actually runs CG now. So > > swe-morph = lt-proc swe.automorf.bin > > swe-disam = lt-proc swe.automorf.bin | cg-proc swe.rlx.bin > > swe-tagger = lt-proc swe.automorf.bin | cg-proc swe.rlx.bin | > apertium-tagger swe.prob > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://makebettercode.com/inteldaal-eval > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Software supporting the translation process
Hi Trond, As I have some experience of both translation and of using OmegaT, I just want to point out some issues: 1. Translators are very often explicitly forbidden to use all kind of web tools due to strict confidentiality enforced by the client company. 2. The Apertium plug-in to OmegaT has the great advantage over e.g. Google Translate that it's run locally. BTW an other advantage is of course that its free - using the Google api will be quite expensive in the long run. Unfortunately, the translation quality is presently far better using Google Translate. Yet, using Apertium is an interesting option for translators working in the proliferating commercial market. Making the OmegaT plugin a web service would certainly make it less interesting for translators. Finally, I agree with your reflection: > I am quite > satisfied with the Apertium+Omega-T platform as it is, the only problem > is that it does not work for the languages I work with. Yours, Per Tunedal On Sun, Mar 6, 2016, at 21:22, Trosterud Trond wrote: > > As one of the people working hard for some of the Apertium projects in > the nursery catalogue, I find it a challenge to convince people in the > language communities that this whole entreprise is a good idea. Since the > usage scenario is translation for text production, and not gisting, I am > dependent upon two things: > > - high quality output (which obviously I and my team are responsible for > ourselves), and > - translation programs to support translators in their work. > > For the last type, there are two candidates for Apertium: The Wikipedia > content translator, which is great, and has the functionality I want (see > below), but is only for Wikipedia translation, and the Apertium + Omega-T > program setup. > > I have for a while tried to get the Apertium+Omega-T working for sme-smn, > but with no results. My last attenpt (and bugfix from one of the involved > developers, thanks!!) brought me to the point where I got a window > telling there was a path problem. > > Anoying as this showstopper is, that is not the point of this letter, it > is only a symptom of the neglect the issue has. My point is that the good > translation programs we build within the Apertium framework are not put > into use (and hence looses the opportunity to much developmental feedback > from users and communities), since we do not have platforms for their > use. > > Prompsit uses Apertium, this I think is fantastic. But they have their > own priorities, and most of the language pairs we work with are outside > those priorities. > > What I would like to see is first and foremost work on the > Apertium+Omega-T (or similar) platform(s), set up as a **web-based** > service, so that users may download the program, and set up the MT > service with paths (preferably by choosing languages from a menu, evt. > having a menu referring to a dynamic list of language pairs). I am quite > satisfied with the Apertium+Omega-T platform as it is, the only problem > is that it does not work for the languages I work with. And when I cannot > get it work, the actual translators will not make it either. What they > need is a setup that saves their time, where they may either take the MT > sentence offered, or translate for themselves, and where, and this is > **very** important, the program fixes formatting, pictures, etc. for > them. The sad thing is that we have all this, we just do not see to it > that it works. > > I know there is a subset of the Apertium languages for which one is able > to just click and download. This is fine, for the ones that work with > those. I am also not against fixes that makes it possible for anyone with > a working commandline version of any Apertium pair to use it in Omega-T. > On the contrary, that would be great -- for me, as a developer. But that > will be irrelevant to the language community and their translators. What > they need is the possibility to use a web-based MT input, just like for > the Wikipedia Content Translation. > > During Google Code bids I have always favoured projects geared towards > concrete language works, although I have seen that there always have been > plenty of programmers applying with lot of interest but less of relevant > language knowledge. > > This is their time. Here, I really would like to see some input. The > difference between saying that something __is__ useful and that something > __could be__ useful is simply to big to be ignored. > > Trond > -- > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] CG versus tagger was: Re: Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi, Just a thought: couldn't this kind of rule just as well be implemented in the TSX-file that's used to train the tagger? In that case, retraining the tagger might do the trick as well. Yours, Per Tunedal On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the > > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the > > cow). We have: > > > > (kon→ kon/ko) > > > > Translating the whole sentence would give us: > > > > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the > > cow) > > > > Wouldn't that be quite revealing in many cases? In this case e.g. a > > statistical language model could easily separate the wheat from the > > chaff. > > That example argues against your point – here the source language has > two analyses of "kon", with different ind/def taggings (as it should). > > This is not a lexical selection problem, but a morphological > disambiguation problem. > > It took me all of five minutes to write a CG rule to select indefinite > for nouns after indefinite determiners: > > LIST IndA = (adj ind) (adj comp) ; > SET NotIndA = (*) - IndA ; > REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) > ; > > and a quick corpus diff seems to show it generalises well: > > http://sprunge.us/hhbf?diff > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] apertium.org
Hi, Looking at https://www.apertium.org/ I find that I still cannot translate the page to Swedish, although I contributed a translation some time ago. Any plans to update to the latest version of apertium-html-tools? BTW Is that page using the new release of the pair swe-dan? Yours, Per Tunedal -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Tino, would it be possible to do this with CG or would this need to implemented in some new program? Anyhow, I suppose this involves a lot of work writing clever rules. My rational for target language smoothing is: a) my bad experience of the performance of the old pair sv-da with lots of blatant errors made by the tagger. b) target language smoothing is very easy to implement. When doing lemmatisation the tagger isn't used, as far as I can see. This leaves all inherent ambiguities in the Swedish dictionary to be handled somehow. So, I just proposed a quick fix that might improve the result significantly in this special case (and maybe other cases with a lot of ambiguity that is not properly handled). Other means my be more adequate and/or more effective. Yours, Per Tunedal On Fri, Mar 4, 2016, at 08:53, Tino Didriksen wrote: > On 4 March 2016 at 07:52, Per Tunedal > <per.tune...@operamail.com> wrote: >> Yes, of course! That has always seemed a bit unnatural to me. It's >> harder to decide on the right source language lemma before translating >> than doing it after translation. > > I almost entirely disagree, and I've got experience and data to back > it up. Target language smoothing does not help much, if your source > language analysis is good. > > You can disambiguate the source language to nigh-100% if you use more > analysis levels, such as dependency and semantics. This is what we do > at GrammarSoft / VISL. It works. > > Apertium could also do it this way, and it would benefit all languages > built from a specific source. > > -- Tino Didriksen > -- > > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > _ > Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi again, Obviously, CG would be quite helpful for disambiguation when doing lemmatisation. Would it be complicated to add an option to use CG (if present)? Using the cg-rules for the language would probable remove some more ambiguity. Looking at the page http://wiki.apertium.org/wiki/Lemmatisation . What does the command actually do: $ echo "Den här är en test." | apertium -d . swe-tagger | cg-proc guesser.bin | sed 's/<[^>]\+>//g' | cg-proc -n guesser.bin Will give lemmatised output where the tokens are encased in ^ and $, and ambiguous stems/lemmas are given separated by '/' Yours, Per Tunedal On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the > > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the > > cow). We have: > > > > (kon→ kon/ko) > > > > Translating the whole sentence would give us: > > > > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the > > cow) > > > > Wouldn't that be quite revealing in many cases? In this case e.g. a > > statistical language model could easily separate the wheat from the > > chaff. > > That example argues against your point – here the source language has > two analyses of "kon", with different ind/def taggings (as it should). > > This is not a lexical selection problem, but a morphological > disambiguation problem. > > It took me all of five minutes to write a CG rule to select indefinite > for nouns after indefinite determiners: > > LIST IndA = (adj ind) (adj comp) ; > SET NotIndA = (*) - IndA ; > REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) > ; > > and a quick corpus diff seems to show it generalises well: > > http://sprunge.us/hhbf?diff > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Backporting of swe-dan to sv-da was: Re: Java Pairs Updated
Hi Mikel, Would you include it in the Java-versions, if I make a back-port of swe-dan to sv-da? Is it a requirement that the new version would be "released"? What would qualify the new version of sv-da as a "release"? Yours, Per Tunedal On Thu, Mar 3, 2016, at 12:39, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > Hi, > > is it possible to do a "back-port" of swe-dan to sv-da? The > > back-ported version would benefit from the improved swedish monodix. > > Unfortunately, the most blatant problem, the untrained tagger, would > > persist, though. > > Do you mean in order to avoid the CG requirement? In that case, you can > just remove it from the pipeline, no further change needed. > > > -Kevin > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Keld, Yes, I remember taking a look at your suggested algorithm. Did you ever try it out? I don't remember if you intended to use it on the source language before the tagger (a possible alternative to, or addition to CG) or if you intended it for lexical selection for the target language (a possible alternative to Francis lexical selection module). Yours, Per Tunedal On Fri, Mar 4, 2016, at 16:19, k...@keldix.com wrote: > On Fri, Mar 04, 2016 at 02:10:50PM +0100, Per Tunedal wrote: > > Hi Kevin, > > > > Back to Lemmatisation: > > What's the easiest way to do a disambiguation, rather than get a list of > > possible lemmas? > > I am not sure it is the easiest way, but I have previously suggested that > we > use wordnet data, which is freely available for Danish, to find out which > of the > lemmas that has the shortest distance to other lemmas in the surrounding > text > for the given homonym. > > Best regards > Keld > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Francis, I didn't thing so: Ta en kon! --> Tage en koen! I will test some of the other problem later on. Yours, Per Tunedal On Fri, Mar 4, 2016, at 14:14, Francis Tyers wrote: > A 2016-03-04 14:10, Per Tunedal escrigué: > > Hi Kevin, > > Yes, this could definitely be fixed before the translation as it's > > evident looking at the grammatical construction of the sentence. And of > > course it's much better to fix it before translation than after. > > > > My point was that translation adds more information, this makes it > > possible to quite easily fix ambiguity that have not been sorted out > > before translation. Even simple solutions like a language model might > > help. > > > > And Apertium sv-da has a lot of problems of this kind - I don't how > > much > > training of the tagger would have helped. Anyhow, now we've got a brand > > new release of Apertium swe-dan with CG. Maybe some of these problems > > are solved by now. Unfortunately I've not been able to test as the two > > of my boxes running Apertium are bound for the city dump. I hope to see > > Apertium swe-dan soon at Apertium.org or maybe I'll find some time to > > install Apertium at some other box. The Jjava versions cannot use CG. > > > > It's already on apertium.org. > > F. > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Kevin, Yes, this could definitely be fixed before the translation as it's evident looking at the grammatical construction of the sentence. And of course it's much better to fix it before translation than after. My point was that translation adds more information, this makes it possible to quite easily fix ambiguity that have not been sorted out before translation. Even simple solutions like a language model might help. And Apertium sv-da has a lot of problems of this kind - I don't how much training of the tagger would have helped. Anyhow, now we've got a brand new release of Apertium swe-dan with CG. Maybe some of these problems are solved by now. Unfortunately I've not been able to test as the two of my boxes running Apertium are bound for the city dump. I hope to see Apertium swe-dan soon at Apertium.org or maybe I'll find some time to install Apertium at some other box. The Jjava versions cannot use CG. Back to Lemmatisation: What's the easiest way to do a disambiguation, rather than get a list of possible lemmas? Yours, Per Tunedal On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the > > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the > > cow). We have: > > > > (kon→ kon/ko) > > > > Translating the whole sentence would give us: > > > > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the > > cow) > > > > Wouldn't that be quite revealing in many cases? In this case e.g. a > > statistical language model could easily separate the wheat from the > > chaff. > > That example argues against your point – here the source language has > two analyses of "kon", with different ind/def taggings (as it should). > > This is not a lexical selection problem, but a morphological > disambiguation problem. > > It took me all of five minutes to write a CG rule to select indefinite > for nouns after indefinite determiners: > > LIST IndA = (adj ind) (adj comp) ; > SET NotIndA = (*) - IndA ; > REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) > ; > > and a quick corpus diff seems to show it generalises well: > > http://sprunge.us/hhbf?diff > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi, Yes, of course! That has always seemed a bit unnatural to me. It's harder to decide on the right source language lemma before translating than doing it after translation. The ambiguity in the source language is in most cases not present in the target language. After translation you have will have an indication if the translation "makes sense" or not, this could be quite useful when choosing between two different translations due to ambiguity in the source language. But Apertium doesn't work that way. You just bet on one of the possible source lemmas before translation. And yes, ambiguity regarding how to translate that one and only source lemma is a more limited task. Can Apertium be manipulated somehow to translate all possible source lemmas into translation hypotheses for the whole sentence (instead of choosing just one source lemma)? The lemmatisation seems to do half of the job: displaying all possible lemmas, separated by '|'. I would like to continue one step further and translate all possible variants (analyses) of a sentence. An example: Apertium sv-da has some trouble to translate the sentence: 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the cow). We have: (kon→ kon/ko) Translating the whole sentence would give us: tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the cow) Wouldn't that be quite revealing in many cases? In this case e.g. a statistical language model could easily separate the wheat from the chaff. BTW Apertium sv-da bets on the second option. Yours, Per Tunedal On Thu, Mar 3, 2016, at 21:53, Kevin Brubeck Unhammer wrote: > Per Tunedal <per.tune...@operamail.com> čálii: > > > If the constraint-based lexical selection module is used for a pair, I > > cannot see why it couldn't be used. The rules are already in place. All > > you have to do is to translate the ambiguous sentences and let the > > module select the best translation. > > > > The tricky bit would be to use this information backwards to choose the > > right lemma in the original language. I'm not savvy enough to figure out > > how to do it. > > The right source language lemma is already selected by the time lexical > selection runs. Lexical selection is about selecting the right *target* > language lemma. > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Francis, well,maybe. Anyhow, lemmatisation is useful for many applications. One is frequency lists e.g. as a way to choose what new words to add to a language pair. It might be worth some effort to figure out an easy way to do disambiguation, or rather lexical selection. Intuitively, I would like to use a translation to distinguish between different significations. A language model might do the trick, but maybe an Apertium translation wouldn't be fluent enough to stand the test. Maybe that's why the lextor module proved non-efficient for lexical selection. If the constraint-based lexical selection module is used for a pair, I cannot see why it couldn't be used. The rules are already in place. All you have to do is to translate the ambiguous sentences and let the module select the best translation. The tricky bit would be to use this information backwards to choose the right lemma in the original language. I'm not savvy enough to figure out how to do it. Yours, Per Tunedal On Wed, Mar 2, 2016, at 09:11, Francis Tyers wrote: > A 2016-03-02 09:09, Per Tunedal escrigué: > > Hi again, > > > > On Wed, Mar 2, 2016, at 08:34, Francis Tyers wrote: > >> A 2016-03-02 08:25, Per Tunedal escrigué: > >> > Hi Francis, > >> > lemmatisation would be interesting to try, but what about > >> > disambiguation? > >> > > >> > "ambiguous stems/lemmas are given separated by '/' " > >> > > >> > Can this be improved by your new lexical selection module somehow? It > >> > would be better to choose the most probable lemma than simply the > >> > first. > >> > >> No, it couldn't. > > > > Any other way to do lexical selection that might work? > > > > I wouldn't bother, I would let it be ambiguous and then fix it in a > post-processing > step. > > F. > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Java Pairs Updated
Hi, Now there's a brand new swe-dan release! Any chance you include it? Yours, Per Tunedal On Tue, Feb 23, 2016, at 09:27, Tino Didriksen wrote: > I have updated all the pairs I could in > https://svn.code.sf.net/p/apertium/svn/builds with these omissions: > > --snip-- > > apertium-sv-da has since been renamed to swe-dan, but there is no > release with that name yet. > > --snip-- > -- Tino Didriksen > -- > > --snip-- -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi again, On Wed, Mar 2, 2016, at 08:34, Francis Tyers wrote: > A 2016-03-02 08:25, Per Tunedal escrigué: > > Hi Francis, > > lemmatisation would be interesting to try, but what about > > disambiguation? > > > > "ambiguous stems/lemmas are given separated by '/' " > > > > Can this be improved by your new lexical selection module somehow? It > > would be better to choose the most probable lemma than simply the > > first. > > No, it couldn't. Any other way to do lexical selection that might work? > > > And OOW-words (not found in the dictionary, but present in the corpus)? > > How to handle them? Can the lemmas be guessed? I suppose some > > statistical model might do the trick. > > Those are guessed, read the page ;) > Oops! I've now looked at the Guesser section in the Swedish monodix and got an idea of the process. --snip-- Yours, Per Tunedal -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Lemmatisation was: Re: apertium-swe-dan-0.7.0 released
Hi Francis, lemmatisation would be interesting to try, but what about disambiguation? "ambiguous stems/lemmas are given separated by '/' " Can this be improved by your new lexical selection module somehow? It would be better to choose the most probable lemma than simply the first. And OOW-words (not found in the dictionary, but present in the corpus)? How to handle them? Can the lemmas be guessed? I suppose some statistical model might do the trick. Or maybe the dictionary can be used in some inventive way? It contains a lot of paradigms - but unfortunately nothing about how common they are. What about sorting them according to frequency in a reference corpus? Or adding the frequency with a tag in the paradigms? (Might be useful anyway, e.g. when adding words to the monodix: a GUI could propose the most likely paradigms at the top of an arrow list. Might minimise the risk for choosing a rare and probably wrong paradigm.) Yours, Per Tunedal On Tue, Mar 1, 2016, at 23:27, Francis Tyers wrote: --snip-- > If you'd like to share any of your probabilistic lexicons for > Swedish--Norwegian > or Swedish--Danish we'd be interested in looking at them. > > If you have experience in SMT, the word alignments for Europarl for > Swedish--Danish > could be pretty useful! Especially if you use the lemmatisation step > described here: > > http://wiki.apertium.org/wiki/Lemmatisation > > Fran > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-swe-dan-0.7.0 released
Hi all of you, Congratulations to the new release! I haven't tested it yet, but I'm very curious. It will be very interesting to see if at least the most blatant errors finally are gone. I'm sorry to hear that I contributed to Lars bad experiences in the past :-( I passed several months struggling with adding words and trying to correct errors I found. Unfortunately, I didn't manage to solve all of the tricky issues I found and was very disappointed when the translation quality stayed very low. Later I found thousands of more errors in the Swedish dictionary by expanding the words and running a spell check on the expanded list. I fixed just a few of them, but didn't find any solution to some of the problems. I simply abandoned the tedious task to correct the errors - they where too many. One big issue that I didn't solve was that the tagger had never been trained - thus it made some very blatant errors. Further some of the new vocabulary wouldn't show up in translations. Is the tagger trained now? An other problem was lexical selection. I'm eager to see how this is handled now. My intention was to create the pair Norwegian-Swedish, but I was advised to start with Swedish-Danish. My bad experiences of that pair made me abandon my original intention and explore statistical translation instead, although I already had learnt some Norwegian. Anyhow, congratulations to the new release! It was probably a brilliant idea to simply make a new Swedish dictionary and thus get rid of all old errors. I do hope the translation quality now finally has improved! Yours, Per Tunedal PS My danish isn't any good, I've several times asked for Danes to review my additions to the Danish dictionary. Now is the time, if not done yet. Maybe an expansion of those words would reveal quite a few errors to a native Dane. On Tue, Mar 1, 2016, at 19:46, Francis Tyers wrote: > A 2016-03-01 16:10, Lars Aronsson escrigué: > > On 03/01/2016 02:03 PM, Kevin Brubeck Unhammer wrote: > >> The bidix has a lot of additions by Per Tunedal from earlier (and I > >> think I saw your name in there as well?), although additions to > >> *monodix* are mostly encompassed by the new SALDO lexicon. There was a > >> lot of inconsistency to work out due to the whole lexicon change, > >> although the swe dictionary is probably a lot more correct now, > >> considering the original one was initially created by Danes :) (and of > >> course it's much bigger). > > > > I started to contribute, but failed and left. I don't remember > > the details, but I think it was something like this: When one > > grammatical rule covers the inflected forms of many words, > > and a handful of them need an exception to that rule, Per > > had a tendency to rewrite the rule for those few words > > without considering the many others. It was too easy to > > modify the rule and no regression tests for the other words > > or phrases that used that rule. It seemed impossible to me > > to guarantee any quality or correctness. > > That sounds like a pretty bad experience. But now, with the SALDO-based > dictionary it should be much better. In essence you should only need to > add words to the bilingual dictionary :) > > Fran > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-html-tools: sort localisation languages
Done. On Fri, Nov 27, 2015, at 16:48, Sushain Cherivirala wrote: > Please file an issue on the GitHub repository (copy your email) and we > can continue discussion there and preserve it for the future. Thanks! > > On Fri, Nov 27, 2015, 9:41 AM Per Tunedal > <per.tune...@operamail.com> wrote: >> Hi, >> I noticed that the list of localisation languages locales.json is used >> "as is": it would be more user friendly if it was sorted. >> >> I suppose this would be a minor change in localization.js, but >> unfortunately I've never learnt javascript. I got confused looking at >> the code. I'd better not touch it. >> >> Yours, >> Per Tunedal >> >> -- >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > -- > > _ > Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Some silly mistake translating Apertium-html-tools
Hi, That's true regarding installation, but the procedure for contribution is new. Yours, Per Tunedal Skickat från min Sony Xperia™-smartphone Francis Tyers skrev >It shouldn't make a difference as the GitHub and SVN versions should be >synched. > >Fran > >A 2015-11-26 08:19, Per Tunedal escrigué: >> Hi again, >> >> Finally I've sent a pull request (Havet). I had some trouble due to a >> bug in GitHub desktop, now fixed. >> >> Would you please update the Apertium wiki page on Apertium-html-tools. >> The information on installation and contribution seems outdated due to >> the move to GitHub: >> >> http://wiki.apertium.org/wiki/Apertium-html-tools [5] >> >> Yours, >> >> Per Tunedal >> >> On Mon, Nov 23, 2015, at 17:03, Sushain Cherivirala wrote: >> >>> Development of apertium-html-tools is proceeding on GitHub [1], >>> please either send a pull request or email us the file for manual >>> inclusion. >>> >>> Thanks for your help! >>> >>> -- >>> >>> Sushain K. Cherivirala >>> >>> www.skc.name [2] >>> >>> On Mon, Nov 23, 2015 at 4:51 AM, Per Tunedal >>> <per.tune...@operamail.com> wrote: >>> >>> Hi, >>> >>> Excellent. I've added the hint to the wiki. >>> >>> Now I have trouble submitting the locales.json file: >>> >>> svn: Server sent unexpected return value (423 Locked) in response to >>> PUT request for >>> >> '/p/apertium/svn/!svn/wrk/eac98752-ece9-4fe0-ac28-360ac3138eea/trunk/apertium-tools/apertium-html-tools/assets/strings/locales.json' >>> >>> Someone else updating or what? I have tried several times, though. >>> >>> Yours, >>> >>> Per Tunedal >>> >>> On Sun, Nov 22, 2015, at 22:11, Tino Didriksen wrote: >>> >>> Line 49 >>> >>> "authors": [Per Tunedal], >>> >>> is not valid JSON. It should be >>> >>> "authors": ["Per Tunedal"], >>> >>> If in doubt, check your JSON with http://jsonlint.com/ [3] or >>> similar. >>> >>> -- TD >>> >>> On 22 November 2015 at 22:01, Per Tunedal >>> <per.tune...@operamail.com> wrote: >>> >>> I've committed the file all the same. I suppose I've done some silly >>> >>> mistake. Any clue? >>> >>> >> -- >>> >>> ___ >>> >>> Apertium-stuff mailing list >>> >>> Apertium-stuff@lists.sourceforge.net >>> >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff [4] >> >> -- >> >> Go from Idea to Many App Stores Faster with Intel(R) XDK >> >> Give your users amazing mobile app experiences with Intel(R) XDK. >> >> Use one codebase in this all-in-one HTML5 development environment. >> >> Design, debug & build mobile apps & 2D/3D high-impact games for >> multiple OSs. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140 [6] >> >> ___ >> >> Apertium-stuff mailing list >> >> Apertium-stuff@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff [4] >> >> -- >> >> Go from Idea to Many App Stores Faster with Intel(R) XDK >> >> Give your users amazing mobile app experiences with Intel(R) XDK. >> >> Use one codebase in this all-in-one HTML5 development environment. >> >> Design, debug & build mobile apps & 2D/3D high-impact games for >> multiple OSs. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140 [6] >> >> ___ >> >> Apertium-stuff mailing list >> >> Apertium-stuff@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff [4] >> >> >> >> Links: >> -- >> [1] https://github.com/goavki/apertium-html-tools >> [2] http://www.skc.name >> [3] http://jsonlint.com/ >> [4] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Some silly mistake translating Apertium-html-tools
Hi again, Finally I've sent a pull request (Havet). I had some trouble due to a bug in GitHub desktop, now fixed. Would you please update the Apertium wiki page on Apertium-html-tools. The information on installation and contribution seems outdated due to the move to GitHub: http://wiki.apertium.org/wiki/Apertium-html-tools Yours, Per Tunedal On Mon, Nov 23, 2015, at 17:03, Sushain Cherivirala wrote: > Development of apertium-html-tools is proceeding on GitHub[1], please > either send a pull request or email us the file for manual inclusion. > > Thanks for your help! > > -- > Sushain K. Cherivirala www.skc.name > > On Mon, Nov 23, 2015 at 4:51 AM, Per Tunedal > <per.tune...@operamail.com> wrote: >> __ >> Hi, Excellent. I've added the hint to the wiki. Now I have trouble >> submitting the locales.json file: svn: Server sent unexpected return >> value (423 Locked) in response to PUT request for >> '/p/apertium/svn/!svn/wrk/eac98752-ece9-4fe0-ac28- >> 360ac3138eea/trunk/apertium-tools/apertium-html- >> tools/assets/strings/locales.json' >> >> Someone else updating or what? I have tried several times, though. >> >> Yours, Per Tunedal >> >> >> >> On Sun, Nov 22, 2015, at 22:11, Tino Didriksen wrote: >>> Line 49 "authors": [Per Tunedal], is not valid JSON. It should be >>> "authors": ["Per Tunedal"], >>> >>> If in doubt, check your JSON with http://jsonlint.com/ or similar. >>> >>> -- TD >>> >>> On 22 November 2015 at 22:01, Per Tunedal >>> <per.tune...@operamail.com> wrote: >>>> I've committed the file all the same. I suppose I've done some >>>> silly mistake. Any clue? >>> >>> -- >>> >>> _ >>> Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> >> >> - >> - >> Go from Idea to Many App Stores Faster with Intel(R) XDK >> Give your users amazing mobile app experiences with Intel(R) XDK. >> Use one codebase in this all-in-one HTML5 development environment. >> Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. >> http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140 >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > -- > > Go from Idea to Many App Stores Faster with Intel(R) XDK Give your > users amazing mobile app experiences with Intel(R) XDK. Use one > codebase in this all-in-one HTML5 development environment. Design, > debug & build mobile apps & 2D/3D high-impact games for multiple OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140 > _ > Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff Links: 1. https://github.com/goavki/apertium-html-tools -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Some silly mistake translating Apertium-html-tools
Hi, Excellent. I've added the hint to the wiki. Now I have trouble submitting the locales.json file: svn: Server sent unexpected return value (423 Locked) in response to PUT request for '/p/apertium/svn/!svn/wrk/eac98752-ece9-4fe0-ac28- 360ac3138eea/trunk/apertium-tools/apertium-html- tools/assets/strings/locales.json' Someone else updating or what? I have tried several times, though. Yours, Per Tunedal On Sun, Nov 22, 2015, at 22:11, Tino Didriksen wrote: > Line 49 "authors": [Per Tunedal], is not valid JSON. It should be > "authors": ["Per Tunedal"], > > If in doubt, check your JSON with http://jsonlint.com/ or similar. > > -- TD > > On 22 November 2015 at 22:01, Per Tunedal > <per.tune...@operamail.com> wrote: >> I've committed the file all the same. I suppose I've done some silly >> mistake. Any clue? > -- > > _ > Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Some silly mistake translating Apertium-html-tools
Hi, I just tried translating Apertium-html-tools to Swedish, but get an exception running ./localisation-tools.py all swe : Traceback (most recent call last): File "/usr/lib/python3.1/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) StopIteration During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./localisation-tools.py", line 73, in strings = OrderedDict(filter(lambda x: x[0] in canonicalStrings.keys(), loadJSON(f).items())) File "./localisation-tools.py", line 15, in loadJSON return json.loads(f.read(), object_pairs_hook=OrderedDict) File "/usr/lib/python3.1/json/__init__.py", line 318, in loads return cls(**kw).decode(s) File "/usr/lib/python3.1/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.1/json/decoder.py", line 357, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded I've committed the file all the same. I suppose I've done some silly mistake. Any clue? Yours, Per Tunedal -- ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] No package apertium-swe found
Hi llnar, Thank you! It works like a charm. Yours, Per Tunedal On Tue, Mar 10, 2015, at 16:32, Ilnar Salimzyan wrote: Hello, 2015-03-10 15:58 GMT+01:00 Per Tunedal per.tune...@operamail.com: Hi, got into trouble when reinstalling. ./autogen.sh in the swe-dan directory gives the following error message: No package 'apertium-swe' found You need to tell autogen.sh where the apertium-swe package resides, e.g.: ./autogen.sh --with-lang1=../../languages/apertium-swe And, possibly, the same for the apertium-dan package: ./augogen.sh --with-lang1=../../languages/apertium-swe --with-lang2=../../languages/apertium-dan Best, Ilnar although the language package is downloaded from SVN and I've ran ./autogen.sh make in the folder. (Make returns: Inget behöver göras för All. ) Yours, Per Tunedal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi Francis, well, yes it's both in the monodix and the bidix. There must be some trivial error I've overlooked: 1. monodix: !-- PT: böjs vanligen med mer, mest men andra ord med detta paradigm kompareras normalt: två paradigm? -- pardef n=afrikansk__adj e pl/l rs n=adj/s n=pst/s n=ut/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=pst/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=pst/s n=m/s n=sg/s n=def//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=pl/s n=ind//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=sp/s n=def//r/p/e e r=LR a=PT plare/l rers n=adj/s n=comp/s n=un/s n=sp//r/p/e e r=LR a=PT plast/l rers n=adj/s n=sup/s n=un/s n=sp/s n=ind//r/p/e e r=LR a=PT plaste/l rers n=adj/s n=sup/s n=un/s n=sp/s n=def//r/p/e /pardef e lm=afrikansk iafrikansk/ipar n=afrikansk__adj//e 2. bidix: pardef n=afrikansk_amerikansk__adj e pls n=pst/s n=un/s n=sp/s n=def//lrs n=pst/s n=un/s n=sg/s n=def//r/p/e e pls n=pst/s n=un/s n=pl/s n=ind//lrs n=pst/s n=un/s n=pl/s n=ind//r/p/e e r=LRpls n=pst/s n=ut/s n=sg/s n=ind//lrs n=pst/s n=un/s n=sg/s n=ind//r/p/e e r=LRpls n=pst/s n=nt/s n=sg/s n=ind//lrs n=pst/s n=un/s n=sg/s n=ind//r/p/e e r=LRpls n=pst/s n=m/s n=sg/s n=def//lrs n=pst/s n=un/s n=sg/s n=def//r/p/e e r=RLpls n=pst/s n=GD//lrs n=pst/s n=un/s n=sg/s n=ind//r/p/e e r=LR a=PTpls n=comp/s n=un/s n=sp//lrs n=unsint/s n=comp/s n=un/s n=ND//r/p/e e r=LR a=PTpls n=sup/s n=un/s n=sp//lrs n=unsint/s n=sup/s n=un/s n=ND//r/p/e e r=RL a=PTpls n=unsint/s n=comp/s n=un/s n=sp//lrs n=unsint/s n=comp/s n=un/s n=ND//r/p/e e r=RL a=PTpls n=unsint/s n=sup/s n=un/s n=sp//lrs n=unsint/s n=sup/s n=un/s n=ND//r/p/e /pardef rafrikansks n=adj//r/ppar n=afrikansk_amerikansk__adj//e I don' t know if the two last lines in the bidix are necessary, I added them just in case when it didn't work without them. But adding them didn't help. Yours, Per Tunedal On Wed, Mar 11, 2015, at 10:14, Francis Tyers wrote: A 2015-03-11 07:51, Per Tunedal escrigué: Hi, it works slightly different in swe-dan, as the grades are included in each paradigm (adjgrad doesn't exist)). I looked at the diskret_amerikansk__adj paradigm and tried to add the following lines in afrikansk_amerikansk__adj: e r=LR a=PTpls n=comp/s n=un/s n=sp//lrs n=unsint/s n=comp/s n=un/s n=ND//r/p/e e r=LR a=PTpls n=sup/s n=un/s n=sp//lrs n=unsint/s n=sup/s n=un/s n=ND//r/p/e Unfortunately this doesn't work: echo afrikanskare | apertium -d . swe-dan *afrikanskare * means unknown word, is it in the bilingual dictionary and in the morphological analyser? F. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi, it works slightly different in swe-dan, as the grades are included in each paradigm (adjgrad doesn't exist)). I looked at the diskret_amerikansk__adj paradigm and tried to add the following lines in afrikansk_amerikansk__adj: e r=LR a=PTpls n=comp/s n=un/s n=sp//lrs n=unsint/s n=comp/s n=un/s n=ND//r/p/e e r=LR a=PTpls n=sup/s n=un/s n=sp//lrs n=unsint/s n=sup/s n=un/s n=ND//r/p/e Unfortunately this doesn't work: echo afrikanskare | apertium -d . swe-dan *afrikanskare Yours, Per Tunedal On Mon, Mar 9, 2015, at 09:00, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi Francis, Excellent! What should I do to get it to work in the opposite direction. I would like to keep the analysis of incorrectly inflected adjectives, like samhälleligare, but mer samhällelig should be generated. You could mark samhälleligare as LR. The nno-nob bidix uses pardefs for the various possibilities like adj.sint to adj (or adj to adj.sint). If you allow a synthetic LR form in an otherwise analytic monodix pardef, the bidix pardefs could deal with that. If we use the bidix pardefs from nno-nob as a basis, it'd probably look something like this: pardef n=adjgrad c=Used by adj pardefs e pls n=comp//l rs n=comp//r/p/e e pls n=pst//l rs n=pst//r/p/e e pls n=sup//l rs n=sup//r/p/e /pardef pardef n=adj c=Analytic on both sides e pls n=adj//l rs n=adj//r/ppar n=adjgrad//e e r=LRpls n=adj/s n=sint//l rs n=adj//r/ppar n=adjgrad//e e r=RLpls n=adj//l rs n=adj/s n=sint//r/ppar n=adjgrad//e /pardef pardef n=adj_sint c=Synthetic on both sides e pls n=adj/s n=sint//l rs n=adj/s n=sint//r/ppar n=adjgrad//e /pardef pardef n=adj_sint:adj c=Synthetic left, analytic right e pls n=adj/s n=sint//l rs n=adj//r/ppar n=adjgrad//e e r=RLpls n=adj/s n=sint//l rs n=adj/s n=sint//r/ppar n=adjgrad//e /pardef pardef n=adj:adj_sint c=Analytic left, synthetic right e pls n=adj//l rs n=adj/s n=sint//r/ppar n=adjgrad//e e r=LRpls n=adj/s n=sint//l rs n=adj/s n=sint//r/ppar n=adjgrad//e /pardef The analytic adj pardef translates adj to adj, but if it sees an adj.sint (samhälleligare), it translates it into analytic adj. Similarly for the other pardefs. The adjgrad is there to make sure we don't have two pardefs matching the same input. - I don't think it makes sense to correct the other direction, that'd just lead to overcorrection (try searching the web for e.g. mer vacker; most of the hits seem to be correct, like lite mer vacker höstskräck, aldrig mer väcker). -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] The pair eng-deu was: Re: why two dat sg in apertium-deu?
HI Wolfgang, I cannot find any pair eng-deu in the repository. Yours, Per Tunedal On Tue, Mar 10, 2015, at 23:43, wolfgang...@web.de wrote: Hi, I'm working with the German apertium-deu monodix (for a eng-deu translation). I verified the dictionary and I notice that many of the noun pardef have two dative singulare forms (one and wrong always with an e at the end) e.g. pardef n=Abf/all__n_m e r=LRplalle/l ralls n=n/s n=m/s n=sg/s n=dat//r/ppar n=cmp-R//e e plall/l ralls n=n/s n=m/s n=sg/s n=dat//r/ppar n=cmp-R//e In German grammar there is only one dat sg (and one dat pl). Are these second dat necessary for old translations? In my local installation I removed these entries. Best regards, Wolfgang -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] No package apertium-swe found
Hi, got into trouble when reinstalling. ./autogen.sh in the swe-dan directory gives the following error message: No package 'apertium-swe' found although the language package is downloaded from SVN and I've ran ./autogen.sh make in the folder. (Make returns: Inget behöver göras för All. ) Yours, Per Tunedal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi, I'm wondering if translation from a language with comparative and supine forms of an adjective to e.g. Swedish works, if the corresponding Swedish adjective is compared with mer and mest (more and most). Can the inflected form of the adjective in the source language be translated to two words (mer/mest + adjective) in the target language? Yours, Per Tunedal On Thu, Mar 5, 2015, at 08:57, Per Tunedal wrote: Hi again Kevin, I cannot find the transfer rules in apertium-dan-nor. What file should I look for? I would like to understand how it would work in practice. Yours, Per Tunedal On Wed, Mar 4, 2015, at 21:19, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi again Kevin. funny, I just looked at the apertium-dan-nor.nno.dix ! Hmm ... When updating quite a lot of files disappeared :-) I would like to see what it looks like in the monolingual dictionaries. languages/apertium-nno/apertium-nno.nno.dix has e.g.: pardef n=sein__adj e plare/l rs n=adj/s n=sint/s n=comp/s n=un/s n=sp//r/p/e e pl/l rs n=adj/s n=sint/s n=posi/s n=mf/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=sint/s n=posi/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=sint/s n=posi/s n=un/s n=pl/s n=ind//r/p/e e ple/l rs n=adj/s n=sint/s n=posi/s n=un/s n=sp/s n=def//r/p/e e plaste/l rs n=adj/s n=sint/s n=sup/s n=un/s n=sp/s n=def//r/p/e e plast/l rs n=adj/s n=sint/s n=sup/s n=un/s n=sp/s n=ind//r/p/e /pardef pardef n=OK__adj e pl/l rs n=adj/s n=posi/s n=mf/s n=sg/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=nt/s n=sg/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=un/s n=pl/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=un/s n=sp/s n=def//r/p/e /pardef (posi→pst is on the TODO …) Now I've got: pardef n=samhällelig__adj !-- PT: Kompareras med mer och mest eller inte alls -- e pl/l rs n=adj/s n=pst/s n=ut/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=pst/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=pst/s n=m/s n=sg/s n=def//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=pl/s n=ind//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=sp/s n=def//r/p/e e r=RL c=style:fam a=PT plare/l rs n=adj/s n=comp/s n=un/s n=sp//r/p/e e r=RL c=style:fam a=PT plast/l rs n=adj/s n=sup/s n=un/s n=sp/s n=ind//r/p/e e r=RL c=style:fam a=PT plaste/l rs n=adj/s n=sup/s n=un/s n=sp/s n=def//r/p/e /pardef e lm=samhällelig a=isisamhällelig/ipar n=samhällelig__adj//e Yours, Per Tunedal -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi again Kevin, I cannot find the transfer rules in apertium-dan-nor. What file should I look for? I would like to understand how it would work in practice. Yours, Per Tunedal On Wed, Mar 4, 2015, at 21:19, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi again Kevin. funny, I just looked at the apertium-dan-nor.nno.dix ! Hmm ... When updating quite a lot of files disappeared :-) I would like to see what it looks like in the monolingual dictionaries. languages/apertium-nno/apertium-nno.nno.dix has e.g.: pardef n=sein__adj e plare/l rs n=adj/s n=sint/s n=comp/s n=un/s n=sp//r/p/e e pl/l rs n=adj/s n=sint/s n=posi/s n=mf/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=sint/s n=posi/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=sint/s n=posi/s n=un/s n=pl/s n=ind//r/p/e e ple/l rs n=adj/s n=sint/s n=posi/s n=un/s n=sp/s n=def//r/p/e e plaste/l rs n=adj/s n=sint/s n=sup/s n=un/s n=sp/s n=def//r/p/e e plast/l rs n=adj/s n=sint/s n=sup/s n=un/s n=sp/s n=ind//r/p/e /pardef pardef n=OK__adj e pl/l rs n=adj/s n=posi/s n=mf/s n=sg/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=nt/s n=sg/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=un/s n=pl/s n=ind//r/p/e e pl/l rs n=adj/s n=posi/s n=un/s n=sp/s n=def//r/p/e /pardef (posi→pst is on the TODO …) Now I've got: pardef n=samhällelig__adj !-- PT: Kompareras med mer och mest eller inte alls -- e pl/l rs n=adj/s n=pst/s n=ut/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=pst/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=pst/s n=m/s n=sg/s n=def//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=pl/s n=ind//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=sp/s n=def//r/p/e e r=RL c=style:fam a=PT plare/l rs n=adj/s n=comp/s n=un/s n=sp//r/p/e e r=RL c=style:fam a=PT plast/l rs n=adj/s n=sup/s n=un/s n=sp/s n=ind//r/p/e e r=RL c=style:fam a=PT plaste/l rs n=adj/s n=sup/s n=un/s n=sp/s n=def//r/p/e /pardef e lm=samhällelig a=isisamhällelig/ipar n=samhällelig__adj//e Yours, Per Tunedal -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] High Frequency Missing Words
Hi, wouldn't it be great if the input language was detected automatically? Maybe TextCat http://www.let.rug.nl/vannoord/TextCat/ would do the trick? Yours, Per Tunedal On Wed, Mar 4, 2015, at 21:16, Flammie Pirinen wrote: 2015-03-04, Tino Didriksen sanoi: There's a lot of source/target language confusion and people using the entirely wrong language pair, which means we have a problem and need to fix the apertium.org interface so people don't make that mistake. Or make it detect languages better and override people's choice when they're clearly wrong. I’m not sure it’s an actual problem in UI, I do it all the time that I try out bunch of things back and forth and since these new fangly widgets nowadays process what I copy-paste without me clicking any buttons it happens all the time that I copy-paste stuff first and change languages then and it has already translated things in obviously wrong pairs. (Yes I am aware of the check-box but it's not that big of an issue for me as a user to have wrong translations on the fly that I’d bother...) -- Flammie, computer scientist bachelor + linguist master = computational linguist doctor, free software Finnish localiser, and more! http://www.iki.fi/flammie/ -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Mitzuli released
Hi Mikel, Congratulations! The app works great. To make it successful the most important is to work on the language pairs to improve the translation quality. (For Swedish-Danish the quality is actually better than what the GF translator achieves, though. What a surprise!) Yours, Per Tunedal On Mon, Mar 2, 2015, at 10:06, Mikel Artetxe wrote: Hi Apertiumers, I just wanted to let you know that, after a year as a beta, Mitzuli has finally been released today, and it is publicly available on Google Play. For those of you who have not heard about it, Mitzuli is an Apertium based translator app for Android with a nice user interface and support for advanced features like ASR (voice input), OCR (camera input), and TTS (voice output). For more information, you can visit its new website at https://www.mitzuli.com (btw, there is now a section for the projects in which Mitzuli is based, and Apertium could not be missing there, of course ;-) The app can be downloaded from https://play.google.com/store/apps/details?id=com.mitzuli And its source code can be found at https://github.com/artetxem/mitzuli Regards, Mikel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Fwd: [Moses-support] DiscoMT 2015 Shared Task on Pronoun Translation (at EMNLP 2015)
Hi, can Apertium somehow be used for this task? Intuitively, the analysis (part of speech tags) would be useful. Especially if the tags could be remembered for the previous sentence. Yours, Per Tunedal - Original message - From: joerg tiede...@gmail.com To: moses-support moses-supp...@mit.edu Subject: [Moses-support] DiscoMT 2015 Shared Task on Pronoun Translation (at EMNLP 2015) Date: Fri, 27 Feb 2015 13:43:50 +0100 === DiscoMT 2015 Shared Task on Pronoun Translation === Website: https://www.idiap.ch/workshop/DiscoMT/shared-task In connection with EMNLP 2015 (http://emnlp2015.emnlp.org) We are happy to announce a new exciting task for people interested in (discourse-aware) machine translation, anaphora resolution and machine learning in general. The EMNLP 2015 Workshop on Discourse in Machine Translation features two shared tasks: Task 1: Pronoun-Focused Machine Translation Task 2: Cross-Lingual Pronoun Prediction Task 1 requires machine translation (from English to French) and focuses on the evaluation of translated pronouns. We provide training data and a baseline SMT model to get started. Task 2 is a straightforward classification task in which one has to predict the correct translation of a given pronoun in English (it or they) into French (ce, elle, elles, il, ils, ça, cela, on, OTHER). We provide training and development data and a simple baseline system using an N-gram language model. More details of the two tasks are attached below and can be found at our website: https://www.idiap.ch/workshop/DiscoMT/shared-task Important Dates: 4 May, 2015 Release of the MT test set (task 1) 10 May, 2015 Submission of translations (task 1) 11 May, 2015 Release of the classification test set (task 2) 18 May, 2015 Submissions of classification results (task 2) 28 May, 2015 System paper submission deadline Sep., 2015 Workshop in Lisbon Mailing list: https://groups.google.com/d/forum/discomt2015 Downloads: https://www.dropbox.com/sh/c8qnpag5z29jyh6/qk1TE9-UvcgEnfccdRwxa?dl=0 Download alternative 1: http://opus.lingfil.uu.se/DiscoMT2015/ Download alternative 2: http://stp.lingfil.uu.se/~joerg/DiscoMT2015/ - Acknowledgements: Funding for the manual evaluation of the pronoun-focused translation task is generously provided by the European Association for Machine Translation (EAMT) - == Detailed Task Description: == * Overview The DiscoMT 2015 shared task will consist of two subtasks, relevant to both the MT and discourse communities: pronoun-focused translation, a practical MT task, and cross-lingual pronoun prediction, a classification task that requires no specific MT expertise and is interesting as a machine learning task in its own right. For groups wishing to participate in both tasks, one possibility is to convert a system for the classification task into an MT feature model using existing software such as the Docent decoder (Hardmeier et al., ACL 2013). Both tasks use the English–French language pair, which has a sufficiently high baseline performance to produce basically intelligible output, as well as interesting differences in their pronoun systems. * Task 1: Pronoun-Focused Translation Task In the pronoun-focused translation task, you are given a collection of English input documents, which you are asked to translate into French. This task is the same as for other MT shared tasks such as that of WMT. The difference is in the way the translations are evaluated. Instead of checking the overall translation quality, we specifically look at how the English subject pronouns it and they were translated. The principal evaluation will be carried out manually and will focus specifically on the correctness of pronoun translation. Thanks to a grant from the EAMT, the manual evaluation will be run by the organisers and participants don't have to contribute evaluations. Automatic reference-based metrics are available for development purposes. The texts in the test corpus will consist of transcripts of TED talks. The training data contains an in-domain corpus of TED talks as well as some additional data from Europarl and news texts. To make the participating systems as comparable as possible, we ask you to constrain the training data of your system to the resources listed below as far as you can, but this is not a strict requirement and we do accept submissions using additional resources. If your system uses any resources other than those of the official data release, please be specific about what was included in the system description paper. For the same reason, we also suggest that you use the tokeniser provided by us unless you have a good reason to do otherwise. The test set will be supplied
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi again Kevin. funny, I just looked at the apertium-dan-nor.nno.dix ! Hmm ... When updating quite a lot of files disappeared :-) I would like to see what it looks like in the monolingual dictionaries. Now I've got: pardef n=samhällelig__adj !-- PT: Kompareras med mer och mest eller inte alls -- e pl/l rs n=adj/s n=pst/s n=ut/s n=sg/s n=ind//r/p/e e plt/l rs n=adj/s n=pst/s n=nt/s n=sg/s n=ind//r/p/e e ple/l rs n=adj/s n=pst/s n=m/s n=sg/s n=def//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=pl/s n=ind//r/p/e e pla/l rs n=adj/s n=pst/s n=un/s n=sp/s n=def//r/p/e e r=RL c=style:fam a=PT plare/l rs n=adj/s n=comp/s n=un/s n=sp//r/p/e e r=RL c=style:fam a=PT plast/l rs n=adj/s n=sup/s n=un/s n=sp/s n=ind//r/p/e e r=RL c=style:fam a=PT plaste/l rs n=adj/s n=sup/s n=un/s n=sp/s n=def//r/p/e /pardef e lm=samhällelig a=isisamhällelig/ipar n=samhällelig__adj//e Yours, Per Tunedal On Wed, Mar 4, 2015, at 13:46, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi Kevin, I cannot find any sint-tag in the dan-nor dictionaries e.g. apertium-dan-nor.nno.dix There is no apertium-dan-nor.nno.dix. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] High Frequency Missing Words
Hi Tino, well the on time words are not of any interest anyway. They just make the list much longer. For me you could just delete them. Yours, Per Tunedal On Wed, Mar 4, 2015, at 14:24, Tino Didriksen wrote: On 4 March 2015 at 14:16, Joonas Kylmälä j.kylm...@gmail.com wrote: This looks good! But there is one thing which came to my mind: what if people write there something personal and they don't want it to show publicly? If the problem is not taken in account yet, maybe we could only show the words which occur two or more times? Thought about that, but something secret that you can somehow make out the context of when you have 1 token to look at? Highly unlikely. Sure, if someone writes MyDirtySecretAllInOneTokenAndMyNameIsXAndILiveInY then that'll show, but I just can't see that being a real world problem. -- TD -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compare adjectives in Swedish
Hi Kevin, I cannot find any sint-tag in the dan-nor dictionaries e.g. apertium-dan-nor.nno.dix Yours, Per Tunedal On Wed, Mar 4, 2015, at 11:42, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, how to treat adjectives that are compared with the help of mer and mest (more and most), instead of the more common endings -are and -aste? I cannot have a paradigm with words before the adjective, can I? It should not be: *samhällelig *samhälleligare *samhälleligast but: samhällelig mer samhällelig mest samhällelig if you would compare this rare adjective. There are a lot of similar adjectives in Swedish. In fact most long adjectives are compared this way. That's handled in transfer; see dan-nor, nno-nob (e.g. macro set_grau_aux2). They should have different taggings; adjectives that inflect «-are/-ast(e)» have adjsint (synthetic), while adj's that take «mer/mest» should just have adj (analytic). This way, transfer knows if it's possible to generate a sup/comp form. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] High Frequency Missing Words
Hi Tino, very interesting. Looking at the swe-dan file I'm a bit confused. I should look for Swedish words, not Danish, shouldn't I? Or are both directions included? Yours, Per Tunedal On Wed, Mar 4, 2015, at 11:22, Tino Didriksen wrote: On 4 March 2015 at 10:53, Francis Tyers fty...@prompsit.com wrote: For the kaz-tat,tat-kaz directions you could grep out Latin characters, would remove at least some of the bokmål and nynorsk :D But that would mean adding special case code to the export script, which sounds boring. Instead, I've added a Download link to each pair so anyone can just get the entire dump as a tab-separated UTF-8 plain text file and do their own filtering. E.g., http://apertium.projectjj.com/missingFreqs.php?export=swe-dan -- TD -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] High Frequency Missing Words
Hi Tino, Excellent. Interesting to read. Quite an incentive to add some more words. It might be a good idea to publish the data on a regular basis, e.g. once a year. BTW What puzzles me, is that the missing words are not very frequent in a general domain corpus. The words apparently reflect the interests of the users. Maybe the general domain users are quite happy? But some popular domains are missing? Yours, Per Tunedal On Wed, Mar 4, 2015, at 13:54, Tino Didriksen wrote: On 4 March 2015 at 13:49, Per Tunedal per.tune...@operamail.com wrote: __ very interesting. Looking at the swe-dan file I'm a bit confused. I should look for Swedish words, not Danish, shouldn't I? Or are both directions included? They're separate: - http://apertium.projectjj.com/missingFreqs.php?pair=dan-swe - http://apertium.projectjj.com/missingFreqs.php?pair=swe-dan Which just goes to show people pick the wrong direction, or don't pick a direction at all. -- TD -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Compare adjectives in Swedish
Hi, how to treat adjectives that are compared with the help of mer and mest (more and most), instead of the more common endings -are and -aste? I cannot have a paradigm with words before the adjective, can I? It should not be: *samhällelig *samhälleligare *samhälleligast but: samhällelig mer samhällelig mest samhällelig if you would compare this rare adjective. There are a lot of similar adjectives in Swedish. In fact most long adjectives are compared this way. Yours, Per Tunedal -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The pipeline
Hi Francis, Thank you! It works like a charm. Yours, Per Tunedal On Wed, Feb 18, 2015, at 17:25, Francis Tyers wrote: A 2015-02-18 17:15, Per Tunedal escrigué: Hi Francis, Thank you. -g works OK: echo Vi behöver en annan boll | lt-proc sv-da.automorf.bin | apertium-tagger -g $2 sv-da.prob | apertium-pretransfer | lt-proc -b sv-da.autobil.bin | apertium-transfer -b apertium-sv-da.sv-da.t1x sv-da.t1x.bin | lt-proc -g $1 sv-da.autogen.bin | less But why isn't there any -g switch after lt-proc in the modes file? And how do I get rid of * @ etc. see below: Vi behøver en anden *karavanförare \@genast (END) Yours, Per Tunedal You shouldn't be using $1 and $2 in the command line, they refer to command line arguments. The switch comes from the main apertium script. That is why it is not in the modes file, to get rid of the diagnostics you can use: -n, --non-marked-genmorph. generation without unknown word marks (from the lt-proc --help output) Fran -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The pipeline
Hi Francis, Thank you. -g works OK: echo Vi behöver en annan boll | lt-proc sv-da.automorf.bin | apertium-tagger -g $2 sv-da.prob | apertium-pretransfer | lt-proc -b sv-da.autobil.bin | apertium-transfer -b apertium-sv-da.sv-da.t1x sv-da.t1x.bin | lt-proc -g $1 sv-da.autogen.bin | less But why isn't there any -g switch after lt-proc in the modes file? And how do I get rid of * @ etc. see below: Vi behøver en anden *karavanförare \@genast (END) Yours, Per Tunedal On Wed, Feb 18, 2015, at 16:50, Francis Tyers wrote: A 2015-02-18 14:23, Per Tunedal escrigué: Hi Mikel, Thank you. All works except the last step: lt-proc $1 /home/per/Repository/apertium-sv-da/da-sv.autogen.bin I run: echo Vi behöver en annan boll | lt-proc sv-da.automorf.bin | apertium-tagger -g $2 sv-da.prob | apertium-pretransfer | lt-proc -b sv-da.autobil.bin | apertium-transfer -b apertium-sv-da.sv-da.t1x sv-da.t1x.bin | lt-proc $1 sv-da.autogen.bin | less and get: std::exception But if I run the ordinary way everything works OK: echo Vi behöver en annan boll | apertium -u -d . da-sv BTW How do I pass the -u if I run the commands manually? Try: lt-proc -n /home/per/Repository/apertium-sv-da/da-sv.autogen.bin lt-proc -g /home/per/Repository/apertium-sv-da/da-sv.autogen.bin lt-proc -d /home/per/Repository/apertium-sv-da/da-sv.autogen.bin Different generation modes, the --help output will explain. F. -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The pipeline
Hi Mikel, Thank you. All works except the last step: lt-proc $1 /home/per/Repository/apertium-sv-da/da-sv.autogen.bin I run: echo Vi behöver en annan boll | lt-proc sv-da.automorf.bin | apertium-tagger -g $2 sv-da.prob | apertium-pretransfer | lt-proc -b sv-da.autobil.bin | apertium-transfer -b apertium-sv-da.sv-da.t1x sv-da.t1x.bin | lt-proc $1 sv-da.autogen.bin | less and get: std::exception But if I run the ordinary way everything works OK: echo Vi behöver en annan boll | apertium -u -d . da-sv BTW How do I pass the -u if I run the commands manually? Yours, Per Tunedal On Wed, Feb 18, 2015, at 11:54, Mikel L. Forcada wrote: Hi, Per. When the modes.xml file is compiled, a set of shell scripts are generated in directory modes/ , one for each mode. Checking these out may give you some inspiration on how to run commands manually. HTH Mikel El 18/02/15 a les 09:42, Per Tunedal ha escrit: Hi, I would like to explore the pipeline Apertium uses for translation. I can see the steps in the modes.xml file, but I cannot figure out how to write the commands to run them manually. How do I get the output of lt-proc as it looks when it is forwarded to the tagger? mode name=swe-dan-bytecode install=yes pipeline program name=lt-proc file name=swe-dan.automorf.bin/ /program program name=apertium-tagger -g $2 file name=swe-dan.prob/ /program The following doesn't work: echo Vi behöver en annan boll | lt-proc swe-dan.automorf.bin | less What should I enter instead? Yours, Per Tunedal -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The pipeline
Hi Mikel On Wed, Feb 18, 2015, at 11:54, Mikel L. Forcada wrote: Hi, Per. When the modes.xml file is compiled, a set of shell scripts are generated in directory modes/ , one for each mode. Checking these out may give you some inspiration on how to run commands manually. HTH Mikel El 18/02/15 a les 09:42, Per Tunedal ha escrit: Hi, I would like to explore the pipeline Apertium uses for translation. I can see the steps in the modes.xml file, but I cannot figure out how to write the commands to run them manually. How do I get the output of lt-proc as it looks when it is forwarded to the tagger? mode name=swe-dan-bytecode install=yes pipeline program name=lt-proc file name=swe-dan.automorf.bin/ /program program name=apertium-tagger -g $2 file name=swe-dan.prob/ /program The following doesn't work: echo Vi behöver en annan boll | lt-proc swe-dan.automorf.bin | less What should I enter instead? Yours, Per Tunedal -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Fwd: languages/apertium-swe failed nightly build
Hi, don't blame me! The dependencies are not met. Yours, Per Tunedal - Original message - From: root apertium-packag...@projectjj.com To: apertium-packag...@lists.sourceforge.net, tune...@users.sourceforge.net Subject: languages/apertium-swe failed nightly build Date: Tue, 17 Feb 2015 03:27:34 + (UTC) Package: languages/apertium-swe started: Tue Feb 17 03:22:26 UTC 2015 latest: 0.1.0~r58937 existing: 0.1.0~r58773-1 distv: 1 launching rebuild data only stopped: Tue Feb 17 03:27:33 UTC 2015 FAILED: http://apertium.projectjj.com/apt/logs/apertium-swe/jessie-amd64.log blames in revisions 58774:58937 : tunedal -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Merging apertium-is-sv.is-sv.dix into apertium-swe.swe.dix
Hi Francis, what if someone would like to create a new pair with Swedish and some more distant language like German, English or French? To me it looks great to include this feature to make the monolingual dictionary more independent of the language pairs. And further, shouldn't this be the standard for all languages? It would facilitate the development of new pairs: the more distance between the pairs, the better. Yours, Per Tunedal On Mon, Feb 16, 2015, at 11:33, Francis Tyers wrote: A 2015-02-16 09:03, Per Tunedal escrigué: Hi, can anyone explain the implications of how the months are treated in the is-sv pair? It differs significantly from the pair swe-dan. Should this somehow be introduced into the pair swe-dan? !-- Punctuation -- pardef n=mánuðir epljanuari/lrjanuari/r/p/e eplfebruari/lrfebruari/r/p/e eplmars/lrmars/r/p/e eplapril/lrapril/r/p/e eplmaj/lrmaj/r/p/e epljuni/lrjuni/r/p/e epljuli/lrjuli/r/p/e eplaugusti/lraugusti/r/p/e eplseptember/lrseptember/r/p/e eploktober/lroktober/r/p/e eplnovember/lrnovember/r/p/e epldecember/lrdecember/r/p/e /pardef !-- 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th ... 20th 21st -- pardef n=dates e re[2-3]?[0,4-9]/replb//lrb//r/ppar n=mánuðir/pl/rs n=num//r/p/e e re1[0-9]/replb//lrb//r/ppar n=mánuðir/pl/rs n=num//r/p/e /pardef This is a fine way to deal with months for translation, to e.g. get 3. nóvember - 3rd November However, I suppose in swe-dan and isl-swe it isn't strictly necessary as they work the same. Fran -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Merging apertium-is-sv.is-sv.dix into apertium-swe.swe.dix
Hi, can anyone explain the implications of how the months are treated in the is-sv pair? It differs significantly from the pair swe-dan. Should this somehow be introduced into the pair swe-dan? !-- Punctuation -- pardef n=mánuðir epljanuari/lrjanuari/r/p/e eplfebruari/lrfebruari/r/p/e eplmars/lrmars/r/p/e eplapril/lrapril/r/p/e eplmaj/lrmaj/r/p/e epljuni/lrjuni/r/p/e epljuli/lrjuli/r/p/e eplaugusti/lraugusti/r/p/e eplseptember/lrseptember/r/p/e eploktober/lroktober/r/p/e eplnovember/lrnovember/r/p/e epldecember/lrdecember/r/p/e /pardef !-- 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th ... 20th 21st -- pardef n=dates e re[2-3]?[0,4-9]/replb//lrb//r/ppar n=mánuðir/pl/rs n=num//r/p/e e re1[0-9]/replb//lrb//r/ppar n=mánuðir/pl/rs n=num//r/p/e /pardef Yours, Per Tunedal On Tue, Feb 10, 2015, at 09:01, Per Tunedal wrote: Hi, --snip-- The months are treated differently. I'm not sure exactly what Tihomir has done, but it looks neat. This applies to the bidix as well. Can anyone explain? Should this be used in apertium-swe.swe.dix and apertium-swe-dan.swe-dan.dix ? --snip-- Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Finding errors in dictionaries
Hi, Now I've created the page Finding errors in dictionaries in the wiktionary: http://wiki.apertium.org/wiki/Finding_errors_in_dictionaries I hope it will help contributors to improve translation quality. I haven't translated it to French yet, though. Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] New GSOC ideas
Hi Francis, I really like the idea Make a program which tests Apertium data files for suspicious or unrecommended constructs (likely to be bugs). For someone like me it's very easy to make a minor mistake when editing those bloody XML-files :-) It's quite easy to miss a quotation mark ( ) or some other symbols () that aren't all that important in ordinary language. Or omitting some closing symbols at the right side of the expression (/). One way of improved checking would be not to just have separate programs like Jimmy O'Regan's lint-tool for tsx-files, but also make the make script be more explicit about errors. Some helpful hints about common errors. Print the offending line with explicit info. Or rather the offending expression? This applies to make scripts for dictionaries as well as for tagger training. The advantage of this is that everyone has to run the make script, but it's easy to forget running a special tool or simply not be aware of it's existence. Regarding the make scripts for tagger training, it would be very welcome if they would work with comments in the tsx-files. Working without comments complicates the work considerably. That's the main reason why I abandoned the work on retraining the tagger for the pair Swedish-Danish. Yours, Per Tunedal On Thu, Feb 12, 2015, at 01:08, Francis Tyers wrote: Hello all, We've added some new ideas for GSOC: * Weighted transfer rules * Automatic blank handling * Integration and debugging tools for Grammatical Framework * Weights in lttoolbox * Improvements to the Apertium website Please don't feel shy about fleshing out the ideas and improving the descriptions. :D http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code We currently have thirteen ideas and could do with a few more. Something around seven or eight more would be good. Entry level: 3 Medium: 5 Hard: 5 It would be good to have a mix, so 4 more entry level ones and two each medium and hard or so. Fran -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] How do I add a page in the Wiki
Hi, I would like to contribute some experiences of how to find errors in a dictionary. Unfortunately, I cannot figure out how to add a new page to the Wiki. I would like to add it with the Documentation page as parent. Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Changes for pronouns and adjectives in apertium-swe.swe.dix
Hi, I've done some changes for pronouns and adjectives in apertium-swe.swe.dix to make it more accurate. These changes have not been reflected in the danish dictionary though, because I believe it might break the pair danish-norwegian. And further I'm not all that strong in Danish anyway. These are rather important features in a language and has to be right. Could anyone more savvy have a look and do the appropriate changes in the Danish dictionary (and possibly in the bidix)? What about the trick word vad for instance. Now it looks like this in the bidix: !-- PT: Changed to pronoun: s n=prn/s n=rel/ e plvads n=adv/s n=itg//lrhvads n=adv/s n=itg//r/p/e -- e a=PT plvads n=prn/s n=rel//lrhvads n=adv/s n=itg//r/p/e Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Why am I getting this for swe-dan
Hi Tino, thank you. Now I've fixed the typo. Yours, Per Tunedal On Tue, Feb 10, 2015, at 08:36, Tino Didriksen wrote: Replied inline... On 10 February 2015 at 08:24, Per Tunedal per.tune...@operamail.com wrote: I've got this strange message: Package: languages/apertium-swe started: Tue Feb 10 07:13:28 UTC 2015 latest: 0.1.0~r58796 existing: 0.1.0~r58773-1 distv: 1 launching rebuild data only stopped: Tue Feb 10 07:18:25 UTC 2015 FAILED: http://apertium.projectjj.com/apt/logs/apertium-swe/wheezy-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/jessie-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/sid-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/precise-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/trusty-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/utopic-amd64.log http://apertium.projectjj.com/apt/logs/apertium-swe/vivid-amd64.log blames in revisions 58774:58796 : tunedal Why? Have done some mistake? This is the kind of mail you get if you break the build for a package. You broke apertium-swe, and the why is in the linked logs, usually at the bottom. In this case, the errors are: apertium-swe.swe.dix:1264: element e: Schemas validity error : Element 'e': Character content other than whitespace is not allowed because the content type is 'element-only'. apertium-swe.swe.dix:1264: element e: validity error : Element e content does not follow the DTD, expecting (i | p | par | re)+, got (p CDATA) You must have forgotten to check that make for apertium-swe passed before committing your changes. -- Tino Didriksen -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Error in expanded dictionary
Hi Kevin, Thanks for the reply. Yours, Per Tunedal On Tue, Feb 10, 2015, at 10:49, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, why is: NON_ANALYSIS appended after words at many lines? eg. aktrisernaNON_ANALYSIS There's a bug in lt-comp, where if you have a pardef that looks like par n=foo e/e /par then it'll produce an FST that leads to lt-proc hanging. So if you want a pardef like pardef n=cmp e pl/l r/r/p/e e r=RLpl/l rs n=cmp//r/p/e /pardef which adds the cmp tag only for the RL FST, then the LR FST uses pardef n=cmp e pl/l r/r/p/e /pardef which gives this bug. Thus we do pardef n=cmp e pl/l r/r/p/e e r=RLpl/l rs n=cmp//r/p/e e plNON_ANALYSIS/l rDUE_TO_LT_PROC_HANG/r/p/e /pardef Yes, the bug should be fixed, it just hasn't been annoying enough yet that anyone's gotten around to it :-) (And the NON_ANALYSIS of course will presumably never be seen in a corpus[1] so it's harmless to have it in there.) [1] Except for the corpus of apertium-stuff emails and #apertium IRC logs. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Changes for pronouns and adjectives in apertium-swe.swe.dix
Hi Francis, OK, now vad - hvad is treated both as relative pronoun and interrogative adverb. Before my changes it was treated only as interrogative adverb. In my Swedish grammar it's both relative and interrogative pronoun. Yours, Per Tunedal On Tue, Feb 10, 2015, at 10:11, Francis Tyers wrote: A 2015-02-10 09:24, Per Tunedal escrigué: Hi, I've done some changes for pronouns and adjectives in apertium-swe.swe.dix to make it more accurate. These changes have not been reflected in the danish dictionary though, because I believe it might break the pair danish-norwegian. And further I'm not all that strong in Danish anyway. These are rather important features in a language and has to be right. Could anyone more savvy have a look and do the appropriate changes in the Danish dictionary (and possibly in the bidix)? What about the trick word vad for instance. Now it looks like this in the bidix: !-- PT: Changed to pronoun: s n=prn/s n=rel/ e plvads n=adv/s n=itg//lrhvads n=adv/s n=itg//r/p/e -- e a=PT plvads n=prn/s n=rel//lrhvads n=adv/s n=itg//r/p/e When making changes to the Swedish dictionary I propose that you follow the patterns in the Norwegian and Danish dictionaries. dan-nor: e vr=nobplhvads n=prn/s n=itg//lrhvas n=prn/s n=itg//r/p/e e vr=nnoplhvads n=prn/s n=itg//lrkvas n=prn/s n=itg//r/p/e e vr=nobplhvads n=adv//lrhvas n=adv//r/p/e e vr=nnoplhvads n=adv//lrkvas n=adv//r/p/e nno-nob: e plkvas n=prn//lrhvas n=prn//r/p/e e plkvas n=adv//lrhvas n=adv//r/p/e Fran -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-sv-da.sv.dix merged into apertium-swe.swe.dix
Hi, I've got stuck: On Mon, Feb 9, 2015, at 10:24, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, Now I've merged apertium-sv-da.sv.dix into apertium-swe.swe.dix. Kevin, would you please make apertium-sv-da depend on languages/apertium-swe with that little change to the makefiles. I suppose I would better move the apertium-sv-da.sv.dix to another folder to avoid mistakes in the future. Set up, and changed to three-letter codes. Make sure you're running the newest SVN of apertium/lttoolbox/apertium-lex-tools (on .deb or .rpm-based linuxes you can just use Tino Didriksen's repos, http://wiki.apertium.org/wiki/Prerequisites_for_Debian or http://wiki.apertium.org/wiki/Prerequisites_for_RPM ). Then do: for l in swe dan; do svn checkout https://svn.code.sf.net/p/apertium/svn/languages/apertium-$l cd apertium-$l ./autogen.sh cd .. || break; done svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-swe-dan Works OK until next step: cd apertium-swe-dan ./autogen.sh --with-lang1=../apertium-swe --with-lang2=../apertium-dan Here I get stuck: You don't have cg-comp installed I thought this language pair didn't use any constraint grammar? I've tried to follow the instructions in the wiki for Debian/Ubuntu but it doesn't work. On this box am I trying Ubuntu for change. I get a message that the repository cannot be read because of wrong format. Now compile both the monolingual data and the pair by doing: make -j3 langs and this should give some output: make test -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-sv-da.sv.dix merged into apertium-swe.swe.dix
Hi Francis, Yes, I noted that a long time ago and have used some of those entries. Goldwashing from SALDO. I was trying to find words from my corpus and got some. Yours, Per Tunedal On Mon, Feb 9, 2015, at 10:31, Francis Tyers wrote: A 2015-02-09 10:24, Kevin Brubeck Unhammer escrigué: Per Tunedal per.tune...@operamail.com writes: Hi, Now I've merged apertium-sv-da.sv.dix into apertium-swe.swe.dix. Kevin, would you please make apertium-sv-da depend on languages/apertium-swe with that little change to the makefiles. I suppose I would better move the apertium-sv-da.sv.dix to another folder to avoid mistakes in the future. Set up, and changed to three-letter codes. ...snip... Note that one of our GCI students, Joonas converted the nouns and verbs from SALDO to something approximating the Apertium tagset. The files and scripts are in apertium-swe/dev/saldo. Fran -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] apertium-sv-da.sv.dix merged into apertium-swe.swe.dix
Hi, Now I've merged apertium-sv-da.sv.dix into apertium-swe.swe.dix. Kevin, would you please make apertium-sv-da depend on languages/apertium-swe with that little change to the makefiles. I suppose I would better move the apertium-sv-da.sv.dix to another folder to avoid mistakes in the future. Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Expand monodix: generation only
Hi Kevin, Thank you for your quick answer. On Fri, Feb 6, 2015, at 18:51, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, I've successfully extracted a Swedish word list from apertium.sv-da.sv.dix as follows: --snip-- LR entries are output from lt-expand with :: as the field separator, so you can do lt-expand *.sv.dix | grep -v '::' | cut -f1 -d: sv.expanded You might also want to exclude RL-marked entries (they tend to be a bit weird in monodixes): lt-expand *.sv.dix | grep -v ':[]:' | cut -f1 -d: sv.expanded Excellent! Just what I need. Anyhow, I continued by checking the list in Word-processing programs to get the real errors and found quite a lot. Some of them have I already corrected in the pair sv-da. What about the separate language dictionary? Should I merge my corrections somehow? What's the recommended procedure when improving/adding to an existing language pair? It'd be great if you could merge your changes in there; before your changes the diff was only 32 lines long so I don't think it should be much work (you might even be able to just copy it over). OK. I will give it a try. By the way: How do I use the separated language monodixies? Can they be used for existing pairs or only when creating new pairs? What's the recommendation for new pairs? The Apertium New Language Pair HOWTO still supposes that the monodixies are made exclusively for the new pair. The challenge is just getting the monodixes merged; if you merge in those changes, we can make apertium-sv-da depend on languages/apertium-swe with a little change to the makefiles. Does that mean that the monolingual dictionaries now are independent of the language pairs? What about the old requirement that all words in the monodix had to be translated for the pair; i.e. words had to be present in both monodixies and in the bidix. Is that requirement now abandoned? What happens when translating to Swedish if a form in the foreign language is missing in Swedish of vice versa? Is it now possible to extend the Swedish dictionary, without having to extend the Danish dictionary at the same time? If so, it would facilitate contributions considerably. Lars Aronsson would be happy. (The diff for the Danish side is 67736 lines long, so that may be more of a challenge to merge … but I'd still say it's worth it to merge the Swedish side right away.) The next step after I merged the sv-da.sv.dix with the swe-dix would be to merge with the pair is-sv. In that way both the pair sv-da and the pair is-sv would benefit from corrections in the Swedish monodix. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Expand monodix: generation only
Hi, I've successfully extracted a Swedish word list from apertium.sv-da.sv.dix as follows: lt-expand apertium-sv-da.sv.dix | cut -f1 -d':' apertium-sv-da.sv.dix.expanded Going through the list I found lots of errors. I excluded words present in the Aspell dictionary to get a shorter list of misspelled words. It was quite long though, and worse: it contained mostly correctly spelled words, unknown to Aspell. Hunspell (used by e.g. OpenOffice/Libre Office) knows much more words. Anyone that happens to know how to extract/get Hunspell word lists as text files? Looking at the misspelled list I realised that many of the errors are variants added for analysis only (r=LR). Is there an easy way to expand only the variants that are used for generation? Such a procedure would produce a much shorter and more correct list. Anyhow, I continued by checking the list in Word-processing programs to get the real errors and found quite a lot. Some of them have I already corrected in the pair sv-da. What about the separate language dictionary? Should I merge my corrections somehow? What's the recommended procedure when improving/adding to an existing language pair? By the way: How do I use the separated language monodixies? Can they be used for existing pairs or only when creating new pairs? What's the recommendation for new pairs? The Apertium New Language Pair HOWTO still supposes that the monodixies are made exclusively for the new pair. Yours, Per Tunedal -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Extract words from monodix
Hi Kevin, Thank you. It works as a charm. Yours, Per Tunedal On Mon, Feb 2, 2015, at 09:07, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, I've successfully extracted a Swedish word list from apertium.sv-da.sv.dix as follows: lt-expand apertium-sv-da.sv.dix | cut -f1 -d':' apertium-sv-da.sv.dix.expanded I would like to get English and French word lists as well. How do I proceed with the pairs fr-es and en-es or en-ca: there aren't any similar files for English or French in those pairs. Only for Spanish. The dix file is compiled from a .metadix file. First, compile the pair, then look for a .dix file, possibly in .deps/, like .deps/en.dix or something. BTW Would it be better to extract words from http://wiki.apertium.org/wiki/Languages , rather than from the pairs? Probably not for those languages … though if you're only after forms anyway, you could just grab all the words from all the directories and then do cat apertium-sv-da.sv.dix.expanded apertium-swe.swe.dix.expanded \ sort -u combined-apertium-swe.swe.dix.expanded -Kevin -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff Email had 1 attachment: + signature.asc 1k (application/pgp-signature) -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Abbreviation list for tokenization was: Re: Using GIZA++
Hi again Miquel, I've manually replaced the variables and the script bitextor-builddics.sh works like a charm! I've got a complaint about a missing list of Swedish abbreviations though: TOKENISING THE CORPUS... WARNING: No known abbreviations for language 'sv', attempting fall-back to English version... Where do I find those lists of abbreviations (what program, what folder)? It would be quite easy for me to supply such a list as I've already done it to Apertium-sv-da and to bligner.py Yours, Per Tunedal On Thu, Feb 20, 2014, at 19:48, Miquel Esplà wrote: Well, of course you can try to replace manually the variables by paths (as I told you, you have to try to replace variables starting and ending with __). I don't think I can help you much more because I never did this, but I'm sure that with a bit of patiente you will do it ;) Good luck! Cheers, Miquel. ---snip--- I'm sorry, I didn't explain it well: as I said, [1]bitextor-builddics.in is only the template of the script. What I didn't say is that you need to compile the project to get the true script. If you have a look into the code of the template, you will see that there are many variables starting and ending with __ (such as __PREFFIX__). These variables are replaced by the corresponding paths at compilation time. So, to use the script, you have to download the whole trunk directory, and then to run: ./autogen.sh ./configure make make install As you know, you can use the option --prefix=LOCALDIR when running ./configure to install bitextor in a specific path (for example LOCALDIR could be /home/per/local/). Best, Miquel. Yours, Per Tunedal On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: Hi Per, I think that the explanation in this website: [2]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It helps a lot to understand the structure and the content of each file generated by OmegaT. About the script, in the last release of bitextor we included a script called bitextor-builddics (you can find the template of this script here: [3]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in) which uses GIZA++ to obtain a plain text bilingual dictionary, but only including pairs of words fulfilling: a) both words occur at least 10 times in the corpus, and b) the harmonic mean of their probabilities in both probabilistic dictionaries (S - T and T - S) is higher than 0.2. If you want to use this, I recommend you to use the version in the trunk, which fixes some minor bugs still present in the release. Best, Miquel. --snip--- References 1. http://bitextor-builddics.in/ 2. http://rali.iro.umontreal.ca/rali/?q=en/node/1325 3. https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Using GIZA++
Hi Miquel, Thanks for your thorough answer. I've tried ./autogen.sh I had to install httrack, but then got: checking for a Python interpreter with version = 2.7... none configure: error: You don't have Python 2.7 or later installed. Is it really necessary to update Python? It appears that the configure script demands Python = 2.7 In Debian Squeeze Pyhton 2.6.6 is the default. I'm afraid of messing things up if I install Python manually, and not with Synaptic. Lots of things depend on Python. And upgrading to Debian Wheezy might fuzz things up as well ... Yours, Per Tunedal On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote: Hi Per, 2014-02-18 21:37 GMT+01:00 Per Tunedal [1]per.tune...@operamail.com: Hi Miquel, thank you. Looks like a good approach. Looking at the script: It runs GIZA++ in both directions to begin with? I just have to supply the bitext files? Yes, you only need to provide the bitext files compressed with gzip. But the script have some trouble finding the GIZA++ files: per@Pers-debian:~/script$ sh [2]bitextor-builddics.in sv fr /home/per/corpora/[3]OpenOffice3.fr-sv.sv /home/per/corpora/[4]OpenOffice3.fr-sv.fr /home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz adict.sv-fr TOKENISING THE CORPUS... Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/[5]OpenOffice3.fr-sv.sv: not in gzip format Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/[6]OpenOffice3.fr-sv.fr: not in gzip format LOWERCASING THE CORPUS... FILTERING OUT TOO LONG SENTENCES... FORMATTING THE CORPUS FOR PROCESSING... mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb: Filen eller katalogen finns inte BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT... CHECKING COOCURRENCE OF WORDS IN THE CORPUS... BUILDING PROBABILISTIC DICTIONARIES... FILTERING DICTIONARY... egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep: /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte : Filen eller katalogen finns inte [7]bitextor-builddics.in: 173: __PYTHON__: not found DONE! I'm sorry, I didn't explain it well: as I said, [8]bitextor-builddics.in is only the template of the script. What I didn't say is that you need to compile the project to get the true script. If you have a look into the code of the template, you will see that there are many variables starting and ending with __ (such as __PREFFIX__). These variables are replaced by the corresponding paths at compilation time. So, to use the script, you have to download the whole trunk directory, and then to run: ./autogen.sh ./configure make make install As you know, you can use the option --prefix=LOCALDIR when running ./configure to install bitextor in a specific path (for example LOCALDIR could be /home/per/local/). Best, Miquel. Yours, Per Tunedal On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: Hi Per, I think that the explanation in this website: [9]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It helps a lot to understand the structure and the content of each file generated by OmegaT. About the script, in the last release of bitextor we included a script called bitextor-builddics (you can find the template of this script here: [10]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in ) which uses GIZA++ to obtain a plain text bilingual dictionary, but only including pairs of words fulfilling: a) both words occur at least 10 times in the corpus, and b) the harmonic mean of their probabilities in both probabilistic dictionaries (S - T and T - S) is higher than 0.2. If you want to use this, I recommend you to use the version in the trunk, which fixes some minor bugs still present in the release. Best, Miquel. 2014-02-17 14:21 GMT+01:00 Per Tunedal [11]per.tune...@operamail.com: Hi Miquel, thank you for your informative answer. In deed I needed to create a coocurrence file. I did successfully create such a file with snt2cooc.out And GIZA++ has run successfully and made a lot of files in my home directory (!). How do I redirect the output to a more suitable folder? -outputpath ? Where can I find an explanation of the content of the files? I suppose the dictionary is in the translation table *.t3.final Any convenient way to extract plain text dictionaries (without going one step further and use Moses)? Some script available to decode the translation table by the using the vocabulary files *.vcb ? Yours, Per Tunedal On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote
Re: [Apertium-stuff] Using GIZA++
Hi Miquel, yes, that what was I had in my mind. But it doesn't help much dough. Next dependency is some Python library for levenstien distance ... There must be an easier way to test the script and see if it gives me something useful. I'm not interested in testing the other functions right now. Just compile the script somehow? Or just hard code paths into the script? Yours, Per Tunedal On Thu, Feb 20, 2014, at 10:46, Miquel Esplà wrote: Hi Per, I didn't try to compile with the version of Python you are using, but you can try to change this condition in configure.ac to do so. Cheers, Miquel. 2014-02-20 10:19 GMT+01:00 Per Tunedal per.tune...@operamail.com: Hi Miquel, Thanks for your thorough answer. I've tried ./autogen.sh I had to install httrack, but then got: checking for a Python interpreter with version = 2.7... none configure: error: You don't have Python 2.7 or later installed. Is it really necessary to update Python? It appears that the configure script demands Python = 2.7 In Debian Squeeze Pyhton 2.6.6 is the default. I'm afraid of messing things up if I install Python manually, and not with Synaptic. Lots of things depend on Python. And upgrading to Debian Wheezy might fuzz things up as well ... Yours, Per Tunedal On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote: Hi Per, 2014-02-18 21:37 GMT+01:00 Per Tunedal per.tune...@operamail.com: Hi Miquel, thank you. Looks like a good approach. Looking at the script: It runs GIZA++ in both directions to begin with? I just have to supply the bitext files? Yes, you only need to provide the bitext files compressed with gzip. But the script have some trouble finding the GIZA++ files: per@Pers-debian:~/script$ sh bitextor-builddics.in sv fr /home/per/corpora/OpenOffice3.fr-sv.sv /home/per/corpora/ OpenOffice3.fr-sv.fr /home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.gizadict.sv-fr TOKENISING THE CORPUS... Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/OpenOffice3.fr-sv.sv: not in gzip format Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/OpenOffice3.fr-sv.fr: not in gzip format LOWERCASING THE CORPUS... FILTERING OUT TOO LONG SENTENCES... FORMATTING THE CORPUS FOR PROCESSING... mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb: Filen eller katalogen finns inte BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT... CHECKING COOCURRENCE OF WORDS IN THE CORPUS... BUILDING PROBABILISTIC DICTIONARIES... FILTERING DICTIONARY... egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep: /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte : Filen eller katalogen finns inte bitextor-builddics.in: 173: __PYTHON__: not found DONE! I'm sorry, I didn't explain it well: as I said, bitextor-builddics.in is only the template of the script. What I didn't say is that you need to compile the project to get the true script. If you have a look into the code of the template, you will see that there are many variables starting and ending with __ (such as __PREFFIX__). These variables are replaced by the corresponding paths at compilation time. So, to use the script, you have to download the whole trunk directory, and then to run: ./autogen.sh ./configure make make install As you know, you can use the option --prefix=LOCALDIR when running ./configure to install bitextor in a specific path (for example LOCALDIR could be /home/per/local/). Best, Miquel. Yours, Per Tunedal On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: Hi Per, I think that the explanation in this website: http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It helps a lot to understand the structure and the content of each file generated by OmegaT. About the script, in the last release of bitextor we included a script called bitextor-builddics (you can find the template of this script here: https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in) which uses GIZA++ to obtain a plain text bilingual dictionary, but only including pairs of words fulfilling: a) both words occur at least 10 times in the corpus, and b) the harmonic mean of their probabilities in both probabilistic dictionaries (S - T and T - S) is higher than 0.2. If you want to use this, I recommend you to use
Re: [Apertium-stuff] Using GIZA++
Hi Miquel, thank you. I will give it a try. Yours, Per Tunedal On Thu, Feb 20, 2014, at 19:48, Miquel Esplà wrote: Well, of course you can try to replace manually the variables by paths (as I told you, you have to try to replace variables starting and ending with __). I don't think I can help you much more because I never did this, but I'm sure that with a bit of patiente you will do it ;) Good luck! Cheers, Miquel. 2014-02-20 14:11 GMT+01:00 Per Tunedal [1]per.tune...@operamail.com: Hi Miquel, yes, that what was I had in my mind. But it doesn't help much dough. Next dependency is some Python library for levenstien distance ... There must be an easier way to test the script and see if it gives me something useful. I'm not interested in testing the other functions right now. Just compile the script somehow? Or just hard code paths into the script? Yours, Per Tunedal On Thu, Feb 20, 2014, at 10:46, Miquel Esplà wrote: Hi Per, I didn't try to compile with the version of Python you are using, but you can try to change this condition in [2]configure.ac to do so. Cheers, Miquel. 2014-02-20 10:19 GMT+01:00 Per Tunedal [3]per.tune...@operamail.com: Hi Miquel, Thanks for your thorough answer. I've tried ./autogen.sh I had to install httrack, but then got: checking for a Python interpreter with version = 2.7... none configure: error: You don't have Python 2.7 or later installed. Is it really necessary to update Python? It appears that the configure script demands Python = 2.7 In Debian Squeeze Pyhton 2.6.6 is the default. I'm afraid of messing things up if I install Python manually, and not with Synaptic. Lots of things depend on Python. And upgrading to Debian Wheezy might fuzz things up as well ... Yours, Per Tunedal On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote: Hi Per, 2014-02-18 21:37 GMT+01:00 Per Tunedal [4]per.tune...@operamail.com: Hi Miquel, thank you. Looks like a good approach. Looking at the script: It runs GIZA++ in both directions to begin with? I just have to supply the bitext files? Yes, you only need to provide the bitext files compressed with gzip. But the script have some trouble finding the GIZA++ files: per@Pers-debian:~/script$ sh [5]bitextor-builddics.in sv fr /home/per/corpora/[6]OpenOffice3.fr-sv.sv /home/per/corpora/ [7]OpenOffice3.fr-sv.fr /home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz adict.sv-fr TOKENISING THE CORPUS... Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/[8]OpenOffice3.fr-sv.sv: not in gzip format Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/[9]OpenOffice3.fr-sv.fr: not in gzip format LOWERCASING THE CORPUS... FILTERING OUT TOO LONG SENTENCES... FORMATTING THE CORPUS FOR PROCESSING... mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb: Filen eller katalogen finns inte BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT... CHECKING COOCURRENCE OF WORDS IN THE CORPUS... BUILDING PROBABILISTIC DICTIONARIES... FILTERING DICTIONARY... egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep: /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte : Filen eller katalogen finns inte [10]bitextor-builddics.in: 173: __PYTHON__: not found DONE! I'm sorry, I didn't explain it well: as I said, [11]bitextor-builddics.in is only the template of the script. What I didn't say is that you need to compile the project to get the true script. If you have a look into the code of the template, you will see that there are many variables starting and ending with __ (such as __PREFFIX__). These variables are replaced by the corresponding paths at compilation time. So, to use the script, you have to download the whole trunk directory, and then to run: ./autogen.sh ./configure make make install As you know, you can use the option --prefix=LOCALDIR when running ./configure to install bitextor in a specific path (for example LOCALDIR could be /home/per/local/). Best, Miquel. Yours, Per Tunedal On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: Hi Per, I think that the explanation in this website: [12]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It helps a lot to understand the structure and the content of each file generated by OmegaT. About the script
Re: [Apertium-stuff] The beginnings of an Icelandic - Russian dictionary for Apertium
Hi, it might be a good idea to store the dictionary some other place, if you would like it publicly available. Now, the access is restricted. Yours, Per Tunedal On Mon, Feb 17, 2014, at 19:53, Ingibjorg Elsa Bjornsdottir wrote: Hi there Apertium community, I have created an Icelandic - Russian wordlist/dictionary in google docs . You are all welcome to contribute or to send me ideas that you may have. The path is the following: https://docs.google.com/spreadsheet/ccc?key=0AtVcoB9lkZjndGRqMVJqWnBUbTdlekRSSVFiSDRNbUEusp=sharing Kindest regards from southern Iceland, Ingibjorg Elsa Bjornsdottir, (Ingella) Selfoss Southern Iceland. -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Using GIZA++
Hi Miquel, thank you. Looks like a good approach. Looking at the script: It runs GIZA++ in both directions to begin with? I just have to supply the bitext files? But the script have some trouble finding the GIZA++ files: per@Pers-debian:~/script$ sh bitextor-builddics.in sv fr /home/per/corpora/OpenOffice3.fr-sv.sv /home/per/corpora/OpenOffice3.fr-sv.fr /home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz adict.sv-fr TOKENISING THE CORPUS... Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/OpenOffice3.fr-sv.sv: not in gzip format Can't open perl script __PREFIX__/share/bitextor/utils/tokenizer.perl: Filen eller katalogen finns inte gzip: /home/per/corpora/OpenOffice3.fr-sv.fr: not in gzip format LOWERCASING THE CORPUS... FILTERING OUT TOO LONG SENTENCES... FORMATTING THE CORPUS FOR PROCESSING... mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb: Filen eller katalogen finns inte mv: kan inte ta status på /tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb: Filen eller katalogen finns inte BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT... CHECKING COOCURRENCE OF WORDS IN THE CORPUS... BUILDING PROBABILISTIC DICTIONARIES... FILTERING DICTIONARY... egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep: /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte : Filen eller katalogen finns inte bitextor-builddics.in: 173: __PYTHON__: not found DONE! Yours, Per Tunedal On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: Hi Per, I think that the explanation in this website: [1]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It helps a lot to understand the structure and the content of each file generated by OmegaT. About the script, in the last release of bitextor we included a script called bitextor-builddics (you can find the template of this script here: [2]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in) which uses GIZA++ to obtain a plain text bilingual dictionary, but only including pairs of words fulfilling: a) both words occur at least 10 times in the corpus, and b) the harmonic mean of their probabilities in both probabilistic dictionaries (S - T and T - S) is higher than 0.2. If you want to use this, I recommend you to use the version in the trunk, which fixes some minor bugs still present in the release. Best, Miquel. 2014-02-17 14:21 GMT+01:00 Per Tunedal [3]per.tune...@operamail.com: Hi Miquel, thank you for your informative answer. In deed I needed to create a coocurrence file. I did successfully create such a file with snt2cooc.out And GIZA++ has run successfully and made a lot of files in my home directory (!). How do I redirect the output to a more suitable folder? -outputpath ? Where can I find an explanation of the content of the files? I suppose the dictionary is in the translation table *.t3.final Any convenient way to extract plain text dictionaries (without going one step further and use Moses)? Some script available to decode the translation table by the using the vocabulary files *.vcb ? Yours, Per Tunedal On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote: Hi Per, if I am not wrong, depending on how you compile GIZA++, it can generate the coocurrence files on-the-fly during alignment, or you may need to do so before running the alignment. Actually, I think that, with the standard compilation, you are in the second case. Have a look here: [4]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope the link will be helpful! Cheers, Miquel. 2014-02-17 10:30 GMT+01:00 Per Tunedal [5]per.tune...@operamail.com: Hi, I tried the procedure described at [6]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough dictionary, but encountered the following error in the last step: ERROR: NO COOCURRENCE FILE GIVEN! Is one step missing in the procedure? Yours, Per Tunedal -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. [7]http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ ostg.clktrk ___ Apertium-stuff mailing list [8]Apertium-stuff@lists.sourceforge.net [9]https://lists.sourceforge.net/lists/listinfo/apertium-stuff --- --- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly
[Apertium-stuff] Using GIZA++
Hi, I tried the procedure described at http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough dictionary, but encountered the following error in the last step: ERROR: NO COOCURRENCE FILE GIVEN! Is one step missing in the procedure? Yours, Per Tunedal -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Using GIZA++
Hi Miquel, thank you for your informative answer. In deed I needed to create a coocurrence file. I did successfully create such a file with snt2cooc.out And GIZA++ has run successfully and made a lot of files in my home directory (!). How do I redirect the output to a more suitable folder? -outputpath ? Where can I find an explanation of the content of the files? I suppose the dictionary is in the translation table *.t3.final Any convenient way to extract plain text dictionaries (without going one step further and use Moses)? Some script available to decode the translation table by the using the vocabulary files *.vcb ? Yours, Per Tunedal On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote: Hi Per, if I am not wrong, depending on how you compile GIZA++, it can generate the coocurrence files on-the-fly during alignment, or you may need to do so before running the alignment. Actually, I think that, with the standard compilation, you are in the second case. Have a look here: [1]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope the link will be helpful! Cheers, Miquel. 2014-02-17 10:30 GMT+01:00 Per Tunedal [2]per.tune...@operamail.com: Hi, I tried the procedure described at [3]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough dictionary, but encountered the following error in the last step: ERROR: NO COOCURRENCE FILE GIVEN! Is one step missing in the procedure? Yours, Per Tunedal -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. [4]http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ ostg.clktrk ___ Apertium-stuff mailing list [5]Apertium-stuff@lists.sourceforge.net [6]https://lists.sourceforge.net/lists/listinfo/apertium-stuff --- --- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. [7]http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ost g.clktrk ___ Apertium-stuff mailing list [8]Apertium-stuff@lists.sourceforge.net [9]https://lists.sourceforge.net/lists/listinfo/apertium-stuff References 1. https://code.google.com/p/giza-pp/issues/detail?id=9 2. mailto:per.tune...@operamail.com 3. http://wiki.apertium.org/wiki/Using_GIZA%2B%2B 4. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk 5. mailto:Apertium-stuff@lists.sourceforge.net 6. https://lists.sourceforge.net/lists/listinfo/apertium-stuff 7. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk 8. mailto:Apertium-stuff@lists.sourceforge.net 9. https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Translation memories with Apertium
Hi, I've glanced through the GSOC idea list and found: Currently Apertium has support for translation memories, basically as follows: If an input sentence is found exactly in the translation memory, it is not machine translated but instead retrieved from the translation memory. That's very interesting. I've read the wiki with great interest: http://wiki.apertium.org/wiki/Translation_memory Yours, Per Tunedal -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: improve support for non-standard input
Hi, I agree with Mikel. Per Tunedal On Wed, Feb 12, 2014, at 19:22, Mikel Forcada wrote: Tweet translation could be a task in itself. Mikel Al 02/12/2014 05:26 PM, En/na Francis Tyers ha escrit: I came up with another idea for GSOC, what do people think ? Description: Machine translation systems, especially rule-based systems, are pretty fragile when it comes to non-standard input. Get a comma, space, apostrophe or hyphen in the wrong place and it can come out all wrong. But, we definitely want to be able to translate IRC, SMS, Tweets and Youtube comments... This could possibly be merged with the accent/diacritic restoration task too. http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Improving_support_for_non-standard_text_input Note: We have two days left before the deadline. I'd encourage people to take a look at the ideas list and add anything you would be interested in mentoring. Alternatively, email the list about your idea and we will see about adding it. Fran -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: improve support for non-standard input
Hey there! What happened to the Ipad app? Yours, Per Tunedal On Wed, Feb 12, 2014, at 17:26, Francis Tyers wrote: --snip-- Note: We have two days left before the deadline. I'd encourage people to take a look at the ideas list and add anything you would be interested in mentoring. Alternatively, email the list about your idea and we will see about adding it. Fran -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: make an app for Iphone/Ipad
Hi Tino, I guess that means that the Apertium project doesn't own the code and cannot release the code under any other but the current license: GPL v.2 or any later version. In that case the only solution might be a completely new application, wouldn't it? I'm not even sure if such an application could use the Apertium online service? What about the dictionaries? What's the trouble with Apple's requirements for the app store? Are all open source licences impossible to use? That would explain the absence of many good open source projects in their store. Yours, Per Tunedal On Wed, Feb 5, 2014, at 9:54, Tino Didriksen wrote: On 5 February 2014 07:46, Per Tunedal per.tune...@operamail.com wrote: what about going a bit commercial? Many companies use a dual licence model: GPL + proprietary (e.g. the small Swedish company that made MySQL started that way). That's not a trivial task. Apertium doesn't require copyright assignment, so you'd have to track down and get assent from each and every Apertium contributor to add a non-FOSS licence option. Plus all 3rd party tools, such as HFST and CG-3 (well, CG-3 already has the non-FOSS license option). -- Tino Didriksen -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: make an app for Iphone/Ipad
Hi, the point is that we won't reach the intended audience if the iOS app isn't available through the official app store. I've not studied the market, but the price would have to be low. With crowd funding of the app it might be able to set the price to zero (free). Anyone familiar with GOTEO http://goteo.org/ or some other crowd funding? Yours, Per Tunedal BTW I noticed that GnuPG have raised quite a lot of money through GOTEO. We won't need that much, I suppose. On Wed, Feb 5, 2014, at 9:09, Mikel Artetxe wrote: On Wed, Feb 5, 2014 at 7:46 AM, Per Tunedal per.tune...@operamail.comwrote: Hi, what about going a bit commercial? Many companies use a dual licence model: GPL + proprietary (e.g. the small Swedish company that made MySQL started that way). I'm not the one who has to take this decision, but doing it just because stupid Apple doesn't want us to use GPL doesn't sound like a strong enough reason for me. I mean, I insist that it's not me who has to take the decision of what license to use, but neither Apple, right? Would it do any harm if the project offered an iOS port for $10? And used the income for the development of Apertium? You wouldn't probably sell too many copies. Not for that price and without a good marketing campaign at least. Unfortunately, that would not fit into GSOC though. We would have to finance it some other way. GOTEO? http://goteo.org/ Or some other crowd funding? Most Ipad/Iphone users are the opposite of hackers: they are happy with the limitations of the system. An Apertium-app for Ipad must be accessible in the App Store to reach the audience. Yours, Per Tunedal On Wed, Feb 5, 2014, at 0:08, Jimmy O'Regan wrote: --snip-- First of all, the basic problem with an iOS port is that it would not be distributable through the App Store (its terms are GPL-incompatible). This is the reason why an iOS port was not pursued in the past, and nothing has changed: see http://www.fsf.org/blogs/licensing/more-about-the-app-store-gpl-enforcement --snip-- -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] GSOC idea: make an app for Iphone/Ipad
Hi, last summer we got a nice app for Android devices, but that's no good for my Ipad. Maybe it would be easy to make an app for IOS-devices? Yours, Per Tunedal -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC Idea: Take a language pair and make it state of the art
Hi, I agree with Lars. It's essential that it becomes VERY easy to contribute words. We need to make thousands of people do small contributions without effort. Yours, Per Tunedal On Tue, Feb 4, 2014, at 23:33, Lars Aronsson wrote: On 02/04/2014 10:47 PM, Francis Tyers wrote: Out of the 39 or so language pairs that we have in trunk/, only two or three could be considered to offer state of the art performance with [...] Any thoughts ? -- snip-- It seems much harder to do small, incremental improvements of the Swedish (-Danish) language pair of Apertium. And quite easy to cause chaos. I made a few contributions in August 2013, but then I got ideas for some radical changes, but I couldn't predict if they were net improvements, or if they could have dangerous side effects. Apertium would benefit if the creation (and testing) of new paradigms was more clearly separated from adding new words to the dictionaries. These are two different roles, that require different skills. Right now, everything is code that is submitted to SVN, which requires the programmer-like ability to edit large text files. Adding words to the dictionaries should be more on the simple wiki editing skill level. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Unsupervised tagger training
Warning: There is not coarse tag for the fine tag '1888num' This is because of an incomplete tagset definition or a dictionary error Warning: There is not coarse tag for the fine tag '1865num' This is because of an incomplete tagset definition or a dictionary error Warning: There is not coarse tag for the fine tag '1880num' This is because of an incomplete tagset definition or a dictionary error Error: A new ambiguity class was found. I cannot continue. Word 'min' not found in the dictionary. New ambiguity class: {ABBR,PRNPOS} Take a look at the dictionary and at the training corpus. Then, retrain. make: *** [sv-da.prob] Fel 1 Yours, Per Tunedl On Thu, Jan 16, 2014, at 1:04, Francis Tyers wrote: El dc 15 de 01 de 2014 a les 20:18 +0100, en/na Per Tunedal va escriure: Thank you Francis! I've corrected one thing my self: change a coarse tag. But this one I don't understand: per@Pers-debian:~/apertium-sv-da$ make -f sv-da-unsupervised.make apertium-validate-tagger apertium-sv-da.sv.tsx apertium-tagger -t 8 \ sv-tagger-data/sv.dic \ sv-tagger-data/sv.crp \ apertium-sv-da.sv.tsx \ sv-da.prob; Calculating ambiguity classes... 97 states and 98 ambiguity classes Kupiec's initialization of transition and emission probabilities... Error: A new ambiguity class was found. I cannot continue. Word 'en' not found in the dictionary. New ambiguity class: {DETIND,NUM} Take a look at the dictionary and at the training corpus. Then, retrain. make: *** [sv-da.prob] Fel 1 Yours, Per Tunedal I just made some updates, adding some missing coarse tags to the TSX file. Looking at nn-nb, it seems like you might need to make further changes to the sv-da-unsupervised.make to deal with compounds. Fran -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] OT: SVN problem
Eureka! Debian defaults to using the gnome keyring and that didn't work for some reason. I've changed the ~/.subversion/config by adding a new line: password-stores = Now SVN works as usual. Yours, Per Tunedal On Thu, Jan 16, 2014, at 0:54, Francis Tyers wrote: El dc 15 de 01 de 2014 a les 20:21 +0100, en/na Per Tunedal va escriure: Hi, I cannot submit from my new box: Could not authenticate to server: rejected Basic challenge Found the following in the wiki, but it doesn't help. Nothing changes. Could not authenticate to server: rejected Basic challenge This happens because you have checked out with http and not https, you need to switch as follows: $ svn switch --relocate http://svn.code.sf.net/p/apertium/svn/ https://svn.code.sf.net/p/apertium/svn/ Check your password and try checking out again making sure that you use https:// instead of http://. Fran -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Unsupervised tagger training
Hi Francis, Thank you for fixing my typos. New error: per@Pers-debian:~/apertium-sv-da$ make -f sv-da-unsupervised.make Generating sv-tagger-data/sv.dic This may take some time. Please, take a cup of coffee and come back later. apertium-validate-dictionary apertium-sv-da.sv.dix apertium-validate-tagger apertium-sv-da.sv.tsx lt-expand apertium-sv-da.sv.dix | grep -v __REGEXP__ | grep -v :: |\ awk 'BEGIN{FS=::|:}{print $1 .;}' | apertium-destxt sv.dic.expanded lt-proc -a sv-da.automorf.bin sv.dic.expanded | \ apertium-filter-ambiguity apertium-sv-da.sv.tsx sv-tagger-data/sv.dic Error: (137): '#comment' tag unexpected. make: *** [sv-tagger-data/sv.dic] Fel 1 per@Pers-debian:~/apertium-sv-da$ make -f sv-da-unsupervised.make apertium-destxt sv-tagger-data/sv.crp.txt | lt-proc sv-da.automorf.bin sv-tagger-data/sv.crp apertium-validate-tagger apertium-sv-da.sv.tsx apertium-tagger -t 8 \ sv-tagger-data/sv.dic \ sv-tagger-data/sv.crp \ apertium-sv-da.sv.tsx \ sv-da.prob; Error: (137): '#comment' tag unexpected. make: *** [sv-da.prob] Fel 1 per@Pers-debian:~/apertium-sv-da$ What now? Yours, Per Tunedal On Fri, Jan 10, 2014, at 17:43, Francis Tyers wrote: Your XML is really messed up. I've tried to fix it, and now it validates, but you might want to check that it is doing the right thing. Try doing svn diff -r HEAD apertium-sv-da.sv.tsx to see the diff. Fran El dv 10 de 01 de 2014 a les 17:20 +0100, en/na Per Tunedal va escriure: Hi Francis, I followed the advice in the wiki - maybe some explanation would be appropriate. Now I've tried the make file that was already present in apertium-sv-da and encountered new errors: Generating sv-tagger-data/sv.dic This may take some time. Please, take a cup of coffee and come back later. apertium-validate-dictionary apertium-sv-da.sv.dix apertium-validate-tagger apertium-sv-da.sv.tsx apertium-sv-da.sv.tsx:471: parser error : Opening and ending tag mismatch: tagset line 119 and def-label /def-label ^ apertium-sv-da.sv.tsx:497: parser error : Opening and ending tag mismatch: tagger line 2 and tagset /tagset ^ apertium-sv-da.sv.tsx:499: parser error : Extra content at the end of the document forbid ^ make: *** [sv-tagger-data/sv.dic] Fel 1 I've looked into the tsx-file and cannot understand what's wrong. There is a tagset label above line 497 and forbid should not be anything strange would it? Yours, Per Tunedal On Fri, Jan 10, 2014, at 16:14, Francis Tyers wrote: Hi Per, it seems like you've copied a tagger training makefile from a language that uses metadix format. If you change .dixtmp1 to .dix it should work. Fran El dc 08 de 01 de 2014 a les 16:13 +0100, en/na Per Tunedal va escriure: Hi, I've tried to follow the instructions in the Wiki http://wiki.apertium.org/wiki/Unsupervised_tagger_training to train sv-da I got several complaints from the compiler. When I moved the files to the Apertium-sv-da folder I got a bit further but I'm still stuck: per@Pers-debian:~/apertium-sv-da$ make -f sv-da-unsupervised.make make: *** Ingen regel för att skapa målet apertium-sv-da.sv.dixtmp1, som behövs till sv-tagger-data/sv.dic. Stannar. My translation: No rule to create the target apertium-sv-da.sv.dixtmp1, that's needed to sv-tagger-data/sv.dic. Halts. Any idea? Yours, Per Tunedal -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff