Re: [Apertium-stuff] Translators on www.apertium.org
2010/4/19 Felipe Sánchez Martínez fsanc...@dlsi.ua.es: Hi all, Prompsit did not go down, why? because the language pairs offered there are stable and tested. I would like to rise a question. Should we offer the translation between developing language pairs at the webpage? IMHO we shouldn't. But what's the measure? Released pairs? The ones at apertium.org have all had a release. All language pairs that have reached version 1.0? That's rather arbitrary… All that have had a thorough testvoc? All released pairs _should_ have this. Should one simply let, say, half a year pass before putting a release on the server, to collect bug reports? How many people actually download the language packages, run lots of text through them, and then _report the bugs_? It seems to me like a better solution is to use ScaleMT, and perhaps let those language pairs that we, for whatever reason, consider too untested run on a different server. Unless I completely misunderstood Victor's presentation last fall, using ScaleMT it should be possible to keep the web page going even though one server goes down (do you even need ScaleMT to do that?). Thus developers can get quick feedback on what's wrong (oh, and apertium.org gets to offer more language pairs). Of course, this assumes that there is the possibility of having yet another server… best regards, Kevin Brubeck Unhammer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] issues with apertium service ?
2010/4/20 Francis Tyers fty...@prompsit.com: Friedel has noticed a change in the listPairs method, it doesn't seem to list pairs, is he doing anything wrong ? friedel1 spectie: Hi. friedel1 spectie: Aware of any issues with your service at the moment? friedel1 curl http://api.apertium.org/json/listPairs friedel1 {responseData:[],responseDetails:null,responseStatus:200} Is api.apertium.org running apertium-service? (there it's languagePairs, not listPairs) I can't test from my IP ;-) -Kevin Unhammer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] strange goings on in post-generation
Hi, I tried making a post-generation dictionary with just one rule ?xml version=1.0 encoding=iso-8859-1? dictionary alphabet/ sdefs sdef n=test/ /sdefs section id=main type=standard e p la/e/l re/r /p /e /section /dictionary but I get slashed output when I try running it: $ echo '~el' | lt-proc -p foo.autopgen.bin e\/el is this a bug or am I missing something? -- Kevin Brubeck Unhammer -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Arch Linux PKGBUILD's now available for (almost) all released language pairs
Hi, I just wanted to let people know that I've uploaded AUR packages for Arch Linux for all released language pairs in Apertium (except for is-en, which seems to require a newer version of apertium-pretransfer than what is in apertium-3.1.1). If anyone's running Arch Linux, I'd be very happy if they could give them a try and let me know where the bugs are hiding :-) Also, in making the packages I discovered some problems with certain pairs; I had to apply the following patches to make these pairs compile: apertium-es-ro: http://aur.archlinux.org/packages/apertium-es-ro/apertium-es-ro/trules.patch apertium-oc-ca: http://aur.archlinux.org/packages/apertium-oc-ca/apertium-oc-ca/t1x.patch apertium-oc-es: http://aur.archlinux.org/packages/apertium-oc-es/apertium-oc-es/oc-es.t1x.patch http://aur.archlinux.org/packages/apertium-oc-es/apertium-oc-es/es-oc.t1x.patch ...I'm not sure I got the logic here as intended, these should probably have a maintenance release. For all the pairs, I had to modify the Makefile.am in this manner: - $(INSTALL_DATA) $(BASENAME).$(PREFIX2).t1x $(apertium_nn_nbdir) + $(INSTALL_DATA) $(BASENAME).$(PREFIX2).t1x $(DESTDIR)$(apertium_nn_nbdir) I think $(DESTDIR) could be in the svn Makefile.am's without causing any trouble, seems to not make a difference except when creating these packages. best regards, Kevin B. Unhammer -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] maintenance release of lttoolbox and Apertium
2010/9/21 Francis Tyers fty...@prompsit.com: For some reason, we had versioned Apertium and lttoolbox as 3.2 in SVN, but never got around to making a 3.2 release. There have been some minor bugfixes and improvements -- fixing an issue in pretransfer and updating the DTDs, and I think it is worth making a 3.2 release -- not least because Unhammer wants to release apertium-nn-nb 0.7.0 ;) Thanks =D Also, the code for append was added to interchunk/postchunk.cc (it was in the DTD's but not in the code). -Kevin -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Bug in apertium-en-es
2010/10/6 Miquel Esplà miqueles...@gmail.com: Hi everybody, I've found a problem with the version of apertium-en-es in the SVN (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es) in the release 25956. It happens taht, when I try to translate a text in English with a $ symbol, it disappears in the tranlsation. I've tried to translate a file with the only sentence hello $ world and the result is: hello world. When I tried the trnalation from Spanish to English it worked, but for English to Spanish it fials. I am using lttoolbox-3.2.0 and apertium-3.2.0 and the version of apertium-es-en in the SVN. Can anybody help, please? Cheers, Miquel. If you add $ to the alphabet/, it will work (and $ will be marked unknown if you don't use -u). But I'm not sure if this causes other problems? best regards, Kevin Brubeck Unhammer -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] News from the mentor summit about GCI -- make tasks specific/Taskset 1: crossdics
2010/10/27 Jacob Nordfalk jacob.nordf...@gmail.com: Ive looked at http://wiki.apertium.org/wiki/Ideas_for_Google_Code-in Do you really think anyone can: 1) translate a text of 34,268 bytes (the new language pair HOWTO) into another language 2) go through it for a new pair of languages. 3) When finished, upload to the Incubator. in 2-3 HOURS!??!?? Well, Ive tried that task when I started out. I might be extraordinary slow but just doing step 1) would take me at least half a day for Esperanto. Same goes for the other proposals: These =18 age students must be really bright, but in general I would multiply all your estimations with a factor 3. Here is a proposal for what I would consider a realistic task for GCI: Add 50 nouns to apertium-sv-da. Check that the words work for boths directions (from Swedish to Danish and from Danish to Swedish). Time: 14 hours (install compile: 4 hours. Understand the format of the 3 .dix files to edit: 2 hours. Adding the words: 4 hours. Checking translation in both directions and fix problems: 4 hours). The time estimates do seem rather low yes. However, I think they're supposed to reflect only the work that's on that specific task (since students can work on several tasks, so they won't install apertium for each task...) The wiki page also does say The time column gives the minimum estimated amount of time that should be spent on the task. It does not include time taken to install / set up apertium. (now boldfaced, as I missed it the first time too) -Kevin -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] diccionarios de Apertium
jgime...@lsi.upc.edu writes: [...] El 24/11/10 17:15, Jesús Giménez escribió: de momento, he estado echándole un vistazo y creo q lo más sencillo será usar apertium-dixtools para leer los ficheros .dix ni qué decir tiene q cualquier sugerencia por tu parte será bien recibida! muchas gracias, jesus ps: por cierto, al hacer check-out de todo apertium subversion me ha dado un problema de encoding -- svn: Can't convert string from 'UTF-8' to native encoding: svn: apertium/apertium-nn-nb/dev/dansknorsk-h?\195?\184gnorsk-todo.dix (Sorry for replying in English) Does the problem only occur with this file? That is only a scratch file which should not compile in any case (it does not even validate); in general, files in dev folders are likely to have errors... best regards, Kevin Brubeck Unhammer -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compound words and dix format
Francis Tyers fty...@prompsit.com writes: Now we have the java compound word implementation ported to C++ we can probably consider this 'de facto' how we are going to do compounds in lttoolbox -- it is _in use_ and there have been _no alternatives_. So it is probably worth looking at how we are going to represent this nicely in the .dix format. At the moment we use two 'special' symbols: sdef n=compound-only-L c=for a form that can only appear on the L/ sdef n=compound-Rc=for a form that can only appear on the R, or as a word on its own/ I propose making a new element c for compound, and having one attribute r for restriction. s n=compound-only-L/ would be replaced with c r=L/ and s n=compound-R/ would be replaced with c r=R/ I think it would be better if elements with c r=R/ are, like c r=L/, compound-only. As the examples below show, an element marked s n=compound-R/ now both allows use in compounds and out of compounds, while s n=compound-only-L/ marks a path that's only reachable in compounds. I think new users would find it less confusing if they mean the same thing, even though it requires a slightly more explicit dix file. So instead of eplplast/lrplasts n=n/s n=m/s n=sg/s n=ind/c r=L//r/p/e eplplast/lrplasts n=n/s n=m/s n=sg/s n=ind//r/p/e eplkortet/lrkorts n=n/s n=nt/s n=sg/s n=def/c r=R//r/p/e you would have to have eplplast/lrplasts n=n/s n=m/s n=sg/s n=ind/c r=L//r/p/e eplplast/lrplasts n=n/s n=m/s n=sg/s n=ind//r/p/e eplkortet/lrkorts n=n/s n=nt/s n=sg/s n=def/c r=R//r/p/e eplkortet/lrkorts n=n/s n=nt/s n=sg/s n=def//r/p/e (Note the beautiful symmetry.) The original reason for having this difference was that we so far have no examples of forms that can be compound-R but not words on their own, so having those extra identical lines means longer dix files. However, lttoolbox has this wonderful feature called pardefs :) So what the line for kortet really looks like is this: e plkortet/lrkorts n=n/s n=nt/s n=sg/s n=def//r/ppar n=cp-R//e where pardef n=cp-R !-- can appear in compounds: -- e pl/l rc r=R//r/p/e !-- can appear as a word on its own: -- e pl/l r/r/p/e /pardef So, if we're deciding on specifications, that's the only thing I'd like to see changed. -Kevin -- Sent from my Emacs -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Compound words and dix format
Francis Tyers fty...@prompsit.com writes: Hi! The problem with this is that there are so many different metadix formats that it will be impossible to come up with one that covers them all. For example if I remember correctly how the alt works is different in es-pt and in oc-es. I think it was decided that it was desirable to have them functioning differently, or at least would require substantial changes in either language pair to get a unified format -- changes that without some push (and let's face it, cash) are not going to get made. On the other hand, implementing compound words gives us the chance to strike while the iron is hot! We can make a (fairly innocuous change -- any language pair that does not have compounding will be unaffected) before getting a plethora of different options and thus avoiding the metadix problem for another set of issues. Btw, thinking about metadix I have some probably unpopular ideas, thatwould preclude any standardisation. I think that maybe we should not have one format, but rather many _codified_ formats depending on the language(group). For example how to include a verb would be different in Tajik and Dutch, because different things are important. Unnecessary examples: e lm=aanzittenpar n=z/itten__vblex prefix=aan pp=aangezeten//e Giving: e lm=aanzitteniaanz/ipar n=aanz/itten__vblex_sep//e e lm=aanzittenplz/lraanz/r/ppar n=z/itten#_aan__vblex_sep/plb/aan/lr/r/p/e e lm=aanzittenplaangezeten/lraanzitten/r/ppar n=gesproken__vblex_sep//e Or in Tajik: e lm=хариданpar n=кард/ан__vblex stem1=харид stem2=хар//e In the unification proposal from http://wiki.apertium.org/wiki/Unification_of_metadix_and_parametrized_dictionaries#A_unifying_proposal the calls would look like e lm=aanzittenpar n=z/itten__vblex prms=prefix='aan' pp='aangezeten'//e and e lm=хариданpar n=кард/ан__vblex prms=stem1='харид' stem2='хар'//e Are there good reasons not to go with that kind of syntax? -- Kevin Brubeck Unhammer -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] modify-case aa on uppercased input
Hi, Is there a bug in modify-case clip pos=1 side=tl part=lemh/ lit v=aa/ /modify-case when the input is all uppercase, or am I using it wrong? wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix lt-comp lr problem.dix problem.dix.bin apertium-preprocess-transfer problem.t1x problem.t1x.bin echo '^GUOKTENum$' | apertium-transfer problem.t1x problem.t1x.bin problem.dix.bin gives ^detdetqnt{^tOdetqnt$}$ whereas I was expecting to see ^detdetqnt{^todetqnt$}$ -- best regards, Kevin Brubeck Unhammer -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] modify-case aa on uppercased input
Francis Tyers fty...@prompsit.com writes: El dc 05 de 01 de 2011 a les 09:32 +0100, en/na Kevin Brubeck Unhammer va escriure: Hi, Is there a bug in modify-case clip pos=1 side=tl part=lemh/ lit v=aa/ /modify-case when the input is all uppercase, or am I using it wrong? wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix lt-comp lr problem.dix problem.dix.bin apertium-preprocess-transfer problem.t1x problem.t1x.bin echo '^GUOKTENum$' | apertium-transfer problem.t1x problem.t1x.bin problem.dix.bin gives ^detdetqnt{^tOdetqnt$}$ whereas I was expecting to see ^detdetqnt{^todetqnt$}$ I think the code that deals with this is in transfer.cc string Transfer::copycase(string const source_word, string const target_word) I'm struggling to make heads or tails of that though. In the en-ca rules, you find: modify-case clip pos=1 side=tl part=lem/ lit v=aa/ /modify-case and in the es-ca rules too. So I guess you are calling it right. It would seem to be a bug of some description. s_word == aa, t_word == TO then for s_word: firstupper is false, uppercase is false, sizeone is false if(!uppercase || (sizeone uppercase)) { result = t_word; result[0] = towlower(result[0]); //result = StringUtils::tolower(t_word); } else { result = StringUtils::toupper(t_word); } if(firstupper) { result[0] = towupper(result[0]); } gives us tO (first test passes). If we change the first test to if(!uppercase || (sizeone uppercase)) { result = t_word; //result[0] = towlower(result[0]); result = StringUtils::tolower(t_word); } we get the expected to. Does anyone know why we would want to only lowercase the first character? On a related note, why is sizeoneuppercase treated as if it were lowercase? Isn't it safer to simply ignore sizeone words passed to modify-case? E.g. if(!sizeone){ if(!uppercase) { tolower } else { toupper } if(firstupper) { toupper [0] } } -Kevin -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Election of new Apertium PMC
Hi all, The Apertium Project Management Committee has just been elected by the census of Committers, as per the Apertium By-laws[1]. According to the by-laws, the responsibilities of the PMC include deciding what is suitable for release as an Apertium product, maintaining the repositories and web sites, speaking on behalf of the project, resolving license disputes, granting commit access, maintaining the by-laws, promoting Apertium and attracting and distributing funds of the project. The newly elected PMC members are: Mikel (president) Jacob Juan Antonio Jim Felipe Sergio Fran Congratulations to them all :-) best regards, Kevin Brubeck Unhammer, of the Election Board Footnotes: [1] http://wiki.apertium.org/wiki/By-laws -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Null character makes lt-proc (without -z option) exit
Hi, The -z option makes lt-proc flush whenever it sees the null character, which is nice. But if you don't give it -z, it exits on the null character -- I'm guessing it shouldn't... Added a bug here: http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=108 (I got a null character out when converting a pdf to text, so they do occur in the wild.) -- Kevin Brubeck Unhammer -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Null character makes lt-proc (without -z option) exit
Jimmy O'Regan jore...@gmail.com writes: On 26 January 2011 09:34, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Hi, The -z option makes lt-proc flush whenever it sees the null character, which is nice. But if you don't give it -z, it exits on the null character -- I'm guessing it shouldn't... Yeah, though I think it's one of those things that falls into the category of if this has happened, you have bigger problems than the translator not working. It would probably be enough to either escape or discard nulls in the deformatter. Is there any compelling reason to not simply discard them? Only if you want to use lt-proc -z. That is, removing nulls in the deformatter would have to be optional, so it can still work with lt-proc -z. Of course you can just run everything with lt-proc -z anyway... but maybe that gives other side effects? Added a bug here: http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=108 (I got a null character out when converting a pdf to text, so they do occur in the wild.) Seems to me to be a double bug -- whatever your were using almost certainly should not have given you a null in its output. Of course; notified pdfminer of the bug too. -Kevin -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] en-es-generador gives a bit too many alternatives on all-caps regexp names
Hi, This oddness happens in apertium-en-es, current svn revision: $ echo Mrs. FOOBAR|apertium -d /l/a/apertium-en-es en-es-generador FOOBAR/FOOBAr/FOOBaR/FOOBar/FOObAR/FOObAr/FOObaR/FOObar/FOoBAR/FOoBAr/FOoBaR/FOoBar/FOobAR/FOobAr/FOobaR/FOobar/FoOBAR/FoOBAr/FoOBaR/FoOBar/FoObAR/FoObAr/FoObaR/FoObar/FooBAR/FooBAr/FooBaR/FooBar/FoobAR/FoobAr/FoobaR/Foobar It seems to be fine up until postchunk: $ echo Mrs. FOOBAR|apertium -d /path/to/apertium-en-es en-es-postchunk ^Pn000FOOBARnpantmfsg$^.sent$ (and the web gives Señora FOOBAR so I guess it did work before). -- Kevin Brubeck Unhammer http://donttrack.us/ -- because you're worth it -- The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC,New Language Pair tr-ky.
mirlan mip1...@yahoo.com writes: ** If you use trmorph, how will you trim the lemmas to the contents of the bilingual dictionary ? I am working on it. Please explain how ;-) The regular method[1] is to take an lttoolbox analyser, and find the full set of possible input-output pairs using the program lt-expand, and run that through the translator to check for errors. Unfortunately, when your analyser is in SFST/HFST-format -- which opens for lots of loops in the analyser -- things get a bit more complicated. Brian Croom's hfst-fst2strings[2] attempts to do something similar to lt-expand, while providing some ways to filter the possibilities. * How will you make the bilingual lexicon ? I presume there are few freely-available (e.g. open-source/free software) dictionaries, so you will probably have to build your own. Someone with experience of Apertium can do ~400 words in a day, so we would like to see a start on the lexicon to make sure you understand the problems involved. Right now i have StarDict tr-ky dicitionary, i hope it could help me. Is there a link? Does it have part-of-speech (word class) information? (That would make it a lot easier to use.) * It would be a good idea to start looking at any transfer (syntactic/morphological) issues between the two languages. tr-ky have some similarities […] We are more interested in the differences ;) E.g. differences in case system, inflection, word order, etc. The best way to document such differences (or similarities) is to make a page like http://wiki.apertium.org/wiki/English_and_French/Pending_tests which you can then test your language pair on. Do come on IRC more so we can discuss the issues and any possible problems you have; we don't want anyone to waste lots of time on something that could be solved by discussing it on IRC :) best regards, Kevin Brubeck Unhammer Footnotes: [1] http://wiki.apertium.org/wiki/Testvoc [2] http://sourceforge.net/mailarchive/forum.php?thread_name=AANLkTinYnDtHehxWWAJf25JVXKYaM0Uw95Kzr41jgKZo%40mail.gmail.comforum_name=apertium-stuff -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC11 Draft Proposal: Rule-based finite-state disambiguation
primarily to detail item (5) 1 week sprint: final polish, debugging and documentation effort I'd like to see a more detailed plan, especially wrt. which features should be implemented and prioritised. Some of the CG functions implemented by e.g. vislcg3[1] are a lot more important than others, so think about the feature set and test cases for that. E.g. LIST, SELECT/REMOVE, star (*), BARRIER, Careful (C) are important. Things like spanning window boundaries, setting marks or making dependency trees should be deferred until much later. Unification is possible to avoid by just writing more rules. [...] Recently I’ve been working on an online chessboard (jQuery/node.js), Include the URL in your proposal, if you can ;) -- Kevin Brubeck Unhammer -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update propsal for GSoC 2011 Apertium tr-ky language pair.
mirlan mip1...@yahoo.com writes: Hi, Please find attached my proposal for GSoC 2011. Looks promising, but please make sure you answer all the questions in http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications#Template (and in the same order). What do you plan to do in the Community Bonding period? best regards, Kevin Brubeck Unhammer -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Which package to download?
Congmin min marlon...@gmail.com writes: Hi, I am new to Apertium and have two questions for your help with: 1) It seems there is not a single bundled package on sourceforge for downloading. Then which ones should I download for Linux or Windows? For example, I want to download and install, and then try out the English-spanish translation first. lttoolbox, apertium, apertium-en-es (install them in that order) However, if you're planning on developing a language pair, it would be better to install from SVN: http://wiki.apertium.org/wiki/Minimal_installation_from_SVN 2) Is it possible to develop an English-Chinese language pair, without significantly change the system? There might _eventually_ be a problem with handling the alphabet size in lttoolbox, although if I remember the conversation from last time, jimregan said it shouldn't be too much trouble to fix… Other than that, I can't foresee any technical issues. -- Kevin Brubeck Unhammer Sent from my emacs. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 49, Issue 4
Aish Raj Dahal dahalaish...@gmail.com writes: In the above example, I have noticed that the ukar symbol of Devnagari is not being rendered so ? is being seen as ???. There is also a problem with rendering of half letters (sorry, i do not know the linguistic term for it). Here is an example of what I mean: echo computer|apertium en-ne ?? In the above example the word ?computer? should have given ? I get ? -- the problem is with your terminal not rendering the combining characters, not with Apertium. gnome-terminal is known to have issues with Devanagari, is that what you're using? Well, I guessed so. I am using the terminal Konsole under KDE 4.6 (Kubunutu 11.04). Is there a way to work around this problem? I get the same behaviour under Konsole on Arch Linux with KDE 4.6: $ echo computer | apertium -d . en-ne कमपयटर while piping into a file gives me कम्प्युटर It seems to be a known bug, with a patch (last one from 2 years ago?) if you feel like recompiling: http://bugs.kde.org/show_bug.cgi?id=156071 But it might be quicker to just install gnome-terminal/xterm/something else. Or open emacs and do M-x shell, which displays it correctly :) -- Kevin Brubeck Unhammer -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Official Apertium buttons
Fajro fai...@gmail.com writes: On Tue, Jun 7, 2011 at 3:16 PM, Mikel Forcada m...@dlsi.ua.es wrote: Hi Apertiumers, would HTML/javascript buttons such as the one below (which of course can easily be improved) be acceptable to the Apertium community. +1. I made a facebook page 2 years ago: http://www.facebook.com/Apertium Still less than 50 fans :( Anyone want to be admin? Apertium also should have a cool blog; something like http://googletranslate.blogspot.com/ but better. Or at least a planet (blog aggregator) ? (see https://secure.wikimedia.org/wikipedia/en/wiki/Planet_%28software%29) -Kevin -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Constraint Grammer infrastructure at risk
The University of Southern Denmark has decided to cut financial support of the VISL Constraint Grammar infrastructure, and the developers are calling for moral/financial contributions or lobbying initiatives: https://groups.google.com/group/constraint-grammar/browse_thread/thread/515081fab2b2797d -- Kevin Brubeck Unhammer -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
Francis Tyers fty...@prompsit.com writes: El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va escriure: On Sunday 28 February 2010, Francis Tyers wrote: I don't know Icelandic at all and therefore can't tell whether some of the words are accepted or rejected incorrectly. Nice, it looks good. Some of the capitalised words should be recognised corrected, at least 'Bretlandi' and 'Norðmenn' . I tried to fix the checking of capitalized words but started to run into problems. It seems that the library API works in somewhat surprising (at least to me) ways when you enter a word that starts with a capital letter and ends with garbage. The implementation is here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182view=markup and test cases here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183view=markup I was able to get all test cases expect the one with TODO in method name implemented. How would you suggest fixing the code so that all tests would pass? Of course a patch would be most welcome :) Hmm, strangely enough, when I try an unknown word I get similar strange output: $ ./test mor.bin ^Reykjanghfghesi$ -- ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ Seems to be a bug with partly-matching regexes in the biltrans functions. Testing the different functions, I get: biltransWithQueue: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ qSize: 0 biltransWithoutQueue: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ biltrans: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ biltransfull: ^$ But, if I comment out the two regex entries e par n=persons//e e par n=organisations//e at the end of apertium-is-en.is.dix, I get biltransWithQueue: @Reykjanghfghesi qSize: 0 biltransWithoutQueue: @Reykjanghfghesi biltrans: @Reykjanghfghesi biltransfull: @Reykjanghfghesi Similarly on the command line with lt-proc -b (while regular lt-proc -a returns unknown, as it should – the persons/orgnisations regexes don't fully match either). -- Kevin Brubeck Unhammer -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Demonyms from ca.wikipedia
Jimmy O'Regan jore...@gmail.com writes: $ wget http://downloads.dbpedia.org/3.7/ca/mappingbased_properties_ca.nt.bz2 $ bzgrep '/demonym' mappingbased_properties_ca.nt.bz2 |perl -MURI::Escape '-MUnicode::Escape qw(unescape)' -ane 'if (m!http://dbpedia.org/resource/([^]*) http://dbpedia.org/ontology/demonym ([^]*)\@ca .!) {print uri_unescape($1).\t.unescape($2).\n;}' gives things like: Alcover Alcoverenc, alcoverenca Aiguamúrcia Aiguamurcienc, aiguamurcienca Amer Amerencs, amerenques Almoster Almosterenc, almosterenca L'Albiol Albiolenc, albiolenca Alforja Alforgenc, alforgenca ArgelaguerArgelaguenc, argelaguenca L'Arboç Arbocenc, arbocenca Arbúcies Arbucienc, arbucienca Albinyana Albinyanenc, albinyanenca ...of course, it's not all /that/ neat and tidy: Newcastle_upon_Tyne Geordie Encarnación_(Paraguai)encarnacero/a Kristiansand kristiansander Bodø bodøværing Haugesund haugesundar, -er Demonym is a name for someone who's from a certain place? In that case, at least the last three should be correct and official[1]. [1] http://www.sprakrad.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/ -- Kevin B. Unhammer -- EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend
Kevin Brubeck Unhammer unham...@fsfe.org writes: Francis Tyers fty...@prompsit.com writes: El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va escriure: On Sunday 28 February 2010, Francis Tyers wrote: I don't know Icelandic at all and therefore can't tell whether some of the words are accepted or rejected incorrectly. Nice, it looks good. Some of the capitalised words should be recognised corrected, at least 'Bretlandi' and 'Norðmenn' . I tried to fix the checking of capitalized words but started to run into problems. It seems that the library API works in somewhat surprising (at least to me) ways when you enter a word that starts with a capital letter and ends with garbage. The implementation is here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182view=markup and test cases here http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183view=markup I was able to get all test cases expect the one with TODO in method name implemented. How would you suggest fixing the code so that all tests would pass? Of course a patch would be most welcome :) Hmm, strangely enough, when I try an unknown word I get similar strange output: $ ./test mor.bin ^Reykjanghfghesi$ -- ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ Seems to be a bug with partly-matching regexes in the biltrans functions. Testing the different functions, I get: biltransWithQueue: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ qSize: 0 biltransWithoutQueue: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ biltrans: ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$ biltransfull: ^$ But, if I comment out the two regex entries e par n=persons//e e par n=organisations//e at the end of apertium-is-en.is.dix, I get biltransWithQueue: @Reykjanghfghesi qSize: 0 biltransWithoutQueue: @Reykjanghfghesi biltrans: @Reykjanghfghesi biltransfull: @Reykjanghfghesi Similarly on the command line with lt-proc -b (while regular lt-proc -a returns unknown, as it should – the persons/orgnisations regexes don't fully match either). I put a patch up at http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which solves this for both lt-proc -b, as well as biltransWithQueue. Please test. I haven't tried with the other biltrans* functions (I can't see that they're actually used in the rest of Apertium, so I'm not sure what they're there for). It also fixes a problem where superfluous characters after tags would pass as matches in lt-proc -b (this bug was not present in biltransWithQueue). It's still possible to carry over _tags_ after the analysis of course. I guess it's not strange that this bug was here, since normally you never have words without tags in bidix, but when using these functions on a monodix it of course becomes a problem. (And, although it's not recommended, if people really do want to have non-tagged lemmas in bidix, lttoolbox should at least not give analyses for lemmas that are _not_ in the bidix.) best regards, Kevin Brubeck Unhammer -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free Love Thy Logs t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Installing Language Pair from Incubator
Francis Tyers fty...@prompsit.com writes: Hi Francis, It should work the same as usual, $ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-en-fr $ cd apertium-en-fr $ ./autogen.sh --prefix=/home/fran/local/ $ make $ make install $ echo Ceci n'est pas une preuve | apertium -d . fr-en This no is a proof (Then whichever scripts are needed for ScaleMT). ScaleMT has this script that downloads and installs pairs from trunk to a prefix folder, changing modes.xml in the process: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/scaleMT/ScaleMTSlave/src/main/assembly-files/installApertiumAndPairs.sh?revision=34131content-type=text%2Fplainpathrev=34131 It shouldn't be too hard to rewrite it to accept incubator/apertium-en-it as an option to -l. Or you can just do what the script does manually if you're in a hurry. E.g. if you installed everything into the default prefix ~/local it should be something like: cd apertium-en-it mv modes.xml modes.xml.original if [ $TRADUBI_ENABLED = yes ] then java -jar ../../ScaleMTSlave-1.0.jar -processModes -inputModes modes.xml.original -outputModes modes.xml -tradubiDictionaryPath $DICT_DIR -prefix ~/local/bin else java -jar ../../ScaleMTSlave-1.0.jar -processModes -inputModes modes.xml.original -outputModes modes.xml fi PKG_CONFIG_PATH=~/local/lib/pkgconfig sh autogen.sh --prefix=~/local make make install mv modes.xml modes.xml.modified mv modes.xml.original modes.xml El dl 10 de 10 de 2011 a les 08:39 +, en/na Francis Gwapo va escriure: Hello, I am using ScaleMT. How do i install a language pair from the Incubator directory. I would like to install english to french. Any help is highly appreciated. Francis -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Transllation ca-en error
Jimmy O'Regan jore...@gmail.com writes: On 21 October 2011 09:27, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Jimmy O'Regan jore...@gmail.com I've fixed it in SVN. One of the macros was being called with too few parameters, which was causing a segfault. $ apertium-transfervm-compiler -i apertium-en-ca.ca-en.t1x -o ca-en.v1x.bin Error: line 6 107, macro 'f_bcond' needs 2 parameters, passed 1 Why .bin? I thought that one of the problems with was that it /doesn't/ output binary. Ah, true, hadn't even looked at that. ISTR that java transfer issues these warnings too, and I'd be far more inclined to use that for debugging because java debuggers exist, and the (uncompiled) output is easier to read by far. It does, as well as some others :) $ apertium-preprocess-transfer-bytecode-j apertium-en-ca.ca-en.t1x apertium-en-ca.ca-en.t1x.class Parsing apertium-en-ca.ca-en.t1x // WARNING: Macro f_bcond is invoked with too few parameters. Adding blank parameters - for transfer default=chunk/section-rules/rule comment=pro + pro + ANAR + INF (m'ho va donar - gave it to me)/action/choose/otherwise/call-macro n=f_bcond // WARNING: Attribute a_prep is not defined. Valid attributes are: [a_nom, a_np_acr, a_adj, grau, a_det, a_num, a_verb, pron, sep, a_adv, a_rel, a_pp, a_prn, tipus_prn, pers, gen, nbr, temps, neg, lem, lemq, lemh, whole, tags, chname, chcontent, content] // Replacing with error_UNKNOWN_ATTR - for transfer default=chunk/section-rules/rule comment=a + DET + MESO ( al juliol → in July/action/out/chunk case=caseFirstWord name=in_meso/tags/tag/clip part=a_prep pos=1 side=tl Compiling: javac -cp /usr/local/bin/../share/apertium/lttoolbox.jar ./apertium_en_ca_ca_en_t1x.java -- The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GUI for adding / editing of words in dictionaries
Francis Tyers fty...@prompsit.com writes: El dl 24 de 10 de 2011 a les 01:00 +, en/na Francis Gwapo va escriure: Hello, Are there any tools for GUI for adding words in dictionaries? Any help is highly appreciated. There have been a number of attempts, you can look at trunk/apertium-forms-server, and branches/gsoc2010/alessiojr -- but neither of them are particularly effective. The former has a screenshot and some info at http://wiki.apertium.org/wiki/Tools#Tools_for_developers -- Kevin Brubeck Unhammer -- The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium for Sugar Labs IRC live translation
Jimmy O'Regan jore...@gmail.com writes: On 31 October 2011 10:49, Jimmy O'Regan jore...@gmail.com wrote: On 31 October 2011 08:39, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Currently, I think the best we have is email a developer (or the mailing list). http://wiki.apertium.org/wiki/Tradubi might be an alternative, here users can enter translations in a web interface which are applied to their system. For these translations to be contributed back to the Apertium project, a developer would have to go through them and add some meta-information, but it could still be very helpful. Tradubi should really be seen as an alternative to adding words to the system, not a means to achieving it. I might accept a _short_ wordlist from Tradubi once, but not on an ongoing basis. It's really no more useful than a list of unknown words. I guess I should note that I've never used it myself, and never will (Affero), so I don't know if it has other export options than TMX. If there are, maybe they're more useful than a list of unknowns, but I still don't imagine it being a viable way of expanding dictionaries. The article says they create lttoolbox bidix (inserted between the part-of-speech tagger and the structural transfer module). I guess it should be possible to add POS along with the lemmas in the GUI, but it doesn't seem to be implemented yet anyway. -Kevin -- Get your Android app more play: Bring it to the BlackBerry PlayBook in minutes. BlackBerry App World#153; now supports Android#153; Apps for the BlackBerryreg; PlayBook#153;. Discover just how easy and simple it is! http://p.sf.net/sfu/android-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium for Sugar Labs IRC live translation
Aleksey Lim alsr...@activitycentral.org writes: On Mon, Oct 31, 2011 at 03:31:46AM +, Aleksey Lim wrote: Hi all! About Sugar: Sugar is a learning platform that reinvents how computers are used for education. Collaboration, reflection, and discovery are integrated directly into the user interface. Sugar promotes studio thinking and reflective practice. Through Sugar's clarity of design, children and teachers have the opportunity to use computers on their own terms. Students can reshape, reinvent, and reapply both software and content into powerful learning activities. Sugar's focus on sharing, criticism, and exploration is grounded in the culture of free software (FLOSS). More information about Sugar might be found on http://wiki.sugarlabs.org/. For some time, Sugar Labs used Google translation API to automatically translate IRC posts in several Sugar related channels [1]. But Google is closing this service for free usage. Since Sugar is totally about learning/doing and, not the least one, supporting FOSS, it might be useful to start using Apertium and ask Sugar community start contributing to Apertium languages data bases. In this regard, a couple of questions: * It seems that our most need is en-es/es-en translation, how Apertium is good for, at least, initial usage for live translation? * Is there any ongoing project to develop a tool to simplify accepting [small] contributions from community members? For example, Sugar Labs uses Pootle instance [2] to coordinate i18n efforts, which is a web service to accept contributions from the community. [1] http://chat.sugarlabs.org/ [2] translate.sugarlabs.org Thanks to everyone, I will try to setup Apertium (and maybe with openmatrex) as en-es/es-en translation backend for Sugar Labs IRC channels to start using it on regular basis. You might want to look at http://wiki.apertium.org/wiki/ScaleMT if you're going to have a lot of users on your server (this is the service that runs on http://api.apertium.org ). Please post to this list if you have trouble installing apertium/scaleMT :) Also, if I got it right, existing Web applicateion, http://wiki.apertium.org/wiki/Tradubi, is not FOSS and doesn't directly contribute to Apertium database. Maybe it makes sense to start thinking about having a la Pootle for contributing directly to Apertium or so, i.e., tools do matter to have sustainable community contribution. I've CCed to Sugar Labs i18n coordinator, Chris Leonard, maybe he has some ideas. Well, Tradubi is AGPL (ScaleMT as well), so it's FOSS, but a controversial license (the gist of it is that changes you make have to be contributed back _even if you're just running the software as a web service_). -- Kevin Brubeck Unhammer -- RSAreg; Conference 2012 Save #36;700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Dictionaries, coverage and other dull tasks
. check source text and find it was 'tiempito' 3. edit to … a short while … Another problem is that transfer becomes more complex, do you insert 'small' before or after other adjectives (adverbs, preadverbs) in a chunk? You now have to think about this for every possible noun chunking rule. Unfortunately, the original sme analyser which we use as a basis in sme-nob is even more complex, since it is meant to cover pretty much all productive derivations. It's very good for annotating and disambiguating a corpus, but without modifications it is too complex for MT. All productive derivations includes derivations that change the part-of-speech, and compounds, and derivations of derivations … If you can have a diminutive of a deverbal noun, you have to think about how to add 'small' in all rules, including those that originally were meant for verbs (a transfer rule pattern matching v.* will match v.derivation.n.*). In sme→nob, we restrict the possible derivations in the analyser quite a lot, and only stick to a small set of single derivations (no derivations of derivations) for which it is easy to find a way of rewriting that sounds alright and doesn't induce too much transfer complexity. Even so, most of the time spent debugging transfer/bidix and the analyser stems from derivations, and if I spoke Sámi I'm pretty sure my time would have been better spent on adding words to bidix rather than on trying to juggle all the possible ways in which derivations interact. However, derivations can work if (1) the derivation is high enough frequency, and (2) it is possible to deal with it in transfer (and the analyser) in a simple way, and (3) it is possible to make the translation sound good while preserving the meaning, and (4) the translator is meant for gisting/assimilation, not post-editing/dissemination, and (5) you have a lot of time on your hands. Elsewise, I'm not sure it's worth it. [1] The main thing HFST adds is 'flag diacritics', which basically allow you to put restrictions on which tags can go together in one analysis. Thus you could put optional diminutives at the end of _all_ noun analyses, and if there's a certain noun that can't have a diminutive, you just put a special 'hidden tag' on that particular noun in its section, the diminutive line of the noun pardef then adds another 'hidden' tag that is incompatible with the first one, and doesn't allow analyses that contain both tags. You could acheive the same effect in lttoolbox by duplicating all noun pardefs into with_diminutive- and without_diminutive-versions. best regards, Kevin Brubeck Unhammer -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] update
Felipe Sánchez Martínez fsanc...@dlsi.ua.es writes: Hi all, I think the task find X rules for how to translate words with more than one possible translation could be misunderstood as they could mixed lexical selection problems with part-of-speech ambiguity problems. I see graduate students doing so, every year. Agree … perhaps a link to http://wiki.apertium.org/wiki/Ambiguity in the description would be enough? I'm not sure there's a shorter way of saying it if you don't already know the concepts. El 16/11/11 15:20, Francis Tyers escribió: El dc 16 de 11 de 2011 a les 14:18 +, en/na Jimmy O'Regan va escriure: On 16 Nov 2011 14:09, Francis Tyersfty...@prompsit.com wrote: El dc 16 de 11 de 2011 a les 14:02 +, en/na Jimmy O'Regan va escriure: On 16 Nov 2011 13:35, Francis Tyersfty...@prompsit.com wrote: Hey all! I've thrown all the parts together and have a working prototype of the lexical selection module. A rule compiler, and a processor. At the moment the rule format is like: https://apertium.svn.sourceforge.net/svnroot/apertium/branches/apertium-lex-tools/examples/rules.txt But we have also discussed an XML-based format, which would be like: https://apertium.svn.sourceforge.net/svnroot/apertium/branches/apertium-lex-tools/examples/rules.xml I would like to, as my next step, improve the rule compiler (at the moment there is a lot of string mangling that I think could be improved on -- e.g. for holding the pattern lengths/ids), and support the XML format, but in order to do this, I would first like to get comments on it. Is there anything that you would change? Do you feel comfortable writing rules in this format? It might be better to ask next week, when GCI tasks have been sorted and finalised. Split focus and so on. What a great idea! We could make some GCI tasks like come up with X lexical selection rules for a language pair of your choice. You'll want to rephrase that, significantly. GCI students are casually browsing a list of titles so you should pick a title that doesn't rely on a relatively obscure phrase - something that immediately informs them that they probably already know this. Yeah, how about: find X rules for how to translate words with more than one possible translation ? -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Tagger training
Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser. but running the makefile itself results in an empty af-tagger-data/af.dic running this line: after creating af.dic.expand gives usage on lt-proc usage lt-proc -e -w -a af-nl.automorf.bin af.dic.expanded Well, there's your problem. Usage prints to stderr, hence empty file. Any pointers? -a is the mode switch, it should be the first option. -w is completely superfluous for tagger training, get rid of it. If you want to train a tagger that's aware of pin-the-tail-on-the-compound mode, you'll probably have to do something extra, because (IIRC) it's only invoked when it encounters words that are not in the dictionary, which will never be the case on an expansion of the dictionary - so either manually add a bunch of compounds, or get rid of that, too. -e is the compound thing, -w just ensures lemmas don't get surface case applied (I guess that's pointless too though?) -- Kevin Brubeck Unhammer -- 10 Tips for Better Server Consolidation Server virtualization is being driven by many needs. But none more important than the need to reduce IT complexity while improving strategic productivity. Learn More! http://www.accelacomm.com/jaw/sdnl/114/51507609/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Tagger training
Francis Tyers fty...@prompsit.com writes: El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer va escriure: Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser. but running the makefile itself results in an empty af-tagger-data/af.dic running this line: after creating af.dic.expand gives usage on lt-proc usage lt-proc -e -w -a af-nl.automorf.bin af.dic.expanded Well, there's your problem. Usage prints to stderr, hence empty file. Any pointers? -a is the mode switch, it should be the first option. -w is completely superfluous for tagger training, get rid of it. If you want to train a tagger that's aware of pin-the-tail-on-the-compound mode, you'll probably have to do something extra, because (IIRC) it's only invoked when it encounters words that are not in the dictionary, which will never be the case on an expansion of the dictionary - so either manually add a bunch of compounds, or get rid of that, too. -e is the compound thing, -w just ensures lemmas don't get surface case applied (I guess that's pointless too though?) Do you think the error might be because it finds a word which has a compound analysis, but that isn't in the dictionary ? Unhammer: Have you ever tried to train the tagger for nn-nb with compound mode ? never tried tagger training at all … -KBU -- 10 Tips for Better Server Consolidation Server virtualization is being driven by many needs. But none more important than the need to reduce IT complexity while improving strategic productivity. Learn More! http://www.accelacomm.com/jaw/sdnl/114/51507609/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Tagger training
Jimmy O'Regan jore...@gmail.com writes: On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote: El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer va escriure: Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser. but running the makefile itself results in an empty af-tagger-data/af.dic running this line: after creating af.dic.expand gives usage on lt-proc usage lt-proc -e -w -a af-nl.automorf.bin af.dic.expanded Well, there's your problem. Usage prints to stderr, hence empty file. Any pointers? -a is the mode switch, it should be the first option. -w is completely superfluous for tagger training, get rid of it. If you want to train a tagger that's aware of pin-the-tail-on-the-compound mode, you'll probably have to do something extra, because (IIRC) it's only invoked when it encounters words that are not in the dictionary, which will never be the case on an expansion of the dictionary - so either manually add a bunch of compounds, or get rid of that, too. -e is the compound thing, -w just ensures lemmas don't get surface case applied (I guess that's pointless too though?) Do you think the error might be because it finds a word which has a compound analysis, but that isn't in the dictionary ? That shouldn't happen, because the input is the expansion of the dictionary. If it is the case, it's most likely that the filtering of the expansion is faulty. But that's beside the point, the problem is that the options, as specified, are triggering the usage information. This could be because 1) -a needs to be first; or 2) some conflict among -a, -w, -e. If it's a conflict between -a and -w and/or -e, then that's a bug in the option handling in lt-proc, and someone who cares about -w and -e should fix it (i.e., it ain't gonna be me). If it's 2), my point is that the bug can be worked around by simply omitting -w and -e, because they do nothing -- or omit -a, because it's the default mode. Whatever works. I'm sure that -w does nothing in this context, but I'm not entirely sure about -e - my recollection is that it is engaged if and only if there is no dictionary analysis of the word, which, see above, should not happen. I don't know, I've never used it. Leading from that, if you want to train the tagger to have some awareness of these guesses at compounds, then the tagger dictionary will need to contain material other than the expansion of the dictionary. $ echo foo |lt-proc -w -a en-es.automorf.bin ^foo/*foo$ It's not 1) $ echo foo |lt-proc -e -a en-es.automorf.bin lt-proc: process a stream with a letter transducer [SNIP] It's a conflict between -e and -a. I think that was because -e can be seen as a replacement for -a (another main mode, and it doesn't make sense to use it with -b nor -g), so I'd say it's a not-a-bug. -- Kevin Brubeck Unhammer -- 10 Tips for Better Server Consolidation Server virtualization is being driven by many needs. But none more important than the need to reduce IT complexity while improving strategic productivity. Learn More! http://www.accelacomm.com/jaw/sdnl/114/51507609/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] debug mode files
Francis Tyers fty...@prompsit.com writes: El dg 18 de 12 de 2011 a les 22:42 -0500, en/na Hector va escriure: Hi all, here's a hack to help debug the output of mode files in Apertium: https://gist.github.com/1495251 The trick is to use the tee command. Then sed is used to replace the pipes in the mode file with tee /dev/tty. The sample output will give you a clear idea of what happens through the pipeline. Hope it helps. Best! Cool! A similar script, although a bit more involved, is Unhammer's 'apertium-view.sh': http://wiki.apertium.org/wiki/Apertium-view.sh … more involved, but not necessarily better … I definitely prefer Hector's simple solution, never knew you could tee to /dev/tty :-) -- Kevin Brubeck Unhammer -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Problem with new language HOWTO
Bartias bart...@o2.pl writes: I can include the files, but I'm not sure how - should I post the code somewhere, or send the files to you? Anyway, with the help of Unhammer, I did manage to get some progress. Currently, the program does produce a correct output for domy --- houses (command: http://codepad.org/52ZA8kvv) However, in the opposite direction, the result for houses is #dom (command: http://codepad.org/RrjDTEYH) When I drop off the last command, that is when i type in echo houses | lt-proc en-pl.automorf.bin | ./gawk | apertium-transfer apertium-pl-en.pl-en.t1x en-pl.t1x.bin en-pl.autobil.bin I get ^domnpl$ I did type in lt-comp rl apertium-pl-en.en.dix en-pl.autogen.bin but it did not change anything. That last command looks like it should be lt-comp rl apertium-pl-en.pl.dix en-pl.autogen.bin (assuming this is for compiling the Polish generator, for the en→pl direction.) -- Kevin Brubeck Unhammer -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] regular expressions in lttoolbox: proposal
Francis Tyers fty...@prompsit.com writes: Is there anyone else that would be interested in having an alphabet symbol for expressions in lttoolbox like \w -- e.g. any alphabetic char ? It would avoid having long (and inadequate) lists such as the following: ÀÁÂÄÇÈÉÊËÌÍÎÏÑÒÓÔÖÙÚÛÜàáâäçèéêëìíîïñòóôöùúûüABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz I conceive of it working as follows: * \w would be compiled into a special symbol. * in state.cc, the apply() method would be changed to check if the input is !punctuation and !space, and then follow it (or something) Something along these lines is preferable to just adding all unicode character symbols. Another option would be to make \w basically the same as the alphabet in the dictionary files. (Although this would mean that I would need to add an alphabet header to the lexical selection rule format) Any thoughts / comments ? I think I prefer the first option (\w means !punct!space, perhaps non-numeric?). Sounds useful, although aren't those long regexes normally used for names? Because then it would perhaps make sense to additionally have a symbol \u or something for only uppercase characters (although I don't know how good isupper(unicode) is in whatever lib lttoolbox uses). Imagine re\u\w+ \u\w+/re instead of the mess that currently is ... -- Kevin Brubeck Unhammer -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Problem with HTML deformatter
Miquel Esplà miqueles...@gmail.com writes: Hi everybody, I am performing some experiments with the fr-es pair. I am trying to translate small n-grams (with 1=n=3). To be sure that they are translated independently, I enclose each n-gram into HTML paragraph tags (p/p). Now, this is my problem: the defformater adds a dot at the end of each n-gram. One of my n-grams in French ends with the word avr (I know it means nothing, but it is automatically extracted from a text) and when the dot is added, it is recognised by the lt-proc as an abbreviation of avril. As a consequence, this paragraph and the following one are concatenated in the resulting translation. This is an example of the output of the deformatter: .[][htmlbodyp]- avr.[][\/pp]- axes.[][\/p\/body\/html and this is what the lt-proc outputs: ^./.sent$[][htmlbodyp]- ^avr./avr.nmsg$[][\/pp]- ^axes/axenmpl/axer vblexprip2sg/axervblexprsp2sg$^./.sent$[][\/p\/body\/html I have been taking a look to the defformater deffinition, but I am not sure about how to solve this. I guess if a space were added before the dot by the deformatter, the problem would be solved, but I am not sure about where to add this feature. May anybody help me? cd trunk/apertium wget http://paste.pocoo.org/raw/536603/ -O nodot.patch patch -p0 nodot.patch make make install This will ensure none of the deformatters add any dots that weren't in the input text. I'm not sure why they do in the first place. Perhaps it helps tagging some times. I just find it a nuisance, so I keep an install in another prefix for when I want a deformatter that doesn't mess up punctuation. hope this helps, Kevin Brubeck Unhammer -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Problem with HTML deformatter
Miquel Esplà miqueles...@gmail.com writes: Hi Kevin and Fran, I Kevin's patch, but somthing changes that causes mistakes when reformatting. I will try to add some rubbish at the end of my segments. Anyway, thank you so much for your help! Ah, yes the reformatter tries to remove those inserted dots again, but if none of your dots are inserted by the deformatter, then it'll remove stuff it shouldn't. http://paste.pocoo.org/raw/537208/ should make it stop doing that, but I haven't really tried reformatting much without the dot. -Kevin 2012/1/18 Kevin Brubeck Unhammer unham...@fsfe.org Miquel Esplà miqueles...@gmail.com writes: Hi everybody, I am performing some experiments with the fr-es pair. I am trying to translate small n-grams (with 1=n=3). To be sure that they are translated independently, I enclose each n-gram into HTML paragraph tags (p/p). Now, this is my problem: the defformater adds a dot at the end of each n-gram. One of my n-grams in French ends with the word avr (I know it means nothing, but it is automatically extracted from a text) and when the dot is added, it is recognised by the lt-proc as an abbreviation of avril. As a consequence, this paragraph and the following one are concatenated in the resulting translation. This is an example of the output of the deformatter: .[][htmlbodyp]- avr.[][\/pp]- axes.[][\/p\/body\/html and this is what the lt-proc outputs: ^./.sent$[][htmlbodyp]- ^avr./avr.nmsg$[][\/pp]- ^axes/axenmpl/ axer vblexprip2sg/axervblexprsp2sg$^./.sent$[][\/p\/body\/html I have been taking a look to the defformater deffinition, but I am not sure about how to solve this. I guess if a space were added before the dot by the deformatter, the problem would be solved, but I am not sure about where to add this feature. May anybody help me? cd trunk/apertium wget http://paste.pocoo.org/raw/536603/ -O nodot.patch patch -p0 nodot.patch make make install This will ensure none of the deformatters add any dots that weren't in the input text. I'm not sure why they do in the first place. Perhaps it helps tagging some times. I just find it a nuisance, so I keep an install in another prefix for when I want a deformatter that doesn't mess up punctuation. hope this helps, Kevin Brubeck Unhammer -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] REQUEST NICE FOR INSTALL APERTIUM IN VIRTUAL MACHINE
john felipe urrego mejia ingenierofelipeurr...@gmail.com writes: Hi, can give me a light that I need to take a something like http:// translator.apertium.eu/ in local enviroment 192.168.xx.xx, I install virtualbox and ubuntu VM, please help. You should be able to install Apertium + Lttoolbox + your language pairs of choice using this guide: http://wiki.apertium.org/wiki/Apertium_on_Ubuntu If you want to set up a web page, I think https://help.ubuntu.com/community/ApacheMySQLPHP is the official Ubuntu guide. The source of apertium.org is in http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/webspace I don't know what translator.apertium.eu runs, but if you install Apache+PHP and put webspace in your /var/www, you should get your translator at 192.168.something.something. -- Kevin Brubeck Unhammer -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] From the libvoikko list: Developing spellchecker infrastructure from automata
Trond Trosterud trond.troste...@uit.no writes: Quoting a letter from a collegue of mine, Sjur Moshagen (sjur.mosha...@uit.no). Since many of the new apertium languages now are built as fst with lexc / hfst, a setup like the one referred to below would as a side effect create spellcheckers directly from apertium-based automata. People interested can contact Sjur (address above). Not only the lexc/hfst-based ones; libvoikko has (experimental) support for lttoolbox fst's too :) see http://wiki.apertium.org/wiki/Spell_checking Regarding the level of experimentalness, http://sourceforge.net/apps/trac/voikko/wiki/libvoikko/SupportedLanguages says: * BLOCKER: Namespace pollution (generic class names in default namespace) * PROBLEM: No proper method for performing analysis on a string argument. The BLOCKER sounds like it requires a lot of sedding in lttoolbox, might require the same replacements in apertium too? The PROBLEM I'm pretty sure is solved. The method in http://wiki.apertium.org/wiki/Lttoolbox_API#Using_as_a_module_from_Python should work now (at least the known bugs are dealt with). --- Hello list members, The Divvun group at the University of Tromsø is looking for someone that could upgrade the VoikkoSpellService code for payment. The upgrade should contain at least the following features: * support for multiple speller languages * support for the latest version of libvoikko, with support for zhfst files * support for speller files within user home dirs (~/.voikko/...) * universal binary (at least 32 and 64 bit, perhaps also PPC) I have asked both the main developer of Voikko and the developer of the present VoikkoSpellService code if they would like to do it, but they have kindly asked me to forward the request to the list. If anyone is interested, please contact me off-list. Regards, Sjur Moshagen Divvun/UiT www.divvun.no -- Kevin Brubeck Unhammer -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Using Lttoolbox from within java
stevens35 steven...@llnl.gov writes: Hello, I've been searching around for a good morphological analyzer for a while and came across Lttoolbox. The analyzer step does exactly what I want for words in a language, it splits the word into it's lexical base and then adds in morphological tags based on how the word was formed.Up until now, I've just been using the Porter Stemmer to get the root word, but it's always been displeasing because it throws away the rest of the surface form. However, most of the text processing code I work with is in Java, and if possible, I'd like to keep everything within Java. Had anyone had any experience linking to Lttoolbox from Java? Or does anyone know of any java versions of Lttoolbox that utilize the existing dictionaries, or a similar tool for java? lttoolbox-java works fine with all the existing dictionaries, and should be feature-complete with the C++ version. lttoolbox and lttoolbox-java are completely independent of each other, so you don't need the C++ version to use the Java version and vice versa, so keeping everything within Java should work fine. -- Kevin Brubeck Unhammer -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-af-nl release
Congrats :) Pim Otte otte@gmail.com writes: Hello everyone, We are proud to present: the release of apertium-af-nl v0.2.0. With the Afrikaans half from the currently dormant af-en project and the Dutch half as a product from the past two Google Code-ins. This means apertium-af-nl is now in the trunk and the release is available at https://sourceforge.net/projects/apertium/files/apertium-af-nl/apertium-af-nl-0.2.0.tar.gz/download Some stats: Coverage: Afrikaans: Number of tokenised words in the corpus: 4691956 Number of known words in the corpus: 3919969 Coverage: 83.5 % Dutch: Number of tokenised words in the corpus: 105037639 Number of known words in the corpus: 86269947 Coverage: 82.1 % Number of entries: Afrikaans: 7459 Bilingual: 6152 Dutch: 7236 It is testvoc clean. More statistics and information on the construction can be found in Otte, P. and Tyers, F. M. (2011) Rapid rule-based machine translation between Dutch and Afrikaans. Proceedings of the 16th Annual Conference of the European Association of Machine Translation, EAMT11 ( xixona.dlsi.ua.es/~fran/publications/eamt2011a.pdf ) Regards, Pim -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lint Checker Ideas for GSOC
Aaron Rubin aaronjrub...@gmail.com writes: Hi all, I've spoken with Francis a few times over e-mail and been on IRC a bit, but I don't think I've introduced myself to the whole listhost. I'm a third-year student at the University of Chicago, majoring in linguistics with a minor in Comp Sci. Most of my programming experience is doing various analyses of text files in C, so it seemed that of all the project ideas, the lint tester for suspicious constructs in .dix files would be the best for me (I thought about proposing a Japanese-English language pair, but Google does a fairly OK job with Japanese as it is, and there's no way I could surpass that in three months). I've already written a duplicate tag checker in C and sent it out to the listhost earlier today, and I've been thinking about how I'd implement some of the other suggestions on the lint tester ideas page, as well as a few ideas of my own. The problem, though, is that I'm not sure how I'd be able to fill up the whole summer doing it! This is my tentative schedule: Week 1: Redundant Entry Finder Week 2: Testing Full Entries in Lemmas where Part of the Lemma is Specified by the Pardef Week 3: Testing Misspelled Tags and Pardefs Week 4: Testing Incompatible Tags (multiple gender tags instead of combined tags for nouns of ambiguous gender, multiple number tags, a noun and adj tag on the same entry) Week 5: Testing Tag Missing on One Side of Translation Equivalents (a noun tag on the English side, but not on the Spanish side) Week 6: Testing Missing Gender on Gendered Languages (this would be an intricate one... I'd have to investigate which of the languages in the language pairs have gender or noun class systems and have the program take that into account) But not all of those would necessarily take up a week, and there's no way that all of this will take 12 weeks! So I've been thinking about common errors that might show up in transfer rules files, but nothing's really come to mind. Has anyone else noticed common mistakes in .dix or transfer rules files that would be suitable for this kind of program to look for? Say you're editing a transfer file that has def-attr n=a_det attr-item tags=det/ attr-item tags=det.emph/ attr-item tags=det.dem/ attr-item tags=det.itg/ attr-item tags=det.qnt/ attr-item tags=det.pos/ /def-attr … not equal clip pos=1 side=tl part=a_det/ lit v=/ /equal /not (ie. it's not a determiner at all) and you want to make it a more specific requirement, like it has to be the tag sequence detpos. It's easy to leave out the -tag and write not equal clip pos=1 side=tl part=a_det/ lit v=det.pos/ /equal /not where the correct version would be not equal clip pos=1 side=tl part=a_det/ lit-tag v=det.pos/ /equal /not or to write det.poss or something, which would never match since it's not defined in a_det. Here you could give a warning if the user tests for a def-attr-defined clip being anything other than 1) empty, 2) a tag/tag sequence from the def-attr, or 3) a variable. There are also default clips not defined in def-attr, like lemh, lemq, lem, that can contain empty or non-empty lit's, but never tags. I guess you could also do the same for begins-with instead of equal. You could probably also warn about in clip part=a_det/ list n=some-list-that-is-disjoint-from-a_det/ /in And then there's calling a macro with the wrong amount of arguments; the various vm for transfer compilers show this check, but the standard one does not, so it wouldn't hurt to put it in. -Kevin -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lint Checker Ideas for GSOC
Jacob Nordfalk jacob.nordf...@gmail.com writes: 2012/3/21 Aaron Rubin aaronjrub...@gmail.com But not all of those would necessarily take up a week, and there's no way that all of this will take 12 weeks! So I've been thinking about common errors that might show up in transfer rules files, but nothing's really come to mind. Has anyone else noticed common mistakes in .dix or transfer rules files that would be suitable for this kind of program to look for? This might not strictly be a lint checkers job, but have a look at 'beginner errors' like breaking the XML or not following the XML Schema: - forgetting an end tag (like writing s n=adj instead of s n=adj/ - messing up the 's, like writing s n=adj - mis-naming an attribute - forgetting attributes etc. I am not sure these kinds of errors are always reported to the user in a meaningfull way by the compiler. However, I am sure that there are some users that use a lot of time struggling with such errors. So I'd suggest leaving a week for seeing if there is something you could do to help out dix editor novices. Couldn't the lint just run apertium-validate-dictionary first? Although, one issue is what to do about those who use xslt transformed dictionaries (e.g. using the alt attribute). I'm guessing it would be easier to run lint on the transformed dictionary, but not as helpful since line number would have changed. On the other hand, if you run lint on the source dictionary, it won't validate (though you can still check that it's well-formed XML). -Kevin -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Lint Checker Ideas for GSOC
Jimmy O'Regan jore...@gmail.com writes: On 22 March 2012 08:29, Francis Tyers fty...@prompsit.com wrote: isn't there. (b) Or using lit when you mean lit-tag because what you're checking against can only be a tag. Counter example: let clip pos=1 side=tl part=some_part/ lit v=/ /let http://permalink.gmane.org/gmane.comp.nlp.apertium/1676 so def-attrs can be empty lit's, lit-tags, or variables (but not non-empty lit's), and this goes for at least equal, begins-with (ends-with? can't remember if we have that), and let. Another exception is that you can do stuff like concatlit v=amp;lt;/lit v=tag/lit v=amp;gt;//concat on the right hand side. I'm not sure why you'd want to though (unless you were using a variable, in which case we're out of lint's league anyway), and in the above example I would want a warning. -Kevin -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat
Francis Tyers fty...@prompsit.com writes: El dl 26 de 03 de 2012 a les 07:08 +0400, en/na Ilnar Salimzyan va escriure: I am sorry for this email to be so long. Honestly, I shortened it several times. Consider it to be the first draft of the proposal. No problem :) Dear Apertium mentors, my name is Ilnar Salimzyanov (‘selimcan’ on Sourceforge and IRC, ‘Ilnar.salimzyan’ on Apertium’s wiki, ‘Ilnar Salimzyan’ on many other places). Hi Ilnar! :D My native language is Tatar, I also speak Russian on native level. = Reason I am writing = I would like to apply for Google Summer of Code and work on adopting Turkish-Tatar / Kazakh-Tatar language pair. I am writing here to discuss my plans, to get some feedback, which would facilitate writing my proposal. = Who I am / Some background information = I am the first year master’s student at the Kazan Federal University, studying Applied Linguistics [1]. I got to know about Apertium first time in 2009, while writing a small paper at the university on comparison of available machine translation systems. Apertium fascinated me then being open source, showing rapid growth and being a good potential starting point for Tatar and other Turkic languages (yes, I have thought about them too). I played around with lttoolbox dictionary for Tatar (bad idea, I know, but I didn’t know about FSTs then and there weren’t any other Turkic languages involved). I even managed to model nouns morphotactics using it! :) Well, lttoolbox also produces FSTs, the difference is that in HFST you have separate morphotactics and morphophonology transducers, which are then composed to form the final transducer in lttoolbox there is only a single transducer. Back in 2009 I translated part of the Official Documentation into Russian [2] (till chapter 3.2.3; besides someone willing to finish it the translation needs a good editor). Also in 2009 I translated Apertium New language pair Howto into Russian. I was one of the participants of the Šupaškar Apertium Workshop, held in January this year, where Francis Tyers, Hector Alos-i-Font, Jonathan Washington and Trond Trosterud were instructors. Cool =D I was very fortunate to see Jonathan and Francis work on Tatar-Bashkir pair as an example pair for the Šupaškar Workshop and move it to nursery. It is very useful to have a transducer for my native language (and a language closest to it) to learn the semantics and structure of lexc and twol files (which I wasn’t really familiar with, since using HFST with Apertium is relatively new thing and it is not mentioned in the Official Documentation), along with the reading of the famous FSMBook. :) I have been involved in work on Tatar-Bashkir pair as, let’s say, “language-consultant” and “tester”. With another fellow from Ufa we have been translating top-5000 wordlist of Russian National Corpus into Tatar and Bashkir. This translations were added then to the translator files. Also, I have been analyzing some errors in the translations finding out, where Apertium-tt-ba performed not so well, describing it on the wiki [3,4] and commiting from time to time to svn. = Resources = // I will list all relevant resources on the wiki before submitting the proposal// For both language pairs I will not have to start from absolute scratch. Transducers for all three languages — Turkish, Kazakh and Tatar — perform quite well, having 87%, 76% and 56% coverage each [5]. Having that, I thought that the crucial thing to benefit from these separate transducers most with less work is to write bidix files, translating words from each lexc file into Tatar. == Bilingual dictionaries == ===Kazakh=== All words in kazakh.lexc [6] were commented with English glosses (thanx who had done this!). Using a simple sed one-liner, I prepared bidix entries with Kazakh words as the left side, putting english glosses again into comments. In few hour’s work, I translated ~500 nouns (not proper nouns) and most of the adjectives into Tatar [7]. For Kazakh words which look very similar to Tatar ones and have the same meaning as these Tatar equivalents, this can be done very quickly. For other I consulted Kazakh-Russian dictionaries too, but again, translating all remaining words from kazakh.lexc will take no more than few days of focused work. ===Turkish=== Unfortunately very few words in Turkish.lexc have English glosses. But there is a Tatar-Turkish dictionary, which was released under GPL [8], and another Tatar-Turkish online dictionary [9], also under GPL. The process will be similar for Turkish too — take stems from turkish.lexc, put them automatically to bidix and translate them into Tatar, consulting where necessary dictionaries mentioned above or Turkish-Tatar dictionary in print. ==Parallel corpora== Some sentences for Turkish-Tatar are available at Tatoeba project. As a source for parallel corpora Bible or Quran translations can
Re: [Apertium-stuff] Proposal: don't prefix paths in apertium-gen-modes (but prefix dirname $0 to PATH in apertium)
Stephen Tigner stephen.tig...@gmail.com writes: On Sun, Mar 25, 2012 at 10:49 AM, Jacob Nordfalk jacob.nordf...@gmail.com wrote: +1 for the proposal. 2012/3/24 Stephen Tigner stephen.tig...@gmail.com I think I'm gonna need to read that again a few times to see if that'd affect the Java runtime at all, Unhammer is actually writing this as a result of a discussion I started on IRC on the occasion that I didnt like lttoolbox-java to have to cut away these paths. (Unhammer, I want you as my ghost writer :-) Ah, I see. makes more sense now, thanks. but I thought I'd at least pitch in with an explanation of how the Java runtime currently handles .mode files. Thanks for a great explanation. Anyone who wants to browse the code he explains can look at http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/pipeline/ A quick fix that uses path could be to check for existence of the program at the specified path, and if not, try running it w/ just the command name w/o a full path. Its not that clear from the code diff, but the idea is first to look for the commands in the installation dir, then on the general PATH: PATH=${APERTIUM_PATH}:${PATH} I think the java port should do the same, but first check if the task can be done by lttoolbox-java itself internally. So, like PATH=can we do it without invoking external stuff?:${APERTIUM_PATH}:${PATH} :-) Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable? If so, that should be fairly easy to implement. Just need to tweak a bit how the UNKNOWN programs are called. I'll try and take a look at it tonight if I have time. n.n If you install apertium to /usr/local/bin/, the shell script /usr/local/bin/apertium will have APERTIUM_PATH=/usr/local/bin as the second line. -- Kevin Brubeck Unhammer -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] About de-duplicating of dictionaries
Ilnar Salimzyan ilnar.salimz...@gmail.com writes: This thread grew out of the discussion of my proposal draft [see GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat from March 26]. Having discussed the problem of monodixes/lexc-files copied in many pairs (and in more and more pairs) with Jonathan and seeing that people at IRC come to this question quite often (Like What lexc of Tatar should I choose for my new Tatar-X translator?), I decided to start a new discussion here :) On Mon, Mar 26, 2012 at 2:37 PM, Kevin Brubeck Unhammer unham...@fsfe.org wrote: It'd be nice to have some general method for deduplicating dictionaries I think we all share the same view. Obvious that having single transducers for many related languages compatible with each other is great. It would facilitate creation of new translators. And I think that keeping them compatible on the tags/morphotactics level can and should be done. … We use a trimming script in apertium-sme-nob; with this method, you would have apertium-kaz and apertium-tat as just development dependencies. So you'd add stuff to apertium-kaz/kaz.lexc and to your bidix, and then run a script from apertium-kaz-tat with the path to apertium-kaz and it creates a file apertium-kaz-tat/kaz.lexc (and you never change this file, although it's in SVN). Similarly for tat.lexc. This works, as long as the trimming script is well configured, but perhaps it'd be 'cleaner' to have apertium-kaz/apertium-tat as make dependencies and do the trimming each time you type make (no need for apertium-kaz-tat to have generated kaz.lexc/tat.lexc files in SVN). (The weak point in the chain is the trimming script though, which expects the lexc files to be fairly easily parsable (they're not, really). Ideally we would have ways of trimming both HFST and lttoolbox dictionaries so that we never had to copy-paste anything between pairs, but language pairs tend to have stuff in them that's rather specific to that pair, not sure how that is best dealt with.) = Reasons why we have monodixes copied = 1. Historical (there weren't many pairs having common part initially, but Apertium keeps growing); 2. Because of the stuff specific to a given pair. = Some imaginable solutions = Just to sum up: 1. Transducers for language A and Language B as make-dependencies; 2. Mono-dictionaries in apertium-langA and apertium-langB as development-dependencies + some trimming / duplicating / keeping-up-to-date scripts. = Strengths and weaknesses of each solution = Strengths and weaknesses become clear when we 'do' need to add language-pair-specific stuff to mono-dictionaries. All examples that come up in mind are for Russian-Tatar (=not related languages), so for related languages this might be not relevant. Maybe they won't need any pair-specific-stuff in their mono-dictionaries at all, but this sounds too good to be true :) Consider Russian word заговорить (start to talk). To Tatar it is translated with two words, just like to English. And in Russian-Tatar / Russian-English pair we will need to add start to talk as a multiword. I am sure that similar cases, when a word of languageA is translated to languageB with a multiword, can be found for related languages too. == 1. Make-dependencies == We can add such words to monodictionaries in apertium-langA, separating them into sublexicons or commenting them like this stuff is needed for langA-langB pair. But this way transducer will become noisier and noisier. == 2. Mono-dictionaries in apertium-langA and apertium-langB as development-dependencies + some trimming / duplicating / keeping-up-to-date scripts == In this case monodictionaries in apertium-langX are considered to be something like vanilla software. They are kept close to linguistical traditions of POS-tagging etc. And they serve as base for building new pairs involving this languages. Modifying them for a given pair is like patching the vanilla software. A script could keep this modified versions in apertium-langX-langY up-to-date with mono-dictionaries in apertium-langX and apertium-langY. A challenge here is not to overwrite modifications while updating. Although script used in sme-nob solves the problem of updating, as I understand, it will overwrite any modifications made in apertium-sme-nob. And I am not sure if this can be done at all technically. We never modify the trimmed dictionary, we consider it a generated file. All modifications go to the dictionary it was trimmed from. Although we don't, we _could_ actually have sme-nob-specific additions to the sme dictionary. It shouldn't be much worse than concatenating another .lexc file onto the trimmed sme.lexc. Note that this would only be lexicon additions (like start to talk, good example), not changes to tagging etc. On the other hand, if you're already trimming, it shouldn't hurt to put start to talk into the monolingual module (apertium-eng or whatever
Re: [Apertium-stuff] apertium tagger usage
Orosz György oros...@itk.ppke.hu writes: Dear All, I am asking your help, hope someone can clarify these thigs: I am wondering if it is possible to use the apertium tagger as a standalone application, without creating all the resources used by the MT system. It's possible to use it by itself, like echo '^foo/foonsg/fooij$ ^bar/barnsg/barvblexinf$' | apertium-tagger en.prob I don't think you can compile apertium-tagger without compiling other core apertium functions, if that's what you mean. But the only other build dependency you have to compile is lttoolbox, and when you've got them installed, apertium-tagger is perfectly usable by itself (without using the rest of the scripts). We have a morphologically disambiguated training corpus, and a morphological analyzer. Is it possible to train the tagger in a supervised mode using only these resources? (Of course we can convert the output of the MA to the format that is used by Apertium.) If not, could anyone explain how to use the tool? The manual states the following parameters are needed: apertium-tagger[-d] -s=n DIC CRP TSX TAGGER_DATA HTAG UNTAG http://wiki.apertium.org/wiki/Tagger_training should be a good starting point (it seems no one has written the section on Supervised training, but see Question 2 under http://wiki.apertium.org/wiki/Unsupervised_tagger_training#Improving_the_tagger_performance ). hope this helps, Kevin -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] About de-duplicating of dictionaries
Ilnar Salimzyan ilnar.salimz...@gmail.com writes: On Wed, Mar 28, 2012 at 11:11 AM, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Ilnar Salimzyan ilnar.salimz...@gmail.com writes: This thread grew out of the discussion of my proposal draft [see GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat from March 26]. Having discussed the problem of monodixes/lexc-files copied in many pairs (and in more and more pairs) with Jonathan and seeing that people at IRC come to this question quite often (Like What lexc of Tatar should I choose for my new Tatar-X translator?), I decided to start a new discussion here :) On Mon, Mar 26, 2012 at 2:37 PM, Kevin Brubeck Unhammer unham...@fsfe.org wrote: It'd be nice to have some general method for deduplicating dictionaries I think we all share the same view. Obvious that having single transducers for many related languages compatible with each other is great. It would facilitate creation of new translators. And I think that keeping them compatible on the tags/morphotactics level can and should be done. … We use a trimming script in apertium-sme-nob; with this method, you would have apertium-kaz and apertium-tat as just development dependencies. So you'd add stuff to apertium-kaz/kaz.lexc and to your bidix, and then run a script from apertium-kaz-tat with the path to apertium-kaz and it creates a file apertium-kaz-tat/kaz.lexc (and you never change this file, although it's in SVN). Similarly for tat.lexc. This works, as long as the trimming script is well configured, but perhaps it'd be 'cleaner' to have apertium-kaz/apertium-tat as make dependencies and do the trimming each time you type make (no need for apertium-kaz-tat to have generated kaz.lexc/tat.lexc files in SVN). (The weak point in the chain is the trimming script though, which expects the lexc files to be fairly easily parsable (they're not, really). Ideally we would have ways of trimming both HFST and lttoolbox dictionaries so that we never had to copy-paste anything between pairs, but language pairs tend to have stuff in them that's rather specific to that pair, not sure how that is best dealt with.) = Reasons why we have monodixes copied = 1. Historical (there weren't many pairs having common part initially, but Apertium keeps growing); 2. Because of the stuff specific to a given pair. = Some imaginable solutions = Just to sum up: 1. Transducers for language A and Language B as make-dependencies; 2. Mono-dictionaries in apertium-langA and apertium-langB as development-dependencies + some trimming / duplicating / keeping-up-to-date scripts. = Strengths and weaknesses of each solution = Strengths and weaknesses become clear when we 'do' need to add language-pair-specific stuff to mono-dictionaries. All examples that come up in mind are for Russian-Tatar (=not related languages), so for related languages this might be not relevant. Maybe they won't need any pair-specific-stuff in their mono-dictionaries at all, but this sounds too good to be true :) Consider Russian word заговорить (start to talk). To Tatar it is translated with two words, just like to English. And in Russian-Tatar / Russian-English pair we will need to add start to talk as a multiword. I am sure that similar cases, when a word of languageA is translated to languageB with a multiword, can be found for related languages too. == 1. Make-dependencies == We can add such words to monodictionaries in apertium-langA, separating them into sublexicons or commenting them like this stuff is needed for langA-langB pair. But this way transducer will become noisier and noisier. == 2. Mono-dictionaries in apertium-langA and apertium-langB as development-dependencies + some trimming / duplicating / keeping-up-to-date scripts == In this case monodictionaries in apertium-langX are considered to be something like vanilla software. They are kept close to linguistical traditions of POS-tagging etc. And they serve as base for building new pairs involving this languages. Modifying them for a given pair is like patching the vanilla software. A script could keep this modified versions in apertium-langX-langY up-to-date with mono-dictionaries in apertium-langX and apertium-langY. A challenge here is not to overwrite modifications while updating. Although script used in sme-nob solves the problem of updating, as I understand, it will overwrite any modifications made in apertium-sme-nob. And I am not sure if this can be done at all technically. We never modify the trimmed dictionary, we consider it a generated file. All modifications go to the dictionary it was trimmed from. Although we don't, we _could_ actually have sme-nob-specific additions to the sme dictionary. It shouldn't be much worse than concatenating another .lexc file onto the trimmed sme.lexc. Note that this would only be lexicon additions (like start to talk, good example), not changes to tagging etc
[Apertium-stuff] soft hyphens and tokenisation
Hi, I notice that soft/hidden hyphens (#173;) can split words, e.g. in Jespersen there's a soft hyphen between n and t, but it should be analysed as one word. I've noticed this a lot in web pages, I guess a lot of news sites and such use programs that hyphenate using that character. The problem is, if we don't have the soft hyphen in alphabet, we get two lexical units; if we have it there, we get one unknown word, even if Jespersen is in the dix. Is it possible to use ACX files[1] or something to say that any soft hyphen can be skipped? It seems sort of similar to what ACX does at least … [1] http://wiki.apertium.org/wiki/Acx -Kevin -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] soft hyphens and tokenisation
Kevin Brubeck Unhammer unham...@fsfe.org writes: Hi, I notice that soft/hidden hyphens (#173;) can split words, e.g. in Jespersen there's a soft hyphen between n and t, but it should be analysed as one Wops, between r and s! word. I've noticed this a lot in web pages, I guess a lot of news sites and such use programs that hyphenate using that character. The problem is, if we don't have the soft hyphen in alphabet, we get two lexical units; if we have it there, we get one unknown word, even if Jespersen is in the dix. Is it possible to use ACX files[1] or something to say that any soft hyphen can be skipped? It seems sort of similar to what ACX does at least … [1] http://wiki.apertium.org/wiki/Acx -Kevin -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] suggestion for lt-proc generation
Francis Tyers fty...@prompsit.com writes: Hello all, What do you think about having a new mode (yes, *groan*, a new mode) for lt-proc where we can generate keeping the lexical form, e.g. Input: ^cantarvblexpresp1sg$ ^depr$ Normal output '-g': canto ~de Output with '-k -g' mode: ^canto/cantarvblexpresp1sg$ ^~de/depr$ This would be teamed with a '-k -p' mode for postgeneration which would strip the analysis: canto ~de Why would this be useful? Well, I could imagine you could do stuff with the analysis. Like with a language model or something. Hmm, if, e.g. you wanted to have a module which posteditted prepositions or something and you wanted to train it on lexical forms as well as surface forms. Fran +1. Could be useful in testvoc as well. Also, lt-proc -a already goes from l to both l and r – going from r to both l and r would make -g more symmetric :) -Kevin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Install on Debian server
Per Tunedal per.tune...@operamail.com writes: Hi, my list of locals: C POSIX sv_SE.UTF-8 Do I need something for the other languages like Spanish and French? How do I create that? Any UTF-8 locale should do. Does $ echo J'ai deux frères | LANG=sv_SE.UTF-8 apertium fr-es not work either? -Kevin On Thu, May 10, 2012, at 09:30, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: --snip-- 1. special characters aren't recognised eg. the example echo J'ai deux frères | apertium fr-es gives an error on frères. I think you just need to set a UTF-8 locale, put e.g. export LANG=sv_SE.UTF-8 in your ~/.bashrc or any scripts that run apertium. The command $ locale -a should give you a list of locales, if you don't have any UTF-8 ones, you can do e.g. $ echo sv_SE.UTF-8 UTF-8 | sudo tee -a /var/lib/locales/supported.d/local $ sudo dpkg-reconfigure locales --snip-- -Kevin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Install on Debian server
Per Tunedal per.tune...@operamail.com writes: No, that doesn't work either. I get: Tengo dos *fruser@computer:~$ :( Then I'm at my wits' end. Anyone? I have a desktop installation of Debian where I have installed Apertium with Synaptic. It works as expected, with one exception: If I put an exclamation mark after a sentence I get: !: Event not found (The same happens in my server installation.) Yeah, that's not Apertium-related. Bash interprets ! as a special symbol even within -quotes, so you'll get that even with just $ echo J'ai deux frères! You can use $ echo J'ai deux frères'!' instead (ie. you can follow -quotes by '-quotes, and so on), but the safest general method is to put the text into a file. -Kevin On Thu, May 10, 2012, at 17:43, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, my list of locals: C POSIX sv_SE.UTF-8 Do I need something for the other languages like Spanish and French? How do I create that? Any UTF-8 locale should do. Does $ echo J'ai deux frères | LANG=sv_SE.UTF-8 apertium fr-es not work either? -Kevin On Thu, May 10, 2012, at 09:30, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: --snip-- 1. special characters aren't recognised eg. the example echo J'ai deux frères | apertium fr-es gives an error on frères. I think you just need to set a UTF-8 locale, put e.g. export LANG=sv_SE.UTF-8 in your ~/.bashrc or any scripts that run apertium. The command $ locale -a should give you a list of locales, if you don't have any UTF-8 ones, you can do e.g. $ echo sv_SE.UTF-8 UTF-8 | sudo tee -a /var/lib/locales/supported.d/local $ sudo dpkg-reconfigure locales --snip-- -Kevin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] suggestion for lt-proc generation
Jimmy O'Regan jore...@gmail.com writes: On 30 April 2012 10:39, Jimmy O'Regan jore...@gmail.com wrote: On 30 April 2012 10:21, Francis Tyers fty...@prompsit.com wrote: Hello all, What do you think about having a new mode (yes, *groan*, a new mode) for lt-proc where we can generate keeping the lexical form, e.g. Input: ^cantarvblexpresp1sg$ ^depr$ lt-proc -l ...by which I mean, 'that mode already exists, and it's -l (or --tagged-gen)'. I added it for feeding Apertium output into a speech synthesiser, so I had no need for post-generation. Sergio originally committed it as '-b' but later had a better candidate for 'b' so changed it to 'l'. He must have missed 'k' :) It seems to skip @-tagged words: $ echo kiteb|apertium -d . mt-ar-transfer ^@kitebvblexpastp3msg$^.sent$ $ echo kiteb|apertium -d . mt-ar-transfer |lt-proc -l mt-ar.autogen.bin ^./.sent$ (bug or feature?) -Kevin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Proposal: don't prefix paths in apertium-gen-modes (but prefix dirname $0 to PATH in apertium)
Jacob Nordfalk jacob.nordf...@gmail.com writes: Hi there! As no objections have been raised I think we can conclude that this proposal has PASSED :-) Unhammer, could apply your patch to the SVN trunk ? Jacob Committed revision 38346. 2012/3/27 Stephen Tigner stephen.tig...@gmail.com On Mon, Mar 26, 2012 at 1:04 PM, Jacob Nordfalk jacob.nordf...@gmail.com wrote: 2012/3/26 Stephen Tigner stephen.tig...@gmail.com [snip] Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable? Sorry, it wasnt clear (again - Unhammer, your'e fired as ghost writer :-). APERTIUM_PATH would be the path to the 'apertium binary' In shell language it would be expressed as APERTIUM_PATH=$(dirname $0) this makes sure that binaries are first searched for in the same directory as the 'apertium' command. Ah, okay. That makes more sense then. On Mon, Mar 26, 2012 at 1:08 PM, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Stephen Tigner stephen.tig...@gmail.com writes: [snip] Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable? If so, that should be fairly easy to implement. Just need to tweak a bit how the UNKNOWN programs are called. I'll try and take a look at it tonight if I have time. n.n If you install apertium to /usr/local/bin/, the shell script /usr/local/bin/apertium will have APERTIUM_PATH=/usr/local/bin as the second line. So basically it just prepends the apertium path to the system path? Well, I don't really think any modification of the existing Java code would be needed, then. Because the desired behavior is already the current behavior, as long as you remove the explicit paths from the mode file. This is because it always checks if it can be done internally first, and then it already depends on the host runtime to run the UNKNOWN programs, and that of course would reference the system path and any conventions the system has for finding executables to run. (Like the convention that the current working directory is always considered implicitly first on the path in Windows.) I used the same trick (letting the host runtime handle path searching) for trying to run cygpath (since I, AFAIK know, I have no way of knowing where cygwin is installed, or at least not an elegant and robust way) and javac for run-time compilation of transfer files (for instance when running on a JRE instead of a JDK, but the JDK is present on the system and in the system path). Hopeful that he's not just rambling and needing sleep now, ;) -- Stephen -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] # before translated word
Jimmy O'Regan jore...@gmail.com writes: On 7 June 2012 11:00, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Unfortunately, searching for # puts you back at the front page (bug in mediawiki I guess) Nope. HTTP does not transmit '#', which is for anchor names (and typing %23 is a pain :) Well, the search box could still redirect to %23 on typing #, but I see http://wiki.apertium.org/wiki/%23 is an illegal page. Oh well. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Online language pair packages
Mikel Artetxe artet...@gmail.com writes: Hi everybody, As some of you might know, I am working on the embeddability of lttoolbox-java as part of my GSoC project (you can follow my progress here). The central part of the project consists of having standalone packages for each language pair that can be run independently as well as easily integrated in bigger Java projects. This way, the project wouldn't make much sense if we don't maintain an infrastructure of ready-to-use packages online for all the language pairs that Apertium supports. Since this is something that would involve the whole Apertium community, I am writing to the list to, first, present you what I have been working on so far and, second, get feedback from you, discuss all this, and adopt the decisions that we take. First of all, let's see what those language pair packages consist of. In order to get a general idea, you can check any of the following links: * Esperanto ⇆ English * Basque → English * Basque → Spanish As long as you have Java in your machine, a simple program to translate between those languages should be launched (and if it is not working for you, please let me know). And I would like to remark that the only requirement is Java, the user doesn't need to have any other program installed in his/her machine, and it works in any operating system, including Windows. The app is run locally, so it can work offline. The secret behind those links, that is, the real language pair packages, are provisionally kept here. The Jars there (one per language pair) are the actual self-contained Java executables, and not only they work on desktop, but they can also work on Android and will be adopted by Arink for the Android app that he is working on. In fact, any other Java program could easily use them thanks to the API class that lttoolbox-java now offers. Very cool =D And now the big question is, how can we maintain all this? I think that we can distinguish two different steps regarding it: 1) Create the packages (those self-contained Jar files) 2) Maintain the packages online (keep updated versions of all the language pairs online for anybody to use) The solution that I have been (and I'm still) working on comes in form of two bash scripts, each one to carry out one of the tasks (you can find them here in my branch): 1) apertium-pack-j offers an easy way to generate the packages. It requires to have lttoolbox-java (the one in my branch, not the one in trunk) and android-sdk installed, and their location must be specified by setting the LTTOOLBOX_JAVA_PATH and ANDROID_SDK_PATH environment variables. After that, you can simply run it passing the path to the mode files for which you want to generate the package as argument, and a ready-to-use package would be created by the script. For instance, the following command would create a ready-to-use package for the Esperanto-English language pair named apertium-eo-en.jar in my machine: LTTOOLBOX_JAVA_PATH=/usr/local/share/apertium/lttoolbox.jar ANDROID_SDK_PATH=/home/mikel/developer/android-sdk-linux ./apertium-pack-j /usr/local/share/apertium/modes/eo-en.mode /usr/local/share/apertium/modes/en-eo.mode As you can see, I simply specify the correct location of lttoolbox-java and android-sdk in my machine, and pass the location of eo-en.mode and en-eo.mode (the main modes that correspond to the Esperanto-English language pair) as argument to apertium-pack-j. 2) apertium-upload-j offers an easy way to maintain the packages created this way online. For instance, running the following command after the one exposed in the previous step would automatically update (or upload for the first time) the package for Esperanto-English: ./apertium-upload-j apertium-eo-en.jar More precisely, it would correctly rename the package to avoid duplication, generate a jnlp file (which is used to run the package through Java Web Start, as in the links above) and commit them to SVN (provisionally to my branch, Jacob suggested to create a specific directory such as binaries outside trunk for them, but any suggestion is welcome). As an idea, both scripts could be integrated in the makefiles of each language pair so that a simple make upload, for instance, would automatically create and upload the appropriate packages. I'd prefer that method over making one person (ie., you) do all the maintenance; it would be nice to simply type make upload. The only thing that's sort of an annoyance is having to get the full android-sdk in addition to lttoolbox-java, but I guess that will be appearing in the various distro repositories … -- Kevin Brubeck Unhammer -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security
Re: [Apertium-stuff] Definite determiner in apertium-es-ca
Bernard Chardonneau bechapert...@free.fr writes: Hey everybody. After 10 days mostly in the nature without a computer and just before 8 other weeks without a permanent internet connexion (widely chosen), I want to give my opinion as a new pair developer about the discussion about what should countain dictionaries. 1) For monodices, I perfectly agree with Fran and some others to think all interesting information should be there even if not used for several pairs. As doing that generally means to write a complete paradigm, and after just to use it hundred or thousand of times for the main ones, it is not a big problem. 2) For bidixes, the most natural way to build them is to write something like : eplmy_words n=kind1//lrmy_translations n=kind2//r/p/e where kind1 and kind2 are often the same and can be built from the name of the paradigm used in the monodix. I tell that because I quickly realised that including a new line typing the right xml syntax in a file with more 40 000 other lines becomes quickly painful. So I wrote a 4 parameter shell to generate new lines, and another to put these lines at the good place. I think a lot of pair developers have their own shell to do the same or something similar to build a bidix when monodices are available. So, making bidixes lines like as above means other s n=something/ would be better if not needed. Of course, there are exceptions witch permit to get pleasant results like in fr-es pair : eplcomas n=n/s n=m//lrcomas n=n/s n=m//r/p/e eplvirgules n=n/s n=f//lrcomas n=n/s n=f//r/p/e or eplcomposants n=n/s n=m//lrcomponentes n=n/s n=m//r/p/e eplcomposantes n=n/s n=f//lrcomponentes n=n/s n=f//r/p/e But having to write (in eo-fr pair) eplABCs n=np/s n=al//lrABCs n=np/s n=al/s n=mf//r/p/e without forgeting any s n=al/ or the s n=mf/ to prevent getting a # in the translation, is not a very nice way to work. There is of course the problem of the beginner not doing that and asking on the list why it does not work. But that can be learned quickly. But the most important problem is being obliged to do that quite allways and finaly having bigger and a little less readable lines in the bidix. I think event in this case : eplajouts n=n/s n=m//lradicións n=n/s n=f//r/p/e(gender changing), there should be no need to give gender if there is no word ambiguity in each langage (like for coma and componente in Spanish). And of course something like : e r=LRplbinaires n=adj/s n=mf//lrbinarios n=adj/s n=GD//r/p/e e r=RLplbinaires n=adj/s n=mf//lrbinarios n=adj/s n=f//r/p/e e r=RLplbinaires n=adj/s n=mf//lrbinarios n=adj/s n=m//r/p/e would become more simple in one line. So, the question is how to succeed to do that without breaking things. Solution 1 : paradigm Several people spoke about it but without details. I remark the information s n=kind/ inside bidixes can generally be generated from the name of the paradigm used in the monodix witch looks like something__kind (or foo__bar if you prefer). But of course, there is les information in kind than in something__kind. So a nice approach woud be for each paradigm of every monodix, to build a paradigm with the same name in the bidix just countaining an invariant list of informations like : s n=thing1/s n=thing2/ And like that, even gender ambiguities like for the Spanish word coma could be solved elegantly : eplcomas n=livre__n//lrcomas n=abismo__n//r/p/e eplvirgules n=abeille__n//lrcomas n=abeja__n//r/p/e Didn't Jacob Nordfalk and Michael Kristensen make a script to do that kind of thing with sv-da? Ie. automatically create bidix pardefs based on monodix pardefs. Solution 2 : during compilation That's another approch. For compiling bidixes files, two cases : - an information is in a s n=thing/ , so just use it - this information is not indicated, so it is taken from the monodix. Have a good summer. You too :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] New applications: Apertium Caffeine and Apertium plug-in for OmegaT
Mikel Forcada m...@dlsi.ua.es writes: [...] There is one thing that could be easily solved. Víctor Sánchez (cc-ed) maybe can help you. When one uses the Apertium webservice from inside OmegaT, we avoid translating the tags (u0, etc.). Some minor changes were made to the code that calls Apertium as a webservice (you'll easily find them, but if not, I can help) and some changes were made in the webservice itself (Víctor can help here). I think it is a matter of using some regular expressions to hide these in some way... I guess that you are talking about this. I might be blind, but I haven't been able to identify the relevant piece of code there... You're right. Most of the work is done at the Apertium server when it receives format=omegat. Perhaps you can just use the translate meapertium-notransdon't translate me/apertium-notrans method, this works in e.g. html and html-noent formats (grep tells me it should also be supported in odt, pptx, xlsx, wxml). [...] Yes. We should probably create a new directory in SVN and start creating and uploading packages for every language pair. The question is how to maintain it in long-term: we could integrate my script in the makefiles of each language pair to make things easier (although the dependency of Android-SDK and lttoolbox-java can still be a problem for some people), but we would still need the implication of every language pair developer in Apertium (or some responsible to take care of the whole maintenance). This deserves a deeper thought. Any ideas? I liked the idea of just adding a make goal, though perhaps the script could be installed by lttoolbox-java (since that's a dependency of the script anyway), so that copies wouldn't be required by every language pair? -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] New applications: Apertium Caffeine and Apertium plug-in for OmegaT
Jimmy O'Regan jore...@gmail.com writes: On 6 August 2012 10:24, Mikel Artetxe artet...@gmail.com wrote: apertium-es-ro: Document apertium-es-ro.trules-ro-es.xml does not validate against /usr/local/share/apertium/transfer.dtd I can't find any instance of 'trules' anywhere in that package. Are you using the current SVN version? It's in the release tarball (it needed https://gist.github.com/3273244 to compile here). apertium-oc-ca: Document oc-ca.t1x does not validate against /usr/local/share/apertium/transfer.dtd apertium-oc-es: Document oc-es.t1x does not validate against /usr/local/share/apertium/transfer.dtd These two involve running an xsl script (alt.xsl) on the transfer files first. … but they could all do with a bugfix release (when I packaged the releases for Arch Linux, I had to do https://gist.github.com/3273264 and https://gist.github.com/3273266 to make them compile). Who maintains the packages? -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium on Android
Arink Verma arinkve...@iitrpr.ac.in writes: One more important feature added! Application can read text from SMS inbox as input for translation Very cool :-) Download link https://github.com/downloads/arinkverma/Apertiurm-Androind-app-devlopment/ApertiumAndroid_ 8_7.apk Just tried it on my HTC Desire, looks slick =D Some notes: When I open the applications and have nothing installed, I am able to click the → arrow and it says There is no mode from to to from. I love how clicking from or to goes straight into the download list, but then when I click a pair, nothing happens, I have to long-click, is there a reason for that? After selecting a pair to install, should the heading really say Modes? How about Translation directions or something a bit less apertium-jargony? Unziping files should probably be spelled Unzipping files. When I switch the 'from' language, 'to' is set to to and clicking → gives There is no mode from null to katalansk; perhaps it should switch to the last used mode with that from-language (or at least the first available mode with that from-language). On clicking 'Translate', the waiting box says Translator, it should probably say Translating. When I select all text in the input box, both the text and the background is black … Perhaps it should be possible to select and copy the output text too (though I see now it's possible using the Clipboard Push feature). -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish - Norwegian
to Norwegian Bokmål (nb)? And the same in the other direction (i.e. convert the transfer rules for sv-da to rules for sv-nb)? Reusing transfer rules probably isn't necessary. If you don't feel like writing them, then you can write testcases on the Wiki and ask someone on the list to write them. Well, from nb to sv you could copy-paste some of the compound chunking rules, but yeah transfer rules don't take very long to write. Perhaps the maintainer of Danish (da) - Norwegian Bokmål (nb) can give me a hint? He's probably very updated on the differences between the two languages. There is no maintainer that I know of. And I don't think that pair has any work done apart from bidix entries … D. Linguistic resources for Norwegian. I have found frequency word lists for Norwegian Bokmål (nb) at http://helmer.aksis.uib.no/nta/ and can thus prioritize my work to the most important words. http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursar has more frequency lists (they also taunt you with this enormous corpus, but it's currently in beta, very messy, and best avoided for now). […] E. Any advice for me if I start working on the pair Swedish (sv) - Norwegian Bokmål (nb)? Have I missed something I need to know? Any other resources I can use? My advice would be to start small, to avoid getting overwhelmed. Start from scratch on a small task. For example translating this short story: http://www.unilang.org/ulrview.php?res=422,416 Once you have managed to make the system to translate this without any system errors (the @, * # you see, not necessarily translation errors), then you should have a good understanding of the system, and be well founded to start working with the other resources. It shouldn't take longer than a week, and some have done it in a couple of days. +1 on that. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish - Norwegian
Per Tunedal per.tune...@operamail.com writes: Hi Keld, On Thu, Aug 9, 2012, at 19:55, k...@keldix.com wrote: On Thu, Aug 09, 2012 at 02:54:27PM +0200, Kevin Brubeck Unhammer wrote: Francis Tyers fty...@prompsit.com writes: --snip-- (3) You make the two translators in the one pair. For this, you could have the same Swedish dictionary, but would need different nb and nn dictionaries, different sv-nb and sv-nn dictionaries and different sv-nb and sv-nn transfer rules. I think that (3) is probably best, but would like input from others (e.g. Unhammer or Trond). (3) sounds best to me too. Perhaps you could even do with one bidix, and just use the alt=nn vs alt=nb attribute; a rough and dirty count shows that the majority of entries in the nn-nb bidix carry over the same lemma/tag: $ lt-expand apertium-nn-nb.nn-nb.dix | grep -v ':[]:' | awk -F: '$1==$2'|wc -l 71628 $ lt-expand apertium-nn-nb.nn-nb.dix | grep -v ':[]:' | awk -F: '$1!=$2'|wc -l 11365 Some one who can tell the easiest way to add the alt-tags to the dictionnaries, before merging them? Maybe one can have an easy procedure to add new entries when the included languages are updated? You wouldn't be adding alt-tags until after you add the third language (ie. Nynorsk if you start with Swedish-Bokmål) to the pair, so it's not something to worry about yet. -- Kevin Brubeck Unhammer Sent using my emacs -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish - Norwegian
Per Tunedal per.tune...@operamail.com writes: Hi, On Thu, Aug 9, 2012, at 23:23, Trosterud Trond wrote: Per Tunedal kirjoitti 9. aug. 2012 kello 20:21: Tihomir has told before that he plans to start developing a constraint grammar for Swedish. Good. Again: - Are there open resources? - Could something be ported from Norwegian? (perhaps only indirectly). Yes, a production system (say, I want to translate a sv article to nn on Wikipedia) (…) Yes, that was the scenario I first had in mind. But it would break if there is a need for a constraint grammar, wouldn't it? And then there wont be any use left for the Apertium-translation. Well. Since a handful of rules will remove most ambiguities, what is left will be partly disambiguated. And how bad this is for MT needs to be seen. So it will not break. It will only be more problematic, and the result will be poorer. Mikel Artetxe has explained that the OmegaT plug-in doesn't work for language pairs that depends on programs that aren't a part of lttoolbox-java. Six language pairs depend on the Constraint Grammar package and are thus excluded, one of them is apertium-nn-nb. But sv-da doesn't use any constraint grammar, thus I concluded that sv-nb (Norsk bokmål) wouldn't need one either. And would come to real use, by real translators, using OmegaT. If the pair cannot be used, I don't see any need to develop it. In any case a CG could be added later, as an option for those who aren't using OmegaT or Android. [...] What kind of resources do I need? For 1: swetwol :-) But it seems there are resources in Gothenburg: http://www.cse.chalmers.se/alumni/markus/FM/ http://www.cse.chalmers.se/alumni/markus/FM/download/swedish.lexicon This might even work. As an input for transfer rules or for a potential constraint grammar? The .lexicon file might be used to enlarge sv.dix. However, is-sv.sv.dix should already be big enough to get a pair started. (What's the license on that FM stuff anyway?) As for 2, Lexin might be one resource. I am on Euralex in Oslo right now, and will ask around. Fine! Besides, what's Lexin? Lexicon för invandrare, http://lexin.nada.kth.se/lexin/ As a native Swede, I don't see any need for this. Your machine translation system, however, is not a native Swede, and might have a need to know that e.g. katt is a noun. But it doesn't seem that Lexin is free software: Går lexikonen att ladda ner? Nej. Däremot kan man ladda ner Folkets lexikon, som ersätter engelska Lexin, men enbart i xml-format. http://lexin.nada.kth.se/lexin/#about=1;main=3; -- Kevin Brubeck Unhammer Sent using my emacs -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] wiki down
Mikel L. Forcada m...@dlsi.ua.es writes: They both seem to be back. Power cuts were scheduled for August... Cheers Mikel 2012/8/6, Francis Tyers fty...@prompsit.com: The Wiki (and the whole of the DLSI) is down. August has finally arrived! In the meantime, you can try using the Google Cache, or if something is urgent, come on IRC. Fran Down again :/ How about a Flattr[1] button on apertium.org, where all donations go to an on-call apertium sysadmin with the sole purpose of keeping the wiki online? =P [1] https://flattr.com/ -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Benefits of Apertium for translators
Jimmy O'Regan jore...@gmail.com writes: On 22 August 2012 12:06, Per Tunedal per.tune...@operamail.com wrote: Hi, OK. Back to my original wish for some kind of easy to use interface for contributions for specific domains. I guess most experts in law or beetles are not hackers and would not even think of contributing by adding codes to dictionary files. There have been a number of attempts at this over the years, and it's surprisingly difficult to get right. IIRC, someone at UA was working on it, but I don't have details. That said, if you can live with import-only, it's really easy to convert from something like a spreadsheet. http://wiki.apertium.org/wiki/Contributing should be a bit clearer about that now. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Solved: Constraint grammar installation was: Re: Swedish - Norwegian
Tino Didriksen tino.didrik...@gmail.com writes: For the record, using a prefix for CG-3 is fine, if your prefix's bin folder is in your $PATH as it should be. Also, using a prefix for language pairs dependent on CG-3 also works fine as long as the above is true AND apertium, lttoolbox and the language pair were also installed to that prefix. And there is no apertium-command installed to a prefix which appears earlier in the $PATH. But since that turns into a rather long checklist, I guess non-prefixed installations are easier to get right for new developers. On Wed, Sep 5, 2012 at 9:42 AM, Per Tunedal per.tune...@operamail.com wrote: Ah! Wouldn't it be a good idea to put a warning on the Wiki page about the Constraint Grammar: DO NOT USE PREFIX, if you don't know what you are doing! A beginner should be able to just copy the commands. http://wiki.apertium.org/wiki/Vislcg3#Installing_VISL_CG3 I started all over again, this time simply using: ./cmake.sh and the rest worked like a charm. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] end of gsoc (partial) report
Francis Tyers fty...@prompsit.com writes: Hello all, We had 11 projects accepted this year. Of those, 2 failed. Here is a rundown of the remaining projects I was keeping an eye on: * apertium-id-ms: 80% coverage over Wikipedia, error rate of 10%. This pair has been moved to trunk and will be released shortly. Congratulations Raymond and Tina for the excellent work. This is a canonical Apertium pair. Testvoc clean in both directions. Comes close to Google level for this pair. Further information here: http://wiki.apertium.org/wiki/Indonesian_and_Malaysian/Work_plan * apertium-mt-ar: 80% coverage over Wikipedia, error rate around 10-20%. This pair has been moved to staging, and a preliminary/beta version will be released shortly. Excellent work by Miri and Unhammer. This pair uses lttoolbox/CG. Beats Google for this pair. Testvoc clean in both directions. Further information here: http://wiki.apertium.org/wiki/Maltese_and_Arabic/Work_plan * apertium-sh-sl: 80% coverage over Wikipedia, error rate around 20%. This pair will be moved to trunk. Fantastic work by Aleš, Hrvoje and Jernej. This pair uses lttoolbox/CG. Testvoc clean sh-sl, needs work sl-sh. Further information here: http://wiki.apertium.org/wiki/Serbo-Croatian_and_Slovenian * apertium-tat-kaz: 80% coverage over Wikipedia, error rate around 20%. This pair will be moved to staging. Really great work by Ilnar and Jonathan. This pair relies on HFST/CG. More or less clean, but challenging to testvoc because of `agglutinative' morphology. Pair not in Google. Further information here: http://wiki.apertium.org/wiki/Kazakh_and_Tatar http://wiki.apertium.org/wiki/Kazakh_and_Tatar/Work_plan * apertium-quz-spa: Coverage difficult to calculate because of weak adherence to orthographic norm. Will be moved to nursery. Good work by pato. Not clean. Pair not in Google. Further information here: http://wiki.apertium.org/wiki/Quechua_cuzque%C3%B1o_y_castellano http://wiki.apertium.org/wiki/Quechua_cuzque%C3% B1o_y_castellano/Apertium-quz-spa/Ortograf%C3%ADa * Corpus based lexicalised feature selection: The project was a reasonable success. We didn't achieve any improvements for definiteness, but we did for preposition selection. Further information and links on Filip's user page: http://wiki.apertium.org/wiki/User:Fpetkovski * Finite-state disambiguation: It was descovered at midterm that this was too much work for Hrvoje student in the available time. After the midterm he continued to work on Slovenian--Serbo-Croatian. Thanks to all our students for taking part! :) Fran +1, great work, apertiumers :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium-id-ms 0.1.0 released
Francis Tyers fty...@prompsit.com writes: Hello everyone, Just a quick note to say that I've released the first version of apertium-id-ms, a pair for translation between Indonesian and Malaysian. This is a GSOC pair, developed by Raymond Susanto and mentored by Septina Larasati and myself. The pair has over 80% coverage of Wikipedia and the News domain, and an error rate of between 8-15% or so. It doesn't quite beat Google yet, but goes a reasonable way there. Congratulations to Raymond for successfully completing his GSOC, and great work all round! You can find the package in the SF repository. :) Fran Wow, congrats on the first released Austronesian language pair in Apertium :-) -- Kevin Brubeck Unhammer Sent from my emacs -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] proposal for change to .lrx format
Mikel Forcada m...@dlsi.ua.es writes: Fran, I'd like to include the possibility for lists in the LRX format. This would involve adding a couple of new tags, That should not be a problem as long as the old format works in the new definition and changing the root tag, You would have to give a good reason for it. the idea is to have something like: http://pastebin.com/GGzfM5qc OK. I would have appreciated comments in the .lrx file... Calling a list would just involve putting it's contents in the rule. So with or it would work like an OR, but without, it would work like a sequence. I don't like this at all. This is very opaque. I would not use the tag list. If it is a set from where you choose, call it set or option... if it is a sequence define it as a sequence. But don't overload something called a list that really isn't. Another option would be to have a named macro that can contain anything that a rule can contain. Then you would use or for lists, etc. But orlist.../list/or does not read well. applyor/list sounds more correct =P Since CG uses 'list' only for the disjunction, I think CG-ers would be confused too, if that matters. (CG uses 'template' for the sequence, though those templates can be a bit more involved.) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Danish to Sweden broken se-da
Per Tunedal per.tune...@operamail.com writes: Hi, Fine! Why didn't you tell me before? I was trying to get it perfect, scared to break anything! https://systemerrorcs.wordpress.com/2011/06/09/commit-early-commit-often/ -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Modes for Icelandic - Swedish is-sv
Per Tunedal per.tune...@operamail.com writes: Hi, I would like to test the pair Icelandic (is) - Swedish (sv), but cannot find any mode files. $ sh autogen.sh make […] $ ls modes is-sv-anmor.modeis-sv-generador.mode is-sv.mode is-sv-pretransfer.mode sv-is-anmor.modesv-is-generador.mode sv-is.mode sv-is-pretransfer.mode is-sv-chunker.mode is-sv-interchunk.mode is-sv-postchunk.mode is-sv-tagger.mode sv-is-chunker.mode sv-is-interchunk.mode sv-is-postchunk.mode sv-is-tagger.mode $ echo 'Svifnökkvinn minn er fullur af álum' | apertium -d . is-sv *Svifnökkvinn #min är Full av *álum Hmm, needs some work. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] new language pair es-de
Isabel Imbernón isabelimber...@gmail.com writes: Hi, I'm trying to create the pair es-de from scratch, as you already know. For that purpose, I copied the es-ca directory and started to change names. I reused the Spanish dictionary and now I have the files apertium-es-de.es.dix and apertium-es-de.es.acx for my new pair. I want to add also the German dictionary from the incubator so that I can have apertium-es-de.de.dix, but then I should also have the apertium-es-de.de.acx, shouldn't I? There is not such a similar file in the incubator, could anyone help me to create this? .acx files are optional, so no need to worry about that (see http://wiki.apertium.org/wiki/Acx for what they do, I don't think they're able to deal with the German double-s unfortunately). Concerning the bilingual dictionary, some months ago I created the file apertium-es-de.es-de.dix just by copying the bilingual dictionary of the directory en-de from the incubator and changing manually English words by Spanish ones. I didn't do it with the whole dictionary, of course, I just have some of them. Do you think this is a good start?Or should I do this differently? Sounds like a good start, although after doing the closed classes (pronouns, conjunctions, etc.) it would be a good idea to prioritise sort the words you are about to translate by frequency. Further reading: http://wiki.apertium.eu/index.php/Appendix_A:_Frequency http://wiki.apertium.org/wiki/Building_dictionaries#Frequency I was also thinking of having an empty file of rules for the beginning, just to start translating, first of all, between words of the dictionaries. How could I do that? Copy-paste this http://wiki.apertium.org/wiki/A_long_introduction_to_transfer_rules#Overview_of_a_transfer_file into your apertium-es-de.es-de.t1x (or apertium-es-de.de-es.t1x for the other direction). Hope this helps, good luck :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Debugging?
Jimmy O'Regan jore...@gmail.com writes: On 30 October 2012 16:07, Yannis Haralambous yannis.haralamb...@telecom-bretagne.eu wrote: dear Apertium people, is it possible to follow the structural transfer of a sentence step by step? For example: what are the chunks, which rule is applied to each, what is the result for each chunk. In other words, is there a debugging option for the structural transfer module? You can get this with apertium-transfer -t but it's not available from the script (it wouldn't make sense) -- you'll have to manually provide the entire pipeline. You can also use http://wiki.apertium.org/wiki/Apertium-viewer to get a quick overview of the input/output of each stage (it doesn't show rule numbers though, like apertium-transfer -t does). -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] compounding
Francis Tyers fty...@prompsit.com writes: El dc 07 de 11 de 2012 a les 17:32 +0100, en/na Per Tunedal va escriure: Hi, thank you. I've read the Wiki and looked into the apertium-nn-nb.nb.dix file. Apparently, this is solved in a less transparent way in the nn-nb pair than in the examples in the Wiki. It's less transparent because it is more complete. I think that compounds work very similarly in sv, da, nn, nb so you could probably just copy these paradigms and see how it goes. In the beginning of the dictionary, there are a lot of pardefs treating compounds, that I don't understand. Can anyone explain? I can try. [...] Exactly :) The only thing I would add is that the tag cmp is a normal tag (as opposed to compound-only-L and compound-R, the special hidden compounding tags). It's not strictly necessary to have it there do compounding, but it is helpful. E.g. in transfer, it's used to distinguish a compound from two nouns simply following each other, and it's also very helpful in generation, for those cases where the sg.ind form is not equal to the form used in compounds (e.g. the Nynorsk word 'vatn' when used as the left-part of a compound becomes 'vass'). -Kevin -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] compounding
Per Tunedal per.tune...@operamail.com writes: [...] The noun kjempe is advertised as possible to use in compounds, yet there is an entry for the adjective kjempehøy (= very high/tall). Why? Assume you have dynamic[1] compounding turned on for the open classes nouns, verbs, adjectives – these are all fairly common in compounding (though nouns cover over 70 % in nn/nb), and you remove kjempehøy from your dictionary. Now, since nb.dix has these analysis of kjempe and høy: kjempevblexinf/kjempenmsgind/kjempenfsgind/kjempenmsgind/kjempenfsgind høyevbleximp/høynntsgind/høynntplind/høyadjposimfsgind your compound analysis will be ambiguous over at least: kjempenfsgind+høynntplind kjempenfsgind+høynntsgind kjempenfsgind+høyevbleximp kjempenfsgind+høyadjposimfsgind kjempenmsgind+høynntsgind kjempenmsgind+høynntplind kjempenmsgind+høyevbleximp kjempenmsgind+høyadjposimfsgind kjempevblexinf+høynntplind kjempevblexinf+høynntsgind kjempevblexinf+høyevbleximp kjempevblexinf+høyadjposimfsgind And it gets even worse if there's some possibility of segmenting at the pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented 'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste' (image+list) vs 'bildel+iste' (image+iced/image+ice tea). Compare this with the ambiguity-count of the analysis given when we do have kjempehøy in the dictionary: kjempehøyadjposimfsgind Only one analysis, and it's the correct one. So you avoid useless ambiguity by adding more compounds. Useless ambiguity is harmful not only to the translation of that word, but of the context (given the seqence adj vblex/n, it's easy to see that the second word is most likely a noun, not so with adj/n/vblex vblex/n). In addition to all that, a decompounding analysis takes a lot longer per word than a simple analysis (you have to check all the possible ways of segmenting the word into two parts, then three parts, etc.), and the fact that adding full compound words further helps decompounding compounds of compounds (it's safer and faster to segment 'bildeliste+generator' than 'bilde+liste+generator', where you might end up with 'bildel+iste+generator'). Aaand, finally, some times the sum is greater than the parts, e.g. Bokmål 'kjempemessig' might be better translated to 'ovstor' or 'diger' in Nynorsk, 'bedømmelseskommité'→'domsnemnd' etc. In summary: Dynamic compounding leads to more ambiguity and slower analysis, and is thus used only when there is no lexicalised analysis. Adding lexicalised compounds improves not only analysis of those compounds and their contexts, but also improves dynamic compounding of longer compounds. BTW I've found only one similar Danish word: kæmpestor (very large). I don't know if there are any more. If kæmpe- is not very productive in Danish, it might be better to translate those words into something else (kjempelett→pærelet, kjempegod→knippelgod?). Adding such pairs as lexicalised compounds in the dictionaries will override dynamic compounding for those words. [1] Dynamic compounding is when the analyser only contains the parts and guesses how they fit together, lexicalised compounds are defined as those we spell out completely in the dictionary. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] compounding
Kevin Brubeck Unhammer unham...@fsfe.org writes: [...] And it gets even worse if there's some possibility of segmenting at the pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented 'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste' (image+list) vs 'bildel+iste' (image+iced/image+ice tea). Wops, mis-glossed the mis-segmentation: 'bilde+liste' would mean car part+iced or car part+iced tea. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Paradigms in Bidixes
Francis Tyers fty...@prompsit.com writes: El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure: Hi, OK. I just thought the other way around: Because coverage is so low, it would be fruitful to generate translations for unknown words. In the next step, I intended to add the most frequent words, bit by bit. Great! As you have pointed out, it's much more effective to have a word in the dictionaries than to generate it by some rule. Thus the gain is obviously largest from adding the most frequent compounds and derivations explicitly in the dictionaries. But it's still nice to get translations of the more rare compounds and derivations. Bad investment in terms of time. You want your work to have maximum, not minimum impact. Thus, work by frequency. Add the frequent stuff first. As https://xkcd.com/1133/ shows, with only the 1000 most frequent words in English you can explain rocket science ;-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cannot commit new mode file
Per Tunedal per.tune...@operamail.com writes: Hi, succeeded after some trouble: sudo sh apertium-generate-modes modes.xml Gives: sh: Can't open apertium-generate-modes Tested: sudo sh autogen.sh make Couldn't remove the existing mode files. Removed the files with: rm * cd .. Removed the directory with: rmdir And finally: sudo sh autogen.sh make Worked!! Don't use sudo before autogen, the only thing that should require sudo is make install. If anything else requires sudo, there's likely either a bug in the makefile, or you at some point earlier sudo'ed when you shouldn't have ;) (Unfortunately, if you've used sudo'ed earlier, you might have to use sudo to rm some files that have been created with root permission.) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cannot commit new mode file
Kevin Brubeck Unhammer unham...@fsfe.org writes: Per Tunedal per.tune...@operamail.com writes: Hi, succeeded after some trouble: sudo sh apertium-generate-modes modes.xml Gives: sh: Can't open apertium-generate-modes Tested: sudo sh autogen.sh make Couldn't remove the existing mode files. Removed the files with: rm * cd .. Removed the directory with: rmdir And finally: sudo sh autogen.sh make Worked!! Don't use sudo before autogen, the only thing that should require sudo is make install. If anything else requires sudo, there's likely either a bug in the makefile, or you at some point earlier sudo'ed when you shouldn't have ;) Aaand now I see there that that makefile will give the modes directory root permission when you do sudo make install. I uploaded a fix, you might have to sudo rm -r modes once if you've done a sudo make install. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Domain and style/genre
Per Tunedal per.tune...@operamail.com writes: Hi, See below. Yours, Per Tunedal On Sun, Dec 16, 2012, at 23:07, Francis Tyers wrote: El dg 16 de 12 de 2012 a les 14:21 +0100, en/na Per Tunedal va escriure: Hi, I consider info on primarily domain and secondly style useful for disambiguation. As a first step it would be very nice to be able to add a domain-tag to words. Adding info on style would make it possible to further improve translation results. The translation would improve considerably if the user could choose the appropriate domain when demanding a translation. Consider e.g. the example of translating the English word key, or similarly the French word clé/clef, to Swedish. If the domain is e.g. Tourism/accommodation/real estate or similar, the word would most likely translate to nyckel (to lock/unlock the door of a house). On the other hand if the domain is e.g. information technology (or even music) the word would most likely translate to tangent (on your keyboard or piano). Obviously, a lexical selector/disambiguator could be trained on a corpus from a specific domain as well, further improving to the translation. I did this in my thesis. It's quite effective. It's possible to tune the vocabulary to a domain with either parallel or monolingual corpora using apertium-lex-tools.[1] You won't be interested in it though as it doesn't work with the Java version. What will work with the java-version? And what will not? What's the problem? No one has re-implemented the apertium-lex-tools package in Java yet. What I would like to do is: - adding info about domain in the dictionaries - do some training on an appropriate corpus You want to first add the domain-specific translation manually, and then have the system automatically discover the domain-specific translation? That sounds like duplicating work, and what do you do if the training and dictionaries don't agree? The way lex-tools training works is: The English word key is listed in the en-sv bilingual dictionary with both nyckel and tangent as possible translations. You then give the bilingual dictionary and a corpus to the lex-tools training scripts, these give you a .lrx file. You can run the training scripts twice, once with a general corpus and once with a music-domain corpus in order to get both a general .lrx file and a music-domain .lrx file. The training scripts don't need any manual specification of what domain you're in, they learn what the best translation is from the (domain-specific) corpus. - let the user choose a suitable domain (if any), as an alternative to the general domain That'd be easy by giving a new translation mode, e.g. en-sv_music.mode would point to a different lrx file from the general en-sv.mode. - let Apertium use info from the dictionaries and the training to solve ambiguities. BTW Would it do any difference if you trained the tagger on a domain corpus? It might. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Video talk on Apertium
Mikel Forcada m...@dlsi.ua.es writes: Jacob, i cannot see your video with the free software I have available. I posted in your entry, but you didn't answer. Mikel http://downloadvimeo.com/#http://vimeo.com/54075259 will download the mp4 file. For future reference: Youtube encodes videos as .webm (Flash-free playback in a non-patent-encumbered codec). Al 12/13/2012 12:28 PM, En/na Jacob Nordfalk ha escrit: Hi there, At film of a talk I made on Apertium and MT at the International Congress University (of World Congress of Esperanto) in 2011 has finally been prepared and published. https://plus.google.com/114820443085046080944/posts/ZtG7zjfD1ns The talk is in Esperanto. Enjoy and share :-) Jacob -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium for Android version 1.0 soon to be released
Xavi Ivars x...@infobenissa.com writes: Hi, I've tested it with two old (SonyEricsson Xperia X10 mini and Samsung Galaxy S) mobiles and a Nexus7 tablet. With the Nexus7 = awesome! But not many differences with the old verision. I guess 2Gb of RAM make this feature not as important as in other older devices. With the GalaxyS = really-really-really awesome! The difference between this version and the last one that loaded the whole files is really big. The translations are really fast, including the first one. With the X10 mini = I couldn't install any package. I got some space errors (the device has a really small storage). Also, because of bad data connectivity, there was an IOException while downloading the lang package, and the expection message was shown in the target textarea. In all 3 devices I got the bug of two icons in the app launcher. As Kevin, I think I would remove all the metainformation about the translation process. Another thing I noticed now (I think this may have been a problem before too): on second startup the translation direction said from → to, where clicking the arrow gave There is no language direction from from to to, and clicking to gave the heading Translated to and no language. Only clicking from has an effect. Perhaps the buttons should be grayed out or something when they're not useful. Also, a little UI request: if there's only one possible direction, it would make sense to auto-pick that; and if there's only one possible to for a certain from, that should be auto-picked on selecting that from. About the permissions, what I would do is try to require as few permissions as possible. I mostly agree (either see what feature requests people make, or at least have a simple base that others can build on). I think SD card installation would be good though; on my Desire I had to remove some stuff before being able to install. The app does 1) work well on older phones and 2) work without a net connection, so I'm guessing SD install would make it fit well into the older-phone-market (as well as work great on newer phones). Anyway, THANKS A LOT Jacob for your work with this app. Yes, thanks Jacob – and Mikel and Arink; offline translation on a phone is quite awesome :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium for Android version 1.0 soon to be released
Jacob Nordfalk jacob.nordf...@gmail.com writes: 2012/12/25 Kevin Brubeck Unhammer unham...@fsfe.org Xavi Ivars x...@infobenissa.com writes: In all 3 devices I got the bug of two icons in the app launcher. The two Apertium icons were the two versions: Arinks work, and a more simple derivative where code from Mikels 'Apertium-Caffeine' is fused with Arinks code and simplified a lot. Ah, that cleared up a lot :-) Arinks is the more sophisticated, i.a. is has two seperate buttons for from and to where the simple just have one Choose languages button. But with sophistication also comes completity; Arinks code is using databases and contains of ~25 classes and, as a result of the sophistication, will be hard to use as example code for others to follow/import into their own projects. Another consequence of the complexity was that I had trouble making it react robustly to screen turning and application restarts. In the end I resolved to fuse the code into 5 classes where there is no databases involved. :) on second startup the translation direction said from → to, where clicking the arrow gave There is no language direction from from to to, and clicking to gave the heading Translated to and no language. Only clicking from has an effect. Perhaps the buttons should be grayed out or something when they're not useful. Also, a little UI request: if there's only one possible direction, it would make sense to auto-pick that; and if there's only one possible to for a certain from, that should be auto-picked on selecting that from. This is Arink's code. Sorry for the confusion! Please update the app or choose the other icon. From inside the basic activity you can choose 'Show extended example' after pressing the MENU button to use Arink's code. About the permissions, what I would do is try to require as few permissions as possible. I mostly agree (either see what feature requests people make, or at least have a simple base that others can build on). I think SD card installation would be good though; on my Desire I had to remove some stuff before being able to install. The app does 1) work well on older phones and 2) work without a net connection, so I'm guessing SD install would make it fit well into the older-phone-market (as well as work great on newer phones). Ive changed the app so it can be moved to SD card. This might make it work on older phones with memory constraints. Could you download again and try if you can move it to SD card and see if it helps? https://apertium.svn.sourceforge.net/svnroot/apertium/builds/apertium-android/ I'm not quite sure how to read the space requirements: On phone, no lang.pair: Total 2.05MB App 2.05MB Data 0.00KB On SD Card, no lang.pair: Total 1.65MB App 1.64MB Data 4.00KB On phone, one lang.pair: Total 9.93 MB App 2.05MB Data 7.88MB On SD Card, one lang.pair: Total 9.52 MB App 1.64MB Data 7.88MB -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Installation troubleshooting: cg-comp not found
Federico Gobbo federico.go...@univaq.it writes: Hi there / Saluton [this message is directed in particular to the maintainer of the pair eo-es / chi tiu mesagho estas aparte celita al la zorganto de la lingvoparo eo-es] I am installing Apertium on Ubuntu and I succeeded to follow the instructions from SVN without any problem until step 5: http://wiki.apertium.org/wiki/Apertium_on_Ubuntu I decided to choose the language pairs. I did not find any problem with eo-ca eo-en but when I have chosen eo-es the make file (correctly created, according to my shell), says that cg-comp was not found. Install vislcg3. See: http://wiki.apertium.org/wiki/Apertium_and_Constraint_Grammar#Installing_VISL_CG3 (I see the troubleshooting page only mentioned cg-proc, added cg-comp now.) -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Why is interchunk done after transfer on the target language ?
Bernard Chardonneau bechapert...@free.fr writes: The question does not explain the whole problem. If in the source language, there is a strict way for ordering words and in the target language, another way, as strict for ordering them, I don't see any problem doing the transfer step first. But if the source language does not impose a way to put words (there is another way to know where is the subject or the object) but the target language does, it may be more simple to reorder words on the source language. [...] In apertium-interchunk, you can reorder chunks. These have been created by apertium-transfer. If you run interchunk before transfer, you won't have any chunks to reorder. Since interchunk operates on chunks, you don't have access to neither source nor target language lemmas, only the chunk tags. In apertium-transfer, you have access to both source and target language lemmas. If I understand you correctly, I think you want to do more of your changes in apertium-transfer, and less in interchunk? -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Installation troubleshooting: cg-comp not found
Hèctor Alòs i Font h.a...@esperanto.cat writes: Kara Federico, Laŭ mi vidas es-eo havas eksperimentalan uzon de Constraint Grammar. Mi provis uzi CG por pli bone morfologie senambiguigi, sed ne sufiĉe sukcese por lanĉi novan version. Tial estas du modoj: es-eo kaj es-eo-no_cg (kiu devus pli-malpli egali al la oficiala versio). It should be possible to have two make goals, e.g. ./configure --with-cg so that people/distros get the choice of whether to install another dependency or not. -- Kevin Brubeck Unhammer Sent from my emacs -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Installation troubles (second and last part)
Federico Gobbo federico.go...@univaq.it writes: Briefly / Rapide: @Héctor: mi kompilis VISLCG3 kiel konsilite sed la arkivo make falis trovi la detalon cg-proc denove :( Kiel mi povas kompili sen CG, almenau pormomente? Mi ne vidas dosieron apertium-eo-es_no_cg au simile? Did you remember to sudo make install after compiling CG? Also, did you re-run autogen.sh in apertium-eo-es before running make there? @Kevin: thanks for your help. I didn't understand if the two goals you mentioned (i.e., ./configure --with-cg) are a proposal or an actual option I can choose. Sorry for the confusion, that was directed at Hector as an idea. You can't choose --with-cg. @all: I had another error in the language pair eo-fr: ---snip--- NOTE: lttoolbox-java (used for bytecode accelerated transfer) is missing Therefore the following will fail (but it's OK) apertium-preprocess-transfer-bytecode-j apertium-eo-fr.fr-eo.t1x fr-eo.t1x.class /bin/bash: apertium-preprocess-transfer-bytecode-j: comando non trovato make[1]: *** [fr-eo.t1x.bin] Errore 127 make[1]: uscita dalla directory /home/riko/apertium/apertium-eo-fr make: *** [all] Errore 2 ---snap--- What does it mean? Should I ignore it? The good news is that I could add You can ignore it. It may provides for faster transfer, but the quality of the translation should be the same, I think. ca-it es-it straightforwardly in my good old Lubuntu machine, so more or less now I am prepared to start contributing to Apertium. Last, a tickle question: why some language pairs use CG if there is already the TSX_format for tagging (if I read the wiki correctly)? Advantages? Costs? Thanks in advance, CG is a rule-based disambiguation system, it allows for writing very nuanced / powerful rules for selecting the right reading, and can match e.g. sequences from the beginning of the sentence to the end (and beyond, actually …). It's probably Turing Complete. apertium-tagger, which uses the TSX format, is a statistical disambiguator; it runs faster, comes installed with apertium, and lets you train on corpora instead of writing lots of rules. But it only matches two word sequences. -Kevin -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How to handle differences between languages
Per Tunedal per.tune...@operamail.com writes: Sorry, not verbs! Should be adjectives. Some adjectives that are used to describe animerad, living things, have maskuline forms. like rädd : rädde, instead of rädda. If I remove such forms in sv dix for non animerad adjectives, i might get into trouble. Well, you don't really have to remove them; even if abelsk is only used to describe very inanimate mathematical structures, it doesn't hurt that sv.dix is in theory capable of producing abelske. If you do remove them, you need to tag them differently in bidix, so that transfer will know whether it's dealing with an adjective capable of the animate form, or one with no animate forms. I'd guess that that would involve more work. The question is: if someone uses abelsk about something _animate_ in Danish, does it hurt that you output abelske [animate_noun]? Besides: Norewegian has masculine, feminine and neutral nouns, but Swedish and Danish have gommen and neutral. How to handel that in My future pair no-sv (nb/nn-sv)? For nouns, it's easy: eplpiges n=ns n=ut/l rjentes n=ns n=f/r/p/e For adjectives, a transfer rule just looks at the noun to the right and picks the correct gender. I guess nouns in sv-da bidix are not tagged with animacy. The animate ones probably should have a tag so that transfer rules can decide whether to use the masculine adjective. -Kevin -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hi, I'm stuck. I can't get the translation of Swedish pronouns to Danish work. Specifically, I've introduced the many Swedish variations of saying I and you by using an expression in third person. I've tried treating the possessive for somliga as a genitive causing # instead of *. But the possessive is treated separately for the other personal pronouns, originally present. And I am trying to translate somliga to the danish du (although that, in rare cases, it can refer to ni , 3rd person plural - both you in english!) [...] That was a bit overwhelming. Take one problematic sentence, and show its output in the various stages. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hej Keld, thank you for the translation! Unfortunately it doesn't work: Somliga har det bra som kan sitta där i skuggan och läsa. ^Somliga/Somligaprnpersp3unplnom/Somligaprnpersp3unplacc$ ^har/harabbr/havbhaverpresactv$ ^det/dendetdefntsg/detprnpersp3ntsgnom/detprnpersp3ntsgacc$ ^bra/braadv$ ^som/somcnjsub/somprnrelunspnom$ ^kan/kunnavbmodpresactv$ ^sitta/sittavblexinfactv$ ^där/däradv/därcnjsub$ ^i/ipr$ ^skuggan/skugganutsgdefnom/skugganutsgdefcmpcompound-only-L/skugganutsgdefnomcompound-R$ ^och/ochcnjcoo$ ^läsa/läsavblexinfactv$^./.sent$ ^Somligaprnpersp3unplnom$ ^havbhaverpresactv$ ^detprnpersp3ntsgnom$ ^braadv$ ^somprnrelunspnom$ ^kunnavbmodpresactv$ ^sittavblexinfactv$ ^däradv$ ^ipr$ ^skugganutsgdefnom$ ^ochcnjcoo$ ^läsavblexinfactv$^.sent$ ^Somligaprnpersp3unplnom$ ^havbhaverpresactv$ ^detprnpersp3ntsgnom$ ^braadv$ ^somprnrelunspnom$ ^kunnavbmodpresactv$ ^sittavblexinfactv$ ^däradv$ ^ipr$ ^skugganutsgdefnom$ ^ochcnjcoo$ ^läsavblexinfactv$^.sent$ ^Somligaprnpersp3unplnom/@Somligaprnpersp3unplnom$ ^havbhaverpresactv/havevbhaverpresactv$ ^detprnpersp3ntsgnom/detprnpersp3ntsgnom$ ^braadv/godtadv$ ^somprnrelunspnom/somprnrelunspnom$ ^kunnavbmodpresactv/kunnevbmodpresactv$ ^sittavblexinfactv/siddevblexinfactv$ ^däradv/deradv$ ^ipr/ipr$ ^skugganutsgdefnom/skyggenutsgdefnom$ ^ochcnjcoo/ogcnjcoo$ ^läsavblexinfactv/læsevblexinfactv$^.sent/.sent$ The @ is introduced by bidix. Either somliga is not in there at all, or it's in there, but with the wrong main part-of-speech tag, or misspelt or something. -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hi again, one more example: Det kan undertecknad självfallet inte alls instämma i. ^Det/Dendetdefntsg/Detprnpersp3ntsgnom/Detprnpersp3ntsgacc$ ^kan/kunnavbmodpresactv$ ^undertecknad/undertecknavblexpputsgind/undertecknadprnpersp3utsgnom/undertecknadprnpersp3utsgacc$ ^självfallet/*självfallet$ ^inte/inteadv$ ^alls/allprndefutsggen/allprndefutsggencompound-R$ ^instämma/*instämma$ ^i/ipr$^./.sent$ ^Detprnpersp3ntsgnom$ ^kunnavbmodpresactv$ ^undertecknadprnpersp3utsgnom$ ^*självfallet$ ^inteadv$ ^allprndefutsggen$ ^*instämma$ ^ipr$^.sent$ Should undertecknad really be analysed as a pronoun?? ^Detprnpersp3ntsgnom$ ^kunnavbmodpresactv$ ^undertecknadprnpersp3utsgnom$ ^*självfallet$ ^inteadv$ ^allprndefutsggen$ ^*instämma$ ^ipr$^.sent$ ^Detprnpersp3ntsgnom/Detprnpersp3ntsgnom$ ^kunnavbmodpresactv/kunnevbmodpresactv$ ^undertecknadprnpersp3utsgnom/underskriverprnpersp3utsgnom$ ^*självfallet/*självfallet$ ^inteadv/ikkeadv$ ^allprndefutsggen/alprndefutsggen$ ^*instämma/*instämma$ ^ipr/ipr$^.sent/.sent$ ^Detprnpersp3ntsgnom$ ^kunnevbmodpresactv$ ^underskriverprnpersp3utsgnom$ ^*självfallet$ ^ikkeadv$ ^alprndefutsggen$ ^*instämma$ ^ipr$^.sent$ Det kan #underskriver *självfallet ikke als *instämma i. The # is introduced because there's no ^underskriverprnpersp3utsgnom$ in da.dix (and I'd say that's a good thing!). -- Kevin Brubeck Unhammer GPG: 0x766AC60C -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hi, well, I think it's a good idea to be able to translate the not uncommon formal Swedish undertecknad instead of jag = I. Hmm, guess that makes sense … I've changed the translation to undertegnede, as Keld suggested. But it still doesn't work. As far as I can see, the requested form is available in the Danish monodix. What am I doing wrong? Before da monodix: ^undertecknadprnpersp3utsgnom/undertegnedeprnpersp3utsgnom$ da monodix: pardef n=undertegnede__prn e a=PT pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=nom//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=acc//r/p/e e pls/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=gen//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=nom//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=acc//r/p/e e pls/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=gen//r/p/e /pardef And what does the call to the pardef look like? -Kevin -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hi again, bidix: e r=LR a=PT plsomligas n=prn/s n=pers/s n=p3/s n=pl//lrnogens n=prn/s n=ut/s n=p3/s n=sg//r/p/e I try to translate a pronoun i 3rd person, plural to an other in 3rd person singular and utrum. Maybe this isn't the way to do it? It should work. -Kevin -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Swedish and Danish pronouns
Per Tunedal per.tune...@operamail.com writes: Hi again, the call: e lm=undertegnede r=RL a=PT i/ipar n=undertegnede__prn//e The lemma and form are missing; that is, nowhere do you add undertegnede to l or r. If it's the same in all forms, the simplest would be to add it to the i (which is shorthand for pl…/lr…/r/p). Remember that the lm attribute is, to lttoolbox, regarded as a comment. -Kevin On Thu, Jan 31, 2013, at 8:40, Kevin Brubeck Unhammer wrote: Per Tunedal per.tune...@operamail.com writes: Hi, well, I think it's a good idea to be able to translate the not uncommon formal Swedish undertecknad instead of jag = I. Hmm, guess that makes sense … I've changed the translation to undertegnede, as Keld suggested. But it still doesn't work. As far as I can see, the requested form is available in the Danish monodix. What am I doing wrong? Before da monodix: ^undertecknadprnpersp3utsgnom/undertegnedeprnpersp3utsgnom$ da monodix: pardef n=undertegnede__prn e a=PT pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=nom//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=acc//r/p/e e pls/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=sg/s n=gen//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=nom//r/p/e e pl/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=acc//r/p/e e pls/lrs n=prn/s n=pers/s n=p3/s n=ut/s n=pl/s n=gen//r/p/e /pardef And what does the call to the pardef look like? -Kevin -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff