from:"Kevin Brubeck Unhammer"

Re: [Apertium-stuff] Translators on www.apertium.org

2010-04-19 Thread Kevin Brubeck Unhammer

2010/4/19 Felipe Sánchez Martínez fsanc...@dlsi.ua.es:

 Hi all,

 Prompsit did not go down, why? because the language pairs offered there
 are stable and tested.

 I would like to rise a question. Should we offer the translation between
 developing language pairs at the webpage? IMHO we shouldn't.

But what's the measure? Released pairs? The ones at apertium.org have
all had a release. All language pairs that have reached version 1.0?
That's rather arbitrary… All that have had a thorough testvoc?  All
released pairs _should_ have this. Should one simply let, say, half a
year pass before putting a release on the server, to collect bug
reports? How many people actually download the language packages, run
lots of text through them, and then _report the bugs_?

It seems to me like a better solution is to use ScaleMT, and perhaps
let those language pairs that we, for whatever reason, consider too
untested run on a different server. Unless I completely misunderstood
Victor's presentation last fall, using ScaleMT it should be possible
to keep the web page going even though one server goes down (do you
even need ScaleMT to do that?). Thus developers can get quick feedback
on what's wrong (oh, and apertium.org gets to offer more language
pairs). Of course, this assumes that there is the possibility of
having yet another server…


best regards,
Kevin Brubeck Unhammer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] issues with apertium service ?

2010-04-20 Thread Kevin Brubeck Unhammer

2010/4/20 Francis Tyers fty...@prompsit.com:
 Friedel has noticed a change in the listPairs method, it doesn't seem to
 list pairs, is he doing anything wrong ?

 friedel1 spectie: Hi.
 friedel1 spectie: Aware of any issues with your service at the moment?
 friedel1 curl http://api.apertium.org/json/listPairs
 friedel1
 {responseData:[],responseDetails:null,responseStatus:200}

Is api.apertium.org running apertium-service? (there it's
languagePairs, not listPairs) I can't test from my IP ;-)

-Kevin Unhammer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] strange goings on in post-generation

2010-07-28 Thread Kevin Brubeck Unhammer

Hi,

I tried making a post-generation dictionary with just one rule


?xml version=1.0 encoding=iso-8859-1?
dictionary
  alphabet/
  sdefs
sdef n=test/
  /sdefs
  section id=main type=standard
  e
p
  la/e/l
  re/r
/p
  /e
  /section
/dictionary


but I get slashed output  when I try running it:


$ echo '~el' | lt-proc -p foo.autopgen.bin
e\/el


is this a bug or am I missing something?


--
Kevin Brubeck Unhammer

--
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Arch Linux PKGBUILD's now available for (almost) all released language pairs

2010-09-15 Thread Kevin Brubeck Unhammer

Hi,

I just wanted to let people know that I've uploaded AUR packages for
Arch Linux for all released language pairs in Apertium (except for
is-en, which seems to require a newer version of apertium-pretransfer
than what is in apertium-3.1.1). If anyone's running Arch Linux, I'd
be very happy if they could give them a try and let me know where the
bugs are hiding :-)


Also, in making the packages I discovered some problems with certain
pairs; I had to apply the following patches to make these pairs
compile:
apertium-es-ro:
http://aur.archlinux.org/packages/apertium-es-ro/apertium-es-ro/trules.patch
apertium-oc-ca:
http://aur.archlinux.org/packages/apertium-oc-ca/apertium-oc-ca/t1x.patch
apertium-oc-es:
http://aur.archlinux.org/packages/apertium-oc-es/apertium-oc-es/oc-es.t1x.patch

http://aur.archlinux.org/packages/apertium-oc-es/apertium-oc-es/es-oc.t1x.patch
...I'm not sure I got the logic here as intended, these should
probably have a maintenance release.


For all the pairs, I had to modify the Makefile.am in this manner:

-   $(INSTALL_DATA) $(BASENAME).$(PREFIX2).t1x $(apertium_nn_nbdir)
+   $(INSTALL_DATA) $(BASENAME).$(PREFIX2).t1x 
$(DESTDIR)$(apertium_nn_nbdir)

I think $(DESTDIR) could be in the svn Makefile.am's without causing
any trouble, seems to not make a difference except when creating these
packages.


best regards,
Kevin B. Unhammer

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] maintenance release of lttoolbox and Apertium

2010-09-21 Thread Kevin Brubeck Unhammer

2010/9/21 Francis Tyers fty...@prompsit.com:
 For some reason, we had versioned Apertium and lttoolbox as 3.2 in SVN,
 but never got around to making a 3.2 release. There have been some minor
 bugfixes and improvements -- fixing an issue in pretransfer and updating
 the DTDs, and I think it is worth making a 3.2 release -- not least
 because Unhammer wants to release apertium-nn-nb 0.7.0 ;)

Thanks =D
Also, the code for append was added to interchunk/postchunk.cc (it
was in the DTD's but not in the code).


-Kevin

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Bug in apertium-en-es

2010-10-06 Thread Kevin Brubeck Unhammer

2010/10/6 Miquel Esplà miqueles...@gmail.com:
 Hi everybody,
 I've found a problem with the version of apertium-en-es in the
 SVN (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es)
 in the release 25956. It happens taht, when I try to translate a text in
 English with a $ symbol, it disappears in the tranlsation. I've tried to
 translate a file with the only sentence
 hello $ world
 and the result is:
 hello world.
 When I tried the trnalation from Spanish to English it worked, but for
 English to Spanish it fials.
 I am using lttoolbox-3.2.0 and apertium-3.2.0 and the version of
 apertium-es-en in the SVN.
 Can anybody help, please? Cheers,
 Miquel.

If you add $ to the alphabet/, it will work (and $ will be marked
unknown if you don't use -u). But I'm not sure if this causes other
problems?

best regards,
Kevin Brubeck Unhammer

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] News from the mentor summit about GCI -- make tasks specific/Taskset 1: crossdics

2010-10-27 Thread Kevin Brubeck Unhammer

2010/10/27 Jacob Nordfalk jacob.nordf...@gmail.com:
 Ive looked at http://wiki.apertium.org/wiki/Ideas_for_Google_Code-in
 Do you really think anyone can:
 1) translate a text of 34,268 bytes (the new language pair HOWTO) into
 another language
 2)  go through it for a new pair of languages.
 3) When finished, upload to the Incubator.
 in 2-3 HOURS!??!??
 Well, Ive tried that task when I started out. I might be extraordinary slow
 but just doing step 1) would take me at least half a day for Esperanto.
 Same goes for the other proposals: These =18 age students must be really
 bright, but in general I would multiply all your estimations with a factor
3.

 Here is a proposal for what I would consider a realistic task for GCI:
 Add 50 nouns to apertium-sv-da. Check that the words work for boths
 directions (from Swedish to Danish and from Danish to Swedish).
 Time: 14 hours
 (install  compile: 4 hours. Understand the format of the 3 .dix files to
 edit: 2 hours. Adding the words: 4 hours. Checking translation in both
 directions and fix problems: 4 hours).

The time estimates do seem rather low yes. However, I think they're
supposed to reflect only the work that's on that specific task (since
students can work on several tasks, so they won't install apertium for
each task...)

The wiki page also does say The time column gives the minimum
estimated amount of time that should be spent on the task. It does not
include time taken to install / set up apertium. (now boldfaced, as I
missed it the first time too)



-Kevin

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] diccionarios de Apertium

2010-11-25 Thread Kevin Brubeck Unhammer

jgime...@lsi.upc.edu writes:

[...]

 El 24/11/10 17:15, JesÃºs GimÃ©nez escribiÃ³:
  de momento, he estado echÃ¡ndole un vistazo y creo q lo mÃ¡s sencillo
 serÃ¡
  usar apertium-dixtools para leer los ficheros .dix
 
  ni quÃ© decir tiene q cualquier sugerencia por tu parte serÃ¡ bien
 recibida!
 
  muchas gracias,
 
  jesus
 
 
  ps: por cierto, al hacer check-out de todo apertium subversion me ha
  dado un problema de encoding --
 
  svn: Can't convert string from 'UTF-8' to native encoding:
  svn: apertium/apertium-nn-nb/dev/dansknorsk-h?\195?\184gnorsk-todo.dix

(Sorry for replying in English)

Does the problem only occur with this file? That is only a scratch
file which should not compile in any case (it does not even validate);
in general, files in dev folders are likely to have errors...

best regards,
Kevin Brubeck Unhammer

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Compound words and dix format

2010-12-19 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Now we have the java compound word implementation ported to C++ we can
 probably consider this 'de facto' how we are going to do compounds in
 lttoolbox -- it is _in use_ and there have been _no alternatives_. 

 So it is probably worth looking at how we are going to represent this
 nicely in the .dix format. At the moment we use two 'special' symbols:

 sdef n=compound-only-L c=for a form that can only appear on the L/
 sdef n=compound-Rc=for a form that can only appear on the R, or
 as a word on its own/

 I propose making a new element c for compound, and having one
 attribute r for restriction.

 s n=compound-only-L/ would be replaced with c r=L/ and 
 s n=compound-R/ would be replaced with c r=R/

I think it would be better if elements with c r=R/ are, like
c r=L/, compound-only. As the examples below show, an element
marked s n=compound-R/ now both allows use in compounds and out
of compounds, while s n=compound-only-L/ marks a path that's only
reachable in compounds. I think new users would find it less confusing
if they mean the same thing, even though it requires a slightly more
explicit dix file. So instead of

   eplplast/lrplasts n=n/s n=m/s n=sg/s
 n=ind/c r=L//r/p/e
   eplplast/lrplasts n=n/s n=m/s n=sg/s
 n=ind//r/p/e
   eplkortet/lrkorts n=n/s n=nt/s n=sg/s
 n=def/c r=R//r/p/e

you would have to have

   eplplast/lrplasts n=n/s n=m/s n=sg/s
 n=ind/c r=L//r/p/e
   eplplast/lrplasts n=n/s n=m/s n=sg/s
 n=ind//r/p/e
   eplkortet/lrkorts n=n/s n=nt/s n=sg/s
 n=def/c r=R//r/p/e
   eplkortet/lrkorts n=n/s n=nt/s n=sg/s
 n=def//r/p/e

(Note the beautiful symmetry.)


The original reason for having this difference was that we so far have
no examples of forms that can be compound-R but not words on their own,
so having those extra identical lines means longer dix files. 

However, lttoolbox has this wonderful feature called pardefs :) So what
the line for kortet really looks like is this:

  e   plkortet/lrkorts n=n/s n=nt/s
  n=sg/s n=def//r/ppar n=cp-R//e

where 

pardef n=cp-R
   !-- can appear in compounds: --
   e   pl/l  rc r=R//r/p/e
   !-- can appear as a word on its own: --
   e   pl/l  r/r/p/e
/pardef


So, if we're deciding on specifications, that's the only thing I'd like
to see changed. 


-Kevin


-- 

Sent from my Emacs


--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Compound words and dix format

2010-12-21 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Hi!

 The problem with this is that there are so many different metadix
 formats that it will be impossible to come up with one that covers them
 all. For example if I remember correctly how the alt works is
 different in es-pt and in oc-es. I think it was decided that it was
 desirable to have them functioning differently, or at least would
 require substantial changes in either language pair to get a unified
 format -- changes that without some push (and let's face it, cash) are
 not going to get made. 

 On the other hand, implementing compound words gives us the chance to
 strike while the iron is hot! We can make a (fairly innocuous change --
 any language pair that does not have compounding will be unaffected)
 before getting a plethora of different options and thus avoiding the
 metadix problem for another set of issues.

 Btw, thinking about metadix I have some probably unpopular ideas,
 thatwould preclude any standardisation. I think that maybe we should not
 have one format, but rather many _codified_ formats depending on the
 language(group). For example how to include a verb would be different in
 Tajik and Dutch, because different things are important. Unnecessary
 examples:

 e lm=aanzittenpar n=z/itten__vblex prefix=aan
 pp=aangezeten//e

 Giving:

 e lm=aanzitteniaanz/ipar n=aanz/itten__vblex_sep//e
 e lm=aanzittenplz/lraanz/r/ppar
 n=z/itten#_aan__vblex_sep/plb/aan/lr/r/p/e
 e lm=aanzittenplaangezeten/lraanzitten/r/ppar
 n=gesproken__vblex_sep//e

 Or in Tajik:

 e lm=хариданpar n=кард/ан__vblex stem1=харид stem2=хар//e

In the unification proposal from

http://wiki.apertium.org/wiki/Unification_of_metadix_and_parametrized_dictionaries#A_unifying_proposal

the calls would look like

e lm=aanzittenpar n=z/itten__vblex prms=prefix='aan' 
pp='aangezeten'//e

and

e lm=хариданpar n=кард/ан__vblex prms=stem1='харид' stem2='хар'//e


Are there good reasons not to go with that kind of syntax?


-- 
Kevin Brubeck Unhammer

--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] modify-case aa on uppercased input

2011-01-05 Thread Kevin Brubeck Unhammer


Hi,

Is there a bug in 

modify-case
  clip pos=1 side=tl part=lemh/
  lit v=aa/
/modify-case

when the input is all uppercase, or am I using it wrong?


wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x
wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix
lt-comp lr problem.dix problem.dix.bin
apertium-preprocess-transfer problem.t1x problem.t1x.bin
echo '^GUOKTENum$' | apertium-transfer problem.t1x problem.t1x.bin 
problem.dix.bin 


gives


^detdetqnt{^tOdetqnt$}$


whereas I was expecting to see


^detdetqnt{^todetqnt$}$


-- 
best regards,
Kevin Brubeck Unhammer


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] modify-case aa on uppercased input

2011-01-05 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dc 05 de 01 de 2011 a les 09:32 +0100, en/na Kevin Brubeck Unhammer
 va escriure:
 Hi,
 
 Is there a bug in 
 
 modify-case
   clip pos=1 side=tl part=lemh/
   lit v=aa/
 /modify-case
 
 when the input is all uppercase, or am I using it wrong?
 
 
 wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x
 wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix
 lt-comp lr problem.dix problem.dix.bin
 apertium-preprocess-transfer problem.t1x problem.t1x.bin
 echo '^GUOKTENum$' | apertium-transfer problem.t1x problem.t1x.bin 
 problem.dix.bin 
 
 
 gives
 
 
 ^detdetqnt{^tOdetqnt$}$
 
 
 whereas I was expecting to see
 
 
 ^detdetqnt{^todetqnt$}$

 I think the code that deals with this is in transfer.cc 

 string
 Transfer::copycase(string const source_word, string const target_word)

 I'm struggling to make heads or tails of that though. In the en-ca
 rules, you find:

   modify-case
 clip pos=1 side=tl part=lem/
 lit v=aa/
   /modify-case

 and in the es-ca rules too. So I guess you are calling it right.

 It would seem to be a bug of some description.

s_word == aa, t_word == TO
then for s_word: firstupper is false, uppercase is false, sizeone is false

  if(!uppercase || (sizeone  uppercase))
  {
result = t_word;
result[0] = towlower(result[0]);
//result = StringUtils::tolower(t_word);
  }
  else
  {
result = StringUtils::toupper(t_word);
  }
  
  if(firstupper)
  {
result[0] = towupper(result[0]);
  }

gives us tO (first test passes). If we change the first test to 

  if(!uppercase || (sizeone  uppercase))
  {
result = t_word;
//result[0] = towlower(result[0]);
result = StringUtils::tolower(t_word);
  }

we get the expected to. Does anyone know why we would want to only
lowercase the first character? 



On a related note, why is sizeoneuppercase treated as if it were
lowercase? Isn't it safer to simply ignore sizeone words passed to
modify-case? E.g.

  if(!sizeone){
if(!uppercase) { tolower }
else { toupper }
if(firstupper) { toupper [0] }
  }



-Kevin

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Election of new Apertium PMC

2011-01-25 Thread Kevin Brubeck Unhammer


Hi all, 

The Apertium Project Management Committee has just been elected by the
census of Committers, as per the Apertium By-laws[1]. 

According to the by-laws, the responsibilities of the PMC include
deciding what is suitable for release as an Apertium product,
maintaining the repositories and web sites, speaking on behalf of the
project, resolving license disputes, granting commit access, maintaining
the by-laws, promoting Apertium and attracting and distributing funds of
the project.


The newly elected PMC members are:

 Mikel (president)
 Jacob
 Juan Antonio
 Jim
 Felipe
 Sergio
 Fran


Congratulations to them all :-) 



best regards,
Kevin Brubeck Unhammer, of the Election Board


Footnotes: 
[1]  http://wiki.apertium.org/wiki/By-laws

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Null character makes lt-proc (without -z option) exit

2011-01-26 Thread Kevin Brubeck Unhammer

Hi,

The -z option makes lt-proc flush whenever it sees the null character,
which is nice. But if you don't give it -z, it exits on the null
character -- I'm guessing it shouldn't... 

Added a bug here: 
http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=108

(I got a null character out when converting a pdf to text, so they do
occur in the wild.)


--
Kevin Brubeck Unhammer





--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Null character makes lt-proc (without -z option) exit

2011-01-26 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com writes:

 On 26 January 2011 09:34, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Hi,

 The -z option makes lt-proc flush whenever it sees the null character,
 which is nice. But if you don't give it -z, it exits on the null
 character -- I'm guessing it shouldn't...


 Yeah, though I think it's one of those things that falls into the
 category of if this has happened, you have bigger problems than the
 translator not working.

 It would probably be enough to either escape or discard nulls in the
 deformatter. Is there any compelling reason to not simply discard
 them?

Only if you want to use lt-proc -z. That is, removing nulls in the
deformatter would have to be optional, so it can still work with
lt-proc -z.

Of course you can just run everything with lt-proc -z anyway... but
maybe that gives other side effects?

 Added a bug here:
 http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=108

 (I got a null character out when converting a pdf to text, so they do
 occur in the wild.)


 Seems to me to be a double bug -- whatever your were using almost
 certainly should not have given you a null in its output.

Of course; notified pdfminer of the bug too.


-Kevin


--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] en-es-generador gives a bit too many alternatives on all-caps regexp names

2011-02-06 Thread Kevin Brubeck Unhammer


Hi,

This oddness happens in apertium-en-es, current svn revision:

$ echo Mrs. FOOBAR|apertium -d /l/a/apertium-en-es en-es-generador 
FOOBAR/FOOBAr/FOOBaR/FOOBar/FOObAR/FOObAr/FOObaR/FOObar/FOoBAR/FOoBAr/FOoBaR/FOoBar/FOobAR/FOobAr/FOobaR/FOobar/FoOBAR/FoOBAr/FoOBaR/FoOBar/FoObAR/FoObAr/FoObaR/FoObar/FooBAR/FooBAr/FooBaR/FooBar/FoobAR/FoobAr/FoobaR/Foobar

It seems to be fine up until postchunk:

$ echo Mrs. FOOBAR|apertium -d /path/to/apertium-en-es en-es-postchunk 
^Pn000FOOBARnpantmfsg$^.sent$

(and the web gives Señora FOOBAR so I guess it did work before).



-- 
Kevin Brubeck Unhammer


http://donttrack.us/ -- because you're worth it


--
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC,New Language Pair tr-ky.

2011-03-28 Thread Kevin Brubeck Unhammer

mirlan mip1...@yahoo.com writes:

** If you use trmorph, how will you trim the lemmas to the contents of

the bilingual dictionary ?

I am working on it.

Please explain how ;-)

The regular method[1] is to take an lttoolbox analyser, and find the
full set of possible input-output pairs using the program lt-expand, and
run that through the translator to check for errors. Unfortunately, when
your analyser is in SFST/HFST-format -- which opens for lots of loops
in the analyser -- things get a bit more complicated. Brian Croom's
hfst-fst2strings[2] attempts to do something similar to lt-expand, while
providing some ways to filter the possibilities.

* How will you make the bilingual lexicon ? I presume there are few

freely-available (e.g. open-source/free software) dictionaries, so you

will probably have to build your own. Someone with experience of

Apertium can do ~400 words in a day, so we would like to see a start on

the lexicon to make sure you understand the problems involved.

Right now i have StarDict tr-ky dicitionary, i hope it could help me.

Is there a link? Does it have part-of-speech (word class) information?
(That would make it a lot easier to use.)

* It would be a good idea to start looking at any transfer

(syntactic/morphological) issues between the two languages.

tr-ky have some similarities […]

We are more interested in the differences ;) E.g. differences in case
system, inflection, word order, etc.

The best way to document such differences (or similarities) is to make a
page like http://wiki.apertium.org/wiki/English_and_French/Pending_tests
which you can then test your language pair on.

Do come on IRC more so we can discuss the issues and any possible
problems you have; we don't want anyone to waste lots of time on
something that could be solved by discussing it on IRC :)

best regards,
Kevin Brubeck Unhammer

Footnotes:

[1] http://wiki.apertium.org/wiki/Testvoc

[2]
http://sourceforge.net/mailarchive/forum.php?thread_name=AANLkTinYnDtHehxWWAJf25JVXKYaM0Uw95Kzr41jgKZo%40mail.gmail.comforum_name=apertium-stuff

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC11 Draft Proposal: Rule-based finite-state disambiguation

2011-04-01 Thread Kevin Brubeck Unhammer

 primarily to detail item (5)
   1 week sprint: final polish, debugging and documentation effort

I'd like to see a more detailed plan, especially wrt. which features
should be implemented and prioritised. Some of the CG functions
implemented by e.g. vislcg3[1] are a lot more important than others, so
think about the feature set and test cases for that. E.g. LIST,
SELECT/REMOVE, star (*), BARRIER, Careful (C) are important. Things like
spanning window boundaries, setting marks or making dependency trees
should be deferred until much later. Unification is possible to avoid by
just writing more rules.


[...]

 Recently I’ve been working on an online chessboard (jQuery/node.js),

Include the URL in your proposal, if you can ;) 







-- 
Kevin Brubeck Unhammer

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Update propsal for GSoC 2011 Apertium tr-ky language pair.

2011-04-04 Thread Kevin Brubeck Unhammer

mirlan mip1...@yahoo.com writes:

 Hi,
 Please find attached my proposal for GSoC 2011.

Looks promising, but please make sure you answer all the questions in
http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications#Template
(and in the same order).

What do you plan to do in the Community Bonding period?



best regards,
Kevin Brubeck Unhammer

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Which package to download?

2011-05-03 Thread Kevin Brubeck Unhammer

Congmin min marlon...@gmail.com writes:

 Hi, I am new to Apertium and have two questions for your help with:
 1) It seems there is not a single bundled package on sourceforge for 
 downloading. Then
 which ones should I download for Linux or Windows? For example, I want to 
 download and
 install, and then try out the English-spanish translation first.

lttoolbox, apertium, apertium-en-es 
(install them in that order)

However, if you're planning on developing a language pair, it would be
better to install from SVN:
http://wiki.apertium.org/wiki/Minimal_installation_from_SVN

 2) Is it possible to develop an English-Chinese language pair, without 
 significantly
 change the system?

There might _eventually_ be a problem with handling the alphabet size in
lttoolbox, although if I remember the conversation from last time,
jimregan said it shouldn't be too much trouble to fix… Other than that,
I can't foresee any technical issues.


-- 
Kevin Brubeck Unhammer


Sent from my emacs.


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium-stuff Digest, Vol 49, Issue 4

2011-05-24 Thread Kevin Brubeck Unhammer

Aish Raj Dahal dahalaish...@gmail.com writes:

 In the above example, I have noticed that the ukar symbol of
 Devnagari is not being rendered so ? is being seen as ???.
 There is also a problem with rendering of half letters (sorry, i do
 not know the linguistic term for it). Here is an example of what I
 mean:

 echo computer|apertium en-ne
 ??
 In the above example the word ?computer? should have given ?

 I get ? -- the problem is with your terminal not rendering the
 combining characters, not with Apertium. gnome-terminal is known to
 have issues with Devanagari, is that what you're using?

 Well, I guessed so. I am using the terminal Konsole under KDE 4.6 (Kubunutu 
 11.04). Is
 there a way to work around this problem?

I get the same behaviour under Konsole on Arch Linux with KDE 4.6:

$ echo computer | apertium -d . en-ne
कमपयटर

while piping into a file gives me कम्प्युटर

It seems to be a known bug, with a patch (last one from 2 years ago?) if
you feel like recompiling: http://bugs.kde.org/show_bug.cgi?id=156071

But it might be quicker to just install gnome-terminal/xterm/something
else. Or open emacs and do M-x shell, which displays it correctly :)


-- 
Kevin Brubeck Unhammer

--
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Official Apertium buttons

2011-06-07 Thread Kevin Brubeck Unhammer

Fajro fai...@gmail.com writes:

 On Tue, Jun 7, 2011 at 3:16 PM, Mikel Forcada m...@dlsi.ua.es wrote:
 Hi Apertiumers,
 would HTML/javascript buttons such as the one below (which of course
 can easily be improved) be acceptable to the Apertium community.

 +1.


 I made a facebook page 2 years ago: http://www.facebook.com/Apertium

 Still less than 50 fans :(  Anyone want to be admin?


 Apertium also should have a cool blog; something like
 http://googletranslate.blogspot.com/ but better.

Or at least a planet (blog aggregator) ?
(see https://secure.wikimedia.org/wikipedia/en/wiki/Planet_%28software%29)

-Kevin

--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Constraint Grammer infrastructure at risk

2011-06-09 Thread Kevin Brubeck Unhammer


The University of Southern Denmark has decided to cut financial support
of the VISL Constraint Grammar infrastructure, and the developers are
calling for moral/financial contributions or lobbying initiatives:

https://groups.google.com/group/constraint-grammar/browse_thread/thread/515081fab2b2797d



-- 
Kevin Brubeck Unhammer

--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend

2011-08-16 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va
 escriure:
 On Sunday 28 February 2010, Francis Tyers wrote:
   I don't know Icelandic at all and therefore can't tell whether some of
   the  words are accepted or rejected incorrectly.
  
  Nice, it looks good. Some of the capitalised words should be recognised
  corrected, at least 'Bretlandi' and 'Norðmenn' .
 
 I tried to fix the checking of capitalized words but started to run into 
 problems. It seems that the library API works in somewhat surprising (at 
 least 
 to me) ways when you enter a word that starts with a capital letter and ends 
 with garbage.
 
 The implementation is here
 http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182view=markup
 
 and test cases here
 http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183view=markup
 
 I was able to get all test cases expect the one with TODO in method name 
 implemented. How would you suggest fixing the code so that all tests would 
 pass? Of course a patch would be most welcome :)

 Hmm, strangely enough, when I try an unknown word I get similar strange
 output:

 $ ./test mor.bin 
 ^Reykjanghfghesi$ --
 ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$

Seems to be a bug with partly-matching regexes in the biltrans
functions.

Testing the different functions, I get:

biltransWithQueue: 
^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
 qSize: 0
biltransWithoutQueue: 
^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
biltrans: 
^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
biltransfull: ^$

But, if I comment out the two regex entries

e  par n=persons//e
e  par n=organisations//e

at the end of apertium-is-en.is.dix, I get

biltransWithQueue: @Reykjanghfghesi qSize: 0
biltransWithoutQueue: @Reykjanghfghesi
biltrans: @Reykjanghfghesi
biltransfull: @Reykjanghfghesi

Similarly on the command line with lt-proc -b (while regular lt-proc -a
returns unknown, as it should – the persons/orgnisations regexes don't
fully match either).


-- 
Kevin Brubeck Unhammer

--
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Demonyms from ca.wikipedia

2011-08-25 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 $ wget http://downloads.dbpedia.org/3.7/ca/mappingbased_properties_ca.nt.bz2
 $ bzgrep '/demonym' mappingbased_properties_ca.nt.bz2 |perl
 -MURI::Escape '-MUnicode::Escape qw(unescape)' -ane 'if
 (m!http://dbpedia.org/resource/([^]*)
 http://dbpedia.org/ontology/demonym ([^]*)\@ca .!) {print
 uri_unescape($1).\t.unescape($2).\n;}'

 gives things like:
 Alcover   Alcoverenc, alcoverenca
 Aiguamúrcia   Aiguamurcienc, aiguamurcienca
 Amer  Amerencs, amerenques
 Almoster  Almosterenc, almosterenca
 L'Albiol  Albiolenc, albiolenca
 Alforja   Alforgenc, alforgenca
 ArgelaguerArgelaguenc, argelaguenca
 L'Arboç   Arbocenc, arbocenca
 Arbúcies  Arbucienc, arbucienca
 Albinyana Albinyanenc, albinyanenca

 ...of course, it's not all /that/ neat and tidy:

 Newcastle_upon_Tyne   Geordie
 Encarnación_(Paraguai)encarnacero/a
 Kristiansand  kristiansander
 Bodø  bodøværing
 Haugesund haugesundar, -er

Demonym is a name for someone who's from a certain place? In that case,
at least the last three should be correct and official[1].


[1]  
http://www.sprakrad.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/

-- 
Kevin B. Unhammer


--
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] [libvoikko] Lttoolbox (Apertium) morphology backend

2011-09-02 Thread Kevin Brubeck Unhammer

Kevin Brubeck Unhammer unham...@fsfe.org writes:

 Francis Tyers fty...@prompsit.com writes:

 El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va
 escriure:
 On Sunday 28 February 2010, Francis Tyers wrote:
   I don't know Icelandic at all and therefore can't tell whether some of
   the  words are accepted or rejected incorrectly.
  
  Nice, it looks good. Some of the capitalised words should be recognised
  corrected, at least 'Bretlandi' and 'Norðmenn' .
 
 I tried to fix the checking of capitalized words but started to run into 
 problems. It seems that the library API works in somewhat surprising (at 
 least 
 to me) ways when you enter a word that starts with a capital letter and 
 ends 
 with garbage.
 
 The implementation is here
 http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182view=markup
 
 and test cases here
 http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183view=markup
 
 I was able to get all test cases expect the one with TODO in method name 
 implemented. How would you suggest fixing the code so that all tests would 
 pass? Of course a patch would be most welcome :)

 Hmm, strangely enough, when I try an unknown word I get similar strange
 output:

 $ ./test mor.bin 
 ^Reykjanghfghesi$ --
 ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$

 Seems to be a bug with partly-matching regexes in the biltrans
 functions.

 Testing the different functions, I get:

 biltransWithQueue: 
 ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
  qSize: 0
 biltransWithoutQueue: 
 ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
 biltrans: 
 ^Reykjavblexactvinf/Reykjavblexactvprip3pl/Reykurnmplgenind$
 biltransfull: ^$

 But, if I comment out the two regex entries

 e  par n=persons//e
 e  par n=organisations//e

 at the end of apertium-is-en.is.dix, I get

 biltransWithQueue: @Reykjanghfghesi qSize: 0
 biltransWithoutQueue: @Reykjanghfghesi
 biltrans: @Reykjanghfghesi
 biltransfull: @Reykjanghfghesi

 Similarly on the command line with lt-proc -b (while regular lt-proc -a
 returns unknown, as it should – the persons/orgnisations regexes don't
 fully match either).

I put a patch up at
http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which
solves this for both lt-proc -b, as well as biltransWithQueue. Please
test.

I haven't tried with the other biltrans* functions (I can't see that
they're actually used in the rest of Apertium, so I'm not sure what
they're there for).

It also fixes a problem where superfluous characters after tags would
pass as matches in lt-proc -b (this bug was not present in
biltransWithQueue). It's still possible to carry over _tags_ after the
analysis of course.


I guess it's not strange that this bug was here, since normally you
never have words without tags in bidix, but when using these functions
on a monodix it of course becomes a problem. (And, although it's not
recommended, if people really do want to have non-tagged lemmas in
bidix, lttoolbox should at least not give analyses for lemmas that are
_not_ in the bidix.)


best regards,
Kevin Brubeck Unhammer


--
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free Love Thy Logs t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Installing Language Pair from Incubator

2011-10-10 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Hi Francis,

 It should work the same as usual,

  $ svn co
 https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-en-fr
  $ cd apertium-en-fr
  $ ./autogen.sh --prefix=/home/fran/local/
  $ make
  $ make install

 $ echo Ceci n'est pas une preuve | apertium -d . fr-en
 This no is  a proof

 (Then whichever scripts are needed for ScaleMT).

ScaleMT has this script that downloads and installs pairs from trunk to
a prefix folder, changing modes.xml in the process:

http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/scaleMT/ScaleMTSlave/src/main/assembly-files/installApertiumAndPairs.sh?revision=34131content-type=text%2Fplainpathrev=34131

It shouldn't be too hard to rewrite it to accept
incubator/apertium-en-it as an option to -l. 

Or you can just do what the script does manually if you're in a hurry.
E.g. if you installed everything into the default prefix ~/local it
should be something like:

cd apertium-en-it
mv modes.xml modes.xml.original
if [ $TRADUBI_ENABLED = yes ]
then
java -jar ../../ScaleMTSlave-1.0.jar -processModes 
-inputModes modes.xml.original -outputModes modes.xml -tradubiDictionaryPath 
$DICT_DIR -prefix ~/local/bin
else
java -jar ../../ScaleMTSlave-1.0.jar -processModes 
-inputModes modes.xml.original -outputModes modes.xml
fi
PKG_CONFIG_PATH=~/local/lib/pkgconfig sh autogen.sh 
--prefix=~/local
make 
make install
mv modes.xml modes.xml.modified
mv modes.xml.original modes.xml

 El dl 10 de 10 de 2011 a les 08:39 +, en/na Francis Gwapo va
 escriure:
 Hello,
 
 I am using ScaleMT.  How do i install a language pair from the Incubator
 directory.  I would like to install english to french.
 
 Any help is highly appreciated.
 
 
 Francis


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Transllation ca-en error

2011-10-21 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 21 October 2011 09:27, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Jimmy O'Regan jore...@gmail.com
 I've fixed it in SVN. One of the macros was being called with too few
 parameters, which was causing a segfault.

 $ apertium-transfervm-compiler -i apertium-en-ca.ca-en.t1x  -o ca-en.v1x.bin
 Error: line 6 107, macro 'f_bcond' needs 2 parameters, passed 1

 Why .bin? I thought that one of the problems with was that it
 /doesn't/ output binary.

Ah, true, hadn't even looked at that.

 ISTR that java transfer issues these warnings too, and I'd be far more
 inclined to use that for debugging because java debuggers exist, and
 the (uncompiled) output is easier to read by far.

It does, as well as some others :)

$ apertium-preprocess-transfer-bytecode-j apertium-en-ca.ca-en.t1x 
apertium-en-ca.ca-en.t1x.class
Parsing apertium-en-ca.ca-en.t1x
// WARNING: Macro f_bcond is invoked with too few parameters. Adding blank 
parameters  - for transfer default=chunk/section-rules/rule comment=pro 
+ pro + ANAR + INF (m'ho va donar - gave it to 
me)/action/choose/otherwise/call-macro n=f_bcond
// WARNING: Attribute a_prep is not defined. Valid attributes are: [a_nom, 
a_np_acr, a_adj, grau, a_det, a_num, a_verb, pron, sep, a_adv, a_rel, a_pp, 
a_prn, tipus_prn, pers, gen, nbr, temps, neg, lem, lemq, lemh, whole, tags, 
chname, chcontent, content]
// Replacing with error_UNKNOWN_ATTR - for transfer 
default=chunk/section-rules/rule comment=a + DET + MESO ( al juliol → in 
July/action/out/chunk case=caseFirstWord 
name=in_meso/tags/tag/clip part=a_prep pos=1 side=tl
Compiling: javac -cp /usr/local/bin/../share/apertium/lttoolbox.jar 
./apertium_en_ca_ca_en_t1x.java


--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GUI for adding / editing of words in dictionaries

2011-10-24 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dl 24 de 10 de 2011 a les 01:00 +, en/na Francis Gwapo va
 escriure:
 Hello,
 
 Are there any tools for GUI for adding words in dictionaries?
 
 
 Any help is highly appreciated.

 There have been a number of attempts, you can look at
 trunk/apertium-forms-server, and branches/gsoc2010/alessiojr -- but
 neither of them are particularly effective.

The former has a screenshot and some info at
http://wiki.apertium.org/wiki/Tools#Tools_for_developers


-- 
Kevin Brubeck Unhammer


--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium for Sugar Labs IRC live translation

2011-10-31 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 31 October 2011 10:49, Jimmy O'Regan jore...@gmail.com wrote:
 On 31 October 2011 08:39, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Currently, I think the best we have is email a developer (or the
 mailing list). http://wiki.apertium.org/wiki/Tradubi might be an
 alternative, here users can enter translations in a web interface which
 are applied to their system. For these translations to be contributed
 back to the Apertium project, a developer would have to go through them
 and add some meta-information, but it could still be very helpful.

 Tradubi should really be seen as an alternative to adding words to the
 system, not a means to achieving it. I might accept a _short_ wordlist
 from Tradubi once, but not on an ongoing basis. It's really no more
 useful than a list of unknown words.

 I guess I should note that I've never used it myself, and never will
 (Affero), so I don't know if it has other export options than TMX. If
 there are, maybe they're more useful than a list of unknowns, but I
 still don't imagine it being a viable way of expanding dictionaries.

The article says they create lttoolbox bidix (inserted between the
part-of-speech tagger and the structural transfer module).

I guess it should be possible to add POS along with the lemmas in the
GUI, but it doesn't seem to be implemented yet anyway.


-Kevin


--
Get your Android app more play: Bring it to the BlackBerry PlayBook 
in minutes. BlackBerry App World#153; now supports Android#153; Apps 
for the BlackBerryreg; PlayBook#153;. Discover just how easy and simple 
it is! http://p.sf.net/sfu/android-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium for Sugar Labs IRC live translation

2011-11-01 Thread Kevin Brubeck Unhammer

Aleksey Lim alsr...@activitycentral.org
writes:

 On Mon, Oct 31, 2011 at 03:31:46AM +, Aleksey Lim wrote:
 Hi all!
 
 About Sugar:
 
 Sugar is a learning platform that reinvents how computers are used
 for education. Collaboration, reflection, and discovery are
 integrated directly into the user interface. Sugar promotes studio
 thinking and reflective practice. Through Sugar's clarity of
 design, children and teachers have the opportunity to use computers
 on their own terms. Students can reshape, reinvent, and reapply both
 software and content into powerful learning activities. Sugar's
 focus on sharing, criticism, and exploration is grounded in the
 culture of free software (FLOSS). 
 
 More information about Sugar might be found on http://wiki.sugarlabs.org/.
 
 For some time, Sugar Labs used Google translation API to automatically
 translate IRC posts in several Sugar related channels [1]. But Google
 is closing this service for free usage. Since Sugar is totally about
 learning/doing and, not the least one, supporting FOSS, it might be
 useful to start using Apertium and ask Sugar community start contributing
 to Apertium languages data bases. In this regard, a couple of questions:
 
 * It seems that our most need is en-es/es-en translation,
   how Apertium is good for, at least, initial usage for live
   translation?
 
 * Is there any ongoing project to develop a tool to simplify accepting
   [small] contributions from community members? For example, Sugar Labs
   uses Pootle instance [2] to coordinate i18n efforts, which is a web
   service to accept contributions from the community.
 
 [1] http://chat.sugarlabs.org/
 [2] translate.sugarlabs.org

 Thanks to everyone,

 I will try to setup Apertium (and maybe with openmatrex) as en-es/es-en
 translation backend for Sugar Labs IRC channels to start using it on
 regular basis.

You might want to look at http://wiki.apertium.org/wiki/ScaleMT if
you're going to have a lot of users on your server (this is the service
that runs on http://api.apertium.org ). Please post to this list if you
have trouble installing apertium/scaleMT :)

 Also, if I got it right, existing Web applicateion,
 http://wiki.apertium.org/wiki/Tradubi, is not FOSS and doesn't directly
 contribute to Apertium database. Maybe it makes sense to start thinking
 about having a la Pootle for contributing directly to Apertium or so,
 i.e., tools do matter to have sustainable community contribution.
 I've CCed to Sugar Labs i18n coordinator, Chris Leonard, maybe he has some
 ideas.

Well, Tradubi is AGPL (ScaleMT as well), so it's FOSS, but a
controversial license (the gist of it is that changes you make have to
be contributed back _even if you're just running the software as a web
service_). 


-- 
Kevin Brubeck Unhammer



--
RSAreg; Conference 2012
Save #36;700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Dictionaries, coverage and other dull tasks

2011-11-16 Thread Kevin Brubeck Unhammer

. check source text and find it was 'tiempito'
3. edit to … a short while …


Another problem is that transfer becomes more complex, do you insert
'small' before or after other adjectives (adverbs, preadverbs) in a
chunk? You now have to think about this for every possible noun chunking
rule.


Unfortunately, the original sme analyser which we use as a basis in
sme-nob is even more complex, since it is meant to cover pretty much all
productive derivations. It's very good for annotating and disambiguating
a corpus, but without modifications it is too complex for MT.

All productive derivations includes derivations that change the
part-of-speech, and compounds, and derivations of derivations … If you
can have a diminutive of a deverbal noun, you have to think about how to
add 'small' in all rules, including those that originally were meant for
verbs (a transfer rule pattern matching v.* will match
v.derivation.n.*).

In sme→nob, we restrict the possible derivations in the analyser quite a
lot, and only stick to a small set of single derivations (no derivations
of derivations) for which it is easy to find a way of rewriting that
sounds alright and doesn't induce too much transfer complexity. Even so,
most of the time spent debugging transfer/bidix and the analyser stems
from derivations, and if I spoke Sámi I'm pretty sure my time would have
been better spent on adding words to bidix rather than on trying to
juggle all the possible ways in which derivations interact.

However, derivations can work if 
(1) the derivation is high enough frequency, and
(2) it is possible to deal with it in transfer (and the analyser) in a
simple way, and 
(3) it is possible to make the translation sound good
while preserving the meaning, and
(4) the translator is meant for gisting/assimilation, not
post-editing/dissemination, and 
(5) you have a lot of time on your hands.

Elsewise, I'm not sure it's worth it.


[1] The main thing HFST adds is 'flag diacritics', which basically allow
you to put restrictions on which tags can go together in one
analysis. Thus you could put optional diminutives at the end of
_all_ noun analyses, and if there's a certain noun that can't have a
diminutive, you just put a special 'hidden tag' on that particular
noun in its section, the diminutive line of the noun pardef then
adds another 'hidden' tag that is incompatible with the first one,
and doesn't allow analyses that contain both tags. You could acheive
the same effect in lttoolbox by duplicating all noun pardefs into
with_diminutive- and without_diminutive-versions.


best regards,
Kevin Brubeck Unhammer


--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] update

2011-11-17 Thread Kevin Brubeck Unhammer

Felipe Sánchez Martínez
fsanc...@dlsi.ua.es writes:

Hi all,

I think the task

find X rules for how to translate words with more than one possible
translation

could be misunderstood as they could mixed lexical selection problems
with part-of-speech ambiguity problems. I see graduate students doing
so, every year.

Agree … perhaps a link to http://wiki.apertium.org/wiki/Ambiguity in the
description would be enough? I'm not sure there's a shorter way of
saying it if you don't already know the concepts.

El 16/11/11 15:20, Francis Tyers escribió:
El dc 16 de 11 de 2011 a les 14:18 +, en/na Jimmy O'Regan va
escriure:
On 16 Nov 2011 14:09, Francis Tyersfty...@prompsit.com wrote:

El dc 16 de 11 de 2011 a les 14:02 +, en/na Jimmy O'Regan va
escriure:
On 16 Nov 2011 13:35, Francis Tyersfty...@prompsit.com wrote:

Hey all!

I've thrown all the parts together and have a working prototype
of
the
lexical selection module. A rule compiler, and a processor.

At the moment the rule format is like:

https://apertium.svn.sourceforge.net/svnroot/apertium/branches/apertium-lex-tools/examples/rules.txt

But we have also discussed an XML-based format, which would be
like:

https://apertium.svn.sourceforge.net/svnroot/apertium/branches/apertium-lex-tools/examples/rules.xml

I would like to, as my next step, improve the rule compiler (at
the
moment there is a lot of string mangling that I think could be
improved
on -- e.g. for holding the pattern lengths/ids), and support the
XML
format, but in order to do this, I would first like to get
comments
on
it. Is there anything that you would change? Do you feel
comfortable
writing rules in this format?

It might be better to ask next week, when GCI tasks have been
sorted
and finalised. Split focus and so on.

What a great idea! We could make some GCI tasks like come up with X
lexical selection rules for a language pair of your choice.

You'll want to rephrase that, significantly. GCI students are casually
browsing a list of titles so you should pick a title that doesn't rely
on a relatively obscure phrase - something that immediately informs
them that they probably already know this.

Yeah, how about: find X rules for how to translate words with more than
one possible translation ?

--
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
 I'm not sure how i should get the output of the analyser.

 but running the makefile itself results in an empty af-tagger-data/af.dic

 running this line: after creating af.dic.expand gives usage on lt-proc
 usage lt-proc -e -w -a af-nl.automorf.bin  af.dic.expanded


 Well, there's your problem. Usage prints to stderr, hence empty file.

 Any pointers?

 -a is the mode switch, it should be the first option. -w is completely
 superfluous for tagger training, get rid of it. If you want to train a
 tagger that's aware of pin-the-tail-on-the-compound mode, you'll
 probably have to do something extra, because (IIRC) it's only invoked
 when it encounters words that are not in the dictionary, which will
 never be the case on an expansion of the dictionary - so either
 manually add a bunch of compounds, or get rid of that, too.

-e is the compound thing, -w just ensures lemmas don't get surface case
applied (I guess that's pointless too though?)

-- 
Kevin Brubeck Unhammer




--
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity 
while improving strategic productivity.  Learn More! 
http://www.accelacomm.com/jaw/sdnl/114/51507609/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
 va escriure:
 Jimmy O'Regan jore...@gmail.com
 writes:
 
  On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
  I'm not sure how i should get the output of the analyser.
 
  but running the makefile itself results in an empty af-tagger-data/af.dic
 
  running this line: after creating af.dic.expand gives usage on lt-proc
  usage lt-proc -e -w -a af-nl.automorf.bin  af.dic.expanded
 
 
  Well, there's your problem. Usage prints to stderr, hence empty file.
 
  Any pointers?
 
  -a is the mode switch, it should be the first option. -w is completely
  superfluous for tagger training, get rid of it. If you want to train a
  tagger that's aware of pin-the-tail-on-the-compound mode, you'll
  probably have to do something extra, because (IIRC) it's only invoked
  when it encounters words that are not in the dictionary, which will
  never be the case on an expansion of the dictionary - so either
  manually add a bunch of compounds, or get rid of that, too.
 
 -e is the compound thing, -w just ensures lemmas don't get surface case
 applied (I guess that's pointless too though?)

 Do you think the error might be because it finds a word which has a
 compound analysis, but that isn't in the dictionary ? 

 Unhammer: Have you ever tried to train the tagger for nn-nb with
 compound mode ? 

never tried tagger training at all …

-KBU


--
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity 
while improving strategic productivity.  Learn More! 
http://www.accelacomm.com/jaw/sdnl/114/51507609/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote:
 El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
 va escriure:
 Jimmy O'Regan jore...@gmail.com
 writes:

  On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
  I'm not sure how i should get the output of the analyser.
 
  but running the makefile itself results in an empty af-tagger-data/af.dic
 
  running this line: after creating af.dic.expand gives usage on lt-proc
  usage lt-proc -e -w -a af-nl.automorf.bin  af.dic.expanded
 
 
  Well, there's your problem. Usage prints to stderr, hence empty file.
 
  Any pointers?
 
  -a is the mode switch, it should be the first option. -w is completely
  superfluous for tagger training, get rid of it. If you want to train a
  tagger that's aware of pin-the-tail-on-the-compound mode, you'll
  probably have to do something extra, because (IIRC) it's only invoked
  when it encounters words that are not in the dictionary, which will
  never be the case on an expansion of the dictionary - so either
  manually add a bunch of compounds, or get rid of that, too.

 -e is the compound thing, -w just ensures lemmas don't get surface case
 applied (I guess that's pointless too though?)

 Do you think the error might be because it finds a word which has a
 compound analysis, but that isn't in the dictionary ?

 That shouldn't happen, because the input is the expansion of the
 dictionary. If it is the case, it's most likely that the filtering of
 the expansion is faulty.

 But that's beside the point, the problem is that the options, as
 specified, are triggering the usage information. This could be because
 1) -a needs to be first; or 2) some conflict among -a, -w, -e. If it's
 a conflict between -a and -w and/or -e, then that's a bug in the
 option handling in lt-proc, and someone who cares about -w and -e
 should fix it (i.e., it ain't gonna be me).

 If it's 2), my point is that the bug can be worked around by simply
 omitting -w and -e, because they do nothing -- or omit -a, because
 it's the default mode. Whatever works. I'm sure that -w does nothing
 in this context, but I'm not entirely sure about -e - my recollection
 is that it is engaged if and only if there is no dictionary analysis
 of the word, which, see above, should not happen. I don't know, I've
 never used it.

 Leading from that, if you want to train the tagger to have some
 awareness of these guesses at compounds, then the tagger dictionary
 will need to contain material other than the expansion of the
 dictionary.

 $ echo foo |lt-proc -w -a   en-es.automorf.bin
 ^foo/*foo$

 It's not 1)

 $ echo foo |lt-proc -e -a   en-es.automorf.bin
 lt-proc: process a stream with a letter transducer
 [SNIP]

 It's a conflict between -e and -a.

I think that was because -e can be seen as a replacement for -a (another
main mode, and it doesn't make sense to use it with -b nor -g), so I'd
say it's a not-a-bug.


-- 
Kevin Brubeck Unhammer


--
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity 
while improving strategic productivity.  Learn More! 
http://www.accelacomm.com/jaw/sdnl/114/51507609/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] debug mode files

2011-12-19 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dg 18 de 12 de 2011 a les 22:42 -0500, en/na Hector va escriure:
 Hi all,
 here's a hack to help debug the output of mode files in Apertium:
 
 https://gist.github.com/1495251
 
 The trick is to use the tee command. Then sed is used to replace the
 pipes in the mode file with tee /dev/tty. The sample output will
 give you a clear idea of what happens through the pipeline.
 Hope it helps. Best!

 Cool! A similar script, although a bit more involved, is Unhammer's
 'apertium-view.sh': 

 http://wiki.apertium.org/wiki/Apertium-view.sh

… more involved, but not necessarily better … I definitely prefer
Hector's simple solution, never knew you could tee to /dev/tty :-)


-- 
Kevin Brubeck Unhammer


--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Problem with new language HOWTO

2012-01-03 Thread Kevin Brubeck Unhammer

Bartias bart...@o2.pl writes:

 I can include the files, but I'm not sure how - should I post the code 
 somewhere, or send the files to you?

 Anyway, with the help of Unhammer, I did manage to get some progress.

 Currently, the program does produce a correct output for domy --- houses 
 (command: http://codepad.org/52ZA8kvv)

 However, in the opposite direction, the result for houses is #dom (command: 
 http://codepad.org/RrjDTEYH)


 When I drop off the last command, that is when i type in echo houses | 
 lt-proc en-pl.automorf.bin | ./gawk | apertium-transfer 
 apertium-pl-en.pl-en.t1x en-pl.t1x.bin en-pl.autobil.bin 

 I get ^domnpl$

 I did type in

 lt-comp rl apertium-pl-en.en.dix en-pl.autogen.bin

 but it did not change anything.

That last command looks like it should be

lt-comp rl apertium-pl-en.pl.dix en-pl.autogen.bin

(assuming this is for compiling the Polish generator, for the en→pl
direction.)

-- 
Kevin Brubeck Unhammer


--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] regular expressions in lttoolbox: proposal

2012-01-08 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

Is there anyone else that would be interested in having an alphabet
symbol for expressions in lttoolbox like \w -- e.g. any alphabetic
char ? It would avoid having long (and inadequate) lists such as the
following:

ÀÁÂÄÇÈÉÊËÌÍÎÏÑÒÓÔÖÙÚÛÜàáâäçèéêëìíîïñòóôöùúûüABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

I conceive of it working as follows:

* \w would be compiled into a special symbol.
* in state.cc, the apply() method would be changed to check if the input
is !punctuation and !space, and then follow it (or something)

Something along these lines is preferable to just adding all unicode
character symbols.

Another option would be to make \w basically the same as the alphabet
in the dictionary files. (Although this would mean that I would need to
add an alphabet header to the lexical selection rule format)

Any thoughts / comments ?

I think I prefer the first option (\w means !punct!space, perhaps
non-numeric?). Sounds useful, although aren't those long regexes
normally used for names? Because then it would perhaps make sense to
additionally have a symbol \u or something for only uppercase characters
(although I don't know how good isupper(unicode) is in whatever lib
lttoolbox uses). Imagine re\u\w+ \u\w+/re instead of the mess that
currently is ...

--
Kevin Brubeck Unhammer

--
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual
desktops for less than the cost of PCs and save 60% on VDI infrastructure
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Problem with HTML deformatter

2012-01-18 Thread Kevin Brubeck Unhammer

Miquel Esplà miqueles...@gmail.com
writes:

 Hi everybody,

 I am performing some experiments with the fr-es pair. I am trying to 
 translate small n-grams
 (with 1=n=3). To be sure that they are translated independently, I enclose 
 each n-gram into
 HTML paragraph tags (p/p). Now, this is my problem: the defformater adds 
 a dot at the end
 of each n-gram. One of my n-grams in French ends with the word avr (I know 
 it means nothing,
 but it is automatically extracted from a text) and when the dot is added, it 
 is recognised by
 the lt-proc as an abbreviation of avril. As a consequence, this paragraph and 
 the following
 one are concatenated in the resulting translation.

 This is an example of the output of the deformatter:
 .[][htmlbodyp]- avr.[][\/pp]- axes.[][\/p\/body\/html
 and this is what the lt-proc outputs:
 ^./.sent$[][htmlbodyp]- ^avr./avr.nmsg$[][\/pp]- 
 ^axes/axenmpl/axer
 vblexprip2sg/axervblexprsp2sg$^./.sent$[][\/p\/body\/html

 I have been taking a look to the defformater deffinition, but I am not sure 
 about how to solve
 this. I guess if a space were added before the dot by the deformatter, the 
 problem would be
 solved, but I am not sure about where to add this feature. May anybody help 
 me?

cd trunk/apertium
wget http://paste.pocoo.org/raw/536603/ -O nodot.patch
patch -p0  nodot.patch
make  make install 

This will ensure none of the deformatters add any dots that weren't in
the input text. I'm not sure why they do in the first place. Perhaps it
helps tagging some times. I just find it a nuisance, so I keep an
install in another prefix for when I want a deformatter that doesn't
mess up punctuation.


hope this helps,
Kevin Brubeck Unhammer


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Problem with HTML deformatter

2012-01-19 Thread Kevin Brubeck Unhammer

Miquel Esplà miqueles...@gmail.com
writes:

 Hi Kevin and Fran,

 I Kevin's patch, but somthing changes that causes mistakes when reformatting. 
 I will try to
 add some rubbish at the end of my segments. Anyway, thank you so much for 
 your help!

Ah, yes the reformatter tries to remove those inserted dots again, but
if none of your dots are inserted by the deformatter, then it'll remove
stuff it shouldn't.

http://paste.pocoo.org/raw/537208/ should make it stop doing that, but I
haven't really tried reformatting much without the dot. 

-Kevin



 2012/1/18 Kevin Brubeck Unhammer unham...@fsfe.org

 Miquel Esplà miqueles...@gmail.com
 writes:

  Hi everybody,
 
  I am performing some experiments with the fr-es pair. I am trying to 
 translate small
 n-grams
  (with 1=n=3). To be sure that they are translated independently, I 
 enclose each n-gram
 into
  HTML paragraph tags (p/p). Now, this is my problem: the defformater 
 adds a dot at
 the end
  of each n-gram. One of my n-grams in French ends with the word avr (I 
 know it means
 nothing,
  but it is automatically extracted from a text) and when the dot is 
 added, it is
 recognised by
  the lt-proc as an abbreviation of avril. As a consequence, this 
 paragraph and the
 following
  one are concatenated in the resulting translation.
 
  This is an example of the output of the deformatter:
  .[][htmlbodyp]- avr.[][\/pp]- axes.[][\/p\/body\/html
  and this is what the lt-proc outputs:
  ^./.sent$[][htmlbodyp]- ^avr./avr.nmsg$[][\/pp]- 
 ^axes/axenmpl/
 axer
  
 vblexprip2sg/axervblexprsp2sg$^./.sent$[][\/p\/body\/html
 
  I have been taking a look to the defformater deffinition, but I am not 
 sure about how to
 solve
  this. I guess if a space were added before the dot by the deformatter, 
 the problem would
 be
  solved, but I am not sure about where to add this feature. May anybody 
 help me?

    cd trunk/apertium
    wget http://paste.pocoo.org/raw/536603/ -O nodot.patch
    patch -p0  nodot.patch
    make  make install

 This will ensure none of the deformatters add any dots that weren't in
 the input text. I'm not sure why they do in the first place. Perhaps it
 helps tagging some times. I just find it a nuisance, so I keep an
 install in another prefix for when I want a deformatter that doesn't
 mess up punctuation.

 hope this helps,
 Kevin Brubeck Unhammer


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] REQUEST NICE FOR INSTALL APERTIUM IN VIRTUAL MACHINE

2012-01-25 Thread Kevin Brubeck Unhammer

john felipe urrego mejia
ingenierofelipeurr...@gmail.com writes:

 Hi, can give me a light that I need to take a something like http://
 translator.apertium.eu/ in local enviroment 192.168.xx.xx, I install 
 virtualbox and
 ubuntu VM, please help.

You should be able to install Apertium + Lttoolbox + your language pairs
of choice using this guide:

http://wiki.apertium.org/wiki/Apertium_on_Ubuntu

If you want to set up a web page, I think
https://help.ubuntu.com/community/ApacheMySQLPHP is the official Ubuntu
guide. The source of apertium.org is in
http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/webspace 
I don't know what translator.apertium.eu runs, but if you install
Apache+PHP and put webspace in your /var/www, you should get your
translator at 192.168.something.something. 


-- 
Kevin Brubeck Unhammer


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] From the libvoikko list: Developing spellchecker infrastructure from automata

2012-02-01 Thread Kevin Brubeck Unhammer

Trond Trosterud trond.troste...@uit.no writes:

 Quoting a letter from a collegue of mine, Sjur Moshagen 
 (sjur.mosha...@uit.no).

 Since many of the new apertium languages now are built as fst with lexc / 
 hfst, a setup like the one referred to below would as a side effect create 
 spellcheckers directly from apertium-based automata. People interested can 
 contact Sjur (address above).

Not only the lexc/hfst-based ones; libvoikko has (experimental) support
for lttoolbox fst's too :) see
http://wiki.apertium.org/wiki/Spell_checking


Regarding the level of experimentalness,
http://sourceforge.net/apps/trac/voikko/wiki/libvoikko/SupportedLanguages
says:
 * BLOCKER: Namespace pollution (generic class names in default namespace) 
 * PROBLEM: No proper method for performing analysis on a string argument. 

The BLOCKER sounds like it requires a lot of sedding in lttoolbox, might
require the same replacements in apertium too?

The PROBLEM I'm pretty sure is solved. The method in
http://wiki.apertium.org/wiki/Lttoolbox_API#Using_as_a_module_from_Python
should work now (at least the known bugs are dealt with).




 ---

 Hello list members,

 The Divvun group at the University of Tromsø is looking for someone that 
 could upgrade the VoikkoSpellService code for payment. The upgrade should 
 contain at least the following features:

 * support for multiple speller languages
 * support for the latest version of libvoikko, with support for zhfst files
 * support for speller files within user home dirs (~/.voikko/...)
 * universal binary (at least 32 and 64 bit, perhaps also PPC)

 I have asked both the main developer of Voikko and the developer of the 
 present VoikkoSpellService code if they would like to do it, but they have 
 kindly asked me to forward the request to the list.

 If anyone is interested, please contact me off-list.

 Regards,
 Sjur Moshagen
 Divvun/UiT
 www.divvun.no

-- 
Kevin Brubeck Unhammer


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Using Lttoolbox from within java

2012-02-08 Thread Kevin Brubeck Unhammer

stevens35 steven...@llnl.gov writes:

 Hello,

 I've been searching around for a good morphological analyzer for a while 
 and came across Lttoolbox.  The analyzer step does exactly what I want 
 for words in a language, it splits the word into it's lexical base and 
 then adds in morphological tags based on how the word was formed.Up 
 until now, I've just been using the Porter Stemmer to get the root word, 
 but it's always been displeasing because it throws away the rest of the 
 surface form. 

 However, most of the text processing code I work with is in Java, and if 
 possible, I'd like to keep everything within Java.  Had anyone had any 
 experience linking to Lttoolbox from Java?  Or does anyone know of any 
 java versions of Lttoolbox that utilize the existing dictionaries, or a 
 similar tool for java? 

lttoolbox-java works fine with all the existing dictionaries, and should
be feature-complete with the C++ version. lttoolbox and lttoolbox-java
are completely independent of each other, so you don't need the C++
version to use the Java version and vice versa, so keeping everything
within Java should work fine.


-- 
Kevin Brubeck Unhammer


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium-af-nl release

2012-02-08 Thread Kevin Brubeck Unhammer

Congrats :)

Pim Otte otte@gmail.com writes:

Hello everyone,

We are proud to present: the release of apertium-af-nl v0.2.0.
With the Afrikaans half from the currently dormant af-en project and
the Dutch half as a product from the past two Google Code-ins.
This means apertium-af-nl is now in the trunk and the release is
available
at https://sourceforge.net/projects/apertium/files/apertium-af-nl/apertium-af-nl-0.2.0.tar.gz/download

Some stats:

Coverage:
Afrikaans:
Number of tokenised words in the corpus: 4691956
Number of known words in the corpus: 3919969
Coverage: 83.5 %
Dutch:
Number of tokenised words in the corpus: 105037639
Number of known words in the corpus: 86269947
Coverage: 82.1 %

Number of entries:
Afrikaans:
7459
Bilingual:
6152
Dutch:
7236

It is testvoc clean. More statistics and information on the
construction can be found in Otte, P. and Tyers, F. M. (2011) Rapid
rule-based machine translation between Dutch and Afrikaans.
Proceedings of the 16th Annual Conference of the European Association
of Machine Translation, EAMT11 (
xixona.dlsi.ua.es/~fran/publications/eamt2011a.pdf )
Regards,

Pim

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lint Checker Ideas for GSOC

2012-03-22 Thread Kevin Brubeck Unhammer

Aaron Rubin aaronjrub...@gmail.com
writes:

 Hi all,

 I've spoken with Francis a few times over e-mail and been on IRC a bit, but I 
 don't
 think I've introduced myself to the whole listhost. I'm a third-year student 
 at the
 University of Chicago, majoring in linguistics with a minor in Comp Sci. Most 
 of my
 programming experience is doing various analyses of text files in C, so it 
 seemed that
 of all the project ideas, the lint tester for suspicious constructs in .dix 
 files would
 be the best for me (I thought about proposing a Japanese-English language 
 pair, but
 Google does a fairly OK job with Japanese as it is, and there's no way I 
 could surpass
 that in three months). I've already written a duplicate tag checker in C and 
 sent it out
 to the listhost earlier today, and I've been thinking about how I'd implement 
 some of
 the other suggestions on the lint tester ideas page, as well as a few ideas 
 of my own.
 The problem, though, is that I'm not sure how I'd be able to fill up the 
 whole summer
 doing it! This is my tentative schedule:

 Week 1: Redundant Entry Finder
 Week 2: Testing Full Entries in Lemmas where Part of the Lemma is Specified 
 by the
 Pardef
 Week 3: Testing Misspelled Tags and Pardefs
 Week 4: Testing Incompatible Tags (multiple gender tags instead of combined 
 tags for
 nouns of ambiguous gender, multiple number tags, a noun and adj tag on 
 the same
 entry)
 Week 5: Testing Tag Missing on One Side of Translation Equivalents (a noun 
 tag on the
 English side, but not on the Spanish side)
 Week 6: Testing Missing Gender on Gendered Languages (this would be an 
 intricate one...
 I'd have to investigate which of the languages in the language pairs have 
 gender or noun
 class systems and have the program take that into account)

 But not all of those would necessarily take up a week, and there's no way 
 that all of
 this will take 12 weeks! So I've been thinking about common errors that might 
 show up in
 transfer rules files, but nothing's really come to mind. Has anyone else 
 noticed common
 mistakes in .dix or transfer rules files that would be suitable for this kind 
 of program
 to look for?

Say you're editing a transfer file that has 

def-attr n=a_det
  attr-item tags=det/ 
  attr-item tags=det.emph/
  attr-item tags=det.dem/
  attr-item tags=det.itg/
  attr-item tags=det.qnt/
  attr-item tags=det.pos/
/def-attr
…
not
 equal
  clip pos=1 side=tl part=a_det/   lit v=/
 /equal
/not

(ie. it's not a determiner at all) and you want to make it a more
specific requirement, like it has to be the tag sequence detpos.
It's easy to leave out the -tag and write

not
 equal
  clip pos=1 side=tl part=a_det/   lit v=det.pos/
 /equal
/not

where the correct version would be

not
 equal
  clip pos=1 side=tl part=a_det/   lit-tag v=det.pos/
 /equal
/not

or to write det.poss or something, which would never match since it's
not defined in a_det. Here you could give a warning if the user tests
for a def-attr-defined clip being anything other than 1) empty, 2) a
tag/tag sequence from the def-attr, or 3) a variable. 

There are also default clips not defined in def-attr, like lemh,
lemq, lem, that can contain empty or non-empty lit's, but never
tags.

I guess you could also do the same for begins-with instead of equal.


You could probably also warn about

in
 clip part=a_det/
 list n=some-list-that-is-disjoint-from-a_det/
/in


And then there's calling a macro with the wrong amount of arguments; the
various vm for transfer compilers show this check, but the standard one
does not, so it wouldn't hurt to put it in.


-Kevin


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lint Checker Ideas for GSOC

2012-03-22 Thread Kevin Brubeck Unhammer

Jacob Nordfalk jacob.nordf...@gmail.com
writes:

 2012/3/21 Aaron Rubin aaronjrub...@gmail.com

 But not all of those would necessarily take up a week, and there's no way 
 that all
 of this will take 12 weeks! So I've been thinking about common errors 
 that might
 show up in transfer rules files, but nothing's really come to mind. Has 
 anyone else
 noticed common mistakes in .dix or transfer rules files that would be 
 suitable for
 this kind of program to look for?

 This might not strictly be a lint checkers job, but have a look at 'beginner 
 errors'
 like breaking the XML or not following the XML Schema:
  - forgetting an end tag (like  writing s n=adj instead of s n=adj/
  - messing up the  's, like writing s n=adj 
  - mis-naming an attribute
  - forgetting  attributes 

 etc. 

 I am not sure these kinds of errors are always reported to the user in a 
 meaningfull way
 by the compiler. 

 However, I am sure that there are some users that use a lot of time 
 struggling with such
 errors.

 So I'd suggest leaving a week for seeing if there is something you could 
 do to help out
 dix editor novices.

Couldn't the lint just run apertium-validate-dictionary first?

Although, one issue is what to do about those who use xslt transformed
dictionaries (e.g. using the alt attribute). I'm guessing it would be
easier to run lint on the transformed dictionary, but not as helpful
since line number would have changed. On the other hand, if you run lint
on the source dictionary, it won't validate (though you can still check
that it's well-formed XML).


-Kevin


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lint Checker Ideas for GSOC

2012-03-22 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 22 March 2012 08:29, Francis Tyers fty...@prompsit.com wrote:
 isn't there. (b) Or using lit when you mean lit-tag because what
 you're checking against can only be a tag.


 Counter example:
 let
   clip pos=1 side=tl part=some_part/
   lit v=/
 /let

http://permalink.gmane.org/gmane.comp.nlp.apertium/1676 so def-attrs can
be empty lit's, lit-tags, or variables (but not non-empty lit's), and
this goes for at least equal, begins-with (ends-with? can't
remember if we have that), and let.

Another exception is that you can do stuff like

   concatlit v=amp;lt;/lit v=tag/lit v=amp;gt;//concat

on the right hand side. I'm not sure why you'd want to though (unless
you were using a variable, in which case we're out of lint's league
anyway), and in the above example I would want a warning.


-Kevin


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat

2012-03-26 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dl 26 de 03 de 2012 a les 07:08 +0400, en/na Ilnar Salimzyan va
 escriure:
 I am sorry for this email to be so long. Honestly, I shortened it
 several times. Consider it to be the first draft of the proposal.

 No problem :)

 Dear Apertium mentors,
 
 my name is Ilnar Salimzyanov (‘selimcan’ on Sourceforge and IRC,
 ‘Ilnar.salimzyan’ on Apertium’s wiki, ‘Ilnar Salimzyan’ on many other
 places).

 Hi Ilnar! :D

 
 My native language is Tatar, I also speak Russian on native level.
 
 = Reason I am writing =
 
 I would like to apply for Google Summer of Code and work on adopting
 Turkish-Tatar / Kazakh-Tatar language pair. I am writing here to
 discuss my plans, to get some feedback, which would facilitate writing
 my proposal.
 
 = Who I am / Some background information =
 I am the first year master’s student at the Kazan Federal University,
 studying Applied Linguistics [1].
 
 I got to know about Apertium first time in 2009, while writing a small
 paper at the university on comparison of available machine translation
 systems. Apertium fascinated me then being open source, showing rapid
 growth and being a good potential starting point for Tatar and other
 Turkic languages (yes, I have thought about them too). I played around
 with lttoolbox dictionary for Tatar (bad idea, I know, but I didn’t
 know about FSTs then and there weren’t any other Turkic languages
 involved). I even managed to model nouns morphotactics using it! :)

 Well, lttoolbox also produces FSTs, the difference is that in HFST you
 have separate morphotactics and morphophonology transducers, which are
 then composed to form the final transducer in lttoolbox there is only a
 single transducer.

 Back in 2009 I translated part of the Official Documentation into
 Russian [2] (till chapter 3.2.3; besides someone willing to finish it
 the translation needs a good editor). Also in 2009 I translated
 Apertium New language pair Howto into Russian.
 
 I was one of the participants of the Šupaškar Apertium Workshop, held
 in January this year, where Francis Tyers, Hector Alos-i-Font,
 Jonathan Washington and Trond Trosterud were instructors.

Cool =D

 I was very fortunate to see Jonathan and Francis work on Tatar-Bashkir
 pair as an example pair for the Šupaškar Workshop and move it to
 nursery. It is very useful to have a transducer for my native language
 (and a language closest to it) to learn the semantics and structure of
 lexc and twol files (which I wasn’t really familiar with, since using
 HFST with Apertium is relatively new thing and it is not mentioned in
 the Official Documentation), along with the reading of the famous
 FSMBook.

 :)

 I have been involved in work on Tatar-Bashkir pair as, let’s say,
 “language-consultant” and “tester”. With another fellow from Ufa we
 have been translating top-5000 wordlist of Russian National Corpus
 into
 Tatar and Bashkir. This translations were added then to the translator
 files. Also, I have been analyzing some errors in the translations
 finding out, where Apertium-tt-ba performed not so well, describing it
 on the wiki [3,4] and commiting from time to time to svn.
 
 = Resources =
 // I will list all relevant resources on the wiki before submitting
 the proposal//
 
 For both language pairs I will not have to start from absolute
 scratch. Transducers for all three languages —  Turkish, Kazakh and
 Tatar —  perform quite well, having 87%, 76% and 56% coverage each
 [5]. Having that, I thought that the crucial thing to benefit from
 these separate  transducers most with less work is to write bidix
 files, translating words from each lexc file into Tatar.
 
 == Bilingual dictionaries ==
 ===Kazakh===
 All words in kazakh.lexc [6] were commented with English glosses
 (thanx who had done this!). Using a simple sed one-liner, I prepared
 bidix entries with Kazakh words as the left side, putting english
 glosses again into comments. In few hour’s work, I translated ~500
 nouns (not proper nouns) and most of the adjectives into Tatar [7].
 For Kazakh words which look very similar to Tatar ones and have the
 same meaning as these Tatar equivalents, this can be done very
 quickly. For other I consulted Kazakh-Russian dictionaries too, but
 again, translating all remaining words from kazakh.lexc will take no
 more than few days of focused work.
 ===Turkish===
 Unfortunately very few words in Turkish.lexc have English glosses. But
 there is a Tatar-Turkish dictionary, which was released under GPL [8],
 and another Tatar-Turkish online dictionary [9], also under GPL. The
 process will be similar for Turkish too —  take stems from
 turkish.lexc, put them automatically to bidix and translate them into
 Tatar, consulting where necessary dictionaries mentioned above or
 Turkish-Tatar dictionary in print.
 
 ==Parallel corpora==
 Some sentences for Turkish-Tatar are available at Tatoeba project. As
 a source for parallel corpora Bible or Quran translations can

Re: [Apertium-stuff] Proposal: don't prefix paths in apertium-gen-modes (but prefix dirname $0 to PATH in apertium)

2012-03-26 Thread Kevin Brubeck Unhammer

Stephen Tigner stephen.tig...@gmail.com
writes:

On Sun, Mar 25, 2012 at 10:49 AM, Jacob Nordfalk
jacob.nordf...@gmail.com wrote:
+1 for the proposal.

2012/3/24 Stephen Tigner stephen.tig...@gmail.com

I think I'm gonna need to read that again a few times to see if that'd
affect the Java runtime at all,

Unhammer is actually writing this as a result of a discussion I started on
IRC on the occasion that I didnt like lttoolbox-java to have to cut away
these paths.

(Unhammer, I want you as my ghost writer :-)
Ah, I see. makes more sense now, thanks.

but I thought I'd at least pitch in
with an explanation of how the Java runtime currently handles .mode
files.

Thanks for a great explanation.
Anyone who wants to browse the code he explains can look
at http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/pipeline/

A quick fix that uses path could be to check for existence of the
program at the specified path, and if not, try running it w/ just the
command name w/o a full path.

Its not that clear from the code diff, but the idea is first to look for the
commands in the installation dir, then on the general PATH:

PATH=${APERTIUM_PATH}:${PATH}

I think the java port should do the same, but first check if the task can be
done by lttoolbox-java itself internally. So, like

PATH=can we do it without invoking external
stuff?:${APERTIUM_PATH}:${PATH}

:-)

Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable? If
so, that should be fairly easy to implement. Just need to tweak a bit
how the UNKNOWN programs are called. I'll try and take a look at it
tonight if I have time. n.n

If you install apertium to /usr/local/bin/, the shell script
/usr/local/bin/apertium will have
APERTIUM_PATH=/usr/local/bin
as the second line.

--
Kevin Brubeck Unhammer

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] About de-duplicating of dictionaries

2012-03-28 Thread Kevin Brubeck Unhammer

Ilnar Salimzyan
ilnar.salimz...@gmail.com writes:

 This thread grew out of the discussion of my proposal draft [see
 GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat from March 26].

 Having discussed the problem of monodixes/lexc-files copied in many
 pairs (and in more and more pairs) with Jonathan and seeing that
 people at IRC come to this question quite often (Like What lexc of
 Tatar should I choose for my new Tatar-X translator?), I decided to
 start a new discussion here :)

 On Mon, Mar 26, 2012 at 2:37 PM, Kevin Brubeck Unhammer
 unham...@fsfe.org wrote:

 It'd be nice to have some general method for deduplicating
 dictionaries

 I think we all share the same view.

 Obvious that having single transducers for many related languages
 compatible with each other is great. It would facilitate creation of
 new translators.
 And I think that keeping them compatible on the tags/morphotactics
 level can and should be done.

… We use a trimming script in apertium-sme-nob; with this
 method, you would have apertium-kaz and apertium-tat as just
 development dependencies. So you'd add stuff to apertium-kaz/kaz.lexc
 and to your bidix, and then run a script from apertium-kaz-tat with the
 path to apertium-kaz and it creates a file apertium-kaz-tat/kaz.lexc
 (and you never change this file, although it's in SVN). Similarly for
 tat.lexc.

 This works, as long as the trimming script is well configured, but
 perhaps it'd be 'cleaner' to have apertium-kaz/apertium-tat as make
 dependencies and do the trimming each time you type make (no need for
 apertium-kaz-tat to have generated kaz.lexc/tat.lexc files in SVN).

 (The weak point in the chain is the trimming script though, which
 expects the lexc files to be fairly easily parsable (they're not,
 really). Ideally we would have ways of trimming both HFST and lttoolbox
 dictionaries so that we never had to copy-paste anything between pairs,
 but language pairs tend to have stuff in them that's rather specific to
 that pair, not sure how that is best dealt with.)

 = Reasons why we have monodixes copied =
 1. Historical (there weren't many pairs having common part initially,
 but Apertium keeps growing);
 2. Because of the stuff specific to a given pair.

 = Some imaginable solutions =
 Just to sum up:
 1. Transducers for language A and Language B as make-dependencies;
 2. Mono-dictionaries in apertium-langA and apertium-langB as
 development-dependencies + some trimming / duplicating /
 keeping-up-to-date scripts.

 = Strengths and weaknesses of each solution =
 Strengths and weaknesses become clear when we 'do' need to add
 language-pair-specific stuff to mono-dictionaries.

 All examples that come up in mind are for Russian-Tatar (=not related
 languages), so for related languages this might be not relevant. Maybe
 they won't need any pair-specific-stuff in their mono-dictionaries at
 all, but this sounds too good to be true :)

 Consider Russian word заговорить (start to talk). To Tatar it is
 translated with two words, just like to English. And in Russian-Tatar
 / Russian-English pair we will need to add start to talk as a
 multiword.

 I am sure that similar cases, when a word of languageA is translated
 to languageB with a multiword, can be found for related languages too.

 == 1. Make-dependencies ==
 We can add such words to monodictionaries in apertium-langA,
 separating them into sublexicons or commenting them like this stuff
 is needed for langA-langB pair.
 But this way transducer will become noisier and noisier.

 == 2. Mono-dictionaries in apertium-langA and apertium-langB as
 development-dependencies + some trimming / duplicating /
 keeping-up-to-date scripts ==
 In this case monodictionaries in apertium-langX are considered to be
 something like vanilla software. They are kept close to linguistical
 traditions of POS-tagging etc. And they serve as base for building new
 pairs involving this languages.

 Modifying them for a given pair is like patching the vanilla software.
 A script could keep this modified versions in apertium-langX-langY
 up-to-date with mono-dictionaries in apertium-langX and
 apertium-langY.

 A challenge here is not to overwrite modifications while updating.
 Although script used in sme-nob solves the problem of updating, as I
 understand, it will overwrite any modifications made in
 apertium-sme-nob. And I am not sure if this can be done at all
 technically.

We never modify the trimmed dictionary, we consider it a generated file.
All modifications go to the dictionary it was trimmed from.

Although we don't, we _could_ actually have sme-nob-specific additions
to the sme dictionary. It shouldn't be much worse than concatenating
another .lexc file onto the trimmed sme.lexc. Note that this would only
be lexicon additions (like start to talk, good example), not changes
to tagging etc.

On the other hand, if you're already trimming, it shouldn't hurt to put
start to talk into the monolingual module (apertium-eng or whatever

Re: [Apertium-stuff] apertium tagger usage

2012-03-28 Thread Kevin Brubeck Unhammer

Orosz György oros...@itk.ppke.hu writes:

 Dear All,

 I am asking your help, hope someone can clarify these thigs: I am wondering 
 if it is
 possible to use the apertium tagger as a standalone application, without 
 creating all
 the resources used by the MT system.

It's possible to use it by itself, like 

echo '^foo/foonsg/fooij$ ^bar/barnsg/barvblexinf$' | 
apertium-tagger en.prob

I don't think you can compile apertium-tagger without compiling other
core apertium functions, if that's what you mean. But the only other
build dependency you have to compile is lttoolbox, and when you've got
them installed, apertium-tagger is perfectly usable by itself (without
using the rest of the scripts).

 We have a morphologically disambiguated training
 corpus, and a morphological analyzer. Is it possible to train the tagger in a 
 supervised
 mode using only these resources? (Of course we can convert the output of the 
 MA to the
 format that is used by Apertium.)

 If not, could anyone explain how to use the tool? 
 The manual states the following parameters are needed:
 apertium-tagger[-d] -s=n DIC CRP TSX TAGGER_DATA HTAG UNTAG

http://wiki.apertium.org/wiki/Tagger_training should be a good starting
point (it seems no one has written the section on Supervised training,
but see Question 2 under
http://wiki.apertium.org/wiki/Unsupervised_tagger_training#Improving_the_tagger_performance
).


hope this helps,
Kevin


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] About de-duplicating of dictionaries

2012-03-28 Thread Kevin Brubeck Unhammer

Ilnar Salimzyan
ilnar.salimz...@gmail.com writes:

 On Wed, Mar 28, 2012 at 11:11 AM, Kevin Brubeck Unhammer
 unham...@fsfe.org wrote:
 Ilnar Salimzyan
 ilnar.salimz...@gmail.com writes:

 This thread grew out of the discussion of my proposal draft [see
 GSoC: Adopting a language pair: Tur-Tat / Kaz-Tat from March 26].

 Having discussed the problem of monodixes/lexc-files copied in many
 pairs (and in more and more pairs) with Jonathan and seeing that
 people at IRC come to this question quite often (Like What lexc of
 Tatar should I choose for my new Tatar-X translator?), I decided to
 start a new discussion here :)

 On Mon, Mar 26, 2012 at 2:37 PM, Kevin Brubeck Unhammer
 unham...@fsfe.org wrote:

 It'd be nice to have some general method for deduplicating
 dictionaries

 I think we all share the same view.

 Obvious that having single transducers for many related languages
 compatible with each other is great. It would facilitate creation of
 new translators.
 And I think that keeping them compatible on the tags/morphotactics
 level can and should be done.

… We use a trimming script in apertium-sme-nob; with this
 method, you would have apertium-kaz and apertium-tat as just
 development dependencies. So you'd add stuff to apertium-kaz/kaz.lexc
 and to your bidix, and then run a script from apertium-kaz-tat with the
 path to apertium-kaz and it creates a file apertium-kaz-tat/kaz.lexc
 (and you never change this file, although it's in SVN). Similarly for
 tat.lexc.

 This works, as long as the trimming script is well configured, but
 perhaps it'd be 'cleaner' to have apertium-kaz/apertium-tat as make
 dependencies and do the trimming each time you type make (no need for
 apertium-kaz-tat to have generated kaz.lexc/tat.lexc files in SVN).

 (The weak point in the chain is the trimming script though, which
 expects the lexc files to be fairly easily parsable (they're not,
 really). Ideally we would have ways of trimming both HFST and lttoolbox
 dictionaries so that we never had to copy-paste anything between pairs,
 but language pairs tend to have stuff in them that's rather specific to
 that pair, not sure how that is best dealt with.)

 = Reasons why we have monodixes copied =
 1. Historical (there weren't many pairs having common part initially,
 but Apertium keeps growing);
 2. Because of the stuff specific to a given pair.

 = Some imaginable solutions =
 Just to sum up:
 1. Transducers for language A and Language B as make-dependencies;
 2. Mono-dictionaries in apertium-langA and apertium-langB as
 development-dependencies + some trimming / duplicating /
 keeping-up-to-date scripts.

 = Strengths and weaknesses of each solution =
 Strengths and weaknesses become clear when we 'do' need to add
 language-pair-specific stuff to mono-dictionaries.

 All examples that come up in mind are for Russian-Tatar (=not related
 languages), so for related languages this might be not relevant. Maybe
 they won't need any pair-specific-stuff in their mono-dictionaries at
 all, but this sounds too good to be true :)

 Consider Russian word заговорить (start to talk). To Tatar it is
 translated with two words, just like to English. And in Russian-Tatar
 / Russian-English pair we will need to add start to talk as a
 multiword.

 I am sure that similar cases, when a word of languageA is translated
 to languageB with a multiword, can be found for related languages too.

 == 1. Make-dependencies ==
 We can add such words to monodictionaries in apertium-langA,
 separating them into sublexicons or commenting them like this stuff
 is needed for langA-langB pair.
 But this way transducer will become noisier and noisier.

 == 2. Mono-dictionaries in apertium-langA and apertium-langB as
 development-dependencies + some trimming / duplicating /
 keeping-up-to-date scripts ==
 In this case monodictionaries in apertium-langX are considered to be
 something like vanilla software. They are kept close to linguistical
 traditions of POS-tagging etc. And they serve as base for building new
 pairs involving this languages.

 Modifying them for a given pair is like patching the vanilla software.
 A script could keep this modified versions in apertium-langX-langY
 up-to-date with mono-dictionaries in apertium-langX and
 apertium-langY.

 A challenge here is not to overwrite modifications while updating.
 Although script used in sme-nob solves the problem of updating, as I
 understand, it will overwrite any modifications made in
 apertium-sme-nob. And I am not sure if this can be done at all
 technically.

 We never modify the trimmed dictionary, we consider it a generated file.
 All modifications go to the dictionary it was trimmed from.

 Although we don't, we _could_ actually have sme-nob-specific additions
 to the sme dictionary. It shouldn't be much worse than concatenating
 another .lexc file onto the trimmed sme.lexc. Note that this would only
 be lexicon additions (like start to talk, good example), not changes
 to tagging etc

[Apertium-stuff] soft hyphens and tokenisation

2012-04-17 Thread Kevin Brubeck Unhammer

Hi,

I notice that soft/hidden hyphens (#173;) can split words, e.g. in

Jespersen

there's a soft hyphen between n and t, but it should be analysed as one
word. I've noticed this a lot in web pages, I guess a lot of news sites
and such use programs that hyphenate using that character.

The problem is, if we don't have the soft hyphen in alphabet, we get
two lexical units; if we have it there, we get one unknown word, even if
Jespersen is in the dix.

Is it possible to use ACX files[1] or something to say that any soft hyphen
can be skipped? It seems sort of similar to what ACX does at least …


[1]  http://wiki.apertium.org/wiki/Acx


-Kevin





--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] soft hyphens and tokenisation

2012-04-17 Thread Kevin Brubeck Unhammer

Kevin Brubeck Unhammer unham...@fsfe.org writes:

 Hi,

 I notice that soft/hidden hyphens (#173;) can split words, e.g. in

 Jespersen

 there's a soft hyphen between n and t, but it should be analysed as one

Wops, between r and s!

 word. I've noticed this a lot in web pages, I guess a lot of news sites
 and such use programs that hyphenate using that character.

 The problem is, if we don't have the soft hyphen in alphabet, we get
 two lexical units; if we have it there, we get one unknown word, even if
 Jespersen is in the dix.

 Is it possible to use ACX files[1] or something to say that any soft hyphen
 can be skipped? It seems sort of similar to what ACX does at least …


 [1]  http://wiki.apertium.org/wiki/Acx


 -Kevin




--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] suggestion for lt-proc generation

2012-04-30 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Hello all,

 What do you think about having a new mode (yes, *groan*,  a new mode)
 for lt-proc where we can generate keeping the lexical form, e.g.

 Input:

   ^cantarvblexpresp1sg$
   ^depr$

 Normal output '-g':

   canto
   ~de 

 Output with '-k -g' mode:

   ^canto/cantarvblexpresp1sg$
   ^~de/depr$

 This would be teamed with a '-k -p' mode for postgeneration which would
 strip the analysis:

   canto
   ~de 

 Why would this be useful? Well, I could imagine you could do stuff with
 the analysis. Like with a language model or something. Hmm, if, e.g. you
 wanted to have a module which posteditted prepositions or something and
 you wanted to train it on lexical forms as well as surface forms.

 Fran


+1. Could be useful in testvoc as well. Also, lt-proc -a already goes
from l to both l and r – going from r to both l and r would
make -g more symmetric :)


-Kevin


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Install on Debian server

2012-05-10 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 my list of locals:

 C
 POSIX
 sv_SE.UTF-8

 Do I need something for the other languages like Spanish and French?
 How do I create that?

Any UTF-8 locale should do. 

Does

$ echo J'ai deux frères | LANG=sv_SE.UTF-8 apertium fr-es 

not work either?

-Kevin

 On Thu, May 10, 2012, at 09:30, Kevin Brubeck Unhammer wrote:
 Per Tunedal per.tune...@operamail.com
 writes:
 --snip--
 
  1. special characters aren't recognised
 
  eg. the example echo J'ai deux frères | apertium fr-es gives an error
  on frères.
 
 I think you just need to set a UTF-8 locale, put e.g. 
 
 export LANG=sv_SE.UTF-8
 
 in your ~/.bashrc or any scripts that run apertium.
 
 The command
 
 $ locale -a
 
 should give you a list of locales, if you don't have any UTF-8 ones,
 you can do e.g. 
 
 $ echo sv_SE.UTF-8 UTF-8 | sudo tee -a
 /var/lib/locales/supported.d/local
 $ sudo dpkg-reconfigure locales
 
 --snip--
 
 
 -Kevin


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Install on Debian server

2012-05-10 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 No,
 that doesn't work either. I get:

 Tengo dos *fruser@computer:~$

:( 

Then I'm at my wits' end. Anyone?

 I have a desktop installation of Debian where I have installed Apertium
 with Synaptic. It works as expected,
 with one exception:

 If I put an exclamation mark after a sentence I get:

 !: Event not found

 (The same happens in my server installation.)

Yeah, that's not Apertium-related. Bash interprets ! as a special symbol
even within -quotes, so you'll get that even with just

$ echo J'ai deux frères!

You can use

$ echo J'ai deux frères'!'

instead (ie. you can follow -quotes by '-quotes, and so on), but the
safest general method is to put the text into a file.

-Kevin

 On Thu, May 10, 2012, at 17:43, Kevin Brubeck Unhammer wrote:
 Per Tunedal per.tune...@operamail.com
 writes:
 
  Hi,
  my list of locals:
 
  C
  POSIX
  sv_SE.UTF-8
 
  Do I need something for the other languages like Spanish and French?
  How do I create that?
 
 Any UTF-8 locale should do. 
 
 Does
 
 $ echo J'ai deux frères | LANG=sv_SE.UTF-8 apertium fr-es 
 
 not work either?
 
 -Kevin
 
  On Thu, May 10, 2012, at 09:30, Kevin Brubeck Unhammer wrote:
  Per Tunedal per.tune...@operamail.com
  writes:
  --snip--
  
   1. special characters aren't recognised
  
   eg. the example echo J'ai deux frères | apertium fr-es gives an error
   on frères.
  
  I think you just need to set a UTF-8 locale, put e.g. 
  
  export LANG=sv_SE.UTF-8
  
  in your ~/.bashrc or any scripts that run apertium.
  
  The command
  
  $ locale -a
  
  should give you a list of locales, if you don't have any UTF-8 ones,
  you can do e.g. 
  
  $ echo sv_SE.UTF-8 UTF-8 | sudo tee -a
  /var/lib/locales/supported.d/local
  $ sudo dpkg-reconfigure locales
  
  --snip--
  
  
  -Kevin
 



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] suggestion for lt-proc generation

2012-05-16 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 30 April 2012 10:39, Jimmy O'Regan jore...@gmail.com wrote:
 On 30 April 2012 10:21, Francis Tyers fty...@prompsit.com wrote:
 Hello all,

 What do you think about having a new mode (yes, *groan*,  a new mode)
 for lt-proc where we can generate keeping the lexical form, e.g.

 Input:

  ^cantarvblexpresp1sg$
  ^depr$

 lt-proc -l

 ...by which I mean, 'that mode already exists, and it's -l (or --tagged-gen)'.

 I added it for feeding Apertium output into a speech synthesiser, so I
 had no need for post-generation. Sergio originally committed it as
 '-b' but later had a better candidate for 'b' so changed it to 'l'. He
 must have missed 'k' :)

It seems to skip @-tagged words:

$ echo kiteb|apertium -d . mt-ar-transfer 
^@kitebvblexpastp3msg$^.sent$

$ echo kiteb|apertium -d . mt-ar-transfer |lt-proc -l mt-ar.autogen.bin 
^./.sent$

(bug or feature?)

-Kevin


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Proposal: don't prefix paths in apertium-gen-modes (but prefix dirname $0 to PATH in apertium)

2012-05-20 Thread Kevin Brubeck Unhammer

Jacob Nordfalk jacob.nordf...@gmail.com
writes:

 Hi there!

 As no objections have been raised I think we can conclude that this proposal 
 has PASSED
 :-)

 Unhammer, could apply your patch to the SVN trunk ?

 Jacob

Committed revision 38346.


 2012/3/27 Stephen Tigner stephen.tig...@gmail.com

 On Mon, Mar 26, 2012 at 1:04 PM, Jacob Nordfalk
 jacob.nordf...@gmail.com wrote:
 
 
  2012/3/26 Stephen Tigner stephen.tig...@gmail.com
 [snip]
  Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable?
 
 
  Sorry, it wasnt clear (again - Unhammer, your'e fired as ghost writer 
 :-).
  APERTIUM_PATH would be the path to the 'apertium binary'
  In shell language it would be expressed as
 
  APERTIUM_PATH=$(dirname $0)
 
  this makes sure that binaries are first searched for in the same 
 directory
  as the 'apertium' command.
 
 Ah, okay. That makes more sense then.

 On Mon, Mar 26, 2012 at 1:08 PM, Kevin Brubeck Unhammer
 unham...@fsfe.org wrote:
  Stephen Tigner stephen.tig...@gmail.com
  writes:
 [snip]
  Ah, okay, so I'm assuming APERTIUM_PATH is an environment variable? If
  so, that should be fairly easy to implement. Just need to tweak a bit
  how the UNKNOWN programs are called. I'll try and take a look at it
  tonight if I have time. n.n
 
  If you install apertium to /usr/local/bin/, the shell script
  /usr/local/bin/apertium will have
  APERTIUM_PATH=/usr/local/bin
  as the second line.

 So basically it just prepends the apertium path to the system path?
 Well, I don't really think any modification of the existing Java code
 would be needed, then. Because the desired behavior is already the
 current behavior, as long as you remove the explicit paths from the
 mode file.

 This is because it always checks if it can be done internally first,
 and then it already depends on the host runtime to run the UNKNOWN
 programs, and that of course would reference the system path and any
 conventions the system has for finding executables to run. (Like the
 convention that the current working directory is always considered
 implicitly first on the path in Windows.)

 I used the same trick (letting the host runtime handle path searching)
 for trying to run cygpath (since I, AFAIK know, I have no way of
 knowing where cygwin is installed, or at least not an elegant and
 robust way) and javac for run-time compilation of transfer files (for
 instance when running on a JRE instead of a JDK, but the JDK is
 present on the system and in the system path).

 Hopeful that he's not just rambling and needing sleep now,  ;)
 -- Stephen


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] # before translated word

2012-06-07 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 7 June 2012 11:00, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Unfortunately, searching for # puts you back at the front page (bug in
 mediawiki I guess)

 Nope. HTTP does not transmit '#', which is for anchor names (and
 typing %23 is a pain :)

Well, the search box could still redirect to %23 on typing #, but I see
http://wiki.apertium.org/wiki/%23 is an illegal page. Oh well.


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Online language pair packages

2012-07-01 Thread Kevin Brubeck Unhammer

Mikel Artetxe artet...@gmail.com writes:

 Hi everybody,

 As some of you might know, I am working on the embeddability of
 lttoolbox-java as part of my GSoC project (you can follow my progress
 here). The central part of the project consists of having standalone
 packages for each language pair that can be run independently as well
 as easily integrated in bigger Java projects. This way, the project
 wouldn't make much sense if we don't maintain an infrastructure of
 ready-to-use packages online for all the language pairs that Apertium
 supports. Since this is something that would involve the whole
 Apertium community, I am writing to the list to, first, present you
 what I have been working on so far and, second, get feedback from you,
 discuss all this, and adopt the decisions that we take.

 First of all, let's see what those language pair packages consist of.
 In order to get a general idea, you can check any of the following
 links:

 * Esperanto ⇆ English

 * Basque → English

 * Basque → Spanish

 As long as you have Java in your machine, a simple program to
 translate between those languages should be launched (and if it is not
 working for you, please let me know). And I would like to remark that
 the only requirement is Java, the user doesn't need to have any other
 program installed in his/her machine, and it works in any operating
 system, including Windows. The app is run locally, so it can work
 offline.

 The secret behind those links, that is, the real language pair
 packages, are provisionally kept here. The Jars there (one per
 language pair) are the actual self-contained Java executables, and not
 only they work on desktop, but they can also work on Android and will
 be adopted by Arink for the Android app that he is working on. In
 fact, any other Java program could easily use them thanks to the API
 class that lttoolbox-java now offers.

Very cool =D

 And now the big question is, how can we maintain all this? I think
 that we can distinguish two different steps regarding it:

 1) Create the packages (those self-contained Jar files)

 2) Maintain the packages online (keep updated versions of all the
 language pairs online for anybody to use)


 The solution that I have been (and I'm still) working on comes in form
 of two bash scripts, each one to carry out one of the tasks (you can
 find them here in my branch):

 1) apertium-pack-j offers an easy way to generate the packages. It
 requires to have lttoolbox-java (the one in my branch, not the one in
 trunk) and android-sdk installed, and their location must be specified
 by setting the LTTOOLBOX_JAVA_PATH and ANDROID_SDK_PATH environment
 variables. After that, you can simply run it passing the path to the
 mode files for which you want to generate the package as argument, and
 a ready-to-use package would be created by the script. For instance,
 the following command would create a ready-to-use package for the
 Esperanto-English language pair named apertium-eo-en.jar in my
 machine:

 LTTOOLBOX_JAVA_PATH=/usr/local/share/apertium/lttoolbox.jar
 ANDROID_SDK_PATH=/home/mikel/developer/android-sdk-linux
 ./apertium-pack-j /usr/local/share/apertium/modes/eo-en.mode
 /usr/local/share/apertium/modes/en-eo.mode

 As you can see, I simply specify the correct location of
 lttoolbox-java and android-sdk in my machine, and pass the location of
 eo-en.mode and en-eo.mode (the main modes that correspond to the
 Esperanto-English language pair) as argument to apertium-pack-j.


 2) apertium-upload-j offers an easy way to maintain the packages
 created this way online. For instance, running the following command
 after the one exposed in the previous step would automatically update
 (or upload for the first time) the package for Esperanto-English:

 ./apertium-upload-j apertium-eo-en.jar

 More precisely, it would correctly rename the package to avoid
 duplication, generate a jnlp file (which is used to run the package
 through Java Web Start, as in the links above) and commit them to SVN
 (provisionally to my branch, Jacob suggested to create a specific
 directory such as binaries outside trunk for them, but any
 suggestion is welcome).


 As an idea, both scripts could be integrated in the makefiles of each
 language pair so that a simple make upload, for instance, would
 automatically create and upload the appropriate packages.

I'd prefer that method over making one person (ie., you) do all the
maintenance; it would be nice to simply type make upload. The only
thing that's sort of an annoyance is having to get the full android-sdk
in addition to lttoolbox-java, but I guess that will be appearing in the
various distro repositories …


--
Kevin Brubeck Unhammer


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security

Re: [Apertium-stuff] Definite determiner in apertium-es-ca

2012-07-11 Thread Kevin Brubeck Unhammer

Bernard Chardonneau bechapert...@free.fr
writes:

 Hey everybody.

 After 10 days mostly in the nature without a computer and just before
 8 other weeks without a permanent internet connexion (widely chosen),
 I want to give my opinion as a new pair developer about the discussion
 about what should countain dictionaries.

 1) For monodices, I perfectly agree with Fran and some others to think
 all interesting information should be there even if not used for several
 pairs.

 As doing that generally means to write a complete paradigm, and after
 just to use it hundred or thousand of times for the main ones, it is
 not a big problem.

 2) For bidixes, the most natural way to build them is to write something
 like :

 eplmy_words n=kind1//lrmy_translations n=kind2//r/p/e

 where kind1 and kind2 are often the same and can be built from the
 name of the paradigm used in the monodix.

 I tell that because I quickly realised that including a new line
 typing the right xml syntax in a file with more 40 000 other lines
 becomes quickly painful.
 So I wrote a 4 parameter shell to generate new lines, and another
 to put these lines at the good place.

 I think a lot of pair developers have their own shell to do the
 same or something similar to build a bidix when monodices are
 available.

 So, making bidixes lines like as above means other s n=something/
 would be better if not needed.

 Of course, there are exceptions witch permit to get pleasant results
 like in fr-es pair :

 eplcomas n=n/s n=m//lrcomas n=n/s n=m//r/p/e
 eplvirgules n=n/s n=f//lrcomas n=n/s 
 n=f//r/p/e

 or

 eplcomposants n=n/s n=m//lrcomponentes n=n/s 
 n=m//r/p/e
 eplcomposantes n=n/s n=f//lrcomponentes n=n/s 
 n=f//r/p/e

 But having to write (in eo-fr pair)
 eplABCs n=np/s n=al//lrABCs n=np/s n=al/s 
 n=mf//r/p/e
 without forgeting any s n=al/ or the s n=mf/ to prevent
 getting a # in the translation, is not a very nice way to work.

 There is of course the problem of the beginner not doing that and
 asking on the list why it does not work. But that can be learned
 quickly.

 But the most important problem is being obliged to do that quite
 allways and finaly having bigger and a little less readable lines
 in the bidix.

 I think event in this case :
 eplajouts n=n/s n=m//lradicións n=n/s
 n=f//r/p/e(gender changing), there should be no need to give
 gender if there
 is no word ambiguity in each langage (like for coma and componente
 in Spanish).

 And of course something like :
 e r=LRplbinaires n=adj/s n=mf//lrbinarios
 n=adj/s n=GD//r/p/e
 e r=RLplbinaires n=adj/s n=mf//lrbinarios
 n=adj/s n=f//r/p/e
 e r=RLplbinaires n=adj/s n=mf//lrbinarios
 n=adj/s n=m//r/p/e

 would become more simple in one line.

 So, the question is how to succeed to do that without breaking things.


 Solution 1 : paradigm

 Several people spoke about it but without details.
 I remark the information s n=kind/ inside bidixes can generally
 be generated from the name of the paradigm used in the monodix
 witch looks like something__kind (or foo__bar if you prefer).

 But of course, there is les information in kind than in
 something__kind.

 So a nice approach woud be for each paradigm of every monodix, to
 build a paradigm with the same name in the bidix just countaining
 an invariant list of informations like :

 s n=thing1/s n=thing2/

 And like that, even gender ambiguities like for the Spanish word
 coma could be solved elegantly :

 eplcomas n=livre__n//lrcomas n=abismo__n//r/p/e
 eplvirgules n=abeille__n//lrcomas n=abeja__n//r/p/e

Didn't Jacob Nordfalk and Michael Kristensen make a script to do that
kind of thing with sv-da? Ie. automatically create bidix pardefs based
on monodix pardefs.

 Solution 2 : during compilation

 That's another approch. For compiling bidixes files, two cases :
 - an information is in a s n=thing/ , so just use it
 - this information is not indicated, so it is taken from the
   monodix.


 Have a good summer.

You too :-)

--
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] New applications: Apertium Caffeine and Apertium plug-in for OmegaT

2012-07-19 Thread Kevin Brubeck Unhammer

Mikel Forcada m...@dlsi.ua.es writes:

[...]

 There is one thing that could be easily solved. Víctor Sánchez
 (cc-ed) maybe can help you. When one uses the Apertium
 webservice from inside OmegaT, we avoid translating the tags
 (u0, etc.). Some minor changes were made to the code that
 calls Apertium as a webservice (you'll easily find them, but
 if not, I can help) and some changes were made in the
 webservice itself (Víctor can help here). I think it is a
 matter of using some regular expressions to hide these in some
 way...
 
 
 I guess that you are talking about this. I might be blind, but I
 haven't been able to identify the relevant piece of code there...
 
 You're right. Most of the work is done at the Apertium server when it
 receives format=omegat. 

Perhaps you can just use the

translate meapertium-notransdon't translate me/apertium-notrans

method, this works in e.g. html and html-noent formats (grep tells me it
should also be supported in odt, pptx, xlsx, wxml).

[...]

 Yes. We should probably create a new directory in SVN and start
 creating and uploading packages for every language pair. The
 question is how to maintain it in long-term: we could integrate my
 script in the makefiles of each language pair to make things
 easier (although the dependency of Android-SDK and lttoolbox-java
 can still be a problem for some people), but we would still need
 the implication of every language pair developer in Apertium (or
 some responsible to take care of the whole maintenance).
 
 This deserves a deeper thought. Any ideas?

I liked the idea of just adding a make goal, though perhaps the script
could be installed by lttoolbox-java (since that's a dependency of the
script anyway), so that copies wouldn't be required by every language
pair?


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] New applications: Apertium Caffeine and Apertium plug-in for OmegaT

2012-08-06 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 6 August 2012 10:24, Mikel Artetxe
 artet...@gmail.com wrote:
 apertium-es-ro: Document apertium-es-ro.trules-ro-es.xml does not validate
 against /usr/local/share/apertium/transfer.dtd

 I can't find any instance of 'trules' anywhere in that package. Are
 you using the current SVN version?

It's in the release tarball (it needed https://gist.github.com/3273244
to compile here).

 apertium-oc-ca: Document oc-ca.t1x does not validate against
 /usr/local/share/apertium/transfer.dtd
 apertium-oc-es: Document oc-es.t1x does not validate against
 /usr/local/share/apertium/transfer.dtd

 These two involve running an xsl script (alt.xsl) on the transfer files first.


… but they could all do with a bugfix release (when I packaged the
releases for Arch Linux, I had to do https://gist.github.com/3273264 and
https://gist.github.com/3273266 to make them compile).

Who maintains the packages?


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium on Android

2012-08-07 Thread Kevin Brubeck Unhammer

Arink Verma arinkve...@iitrpr.ac.in writes:

 One more important feature added! 

 Application can read text from SMS inbox as input for translation

Very cool :-)

 Download link
 https://github.com/downloads/arinkverma/Apertiurm-Androind-app-devlopment/ApertiumAndroid_
 8_7.apk

Just tried it on my HTC Desire, looks slick =D

Some notes:

When I open the applications and have nothing installed, I am able to
click the → arrow and it says There is no mode from to to from.

I love how clicking from or to goes straight into the download list,
but then when I click a pair, nothing happens, I have to long-click, is
there a reason for that?

After selecting a pair to install, should the heading really say
Modes? How about Translation directions or something a bit less
apertium-jargony?

Unziping files should probably be spelled Unzipping files.

When I switch the 'from' language, 'to' is set to to and clicking →
gives There is no mode from null to katalansk; perhaps it should
switch to the last used mode with that from-language (or at least the
first available mode with that from-language).

On clicking 'Translate', the waiting box says Translator, it should
probably say Translating.

When I select all text in the input box, both the text and the
background is black …

Perhaps it should be possible to select and copy the output text too
(though I see now it's possible using the Clipboard Push feature).




-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish - Norwegian

2012-08-09 Thread Kevin Brubeck Unhammer

 to Norwegian Bokmål (nb)? And the
 same in the other direction (i.e. convert the transfer rules for sv-da
 to rules for sv-nb)?

 Reusing transfer rules probably isn't necessary. If you don't feel like
 writing them, then you can write testcases on the Wiki and ask someone
 on the list to write them. 

Well, from nb to sv you could copy-paste some of the compound chunking
rules, but yeah transfer rules don't take very long to write.

 Perhaps the maintainer of Danish (da) - Norwegian Bokmål (nb) can give
 me a hint? He's probably very updated on the differences between the two
 languages.

 There is no maintainer that I know of. 

And I don't think that pair has any work done apart from bidix entries …

 D. Linguistic resources for Norwegian.
 
 I have found frequency word lists for Norwegian Bokmål (nb) at
 http://helmer.aksis.uib.no/nta/ and can thus prioritize my work to the
 most important words.

http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursar has
more frequency lists (they also taunt you with this enormous corpus, but
it's currently in beta, very messy, and best avoided for now).


[…]

 E. Any advice for me if I start working on the pair Swedish (sv) -
 Norwegian Bokmål (nb)? Have I missed something I need to know? Any other
 resources I can use?

 My advice would be to start small, to avoid getting overwhelmed. 

 Start from scratch on a small task. For example translating this short
 story: 

 http://www.unilang.org/ulrview.php?res=422,416

 Once you have managed to make the system to translate this without any
 system errors (the @, * # you see, not necessarily translation errors),
 then you should have a good understanding of the system, and be well
 founded to start working with the other resources.

 It shouldn't take longer than a week, and some have done it in a couple
 of days.

+1 on that.


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish - Norwegian

2012-08-10 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi Keld,


 On Thu, Aug 9, 2012, at 19:55, k...@keldix.com wrote:
 On Thu, Aug 09, 2012 at 02:54:27PM +0200, Kevin Brubeck Unhammer wrote:
  Francis Tyers fty...@prompsit.com writes:
 --snip--
  
   (3) You make the two translators in the one pair. For this, you could
   have the same Swedish dictionary, but would need different nb and nn
   dictionaries, different sv-nb and sv-nn dictionaries and different sv-nb
   and sv-nn transfer rules.
  
   I think that (3) is probably best, but would like input from others
   (e.g. Unhammer or Trond).
  
  (3) sounds best to me too. Perhaps you could even do with one bidix, and
  just use the alt=nn vs alt=nb attribute; a rough and dirty count
  shows that the majority of entries in the nn-nb bidix carry over the
  same lemma/tag:
  
  $ lt-expand apertium-nn-nb.nn-nb.dix | grep -v ':[]:' | awk -F: 
  '$1==$2'|wc -l
  71628
  $ lt-expand apertium-nn-nb.nn-nb.dix | grep -v ':[]:' | awk -F: 
  '$1!=$2'|wc -l
  11365
  

 Some one who can tell the easiest way to add the alt-tags to the
 dictionnaries, before merging them?
 Maybe one can have an easy procedure to add new entries when the
 included languages are updated?

You wouldn't be adding alt-tags until after you add the third language
(ie. Nynorsk if you start with Swedish-Bokmål) to the pair, so it's not
something to worry about yet.


-- 
Kevin Brubeck Unhammer

Sent using my emacs


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish - Norwegian

2012-08-10 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,

 On Thu, Aug 9, 2012, at 23:23, Trosterud Trond wrote:
 
 Per Tunedal kirjoitti 9. aug. 2012 kello 20:21:
  Tihomir has told before that he plans to start developing a constraint
  grammar for Swedish.
 
 Good. Again: 
 - Are there open resources?
 - Could something be ported from Norwegian? (perhaps only indirectly).
 
  Yes, a production system (say, I want to translate a sv article to nn on
  Wikipedia) (…)
 
  Yes, that was the scenario I first had in mind. But it would break if
  there is a need for a constraint grammar, wouldn't it? And then there
  wont be any use left for the Apertium-translation.
 
 Well. Since a handful of rules will remove most ambiguities, what is left
 will be partly disambiguated. And how bad this is for MT needs to be
 seen. So it will not break. It will only be more problematic, and the
 result will be poorer.

 Mikel Artetxe has explained that the OmegaT plug-in doesn't work for
 language pairs that depends on programs that aren't a part of
 lttoolbox-java. Six language pairs depend on the Constraint Grammar
 package and are thus excluded, one of them is apertium-nn-nb. But sv-da
 doesn't use any constraint grammar, thus I concluded that sv-nb (Norsk
 bokmål) wouldn't need one either. And would come to real use, by real
 translators, using OmegaT. If the pair cannot be used, I don't see any
 need to develop it.

In any case a CG could be added later, as an option for those who aren't
using OmegaT or Android.


[...]

  What kind of resources do I need?
 
 For 1: swetwol :-) But it seems there are resources in Gothenburg:
 
 http://www.cse.chalmers.se/alumni/markus/FM/
 http://www.cse.chalmers.se/alumni/markus/FM/download/swedish.lexicon
 
 This might even work.

 As an input for transfer rules or for a potential constraint grammar?

The .lexicon file might be used to enlarge sv.dix. However, is-sv.sv.dix
should already be big enough to get a pair started.

(What's the license on that FM stuff anyway?)

  As for 2, Lexin might be one resource. I am on Euralex in Oslo right now,
  and will ask around.
  Fine! Besides, what's Lexin?
 
 Lexicon för invandrare, http://lexin.nada.kth.se/lexin/

 As a native Swede, I don't see any need for this.

Your machine translation system, however, is not a native Swede, and
might have a need to know that e.g. katt is a noun.

But it doesn't seem that Lexin is free software:

Går lexikonen att ladda ner?
Nej. Däremot kan man ladda ner Folkets lexikon, som ersätter engelska 
Lexin, men enbart i xml-format. 
http://lexin.nada.kth.se/lexin/#about=1;main=3;


-- 
Kevin Brubeck Unhammer

Sent using my emacs


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] wiki down

2012-08-13 Thread Kevin Brubeck Unhammer

Mikel L. Forcada m...@dlsi.ua.es writes:

 They both seem to be back. Power cuts were scheduled for August...

 Cheers

 Mikel

 2012/8/6, Francis Tyers fty...@prompsit.com:
 The Wiki (and the whole of the DLSI) is down. August has finally
 arrived!

 In the meantime, you can try using the Google Cache, or if something is
 urgent, come on IRC.

 Fran


Down again :/ 

How about a Flattr[1] button on apertium.org, where all donations go to
an on-call apertium sysadmin with the sole purpose of keeping the wiki
online? =P


[1]  https://flattr.com/


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Benefits of Apertium for translators

2012-08-22 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 22 August 2012 12:06, Per Tunedal
 per.tune...@operamail.com wrote:
 Hi,
 OK.
 Back to my original wish for some kind of easy to use interface for
 contributions for specific domains. I guess most experts in law or
 beetles are not hackers and would not even think of contributing by
 adding codes to dictionary files.

 There have been a number of attempts at this over the years, and it's
 surprisingly difficult to get right. IIRC, someone at UA was working
 on it, but I don't have details.

 That said, if you can live with import-only, it's really easy to
 convert from something like a spreadsheet.

http://wiki.apertium.org/wiki/Contributing should be a bit clearer about
that now.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Solved: Constraint grammar installation was: Re: Swedish - Norwegian

2012-09-05 Thread Kevin Brubeck Unhammer

Tino Didriksen tino.didrik...@gmail.com
writes:

 For the record, using a prefix for CG-3 is fine, if your prefix's bin
 folder is in your $PATH as it should be.

Also, using a prefix for language pairs dependent on CG-3 also works
fine as long as the above is true AND apertium, lttoolbox and the
language pair were also installed to that prefix. And there is no
apertium-command installed to a prefix which appears earlier in the
$PATH.

But since that turns into a rather long checklist, I guess non-prefixed
installations are easier to get right for new developers.

 On Wed, Sep 5, 2012 at 9:42 AM, Per Tunedal
 per.tune...@operamail.com wrote:

 Ah! Wouldn't it be a good idea to put a warning on the Wiki page
 about
 the Constraint Grammar: DO NOT USE PREFIX, if you don't know what
 you
 are doing! A beginner should be able to just copy the commands.
 
 http://wiki.apertium.org/wiki/Vislcg3#Installing_VISL_CG3
 
 I started all over again, this time simply using:
 
 ./cmake.sh
 
 and the rest worked like a charm.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] end of gsoc (partial) report

2012-09-07 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Hello all, 

 We had 11 projects accepted this year. Of those, 2 failed. Here is a
 rundown of the remaining projects I was keeping an eye on:

 * apertium-id-ms: 80% coverage over Wikipedia, error rate of  10%.
 This pair has been moved to trunk and will be released shortly.
 Congratulations Raymond and Tina for the excellent work. This is a
 canonical Apertium pair. Testvoc clean in both directions. Comes close
 to Google level for this pair. Further information here:
   http://wiki.apertium.org/wiki/Indonesian_and_Malaysian/Work_plan

 * apertium-mt-ar: 80% coverage over Wikipedia, error rate around
 10-20%. This pair has been moved to staging, and a preliminary/beta
 version will be released shortly. Excellent work by Miri and Unhammer.
 This pair uses lttoolbox/CG. Beats Google for this pair. Testvoc clean
 in both directions. Further information here:
   http://wiki.apertium.org/wiki/Maltese_and_Arabic/Work_plan

 * apertium-sh-sl: 80% coverage over Wikipedia, error rate around 20%.
 This pair will be moved to trunk. Fantastic work by Aleš, Hrvoje and
 Jernej. This pair uses lttoolbox/CG. Testvoc clean sh-sl, needs work
 sl-sh. Further information here:
   http://wiki.apertium.org/wiki/Serbo-Croatian_and_Slovenian

 * apertium-tat-kaz: 80% coverage over Wikipedia, error rate around 20%.
 This pair will be moved to staging. Really great work by Ilnar and
 Jonathan. This pair relies on HFST/CG. More or less clean, but
 challenging to testvoc because of `agglutinative' morphology. Pair not
 in Google. Further information here:
   http://wiki.apertium.org/wiki/Kazakh_and_Tatar
   http://wiki.apertium.org/wiki/Kazakh_and_Tatar/Work_plan

 * apertium-quz-spa: Coverage difficult to calculate because of weak
 adherence to orthographic norm. Will be moved to nursery. Good work by
 pato. Not clean. Pair not in Google. Further information here: 
  http://wiki.apertium.org/wiki/Quechua_cuzque%C3%B1o_y_castellano
  http://wiki.apertium.org/wiki/Quechua_cuzque%C3%
 B1o_y_castellano/Apertium-quz-spa/Ortograf%C3%ADa

 * Corpus based lexicalised feature selection: The project was a
 reasonable success. We didn't achieve any improvements for definiteness,
 but we did for preposition selection. Further information and links on
 Filip's user page:
  http://wiki.apertium.org/wiki/User:Fpetkovski

 * Finite-state disambiguation: It was descovered at midterm that this
 was too much work for Hrvoje student in the available time. After the
 midterm he continued to work on Slovenian--Serbo-Croatian.

 Thanks to all our students for taking part! :)

 Fran

+1, great work, apertiumers :-)


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] apertium-id-ms 0.1.0 released

2012-09-21 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 Hello everyone,

 Just a quick note to say that I've released the first version of
 apertium-id-ms, a pair for translation between Indonesian and Malaysian.
 This is a GSOC pair, developed by Raymond Susanto and mentored by
 Septina Larasati and myself.

 The pair has over 80% coverage of Wikipedia and the News domain, and an
 error rate of between 8-15% or so. It doesn't quite beat Google yet, but
 goes a reasonable way there.

 Congratulations to Raymond for successfully completing his GSOC, and
 great work all round!

 You can find the package in the SF repository. :)

 Fran

Wow, congrats on the first released Austronesian language pair in
Apertium :-)

-- 
Kevin Brubeck Unhammer

Sent from my emacs


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] proposal for change to .lrx format

2012-10-07 Thread Kevin Brubeck Unhammer

Mikel Forcada m...@dlsi.ua.es writes:

 Fran,


 I'd like to include the possibility for lists in the LRX format. This
 would involve adding a couple of new tags,
 That should not be a problem as long as the old format works in the new 
 definition
 and changing the root tag,
 You would have to give a good reason for it.
 the idea is to have something like:

 http://pastebin.com/GGzfM5qc
 OK. I would have appreciated comments in the .lrx file...

 Calling a list would just involve putting it's contents in the rule.
 So with or it would work like an OR, but without, it would work like a
 sequence.
 I don't like this at all. This is very opaque. I would not use the tag 
 list. If it is a set from where you choose, call it set or 
 option... if it is a sequence define it as a sequence. But don't 
 overload something called a list that really isn't.

 Another option would be to have a named macro that can contain anything 
 that a rule can contain. Then you would use or for lists, etc. But 
 orlist.../list/or does not read well.

applyor/list sounds more correct =P

Since CG uses 'list' only for the disjunction, I think CG-ers would be
confused too, if that matters. (CG uses 'template' for the sequence,
though those templates can be a bit more involved.)


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Danish to Sweden broken se-da

2012-10-11 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 Fine! Why didn't you tell me before? I was trying to get it perfect,
 scared to break anything!

https://systemerrorcs.wordpress.com/2011/06/09/commit-early-commit-often/


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Modes for Icelandic - Swedish is-sv

2012-10-20 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 I would like to test the pair Icelandic (is) - Swedish (sv), but cannot
 find any mode files.

$ sh autogen.sh  make
[…]

$ ls modes
is-sv-anmor.modeis-sv-generador.mode   is-sv.mode
is-sv-pretransfer.mode  sv-is-anmor.modesv-is-generador.mode   sv-is.mode   
 sv-is-pretransfer.mode
is-sv-chunker.mode  is-sv-interchunk.mode  is-sv-postchunk.mode  
is-sv-tagger.mode   sv-is-chunker.mode  sv-is-interchunk.mode  
sv-is-postchunk.mode  sv-is-tagger.mode

$ echo 'Svifnökkvinn minn er fullur af álum' | apertium -d . is-sv
*Svifnökkvinn #min är Full av *álum


Hmm, needs some work.


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] new language pair es-de

2012-10-23 Thread Kevin Brubeck Unhammer

Isabel Imbernón isabelimber...@gmail.com
writes:

Hi,

I'm trying to create the pair es-de from scratch, as you already know.
For that purpose, I copied the es-ca directory and started to change
names. I reused the Spanish dictionary and now I have the files
apertium-es-de.es.dix and apertium-es-de.es.acx for my new pair. I
want to add also the German dictionary from the incubator so that I
can have apertium-es-de.de.dix, but then I should also have the
apertium-es-de.de.acx, shouldn't I? There is not such a similar file
in the incubator, could anyone help me to create this?

.acx files are optional, so no need to worry about that (see
http://wiki.apertium.org/wiki/Acx for what they do, I don't think
they're able to deal with the German double-s unfortunately).

Concerning the bilingual dictionary, some months ago I created the
file apertium-es-de.es-de.dix just by copying the bilingual dictionary
of the directory en-de from the incubator and changing manually
English words by Spanish ones. I didn't do it with the whole
dictionary, of course, I just have some of them. Do you think this is
a good start?Or should I do this differently?

Sounds like a good start, although after doing the closed classes
(pronouns, conjunctions, etc.) it would be a good idea to prioritise
sort the words you are about to translate by frequency.

Further reading:
http://wiki.apertium.eu/index.php/Appendix_A:_Frequency
http://wiki.apertium.org/wiki/Building_dictionaries#Frequency

I was also thinking of having an empty file of rules for the
beginning, just to start translating, first of all, between words of
the dictionaries. How could I do that?

Copy-paste this
http://wiki.apertium.org/wiki/A_long_introduction_to_transfer_rules#Overview_of_a_transfer_file
into your apertium-es-de.es-de.t1x (or apertium-es-de.de-es.t1x for the
other direction).

Hope this helps, good luck :-)

--
Kevin Brubeck Unhammer

GPG: 0x766AC60C

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Debugging?

2012-10-31 Thread Kevin Brubeck Unhammer

Jimmy O'Regan jore...@gmail.com
writes:

 On 30 October 2012 16:07, Yannis Haralambous
 yannis.haralamb...@telecom-bretagne.eu wrote:
 dear Apertium people,

 is it possible to follow the structural transfer of a sentence step
 by step? For example: what are the chunks, which rule is applied to
 each, what is the result for each chunk. In other words, is there a
 debugging option for the structural transfer module?


 You can get this with
 apertium-transfer -t
 but it's not available from the script (it wouldn't make sense) --
 you'll have to manually provide the entire pipeline.

You can also use http://wiki.apertium.org/wiki/Apertium-viewer to get a
quick overview of the input/output of each stage (it doesn't show rule
numbers though, like apertium-transfer -t does).

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] compounding

2012-11-07 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dc 07 de 11 de 2012 a les 17:32 +0100, en/na Per Tunedal va escriure:
 Hi,
 thank you. I've read the Wiki and looked into the apertium-nn-nb.nb.dix
 file.
 
 Apparently, this is solved in a less transparent way in the nn-nb pair
 than in the examples in the Wiki. 

 It's less transparent because it is more complete. I think that
 compounds work very similarly in sv, da, nn, nb so you could probably
 just copy these paradigms and see how it goes.

 In the beginning of the dictionary,
 there are a lot of pardefs treating compounds, that I don't understand.
 Can anyone explain?

 I can try.

[...]

Exactly :)

The only thing I would add is that the tag cmp is a normal tag (as
opposed to compound-only-L and compound-R, the special hidden
compounding tags). It's not strictly necessary to have it there do
compounding, but it is helpful.

E.g. in transfer, it's used to distinguish a compound from two nouns
simply following each other, and it's also very helpful in generation,
for those cases where the sg.ind form is not equal to the form used in
compounds (e.g. the Nynorsk word 'vatn' when used as the left-part of a
compound becomes 'vass').


-Kevin


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] compounding

2012-11-08 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

[...]

 The noun kjempe is advertised as possible to use in compounds, yet
 there is an entry for the adjective kjempehøy (= very high/tall). Why?

Assume you have dynamic[1] compounding turned on for the open classes
nouns, verbs, adjectives – these are all fairly common in compounding
(though nouns cover over 70 % in nn/nb), and you remove kjempehøy from
your dictionary.

Now, since nb.dix has these analysis of kjempe and høy:


kjempevblexinf/kjempenmsgind/kjempenfsgind/kjempenmsgind/kjempenfsgind

høyevbleximp/høynntsgind/høynntplind/høyadjposimfsgind

your compound analysis will be ambiguous over at least:

kjempenfsgind+høynntplind
kjempenfsgind+høynntsgind
kjempenfsgind+høyevbleximp
kjempenfsgind+høyadjposimfsgind
kjempenmsgind+høynntsgind
kjempenmsgind+høynntplind
kjempenmsgind+høyevbleximp
kjempenmsgind+høyadjposimfsgind
kjempevblexinf+høynntplind
kjempevblexinf+høynntsgind
kjempevblexinf+høyevbleximp
kjempevblexinf+høyadjposimfsgind

And it gets even worse if there's some possibility of segmenting at the
pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented
'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste'
(image+list) vs 'bildel+iste' (image+iced/image+ice tea).

Compare this with the ambiguity-count of the analysis given when we do
have kjempehøy in the dictionary:

kjempehøyadjposimfsgind

Only one analysis, and it's the correct one. 

So you avoid useless ambiguity by adding more compounds. Useless
ambiguity is harmful not only to the translation of that word, but of
the context (given the seqence adj vblex/n, it's easy to see
that the second word is most likely a noun, not so with
adj/n/vblex vblex/n).


In addition to all that, a decompounding analysis takes a lot longer per
word than a simple analysis (you have to check all the possible ways of
segmenting the word into two parts, then three parts, etc.), and the
fact that adding full compound words further helps decompounding
compounds of compounds (it's safer and faster to segment
'bildeliste+generator' than 'bilde+liste+generator', where you might end
up with 'bildel+iste+generator').

Aaand, finally, some times the sum is greater than the parts, e.g.
Bokmål 'kjempemessig' might be better translated to 'ovstor' or 'diger'
in Nynorsk, 'bedømmelseskommité'→'domsnemnd' etc.


In summary: Dynamic compounding leads to more ambiguity and slower
analysis, and is thus used only when there is no lexicalised analysis.
Adding lexicalised compounds improves not only analysis of those
compounds and their contexts, but also improves dynamic compounding of
longer compounds.


 BTW I've found only one similar Danish word: kæmpestor (very large). I
 don't know if there are any more.

If kæmpe- is not very productive in Danish, it might be better to
translate those words into something else (kjempelett→pærelet,
kjempegod→knippelgod?). Adding such pairs as lexicalised compounds in
the dictionaries will override dynamic compounding for those words.



[1] Dynamic compounding is when the analyser only contains the parts and
guesses how they fit together, lexicalised compounds are defined as
those we spell out completely in the dictionary.


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] compounding

2012-11-08 Thread Kevin Brubeck Unhammer

Kevin Brubeck Unhammer unham...@fsfe.org writes:

[...]

 And it gets even worse if there's some possibility of segmenting at the
 pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented
 'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste'
 (image+list) vs 'bildel+iste' (image+iced/image+ice tea).

Wops, mis-glossed the mis-segmentation: 'bilde+liste' would mean car
part+iced or car part+iced tea.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Paradigms in Bidixes

2012-11-12 Thread Kevin Brubeck Unhammer

Francis Tyers fty...@prompsit.com writes:

 El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure:
 Hi,
 OK. I just thought the other way around:
 
 Because coverage is so low, it would be fruitful to generate
 translations for unknown words.
 
 In the next step, I intended to add the most frequent words, bit by bit.

 Great!

 As you have pointed out, it's much more effective to have a word in the
 dictionaries than to generate it by some rule. Thus the gain is
 obviously largest from adding the most frequent compounds and
 derivations explicitly in the dictionaries. But it's still nice to get
 translations of the more rare compounds and derivations.

 Bad investment in terms of time. You want your work to have maximum, not
 minimum impact. Thus, work by frequency. Add the frequent stuff first.

As https://xkcd.com/1133/ shows, with only the 1000 most frequent words
in English you can explain rocket science ;-)




-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Cannot commit new mode file

2012-11-21 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 succeeded after some trouble:

 sudo sh apertium-generate-modes modes.xml
 Gives: sh: Can't open apertium-generate-modes

 Tested:
 sudo sh autogen.sh  make
 Couldn't remove the existing mode files.

 Removed the files with:
 rm *
 cd ..
 Removed the directory with:
 rmdir

 And finally:
 sudo sh autogen.sh  make
 Worked!!

Don't use sudo before autogen, the only thing that should require sudo
is make install. If anything else requires sudo, there's likely either
a bug in the makefile, or you at some point earlier sudo'ed when you
shouldn't have ;)

(Unfortunately, if you've used sudo'ed earlier, you might have to use
sudo to rm some files that have been created with root permission.)


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Cannot commit new mode file

2012-11-21 Thread Kevin Brubeck Unhammer

Kevin Brubeck Unhammer unham...@fsfe.org writes:

 Per Tunedal per.tune...@operamail.com
 writes:

 Hi,
 succeeded after some trouble:

 sudo sh apertium-generate-modes modes.xml
 Gives: sh: Can't open apertium-generate-modes

 Tested:
 sudo sh autogen.sh  make
 Couldn't remove the existing mode files.

 Removed the files with:
 rm *
 cd ..
 Removed the directory with:
 rmdir

 And finally:
 sudo sh autogen.sh  make
 Worked!!

 Don't use sudo before autogen, the only thing that should require sudo
 is make install. If anything else requires sudo, there's likely either
 a bug in the makefile, or you at some point earlier sudo'ed when you
 shouldn't have ;)

Aaand now I see there that that makefile will give the modes directory
root permission when you do sudo make install. I uploaded a fix, you
might have to sudo rm -r modes once if you've done a sudo make
install.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Domain and style/genre

2012-12-17 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 See below.
 Yours,
 Per Tunedal

 On Sun, Dec 16, 2012, at 23:07, Francis Tyers wrote:
 El dg 16 de 12 de 2012 a les 14:21 +0100, en/na Per Tunedal va escriure:
  Hi,
  I consider info on primarily domain and secondly style useful for
  disambiguation. As a first step it would be very nice to be able to add
  a domain-tag to words. Adding info on style would make it possible to
  further improve translation results.
  
  The translation would improve considerably if the user could choose the
  appropriate domain when demanding a translation. Consider e.g. the
  example of translating the English word key, or similarly the French
  word clé/clef, to Swedish. If the domain is e.g.
  Tourism/accommodation/real estate or similar, the word would most likely
  translate to nyckel (to lock/unlock the door of a house). On the other
  hand if the domain is e.g. information technology (or even music) the
  word would most likely translate to tangent (on your keyboard or
  piano). Obviously, a lexical selector/disambiguator could be trained on
  a corpus from a specific domain as well, further improving to the
  translation.
 
 I did this in my thesis. It's quite effective. It's possible to tune the
 vocabulary to a domain with either parallel or monolingual corpora using
 apertium-lex-tools.[1] You won't be interested in it though as it
 doesn't work with the Java version.

 What will work with the java-version? And what will not? What's the
 problem?

No one has re-implemented the apertium-lex-tools package in Java yet.

 What I would like to do is:
 - adding info about domain in the dictionaries
 - do some training on an appropriate corpus

You want to first add the domain-specific translation manually, and then
have the system automatically discover the domain-specific translation?
That sounds like duplicating work, and what do you do if the training
and dictionaries don't agree?


The way lex-tools training works is:

The English word key is listed in the en-sv bilingual dictionary with
both nyckel and tangent as possible translations. You then give the
bilingual dictionary and a corpus to the lex-tools training scripts,
these give you a .lrx file.

You can run the training scripts twice, once with a general corpus and
once with a music-domain corpus in order to get both a general .lrx
file and a music-domain .lrx file.

The training scripts don't need any manual specification of what domain
you're in, they learn what the best translation is from the
(domain-specific) corpus.

 - let the user choose a suitable domain (if any), as an alternative to
 the general domain

That'd be easy by giving a new translation mode, e.g. en-sv_music.mode
would point to a different lrx file from the general en-sv.mode.

 - let Apertium use info from the dictionaries and the training to solve
 ambiguities.

 BTW Would it do any difference if you trained the tagger on a domain
 corpus?

It might. 


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Video talk on Apertium

2012-12-19 Thread Kevin Brubeck Unhammer

Mikel Forcada m...@dlsi.ua.es writes:

 Jacob,
 i cannot see your video with the free software I have available. I
 posted in your entry, but you didn't answer.

 Mikel

http://downloadvimeo.com/#http://vimeo.com/54075259 will download the
mp4 file.

For future reference: Youtube encodes videos as .webm (Flash-free
playback in a non-patent-encumbered codec).

 Al 12/13/2012 12:28 PM, En/na Jacob Nordfalk ha escrit:

 Hi there,
 
 At film of a talk I made on Apertium and MT at the International
 Congress University (of World Congress of Esperanto) in 2011 has
 finally been prepared and published. 
 
 https://plus.google.com/114820443085046080944/posts/ZtG7zjfD1ns
 
 The talk is in Esperanto. Enjoy and share :-)
 

 Jacob

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium for Android version 1.0 soon to be released

2012-12-25 Thread Kevin Brubeck Unhammer

Xavi Ivars x...@infobenissa.com writes:

 Hi,


 I've tested it with two old (SonyEricsson Xperia X10 mini and
 Samsung Galaxy S) mobiles and a Nexus7 tablet.

 With the Nexus7 = awesome! But not many differences with the old
 verision. I guess 2Gb of RAM make this feature not as important as in
 other older devices.

 With the GalaxyS = really-really-really awesome! The difference
 between this version and the last one that loaded the whole files is
 really big. The translations are really fast, including the first one.

 With the X10 mini = I couldn't install any package. I got some space
 errors (the device has a really small storage). Also, because of bad
 data connectivity, there was an IOException while downloading the lang
 package, and the expection message was shown in the target
 textarea. 

 In all 3 devices I got the bug of two icons in the app launcher.

 As Kevin, I think I would remove all the metainformation about the
 translation process.

Another thing I noticed now (I think this may have been a problem before
too): on second startup the translation direction said from → to,
where clicking the arrow gave There is no language direction from
from to to, and clicking to gave the heading Translated to
and no language. Only clicking from has an effect. Perhaps the
buttons should be grayed out or something when they're not useful.

Also, a little UI request: if there's only one possible direction, it
would make sense to auto-pick that; and if there's only one possible
to for a certain from, that should be auto-picked on selecting that
from.

 About the permissions, what I would do is try to require as few
 permissions as possible.

I mostly agree (either see what feature requests people make, or at
least have a simple base that others can build on). I think SD card
installation would be good though; on my Desire I had to remove some
stuff before being able to install. The app does 1) work well on older
phones and 2) work without a net connection, so I'm guessing SD install
would make it fit well into the older-phone-market (as well as work
great on newer phones).

 Anyway, THANKS A LOT Jacob for your work with this app. 

Yes, thanks Jacob – and Mikel and Arink; offline translation on a phone
is quite awesome :-)

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium for Android version 1.0 soon to be released

2012-12-26 Thread Kevin Brubeck Unhammer

Jacob Nordfalk jacob.nordf...@gmail.com
writes:

 2012/12/25 Kevin Brubeck Unhammer
 unham...@fsfe.org

 Xavi Ivars x...@infobenissa.com writes:
 
  In all 3 devices I got the bug of two icons in the app
 launcher.
 

 The two Apertium icons were the two versions: Arinks work, and a more
 simple derivative where code from Mikels 'Apertium-Caffeine' is fused
 with Arinks code and simplified a lot. 

Ah, that cleared up a lot :-)

 Arinks is the more sophisticated, i.a. is has two seperate buttons for
 from and to where the simple just have one Choose languages
 button. But with sophistication also comes completity; Arinks code is
 using databases and contains of ~25 classes and, as a result of the
 sophistication, will be hard to use as example code for others to
 follow/import into their own projects. 

 Another consequence of the complexity was that I had trouble making it
 react robustly to screen turning and application restarts. 

 In the end I resolved to fuse the code into 5 classes where there is
 no databases involved.

:)


 on second startup the translation direction said from → to,
 
  

 where clicking the arrow gave There is no language direction from
 from to to, and clicking to gave the heading Translated
 to
 and no language. Only clicking from has an effect. Perhaps the
 buttons should be grayed out or something when they're not
 useful.
 
 Also, a little UI request: if there's only one possible direction,
 it
 would make sense to auto-pick that; and if there's only one
 possible
 to for a certain from, that should be auto-picked on selecting
 that
 from.
 


 This is Arink's code. Sorry for the confusion!
 Please update the app or choose the other icon. 
 From inside the basic activity you can choose 'Show extended example'
 after pressing the MENU button to use Arink's code.

  

 
  About the permissions, what I would do is try to require as few
  permissions as possible.
 
 
 I mostly agree (either see what feature requests people make, or
 at
 least have a simple base that others can build on). I think SD
 card
 installation would be good though; on my Desire I had to remove
 some
 stuff before being able to install. The app does 1) work well on
 older
 phones and 2) work without a net connection, so I'm guessing SD
 install
 would make it fit well into the older-phone-market (as well as
 work
 great on newer phones).
 

 Ive changed the app so it can be moved to SD card. This might make it
 work on older phones with memory constraints.

 Could you download again and try if you can move it to SD card and see
 if it helps?
 https://apertium.svn.sourceforge.net/svnroot/apertium/builds/apertium-android/

I'm not quite sure how to read the space requirements:

On phone, no lang.pair:
Total 2.05MB
App 2.05MB
Data 0.00KB

On SD Card, no lang.pair:
Total 1.65MB
App 1.64MB
Data 4.00KB

On phone, one lang.pair:
Total 9.93 MB
App 2.05MB
Data 7.88MB

On SD Card, one lang.pair:
Total 9.52 MB
App 1.64MB
Data 7.88MB

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Installation troubleshooting: cg-comp not found

2013-01-14 Thread Kevin Brubeck Unhammer

Federico Gobbo federico.go...@univaq.it
writes:

 Hi there / Saluton

 [this message is directed in particular to the maintainer
 of the pair eo-es / chi tiu mesagho estas aparte celita al
 la zorganto de la lingvoparo eo-es]

 I am installing Apertium on Ubuntu and I succeeded
 to follow the instructions from SVN without any problem
 until step 5:

 http://wiki.apertium.org/wiki/Apertium_on_Ubuntu

 I decided to choose the language pairs. I did not find
 any problem with

 eo-ca
 eo-en

 but when I have chosen

 eo-es

 the make file (correctly created, according to my shell),
 says that cg-comp was not found.

Install vislcg3. See:
http://wiki.apertium.org/wiki/Apertium_and_Constraint_Grammar#Installing_VISL_CG3

(I see the troubleshooting page only mentioned cg-proc, added cg-comp
now.)

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Why is interchunk done after transfer on the target language ?

2013-01-14 Thread Kevin Brubeck Unhammer

Bernard Chardonneau bechapert...@free.fr
writes:

 The question does not explain the whole problem. If in the source language,
 there is a strict way for ordering words and in the target language, another
 way, as strict for ordering them, I don't see any problem doing the transfer
 step first.

 But if the source language does not impose a way to put words (there is
 another way to know where is the subject or the object) but the target
 language does, it may be more simple to reorder words on the source
 language.

[...]

In apertium-interchunk, you can reorder chunks. These have been created
by apertium-transfer. If you run interchunk before transfer, you won't
have any chunks to reorder.

Since interchunk operates on chunks, you don't have access to neither
source nor target language lemmas, only the chunk tags. In
apertium-transfer, you have access to both source and target language
lemmas. If I understand you correctly, I think you want to do more of
your changes in apertium-transfer, and less in interchunk?

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Installation troubleshooting: cg-comp not found

2013-01-15 Thread Kevin Brubeck Unhammer

Hèctor Alòs i Font h.a...@esperanto.cat
writes:

 Kara Federico,

 Laŭ mi vidas es-eo havas eksperimentalan uzon de Constraint Grammar.
 Mi provis uzi CG por pli bone morfologie senambiguigi, sed ne sufiĉe
 sukcese por lanĉi novan version. Tial estas du modoj: es-eo kaj
 es-eo-no_cg (kiu devus pli-malpli egali al la oficiala versio).

It should be possible to have two make goals, e.g. 

./configure --with-cg

so that people/distros get the choice of whether to install another
dependency or not.

-- 
Kevin Brubeck Unhammer

Sent from my emacs


--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Installation troubles (second and last part)

2013-01-15 Thread Kevin Brubeck Unhammer

Federico Gobbo federico.go...@univaq.it
writes:

 Briefly / Rapide:

 @Héctor: mi kompilis VISLCG3 kiel konsilite sed
 la arkivo make falis trovi la detalon cg-proc denove :(
 Kiel mi povas kompili sen CG, almenau pormomente?
 Mi ne vidas dosieron apertium-eo-es_no_cg au simile?

Did you remember to sudo make install after compiling CG?

Also, did you re-run autogen.sh in apertium-eo-es before running make
there?

 @Kevin: thanks for your help. I didn't understand if the
 two goals you mentioned (i.e., ./configure --with-cg) are
 a proposal or an actual option I can choose.

Sorry for the confusion, that was directed at Hector as an idea. You
can't choose --with-cg.

 @all: I had another error in the language pair eo-fr:

 ---snip---
 NOTE: lttoolbox-java (used for bytecode accelerated transfer) is missing
   Therefore the following will fail (but it's OK)

 apertium-preprocess-transfer-bytecode-j apertium-eo-fr.fr-eo.t1x 
 fr-eo.t1x.class
 /bin/bash: apertium-preprocess-transfer-bytecode-j: comando non trovato
 make[1]: *** [fr-eo.t1x.bin] Errore 127
 make[1]: uscita dalla directory /home/riko/apertium/apertium-eo-fr
 make: *** [all] Errore 2
 ---snap---

 What does it mean? Should I ignore it? The good news is
 that I could add

You can ignore it. It may provides for faster transfer, but the quality
of the translation should be the same, I think.

 ca-it
 es-it

 straightforwardly in my good old Lubuntu machine, so more or
 less now I am prepared to start contributing to Apertium.

 Last, a tickle question: why some language pairs use CG if
 there is already the TSX_format for tagging (if I read the wiki
 correctly)? Advantages? Costs? Thanks in advance,

CG is a rule-based disambiguation system, it allows for writing very
nuanced / powerful rules for selecting the right reading, and can match
e.g. sequences from the beginning of the sentence to the end (and
beyond, actually …). It's probably Turing Complete.

apertium-tagger, which uses the TSX format, is a statistical
disambiguator; it runs faster, comes installed with apertium, and lets
you train on corpora instead of writing lots of rules. But it only
matches two word sequences.


-Kevin


--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] How to handle differences between languages

2013-01-28 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Sorry, not verbs! Should be adjectives. Some adjectives that are used to
 describe animerad, living things, have maskuline forms. like rädd :
 rädde, instead of rädda. If I remove such forms in sv dix for non
 animerad adjectives, i might get into trouble.

Well, you don't really have to remove them; even if abelsk is only
used to describe very inanimate mathematical structures, it doesn't hurt
that sv.dix is in theory capable of producing abelske.

If you do remove them, you need to tag them differently in bidix, so
that transfer will know whether it's dealing with an adjective capable
of the animate form, or one with no animate forms. I'd guess that that
would involve more work. The question is: if someone uses abelsk about
something _animate_ in Danish, does it hurt that you output abelske
[animate_noun]?

 Besides: Norewegian has masculine, feminine and neutral nouns, but
 Swedish and Danish have gommen and neutral. How to handel that in My
 future pair no-sv (nb/nn-sv)? 

For nouns, it's easy:

eplpiges n=ns n=ut/l rjentes n=ns n=f/r/p/e

For adjectives, a transfer rule just looks at the noun to the right and
picks the correct gender. 

I guess nouns in sv-da bidix are not tagged with animacy. The animate
ones probably should have a tag so that transfer rules can decide
whether to use the masculine adjective.

-Kevin


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-29 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 I'm stuck. I can't get the translation of Swedish pronouns to Danish
 work. Specifically, I've introduced the many Swedish variations of
 saying I and you by using an expression in third person. I've tried
 treating the possessive for somliga as a genitive causing # instead of
 *. But the possessive is treated separately for the other personal
 pronouns, originally present. And I am trying to translate somliga to
 the danish du (although that, in rare cases, it can refer to ni ,
 3rd person plural - both you in english!)


[...]

That was a bit overwhelming. Take one problematic sentence, and show its
output in the various stages.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-30 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hej Keld,
 thank you for the translation!
 Unfortunately it doesn't work:

 Somliga har det bra som kan sitta där i skuggan och läsa.

 ^Somliga/Somligaprnpersp3unplnom/Somligaprnpersp3unplacc$
 ^har/harabbr/havbhaverpresactv$
 ^det/dendetdefntsg/detprnpersp3ntsgnom/detprnpersp3ntsgacc$
 ^bra/braadv$ ^som/somcnjsub/somprnrelunspnom$
 ^kan/kunnavbmodpresactv$ ^sitta/sittavblexinfactv$
 ^där/däradv/därcnjsub$ ^i/ipr$
 ^skuggan/skugganutsgdefnom/skugganutsgdefcmpcompound-only-L/skugganutsgdefnomcompound-R$
 ^och/ochcnjcoo$ ^läsa/läsavblexinfactv$^./.sent$

 ^Somligaprnpersp3unplnom$ ^havbhaverpresactv$
 ^detprnpersp3ntsgnom$ ^braadv$
 ^somprnrelunspnom$ ^kunnavbmodpresactv$
 ^sittavblexinfactv$ ^däradv$ ^ipr$
 ^skugganutsgdefnom$ ^ochcnjcoo$
 ^läsavblexinfactv$^.sent$

 ^Somligaprnpersp3unplnom$ ^havbhaverpresactv$
 ^detprnpersp3ntsgnom$ ^braadv$
 ^somprnrelunspnom$ ^kunnavbmodpresactv$
 ^sittavblexinfactv$ ^däradv$ ^ipr$
 ^skugganutsgdefnom$ ^ochcnjcoo$
 ^läsavblexinfactv$^.sent$

 ^Somligaprnpersp3unplnom/@Somligaprnpersp3unplnom$
 ^havbhaverpresactv/havevbhaverpresactv$
 ^detprnpersp3ntsgnom/detprnpersp3ntsgnom$
 ^braadv/godtadv$
 ^somprnrelunspnom/somprnrelunspnom$
 ^kunnavbmodpresactv/kunnevbmodpresactv$
 ^sittavblexinfactv/siddevblexinfactv$ ^däradv/deradv$
 ^ipr/ipr$ ^skugganutsgdefnom/skyggenutsgdefnom$
 ^ochcnjcoo/ogcnjcoo$
 ^läsavblexinfactv/læsevblexinfactv$^.sent/.sent$

The @ is introduced by bidix. Either somliga is not in there at all, or
it's in there, but with the wrong main part-of-speech tag, or misspelt
or something.

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-30 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi again,
 one more example:
 Det kan undertecknad självfallet inte alls instämma i.

 ^Det/Dendetdefntsg/Detprnpersp3ntsgnom/Detprnpersp3ntsgacc$
 ^kan/kunnavbmodpresactv$
 ^undertecknad/undertecknavblexpputsgind/undertecknadprnpersp3utsgnom/undertecknadprnpersp3utsgacc$
 ^självfallet/*självfallet$ ^inte/inteadv$
 ^alls/allprndefutsggen/allprndefutsggencompound-R$
 ^instämma/*instämma$ ^i/ipr$^./.sent$

 ^Detprnpersp3ntsgnom$ ^kunnavbmodpresactv$
 ^undertecknadprnpersp3utsgnom$ ^*självfallet$ ^inteadv$
 ^allprndefutsggen$ ^*instämma$ ^ipr$^.sent$

Should undertecknad really be analysed as a pronoun??

 ^Detprnpersp3ntsgnom$ ^kunnavbmodpresactv$
 ^undertecknadprnpersp3utsgnom$ ^*självfallet$ ^inteadv$
 ^allprndefutsggen$ ^*instämma$ ^ipr$^.sent$

 ^Detprnpersp3ntsgnom/Detprnpersp3ntsgnom$
 ^kunnavbmodpresactv/kunnevbmodpresactv$
 ^undertecknadprnpersp3utsgnom/underskriverprnpersp3utsgnom$
 ^*självfallet/*självfallet$ ^inteadv/ikkeadv$
 ^allprndefutsggen/alprndefutsggen$
 ^*instämma/*instämma$ ^ipr/ipr$^.sent/.sent$

 ^Detprnpersp3ntsgnom$ ^kunnevbmodpresactv$
 ^underskriverprnpersp3utsgnom$ ^*självfallet$ ^ikkeadv$
 ^alprndefutsggen$ ^*instämma$ ^ipr$^.sent$

 Det kan #underskriver *självfallet ikke als *instämma i.

The # is introduced because there's no
^underskriverprnpersp3utsgnom$ in da.dix (and I'd say that's
a good thing!).

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-30 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi,
 well, I think it's a good idea to be able to translate the not uncommon
 formal Swedish undertecknad instead of jag = I.

Hmm, guess that makes sense …

 I've changed the translation to undertegnede, as Keld suggested. But
 it still doesn't work.

 As far as I can see, the requested form is available in the Danish
 monodix. What am I doing wrong?

 Before da monodix:
 ^undertecknadprnpersp3utsgnom/undertegnedeprnpersp3utsgnom$

 da monodix:

 pardef n=undertegnede__prn
   e a=PT   pl/lrs n=prn/s n=pers/s
   n=p3/s n=ut/s n=sg/s n=nom//r/p/e
   e   pl/lrs n=prn/s n=pers/s n=p3/s
   n=ut/s n=sg/s n=acc//r/p/e
   e   pls/lrs n=prn/s n=pers/s n=p3/s
   n=ut/s n=sg/s n=gen//r/p/e
   e   pl/lrs n=prn/s n=pers/s n=p3/s
   n=ut/s n=pl/s n=nom//r/p/e
   e   pl/lrs n=prn/s n=pers/s n=p3/s
   n=ut/s n=pl/s n=acc//r/p/e
   e   pls/lrs n=prn/s n=pers/s n=p3/s
   n=ut/s n=pl/s n=gen//r/p/e
 /pardef

And what does the call to the pardef look like?


-Kevin


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-30 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi again,
 bidix:
 e r=LR a=PT   plsomligas n=prn/s n=pers/s
 n=p3/s n=pl//lrnogens n=prn/s n=ut/s n=p3/s
 n=sg//r/p/e

 I try to translate  a pronoun i 3rd person, plural to an other in 3rd
 person singular and utrum. Maybe this isn't the way to do it?

It should work. 


-Kevin


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Swedish and Danish pronouns

2013-01-31 Thread Kevin Brubeck Unhammer

Per Tunedal per.tune...@operamail.com
writes:

 Hi again,
 the call:
 e lm=undertegnede r=RL a=PT i/ipar
 n=undertegnede__prn//e

The lemma and form are missing; that is, nowhere do you add
undertegnede to l or r. If it's the same in all forms, the
simplest would be to add it to the i (which is shorthand for
pl…/lr…/r/p). 

Remember that the lm attribute is, to lttoolbox, regarded as a comment.

-Kevin

 On Thu, Jan 31, 2013, at 8:40, Kevin Brubeck Unhammer wrote:
 Per Tunedal per.tune...@operamail.com
 writes:
 
  Hi,
  well, I think it's a good idea to be able to translate the not uncommon
  formal Swedish undertecknad instead of jag = I.
 
 Hmm, guess that makes sense …
 
  I've changed the translation to undertegnede, as Keld suggested. But
  it still doesn't work.
 
  As far as I can see, the requested form is available in the Danish
  monodix. What am I doing wrong?
 
  Before da monodix:
  ^undertecknadprnpersp3utsgnom/undertegnedeprnpersp3utsgnom$
 
  da monodix:
 
  pardef n=undertegnede__prn
e a=PT   pl/lrs n=prn/s n=pers/s
n=p3/s n=ut/s n=sg/s n=nom//r/p/e
e   pl/lrs n=prn/s n=pers/s n=p3/s
n=ut/s n=sg/s n=acc//r/p/e
e   pls/lrs n=prn/s n=pers/s n=p3/s
n=ut/s n=sg/s n=gen//r/p/e
e   pl/lrs n=prn/s n=pers/s n=p3/s
n=ut/s n=pl/s n=nom//r/p/e
e   pl/lrs n=prn/s n=pers/s n=p3/s
n=ut/s n=pl/s n=acc//r/p/e
e   pls/lrs n=prn/s n=pers/s n=p3/s
n=ut/s n=pl/s n=gen//r/p/e
  /pardef
 
 And what does the call to the pardef look like?
 
 
 -Kevin


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

1 2 3 4 5 >

1 - 100 of 472 matches

Mail list logo