Tanmai Khanna
čálii:
> the analyser sees wordbound blanks as normal blanks,
So currently if I have the multiword "i dag", it'll recognize
"idag" but it won't recognize "i dag"? (And I suppose if
I have the non-multiword "today" it won't recognize "today".)
One possibility might be to have
Tanmai Khanna
čálii:
>> So currently if I have the multiword "i dag", it'll recognize
> "idag" but it won't recognize "i dag"? (And I suppose if
> I have the non-multiword "today" it won't recognize "today".)
>
> Exactly, but even when it recognises "idag", the will probably
> be lost because
Congrats
5.7% WER is pretty nice =D
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Or just call the NMT system "NMT+apertium" =P
Zanga Chimombo
čálii:
> I do not mean to be unduly polemic by questioning the methodology in
> choosing what to compare, neither do I want to overlook the shortfalls
> of Apertium/ RBMT, however, if Apertium was "good enough" to create
> corpora
Woohoo congrats and thanks for all the hard work Tanmai and Tino =D
The superblank issues have been a pain for quite some time.
How does it work with transfer now, what are the semantics of things
like or just ?
signature.asc
Description: PGP signature
Tanmai Khanna
čálii:
> we no longer need the user to be worried about blank
> positions in transfer rules. The latest update to the apertium code makes
> it such that is now the same as . You can change the pos="X"/> in your transfer rules to just and it'll work.
>
> Now, the only thing you
Tanmai Khanna
čálii:
> So what I'll try to do, is after the blanks are collected, lets say X is
> the number of source LUs in the pattern and Y is the number of output LUs.
> If X = Y then we can keep them in the same place, if X < Y, then we can
> keep them in the first X gaps the rest can be
Zanga Chimombo
čálii:
> One of the processes that occurs in one of the languages I am dealing
> with is "nk-" becoming "ng-"
>
> I thought I would be able to fix this using the post generator here:
> https://gitlab.com/zangaphee/CiBantu/-/blob/master/twoc/apertium-yao/apertium-yao.post-yao.dix
>
Tanmai Khanna
čálii:
> Thanks Unhammer!
> So now we have three kinds of units: block tags, superblanks, and wordbound
> blanks. Block tags are hard breaks in the text, wordbound blanks move
> around with words, and superblanks are tags that aren't hard breaks but not
> attached to words (such as
Tanmai Khanna
čálii:
> I always thought that's the default behaviour. That if some blanks aren't
> explicitly printed in the transfer rules then they're flushed. I'll check
> it out, but it should be that.
The old behaviour has been to just throw away anything that's eaten by
a rule but not
(Antonio: I forwarded your message to apertium-stuff since you're more
likely to get help there)
Are the TMX tools still used by anyone?
Start of forwarded message
From: Antonio Giovanni Contarino
Date: Sat, 19 Sep 2020 14:46:39 +0200
To:
Tanmai Khanna
čálii:
> *making trimming the norm and having the option of
> eliminating it, or making eliminating trimming the norm and having the
> option of activating it, or to have partial trimming, as discussed later.*
I'd vote for keeping trimming the norm, implementing the project
Flammie A Pirinen čálii:
>> 4. Weighting the monodix will take more compile time than just trimming it.
>
> Some numbers would be interesting, I think both are quite heavy and we
> don't do much further processing in finite-state algebra (/hfst space)
> so the weighted models won't blow up. In
Tanmai Khanna
čálii:
> Here's a timing test for weighted dictionaries.
> On apertium-eng-kaz:
>
> 1.
> real 0m4.257s
> 2.
> real 0m7.990s
With nob→nno plain lt-trim 1. takes 33s
whereas the long script in 2. takes 45s.
File size increases from 1.2M to 5.5M, seems acceptable (the unweighted,
Francis Tyers čálii:
> Sourcehut is a free/open-source "forge" type thing run by Drew
> DeVault. They have
> mailing lists.
>
> Our current mailing lists are with SourceForge and all of the terrible
> stuff that
> goes with that.
>
> Here is a link:
>
> https://lists.sr.ht/
>
> What do people
⭐
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Francis Tyers čálii:
> fust
> ferro
>
> ->
>
>
>
>
>
>
> These could then be included into the .lrx file by the Makefile, or
> a separate, monolingual, file could be another argument to lrx-comp.
That's a great idea =D
signature.asc
Description: PGP signature
I came across this article about various FOSS tools used by journalists
digging through a data leak:
https://www.icij.org/investigations/luanda-leaks/how-we-mined-more-than-715000-luanda-leaks-records
where they mention using Apertium to avoid sharing sensitive information:
With more than
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Flammie A Pirinen čálii:
> Hi all,
>
> I've written a handful of apertium-fin-* prototypes and I usually end up
> spending way too much time with all the useless subclasses of proper
> nouns we have (cogs, ants, als, tops, orgs, and to top all that,
> sometimes ms and fs for some extra
"Bernard Chardonneau"
čálii:
> a solution could be to give both source code of Apertium tools
> ans source code of system libraries it uses. These libraries would be
> compiled with Apertium tools using them and object files stored outside
> /usr/lib . So, there would not be compatibility
Kevin Brubeck Unhammer čálii:
> it's probably the impossiblest to compile the same way, just put it into
that line was missing a "not" there :-)
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
A
VIVEK VICKY
čálii:
> Hello everyone,
> The eng-spa parallel corpora I am using(http://www.statmt.org/europarl/,
> http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz), have empty lines
> in either languages due to splitting of a sentence into two or merging of
> two sentences after the
Hi,
I've tagged some new releases of nno, nob and apertium-nno-nob.
Like before[0], the work has been funded by the Norwegian Ministry of
Culture via Nynorsk pressekontor (NPK) and the Norwegian News Agency,
now with direct commits from contributors Anja, Victoria and Hallvard of
NPK :-)
One
Hèctor Alòs i Font
čálii:
> I am more sceptical about the need to distinguish between toponyms and
> hydronyms. In some languages one will have an article and the other will
> not, but these are rare cases. On the other hand, we do not distinguish
> between countries (or regions) and cities,
Hi all,
I made a thing:
https://apertium.trigram.no/?dir=nob-nno=Vi%20liker%20enten%20%C3%A5%20fortsette%20%C3%A5%20bygge%20n%C3%A5r%20vi%20blant%20annet%20s%C3%B8ker%20forskjellen%20mens%20dere%20er%20uenige.#translation
(try toggling the various "Style preferences")
Norway in general has a
Volda University College recently had a seminar on MT in education –
half of it dedicated to Apertium:
https://nynorsksenteret.no/blogg/program-for-digital-fagdag-om-omsetjingsteknologi
They also link to an article
https://nynorsksenteret.no/uploads/images/Artikkel-Aasbrenn-okt2020.pdf
titled
Rajarshi Roychoudhury
čálii:
> Bhojpuri and Hindi are very closely related language pairs
> As far as I know(correct me if I am wrong) , apart from some minor
> phoenetical changes they can be considered identical pairs .
Seems like a good fit for Apertium then :) considering one of the most
Tino Didriksen
čálii:
> The monolingual packages install many more modes, because they are used for
> further development. So you can get morph from those. But biltrans is not
> normal to want if you aren't a developer, and thus building from source.
The reasoning is that
- people who want to
Hi,
Does anyone have any smart methods for protecting text in quotes?
It's certainly possible with a little pre-/postprocessing like
https://github.com/apertium/apertium/issues/32
(a bit less safe if your language uses "" instead of «», I suppose
you could restrict it to only match fairly short
Jonathan Washington
čálii:
> As to Andrey's question concerning kaz-rus not working because of a
> missing .t4x file, that sounds like a legit packaging error, which I'm
> not sure how to fix (I really should learn...)
That was fixed in
Flammie A Pirinen čálii:
> chunk names had an effect to the
> translation
Postchunk by default applies the chunk case onto the lemma.
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
Make sure you compile (`make -j langs`) before testing, and preferably
before each commit as well. The version on github doesn't compile right
now:
apertium-eng-sat.eng-sat.t1x:69: element rule: validity error : Element
rule content does not follow the DTD, expecting (pattern , action), got
> please unsubscribe me
try https://sourceforge.net/projects/apertium/lists/apertium-stuff/unsubscribe
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> please unsubscribe me
https://sourceforge.net/projects/apertium/lists/apertium-stuff/unsubscribe
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> Please unsubscribe me from this list. Or please tell me how to.
https://sourceforge.net/projects/apertium/lists/apertium-stuff/unsubscribe
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
> The link in that earlier email is dead, so I can't see what the original
> script was doing, but based on the name it might have just been replacing
> with , in which case, if you still have that script, you could
> just edit it to replace with .
Wops, I should've attached it …
These days I
Hi all,
If anyone uses Emacs out there, the CG package now does error/warning
highlighting out-of-the-box:
https://wiki.apertium.org/w/images/0/03/Cg-flymake.gif
There are also new Toolbar Buttons for those who like clicking things,
letting you open the input editor, run the grammar and filter
TL;DR: Put `export LT_JOBS=yes` in your bash profile for faster dix
compilation. You'll need the newest packages for this. My most commonly
run compilation now takes 60s instead of 90s.
Please try it out and report back if you find any bugs.
In more depth: The newest package of lttoolbox in
拾 congrats!
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> Theoretical, if I am able to get thru Sysdamins, this is “all” we need for
> Suse?
> # Nightly, unstable, new, almost always use this:
> curl -sS https://apertium.projectjj.com/rpm/install-nightly.sh | sudo bash
>
> —OR—
>
> # Release, stable, old:
> curl -sS
it with just lttoolbox.jar. The
easiest option is to just shell out to the regular Apertium pipeline.
Is this meant to run on Android or something since you're looking at
Java?
best regards,
Kevin Brubeck Unhammer
___
Apertium-stuff mailing list
Apertium
> Thank you Kevin
>
> I was asked to find offline translator for a tomcat based webapp.
> Tomcat itself is running on linux (development on win and mac). Our
> goal was with pure java to get less integration issues with system
> administration and to keep all logic in the same (WAR) file. But know
> === Candidates:
> Do you want to be a PMC member? Speak up!
I do.
-Kevin
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Hi,
Forwarding
https://giella.zulipchat.com/#narrow/stream/124588-all_langs/topic/IDIL.20langtech.20network
> The Norwegian Ministry of Local Government and Regional Development together
> with the Norwegian Sámi Parliament are going to use the Indigenous Decade of
> Indigenous Languages
Xavi Ivars čálii:
> But also, voting for just to confirm (or also push back?) the only group of
> people that volunteered seems a bit useless.
>
> Maybe if someone outside the PMC gave their opinion, voting would make more
> sense. But so far, it's been only the ones in the PMC (+ Sushain +
Hi,
I've tagged new releases of nno, nob and apertium-nno-nob.
As before, work has been funded by the Norwegian Ministry of Culture via
Nynorsk pressekontor (NPK) and the Norwegian News Agency, with commits
from contributors Mari, Anja, Maria, Victoria and Hallvard of NPK.
One major change is
Goddag,
I've just tagged new releases of swe-nor and dan-nor.
The work on swe-nor is partially funded by the Norwegian News Agency,
and dan-nor by Store norske leksikon.
For both pairs, all directions now use apertium-separable (lsx) and
recursive transfer (rtx), with testing by
I'll mentor :)
--
Kevin Brubeck Unhammer
> GSoC 2023 org application is open, but do we have mentors for this year?
> Please report in if you want to mentor.
>
> And as every year, please review
> https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code -
> add/
Hi,
Those sound like great projects. There's no limit to how many projects
can be related to one language; proposals are ranked based things like
student application/experience/involvement, mentor availability, and how
well the proposal fits with the Apertium project's overall goals and how
> As far as rewriting the
> transfer rules using apertium-recursive is concerned, a co-mentor with
> experience in the module would be highly desirable.
I can try to assist :)
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Congrats on the release!
And that documentation is impressive :)
> 1) We have a serious problem in the translation from Gascon into French.
> The basic issue is that some Gascon speakers use something called
> enunciatives and others do not. These enunciatives, when they are used, are
> found
What if you do
lt-proc oci.automorf.bin | cg-proc enondetect.rlx.bin | cg-proc oci.rlx.bin | …
The first CG step would output a stream variable, so that what the next
step sees is
[]
^que/que/que$
[more text here]
If the next step is CG, it's just
REMOVE:var-is-set (enon) IF (0
Hèctor Alòs i Font
čálii:
> Enunciatives are a kind of adverbs that are put just before verbs in main
> clauses (although they can also be found in subordinate clauses too). For
> affirmative clauses, it works like the English reinforcement "do" in "I do
> like", but it is syntactically
Daniel Swanson
čálii:
> Greetings Apertiumers!
>
> This morning I set out to change the Ancient Hebrew analyzer from
> Latin script to Hebrew script (a task I don't wish upon anyone) and in
> the process produced a search-and-replace tool that understands the
> structure of several of our source
> Since there was no coding challenge mentioned on the wiki,
> I'm assuming that there is none
The task just links to the issue tracker
https://github.com/apertium/apertium-python/issues
so I'm guessing a good place to start would be to
1. try out the api, see what features are there
2. try
ine.
(Think about how this will be integrated into apertium – we have a
translation pipeline which expects a certain format
https://wiki.apertium.org/wiki/Apertium_stream_format )
best regards,
Kevin Brubeck Unhammer
___
Apertium-stuff mailing
Daniel Swanson
čálii:
> To be clear, I meant splitting into .
> One of my ideals for the tagset is that every tag be
> position-independent, so that the only reason I need to care about
> order is because of FST topology (and maybe not even then).
Aren't the tags themselves already
> I'd like to participate in Google Summer of Code 2023 at Apertium.
> In particular, I'm interested in adding new language pair and I am
> thinking to add Japanese-English as I speak Japanese. I took summer
> school at Tokyo University online on natural language processing
> before.
> Could you
> I use Metawiki as a translator. Often I find that the English word
> 'for' is translated by Apertium with 'partorisca'. This Italian word
> is a verb meaning 'to give birth'. The correct translation is 'per'.
> Is it possible to fix it?
According to Kartik Mistry, apertium-eng-ita shouldn't be
Hi,
Cf. http://tinodidriksen.com/pisg/OFTC/logs/%23hfst/2023-02-28.log
perhaps you can make an xfst rule to do the equivalent of
sed 's/\(.*\)/\1/'
?
signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
new lexical selection
- 34 new separable/mwe entries
best regards,
Kevin Brubeck Unhammer
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> Greetings Apertiumers!
>
> I recently identified a way that apertium-preprocess-transfer was
> being rather inefficient and today I fixed it, so tomorrow you all
> should be able to update to apertium 3.9.4 and see some improved
> compile times for any pairs not using apertium-recursive, with
>
Congrats, that's great =D
> Dear all,
>
> The pairs Spanish-Aragonese and Aragonese-Catalan are ready to release
> (can anyone tag them?)
>
> apertium-spa-arg 0.6.0 (commit 61048e9) depends on apertium-spa
> (commit d2455cf, needs new tag) and apertium-arg 0.2.0 (commit
> 0b9f06e).
>
>
> I am looking at this again. Removing the extra tag at the transfer
> stage seems to be too late down the pipeline (I need the adjective to
> match the noun which is done by CG). Actually, surely removing the
> extra tag could be done at the same CG stage?
If you use an xfst rule, that happens
Try https://sourceforge.net/projects/apertium/lists/apertium-stuff/unsubscribe
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> [CC: -stuff and PMC]
>
> Should we apply for Google Summer of Code this year? Deadline Feb 6th.
>
> -- Tino Didriksen
I'd be happy to mentor at least. Some projects that I personally would
love to see happen:
* More dictionaries and language data! Whether from scratch or converting
sources
*
> Yes, even if you already registered last year. We just got a warning that
> we only have 1 admin (me), even though I was sure we had 3. So,
> https://summerofcode.withgoogle.com/
>
> "Before you can add an Org Member who has participated in previous programs
> to your organization for 2024, they
Check your inbox (and spam folder if not there) for a password.
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> On Tue, 4 Jun 2024 at 17:44, Aure Séguier wrote:
>> Es possible de far de règlas de desambiguïzacion especificas a una
>> varietat ? Per exemple, en gascon, avèm los enonciatius ("que", "ne", etc.)
>> qu'existisson pas dins las autras varietats. Se cambiam lo sistèma de
>> gestion de las
> Occitan can manage variety in its metadix file. My question is, is
> there a way to manage variety in the .rlx file ?
There is :)
> For instance, we have the word "bad", "evil" which is "mal" in
> lengadocian and "mau" en gascon. But "mau" can also be a conjugated
> verb (a pretty rare one). I
> How can I define src_lengadocian as the variable that means the source
> language is lengadocian ?
Hm, it kind of depends. In general, if you use variables, you can do
export AP_SETVAR=src_lengadocian
echo mau o mal | apertium -d . oci-fra
and that variable will be available to the
401 - 472 of 472 matches
Mail list logo