Yup, if you see the transfer output as well, prepositions fail as transfer
matches "pr" and not "pr.*". Hence, all FSTs will be ignoring secondary
tags and there will be a separate matching mechanism for secondary tags.
The problem with treating secondary tags like primary tags is that
secondary
I think what's written in the proposal is to have pattern matching FSTs
skip secondary tags (in this case a small modification to lrx-proc).
It was suggested that matching secondary tags would end up as some sort of
hash table lookup separate from the FSTs, but I think it could also work to
just
The main thing I worry about here is lrx rules.
Currently a lot of pairs have rules that match e.g. tags="adj", but not
necessarily tags="adj.*". So something that's normally hargle might
now be hargle, and that means the lrx rule won't match.
Since we want this to be backwards-compatible
Hey Francis,
I agree that it does seem like a solution searching for a problem if we
look at it in isolation. But it's important to look at this in the context
of eliminating trimming. Chronologically, this project was first about and
still is, about eliminating dictionary trimming. Modification
In a nutshell, by using the source analysis for disambiguation and
transfer, we make the translation output better, and by outputting the
source surface form instead of the source lemma, we make the output more
comprehensible, or post-editable.
Tanmai
On Tue, Apr 21, 2020 at 12:19 AM Tanmai
El 2020-04-20 19:21, Daniel Swanson escribió:
Another way of putting this is that it looks like a technical
solution
in search of a problem, rather than a problem description in search
of a solution.
To me the most obvious thing to do with it is to put markup
information in secondary tags as
> Another way of putting this is that it looks like a technical solution
> in search of a problem, rather than a problem description in search
> of a solution.
To me the most obvious thing to do with it is to put markup information in
secondary tags as a way of solving the superblank reordering
El 2020-04-20 19:14, Francis Tyers escribió:
El 2020-04-20 19:05, Tanmai Khanna escribió:
Hey guys,
When I proposed the modification to the Apertium stream format
earlier, it was rightly pointed out to be a bit premature and not
coupled with adequate justification. As part of preparation for my
El 2020-04-20 19:05, Tanmai Khanna escribió:
Hey guys,
When I proposed the modification to the Apertium stream format
earlier, it was rightly pointed out to be a bit premature and not
coupled with adequate justification. As part of preparation for my
project, I have tried to document the
Hey guys,
When I proposed the modification to the Apertium stream format earlier, it
was rightly pointed out to be a bit premature and not coupled with adequate
justification. As part of preparation for my project, I have tried to
document the modification in a robust way, such that it makes it
Just to clarify, in this original example:
^potato/patata$
case refers to capitalisation. Morphological case already has a tag, which
would be primary information so this wouldn't touch that at all. So if it
felt like we're changing the format, we're not and this would continue
to be backwards
Instead of looking at this as modifying or extending the apertium stream
format, we could look at this as making tags more versatile by creating a
new kind of tags which have a feature:value pair. That's all there is to
it, really. In effect, it allows us to pass an arbitrary amount of info in
the
Hi Mikel,
> (0) No change should be made without proper regression testing. I think we
> all agree on that!
>
Definitely, and this is something I'll add in the proposal.
> (1) I still believe that the functionality should be proven without
> rewriting the (critical) format parsing portions in
Folks:
A quick round of comments after the responses by Tino, Xavi, Tanmai,
and Fran. Did I miss anyone?
(00) I cannot claim to have thoroughly considered all of the details of
the proposal. Therefore, I can change my mind.
(0) No change should be made without proper regression testing.
Mikel,
This is a preliminary idea and a suggestion that we discussed only
yesterday, but I assure you that it will be justified in an uncontestable
way before even one line of code is written. Ensuring backwards
compatibility is of utmost importance, and because of this, in the proposal
to modify
It's all transparent. Nobody has to add secondary information to the
stream. All current pipes will continue to work as-is, unmodified. All old
data and files remain valid.
The work is to allow for arbitrary secondary information to be added to the
stream. Initially for use with surface forms, so
Missatge de Mikel L. Forcada del dia dg., 29 de març 2020
a les 12:22:
> Folks:
>
> The elders in Apertium will not be surprised if I voiced my opposition to
> changing the format in the Apertium formats used between different modules
> of the pipeline. In any case, this is affects the core
El 2020-03-29 11:21, Mikel L. Forcada escribió:
Folks:
The elders in Apertium will not be surprised if I voiced my opposition
to changing the format in the Apertium formats used between different
modules of the pipeline. In any case, this is affects the core
functionality of Apertium in many
Folks:
The elders in Apertium will not be surprised if I voiced my opposition
to changing the format in the Apertium formats used between different
modules of the pipeline. In any case, this is affects the core
functionality of Apertium in many ways and its need should be justified
in an
No dixes will be harmed during this procedure. Nobody has to touch any
existing language files for this work to be incredibly useful. The proposal
is to allow the stream to carry secondary information. This secondary
information can come from anywhere, and will mostly be dynamic.
Initially, the
I apologise, it seems like the link got removed when the message sent.
Here it is:
http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming
Thanks
Tanmai
On Sun, Mar 29, 2020 at 3:11 PM Tanmai Khanna
wrote:
> Hey guys,
> Here's a draft proposal for this project. Any comments
Hey guys,
Here's a draft proposal http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming>
for this project. Any comments will be appreciated :)
Thanks,
Tanmai
On Sun, Mar 29, 2020 at 12:52 PM Tanmai Khanna
wrote:
> Hi Hèctor,
> A fundamental motivation for this proposal is
Hi Hèctor,
A fundamental motivation for this proposal is the possibility of giving the
power to each program to use and propagate as much information as it needs
in the pipeline. In our discussion on the IRC, Tino Didriksen said:
> You should see how much secondary information VISL's streams
Hi Tanmai,
I am surprised by this proposal. It involves some very important changes
that should be better justified. I don't quite understand when should one
define the "optional secondary information" in addition to the current
morphological fields. Will it be in the language module
I think you could reasonably consider it consistent, just with primary
information having an empty prefix, which makes sense, given that it is
primary.
On Sat, Mar 28, 2020 at 6:00 PM Scoop Gracie wrote:
> Oh, okay, that makes sense. I was also thinking it might make it easier
> for humans to
Oh, okay, that makes sense. I was also thinking it might make it easier for
humans to debug the format.
On Sat, Mar 28, 2020, 14:55 Tanmai Khanna wrote:
> Scoopgracie,
> We discussed something similar to this on the IRC, while doing that would
> make things very consistent, it would become too
Scoopgracie,
We discussed something similar to this on the IRC, while doing that would
make things very consistent, it would become too verbose, which is why it
might be easier to not have the feature:value format for primary
information, i,e., information that's almost always going to be there,
That sounds like a great idea to me. Maybe could even become ?
On Sat, Mar 28, 2020, 13:51 Tanmai Khanna wrote:
> Hey guys,
> As part of the project to eliminate trimming, I had to come up with a way
> to include the surface form in the lexical unit and hence modifying the
> apertium stream
Or =
On Sat, Mar 28, 2020, 13:58 Scoop Gracie wrote:
> That sounds like a great idea to me. Maybe could even become ?
>
> On Sat, Mar 28, 2020, 13:51 Tanmai Khanna wrote:
>
>> Hey guys,
>> As part of the project to eliminate trimming, I had to come up with a way
>> to include the surface form
Hey guys,
As part of the project to eliminate trimming, I had to come up with a way
to include the surface form in the lexical unit and hence modifying the
apertium stream format. To do this I would have to modify the parsers of
every program in the pipeline, and if that has to happen, we
30 matches
Mail list logo